Designus Maximus Unleashed! (Unabridged & Unexpurgated)
By Clive "Max" Maxfield
Newnes Boston Oxford
Johannesburg Melbourne
New Dehli
Singapore
Newnes is an imprint of Butterworth-Heinemann. Copyright 9 1998 by Butterworth-Heinemann -@
A member of the Reed Elsevier group
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher.
|
Recognizing the importance of preserving what has been written, ButterworthHeinemann prints its books on acid-free paper whenever possible. Butterworth-Heinemann supports the efforts of American Forests and the Global ReLeaf program in its campaign for the betterment of trees, forests, and our environment. ISBN: 0-7506-9089-5 With regards to the book: All of the technical illustrations were created by Clive "Max" Maxfield. The caricatures in Chapter 26 are copyright Maxfield & Montrose Interactive Inc., Madison, AL. With regards to the CD-ROM: The Beboputer Virtual Computer is copyright Maxfield & Montrose Interactive Inc., Madison, AL, USA (http://ro.com/maxmon). The BOOL Logic Synthesizer is copyright Alon Kfir, CA, USA. The MMLogic MultiMedia Logic Design System is copyright Softronics Inc., Northboro, MA, USA (www.softronix.com). The Analog Magic demo and all related files and datasheets are copyright IMP, Inc., San Jose, CA, USA (www.impweb.com).
The publisher offers special discounts on bulk orders of this book. For information, please contact: Manager of Special Sales Butterworth-Heinemann 225 Wildwood Avenue Wobum, MA 01801-2041 Tel: 781-904-2500 Fax: 781-904-2620 For information on all Newnes electronics publications available, contact our World Wide Web home page at: http://www.bh.com 1098765432
1
Printed in the United States of America
To my "little" brother Andrew James, one of the genuinely nice people who make the world a better place for the rest of us.
iv
Designus Maximus Unleashed!
flbout the
luthor
Clive " M a x " Maxfield is 6'1" tall, outrageously handsome, English and proud of it. In addition to being a hero, trend setter, and leader of fashion, he is widely regarded as an expert in all aspects of electronics (at least by his mother). After receiving his B.Sc. in Control Engineering in 1980 from Sheffield Polytechnic (now Sheffield Hallam University), England, Max began his career as a designer of central processing units for mainframe computers. To cut a long stow short, Max now finds himself a Member of the Technical Staff (MTS) at Intergraph Computer Systems, Huntsville, Alabama, USA, where he gets to play with their high-performance 3D graphics workstations. In his s p a r e t i m e (Ha[), Max is a contributing editor to Electronic Design News (EDN) magazine and a member of the advisory board to the Computer History Association of California (CHAC). In addition to numerous technical articles and papers appearing in magazines and at conferences around the world, Max is also the author of the outstandingly successful Bebop to the Boolean Boogie (An Unconventional Guide to Electronics), and the co-author of Bebop
BYTES Back (An Unconventional Guide to Computers). O n t h e off-chance that you're still not impressed, Max was once referred to as an "industry notable" and a "semiconductor design expert" by someone famous who wasn't prompted, coerced, or remunerated in any way!
cknowledgment$
Introduction
Just before we leap into the fray, I would like to take a few moments to thank my staunch friend and good companion Alvin for his help, support, and encouragement in this project, and I'd like to thank Alvin's wife Sue for letting him come round to play at my house, l'd also like to thank Sue for her many kindnesses (including, but not limited to, punishing her kids by making them weed my garden whilst I sat chained to the computer). Thanks also go to my ex-wife Steph for the long hours she spent proof-reading the final manuscript (any remaining errors are mine own). While a large part of this book is devoted to my musings on miscellaneous topics, a key element is the accompanying CD-ROM, whose contents I hope you'll enjoy experimenting with for years to come. So my grateful appreciation goes to Maxfield & Montrose Interactive Inc., Madison, AL, USA (http://ro.com/maxmon) for letting me present a demo copy of the BeboputerTM Virtual Computer, Alon Kfir, CA, USA, for allowing me to showcase his BOOL Logic Synthesizer; George Mills of Softronics Inc., Northboto, MA, USA (www.softronix.com) for allowing me to feature his MMLogic MultiMedia Logic Design System; and Dave Gillooly of IMP, Inc., San Jose, CA, USA (www.impweb.com) for providing the material on Analog Magic and related datasheets. Many of the topics presented in this book have previously appeared in a condensed and edited form as magazine articles, so l'd like to thank the following magazines for allowing me to reprint these articles in their original form: Electronic Design News (EDN) magazine (www.ednmag.com), Electronic Design (ED) magazine (www.penton.com/ed), and Electronic Design & Technology Network (EDTN) magazine (www.edtn.com). Last but certainly not least, my thanks to HighText Publications, Solana Beach, CA (www.hightext-publications.com) and to Doone Publications, Madison, AL (www.doone.com) for kindly permitting me to abstract and expand on certain topics from my earlier works.
V
vi
Designus Maximus Unleashed!
(
ontents
Section 1" Capriciously Cunning Software Chapter 1: Who was that Masked Man? ....................................................................................... 3 Chapter 2: The
BeboputerTM Virtual
Computer ................................................................ 15
Chapter 3: B O O L Logic Synthesis .................................................................................................. 27
Chapter 4: MMLogic Multimedia Logic Design System ........................................... 33
Section 2- Design Capture, Synthesis, and Simulation Chapter 5: Capturing and Synthesizing a Design ............................................................ 39 Chapter 6: Digital Logic Simulation ................................................................................................ 57 Chapter 7: Static and Dynamic Timing Analysis .............................................................. 73 Chapter 8: Digital Fault Simulation ................................................................................................. 85 Chapter 9: Digital Simulation Logic Value Systems ...................................................... 97 Chapter 10: Unknown X Values ...................................................................................................... 107 Chapter 11: Analog and Mixed-Signal Simulation ....................................................... 119
Section 3: Binary Arithmetic Chapter 12: A minus B = A + NOT(B) + 1 ......................................................................... 135 Chapter 13: Binary Multiplication .................................................................................................. 155 Chapter 14: Binary Division ................................................................................................................. 167
Section 4: Design Musings Chapter 15: State Machines ................................................................................................................. 183
Chapter 16: Asynchronous Design ............................................................................................... 199 Chapter 17: Linear Feedback Shift Registers .................................................................... 219 Chapter 18: Designing a Three-Phase Clock .................................................................... 233 Chapter 19: Field Programmable Devices ........................................................................... 245 Chapter 20: Library of Parameterized Modules ............................................................. 275
Introduction
Section 5: Miscellaneous Musings Chapter 21: Reconfigurable Logic ................................................................................................287 Chapter 22: Genetic Algorithms ......................................................................................................297 Chapter 23: Reed MUller Logic .........................................................................................................307 Chapter 24: Testing RAMs and ROMs ......................................................................................317 Chapter 25: Deep Submicron Delay Effects ......................................................................331 Chapter 26: Logic Diagrams and Machines ........................................................................353 Chapter 27: Transistors of the Future ........................................................................................361 Chapter 28: Protein-Based Switches and Nanotechnology .............................. 371 Chapter 29: Interrupts and Interrupt Handling ............................................................... 381 Chapter 30: A Letter from America .............................................................................................401
Appendix A: Installing your B e b o p u t e r .............................................................................417 Appendix B: B e b o p u t e r Addressing Modes and Instruction Set ....... 421 Index ............................................................................................................................................................................... 429
vii
viii
Designus Maximus Unleashed! The EDN Series for Design Engineers A. Kularatna Power Electronics Design Handbook: Low-Power Components and Applications EDN Design Ideas (CD-ROM) C. Sehroeder Printed Circuit Board Design Using AutoCAD J. Lenk Simplified Design of Voltage-Frequency Converters J. Lenk Simplified Design of Data Converters F. Imdad-Haque Inside PC Card: CardBus and PCMCIA Design C. Schroeder Inside OrCAD J. Lenk Simplified Design of lC Amplifiers J. Lenk Simplified Design of Micropower and Battery Circuits J. Williams The Art and Science of Analog Circuit Design J. Lenk Simplified Design of Switching Power Supplies V. Lakshminarayanan Electronic Circuit Design Ideas J. Lenk Simplified Design of Linear Power Supplies M. Brown Power Supply Cookbook B. Travis and I Hickman EDN Designer's Companion J. Dostal Operational Amplifiers, Second Edition T. Williams Circuit Designer's Companion R. Marston Electronic Circuits Pocket Book: Passive and Discrete Circuits (Vol. 2) N. Dye and H. Granberg Radio Frequency Transistors: Principles and Practical Applications Gates Energy Products Rechargeable Batteries: Applications Handbook T. Williams EMC for Product Designers J. Williams Analog Circuit Design: Art, Science, and Personalities R. Pease Troubleshooting Analog Circuits I. Hickman Electronic Circuits, Systems and Standards R. Marston Electronic Circuits Pocket Book: Linear ICs (Vol. 1) R. Marston Integrated Circuit and Waveform Generator Handbook I. Sinclair Passive Components: A User's Guide
This Page Intentionally Left Blank
Chapter 1:
Who was that Masked Man? "And now m y friends, the show that never ends, In this chapter you will discover: It's a Funny O l d World ........................................................................................................................4 It all S e e m e d Rather Pointless Really ....................................................................... 4 B e w a r e t h e J a b e r w o c k m y Son ....................................................................................... 5 I n t e r n a t i o n a l C o m p u t e r s : Shifters/Rotators ................................................... 6 Cirrus Designs: Test a n d Simulation .............................................................................. 8 I n t e r g r a p h C o r p o r a t i o n ' EDA a n d B a c k t o C o m p u t e r s ..... 10 Articles, C o n f e r e n c e s , a n d Books ............................................................................. 11 So Here We Are ..........................................................................................................................................14
4
DesignusMaximusUnleashed! ~t's a 3unng Old World It really is a funny old world when you come to think about it. Deep in the mists of time when I was but a junior engineer, I occasionally thought that it would be really cool to write technical articles and have them published in electronics magazines. Only two things stood in my way: a) I didn't know anything worth writing about. b) Magazines weren't interested in me because I hadn't written anything before. Over the years l've come to realize that point (a) doesn't necessarily hold you back as much as one might think, but point (b) is certainly something of a show-stopper. As time passed I began to have the odd article published (and some of them were really odd), until one day it appeared as though l'd crossed some undocumented article-producing threshold known only to magazine editors. Instead of being obliged to grovel with editors to have an article published, I suddenly found myself having to beat them off with a stick. Now they call me pleading for an article on "xyz," and happily ignore any protestations on my part that I know nothing whatsoever about the subject in hand. Why should this be? I don't know. That's just the way it is.
~ t a l l S e e m e d I ~ a t h e r P o i n t l e s s i~eallg When I started the English equivalent of High School at the tender age of 11 (Abbydale Grange Grammar School, Millhouses, Sheffield, England, on the off-chance anyone is interested) I wasn't very good at English football (what the Americans would call soccer). In addition to my seeming to have two left feet, no one ever bothered to explain the rules to me. So whenever I actually gained comrfland of the ball, the referee would shout something like "You're offside you dingbat," and then they took the ball away from me again. It all seemed rather pointless really. On the bright side, one of my friends (who also didn't like football) was really rather clever. Even for a precocious 11-year old, Carl Clements, I~l for that was his name, knew an awful lot about electronics So while our compatriots were making life miserable for the old pig's bladder, Carl would be explaining how to make an audio amplifier using just one transistor (not a high-fidelity unit you understand), describing the role of the bias voltage and drawing circuit diagrams in the playground dust.
ICarl, long time no see. If you happen to read this, drop me an email at
[email protected].
Chapter I CapriciouslyCunningSoftware The years rolled by and we moved on to more esoteric topics such as bio-feedback. By the age of 14 we were huddled in our corner of the playground designing brainwave amplifiers that could filter out alpha waves, amplify them, and feed them back into headphones as pink-noise. 121Sad to relate, these devices never actually worked. The theory was sound, but we didn't have any test equipment (in fact we didn't have much more than a soldering iron between us) and we didn't have any training in testing circuits and resolving faults. Still later we migrated into simple digital logic and then progressed to creating rudimentary robots that rolled around the house and plugged themselves into power outlets (on occasion they even managed to do this without vaporizing every fuse in the house). Looking back I guess all of this might seem pretty weird. If I were to observe two 11-year olds of today sketching circuit diagrams in the playground while their friends played games, l'd probably pause to ponder what cruel quirk of fate had dealt them such a pathetic existence. But we actually had a great deal of "fun in the sun," and it all seemed perfectly normal at the time.
~eware
the ~aberwock
my Son
In my last year at High School I was s u m m o n e d into the august presence of the Careers Officer, who loomed over the top of his desk and spake unto me thusly: "Do you have the faintest clue what y o u want to be w h e n y o u grow up y o u sniveling little toad?" (Truth to tell, this master of the polyester fashion statement didn't actually voice the words "you sniveling little toad," but it was certainly implied in his sneer and his demeanor.)I31 When I replied: "I'm going to be an electronics engineer," he responded with the following carefully considered gems of advice, which I still treasure to this day: "Good .... y o u can go n o w .... s e n d in the next o n e on your way out." And so it was that I meandered my way into the electronics course at Sheffield Polytechnic (Now Sheffield Hallam University). I hated it. We seemed to spend an inordinate amount of time calculating the angular m o m e n t u m of electrons and listening to lecturers spout forth on similar apparently meaningless drivel. Not that I have anything against electrons having angular m o m e n t u m you understand, or even that I particularly dislike calculating what it is. But once you've calculated one electron's angular m o m e n t u m you've done the lot as far as l'm concerned, so this sort of thing became somewhat tiresome after the third week. 2"Pink Noise" is sort of like "White Noise," but it's a bit more colorful (pun intended). For example, the sound of falling rain might be equated to white noise, while the "churl ... churl ... chuff' of an old fashioned steam train would be a form of pink noise. 3One gets to be more charitable over the years .... perhaps his surly disposition was caused by his underwear riding up or some similar such happenstance.
5
6
DesignusMaximusUnleashed! Knowing that there are a lot of electrons in the universe, 141and with a growing certainty that I would be required to calculate the angular momentum for each and every one of them, I began to desperately search for a way out. After a little rooting around I was amazed to discover that the "Electrical & Electronics" course was one of the last places you wanted to be if you actually wanted to build electronic devices and (dare we say it) have a bit of fun during the process. So after my first year I transferred to a 4-year degree course in Control Engineering, which turned out to be just what I'd wanted to do all along. The Control Engineering course was based on a core of mathematics and control theory, with surrounding subjects in electronics, mechanics, and hydraulics and fluids. The entire course was geared up to building things and controlling them, from factory systems to robots to fly-by-wire aircraft. Also, along the way we were introduced to computers. Of course, in those days we wrote programs in the FORTRAN language (ugggg, arrggghhh), entered them on punched cards, walked through wind, rain, and sleet to the computer building, waited a week to get the results (which were invariably along the lines of "Syntax error on line 2, missing comma."), and then started the whole process again. In fact when you finally got a program to work, you usually couldn't remember what had prompted you to write it in the first place!
~nternational
eomputer$: Shiftersll~otators
After graduating with a B.Sc. in Control Engineering in the Summer of 1980, my first position ("Look Morn, a real job!") was with International Computers Limited (ICL) in Manchester, England. At that time ICL made honking big mainframe computers (they probably still do as far as I know), and I was hired as a member of one of their Central Processing Unit (CPU) design teams. It didn't take me long to discover that little of what l'd studied at college had much bearing on my new job. Isl I also quickly realized that subjects which had appeared easy in the classroom (when the lecturer was doing the bulk of the work) were somewhat trickier when you had to do them in earnest. Fortunately, ICL had a really good policy whereby junior woodchucks like myself were partnered with more experienced team leaders. I was lucky in this regard to be assigned to one Dave Potts, who taught me far more than l'm sure he ever realized. Working under Dave was somewhat frustrating, however, in that he would never answer even the simplest question directly; for example: 4Unless there's only one electron that moves around jolly quickly and pops in and out of existence hither and thither, as was once postulated by the legendary physicist Richard Feynman. 5To be fair, almost everything I learned at college has subsequently come in handy at one time or another.
Chapter I Capriciously Cunning Software
Max:
Hey Dave, what time is it?
Dave: Where is the sun in the sky, which way is the wind blowing, what side of the tree does the moss grow, how ......
To cut a long story short, Dave's policy was to lead you through a series of questions, thereby guiding you to discover the answers to life, the universe, and everything for yourself. In many ways this proved to be a superb learning experience (but you quickly learned not to ask Dave for the time). My first task at ICL was to design a 128-bit barrel shifter and rotator; that is, a unit that could shift or rotate the contents of a 128-bit bus by any amount from 1 to 128 bits in a single clock cycle. Dave informed me that the project called for this unit to be implemented using eight Application-Specific Integrated Circuits (ASICs), each of which would handle a 16-bit chunk of the data bus. Furthermore, all of the ASICs were to be functionally identical in order to keep the project within budget. Initially my task didn't appear to be particularly strenuous. The only tricky details involved inserting the appropriate values into the most- and least-significant bits for right and left shifts, respectively (and handling logical versus arithmetic shifts of course). The solution was to employ two of the pins on each ASIC to act as a device ID; that is, these two pins could be presented with one of four binary patterns (OO, O1, 10, and 11), thereby instructing the device as to its position in the chain. For example, the six devices forming the middle of the chain could have these two pins driven to OO, the device on the left-hand end of the chain could have its pins driven to O1, and the device on the right could be driven to 10. When l'd completed this part of the exercise, Dave deigned to inform me that he'd neglected one slight detail, which was that in addition to shifting all 128 bits, the shifter/rotator also had to be capable of shifting only the least-significant 64 bits or the least-significant 32 bits. OK, my task had just become a tad trickier, but it still wasn't all that bad and a few days later I returned with my latest offering. "Ah, Fla!" said Dave, "now we're getting there, but in addition to binary
shifts, this device also has to be able to handle Binary-Coded Decimal (BCD) data!" And so it went. Every time I finished a problem another feature would be added to my portion of the project. In reality, of course, the specification already contained all of these details, but if l'd been presented with the full requirements on my first day, my brains would have leaked out of my ears and I would have been reduced to a gibbering wreak. These days engineers have access to some very sophisticated tools, such as schematic capture (Chapter 5), logic simulation (Chapter 6), timing analysis (Chapter 7), and so forth. But in those days of yore, the best one could hope for
7
8
Designus Maximus Unleashed!
was a good data book, a sharp pencil, a lot of paper, and a pocket calculator (if you were lucky). The project specifications called for a certain clock frequency, from which one could derive the maximum permitted delays across my devices. My next task was to calculate the input-to-output delays by hand, then juggle the logic gates inside the device until l'd achieved my target. Last but not least, each device could only accommodate a certain number of logic gates and registers, and I was dreadfully over budget. Unfortunately, we didn't have access to anything like today's logic synthesis technology (Chapters 3 & 5), but we had something far better! One of the members of our team was a 30-year old Chinese engineer, whose forte was logic minimization and optimization, l'm sad to say that I no longer remember the lad's name, but this guy was incredible ~ a grand master of logic ~ and I'd put him up against today's automatic tools without any hesitation at all.
e i r r u s Designs: ~ e s t a n d Simulation My mother was delighted when I was accepted by ICL, because she could see a steady progression through the corporate hierarchy, yay even unto the highest pinnacles of power (my mother had remarkably good eyesight in those days). So you can only imagine her surprise and delight when, after just one year, I departed ICL to join a start-up company called Cirrus Designs. Upon discovering that my new employer had only been incorporated for a few days, my mother became full of doom and despondency (and she wasn't one to suffer quietly or alone ~ sorry Dad). Strange to relate, she now claims that this was the best move she ever told me to make, and when I question her recollection of events she stridently informs me that: "A mother remembers these things!" and who amongst us can argue with that. ~6~ I was the sixth person to join Cirrus Designs, and the other five told me that I was a lucky swine because I'd joined the day after the desks and chairs had arrived. Our sole computer facilities comprised a PDP 11-23, with two terminals and keyboards between us. The hard disk only had a single directory that we all had to share, 171and each of us was allocated only 2000 blocks of disk space (where each block held 1024 characters). It seemed like a lot at the time! When Cirrus Designs commenced operations, we used to have a weekly "state of the company" meeting to see how well we were doing. Unfortunately, I think the founders had assumed that when you did some work for somebody they would pay you for it immediately, so it came as something of a shock to discover that it 6My mother's memory is so acute that she can remember things that haven't even happened yet! 7The last character of a file name indicated its owner. My letter was "M", so any file called xxxM.xxx was one of mine.
. . . . .
._
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
-
Chapter I Capriciously Cunning Software
. . . . . . . . . . . . . . . . . . . . . . .
.
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
- . . . . . . . . . . .
- - .
.............
was common to wait for 90 days before you smelled any cash. Thus, week by week the company funds headed ever South, until at one point we were asked to provide our own pencils and paper. Then suddenly the graph flattened out ...... a little later it even began to rise ...... and ...... by golly we'd made it! l've been very fortunate in my career to have been exposed to many aspects of electronics, but perhaps the most fortuitous was learning how to write test programs. The main service provided by Cirrus Designs in the early years (circa 1981 to 1984) was to write functional test programs for circuit boards designed by other companies. These programs were developed using GenRad's 2225 testers, into which one plugged the board to be tested, tS~A test program consisted of driving a pattern of logic Os and logic l s into the board's inputs, waiting for their effect to propagate through the board, and checking that the corresponding pattern of logic 0s and logic l s on the board's outputs matched an expected response. 191Each pattern applied to the inputs was called a test vector, and a test program could consist of thousands (or tens of thousands) of such vectors. My personal view is that l've never received better training than learning to write these test programs. To understand why this should be so, you have to realize that all one was provided with was a "good" circuit board (which often wasn't), and a set of "good" schematics (which often weren't). It wasn't unknown (he said with a wry grin) for the schematics to actually correspond to an earlier or later version of the board, which made life interesting to say the least. First you had to look at the schematics and try to decide exactly what it was that the board was intended to do, then you had to write a test program. Next you ran the test program while applying a fault to the board (such as shorting a device's output pin to ground) in order to ensure that the program could detect and resolve the fault, and you repeated this for every possible fault that could occur on the board. So in addition to reverse-engineering someone else's logic (which is an educational experience in itself), you also discovered portions of the design that were effectively untestable, and you did this to boards designed for myriad applications by different engineers in diverse companies using widely different design styles. As I said, this was the best training in digital logic one could hope to receive. After a few years of test programming, I wandered into my boss's office to find him pondering a request to write some digital simulation models. The problem was that none of us had actually seen a digital logic simulator "in the flesh," but the remuneration (for the company) was particularly attractive. I remarked that 8Cirrus Designs subsequently became a wholly owned subsidiary of GenRad, Concord, MA, USA. 91fthe board contained register elements or memory devices, then a pattern appearing on the board's outputs didn't necessarily correspond to the current pattern on its inputs, but was instead a function of one or more preceding patterns.
9
10
Designus Maximus Unleashed!
this project looked like it could be fun, and he promptly told me that l'd got the job. I firmly believe that everyone is really good at something, but some people wander through life without ever discovering any of their particular callings. In my case, one of my vocations turned out to be digital simulation, which I took to like a duck to water. Over the next few years I wrote simulation models for everything from ASIC cell libraries to microprocessors, and I also started to pen white papers specifying possible enhancements to the logic, timing, and fault simulators. From there I branched out into giving advanced simulation modeling classes all over Europe, Asia, and America. This served to place me in contact with integrated circuit manufacturers, from whom I collected all sorts of useful nuggets of knowledge about exotic low-level delay effects (Chapter 25).
~Intergraph eorporation: ED~ and ~ a c k to eomputers After nine years with Cirrus Designs (the latter two of which were spent on the South coast of England with our sister company, Cirrus Computers, who specialized in simulation technology), I accepted a position with the electronics division of Intergraph Corporation, Huntsville, AL, USA. I spent a happy first year plunging myself into myriad aspects of electronic design capture, and also writing specifications for a futuristic digital simulation language and simulator. Sad to say this language and simulator were never to see the light of day, because Verilog and VHDL were fast becoming industry standards, and it soon became apparent that the last thing the world needed was "Yet another language and simulator," irrespective of how good they may have been. However, it was during this time that I came into contact with Alon Kfir (the creator of the BOOL Logic Synthesis utility presented in Chapter 3). Alon, who is an expert in many aspects of electronics and mathematics, was based in California, so we ended up spending countless hours on the phone debating fine points of logic value systems (Chapter 9) and simulation algorithms. Following this first year, my life took yet another turn. Recognizing my expertise in digital simulation, those who strode the corridors of power and donned the undergarments of authority at Intergraph Electronics resolved that I was destined to become the analog and mixed-signal marketing manager, so I was thrust into a whole new ball game (life is never boring as long as you're learning something
FleW). After five years with the electronics division (which subsequently became known as VeriBest Inc.), I was given the opportunity to join Intergraph Computer
Chapter I Capriciously Cunning Software
Systems (ICS), where I remain to this day. Computers have sure come a long way since my days designing mainframes and life is pretty exciting at ICS, because we specialize in making state-of-the art single and multiple Pentium | Pro and Pentium ll| systems running the Windows | 95 and Windows NT | operating systems. Furthermore, Intergraph's RealiZm graphics cards are acknowledged as providing the fastest and most powerful 3D graphics available on IntelAJVindows NT platforms. These cards are employed for the highestperformance 3D simulations and virtual reality visualizations (and I get to play with them!). As I pen these words I find it difficult to believe that l've now spent seven years at Intergraph, because it really seems as though I arrived in America only a few weeks ago. TM
~qrticle$~ eonference$~ and ~oolc~ At some stage after commencing employment with Intergraph, I was fortunate enough to have an article accepted by a magazine (I forget which one now), followed by another, and another, and ..... suddenly the flood gates opened, and I was being published in one magazine or another almost every month. The combination of these articles and my job responsibilities also caused me to be invited to present papers at conferences, first in America with the Printed Circuit Design and the Analog and Mixed-Signal Design conferences, and subsequently in Korea, Taiwan, and China (~~ with the Electronics Design Automation & Test (EDA&T) conference, of which I also served as a member of the technical advisory board for two years. And then there are my books. It's strange really, because I had no conception as to what life had in store for me on this front. This all came about when I was lying in bed planning a trip down to a local bookstore one Saturday morning in the autumn of 1992. For reasons unknown I decided that it would be really cool to walk into a bookshop and see a book l'd written on the shelf ...... so I wrote one. (It sounds easy if you say it quickly, but it actually took two years of evenings and weekends.) I called my first book Bebop to The Boolean Boogie (An Unconventional Guide to Electronics), I~l because it was my book and I could call it what I liked! It's safe to say that Bebop is somewhat unusual, because it starts at ground zero with "What is an atom and what is an electron," and ends up with such esoteric topics as holographic optical interconnects and nanotechnology. Along the way we explore subjects like the secrets behind ~~ got to see the Forbidden City in Beijing and walk on the Great Wall of China, both of which are really, truly amazing. ~HighText Publications (www.hightext-publications.com), ISBN 1-878707-22-1, and for your shopping convenience there's an order form for Bebop to the Boolean Boogie in the back of this book, (if I might make so bold).
11
12
DesignusMaximus Unleashed!
musical socks (the ones that your aunts buy you for Christmas that play an immediately annoying tune when you squeeze them ~ the socks, not the aunts), and there's a rather spicy recipe for a "No-Holds-Barred Seafood Gumbo" (this isn't your mother's electronics book). Once I'd penned the words "The End," I started to think about looking for a publisher. I'd heard some horror stories about this part of the process, but it turned out to be less of a problem than I'd feared. A magazine editor friend of mine suggested I call HighText Publications in Solana Beach, California, because he'd been hearing good things about them. So I called Carol Lewis, the president of HighText, who said "Send the contents list and a couple of example chapters." I posted them on Tuesday, they arrived Thursday morning, and I got a call Thursday lunchtime saying: "We're sending you a draft contract." So there we were. Carol later told me that prospective American authors typically approach a publisher saying: "I've got a great idea for a book, how much of an advance will you give me to write it?", while Europeans almost invariably write their books first and then approach the publisher saying: "I've written this book, would you like to publish it?"
Bebop hit the streets in February 1995. As soon as our local bookstore (Madison Books and Computers on Old Madison Pike ~ a great store, highly recommended) got some copies in, they called me and I rushed down with my daughter, Lucie. We stood there for a while looking at my books on the shelf, then we went to McDonalds for a burger and some fries. So that was that; I'd done what I set out to do and it was all over ...... or so I thought. What I didn't know was that this book was to open all sort of doors to future adventures, such as Electronics Design News (EDN) offering me a regular column, and eventually making me a contributing editor. (~2) Writing Bebop was just about the hardest thing l'd ever done up to that time (Ah, the loneliness of the long-distance writer), and I swore that I'd never attempt another book in my life. Plus I'd hardly seen any television for two years, so I prepared to put some hours in as a couch potato. Have you watched television recently? The vast majority of it is absolute dross! After a couple of weeks I began to fear for my sanity, not the least that I could hear the computer calling plaintively to me from my study. Also l'd had some ideas for a new project, but I didn't want to suffer through the writing of another book alone (suffering loves company). So I approached a chum of mine called Alvin and tempted him with tall tales of how much fun writing a book could be (he's older and wiser now ...... I should bill him for the education).
12Contrary to popular belief, being a "Contributing Editor" doesn't imply a salaried position. It's really an honorary title, but it has made my mother v~ry proud, so that's all right!
Chapter I Capriciously Cunning Software Two and a half years later we finished Bebop B YTE S Back (An Unconventional Guide To Computers). I~31One of the cunning things a b o u t this book is that it's accompanied by a CD-ROM (for Windows 95) containing the fully-functional Beboputer T M Virtual Computer. In fact the Beboputer is the topic of the next chapter, and a d e m o copy of the Beboputer is included on the CD-ROM accompanying this book. Once we'd h a n d e d Bebop BYTES Back over to the publisher, c~4~I was prepared to take life easy for a while, by which I m e a n l'd put in my ten hours a day at Intergraph and spend my evenings reading books and doing not much of anything at all. Sadly this was not to be, because the opportunity came my way to collect a n u m b e r of my magazine articles and miscellaneous musings together into the tome you're reading as we speak. The really tempting aspect about this third book was that I could present my articles in their original form. One of the problems with magazines is that they each have an internal style, which typically excludes h u m o r and the use of first person pronouns. For example, an article I submit to a magazine might c o m m e n c e as follows:
"When we were younger we thought we knew everything there was to know about binary arithmetic, but as the years roll by we've c o m e to realize that we haven't got a clue." The idea l'm trying to convey here is that l'm "one of the lads," in the trenches, up to my armpits in mud, standing side-by-side with the reader. But the way this reemerges after editing is quite likely to be along the lines of the following: "When you were younger you thought you knew everything there was to know about binary arithmetic, but as the years roll by y o u ' v e c o m e to realize that you haven't got a clue." As you can see, this presents the s o m e w h a t different impression that l'm stood on top of the pitcher's m o u n d pontificating furiously and talking d o w n to everyone. This isn't to say that l'm against editing in general ...... I just don't like it being done to me! So the chance to be able to write a book in my own words without fear of having them mutate into another form while my back was turned was just too tempting to resist. 13Doone Publications (www.doone.com), ISBN 0-9651934-0-3. Also, it may not surprise you to learn that there's an order form for Bebop BYTES Back in the back of this book. 14The publisher of Bebop BYTES Back is Doone Publications, but this is no reflection whatsoever on HighText who published Bebop to the Boolean Boogie. Doone is owned by friends of ours who live in the immediate vicinity, and HighText have been strongly supportive and offered invaluable advise throughout.
13
14
Designus Maximus Unleashed!
So H e r e We Fire If I were a betting man, I'd take a stab in the dark and guess that you think l'm at the beginning of this project as I write these words. Ah, the innocence of youth. In fact l've already finished the rest of the book (that is, the other twenty-nine chapters, two appendices, acknowledgments, and suchlike), and as soon as I pen these final words l'm going to stick the whole thing in the post to the publisher, then wander out to a purveyor of fine refreshments to blow the froth off a few beers with some friends. So why have I left the first chapter until last? Well for one thing I find the technical stuff easier to write than my life story. But perhaps more importantly, one of the capriciously cunning pieces of software I was going to feature in this first section turned out to be not quite so capriciously cunning as l'd first thought, so I decided to chuck it out. But that left me with the problem that the other chapters reference each other, and there's no way I could face going through the whole thing changing all of these references again (plus thirty chapters seemed to be an awfully nice round number ~ much better than twenty-nine). So I decided to lose the Preface and to write this chapter instead (hey ~ it's an engineering solution, OK?). And that's all there is to it really. Of course l've condensed the tale somewhat, omitting the wild parties, the dancing girls, the walrus, the trips to exotic lands, and the months where nothing much happened at all (where this latter state of affairs was more common than not). I've also tended to "accentuate the positive and eliminate the negative," as they say; for example, I haven't once mentioned the fact that the only consistent way l've found of dismounting the unicycle in the house is to slam headfirst into the pantry door, then slide down the wall getting splinters up my nose (Ooops, you tricked it out of me). So I'II close right here and now before we get maudlin, and leave you free to root through the rest of my meandering musings to your heart's content. With my very best wishes Clive "Max" Maxfield (April 1st 1 9 9 7 - J u n e 30th 1997).
Chapter 2:
The Bebopu~er TM
Virtual Computer "When is a computer not a computer?" In this chapter you will discover: P e d a g o g i c a l a n d F a n t a s m a g o r i c a l .....................................................................16 For Your D e l e c t a t i o n a n d D e l i g h t ..............................................................................17 It's A l m o s t Time to Rock a n d Roll ...............................................................................17 The Switch Panel ......................................................................................................................................18 The Hex K e y p a d a n d M e m o r y Walker Display .................................. 21 The CPU Register Display ...........................................................................................................24 M u l t i m e d i a I n t r o d u c t i o n s a n d t h e W e b .........................................................24 O t h e r C o o l Stuff ......................................................................................................................................... 25
16
DesignusMaximus Unleashed!
Pedagogical and 3antasmagoricai In November 1994, my chum Alvin agreed to collaborate with me on a book on computers. We told Alvin's wife, Sue, that it would only take about a year and that it wouldn't occupy our every waking moment (the frightening thing is that we believed this to be true!). Two and a half years later, after spending almost every evening and weekend I~ slaving over hot computer keyboards, we staggered out of our respective studies with stooped shoulders and the finished manuscript clutched firmly in our shaking hands. That book, Bebop BYTES Back (An Unconventional Guide to Computers), was one of the most demanding projects l've ever undertaken, but l'm forced to admit that it's also a project of which l'm inordinately proud. To complement the book, we created what came to be known as "The pedagogical and
fantasmagorica! Beboputer(TM) Virtual Computer." To cut a long store short, we designed a simple microprocessor with an easily understood instruction set. Next we designed and implemented a computer based on our microprocessor, and we christened this system the Beboputer (pronounced "Bee-bop-you-ter"). Now comes the capriciously cunning part of our tale, because instead of constructing the Beboputer in hardware (out of silicon chips), we implemented it as a virtual machine in software. To a large extent we treated the Beboputer as a hardware project. We'd have regular design review meetings in which we'd define new devices to be plugged into the Beboputer's input or output ports. For example, in the case of our virtual QWERTY keyboard, we created a complete specification describing the way in which the keyboard would capture key codes in an internal latch, how this latch would be automatically cleared when the Beboputer read it's contents via an input port, what would happen when the
key was pressed (on our virtual keyboard), and so forth. Then Alvin would wander off into the depths of his study to create the virtual device, while I commenced to document the first typewriter patent, 121and then proceeded to describe the evolution of these devices via printing telegraphs and teleprinters into the computer keyboards of today. It sounds easy if you say it quickly!
iWe both have full-timejobs in the day. 2The first patent for a typewriter was granted by the grace of Queen Anne to the English engineer Henry Millin 1714.
Chapter 2 The B e b o p u t e r TM Virtual Computer
3or gour Delectation and Delight It's important to note that the Beboputer is to all intents and respects a real computer, for which one can create and run programs. Furthermore, in addition to a wealth of virtual input and output devices, the Beboputer is equipped with a smorgasbord of useful utilities, such as the CPU Register Display which lets you see the current contents of the registers and status flags within the CPU. In addition to describing where computers came from and how they work (including an in-depth investigation of the design of our virtual Central Processing Unit (CPU)), Bebop BYTES Back documents a series of step-by-step interactive laboratories to be performed on the Beboputer. But I can tell that your excitement is beginning to run wild and adrenaline is beginning to flood your nervous system, so I'II cut to the chase. For your delectation and delight l've included a demo copy of the Beboputer on the CD-ROM accompanying this book. Throughout the remainder of this chapter I'II walk you through a quick demonstration, then I'II leave you free to experiment with the little rascal to your heart's desire.
~It's ~ l l m o s t ~ i m e to l~ock a n d I~oll If you haven't already installed your Beboputer from the CD-ROM accompanying this book, do so now by following the instructions in Appendix A. (Note that the Beboputer is only certified for use with Microsoft Windows 95. Also note that your screen resolution should be 800 x 600 or higher.) Should you have any problems with the installation process, check out the Frequently Asked Questions (FAQ) section of our Beboputer-specific web pages at:
ht,t,p'.//ro.comlmaxmonlb~e~Iby~s, htm If during the installation you receive a message saying something like "Can't
install xxx.dll because this file is a/ready in use," just select Ignore because if the
file is already in use you have already got it and you don't need to load it. Once you've completed the installation, ensure that the CD is still in the appropriate drive in your computer, then use S t a r t - > Programs-> Bebopui;er ->13eboput,er Computer to invoke your yew own pedagogical and fantasmagorica] Beboputer Virtual Computer. OK, you're now in the Beboputer's main project window. When you know what you're doing you can just kick off from here invoking whichever tools and utilities you wish, but to save time we've already created a project for you. Use the File - > Open Project command to invoke the project called ~iemoprj, which will hurl a number of devices onto your screen.
17
18
DesignusMaximus Unleashed!
~ h e Switch P a n e l One of the more prominent devices is called the switch panel (Figure 2-1). In ye olden tymes computers didn't have typewriter-style keyboards, and the switch panel was one of the main methods for the operator to instruct the computer as to which operations it was to perform.
Figure 2-I: The Beboputer's switch panel Running along the upper half of the switch panel are sixteen switches which correspond to the signals forming the Beboputer's 16-bit address bus. Similarly, the lower left hand side of the switch panel contains eight switches which correspond to the Beboputer's 8-bit data bus. In the bottom right-hand corner is the main ON/OFF switch, which is used to power-up the Beboputer (we'll look at the other buttons in due course). Use your mouse to click the ON/OFF switch on the switch panel. The switch toggles, the display lights flash, and, if your home computer is equipped with a sound board, you'll hear the cooling fans begin to roar into action (if you didn't shell out the extra lucre for a sound board ...... we bet you're sorry now). In addition to the main switch panel, there are a number of other devices on the screen, including an 8-bit switch input device (Figure 2-2). This device is plugged into one of the Beboputer's input ports at address $F000 (dollar "$" characters are used to indicate hexadecimal values). A switch in the down position corresponds to a logic O, while a switch in the up position corresponds to a
logic 1. Corresponds to Input bit 7
Figure 2-2: A simple 8-bit switch input device
Corresponds
to Input bit 0
Chapter 2 The Beboputerr~ Virtual Computer
Similarly, we've plugged a dual decoded 7-segment LED display into one of the
Beboputer's output ports at address $F023 (Figure 2-3). The most-significant 4 bits of the output port are used to drive the lefthand digit, while the least-significant 4 bits drive the right-hand digit. (Note that the device on your screen won't be displaying any numbers at this stage, because we haven't actually written any data to this output port. ) What we're going to do is to enter a program that Figure 2-3: A dual decoded will perform an endless loop, reading the state of 7-segment display the switches on the 8-bit input device and writing this state to the dual 7-segment display. The flowchart for this program, along with its associated opcodes and data bytes, is shown in Figure 2-4.
.....................
9
.....................
s 4001 ooo ~I....~~ ~............lIi $91 =
to by the following two ~ s $4oo2 [...........~,~ .......|.... (which are SPO and $00 = SFO00)
$40 o4 I ........~:~:~.....| pointed to !:5/the following two bytes $ 4 0 0 5 iii ~~ ............!| ((which ~ , h i : h are ~ _~1=/') ~ 9 ~ -SFO a~m,,'l n d 4$23 = ~1r/'~"2'~'% $F023) $4006
..................... "1~
Load accumulator from the address
................................................................ pointed
$4007
$40o8
.....~.... ......./.....$C1 = Jump back to the addrees ..........~
..........|..
(which are $40 and $00 = $4000)
Figure 2-4: A simple "load and store" program Note that the Beboputer's ROM occupies addresses $0000 through $3FFF, while its RAM occupies addresses $4000 through $EFFF (addresses higher than this are used for input/output ports and suchlike), so this program happens to commence at the first location in the RAM. From the flowchart we see that the first action is to load the accumulator with whatever value is represented by the 8-bit switch input device. In this flavor of a load, the $91 opcode at address $4000 instructs the CPU to load the accumulator with the contents of the memory location which is pointed to by the following two bytes. Also note that the Beboputer's designers (Alvin and myself) decided that addresses would be stored in memory with the most-significant byte first. Thus, the contents of the two bytes stored at addresses $4001 and $4002 ($F0 and $00, respectively) are understood by the CPU to represent the address $F000.
19
20
DesignusMaximus Unleashed!
However, we're going to trick the CPU, because although it thinks it's reading from memory, the address we're using actually points to the input port into which we've connected the 8-bit switch device. Similarly, the $99 opcode at address $4003 instructs the CPU to copy the contents of its accumulator into memory location $F023 as specified by the address contained in the following two bytes (this is the address of the output port that drives the dual 7-segment display). Finally, the $C1 opcode at address $4006 instructs the CPU to jump back to the memory location specified by the address contained in the following two bytes. In the case of this program, we've decided that the jump instruction should cause the CPU to return to address $4000, which is both the beginning of the program and the beginning of the loop. Just to get the feel of how painful programming a computer used to be, we (well you actually) are going to employ the main switch panel to load this program as shown in the following sequence of illustrations.
.
.
.
l
.
.
l
1
.
l
l
i
l
.
l
.
.
.
.
.
.
.
.
l
l
l
l
l
l
l
l
l
l
1
~l~l~l~H~l~I=l~/ I
i
l
l
I
l
l
l
lil~lili~il~t~t~/~t~l~l~~ .
.
.
.
.
.
.
.
.
.
.
.
.
i~l~l~/ilit~l~/=lilili/~l~il'.~i www
lltilmti/
Set up the addreee anti data ewitchee ae ehown to the left then click on the Enter button. Thle ioacle $91 into aeldreee $ 4 0 0 0
$ ooo
Set up the aclclreee anti data ewitchee ae ehown to the left then click on the Enter button. Thle loade $FO into a~ldreee $4001
$~ooz $4002
$4001
$4000
Set up the acicireee and data ewitchee ae shown to the left then click on the Enter button. Thle loa~le $00 Into a~l~re~e $4002
$4002 $4oo
Set up the acldreee and data switchee as shown to the left then click on the Enter button. Thle loacls $99 into acl~iress $ 4 0 0 3
$4002 $4003 $4004
Set up the address and data ewl'r,ehee as
s4oo3
button. Thle ioacle $FO into addrese $ 4 0 0 4
$400 s
l........
......!.........
!
$4 oo1
!
Chapter 2 The BeboputerT M Virtual Computer ~
t
~
I
~
~
. . . . . . . . . . . . .
-- . . . .
~~l~~I~l~l~~ ~~l~l~i ..... n
~ i
-
a m
i
l
i
m
i
n
l
m
~ i
m
l
i
i
i
m
a
m
n
m
Set up the a~ldre~s and data switches as shown t o the left, then click on the Enter button. This loads $23 into address $4005
$4004 $r $4006
Set up the address and data switches as shown to the left then click on the Enter
s4oos I....~ ..........] , 4 o o 6 t!iii!~i~~ii ~ o o ~ ~..,.:..:.:i~i.i]i!ii.i!.!
button. This ioacle $C1 into actctreee ~ 0 6
i
i
i
~
i
Set up the address and data switches as
shown t o the left, then click on the Enter
....!iiil
s4 o o 6 liii!iiii!i@~i~! 4 o o ; I~!il ii!ii!iiil
button. This loads $40 into address $4007
~4oo8 li
5st up the address anti data switches ae shown t o the left then click on the Enter
$4007 $r
button. This loa~ls $00 into admirers $ 4 0 0 8
]iii!]
I
$4009
Don't forget that you can always correct any errors by overwriting a location's contents with new data. Once you've entered the program and you're happy that all is as it should be, return the address switches to the program's start address of $4000 (Figure 2-5), then _ __ _ _ _ _ _ _ _ _ _ _ _ _ click the Run button to let the Beboputer rip. Figure 2-5: Start address Now although not a lot seems to be happening, the Beboputer is actually working furiously, reading the values of the switches on the 8-bit input device and writing what it finds there to the 7-segment displays. Play around for a while toggling the switches on the 8-bit input device and watching the 7-segment displays respond.
~he Hex Keypnd and ;~lemory Walker Displny Now we could continue to experiment with the switch panel, but we've still got a lot of other things to do, so once your adrenaline rush has spent it's course, click the ON~OFFswitch to power down the Beboputer, then click the clismi~s button in the switch panel's upper right-hand corner. As you've discovered, entering programs using a switch panel is somewhat tedious. The next stage in home computing was to replace the switch panel with a simple keypad device. In the case of the Beboputer, we've supplied you with a hexadecimal keypad (Figure 2-6).
21
22
DesignusMaximus Unleashed! On = 0n/Off Use Setup - > Hcx K~dpad to invoke this AD = Aaare~e device on your screen. Use your mouse to drag the hex keypad by its title bar to Da = Data CIr = Clear a clear area on your screen. Now use Rst = Reset Ent = Enter 5 t p = Step Ku = Run
Display - > Memory Walker to access a rather useful utility that let's us v i e w the contents of the Beboputer's m e m o r y (Figure 2-7).
Once again, use your mouse to drag the memory walker by its title bar to a clear area of your screen. Note that the memory walker on your screen is grayed out, thereby indicating that the Beboputer isn't powered up yet. By default, the
Figure 2-6: The Beboputer's hex keypad
m e m o r y w a l k e r starts at address $ 4 0 0 0 , w h i c h is
the first location in the Beboputer's RAM. Use your mouse to drag the bottom edge of the memory walker down such that you can see locations $4000 through $400A as illustrated in Figure 2-7. Now click the ON switch on the hex keypad to power the Beboputer up again, and observe that the memory walker display clears to white to indicate that the Beboputer is ready for action. Note the $XX data values in the memory walker, which indicate that the contents of these RAM locations currently contain random, $4ooo SXX I ... unknown values.
$4001
$XX iiiiii.,
$4002 $XX OK, we're going to need a small program .............. $4003 $XX ~ to play with, such as the one we entered with the switch panel, which consisted of .............. $4005 $XX just nine bytes: $91 $F0, $00, $99, $F0, i~:i $4006 sxx iiiiiiiiiiiiii, $23, $C1, $40, and $00. If we wished, we $4oo7 sxx !i!i!i!!i!i!i!~, ~: $4oo8 sxx ]i~ ' ii~,'i~"i~i~,i,'i could use the hex keypad for this purpose ~:.~::: $4009 $XX ,~:~,i~!~::;,~ i ,i~,i~,;i (it would be one heck of a lot easier than :~:.~ : $400A $XX i~i~ii was the switch panel). To start this process, ............... all we would have to do would be to click Figure 2-7: The Beboputer's memory the keypad's Ad (Address) button and walker display enter the start address of $4000, then click the Da ( D a t a ) button and enter the first byte of data, which was $91. If we now clicked the Ent (Enter) button, this would load the $91 data into address $4000, automatically increment the address field to point to address $4001, and leave
,4OO4 ,XX:
................
. . . . . . . . . . . . . . . .
Chapter 2 The BeboputerT M Virtual Computer
the data field active, thereby allowing us to alternate between entering data and clicking the Ent (Enter) button. Alternatively, we could use the memory walker display to enter this data directly by simply double-clicking on whichever data field we wish to edit. But both of these techniques would be boring. Instead, you're going to load this program from a virtual paper tape that we prepared for you while you weren't looking (Figure 2-8).
Tape in~
Tape ~ t
Figure 2-8: The Beboputer's paper tape reader/writer This is a top-down view of the paper tape reader/writer's mechanism. Click the hex keypad's Ad (Address) button to make the address field active and enter address $0050. This is the address of the paper tape reader subroutine in our monitor program which is stored in the Beboputer's ROM. Now click the keypad's Ru (Run) button to initiate the subroutine. In the real world each paper tape would have a hand-written label, and it would be up to the operator to locate the tape from a rack (or the bottom of someone's desk drawer). In our virtual world, you're presented with a dialog asking you to select a tape. In fact your only choice in this demo is to select the item called demotape and then click the Open button, which will cause the contents of this tape to be loaded into the Beboputer's memory. Note that the memory walker updates to reflect the fact that the program from the paper tape has now been loaded into the Beboputer. Now click the hex keypad's Ad (Address) button again, enter our program's start address of $4000, click the Ru (Run) button to initiate the program, and play with the 8-bit switches for a while to convince yourself that the program still works. Note that the act of running the program caused the memory walker to gray out again, to indicate that it's contents are not guaranteed to be valid while the Beboputer is in it's Run mode. Now click the keypad's Stp (Step) button, which causes the Beboputer to transition into its Step mode. The memory walker returns to white, and a chevron (arrow) character appears in its "Step" column to indicate the next opcode to be executed (note that you can't predict the opcode at which the Beboputer will pause, because this depends on whatever it happens to be doing when you enter the Step mode). Click the Stp (Step) button a few more times and watch what happens in the memory walker, then proceed to the next section.
23
24
DesignusMaximusUnleashed! ~ h e e P l / I i~egister D i s p l a y The memory walker display has a variety of uses that we haven't considered, such as setting and clearing breakpoints, but once again we need to move onwards and upwards. Dismiss the memory walker by clicking the dismiss button in the upper right-hand corner of its title bar, then use Dlsplay-> CPU F,.eglst~rs to activate the CPU register display (Figure 2-9).
Figure 2-9: The Beboputer's CPU
register display
As its name might suggest, this tool is used to display the contents of the Beboputer's internal registers. Click the hex keypad's 5tp (St~p) button a few times to see the program counter and instruction register change. Note that the "program counter" shows the address of the next instruction to be executed, while the "instruction register" field reflects the last instruction to be executed.
Now click the F-,u (le,un) button to let the Beboputer run wild and free, and watch the activity in the CPU register display. Note that the accumulator field in the CPU register display isn't changing, because our program causes it to reflect the value on the 8-bit switch device. But if you click the switches on the 8-bit device, you'll see the accumulator field update to reflect this new value. Last but not least, click the keypad's l~st (le,e~r162button to return the Beboputer to it's le,es~r mode.
M u l t i m e d i a ~Tntroductions a n d t h e Web Amongst a myriad of other topics, Bebop BYTES Back (the book accompanying the Beboputer) documents a series of interactive labs (similar to what we've done here, but in much more detail), and each of these labs comes equipped with a multimedia introduction. Although it doesn't strictly match our discussions here, we've included the multimedia overview from lab 4 for your edification and enjoyment. To invoke the multimedia viewer, either use Tools - > Multlm~la, or click the icon that looks like a film projector on the Beboputer's project window toolbar (note that a sound card is strongly recommended). Click the Exl= button in the multimedia theater when the presentation is completed.
-
......
- ..................
.......................
~. . . . . .
....~
Chapter 2 The BeboputerTM Virtual Computer
.................
......,,
,. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
In addition to its multimedia content, the Beboputer is also "Internet Ready," by which we mean that you can use it to connect to some special BeboDuterweb pages (this assumes that you have a modem and access to the Internet and the worldwide web). As you'll discover, these web pages contain all sorts of cool stuff like competitions and additional programs and subroutines written by ourselves and other users.
Other eool S t u f f Although we haven't had time to go over it here, your demo copy of the
Beboputer contains a lot of other cool stuff, including a virtual sound card (which allows the Beboputer to do things like speak numbers aloud), a virtual QWERTY keyboard, and a virtual memory-mapped computer screen. You can access a lot of these devices by using S ~ u p - > Input Po~s and S ~ u p - > Output Po~s, and there's also a lot of information available in the online help. Furthermore, Appendix B of this book provides some additional discussions on the Beboputer's addressing modes, along with a summary of the Beboputer's instruction set. On the down side, you can't use this demo copy to execute the File - > Save Project command or the Memory - > Save I~AM command (see the online help for details on what these do). Also, the real Beboputer comes equipped with a bevy of general-purpose subroutines and smorgasbord of support tools, such as a cross assembler and a character editor, but all of these treats have been omitted from this demo copy. Having said this, the demo copy contains a lot of really good stuff, which should keep you occupied for many evenings to come. Enjoy!
BebopBYTESBack (An UnconventionalGuide to Computena),lSl3N 0-9651934-0-3, accompanied by a fully,functional copy of the Beboputer,then you can Should you be interested in acquiring
l~ F
ii check out the publisher s web pages at www.cioone.com or use the order t~
25
This Page Intentionally Left Blank
Chapter 3:
BOOL Logic Synthesis "Not without a sense of humor" In this chapter you will discover: W h a t is B O O L ? .............................................................................................................................................28 Instal!ing B O O L .............................................................................................................................................28 R u n n i n g B O O L ..............................................................................................................................................28
28
DesignusMaximus Unleashed!
What i s / 3 0 0 s On the CD-ROM accompanying this book you'll find a directow called [500L, which contains the fully-functional BOOL logic synthesis package. This tool was created by a friend of mine, Alon Kfir, a man with a size 16 turbo-charged brain. Alon has kindly put BOOL into the public domain, which means that you can make copies of the software and give these copies away, but you may not charge any fees apart from nominal shipping and handling charges. Also, any such copies must contain all of the original files and you may not misrepresent the origin of these files. BOOL is a general-purpose tool for synthesizing two-level logic from a behavioral description, and is particularly useful for ASIC and PLD synthesis. BOOL contains an HDL compiler, a two level minimizer, a high level event driven simulator, and sophisticated output formatting routines. More details on BOOL, including the terms and conditions for its use are to be found in three files in the BOOL directory: readme.doc, hlghllte.cloo, and manual.doe, where this latter file is a complete 11,000 line tutorial manual and reference guide (so I'd strongly recommend that you print it out one chapter at a time :-)
lnstalling 1300s BOOL can run on any class of IBM/PC and compatibles under DOS 2.0 or above. Installing BOOL essentially consists of copying the files to your hard disk, and setting an appropriate search path to the executable and library files. 1) Create a directory called 1500L at the top level of your main hard drive 2) Copy all of the files from the BOOL directory on the CD-ROM into the BOOL directory on your hard drive.
Running
00s
The manual.doc file in the BOOL directory will walk you through a series of test cases that demonstrate the full capabilities of BOOL. For our purposes here, we just want to perform a simple experiment in order to show you the general principles. The topic in Chapter 18 is a rambling discussion on the design of a three-phase clock. During the course of these discussions we derive a truth table and extract a set of Boolean equations (Figure 3-1).
.
.
.
.
.
.
.
,,,,, .
.
.
.
.
.
.
.
,
.
Chapter 3 BOOL Logic Synthesis .
.
.
.
.
.
.
.
,.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
c13 = (~p3 & p2 & p1) I (p3 & p2 & p1) I (p3 & p2 & ~p1) ~12 = (~p3 & ~p2 & p1) l (-,p3 & pZ & p1) l (p3 & pZ & p1) cll = (~p3 & ~p2 & ~p1) I (~p3 & ~p2 & p1) I (~p3 & pZ & p1)
Figure 3-1. An example truth table and Boolean equations ('~' = NOT, "&' = AND, end '1' = OR)
If we were to implement these equations directly (taking shared product terms into account), we'd require five 3-input AND gates and three 3-input OR gates (note that we don't require any NOT gates, because we can pick up the negated versions of these signals from the complementary outputs of the D-type flip-flops that are used to generate the p1, p2, and p3 signals). Of course, the equations shown in Figure 3-1 are not optimized or minimized in any way. The discussions in Chapter 18 proceed to use Karnaugh Map techniques to minimize these equations, but for our purposes here we'll use BOOL. In fact there are many user scenarios for working with BOOL depending on the way your system is set up. BOOL itself runs under DOS, but my home computer is also loaded with Microsoft Windows 95, so the way in which I use BOOL reflects my system configuration. First of all I invoke the Notepad editor supplied with Windows 95, and I enter the following text (note that BOOL prefers to see keywords like minimize and print: in lowercase)"
inpul: p1, p2, p3; out;pul~ c11,ci2, ci3; cl3 = (!p3 & p2 & p1) I (p3 & p2 & p1) l (p3 & p2 & !p1); cl2 = (!p3 & !p2 & p1) i (!p3 & p2 & p1) I (p3 & p2 & p1); cll = (!p3 & !p2 & !p1) I (!p3 & !p2 & p1) I (!p3 & p2 & p1); minimize(d3, d2, cil); prinl:(ci3, ci2, cil); encl;
29
0
DesignusMaximus Unleashed/
The format of this file is really simple. First we declare our inputs and outputs; then we specify the Boolean equations we want to minimize; next we call the minimize function and tell it which signals we want it to process; and finally we call the print function which will output the results. Note that BOOL uses exclamation marks "!" (also known as "shr/ek" characters) to indicate inversions (as compared to Figure 3-1, in which we used tilde " - " characters). Next I use File - > Save in the text editor to save this source code to a file called c:\bool\3pclock.txt (substitute c: for the name of your hard drive if it's different). Now I use S t a r t - > Programs - > MS-D0S Prompt to open up a DOS window, then I use the command col c:\bool to change the directory to the work directory I created earlier (once again, substitute c: for the name of your hard drive if it's different). Now I use the dir command to assure myself that my 3pclock.txt file is indeed in this directory (call me "old-fashioned" if you will), then I run BOOL on this file using the c o m m a n d bool 3pclock.txt, which results in the following equations being displayed on my screen:
d3 = (p2 & p3) I (pl & p2); ci2 = (pl & lp3) I (pl & p2); cll = (!p2 & !p3) I (pl & !p3); You can only imagine my surprise and delight to discover that these equations are the same as those I generated by hand using my Kamaugh Maps in Chapter 18. Due to the fact that some of the product terms are shared between these equations, I can now implement this logic using just four 2-input AND gates and three 2-input OR gates (Figure 3-2). p3 el3 p2 p1 c12 !p3 pl ell !p3 .Ip2
Figure 3-2: The minimized solution from BOOL requires four 2-input AND gates and three 2-input OR gates
Chapter 3 BOOL Logic Synthesis
For your interest, you may care to note that I didn't close the Notepad editor containing my source file, because I might want to make some changes later. In fact I always keep both the editor and the DOS window on my screen. Thus, whenever I make any changes to my source file, all I have to do is perform a Save operation and then re-run BOOL. This interactive technique bouncing back and forth between Notepad and the DOS window provides a really fast and efficient way of interacting with BOOL. Last but not least, note that this trivial example did little to demonstrate the power of BOOL. This surprisingly sophisticated package can perform a wealth of cunning tricks, and it has proved itself to be endlessly useful to me over the years. For example, BOOL can synthesize hazard-free logic I~l if directed to do so, which is something that even the "big-boy" synthesis tools aren't particularly good at. In order to fully appreciate BOOL's features and capabilities, I strongly recommend that you read the manual.doe file provided with BOOL and perform all of the tutorial examples therein. Enjoy!
IHazards and hazard-free logic are discussed in Chapter 16.
31
This Page Intentionally Left Blank
Chapter 4:
MMLogic Multimedia Logic Design System
"A syncopated symphony of delight" In this chapter you will discover: W h a t is M M L o g i c ? ................................................................................................................................. 34
Installing M M L o g i c ................................................................................................................................ 34 R u n n i n g M M L o g i c .................................................................................................................................. 34
34
DesignusMaximus Unleashed!
W h a t is ~j3/ls OK, this is really, really cool. As you're soon to discover, MMLogic (which was created by George Mills, Softronics Inc.) is a terrific program for learning the fundamentals of digital logic and experimenting with said logic. In fact the only problem is that Alvin and I were planning on doing something very similar, but now there doesn't seem to be much point, because George has done such an outstanding job. George very kindly gave me permission to include MMLogic on the CD-ROM accompanying this book. Note, however, that MMLogic is NOT in the public domain. The version included on this CD is Shareware, which means that once you install MMLogic you may use it in its fully functional trial mode for 30 days, after which the tool will automatically switch to a read-only mode. The readme file accompanying MMLogic describes how you can acquire a license, which will entitle you to use MMLogic in its fully functional mode indefinitely, and which also entitles you to all applicable upgrades to this version.
gnstaUin g ~)Vls MMLogic is suitable for use on Win32 Windows Platforms, including Windows 3.x (so long as you've got Win32s) I~l and Windows 95. On the CD-ROM accompanying this book is a directory called mmloglc, and inside this directory is a file called mmlogl10.exe. Use your file manager or Windows Explorer (or whatever) to locate this file and double-click on it with your mouse. This file is a self-extracting executable, which means that double-clicking on it will launch the process required to install MMLogic on your system; simply follow the prompts as they appear on your screen.
I~unnin g ~l)l/ls Assuming that you're running Windows 95, use S t a r ~ - > P r o g r a m s - > MultiMedia Logic - > MultiMedia Logic to launch MMLogic. Once you've launched MMLogic, you'll see a user window and a pallet. You can use the contents of the pallet to quickly draw your own circuit diagrams on the screen and then simulate them. For example, click the AND gate symbol on the pallet (left-hand column, second row from the top), move your cursor into the user area and click to place a copy of this gate, then move your mouse a little and click again to place a second copy of the gate.
INote that you must have a complete version of Win32s, which is not included with MMLogic. If you don't already have Win32s, you can download it over the web from www.microsoft,com.
Chapter 4 MMLogic Multimedia Logic Design System
35
Now click the pointer tool in the upper left-hand corner of the pallet, then double-click on one of your AND gates. This results in a form appearing, which allows you to specify certain attributes associated with this gate, such as its number of inputs and whether or not the output should be inverted (thereby forming a NAND) (Figure 4-1 ). Try selecting the 3 Inputs option and setting the Invert Output box, then click OK to see these changes occur in your schematic. Similarly, Figure 4-1" The properties form for the AND gate the properties form associated with the flip-flop device (right-hand column, fifth row from the top on the pallet) allows you to select between different flavors of latches and flip-flops. Once you've placed all the components you require and have wired them together, you can proceed to simulate your circuit. In fact MMLogic comes equipped with a wealth of example circuits, so let's use one of these. Use File - > Open to reveal an Examples directory. Double-clicking on this directory reveals a series of sub-directories, including one called Moderate, which is the one we're interested in here. Double-clicking on Moderate exposes a file called Bandit, and double-clicking on this file invokes the circuit for a one-armed bandit. At this time we're still in MMLogic's Draw mode, which is the mode we use to edit schematics. But we want to simulate this circuit, so use Simulate - > Run to place us into the Simulation mode. On the left-hand side of the schematic you'll spot four push buttons. Click the button marked Reset on the extreme left of the schematic to start the one-armed bandit running, and note the three devices on the right-hand side of the screen as they start to display bitmaps. When you're ready, click the three buttons marked Stop I, Stop 2, and Stop 3 (pausing for a few seconds between each button), which will halt their associated windows. Don't worry, once you see this running you'll quickly get the idea. Once you've finished playing with this example, use Simulate - > Stop to return to the Draw mode, then close this example. Now feel free to root around the other examples, or to start creating and simulating your own circuits. Enjoy!
This Page Intentionally Left Blank
This Page Intentionally Left Blank
Chapter 5:
Capturing and Synthesizing a Design "Textual, graphical, and mixed-level techniques "
In this chapter you will discover: Electronic Design A u t o m a t i o n ........................................................................................40 The Early Days ............................................................................................................................................... 41 The D a w n i n g of EDA ......................................................................................................................... 42 The Introduction of H a r d w a r e Description L a n g u a g e s ...... 46 Migrating Existing G a t e - L e v e l Designs to PLDs ...................................47 The Introduction of Verilog a n d VHDL ...............................................................48 Top-Down, M i d d l e - O u t , a n d Bottom-Up .......................................................49 G r a p h i c a l Entry M e c h a n i s m s .............................................................................................50 Logic Synthesis a n d L a n g u a g e - D r i v e n Design ...................................51 Mixed-Level Design ............................................................................................................................. 54
40
DesignusMaximus Unleashed!
Electronic
Design
utomation
The phrase Electronic DesignAutomation (EDA) encompasses a number of distinct specialties, the main ones being Computer-AidedEngineenng(CAE), Computer-Aided Design (CAD), Computer-AidedManufacture (CAM), and Computer-Aided Test(CAT) (Figure 5-1).
I~
Electronic~ Design Automation (EDA)
i
i
iiiiiiiiiiii!ii
"1
i Ziiill
Figure 5-I: Electronicsdesign automation includes computer-aided engineering, design, manufacture, and test At the circuit board level, CAE includes capturing the design and verifying its functionality and timing; CAD is concerned with laying out the board (that is, placing components on the board and describing the connections between them), along with any real-world verification such as signal integrity and thermal analysis; CAM involves anything to do with manufacturing the board; and CAT refers to the testing of the board. By comparison, when working with Application-Specific Integrated Circuits (ASICs) and Field-Programmable Devices (FPDs), CAE is generally accepted to encompass both the capture process and any mapping software used to squeeze the design into the device. I~l Apropos of nothing at all, some engineers feel that the terms CAE and CAD are misleading, in that it would make more sense to use CAD to refer to the process of capturing the design and something like Computer-Aided Layout (CAL) to indicate the process of layout. The reason for using the terms CAE and CAD as we do is that the layout fraternity started using computer-aided techniques first, so they grabbed the CAD designation. Thus, when design engineers finally caught up, they were obliged to chose a new label, and CAE was apparently the best they could come up with. Over time, layout guys and gals came to refer to themselves as layout designers or simply designers, based on the fact that they practiced CAD. Similarly, the layout fraternity refer to people who conceive and capture designs as engineers on the basis that they practice CAE. Unfortunately, nobody bothered to inform the "engineers," who therefore blissfully stroll through life referring to themselves IASICs and FPDs are discussed in more detail in Chapter 19.
Chapter 5 Capturing and Synthesizing a Design
as "designers," and referring to the layout guys as .... "the layout guys." For the purposes of this book we will use the terms "designer" and "engineer" interchangeably, thereby giving everyone an equal opportunity to have something to moan and groan about.
~he E~rly D~ys As I poise myself to pen these words a grim chill is stealing over me, because I know that l'm about to open a huge "can of worms," but this is the path we have to tread to make any sense out of the horrors that are to come. Way back in the mists of time, say around the 1960s, the majority of electronic designs were created by large companies, because they were the only ones who could afford to do it. Neglecting for the moment manufacturing and test, the world as seen by engineers was divided into two camps: the engineerS themSelves and the drafting office. Prior to the advent of computer-aided techniques, engineers captured designs by drawing circuit diagrams on whatever surface came to hand, including scraps of paper, shirt sleeves, and tablecloths. These masterpieces were then passed to the drafting department, where they were manually redrawn and annotated to the company's internal standard. (To increase the fun, each organization had its own standard). The resulting "official" schematics were subsequently handed on to the layout draftsmen who performed the physical layout of the board by hand, and who were considered (at least by themselves and their mothers) to be at the top of the drafting totem pole. (In order to be "politically correct" we should perhaps say "draftsmen and draftswomen," but in those days you could count the number of draftspersons of the female persuasion on the fingers of one foot.) A common scenario in those days was for engineers to be on the staff and for drafting personnel to be union members (this scenario still occurs to this day in certain industries ~ for example, the automotive industry springs to mind). Unfortunately, it was not unknown for friction to develop between these two groups, and woe betide any hapless engineers who got on the wrong side of the drawing office, because their jobs were mysteriously prone to "unforeseen" delays, and some jobs might even disappear into the nether regions from whence they never reemerged into the light of day (in much the same way that today's engineers quickly learn not to annoy their system administrators). For their part, the layout draftsmen accepted a schematic and generated the circuit board's physical layout. In order to do this they typically worked at a scale of 4:1, which meant that everything was four times its actual size. First the draftsman applied sticky cutouts called footprints to a transparent sheet of mylar, where these footprints were used to represent the physical locations of the
41
42
DesignusMaximusUnleashed! components. Next they marked the positions of the tracks on the sheet using colored pens (in the case of double-sided boards, a red pen was used to represent the tracks on one side of the board while a blue pen was employed to indicate the tracks on the opposing side). The next stage in the process was referred to as "taping up," in which sticky black tape was used to represent the true widths of the tracks and pads (still at a scale of 4:1).c21 The resulting taped-up artworks would eventually be photographed and reduced to generate a 1:1 film artwork, which would subsequently be used in the process of etching the board. 131 Meanwhile, the 4:1 taped artworks would be passed back to the drafting office to be redrawn by hand to create the "official" assembly and manufacturing drawings. Finally, the physical layout would be compared to the schematic by eye in order to ensure that the two were synchronized. By golly, we certainly knew how to have fun in those days!
~he D~wning of ED~ When discussing the origins of EDA, it's tempting to assume that there was some master plan in mind, which would therefore allow us to plot a linear development from ground zero. In reality, different disciplines in a multitude of organizations, including industry and academia, were all merrily doing their own thing, to the extent that it's a miracle anything came together at all. Also, many interesting computer-aided activities were being pursued during the 1960s that addressed integrated circuit design issues, but we'll start by looking at board-level design, which is easier to comprehend in many ways. Board-level design in the early-1960s was predominantly based on the use of discrete components such as transistors, resistors, and suchlike, and circuit boards themselves were predominantly single- or double-sided (which means that they only had tracks on one or both of their outer surfaces). On the layout side of the fence, a typical design cycle could be viewed as: 9 Initial design layout 9 Artwork generation 9 Manufacturing drawings
2 weeks (Design activity) 2 weeks (Post-design activity) 2 weeks (Post-design activity)
It was recognized that there were significant productivity advantages to be gained by reducing the time and effort associated with the post-design portions of the 2You can always recognize layout draftsmen of yesteryear by the fact that they abhor woolly sweaters (because tiny woolen fibers could become attached to the tape, be included in the artwork, and end up causing process problems and potential short circuits on the resulting circuit board). aThe process of creating circuit boards is more fully described in my book: Bebop to the Boolean Boogie (An Unconventional Guide to Electronics), ISBN 1-878707-22-1, HighTextPublications.
Chapter 5 Capturing and Synthesizing a Design
process. Thus, one of the first applications of computer-aided techniques at the board level was for the layout designers to use digitizing tables to capture the initial design layout. First they placed their transparent 4:1 mylar films carrying the sticky footprints and the pen-drawn tracks, pads, and vias onto a back-lit digitizing table. Then they proceeded to digitize and record the locations of all of the entities on the board. The resulting electronic representations could subsequently be used for such purposes as generating film artworks by feeding the data to a photoplotter; generating drawing office quality assembly drawings via a pen plotter; and constructing a wiring list that could be used to assist in the still manually-intensive process of checking the layout against the schematic. The result was a 50% reduction in the time spent on post-design activities. Meanwhile, engineers were starting to make use of analog simulation in the form of a program called SPICE, which stands for Simulation Program with Integrated Circuit Emphasis. As its name might suggest, SPICE (which was designed in the mid-1960s at the University of Berkeley, California) was originally intended for designing integrated circuits, but it quickly found use in analog board-level designs. Textual netlist
Tabular response
Tabular stimulus
Device knowledge database
Figure 5-2: The first analog simulators used text-based input and output In order to use SPICE, the engineer had to provide it with a description of the circuit in the form of a netlist. Initially this was supplied as a set of punched cards called a deck, and the term SPICE deck persists to this day. 141In addition to this netlist, the engineer also had to supply SPICE with a description of the stimulus 4The first iterations of SPICE ran as batch jobs. Later versions came equipped with a simple user interface called Nutmeg, whose name had no relevance beyond the fact that nutmeg is a spice m Oh, how we laughed!
43
44
Designus Maximus Unleashed!
to be applied to the circuit's inputs. This stimulus was described in terms of voltage and current values presented in tabular form. Similarly, the output from the simulator was also presented in tabular form, which made it an absolute swine to read and comprehend (but it was a whole lot better than trying to perform the calculations by hand) (Figure 5-2). Note that the file formats shown in Figure 5-2 are pseudo formats created solely for the purposes of this illustration (we're not trying to teach SPICE syntax here). Also note that analog simulation is discussed in greater detail in Chapter 11. For the purposes of this discussion, we need only be aware that it began to strike engineers that drawing a circuit diagram on a piece of paper and keying in the netlist by hand was a somewhat painful process. So the next step in the game was to use a program called schematic capture, which could be used to graphically draw the circuit on a computer screen using predefined component symbols and connecting them together. The schematic capture tool could then automatically generate a netlist for use with the analog simulator (Figure 5-3).
Component-leveleohematio !~1=10K
Textual n ~ l l e t W
~ _ _ CI=SUF
To analog simulator
GND=OV
Figure 5-3: Schematic capture could be used to generate a netlist for use with the analog simulator
Thus, engineers took a step towards the light. Strange as it may seem, however, schematic capture was still regarded as an engineering "toy" (or "tool" if you prefer), but it was not considered to represent the final design. The engineer could certainly use the schematic capture utility to print out a circuit diagram, but this diagram was then passed to the drafting office to be manually redrawn to the company standard, and it was several years before the output from schematic capture programs came to be accepted as being correct and archiviable documentation. Be this as it may, the fact that engineers were now generating netlists had a number of important ramifications, not the least that there were now two netlists associated with a design: the one generated by the engineers and the one digitized by the layout designers (albeit in different forms). Thus, the stage was set
Chapter 5 Capturing and Synthesizing a Design
for the arrival of tools that could automatically compare the two netlists to ensure that the design and layout databases were synchronized. Still later, automatic layout tools began to appear, which could accept a netlist as input and perform large portions of the layout process by themselves. Meanwhile, digital electronics was starting to gain a stronger foothold, because integrated circuits could be created containing a number of simple digital functions, such as four 2-input AND gates. Thus, in addition to placing the components and generating the connections between them (a process referred to as place-and-route), layout tools were augmented with the ability to experiment by swapping gates between different integrated circuit packages and swapping pins on gates. This lead to what we might refer to as the classical board-level design flow, which involves capturing the design as fiat (non-hierarchical) gateand/or component-level schematics, and communicating the design's topology to layout in the form of a fiat netlist (Figure 5-4). Flat, multi-sheet schematics Flat netliet
Back annotation
Figure 5-4: The classical board-level design flow The reason we describe these schematics as being "fiat" is that, with the exception of partitioning the board into major functional units, board level designers typically make minimal use of hierarchy. One reason for this is that the layout tool assigns individual names to component packages, such as IC1, IC2, IC3, and so forth. At some stage this information has to be back-annotated into the schematic, which can be very complex if the original design was captured in a hierarchical form. By comparison, IC and ASIC designs are not based on components contained in physical packages, so names can be arbitrarily assigned to individual blocks, and designers therefore make far greater use of hierarchic representations (in fact they are obliged to employ such representations due to the relative complexity of their designs).
45
46
DesignusMaximus Unleashed!
~he ~Introduction of H a r d w a r e Description s The classical board-level design flow began to change in the early 1970s with the advent of Programmable Logic Devices (PLDs). Is) In the case of these devices, the designer specified the function the device was to perform using rudimentary, proprietary Hardware Description Languages (I-IDLs) such as ABEL TM from Data I/O | These languages could be used to describe Boolean equations and simple truth tables, and also had additional constructs to declare entities such as registered outputs. The text file containing this description was then fed to a PLD tool that optimized and minimized the Boolean equations, and used a knowledge database to generate a fuse map targeted to a device specified by the user. These fuse files could be created in a number of standard formats such as JEDEC (Figure 5-5).
Textual HDL (e.g. ADEL)
Optimization & minimization
Figure 5-5: One of the first uses of HDLs was to specify the function of PLDs
Fuee file (e.g. JEDEC)
Device knowledge database
Originally, it was the designer's responsibility to instruct the PLD tool as to which device it should target. But these tools became more sophisticated over time, and later versions allowed the designer to augment the optimization and minimization software's knowledge database with information as to each device's price and availability, along with extra data pertaining to such aspects as the user's own preferences. The tool could then determine the most cost-effective device that could accommodate this particular functional description. The tools were subsequently further augmented with the ability to automatically split large designs across multiple devices. The topic of PLDs is covered in more detail in Chapter 19. For our purposes here we need only note that there were several problems with this early PLD methodology, most of which revolved around the fact that the design of the 5PLDs are discussed in greater detail in Chapter 19
Chapter 5 Capturing and Synthesizing a Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. .............
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
,
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.......
. .....................
- .....
board and the design of the programmable devices were almost completely distinct. For example, there weren't any automatic systems in place to ensure that the data associated with different versions and revisions of the board and devices kept in step. Secondly, early PLD tools often automatically dictated which device pins were assigned to which signals. This meant that the person capturing the schematic couldn't connect any wires to the symbol for the programmable device until the PLD tool had been run, which could therefore impact the layout designer. Even worse, modifying the HDL source and re-running the PLD tool could result in the pins being reassigned, which might therefore require modifying the board's layout. Not surprisingly, these problems were somewhat aggravating to all concerned (particularly the layout designer), but we digress ....
)l/ligrating Existing
ate-s
Designs to Ps
Following the introduction of the first PLDs, an interesting flavor of design capture emerged to take account of the fact that so many existing designs were represented as gate-level schematics and netlists. A market emerged for designers to take these designs (or portions thereof), automatically translate the schematics into their HDL equivalents, and then re-implement that design using one or more PLDs. This allowed designers to dramatically reduce the size and cost of new generations of the board-level product with relatively little expenditure of time and effort (Figure 5-6).
Existing gate-level schematic or netlist
Textual HDL
Fuse file
Figure 5-6: Migrating existing gate-level designs to PLD implementations As an interesting counterpoint to this approach, some designers work in sort of the reverse manner. These designers like to take advantage of specifying their design in the form of Boolean equations, but they wish to target a resulting netlist
47
48
Designus Maximus Unleashed!
toward discrete primitive gate and register devices. One recent example known to
the author involves a medical application, whose designers, for reasons of their own, simply have more confidence in discrete devices as opposed to programmable logic.
~ h e ~Tntroduction of Verilo~ a n d ~ f l D s As was previously noted, the early PLD optimization and minimization tools employed simple proprietary HDLs. This may have remained the case indefinitely, except that other segments of the EDA market also began to employ HDL representations for simulation and logic synthesis, where synthesis was predominantly targeted towards IC and ASIC designs. Initially these simulation and synthesis HDLs were also proprietary, but over time the industry standardized on two languages: VHDL (IEEE 1072) and Verilog (IEEE 1364) (Figure 5-7). ,AL,
13ehavioral
(Algorithmic)
~r dL
~'~,~,~:~,~,~!~,!~:~:;~!~,~!~!i!~!~!~:~!~!i~i~!~i~i:~84:~::i!~!~!~i
VHi )L
~.:. :.........
Verilog
FSM::
Funcr,ional Ir
Structural
9~h
Gat~ . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
Figure 5-7: High-level comparison of Verilog and VHDL The reason Figure 5-7 shows both of these languages as not fully covering gateand switch-level constructs is due to timing. Although each language is capable of representing the functionality of gates and switches, they have varying degrees of success in representing delays, and neither language is fully capable of handling the delay effects seen in deep-submicron integrated circuit technologies (see also the discussions in Chapter 25).
Chapter 5 Capturing and Synthesizing a Design
The lowest level of modeling abstraction is called structural, which refers to switch- and gate-level netlists. The next level of abstraction is referred to as functional, which ranges from Boolean equations, through Finite State Machines (FSMs), to Register Transfer Language (RTL) descriptions. (RTL is shown as being at a higher level than FSM, because the latter representations are, by definition, bounded to a finite number of states.) The highest level of modeling abstraction is referred to as behavioral, which is, in the author's humble opinion, a somewhat misleading term, in that all levels of abstraction essentially describe behavior. A more appropriate designation might be algorithmic, but we have to live in harmony with the rest of the world, so we're stuck with "behavioral." Each language has its advocates and its detractors. Proponents of VHDL point to its power, versatility, and capability for extremely high levels of modeling abstraction, while the supporters of Verilog feel that this language is more intuitive to use and more closely relates to the hardware it is employed to represent. During the late 1980's and early 1990's, a period referred to by some observers as "the HDL wars," the advocates of each language loudly proclaimed the early demise of the other. However, it now appears certain that both languages will maintain a significant presence for the foreseeable future. Also of interest is the fact that both VHDL and Verilog support the concept of hierarchy, which is of particular application in the design of ASICs and ICs (but of less interest in the case of board-level designs). The early PLD HDLs did not support the concept of hierarchy, although this has now been added to languages like ABEL. Many of the early HDLs are still in use, because a great many tools have been developed to generate and process them over the years. However, a large proportion of such tools now also accept VHDL, Verilog, or both.
~op-Down, ~iddle-Out, ~nd ~ottom-lAp There are three distinct design methodologies that are commonly referred to as top-down, middle-out, and bottom-up. A top-down methodology means that the designer specifies the design, or portions thereof, at a high level of abstraction, and subsequently progresses the design to the implementation level (where the implementation level is considered to be the lowest level with which a designer is concerned; for example, a gate-level netlist in the case of a digital designer or a transistor-level netlist in the case of an analog designer). In practice, designers always employ a top-down methodology in the initial stages of a design, even if only as a paper exercise (it's rare indeed for a designer to run into the office shouting: "Look at this great new device, now what can we design around it?"). A middle-out methodology refers to the use of previously created functions. These functions may range in complexity from relatively simple ASIC macro-
49
50
Designus Maximus Unleashed!
functions, through discrete integrated circuits (ICs), to complete sub-systems. Last but not least, a bottom-up methodology means that the designer directly implements the design, or portions thereof, at the lowest levels of abstraction, such as transistors, primitive logic functions, or ASIC cells. Prior to the use of HDLs combined with the minimization and optimization software used to design programmable logic devices, the top-down component of the design process typically only encompassed the partitioning of the design into a hierarchy of functional blocks. These functional blocks were then progressed to the implementation level using either the middle-out or bottom-up methodologies. The introduction of PLD techniques (followed by synthesis technology as discussed below) augmented the top-down design methodology by providing an automated technique to progress the design, or portions thereof, from a high level of abstraction to the implementation level. In reality, however, complex designs are rarely amenable to the use of a single design methodology, and designers must therefore have the ability to combine top-down, middle-out, and bottom-up techniques.
~raphical Entry ,Mechanisms An important consideration pertaining to the use of HDLs is their software nature. Hardware engineers often find it difficult to visualize a design in terms of an HDL, and many designers find HDLs too verbose and difficult to enter textually. These problems are best alleviated by graphical entry mechanisms, which more closely reflect the way in which designers think; for example, state-diagram and flowchart editors (Figure 5-8). Graphical etate diagram
Textual HDL (Verilog, VHDL, ABEL.... ) Graphical flowchart,
To logic synthesis
Figure 5-8: Graphical entry mechanisms can be used to generate HDL representations
Chapter 5 Capturing and Synthesizing a Design
A key feature of these graphical entry mechanisms is that they are capable of automatically generating simulatable and synthesizable HDL representations. Additionally, many of these tools provide the ability to select between alternative HDLs, which allows the designer to take full advantage of various technologyspecific tools.
s
Synthesis and s
Design
One of the strange aspects about the electronics industry is the way in which devices and methodologies appear, fade away, and reappear in different guises. 11 t.,lUq~:::~LIUII~:~U ~ l l l O . l l y
WUUlU
I.,tUIIIL LU LII~:~ t:~Ci/[Iy-J. 7 O U ~
C:l.b LIIt~ ~:~lllt~l~::~:~lll~t~ U I
tll~
ASIC, but the concept actually originated one and a half decades earlier, thereby occurring alongside the introduction of the first rudimentary (and conceptually much simpler) programmable logic devices. In 1967, Fairchild introduced a device called the Micromosaic, which contained a few hundred transistors. The key feature of the Micromosaic was that the transistors were not initially connected together. A designer used a computer program to specify the function the device was required to perform, and the program then determined the necessary interconnections and generated the masks required to complete the device. The Micromosaic therefore led the field as the forerunner to the modern ASIC, and also as one of the first real applications of computer-aided design. This device also exhibited one of the first examples, albeit rudimentary, of high-level design coupled with logic synthesis. Had the concepts behind Micromosaic been pursued, high-level design techniques would almost certainly have enjoyed widespread acceptance much sooner than they did, but, unfortunately, this technology faded away into the background for a while. This meant that when the use of ASICs started to become somewhat more commonplace in the early-1980s, their design methodologies were based on the traditional techniques that had been established for circuit boards. Thus, the designers of early ASICs used schematic capture to describe the function of their circuit as primitive logic functions and the connections between them. The schematic approach does convey certain advantages, not the least that it reflects the way in which designers think at the lowest level of abstraction, and it also allows expert designers to hand-craft extremely efficient functions. However, gate-level schematics are time consuming to enter and they don't lend themselves to "what-if" analysis at the architectural level. Also, verification using simulation is extremely CPU intensive at the gate level, and it is difficult to re-target gate-level schematics to new device technologies.
51
52
DesignusMaximus Unleashed!
Since the early ASICs typically only supported between 2,000 and 5,000 primitive gates, the schematic capture approach was at least tenable, and it remained in force throughout most of the 1980s. However, as gate counts continues to rise through 10,000, 15,000, 20,000, and beyond, it became increasingly difficult to design these devices using traditional techniques. Thus, the late-1980s and the early-1990s saw increasing use of Language-Driven Design (LDD), which involved the combination of HDLs and logic synthesis technology (Figure 5-9). Textual HDL
Figure 5-9: Language-driven design refers to the combination of HDLs and synthesis technology
Gate-level netllet
Device
knowl~ge
database
Logic synthesis itself involves two distinct steps. First, the high-level HDL description is converted into a canonical form such as "sum-of-products"; second, these expressions are optimized and minimized taking into account the features of the target technology, such as how many loads each type of gate can drive. Also, designers can specify constraints such as maximum delay paths through the device, and the user can also instruct the synthesis tool to optimize different portions of the design to minimize either their delays or the area they occupy on the device. The early examples of language-driven design involved proprietary languages, but (as we've already discussed) the electronics industry eventually (largely) standardized on VHDL and Verilog. These HDLs are appropriate for describing both control and datapath logic at a reasonably high level of abstraction. This means that in the early stages of a project, the designer can concentrate on the architecture of the design as opposed to worrying about implementation details. Also, it is much more efficient to simulate designs at a high level of abstraction compared to the gate level, and it is far easier to perform "what-if" analysis at the architectural level. Synthesis technology then allows these high-level representations to be migrated to the implementation level, and it facilitates design reuse by allowing the design to be re-targeted to alternative implementation technologies.
Chapter 5 Capturing and Synthesizing a Design
Unfortunately, neither VHDL or Verilog were conceived with synthesis in mind (Verilog was initially targeted towards simulation, while VHDL was intended for documentation), which means that one can use these languages to describe constructs that cannot be synthesized (Figure 5-10). AL
Not amenable to
eyntheele ueing
15ehaviorai
tra~litional teehniquee
(Algorithmic)
~L Functional
Structural ,
VHDL Verilog
r Can be eyntheeized ueing traditional techniquee
i i i i i i!iijZiiiiiii!iiiii!iiiiiiii!i!i i!iii i i iiiiii!iiiiiiiiii!iiiiiii i iiiiiii!iiiiiiii Figure 5-I0: Verilog and VHDL both support constructs that can't be synthesized
To solve these problems, the Japanese came up with their own HDL called UDL/I, which was designed with synthesis and simulation in mind. Unfortunately, by the time UDL/I, arrived on the scene, Verilog and VHDL already held the high ground, and it never managed to attract much attention outside of Japan (interest in UDL/I is now almost non-existent even in Japan). LDD methodologies also have other problems, such as the fact that the textual gate-level netlists generated by synthesis tools are difficult to comprehend. Hence, another tool that began to appear was schematic synthesis, which accepts a gate-level netlist and automatically generates an equivalent schematic (Figure 5-11).
53
54
DesignusMaximus Unleashed! Textual HDL
Gate-level netiist
Gate-level schematic
Figure 5-1 I: Schematic synthesis helps designers "visualize" the results from logic synthesis LDD techniques are generally accepted to increase designer productivity (in terms of gates-per-day) by a factor of ten compared to designing at the gatelevel. Based on these promises of delectation and delight, a number of early adopters became overly enthusiastic, in that they decided LDD was the only way to go and discarded schematics as being "yesterday's technology." However, in addition to having to learn an HDL along with any associated software design techniques and disciplines, designers also have to learn how certain constructs and statements affect the synthesis tool. Today's synthesis tools are very powerful, but although it's possible to obtain good results (in the form of small and efficient designs), it's also easy for designs to get "out-of-control." Also, writing HDL code in two different ways which simulate exactly the same at the RTL level can synthesize to radically different gate-level implementations.
Mixed-s
Design
Although language-driven design can be very effective, certain portions of designs may not be amenable to logic synthesis techniques, in which case these portions inherently have to be captured at the gate level. Thus, the industry came to realize that schematics still had a role to play: first in describing the design graphically in terms of high-level functional blocks and the connections between them; and second, in describing certain portions of the design at the gate level. The resulting "mixed-level" design style offered the best of both worlds by allowing designers to mix schematic and HDL representations together.
............................................................................................
-::- ...........
: ...............................................
Chapter 5 Capturing and Synthesizing a Design
~::
.................................
: .............................................
--:::::.::
_. . . . . . . . . : . . . . . . . . . . . . .
: ..................................
- .............
: ..............................................
_:_
These systems were subsequently further extended to incorporate additional graphical entry mechanisms, such as the state-diagram and flowchart editors that had originally been conceived as standalone applications for programmable logic devices. Thus, today's mixed-level design capture systems support multiple entry mechanisms in a heterogeneous environment, thereby allowing designers to use the most appropriate tool for each portion of the design (Figure 5-12).
Textual HDL
Graphical State Diagram ~ii~ii!iiii~iiiiiiiiiiii!i~i~iiiiii!ii~i~i~i~i~i~i~i~i~i~ii~ii~i~i~i~i~i~i~ii~i~!i~i~` i iiili!~
:~ii!iiii!i!iiiii
~
~
z
9
,
,
Block-level , schematic /
~ii~i~!i~i~i~i~i~i!i~i~i~ii~ii~iii!ii~iii~i~ii~ii~i~i~i!~i!ii~!i~i~i~i~i~i~i!~i~i~i~i
when clock ri~ee /
then y = (a & b) I c; else y = c & 1(~1 ^ e);
I
::~::~:: :::::::::::::::::::::::~::~::~:~::~::~::~::~ ~:::~:::::::: ~ ~~~~:~:~ ~~:~
ii!i',',i',i~,!i i}i!ili
i}i'~i~i'~ii'~i / iiiii}iii! /
i,~ii,~~,i~,~i
i,,i,i,i,i,i,i~,~::: ~,
Graphical Flowcha~
~iii~ i
",, ,~ ........................
Gate-level schematic
Figure 5-12: Mixed-level systems ollow eoch podion ot the design to be represented at the most appropriate level of abstraction In this scenario, designers commence by creating a block-level schematic at a high level of abstraction. When designers subsequently decide to "push" into a block, they can choose to represent its contents as a flowchart, a state diagram, textual HDL, or a gate-level schematic. Furthermore, modern capture systems support the concept of "views," in which each block may have multiple representations at varying levels of abstraction. Also note that we've only introduced some of the more prominent design capture tools here, but that many more are available. For example, there are tools that accept HDL descriptions of state machines as input and synthesize their graphical counterparts, while other tools can be used to input a state machine in hand-entered HDL and generate a corresponding HDL output that's optimized for simulation and synthesis, and the list of such applications goes on, and on, and ......
55
56
Designus Maximus Unleashed!
The topic in this chapter wae published in an edited form in Electronics Design & Technology Network (EDTN) in July 1997, and is reproduced in ire original form here with their kincl permission. EDTN is a web-only publication for electronic~ engineer~, and provicle~ know-how, news, and clara eheet ~pecificatione for a broad range of technologies. It'~ actually pretty cool, and well worth your checking them out at www.~tn.com For your ruff, her reading pleasure, an excellent guide to the uee of hardware cleecription language~ is HDL Chip Deeign (A Practical Guide
for Deeigning, Syntheeizing, and Simulating ASICe and FPGAe ueing VHDL or Verilog) by Doug Smith, lSBN 0-9651934-3-&, Doone
Publications (www.cloone.com, call 1-800-311-3753).
With regard to the cleeign tools cllecues~ in this chapter, there are a number of venclore of euch appiicatione, anti new featuree anti techniquee become available on an almoet dally baeis. A great etarr, ing point, however, would be VeriBeet Inc., Boulder, CO, USA (www.veribeet.com, call 1-&OO-VEI~II3EST), who can eupply capture, eimulation, eynth~ie, anti layout utilitlee, either ae inclivicluai applicatione or ae an i n t e g r a ~ euite.
Chapter 6:
Digital Logic Simulation "Event-driven, cycle-based, hardware, and home-brewed"
In this chapter you will discover: From " S u c k - i t - a n d - S e e " t o S i m u l a t i o n ........................................................................ 58 T r a d i t i o n a l L o g i c S i m u l a t i o n .............................................................................................................. 58 Verilog a n d VHDL ........................................................................................................................................ 60 E v e n t - D r i v e n Simulators .............................................................................................................................. 61 D i s t r i b u t e d D e l a y s a n d N a r r o w Pulses ........................................................................ 63 M i x e d - L e v e l S i m u l a t i o n ..................................................................................................................... 66 C y c l e - B a s e d Simulators ................................................................................................................. 68 H o m e - B r e w e d Simulators .............................................................................................................. 69 H a r d w a r e Simulators A K A H a r d w a r e E m u l a t o r s ............................... 70
8
DesignusMaximus Unleashed!
~ r o m HSuck-it-and-See" to S i m u l a t i o n There are a number of techniques that can be employed to verify the functionality of a digital circuit, ranging from the "suck-it-and-see" approach (which means that first you build it and then you see if it works), through various flavors of logic simulation, all the way up to formal verification technology. Prior to the advent of computer-aided techniques, the only way to determine whether or not your circuit would work was to use a technique called breadboarding, in which you insert components into a special circuit board, use small wires with pins on the end to connect the components together, and then power up the circuit to see if it functions as planned. The breadboarding technique is still used for portions of board-level designs to this day, but this approach is extremely inefficient for large circuits. Thus, during the mid to late 1970s, a number of programs known as digital logic simulators started to become commercially available.
~raditional s
Simulation
When digital simulation first arrived on the scene, the majority of board-level digital circuit boards consisted of simple logic functions (gates and registers) implemented in a technology known as Transistor-TransistorLogic (TTL). Both of these factors were to have significant implications as to the way in which early logic simulators were conceived and designed. In order to use one of these early logic simulators, the engineer was obliged to provide it with a description of the circuit in the form of a gate-level netlist. In addition to the netlist, the engineer also had to supply the simulator with a description of the stimulus to be applied to the circuit's inputs, where this stimulus was presented as a text file in tabular form. Similarly, the output from the simulator was presented in tabular form, which made it awkward to read and comprehend (Figure 6-1). Note that the file formats shown in Figure 6-1 are pseudo formats created solely for the purposes of this illustration. When the circuit is passed to the simulator, it accesses a pre-defined model library to determine the functionality and timing associated with each component, and then constructs a virtual circuit in the computer's memory. The simulator then applies the test vectors to the circuit and reports the results.
Chapter 6 Digital Logic Simulation Textual netlist
Tabular response
Tabular stimulus
Device knowledge database
Figure 6-I: The first digital simulators used text-based input and output
As computers became more powerful, these time-consuming and error-prone text-based techniques were superseded by graphical methods for capturing the schematic and stimulus and displaying the results (Figure 6-2). Gate-level schematic
Graphical response
I Graphical stimulus i _
I mU
iiii1iiii~i~!ii~iiiiiiiiiiiiiiiiiiiiiii
iiiiiIiiii
I
LI
F1
I_I-LI
I
_
I
I-l__ Lira1
Device knowledge database
Figure 6-2: Later digital simulators employed graphical input and output techniques
Note that the circuit can be described at many levels of abstraction, including flowcharts, state diagrams, and high-level hardware description language (HDL)
59
60
DesignusMaximus Unleashed!
representations (see also the discussions in Chapter 5). For the sake of simplicity, however, these discussions concentrate on gate-level views of the circuit, but the following points are typically applicable to all levels of modeling abstraction. Also, although we're predominantly going to consider gate-level views, the simulation models for these gate-level components could themselves be represented at different levels of abstraction.
Verilog a n d "VHDs Early logic simulators were typically based on the concept of "simulation primitives" (simple logic gates and registers which were often represented as truth tables and which were inherently understood by the simulator), and any other devices were obliged to be modeled as a collection of these primitives. Later simulators used a plethora of proprietary hardware description languages, but the industry has now largely standardized on two main "tongues", Verilog and VHDL, which are described by the IEEE 1364 and 1076 standards, respectively (Figure 6-3).
B~havioral
(Al~orithmic) ..........
i ......................
!
VHDL
Ir "
............
...........i i.........
::
FunoCional if! .:
n
i.~
1
5truoCural
Figure 6-3: High-level comparison of Verilog and VHDL Both Verilog and VHDL can describe circuits at different levels of abstraction, from primitive switches and gates to behavioral representations. In some respects VHDL is more powerful than Verilog at the behavioral level; however, the majority of today's synthesis tools cannot accept anything more abstract than register transfer language (RTL) descriptions, which tends to level the playing field between the two languages. Note that Figure 6-3 indicates that neither Verilog nor VHDL completely cover the switch and gate levels. This is because
Chapter 6 Digital Logic Simulation although both languages can represent the functionality of these primitive elements, they have varying amounts of success when it comes to modeling sophisticated timing effects, and neither language can completely handle the more esoteric delay models required for deep submicron technologies (see also Chapter 25). But we digress .....
Event-Driven Simulators The most common form of logic simulation is classed as event-driven, because the simulator sees the world as a series of discrete events. When an input value ....
-.__-zJ-_
_
_ _ r
U I I Cl I O I I I I I I L I V ~ ~r
_L
. . . . .
L;II,ClII~
r
_" . . . .
1_r
. . . . . . .
!..--r
r
----r
r
LII~ ~IIIII..ILIQLUI ~VClILIClLtc:~ LII~ ~:~O.Lt:~ L U
- 1 _ r
LI~:~Lt::~IIIIIIIt:~
whether this will cause a change at the output and, if so, the simulator schedules an event for some time in the future (Figure 6-4).
,
b
I
c O~
Current
time
Future
time
Figure 6-4: The simulator evaluates the effects of changes on a gate's input(s) and schedules future events to occur on the gate's output(s)
In fact, most event-driven logic simulators allow you to attach minimum, ~ i c a l , and maximum (min.~yp:max)delays to each model. This allows us to represent the fact that electronic devices switch at different speeds depending on environmental conditions such as their temperature and voltage supply (which can vary across a circuit board), and also that slight variations in the manufacturing process cause differences in the switching speeds of otherwise iden~cal integrated circuits (Figure 6-5). Note that LH (meaning "low-to-high") and HL (meaning "high-to-low")are used to represent rising and falling transitions at the gate's output, respectively. For example, assume that the OR gate in Figure 6-5 was specified as having a LH delay of 5:10"15 n~ (where ns is the abbreviation for nanoseconds, meaning one thousandth of one millionth of a second, or 10 .9 seconds). This would mean that any change on an input causing the output to transition from a logic 0 to a logic 1 would take a minimum of 5 ns and a maximum of 15 ns to propagate through the gate and affect its ou~ut.
61
62
DesignusMaximus Unleashed! LH = min:typ:max HL = min:typ:max
O~ Note t h a t LH and HL repreeent rising anti failing traneitione at the output, respectively
i
!
mln i i :
i
i I?lax
i
i
Figure 6-5: Many logic simulators support min:typ:max delays When you run the simulator, you can select one of the min.r delay modes and the simulator will use that mode for all of the gates in the circuit. Also, some simulators allow you to select one delay mode as the default, and then force certain gates to adopt another mode. For example, you might set all the gates in your datapath to use minimum delays and all the gates in your control path to use maximum delays, thereby allowing you to perform a "cheap and cheerful" form of timing analysis (see also Chapter 7). One of the problems facing the creators of simulation models is that delay specifications are becoming more complex over time. In the early 1970s, it was common for all the delays of a simple gate to be specified in the data book as being identical. Over time, however, delays began to be specified more accurately, and today each input to output path typically has its own delay for both rising and falling transitions at the output (Figure 6-6). l a,b-> a,b-> a,b->
yb
Figure 6-6: Delay specifications have become more complex over time
y,yb
= ? ' ? ' ? ns
y yb
-- ? - ? ' ? - ?'?'?
~ l~T/Oe
ns ns
a,b-> y(LH)? ' ? ' ? ns ]1 a,b-> y(HL)? ' ? ' ? ns a,b -> y b ( L H ) = ? ' ? ' ? n s a,b-> yb(HL)?-?-? n s a a b b a a b b
-> -> -> -> -> -> -> ->
y(LH)y(HL)y(LH) y(HL) yb(LH)yb(HL)= yb(LH) yb(HL)
= = = -
?:?:? ?:?:? ?:?:? ?:?:? ?:?:? ?:?:? ?:?:? ?:?:?
ns ns ns ns ns ns ns ns
I
1990s
Chapter 6 Digital Logic Simulation Another problem facing designers is that each tool (such as simulation and synthesis) typically has its own model library, and it's frightening how often different tools return different delays. One trend I think we'll see is for the timing and functionality portions of models to become separate and distinct entities, c~ and something that needs to happen in the not-so-distant future is for diverse tools to make use of common timing libraries. I can envision a day when all of the tools use a common timing model that returns different levels of accuracy depending on the information that is fed into it. Thus, in the early (pre-layout) part of the design cycle the timing model would return delays at one level of accuracy, and these delays would become increasingly accurate as more and more information becomes available throughout the course of the design.
Distributed Delays and
larrow Pulses
In the early simulators, it was only possible to attach delays to primitive elements. Thus, .when you built a model such as a multiplexer, you had to distribute the delays over the primitive elements forming that model; this was known as the "distributed delay" style of modeling (and it was a complete pain in the rear end, let me tell you) (Figure 6-7a).
Individual gate delays .__ ...............
........
..................
Total path delays ..............................
o_---~--____...............................
a
L
-
(a) Distributed delays
iiiiiiiiiiiililil i,,!:
(b) Pn-Pn delays
Figure 6-7- Early logic simulators only supported distributed delays, but modern tools also support pin-to-pin (Pn-Pn) specifications The problem was that data books only give delays from the component's inputs to its outputs, so it was up to the person writing the models to fragment these delays and distribute portions of them throughout the model in order to achieve the correct total delays through each path (you can always tell an old simulation 11accept that this is non-trivial in certain cases such as negative setup and hold violations, but it can be done with a little thought.
63
64
DesignusMaximus Unleashed/
modeler by the fact that they're phenomenally good at solving simultaneous equations). By comparison, modern simulators usually support Pin-to-Pin (Pn-Pn) delay specifications, which can be taken straight from the data book and applied as total path delays from the component's inputs to its outputs (Figure 6-7b). Note that the term Pin-to-Pin (Pn-Pn) One of the arguments in favor of the delay means different things to distributed delay style of modeling is that it cli~erent people. Design engineers provides better handling of narrow pulses regard Pn-Pn delays as being the and is closer to the real world. (Of course, delays through a component from it~ this point is usually put forward by the inputs to it~ outputs. By comparison, vendors who don't support Pn-Pn delay layout cleeigners are typically not specifications.) This argument is countered concerned about what's in an by fact that a model's contents usually bear Integrated circuit package, and they only a passing correspondence to the usually u~e the term Pin-to-Pin delay internal structures of the physical device, so to refer to the time it takes a ~ignal the way in which distributed delay models to propagate from one component's handle narrow pulses is speculative at best. output, through a track, to another component's Input (deeign engineer~ Another interesting delay effect is that pulses would call the~e Point-to-Point (Ptcan be stretched or compressed as they pass Pt) clelay~). In this book we're using through gates due to "unbalanced delays" Pn-Pn and Pt-Pt to refer to on those gates (Figure 6-8). component and track clelays,
LH = lOne HL = 6ne
:=~
24he
==
,
=
'
'
I i
I , ,
,
, ,
16he
,
I i F"
Figure 6-8: U n b a l a n c e d
I 20ns
:: v;
d e l a y s c a n "stretch" or " c o m p r e s s " pulses
Note that Figure 6-8 only shows typical delays for reasons of simplicity. Due to the fact that the rising delay is larger than the falling delay in this example, a positive-going pulse applied to the input is compressed by the difference between the two delays. Similarly, a negative-going pulse applied to the input would be
Chapter 6 Digital Logic Simulation stretched by the difference. Also remember that the LH and HL annotations apply to transitions at the output, so if the gate includes a negation (such as a NOT, NAND, or NOR), then the opposite effect will occur (Figure 6-9). LH = l O n s HL = 6ns
24ne
~.-
j6ns i~ ~i ,
~ lOns
',
28ns
Figure 6-9: An inverting gate has the opposite effect This leads us nicely into the consideration of what the simulator does with narrow pulses; that is, what occurs when a pulse is applied to a gate's input when that pulse is narrower than the propagation delay of the gate. The first logic simulators were targeted toward simple TTL devices at the board level. These devices typically rejected narrow pulses, so that's what the simulators did, and this was called the "inertial delay model" (Figure 6-10). 15ns LH & HL = l O n s
a
.
y
~
.
.
6nsi .
~
Rejected
Passe~
Figure 6-10: The inertial delay model rejects any pulse that is narrower than the gate's propagation delay However, later technologies like ECL will pass pulses that are narrower than their propagation delays (as will devices such as delay lines), so the next step was for the modeler to be able to select between the inertial delay model and a "transport delay model" (Figure 6-11). 6nee ,.,=
LH & HL = lOns
a
~',
5
a
I
y
Passes
Y i
lOn~
i
Figure 6-1 I. The transport delay model propagates any pulse, irrespective of its width
65
66
DesignusMaximus Unleashed/ The problem with both the inertial and transport delay models is that they only provide for extreme cases. Over time, simulators began to use more sophisticated narrow pulse handling techniques, leading to the current state of the art which is knows as the "3-band delay mode/" (Figure 6-12). i
Tp
:
0%
r% >=p%
LH & HL = Tp v
propa~ja'r~e
I
I
I I
vl
p%
100%
>=r%,
I I
M
ambi~uoue
m
Figure 6-12: The 3-band delay model is the current state-of-the-art when it comes to handling narrow pulses
With the 3-band model, each input-to-output delay can be qualified with two values called "r" and "p", which are specified as percentages of the total propagation delay for that path. If a pulse applied to the input is greater than or equal to p% of the propagation delay, then that pulse will pass through the gate in a transport delay mode. If the pulse is greater than or equal to r% of the propagation delay but less than p% of the delay, then it will appear at the output as an ambiguous, unknown X value. 121Finally, if the pulse is less than r% of the propagation delay, it will be rejected in an inertial delay mode. The 3-band model is quite sophisticated and allows the accurate modeling of any technology. Also, this model provides backwards compatibility with earlier modeling styles, because setting the r% and p% values to be 100% of the propagation delay will result in a pure inertial delay for that path, while setting the r% and p% values to be 0% of the propagation delay will result in a pure transport delay for that path.
}Vlixed-~.evel Simulation In Chapter 5 we described how modern mixed-level design systems allow each portion of a design to be described at the most appropriate level of abstraction, including graphical state diagrams and flowcharts, textual HDL, and gate-level schematics (Figure 6-13). 2Unknown • values are discussed in more detail in Chapters 9 and 10.
Chapter 6 Digital Logic Simulation Graphical State Diagram
Textual HDL
Dlock-level echematic \
%
/
/
/
Graphical Flowchar~
/
/
/
/
%
#
#
%
/
s
,
,"
/ !l~ii!iiii!~'~en iii!ii~iiIil clockr'j~se
/
iiiiiii::iiiiiil
!iiii? if (~== o)
:iiiii!iii::i!iii :i!iii!i:i! then y = (a & b) I c;
else y = c & ICd ^ e ) ;
, .....
/
%
%
%
% %
%
%
Gate-level schematic
Figure 6-13: Mixed-level capture systems allow each portion of the design to be represented at the most appropriate level of abstraction When the first combinations of schematic capture and logic simulation became available, the schematic was only used to generate a netlist to feed the simulator. Over time these systems became more sophisticated; first by allowing the user to select any signals to be displayed by clicking the appropriate wires in the schematic with the mouse; and later by actually displaying the logic values on the wires in the schematic whilst the user was single-stepping through the simulation. Similarly, when the first mixed-level design capture systems reached the market, they had a relatively low level of integration with the simulator. For example, although the user might describe a portion of the design as a graphical state diagram, this diagram would have to be automatically translated into an HDL equivalent before the simulator could use it. Furthermore, in order to debug the design, the user would have to single-step through reams of unfamiliar HDL. In order to be able to achieve the highest efficiency, users m u s t be able to simulate each portion of the design in the form with which they are most familiar; that is, the form in which each portion is entered. For example, assume that the user is trying to debug a portion of the design that was originally captured as a state diagram. In this case, single-stepping through the simulation should automatically cause the active state to be highlighted on the screen. Thus, today's more sophisticated mixed-level simulation environments interact with each portion of the design at whatever level of abstraction is dictated by the user.
67
68
DesignusMaximusUnleashed/ eycle-~ased
Simulators
The great thing about event-driven logic simulation using models with accurate timing data is that they are applicable to almost every type of digital design, and also that they give you a real good feel for what's going on in your circuit. The downside is that the simulation models can commandeer a lot of memory to hold the timing data (several thousand bytes per gate in some cases) and logic simulators aren't as fast as one would ideally prefer. Thus, it may not be possible to simulate a really large design in a reasonable amount of time. One solution would be to use a hardware simulator (this concept is discussed in more detail a little later in this Chapter). Another approach is to use a cycle-based simulator, such as SpeedSim from Quickturn. Cycle-based simulators are particularly well-suited for the evaluation of classical synchronous designs, which means that they cover quite a lot of ground (see also Chapter 16). In this context, l'm considering a classical synchronous design to comprise "globs" of combinational logic sandwiched between blocks of registers (Figure 6-14). F.,e.~jie'r.~l~
Combina'r,ional Re,~jie'r,~re Combina'r, ional Re,~is'r~r~ ~.~,
.
,:.
m ._l-I_r-L/'l_
Figure 6-14: A classical synchronous design consists of "globs" of combinational logic sandwiched between blocks of registers
The cycle-based and event-driven simulators can conceptually accept the same netlist. However, a cycle-based simulator such as SpeedSim will throw away all of the timing information and convert the gate-level netlists for each of the combinational logic into "fiat" Boolean equations. The end result is that a cycle-based simulator is very efficient in terms of memory usage and it can be used to verify extremely large designs very quickly, but you sacrifice the ability to check any timing. This means that you also have to perform some independent form of timing verification such as static timing analysis. However, by some strange quirk of fate, the types of designs that are amenable to cycle-based simulation are also particularly suitable for static timing analysis, which means that life isn't quite as hard as it might otherwise be. ~3~ 3Static timing analysis is considered in Chapter 7.
Chapter 6 DigitalLogicSimulation One point that the cycle-based marketing boys and girls often neglect to mention is that this form of simulation is well-suited to hardware/software integration. In the not-so-distant past (in geological terms), I was working alongside a team who were designing a RISC hardware accelerator. As is common, this project had two main facets: the hardware itself and the operating system. If a system of this complexity is verified using an event-driven simulator, pretty much the best you can hope for is to simulate a few tens of cycles of the system clock, which doesn't give you much confidence in the quality of the system's software. Of course you can build the board and then run the operating system, but this pushes a large portion of the software debugging towards the end of the design cycle, and it may be that some of the bugs ideally require modifications to the hardware. For these reasons, it is preferably to debug the software in conjunction with the hardware as early as possible in the design cycle. To solve this problem, the team spent a few weeks creating a relatively simple cycle-based simulator, which allowed them to verify the entire system up to receiving the operating system prompt on the screen and executing a few simple commands. Had they attempted to do the same thing with an event-driven simulator, they would probably still be waiting for the results.
Home-brewed
Simulators
Home-brewed simulators can appear in a variety of different guises, l've always had a soft spot for this type of thing, because they're usually relatively easy to understand and it's almost always possible to pick up a few new tricks along the way. To provide an example of a home-brewed simulator, I contacted a friend of mine, Ed Smith, a master of the mystic programming arts at VeriBest Inc., Boulder, CO. I provided the specification, he pondered over the problem for a couple of evenings, and before you could say "Buy the author of this book a large beer and put it on my tab," a brand-new home-brewed simulator was hurling its way toward me across the Internet. The idea behind this particular simulator is really quite simple. It's written in C, and the logic gates and registers are provided as C functions. To create a circuit to be simulated, you simply declare a main function that calls the primitive gates and connects them together. For example, you could describe and simulate the circuit for an 8-bit linear feedback shift register (LFSR) as follows: 141 #include "models.c" #include "sire.c" main()
4Linear feedback shift registers are introduced in more detail in Chapter 17.
69
70
DesignusMaximusUnleashed! .
,.,.,.,,..
{
.
.
.
.
.
.
.
.
.
.
.
.
.
.
,
/* Deecribe the circuit */ xnor2 ("Gl","qT","ql","wl"): xnor2 ("G2","qT""q2","w2"); xnor2 ("G3","q7""q3","w3"); tiff =("RO","clea r","clock","qP',"qO"); aft ("Rl","clea r","clock","qO","ql"); tiff ("R2","clea r","clock","wl","q2"); tiff ("R3","clea r","clock","w2","q3");
dCf ("R4"," clear"," clock""w3"" q4"); dfic ("RS","clea r","ciock","q4","qS"), dff ("R6","clea r","clock","qS","q6"), aftc ("RT,"clea r","clock""q6","qT), /* Simulate the circuit */ simulate(stimulus, response);
Note that I've cut out a few lines for the sake of brevity (such as the bit that determines the names of the stimulus and response files), but to a large extent this is all there is. The simulate function is contained in the file elm.c, while the xnor2 and clfl=functions are declared in the file moclels.c. We've provided a selection of simple logic models, but you can easily add new models of your own. If you're interested in perusing the source code for this simulator (which occupies around ten pages of C), you can root around in the homebrewecl directory on the CD-ROM accompanying this book. Please note, however, that this is totally unsupported software; neither Ed nor myself have the time to walk you through it (apart from the notes provided on the CD), so you'll be on your own. Have fun with it and feel free to let me know what you think.
Hardware
Simulators
~K~q Hardware
Emulators
As we previously discussed, logic simulation models can commandeer a lot of memory, and software-based logic simulators aren't as fast as one would ideally prefer. Thus, it may not be possible to simulate a really large design in a reasonable amount of time. One solution is to use a hardware simulator (which have started to be referred to as "hardware emulators"), of which there are a number of flavors. One of the leading purveyors of hardware emulation is Quickturn Design Systems Inc., Mountain View, CA, which supports two main flavors of such emulators: Field Programmable Gate Array (FPGA)-based and microprocessorbased.
Chapter 6 Digital Logic Simulation FPGAs are integrated circuits that can contain tens of thousands of logic gates and registers, where the function of individual blocks of logic and the interconnections between these blocks can be specified "on-the-fly" by loading configuration data into the devices while they are resident on a circuit board in a system (FPGAs are introduced in more detail in Chapters 19 and 21). In the case of Quickturn's FPGA-based solution, the hardware emulator essentially comprises a "box" containing one or more circuit boards, each of which carries a large number of FPGAs. By comparison, in the case of their microprocessor-based solution, each circuit board in the hardware emulator carries a number of special microprocessor integrated circuits, each of which contains a number (say 64) of simple processor cores (where all of the processor cores on a chip and all of the microprocessor chips can run concurrently). Both of these hardware emulation solutions are combined with a simulation environment that resides on your main (host) computer. First you capture your design as a mixture of gate-level and HDL-level representations, which are subsequently passed to the hardware emulator for verification. In the case of the FPGA-based emulator, the simulation environment will map your design into the physical gates in the FPGAs, thereby allowing you to simulate hundreds of thousands of gates (or more). By comparison, in the case of the microprocessorbased solution, the simulation environment will map your design into the low-level opcodes used by the processor cores, thereby allowing you to simulate several million gates. Each of these solutions have their advantages and disadvantages depending on what you're trying to simulate, but both solutions allow you to simulate a design at megahertz speeds (software-based logic simulators can typically only manage to simulate complex systems at speeds of 2Hz to 20 Hz). These forms of simulation are particularly appropriate for testing devices such as ASICs (which can contain hundreds of thousands of gates), because in addition to your being able to directly specify test waveforms to simulate individual components, the hardware emulator can also be connected directly into the main design. To put this another way, let's suppose that you are part of a team that's designing a circuit board that includes one or more 100,000-gate (or larger) ASICs. Due to the slow speed of software simulation, these large ASICs would typically be verified in isolation, and any board-level verification would have to wait until the physical ASICs had returned from the vendor. By comparison, a hardware emulator can be plugged into the ASIC's socket on the circuit board and used to emulate the ASIC's functionality, which allows the board-level design and verification to proceed concurrently with the ASIC's design and verification. Furthermore, while you are waiting for your "first silicon" to return from the ASIC
71
72
DesignusMaximus Unleashed!
vendor, both the board and ASIC designers can start working in conjunction with the system engineers to test and verify the system software. Hardware emulators aren't particularly cheap, but neither is creating a complex design that requires a number of time-consuming iterations to make it work. The ability to present your design to the market in a timely manner can make the difference between success and failure, and hardware emulation may well be an appropriate solution.
The topic in this chapter was published in a condensed form under the title Digital Logic Simulation: Event-Driven, Cycle-Ba~ed, and HomeBrewed, in the July 4th, 1996 Issue of EDN (www.eclnmag.com), and is reproduced in its original form here with their kind permis~ion,
I Iii !! i
number of vendors of such applications. In the case of cligital logic simulation, a good starting point would be VeriBest Inc., Boulder, CO, USA (www.veribeet.com, call 1-800-VERII3EST). Alternatively, if your taste runs more to cycle-based simulation and hardware emulation, then you could do a lot worse than talking to Quickturn Design Systems Inc.,
li Iii Ii li
Mountain Vi~, CA, USA (~,e,p.quic~urn.com). Also, please feel free to
Ii
contact a friend of mine at Quickturn called Chuck Paglicco (Email = chuckp,@qcl(trn.com) anti don t forget to mention that l sent you (so he II owe me one).
!
li ! li
Chapter 7:
Static and Dynamic Timing Analysis "Time flies like a Ptarmigan" In this chapter you will discover: In Those Days of Yore
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
74
Static Timing Analysis ....................................................................................................................... 74 D y n a m i c Timing Analysis ........................................................................................................... 77 If You're on a Shopping Spree ........................................................................................80 Timing D i a g r a m Analyzers ....................................................................................................... 82
74
Designus Maximus Unleashed!
g n 'Those
Days of yore
. . . . . .
If we were to peer deep into the mists of time, say around 1980, the vast majority of engineers simply didn't have sophisticated timing analysis tools at their fingertips. In those days of yore, the best most of us could hope for was a four-function calculator and a sharp pencil! To calculate the delays through a circuit, one used a design rule-book that gave the minimum, typical, and maximum (min:typ:max) delays through each gate and the estimated loading and track delays for that gate (one load equals 'x' ns, two loads equals 'y' ns, and so forth). I~l Working in this fashion was tedious and prone to error, and life only got worse as the complexity of the average design increased. Thus, the way was paved for some type of computer-aided solution, and the development of such solutions was facilitated by the fact that design engineers were gaining the ability to automatically generate netlists from their schematic capture tools as discussed in Chapter 5.
Static timing analysis Static timing analysis is conceptually quite simple as, to a large extent, it simply automates the process of summing gate and track delays together. First let's assume that we're considering a simple block of combinational logic (or "combinatorial" if you prefer). The user provides the analyzer with a gate-level netlist, and the analyzer then determines all of the possible paths from the primary inputs to the primary outputs. The analyzer accesses a database containing the min:Wp:max delays for each gate, along with a rule-base describing the estimated loading effects and track delays, and ultimately reports the total min:typ:max delays for each input-to-output path (Figure 7-1). Note that the process of generating the netlist from the schematic and feeding it to the static timing analyzer has been omitted from Figure 7-1 for purposes of simplicity. Also note that the scenario depicted by this illustration is substantially more sophisticated than was offered by early tools. Like many engineers, the first timing analysis package I ever laid my hands on (in 1980) was a static timing analyzer that was created internally by the group I was working with. At that time we had to key-in the netlist by hand, while the device knowledge database was in fact a set of simple ASCII text files, one for each type of component. For example, in the case of an SN7400 quad 2-input TTL NAND gate, we might have a text file called something like snT~O.del (Figure 7-2). IRemember that n~ is the abbreviation for nanoseconds, meaning one thousandth of one millionth of a second, or 10.9 seconds.
Chapter 7 Static and Dynamic Timing Analysis Results
Gate-level schematic
Device
knowledge database
Figure 7-I" A static timing analyzer reports the total min:typ:max delays for each input-to-output path Filename = sn7400.del
i i iiii~iiiiii
~i!ii iiiiiiiii~iiiiiiilj iiiiiiii!iiiiii!iil iiiiii iiiiii~iiiii!iiiiiii iiiii!
~,,',~~,,'~~ii ,~~,,i~'~,i~,'~,~i~',i~,~i,~i,'~i~,,~~,i~' i',~i~~,,~,i~,'~,i~~,~,',!i',~i,'~,'~,i~'~~!~,~i'~,~~i'~i',i~',~~' ~~~i~',~~'~i~',~i~',~~i~,~,~ii',~i~'',~~,,i~i'~~ii,~ii~i~i~,!',~',',~'~il Figure 7-2: The timing models for early static timing analyzers were actually simple text filessuch as this one When the lining analyzer found a reference to an SN7400 in the netlist, it would open up the co~esponding text file to access its delays, and so on for all of the other devices in the netlist. The contents of a delay file were presented in the form of a ma~ix, and the analyzer simply read down the inputs on the le~ unll it found the one it was interested in, then scanned across the row to find the delays in the appropriate output column. In fact, modern static Iming analyzers behave in an almost identical manner, except that the Iming information for the devices is usually more detailed than shown here (including separate delays for each input to each output for both rising and falling transitions at the ou~uts). Also, modern analyzers ~pically maintain the timing data in a compiled form, and the timing models contain some minimal amount of functional knowledge about their associated devices, such as any inversions between the inputs and outputs.
75
76
Des/gnus Maximus Unleashed/
Static timing analysis is particularly well-suited to classical synchronous designs such as pipelined processor architectures, in which "chunks" of combinational logic are partitioned by blocks of registers (Figure 7-3). RP.,gie'r~re
Combinational RP.,gie'r,cre
Combina'r,ional ~P_~jie~re
clock Figure 7-3: A classical synchronous design consists of "chunks" of combinational logic sandwiched between blocks of registers
In this case, the primary input provided by the user is the netlist along with details about the clock (period or frequency and possibly its mark/space ratio). The analyzer understands the concept of registers, and the models for the register devices will contain details about their setup and hold times. Thus, when the analyzer calculates the delays through each block of combinational logic, it can also determine whether or not these delays will violate the setup and hold times of the registers associated with that block. The main advantages of static timing analysis are that it is (relatively) fast, it doesn't require a test waveform, and it exhaustively tests every possible path into the ground. However, this form of analysis does nothing to check the functionality of the design, so if you've mistakenly used an AND gate where you intended to use a NAND, then the analyzer doesn't know or care and will simply tell you whether or not the delays meet your requirements. This means that static timing analysis has to be combined with some type of functional verification such as simulation. In fact, static timing analysis goes hand-in-hand with cycle-based simulation which, by some strange quirk of fate, is great at checking the functionality of a design but no use at all in verifying its timing. 121 Earlier we noted that one of the advantages of static analysis is that it exhaustively checks every possible input-to-output path, but this may also be considered to be a disadvantage in certain cases. Static analyzers are notorious for detecting false paths that will never be exercised in the course of the design's normal operation, so you can end up spending a lot of time instructing the analyzer as to which paths to ignore. The problem here is that there can be hundreds of such paths, and it's not unheard of for an engineer to become so carried away by the 2Cycle-based simulation was introduced in Chapter 6.
Chapter 7 Static and Dynamic Timing Analysis
monotony of saying "ignore, ignore, ignore,..." that a real problem becomes accidentally discarded in the process. The other main problem with static analyzers are that they function poorly (or not at all) in the case of designs that feature feedback loops, so you may be obliged to spend a substantial amount of time "breaking" the loops manually and passing the analyzer a number of different "views" of the design.
Dynamic iming qnaly$i$ Another form of timing verification is called dynamic timing analysis, or sometimes "worst-case" analysis. This form of analysis is based on an eventdriven simulator, and it does require the user to specify a test waveform (Figure 7-4). Gate-level schematic
Graphical response IIII N IIIIIIII IIII IIII
Graphical stimulus J---!_J-I I - - - 1 I FU-1 F1 --11 U I
IIII IIII IIII 1 ~
Device knowledge database
Figure 7-4: Dynamic timing analysis is a form of event-driven simulation which does require the user to specify a test waveform
The key difference between a standard logic simulator and a dynamic timing analyzer is that the former only uses a single rain, typ, or max delay for each path, while the latter uses a delay pair: either min:typ, typ:max, or rain:max. To illustrate the difference between logic simulation and dynamic timing analysis, consider the way in which they would each simulate a simple buffer gate (Figure 7-5).
77
78
Designus Maximus Unleashed!
min:typ:max ~ielays a
y
Logiceimulator, eingle-clelay tootle
~ur:
/ I~namictiming analyeer,dualdelay mode
IIIIIIIIIIII rain i
max
i !
Figure 7-5: Unlike a logic simulator which operates in a single-delay mode, a dynamic timing analyzer simulates the ambiguity across a delay-pair In the case of the logic simulator, a new value presented to input a will cause the output y to change after either the mln o r m a x delays (or the typical delay which is not shown here for clarity) depending on which delay was selected by the user. By comparison, assuming that we've instructed the dynamic analyzer to use the m l n : m a x delay pair, the output y begins to change after the mln delay and doesn't complete its transition until it reaches the max delay. Note that the vertical lines shown between the logic 0 and logic I levels in the dynamic timing analyzer trace DO NOT indicate an unknown X value, lal That is, the y output is considered to be performing a good transition, it's just that we don't know exactly when this transition will occur. For this reason, the guys and gals who create these tools typically call this state something like "gone high, don't know when" (there would also be a corresponding "gone low, don't know when" state). Now assume that the output from our buffer gate is presented to the input of another gate further down the line. This second gate "sees" the ambiguity on the incoming signal and adds its own ambiguity to the signal based on its delays. And so it goes throughout the circuit, with each gate supplying its own amount of ambiguity until the signal reaches a primary output from the design. One disadvantage with dynamic timing analysis is that it's a CPU-intensive process, being approximately three times slower than logic simulation and a lot slower than its static analysis counterpart. But possibly its main disadvantage is that it doesn't exhaustively check every conceivable input-to-output path, aUnknown Xvalues are discussed in more detail in Chapters 9 and 10.
Chapter 7 Static and Dynamic Timing Analysis
because it can only evaluate those paths that are exercised by the test waveform, which throws the onus back on the user to write a good set of test vectors (see also the discussion on Fault Simulation in Chapter 8). Having said this, the main advantage dynamic analyzers have over their static timing cousins is that the dynamic approach is suitable for just about every type of circuit, including those containing feedback loops--- such as asynchronous state machines - - which would cause a static analyzer to throw up its metaphorical hands in horror. Also, dynamic analyzers can detect and report timing problems that static analyzers wouldn't even consider. For example, consider the case of a humble AND gate buried deep in the heart of a circuit (Figure 7-6). Single-delay
min.'typ:max ~ielaye
mode,rainclelaye
b
a ..... i .... AND
,
I,,
Single-delay
,i
...... I
lIIIili
"
i
i
I!i!!1111
movie,maxdelays I Dual-elelaymoele, rain:maxclelaye
Figure 7-6: Dynamic timing analyzers can detect faults that static analyzers wouldn't even look for Assume that, in this particular instance, we want input a to this AND gate to be driven to a logic 0 some time before input b is driven to a logic 1, thereby trying to ensure that output y will be held at a logic O. Also remember that we're considering this gate to be embedded in the middle of a circuit, so each of its inputs are being driven by a "chain" of other gates, each of which will have its own mincyp:max delays. In the case of a static analyzer, the only information that will be reported is the total rain, typ, and max delays from the primary inputs, through this gate, and on to the primary outputs (note that it is possible for the analyzer to provide a detailed breakdown of each portion of the delay path, but you quickly end up being buried up to your ears in data). In the case of logic simula~on, if you instruct the simulator to set all of the gate delays to their rain values and run the
79
0
DesignusMaximusUnleashed! simulation it appears that there aren't any problems. Similarly, no problems would be reported if you performed a simulation of the circuit with all of the gate delays set to their max values. However, in the case of dynamic timing analysis, the analyzer simulates across the range of delays. As we see from the example in Figure 7-6, if the gates feeding the a input were from a batch that were running at their max delays, while the gates feeding the b input were from a batch running at their mln delays, then there's a chance that the b input will go to logic I before the a input has fallen to logic O, in which case we may well see a glitch appearing at the output from the gate. In many cases such a glitch would not be a problem; for example, if it occurs in the middle of a one of the blocks of combinational logic in a classical synchronous design as was illustrated in Figure 7-3. However, if this gate happened to be located in the middle of an asynchronous state machine, then we would certainly want to be informed that we have a potential problem on our hands. ~lf
~ o u ' r e on a S h o p p i n g S p r e e ...
Should you happen to find yourself on a shopping spree for a dynamic timing analyzer, then there are a number of things to look for. First of all, you want the ability to attach ambiguity to the signals in your test waveform, thereby allowing you to simulate the effects of skew on the inputs. This ambiguity can be used to represent the skew from other boards and the backplane in your system, or, in the event that you are writing a waveform to test a single integrated circuit on a component tester, then this ambiguity can be used to simulate the effects of pin skew at the test head. Note that adding ambiguity to your stimulus does not mean that you have to laboriously modify each edge in the waveform. Instead, you should be provided with the ability to bring up a dialog form that allows to specify x ns of skew on signal a, y ns of skew on signal b, and so forth. Of course, when you choose to apply this ambiguity to the stimulus, it should immediately appear in your graphical waveform display for all the world to see. Also, depending on the analyzer in question, you should be able to specify different amounts of ambiguity both before and after the original edges in the waveform, and you should also be able to specify different skew values for rising and falling edges. The next thing you is the quality of the (sometimes known signal diverges into line (Figure 7-7).
should consider when purchasing a dynamic timing analyzer algorithms it uses to detect and handle reconvergent fanout as common-mode ambiguity). This refers to the case where a multiple paths, and then reconverges somewhere down the
Chapter 7 Static and Dynamic Timing Analysis
wb
g2 wd gl
g4 WO
g3 Figure 7-7: Reconvergent fanout is where a signal diverges into multiple paths and reconverges somewhere down the line Obviously this isn't a particularly meaningful circuit, but it will serve to illustrate the point. When the dynamic analyzer is evaluating the effect of the signals wb and wc on gate g4, it must recognize the fact that some portion of the ambiguity on these signals came from gate gl and is common to both paths. Thus, the analyzer must remove this common ambiguity from its calculations to prevent it from being unduly pessimistic. This sort of thing is particularly common in the case of functions such as shift registers and counters, in which a single clock signal is used to drive a number of registers. The analyzer must recognize the fact that any ambiguity on the clock is common to all of the registers, otherwise it would flag timing problems all over the place. The purveyors of dynamic analyzers may try to blind you with science as to the quality of their algorithms, but this is often a case of "the quickness of the hand deceives the eye." The key question you should ask is: "Through how many levels of logic can the analyzer track and handle reconvergent fanout?" If the answer is "one gate," then it's effectively useless and you should start to cast your eyes elsewhere. Last but not least, you should ensure that your dynamic timing analyzer supports the concept of correlation (even if you don't need it this week, you'll be sure to find a use for it at some stage, especially if you haven't got it). The idea here is that all of the gates in a particular integrated circuit package are correlated to some degree. For example, if you plunge your hand into a bucket of hex buffers and pull two devices out, then, due to process variations and manufacturing tolerances, you stand a fair chance that one of the devices will be "fast" and the other will be "slow." This is one of the reasons why simulation models are specified with min:typ:maxtimings, which, in addition to environmental conditions such as temperature and voltage, also cover process variations. The point is that, although you don't know whether a particular device will be fast or slow, you do
81
82
Designus Maximus Unleashed! .....
.
,,
,
, . ,
,,,
know that each of the gates on that device will be correlated to some extent, and you want your dynamic timing analyzer to understand and use this information to reduce pessimism. For a long time correlation was only considered to have application in board-level designs, because designers of Application-Specific Integrated Circuits (ASICs) felt confident that all of the gates on a single piece of silicon would be correlated by default. However, in the case of deep-submicron devices containing a hundred thousand gates or more, minute process variations from one side of the chip to the other can have a significant effect. Thus, the ability to simulate correlation is now becoming increasingly important to AS IC designers.
iming Diagram 7qnalyzers In addition to the static and dynamic timing analysis programs introduced above, there are a variety of related tools which can be of use in evaluating a digital circuit's delay paths and timing constraints. One such product is a timing diagram analyzer such as the TimingDesigner application from Chronology Corporation, Redmond, WA, USA (www.chronology.com), which allows you to describe the relationships between different signals in the circuit in the form of intelligent graphical timing diagrams (Figure 7-8). In~lligent timing ~llagram
J
,/ m__
,,/
....... :~
,~q
Pulee = lOne
=:
r;
"'.\,
/ //
J
~ S~up i Hold i 6he ~,~4ne =.,;
\
',\
,," / ""-...
./
.. "- ...............................
-."
Figure 7-8: Signal relationships can be specified in the form of intelligent graphical timing diagrams At a first glance, a timing diagram analyzer may appear to be little more than a standard waveform editor of the type used to create stimulus for a digital simulation, but these tools are far more sophisticated than they seem. First of all they come equipped with libraries containing the propagation delay and timing violation parameters associated with thousands of common devices (you can also enter your own data into these libraries), and you can associate different signals with individual pins on specific devices. Next you can associate specific edges on signals with edges on other signals, and you can describe relationships in the form
Chapter 7 Static and Dynamic Timing Analysis
of equations such as: "The delay between 'these' two edges must be less than (or greater than) 'that' pulse width plus 5 ns." When you subsequently drag an edge or a pulse on one of the signals, the analyzer will report any timing violations that ensue. Furthermore, some of the signals may be defined as being clocks (with a specified frequency or period and mark/space ratio), and transitions on the other signals can be defined in relation to edges on a clock. This allows you to play "what-if" games like increasing the clock frequency to determine the point at which the circuit will fail. While they don't provide the same level of in-depth verification offered by simulation, timing diagram analyzers are exceptionally useful for purposes of specification and documentation, and their ability to represent cause-and-effect relationships and timing constraints in a clear and unambiguous manner is unsurpassed by almost any other tool.
The topic in this chapter was published in a condensed form under the title Time Flies When You re Analyzing It, in the August 1st, 1996 issue of EDN (www.ednmag.com),and is reproduced in its original form here with their kind permission,
!!.ii. lii iiii ii
With regard to the etatic and dynamic analysis toole discussed in this chapter, the vendors of such applications seem to come and go (or get acquired by other vendor) with the seasons, ~o your best bet is to root around on the lnternet to see who s doing what to whom, and then
~li i!i i~ l~ii
talk to people who are actually using these tools and find out what their r
feelings are on the ~ubject. In the caee of timing diagram analyzens, however, timingDe~igner from Chronology Corporation, Redmond, WA, USA (www.chronology.com)stands proud in the crowd,
iili ri iii
83
This Page Intentionally Left Blank
Chapter 8:
Digital Fault Simulation "Whose fault is it anyway?"
In this chapter you will discover: C o n c e p t u a l l y Simple Yet C a p r i c i o u s l y C u n n i n g .................. 86 Stuck-At, O p e n , a n d Drive Faults ....................................................................88 Fault C o l l a p s i n g .............................................................................................................................. 90 A d d i t i o n a l Fault Types ......................................................................................................... 92 Further C o n s i d e r a t i o n s ........................................................................................................ 93 Last But N o t Least ........................................................................................................................ 94
6
DesignusMaximus Unleashed!
eonceptually Simple ~et ~apriciou$1y ~unning Fault simulation is conceptually very simple yet capriciously cunning at the same time. Having captured your design, you create a set of test vectors describing stimulus to be applied to the circuit along with the expected responses from the circuit. The fault simulator analyses the circuit to determine any faults which might occur in the real world (such as shorted wires or breaks in the interconnect), and then proceeds to evaluate how many of these faults would be detected by your test vectors (Figure 8-1). The criteria for detection is that a fault must manifest itself at one or more of the circuit's primary outputs in the form of a response that differs to that of the fault-free circuit. 5chematic/Netllet
F-,esulte
"Feet vectore
I __iT_F1
I--
I7__ LI--1
Device knowl~ge
~atabaee
Figure 8-I" The fault simulator determines how many faults would be detected by the test vectors at the circuit's primary outputs
Perhaps the most popular misconception about fault simulation is that it's only role in life is to verify the quality of the test vectors which will eventually be used in conjunction with Automatic Test Equipment (ATE) to check physical boards. This misconception is promulgated by reference manuals whose process flows show fault simulation at the downstream end of the design cycle along with Automatic Test Pattern Generation (ATPG). However, fault simulation is also extremely efficacious in the early stages of the design (Figure 8-2).
.
.
.
.
.
.
.
.
.
.
.
.
::::::
-
...........
--
: .....
----::::
.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Functional verification
J
: ............
------:
~
Chapter 8 Digital Fault Simulation
................
l
/
Timing verification
j
Figure 8-2" Fault simulation should be used in the early stages of the design as part of the functional verification process
Here's the way it works: after describing the circuit and test vectors, you use a digital logic simulator to verify the functionality of your design. Maybe it passes and maybe it fails. In the latter case, you root around to determine whether the problem lies in the circuit or the test vectors, and then you fix one or the other or both. At this stage, many designers proceed directly to timing verification, but what's the point? Just because your initial test vectors appear to pass muster doesn't mean that you've got a good design, because your first-pass vectors probably don't test all of your circuit. This is where you should use fault simulation to discover which parts of the circuit have not been exercised by the test vectors, then loop back and augment the test vectors until you are satisfied that they do in fact put the design through its paces. It's only when you are convinced that you have a comprehensive test sequence that you should consider moving to timing verification. Note that the scenario in Figure 8-2 assumes that the timing verification is performed using a dynamic timing verifier, and that all three simulators (logic, fault, and timing) are assumed to use the same simulation models and test vectors. In fact, some designers (particularly at the board level) neglect any form of verification by simulation and employ static timing analysis to perform their timing verification. The arguments for and against static versus dynamic timing analysis are many and varied (see also Chapter 7). Suffice it to say that the proponents of static timing analysis would certainly point to the fact that it's exhaustive, relatively fast, and doesn't require a waveform. The problem with static timing analysis is that it myopically checks the timing without paying overmuch regard to the functionality (excepting simple cases such as understanding when an inversion has taken place). This means that static timing analysis will never inform you that you've used an AND gate where you really
87
88
Designus Maximus Unleashed!
need an OR ~ it will only report whether whatever you did use passes or fails whatever timing constraints you specified as your goal. However, we've digressed. In addition to the fact that running the fault simulator early in the design cycle increases confidence in your test vectors and, by extension, confidence in your design, there's also the added benefit that it will help you to locate any areas of the design that are difficult to test or simply untestable. This allows you to modify your design early in the design cycle to ensure its testability, thereby gaining the undying gratitude of the poor souls who are eventually going to build and test your masterpiece. (If more designers were actually forced to walk one of their creations through the layout, manufacturing, and testing processes, l'd bet my ex-wife's life savings that they would radically change the way in which they went about creating any future designs!)
Stuck-tqt, Open, and Drive ~aults The next point to consider is the different types of faults that the simulator might apply to the circuit. The three most common fault classes are known as stuck-at, open, and drive faults (Figure 8-3). DO, D1, DZ
....................................... l_...... gl.a ~
gl.y
50, 51
l ~rel
00, 01, OZ
i ....-i.................................. !
gl
Figure 8-3: The three most common fault classes are stuck-at faults (on wires), drive faults (on component outputs), and open faults (on component inputs)
Stuck-at faults occur on wires and represent a short circuit to a ground plane or power supply; thus, assuming positive logic, these are abbreviated to SO and $I respectively. Drive faults only apply to a component's output terminals, where DO and DI indicate that the component has failed such that it's constantly driving either a logic 0 or logic I value at that component's normal driving strength (see also Chapter 9), and the DZ fault represents the cases where the component has failed or has become disconnected from the wire such that it's driving a highimpedance Z state. Correspondingly, open faults only apply to a component's input terminals: all three open fault types represent cases where the input terminal has become disconnected from the wire, where OO and 01 indicate that
Chapter 8 Digital Fault Simulation
this input subsequently "floats" to a logic 0 or logic I (caused perhaps by an internal pull-down or pull-up resistor), and OZ indicates that the input assumes a high-impedance value. In the case of the circuit segment in Figure 8-3, an SO fault on wire1 would manifest itself to the outside world in an identical manner to a DO fault on g1.y or an OO fault on g2.a (similarly for a Sl on wire1 and a DI fault on g1.y or an 01 fault on g2.a). The fact that, for this example, these faults return the same results leads us toward the subject of fault collapsing which we'll discuss a little later. The main point to note at this stage is that, with the exception of a DZ, drive faults are really only of interest in the case where multiple components are driving the same wire. Similarly, with the exception of an OZ, open faults are only really of interest in the case where the wire is connected to two or more component inputs. Of course, not all fault simulators are created equal, and many will only consider a subset of the faults described above (along with additional faults which are yet to be introduced). In fact, some fault simulators are only equipped to handle stuck-at faults, in which case the purveyors of said simulators will go to great lengths (using multi-colored charts of the highest quality) to persuade you that you don't really need to consider the other faults at all. In my humble opinion, this is akin to a computer salesman proclaiming that a mouse is an unwarranted luxury item ~ having the ability to distinguish open and drive faults from stuck-ats provides a much deeper understanding of any potential problems that may eventually leap up and bite you in a sensitive portion of your anatomy when you least expect it. If you are swayed by the argument that you can get by only using stuck-at faults, another thing to watch out for is fault simulators that only handle these faults on the interconnect, but not inside the component models themselves. The main argument behind this approach is historically based on the fact that early simulators only inherently understood models of primitive gates and registers. Thus, if you wanted to create a model of a component like a multiplexer, you did so by wiring a number of simulation primitives together. This meant that, although your component model was functionally equivalent to the device in question, the model only approximated the physical reality, so many pundits proclaimed that any fault simulation of its internal structure was meaningless. In fact this is somewhat of a gray area ~ although there is some validity to the naysayer's arguments, l've personally detected and resolved many unexpected issues by performing fault simulations that include the contents of component models, even when the internal structures of those models only bore a passing resemblance to the real devices.
89
90
Designus Maximus Unleashed!
The proponents of the "don't simulate component internals" party may also point to the fact that it's difficult, if not impossible, to fully fault simulate a circuit containing large components the size of microprocessors unless you restrict yourself to the interconnect. The answer is .... yes and no. If you've got devices the size of a Pentium| II on your circuit board, fault simulating every aspect of each model would bring your computer to its knees, and you would be old and gray by the time you got to see any results (if you are already old and gray, the chances are you've discovered this for yourself). However, there's no reason why you shouldn't at least fault simulate the device's primary internal registers and major data paths. Of course, this leads us to the fact that the device models you're using may come from a third party and may therefore be encrypted. In this case, the "don't simulate component internals" brigade would gleefully point to the fact that you don't know what's inside the model anyway. However, although the model's creators may have signed a non-disclosure agreement with the device's manufacturer, it's no great secret that a component like microprocessor includes registers such as an accumulator, program counter, and so forth. All that is required is for the modeler to make a limited number of internal constructs visible to the outside world (in the form of the fault simulator), and to write a brief application note telling the user what those constructs are called and what they represent; for example, "ACC stands for Accumulator."
~ault eollapsing Fault simulation can be a time-consuming hobby, so anything that can be done to speed it up is very gratefully received. One particularly useful trick is fault equivalencing and collapsing, in which the simulator recognizes any faults that will manifest themselves identically at the primary outputs (equivalencing) and then only actually simulates one fault from that group (collapsing). As a simple example of this, consider a chain of inverters (Figure 8-4).
~'0 ~ \
\ \
Sl w2 \
L ...............................................
\ ...
/ /
/
/
I
\
\ \
\
~ ~...............................................
\\
//
w3
/
//
Figure 8-4: The fault simulator can recognize faults which generate equivalent results and collapse them to a single fault
Chapter 8 Digital Fault Simulation
A fault simulator that supports fault collapsing would recognize that an $I fault on wire wl would generate the same result as an ,~O on wire w2, which, in turn, would generate the same result as an $I on wire w3. Thus, the simulator would select one of these faults as being the prime fault, equivalence the other faults to it, and only simulate the prime (similarly for an '30 on wl, a $I on w2, and a SO on w3). The simulator would also detect any drive faults that were equivalent to open faults and, in turn, any open faults that were equivalent to short faults. Thus, even if you're simulating all fault types, the end result will probably be somewhat less horrific than you might at first fear. For example, consider the fault collapsing that might be performed on a 2-input AND gate (Figure 8-5). i ...................................................................
wl
w2
9
gl.y /
gl.a
'31
,~ 01
/
"50
~' O 0
I ,~
so
",,,,oo
'\
'
.
.
.
.
.
.
.
.
.
.
.
.
.
\
I..............................
;1
w3
/ .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Figure 8-5: The fault simulator attempts to equivalence drive and open faults to stuck-at faults In the case of an AND gate, any logic 0 applied to an input will force the output to logic O. This means that the 0 0 on g1.a can be equivalenced to the SO on wl; the 0 0 on g1.b can be equivalenced to the SO on w2; and both the SO on wl and the SO on w2 can be equivalenced to an SO on w3 (along with the DO on gl.y). Thus, all six SO, 00, and DO faults end up being equivalenced to a single SO on w3 (which may itself end up being equivalenced to some other fault further down the line). Similarly, the 01 on g1.a can be equivalenced to the $I on wl, the 01 on g1.b can equivalenced to the ,51 on w2, and the DI on g1.y can equivalenced to the ,51 on w3. The end result is that the fault simulator only actually has to simulate four of the twelve potential faults shown here. However, we should note that the equivalencies shown in this example are based on the assumption that wires wl and w2 are each only connected to a single driving gate, and that wire w3 is only connected to a single load gate. Also note that stuck-at faults are considered to have the highest precedence. This means that, given a choice, the simulator will always attempt to make a stuck-at fault the prime and equivalence any drive and open faults to that stuck-at.
91
92
DesignusMaximusUnleashed! t q d d i ~ i o n a l 3 a u l t ~cypes In addition to the stuck-at, open, and drive faults discussed above, there are a variety of other faults that we might wish our simulator to consider. For example, some fault simulators understand the primitive gate faults that can occur in certain technologies, such as an AND gate failing in such a way that it behaves as if it were an OR gate. Another type of fault that can be useful to simulate are short faults. Unlike stuck-ats (which imply an undesired connection to a power or ground plane), short faults represent unwanted connections between a pair of signal wires. The main problem with this type of fault is specifying the pairs of wires to be evaluated. An exhaustive approach would be to consider potential shorts between every possible combination of signal wires, but the universe could end before your simulation does. Another technique is to explicitly specify each pair of wires for which you wish to simulate a short, but this is extremely tedious to say the least. In the days when dual in-line packages were the rage, certain simulators provided the ability to automatically generate a list of potential short faults based on adjacent pins on the component package. Unfortunately, l'm not aware of any of today's simulators that have extended this technique to be able to handle grid-array type packages (which isn't to say that such simulators don't exist). Although all of the examples discussed thus far are based on primitive gates, some simulators understand the concept of "functional faults," which they can apply to hardware description language (HDL) representations of components, circuits, or systems. To illustrate this concept, consider the following portion of a model written in Verilog:
always @(posedge clock) if (sigA == O) regA = regB, else regA = regC; This model segment says that a positive edge occurring on the signal called clock will cause the register called regA to be loaded with the contents of regB or regC depending on the state of the signal ~igA. One example of a functional fault would be for the simulator to hold slgA at logic 0 (similar to a SO on an interconnect) and see what happens, then repeat the exercise while holding sigA at logic 1. At the very least, simulating functional faults in this way can reveal portions of the HDL which haven't been exercised by your test waveform.
Chapter 8 DigitalFaultSimulation ~urther
eonsiderations
As we previously noted, in addition to being incredibly useful at the front end of the design, fault simulators are also employed to verify the quality of the test vectors that will ultimately be used to check the physical Unit Under Test (UUT), which may be an individual device or a full circuit board. Some testers can only differentiate between logic 0 and logic I values at the UUT's primary outputs, while others can also detect high-impedance Z values. Thus, it should be possible to instruct the fault simulator as to which output values it is allowed to recognize based on the tester's specification. Another point is that a fault applied by the simulator may cause unknown • values to appear at the outputs (see also Chapters 9 and 10). In the real world these X values would actually appear at the UUT's outputs as logic Os or logic ls. But the simulator doesn't know what will happen in the real world, so it considers an X caused by a fault to represent a "potential detection." Assuming for the sake of this discussion that your tester can't detect Z values and that you've therefore instructed your fault simulator to only distinguish between Os, ls, and Xs (the simulator will coerce Zs into Xs), then it's possible to create a simple matrix to illustrate the difference between detections and potential detections (Figure 8-6). Faulty values
0 0
k~
0
1
X
0
-
D
PD
=
1
D
-
PD
D =
Detected
PD =
Potentially detected
X
Not detected
Figure 8-6: Faults causing X values to occur at the UUT's primary outputs are only potentially detected
If the known-good value was logic 0 and the faulty value was logic I (or vice versa), then the fault has definitely been detected, but if the known-good value is logic 0 or logic I and the faulty value is X, then the fault is only potentially detected. The point is that a fault simulator with any level of sophistication should allow you to specify the number of potential detections that must occur before that fault is dropped from the active list, otherwise the simulator could continue to simulate that fault ad-infinitum or until you reprogram it with a mallet in frustration.
93
94
DesignusMaximus Unleashed!
Some fault simulators are based on a serial algorithm, which means that they check each fault independently, one after another. This can be acceptable in a hardware simulator, but is pretty much untenable in a software implementation. Many software simulators favor a parallel approach to evaluating faults; this algorithm is particularly powerful at the start of a simulation when many faults are active, but it can become dog-slow toward the end when only a few faults remain to be detected.
s
~ u t J~lot s
As well as the advantages discussed above, I personally believe that using a fault simulator conveys a number of benefits, including guiding you toward creating more testable circuits and more rigorous test waveforms. If you do start to use a fault simulator, any additional time you spend up-front creating your initialization sequence will reap rewards downstream. One trick that I've found to be particularly useful is to try to initialize every register element using at least two different techniques before allowing the simulator to start applying its faults. For example, if a register has both s ~ and r e s ~ inputs, first make one active and then the other. Alternatively, if a register only has a r e s ~ input first pull the reset and then load a value into the register using the clock (it doesn't matter whether the value loaded with the clock is a logic 0 or logic I ). The reasoning behind this is that, if you only use one method such as the r e ~ to initialize your register, then when the simulator applies an 51 fault to the r e s ~ input (assuming that this signal is active low), the register will end up containing unknown (or uninitialized) X states, which can flood throughout the simulation. This results in the simulator only making potential detections, increases its memory usage, and slows it down to a crawl. Also note that using XOR gates in your circuits makes them more testable, which can ease the task of writing your test waveform and speed up your fault simulation. Last but not least, we might ask ourselves how fault simulators could evolve in the future. Off the top of my head I can think of at least two features that l'd love to see. l'm going to keep the first one to myself (in the hopes that it will one day make me rich beyond my wildest dreams), but l'd be more than willing to share the other one with you. Suppose that l'm designing something like a traffic light controller, and I know that I never want it to fail such that each of the roads at the intersection is presented with a green light. What I'd like to be able to do is to instruct the fault simulator to apply my test waveform to the circuit, and to inform me of any individual faults or combinations of a specifyable number of faults that would cause the circuit to fail in such a way as to present a particular combination of values to its outputs (all the traffic lights being green in this example). Obviously this could take forever if I let the simulator rampage throughout every
Chapter 8 Digital Fault Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. ....-,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
gate in the circuit, but this shouldn't be necessary in a mission-critical design like this. If l've done my job right as a designer, my circuit should be compartmentalized such that the controller portion of the circuit is partitioned from the driver circuitry, and the driver circuitry should contain fail-safes to prevent my worst dreams turning into nightmares. On this basis, I should be able to restrict the simulator to only exhaustively examine the driver portion of the circuit for fault combinations leading to my "all green" condition.
The topic in this chapter was published in a condensed form under the title Whose Fault is it Anyway? An Introduction to Digital Fault Simulation, in the June 6th, 1996 issue of EDN (www.ednmag.com), and is reproduced in its original form here with their kind permission. With regard to fault simulators, the vendors of such applications seem to come and go (or get acquired by other vendors) with the seasons, so your best bet is to root around on the lnternet to see who's doing what to whom, and then talk to people who are actually using these tools and find out what their feelings are on the subject.
95
This Page Intentionally Left Blank
Chapter 9:
Digital Simulation Logic Value Systems "When Xs rear their ugly heads!" In this chapter you will discover: C o o l Beans!
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
98
C r o s s - P r o d u c t V a l u e Sets ....................................................................................................... 100 Interval V a l u e Sets ............................................................................................................................. 102 C o s i m u l a t i o n B e t w e e n L o g i c V a l u e Sets ...................................................104
98
Designus Maximus Unleashed!
eooi I eans! One of the administrative assistants where I spend my days is wont to meander around the office muttering obscure phrases like: "Coo/Beans." I rarely understand what she means and it makes my brain hurt, so I retreat from any vain attempts to decipher these obscure mumblings into the relative sanity of topics such as the logic value systems used by digital simulators. Unfortunately "rela~ve" is a relative term, because there tends to be more to logic value systems than meets the eye. The vast majority of today's digital electronics are based on binary logic; that is, logic gates based on two distinct voltage levels associated with two distinct logical values. Some experiments have been performed on tertiary logic, which is based on three distinct voltage levels and logical values, and whose digits are referred to as "trits," but thus far this technology hasn't made any inroads into commercial applications (for which what's left of my brain is truly thankful). (~) But we digress. The minimum set of values required to represent the operation of basic logic functions is logic 0 and logic I, and it's certainly possible to implement a functional logic simulator using only these two values; for example, the home-brewed simulator discussed in Chapter 6. However, simulators that only support logic 0 and logic I values are somewhat limited, and the next step up the ladder is the ability to represent unknown X values. These X values can be used to represent a variety of different conditions, such as a clash between two devices driving the same wire with opposing logical signals (Figure 9-1a). c2)
LH
HL
0
Ca) Figure 9-I: Unknown X values can be used to represent clashes between signals IThere are advantages to tertiary logic, but these are offset by the fact that logical functions capable of generating and recognizing three distinct voltage levels require substantially more transistors than functions based on two voltage levels. 2The use of unknown Xvalues is discussed in more detail in Chapter 10.
Chapter 9 Digital Simulation Logic Value Systems One interesting point to note is that the simulator can automatically derive the delays to and from X values from the LH (O to I) and HL (I to O) delays specified within the component models (Figure 9-1b). For example, a transition from a O to an X at a gate's output will assume that gate's LH delay. The reason for this is that an X in this context is assumed to represent an unknown between a O and a I, so a transition from a O to an • actually represents either a O to O or a O to I transition, where only the latter case is of interest in the context of timing relationships. Similarly, a I to X transition will assume the HL delay, an X to O transition will assume the HL delay, and an X to I transition will assume the LH delay. Last but not least, it's useful for the simulator to have the ability to represent highimpedance Z values in order to be capable of simulating t r i - ~ devices. Thus, designers of logic simulators tend to regard O, 1, • and Z as being the minimum logic set required to verify the basic functionality of the majority of digital circuits. But choosing a value set is only part of the story. In order to reduce pessimism, the simulator must employ some technique to resolve the final value on a wire being driven by multiple sources. Consider a portion of a circuit containing a capacitive element combined with a resistive driving element and a strongly driving element (Figure 9-2). Logic I l~,esitive FET Node under consideration Capacitive Element
f -1 -
~
--
Normal FET
Logic 0
Figure 9-2: A simulator must be able to resolve the final value on a wire being driven by multiple sources
First assume that the resistive FET is enabled and the Normal FET is disabled, which means that the node (wire) under consideration will be driven to a weak logic I by the resistive FET. Next assume that the resistive FET is subsequently disabled, which will result in the wire being maintained at a very weak logic I by the charge on the capacitive element. Now assume that the normal FET is enabled, at which time it will overpower the capacitive element and commence to drive the wire with a strong logic O. Finally, assume that both the resistive and normal FETs are enabled, in which case the normal FET will be the dominant player and will continue to drive the wire with a strong logic O.
99
I O0
Designus Maximus Unleashed!
From this we see that it's not enough for a simulator to be merely able to represent logical values, but that it is also necessary for the simulator to be able to represent the strengths of those values, but how should we do this? Ah, therein lies the problem ......
Gross-Product 1/ lue Sets One intuitively appealing approach when deciding on a set of logic values is to consider each value as having a number of distinct attributes chosen from independent categories. Thus, for example, one may characterize a node's logic state and the "strength" of that state independently. A plethora of logic simulators use the values formed by the cross-product of the various value attributes; for example, consider the cross-product value set shown in Figure 9-3.
5tare 0
1
X
Driven (O) .............. : .........
...........
......
r
~ ~ :::
: ..........
Weak (W)
Figure 9-3: Cross-product value systems are defined by the products of strength and state attributes In this particular example, the Forced strength is used to refer to the value of a power supply or ground, with the underlying assumption that these signals will provide enough sink or source capability to be unaffected by the contributions made by any transistors or gates. The Driven strength is assumed to refer to the output from normal transistor switches, while the Weak strength is assumed to come from a resistive source, such as a depletion mode transistor. Finally, the Charged strength refers the charge stored on a capacitive element, and is generally assumed to remain unchanged until overridden by some other part of the network (although some simulators allow the user to specify decay times on these values). Unfortunately, there's a significant amount of confusion in this area, in that there are few (if any) standards with regard to the terminology. Thus, some simulators refer to states as levels or values, some refer to the Forced strength as Supply, some refer to the Driven strength as Strong, Forced, or Active, and some refer to the Weak strength as Resistive or Passive. To further add to the confusion, some
Chapter
9 DigitalSimulation Logic Value Systems I 0 1
simulators regard the high-impedance Z value as being a state (or level), while others regard it as being a strength. Last but not least, some simulators include the concept of an undefined U state that can be used to differentiate between cases such as uninitialized registers and the unknown X values generated by clashes between opposing values. Increasing
However, the main problem with simulators based on the cross-product technique is that they can be overly pessimistic in certain cases, because they tend to regard the various cross-product values as forming a regimented hierarchy
Strength
(Figure 9-4).
At a first glance this hierarchy may appear to be perfectly reasonable. After all, we would certainly expect a DH ("Driving High") to override a WL ("Weak Low") ..... wouldn't we? However, the situation may be less than straightforward when we start to consider more involved scenarios, such as transistors whose control inputs are presented with unknown X values (Figure 9-5).
./...,"" ~"-,..... Figure 9-4: Cross-product values are considered to form a regimented hierarchy Logic 1
When the simulator comes to analyze 0 this node, it does so by incrementally Resistive analyzing each of the driving terminals. FET FET Let's assume that both of the transistors i Node under were initially disabled and the capacitive consideration Capacitive element was originally presenting the Element node with a CL ("Charged Low") value from a previous evaluation. Let's further assume that the current evaluation cycle - Logic0 is triggered by a logic 0 being presented Figure 9-5: Problems may arise in crossto the control input of the resistive FET product-based simulators when at the same time that an unknown X is unknown • values rear their ugly heads presented to the control input of the normal FET. Finally, let's assume that the simulator evaluates the driving terminals in the order shown below: 1) Capacitive element 2) Normal FET 3) Resistive FET
I
102 Designus Maximus Unleashed! So, the evaluation cycle commences when the simulator looks at the capacitive element and tentatively assigns a value of CL ("Charged Low") to the node. Next the simulator considered the normal FET, which, due to the unknown X on its control input, may either be driving a high-impedance Z or a DH ("Driving High") value onto the node. In this case, the simulator determines that the state of the normal FET's output terminal is some flavor of an X, but the strength of this signal is somewhat uncertain. However, assigning a Weak or Charged strength to the signal would be inappropriate, because they understate the case where the output from this transistor could be a strong logic 1, so the only reasonable course of action is for the simulator to assign a DX ("Dr/ving Unknown") value to the node. Finally, the simulator comes to consider the resistive FET, which is clearly presenting a WH ("Weak High") value to the node, but the simulator doesn't remember that the DX value was the result of the ambiguity between a Z and a DH, so its only course of action is to leave the node set to a DX value. While this final D• value is not wrong per se, it is certainly more pessimistic than is necessary. Of more concern is the fact that, depending on the strength of the simulator's algorithms, varying the order of the terminal evaluations may have returned a different result. That is, if the simulator had evaluated the resistive FET before the normal FET, it would have assigned a tentative value of WH ("Weak High") to the node. Then, when the simulator subsequently came to evaluate the normal FET, it's algorithms may be sufficiently powerful to recognize the fact that combining a definite WH with either a Z or a DH means that the final value assigned to the node should be some flavor of a logic 1. In fact, simulators that use cross-product sets usually attempt to perform their incremental analysis in an order that would yield a reasonable answer on the example circuit shown here, but it is possible to confound them with X values in more complex circuits. It is certainly possible to "tweak" the algorithms so as to build effective simulators using cross product value sets, but the fact remains that such simulators are prone to make pessimistic evaluations on circuits containing X values, which can cause difficulty in initializing the circuits and leads to the over-propagation (flooding) of Xs through the circuit.
~Interval Value Sets The difficulties with the cross-product concept are caused by the separation of the notions of strength and state. Once a node has been set to an unknown X value of some strength, it cannot be returned to a "normal" value unless it is overpowered by a stronger signal. Thus, if a node has been tentatively set to the strongest X value, it remains at that value for the rest of the computation. As was illustrated in the previous example, this leads to conservative predictions due to
Chapter 9 Digital Simulation Logic Value Systems
the lack of suitable alternatives. Specifically, our theoretical simulator was obliged to pick the highest strength to be on the safe side, because there was no value available to reflect the fact that the high-impedance Z component that contributed to the intermediate • value was of very low strength, and hence that this component might be overridden by later network contributions. An alternative approach to constructing a set of possible node values is based on the concept of intervals. We commence by defining a set of pr/me values, each of which encapsulates both strength and state information. For example, we might commence by defining a set of prime values comprising the eight non-X values we derived in our cross-product table in the previous section along with a highimpedance Z (Figure 9-6). Prime Values
Increaeing Strength
Derived Valuee
~~mI~
!i il
...... . . . . ...............................................
- ...........
I ~.....l.i/ .
.................
..................................................
oo, I Nii/
.
.
.
.
.
.
.
~-
.
.
.
.
.
.
.
.....
] .............
.
.
.
.
\ !!
~I~ ........................................................................................................
.
.
.
.
.
.
.
.
iI--Ii............................................................................................ - ....
...................... ii ................ i!
iT
_. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Increasing " Strength
.... _r...........
Figure 9-6: An interval-value set is generated by deriving intermediate values from a set of prime values In this case, derived interval values that do not cross or touch the Z centerline correspond to a valid logic state, and intervals below and above the line represent some flavor of a logic 0 or logic 1, respectively. By comparison, intervals that do cross the centerline represent X values (the X values in the cross-product set we discussed earlier correspond to intervals with equal strengths at their end points). The end result of all of this is that X values in the interval value set represent the ambiguity as to which base values best represent the true node value. As we shall see, this is far more satisfactory than thinking of X as an independent logic state.
103
104 Designus Maximus Unleashed! When an interval value simulator merges two node values, it takes account of the fact that some prime values will overpower other prime values, and then chooses the smallest interval that covers all of the possible remaining node values. However, unlike the cross-product technique, the interval value set can represent X values without losing track of the strengths of the signals that lead to those values. To illustrate this aspect of the interval value set, let's return to consider the circuit problem we looked at earlier (Figure 9-7). When the interval value simulator compares the CL ("Charged Low") from the capacitive element with the ambiguity between the high-impedance Z and DH ("Driving High") Normal values arising from the normal FET, the FET resulting intermediate value will be the Node uncler interval encompassing the range between CL r Capacitive and DH. When the WH contribution from the Element I resistive FET is subsequently discovered, the simulator understands that the prime WH value from the resistive FET will overpower Logic 0 the weaker CL component from the Figure 9-7: Interval-value-based capacitive element, so it can narrow the simulators aren't daunted when range of the final nodal value to the interval unknown X values rear their encompassing WH and DH. Although there ugly heads remains some ambiguity about the strength of this signal, it still boils down to some flavor of a logic I, which is a sensible answer. Furthermore, the act of merging the contributions from the individual terminals driving the node is both commutative and associative, which means that the simulator can process the network in any order without affecting the final value that will be assigned to the node. Finally, we should note that the interval value set is not (of itself) particularly CPU intensive, because the intermediate evaluations can be performed using a lookup table approach. Logic 1
m
"
eosimulation ~ e t w e e n s
Value S e t s
Designers of digital simulators are faced with disparate considerations when it comes to specifying the logic value set they intend to implement. If the set is too small, it may not be possible to precisely describe the actual node value using one of the available values, which means that the simulator has to chose an approximation. If this approximation involves some variant of an unknown X, then it may cause effects beyond those dictated by the network under consideration, thereby resulting in conservative or pessimistic evaluations. By comparison, if the logic value set is two large, it becomes difficult to determine
c.opt., 9
al S
, Logic v
S
.to=.
whether the simulator's predictions are correct, and there may also be problems in displaying the results in a form that is meaningful to the user. These problems are exacerbated when we come to consider the cosimulation of two or more simulators, each of which has its own way of representing the world in the form of its logic value set. The two most popular Hardware Description Languages (I-IDLs) in use today are VHDL and Verilog. In the case of VHDL, one can essentially create one's own data types and logic sets, and one can also define the resolution function to be used, where a resolution function controls how conflicts between signals of different values are to be resolved. By comparison, Verilog is restricted to a choice between a compact signal set comprising O, I, X, and Z, or an expanded set in which signals includes both the Iog~ value and asr,oc~ted strength information. This expanded set, which is of the interval value persuasion, comprises 15 prime states. In addition to the Z state, there are seven flavors each of high and low which, in descending order of strength are as follows: Supply
Drive, Strong Drive, Pull Drive, Large Capacitive, Medium Capacitive, Weak Drive, and Small Capacitive.
In the not-so-distant past, designers tended to purchase a simulator based on the HDL of their choice and stick to it. However, there is an increasing trend towards the co-development of products across different companies, and projects spanning multiple sites and multiple companies must be able to accommodate existing investments in, and individual preferences for, diverse languages. Thus, there is an increasing requirement for simulation environments that are capable of supporting the cosimulation of VHDL and Verilog, which means that it is necessary to define the way in which signals from a module in one domain will be coerced, or mapped, into signals understood by the other domain. Although VHDL allows users to define their own data types, the majority of cosimulation environments only support mapping at the VHDL side of the simulator-to-simulator interface using signals based on the "std logic" data types as referenced in the IEEE-1164 standard. This standard defines "std logic" as having nine values as follows: O I L H X W U -
----
Strong IogicO Strong logic 1 Weak IogicO Weak logic1 Strong unknown Weak unknown Uninitialized unknown Don't care
105
106 Designus Maximus Unleashed! Note that the unknown U value is particularly useful for distinguishing cases such as uninitialized registers from X values caused from clashes between multiple devices driving opposing signals onto a common wire. Also note that the "don't care" value is of little interest to a simulator, but is instead predominantly intended for use by synthesis utilities. Be this as it may, the problem remains that in order for Verilog ~ VHDL cosimulation environments to function, it is necessary to define some technique to map between the extended Verilog and VHDL "std_Iogic" value sets (Figure 9-8).
VHDL to Verilog VHDL Verilog Verilog to VHDL
Verilog VHDL Figure 9-8: One possible mapping between the extended Verilog and VHDL "std_Iogic" value sets Note that the mapping shown in Figure 9-8 was originally proposed in the paper entitled ",Standard Verilog-Vl-IDL Interoperability," which was presented at the Spring 1994 VIUF conference by Victor Berman of Cadence Design Systems Inc., and which was subsequently passed to the IEEE PAR-1076.5 committee for their consideration as part of a VHDL-Verilog interoperability standard.
The topic in this chapter was published in a condensed form under the
!
Issue of EDN (www.ednmag.com), and is reproduced in its original form here with their kind permission,
f l!
v
3 d. 1997 ,liiiJ
Chapter 10:
Unknown X Values "Beware, here be dragons!" In this chapter you will discover: B e w a r e , Here Be D r a g o n s !
108
L o g i c V a l u e Sets .................................................................................................................................... 108 I D o n ' t K n o w a n d l D o n ' t C a r e ! ...............................................................................108 W h a t Does " U n k n o w n " A c t u a l l y M e a n ? ...................................................109 W h o ' s M a k i n g t h e Decisions A r o u n d Here A n y w a y ? ........ 111 C r i n g i n g Whiners or Reckless Thrill-Seekers? ........................................112 Xs a n d Initialization ........................................................................................................................... 112 Xs in M i x e d - S i g n a l E n v i r o n m e n t s ..............................................................................113 Xs of t h e Future ....................................................................................................................................... 114 Static versus D y n a m i c Xs ............................................................................................ 114 Inverse (NOT) Xs .......................................................................................................................... 115 Xs w i t h ID N u m b e r s ............................................................................................................... 116 A n d Finally ............................................................................................................................................ 117
108 Designus Maximus Unleashed! ~ e w a r e , Here ~ e Dragons! In ancient times, the phrase "Beware, here be dragons" was penned on maps to indicate unknown territory. Today, the same words of caution should be applied to the X value used in digital simulation. This value, which is employed to indicate an unknown state, directly affects anyone using a digital simulator, yet few users fully appreciate all of its ramifications. This lack of understanding can be extremely unfortunate, because the quality of simulation results can be drastically compromised by the underlying (and often incompatible) assumptions made by the people who write simulation models and the poor souls who actually end up having to use them.
s
Value Sets
Each digital simulator can represent a specific number of logic values. For example, certain specialized simulators only consider signals as carrying logic 0 and logic I values, while others might consider 128 different values or more. Additionally, the way in which these values are represented and evaluated varies between simulators. Some simulators employ cross-product techniques, while others prefer to use an interval-value approach. Digital simulation logic value sets were introduced in excruciating detail in the previous chapter, so for the purposes of these discussions we will concentrate on the almost universally available logic value subset comprising O, I, X, and Z (where X and Z are used to represent "unknown" and "high-impedance," respectively).
~7 Don't Know and ~ Don't e a r e ! In addition to O, I, X, and Z, most Hardware Description Languages (HDLs) also support a ? value (VHDL uses the "-"character) meaning "don't care." In fact, ~'s aren't really considered to be true simulation values in the sense that they can't actually be assigned to outputs as driven states. Instead, ?s are predominantly used to describe the way in which a model's inputs should respond to any signals that are presented to them. For example, consider the model for a simple 2:1 multiplexer (Figure 10-1). Although it would certainly be possible to enumerate all of the possible combinations of logic values that could be presented to the inputs, the use of ? values to indicate "don't care" conditions both eases the task of creating the model and more accurately reflects the model's intent.
Chapter 10 Unknown X Values .,, N,,,
,, .
.
.
.
.
.
.
.
.
.
.
,,,,, , , , ,
2:1 Mux ii|
b
~
O, O, ? => O, O, 1 7=>1; 1, 7 , 0 = > 0 ,
Y~
ii!iiiiiiiiiil i "
1,?,1 =>1;
ENDCASE;
~el Figure 10- I: Pseudo HDL for a 2: I multiplexer using ? = "don't care" states
For the sake of completeness (and to prevent me from receiving a mailbag of irate letters), we should also note that some HDLs do permit "don't care" values to be assigned to outputs. In this case, however, the "don't cares" are intended for future use by a logic synthesis utility and not by the digital simulator itself (the simulator automatically converts these assignments to unknown • values at run time). One of the most common mistakes made by novice simulation modelers is that data-books tend to use • characters to represent "don't care" conditions. If the model writer neglects to translate these into 7s which represent "don't cares" to the simulator, then somewhere down the line, some poor soul is going to spend one heck of a long time trying to figure out what's happening (and l'm tired of that someone being myself). (~I
{~lhat D o e s "qdnknown" ~qctually ]~ean? A significant problem with today's digital simulators is that they tend to use Xs to represent a wide variety of different conditions depending on the portion of the circuit in which they appear. For example, consider a D-type flip flop (Figure 10-2).
D-type
o 0
::•ii]i!]i•iii•iiii!•ii!•!•i•!ii••iii!i•i!iii}iii•i•i•iiiiiiii}!i!!•iiiiii•iii•i•i!i!i!ii
clock
1Call me "old-fashioned" if you will.
q
x
i
iii clear
i
..... ~:~ .....
Figure 10-2: X values can be used to represent uninitialized registers
109
110 Designus Maximus Unleashed! Let's assume that power has recently been applied to the system and that the register was powered-up with its clear input in the inactive state. Thus, the unknown X value on the register's output represents an uninitialized state. Additionally, if we assume that sufficient time has elapsed for the register to stabilize internally, then we may also say that this X represents a good, stable logic 0 or logic I value, we just don't happen to know which particular logic value it is. Now consider two tri-state buffers driving the same node. Let's assume that both of the buffers are enabled, and that one is attempting to drive a logic 0 value while the other is attempting to drive a logic I (Figure 10-3).
0
0
Figure 10-3: X values can be used to clashes between multiple signals driving the same wire
In this case, the physical voltage on the output depends on the relative drive strengths of the two tri-state buffers, and the X may be used by the simulator to "warn" downstream logic functions that this signal is unknown. In a worse-case scenario, this X potentially represents an uncontrolled oscillation of unknown frequency and unknown amplitude. As a final example, consider three inverters connected so as to form a feedback loop (Figure 10-4).
Figure 10-4: X values can be used to represent a controlled oscillation
In this case, the X being generated by the inverter loop actually represents a controlled oscillation between good logic 0 and logic I values (in this context, the term "good" refers to the fact that the output isn't stuck at an intermediate voltage level, but that it is achieving real logic 0 and logic I thresholds). So without trying particularly hard we've discovered three cases where the simulator might use an • value to represent different things. In fact the logic value sets employed by some HDLs do support the concept of an uninitialized U value (for example, the VHDL "Standard Logic Value" set as defined in the IEEE
.....
~ = ~ _
.........
- ....................
~
.............
---.~.~-~
. . . . . . . . . . . . . .
--~:
:~
.
.
.
.
.
.
:.~
. . . . . . .
-~ ~ ....
. . . . . . . . . . . . . . . . . . .
:~
Chapter I0 UnknownX Values 111 - .......
.................
1164). 121These U values can be used to distinguish uninitialized registers from other types of Xs, but the situation is still far from satisfactory ......
IMho's ~ a k i n g
the Decisions ~qround Here 7qnyway?
If you're using digital simulation to verify a design, and if your company is creating its own simulation models, then it is absolutely imperative for the project leaders to decide and document exactly what they want X values to represent (or, more precisely, how they want their simulation modelsto deal with Xs). For example, consider the case of a 2:1 multiplexer, which has logic I values presented to both of its data inputs and an X presented to its select input (Figure 10-5).
2:1 Mux What should the output value be?
sel Figure I0-5: Someone has to decide what happens when an X value is presented to a multiplexer's select input
The decision that has to be made is: "What do we want this model to generate as its output?" Remember that the X driving the multiplexer's select input is coming from some other part of the circuit (as seen by the simulator). But the simulator (and thus the person writing the model) cannot differentiate between Xs, and it is therefore not possible to determine what any particular X represents. One option is to assume that Xs are generally "well-behaved"; that is, that an X either represents a stable logic 0 or logic I as shown in Figure 10-2, or even well-defined oscillation as shown in Figure 10-4. If we do decide to assume that Xs are well-behaved, then we may take the optimistic attitude that the output from the multiplexer should be a logic 1, on the basis that if both of the inputs are logic 1, then it doesn't really matter which one is being selected. On the other hand, if we acknowledge that Xs may actually represent an uncontrolled oscillation of unknown frequency and amplitude, then we should really take the pessimistic approach and cause the output to drive an X.
2This nine-value set is sometimes unofficiallyreferred to as MVL-9.
112 Designus Maxirnus Unleashed! e r i n g i n g Whiners or i~eckless ~hrill-$eekers? We can use the final decision to judge the cowardliness versus aggressiveness of the project-leaders. Are they merely cringing whiners of no account, or are they reckless irresponsible thrill-seekers who want to obtain an adrenaline-rush on the company's budget? Actually, l'm not advocating any particular view as to how Xs should be treated (I could but I won't), but I am stating that someone, somewhere has to make an informed decision. The absolutely worse thing that can happen is for different models to be created by multiple modelers without any documented standard as to how their models should view and handle ;Ks. If this is the case, then "Murphy's Law '131dictates that the different modelers are guaranteed to use different approaches. The result is that, without a company standard, the end-user doesn't know what any individual model may do when presented with unknown X values. If a designer uses two models with identical functions but created by different modelers, the models could respond differently to the same stimulus (this is generally not considered to be a good thing to happen). Even worse, consider a case where one model is a high-drive equivalent of the other; by simply exchanging models to test the effects of using a different drive capability, designers may completely change their simulation results, which can potentially consume endless painful hours before it's tracked down to differences in the way in which the two simulation models handle • values. 141
~Cs a n d ~ l n i t i a l i z a t i o n X values can perform a number of different (and, as we have seen, often incompatible) roles in digital simulation, but one very common role is that of indicating uninitialized elements. In this case, some users take the view that every memory element should power-up containing ~ , and if can't clear them out as part of your initialization sequence then "shame on you." In the real world, however, there's no such thing as an • and designers are endlessly inventive in making use of the fact that, in certain cases, it doesn't matter whether a particular element contains a logic 0 or a logic 1. For example, consider a number of D-type registers configured as a divide-by-n counter. In some applications, the designer may simply not care how the individual elements of the counter initialize, so long as the little rascal counts. Now a purist would 3Anything that can go wrong will go wrong! This was first espoused in a more sophisticated form by Captain Edward A. Murphy in 1949 as a good working assumption in safety-critical engineering. 41n addition to internally-developed models, it is also necessary to define exactly what you expect when you're acquiring models from outside sources; especially if these models are coming from multiple vendors.
Chapter 10 Unknown X Values
recommend using registers with clear inputs, but the overhead of tracking an additional clear signal around the circuit may be unacceptable to the designer. Similarly, in the case of RAMs, one would generally both expect and require them to power-up with Xs in the simulation to indicate that they contain random logic 0 and logic I values. However, if the output from the RAM feeds into some other logic such as a state machine which has already been initialized, then the Xs from the RAM may "escape" into this downstream logic and "poison it." Unfortunately, there is no all-embracing answer that covers every situation. Almost every digital simulator will allow you to "force" a value onto a selected signal. For example, in the case of the inverter loop we discussed in Figure 10-4, it would be possible to force one of the signals forming the loop to a logic O, hold that value for a sufficient amount of time for its effect to propagate around the loop, and then remove the forced value to leave the loop acting as an oscillator. However, you are advised to be parsimonious in your use of this technique, because you've just introduced something into your simulation that does not reflect the circuit's real world behavior, and it is not uncommon for problems to arise sometime in the future when the circuit is revised. In particular, you must document what you've done and ensure that other team members are aware of your use of this methodology. Another common technique is to use an internal simulator function to randomly coerce uninitialized • values into logic Os and logic ls. This can be a very useful strategy, but, if the simulator supports it, you should restrain yourself to only targeting specific "trouble spots." Also, at a minimum, you must ensure that you repeat the simulation with a variety of different random seed values. Perhaps the best advice one can offer is that hardware initialization techniques should be employed wherever possible, and that simulator "tricks" should be used sparingly and with caution. It is not unheard of (he said tongue-in-cheek) for a design to function in the simulation domain only to fail on the test bench, because the designer used the simulator to force conditions that simply could not occur in the physical world.
Xs in ~ i x e d - S i g n a l Environments Last but not least, particular care must be taken with regard to ~ in mixed-signal designs, in which a digital simulator is interfaced to, and is simulating concurrently with, an analog simulator. Obviously, X values are absolutely meaningless in the analog domain, so they have to be coerced into voltage levels representing artificial logic Os and logic l s as they are passed from the digital simulator to its analog counterpart.
11 3
114 DesignusMaximusUnleashed! The problem is that the analog portion of the circuit can act like an "X filter." For example, consider a digital portion of a circuit containing uninitialized Xs driving into an analog portion, which in turn feeds back into the digital portion. The simulation environment will remove Xs by coercing them into logic Os and logic l s as the signals move across the digital-to-analog boundary, but there is no way to restore them at the analog-to-digital interface. Thus, the down-stream digital portion of the circuit sees an optimistic view of the world, which may lead portions of the circuit to appear to be initialized when, in fact, they are not. Some mixed-signal simulation environments only offer the option to coerce the • values into logic Os (or logic ls) as they are handed over to the analog simulator. This technique should generally be avoided at all costs. Wherever possible, you should employ the technique of coercing the Xs to random logic Os and logic l s. Additionally, some systems combine an initial random assignment with the fact that every subsequent X on that signal will be alternately coerced to the opposite logic value to the one used previously. In all of these cases, you should plan on performing a number of simulations using a variety of random seed values. The moral of our story is that X values should be treated with both respect (they can be extremely useful) and caution (they can potentially be the source of diverse and subtle problems). It is not enough to simply say that Xs mean unknown, because first we have to define what we mean by "unknown." Similarly, it is absolutely essential to ensure that everyone involved in creating and using a particular set of simulation models is in complete agreement as to exactly how they expect those models to behave, both in the way the models generate ~ and the way in which they respond to them.
Xs of the 3uture The • values that we know and love so well are not necessarily be the same Xs we will play with in the future. There are a number of possibilities for the evolution of Xs that developers of digital simulators might consider should circuit designers determine that they need such capabilities. A few of these options are introduced below.
Static versus Dynamic ,Ks As we previously discussed, one of the problems with current digital simulation technology is that ~ are used to represent multiple conditions, from a steadystate and well-behaved unknown (which we might call a "static X"), all the way through to an uncontrolled oscillation of unknown frequency and unknown amplitude (which we may call a "dynamic X"). One solution would be for the
Chapter 10 Unknown X Values ,,,,,
.
.
.
.
,,
,
,, .
.
.
.
,r
,,,,,,,
r, ..........
,, r
,
,,
,r,,
,
,,
,,,
both the designer and the simulator to be able to distinguish between these extremes. For example, we might use two symbols, X and #X, to represent static and dynamic unknowns, respectively. In this case, an uninitialized register element (as shown in Figure 10-2) could generate well-behaved X values, while two gates driving incompatible values onto a common signal (as was illustrated in Figure 10-3) could result in a more pessimistic #X value. To illustrate one possible application of how these two values could be used, consider a modified pseudo HDL representation of a 2:1 multiplexer (Figure 10-6). 2:1 Mux
liHiiiiiHHii il
Y
h.~ v
II
A
!
y := CASE {sel} OF 0 => a, l=>b, X => aEQU b, #X => #X; ENDCASE;
sel
Figure 10-6: Pseudo HDL for a 2: I multiplexer using X and #X values
Note that for the purposes of this discussion, the EQU operator in our pseudoHDL is assumed to return an X if the values on inputs a and b are different, and it will return whichever value is on a and b if they are the same. As we see, unlike today's digital simulators in which everyone involved (the model writer and the model user) would have to agree whether an unknown X applied to the select input would cause an optimistic or pessimistic response at the output, the ability to differentiate between X and #X would allow the model to respond appropriately in both cases.
O.verse OVO
Xs
Today's simulators do not consider the effect that inverting logic functions have on Xs. For example, if an X is presented to the input of an simple NOT gate, the resulting output from the function is also X. Thus, another possibility for the developers of simulators would be to introduce the concept of NOT X (which we might represent as ~X). The concept of ~X values could greatly reduce pessimism and aid the simulator in clearing out uninitialized unknown values (Figure 10-7).
115
116 Designus Maximus Unleashed!
data clock1
I 1
, iI
X clock2
~X
D-type
D-type
Figure 10-7: Using ~Xs to reduce pessimism This is obviously a contrived example, but it serves to illustrate the point. Assume that the circuit has recently powered-up and the registers contain unknown states. With today's simulators, you could apply clock pulses to clockl until the end of time without any useful effect. However, if the simulator inherently understood the concept of ~X, then the first positive edge on clockl would cause that register's q output to toggle from its original X to a ~X value (similarly, the complementary qb output would toggle from a ~X to an X). The simulator could be made to recognize that one of these two values would have to represent a logic O, which would cause the downstream register to be placed into its s ~ state. Thus, even though the contents of the first register remain unknown, the simulator could use its transition from X to ~X to determine that the downstream register should now be placed into a known state, which could subsequently be used to clear unknowns out of other portions of the circuit.
Xs with ~D ~lumber$ As a final suggestion (and remembering that there are many other possibilities), we might consider giving each X a unique Identification Number (ID). Every X in existing simulators is indistinguishable from its counterparts, which can result in undue pessimism (Figure 10-8).
XOR
X0R x
x
What we get today
T
/
~ 0
What we'd prefer
Figure 10-8: Today's Xs are unduly pessimistic
Chapter 10 UnknownX Values 1 1 7 Even in a relatively simple case involving an XOR gate, the gate cannot recognize the fact that the X applied to both of its inputs comes from a common source. If the simulator could tell that both of these Xs were in fact the same (and assuming that we know that this is a "well-behaved" static X and not a dynamic #X as discussed above), then the output from the gate could be assigned the lesspessimistic value of logic O, which could aid in initializing other down-stream logic (remember that O XOR O - O and I XOR I -- O). The potential benefits increase dramatically if we consider a signal that diverges into multiple paths, where each path passes through several levels of logic, and then two or more of the paths reconverge at some point downstream. In fact, tracking Xs in this way would be similar in principle to the way in which today's dynamic timing analyzers resolve timing pessimism in circuits exhibiting reconvergent fanout (also known as common-mode ambiguity), isl Another, perhaps less obvious, application of Xs having IDs could be in pinpointing an X's origin. Assume that you're running your first-pass simulation, you're only monitoring the primary outputs from the circuit, and at some stage during the simulation you see some Xs in the output waveform display. The problem is that these Xs could have originated deep in the bowels of the circuit thousands of time-steps in the past. With today's simulators, your only recourse is to rerun the simulation and work your way back from the suspect primary output like a salmon swimming upstream. However, if the Xs had unique IDs, which were related in some way so as to keep track of parent Xs and their offspring, then it would be feasible to click your mouse on an X in the waveform display, and for the simulator to inform you that: "This X originated at time 6854 ns at gate G4569". Thus, you could immediately target the offending gate and monitor all of the signals in its immediate vicinity to quickly pinpoint and isolate the problem.
Aria
inally
Each of the above suggestions (• versus #• versus ~• versus • with IDs) could be useful in isolation, but a combination of all of them could dramatically improve the quality of digital simulations. One downside to all of this is that even today's simple Xs tend to negatively impact simulation speed, because they can procreate and propagate throughout a circuit at a frightening rate. The more sophisticated Xs discussed here, especially those with individual IDs, would certainly slow the simulator down even further. However, there are a number of ways to mitigate these detrimental effects. For example, one suggestion is for the simulator to only differentiate between X, ~X, SDynamic timing analysis was introduced in Chapter 7.
118 Designus Maximus Unleashed! and Xs with IDs during the time when the circuit is undergoing its initialization sequence. Once the circuit has been initialized (at a time specified by the user), the simulator could then revert to only considering X and #X values. And finally we should note that this excursion into the world of "Wouldn't it be nice if..." has only scratched the surface of what is possible. Ultimately, it is you, the designers in the trenches, who will determine how simulation tools evolve in the future by communicating your requirements to their developers. So don't be shy: "Hug a developer today!" The topic in this chapter was published in a condensed form under the title Xe in Digital Simulation: Beware, Here be Dragone, In the October 12th, 1995 Issue of EDN (www.ednmag.com), and i~ reproduced in it~ original form here with their kincl permission. |m
Chapter 11:
Analog and Mixed-Signal Simulation "Does it wriggle, or does it g o ker-thunk ?" In this chapter you will discover: There's No Such Thing As Digital! ........................................................................... 120 B e w a r e of Terms Like " C l o c k i n g F r e q u e n c y " . ................................... 120 A n a l o g Simulation ............................................................................................................................. 121 Digital Simulation ................................................................................................................................. 125 Mixed-Signal Verification Strategies ................................................................... 126 A / d , A / D , a n d a / D ........................................................................................................................... 127 A l t e r n a t i v e C o s i m u l a t i o n Strategies .................................................................. 128 Mixing A / D a n d A / d ...................................................................................................................... 131 S u m m a r y ............................................................................................................................................................ 132
120 Designus Maximus Unleashed! ~here's JVo Such ~hing 7qs Digital{ The phrase "mixed-signal" is typically understood to refer to designs containing both analog and digital functions or components, but this something of an oversimplification. In the real world every electronic component behaves in an analog fashion, but these components can be connected together so as to form functions whose behavior is amenable to digital approximations. As the speed, or clocking frequency, of a system increases, we start to move into an area known as "High-Speed Design," in which circuits containing only digital functions begin to exhibit increasingly significant analog effects in their interconnect. At some point (which depends on the system and the technology and is subject to debate) these effects become significant enough that they must be considered in the analog domain. As the clocking frequency increases still further, the digital functions themselves begin to exhibit analog effects. At sufficiently high frequencies, which vary by system and technology, "digital" and "analog" components and functions become almost indistinguishable. Thus, the phrase mixed-signal should actually be taken to refer to designs that exhibit both analog and digital characteristics. This distinction is important, because many designers are moving into the mixed-signal arena, even though they may consider their designs to be "purely digital."
~eware of ~erms ~ike "elocking 3requency" The introductory discussions above were wont to say things like: "As the clocking frequency of a system increases ...... " But we have to be very careful here. Many people associate High-Speed Design with higher clocking frequencies, but the two are not directly related. What is important is the rate of change of signals; that is, the speed of their "edges" (Figure 11.1).
edge epeeds
Signa! C
~~~
~~x~~
~
~~
I
Figure 11- I" Edge speed is more important than clock frequency when it comes to categorizing "high-speed" designs
Chapter 11 Analog and Mixed-SignalSimulation 1 2 1 The three signals shown in this figure all have the same frequency, but they transition between logic 0 and logic I values at different speeds. As the edge speed increases we move closer to having a perfect square wave. But wait! A square wave is actually composed of multiple sine waves at different harmonics. The closer we approach a perfect square wave the higher are the frequencies of the sine waves required to form it. This is also true of individual edges; the faster the edge rate the higher the frequencies of that edge's harmonic components. So "clock frequency" is something of a "red herring." When we increase the frequency of the system clock, we also need to increase the edge-speed of our signals so as to squeeze the same amount of activity into each clock cycle. To a large extent it's the high-frequency components of these "sharper edges" that causes us problems. Thus, some engineers may experience "high-speed" design problems in products driven by relatively slow system clocks.
7qnalog Simulation With the exception of certain applications such as computers, the majority of electronic designs in the 1960s and early 1970s contained substantial amounts of analog circuitry. One reason for this was that there were a relatively limited variety of digital functions available as integrated circuits, and the majority of these were at the small-, medium-, and large-scale integration level of complexity. (~I In addition to the lack of availability of digital functions, speed requirements demanded that certain portions of designs were implemented in analog. Clock frequencies of typical mid-1970s digital circuits were in the order of 5MHz, 121so signal processing or conditioning was predominantly handled by analog circuitry. Analog design can be somewhat esoteric (as can some analog designers), which prompted a number of universities to begin investigations into computer-based analog design aids during the latter half of the 1960s. One of the first analog simulators to be presented to the industry was the Simulation Program with Integrated Circuit Emphasis (SPICE), which emerged from the University of California in Berkeley in the 1970s. SPICE 1 became available around the beginning of the 1970s, while the more popular SPICE 2 appeared around the middle of the 1970s. There were a number of versions of SPICE 2, commencing with 2A1 and cumulating with 2G6.131 At that time, commercial vendors almost ~By one convention, SSI = I to 12 gates, MSI - 13 to 99 gates, and LSI - 100 to 999 gates. 2 The phrase "in the order of" is a slippery one (which is, of course, why I use it). In fact "in the order of 5 MHz" means anywhere from 500 KHz to 50 MHz, where the majority of mid-1970s digital circuits favored the lower end of this range. 3There was a 2G7 version of SPICE, which was essentially 2G6 with a BSIM MOSFET model, but it never gained wide acceptance.
122 Designus Maximus Unleashed! uniformly based their products on Berkeley SPICE, and most were based on version 2G6. Similarly for in-house systems created by companies such as National Semiconductors and Texas Instruments. All of the SPICE 2Gx versions were written in FORTRAN and were primarily intended to be run as batch processes, which is the way in which many designers still think of analog simulation today. Around 1988 to 1989, Berkeley developed SPICE 3, which was, to a large extent, SPICE 2 rewritten in C and restructured in an attempt to make it easier to add models. Additionally, SPICE 3 had a somewhat snazzier user interface which was called NUTMEG (a spice, get it?). Unfortunately SPICE 3 contained (and still contains) a number of problems, including some serious algorithmic bugs. However, it is important to remember that (to a large extent) Berkeley's goal is research rather than generating production code, and in this respect they have succeeded beyond all expectations. To understand the problems involved in cosimulating analog and digital simulators, it is first necessary to understand the way in which these simulators operate. In the case of an analog simulator, a circuit is represented as a matrix of differential equations and, for a given stimulus, the simulator solves the matrix in an iterative way as it attempts to converge on a solution. Consider the simple circuit shown in Figure 11.2. Vin
Vout
I~
o
o
~ C
D i ~ e r e n t l a i Equation RC x ( d V o u t / a t ) + V o u t = Vin Analytical Solution Vout(=) = 1 - e -=z=c
Figure 11-2: Analog simulation: simple example circuit The differential equation describes how the circuit behaves for any time-varying input, and the combination of this equation and the input stimulus is the only data available to the simulator. (Note that the analytical solution shown in Figure 11-2 is only applicable for the special case in which a step function is applied to Vin and which, in any case, is unknown to the simulator.) The analog simulator is faced with two objectives: to determine and return accurate results, and to do so quickly and efficiently. Unfortunately, these objectives clash with each other to a large extent. As the delta-time used to solve
Chapter 11 Analog and Mixed-Signal Simulation a differential equation approaches zero, the result returned by the equation approaches its optimal accuracy. But solving the equation an infinite number of times for infinitesimal time steps would mean that the computer would never actually arrive at a final result. Ideally, the computer would prefer to use large increments of time and therefore only be obliged to solve the equation on a limited number of occasions, but the possibility of error increases as a function of the size of the time increment. The solution adopted by analog simulators is illustrated in Figure 11-3 which, for the purposes of this example, assumes a step function applied to Vin. Vout ~L
Analytical solution (unknown to the computer)
J
i l
,c3 -,- C2) -,- O)
t
Iterative solution (calculated by the computer)
- Time
Figure 11-3" Analog simulation: solving differential equations numerically The simulator selects a time-step for some time in the future (called t in this example) and attempts to solve the differential equation (shown as point (I) in the figure). The simulator then examines this solution to see how close it came with respect to some tolerance criteria. If the required tolerance was not achieved, the simulator reiterates the calculation in an attempt to come closer to the required solution (shown as points (2) and (3) in the figure). A more complex circuit would be represented by a larger number of differential equations, each of which may affect the others, which is why the simulator stores the equations as a matrix and solves them all simultaneously. Remembering that the analytical solution is unknown to the simulator, it is reasonable to wonder how this process works; that is, how does the simulator know how well it's doing if it doesn't know what the answer should be? One of the easiest ways to visualize this is to consider the case where a colleague asks you to calculate the square root of a number, say 30. Although you don't know what the square root of 30 is, you do know a method that will allow you to calculate a solution. Using one technique, you may start with the number 5 and square that to get 25. Realizing that this is too low, you would then iteratively
123
124 Designus Maximus Unleashed! modify your starting number until you achieved a result that satisfied you to within some level of tolerance that you were prepared to accept ...... 5.00 5.20 5.30 5.40 5.45 5.47
x x x x x x
5.00 5.20 5.30 5.40 5.45 5.47
-------
25.00 27.04 28.09 29.16 29.70 29.92
If you feel that the final result is close enough for your needs (that is, within the required tolerance), then you may decide to stop at this point and inform your colleague that (as far as you are concerned) the square root of 30 is 5.47. This process is conceptually very similar to the iterative approach employed by the analog simulator to solve its differential equations and to converge on a solution. Thus, in addition to "tolerances," a common term that is often used in respect to analog simulation is "convergence", which means that the simulator has sufficiently powerful algorithms and the circuit is described using sufficiently accurate equations to enable the simulator to achieve a numerical solution. If, after a specified maximum number of iterations, the simulator has not managed to converge on an acceptable solution, then it shortens the time-step and tries again. Similarly, if the simulator converges on a solution using relatively few iterations, then it will employ a larger time-step to calculate the next point in the sequence. All modern analog simulators use a form of this adaptive time-step algorithm to achieve the optimal tradeoff between simulation accuracy and CPU efficiency. This affects the cosimulation of analog and digital simulators, because the digital world uses a fixed time-step, while the analog time-step can be dynamically changing. One of the classic problems with the original SPICE (and many of its derivatives) is that it pretty much required an expert to use it. Designers can spend an inordinate amount of time fine-tuning the simulator and adjusting the tolerances, thereby tightening or loosening the convergence criteria. Tightening the tolerances generally results in improved accuracy, but also causes the simulator to take longer to evaluate the circuit. Additionally, tightening the tolerances may result in the simulator never converging at all. Of more concern is the fact that changing the tolerances can significantly modify the output from the simulator. Another problem with SPICE is found in the core primitives, some of which have discontinuities in their equations, which can cause difficulties when trying to converge in the areas of these discontinuities.
Chapter 11 Analog and Mixed-Signal Simulation
By comparison, modern analog simulators typically have more sophisticated core equations than SPICE. Additionally, some modern simulators employ heuristic techniques which allow them to select different algorithms and attempt different solutions rather than simply saying "l'm sorry old bean, but ! couldn't converge." There are several other areas where modern analog simulators differentiate themselves from their forefathers. Traditionally, creating stimulus (as tables of numerical values) could be almost as taxing as designing the circuit itself. Today's graphical environments can offer utilities to facilitate the construction of highly complex waveforms, allowing designers to graphically create waveforms using straight-line and free-form drawing features. Such waveforms may then be modified using spline-based techniques or manipulated mathematically; for example, multiplying one waveform by another. In addition to displaying waveforms in the time domain, designers can also display and modify the waveform's spectral components (magnitude and phase). These techniques allow designers to construct complex "real world" waveforms which include such features as non-linearities, overshoot, ringing, and high-frequency noise. Finally, analog simulators are not obliged to operate at the transistor-level- they may support multiple levels of modeling abstraction, including digital logic primitives (such as gates and registers) with analog interface characteristics. Also of interest are analog behavioral languages which allow portions of a design to be represented at a high level of abstraction. In fact, some analog simulators are primarily used for system level and control engineering design tasks, and are relatively poor at the transistor level required for integrated circuit design.
Digital Simulation As for their analog counterparts, digital simulators also started out as university projects. For example, one of the first, called HILO, was developed by a team headed by Dr. Gerry Musgrave at Brunel University in England. The use of digital simulators trailed their analog equivalents, and digital simulation only began to attract significant commercial attention towards the end of the 1970s. Digital simulators are based on the concepts of a fixed time-step and an "event wheel", in which events are scheduled to take place at some time in the future, c41 When an input to a digital function changes state, the simulator evaluates the logical function to determine whether this change should cause a corresponding change at the output. If the simulator determines that an output change is 4Note that a digitalsimulator does not blindly simulate every single time-step. Once all of the actions associated with a particulartime-step have been executed, the simulatorskips any empty time-steps and leaps directlyto the next time-step in the event wheel which has an action to be performed.
125
126 Designus Maximus Unleashed! required, it looks up the delay associated with this change, then posts an event to the event wheel to be actioned at the appropriate future time.
The original digital simulators were targeted towards the designs of the day, which predominantly involved gate-level TTL. Additionally, the original simulators were based on proprietary hardware description languages (HDLs), which were generally little more than netlists calling simulator primitives. These languages were gradually extended to the functional level of Boolean expressions and Register Transfer Language (RTL) capability, followed by excursions into a more behavioral level of abstraction. Currently, the two most popular digital HDLs are VHDL and Verilog HDL, while the C programming language is also commonly used to represent certain functions at the board and system levels. Early digital simulators were based on a distributed delay model, in which each primitive gate had its own individual delays. This model remains useful for some applications, but modern digital simulators typically also support Pin-to-Pin (Pn-Pn) delay specifications, which are particularly advantageous for modeling the cells used in integrated circuit designs. Additionally, due to the fact that the early digital simulators were tailored to representing TTL logic at the board level, they predominantly used an inertial delay model, in which pulses that were narrower than the gate's propagation delay were rejected. The inertial delay model is insufficient for components such as delay lines, so digital simulators were augmented to support transport delay specifications, in which pulses are always propagated, irrespective of the width of the pulse or the propagation delay of the logic gate. Is~
jYlixed-$ignal Verification Strategies The traditional approach for designs that contain both analog and digital elements is to partition the design at the beginning of its development cycle. The digital and analog portions are then captured and verified in isolation, and they are only reunited at the prototyping stage. This is the simplest technique, and it may be appropriate for highly partitioned designs without feedback. However, this strategy does not provide the designer with much confidence that the digital and analog portions will interface correctly. A development of this technique is to verify one portion of the design, analog or digital, using the relevant simulator. The simulation output is captured and then coerced (mapped) into a suitable format for use as stimulus for the other portion. Once again, this technique may be suitable for highly partitioned designs without feedback, but it is typically painful and time consuming. SThe way in which digital simulators work (includingdetains on distributed delays versus Pn-Pn delays, and inertial delays versus transport delays) was introduced in more detail in Chapter 6.
Chapter I I Analog and Mixed-Signal Simulation
Neither of the above techniques can be categorized as true mixed-signal verification. To be classed as mixed-signal, the simulation environment must support the concurrent verification of both the analog and digital portions of the design. There are many cases where true mixed-signal verification is the only realistic alternative. Designs in this category include, but are not limited to, those exhibiting tightly coupled feedback between the analog and digital portions; for example, circuits employing Phase-Lock Loop (PLL) techniques. Additionally, the advent of multimedia technology is dramatically increasing the usage of Digital Signal Processing (DSP) techniques, which can involve sophisticated and tightly integrated combinations of analog and digital circuitry. Mixed-signal verification may also be required in the case of high-speed designs employing only digital components. Although the bulk of the design may be verified using a digital simulator, it may be necessary to subject the critical interconnection paths to a more exhaustive analog analysis. In some cases it may be sufficient to perform signal integrity analysis off-line (on the interconnect only), but cases where the parasitic effects are dependent on the state of other signals may mandate a mixed-signal approach. 161
ld,
ID, and aid
Until recently, the phrase "mixed-signal simulation" has typically been associated with the cosimulation of analog and digital simulators, but a more precise classification of simulation technology is beginning to emerge: D a/D A/D A/d A
= = = = =
Pure digital simulator Digital simulator with native analog capability Cosimulation of analog and digital simulators Analog simulator with native digital capability Pure analog eimulator
Designs containing large, complex portions of analog and digital may mandate the use of an A/D technique, which involves linking analog and digital simulators together. Many designs, however, may be responsive to the a/D or A/d forms of evaluation. The a/D technique is typically employed for primarily digital designs containing some proportion of analog. However, in this case the analog portions are usually in the form of relatively large functions which are amenable to behavioral representations; for example, analog-to-digital converters. Also, the a/D technique relies on a digital simulator which can support C models (or VHDL models with signals of type REAL). 6The concept of parasitic effects that are dependent on the state of other signals is introduced in more detail in Chapter 25.
127
128 Designus Maximus Unleashed! By comparison, the Aid technique relies on the use of an analog simulator that inherently understands the concept of digital primitives such as logic gates and registers. These gates are handled by the simulator in a digital fashion, but can have associated analog interface characteristics such as input and output impedances, input switching thresholds, output slope, overshoot, and frequency damping. A modern analog simulator with Aid capability can simulate a few tens of thousands of logic gates in this manner. The Aid technique is typically of use with primarily analog designs containing a limited amount of digital functionality in the form of relatively simple logic functions. The A/d technique is of particular interest for a number of reasons, not the least that analog simulators with this capability usually contain their own version of an event wheel. This can become significant when the analog simulator is cosimulated with a digital simulator as discussed below.
~lternative eosimulation Strategies Before introducing the various A/D cosimulation strategies, it is appropriate to consider the environment necessary to support full mixed-signal simulation. Today's mixed-level design practices require that each portion of a design may be represented at the most appropriate level of abstraction. On the digital side, the design system should ideally allow individual portions of a design to be represented using graphical techniques such as state diagrams, state tables, and truth tables, as textural HDL, as gate-level schematics and/or netlists, and as physical devices interfaced using a hardware modeler. Similarly, on the analog side, the design system should ideally allow individual portions of a design to be represented using an analog behavioral language, native digital logic (with analog interface characteristics), and as transistor-level schematics and/or netlists. The first requirement of such a system is that some hierarchical blocks may contain analog views and some may contain digital views. Additionally, the analog and digital portions of a design may not always be amenable to being partitioned into discrete blocks. Thus, the system should also support analog and digital elements in the same block; for example, primitive logic gates combined with transistors, resistors, and capacitors. After the design has been captured, the system should be capable of automatically partitioning the analog and digital functions and presenting them to the appropriate simulation engines. Additionally, the system should inject special elements to interface the analog and digital worlds (Figure 11-4).
Chapter 11 Analog and Mixed-Signal Simulation A digital traneition ie coerce~l into it~ analog counterpar~
/
Y Interface element
Digital domain
~
An analog transition is coerce~l into its digital counterpart
-•+5V 1 ov Analog domain
Interface element
~' Digitaldomain
Figure 11-4: The mixed-signal environment should automatically inject special interface elements For example, in the case of a digital function driving into the analog world, the interface element must map the "sharp" digital edges into analog equivalents such as impedances and slopes. To avoid the necessity for manual intervention, the system should automatically inject the appropriate interface element based on a technology parameter (such as S, AS, LS, ALS, F, ...) associated with the digital symbol. The system should also support user-defined technologies for fullcustom applications. Last but not least, the system should support cross-probing of analog and digital signals from the schematic, and display both analog and digital traces in the same analysis window. Moving on to the actual verification of the design, there are a number of cosimulation strategies available. The majority of cosimulation techniques may be classed as either Unified, Simulation Backplane, Glued, or Coupled. In the case of the unified approach, a single simulation database is shared by the digital and analog simulation engines. This is generally regarded as offering the fastest simulation speed, but the engines themselves are typically inferior. Note that, in this context, the term "inferior" does not necessarily imply that there is anything wrong with the engines, just that they are usually restricted in the levels of modeling abstraction they support and the types of analysis they can perform. For example, unified solutions are typically only capable of time domain analysis, even when performing "pure" analog simulations. Simulation backplanes are usually promoted in terms of "plug-and-play" capability; that is, the ability to combine digital and analog simulators of choice.
129
130 Designus Maximus Unleashed! In reality the supported simulators are usually limited, the final solution is often restricted to the smallest subset of capabilities offered by all of the simulators, and there can also be significant initialization problems. There is also an obvious overhead associated with having three processes communicating with each other (the two simulators and the backplane). Also, there can be a huge overhead associated with type conversion between simulation engines, and, potentially more troublesome, backplanes may perform inefficient and somewhat inaccurate type conversions via some intermediate form. By comparison, the glued approach refers to the case where two simulators are linked via a C interface in a master/slave relationship. The typical case is for the digital simulator to be the master, and for this master to "see" the analog simulator as a C model. The glued approach offers a much tighter coupling than a simulation backplane, as well as increased accuracy and efficiency. Last but certainly not least, we have the coupled approach, which is somewhat similar to the glued technique in that it involves linking the two simulators via a C interface. However, the coupled approach is based on an analog simulator with Aid capability, in which the analog simulator inherently understands the concept of digital primitives such as logic gates and registers. As was noted above, analog simulators with this capability usually contain their own version of an event wheel. In the coupled approach, both simulators have the ability to mutually schedule and access events on this event wheel. The proponents of this scheme believe that the coupled approach offers the best combination of features from the unified and glued techniques. A major consideration with all of the alternative cosimulation strategies discussed above is the way in which the analog and digital simulators are synchronized in time. As has already been noted, this is a non-trivial problem because the digital world uses a fixed time-step, while the analog time-step can be dynamically changing. The two most common synchronization techniques are known as the Lockstep and Calaveras algorithms. The Lockstep algorithm requires that the two engines are locked together in time throughout the course of the simulation, which means that the digital engine has to take the smallest time-step that the analog engine does. Additionally, the first engine to complete a particular timestep must wait for the other engine to catch up. By comparison, the Calaveras algorithm allows the simulators to run ahead of the other. If it is subsequently determined that evaluations in one domain would have affected the other domain, then time is "wound back" (as far as is necessary to account for the interaction) and the process is repeated with the new data. Champions of the Calaveras algorithm would say that the Lockstep technique is inefficient, because one simulator is always waiting for the other. Similarly,
Chapter I I Analog and Mixed-Signal Simulation
proponents of the Lockstep algorithm would say that the Calaveras technique is inefficient on the basis that the simulators are constantly throwing data away, and the simulation of a particular group of time-steps may be performed several times. In reality, each approach has its strengths and weaknesses, and the performance of each is strongly dependent on the topology and content of the particular circuit under evaluation. One technique that might be used to improve the Calaveras algorithm would be to monitor simulation activity to see which simulator runs ahead the most, and to then vary the proportion of CPU time that is made available to each simulator to balance the load.
I ixin
,"lid
,"lid
As we previously discussed, the Aid technique involved an analog simulator that inherently understands the concept of digital primitives such as logic gates and registers. These gates are simulated in a digital fashion, but can have associated analog interface characteristics such as input and output impedances, input switching thresholds, output slope, overshoot, and frequency damping. A modern analog simulator with Aid capability can simulate a few tens of thousands of logic gates in this manner. These Aid models combine the simulation accuracy of pure analog with simulation speeds which can be 250 times faster than their transistor-level equivalents; however, this is still substantially slower than simulating logic gates using a digital simulator. More important is the limitation of the number of logic gates that can be simulated using the Aid technique, because today's integrated circuits can contain many hundreds of thousands of logic gates. One answer to this problem is to use a combination of A/D and Aid simulation techniques (Figure 11-5). MD boundary == Digital Simulator '~
II
~=~
A/D boundary
Analog Simulator
~
!
~
i
Digital Simulator
.~..:~:::~::,::::~;~~ ~!i~]~:~~:~;~:~:~:~i~
ii!ii!ii~i~ililiiiii~iliii!iii~!i!i!i~ii!i!ii~ii~i~i~i~i~i~i!~i~i~i~i~i~,~~ i~i~i~i~i~i~ii~ ' ~: ~,"~
~;~:~:~:
::::::::::::::::::::::::::::::::::::::::::::::::::: iilil iiiiiiiiiii!i!i!::i::::::iiiili~::~::~:~:~:~:~: ~ ~..',~.~;~.,,.'..'
Figure 1 I-5: Mixing A/D and A/d techniques
\
!
Aid boundaries
~":~~"....... ~ ,
i~l
: ......... ~: "~ ........................ ~ ..................... ~::~i~::~::~ ~!~@".~!
~ii~i~i~!ii~i~ii~i!~!~ii~i~i~i!~iii~iii~i~ii~i~i~i~!~iii~iiiii~i~i~i~i~iii~i!i~i~i!i~i~i~i~i!!~i~i~!~! .............................................................................................
~
~ t ~
!iiiiiiii~/.ii~iiiiiiiii!iiiii!iiiiiiii!iiii i i i i i!i !i i iilili!iiiiiiii!!iiiiiiiiiiiiii!iliiiiiiii~i~iiiii~i iii
. . . . . . . . . . . . ~ ' ~ I ~ . . . . . . . . ,7 .........
Digital functions with analog interface characteristics
i
: ~ ~ " J ......
131
132 Designus Maximus Unleashed! The idea here is that devices at this ultra-large-scale of integration typically contain complex digital macro-functions such as microprocessor cores, blocks of memory, and communications functions. These functions can be rigorously characterized on an individual basis, and then represented as C models, as behavioral HDL models (such as VHDL or Verilog HDL), as physical devices interfaced using a hardware modeler, as gate-level netlists, or as a combination of all of these techniques. Thus, the strongly characterized digital macro-functions can be simulated in the digital domain, which can be interfaced to the analog domain using A/D mixedsignal techniques. Meanwhile, the critical paths in the remaining "glue logic" (the simple digital primitives linking the macro-functions) can be simulated in the analog domain using Aid mixed-signal simulation techniques.
Summary A number of independent surveys indicate that the late 1990s will see a rapid increase in the use of mixed-signal simulation technology. As we have discussed, there are a variety of strategies available including: digital simulators with native analog capability (a/D); analog simulators with native digital capability (Aid); and full cosimulation of analog and digital simulators (AID). Also, the most advantageous approach in many cases may be to use a mixture of strategies; for example, AID combined with Aid. With regard to cosimulation, there are a number of techniques, and it is difficult to state that any particular approach is superior to any other, because each design presents its own unique problems which demand their own unique solutions. In fact, the choice of a particular cosimulation technique may be largely governed by the target application; for example, small numbers of large, complex analog and digital functions may be appropriate for one approach, while large numbers of small, simple analog and digital functions may dictate an alternative technique. That's it. You are on your own. Be careful. It's a jungle out there.
The topic in this chapter was published in a oondensed form under the title Some Designs Send Mixed Signals, in the October 9th, 1997 Issue of EDN (www.ednmag.com), and is reproduced in its original form here with their kind permission.
This Page Intentionally Left Blank
Chapter 12: A minus B = A + NOT(B) + 1 "Has a n y o n e s e e n m y t w o ' s complementor? " In this chapter you will discover: It All S e e m e d So Easy Then ...............................................................................................136 C o m p l e m e n t T e c h n i q u e s .....................................................................................................136 Signed Binary N u m b e r s ............................................................................................................140 The A r i t h m e t i c / L o g i c Unit (ALU) ................................................................................143 The " C o r e " ALU ...................................................................................................................................... 144 Extending t h e C o r e ALU to Perform S u b t r a c t i o n s ..................... 148
136 Designus Maximus Unleashed!
~Tt ~qll S e e m e d S o E a s y ~hen
Following the joys of the "Microprocessor 101" courses we endured at college, most of us are reasonably confident that we understand the way in which computers add and subtract binary numbers. Thereafter, we bask in the rosy glow that assembly-level instructions such as A00 ("add without carry"), AOIPG ("add with carry"), SUB ("subtract without carry"), and SI.I~G ("subtract with carry") hold no fear for masters of the universe such as we. I~l The years race by as if they're in a desperate hurry to get somewhere, until we arrive at that grim and fateful day when someone asks us to replicate an addeffsubtractor function ourselves, either as a chunk of logic for a design, or possibly using the hardware description language of our choice to create a model for simulation and synthesis. Our first port of call may well be to blow the cobwebs off our microprocessor course notes, only to find that there's more to this than we seem to recall. The description of the AlPlP instruction (which doesn't require a carry-in) looks simple enough, saying something like a[7:O] plue ~[7:O] = a[7:0] + ~[7:O] (assuming an 8-bit datapath). It's when we turn our attention to the SUB, whose definition may be something along the lines of a[7:O] minue ~[7:O] = a[7:O] + NO'r(~[7:O] ) + I, that we realize that perhaps we should have paid just a tad more attention in Prof. Gonzo Dribbler's Monday morning lectures (possibly to the extent of actually staying awake). Light begins to dawn after a few moments contemplation, as we start to recall that these calculations are performed using two's complement arithmetic. So it comes as something of a shock when we come to peer at the block diagram of the Arithmetic-Logic Unit (ALU), desperately searching for a two's complementor, only to find a humble one's complementor glaring back at us as though it has every right to be there. "Holy socks Batman, how can this be?" Obviously we need to go back to first principles in order to figure this out .....
eomplement ~echm'que$ There are two forms of complement associated with every number system, the radix complement and the diminished radix complement, where the term "radix" (which comes from the Latin word meaning "root") refers to the base of the number system. Under the decimal (base-10) system, the radix complement is also known as the ten's complement and the diminished radix complement is known as the nine's complement. First consider a decimal subtraction performed using the nine's complement technique - a process known in ancient times as "Casting out the nines" (Figure 12-1). ~In many microprocessor instruction sets, the mnemonic SU~B ("subtract with borrow") is used in preference to SLIBG("subtract with carry"), but the resulting actions are identical.
Chapter 12 A minus B = A + NOT(B) + 1 1 3 7
Nine's complement equivalent
Standard subtraction .
.
.
.
.
.
.
.
.
.
.
.
647 -283 =364
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
647
999
-283
~
=71 6
+ 716 =1 3 6 3
endaroundcarry
.-1 364
Take nines complement
Figure 12-I" Decimal subtractions performed using nine's complements don't require any borrows, but they do require an end-around carry
The standard way of performing the operation would be to subtract the subtrahend (283) from the minuend (647) which, as in this example, may require the use of one or more borrow operations. To perform the equivalent operation using a nine's complement technique, each of the digits of the subtrahend is first subtracted from a 9. The resulting nine's complement value is added to the minuend, then an end-around-carry operation is performed. The advantage of the nine's complement technique is that it is never necessary to perform a borrow operation (hence its attraction to those of limited numerical ability in the days of yore). Now consider the same subtraction performed using the ten's complement technique (Figure 12-2). The advantage of the ten's complement is that it is not necessary to perform an end-around-carry, because any carry-out resulting from the addition of the most-significant digits is simply dropped from the final result. The disadvantage is that, during the process of creating the ten's complement, it is necessary to perform a borrow operation for every non-zero digit in the subtrahend. (This problem can be overcome by first taking the nine's complement of the subtrahend, adding one to the result, and then performing the remaining operations as for the ten's complement). Nine's complement equivalent
Standard subtraction .
647 -283 =364
.
.
.
.
.
.
.
.
.
.
.
.
.
.
=
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1 000 283 717
Take tens complement
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
647, + 717 =1 3 6 4 364
i
drop any carry
Figure 12-2: Decimal subtractions performed using ten's complements do require borrows, but they don't require an end-around carry
138 Designus Maximus Unleashed! Similar techniques may be employed with any number system, including binary (base-2), in which the radix complement is known as the two's complement and the diminished radix complement is known as the one's complement. First consider a binary subtraction performed using the one's complement technique on 8-bit unsigned binary values, where such values can be used to represent positive integers in the range 0~0 to 255~0 (Figure 12-3). Standard subtraction 00111001 -0001 1 1 10 = 000 1 10 1 1
One's complement equivalent ...................................................................................................................
~1
-r
11111111 -00011110 = 11100001 \
A Take once
,
__~ 00111001 ~ + 11100001 = 100011010
I
r
/
complement
L
9I
00011011
endaround carry
,57,0- 301o=27,0
Figure 12-3: Binary subtractions performed using one's complements don't require any borrows, but they do require an end-around carry
Once again, the standard way of performing the operation would be to subtract the subtrahend (000111102) from the minuend (0011i0012), which may require the use of one or more borrow operations. (Don't beat your head against a wall trying to understand the standard binary subtraction because we won't ever be doing one - simply take my word as to the result). To perform the equivalent operation in one's complement, each of the digits of the subtrahend is first subtracted from a 1. The resulting one's complement value is added to the minuend, then an end-around-carry operation is performed. As for the nine's complement process, the advantage of the one's complement technique is that it is never necessary to perform a borrow operation. In fact, it isn't even necessary to perform a subtraction operation, because the one's complement of a binary number can be generated simply by inverting all of its bits, that is, by exchanging all of the 0s with l s and vice versa. This means that, even if we stopped here, you already know how to perform a simple binary subtraction using only inversion and addition, without any actual subtraction being involved! Now consider the same binary subtraction performed using the two's complement technique (Figure 12-4).
Chapter 12
A minus B = A + NOT(B) +
1 139
Two's complement equivalent
Standard subtraction
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
O0111 O01 -00011110
-
= 00011011
=
100000000 00011110
\
1 1100010
A
Take twos
__~ ~ -+y
/
O0111001 1 1100010
i drop
= 100011011
any
carry
00011011
complement 571o- 30~o= 271o
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
,
Figure 12-4: Binary subtractions performed using two's complements do require borrows, but they don't require an end-around carry
As for the ten's complement technique, the advantage of the two's complement is that it is not necessary to perform an end-around-carry, because any carry-out resulting from the addition of the two most-significant bits is simply dropped from the final result. The disadvantage is that, during the process of creating the two's complement, it is necessary to perform a borrow operation for every non-zero digit in the subtrahend. This problem can be overcome by first taking the one's complement of the subtrahend, adding one to the result, and then performing the remaining operations as for the two's complement. As fate would have it, there is also a short-cut approach available to generate the two's complement of a binary number. Commencing with the least significant bit of the value to be complemented, each bit up to and including the first I is copied directly, then the remaining bits are inverted (Figure 12-5). MSB
complemente~t
LSI3
.................................................................................
Copy from LSB up to
and including the first 1 Invert any remaining bits complement
Figure 12-5" Fortunately there's a shortcut technique for generating the two's complement of a binary value
Unfortunately, both the one's and two's complement techniques will return incorrect results if we're using unsigned binary representations and a larger value is subtracted from a smaller value; that is, for these techniques to work, the final result must be greater than or equal to zero. The reason for this is fairly obvious,
140 DesignusMaximusUnleashed! because subtracting a larger number from a smaller number results in a negative value, but we've been using unsigned binary numbers which, by definition, can only be used to represent positive values. It is obviously impractical to only ever perform calculations that will have positive results, so we are obliged to come up with some way of representing negative values. One solution is to use signed binary numbers.
Signed
~inary N u m b e r s
In standard decimal arithmetic, negative numbers are typically represented in
sign-magnitude form by prefixing the value with a minus sign. For example, a value of plus twenty-seven would be represented as +27 (or just 27 for short), while a value of minus twenty-seven would be indicated as -27 (where the '+' or '-' is the "sign" and the '27' is the "magnitude", hence the "sign-magnitude" designation). Similarly, we could replicate the sign-magnitude form in binary, by simply using the most significant bit to represent the sign of the number (0 -- positive, 1 - negative). However, computers rarely employ the sign-magnitude form, but instead use a format known as signed binary. Signed binary numbers can be used to represent both positive and negative values, and they do this in a rather cunning way. To illustrate the differences between the sign-magnitude and signed binary formats, consider the decimal sign-magnitude representations of plus and minus twenty-seven, along with the same values represented as signed binary numbers (Figure 12-6). N~af, ive value P'oeit,ive value
o
ol \
~
+271o
ll U 1 0 l 0 1 0 12 /
(-12 1o) + (,lol,o) - -27,o
Figure 12-6: Positive and negative versions of a number look radically different in the signed binary format Unlike the decimal values, the bit patterns of the two binary numbers are very different, because the signed binary equivalent of-2710 is formed by combining -12810 with + 101~0. That is, the least significant bits continue to represent the same positive quantities as for unsigned binary numbers, while the most-
:
,,,,,,,,,,
. . . . .
C h a p t e r 12 A minus B = A + NOT(B) + I . .........
,,,,,,,,,,
,,,,
,
,,,
,
.
.
.
.
significant bit is used to represent an actual negative quantity rather than a simple plus or minus. In the case of a signed 8-bit number, a '1' in the sign bit represents -27 (- -128), and the remaining bits are used to represent positive values in the range 010 through + 127 m. Thus, an 8-bit signed binary number can be used to represent values in the range -128 m through + 127 m. At a first glance, signed binary numbers appear to be an outrageously complex solution to a fairly simple problem. In addition to representing an asymmetrical range of negative and positive numbers (-128~0 through + 127 m in the case of an 8-bit value), the way in which these values are formed is, to put it mildly, alien to the way we're used to thinking of numbers. Why then, you may ask, don't we simply use the most-significant bit to represent the sign of the number and leave it at that? Well as you may expect, there's reason behind our madness. First, if we did use the most significant bit to represent only the sign of the number, then such numbers would accommodate both +0 and -0 values. Although this may not seem like a particularly hairy stumbling block, computers are essentially dumb, and it would introduce complications in recognizing whether or not a given value was less than zero or equal to zero (or whether +0 was greater than or equal to -0). But there's a lot more to signed binary numbers than this. Now pay attention, because this is the clever part; closer investigation of the two binary values in Figure 12-6 reveals that each bit pattern is in fact the two's complement of the other! To put this another way; taking the two's complement of a positive signed binary value returns its negative equivalent, and vice versa (the only problem being that, due to the asymmetrical range, the largest negative number can't be negated; for example, in an 8-bit number, you can't negate -128 m to get + 128 m because the maximum positive value supported is + 127m). The end result of all this rigmarole is that using signed binary numbers (which are also commonly referred to as two's-complement numbers) greatly reduces the complexity of the operations within a computer. To illustrate why this is so, let's consider one of the simplest operations: that of addition. Compare the following additions of positive and negative decimal values in sign-magnitude form with their signed binary counterparts (Figure 12-7). First examine the standard decimal calculations - the one at the top is easy to understand, because it's a straightforward addition of two positive values. However, even though we are familiar with decimal addition, the other three problems aren't quite as simple, because we have to decide exactly what to do with the negative values. By comparison, the signed binary calculations on the right are all simple additions, irrespective of whether the individual values are positive or negative.
141
142 Designus Maximus Unleashed! Decimal signmagnitude
Signed binary
+
57 30
00111001 +00011110
=
87
=01010111
57 + -3 0
O0111001 + 11100010
=
27
= 00011011
+
-5 7 30
11000111 +00011110
= -2 7
= 11100101
-5 7 + -3 0
+ 11100010
= -8 7
= 10101001
11000111
Figure 12-7: The signed binary form facilitates a d d i n g both positive a n d n e g a t i v e values
If a computer were forced to use a binary version of the sign-magnitude form to perform additions, then instead of performing its calculations effortlessly and quickly, it would have to perform a painful sequence of operations. First of all the computer would have to compare the signs of the two numbers. If the signs were the same the computer could simply add the two values (excluding the sign bits themselves), because, in this case, the result would always have the same sign as the original numbers. However, if the signs were different, the computer would have to subtract the smaller value from the larger value, and then ensure that the correct sign was appended to the result. As well as being time consuming, performing all of these operations would require a substantial amount of logic gates. Thus, the advantage of the signed binary format for addition operations is apparent: signed binary numbers can always be directly added together to provide the correct result in a single operation, irrespective of whether they represent positive or negative values. That is, the operations a + b, a + (-b), (-a) + b, and (-a) + (-b) are all performed in exactly the same way, by simply adding the two values together. This results in adders that can be constructed using a minimum number of logic gates and are consequently fast.
chopt., 12 A mi,.u ,
- A §
§
l
Now consider the case of subtraction. We all know that 10 - 3 - 7 in decimal arithmetic, and that the same result can be obtained by negating the right-hand value and inverting the operation; that is, 10 + (-3) - 7. This technique is also true for signed binary arithmetic, although the negation of the right hand value is performed by taking its two's complement rather than by changing its sign. For example, consider a generic signed binary subtraction represented by a - I~. Generating the two's complement of I~ results in -b, allowing the operation to be performed as an addition: a + (-I~). Similarly, equivalent operations to a - (-~), (-a) - b, and (-a) - (-I~) are all performed in exactly the same way, by simply taking the two's complement of ~ and adding the result to a, irrespective of whether a or I~ represent positive or negative values. This means that computers do not require two different blocks of logic (one to add numbers and another to subtract them); instead, they only require an adder and the ability to generate the two's complement of a number, which tends to make life a lot easier for all concerned. Early digital computers were often based on one's complement arithmetic for a variety of reasons, including the fact that two's complement techniques were not well understood. But designers quickly migrated to the two's complement approach, because of the inherent advantages it provides. Unfortunately, the problem that we noted at the beginning of this chapter remains, which is that when we examine a computer's ALU, there isn't a two's complementor in sight; instead a humble one's complementor glares balefully at us from it's nest of logic gates. So where is our two's complementor? Is this part of some nefarious government scheme to deprive us of the benefits of two's complement arithmetic? Well fear not my braves, because the ghastly truth is about to be revealed .....
~he ~lrithmetic/Z~ogic IAnit (~qs The heart (or, perhaps, the guts) of the CPU is the arithmetic~logic unit (ALU), where all of the number crunching and data manipulation takes place. For the purposes of these discussions we'll assume a computer whose data bus is 8 bits wide, and whose ALU therefore works with 8-bit chunks of data (Figure 12-8). The ALU accepts two 8-bit words A[7:0] and 1317:0] as input, "scrunches" them together using some arithmetic or logical operation, and outputs an 8-bit result which we've named F[7:0]. Whatever operation is performed on the data is dictated by the pattern of logic Os and logic l s fed into the ALU's instruction inputs. For example, one pattern may instruct the ALU to add A[7:0] and 1317:0] together, while another pattern may request the ALU logically AND each bit of A[7:0] with the corresponding bit in 1317:0].
143
144 Designus Maximus Unleashed! .
\~o'~,~r
j.......-
.................
.....
'
"""' c'\s 2:~:~&0~ ~
~~
_,,
t ~<~
.'
Figure 12-8: The ALU is where the number crunching and data manipulation takes place
Note that our ALU is completely asynchronous, which means that it is not directly controlled by the main system's clock. As soon as any changes are presented to the ALU's data, instruction, or carry-in inputs, these changes will immediately start to ripple through its logic gates and will eventually appear at the data and status outputs. ~he
"eore"
~t4
The number of instruction bits required to drive the ALU depends on the number of functions we require it to perform: two bits can be used to represent four different functions, three bits can represent eight functions, and so forth. We can consider the ALU as having layers like an onion, and we might visualize the core of an extremely rudimentary ALU as performing only five simple functions (Table 12-I). Function
I Outputs F[7:0] equal ^
a:
Logical OF,
A
I
Logical XOI~
^
A~ldition (ADD)
^[7~]: ::: + :::~:B~]
Logical AND
Compare (CMP)
,,
B
....
.
,,
Flags Me,lifted
.....................N , Z
,
::B
N,Z.,. - ::
: ':::.
iii iii ii i iii
iiii iiiiiii iiiiiiiii iiiiiiiiii
<
'
.
::
-:..
......
Ni z + ct ..............
I
:~:O,:N, Z ..
......... ::::.i!i::::
Table 12-I. We might visualize the core of an extremely rudimentary ALU as performing only five simple functions
C h a p t e r 12 A minus B = A + NOT(B) + I .
.
.
.
.
.
.
.
.
.
.
---
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
-::-::::--::--:
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
:::-:
: :
:.
: .................................
---
:
.....................................
- .........................
The instruction-bit patterns we might assign to these functions are not important for our purposes at this time; suffice it to say that the five functions shown here would only require three instruction bits. As we shall see, implementing a core ALU to perform these tasks is really not too complex. First consider how we could implement the AND function, which actually only requires eight 2-input AND gates (Figure 12-9).
% /i/~"'d ~ " "J
.............
4%,e| .
q~h_, " '~Z'z'7
"%0,r
Figure 12-9: Implementing the core ALU's AND function only requires eight 2-input AND gates Similarly, the OR function would only require eight 2-input OR gates, and the X01~ function would require eight 2-input XOR gates. Things are a little more complex in the case of the ADD function, but not unduly so. For example, a basic ripplethrough adder would only require a total of sixteen AND gates and twenty-four XOR gates, plus an additional XOR gate to generate the overflow output (the overflow output is generated by XOR-ing the carry-in and carry-out associated with the most-significant bit of the result). Similarly, the CMP (compare) is a little more complex than the primitive logical functions, but nothing that a few handfuls of cunningly connected gates can't handle. Thus, it wouldn't take us long to generate our five core functions as individual entities (Figure 12-10). Note that the 0add ("overflow from the adder") output from the ADD function could be directly connected to the main 0 ("overflow") output coming out of the core ALU. However, ADD function's Cladd ("carry-in") input and C0add ("carry-out") output would not be connected to the core ALU's CI and CO terminals, because we have other plans for these signals as we shall see.
145
146 Des/gnus Maximus Unleashed/ '2
(b) OR
~,.j....
~e'/>'q7
~o~
(c) xo R
(e) CMP
Figure 12-10: The core ALU's five functions could be constructed in isolationusing relativelyfew logic gates For the purpose of these discussions, we'll consider the inputs to the C M P ("compare") function as being signed binary numbers. Also, the AgtB output will be driven to logic I if A[7:0] is greater than B[7:O], while the AeqB output will be driven to logic I if A[7:0] is equal to B[7:0]. So at this stage in the proceedings we know how to implement the core ALU functions in isolation (although admittedly we've skimped on some of the nitty-gritty details). The next point to ponder is the means by which we can "glue" them all together to form the ALU itself. One possible approach would be to hurl a multiplexer into our cauldron of logic gates, stir things up a little, and see what develops (Figure 12-11).
,,"'~
\~e~,~~ C~
Figure 12-1 I" The outputs from the simple functions (excepting the CMP) are used to drive a 4:1 multiplexer
C h a p t e r 12 A minus B = A + NOT(B) + I .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
,,,,, .............
,
,
,, .
.
.
.
.
,,,
.
.
.
.
.
.
.
.
.
.
.
,,,
.
.
.
.
.
.
.
.
In this scenario, we use two of our instruction bits (which can represent four patterns of 0s and ls) to control a 4:1 multiplexer, where each of the input channels feeding the multiplexer is 8 bits wide. The A[7:O] and B[7:O] signals are presented to all of the functions, but only the outputs from the function of interest are selected. The reason we only need a 4:1 multiplexer is that the fifth function, the CMI~, only outputs status information, but doesn't generate any data as such. The advantage of this multiplexer-based approach is that it's easy to understand, but in fact we would be unlikely to use it in a real-world implementation. This is because we're only interested in being able to perform a single function at any particular time, so we would examine the functions to find areas of commonality and share gates between them. To put this another way, instead of having multiple distinct functions feeding a multiplexer, it's likely that we'd lose the multiplexer and "scrunch" all of the functions together into one "super function," thereby allowing us to reduce the ALU's total gate count and increase its speed. On the other hand, there's nothing intrinsically wrong with this multiplexer-based technique, so we'll stick with it for the purposes of these discussions. So now we know how to implement the data-processing portion of our core ALU, but we've yet to decide how we're going to use the Agtl3, Aeql3, 0a~l~l, and Coad~l signals, and also how we're going to generate the CO, 0, N, and Z status outputs (Figure 12-12). ~0,
"'J _ ~
0 ~ ~ oo~'
~o~
Figure 12-12: Generating the status outputs from the five core functions is fairly simple The N ("negative") status output is the easiest of all, because it's simply a copy of the most-significant bit of the data outputs (that is, F:[7]). Things get a little more
147
148 DesignusMaximusUnleashed! complicated when we come to the Z ("zero") output, because it depends on the type of operation that the ALU is performing. In the case of the AND, OR, XOR, and ADD functions, the zero output is set to logic I if the result from the operation is all 0s. We can create an internal signal called Zint to implement this by simply feeding all of the F[7"O] data outputs into an 8-bit NOR gate. However, in the case of the CMP function, we wish the Z output to be set to logic I if the two data values A[7:O] and B[7:O] are equal (this is represented by the AeqB signal coming out of the CMP block). The bottom line is that we've got a single output called Z, which we want to reflect the state of one of two signals, Zint and AeqB, depending on the function being performed. We can achieve this by feeding Zint and Aeq~ into a 2:1 multiplexer, whose select input could be controlled by a third instruction bit driving the core ALU. Similarly, we usually want the CO status output to reflect the carry out from the ADD function on its Coadd signal; but if we're performing a CMP instruction, then we want the CO signal to be set to logic I if the unsigned binary value on A[7:O] is greater than on 1317:O]. Once again, we can achieve this by feeding both the COadd and AgtB signals into a 2:1 multiplexer controlled by our third instruction bit.
E x t e n d i n g t h e e o r e ~lJ~l/I to P e r f o r m S u b t r a c t i o n s Thus far we've designed a core ALU that can perform five simple functions, but we know that our CPU requires more of us. For example, our core ALU has an ADD function that can add two 8-bit signed binary numbers together (along with the carry-in status input), but the CPU needs to be able to perform both additions and subtractions in the form of the instructions that we referenced at the beginning of this chapter: ADD ("add without carry"), ADDC ("add with carry"), SUB ("subtract without carry"), and SUBC ("subtract with carry"). This means that we need to extend our core ALU in strange and wondrous ways, such that we can control the value being fed into the B~:O] inputs by means of a complementor block (Figure 12-13). Before delving into this new complementor block, it may be best to determine exactly what we want it to do. In the case of instructions such as AND, OR, XOR, ADD, and ADDC, we want the new block to pass whatever value is on the BB[7:O] inputs directly through to its outputs without any modification. However, in the case of SUB and SUBC instructions, our new block is going to negate (generate the one's complement of) the value on the BB[7:O] inputs before passing it on to the core ALU (Figure 12-14).
C h a p t e r 12 A minus B = A + NOT(B) + I ...,...,
..........
.9. . . . . . . /.
Complementor block /-
Extended ALU
,/
-.
"
c~176
~
/
"-..
I"
/
./
/
/.
Figure 12-13: The ALU requires a complementor block in order to perform subtractions _/~00 ~
@~x~'-~
...............................
(a) Complementor block
0~
~
o:,o\'v ,,~,~v~,,~G~
(b) Additional status logic in core ALU
Figure 12-14: The complementor block consists of a negator and a 2: I multiplexer (plus we have to augment the status logic in the core ALU) As we see, this example complementor block only contains two functions: a 2:1 multiplexer and a negator, where the negator could simply comprise eight NOT gates - one for each signal in the data path. If the pattern on the instruction bits represents an operation such as AND, OR, XOR, ADD, or ADDC, then we'll decode them in such a way that they cause the 2:1 multiplexer in the complementor block to select the value on BB[7:O]. By comparison, a SUB or SUBC will cause the multiplexer to select the outputs from the negator, whose value is the inverse of that found on BB[7:O]. In reality, we could derive a far more efficient and aesthetically pleasing complementor block by simply replacing its negator and multiplexer by eight 2-input XOR gates, where one input from each of these gates would be driven by the multiplexer's select input in Figure 12-14a. In this case, a logic 0 on the select signal would cause the XOR gates to pass the values on their
149
150 Designus Maximus Unleashed! other inputs directly to their outputs, while a logic I on the select signal would cause the XOR gates to invert the values at their outputs. (In fact the only reason we considered the multiplexer-based technique at all is that it's easier to understand for beginners.) One question that is probably on your lips is: "Why are we using a negator (which generates the ones complement of the value on BB[7:0] ) instead of a twos complementor?" After all, we devoted a lot of effort in the early portions of this chapter singing the praises of two's complement versus one's complement representations. The answer to this question will make you begin to appreciate the wily ways of the CPU designer, but let's first review what we know thus far: i) We can perform a subtraction such as a - b by converting b into its negative equivalent and then performing an addition; that is, another way to represent a - b is to say a + (-b). ii) We could convert b to -b by generating its two's complement value. iii) We already have an ADD function in our core ALU. So, one way to perform a simple SUB operation would be to cause the Cladd input to the ADD function in the core ALU to be forced to a logic 0 value, and to feed the 131317:O]inputs through a two's complementor function that we could have created in our complementor block had we wished to do so. However, generating the two's complement of a binary value requires us to invert all of the bits and then add 1 to the result, so the two's complement of B1317:O] will be NOT(BI3[7:0]) + I (Figure 12-15). 9
0\-.. 0 Ov This le not the a d d e r ~ in the Gore ALU
Thle ie not t In the cot
(a) One way to generate the
twos complement of 131317:0]
~0.,~3"
..0.~h '~
(b) Another way to generate the twos complement of 131317:0]
Figure 12-15: These two techniques for generating the two's complement of BB[7:0] both require the use of an additional 8-bit adder
C h a p t e r 12 A minus B = A + NOT(B) + I
Remember that we're not actuMly going to use a twos complementor; this portion of our discussions is simply intended to "set the scene." The process of inverting the bits using the negator is easy (we can simply feed each bit through a NOT gate), but adding I to the result would require us to build a second 8-bit adder, which would require a substantial amount of additional logic gates. Given a choice, we would prefer not to have two adder blocks in our ALU, but what can we do? In fact, the answer is floating around under our noses. Consider the methods for generating a twos complement shown in Figure 12-15. The first (Figure 12-15a) requires us to force our new adder's CI input to a logic O, and to connect its second set of inputs to a hard-wired $01 value (where the '$' character indicates a hexadecimal value). Alternatively, we can achieve the same effect by connecting the second set of inputs to a hard-wired $ 0 0 value, and forcing the CI input to our new adder to a logic I (Figure 12-15b). Hmmm, just a cotton-pickin moment. Earlier in our discussions we noted that, if we decided to use a twos complementor for a simple SUB operation, then we would have to force the Cla~l~l carry-in signal to the adder in the core ALU to logic 0, but we seem to have introduced some redundancy here. To understand why this should be so, let's combine the two's complementor in Figure 12-15b with the main adder in the core ALU (note that we've omitted the multiplexer in the complementor block to simplify the issue) (Figure 12-16).
00
....~~i..i3..1iiiliEi.i.i~....
.....
/-'"
.............. .~O~k~ "....0 ~ " "
'~(~*0 G\
GO'~
Figure 12-16: This would be the situation if we were to use a two's complementor
151
152 Designus Maximus Unleashed! Does anything leap out at you from this figure? Well, the CI input to the adder in the two's complementor is being forced to logic 1, while the Cladd input to the adder in the core ALU is being forced to a logic O. It doesn't take much thought to realize that we can achieve exactly the same effect by forcing the CI input to the adder in the twos complementor to a logic O, and forcing the Cladd input to the adder in the core ALU to a logic I. This is the clever bit. If the CI input to the adder in the twos complementor is forced to a logic O, then this adder isn't actually doing anything at all. That is, we've ended up with a block that's adding one set of inputs (the outputs from the negator) to a value of zero with a carry-in of zero. As any value plus zero equals itself, why do we need this adder? The answer is that we don't! If we force the Cladd input to the adder in the core ALU to a logic 1, we can simply lose the adder from the two's complementor. Hence the fact that Figure 12-14a showed our complementor block as only containing a negator and a multiplexer. So let's summarize where we're at so far: 1) To perform a SUB function (without a carry-in), we need to perform the operation a[7:O] -b[7:O]. 2) We know that this is equivalent to a[7:O] + (-b[7:O]). 3) We also know that we can generate (-b[7:O]) by taking the twos complement of b[7:O] using (NOT(b[7:O]) + I), which would allow us to perform the operation a[7:O] + (NOT(b[7:O]) + I). 4) But we don't want to use a twos complementor, because this would require too many gates; instead, we only want to use a negator (ones complementor). The ones complement of b[7:O] is NOT(b[7:0]), which would allow us to perform the operation a[7:O] + (NOT(b[7:O])). 5) We know that the two's complement of a number equals its one's complement + 1, so we can also say that the one's complement of a number equals the two's complement- 1. This means that a[7:0] + (NOT(b[7:0])) is equivalent to a[7:0] + (-hi7:0]) -1. 6) So forcing the CI input to the adder in the core ALU to logic I means that the operation we're actually performing is a[7:O] + (-b[7:O]) -I +I. If we cancel out the -1 and + 1 in step (6), then we're left with an identical expression to that shown in step (2), which is what we wanted in the first place. This explains why we only require a negator (one's complementor) in our complementor block, because forcing a logic I onto the carry-in input to the
Chapter 12 A minus B = A + NOT(B) + 1 1 5 3
adder in the core ALU allows us to take the one's complement output from the negator and convert it into a full two's complement value. And it's as simple as that .... phew!
The topic in this chapter was published in a condensed form as a two-
li
p a ~ article under the title A minu~ B = A + N O r ( B ) + I in the December [i
5th, 1996 and the January 12th, 1997 issues of EDN (www.ednmag.com), and is reproduced in its original form here with their kind permission.
liiii tii
For your further reading pleasure, this article was itself abstracted from the book: Bebop BYTES Back (An Unconventional Guide to Computers), ISBN 0-9651934-0-3, with the kind permission of Doone Publications (www.doone.com) (see also the order form in the back of this book).
lii li liiii [i [i
This Page Intentionally Left Blank
Chapter 13:
Binary Multiplication "The product of all fears" In this chapter you will discover: Primitive C o m p u t e r s w i t h Simple Instruction Sets ....................... 156 The S h i f t - a n d - A d d T e c h n i q u e .......................................................................................156 S i g n e d Binary M u l t i p l i c a t i o n .............................................................................................160 E x a m p l e S u b r o u t i n e ..............................i........................................................................................161
156 Designus Maximus Unleashed! Primitive eomputers with Simple ~nstruction Sets Modern computers typically have special instructions and employ dedicated logic (either inside or outside the central processing unit) to perform multiplication operations. But early microcomputers didn't have this capability, with the result that their users had a limited number of options: either they made do without the ability to multiply numbers together (which would have restricted their ability to write useful programs to the point of absurdity), or they were obliged to come up with some technique to perform multiplication using whatever primitive instructions were available to them. ~he
Shift-and-~dd
~echnique
One technique for performing multiplication in any number base is by means of repeated addition; for example, 6 x 4 = 6 + 6 + 6 + 6 = 24 in decimal (similarly, 4 x 6 = 4 + 4 + 4 + 4 + 4 + 4 = 24). However, although computers can perform millions of operations every second, this technique can be extremely timeconsuming when the values to be multiplied are large. For example, if we wished to multiply two unsigned 16-bit numbers together (where an unsigned 16-bit number can be used to represent values in the range 0 through 65,535 in decimal ($0000 through $FFFF in hexadecimal)), then the worst-case for a subroutine based on the repeated addition technique would be to have to perform 65,535 additions, which would be somewhat less than amusing. As an alternative, we can perform multiplications using a "shift-and-add" technique based on the generation of intermediate results called partial products. For example, consider the multiplication of two 8-bit unsigned binary numbers (Figure 13-1). ~-bi~ 8-bit Decimal equivalent multiplicand multiplier /
v
\
/
v
\
22 x 115
,j
0001011000000000--
partial
00000000---
products
000 000
- - 000
10 10
10
~" 0 0 0 0 0 0 0 0
110
110
110
!
-~ -=
J
J
"" "-
. . . . . . .
IO000 I00111 I 0 0 0 I0~ 16-bltreeult
Figure 13-I" Binary multiplication can be performed using a shift-and-add technique
2530
Chapter 13 Binary Multiplication
Using this algorithm, a partial product is generated for each bit in the multiplier (note that the dash '-' characters in this figure would actually be 0s used to pad the partial products to the necessary width). If the value of a multiplier bit is 0, its corresponding partial product consists only of 0s; but if the value of the multiplier bit is 1, its corresponding partial product is a copy of the multiplicand. Also, each partial product is left-shifted as a function of the multiplier bit with which it is associated; for example, the partial product associated with bit 0 in the multiplier is left-shifted zero bits, while the partial product associated with bit I in the multiplier is left-shifted one bit, and so on. Once all of the partial products have been generated, they are added together to generate the result, whose width is equal to the sum of the widths of the multiplicand and the multiplier. As was illustrated in Figure 13-1, multiplying two 8-bit numbers together generates a 16-bit result. Each of our unsigned 8-bit numbers can carry values in the range 0 through 255 in decimal ($00 through $FF in hexadecimal), so the result will be in the range 0 through 65,025 in decimal ($0000 through $FE01 in hexadecimal). This shift-and-add approach can be implemented in hardware or software. If we were considering a hardware implementation, then we could use a dedicated block of logic to generate all of the partial products simultaneously and add them together. But a hardware solution wasn't an option in the case of early microprocessor users, so they were obliged to use a software solution, which means that they had to generate each partial product individually and add it into the result. Even so, the beauty of this scheme is that it only requires the same number of additions as there are bits in the multiplier, which means eight additions for this particular example. But there are several considerations here that we need to ponder. For example, if we are considering a computer with an 8-bit data bus (like the early microcomputers), then each of the partial products in Figure 13-1 would be 16 bits wide (including the 0s used for padding), and we would have to perform eight 16-bit additions. Also, we would have to expend a substantial amount of effort splitting our multiplicand into the 8-bit quantities required to form each 16-bit partial product. To illustrate the situation in more detail, consider the partial product associated with bit 4 of the multiplier (remember that bits are numbered from right to left starting at 0) (Figure 13-2). \
0 0 0 1 0 1 1 0
/
A
/\
A
. . . .
\
/
Partial p r o d u c t generated ~om bit 4 in t h e multiplier A number of instructions would be required t o c r e a t e these two bytes
Figure 13-2: Generating 16-bit partial products is time-consuming if your computer only has an 8-bit data bus
157
158 Designus Maximus Unleashed! The problem is that we would have to perform a number of operations to split our 8-bit multiplicand across these two bytes. Also, we would have to split the multiplicand at different locations for each partial product. This isn't to say that we couldn't use this technique, but rather that we would prefer a method that could achieve the same effect with fewer instructions and less hassle. In fact, by some strange quirk of fate, there is indeed a rather cunning ploy that we can use (Figure 13-3). 8-bit
/
multiplicand v
\ ~ u l t ie in'r,lally Ioacleci with Oe
CAdet
Carry flag \
ShEr ,,
".--
/
1G-bit rs,3ult
Figure 13-3: We can minimize our code using a cunning ploy
Initially, the two bytes we're using to represent our result are loaded with 0s. Next we look at bit 0 in the multiplier to see if it contains a 0 or a 1. If bit 0 of the multiplier contains a 0, we add a byte of 0s to the most-significant byte of the result, but if bit 0 of the multiplier contains a 1, then we add a copy of the multiplicand into the most-significant byte of the result. Note that, in both cases, we're interested in the resulting state of the carry flag. Next we shift both bytes of the result one bit to the right. When we shift the most-significant (left-hand) byte of the result, it is important that we shift the carry flag into its most-significant bit. Also, the act of right-shifting the most-significant byte causes the bit that "drops off the end" (this byte's original least-significant bit) to be loaded into the carry flag. Similarly, when we shift the least-significant (right hand) byte of the result, it is important that whatever bit "dropped off the end" from the most-significant byte is shifted into the most-significant bit of the least-significant byte (hmmm, you might wish to ruminate on that last sentence for awhile). The point is that we can repeat this procedure for each of the bits in the multiplier, either adding a byte of 0s or a copy of the multiplicand into the most significant byte of the result, and then shifting both bytes of the result one bit to the right. If we were to sketch this process out on a piece of paper, we'd find that performing this sequence for each of the bits in the multiplier ultimately generates the same result as our original technique based on 16-bit partial products ~ but this method has the advantage of requiring far fewer computer instructions.
Ch=,t,,r
ry Mump
Sad to relate, there's still one more problem before we continue - the task of extracting and testing the bits in the multiplier. One approach would be to individually mask out the bits. For example, to determine the value in bit 0 of the multiplier we could AND it with $O1 (or 00000001 in binary), and then use a JZ ("jump if zero") instruction to vary our actions depending on the result (alternatively, we could use a JNZ ("jump if not zero") if that better served our purposes). Similarly, to determine the value in bit I of the multiplier we could AND it with $02 (or 00000010 in binary); to determine the value in bit 2 of the multiplier we could AND it with $04 (or 00000100 in binary); and so forth. However, if we actually try to implement this technique, we would again discover that it requires a lot of instructions and muddling around. As an alternative, we could simply shift the multiplier one bit to the right (which would result in its least-significant bit falling off the end and dropping into the carry flag), and then use a JC ("jump if carry") instruction to vary our actions depending on the result (as usual, we could use a JNC ("jump if not carry") instruction if that better served our purposes). Finally, we might note that we are already shifting our 16-bit result one bit to the right for each bit in our 8-bit multiplier, which means that whatever is initially loaded into the least-significant byte of the result will ultimately be thrown away anyway. Thus, we can save ourselves considerable effort by initializing the result such that its most-significant byte contains 0s, while its least-significant byte contains the multiplier (Figure 13-4). Leaet-eigniflcant byte
Moet-significant byte of result i~ intially loaded with Oe
r.ii !i \
of result ie intially loaded with multiplier
i iii ii il li l!iii!iiiii iii!i ii i iiiiii!iiil ,.
16-bit result
/
Carry flag
Figure 13-4: Pre-loading the multiplier into the leastsignificant byte of the result also saves instructions This means that every time we stroll around the main loop, which involves shifting both bytes of the result one bit to the right, the next bit of interest in the multiplier will automatically end up in the carry flag, ready and waiting for the following iteration of the loop. Don't worry if you find the above to be a bit mind-boggling at first - after a while you'll find that this sort of "wheels-withinwheels" thinking starts to come naturally (and that would be the time to start worrying).
159
160 Designus Maximus Unleashed! Signed ~inary jYlultiplication Unfortunately, the technique introduced overleaf can only handle unsigned numbers. If we decide to consider our 8-bit numbers as representing signed binary values, then sometimes this routine will work and sometimes it won't, which many would consider to be somewhat less than satisfactory. Thus, if we wish to multiply signed 8-bit numbers with any level of confidence, we will need to modify our technique. In fact the solution is really rather simple. From the discussions above we already know how to multiply unsigned (positive) numbers together, so all we need to do to multiply signed (positive and negative) numbers together is to convert any negative values into their positive equivalents, multiply the values together, and then correct the sign of the result if necessary. For some reason, it's usually easier to visualize the way in which this sort of thing works in terms of a hardware implementation (Figure 13-5). Bit 7 --~
~
Multiplicand
~ - Two', comp
Multiplexer
ga~e
Bi~ 7 --~
I
~
II
~
Multiplier
*-- Two', comp r
Multiplexer
4 - Two'e eomp
:0
Multiplexer Final reeult
Figure 13-5: It's usually easier to visualize multiplying signed binary numbers as a hardware implementation The most-significant bit of a signed binary number is called the sign bit, and this bit will contain a logic 0 if the value is positive and a logic I if it's negative (the sign bit will be bit 7 in the case of an 8-bit number). Now consider the multiplicand, which is fed directly into a multiplexer and also into a two's complementor. The two's complementor automatically generates the negative equivalent of whatever value is fed to its inputs. This means that, if the multiplicand is a positive value, the output from the two's complementor will be its negative equivalent, and vice versa. The output from the two's complementor
Chapter 13 BinaryMultiplication
is also connected to the multiplexer, whose s~l~c~ input is driven from the multiplicand's sign bit. This is the clever part, because if the sign bit contains a logic 0 (indicating a positive value), this instructs the multiplexer to select the multiplicand. Conversely, if the sign bit contains a logic I (indicating a negative value), this instructs the multiplexer to select the output from the two's complementor, which is the positive equivalent of the multiplicand. Similar actions are also performed in the case of the multiplier. Thus, the multiplier array is always going to be presented with two positive numbers, which it proceeds to multiply together with alacrity and dispatch. Assuming that the multiplier array is based on the partial product technique introduced earlier in this chapter, then this array will generate all of the partial products, add them together, and present the 16-bit result at its outputs. Now comes another ingenious legerdemain, in which we decide whether or not to negate the output from the multiplier array. If both the multiplicand and the multiplier were positive then we don't have to invert the output from the array, because a positive times a positive equals a positive. Similarly, we don't have to invert the output from the array if both the multiplicand and the multiplier were negative, because a negative times a negative equals a positive. Thus, it is only when the multiplicand and the multiplier have different signs that the result should be negative, in which case we will need to invert the output from the array. As is illustrated in Figure 13-5, the output from the array is fed into a multiplexer and two's complementor arrangement similar to the ones used for the multiplicand and the multiplier. However, in this case, the multiplexer is controlled by the output from an XOR gate, whose inputs are the sign bits from the multiplicand and the multiplier. If both of the sign bits have the same value, then the output from the XOR gate will be a logic 0 and the multiplexer will select the outputs from the multiplier array. Alternatively, if the sign bits have different values, then the output from the XOR will be a logic 1, thereby causing the multiplexer to select the outputs from the two's complementor.
Example Subroutine For the purposes of these discussions we are going to realize our solution as a software subroutine as opposed to a hardware implementation, but the same general principles apply. Also, software practitioners keep a few tricks up their own sleeves; for example, to generate the two's complement of a number we can simply subtract that number from zero; alternatively, we can achieve the same effect by inverting all of the bits in the number and then adding 1 to the result (you will see this technique being used in the code for the subroutine overleaf).
161
162 Designus Maximus Unleashed! The procedure presented above does raise one slight problem, in that signed 8-bit numbers can carry values in the range -128 through + 127 in decimal ($80 through $7F in hexadecimal). The problem lies in the fact that we can't convert any -128 values into their positive equivalents, because our 8-bit fields simply cannot represent a value of + 128. This means that we have to introduce a rule that says we can only guarantee the results returned by our subroutine if it is presented with values in the range -127 through +127 in decimal ($81 through $7F in hexadecimal). (Note that it's the user's responsibility to ensure that this condition is met). Thus, the most-negative result we can receive will be -127 x +127 (or +127 x -127) -- -16,129, while the most-positive will be +127 x +127 (or -127 x -127) -- +16,129, so our 16-bit result will be in the range -16,129 through +16,129 in decimal (SCOFF through $3F01 in hexadecimal). Now let's proceed to the assembly language listing of the subroutine itself, which retrieves two 8-bit signed numbers from the stack, multiplies them together, and places the 16-bit signed result on the top of the stack (note that this subroutine assumes that any 16-bit numbers are stored with the most-significant byte "on top" of the least-significant byte). ################################################################### # Copyright(C) #
Maxfield
& Montrose
Interactive
Inc,1996,1997.
# The authors are not r e s p o n s i b l e for the c o n s e q u e n c e s of # u s i n g this software, no m a t t e r how awful, even if they # arise from d e f e c t s in it. # ###################################################################
# Name : #
_SMULT8
# Entry: # # # # #
Top of stack M o s t - s i g n i f i c a n t byte of r e t u r n address L e a s t - s i g n i f i c a n t byte of r e t u r n address First 8-bit n u m b e r (multiplicand) S e c o n d 8-bit n u m b e r (multiplier)
# Function# #
# Exit# # #
# Modifies. # #
M u l t i p l i e s two 8-bit signed n u m b e r s (in the range -127 to +127) and returns a 16-bit s i g n e d result.
Top of stack M o s t - s i g n i f i c a n t byte of result L e a s t - s i g n i f i c a n t byte of result Accumulator Index r e g i s t e r
# #
# # # #
# #
# #
Chapter 13 Binary Multiplication # SizeP r o g r a m = 128 b y t e s # # Data = 6 bytes # ################################################################### _SMULT8
9
BLDX
POPA STA POPA STA POPA STA POPA STA LDA
STA
#### #### ####
Invert
_AD_TSTA-
9
[_AD_RADD] [_AD_RADD+I]
[_AD_MAND] [_AD_RES+I] 0
[_AD_RES]
input
values
LDA STA JNN
[_AD_MAND ] [_AD_FLAG ] [_AD_TSTB ]
XOR
SFF
L o a d the i n d e x r e g i s t e r w i t h 9, w h i c h e q u a l s the n u m b e r of t i m e s we w a n t to go a r o u n d the loop +i R e t r i e v e MS b y t e of r e t u r n a d d r e s s f r o m s t a c k a n d s t o r e it R e t r i e v e LS b y t e of r e t u r n a d d r e s s f r o m s t a c k a n d s t o r e it
# # # # # #
Retrieve multiplicand from s t a c k a n d s t o r e it Retrieve multiplier from stack a n d s t o r e it in LS b y t e of result L o a d the a c c u m u l a t o r w i t h 0 a n d
# s t o r e it in the MS b y t e # result
if n e c e s s a r y
INCA STA [_AD_MAND ]
_AD_TSTB-
# # # # # # # #
# # # # # # # # # #
and
load
the
of
output
flag
L o a d the m u l t i p l i c a n d a n d save it to the flag If m u l t i p l i c a n d is p o s i t i v e t h e n jump to '_AD_TSTB', o t h e r w i s e .. ..invert the c o n t e n t s of the A C C then.. ..add 1 to A C C . store now-positive multiplicand
LDA XOR STA
[_AD_FLAG ] [_AD_RES+I ] [_AD_FLAG ]
# L o a d the flag, # X O R it w i t h the m u l t i p l i e r , # t h e n s t o r e the flag a g a i n
LDA
[_AD_RES + 1 ]
JNN
[_AD_DUMY ]
XOR
SFF
# # # # #
L o a d the m u l t i p l i e r into the ACC If m u l t i p l i e r is p o s i t i v e t h e n j u m p to '_AD_DUMY', o t h e r w i s e . . . .invert the c o n t e n t s of the
163
164 Designus Maximus Unleashed! # ACC # ..add 1 to ACC
INCA STA _AD_DUMY:
####
#### H o l d #### loop ####
_AD_LOOP-
_AD_SHFT-
ADD
tight
[_AD_RES+I] 0
- this
is the start
[_AD_RE S ]
JNC
[_AD_SHFT ]
ADD
[_AD_MAND ]
RORC
STA
[_AD_RES ]
LDA
[_AD_RES+I]
RORC
[_AD_RES+I]
DECX JNZ
multiplier
# Add zero to the a c c u m u l a t o r # (dummy i n s t r u c t i o n w h o s e sole # p u r p o s e is to set the carry # flag to 0)
LDA
STA
# ..store n o w - p o s i t i v e
[_AD_LO0 P ]
of the m a i n m u l t i p l i c a t i o n
# # # # # # #
Load ACC w i t h MS byte of result (doesn't affect the carry flag) If carry=0, jump to start shifting o t h e r w i s e add the m u l t i p l i c a n d (which m a y m o d i f y the carry flag)
# # # # # # # #
Rotate the a c c u m u l a t o r (MS byte of result) 1-bit right. This shifts the carry flag into the MS bit and also u p d a t e s the carry flag w i t h the bit that ~falls off the end" N o w store the MS byte of result (doesn't affect the carry flag)
# # # # # # # # # #
Load ACC w i t h LS byte of result (doesn't affect the carry flag) Rotate the LS byte of the result 1-bit right. This shifts the carry flag into the MS bit and also u p d a t e s the carry flag w i t h the m u l t i p l i e r bit that ~falls off the end" N o w store the LS byte of result (doesn't affect the carry flag)
# # # # #
D e c r e m e n t the index r e g i s t e r (which d o e s n ' t affect the carry flag). If the index r e g i s t e r isn't 0 then jump b a c k to the b e g i n n i n g of the loop
Chapter 13 Binary Multiplication #### B r e a t h e out - this is the end of the m a i n m u l t i p l i c a t i o n #### loop. N o w c h e c k the flag a n d n e g a t e the o u t p u t r e s u l t if #### n e c e s s a r y _AD_TSTCLDA [_AD_FLAG] # L o a d A C C w i t h the flag JNN [_AD_SAVE] # If MS b i t of flag is 0 t h e n # j u m p to '_AD_SAVE', o t h e r w i s e . . LDA [_AD_RES+I] # . l o a d A C C w i t h LS b y t e of # result XOR $FF # ..invert the c o n t e n t s of the # ACC INCA # ..add 1 to A C C (updates c a r r y # flag) STA [_AD_RES+I] # ..store n e g a t e d LS b y t e # (doesn't a f f e c t c a r r y flag) LDA [_AD_RES] # ..load A C C w i t h MS b y t e of # r e s u l t (doesn't a f f e c t c a r r y # flag) XOR $FF # . . i n v e r t the c o n t e n t s of the # A C C (doesn't a f f e c t c a r r y flag) A D D C $00 # . . p r o p a g a t e a n y c a r r y f r o m LS # byte STA [_AD_RES] # ..store n e g a t e d MS b y t e ####
#### Save #### h e r e _AD_SAVE-
the
result
on the
LDA [_AD_RES+I] PUSHA LDA [_AD_RES] PUSHA LDA [_AD_RADD+I]
stack
and
PUSHA
# # # # # # # # # #
RTS
# That's
PUSHA LDA [AD
A D RADD-
.2BYTE
A D MAND-
.BYTE
A D RES-
.2BYTE
RADD]
# # # # # #
then
let's
Load ACC with a n d s t i c k it Load ACC with a n d s t i c k it Load ACC with address from s t i c k it b a c k Load ACC with address from s t i c k it b a c k it,
bug
out
of
LS b y t e of r e s u l t on the s t a c k MS b y t e of r e s u l t on the s t a c k LS b y t e of r e t u r n temp location and on the s t a c k MS b y t e of r e t u r n temp location and on the s t a c k
exit
the
subroutine
Reserve 2-byte temp location for the r e t u r n a d d r e s s Reserve 1-byte temp location for the m u l t i p l i c a n d Reserve 2-byte temp location for the r e s u l t
165
166 Designus Maximus Unleashed! _AD_FLAG-
.
BYTE
# R e s e r v e 1 - b y t e to be u s e d as a # flag for n e g a t i n g the r e s u l t # (or not)
The way in which the flag, __AD_FT,A~ works is as follows ...... In the ___AD_TSTA part of the routine, this flag is loaded with all eight bits of the multiplicand. Later, in the . _ . A D T S T B part of the routine, the flag (which now contains a copy of the multiplicand) is XOR-ed with all eight bits of the multiplier. In reality, we are not concerned about the least-significant seven bits of the flag, because we're only interested in the most-significant bit (the sign bit). However, we don't have to worry about the other bits, because when we come to actually use the flag at the beginning of the _AD_TSTC part of the routine, we can use a JNN ("jump if not negative") instruction, which only considers the state of the sign bit. Note that the assembly language shown here doesn't correspond to any particular microprocessor, because it's one that the author and his accomplice (Alvin Brown) designed for the Beboputer T M Virtual Computer accompanying their book Bebop BYI"ES Back.
The topic in this chapter was publish~ in a condens~ form under the title The Product of all Feana: Binary Multiplication in the April lOth, 1997 Issue of EDN (www.~nmag.com), and is reproduced in its original form here with their kind permiseion.
[i'~ii'~ [ [
!
For your further reading pleasure, this article was iteelf abstracted from iii the book: Bebop BYTES Back (An Unconventional Guide to Computera), [ii 1513N 0-9651934-0-3, with the kinct permission of Doone Publications
(www.doone.com) (see also the order form in the back of this book),
liii
iil
Chapter 14:
Binary Division "Divide and conquer" In this chapter you will discover: L o n g Division is a Pain ................................................................................................................168 Taking Stabs in t h e D a r k .........................................................................................................168 A 4-Bit Test C a s e ..............................................................................................................................170 E x a m p l e S u b r o u t i n e .......................................................................................................................173
168 Designus Maximus Unleashed! s
Division is a P a i n
The previous chapter considered the problem of performing binary multiplication using the elementary instructions available to the users of simple microprocessors. Continuing the theme, we now turn our attention to binary division, which is one of the most difficult operations for a computer to perform. To understand why this should be so, consider the way in which one might perform a long division in the decimal numbering system; for example, let's divide 14 into 663 (Figure 14-1). 47 14 ) 4 x 14 = 5 6
66
- 5 6 = 10
Drop the 3 to form 103 7 x 14 = 9 8
663 56[
l~eeult equals 47 plus 5 remainder
,.10 ~103 "
98
103 - 9 8 = 5
Figure 14-I. Performing a long division in decimal requires a little effort Assuming that we didn't have the calculations shown in Figure 14-1 in front o f us, our thought processes would probably go something like this: We know that we can't divide 14 into 6, so we'd take the next option and try to divide 14 into 66. But how should we set about doing this? Most of us would probably hunt around in our minds and on our fingers, saying something like: "3 x 14 - 42, but that's too small; 5 • 14 -- 70, but that's too big; 4 x 14 -- 56, a n d that's as close as w e can get."
So now we know that the first digit of our result is 4 (from the 4 x 14). Next we subtract 56 from 66 leaving 10, drop the 3 down to form 103, and go through the process again, saying: "6 • 14 = 84, but that's too small; 8 • 14 - 112, but that's too big; 7 x 14 - 98, a n d that's as close as w e can get." Thus, we now know that the second digit of our result is 7. Finally we subtract 98 from 103 leaving 5, which is too small to divide by 14 (note that we're performing an integer division), so we know that our result is 47 with a remainder of 5.
~ a k i n g 5 t a b s in t h e D a r k The point about the example above is that a CPU has to go through a similar process, which means that it has to iterate on the result by taking stabs in the dark, seeing if it went too far, and backtracking if it did. To confuse the issue even
Chapter 14 BinaryDivision further, we sort of end up doing everything backwards, because we use the same sort of cunning tricks that we employed in the previous chapter to make our multiplications easier for the CPU to handle. To cut a long story short, assume that we have two 16-bit numbers called the dividend and the divisor, and we wish to divide the former by the latter. Ultimately we'll end up with a 32-bit result, in which the most-significant 16 bits will contain the remainder, while the least-significant 16 bits will contain the quotient (final value) (Figure 14-2).
~
~
Carry \ flag
~i2
16- b't Di idend (in itial valu e)
: ~ ~ ~ ~ ~ ~
"
~t-- Sh~ ~ 16-bit Remainder
\
~
/
/
16-bit Divisor
T~I
3 2 - b i t Result \
~-
Sh~ ~--
/
16-bit Quotient (final value) /
Figure 14-2: Graphical representation of a 16-bit unsigned divide
The way in which this works may seem kind of difficult to follow at first, but everything will come out in the wash. First, we reserve a 2-byte field in which to store our divisor, along with a 4-byte field in which to store our result. Also, we initialize the most-significant two bytes of the result to contain all zeros, and we load the least-significant two bytes with our dividend (the number to be divided). Once we've initialized everything, we have to perform the following sequence of operations sixteen times:
a) Shift the entire 32-bit result one bit to the left (shift a zero into the least-significant bit). b) Subtract the 2-byte divisor from the most-significant 2 bytes of the result and store the answer back into these 2 bytes.
c) If the carry flag contains a logic 0 following step (b), then this indicates a positive result, which means that the divisor was smaller than the 2 mostsignificant bytes of the result. In this case, force the least-significant bit of the result to a logic 1.
169
170 Designus Maximus Unleashed! Otherwise, if the carry flag contains a logic 1, then this indicates a negative result, which means that the divisor was bigger than the 2 most-significant bytes of the result. In this case, leave the least-significant bit of the result containing a logic 0 (from the shift), add the 2-byte divisor back to the most-significant 2 bytes of the result, and store the answer back into these 2 bytes. Note particularly the case in which the carry flag contains a logic I when we enter step (c). Every time this occurs, it means that the divisor was too big to subtract from the portion of the dividend that we're currently examining. But we've discovered this too late, because we've already performed the subtraction, which means that we now have to add the divisor back in. In fact, this is known as a restoring division algorithm for just this reason; namely, that we have to keep on restoring things every time we "go too far." There are also non-restoring algorithms which are somewhat more efficient, but also a tad more complicated and convoluted ...... so we'll ignore them in the hope they'll go away.
4-
it r
ease
Now, unless you've got a size-16 brain (and one of the models equipped with turbo cooling at that), the above has probably left you feeling overheated, confused, lost, and alone. But don't be afraid, because computer division can bring the best of us to our knees. The easiest way to understand this (or, at a minimum, to convince ourselves that it actually works as promised) is to examine a much simpler test case based on 4-bit numbers. For example, let's consider how we'd divide 00112 (3 in decimal)into 10112 (11 in decimal). Our first step would be to set up some initial conditions (Figure 14-3). Carry flag
Zen:~
JO0
Dividen~e
O,v,,or
Figure 14-3: Initial conditions for our 4-bit test case
Remembering that our thought-experiment is based on 4-bit numbers, this means that the most-significant 4 bits of what will eventually be our 8-bit result are set to zero, while our dividend will be loaded into the least-significant 4 bits. Now consider what happens during the first cycle of the process (Figure 14-4).
Chapter 14 BinaryDivision 1 71
I!!
ill I Oiii/
(a) lntial con~iitions at start of this cycle
(b) 5hi~ the entire 8-bit
(c) Subtract ~iivisor from the 4 MS bit~ of the result
(b) Add ~llvisor back into the 4 MS bits of result
result 1-bit to the left;
Figure 14-4: First cycle of our 4-bit division test case
Commencing with our initial conditions at the start of this cycle (Figure 14-4a), the first thing we do is to shift the entire 8-bit result one bit to the left, and also shift a logic 0 into the least-significant bit during the process (Figure 14-4b). Next we subtract the divisor from the 4 most-significant bits of the result (Figure 14-4c), but this sets the carry flag to a logic 1, which tells us that we've gone too far. Thus, we complete this cycle by adding the divisor back into the 4 most significant bits of the result (Figure 14-4d). As fate would have it, the second cycle offers another helping of the same thing (Figure 14-5). ~m
liii iii iiiil (a) Intiai con~litions at start of this cycle
I
(b) Shif~ the entire 8-bit
result 1-bit to the left;
i!i!iiiil
(c) Subtract ~iivisor from the 4 MS bits of the result
(b) Add divisor back into the 4 MS bits of result
Figure 14-5: Second cycle of our 4-bit division test case
Once again, the first thing we do is to shift the entire 8-bit result one bit to the left, and also shift a logic 0 into the least-significant bit during the process (Figure 14-5b). Next we subtract the divisor from the 4 most-significant bits of the result (Figure 14-5c), but this sets the carry flag to a logic 1, which tells us that we've go too far. Thus, we complete this cycle by adding the divisor back into the
172 Designus Maximus Unleashed! 4 most-significant bits of the result (Figure 14-5d). We can but hope that the third cycle will do something to break the monotony (Figure 14-6).
(a) lntlal conditions at etar~ of this cycle
(b) 5hif~ the entire 8-bit result 1-bit to the left \1/
i00,
, /
(c) Subtract ~livisor from the 4 MS bits of the result
I x
(b) Set L5 bit of result to logic 1
Figure 14-6: Third cycle of our 4-bit division test case
As usual we commence this cycle by shifting our 8-bit result one bit to the left (Figure 14-6b) and subtracting the divisor from the 4 most-significant bits of the result (Figure 14-6c). But in the case of this cycle, the carry flag is left containing a logic O, which means that the only thing we have to do is to force the least-significant bit of the result to a logic I (Figure 14-6d). Now the excitement really starts to mount, because we've only got one more cycle to go, but we don't appear to be closing on a result. Can things possible work out to our satisfaction? (Figure 14-7). -.q
[~
I00 (a) Intlal conciltlons at e t a ~ of this cycle
I010i~|t0010/
IooI~!
(b) Shift the entire 8-bit result 1-bit to the left \1/
~ Ioo ~o/I,,oo! o! I0o 1 1 (c) Subtract divisor from the 4 MS bite of the result
[~I Ioo ! o l I o o ~9111111!11~ Ioo~ ~,/ (b) Set LS bit of result to logic 1
Figure 14-7: Fourth and final cycle of our 4-bit division test case
As before, we start by shifting the 8-bit result one bit to the left (Figure 14-7b) and subtracting the divisor from the 4 most-significant bits of the result (Figure 14-7c). Once again, the carry flag is left containing a logic O, which means
.,,,
....
,
Chapter 14 BinaryDivision
,
that the only thing we have to do is to force the least-significant bit of the result to a logic I (Figure 14-7d). When we divide 11 by 3, we expect a quotient (result) of 3 and a remainder of 2. Well tickle my toes with a turtle, just look at what's lurking in our 8-bit result. The most-significant 4 bits (which represent the remainder) contain 00102 (2 in decimal), while the least-significant 4 bits (which represent the quotient) contain 00112 (3 in decimal). Good grief, it works! (If you're feeling frisky with your newfound knowledge, try picking on a passerby and attempt to explain just how it works!) Unfortunately, our division algorithm will not always perform correctly if faced with negative values, so we have to perform similar tricks to those we used in the case of the signed multiplication subroutine we introduced in the previous chapter. That is, we need to check the signs of the numbers, change any negative values into their positive counterparts using two's complement techniques, perform the division, then correct the sign of the result if necessary.
Example
Subroutine
Assume that we're working with an 8-bit microprocessor, and that we wish to create a subroutine to perform signed binary division on 16-bit numbers. The following listing describes just such a subroutine, which retrieves two 16-bit signed numbers from the stack, divides one into the other, and places the 16-bit signed result on the top of the stack (note that this subroutine assumes any 16-bit numbers are stored with the most-significant byte "on top" of the least-significant byte). ################################################################### # Copyright(c) #
Maxfield
& Montrose
Interactive
Inc.,1996,1997.# #
# The a u t h o r s are not r e s p o n s i b l e for the c o n s e q u e n c e s of # u s i n g this software, no m a t t e r h o w awful, e v e n if t h e y # a r i s e from d e f e c t s in it. # ################################################################### # Name : #
_SDIVI 6
# Entry: # #
Top of s t a c k M o s t - s i g n i f i c a n t b y t e of r e t u r n a d d r e s s L e a s t - s i g n i f i c a n t b y t e of r e t u r n a d d r e s s
# Function: # #
Divides -32,767
# # # # # #
two 1 6 - b i t s i g n e d n u m b e r s (in the r a n g e # to + 3 2 , 7 6 7 ) : r e t u r n s a 1 6 - b i t s i g n e d r e s u l t # #
# # #
173
174
Designus Maximus Unleashed! # # # # #
MS LS MS LS
# Modifies" # #
Accumulator Index register
# Exit# # #
Byte Byte Byte Byte
of of of of
Ist ist 2nd 2nd
16-bit 16-bit 16-bit 16-bit
number number number number
(divisor) (divisor) (Dividend) (Dividend)
Top of s t a c k M o s t - s i g n i f i c a n t b y t e of r e s u l t L e a s t - s i g n i f i c a n t b y t e of r e s u l t
# SizeP r o g r a m = 226 b y t e s # Data = 9 bytes ###################################################################
_SDIVI 6 9
BLDX
POPA STA POPA STA POPA STA POPA STA
16
[_AH_RADD] [_AH_RADD+I] [_AH_DIV] [_AH_DIV+I]
# # # #
L o a d the i n d e x r e g i s t e r w i t h 1 6 , w h i c h e q u a l s the n u m b e r of times we w a n t to go a r o u n d the loop
# # # #
R e t r i e v e MS b y t e of r e t u r n a d d r e s s from s t a c k a n d store R e t r i e v e LS b y t e of r e t u r n a d d r e s s from s t a c k a n d store
# # # #
Retrieve from the Retrieve from the
it it
MS b y t e of the d i v i s o r s t a c k and store it LS b y t e of the d i v i s o r s t a c k a n d store it
# N o t e that the r e s u l t is 4 b y t e s in size (_AH_RES+0, +i, +2, # a n d +3), w h e r e _ A H _ R E S + 0 is the m o s t - s i g n i f i c a n t b y t e POPA # R e t r i e v e MS d i v i d e n d from stack STA [_AH_RES+2] # a n d s t o r e it in b y t e 2 of # result POPA # R e t r i e v e LS d i v i d e n d from s t a c k STA [_AH_RES+3] # and s t o r e it in b y t e 3 of # result LDA 0 # L o a d the a c c u m u l a t o r w i t h 0 and STA [_AH_RES] # s t o r e it in b y t e 0 of result STA [_AH_RES+I] # then in b y t e 1 of r e s u l t
Chap~r 14 B ~ a ~ Division ####
#### #### ####
Check that we're not trying t h e n it's an ERROR, so j u s t
_AH_TSTZ-
LDA OR
[_AH_DIV] [_AH_DIV+I]
JNZ
[_AH_TSTA]
PUSHA PUSHA JMP ####
#### ####
Invert
_AH_TSTA-
_AH_TSTB-
[_AH_RET]
input
values
# # # # # # # # # # #
to d i v i d e b y zero. If w e return zero and bomb out
L o a d M S b y t e of the d i v i s o r a n d O R it w i t h the LS b y t e of t h e divisor If the r e s u l t i s n ' t z e r o t h e n w e ' v e g o t at l e a s t o n e l o g i c i, so j u m p to the b i t to t e s t for -ve n u m b e r . O t h e r w i s e p u s h the z e r o in A C C o n t o the s t a c k twice, t h e n j u m p to the l a s t c h u n k of the r e t u r n routine
if n e c e s s a r y
LDA STA JNN
[_AH_DIV] [_AH_FLAG ] [_AH_TSTB ]
LDA SUB STA LDA SUBC STA
0 [_AH_DIV+I] [_AH_DIV+I] 0 [_AH_DIV] [_AH_DIV]
LDA XOR STA
are
and
load
the
output
flag
# # # # # # # # # #
L o a d A C C w i t h M S b y t e of d i v i s o r a n d s a v e it to the f l a g if the d i v i s o r is p o s i t i v e t h e n j u m p to ' _ A H _ T S T B ' , o t h e r w i s e . . l o a d the a c c u m u l a t o r w i t h 0 . .s u b t r a c t LS b y t e of d i v i s o r .. (no c a r r y in) s t o r e r e s u l t . . l o a d the a c c u m u l a t o r w i t h 0 . . s u b t r a c t MS b y t e of d i v i s o r . . (yes c a r r y in) s t o r e r e s u l t
[_AH_FLAG] [_AH_RES+2] [_AH_FLAG]
# # # #
L o a d the flag, X O R it w i t h MS b y t e dividend, then store again
LDA JNN
[_AH_RES+2] [_AH_LOOP]
LDA SUB STA LDA SUBC STA
0 [_AH_RES+3] [_AH_RES+3] 0 [_AH_RES+2] [_AH_RES+2]
# L o a d M S d i v i d e n d i n t o the A C C # If d i v i d e n d is p o s i t i v e t h e n # j u m p to ' _ A H _ L O O P , o t h e r w i s e # . . l o a d the a c c u m u l a t o r w i t h 0 # . . s u b t r a c t LS b y t e of d i v i d e n d # ..(no c a r r y in) s t o r e r e s u l t # . . l o a d the a c c u m u l a t o r w i t h 0 # . . s u b t r a c t MS b y t e of d i v i d e n d # . . ( y e s c a r r y in) s t o r e r e s u l t
of the
flag
175
176 Designus Maximus Unleashed! ####
#### H o l d ####
_AH_LOOP-
tight LDA SHL STA LDA ROLC STA LDA ROLC STA LDA ROLC STA
- this
is the start
[_AH_RES+3] [_AH_RES+3] [_AH_RES+2] [_AH_RES+2] [_AH_RES+I] [_AH_RES+I] [_AH_RES] [_AH_RES]
# # # # # # # # # # # #
of the m a i n
Load ACC dividend S t o r e it Load ACC dividend S t o r e it Load ACC remainder S t o r e it Load ACC remainder S t o r e it
division
loop
w i t h LS b y t e of and s h i f t left 1 bit w i t h MS b y t e of and r o t a t e left 1 bit w i t h LS b y t e of and r o t a t e left
1 bit
w i t h MS b y t e of and r o t a t e left
1 bit
# N o w we w a n t to s u b t r a c t the 1 6 - b i t d i v i s o r from # the m o s t - s i g n i f i c a n t two b y t e s of the r e s u l t LDA [_AH_RES+I] # L o a d A C C w i t h LS b y t e of # remainder SUB [_AH_DIV+I] # S u b t r a c t LS b y t e of d i v i s o r STA [_AH_RES+I] # S t o r e it in LS b y t e of # remainder LDA [_AH_RES] # L o a d A C C w i t h MS b y t e of # remainder SUBC [_AH_DIV] # S u b t r a c t MS b y t e of d i v i s o r # (w carry) STA [_AH_RES] # S t o r e it in MS b y t e of # remainder # If the c a r r y flag is zero, set the LS bit of the # r e s u l t to logic 1 and jump to the e n d of the loop. # O t h e r w i s e u n d o the h a r m w e ' v e just d o n e by a d d i n g # the 1 6 - b i t d i v i s o r b a c k into the MS two bytes of # the r e s u l t JNC [_AH_ADD] # If c a r r y flag not zero jump to LDA [_AH_RES+3] # to _ A H _ A D D , o t h e r w i s e load A C C # w i t h LS b y t e of result, OR $01 # use OR to set LS STA [_AH_RES+3] # bit to I, then s t o r e it and # jump to _ A H _ T S T L JMP [_AH_TSTL] # (test at end of the loop) _AH_ADD-
LDA
[_AH_RES+I]
# L o a d A C C w i t h LS b y t e # remainder
of
Chapter 14 Binary Division
_AH_TSTL:
ADD
[_AH_DIV+I]
STA
[_AH_RES+I]
LDA
[_AH_RES]
ADDC
[_AH_DIV]
STA
[_AH_RES]
DECX JNZ
[_AH_LOOP]
####
# # # # # # # # # #
A d d LS b y t e of d i v i s o r (w/o carry) Store it in LS b y t e of remainder L o a d A C C w i t h MS b y t e of remainder A d d MS b y t e of d i v i s o r (with carry) Store it in MS b y t e of remainder
# # # #
D e c r e m e n t the index register. If the index r e g i s t e r isn't 0 then jump b a c k to the b e g i n n i n g of the loop
#### B r e a t h e out - this is the end of the m a i n d i v i s i o n loop #### N o w check the flag and n e g a t e the q u o t i e n t p o r t i o n of the #### r e s u l t if n e c e s s a r y (see also the notes f o l l o w i n g the #### subroutine) ####
_AH_TSTC-
####
LDA JNN
[_AH_FLAG ] [_AH_SAVE ]
LDA SUB STA LDA SUBC STA
0 [_AH_RES+3] [_AH_RES+3] 0 [_AH_RES+2] [_AH_RES+2]
# # # # # # # # #
L o a d ACC w i t h the flag If MS bit of flag is 0 then jump to '_AH_SAVE', otherwise.. ..load the a c c u m u l a t o r w i t h 0 ..subtract LS b y t e of q u o t i e n t ..(no c a r r y in) store result ..load the a c c u m u l a t o r w i t h 0 ..subtract MS b y t e of q u o t i e n t ..(yes c a r r y in) store result
#### Save result on the s t a c k and bug out of here. #### that w e ' r e only r e t u r n i n g the 16-bit q u o t i e n t #### the result ####
_AH_SAVE-
LDA
[_AH_RES+3]
PUSHA LDA [_AH_RES+2] PUSHA _AH_RET-
LDA PUSHA
[_AH_RADD+I]
# # # # # #
Remember p o r t i o n of
L o a d ACC w i t h LS b y t e of q u o t i e n t and s t i c k it on the s t a c k L o a d ACC w i t h MS b y t e of q u o t i e n t and s t i c k it on the stack
# L o a d ACC w i t h LS b y t e of r e t u r n # a d d r e s s from temp l o c a t i o n and # s t i c k it b a c k on the s t a c k
177
178 Designus Maximus Unleashed! LDA
_AH_FLAG-
[_AH_RADD ]
PUSHA
# L o a d A C C w i t h MS b y t e of r e t u r n # address from temp location and # s t i c k it b a c k on the s t a c k
RTS
# That's
.B Y T E
_ A H _ R A D D : .2 B Y T E _ A H _ D IV-
.2 B Y T E
_AH_RES-
.4 B Y T E
# # # # # # # # # # # #
it,
exit
the
subroutine
R e s e r v e 1 - b y t e f i e l d to be u s e d as f l a g to d e c i d e w h e t h e r or not to n e g a t e the r e s u l t Reserve 2-byte temp location for the r e t u r n a d d r e s s Reserve 2-byte temp location for the d i v i s o r Reserve 4-byte temp location for the r e s u l t . T h e MS two b y t e s of w h i c h w i l l c o n t a i n the r e m a i n d e r a n d t h e LS two b y t e s the q u o t i e n t
Note that the 2-byte remainder from the division ends up in the most significant two bytes of the result ( . ~ RES and ._AH_RES+I). Thus, if we decided that we wanted our subroutine to return this remainder, all we would have to do would be to add two more pairs of LDA and PUSHA instructions in the _ . ~ _ S A V E section of the subroutine (just before we push the return address onto the stack). However, it's more usual to create a separate subroutine that just returns the remainder. Alternatively, another common technique would be to modify this subroutine to have multiple entry and exit points, depending on whether we wish it to return the remainder or the quotient. Both of these techniques save us from passing the remainder back and forth when we don't wish to use it. Also note that as we (in our wisdom) decided that we weren't concerned with returning the remainder in this subroutine, we've managed to sidestep a rather convoluted problem. This problem may be summarized as follows: "If you divide a positive number with a negative number (or a negative number with a positive number), then what sign should be associated with the remainder?" To put this another way, if we divide +3 into -17, we know that the quotient is going to be -5, but should the remainder be +2 or -2? In fact one can get completely bogged down on this subject (people devote entire books to this sort of thing). The bottom line is that if you decide that you do indeed want to return the remainder from a signed division, then it's up to you to stroll down to your local library, read as many books on computer mathematics as you can stomach, and then make
.
.
.
.
,.
.
.
.
.
.
.
.
.
.
.
.
.
,
,.
.,,
,, .
.
.
.
.
.
.
.
.
.
.
.
.
.
,,,,
Chapter 14 Binary Division .
.
.
.
,.,
,,,, .
.
.
.
,.
.
.
.
9our own decision as to what you want the sign of the remainder to be. (Yes, I am weaseling out of a tortuously tricky topic.) Note that the assembly language shown here doesn't correspond to any particular microprocessor, because it's one that the author and his accomplice (Alvin Brown) designed for the Beboputer T M Virtual Computer accompanying their book Bebop BYTES Back. The topic in this chapter was published in a condensed form under the title Divide and Conquer: Binary Division in the May 8th, 1997 issue of EDN (www.ednmag.com), and is reproduced in its original form here with their kind permission. For your further reading pleasure, this article was itself abstracted from the book: Bel~op BYTES Back (An Unconventional Guide to Computena), ISBN 0-9651934-0-3, with the kind permission of Doone Publications (www.doone.com) (see also the order form in the back of this book).
.
,
179
This Page Intentionally Left Blank
This Page Intentionally Left Blank
Chapter 15:
State Machines "Deus e x M a c h i n a " In this chapter you will discover: Deus ex M a c h i n a
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
184
W h a t A r e S t a t e M a c h i n e s ? ...............................................................................................184 M o o r e a n d M e a l y e t al ............................................................................................................185 O n e - H o t versus B i n a r y - E n c o d e d ..............................................................................186 S t a t e - p e r - B i t a n d B i t - p e r - S t a t e ....................................................................................188 R e g i s t e r e d O u t p u t M a c h i n e s ........................................................................................188 Initialization a n d U n d e f i n e d S t a t e s ......................................................................189 O n e - B i g a n d M a n y - S m a l l D e f i n i t i o n s ...............................................................191 N o n - H i e r a r c h i c a l versus H i e r a r c h i c a l ...............................................................191 It's a C r i m e t o W a s t e a C l o c k ......................................................................................192 P e e r - t o - P e e r I m p l e m e n t a t i o n s ....................................................................................194 H o w t o D e s i g n M a n y - S m a l l M a c h i n e s ...........................................................194 R e a l - W o r l d E x a m p l e s ....................................................................................................................195 S u m m a r y ............................................................................................................................................................ 197
184 DesignusMaximusUnleashed! Deus ex ~~china The phrase "Deus e x Machina" (literally "a god from a machine") originally referred to an actor playing a deity who was introduced at the end of convoluted Greek dramas to decide the final outcome. (The actor was lowered onto the stage using a mechanical hoist; hence the "machine" portion of the expression.) More recently, the term has come to mean a person (or thing) that appears suddenly and unexpectedly to provide a contrived solution to an apparently insoluble difficulty. Who among us would deny that this latter meaning is a particularly apposite I~l description of digital designers and their state machines? Let us be brutally honest; if you hope to one day achieve that pinnacle of perfection which is embodied in the form of a digital logic designer, then it won't be long before you run into state machines in one or more of their many incarnations. But turn that frown upside down into a smile, because state machines can provide endless hours of amusement (as long as you establish early on who is going to be in charge ~ you, or the machine). "But where are we going with this?" you cry plaintively (and somewhat pathetically I might add). Well, if you'll settle down and curb your enthusiasm, I'II tell you. The purpose of this discussion is to consider some of the considerations and trade-offs between designing a single large state machine (which we might refer to as "One-Big") versus a number of smaller state machines, which are subsequently merged into a harmonious whole ("Many-Small"). But first, ......
What ~re State ~achines? Before we consider the issue of One-Big versus Many-Small, it probably wouldn't hurt to ensure that we're all tap-dancing to the same set of bagpipes, not the least that l'm not afraid to create my own definitions should it prove to be necessary (although I'II attempt to give fair warning if I do). It should come as no great surprise (for our younger viewers) to learn that a state machine is a logical entity that, at its core, consists of a set of states. Possibly the two most important concepts associated with state machines are those of the current state and the next state. In the case of a synchronous state machine, the current state is stored in a set of registers known as state variables, while the next state is decoded from a combination of the current state and any external inputs (Figure 15-1). The state machine performs its function by sequencing from state to state, and the external inputs can cause the machine to switch paths between alternate sequences. (Some state machines don't require external data inputs as such; for IQuick, whip out your dictionary
Chapter 15 State Machines
example, a simple synchronous binary counter which follows a rigid, immutable sequence.)
Next-state decode
From external inpute (if any)
Next, State
State variables ~
Current State To output logic
clock
Figure 15-I- The core of a synchronous state machine
It is also worth noting that state machines can be created using both synchronous or asynchronous techniques. For the purposes of these discussions, however, we're going to focus on synchronous machines. This is not to imply that there's anything wrong with asynchronous implementations, which can be capriciously cunning and which have their own advantages (and disadvantages), but synchronous machines are currently more prevalent.
~oore and ~ealy
e t al
In the early digital computers dating from the 1940s and 1950s, control functions were often based on shift register techniques called timing chains. The individual registers in these timing chains were originally implemented using relays, then vacuum tubes, and ultimately from discrete transistors. Unfortunately, in those days of yore, digital techniques were not well-understood, so designers merrily did their own thing and often ended up "reinventing the wheel." In fact, it was not until the mid-1950s that D.A. Huffman and his contemporaries G.H. Mealy and E.E Moore formalized synchronous logic in general, and state machines in particular, into the generic forms that we know today. In fact, two classical forms of synchronous state machines quickly came to be referred to as Moore and Mealy machines. In the case of a Moore machine (Figure 15-2), the outputs from the machine are derived only from the current state of the machine. By comparison, the outputs from a Mealy machine (Figure 15-3) are derived from a combination of the current state of the machine and one or more input variables (which means that you can make the outputs wriggle and squirm between clock cycles).
185
186 Designus Maximus Unleashed! Next-state
Output decode
State variables
decode
From Inputs
To outputs clock Figure 15-2: The Moore machine's outputs are derived only from the current state
Next-state decode From
inpute
State va rla bles
i i Outpu= decode
._._Jl.:
F'/II: / . . . .
99
clock
Figure 15-3: The Mealy machine's outputs are derived from the current state and one or more inputs
O n e - H o t versus
in
ry-Encoded
At some stage during the course of designing a state machine, it is necessary to assign the various states to the state variables, a process that is a rich source of pratfalls for the unwary. However, underlying the complexity, there are two fundamental assignment architectures that are commonly referred to as One-Hot and Binary-Encoded. 121For example, consider a state machine comprising five unique states called SO through $ 4 implemented using these two techniques (Figure 15-4). 2Additional coding techniques are introduced in the book/-/DL Chip Design (A Practical Guide for
Designing, Synthesizing, and Simulating ASICs and FPGAs using VI-IDL or Verilog) by Doug
Smith, ISBN 0-9651934-3-8, Doone Publications (www.doone.com).
Chapter 15 State Machines 1 8 7 State Variables
State Variables
oooo~-~ o
o
o o~ o~
o o
~ o
o o o
,oooo Remaining 27
o
)~I~ ii~i F~~I
N
Iooo-~ o
o
~
o
~
~
o ~ o , oo
combinations are unused
Remaining 3 combinations are unused
(a) One-hot
(b) Binary-encoded
l~il~
i~l
I,,~~~,:1~
N
Figure 15-4: There are a number of techniques for assigning states to state variables
In the case of a one-hot architecture, each state in the machine is described by its own unique state-variable bit. In a simple (non-hierarchical) machine, only a single state can be active at any particular time, so the one-hot architecture refers to the fact that only a single state variable is "on," or "hot," at that time. By comparison, in a binary-encoded architecture, each state in the machine is described using a unique binary code in the state variables. As usual there are pros and cons to both techniques. The combinational logic associated with a one-hot architecture is typically narrower, shallower, and faster, while the equivalent logic for a binary-encoded implementation is generally wider, deeper, and slower. However, the one-hot approach can require one heck of a lot of registers and undefined states are a major problem. Let's consider a machine with 1,000 states (if you're new to the game then this will probably make your eyes water, but it's certainly not unheard of). To implement this machine using a one-hot technique would require 1,000 registers, while a binary-encoded approach would only require 10. Of course, design decisions are often influenced by outside factors such as the type of device being used. For example, PLDs are relatively rich in combinational logic and poor in registers, thereby hinting at binary-encoded solutions; FPGAs are richer in registers, thereby tempting us with one-hot implementations; and ASICs are both logic and register rich, thereby allowing us to do what we jolly well please.
188 Designus Maximus Unleashed! State-per-
it r
it-per-State
Other appellations seem to be springing up from the nether regions, often from FPGA companies who tend to coin new terms as though they're going out of style. For example, the designations state-per-bit and bit-per-state, which try to differentiate themselves from one-hot in the case of hierarchical state machines. However, I personally don't understand any of the convoluted reasonings that l've been exposed to, and nobody else I know understands the distinctions, so I'm going to ignore state-per-bit and bit-per-state completely (you can do this sort of thing if you're precocious or eccentric). The problem is that so many terms which purport to be engineering concepts are actually invented by "the back room marketing boys" in a pitiful attempt to differentiate their products from those of their competitors. So if you aren't careful, you recklessly shell out your hard-earned lucre to purchase a "New and Revolutionary/" tool or product, only to find that you've just acquired something that's identical to three similarly useless articles that have been gathering dust in your office for the last 18 months. But, much like Forest Gump, 131"That's all l'm going to say about that."
Registered-Output
fachine$
As is invariably the case in electronics design, there are numerous ways to solve almost any problem. A variant (some may say "subset") of the Moore machine is to omit the output decoder logic completely, and to cause the state variable registers to also serve as the outputs (Figure 15-5).
Next-state decode
State variables
From
inpute ,,.,, ,~.,,,,
clock i~ii~ iili 84 ....
Figure 15-5: In registered output machines the state variables directly drive the outputs
Chapter 15 State Machines 1 8 9 In some cases this occurs naturally (as in the synchronous binary counter), while in others you have to work at it. To a large extent this form of design is conceived using a "back-to-front" approach: 1) Commence with an initial number of state variable registers equal to the number of outputs. 2) Using a state table (or equivalent), determine when you want your outputs (state variables) to be logic 0 or logic 1. 3) Start generating the next-state logic. 4) Add additional state variables (which aren't connected to outputs) as it becomes necessary to uniquely identify each state. 5) Iterate steps (3) and (4) until it works, which sounds easy if you say it quickly, but it can get quite involved. On the down-side, registered output machines tend to require more combinational logic for the next-state decoder than the other architectures. In fact, this type of implementation is prevalent in PLD realizations which are register poor internally, but which are combinationally rich and have registered outputs. However, this style is not limited to PLD implementations, because it has certain advantages in its own right, not the least that the outputs become available in the quickest possible time following the clock's active edge, thereby making this an optimal implementation for high-performance synchronous designs.
~ n i t i a l i z a t i o n a n d I/Indefined S t a t e s It is generally necessaw to provide some way of initializing or resetting your state machine, and, due to its larger number of registers, this initialization will require more intensive routing resources for a one-hot architecture. (This point may be somewhat negated if you're using certain FPGAs, which allow registers to be assigned a default state on power-up). Of more concern is the possibility of some random occurrence, such as a glitch in your power supply, which can throw the machine into an undefined state. If this occurs, then the machine may eventually recover (for example, by leaping to an arbitrary point in the main sequence), or it may hang up completely. The consequences of either of these scenarios obviously depend on the application; there's a world of difference between a toy car failing to respond to its young owner's commands and a nuclear power station locking up and pumping radioactive effluent into a river. 3For those young engineers 100 years in the future who are gazing at this tome in wonder and delight: "Forest Gump" was the title character in a film of the same name that was pretty big news in the mid-1990s.
190 Designus Maximus Unleashed! One solution is to ensure that there are no undefined states; that is, any state not being used is programmed to act in a well-behaved manner and return to a specified entry point in the main sequence. Some designers would regard this as over-kill, while others believe that it's the only way to fly. Once again, the end application may dictate the outcome. The point is that if you do intend to define every state, then this is easier to achieve with a binary-encoded machine than with a one-hot architecture. Consider the simple examples shown in Figure 15-4, each of which involved only five states. In the binary-encoded scenario, this only leaves three undefined states to be accounted for, but the five state variables in the one-hot implementation provide thirty-two possible states, of which twentyseven are undefined. Thus, using one-hot, the undefined state problem can quickly become unmanageable (try contemplating the number of potentially undefined states in a 1,000 state one-hot architecture!). In fact the problem of handling undefined states causes many designers to steer away from one-hot implementations completely. On the other hand, one advantage of the one-hot approach is that state assignment is not overly taxing, while binary-encoded state assignment can bring you to you knees. In the example in Figure 15-4, there are 6,720 different ways in which the five state variables can be assigned to the eight possible states (8 x 7 x 6 x 5 x 4), each of which can influence the number of literals and product terms required to implement both the next-state and the output-decode logic. Now although this 6,720 value ignores reflections and mirror images (in order to make it sound more impressive), finding the optimal solution remains daunting to say the least. The reason for this is that binary-encoded state variable assignment is an "NP" problem (where NP stands for "Non-deterministic Polynomial"). In fact this is similar to the legendary t r a v e l i n g s a l e s m a n problem, whereby a salesman has to determine the shortest possible route between a number of randomly located cities while only visiting each city once. Although the "traveling salesman" appellation may seem humorous, this problem is extremely relevant in many cases; for example, calculating the shortest (and therefore fastest) path that can be taken by the drill head when drilling vias in a circuit board. Some problems are well behaved, in that the time taken to solve them is directly related to the number of variables involved. By comparison, in the case of a traveling salesman-type problem, the solution space explodes as the number of variables are increased. However, there are acceptable techniques for solving these problems (even on paper) and there are software tools to help. While the resulting solutions can not be guaranteed to be optimal, they can be guaranteed to be "no w o r s e t h a n ... ". Although a theoretician might be left wanting more, engineers have to be artful in gauging "good enough."
Chapter 15 State Machines 1 9 1
One-~ig a n d ~ l a n y - S m a l l Definitions Now, where were we? Good grief, we set out to talk about "One-Big" versus "Many-Small," and we haven't even started yet. Fundamentally, there is a spectrum of state machine design, ranging from a single large machine to a number of smaller machines that are interlocked together. Although the one-big approach may be suitable for many problems, it has to deal with all of the input variables; its clock rate throughout has to be sufficient to handle the fastest situation it has to deal with; and you can end up clocking the heck out of a substantial amount of logic that's not actually doing anything at the time (thereby consuming unnecessary power). At the other end of the spectrum are multiple, simple machines, but integrating these small machines together can prove to be a pain in the rear-end. However, each of these sub-machines can be created with a tightly focused number of states, each requires a minimal amount of combinational logic, and each can be potentially operated at the lowest clock rate that is sufficient to its purposes. Unfortunately, it is not sufficient to simply say "many-small" and leave it at that. As is usually the case, I find that there is insufficient established terminology for me to present the problem to my satisfaction. (Alternatively, the terminology may in fact exist, but nobody bothered to keep me informed.) So I will take my usual course of action in these situations and merrily define my own nomenclature as follows: a) b) c) d) e)
One big (non-hierarchical) sequential Many-small (hierarchical) sequential Many-small (hierarchical) concurrent Many-small (peer-to-peer) sequential Many-small (peer-to-peer) concurrent
But l'm not proud (I can't afford to be :-), so once we've discussed these different cases, feel free to email me at [email protected] if you can come up with any alternative designations that you believe to be more appropriate.
JVon-Hierarchical v e r s u s Hierarchical Let us commence this portion of our discussions by saying that, for many applications, there's nothing wrong with a one-big approach. Consider a very linear and sequential problem such as a simple coin operated machine, where nothing is happening in parallel and at every stage the decision is just to do this or do that. In this case, there's no inherent disadvantage to a one big approach and there's no obligation to partition the machine per se. On the other hand, if the machine contains a reasonably large number of states, then partitioning the design into a many-small implementation may convey certain advantages.
192 Designus Maximus Unleashed! If it doesn't cost too much, the act of dividing any problem into well-defined subunits almost invariably results in a net advantage, not least in terms of allowing one to relatively quickly and easily specify, understand, analyze, test, and subsequently modify the individual units. Of these points, perhaps analysis and test are the most important, because it may be computationally impractical to comprehensively analyze a large convoluted problem, while a number of small, well-bounded problems are each amenable to exhaustive analysis. This is, of course, why software programmers tend to employ a number of small functions as opposed to a single large "blob" of code. Similarly, even in the case of a very linear and sequential state machine, it may be productive to partition the design into a number of smaller sub-machines. Apart from anything else, this strategy allows the chief designer to mastermind the design at the conceptual level and to partition the problem, leaving the "junior woodchucks" to frolic around implementing the sub-machines. The classical way to partition a very linear and sequential state machine is to divide the problem into logically self-sufficient units, and to then create an executive (master) machine to control them. A common technique is for the executive to have state variables of the one-hot persuasion, 141each of which activates a particular sub-machine. When activated, the sub-machine locks its associated variable in the executive, which is only released when the sub-machine signals its exit condition. This form of implementation, which we may call "many-small (hierarchical) sequential," is relatively easy to implement, and assuming that all of the machines are running from a common clock (and at a common frequency), interlocking the various machines is comparatively simple. Unfortunately, in the real world, you may be obliged to partition a machine to map it into a number of physical devices. As a general rule of thumb, it's relatively easy to partition your machine into logically self-sufficient units, but it can be an absolute pain to create artificial partitions due to device constraints (for example, if you're using PLDs). If the fates are with you, the logical partitioning will also satisfy the device partitioning; if not, you can end up replicating states in multiple devices, which muddies the waters completely; and which can cause even trivial future modifications to grow into a nightmare, because a simple change may need to be replicated in multiple locations.
~Tt's a e r i m e
to Waste a elock
A key qualifier in the many-small (hierarchical) sequential discussion above was "where nothing is happening in parallel." Ah, if only life were always so simple (but if electronics was easy then everyone would be doing it). There's an old 41t's also possible for the master to be binary-encoded if necessary.
Chapter 15 State Machines
designer's maxim that "It's a cr/me to waste a clock", which can occur if a totally sequential approach is employed. To provide a real world example, I rooted out a friend, Preston jett (a man who, off the top of his head, can tell you the rate at which the switching threshold of the early PMOS transistors deteriorated with use over time). Preston responded by burying me under a mountain of design notes for a mid-1980s memory controller which he implemented using a many-small (hierarchical) concurrent style. In this case, the controller contained four sub-machines: address latch, read, write, and refresh (Figure 15-6).
[~i!!ii i i i i i i i i i i i i ii~lN~y!i iN~fii !i i!i i i!i!i i i i i i !!!!ii i l
I
I
Figure 15-6: Memory controller implemented in a many-small (hierarchical) concurrent style In this particular example, the address latch also served as the executive machine, because it was necessary to latch an address as soon as it appeared, before that address wandered off again to whence it came. One of the reasons that a concurrent implementation was employed was that, even if a refresh was taking place, it was necessary to keep on latching any addresses that came the machine's way. Additionally, this machine also used an "overlapping" approach, in that the write sub-machine flagged that it had completed its task while it was still tidying things up before it had actually finished. This allowed other portions of the machine to start to strut their funky stuff, thereby allowing machine as a whole to save clock cycles. Preston noted that the hardest facet of using independent machines in this style was the complexity of the interlocking, because certain states in each machine needed to keep a watchful eye on states in the other machines. Also that designing concurrently like this is always harder to do, because: "There's more to think about and more to go wrong." However, the advantage of using sub-machines was the simplicity of the machines themselves: there were no
193
194 DesignusMaximusUnleashed! wasted states, and all of the states were doing something or were there for a reason. Had the machine been implemented using a one-big approach, then it would have still needed to constantly monitor for addresses that needed to be latched. This would have resulted in multiple branches, with each branch doing multiple things, in which case managing the branching quickly grows to be as complex as interlocking the smaller machines. Also, unless you jump through hoops (a skill honed to perfection by most digital designers), applying the one-big technique to this type of problem can easily result in latency (and wasted clock cycles), because the machine has to finish processing one thing before it can go back and check something else. Finally, due to the fact that it was to be implemented using PLDs, this particular design employed registered-output machines. But Preston noted that if he were to repeat this design using a single device (such as an ASIC) that was rich in registers and combinational logic, he would probably continue to use exactly the same many-small, registered-output techniques because of the effectiveness of the final result.
Peer-to-Peer flmplementations In this context, l'm using the term peer-to-peer to refer to a generic set of many-small implementations in which the individual sub-machines aren't particularly close friends. By this I mean to imply that there is no hierarchical view containing a high-level executive machine, but rather a collection of individual machines that communicate with each other using semaphores (waving digital flags at each other). In the case of a peer-to-peer sequential machine, only one machine is active at any particular time and, on completion, it passes the baton to another machine of its choosing. By comparison, the peer-to-peer concurrent approach refers to a collection of autonomous machines which are running concurrently, passing information between themselves as required. As usual, in the real world, an peer-to-peer machine will typically contain a mixed bag of sequential and concurrent sub-machines and inter-machine relationships.
H o w to Design M a n y - S m a l l ~ a c h i n e $ Unfortunately, this is going to be short, sharp, sweet, and anti-climatic, because it involves very few specific points and a large dollop of common sense (why do we call it "common" when it's so rare?). The key advise offered by the experts is to always consider the datapath first, because it's the data that is important (what kind, how wide, how often sourced, how often delivered, how processed, and so forth). Ultimately, it's the data
Chapter 15 State Machines
manipulation that determines the requirements of the system controller, and it is only after the system controller's requirements have been determined (from the datapath design) that the controller's architecture should be addressed. Those designers who make early assumptions about the controller's architecture and implementation, but who neglect to take the time to fully understand the data processing requirements, will almost invariably discover that the proposed controller architecture is not equal to the task. Unfortunately, the process of fully understanding and documenting the data requirements requires patience and discipline, which is in short supply with some designers who are eager to "get on with the job." The problem is only exacerbated when dealing with customers who can be excruciatingly vague about their requirements; but if they can't define what they want, how can you possibly make any assumptions about what you're going to deliver? So, it's as simple as that (he said with a wry grin and with his fingers crossed behind his back): 1) Understand the data processing requirements. 2) Document the data processing requirements. 3) Evaluate alternative controller architectures. 4) Select and document the optimal controller architecture. 5) Implement the controller. 6) Prepare to receive applause and accolades from your peers and a pleasant surprise in your pay packet (we can all dream).
i~eal-World Examples It sometimes happens that the author of a technical piece rambles on and on about some pet theory and blinds himself with his own rhetoric ("No," you cry, "tell me this isn't so"). To avoid falling into this trap, I determined to ask some designers in the trenches about any many-small implementations they may have done. Design Analysis Associates (Logan, Utah, USA), kindly volunteered to search their design database, and pulled out the following examples of some of their smaller designs that are currently working in the field, each of which illustrates specific problems and solutions.
E x a m p l e I" ~ a n y - $ m a l l , S y n c h r o n o u s x .3 This design involved a complete DRAM memory subsystem interfacing to a non-trivial system bus. The controller solution was to have three interlocked synchronous machines. Machine # 1 is a very small machine which receives and
195
196 Designus Maximus Unleashed! holds memory access requests, and sends output to (and receives acknowledgment from) machine #2. Machine #2 receives access requests and also the type of access required (DMA, system process read, and suchlike), and initializes the bus datapaths. Machine #2 also communicates high-level information to machine #3, which actually implements all the RAS and CAS timing and the read and write accesses into the memory. Machine #3 also implements the DRAM refresh and the error detection and correction control. Machine # 1 is small, machine #2 is bigger, and machine #3 is the biggest. A major point here is that the "total" machine was partitioned along natural lines, and that each of the sub-machines employs the minimum logic required to do the job. For example, once the initial access request is dealt with, the state machine resources that handled it are released and fall dormant.
Example 2: ~lang-Smail~ S g n c h r o n o u s x 2 a n d ~ s g n c h r o n o u s x I In this case, the design involved three sub-machines to interface a system processor to an asynchronous system bus. Machine # 1 is a synchronous state machine which creates the timing environment for the VLSI processor (based on the processor's clock). Machine #2 is a second synchronous state machine which manages the memory resources, accommodating read/write accesses from the processor and DMA accesses from the system bus. Machine #3 is a fully asynchronous state machine, which implements the asynchronous protocol for the system bus and communicates with machines # 1 and #2 on the board. The point here is that a given controller design may naturally partition down into multiple machines, and that the linked machines can be both synchronous and asynchronous. Example 3: ~lang-SmaU, S g n c h r o n o u s x 3 a n d D i f f e r e n t ~requencies This design featured three sub-machines implementing a bus-to-bus interface, which was required to connect an internal, processor-specific bus to an external 100 megabyte-per-second synchronous system bus. Again, the three machines are all synchronous, but one of the machines has a non-integer clock-frequency relationship to the frequency of the other machines. Machine # 1 uses a 33 MHz clock which is synchronous with the system processor and which talks to machine #2. Machine #2, a rather large machine containing 512 states, is clocked at 25 MHz and talks to machine #3. Finally, machine #3, which implements the synchronous system-bus protocol, is clocked at 12.5 MHz. This example illustrates a case of linking sub-machines across multiple timing domains.
Chapter 15 State Machines
Example ~: ~ l a n y - $ m a l l , S y n c h r o n o u s x 2 a n d P i p e l i n e d This final case involved interfacing a multiprocessor internal synchronous system bus to an external asynchronous system bus. The resulting design involved two machines: the smaller Machine # 1 deals with the control of the internal synchronous multiprocessor bus, while the larger Machine #2 deals with the external asynchronous system bus. Although it is not apparent in this brief summary, the programming environment for Machine #2 had to take account of the fact that the output registers caused a pipelining effect for the control system. This required a special effort to ensure that the pipelining effects did not violate the timing requirements of the system bus.
Summary It is not the intention of the author to recommend one implementation over another, because every design presents its own unique problems which require equally unique solutions. Ultimately, any particular implementation depends on a combination of the target application and the engineering virtuosity of the designer. State machines can provide endless hours of amusement and they can also provide many opportunities for creative expression. However, your creativity should not involve making your design depend on some "corner" behavior of a circuit or device; the focus of your creativity should be in designing the most minimalistic and efficient (in terms of time, power, states, wires, and so forth) total machine which implements the customer's requirements. The topic in this chapter was published in a condensed form under the title Deue ex Machina: State Machines: One-Big or Many-Small?, in the November 9th, 1995 issue of EDN (www.ednmag.com), and is reproduced in ire original form here with their kind permission. For your further reading pleasure, an excellent guide to the implementation of state machines in VHDL or Verilog is HDL Chip Design (A Practical Guide for Deeigning, Syntheeizing, and Simulating ASICe and FPGAe ueing VHDL or Verilog) by Doug Smith, ISBN 0-9651934-3-8, Doone Publicatione (www.doone.com, call 1-800-311-3753).
197
This Page Intentionally Left Blank
Chapter 16:
AsynchronousDesign "Asynchronous is like a box of chocolates In this chapter you will discover: Creators of Ingenious C o n t r i v a n c e s ......................................................................... 200 W h a t is A s y n c h r o n o u s Logic? ............................................................................................... 200 W h a t Did A s y n c h r o n o u s Ever Do For You? ................................................... 201 We C o u l d H a v e Been Heroes, But N o - O n e T a u g h t Us H o w ................................................................................................. 203 A s y n c h r o n o u s C o n t r o l Circuits ............................................................................................ 204 Races, Hazards, a n d Oscillations ................................................................................... 205 Classical S y n c h r o n o u s D a t a - P r o c e s s i n g ............................................................. 207 T u n e d - R a c e s a n d W a v e - P i p e l i n i n g T e c h n i q u e s ................................... 210 Islands of Logic ............................................................................................................................................... 211 A s y n c h r o n o u s Microprocessors? ..................................................................................... 214 Real-World Examples ............................................................................................................................ 216 S u m m a r y .................................................................................................................................................................... 217
200
.
Designus Maximus Unleashed! .
.
.
~ r e a t o r s of ~Ingenious e o n t r i v a n c e s As fate would have it, I was taking a break from riding the unicycle (as you do), and polishing up my knowledge on medieval war machines (as you do), when I came across an interesting nugget of trivia I thought I might share: the origin of the word "engineer." The trebuchet (a honking big catapult), which was invented in China between 500 and 300 BC, was a machine that could hurl heavy missiles with great force. As it turns out, a common word for trebuchet in the West was "engine," which came from the Latin ingenium, meaning "an ingenious contrivance." Similarly, the people who designed and built these devices were called ingeniators, which, over time, mutated into "engineers." Who would disagree that naming our honorable profession after "the creators of ingenious contrivances" is exceedingly appropriate. Of course, the doubting Thomases and Thomasinas among us may say that this has absolutely nothing whatsoever to do with the subject of asynchronous logic, but I have no fear that I can furnish a suitably cunning link before we reach the end of this chapter.
What is ~lsgnchronous s Digital designers often have a knee-jerk response that equates "asynchronous" with "complex, .... dangerous," "unmanageable," or just plain "bad," so it's surprising how many seem to find it difficult to define exactly what asynchronous means. First of all, any particular digital logic function may be categorized as being either combinational (otherwise called combinatorial) or sequential. In the case of a combinational function, the logic values at that function's outputs are directly related to the current combination of values on its inputs. By comparison, in the case of a sequential function, the logic values on that function's outputs depend not only on its current input values, but also on previous input values; that is, the output values depend on a sequence of input values. For example, consider two alternative configurations for two 2-input NAND gates (Figure 16-1).
X
Combinational
Sequential
Figure 16- I: Combinational versus sequential logic
Chapter 16 AsynchronousDesign 2 0 1 The differentiator between these circuits is feedback; that is, in the case of a sequential topology, one or more of the outputs from the function are fed back to act as inputs to that function. In this particular example, the sequential function is obviously that of a common RS-latch whose inputs are level-sensitive. As we are all aware, it is also possible to extend the feedback concept to create more sophisticated sequential functions such as D-type flip-flops, in which any data presented to the data input is loaded into the function by an appropriate transition on the edge-sensitive clock input. In the case of a D-type flip-flop, any changes on the data input have to be synchronized to edges on the clock, so this function is said to be synchronous. By comparison, in the case of an RS-latch, the function immediately responds whenever the inputs change. Because the inputs to the latch aren't obliged to be synchronized to any other signals, this function is said to be asynchronous (literally "not synchronous"). Thus, as a set of rock-bottom, ground-zero definitions, we can say that a synchronous circuit is formed from sequential logic based on clocked memory elements, while an asynchronous circuit is formed from sequential logic which does not employ any clocked memory elements. For this reason, asynchronous circuits are also referred to as self-timed or timingindependent circuits. In a (probably futile) effort to prevent brick-bats from being hurled my way, I should point out that some would regard purely combinational functions as being asynchronous, in that their outputs respond directly to changes on their inputs and they aren't clocked. Ultimately, this depends on one's conceptual vocabulary, but, for the nonce, I intend to stand by my definition that to be considered asynchronous a circuit has to contain feedback.
What Did ~qsynchronous Ever Do 3 o r ~ou? In Monty Python's film "The Life of Brian," there was a sequence in which the revolutionary leader was haranguing his followers as to the results of the Roman occupation. The conversation went something like this (L-Leader, F-Follower): L: "What did the Romans ever do for you?"
F: "They built the roads." L: "Well, obviously they built the roads, you can take the roads for granted,
but, apart from the roads, what did the Romans ever do for you?" F: "Education?" L: "OK, I'II give you that one, but apart from the roads and education, what did the Romans ever do for you?"
202
Designus Maximus Unleashed! ,,
,r
.
.
.
.
.
.
r
F: "Well, they did introduce the public baths." L: "All right, all right, but apart from ..... " And so it goes. Similarly, if one was eavesdropping on a discussion between a synchronous die-hard and an asynchronous hero, one might perhaps hear something along the following lines: S: "What did asynchronous ever do for you?" A: "It requires fewer transistors." S: "Well, obviously it requires fewer transistors, you can take fewer transistors for granted, but, apart from fewer transistors, what did asynchronous ever do for you?" A: "Lower power consumption?" S: "OK, I'II give you that one, but apart from fewer transistors and lower power consumption, what did asynchronous ever do for you?" A: "Well, it does offer. ..... " And, once again, so it goes. But if asynchronous logic does indeed convey certain advantages in certain situations, then why don't we make more use of it? Well as usual there are pros and cons, and the issues can become somewhat convoluted. One of the more obvious considerations is that unlike synchronous logic which can typically be viewed as a series of sequential actions, asynchronous logic generally has to be viewed in a concurrent manner, and we tend to find it harder to conceptualize multiple things occurring simultaneously as opposed to an ordered sequence of events. (OK, I know that all digital logic operates in a concurrent manner, the point here is that we don't worry about concurrency in synchronous systems in the same way that we do in asynchronous systems.) It is also true that asynchronous circuits can exhibit a variety of nasty effects, and that sufficiently powerful design tools for these circuits are somewhat thin on the ground. However, possibly one of the most significant factors is the traditional lack of educational emphasis on asynchronous design techniques, which means that designers don't use these techniques, which curtails the demand for appropriate design tools, which means that EDA vendors don't invest in developing such tools, which .... and round the circle we go.
Chapter 16 AsynchronousDesign203 We e o u l d H a v e ~ e e n Heroes, ~ut: ]~lo-One ~ a u g h t I/Is How In the early days of electronics and computing around the 1940s and 1950s, asynchronous design was much more prevalent than it is today. This was primarily due to the fact that designers were working at the level of individual switches implemented as vacuum tubes or transistors (there weren't any digital integrated circuits at that time). In those days of yore, components were significantly more expensive (in relative terms) than they are today; in fact, at one time designers were cheaper than transistors! The introduction of digital integrated circuits resulted in the fact that many of today's practicing designers grew up using the 7474 D-type flip-flop or it's later derivatives as the register (sometimes synchronizer) of choice. By some strange quirk of fate, the emergence of devices such as these did more than simply gather a lot of logic gates into the same package; they also allowed us to bundle up the majority of nasty asynchronous issues into the package and to subsequently forget about them. The end result is that, assuming that designers observe the various data-to-clock setup and hold relationships associated with these functions, the functions themselves can be treated as synchronous devices, while their internal asynchronous machinations can remain hidden ("out-of-sight, out-ofmind" as the saying goes). As we now know, the infrastructure that subsequently grew up around this approach became extremely pervasive and successful, and the majority designers are now "living on the edge" (pun intended). A telling point would be to ask how many of today's designers could actually design a 7474. Be warned that this is a non-trivial task, because the device itself is essentially a totally asynchronous sequential circuit. A somewhat related point is that asynchronous synthesis techniques (as described in hardware textbooks) all produce different designs, sometimes radically so. If you were to collect samples of even a simple binary toggle from a variety of hardware textbooks, you would end up with a pile of substantially different solutions. Also, these text-book examples of binary toggles almost never have as few gates as you would obtain by starting out with a data book schematic of a 7474, connecting the complementory ~ output back into to the da~a input, and ripping out the unused inputs. The old electronics books used to cover asynchronous design in depth, but authors of more recent books generally seem to have decided that asynchronous designs are "not today's methodology." Similarly, college courses primarily concentrate on synchronous techniques, while asynchronous approaches are, to a large extent, glossed over. For example, when introducing the concept of state machines, very few text books or college courses put forward a balanced view of
204
Designus Maximus Unleashed!
sequential machines in the form of sequential synchronous and sequential asynchronous. Instead, they concentrate almost exclusively on sequential synchronous and slap down D-type flip-flops as though they were going out of style, but they rarely address what's actually inside the D-type flip-flop to any level of detail. In fact, the D-type flip-flop contains multiple interconnected loops of un-clocked feedback. (Note that actual 7474 implementations don't exactly follow the data-book's gate-level schematics, but the discussion points we're making here are valid non-the-less). If these devices were presented in the right way, it's possible that the scales would fall from the students' eyes, and they would realize that the whole world (of electronics) hangs together in a cohesive way.
~ s y n c h r o n o u $ e o n t r o l Gircuit$ There are two major application areas for digital electronic circuits: either as control logic or for data processing and manipulation (or, not uncommonly, as a mixture of both). Thus, one possible use for asynchronous logic is to implement a control structure as an asynchronous state machine. An obvious example of where an asynchronous machine would be preferred to its synchronous counterpart is when interfacing to an asynchronous system bus (as was discussed in the Real-World Examples section of the previous chapter). To illustrate the gross differences between the two approaches, first consider a traditional synchronous state machine (Figure 16-2). Next-state decode From external Inputs (if any)
State Next 5rate
variables Current State
To output logic
clock .
.
.
.
.
.
.
Figure 16-2: The core of a traditional synchronous state machine
Hopefully, this synchronous representation will be reasonably familiar. The values on the input variables are combined with the current state from the state variable registers to generate the next state values, which, in turn, will be loaded into the state variable registers on the next active edge of the clock. The main
Chapter 16 Asynchronous Design requirement on the designer is to ensure that the next-state data satisfies the setup and hold requirements of the registers. By comparison, an asynchronous state machine doesn't have a clock or registers (Figure 16-3), but instead it acts as its own memory. (When we refer to state variables in this type of machine, we are actually referring to the outputs from the next state decode logic, which are fed back to act as internal inputs).
Next-state decode (includes state variables) From inpu'r,e
To output logic
Figure 16-3: The core of an asynchronous state machine
Irrespective of the fact that the asynchronous machine doesn't have registers per se, it is still a state machine in every sense. A change on the inputs is combined with the current state to generate the next state values, and the machine quickly comes to adopt a new, stable state. One advantage of the asynchronous implementation is that it will immediately respond to any changes presented to its inputs (unlike the synchronous machine which is obliged to wait for the next active clock edge). However, although the asynchronous machine appears to be the simpler of the two diagramatically, in reality it presents a number of timingrelated effects that need to be accounted for, such as races, hazards, and oscillations.
I~aces, H a z a r d s , a n d Oscillations One of the more common pieces of advice offered in texts on asynchronous state machine design is to: "Ensure that only a single input can change at a time, and only allow a single output to change at a time." In reality, the world tends to not give a hoot about unit state difference, so you typically have to deal with the fact that your inputs are going to change when they want and, unless you bend over backwards, your outputs are going to change as the logic demands.
205
206
Designus Maximus Unleashed!
If you can achieve the ideal case, in which each next-state code differs from the current state by only a single bit (that is, only a single state variable is ever unstable at any particular time), then your cup runneth over and your state machine is blessed. However, life is rarely so accommodating, at least on the planet that the author is wont to call home. If more than one feedback variable is transitioning at any time, then a race condition is said to exist. For example, assume that our asynchronous machine has three state variables, and consider a change at the inputs that causes these variables to change from 011 to 000. Due to unequal delays in the circuit, the variables could actually pass through two alternative intermediate states on their way: 010 or 001. Depending on whether or not it matters if the machine passes through these intermediate states, then the race condition would be classed as a "critical race" or a "non-critical race," respectively. In the case of a critical race, the intermediate states could potentially cause the machine to hang up, to generate unwanted output on a temporary basis, or to set the machine off on a completely different path. In a worst-case scenario, the machine could go into oscillation wrecking untold havoc on the rest of the system (this is generally not considered to be a good thing to happen). However, there are several well-documented techniques for converting critical-races into their non-critical counterparts, including (a) adding extra intermediate states, (b) duplicate existing states, and (c) providing phantom states that also transition to the destination state (for example, in the case of 011 to 0 0 0 as discussed above, ensuring that states 010 and 001 are both designed to make an unconditional transfer into state 000). In addition to the fact that delays inside the circuit can cause the machine to pass through intermediate states, they can also cause unwanted spikes or glitches which are known as hazards. These hazards can circulate around the loop to appear on the fed-back inputs to the machine, which can be detrimental to that machines functioning correctly to say the least. Thus, we need to ensure that hazards don't occur, but first we need to understand where they come from in the first place. Consider the equations resulting from a standard Karnaugh-map minimization of an example function (Figure 16-4). Assume that these gates are buried in the middle of the asynchronous state machine's next-state decode logic. As we see, if inputs a and I~ both carry logic 1 values, then a falling transition on input c can cause a hazard on output y. (In this particular example the hazard would be referred to as a static hazard, because it only involves a single glitch on the output. By comparison, a dynamic hazard would have caused multiple glitches). The solution is to add an extra term into the Karnaugh map (and thus an extra gate into the circuit) to eliminate the hazard (Figure 16-5).
Chapter 16 Asynchronous Design a
01
oo 0
11
10
!,
I
~i!iii!i!ii!iiiii!iiii!iiiiiiiiiiiiiiiiiiii~bo ~
G
y = (a & .,.c)I (b & o)
o y
iiiliiiiili!i!i~i....... ~i~
I
I U"
hazard
Figure 16-4: Simple function which generates a hazard
C~o0
01
11 10 ~ii2ili i iiiiiiiiiiiiiiiiiiiiiiiiiiiiiii~b
y - (a & ~c) I (b & c) l(a & b) Figure 16-5: Modifying the function to eliminate the hazard It therefore becomes more difficult to generate the combinational logic for an asynchronous state machine than for a synchronous implementation, because every transition function must be created so as to be hazard-free. Unfortunately, although state assignment programs specifically intended for asynchronous machines typically take hazards into account, the majority of general-purpose logic synthesis tools do not support the generation of hazard-free logic. I~l
elassical Synchronous Data-processing As was previously noted, digital electronic circuits may be employed as control logic or for data processing and manipulation applications (or as a mixture of INote that the BOOL logic synthesis utility discussed in Chapter 2 and provided on the CD-ROM accompanying this book will generate hazard-free logic (if requested to do so).
207
208
Designus Maximus Unleashed!
both). A classical scenario for a synchronous data processing implementation is illustrated in Figure 16-6. ~letere
Combinational
~le~re
Combinational
IP,.~ister~
<;lock
Figure 16-6: Classical synchronous data-processing scenario The beauty of this generic model, which is the one that's predominantly taught at school, is its inherent simplicity resulting from the partitioning of the combinational logic into well-defined functions. All of the registers are driven (or derived) from a common clock, where each active clock edge causes the current results from a portion of combinational logic to be stored into its downstream registers, and new data to be presented to the combinational logic from its upstream registers. In addition to the fact that it works, the main advantages of synchronous design are that it's (relatively) easy to understand, it partitions the problem into well-defined chunks, and it can employ well-understood techniques. Synthesis technology can be used to generate the logic, and both simulation and static timing analysis tools can be used for verification purposes. Synchronous designs are also amenable to automatic test pattern generation (ATPG) and to automatic testing and fault isolation. Furthermore, synchronous designers are relatively unconcerned about generating hazard-free logic in the combinational portions of their designs; their only requirement is that the combinational outputs are stable (and observe any setup and hold specifications) around the next active clock edge. On the other hand, the disadvantages of this approach are speed, power consumption, and the difficulty of getting the clock to go where and when it's needed. First, the clock period has to be set such that it provides for the slowest portion of combinational logic. There are a variety of clever tricks and cunning ploys that can be used to alleviate this, but these often exhibit more than a hint of (dare we say it?) asynchronous behavior. Also, operations such as addition and multiplication can take an arbitrary amount of time depending on the data being processed, but the clock can only account for the worst-case scenario. In addition to accommodating the longest combinational delay path, the clock period must be long enough to ensure that the design will operate under worst-case operating
Chapter 16 AsynchronousDesign conditions. That is, the clock period must account for delays over the entire temperature range and manufacturing tolerance of the design's constituent parts, and it cannot take advantage of fact that the actual delays will probably be far shorter than these worst-case delays. In addition to the physical problems of routing the clock (both at the integrated circuit and circuit board levels), the clock signal propagates throughout the system at a finite speed, resulting in clock skew as the signal arrives at different points on the track at different times. Although there are a number of techniques for distributing clocks around a system, there will always be some amount of skew. (In the case of deep-submicron chip design, the minimum clock skew can easily equal several gate delays, even when the clocks are distributed). Ultimately, the maximum clock skew between different portions of the system must be added to the worst-case delay path to return the minimum clock period, which means that the larger the system becomes, the slower the clock can run. To mitigate this, some systems employ local clocks on a subsystem basis, but this then obliges the subsystems to synchronize, which, in addition to hairy timing issues, leads to an average 1/2 clock delay before a subsystem "sees" data from its adjacent upstream subsystem, plus a second clock for the subsystem to respond. Another disadvantage of the synchronous approach is its power consumption, particularly in the case of VLSI and ULSI integrated circuits. These circuits employ CMOS technology whose static power dissipation (when it's not switching) is negligible, but whose dynamic power dissipation (when it is switching) can be horrendous when you're dealing with millions of gates. The problem with a synchronous system is that every register is potentially switching on every active clock edge, irrespective of whether or not that register (and its associated downstream combinational logic) is actually doing anything. A related point is that systems such as microprocessors contain a large amount of functional latency, in that only a relatively small portion of the logic is typically being used for a real purpose during any given clock cycle, but even the unused functions continue to burn up power when all of the registers are being stimulated on every clock. Of course, this all goes some way to explain why today's high-end processor chips can dissipate tens of watts of raw power! Having said all this, you should not assume that the purpose of this article is to convince you that asynchronous techniques are the light at the end of the tunnel and that they that are going to lead us forth into a new era, because they're not. Also, you should not assume that this article is attacking synchronous techniques unnecessarily, because all of the above problems are well-known and well-understood aspects of synchronous designs. As we noted at the beginning of this section, one of the most important aspects of synchronous designs and
209
210
Designus Maximus Unleashed!
methodologies is that they work, which is why chip vendors can roll out new microprocessor architectures as quickly as they do.
~uned-i~aces and Wave-Pipelining ~echniques Before we consider some aspects of asynchronous data-processing designs, perhaps a word of advice is in order. One of the biggest problems with asynchronous design today is its controversial nature. In the "good old days," the ability to get a lot done with a small amount of asynchronous logic was considered to be a mark of skill. By comparison, in many of today's electronics environments, even mentioning the word "asynchronous" can brand you as being immature or dangerous. Some design houses (and designers) take the attitude: "Synchronous was good enough for my manager, it was good enough for my teamleader, and by golly it's good enough for me!" The end result is that even designs that would obviously lend themselves to an asynchronous implementation are often hammered, battered, and squeezed into a synchronous mold. So, if you work in this sort of environment, and if you ever feel the urge to solve a problem using an asynchronous solution, then for goodness sake refer to it by another name such as "pseudo-synchronous." Asynchronous techniques started to be formally investigated in the 1950s and 1960s when components were expensive and circuits were too slow. One of the pioneer's of asynchronous design was David E. Muller, who devised a delayinsensitive logic gate called a C-element that produced an output only when all of its inputs were the same. Eventually, the fundamental concepts of asynchronous design were formalized by S.H.Unger in 1969 in his book Asynchronous Sequential Switching Circuits (John Wiley & Sons, Inc.). Over time we have come to recognize a number of different asynchronous techniques, one of the more daring being known as matched delays or tuned races. The premise here is that, if you know your delays and if you know the degree to which these delays can be matched, then you can build regular arrays of circuits which take advantage of these timing details and the resulting computation can be made to go much faster. (Note that this is not asynchronous control per se, it's just that the process implicitly embeds control in the propagation delays of the data pathways.) An example of this technique is found in wave pipelining, which relies on the inherent propagation delays of combinational circuits to store data. Consider the synchronous structure that we previously discussed in Figure 16-6, and assume that one of the blocks of combinational logic represents a multiplier array. The traditional way in which this portion of the system would operate
Chapter 16 Asynchronous Design would be for its upstream registers to present new data to the inputs of the array, for this data to eventually propagate through the array, and for the next clock edge to store the results from the array into its downstream registers. However, data can take a significant time to propagate through a multiplier array, which could well be the slowest portion of combinational logic in the design. Under this traditional scenario, the clock period must be extended to accommodate the delays through the array, resulting in the whole system being slowed to the speed of the array. Assume for the purposes of this discussion that the maximum delay through the array is 50 ns. The majority of classical synchronous designers would be predominantly concerned with this maximum delay, but they will typically be relatively unconcerned as to what the minimum delay turns out to be. Now assume that, by tuning the delays, it were possible to guarantee that the minimum delay were 40 ns, thereby resulting in a 10 ns window. To a large extent, the wave pipelining technique is based on this "window of opportunity." As usual, data is presented to the inputs of the array from its upstream registers, which initiates a wave of activity propagating through the array. However, as soon as we can guarantee not to interfere with the current wave of data, the upstream registers can be caused to present new data to the array, thereby casting off a new wave. Using this mechanism, multiple waves of data can be propagating through the array simultaneously, separated only by time. In our example, the waves could potentially be separated by 10 ns, allowing up to five waves to be present in the array at any particular time. Assuming that the multiplier array is at least five times slower than any of the other portions of combinational logic in the system, and ignoring any side-issues which would only muddy the waters, the end result of the wave pipelining approach in this example is that the system's clock speed can be increased by a factor of 5x. Unfortunately, although tuned races techniques such as wave pipelining are exciting, they do require extremely precise process knowledge and extensive engineering and simulation. Some work is being performed on timing-driven silicon compilation techniques, and there have been some notable real-world successes in commercial products, but it will be quite some time before these techniques become widely available to mainstream designers.
slands of s Perhaps a more generic asynchronous technique is to divide circuits into "islands of logic," which communicate by hand shaking using asynchronous links (Figure 16-7). When a process receives a flag saying "l've got something for you" from its upstream counterpart, it performs the necessary operations on the data
211
212
Designus Maximus Unleashed!
being presented to its input variables, transfers the results to its output variables, and also passes its own "l've got something for you" flag along to the next process downstream. In addition to the "I've got something for you" flag coming in, each process also passes back an "I'm ready for more" flag to it's upstream companion. I've got somcC,hing .
.
.
.
I've got eomcr, hing .
rye got eom~t;hing
.
iiiii
.
.
.
.
_
I'm reaciy for mor~
I'm really for more
_
_
_
I'm really for more
Figure 16-7: Generic synchronous data-processing scenario This use of local completion detectors and hand-shaking between fully asynchronous blocks can be extremely robust, because proper operation of the system is not dependent on detailed process-specific delay information. Of course, nothing's simple as usual, and there are at least two main techniques for this style of asynchronous system. One is to have special control logic which senses when operations are completed based on information embedded in the circuit path, but this form of control tends to slow the circuit down. Another approach is to use two wires for each bit of information. In one flavor of this approach, one of the wires is used to represent the data bit while the other is used to carry the control/timing information. In another flavor, the four different binary combinations that can be carried by the two wires are used to reflect both logic values and control status; for example, 01 - True, 10 - False, while O0 and 11 = Not Ready. In both of these flavors the control typically does not slow the data, but it does require more silicon. The main interest in asynchronous data processing is its potential for creating faster systems that use less power. Using the asynchronous techniques described here, each subsystem is almost completely independent, and the only timing constraints are those at the interfaces between the subsystems. The point is that, as long as the design can implicitly guarantee the integrity of the signals at the subsystem interfaces, then the system as a whole will always run at its maximum possible operating frequency for its current operating conditions and manufacturing tolerances. In regards to power usage, the asynchronous system deals with each operation on an "as-needed" basis, which means that an
.....
,
,,,,
,,,,
.
.
.
.
.
.
.
.
.
.
. . . . . . . . . . . . . . . .
,
,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Chapter 16 Asynchronous Design .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
asynchronous system conceptually halts when calculations are complete and starts up again immediately when new data appears. Of course, synchronous designers would point to the push for "Green PCs," in which serious power savings have been obtained from a collusion between the hardware and the software. In fact these designers have done amazing things, such as slowing or even stopping the system clock when the processing falls idle. However, they usually can't do it on an operation-by-operation basis, which is an inherent facet of an asynchronous system. Additionally, when the processor is running in a synchronous system, every function is "woken up" and consumes power on every clock cycle, irrespective of whether or not that function is actually being used. 12~By comparison, in an asynchronous system, the only portions of the logic that are active (switching) at any particular time are the ones which are actually doing something at that time. Another benefit of this style of asynchronous system is the ease of future design modification. As opposed to synchronous designs, in which it can be difficult to incrementally improve the speed of the system as a whole, the asynchronous approach supports future modification of selected bottleneck portions; that is, individual modules may be upgraded or replaced, and all of the other portions automatically speed up. If we did have the ability to create fully asynchronous designs at the level of microprocessors, then the ability to evolve such processors one section at a time confers obvious (and major) benefits. However, having sung the asynchronous song "in four-part harmony and stuff like that," we should also note that, in addition to their inherent complexity, one of the biggest disadvantages of designing asynchronous systems is the lack of widely-available, product-quality design and verification tools. For example, logic synthesis tends to fall fiat on its face when dealing with sophisticated asynchronous designs, and static timing analysis isn't much better. In the case of synchronous designs, it is generally only necessary to know the slowest path (the longest delay on any path between registers), but in asynchronous designs it is necessary to know both the slowest and fastest paths, and usually every combination in-between for every related path; that is, the designer must worry about signals arriving too early as well as too late. Static timing tends to be good at detecting worst-case paths (maximum delays), but its not quite so focused toward detecting "best-case" paths (minimum delays). Additionally, along with a number of other tools, static timing analysis usually requires the designer to break asynchronous loops, which can, in turn, introduce other problems. 2Note that really advanced synchronous designs such as Intel's Pentium" Pro and Pentium II processors do have a reasonably high level of ability to "turn off" logic that's not being used on an instruction-by-instruction basis (but they still burn approximately 40 watts of power).
2 13
214
DesignusMaximusUnleashed! On the brighter side, digital logic simulators are generally well-equipped for verifying asynchronous designs, because they tend to function in the same way as does the logic. Digital logic simulators can verify minimum and maximum delay paths, while their dynamic timing verifier cousins can verify all of the minimum-maximum delay combinations. The relative advantages and disadvantages of static timing analysis versus dynamic timing verification often provokes intense debate, but it is generally accepted that dynamic timing verification holds the high ground in the case of verifying asynchronous designs. In fact the only real concern with regards to the simulation of asynchronous logic is in the initialization of the circuit and the clearing out of unknown X values. However, this does not tend to be a major problem, because the majority of simulators provide a variety of automatic and designer-driven techniques for converting these unknown values into good logic 0s and logic ls (see also Chapter 10).
~ s y n c h r o n o u s )14icroprocessors? The first completely asynchronous microprocessor was created at the California Institute of Technology (Caltech) in 1989. Although this device was nowhere as complex as a modern high-powered synchronous processor, it was a substantial enough machine that it started attracting attention from the "big boys." Subsequently, a team at the University of Manchester, England, constructed an asynchronous version of the ARM-6 chip (the original synchronous version is the processor which is used in Apple's personal organizer, Newton), and other teams around the world are proceeding apace. Proponents of asynchronous systems often say that, theoretically, systems at the complexity level of microprocessors could gain a 2x to 3x speed advantage if designed using asynchronous techniques, and that they would potentially only use 20% of the power required by their synchronous counterparts. To repudiate these claims, the synchronous guys are constantly coming out with improvements and, pointing toward the maturity of their tools and methodologies, would claim that synchronous designs will always be the better. Comparing the sub-processes in an asynchronous design to runners in a relay race, it now appears as though the bulk of the speed savings in the individual processes (comparable to the individual runners sprinting) are canceled out by the effort requires to stop the processes stomping all over each other at the system level (when they pass the baton). However, the energy savings may be real, because a number of asynchronous designs have been realized for relatively complex functions that do in fact use only 20% of the power required by their synchronous equivalents.
Chapter 16 AsynchronousDesign
So could we design an asynchronous equivalent of a Pentium ll-level microprocessor today? Unfortunately, the answer is probably no. There are several reasons for this, not the least that it would require a large design team, and the majority of designers aren't conversant in asynchronous techniques to a high enough degree. Also, the tools and design methodologies aren't there, so designers prefer to use synchronous techniques if at all possible because they are well-understood and safer. Having said this, the team at Caltech now boast a complete asynchronous programming environment which supports formal description and formal verification. The resulting "correct by transformation" logic is delivered with all the necessary handshaking logic and guards in place. The Caltech team are currently working on a Silicon Graphics MIPS compatible processor, and their tools seem poised to move out of the research environment. However, it is also true that billions of dollars have been invested in synchronous tools and training, and that such humongous stakes are involved in generating a next-generation microprocessor that, not unreasonably, it is difficult to find someone who will risk a project by basing it on a totally different technology. Having said this, it is widely known that Sun fellow and Turing laureate, Ivan E. Sutherland, was working on a project to evaluate the feasibility of creating an asynchronous version of Sun's SPARC chip. In fact, the rumor on the streets is that Ivan's team is hoping to come up with a next-generation asynchronous UltraSPARC. So does this swing the balance back in favor of commercial asynchronous microprocessors? After all, if Ivan (and therefore Sun) do achieve their goal, could the other microprocessor houses afford to miss the boat? The answer (and you can quote me on this in years to come) is that ........ no-one knows, which is, after all, what makes electronics so interesting (Good grief, I love this industry!). What is certain is that asynchronous tools and techniques are becoming available, and that there's a growing resurgence in asynchronous digital design. What seems to be happening is that microprocessor design houses are quietly replacing targeted portions of their systems with asynchronous units. For example, HaL Computers replaced their processor's division unit with an asynchronous equivalent which is said to be 4x as fast, and Hewlett-Packard added an asynchronous floating point multiplier into their 100-MHz RISC processor. Other microprocessor vendors seem to be taking the same tack, which is not to attempt to plunge all the way into asynchronous designs, but rather to replace relatively small portions at a time. In reality, these hybrid solutions are probably the way the world will go, and we will see more and more complex digital systems, including microprocessors, that are based on asynchronous sub-units residing in a synchronous framework. Perhaps the last word on the subject comes from a designer who is equally at home in the synchronous and asynchronous domains,
215
216
DesignusMaximus Unleashed! who told me: "As clock frequencies continue to increase, there will come a time where there's no such thing as classical synchronous design any more."
l~enl-World Exnmples As for the previous Chapter, I approached the boys in the trenches at Design Analysis Associates (Logan, Utah, USA) for some real world examples. As they were so helpful in the case of my state machine questions, I decided to press my luck and ask about their experiences with asynchronous designs. Strangely, I thought that I could hear someone sobbing in the background when they realized that it was me again (doubtless some problem with the phone line). Anyway, as usual, they came through like troopers with the following:
Example #I: ~lsgnchronous eontroller s
~wo ~Ciming Domains
We have executed too many of this type of design to mention. The general characteristics of this problem are: the need to link between two timing domains, the need to move data between the two timing domains (control buffer directions and drive states, control latches/registers), and the need to pass control (request, acknowledgment) between the two domains.
Example #2: Flsgnchronous 3~130 (~tirst gn 3irst Out) ~ u f f e r eontroller FIFOs have been used for many years in digital systems design, well before "commodity" single-chip FIFOs became available. A number of asynchronous FIFO controllers have been designed using fast static RAMs for FIFO storage. Our experience teaches us that it's highly desirable to be able to design the asynchronous FIFO controller as a distinct control element, separate from: (a) the system on the input side of the FIFO, (b) the system on the output side of the FIFO, and (c) the control mechanism which might already be associated with the FIFO's memory. It very seldom occurs that these three subsystems, each characterized by their own signal transitions and timing constraints, accidentally work well (much less optimally) with each other.
Example #3: phase s Recovery
Coop eircuit ~or elock ~Ind Data
PLL circuits are needed anywhere that timing is embedded in a serial bit stream. In this case, there is no separate clock line and the clock has to be derived completely from the signal transitions in the incoming information stream. (Note that PLLs are also being used for clock multiplication and distribution within and between VLSI chips in high performance computing systems.)
Chapter 16 AsynchronousDesign The key asynchronous sequential controller that is almost universally evident in all PLLs is the phase and/or frequency detector circuit. It is this circuit's task to compare the phase (frequency) of the incoming signal with the signal generated by the local clock. The requirements of this circuit are to be extremely fast (in comparison to the derived clock frequency), and to introduce no spurious influences into the clock recovery system. Many designs have been executed for clock and data recovery in high speed magnetic disk storage systems and in video transmission systems. In all three of these areas (and there are many more), we've learned that it is essential to be able to view the requirements of each asynchronous controller design from a fresh perspective. Once the requirements are understood, it may be possible to make use of a previous design. However, trying to modify an existing design before the new requirements are formally articulated is never done.
Summary As usual, the purpose of these discussions is not to cause you to run around in ever-decreasing circles shouting "Don't panic, don't panic, " or to drop whatever is that you're doing to adopt an entirely new design methodology. However, you should not always assume that a synchronous approach will provide the optimal solution to any problem, and you should steer clear of trying to force a "round" asynchronous problem into a "square" synchronous hole. So when you're evaluating your next design, look at it carefully to see if perhaps a small portion of it could benefit from an asynchronous approach. And finally, we return to trebuchets and their cunning link to asynchronous logic that was promised in the introduction. A trebuchet was based on a long beam with a sling at the end. One end of the sling was attached directly to the beam, while the other was hooked over a metal prong stuck in the beam. When the beam was rotated and had reached the correct point of arc, the end of the sling slipped off the prong and released the projectile. Obviously, correct adjustment of the prong was critical in order to release the sling at exactly the correct time. The last time a trebuchet was used in anger was at the siege of Mexico City in 1521. Due to the fact that they were running out of ammunition, Don Cortes instructed his men to build a trebuchet. But Cortes' men were inexperienced with this weapon, and when they used the beast for the first time the missile flew straight up and then returned on the path from whence it came, smashing the trebuchet to pieces in the process. So, the moral of this story is that designing asynchronous logic is just like using a trebuchet ...... it's all a matter of timing really!
217
218
Designus Maximus Unleashed!
The topic in this chapter was published in a condensed form under the title To be or not to l~e Asynchronous; that i~ the question, in the December 7th, 1995 issue of EDN (www.ednmag.com), and is reproduced in its original form here with their kind permission.
Chapter 17:
Linear Feedback Shift Registers "The ouroboros of the digital consciousness" In this chapter you will discover: Hold That C o u c h Carl!
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
220
M a n y - t o - O n e I m p l e m e n t a t i o n s ................................................................................. 220 M o r e Taps Than You C a n Swing a Stick A t ........................................... 222 O n e - t o - M a n y I m p l e m e n t a t i o n s ................................................................................. 225 S e e d i n g a n LFSR ................................................................................................................................... 226 Using LFSRs in FIFOs .......................................................................................................................... 227 M o d i f y i n g LFSRs t o S e q u e n c e 2 ~ Values .................................................... 228 A c c e s s i n g t h e Previous V a l u e ...................................................................................... 229 A W e a l t h of A p p l i c a t i o n s ...................................................................................................... 230 Last But N o t Least ..................................................................................................................................232
220 Designus Maximus Unleashed! Hold ~ h a t eouch earl! The Ouroboros - a symbol of a serpent or dragon devouring its own tail and thereby forming a circle- has been employed by a variety of ancient cultures around the world to depict eternity or renewal. (Not to be confused with the Amphisbaena, a serpent in classical mythology having a head at each end and capable of moving in either direction.)(~) In fact, due to its widespread emergence, the great psychoanalyst Carl Jung speculated that the Ouroboros could be part of the collective consciousness. In the rarefied realm of the digital designer, one obvious equivalent to the Ouroboros would be a Linear Feedback Shift Register (LFSR), in which the output from a standard shift register is cunningly manipulated and fed back into its input in such a way as to cause the function to endlessly cycle through a sequence of patterns. In many ways LFSRs combine a pleasing simplicity with a seductive appeal and "to know them is to love them" (it's interesting to speculate what Jung would have had to say about this ... "/-/old that couch Carl, l'm on my way!") As it happens, comparing an LFSR to the Ouroboros is particularly apt for another reason, because, in much the same way that the Ouroboros symbolizes renewal, the subject of LFSRs keeps on rearing its head again, and again, and again ...... Strangely enough, although LFSRs are simple to construct and are useful for a wide variety of applications, they are often sadly neglected by designers. So in addition to tweaking the things you already know but which have faded away into the nooks and crannies of your mind, this discussion also proffers a few interesting twists on LFSRs for you to peruse and ponder.
~ a n y - ~ o - O n e ~Implementations One of the more common forms of LFSR is formed from a simple shift register with feedback from two or more points, known as taps, in the register chain (Figure 17-1). XOl~
I
i
!
(a) Symbol
(b) Implementation Figure 17-1: LFSRwith XOR feedback path
~Not unlike someone occupying a management position.
1 cl~
Chapter 17 Linear Feedback Shift Registers
All of the register elements share a common clock, which is omitted from the symbol for reasons of clarity. The taps in this example are at bi~ 2 and bi~ O, which can be referenced as [2,0]. The data input to this type of LFSR is generated by XOR-ing (or XNOR-ing) the tap bits, while the remaining bits function as a standard shift register. In fact the reason LFSRs are so named is due to their XOR or XNOR feedback functions, which are referred to as linear operators. The sequence of states generated by the LFSR is determined by both its feedback function (XOR versus XNOR) and tap selection. For example, consider the sequence followed by an XOR-based LFSR with taps at [2,0] loaded with an initial value of 1002 (Figure 17-2).
XOR
Figure 17-2: An n-bit maximal-length LFSRsequences through 2n-I states This LFSR is said to be of maximal-length, because it sequences through every possible state (excluding the state where all of the bits are 0) before returning to its initial value. A binary field with n bits can assume 2" unique values, but an unmodified maximal-length LFSR with n register bits will only sequence through (2"- 1) states. LFSRs with XOR feedback paths will not sequence through the state where all the bits are 0, while their XNOR counterparts will not sequence through the state where all the bits are 1 (Figure 17-3). Additionally, an unmodified LFSR cannot be permitted to enter into its prohibited state (all 0s for XOR feedback paths or all ls for XNOR feedback paths) during random initialization on power-up, because there is no way for the function to exit that state without some external assistance in the form of additional circuitry.
221
222
Designus Maximus Unleashed/
XOR
.
.
.
.
.
.
.
.
.
.
.
.
XNOR
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . .
All O~ All ls
(a) XOR-basecl LFSR
(b) XNOl~-basecl LFSl~
Figure 17-3: XOR versus XNOR feedback paths
)Wore ~aps CChan ~ou e a n Swing a Stick R t By some strange quirk of fate, each LFSR supports a multitude of tap combinations that will generate maximal-length sequences; the problem is weeding out the ones that do from the ones that don't (badly chosen taps can result in the register entering a loop comprising a limited number of states). Another, more insidious, problem is the fact that documented tap combinations often contain errors. In the days of yore, before I became the cynic that I am today, I contacted a publisher requesting permission to reference such combinations from one of their books, but the cheeky little rascals responded by asking me to send them some currency of the realm. As an alternative to parting with my hard-earned lucre, I penned a simple program to ascertain valid tap combinations. As luck would have it, it turned out that the publisher had performed me a service, because I discovered that approximately 5% of the values in their book were incorrect! So just because something is written down in black and white, one can't assume that it's
Chapter 17 Linear Feedback Shift Registers
necessarily correct, but we digress .... Using my program I extracted all of the possible tap combinations for maximal-length LFSRs with 2 to 32 bits. The program ran for several weeks and generated humongous quantities of alternative tap combinations, samples of which are presented in Table 17-1.
Table 17-I: Taps for maximal length LFSRs with 2 to 32 bits (* annotations indicate sequences whose length is a prime number)
223
224
Designus Maximus Unleashed!
The taps are identical for both XOR-based and XNOR-based LFSRs, although the resulting sequence will, of course, differ. Also, as was noted above, multiple alternative tap combinations may also yield maximum-length LFSRs. For example, in the case of a 10-bit LFSR, there are two 2-tap combinations that result in a maximal-length sequence ([2,9] and [6,9]); along with a further twenty 4-tap combinations, twenty-eight 6-tap combinations, and ten 8-tap combinations that also satisfy the maximal-length criteria. However, once again, note that the actual sequence of states will vary depending on the tap selections that are used. My original program was a little long-winded because it was testing every conceivable tap combination. Even worse, this program disappeared into the nether regions when my hard disk threw a wobbly, but a simple example for a 12-bit XOR-based LFSR using the taps from Table 17-1 could be as follows: /* Note there's a pre-calculatecl lookup table called "lookup" not shown here*/ main 0 { /* Bit 11 Bit 0 */ /* v v */ int lfsr = 1; /* Initial value = 0 0 0 0 0 0 0 0 0001 */ int maek= 2089; /* Mask value = 1000 0010 1001 */ int Index;
/* Used to store the intermediate result */ /* and Index into a pre-calculated table */
wh,e (I)
/* Loop around generating values
{
}
}
Index = l~r ^ mask; lf~r < <- 1; l ~ r != lookup[index], printf(~value is %cl \n",lfsr),
/* /* /* /* /*
Get the valuee on the tape Shift the LFSR left one bit Insert eerlal data (see notes below) Display the output Return to the beginning of the loop
*/ */ */ */ */ */
The only tricky point is the line of code above the printf statement. After combining the contents of the lhar with the mask in the first line of the while loop, the value of Index contains between zero and four bits set to 1. We want the XOR of these l s, and the quickest way to do this in terms of execution speed (and the simplest to describe) is to use the Index variable as an index into a pre-calculated array of integers (not shown here) called lookup~, which contains values of 0 for even numbers of ls in the index and values of 1 (equivalent to a '1' in the If~r's bit 0 position) for odd numbers of ls in the index. Whichever value is returned from the lookup table is OR'ed into Ifsr to emulate the serial data input.
Chapter 17 Linear Feedback Shift Registers 2 2 5
One-to-~flany ~mplementations Consider the case of an 8-bit LFSR, for which the minimum number of taps that will generate a maximal-length sequence is four. In the real world, XOR gates only have two inputs, so a four-input XOR function has to be created using three XOR gates arranged as two levels of logic. Even in those cases where an LFSR does support a minimum of two taps, you may actually wish to use a greater number of taps such as eight (which would result in three levels of XOR logic). One LFSR application for which you may wish to use more taps is in the generation of Cyclic Redundancy Codes (CRCs) (these codes are discussed in more detail later in this chapter). In this case, the taps are selected such that an error in a single data bit will cause the max/mum possible disruption to the final contents of the register (these final contents are referred to as the checksum value in the case of CRC applications). Thus, in addition to their sequences being of maximal-length, these LFSRs may also be qualified as being maximal-
displacement. The problem is that increasing the levels of logic in the combinational feedback path can impact the maximum clocking frequency of the function. One solution is to transpose the many-to-one implementations discussed above into their one-tomany counterparts. The easiest way to explain this is by means of an illustration, but where are we going to find such beast at this time of the night .... Ah, I do believe that I can see one racing towards us as we speak (Figure 17-4).
(a) Many-to-one implementation
(b) One-to-many implementation Figure 17-4: Many-to-one versus one-to-many implementations The traditional many-to-one implementation for the eight-bit LFSR has taps at I'7,D~J]. To convert this into its one-to-many counterpart, the most-significant tap (which is always the most-significant bit) is fed back directly into the least
226
Designus Maximus Unleashed!
significant bit, and is also individually XORed with the other original taps (bits [3,2J] in this example). Note that although both styles result in maximal-length LFSRs, the actual sequences of values will differ between them. But the main point is that using the one-to-many style means that there is only ever one level of combinational logic in the feedback path, irrespective of the number of taps being employed.
Seeding an r.351~ As we previously discussed, one quirk with an XOR-based LFSR without any form of external data input is that, should it happen to find itself in the all 0s value, it will happily continue to shift all 0s indefinitely (similarly for an XNORbased LFSR and the all ls value). This is of particular concern when power is first applied to the circuit, because each register bit can randomly initialize containing either a logic 0 or a logic 1, and the LFSR can therefore wake up containing its "forbidden" value. For this reason, it is necessary to provide a mechanism to initialize the function with a seed value. One method for loading a seed value is to use registers with r e e ~ or e ~ inputs. A single control signal can be connected to the r e ~ inputs on some of the registers and the e ~ inputs on the other registers. Thus, when the control signal is placed in its active state, the LFSR will be loaded with a hard-wired seed value. In certain applications, however, it is desirable to be able to vary the seed value. One technique for achieving this is to include a multiplexer at the LFSR's input (Figure 17-5): X0~ 2:1 Mux data
select Figure 17-5: Loading alternative seed values When the data input is selected, the device functions as a standard shift register and any desired seed value may be loaded. After loading the seed value, the feedback path is selected and the device returns to its LFSR mode of operation.
Chapter 17 Linear Feedback Shift Registers 2 2 7
ising s
in
The fact that an LFSR generates an unusual sequence of values is irrelevant in many applications. For example, consider an SRAM-based 4-bit x 16-word First-ln First-Out (FIFO) memory device (Figure 17-6)"
data-in[3:0] --~
~--
|
4:16 decoder write pointer ~
~
~
j
v
.
~\~ ~
read pointer
4:16 decoder
data-out[3:0]
v
control ~-~
Figure 17-6: First-in first-out (FIFO) memory A brief summary of the operation of the FIFO is as follows. The wri~ poin~r and read pointer are essentially 4-bit registers, whose outputs are processed by 4:16 decoders to select one of the sixteen words in the memory array. The reset input is used to initialize the device, primarily by clearing the write pointer and read pointer such that they both point to the same memory word. This initialization also causes the empty output to be placed in its active state and the full output to be placed in its inactive state. The write and read pointers chase each other around the memory array in an endless loop. An active edge on the write input causes any data on the data-in bus to be written into the word pointed to by the write pointer; the empty output is placed in its inactive state and the write pointer is incremented to point to the next empty word. Data can be written into the FIFO until all the words in the array contain data. When the write pointer catches up to the read pointer, the full output is placed in its active state and no more data can be written into the device. An active edge on the read input causes the data in the word pointed to by the read pointer to be copied into the output register; the full output is placed in its inactive state and the read pointer is incremented to point to the next word
228
Designus Maximus Unleashed! .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...
.....
_
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
containing data, Data can be read out of the FIFO until the array is empty. When the rcaci point,er catches up to the wri~ point~r, the cmpt,y output is placed in its active state and no more data can be read out of the device. I~-I The write and road poin~r~ for a 16-word FIFO are often implemented using 4-bit binary counters. However, a moment's reflection reveals that there is no intrinsic advantage to a binary sequence for this particular application; the sequence generated by a 4-bit LFSR would serve equally well. Additionally, while the combinational "next-state" logic for the 4-bit binary counter requires a number of AND and OR gates, the combinational logic for the 4-bit LFSR only consists of a single 2-input XOR gate. This means that the LFSR requires fewer logic gates and wires, and is therefore significantly more efficient in terms of silicon real estate. Additionally, the LFSR's feedback only passes through a single level of logic, while the binary counter's feedback passes through multiple levels of logic. This means that the new data value to be loaded into the LFSR is available sooner, which allows the LFSR to be clocked at a higher frequency. These differentiations become even more pronounced for FIFOs with more words requiring pointers with more bits. Thus, implementing r~acl and writ,e poln~r~ using LFSRs should certainly be high on the list of things to consider for the discerning designer of FIFOs.
)Ylodifying s
to Sequence 2" Values
The main downside to the 4-bit LFSRs shown in the FIFO scenario above is that, unlike a binary counter's sequence of 16 values (24), they will only sequence through 15 values (24 -1). Designers may not regard this to be a problem, especially in the case of larger FIFOs. However, if it is required for an LFSR to sequence through every possible value, then there is a simple solution (Figure 17-7). XOR
T
@ 1
9
i.....
i0
.... ~:
0
zm: :~~:::::.:: ...e.
0
.am
.o
i
0 ~
mm
All Os
"
NOR
Figure 17-7" Modifying a 4-bit LFSRto sequence through 2" Values 2Note that although these discussions assume write-and-increment and read-and-increment techniques, some FIFOs use an increment-and-write and increment-and-read approach.
,
, ,
,
,
Chapter 17 LinearFeedbackShiftRegisters2 2 9
For the value where all the bits are 0 to appear, the preceding value must have comprised a logic I in the most-significant bit (MSB) and logic 0s in the remaining bit positions (the MSB in these examples is taken to be on the lefthand side of the register). In an unmodified LFSR, the next clock would result in a logic I in the least-significant bit (LSB) and logic 0s in the remaining bit positions. By comparison, in the case of our newly modified LFSR, the output from the NOR gate is a logic 0 for every state except two: 1) The value preceding the one where all of the bits are 0. 2) The value where all the bits are O. These two cases force the NOR gate's output to a logic I, which serves to invert the usual output from the XOR in the feedback loop. This in turn causes the sequence to first enter the all 0s value and then resume its normal course. In the case of LFSRs with XNOR feedback paths, the NOR gate can be replaced with an AND, which will cause the sequence to cycle through the value where all of the bits are 1.
~Iccessing t h e P r e v i o u s ' V a l u e In some applications it is required to make use of a register's previous value. For example, in certain FIFO implementations, the full condition is detected when the write pointer is pointing to the location preceding the location pointed to by the read pointer (try saying that ten times in rapid succession!). This implies that a comparator must be used to compare the current value in the write pointer with the previous value in the read pointer. Similarly, the empty condition may be detected when the read pointer is pointing to the location preceding the location pointed to by the write pointer. This implies that a second comparator must be used to compare the current value in the read pointer with the previous value in the write pointer. In the case of binary counters, there are two primary techniques by which the previous value in the sequence may be accessed. The first approach requires the provision of a second set of registers called "shadow registers" for each counter. Every time the read or write pointers are incremented, the current contents of that pointer are first copied into its shadow registers. Alternatively, each pointer can be equipped with an extra block of combinational logic, which can be used to decode the previous value from the current value. Unfortunately, both of these techniques involve a substantial overhead in terms of additional gates or registers. By comparison, LFSRs inherently remember their previous value, and all that is required is the addition of a single register appended to the most-significant bit (Figure 17-8).
230
Designus Maximus Unleashed!
XOl~
I I c0r ~ ITM
Previoue value
J .I
"1
Figure 17-8: Accessing an LFSR's previous value
Wealth of ~pplications Unfortunately we've only been able to skim the surface of LFSRs in these discussions. In addition to the FIFO applications introduced earlier, LFSRs are of use in a wealth of other applications, some examples of which are as follows:
Encryption and Decryption: The unusual sequence of values generated by an LFSR can be gainfully employed in the encryption (scrambling) and decryption (unscrambling) of data in a communications system. In the case of digital data, the data stream can be simply XOR'ed with the output of the LFSR to generate its encrypted equivalent (a similar process is used at the receiver to decrypt the incoming signal).
Data Communications: For certain communications applications in environments with significant background noise, modulating the data using a maximal-length LFSR gives the modulated signal an auto-correlation function that allows the data to be recovered despite a noise-to-signal 131excess of many dB. Data Integrity: A traditional application for LFSRs is in Cyclic Redundancy Code
(CRC) calculations, which can be used to detect errors in data communications by using the stream of data bits being transmitted to modify the values fed back into an LFSR. The final CRC value stored in the LFSR, which is known as a checksum, is dependent on every bit in the data stream. The receiver compares an internally generated checksum value with the checksum sent by the transmitter to determine whether any corruption occurred during the course of the transmission. This form of error detection is very efficient in terms of the small number bits that have to be transmitted in addition to the data. 3Yes, I really mean "noise-to-signal."
Chapter 17 Linear Feedback Shift Registers
Similar techniques can be used to generate a CRC based on the contents of a file. The resulting checksum is then attached to, or associated with, that file, and can subsequently be used to check that the contents of the file have not been corrupted. An offshoot of this application is in the detection of computer viruses (although more sophisticated viruses have been evolved that can counteract these measures). Data Compression: The CRC calculators discussed above can also be used in a data compression role. One such application is found in the circuit board test strategy known as functional test. The board is plugged into a functional tester by means of its edge connector. The tester applies a pattern of signals to the board's inputs, allows sufficient time for any effects to propagate around the board, and then compares the actual values seen on the outputs with a set of expected values stored in the system. This process is repeated for a series of input patterns which may number in the thousands.
If the board fails the preliminary tests, a more sophisticated form of analysis known as guided probe analysis may be employed to identify the cause of the failure. The tester instructs the operator to place the probe at a particular location on the board, then the entire sequence of test patterns is rerun. The tester compares the actual sequence of values seen by the probe with a sequence of expected values which are stored in the system. This process (placing the probe and running the tests) is repeated until the tester has isolated the faulty component or track. A major consideration when supporting a guided probe strategy is the amount of expected data that must be stored. One solution is to employ LFSR-based CRC calculators. The sequence of expected values for each track can be passed through a CRC calculator implemented in software. Similarly, the sequence of actual values seen by the guided probe can be passed through an identical CRC calculator implemented in hardware. In this case, the calculated checksum values are also known as signatures, while a guided probe process based on this technique may be referred to as signature analysis. Irrespective of the number of test patterns used, the system only has to store a single signature for each track. Additionally, for each application of the guided probe, the tester only has to compare the signature gathered by the probe with the expected signature stored in the system. Thus, compressing the data results in storage requirements that are orders of magnitude smaller and comparison times that are orders of magnitude faster than the uncompressed approach. Built-ln Self-Test: One test strategy which may be employed in complex integrated circuits is that of Built-ln Self Test (BIST). Devices using BIST contain special test generation and result gathering circuits, both of which can be
231
232
Designus Maximus Unleashed!
implemented using LFSRs (the results gathering LFSR generally requires some modifications to allow it to accept parallel data).
Pseudo-Random Numbers: Many computer programs (including games, digital and analog simulators, and graphics packages) rely on an element of randomness. LFSRs can be used to generate pseudo-random sequences, which can themselves be used to form pseudo-random numbers. In fact pseudorandom numbers have an advantage over truly random numbers, because a large proportion of computer applications typically require repeatability. However, designers also need the ability to modify the seed value of the pseudorandom number generator so as to spawn alternative pseudo-random sequences as required.
s
~ u t )Vot s
LFSRs are simple to construct and are useful for a wide variety of applications, but be warned that choosing the optimal polynomial (which ultimately boils down to selecting the tap points) for a particular application is a task that is usually reserved for a master of the mystic arts, not the least that the maths can be hairy enough to make a grown man break down and cry (and don't even get me started on the subject of cyclotomic polynomials). 141 And finally, the purpose of this discussion is not to persuade you to try to implement every aspect of every design using only LFSRs. However, when you're evaluating your next design, you might want to keep a watchful eye open for cases where an LFSR might be appropriate.
The topic in this chapter was published in a condensed form under the title The Ourol~oroe of the Digital Conecioueneee: Linear Feedback Shift Regietena, in the January 4th, 1996 Issue of EDN (www.ednmag.com),and is reproduced in its original form here with their kind permission. For your further reading pleasure, this article was Itself abstracted ~om the book: Bel~op to the Boolean Boogie (An Unconventional Guide to Electronice), ISBN 1-878707-22-1, with the kind permission of HighText Publications (www.hightext-publications.com) (see also the order form in the back of this book). .
.
.
.
.
.
.
.
.
.
4Because I haven't got the faintest clue what a cyclotomic polynomial is!
.
.
J
Chapter 18:
Designing a Three-Phase Clock "Have you ever h a d ' O n e of those d a y s ' ? " In this chapter you will discover: D o n ' t Hate M e Because I'm Beautiful! ........................................................ 234 Have You Ever Had O n e of Those Days? ................................................ 234 M a y h a p a State M a c h i n e Solution? ................................................................235 Extracting t h e Boolean Equations .........................................................................237 Optimizing t h e Boolean Equations ...................................................................... 238 Optimizing t h e I m p l e m e n t a t i o n ................................................................................239 Epilog ........................................................................................................................................................................... 240
234
Designus Maximus Unleashed!
Don't Hate ~ e ~ec~use ~7"m ~e~utiful! The following discussion first appeared as an article in EDN in the autumn of 1996. Sad to relate this was not my "finest hour," because I completely neglected a fundamental electronics concept, whose omission prompted one reader to email me saying: "Either this was a joke or you are a fool! ''(~I But the really scary thing is the number of readers who contacted me praising the piece. So I hope you enjoy this topic, but please don't judge me until you read the epilog.
H~ve ~ou Ever H~d One of ~hose D~ys? A few weeks ago I was wrestling with a cunning conundrum in the wee hours of the morning, when my email beeped in a threatening sort of way. "Good grief," I thought, "what sort of a lunatic would be playing on his computer at 3:00 AM. Don't people have lives anymore?" On perusing the ensuing message, I found a plaintive plea for help from one of my loyal readers (the other one being on vacation at the time). Promptly parsing the preamble I reached the first paragraph of substance, whereupon I was informed that this young man's manager had tasked him with creating some logic to transmogrify a master clock signal into three sub-clocks with certain requirements of a phase-relationship nature. Unfortunately l'm an engineer, not a literature major, which means that I think in terms of pictures and equations, not those little squiggly black whatchamacallits ...... words. What we needed here was a picture, so I closed my eyes and envisioned ...... (Figure 18-1 ). Ma~r
Phaee 1 Phase 2 Phase 3
Figure 18-I: The problem is to take a master clock and generate three sub-clocks
Ah, all became clear. The problem was obviously to take a master clock and generate three sub-clocks. Why couldn't the boy bring himself to say so in the first place? Reading on, I discovered that the master clock generator (which already existed) was some kind of voltage controlled oscillator, whose frequency could be varied, but whose duty-cycle could not be guaranteed. However, the three outputs, Phase I, Phase 2, and Phase 3, were each required IDon't hold back -- tell me what you really think :-)
Chapter 18 Designing a Three-Phase Clock
to have a duty-cycle reasonably close to 50% (that is, an equal mark-space ratio). Additionally, the Phase 2 output was required to lag the Phase I output by 60 degrees, while the Phase 3 output was required to lag the Phase 2 output by a further 60 degrees. What on earth was the lad babbling on about? Another figure was clearly the order of the day (Figure 18-2). Phase 1 Phase 2 Phase 3
i ! i
,
!
Figure 18-2: The Phase 2 clock must lag Phase I by 60 degrees, while Phase 3 must lag Phase 2 by a further 60 degrees Last but not least, the entire thing had to be constructed from as few "cheap and cheerful," commercially available, off-the-shelf components as possible (so what else is new?). So now I understood the problem, but did I really want this hot potato? Like most engineers l'm already putting in 26 hours a day at work, quite apart from trying to squeeze in a social life (or what passes for a social life in Huntsville, Alabama). So I took the only sensible course of action, which was to put the problem on the back burner, eat some breakfast, and head to bed.
]~ayhap a
State
]Vlachine
Solution?
The following morning I decided that my best course of action for dealing with this poser was to forget all about it, which is my usual mode of operation when my boss gives me something to do. But you know what it's like; some things just seem to keep on niggling away incessantly at the back of your mind, and the only way to put them to sleep is to solve the gosh-darned things. Initially I started pondering some highfalutin ideas involving sine-wave generators, but I quickly kicked that into touch. Next I began to have visions of phase-locked loops (PLLs), but I don't know anything about them, so that horse was dead at the starting gate for sure. Eventually I slapped myself around the head to knock some sense into it and started from ground-zero. First of all I re-drew my waveforms to include the master clock, which seemed to drop out naturally at six times the frequency of the phase clocks (Figure 18-3).
235
236
Designus Maximus Unleashed!
II
IIIIL_
i
i
:
Phaee 1 Phase 2 Phase 3 (3 (:3
w==-
Wl=
w===
C3 qp==
A
C3
C) (:3 C)
w==-
C) C)
(:3
w=== w=,=,
C)
w===
(3 (3
C)
One complete cycle Figure 18-3" The master clock falls out at six times the frequency of the phase clocks After a few seconds thought, one solution that sprang to mind was to implement this as a classical synchronous state machine, so I immediately added the binary values of the phase clocks to the bottom of the diagram. Furthermore, in this particular case, the problem was crying out for a registered-output implementation, in which the state variable registers directly drive the outputs (Figure 18-4). 12~
Next state (cl3, cl2, c11)" ~
Combinationallogic
/
Current state (p3. p2. pl)
= ~ .... ll==
Master clock ~
State variables
Phase 1 ~ ~Phase 2 Phase 3
Figure 18-4: A registered output implementation appeared to be the order of the day One reason why a registered output machine is so attractive in this case is that the outputs from this type of machine all transition simultaneously (or reasonably 2Registered output machines were introduced in Chapter 15.
. . . . . . . . . . . . . . . . . . .
,,,.,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Chapter 18 Designing a Three-Phase Clock .
.
.
,.
.
.
.
,.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
so), which usually isn't the case if we use combinational logic to generate the outputs by decoding the state variables. As there are six unique states, we only need three state variable registers to represent them should we decide to use a binary encoding technique. 13)
Extracting the ~oolean Equations Assume that we're using positive-edge triggered D-type flip-flops for our state variables. Let's call the data inputs to these registers c11, c12, and c13, where these inputs represent the machine's "next state". Also, we'll call the outputs from the registers pl, p2, and p3, where these outputs represent the machine's "current state", and they also serve to form the output clocks (pl - Phase 1, p2 = Phase 2, p3 = Pha~e 3). The next step is to draw a state table for the combinational logic and extract the Boolean equations (Figure 18-5).
d3 = (~p3 a p2 a pl) i (p3 a p2 a p~) i (p3 a p2 a ~p1) d2 = (-,p3 & -,p2 & pl) I (~,p3 & p2 & p1) I (p3 & p2 & p1) cll = (~p3 & ~p2 & ~pl) I(~p3 & ~p2 & p1) IC~p3 & p2 & pl) Figure | 8-5: Extracting Booleon equations from the state table ('~' = NOT, '&' = AND, and '1' = OR) Note that as we're only using six of the eight states that can be represented by our three binary-encoded state variables, we have to figure out what to do with the two unused states (the two bottom rows of the state table in Figure 18-5). Remember that D-type flip-flops can power-up containing random logic Os and logic l s, which means that our circuit could initialize in one of these unwanted states. Even worse, if we're not careful our machine could get hung-up in a loop, cycling back and forth between these two states. One technique would be to use D-type flip-flops with clear inputs for our state variables, and then create some
3Binary encoding was introduced in Chapter 15.
23 7
238
Designus Maximus Unleashed!
sort of power-on reset circuit to initialize them all to zero. However, we want to minimize the number of components and the complexity of the design as much as possible, so another alternative is to cause our unused states to jump into one of the used states (for example, state 000 as shown in Figure 18-5).
Optimizing the ~oolaan Equations Arrggghh! As soon as I'd extracted the Boolean equations I realized that l'd made a big mistake, which is the fact that I hate minimizing and optimizing Boolean equations. On the one hand I could have performed the minimization using a logic synthesis package such as BOOL, (4) but throwing logic synthesis at these piffling little equations seemed to be bordering on overkill, so I promptly did the next best thing, which was to throw them all away. My next line of attack was to employ the tricks that we learned at college in "Kamaugh Maps 101," which allow us to quickly and easily extract the optimized equations (Figure 18-6). Isl
n~._rJ1
r~2. ~1
p~
p~
p1
\p2,
p 3 ~ ~ O 0 01 11 10 1
, ~ - (p~- & p1) I (p3 & p~-)
,~. - (~p3 & p1) I (p~- & p1) dl ; (~p3 & ~p~.) I (~p3 & p1)
Figure 18-6: The optimized equations can be quickly determined using Karnaugh Map techniques Glancing at these expressions, it becomes apparent that certain product terms appear more than once; that is, (p2 & p1) appears in the equations for d3 and d2, while (~p3 & p1) appears in the equations for d2 and dl. Note that the inverted inputs can be taken directly from the flip-flops' complementary outputs, which means that we don't need to use any NOT gates. Thus, our penultimate implementation of the combinational logic only requires four 2-input AND gates and three 2-input OR gates (Figure 18-7).
4The BOOL logic synthesis program was introduced in Chapter 3, and is provided on the CD-ROM accompanying this book. SThe use of Karnaugh Maps is introduced in more detail in Bebop to the Boolean Boogie (An Unconventional Guide to Electronics), ISBN 1-878707-22-1, HighText Publications (www.hightextpublications.com) (see also the order form in the back of this book).
Chapter 18 Designing a Three-Phase Clock
p3 d3
p2 p1 .,.p3
I
d2
pl dl ~p2 Figure 18-7: Our penultimate implementation of the combinational logic only requires four 2-input AND gates and three 2-input ORs
Optimizing the ~Implementation Why did we refer to the above as being our "penultimate" solution? Well the maximum frequency we can use to clock our state machine depends on the propagation delays through the feedback path formed by the combinational logic. Assuming minimalist CMOS implementations (but excluding weird pass-transistor configurations), each AND and OR gate requires 6 transistors and uses two internal levels of transistors. This is because an AND gate is actually formed by inverting the output from a NAND with a NOT, while an OR gate is formed by inverting the output from a NOR. Thus, AND gates are slower than NANDs, while ORs are slower than NORs. To cut a long story short, all that is required is to perform a DeMorgan transformation of the OR gates driving the outputs and see what happens. (Remember that a DeMorgan transformation means that we swap the OR gates for NANDs and invert their inputs.) The end result is that the topology of the circuit stays "as-is," and the only thing that changes is that we replace all of the gates in Figure 18-7 (the slow ANDs and ORs) with NAND gates which require fewer transistors and which are faster (Figure 18-8).
239
240
Des/gnus Maximus Unleashed!
,=,
,r,
,,
p,...3 p2 cl3 p2 pl ci2
p1 cil ..,,p3 .-,p2 Figure 18-8: Our ultimate implementation of the combinational logic requires seven 2-input NAND gates So our final circuit requires three D-type flip-flops and seven 2-input NAND gates. If I were implementing this circuit l'd almost certainly use one of the smallest programmable devices that I could lay my hands on. However, one could also use an SN74xx175 (quad D-type flip-flop) and two SN74xx00 (quad 2-input NAND) devices, or components of that ilk. And there we have it, a swift ten minutes with a cup of java and my trusty white board, and the problem was laid to rest. l'd managed to increase the sum total of happiness in the world, and I was free to get on with the rest of my life. All that remained was to triumphantly email the solution back to its originator, which I promptly did with gusto and abandon, and then awaited the response in dread anticipation. Sad to relate my joy was short lived, because the following reply quickly came hurling my way over the Internet: " H m m m , that's really very nice, but I actually m e a n t to say 6 degrees out of phase, not 60 ...... " At this point there was much gnashing of teeth and renting of garb in my office, let me tell you. Have you ever had one of those days?
Epilog As I mentioned at the beginning of this chapter, I neglected to consider a fundamental electronics concept while pondering this puzzle. So my original title,
Chapter 18 Designing a Three-Phase Clock
which was "Have you ever had one of those clays," turned out to be far more appropriate than I could have hoped (or wished) for. Assume for the sake of argument that we are still interested in generating three clocks that are 60 degrees out of phase with each other. The concept that slipped my mind is known as a Johnson counter or a ring counter, which (at its most rudimentary level) simply involves daisy-chaining D-type flip-flops together as a shift register, and feeding back the inverted output from the last "flop" to drive the input to the register chain (Figure 18-9).
c!ock
.............. _
.............
_[
...............
Figure 18-9: A basic Johnson Counter implementation (without any bells and whistles) Note that the p l, p2, and p3 signals actually form our phase clock ou~uts. These signals could be used to drive the ou~uts directly, or they might be buffered if the loading on these ou~uts were significant. Assuming for the sake of discussion that the registers power-up containing OOO, then our circuit will follow the sequence shown in Figure 18-10. Apa~ from the fact that the Johnson Counter gives us just the ou~uts we require to generate our phase clocks, one interes~ng point to note is that these ou~uts follow a gray code sequence. This means that only one register (and therefore one output) changes state for each clock cycle, so the ou~uts are "glitch-free," which can be quite use~l in certain circumstances. Of course, the key point in the previous discussions was our assump~on that the registers powered-up containing OOO.
Figure 18- I0: The Johnson Counter gives the sequence we want
241
242
Designus Maximus Unleashed!
In fact we would be quite happy for the registers to power-up in any of our "good" states, which are 000, 100, 110, 111, O11, and OO1. But problems arise if the registers power up containing O10 or 101, because our circuit will simply commence to alternate between these two values. One solution would be to employ a special power-on reset circuit to force the registers into a "good" initial value. Alternatively, we might decide to modify our equations so as to coerce any "bad" states to transition into good ones. There are a number of ways to do this; for example:
Original Equations cll = ~p3 cl2 = pl ci3 = p2
New Equations cil = ~p3 I (pl & ~p2) cl2 = pl ci3 = p2
If the registers were to power-up containing 101, then our new equations would cause the next clock to load them with 110, which is one of the values in our main sequence. The only niggle is that if the registers were to power up containing O10, then the first clock would load them with 101, and only on the second clock would we proceed into the main sequence. Countering this problem is the fact that our new solution still requires very few gates (Figure 18-11).
Figure 18-I I: Modified Johnson Counter to escape from "bad" states (010 and 101 ) This new circuit is perfectly acceptable, it's just that having a single 2-input OR gate and a single 2-input AND offends the purist in me. In the real world, if this were a portion of a larger circuit and if we were implementing this circuit on a board using SSI and MSI components, then there's a good chance that we might have these gates "left over" from other portions of the circuit, in which case there's no problem. However, if we happened to be building our counter in
Chapter 18 Designing a Three-Phase Clock 2 4 3
isolation, then using a single OR gate from an integrated circuit that contains four of the little rascals is something of a waste of space (similarly in the case of the AND gate). Once again, all that is required is to perform a DeMorgan transformation of the OR gate and see what happens. Performing a DeMorgan transformation on the OR gate means that we swap the OR for a NAND and invert its inputs, which in turn means that we can lose the inverter on the p3 output and swap our AND for a NAND (Figure 18-12).
dffl i i~,,
pl
dff2
df~
i i i i i iiiiil iiiiiiiiiiiZiiii21
Figure 18-12: Performing a DeMorgan transformation allows us to replace the OR and AND with NANDs In fact the only reason we continue to show the inverter is for purposes of clarity. If we were using a quad D-type flip-flop device like an SN74xx175, which supports both true and complement outputs, then we could lose the remaining inverter and use the complemented (~p2) output to drive the NAND gate, which means that we could implement our entire three-phase clock generator using just two simple "jelly-bean" integrated circuits.
The topic in this chapter was published in a condensed form under the title Have You Ever Had One Of Those Day,a?, in the September 2nd, 1996 issue of EDN (www.ednmag.com), and is reproduced in its original form here with their kind ermission p 9
.............................................................................................................................................
l:iiii iiii !!iiii iii!
This Page Intentionally Left Blank
Chapter 19:
Field- Ping ram ma ble Devices "A perplexing plethora of parts" In this chapter you will discover: Taxonomy 246 T e r m i n o l o g y .............................................................................................................................................................. 246 Timeline ............................................................................................................................................................................ 247 P r o g r a m m i n g T e c h n o l o g i e s ........................................................................................................ 248 Fusible Links ( L a t e r a l a n d V e r t i c a l Fuses) ............................................................. 248 Antifuses ......................................................................................................................................................................... 250 EPROM Transistors .......................................................................................................................................... 250 EEPROM Transistors ....................................................................................................................................... 252 FLASH Transistors ............................................................................................................................................... 253 S i m p l e PLDs (SPLDs) .................................................................................................................................... 253 Removing Unwanted Fusible Links................................................................................... 254 Special SPLD Notation .................................................................................................................... 255 True and Inverse Inputs, and AND and OR Arrays ...................................... 255 Programmable Logic Arrays (PLAs) ................................................................................ 257 Programmable Array Logic (PAL) ..................................................................................... 258 Programmable Read-Only Memories (PROMs) ............................................... 258 Additional Programmable Options ................................................................................ 259 Reprogrammable PLDs.................................................................................................................. 263 Programming PLDs .............................................................................................................................. 264 C o m p l e x PLDs (CPLDs) ........................................................................................................................ 266 F i e l d - P r o g r a m m a b l e G a t e Arrays (FPGAs) .........................................................268 F i e l d - P r o g r a m m a b l e I n t e r c o n n e c t D e v i c e s (FPIDs) .............................272 F i e l d - P r o g r a m m a b l e A n a l o g a n d M i x e d - S i g n a l D e v i c e s ....... 273 ....................................................................................................................................................................
246 Des/gnus Maximus Unleashed/ ~axonomy The term Application-Specific Integrated Circuit (ASIC) is commonly used to refer to gate arrays, standard cell devices, and full custom integrated circuits, all of which may have their functionality defined by the designer, but all of which are actually constructed by a device vendor. However, in its more generic usage, ASIC also encompasses the group of devices referred to as Field-Programmable Devices (FPDs) (Figure 19-1). Generic ueage
Application-Specific InC~j rar Circuit
FieldProgammable D~vlce
Typical u~age
Figure 19-I. Taxonomy (ASlCs versus FPDs) FPD is the contemporary name for any integrated circuit that can be customized by the user "in the field." That is, if the user has access to the appropriate electronics design automation (EDA) software utilities and associated physical device programming tools, then he or she can customize an FPD to perform a certain task or function. One of the problems with FPDs is that there are a lot of them about (Figure 19-2).
i
~
~ir
/I o^L /f oM/
1
~'I
I...... "
/
IIIIIIIIIIIIIII
~'I
/I E E ~L" ~ "...../l
Figure 19-2: Taxonomy continued: FPDs encompass a plethora of parts
~erminology
Another problem with FPDs is that their terminology is subject to change. For example, in early 1995 the term "FPD" was generally considered to only apply to digital logic such as SPLDs, CPLDs, and FPGAs, but the FPD designation is now
Chapter 19 Field-Programmable Devices 2 4 7
genera]|y accepted to include Field-ProgrammableAnalog Devices (FPADs) and Field-Programmable Mixed-Signal Devices (FPMSDs). ASIC PLD SPLD CPLD PLA PAL GAL PROM EPROM EEPROM EPLD EEPLD FP@A FPID FPAD FPMSD
Application-Specific Integrated Circuit Programmable Logic Device Simple PLD Complex PLD Programmable Logic Array Programmable Array Logic Generic Array Logic Programmable Read-Only Memory Erasable Programmable Read-Only Memory Electrically-Erasable Programmable Read-Only Memory Erasable PLD Electrically-Erasable PLD Field-Programmable Gate Array Field-Programmable Interconnect Device Field-Programmable Analog Device Field-Programmable Mixed-Signal Device
imeline The first point-contact transistors arrived on the scene in late 1947. These were followed in the early 1950s by Bipolar Junction Transistors (BJTs), which were joined by Metal-Oxide Semiconductor Field-Effect Transistors (MOSFETs) in the early 1960s. Many references state that PLDs didn't arrive until the early 1970s, but prototype versions of these devices were available from around 1965. Similarly, many engineers believe that ASICs are a 1980s technology, but Fairchild introduced a device called the Micromosaic in 1967, which many regard as being the forerunner to the modern ASIC (Figure 19-3). 1945 1950 1955 1960
1965
1970
1975
1980 1985 1990
I
FPMSD~ FPAD~
I
............... i ...............
...............j
FPGAe ASICs PLDs
I
...............
ICs Transi~tona
Figure 19-3- Timeline (dates are approximate)
1995 2 0 0 0
248
Designus Maximus Unleashed/
Meanwhile, digital FPGAs began to arrive on the scene toward the end of the 1980s; their analog counterparts, FPADs, began to appear in 1994; and mixedsignal versions, FPMSDs, started to become commercially available in 1997.
programming ~echnologies The term Programming Technology (PT) refers to the physical techniques used to create the user-programmable switches that are employed in programmable devices. The most commonly used PTs are fusible links, antifuses, EPROM transistors, EEPROM cells/transistors, and SRAM cells (Figure 19-4)
Technology
Symbol
Fusible link
=' P1
Antlfuee EPROM
~
SPLDs
'1 []---
FPGAe
-II1
sPLD,, C LD,
-~1~ 1 / P7
SPLDs, CPLDe
/
EEPROM
Pr~ominantly used by
Figure 19-4: Alternative programming technologies As is illustrated by Figure 19-4, different PTs are predominantly associated with certain classes of devices. However, the reason for the qualifier "predominantly" is that there are always oddball components popping up at the fringes ("the
exceptions that prove the rule").
3usible s
(s
and Vertical 3uses)
As its name might suggest, a fusible link is similar to an electrical fuse, in that it involves a connection whose characteristics can be modified in a fairly forthright way by providing an excess of current. As we will come to see a little later in this chapter, fusible links are used to provide programmable connections in the AND and OR planes of SPLDs. There are two basic types of fusible links: lateralfuses and vertical fuses. A lateral fuse is based on an alloy wire in series with a bipolar junction transistor (Figure 19-5) This type of link starts off as a short circuit, but we can blow the fuse to create an open circuit. To program a lateral fuse we essentially pull the product wire high, pull the input wire low, and wait for the fuse to blow. The junction between the
Chapter 19 Field-Programmable Devices
249
base and emitter of a bipolar junction transistor acts like a diode, and the transistor can pass sufficient current to melt the tungsten-titanium fuse. This sort of fuse is referred to as One-Time Programmable (OTP), because once it's been programmed there's no going back. Pull-up resistor
Input wire
Input wire
Pull-up resistor
Product wire
wir~ b= y
Ir
To other Input wire~ and OR plane
f
'" i /' (])
S Fuse is blown
To other product wire~
Tungsten-titanium
fuse
To other Input wiree and 01~ plane
(b) l~ore programming
I',,
To other product wiree
(b) After programming
Figure 19-5: Fusible links: Lateral fuses By comparison, a vertical fuse consists of the diode formed at the base-emitter junction of a bipolar junction transistor. Ini~ally this sort of link starts off as an open circuit, because the transistor acts like two back-to-back diodes which prevent current from flowing. However, if a sequence of current pulses are forced through that transistor's emitter terminal, an "avalanche effect" occurs and the emitter collapses to create a short circuit (Figure 19-6). Pull-up
Input
resistor
wire
Pull-up re~ietor
%1
Product wire
Input wire
]
Product ~re
To other Input wiree and OR plane
To other input wire~
anti OR plane
To other product wire~
To other product ~r~
Alumlnlum
Oxide
Collector Sub~trate
Ox~e
,,., ~ uxl~e |
Ba~e Collector
Oxicle
~ubetrate
(b) Aff,er programming (a) Before programming Figure 19-6: Fusible links: Vertical fuses
250
Designus Maximus Unleashed!
The resistance of the vertical fuse is measured during the process of applying current pulses to the transistor, and the fuse is considered to be programmed when its resistance has fallen to a defined value. Once again, this type of fuse is referred to as One-Time Programmable (OTP), because once it's been blown there's no going back.
~ntifuses As an alternative to fusible links, some FPDs (predominantly CPLDs and FPGAs) employ antifuse technology. Antifuse links are formed by creating a via of amorphous (non-crystalline) silicon between two layers of metalization. In its unprogrammed state the amorphous silicon is an insulator with a very high resistance in excess of one billion ohms, but the user can program an antifuse link by applying signals of relatively high current (approximately 20 mA) to the device's inputs. The programming signal effectively grows a link by changing the insulating amorphous silicon into conducting polysilicon (Figure 19-7). Amorphoue
P'o~eilicon
el,con via
M~I
2 .
9 ,~: ,~i ,,i, ~.,i,,~....
~ <
(a) 13~ore pro~jrammin~j
Oxi~l~ ~
D,
.
.
.
.
.
.
.
.
.
'
.
.
.
.
.
(
Mdr..al 1 Oxicl~
(~) A('r.~r pro@ramming
Figure 19-7: Growing an antifuse link
The programmed links in the original antifuse technologies had resistance values in the order of 1,000 ohms, but modern processes have reduced this value to as low as 50 ohms, along with a parasitic capacitance in the region of I fF (one fempto Farad). As for fusible link technologies, devices based on antifuses are One-Time Programmable (OTP), because once an antifuse link has been grown it cannot be removed.
6 P l ~ O ~ ~ransistors Erasable Programmable Read-Only Memory (EPROM) transistors are so-called, because read-only memories of this flavor were the first devices to employ them; however, these transistors find application in a variety of FPDs. EPROM transistors have the same basic structure as standard MOS transistors, but with the addition of a second polysilicon floating gate isolated by layers of oxide (Figure 19-8)
Chapter 19 Field-Programmable Devices 2 5 1 5ouroe
t~rminal
Control gate terminal
Drain ~rminal
5oume
terminal
Control gate terminal
:iiiiiiii!ii~ :~i~!,il'!i!~i'ili!~ i' i~'~' i!~
Smcon ~
Souroe ~
Drain
Silicon "~
(a) Standard MO5 transistor
Drain t~rminal
eubetra~
(b) EPKOM traneieter
Figure 19-8: An EPROM transistor has a "floating gate" that is isolated by layers of oxide In its unprogrammed state, the floating gate is uncharged and doesn't affect the normal operation of the control gate. In order to program the transistor, a relatively high voltage in the order of 12V is applied between the gate and drain terminals. This causes the transistor to be turned hard on, and energetic electrons tunnel (force) their way from the substrate through the oxide into the floating gate in a process known as hot (high energy) electron injection. When the programming signal is removed, a negative charge remains on the floating gate. This charge is very stable and will not dissipate for more than a decade under normal operating conditions. The stored charge on the floating gate inhibits the normal operation of the control gate, and thus distinguishes those cells that have been programmed from those which have not. EPROM transistors are efficient in terms of silicon real estate, being only half the size of a dynamic-RAM (DRAM) cell and an order of magnitude smaller than fusible links. However, their main claim to fame is that they can be erased and reprogrammed. The transistor is erased by discharging the electrons on the floating gate. The energy required to discharge the electrons is provided by a source of ultraviolet radiation (UV). An FPD based on these transistors is delivered in a ceramic package with a small quartz window in the top, where this window is usually covered with a piece of opaque sticky tape. To erase the device, it is first removed from the circuit board, the quartz window is uncovered, and the package is placed in an enclosed container with an intense ultraviolet source. The main problems with this technology are the expensive packaging requirements, the necessity for a special erasing tool, the fact that the entire device must be erased (that is, it is not possible to erase individual portions of a device), and the time it takes to erase the devices (which is in the order of 20 minutes). Also, a foreseeable problem with next generation devices is
252
Designus Maximus Unleashed/
paradoxically related to the improvements in process technology which allow transistors to be made increasingly smaller. The problem is one of scaling, because not all of the structures on a silicon chip are shrinking at the same rate. Most notably, transistors are shrinking faster than the metal interconnections. Thus, as feature sizes become smaller, a larger percentage of the surface of the die is covered by metal, which makes it difficult for the ultraviolet light to penetrate the device and reach the floating gate.
E EpI
O)Yl
ransistors
Electrically-Erasable Programmable Read-Only Memory (EEPROM or E2PROM) transistors are so-called, because read-only memories of this flavor were the first devices to employ them. Once again, however, these transistors can be found in a variety of FPDs (Figure 19-9). Prain ~rminal
Control gate ~rmlnal
5ource ~rminal
Standard MO~ transistor
EEPROM
traneleCor
5111con dioxide 5111con eubetrate
==
=
I
(a) EEPROM transleCor
I
I
I
(a) EEPEOM cell
Figure 19-9: An EEPROM transistor and cell
An EEPROM cell is approximately 2.5 times larger than its EPROM equivalent, because it actually contains two transistors- a standard MOSFET and the EEPROM transistor shown here. The EEPROM transistor is similar to its EPROM counterpart in that it contains a floating gate, but this gate is constructed slightly differently such that part of it is situated very close to the transistor channel; also, the insulating oxide layers surrounding the floating gate are somewhat thinner. A large proportion of EEPROM devices use hot (high energy) electron injection and require 12V signals for programming similar to EPROM components. In this case, the transistor is programmed by connecting the source terminal and the drain terminal to GND (0 volts), while the control gate terminal is connected to + 12 volts. This causes electrons in the substrate to "tunnel" through the oxide layer into the floating gate where they are trapped. Additionally, due to their thinner oxide layers, EEPROM transistors can be erased by reversing the bias voltages used during programming, which causes any electrons stored in the
Chapter19 Field-ProgrammableDevices253 floating gate to "tunnel" back into the substrate, thereby returning the transistor to its original, unprogrammed state. Some designers regard 12V programming signals as an advantage, because these devices cannot be accidentally erased by means of the 5V (or lower) signals used during the circuit's normal operation. However, an increasing number of devices use cold (low energy) Fowler-Nordheim electron tunneling and can be programmed using 5V signals (3.3V and 3.0V devices are becoming available to accommodate portable computer and telecommunication equipment).
~E~SH ~ransistors Yet another technology, known as FLASH, is generally regarded as an evolutionary step that combines the best features from EPROM and EEPROM (the name FLASH is derived from its fast reprogramming time compared to EPROM). Although many engineers regard FLASH as being a fairly recent contender, it has actually been under development since the end of the 1970s and was officially described in 1985, but the technology did not initially receive a great deal of interest. Towards the end of the 1980s, however, the demand for portable computer and communication systems increased dramatically and FLASH began to attract the attention of designers. All variants of FLASH are electrically erasable like EEPROMs. Some devices are based on a single transistor cell, which provides a greater capacity than an EEPROM, but which must be erased and reprogrammed on a device-wide basis similar to an EPROM. Other devices are based on a dual transistor cell and can be erased and reprogrammed on a word-by-word basis. FLASH is considered to be of particular value when the designer requires the ability to reprogram a system in the field or via a communications link whilst the devices remain resident on the circuit board, where such devices are referred to as In-System
Programmable (ISP).
Simple P s
(SPf.Ds)
Experiments on prototype programmable devices commenced in the mid-1960s, and such devices became commercially available in the early-1970s. This class of components were first referred to as Programmable Logic Devices (PLDs), but more recently they have come to be called Simple PLDs (SPLDs) to contrast them with their bigger cousins Complex PLDs (CPLDs). Traditional PLDs are based on logic gates formed from diodes in wired-AND and wired-OR configurations (Figure 19-10).
254
DesignusMaximusUnleashed! D^
"
Logic 1
PI DB
D^
a -
~ y=a&b
(a) 2-input AND
(b) 2-input 01~
rl .-~ y = a l b
D~
,
Logic 0
Figure 19-10: Diode Implementations of AND and OR gates In the case of the 2-input AND, the pull-up resistor I~ attempts to hold the output y at a logic I value. When both of the inputs are at logic I, there is no difference in potential across the diodes and y remains at logic I. However, if input a is connected to logic O, diode D^ will conduct and y will be pulled down to logic O; similarly for input b and diode D~. Additional inputs can be formed by adding more diodes. In the case of the 2-input OR, the pull-down resistor I~ attempts to hold the output y at a logic 0 value. When both of the inputs are at logic O, there is no difference in potential across the diodes and y remains at logic O. However, if input a is connected to logic 1, diode D^ will conduct and y will be pulled up to logic I; similarly for input b and diode D~. Once again, additional inputs can be formed by adding more diodes.
Removing I J n w a n t e d ~usible s One of the most common SPLD technologies is that of fusible-links, where each diode has an associated fuse. The designer can individually remove unwanted fuses by applying pulses of relatively high voltage and current to the device's inputs. These pulses are sufficient to effectively vaporize the fuse. This process is typically known as programming the device, but may also be referred to as blowing or burning the device. As we previously noted, devices based on fusible links are said to be One-Time Programmable (O'f'P), because once a fuse has been blown it cannot be replaced (Figure 19-11).
Chapter 19 Field-Programmable Devices 2 5 5 DA
a ~ j~~
Logic 1
Logic 1
DB 1,4
D~
y=a&b&c
v
DA
DC
C \1/~ "/1\
y=a&b
Dc
(a) Programmed(2-input AND)
(a) Unprogrammed (3-input AND)
Figure 19-1 I" Removing unwanted fusible links
Special S P s
)Votation
Due to the fact that SPLD structures are very regular, and also because they differ from standard logic gates, a special form of notation has been adopted. Consider a device delivered with four diode-link pairs forming a 4-input AND function (Figure 19-12).
a
b
c
cl
The AND symbol indicates the ,, ,, function and encompasses any i ! i -,~ i additional components such as /" pull-up resistors and transistors. The diode-fuse pairs are represented by the presence or ,,,/' absence of crosses (a cross indicates that a corresponding link ............... AND has been left intact, while the y=a&b absence of a cross indicates that the link has been blown away). Figure 19-12: Special SPLD notation Thus, the circuit diagram for an unprogrammed fusible-link device will show crosses at each intersection of the matrix. ,,
~ r u e a n d ~ n v e r s e ~nput$, a n d ~)VD a n d OI~ ~lrrays To increase the versatility of SPLDs, their inputs are inverted inside the device and both true and inverse versions of the inputs are presented to an array (Figure 19-13).
256 Designus Maximus Unleashed! a
b
a
c
~r=
b ~b
c
c
N/A
~r=a&b&c
- - . 3 = N/A
s~=a&c
~t= a ~a
b
N/A
- - t - ~b & ~c a ~a
~e
(a) Unpmgramm~
b ~b
C ~t;
(b) rrogramm~
Figure 19-13: True a n d inverse inputs are p r e s e n t e d to the array
The number of AND functions is independent of the number of inputs, and additional ANDs can be formed by introducing more rows into the array. Similar techniques can be used to create an array of OR functions, and SPLDs often contain an AND array feeding into an OR array (Figure 19-14). a
t%
b
c
r=a&b&c
J~
x
)(
< ~f
\
b ~b ,x
~( ~
I
}\
)(
I t= ~b&~c
Pro~ramma:ble OR: array g
C ~C /
Programmable AND array
w
x
y -=y= Ca & b & c ) -=x = Ca & b & c ) i (~b& ~c) ; w = Ca & c) I (~~ & ~c)
Figure 19-14: An A N D array f e e d i n g into an OR array
Chapter 19 Field-Programmable Devices
The number of OR functions is independent of both the number of inputs and the number of AND functions, and additional ORs can be formed by introducing more columns into the OR array. The sum-of-products representations most often used to specify Boolean equations can be directly mapped onto these AND-OR structures, while other equation formats can be accommodated using standard Boolean algebraic techniques; for example, DeMorgan transformations. Having said this, SPLDs are not obliged to have AND input arrays feeding OR output arrays; some devices have ~ o NAND arrays; others have two NOR arrays; some have a NAND array driving a NOR array; and there's even a case called "folded logic," which is based on a single a~ay in which the outputs are fed back into the array to implement sum-of-products expressions. Although core SPLD concepts are relatively simple, the wider arena may prove to be a morass of con~sion to the unwa~. ~ e r e a r e a mul~plici~ of SPLD alternatives, most of which seem to. have mnemonics formed from different combinations of the same three or four le~ers. This may be a s~ategy to separate the "priests" from the "acolytes," or it may be that the inventors of the devices had no creative energy le~ to dream up meaning~l names for them. Whatever the case, the most common= SPLD variants are introduced below.
Programmable: s
~qrrays (ps
The most user-configurable of the ~adi~onal PLDs are
Arrays (PLAs), because (Figure 19-15)
Programmable Logic
both the A N D and OR a~ays are programmable
a
b
c
~
Pro~rar,m=~lel.,k
i ;aI
#'%
%,/ %. / X /
#' %,r / X
J
, ,,L,, ,, a ~a
P~rammable O~ array:
t i
b ~b
/
% /'
~
%f
c ~c /
Programmab~ AND array
x y Figure 19-15: Programmable logic array ( P ~ ) (programmable AND and programmable OR) w
257
258 Designus Maximus Unleashed! This diagram indicates a lateral fusible link technology, because all the links are present when the device is in its unprogrammed state; similarly, the following examples are based on this technology unless otherwise stated.
programmable 7qrray ~,ogic (P~E) (~ Many applications do not require both the AND and OR arrays to be programmable. For example, in the case of Programmable Array Logic (PAL) devices, only the AND array is programmable while the OR array is pre-defined (Figure 19-16). a b c ,1~ '1" Programmable link .LT #%
/\
1 \
Pre~leflne,~llink
t
)(, )I, ) ( ),, ")( ) ~
i
Pr~dlned OF, array
/
(,
(?,
(
..... I
1%
a \
~a
b ~b
c
~c
A
Programmable AND array
w
x
y
Figure 19-16: Programmable array logic (PAL) (programmable AND and predefined OR)
Although PALs are less flexible than PLAs, they operate significantly faster, because hard-wired connections take less time to switch than their programmable equivalents. Also, PALs are more efficient in terms of silicon real-estate, because hard-wired connections occupy less space than their programmable counterparts.
programmable Read-Only ~lemories (pi~O~s) The last of the traditional SPLDs are Programmable Read-Only Memories (PROMs), which can be viewed as being a predefined AND array driving a programmable OR array (Figure 19-17). IPAL is a registered trademark of Monolithic Memories Inc.
Chapter 19 Field-ProgrammableDevices 2 5 9 a
b
c
)~
Programmable link
.L T
Pre~lefine.,dlink
A~l~lree~ 0 A~l~lr~ 1 Addree~ 2 A~ldreee 3
:9
b~l~Ires~ 4
~.,
}
)<
,,"
>(
Programmable OF, array
Addreee ,5
^dciress 6
"=
;~
Addree~ 7
/
a ,,,a \
b ,,,b A Predefined AND array
c ,,,c
~
J
/
W
X
y
Figure 19-17: P r o g r a m m a b l e r e a d - o n l y m e m o r y (PROM) ( p r e d e f i n e d A N D a n d p r o g r a m m a b l e OR)
Remember that this is just one way to visualize these devices. In reality a PROM's internal architecture is more akin to a decoder driving a programmable OR array, while another way to view a PROM is as a decoder driving a two-dimensional array of memory cells. PROMs are generally considered to be memory devices, where each address applied to the inputs returns a value programmed into the device. However, PROMs are also PLDs in the classical sense, in that they can be used in the role of hardware truth tables or to implement equations requiring a large number of product terms.
~qdditi'onal Programmable Options Many PLDs have tri-state buffers on their outputs. All the tri-state buffers share a common enable control, which therefore requires only a single input pin on the device (Figure 19-18).
260
Designus Maximus Unleashed! From OR array .v
/
\
From OR array v
/
-enable
\
-enable
Caet~e Io~)
(actlvelow)
w
x
3/
x
Figure 19-18: PLD with tri-statable outputs
y
Figure 19-19: Programming a PLD's tri-statable outputs
Some devices may contain additional programmable fuses which have not been shown for reasons of clarity. By means of these additional fuses, each output can be individually programmed to be either permanently enabled or to remain under the control of the enable input as required (Figure 19-19). In this example, fuses have been blown such that the tri-state buffer driving the w output has been disconnected from the ~enable control and connected to a constant logic O. As a result, the w output is permanently enabled irrespective of the value on the ~enable input. To increase the fun, some devices are constructed in such a way that the tri-state outputs are fed back into the AND array (Figure 19-20). a
b
e
,I~
~, y
a ~a
~ ~
~' ~
b ~b
c ~c
~y
'~ ~
x
~x
~ ~
Programmable link
w ~w
~ ~
i
~i~
~
~
-enable (active low)
Figure 19-20: Some PLDs have tri-statable outputs feeding back into the AND array
w
x
y
C h a p t e r 19 F i e l d - P r o g r a m m a b l e D e v i c e s
26
Once again there are additional programmable fuses that have not been shown for reasons of clarity. By means of these fuses, each output's tri-state buffer can be individually configured so as to always drive a high-impedance Z state. Pins that are configured in this way can be used as additional inputs (Figure 19-21). a
b
c
~J~
/
y
~y
x
()
)( )
a ~a
\
b ~b
w ~w
l
\
I
)( ) \ /
f
Y
Pmgramm~ link (remove)
)
J
J
\/
~x
Programmable link
r
)
( ~C t
>
c ~c -enable
(active low)
w
x
y
Figure 19-2 I: Tri-statable outputs can be p r o g r a m m e d to act as inputs
In this example, the unprogrammed device commenced with three inputs and three outputs, but the user actually required a device with four inputs and only two outputs. By means of the appropriate fuses, the tri-state buffer on the y output was disconnected from the ~enable control and connected to a constant logic 1. Because w and x are still required to function as outputs, their associated links in the AND array must be blown away to ensure that these pins will not have any effect as inputs. The ability to configure pins as outputs or inputs provides a great amount of flexibility, and saves the designer from having to purchase myriad devices with every conceivable combination of inputs and outputs. Certain PLDs are equipped with registers on the outputs, and others with latches. Depending on the particular device, the registers (or latches) may be provided on all the outputs or on a subset of the outputs. Registered devices are particularly useful for implementing Finite State Machines (FSMs). All of the registers (or latches) typically share a common control signal, which therefore requires only a single input pin on the device (Figure 19-22).
262
Designus Maximus Unleashed! w
From 01~ array
clock
(poeitive edge)
a-type flipflop~
Figure 19-22: PLD with registered outputs In this example, the outputs are shown as being registered with D-type flip-flops, but alternative register types such as JK flip-flops or T-type flip-flops may be more suitable for certain applications. It can be inconvenient to supporting a dedicated device for each type of register. As a solution, some devices have configurable register elements whose type can be selected by programming appropriate fuses. Registered (or latched) outputs may also incorporate bypass multiplexers (Figure 19-23).
From OR array
clock (poeitive edge)
Multiplexer~
l~-type flipflope
Figure 19-23: Registered outputs with bypass multiplexers
Chapter 19 Field-Programmable Devices
By means of appropriate fuses, the control inputs to the multiplexers can be individually set to select either the non-registered data or its registered equivalent. There are also a variety of other common options, such as the ability to select true or complemented outputs. An individual PLD typically only provides a subset of the above capabilities, but these may be combined in a variety of ways; for example, registered outputs may be followed by tri-state buffers.
i~eprogrammable Ps One consideration with fusible link (and antifuse) technologies is that once they have been programmed there is no going back. This may be of particular concern with PROMs as the data they store is prone to change, but it's also true for the other PLDs. In reality, all of the components, including the diodes, transistors, and fuses are created on the surface of a single piece of silicon substrate (Figure 19-24a); however, it can be useful to visualize the device as consisting of two distinct strata (Figure 19-24b). In the real world, all the components (including the fuses) are constructed on a single piece of substrate
(a) Real world
We can visualize the components and fuses as occupying two distinct
Fuses
AND and OR arrays
(b) Pretend world
Figure 19-24: Visualizing components and fuses as occupying two distinct strata Earlier in our discussions we introduced the concepts of EPROM, EEPROM, and FLASH transistors and/or cells. To see how these fit into the scheme of things, we can simply visualize replacing the fuses in the previous example with these reprogrammable transistors (Figure 19-25). Once again, don't be confused by these figures. Irrespective of the technology used, from standard PLDs to EEPLDs, all of the components are created on the surface of a single piece of silicon substrate; it's just sometimes easier to regard the programming fuses/switches as occupying their own strata.
263
264
Designus Maximus Unleashed! Fusible
links
=j
m
~___
(a) PLD
EPI~OM transistors
-II f
EEPROM I transistors flt
(b) EPLD
(b) EEPLD
Figure 19-25: PLDs versus EPLDs and EEPLDs
Note that Generic Array Logic (GAL) devices are sophisticated versions of EEPLDs with a few extra "bells and whistles." Reprogrammable devices convey advantages over fusible link and antifuse devices, in that they can be more rigorously tested at the factory by undergoing one or more program and erase cycles before being shipped to the end user. Also, in those cases where components can be programmed while remaining resident on the circuit board, these devices are referred to as being In-System Programmable (ISP).
Programming Ps Programming a traditional PLD is relatively painless because there are dedicated computer programs and tools for the task. The user first creates a computer file known as a P/..D source file containing a textual Hardware Description Language (I-IDL) description of the required functionality (Figure 19-26).
Textual HDL (e.g. ABEL) o~p~
Optimization &
minimization
w,x. y, =
~i~i,~i~!i~i~i~i~i~~~!!?i~:~i~i/~i ~/~,i~i~i?:~i!
Figure 19-26: Using a textual source file to create a fuse file
Device knowledge database
Fuse file (e.g. JEDEC) o ~ o o o ,11 ~iol o~o o o~ oo z zo.oozooQ ~ . oo
z ~... .t.= i'll
Chapter 19 Field-Programmable Devices In addition to Boolean equations, the PLD source file may also support truth tables, state tables, and other constructs, all in textual format. Additional statements allow the user to specify which outputs are to be tri-statable, which are to be registered, and any of the other programmable options associated with PLDs. A special computer program is used to process the PLD source file. This program makes use of a knowledge database which contains details about the internal construction of all the different types of PLDs. After the user has instructed the program as to which type of device they wish to use, it analyzes the equations in the source file and performs algebraic minimization to ensure optimal utilization of the device's resources. The program accesses the knowledge database for details about the designated device and evaluates which fuses need to be blown to implement the desired functionality. The program then generates a textual output file comprising 0 and 1 characters which represent the fuses to be blown. There are a number of formats that can be used to represent these fuse files, where one c o m m o n standard is known as a JEDEC format. 121
Unprogrammed device
Programmed device
(a) Main computer
(b) Device programmer
Figure 19-27: Programming a physical SPLD As an alternative to the user specifying a particular device, the PLD program can be instructed to automatically select the best device for the task. The program can base its selection on a variety of criteria such as the speed, cost, and power consumption of the devices. The program may also be used to partition a large design across several devices in which case it will output a separate fuse file for each device. Finally, the designer takes a virgin device of the appropriate type and places it in a tool called a programmer, blower, or burner (Figure 19-27). The main 2The fuse files for PROM (and related) devices are typically represented using Intel Hex or Motorola S-Record formats.
265
266
Designus Maximus Unleashed!
computer passes the JEDEC file to the programmer, which uses the contents of the file to determine which fuses to burn. The designer presses the G0 button, the programmer applies the appropriate signals to the device's inputs, and a new device is born.
eomplex Ps
(ePs
A CPLD essentially consists of multiple SPLDs on a single chip. The programmable switches may be based on fusible links, antifuses, EPROM transistors, EEPROM transistors, or SRAM cells (Figure 19-28). Programmable In~rconn~
,.d
Input/output pine ~ .
SPLD-IIk=
blocke
Figure 19-28: Generic CPLD
Note that the programmable interconnect may contain a lot of wires (say 100), but that it would be impractical to feed all of these wires into each SPLD block. Thus, the blocks are interfaced to the interconnect using some form of programmable multiplexer (Figure 19-29). I00 wlres
~_ ..............................................
"'-,,,
/
~
wu~8~
.........................................
Figure 19-29: A multiplexer is used to select a subset of the programmable interconnect to be fed to each SPLD block
/
Chapter 19 Field-ProgrammableDevices2 6 7 As usual, both the logic and the programmable switches are really constructed on the same piece of silicon, but it is sometimes easier to visualize them as occupying two distinct strata (Figure 19-30). Programmable switchee Logic
Figure 19-30: Visualizing components and programmable switches as occupying two distinct strata In the case of CPLDs based on SRAM programmable switches, some variants increase their versatility by allowing individual blocks of SRAM to be used either as programmable switches or as an actual chunk of memory. Note that the above illustrations represent the architecture of a fairly simple CPLD, but that these devices can have a lot of SPLD blocks and be much more sophisticated (Figure 19-31 ). SPLD blocks Input/Output t
J
Figure 19-3 I: A more sophisticated CPLD architecture
Interconnect
268
Designus Maximus Unleashed!
One of the main advantages with respect to CPLDs is that, in addition to their reasonably high densities, they offer fairly predictable timing because of their regular structures. Over the last few years the market for CPLDs has grown significantly, and a variety of competing products are now available from several companies. CPLDs are being utilized in many commercial applications, one of the most significant being the re-working of existing SPLD-based designs into more cost effective implementations that use fewer chips.
3ield-Programmable
ate qrrays
SPLDs and CPLDs are tremendously useful for a wide variety of tasks, but they are somewhat limited by the structures of their programmable AND and OR planes. At the other end of spectrum are full-blown ASICs, which include gatearrays, standard cell, and full custom devices. Perhaps the simplest of the fullblown AS ICs are gate arrays, which are based on the concept of basic cells (Figure 19-32).
Pure CM05 baeic cell
BICMOS baeic cell
Figure 19-32: Examples of gate array basic cells Each ASIC vendor determines the mix of transistors, resistors, and other components that will be contained in their particular basic cell. Silicon die are then constructed containing large numbers of these basic cells, which may be arranged in a number of different ways; for example, consider channeled gate array architectures (Figure 19-33). There are two main facets to the silicon die: the transistors (and other components) and the interconnect. As we've seen, the transistors (in the form of basic cells) are pre-created by the foundry. The designer then provides a gatelevel netlist representing the desired functionality of the device, and the foundry creates the custom masks used to lay down the interconnect. Thus, these devices may be referred to as Mask Programmable Gate Arrays (MPGAs).
Chapter 19 Field-Programmable Devices I/0 cells/pads Channels \ Basic c
.....i,I~ ~ ...
.....~
Single-column arrays
...........
Dual-column arrays
Figure 19-33: Channeled gate array architectures
These devices are characterized by being very generic, having fine-grained architectures (at the level of primitive functions such as gates and registers), and having very high capacities of up to 800K equivalent gates or more, but they also have high startup costs and long lead times. Thus, there is a large gap between SPLDs and CPLDs at the lower end of complexity and ASICs at the high end (Figure 19-34). Generic
u~age
~
Application-Specific Circuit !i~~i !i~~i ~i~i ~i ~i ~i ~i ~i ~i!~i ~i !~i ~i ~i ~i !~i ~i !i~!~!~!~~i!i~!iIntegrateml ~i ~i ~i !~~i ~i i
iiiii/
i~!~i~i[i~i~i!~i~i~i~i~]!i~i~J~ii~i~Programmable i~i~i~ Device
ii!!i
liiiiiiilii i~iUili!iiiiiiiil
Figure 19-34: FPGAs were designed to bridge the gap between SPLDs and ASICs
Towards the end of the 1980s, a new breed of devices called FieldProgrammable Gate Arrays (FPGAs) began to appear on the scene. These devices combined many aspects of mask-programmable gate arrays (such as high-density) with characteristics of SPLDs and CPLDs (such as the ability to program them in the field) (Figure 19-35).
269
270
Designus Maximus Unleashed!
Figure 19-35: Spectrum of "application-specific" devices One differentiating factor is that the majority of FPGAs are coarse-grained, which means that they consist of islands of programmable logic surrounded by programmable interconnect. One of the things that makes FPGAs a little tricky to work with is that all of the vendors field their own unique architectures, but a generic course-grained device might be as shown in Figure 19-36. Froc3remmable Io~jIc block
i
~
84 .
.
.
.
.
Frogjrammable ewitchin~jmatrix
Fro~jrammable connection matrix
9
Figure 19-36: Generic coarse-grained FPGA architecture The majority of FPGAs are either based on antifuse or SRAM programmable switches. In the case of coarse-grained devices, the two predominant architectural variants are to use Look-Up Tables (LUTs) or multiplexers. First consider the LUT approach, of which there are two main sub-variants (Figure 19-37).
Chapter 19 Field-ProgrammableDevices 2 7 1 Required f u n c t i o n b
a
Truth table
b c
Y
iiili
.................................................................................................
y = (a & b) l c
C
y
, ............................................
.....Q....Q . ....!I . ....i...... S I ~ A M cells
a
~ iiiiiiiii!iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii!
iiiiiii
I!
.
.
.
.
.
J
%
iii .
a
b
c
Figure 19-37: Look-up table (LUT)-based FPGAs Assuming 3-input LUTs, one technique involves using a 3:8 decoder to select one of eight SRAM cells that contain the required truth table output values. Alternatively, the SRAM cells can be used to drive a "pyramid" structure of multiplexers, where the multiplexers are themselves controlled by the values on the input variables and "funnel down" to generate the output. Alternatively, as opposed to LUTs, some FPGA architectures are based almost purely on multiplexers (Figure 19-38)
y = (a & b) l c
0
c
0
Figure 19-38: Multiplexer-based approach (this example features an Actel-type logic module) Unfortunately, there are several problems with these coarse-grained FPGA architectures. First, interconnect delays are not as predictable as they are with SPLDs and CPLDs. Second, each vendor employs special fitting software to map
2 72 Designus Maximus Unleashed! ....
,,
,,
,,
,
,,,
....
,,
,,,,,,,,,,
. . . . . . . . .
,
.
,
,
,
.
.
.
.
.
.
.
.
.
,
. . . . . . . . . .
designs into their devices, which makes it well-nigh impossible to migrate a design from one vendor to another While maintaining anything like the same propagation delays. Third, the majority of synthesis tools (which are geared towards fine-grained AS|C architectures) output gate-level netlists, but FPGA fitting tools usually do a ,less than superb job of placement or packing from these netlists. Thus, in addition to relatively poor utilization of device resources, it's difficult to estimate realistic propagation delays prior to routing, which means that you sometimes have to perform a lot of finagling of the design in the downs~eam portions of the design cycle to make it work. However, there are some design techniques that are particularly apt for coursegrained FPGA architectures, such as the EDIF Library of Parameterized Modules (LPMs). <31Also, there is an increasing trend toward fine-grained FPGA architectures, such as QuickLogic's pASIC-2 TM devices, whose cells can act in either a course- or fine-grained fashion, or the CrossFire TM architecture from Crosspoint Solutions, which employs a super-fine-grained half-gate basic building block.
~ i e i d - P r o g r a m m a b l e ~Interconnect Devices ( PgDs) Field-Programmable Interconnect Devices (FPIDs), which may also be known as Field-Programmable Interconnect Chips (FPICs), are a relatively recent breed of components that act as SRAM-based switching matrices (Figure 19-39). c4~ FPIDe FPGAe, ASlCe, or
any other ICe
Figure 19-39: FPIDs are a relatively new breed of devices that act as SRAMbased switching matrices
3LPMs are introduced in more detail in Chapter 20. 4FPIC is a trademark of Aptix Corporation, San Jose, CA.
Chapter 19 Field-Programmable Devices
These devices, which are used to connect other devices together, can be dynamically reconfigured in the same way as standard SRAM-based FPGAs. Due to the fact that each FPID may have around 1,000 pins, only a few such devices are typically required on a circuit board. One example of an application based on these devices has been developed by Aptix themselves in the form of a reconfigurable development board, which allows the designer to effectively connect any point on the board to any other point (or points). This is similar in concept to traditional breadboards, except that instead of playing around with jumper cables (with their associated high parasitic inductances and so forth), designers uses a program to generate the FPID configuration data and download it directly into the development board, thereby allowing changes to the interconnect to be applied "on the fly."
3ield-Programmable
~qnalog a n d ~ i x e d - S i g n a l
,~
Devices One of the more exciting current trends in FPDs is their extension into the analog and mixed-signal domains. One of the first Field-Programmable Analog Devices (FPADs) was presented to the market in 1994 by IMP, San Jose, CA. This device, which IMP prefer to call an Electrically Programmable Analog Circuit (EPAC), contains a number of analog modules, such as analog multiplexers, sample-and-hold amplifiers, and so forth. IMP also provide an easy to use programming interface called "Analog Magic,'~s~which allows designers to specify the modules' parameters (such as the gain and offset of amplifiers) and how they are to be connected together. These EPACs are SRAM-based, which allows them to be reprogrammed "on the fly," but they also have an onboard block of EEPROM that can be loaded with the configuration data, thereby allowing them to adopt a default configuration on power-up. These devices are positioned for analog designers, but they are also of interest to digital designers who require a limited amount of analog in their designs. Another interesting device that deserves mention is the AL220 programmable analog fuzzy-logic microcontroller from Adaptive Logic, San Jose, CA. However, although the AL220 is ideal for implementing a wide variety of control systems, it's not really a general-purpose programmable analog device in the wider sense.
5A demonstration copy of Analog Magic is included in the IMP directory on the CD-ROM accompanying this book, along with a number of datasheets and suchlike, with the kind permission of IMP, San Jose, CA, USA (www.impweb.com, Tel: 408-432-9100). Note that in order to view some of these files you will require access to the Adobe Acrobat reader, which you can download from Adobe's web site at www. adobe, com.
273
274
Designus Maximus Unleashed!
In fact the first truly generic FPAD is being created by Motorola. The initial release of this device consists of a 4 x 5 array of 20 analog cells, each of which can be configured to act as a high-level function such as an amplifier, adder/ subtractor, integrator/differentiator, and so forth. The parameters of each cell (such as the gain of an amplifier) can be individually configured, and the designer can program the way in which all of the cells are connected together, thereby supporting quite sophisticated analog functions. Additionally, these SRAM-based devices (called MPAAs) share the same configuration scheme as Motorola's digital FPGAs (called MPAs), and both devices sport an interface bus allowing them to be connected together to implement two-chip mixed-signal designs. Motorola had functional silicon towards the end of 1996, and intended to commence the official beta sampling of the parts in early 1997. Last but not least, truly general-purpose, single-chip, programmable mixed-signal devices probably won't become available until around 1998 (although, just to cover myself, I wouldn't be at all surprised if this occurred sooner ...... or later). As usual, the life of a design engineer is nothing if not exciting, and the only real problem is finding the time to play with all of the capriciously cunning devices that the manufacturers are continually developing for us. The topic in this chapter was publlehect in a condensed form under the title Field-Programmal~leDevicee, in the October lOth, 1996 Issue of EDN (www.eclnmag.com), anti is reproduced in its original form here with their kind permlesion. For your further reading pleasure, the SPLD-por~ion of this article was Itself abstracted from the book: Bel~op to the Boolean Boogie (An Unconventional Guide to Electronice), ISBN 1-878707-22-1, with the kind permission of HighText Publications (www.hightext-publications.com) (see also the order form in the back of this book).
Chapter 20:
Library of Parameterized Modules "High-le vel design using Io w-le vel techniques "
In this chapter you will discover: S o m e You Win a n d S o m e ................................................................................................ 276 The First Taste of High-Level Design ....................................................................276 A S c h e m a t i c - O n l y A p p r o a c h .......................................................................................... 276 L a n g u a g e - D r i v e n Design ...................................................................................................... 277 Flowcharts a n d State D i a g r a m s ...............................................................................278 Problems with C o a r s e - G r a i n e d A r c h i t e c t u r e s .................................279 LPMs t o t h e Rescue ........................................................................................................................ 281 LPMs C a n ' t Do Everything .................................................................................................. 284 Epilog ........................................................................................................................................................................... 284
276
DesignusMaximusUnleashed! Some ~ou Win and Some
. . . . . .
When I first penned this piece in the summer of 1996, I had great hopes for the design technique known as Library of Parameterized Modules (LPMs). In fact the concepts embodied by LPMs live on and grow ever-stronger in proprietary offerings, but the advantages of the LPMs introduced here were inherent in their becoming an accepted industry standard .... which sadly failed to materialize. The underlying philosophy behind LPMs remains valid, however, so I decided to keep this topic for you to ruminate on.
~he 3irst ~aste of High-level Design There are a baffling and bewildering array of tools available to today's designers; so much so that one might question the need for yet one more. However, a relative newcomer known as the Library of Pararneterized Modules (LPMs) is proving its worth by providing a path for users to adopt a high-level design methodology using (conceptually) low-level techniques. Before we begin to examine LPMs in detail, it is appropriate to describe the "big picture" in order to understand where LPMs "fit in." One of the strange aspects about electronics is the way in which devices and methodologies appear, fade away, and reappear in different guises. If questioned, many would point to the early-1980s for the emergence of the Application-Specific Integrated Circuit (ASIC), but the concept actually originated one and a half decades earlier alongside the introduction of the conceptually much simpler Programmable Logic
Devices (PLDs). In 1967, Fairchild introduced a device called the Micromosaic, which contained a few hundred transistors. A key feature of the Micromosaic was that the transistors were not initially connected together. A designer used a computer program to specify the function the device was required to perform, and the program then determined the necessary interconnections and generated the masks required to complete the device. The Micromosaic therefore led the field as the forerunner to the modern ASIC and as one of the first real applications of computer-aided design. This device also exhibited one of the first examples, albeit rudimentary, of high-level design coupled with logic synthesis technology.
Schematic-Only ~ p p r o a c h Had the concepts behind Micromosaic been pursued, high-level design techniques would almost certainly have enjoyed widespread acceptance much sooner than they did, but, unfortunately, this technology faded away into the background for a while. This meant that when the use of ASICs started to
,,,,,..
,.
.
.
.
.
.
..
.
.
.
.
.
.
.
Chapter 20 Library of Parameterized Modules .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
,,
,,. .....
,,,
,,,,
, . ,
,, .
.
.
.
.
.
.
.
.
.
,
become somewhat more commonplace in the early-1980s, their design methodologies were based on the traditional techniques that had been established for circuit boards. Thus, designers of early ASICs used schematic capture to describe their contents as primitive logic functions and the connections between them. The schematic approach does convey certain advantages, not the least that it reflects the way in which designers think at the lowest level of abstraction, and it does allow expert designers to hand-craft extremely efficient functions. But gate-level schematics are time consuming to enter and they don't lend themselves to "what-if" analysis at the architectural level. Also, verification using simulation is extremely CPU intensive at the gate level, and it is difficult to re-target gate-level schematics to new device technologies due to the diverse symbols and design rules associated with the cell libraries provided by different vendors. Since the early ASICs typically only supported between 2,000 and 5,000 primitive gates, the schematic capture approach was at least tenable, and it remained in force throughout most of the 1980s. However, as gate counts continues to rise onwards and upwards through 10,000, 15000, 20,000, and so forth, it became increasingly difficult to design these devices using traditional techniques. Thus, the tail-end of the 1980s and the beginning of the 1990s saw the emergence of Language-Driven Design (LDD), which involved the combination of Hardware Description Languages (HDLs) and logic synthesis technology.
s
Design
The early examples of language-driven design involved proprietary languages, but the industry eventually standardized on two languages: VHDL and Verilog. HDLs are appropriate for describing both control and datapath logic at a reasonably high level of abstraction. This means that during the early stages of a project, designers can concentrate on the architecture of the design as opposed to worrying about implementation details. It is much more efficient to simulate designs at a high level of abstraction compared to the gate level, and it is far easier to perform "what-if" analysis at the architectural level. Synthesis technology then allows these high-level representations to be migrated to the implementation level, and it facilitates design reuse by facilitating the re-targeting of designs to alternative implementation technologies. LDD techniques are generally accepted to increase designer productivity (in terms of gates-per-day) by a factor of ten (or more) compared to designing at the gate-level. Based on these promises of glory, a number of early adopters became overly enthusiastic, in that they decided LDD was the only way to go
277
278
Designus Maximus Unleashed!
and discarded schematics as being "yesterday's technology." In reality nothing comes for free in this life, and LDD methodologies have their own problems, such as the fact that describing a design using an HDL doesn't usually reflect the way in which designers think. Also, in addition to being obliged to learn an HDL along with software design techniques and disciplines, designers have to understand how certain language constructs and statements will affect the synthesis tool. Today's synthesis utilities are very powerful and it's certainly possible to obtain good results using them, but it's also easy for designs to get "out-of-control." Writing the source HDL code in two different ways (which simulate exactly the same at the RTL level) can synthesize to radically different gate-level implementations. Even worse, certain portions of designs may not be amenable to logic synthesis techniques, in which case these portions inherently have to be captured at the gate level. Thus, the industry came to realize that schematics still had a role to play: first in describing the design graphically in terms of high-level functional blocks and the connections between them; and second, in describing certain portions of the design at the gate level. The resulting "mixed-level" design style offered the best of both worlds by allowing designers to mix schematic and HDL representations together.
~lowchart$ and State Diagrams The early mixed-level design tools allowed designers to create a schematic of functional blocks and the connections between them, where each block represented a functional unit. The designer could then "push" into a block and describe its contents as a gate-level schematic or as textual HDL using an appropriate editor. However, the problems still remained that the designer had to learn the HDL in question, and that there are numerous subtle tricks of the trade when it comes to creating HDL representations in such a way that they will synthesize efficiently. Thus, the current generation of mixed-level design tools augment the schematic and textual capabilities of the first generation by supporting more sophisticated graphical entry mechanisms such as flowcharts and state diagrams (Figure 20-1). Once again, designers can commence by creating a block-level schematic at a high level of abstraction, but on "pushing" into a block they can choose to represent its contents as a flowchart or a state diagram. These graphical representations can then be processed to automatically generate the HDL of choice (VHDL or Verilog) in a form that is guaranteed to simulate and synthesize efficiently.
Chapter 20 Libraryof ParameterizedModules 2 7 9 Graphical State Diagram \
\
13lock-level schematic
i
,
,,
/
"
ill
t~
Textual HDL when clock rises
If (~== o)
t h e n y = (a & b) I c; else y = c & I(ci ^ e);
2 //
/
Graphical Flowchar~
/
/
/
/
Gate-level schematic
Figure 20-I" Mixed-level design tools allow each portion of the design to be represented at the most appropriate level of abstraction
Other advantages of the graphical flowchart and state diagram techniques are that they closely reflect the way in which designers visualize the control portions of their designs, and they can be used by designers who are not expert (or even familiar) with the HDL in question. In fact, these graphical tools can also serve as a training aid, because the designer can gain familiarity with the language by analyzing the generated code.
Problems with eoarse-~rained ~rchitectures Unfortunately, although flowcharts and state diagrams are ideal for describing control functions, neither are particularly well suited to representing datapath logic. Another problem with language-driven design as a whole is that the majority of today's synthesis tools are focused toward, and most effective with, fine-grained architectures such as those found on ASICs. By "fine-grained" we mean devices that, at the implementation level as "seen" by the synthesis tool, are realized as primitive logic gates and simple register elements. Toward the end of the 1980s, a new breed of devices became available to designers: field-programmable gate arrays (FPGAs). Although some FPGAs are fine-grained, the majority are constructed around various flavors of coarse-grained architectures (Figure 20-2).
280 Designus Maximus Unleashed! Pm(~rammable Programmable logic block
switching matrix
Programmable
conne~Cion matrix
Figure 20-2: Some FPGAs (like the one shown here) have coarse-grained architectures which are strongly layout dependent
The problem with FPGAs is that each vendor can sport radically different architectures. Figure 20-2 is intended to illustrate a generic coarse-grained FPGA architecture. Each of the programmable logic blocks contains a number of primitive gates and registers (in some architectures the primitive gates are essentially used to implement small truth tables). The inputs and outputs from the programmable logic blocks are connected to programmable connection matrices, which are, in turn, connected to programmable switching matrices. This type of FPGA is strongly layout dependent in terms of functionality and propagation delays, and these devices require special "fit~ng" software to map a design into the device. Unfortunately, as we previously noted, LDD involving traditional logic synthesis is focused toward, and most effective with, fine-grained architectures. Thus, if the designer creates an HDL representation (either by hand or using a graphical technique), and subsequently migrates this representation to a gate-level netlist by means of logic synthesis, then, after the fitting tool has performed its task, the final result is typically less than optimal in terms of utilization and timing. To put this another way, if the designer could work at a slightly higher level of abstraction, such as multi-bit multiplexers, registers, counters, adders, and so forth, then the "fitter" could be made recognize these higher-level functions and treat them as special cases. The benefit of this approach is that the fitter could contain special hand-crafted algorithms for each of the higher level functions, and could therefore return more optimal results.
Chapter 20 Libraryof Parameterized Modules f.p)lr
to the I~escue
In a rare example of cooperation for the electronics industry, a group of design tool vendors and chip manufacturers met in October 1990 to hammer out the details of an alternative design methodology. Their goals were to offer a vendorindependent methodology that did not target any specific FPGA architecture; that did not oblige the user to learn an HDL; and that did not require any investment in synthesis technology - - the result was the Library of Parameterizecl Modules (L.PMs). The idea behind LPMs was to use graphical techniques based on a library of high-level parameterizable building blocks, similar in concept to connecting SSI and MSI devices together at the board level. It should be noted that the basic idea behind LPMs is not radically new, and there are a number of proprietary implementations of template-driven approaches available. But one of the key differentiators of the LPM approach is that they are part of a non-proprietary industry standard. The preliminary version of this standard was presented in March 1991, and was subsequently adopted by the EDIF committee as an extension to the EDIF 2.0.0 standard (EIA-548-A). The result was a generic set of twenty-five parameterizable templates for some of the most common datapath functions used in digital design. These templates (which allow designers to represent complex behavior without worrying about implementation-specific details) comprise the following functions: Basic Ga'~s
Invert, AND, OR, XOR, Mux, Constant, Decode, Tristate, Shift
Arithmetic Components
Add/Sub, Compare, Multiply, Counter, Absolute Value
Storage Components
Latch, DFF, TFF, RAM, ROM
Table Primitives
Truth Table, Finite State Machine
Pad Primitives
Input, Output, Bi-directionai
As an initial example, consider one of the simpler functions, the LPM_AND. This function supports two properties called width and size, where the width reflects the number of gates represented by the function and the size denotes the number of inputs to each gate (Figure 20-3).
281
2 8 2 DesignusMaximus Unleasheci! /
Equivalent to
LPM_AND wiclth = 3
eize = 2
Figure 20-3: The width and size parameters allow a single schematic symbol to represent multiple gates This means that an LPM_AND with wlclth = 3 and slze = 2 represents three 2-input AND gates, while a similar function with wlclth = 2 and slze -- 3 would indicate two 3-input AND gates. Even at this simple level, it is easy to see how the use of LPM symbols can help to reduce the complexity and increase the clarity of a schematic. But although these primitive functions are extremely useful for creating compact schematic representations, the more complex functions really begin to demonstrate the true power of LPMs. For example, consider the LPM_COUNTEI~ function (Figure 20-4). aloacl
eloacl
aconet
econet
aclr
eclr
Figure 20-4: The LPM_COUNTER function supports a number of parameterizable values, including its width and modulus
q[O..n]
elata[O..n]
clock enable
up~down
v
LPM_COUNTER
eqe-1 =
teetenable testin
t~s~out
First consider the ports (that is, the inputs and outputs). Some of these ports are of course mandatory, such as the data outputs, but many of the others are optional. In a schematic capture package the ports would have different colors: one color would be used to indicate inputs with mandatory connections, another color to indicate inputs requiring at least one connection, and another color to
Chapter 20 Library of Parameterized Modules
indicate optional inputs. This means that, if you're using LPMs at the schematic level, then you only need to connect signals to those ports that you wish to employ. For example, if you connect a signal to the aclr (asynchronous clear) input, then you've automatically instructed the system that you wish this function to have an asynchronous clear. Similarly, the decision as to whether or not the counter supports SCAN technology is made by either connecting signals to the appropriate pins (~stin, ~ s ~ u t , and tes~nable) or by leaving them unconnected. The point is that when you generate the EDIF ! LPM netlist from the schematic, only those pins to which you've made connections are referenced in the sub-circuit call. Additionally, when this netlist is passed to the technology-specific fitter tool for a FPGA device, the fitter only creates the minimum amount of logic required to generate the optimal implementation of this function. Perhaps of more interest is the fact that some capture systems allow you to automatically derive the width of a component by associating an appropriate parameter to the bus that's connected to it. For example, assume that the q[O..n] outputs from the LPM__COUNTERabove were connected to a bus, which was in turn connected to the inputs of a multiplexer and other LPM components. In this case, attaching a width parameter to the bus could be used to set the width of the counter, multiplexer, and so forth, thereby facilitating the speed and ease of implementing high-level design modifications. In addition to being able to customize LPMs by either connecting signals to ports or by leaving those ports unconnected, each function also supports a set of properties. For example, in the case of the LPM_COUNTER, it is possible to request an unsigned binary counter (the default) or a GRAY-code counter. Also, in addition to being able to specify the modulus of the counter, it is also possible to specify constant values that can be loaded either synchronously or asynchronously, and also an initialization value to be loaded to power-on. There are a number of reasons to use LPMs. One of the main reasons is that they provide efficient access to unique silicon architectures (such as those found in coarse-grained FPGAs) without requiring the designer to have detailed knowledge of the silicon technology. Also, although LPMs were not originally positioned as a capture methodology, they are actually very useful in this role, because designing with LPMs at the schematic level facilitates the rapid evaluation of alternative design architectures. Furthermore, some LPM tools also facilitate design re-use and re-targeting designs between alternative FPGA and ASIC implementations. For example, if you wish to re-target an LPM-based design from a coarse-grained FPGA
283
284
Designus Maximus Unleashed!
architecture to a fine-grained ASIC architecture, then some tools will generate a high-level HDL representation of the LPM which can be passed into your existing logic synthesis tools. This type of tool can obviously save you a great deal of time. For example, consider the LPM_COUNI~R example that we looked at above. Hand-crafting this function as textual HDL is a non-trivial task, requiring a substantial amount of verification by simulation before this code can be used with any level of confidence. By comparison, the HDL generated by an automatic tool inspires a much greater level of confidence.
s
ean't Do
Everything
In the same way that state diagrams and flowcharts are particularly useful for representing control functions, but are less effective for datapath logic; LPMs are particularly effective for representing datapath logic, but are much less appropriate for control functions. Thus, an optimal design solution will allow you to capture (and verify) your design using a mixture of whatever styles are appropriate for the various design portions. These styles include, but are not limited to, hierarchical schematics (including mixing block-level, gate-level, and LPM level), graphical techniques (including state diagrams and flowcharts), and textual HDLs.
Epilog As was noted at the beginning of this chapter, when this piece was first penned it appeared as though LPMs were destined for greatness and glory. Unfortunately, only a limited number of design-tool and FPGA vendors adopted LPMs wholeheartedly, while the rest continued to "do their own thing." The end result was that support for LPMs was quickly eroded. On the bright side, there are a variety of LPM-Iook-a-like tools available, some of which are surprisingly sophisticated, and each of which has its own capabilities and restrictions. However, the design-tool market is fast-moving to say the least, and these tools are evolving all the time, so there's little point in my giving a recommendation. Ultimately it's your responsibility to determine the feature set that best matches your particular requirements, and then to check with the various design-tool vendors to see what's available to meet your needs. The topic in this chapter was published in a condensed form under the title LPMs: High-level Design Using Low-level Techniques in the May 9th, 1996 issue of EDN (www.ednmag.com), and is reproduced in its original form here with their kind permission.
This Page Intentionally Left Blank
Chapter 21:
Reconfigurable Logic
"Logic that mutates while you wait" In this chapter you will discover: Stretch-Resistant Socks
288 Configurable Logic ......................................................................................................................... 288 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Reconfigurable Logic ..................................................................................................................289 Dynamically Reconfigurable Logic ....................................................................290 Virtual Logic .................................................................................................................................................. 293 Limitations of Current Design Methodologies a n d Tools .295
288
Designus Maximus Unleashed!
S t r e t c h - i ~ e $ i s t a n t Socks The phrase "reconfigurable logic" is similar to the phrase "stretch-resistant socks," in that they both mean different things to different people. As those us who are older, wiser, and a little sadder know, "stretch-resistant" actually refers to socks that will stretch ~ they just do their best to resist it for a while! Similarly, the term "reconfigurable" is subject to a myriad of diverse interpretations depending on the observer's point of view. As a starting point, the term "reconfigurable" refers to electronic products whose function can be customized to a particular system or application. There are obvious benefits to making one product (that can be customized) many times, as opposed to making many application-specific products once. Unfortunately, the perception of what is implied by reconfigurable is a moving target which is evolving over time as new technologies and techniques become available. Throughout most of the 1980s, the most sophisticated level of customization was displayed by products based on Programmable Logic Devices (PLDs). Products of this type were usually targeted at a particular application and then focused towards a specific implementation. By comparison, the advent of FieldProgrammable Gate Arrays (FPGAs) in the late 1980s opened the door to products which could be almost totally customized for radically diverse applications. In order to permit the presentation of a consistent view, we will adopt the following terminology:
Conflgurable Loglc:
A product whose function may be customized once or a very few times.
Reconflgurable Logic:
A product whose function may be customized many times.
Remotely Reconflgurable Logic:
A product whose function may be customized remotely, by telephone or radio, while remaining resident in the system.
Dynamically Reconflgurable Logic:
A product whose function may be customized on-the-fly while remaining resident in the system.
Virtual
Logic:
An extension of dynamically reconfigurable logic.
eonfigurable s A classic example of a product whose function may only be customized once is provided by car radios, of which there may be a variety of low-end, mid-range,
Chapter 21 geconfigurable Logic and high-end versions. It is not unusual for all of these variations to be constructed on identical circuit boards, which can be configured by adding or removing certain components and modifying certain switches or jumpers (Figure 21-1). cha.ee ~om~ Jumporelwire Iinke~ A~I~I e o m e 8r
A~ld but~one /~ ariel flaehing Iighte (a) Cheap and cheerPui model
(b) Expensive mo~iel
Figure 21- I: Car radios are an example of configumble logic
In fact it is common for a number of competing automotive manufacturers to use the same basic circuit board, and for the main differences between the various models to be the qualiW of their cases and the number of buttons and dials that they suppo~. Of course the circuit boards used in these radios are only configurable from the viewpoint of the manufacturer, from the perspective of the user their function is cast in stone. A similar example that may be a little closer to home revolves around digital wrist watches. It is possible to purchase very inexpensive watches with very limited functionality. It is also possible to purchase extremely expensive watches that can play sixteen immediately annoying tunes and simultaneously display the current ~me in Tokyo, London, and New York. But it's not beyond the bounds of possibility that both models contain identical integrated circuits! In the case of the simpler model, a hard-wired voltage level applied to one of the device's pins instructs it to pretend to be "cheap and cheer~l." Once again, the major difference b e ~ e e n the two models is the quality of their cases and... the price tag.
l~econfigurable s Hardware that is simply configurable is obviously limited, because evewthing that the product can do has to be designed into its base configuration which has to encompass all possible variants. One technique for producing a product whose function may be extended beyond its original design objectives is to base that product on devices that can be programmed. For example, PLDs can be employed to act as hardware troth tables, Boolean equations, or state machines (Figure 21-2).
289
290
Designus Maximus Unleashed!
abc
000 001 010 011 100 101 111 110
wxy 011 100 000
New PROM
~H~
abc ola P~
110
101 O01 100 101
~
wxy 0 0 0 101 O01 110 010 010 011 101 1 O0 010 101 001 111 111 110 100
Figure 21-2: Circuits using PROMs provide examples of reconfigurable logic In all of these cases, the functions of the truth tables, Boolean equations, and state machines can be modified by simply exchanging the programmable device with an upgraded version. Another, similar technique is to use non-volatile memory devices to store firmware programs for use by a microprocessor or a microcontroller. An example could be a set of instructions used by a microprocessor to play a tune such as the National Anthem on a musical door chime. Different versions of the PROM could be used to allow the product to be marketed in different countries. Additionally, in the case of those countries that count revolution as a national sport, the product could be easily reconfigured to reflect the "tune of the day." The above examples could employ PLD, EPLD, EEPLD, or FLASH-PLD components. In all of these cases, from a board-level perspective the board itself would be classified as reconfigurable. However, from a device-level viewpoint, PLDs would fall into the category of configurable, while their more sophisticated cousins, EPLDs, EEPLDs, and FLASH-PLDs, would be categorized as reconfigurable. Additionally, certain EE-based and FLASH-based components may be referred to as In-System Programmable (ISP), because they can be reprogrammed while remaining resident on the circuit board.
Dynamically I~econfigurable ~ogic The advent of SRAM-based FPGAs in the late 1980s presented a new capability to the electronics community: dynamically reconfigurable logic, which means designs that can be reconfigured "on-the-fly." Unfortunately FPGAs can be difficult to characterize, because each FPGA vendor fields a proprietary
chopto 2
Roco.r gu ob o Logic
architecture. However, a generic architecture that illustrates the sophistication of FPGAs as compared to traditional PLDs could be represented as follows (Figure 21-3). Programmable Programmable Programmable logic block
\
~wiCchin,g matrix
/
conne~ion matrix
/
Figure 21-3: Generic coarse-grained FPGA architecture The device consists of a number of programmable logic blocks, each of which are connected to a number of programmable connection matrices, which are in turn connected to a number of programmable switching matrices. Each programmable logic block may contain a selection of primitive logic gates and register elements. By programming the appropriate links, each logic block can be individually configured to provide a variety of combinational and/or sequential functions. The programmable connection matrices are used to establish links to the inputs and outputs of the relevant logic blocks, while the programmable switch matrices are used to route signals between the various connection matrices. In short, by programming the appropriate links in the connection and switch matrices, the inputs and outputs of any logic block can be connected to the inputs and outputs of any other logic block. FPGAs can contain a large number of logic gates and registers which can be connected together in widely different ways to achieve a desired function. SRAM-based variants augment the capabilities of standard FPGAs by allowing new configuration data to be down-loaded into the device by the main system in a fraction of a second. In the case of these devices, a few of the external pins are dedicated to the task of loading the data, these include enable, clock, and data inputs. When the enable input is placed in its active state, edges on the clock are used to load the device's SRAM with a stream of logic Os and logic ls, which are presented to the serial data input. Although all of the logic gates and SRAM cells are created on the surface of a single piece of silicon substrate, it may be useful to visualize the device as comprising two distinct strata: the logic gates and the programmable SRAM "switches" (Figure 21-4).
291
292
Designus Maximus Unleashed! Unltlallz~
~tat, a
clock enable " ~
, ~ M ~fueee"
c.,Iock
enable " ~ nary ~u~
Primand " / / inpute
land ~ute
Pdmanj inpu~
(a) Unconflgur~
/
(b) Conflgur~
Figure 21-4: SRAM-based FPGAs support dynamically reconfigurable logic The versatility of these devices, which may be referred to as In-Circuit Reconfigurable (ICR), opens the floodgates to a wealth of possibilities. For example, when a system is first powered-up, it might configure all of the FPGAs to perform diagnostic functions, both on themselves and on the circuit board on which they reside. After the diagnostic checks have been completed, the system can dynamically reconfigure the FPGAs to fulfill the main function of the design. Another example is illustrated by the Tomahawk cruise missile, which uses one technique to control itself while flying over water and another while soaring over land. When the Tomahawk crosses the boundary from water to land or vice versa, it causes its FPGAs to be dynamically reconfigured, changing from waternavigation mode to land-navigation mode in a fraction of a second. (Of course, some of us might take the view that it is inherently unwise to have an armed missile flying around in a mindless state while it reprograms its own brain, but philosophical questions such as these are beyond the scope of this article.) Perhaps a more interesting application of dynamically reconfigurable logic was proposed by the British company Pilkington Microelectronics (PMEL). PMEL's idea is to construct a device containing an array of DSP cores. Each of these cores would be dynamically reconfigurable, as would the interconnect between them. Perhaps the most interesting point about this proposed device is that the configuration data would not be loaded serially through special pins, but would instead be appended to the front-end of the data stream and loaded in parallel (Figure 21-5).
Chapter21 ReconfigurableLogic293 DSP Core~ Configuration data is appended to ~ont-encl of the clara ~tream
out
Data i
Figure 21-5: The power of dynamically reconfigurable logic is demonstrated by PMEL's proposed DSP device Assuming that your home computer contained such a device and that you wished to view a video file, then irrespective of the compression scheme used to create the video file (MPEG, Motion JPEG, fractal-based, and so forth), the operating system would simply append the appropriate configuration data to the front of the data stream and the device would reconfigure itself accordingly. The advantage of this technique is that it can accommodate evolutionary changes to existing compression algorithms, and has the potential to handle completely new compression techniques as and when they become available. The end result is that a computer based on this technology might actually survive more than three months before becoming obsolete!
Virtual ~.ogic The main limitation with the majority of SRAM-based FPGAs is that it is necessary to load the whole device. Apart from anything else, it is usually necessary to halt the operation of the entire circuit board while these devices are being reconfigured. Additionally, the contents of any registers in the FPGAs are irretrievably lost during the process. To address these issues, a new generation of FPGAs were introduced around the beginning of 1994. In addition to supporting the dynamic reconfiguration of selected portions of the internal logic, these devices also feature:
a) No disruption to the device's inputs and outputs. b) No disruption to the system-level clocking. c) The continued operation of any portions of the device that are not undergoing reconfiguration. ct) No disruption to the contents of internal registers cluring reconfiguration, even in the area being reconfigured.
294
Designus Maximus Unleashed!
The latter point is of particular interest, because it allows one instantiation of a function to hand over data to a new instantiation of a function. For example, a group of registers may initially be configured to act as a binary counter. Then, at some time determined by the main system, the same registers may be reconfigured to operate as a Linear Feedback Shift Register (LFSR), 111whose seed value is determined by the final contents of the counter before it was reconfigured. Although these devices are evolutionary in terms of technology, they are revolutionary in terms of the potential they offer. To reflect their new capabilities, appellations such as virtual logic, adaptive logic, and Cache Logic are beginning to emerge (Cache Logic is a trademark of Atmel Corporation). Because it appears likely that this nomenclature may quickly work its way into mainstream usage, it is appropriate to take a few moments to explain the roots of these terms. The phrase "virtual logic" is derived from its software equivalent, "virtual memory," and both are used to imply something that is not really there. In the case of virtual memory, the computer's operating system pretends that it has access to more memory than is actually available. For example, a program running on the computer may require ten mega-bytes to store its data, but the computer may have only five mega-bytes of memory available. To get around this problem, whenever the program attempts to access a memory location that does not physically exist, the operating system performs a sleight-of-hand and exchanges some of the contents in the memory with data on the hard disk. This practice, known as swapping, allows the program to perform its task, without having to wait while someone runs down to the store to buy some more memory chips. Similarly, the phrase "Cache Logic" is derived from it's similarity to the concept of "Cache Memory," in which high-speed, expensive SRAM is used to store active data, while the bulk of the data resides in slower, lower cost memory devices such as DRAM (in this context, "active data" refers to data or instructions that a program is currently using, or which the operating system believes that the program will want to use in the immediate future). In fact the concepts behind virtual logic are actually quite easy to understand. Each large macro-function in a device is usually formed by the combination of a number of smaller micro-functions such as counters, shift registers, and multiplexers. Two things become apparent when a group of macro-functions are divided into their respective micro-functions. First, there is functional redundancy, in which an element such as a counter may be used several times in different places. Second, there is functional latency, which means that only a portion of ~Linear Feedback Shift Registers were introduced in Chapter 17.
Chapter 21 Reconfigurable Logic
the micro-functions are active during any given clock cycle. Thus, the ability to dynamically reconfigure individual portions of a virtual logic device means that a relatively small amount of logic can be used to implement a number of different macro-functions. By tracking the occurrence and usage of each micro-function, then consolidating functionality and eliminating redundancy, virtual logic devices can perform far more complex tasks than they would appear to have logic gates available. For example, in a complex function requiring 10,000 equivalent gates, only 2,000 gates may be active at any one time. Thus, by storing, or caching, the functions implemented by the extra 8,000 gates in a separate memory device, the proponents of this technology claim that a smaller, faster 2,000-gate device can be used to replace a larger, slower 10,000 gate component (Figure 21-6). Configuration
data
etoreci in memorydevice Function'a' I Function 'b'
Function'a'
Function 'b' I
Function'c' I
Active tasks
Inactive taeks
~
UnuseJ resources
Figure 21-6: Some FPGAs support the concept of virtual logic In fact, it is even potentially possible to "compile" new design variations in real-time, which may be thought of as dynamically creating subroutines in hardware! Hence the phrase "adaptive logic" noted above.
s
ools
of ~urrent Design )~ethodologies ~nd
One of the major problems in the electronics industry today is managing designs with multiple variants, each of which may have a number of versions, each of which, in turn, may have a number of revisions. Another problem is in accurately representing sophisticated interconnect delay effects, particularly in the case of
295
296 OeslgnusMaxlmus Unleashed! deep submicron technologies. These problems are only exacerbated in the case of dynamically reconfigurable systems. Today's designers are armed with an impressive arsenal of tools which offer a wealth of capabilities, such as ~raphical representations (including state diagrams and flow charts), automatic HDL generation from these graphical representations, and synthesis technology to progress the designs to the implementation level. The problem is that today's tools are predominantly focused on static views of a design, in which the design's core functionality does not change. Additionally, the current method of communicating design data at the implementation-level between tools such as deslgn capture and design layout is usually in the form of netlists. The limitations of the tools dictate that each permutation of a design must be verified in isolation, which results in extremely cumbersome methodologies. Also, in the case of dynamically reconfigurable logic, the designer loses access to anything that is happening during the process of reconfiguration. Today's methodologies and tools can be used for designs that have a limited number of permutations; for example, designs with a self-test mode and an operating mode as discussed above. But the problems may well become untenable in the case of virtual logic, in which there may be hundreds or thousands of different configurations. Even worse is the possibility that, within the foreseeable future, some of these configurations may be determined by the hardware itself. If this seems unlikely, consider the somewhat equivalent case of neural networks, which can make decisions without their designers being fully cognizant of the process by which these decisions are arrived at. Thus, although the wave of new devices introduced above offer almost unbounded possibilities, current methodologies and design tools are ill-equipped to adequately manage dynamically reconfigurable designs. Addressing these methodologies and creating these tools is likely be one of the major tasks facing the design tool industry in the coming years. The topic in this chapter wa~ publish~ In a condensed form under the title Logic ~ a t Mutate~ WHile You Wait, in the November 7th, 1996 i~sue of EDN (www.ednmag.com), and is reproduced in its original form here with their kind permission. For your further reading pleasure, portions of this article were abstracted from the book: Bel~op to the Boolean Boogie (An Unconventional Guide to Electronic~), ISBN 1-878707-22-1, with the kind permission of HighText Publications (www.hightext-publicatlons.com) (~ee also the order form in the back of this book).
Chapter 22:
Genetic Algorithms
"Programs that boggle the mind" In this chapter you will discover: Solution Spaces a n d D r e a m s c a p e s ..................................................................298 Alternative Search Techniques ..................................................................................300 G e n e t i c Algorithms ..........................................................................................................................301 So W h a t Are They G o o d For? ......................................................................................... 304
298
DesignusMaximus Unleashed!
Solution S p a c e s a n d D r e a m s c a p e s Let's assume that (for purposes unknown) you construct a simple device consisting of a power source and two potentiometers ~ in series with an incandescent light bulb (Figure 22-1). Let's further assume that the potentiometers can be rotated through a 180 degree range from -90 degrees to +90 degrees, and that the point of least resistance for each potentiometer is in it's center (upright) position.
~ \ ~i//,/
Figure 22-I: A simple device consisting of a power source and two potentiometers in series with a light
Now suppose that you were to set the potentiometers to random positions, then you proffer the device to a friend whom you ask to play with it so as to make the light as bright as possible (using the potentiometers - they aren't allowed to do anything weird like doubling the supply voltage). Most people would select one of the potentiometers, turn it a little way to the left or right, and note the result. If the light gets brighter they will continue to turn the potentiometer in that direction. Alternatively, if the light dims, they will reverse their original action and try turning the potentiometer the other way. After a little experimentation, your friend would soon discover the optimal setting for the first potentiometer, for which the lamp's brightness reaches its peak. They would then turn their attentions to the second potentiometer and perform a similar sequence of actions. The phrase "solution space" refers to all of the possible solutions associated with a particular problem. We can represent the simple solution space corresponding to this problem in the form of a three-dimensional graph (Figure 22-2). In this particular case, we're assuming that the • axis corresponds to the rotation of the first potentiometer, the z axis corresponds to the second potentiometer, and the y axis reflects the brightness of the light. IPotentiometers are a form of variable resistor.
Chapter 22 Genetic Algorithms
Y
Figure 22-2: The solution space corresponding to our simple problem can be represented as a three-dimensional graph If we now assume the presence of some computer-controlled actuators capable of rotating the potentiometers, and also of some sensor to measure the brightness of the bulb (say a photo-detector, or perhaps some method of measuring the current in the circuit), then we could decide to write a computer program to automatically determine the optimal settings for the potentiometers. But what algorithm should our program employ? One possible approach would be to use a "hill climbing" algorithm, which is based in the same scenario we discussed with your imaginary friend. Using this technique, the computer would commence at some random point and make a small modification to one of the potentiometers. The program would then check to see if the quality of the solution (as reflected by the sensor measurement) had improved or otherwise. In the case of the former, the program would continue to effect changes to the potentiometer in the same direction; but if the original modification had caused the solution to worsen, then the computer would reverse the direction in which it had changed the potentiometer and try again. Also, the program would require some way to recognize when it had reached its "peak" with this potentiometer; that is, when a change in either direction causes the solution to worsen. At this point the program must realize that it's time to move on to the other potentiometer, otherwise the program would commence to oscillate around the first potentiometer's center position without progressing any further. This "hill-climbing" technique is certainly of use for some classes of problems, but it can run into trouble as the complexity of the solution space increases. For example, suppose we added more potentiometers to our device, and further suppose that the relationships between the various potentiometers were nontrivial; that is, turning a pot in one direction may cause the lamp to brighten or dim depending on the positions of one or more of the other potentiometers. In this case, our solution space may consist of a number of "hills" and "valleys" (Figure 22-3).
299
300
DesignusMaximusUnleashed#. The problem here is that our hill climbing algorithm is obliged to commence at some random point. This means that it's more than possible for the algorithm to manipulate the potentiometers in such a way that the program locates some local maxima (the top of one of the "hills"), but the algorithm has no way of Figure 22-3: Some solution spaces may "seeing" any other "hills," some contain a number of "hills" and "valleys" of which may be higher. A lowlevel solution would be to perform a number of searches, commencing each new search from a randomly generated starting point, but this still fails to provide 100% confidence that our algorithm has discovered the "Everest" of its solution space. Furthermore, as the number of variables used to describe a problem increases, the result can be a solution space containing constructs such as "tunnels, .... bridges," and even weirder elements like "cities in the sky" (the names of these constructs are fairly self-explanatory). As the topology of a solution space approaches these levels of complexity, finding the highest peak (or even being able to tell the difference between "up" and "down") becomes highly problematical.
~lternative
Search ~echniques
The hill climbing algorithm discussed above is only one of a variety of search techniques, where such techniques may be grouped into three main classes (Figure 22-4).
~I- -'~'
"'- I ~
J
L
I
~I- J'
,,,
:._,
"'""L ...... , /
Figure 22-4: Search techniques may be categorized as being calculus-based, enumerative, or guided random searches
Chapter 22 Genetic Algorithms
Calculus-based techniques can be sub-divided into two main groups called direct and indirect methods, where direct methods (such as those described by Fibonacci and Newton) skip merrily around the search space and evaluate the gradient at each new point as they trek around looking for peaks. Our hill climbing algorithm falls into this class, which is typically only of use in the case of "well-behaved" solution spaces. By comparison, enumerative techniques tend to search every point in the solution space, one at a time, so they are relatively easy to implement and are applicable to a wide range of problems, but they can take an inordinate amount of processing time, which can therefore make them effectively unusable for many problems. For example, consider another simple circuit involving 100 switches, each of which can be "up" or "down." Now assume that each of these switches provides a contribution to some measurable value (such as voltage, current, resistance, ...), but also that the amount of each switch's contribution (and whether or not said contribution is additive or subtractive) depends on the state of one or more of the other switches. (Note there's an underlying assumption that the switches' contributions and interactions are non-random, but instead reflect some underlying meaningful relationships.) This problem would certainly be applicable to a rudimentary enumerative solution, in that we could simply cycle through every possible combination of switches and measure the result. However, the fact that there are 2 ~~176 different combinations means that even if we evaluated 10 million solutions a second, it would be one heck of a long time before we found the optimal result (try working out the math for yourself, the answer will scare
you). The third major class of search algorithms are guided random searches, which are in some respects similar to enumerative techniques, except that they employ additional information to guide the search. One subclass of this category is simulated annealing, which is based on the thermodynamic model of cooling metals. (Algorithms based on the simulated annealing concept have appeared in some surprising applications, such as place-and-route software for circuit boards.) The other major subclass of guided random searches encompasses evolutionary algorithms, of which one of the most interesting groups (at least to me) is that of genetic algorithms.
~enetic ~lgorithm$ The key concept underlying genetic algorithms is that they mimic evolufionaw processes in the natural world; specifically, those of natural selection based on the "fitness" of individuals in a population that evolves by exchanging genetic material and also by random mutations. The principles underlying genetic
301
302
Designus Maximus Unleashed!
algorithms are actually quite simple, and were first described by J.H. Holland in the early 1970s. 121 First of all we manipulate the problem under consideration in such a way that its variables can be represented as a string of 0s and ls. This would obviously be a simple task in the case of variables that can occupy a limited number of discrete states, such as the switches in our 100-switch example discussed above (we'll consider how to handle analog variables a little later). Then we "seed" our environment with an initial "population" of randomly generated strings; that is, strings containing random sequences of 0s and ls (Figure 22-5a). Next we evaluate each string in our population by testing the measurable quantity (or quantities) in our system to see how close we came to some desired result. We use the results of this evaluation to assign a "fitness" to each string, then we rank the strings in order of this fitness (Figure 22-5b). Low-ranking strings may be discarded, while high-ranking strings represent the individuals that will be permitted to "breed"; that is, the strings that ranked the highest will be permitted to generate the offspring that will form part of the next generation. L
Initialization
iT
-i ~
Vl~
13 I~:I~t::I:~t~,1~
~
H Ii:I~;.t;t/t t~, - - - - ~ 1 - - ~ ~
H
[I I I~/ ~
D F I~! !.1;~!~t~ G I~:,! :i~ itili i~ H I:;I 11t lil.~ (a) Initial population
F
I 11
~r L
(b) After ranking
\
N GHI;I [~
/
(c) After breecling
Figure 22-5: After generating a random initial population (a), we rank the individual strings according to their fitness (b), then the fittest strings are allowed to "breed" (c)
Now here's the clever part, because the strings that will act as the parents for the next generation undergo a process known as "crossover," in which we emulate the natural process of exchanging genetic material in order to create "offspring" strings (Figure 22-5c). Note that the original parent strings remain part of this new population, because we don't want to discard the best solutions we've obtained thus far. Also note that we've only a shown a small initial population and the 2Adaptation in natural and artificial systems, Universityof Michigan Press, Ann Arbor, MI, 1975
Chapter 22 Genetic Algorithms
mating of two pairs of strings; in reality we would have a much larger population pool and there would be many such unions. Furthermore, some algorithms allow strings to "mate" proportional to their fitness, in which case a high-ranking string will be allowed to mate with more strings than a lower-ranking companion. Thus we see that a key feature to genetic algorithms is "survival of the fittest," whereby the fitter strings generate more offspring and therefore have a higher chance of passing their "genetic information" on to future generations. Now let's consider the process of crossover in a little more detail. First we take two strings that have been selected to "mate" (we'll focus on strings B and D from Figure 22-5). Next we randomly select a "crossover point," which can occur anywhere throughout the length of the strings, and we merge the left side of string B with the right side of string D, and vice versa; thereby generating two new strings (Figure 22-6). l~andomly generated cmeeover point /
From B
\
/
From D
\
BD
D
\
/
From D
\
/
From B
Figure 22-6: After generating a random "crossover point," we fragment the parent strings around this point and recombine them to generate their offspring Note that, for the purposes of simplicity, Figure 22-6 only considers strings containing 10 bits; in reality, the strings used by genetic algorithms can be hundreds of bits in length. One problem with the crossover technique is that it's possible to lose valuable "genetic" material. For example, if a number of strings evolve to have a group of 0s at the same location, then mating these strings will never manage to generate ls in those positions. Thus, another important component to genetic algorithms is that of mutation. Following crossover, every bit in each of the new strings has some finite chance of undergoing mutation (that is, our algorithm might decide to flip its state from a 0 to a 1, or vice versa). Note that the probability of mutation is maintained at a very low level (say a chance of 1 in 10,000); also note that each bit is treated independently, such that the mutation of one bit doesn't affect the probability of surrounding bits being mutated. From the above we can derive a high-level view of the genetic algorithm process as follows:
303
304
Designus Maximus Unleashed! .
.
.
.
.
.
.
.
.
.
.
.
.
.
, . . . . . . . . . . . . . . . . . .
E~tablish Establish Establish Establish
M,,
........
the encocling mechanism a fitness function a selection mechanlem crossover and mutation functions
Generate an Initial population While termination is nor, reachP~l { Evaluate the population Rank the population 5elect Individuals to mate Perform crossover Perform mutation } Encl
Note that all of the actions in the inner loop would equate to one cycle (or generation), and that termination of the loop may occur after some specific number of cycles (which could number in the tens of thousands) or when one or more solutions come close enough to the desired result. The above discussions focused on digital variables such as switches, but genetic algorithms can easily handle analog variables as well. For example, the angular position of the potentiometers in Figure 22-1 could easily be encoded into a certain number of bits; a l-byte field would allow us to encode the 180 degree range of one of the potentiometers to an accuracy of 0.7 degrees, and more bits could be used to increase this accuracy as required. Also, multiple potentiometers simply equate to more bits in the main string, and both analog and digital quantities can easily be mixed together, because the genetic algorithm just sees them as strings of 0s and l s.
So What Fire ~hey ~ o o d 3or? Although the whole concept of genetic algorithms might seem a bit nebulous, their use of pseudo-natural selection and mutation manages to direct the search towards regions of high potential in the solution space. These mechanisms also allow genetic algorithms to explore a greater range of potential solutions than do more conventional search techniques, and to converge on optimal results in complex solution spaces faster and more efficiently than other approaches. One application for genetic algorithms in the electronics arena would be to fine-tune analog circuits, which often have contradictory requirements such as increasing edge-rates while reducing power consumption. These problems, which contain large numbers of variables (component values) with complex
Chapter22 Genetic Algorithms
interrelationships, are ideally suited to a genetic algorithm approach. In this case the algorithm would be used in conjunction with an analog simulator, where the algorithm would be used to modify the variables and the simulator would be employed to measure the results (thereby determining the fitness of the genetic population). Two other fields of interest are those of fuzzy logic and neural networks. These disciplines have been successfully combined on a number of occasions, resulting in two hybrids: neuro-fuzzy (in which the neural network is used to establish a set of fuzzy rules) and fuzzy-neural (in which fuzzy logic is used to fine-tune a neural network). Similarly, the concept of genetic algorithms opens the doors to four more hybrids: genetic-neuro, neuro-genetic, genetic-fuzzy, and fuzzy-genetic. However, perhaps one of the more exciting potential areas for future research involves the possibility of combining the concepts of genetic algorithms with those of adaptive logic. As was discussed in Chapter 21, the phrase "adaptive logic" refers to the latest generation of FPGAs, whereby individual portions of the device can be reconfigured "on-the-fly" without disturbing the operation of the rest of the device. This makes it possible to "compile" new design variations in real-time, which may be thought of as dynamically creating subroutines in hardware! Thus, we might consider implementing a genetic algorithm in the form of dynamically evolving hardware. The implications of all of this may make the mind boggle, but it's better than being bored! The topic in this chapter was published in a condensed form under the title Genetic algorithms: programs that boggle the mind, in the March 3rd, 1997 issue of EDN (www.ednmag.com), and is reproduced in its original form here with their kind permission.
305
This Page Intentionally Left Blank
Chapter 23:
Reed Mi311erLogic "Hug an XOR gate today!" In this chapter you will discover: R e p r e s e n t a t i o n s O t h e r Than B o o l e a n ............................................................ 308 Pure Reed-M[~ller a n d K a r n a u g h M a p s ...................................................... 308 B i n a r y - t o - G r a y C o n v e r t e r s R e v e a l e d .............................................................. 311 Partial Reed-M[~ller I m p l e m e n t a t i o n s .............................................................. 312 Is t h e Partial Reed-M[~ller C o n c e p t A d v a n t a g e o u s ? ......... 314
308
Des/gnus Maximus Unleashed/
i ~ e p r e s e n t a t i o n s O t h e r than ~oolean As designers we're used to representing digital functions using Boolean expressions in conventional sum-of-products or product-of-sums forms, which we predominantly implement using AND, OR, NAND, NOR, and NOT gates. However, there are other forms of representation, such as Reed-MiJller logic, in which XOR gates are prominently featured. Employing XOR gates in functions often facilitates the testing of circuits, and such implementations can also offer significant benefits in terms of reduced area utilization by employing fewer transistors and tracks. The classical form of Reed-MUller logic is based on sum-of-products-style representations in which the OR gates are replaced with XORs. But there are also functions which can be implemented using only XOR and XNOR gates, which, for the purposes of this discussion, I have chosen to call "pure" Reed-Mtiller functions.
P u r e i~eed-~fiiller a n d Karnaugh m a p s One indication as to whether a function is suitable for a pure Reed-MUller form of implementation is if that function's Karnaugh Map displays a checkerboard pattern of 0s and ls. For example, consider the truth table and Karnaugh Map associated with a familiar 2-input function (where & = AND, I = OR, ^ = XOR and --- = NOT) (Figure 23-1).
~
b
y
0
0
1 0 I
1 1 O
aeb
'
00 01
/,I,I
::
'
11 10
Boolean
y = (~a & b) l(a & ~b)
....
i~eM Muller y=a^b
Figure 23-I: The checkerboard pattern for this 2-input function indicates its suitability for a pure Reed-MQller implementation
As this truth table is immediately recognizable as being that for an XOR function, it should come as no great surprise to find that implementing it as a single XOR gate is somewhat simpler than an implementation based on multiple AND, OR,
Chapter 23 Reed M(Jller Logic
and NOT gates. Furthermore, similar checkerboard patterns apply to functions with more inputs (Figure 23-2).
a,b
c~l~o0
c ~00
o
01 11 10
I i
iii!i
ili
iiiiiiiiiiiiiiiii!i!iiii',iiiiiiii~ii iiiii!ii!iiiiiiiiiill
y=a"b"c
01 11 10
oo 11
jii
ililiiiiiiiiilil;iiiiiii!iiiiiiilili i
i
i i i i i ii!i i i i i i i l
iiiiiiiiii;iiiiiilili;ii!ili i!iii!i!ililj!!iii!i?iiiiiiiiiiiiiii
.....::::::::::::::::::::::::::::
i
ili;ii!iiii)iiiii!i;iiiiiiiiliiiiiii
y = a " b" c " d
/
(a) 3-input function
(b) 4-input function
Figure 23-2: The checkerboard patterns for these 3- and 4-input functions indicate their suitability for pure Reed-MiJller implementations As XORs are both commutative (a " b) = (b ^ a) and associative (a " b) ^ c ----a ^ (b ^ c), it doesn't matter which combinations of inputs are applied to the individual gates. The simple checkerboard patterns illustrated thus far are reasonably well documented. However, while playing mind-games one evening, I realized that checkerboard patterns involving larger groups of 0s and l s also indicate functions suitable for pure Reed-MUller implementations. Some of these patterns may not initially appear to be obvious, so the trick I use is to visualize the Karnaugh Map as being a ceramic tile, and to then imagine the effect of covering a surface using a number of such tiles (Figure 23-3). Having recognized a checkerboard pattern, there is a quick "rule of thumb" for determining the variables to be used in the Reed-Mtiller implementation. First we select any group of 0s or l s and identify the significant and redundant input variables associated with that group (where the significant variables are those whose values are the same for all of the boxes forming the group, while the redundant variables are those whose values vary between boxes). The resulting
309
310
Designus Maximus Unleashed!
Reed-Mtiller expression is determined by simply XOR-ing the significant variables (note that it is irrelevant whether the significant variables have 0 or I values) (Figure 23-4). c~
a,b
mmm~mm~mm
O0 01 11 10 oo
mmm~mm~mmm)
iii!iiiiiiiii!il
mm~mm~)mm~) mmm~mm~al mmm~-mmm~mi iiiiiiiiiim iiiiiiim ~ m m ~ m m ~ ) um~)mm))mm)| mmm~mm~mm mmmnmm~um lmllimm))mmlj lmmmmmmml~mml~ mmmmmmummmmmm HPRPq~Pq~PqPqPqPiq~pqpqp
........i l ! l
11I~ ........i
10
.....i................. ..
iii!iiijiiiiiil ~ iiiiiiiiiii~iiii~
':ii~i~i)i!i~i~i~i~i~i~i~i~i~ii~li~i~)~!i)~i~)i)i~i)i~ii~i)ii)~i~i~i~.~.
iii!!iiiii!i!~:i!
!iiiiii!)iiiiiiiiii!iiiii)iiiiiliiiiiiiiiii!iiiiiiiii
iliiiiiiii!i!iill
il~:l I
iiiiiiiii:iiiii~i
J)':!):)i)) J))!)ii)i)))):i~!iJJi)i!F))J) J:Jl)):,) !))i:).:i,))j)ij
Figure 23-3: Visualizing a Karnaugh m a p as a floor tile can reveal larger checkerboard patterns suitable for pure Reed-MQller implementations a,b
a,b
O0 01 11 10 oo
oo
01 11
': i ;;" 9::~i::"
10
....
10:1
1:'
' L . . . .
, , iiJ, i . . . .
y=b"d
~ ............
",:1: 1
ol
l:~li
::,
~
,
c~
y=a"c
a,b
O0 01 11 10 O0 1 , ,:::
01 11
O0
,,:
10
y=a"b
O0
O0 01 11 10
O0 01 11 10 . ~ !,::i
01 11
i t-
10
a,b
11
-..:.:
..........
:
. . . . . . . . . .
::
.!
~ ~,,,,
J
,,:,
. : .
y=a^b^c
c~
.
.
.
y=b^c
a,b
.
~ 01
O0 01 11 10 I 1i ! :.
:,,:i
;.
11 .:1: lO
'
-
1 1
:
...........
,
:. ............ " . . . 1
y=a"b^cl
Figure 23-4: Example functions suitable for pure Reed-MQller implementation
Note that there are a tremendous number of these checkerboard patterns, and the ones shown in this article represent only a small sample. Due to the fact that all of the checkerboard patterns in Figure 23-4 include a logic 0 in the box in the upper left-hand corner of the Karnaugh Map (which corresponds to all of the inputs being 0), the resulting Reed-MUller implementations can be realized using
Chapter 23 Reed M~ller Logic 3 1 1 only XOR gates. In fact any pair of XOR gates may be replaced with XNORs, the only requirement being that there are an even n u m b e r of XNORs. However, if the checker-board pattern includes a logic 1 in the box in the upper left-hand corner of the Karnaugh Map, then the Reed-Mtiller implementation must contain an odd number of XNOR gates or an inversion at the output. As usual, it does not matter which combinations of inputs are applied to the individual XORs and XNORs (Figure 23-5).
~,b O 0 01
11 10
ooi 11
\a,b
~00
i
01
11
i i
..!::~ . ....... i .....i.......
10
i
ooi
........i :':........i i~
9.:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:, .................... :~=~
y = .,.(c " a )
y = .,.(a ,, c)
~a,b
~,b
01 11 10
c,~t ~ 0 0
O 0 01
01 11 10
10
i
11
11 ...i...............i..............= :.i..~..~:...~i..~:..~:..~i!..-i.....-..
y = - ( a ^ b ^ c ^ a)
C~~o0
O 0 O1 11
11 10
i iiiiii!il;ili!i!iiiii!iiiiill i);li)iii!iiiili;ilili!i!iiiill ilii-
01
. . . .!. .......:j......
..................
11
i
11 ~
i
:::::::::::::::::::::::::~::::::::::::::::::::::::::::::::::::::::::::::
1o i i liii
:::::::::::::::::::::::::::::::::::::::::::::::: ::::::::::::::::::::::::::::::::::::::::
= .,.(a " ,~)
10
i
10 i~
iiiiiiiiii[iiii::i::iiiii::i::iii::i::!ii::
,ii',iii',iiiiiiii I ,il
y = .,.(a " c " ,~)
Figure 23-5: A I in the upper left-hand corner of the map indicates implementations requiring an odd number of XNOR gates or an inversion at the output There are a variety of mathematical techniques for extracting Reed-Mtiller expressions, but these are typically time consuming and complex (they certainly make my eyes water). By comparison, the checkerboard Karnaugh Map patterns shown here are easy to recognize and evaluate.
~ i n a r y - t o - ~ r a y ~ o n v e r t e r $ i~evealed Reed-Mtiller implementations are often appropriate for circuits performing arithmetic or encoding functions; for example, binary-to-gray converters and their gray-to-binary counterparts (where a gray code is one in which only a single bit differs between adjacent states).
312
Designus Maximus Unleashed!
When digital text books come to consider the task of converting binary codes to gray codes and vice versa, they almost invariably show implementations based on XOR gates, but they rarely explain how these solutions were derived. As a student I found this immensely frustrating, because, although I could extract the Boolean equations, it was difficult to figure out how to boil them down into the XOR implementations presented in the books. However, the graphical techniques described above immediately reveal the potential for pure Reed-Mtiller implementations. For example, consider the case of a binary-to-gray code converter (Figure 23-6).
~,[z~:o] oooo 0000 ooo 1 0001 OOlO 0011
OOll OlOO OlOl o11o o111 lOOO lOOl lOlO lOll 11oo 11Ol 111o 1111
0010 0110 0111 0101 0100 1100 1101 1111 1110 1010 1011 1001 1000
~,[a:2] ~,r_.~:o] O0 01 11 10 .......... !ii!!i;i!i!;i~i 01 i!!!i!ii~i!ii!il !i!!;!i11ii!!!i 11 10
~
b[l:0] ~ 0 0 01 11 10
i,[2]
',)~
,~[2]
oo
11 :11~i
lO i
.•[5:2]
~,p.o] \ o o o l ~ 0 1
O0 1
,,,
11};
~rl]
'.!
?
11 lO 1 : 1:~ i I ,!
~,[o]
I
' ) ~
e[o]
....
11
10
':i :II 11
Figure 23-6: Checkerboard Karnaugh Map pafferns reveal that a binary-to-gray converter is amenable to a pure Reed-MQller solution
Similar checker-board patterns are also seen in the case of a gray-to-binary converter, but this is left as an exercise for the reader.
P a r t i a l I~eed-~iiller ~mplementations For those applications that are susceptible to pure Reed-M~iller solutions, the savings in the numbers of gates required to realize the function can be extremely significant. Unfortunately, only a relatively few functions are amenable to being implemented in this way, and these functions tend to be the ones that we already know about, such as parity generators and binary-to-gray converters. However, a
Chapter 23 Reed MOller Logic
surprising number of well-known functions are susceptible to partial Reed-Mtiller implementations. For example, consider a simple 4-bit synchronous binary counter based on four D-type flip-flops and some combinational logic (Figure 23-7). Next count v a l u e dl,
(d3,a2, dO)~
Combinational logic
Current count value
~/(q3'q2, ql, qO)
I1__~
Master clock
Outputs
FourD-type flip-flops
Figure 23-7: A 4-bit synchronous binary counter can be considered as a state machine
The traditional approach to determine the contents of the combinational logic is to draw a truth table showing the current count value as inputs and the next count value as outputs, and to then extract the Boolean equations for the combinational logic required to generate the next count value. This would result in the following first-pass solution:
d3 = (,-,q3 & q2 & ql & qO) I (q3 & ,-,q2) I (q3 & ,-,ql) I (q3 & ,-,qO) d2 = (~q2 & ql & qO) I (q2 & ~ql) I (q2 & ~qO) dl = (~ql & qO) I (ql & ~qO) do
= ~qo
Of course, this first pass solution neglects to take account of any characteristics associated with the physical implementation, such as the switching speeds of the logic gates or the number of transistors they require. However, we know that NAND gates require fewer transistors and switch faster than AND and OR gates, so we can use standard DeMorgan transformations to achieve a more optimal solution as follows:
d3 = .-,C.-,C.-,q3a q2 a ql a qo) a .-,Cq3 a ..,q2) a .-,Cq3 a ...ql) a ..,Cq3 a ..,qo) ) d2 = ~C~C~q2 a q~ a qo) a ~Cq2 a ~~) a ~Cq2 a ~qo) ) dl = ~(~(~ql & qO) & ~(ql & ~qO) ) dO
= ~qO
313
314
DesignusMaximusUnleashed! Now consider an alternative approach based on an old PLD designer's trick (well, he wasn't all that old), which we might class as being a partial Reed-MUller implementation. In the case of this particular function, any data input el(n) can be determined by XOR-ing the output from the corresponding ~iCn)register with the AND of any less-significant register outputs. This would lead to a first-pass solution as follows:
(~p.a~la~O)
a3
=
O ^
al
=
~,I"(~0
ao =
~~o
Once again, the first pass solution takes no account of any characteristics associated with the physical implementation. We could obtain a more efficacious solution by exchanging the AND gates with NANDs and replacing the XORs with XNORs as follows:
a2 __..-,( ~2 ^ .-,C~1 a qo) ) al = .-,( ~.1 ^ ..,~p ) ao
=
~~o
However, we can achieve a still more optimal solution by keeping the original XORs and driving their (non-NAND driven) inputs from the flip-flops' inverted outputs as follows: a3 az
= =
~ O ^ ~Nz a @ a ~o) ~~z ^ ~(@ a ~o)
,ril
=
,,,~1 "
ao
=
~~o
~(~0
The reason why this latter solution is preferable to .replacing the XORs with XNORs, is that it removes some of the loads from the flip-flops' true outputs. This balances the loading on the true and complement outputs, thereby reducing loading delays.
~Ts t h e P a r t i a l I~eed-jYl~ller e o n c e p t ~ d v a n t a g e o u s ? At this point it would not be unreasonable to ask whether there are any practical benefits to partial Reed-MUller implementations, or whether this is simply a pointless exercise in logic manipulation. To answer this question, consider the following results obtained for the optimized solutions described above using a commercial ASIC library based on an 0.9 micron CMOS technology (Table 23-1).
Chapter23 ReedM~llerLogic 3 1 5
nMaxeDelay ~ C
...........................................................................................................................................................................................................................
Traneistors Tracks Track Segmentee Table 23-1: A comparison ot lhe bina~, counler's combinational logic implemented using standard Boolean logic wilh an allernative padial Reed-MiJller implemenlation.
These results take full account of loading effects experienced by the registers' outputs, but do not include the transistors used to implement the registers (as the registers are common to both implementations). Note that the XORs in this particular ASIC library were buffered pass-transistor implementations, each requiring only ten transistors (as opposed to being constructed from ANDs, ORs, and NOTs which would require more transistors and would be slower). Also note that no slippery, slimy tricks of an underhand nature were involved ~ I optimized both solutions as best I could and I literally used the first ASIC data book that came to hand from my bookshelf. Thus, for this example, the partial Reed-MUller implementation is only slightly slower than its Boolean counterpart, but it offers significant benefits in terms of reduced area utilization because it requires fewer transistors and tracks. This example shows that partial Reed-MUller implementations can offer some interesting benefits to the designer. Unfortunately, the majority of currently available synthesis and optimization tools typically make minimal use of XOR gates due to the complexity of extracting these partial solutions, but it is more than possible that the use of partial Reed-Mtiller solutions will increase as new techniques become available.
316 Designus Maximus Unleashed! The topic in this chapter was published in a condensed form under the title i-lug an XOR gate today: An Introduction to Reed-lVl~iler logic, in the March 1st, 1996 issue of EDN (www.ecinmag.com), and is reproduced in its original form here with their kinci permission. For your further reading pleasure, portions of this a~icle were abstracted from the book: Bel~op to the Boolean Boogie (An Unconventional Guide to Electronice), ISBN 1-878707-22-1, with the Idnci permission of HighText Publications (www.hightextpublications.corn) (see also the order form in the back of this book). As a point of Interest, ! created a simple C program to examine truth tables, cletermine tho~e which are suitable for pure Reeci-M~ller implementations, anti extract the appropriate Reeci-M~iller expressions. One of my =back-burner' projecte is to try to develop thi~ program to recognize functions which are amenable to partial ReecI-M~ller solutions. 5houlci you be at all in~eres~ecl, you can fincl the source for this program in Bel~op to the Boolean Boogie. [!~
[i
Chapter 24:
Testing RAMs and ROMs "My m e m o r y is not what it used to be!" In this chapter you will discover: I Never Forget a Whatchamacallit
.....................................................................318
W h o s e Fault is it A n y w a y ? .................................................................................................. 318 The " N a m e l e s s " Test S e q u e n c e
.................................................................................... 319
The N a m e l e s s S e q u e n c e U n m a s k e d ! .............................................................320
A c c e s s Tests for a Single RAM ...................................................................................... 321 Internal Tests For a Single RAM ...................................................................................... 325
Testing ROMs ............................................................................................................................................... 327 Overcoming
Serial A c c e s s P r o b l e m s ...............................................................328
U n i q u e P r o b l e m s a n d U n i q u e Solutions ......................................................... 330
318
Designus Maximus Unleashed!
~I~lever 3 o r g e t a W h a t c h a m a c a l l i t While roaming the electronics groups on the Internet, one often sees plaintive pleas for help along the lines of: "Can anyone tell me a method for testing my memory?" Of course, the world being what it is, it's not uncommon for some wag to respond as follows: a) Look at these numbers for ten seconds: 23, 06, 93, 65, 87, 42. b) Now close your eyes and see if you can recite them forwards and backwards. However, although merry quips like this help to make the world go round, the situation tends to lose some of its humor when portions of your circuit are transmogrifying themselves into globs of incandescent slag. In fact, knowing how to test memory devices invariably comes in handy at some juncture and many of the concepts are applicable to testing digital logic in general, so I decided to pen a few choice words on the subject.
H / h o s e ~ a u l t is it ~ n y w a y ? The term "fault" refers to a physical failure mechanism such as a broken wire, while the term "fault-effect" refers to the way in which a fault manifests itself to the outside world. Although we are certainly interested in fault-effects at the device level, the fault-effects we most often see are the ones that appear at the primary outputs from the system under consideration. In the case of memory devices, faults can be categorized as being either functional or dynamic, where functional faults include bad memory cells or bad access to these cells, while dynamic faults refer to timing failures. For the purpose of this article we will only be considering functional faults. One set of functional faults are predominantly associated with the interconnect (both on the circuit board and in the device), the majority of which will be either stuck-at, bridging, or open faults. A stuck-at fault is a short between a signal and a ground or power plane, so (assuming positive logic) these are referred to as stuck-at-O and stuck-at-1 faults, respectively. Bridging faults are similar to stuckats in that they share common mechanisms (such as solder splashes at the board level or internal shorts at the device level), but in the case of a bridging fault, the unwanted connection is between two or more signals rather than between a signal and a power plane. Finally, an open fault refers to the lack of a desired connection, such as a broken track or a bad solder joint at the board level or a disconnected bonding wire at the device level. Open faults are referenced as open-O, open-l, or open-Z depending on the way in which they manifest themselves (where Z indicates a high-impedance value). For example,
Chapter 24 TestingRAMsand ROMs 3 1 9 an open-O fault indicates that a signal or input has b e c o m e disconnected from its driving device(s), and that this signal or input will consequently "float" to a weak logic 0 value.
~ h e '~Name/es$" ~ e s t Sequence Assuming for the m o m e n t that we're interested in a single RAM device (either in isolation or e m b e d d e d in the middle of a circuit), the first thing we need to do is to test our access to the device in the form of the address and data busses. The reason we perform these tests first is that they are relatively quick and painless, and it's only after we've proved that we can actually "talk" to the device that we would wish to proceed to the time-consuming process of verifying its internal structures. Before we look at the tests themselves, first, consider a group of eight wires n a m e d a through h, and assume that we can drive signals into one end of these wires and monitor the results at the other end. The task is to determine the minimum n u m b e r of test patterns that are required to detect every possible stuck-at, bridging, and open fault on these wires (Figure 24-1).
Eight wiresnamed 'a' thru 'h'
~
~> Reeponee
Stimulus
h
Figure 24-I- What is the minimum number of test patterns that are required to detect every possible stuck-at, bridge, and open fault on eight wires? First of all, we know that we must check that each wire can be driven to a logic 0 and a logic 1. This will ensure that there are no stuck-at faults and, ignoring any weird capacitive effects, no open faults. To do this we could use just two test patterns, 000000002 and I 11111112, but this would not reveal any bridging faults. In order to detect bridging faults we have to ensure that every wire can carry the opposite logic value to every other wire. One of the simplest test sequences is the "walking ones," in which each wire is driven with a logic I while all of the other wires are driven with logic Os. Thus, for n wires this sequence requires n test patterns, which, at a first glance, doesn't appear to be an unduly excessive requirement (Figure 24-2a). However, for reasons that will b e c o m e apparent, we often wish to use the smallest possible test sequence that we can. An alternative test sequence that I call the nameless
320
Des/gnus Maximus Unleashed/
sequence (because I made it up myself and had never actually seen it documented anywhere until after l'd penned this piece) commences by dividing the wires into two groups. We start by driving the "left-hand" group with logic l s and the "right-hand" group with logic Os; then we proceed to divide each group into two sub-groups, and to drive each "left-hand" sub-group with logic l s and each right-hand sub-group with logic Os. This continues until we have alternating logic l s and logic Os on each wire, at which point we terminate the sequence by simply inverting all of the wires (Figure 24-2b). a
b c d e f
g h
1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 O 0 1 0 0 0 0 0
0 0 0 0000 0 0 0 0 0 0 0 0 0
1 0 0 0 0 1000 0 0 1 O0 0 0 0 1 0 0 0 0 0 1 (a) Walking onee eequence
a b c d e f
g h
1 1 1 1 0 0 0 0 1 1 O0 1 1 O0
1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1 (b) Nameless sequence
Figure 24-2: The "nameless" s e q u e n c e requires fewer tests than a "walking ones"
The beauty of the nameless sequence is that whenever the number of wires double, we only have to add one new test pattern. That is, 8 wires require 4 test patterns, 16 wires require 5 test patterns, 32 wires require 6 test patterns, and so on. Thus, as the number of wires increase, so does the efficiency of the nameless sequence in comparison to the walking ones sequence. Note that if you're using the nameless sequence and the number of wires is not equal to a power of two (2, 4, 8, 16, 32, 64, ...), then you can add some imaginary wires to create the sequence and discard them at the end. For example, if you have 5, 6, or 7 wires, you would add enough pseudo-wires to bring the total number of wires up to 8, write the test sequence based on 8 wires, then drop the pseudo wires. ~he
,Nameless
Sequence
IAnmasked!
After this piece first appeared in EDN magazine, I received a jolly pleasant letter from Mr. Norman Megill, Vice President of Engineering at Production Services Corporation, Belmont, MA. Mr. Megill pointed out that my nameless sequence should more properly be referred to as the Modified Counting Sequence Algorithm as per a 1989 IEEE paper:
Chapter 24 Testing RAMs and ROMs 3 2 1
N. Jarwala and C.W. Yau, "A New Framework for Analyzing Test Generation and Diagnosis Algorithms for Wiring Interconnects," Proceedings, IEEE International Test Conference, 1989, pp. 63-70. Mr. Megill went on to note that, to the best of his knowledge, this algorithm was first documented by himself in a 1979 paper: N.D. Megill, "Techniques for Reducing Pattern Counts for Functional Testing," Digest of Papers, IEEE Test Conference 1979, pp. 90-94. In fact my discussions on the nameless sequence prompted a slew of emails from readers who had independently come up with the same thing. Furthermore, several readers noted a trick they used to generate a variation on the nameless sequence, which simply involves writing down a standard binary count sequence, commencing at 1, proceeding up to the number of wires you wish to test, and then "rotating" the results. For example, assuming that we wish to test 10 wires called a through j for stuck-ats, bridges, and open faults, we would commence by writing the binary values for I to 10 (Figure 24-3a). 1 -~0001 2-+0010 3-~001
a b c d e f 1
4-~0100
"Rotate"
5-+0101
j
1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 1 1 0
0 1 0 1 0 1 0 1 0 1
6-~0110
(b) Resulting t ~ s t eequence
7 ~ 0 1 1 1 8-+
g h i
1000
9-+1001 10-~1010 (a) Standard binany count
Figure 24-3: Generating a variation of the nameless sequence
Once we've generated the binary count, we "rotate" the table 90 degrees clockwise (or anti-clockwise if you are so-inclined) to create the final test sequence (Figure 24-3b). This scheme has an advantage over my nameless sequence in that it results in one less test for any number of wires except 2" (I learn something new every day).
Iccess
ests f o r a Single i
Ij /I
As we previously noted, the first thing that we need to do is to ensure that we can actually "talk" to the RAM in question, and for this we need to check the address
322
Designus Maximus Unleashed!
and data busses to make sure that there aren't any stuck-at, bridging, or open faults. As you might expect, we can base these tests on either the "walking ones" sequence or the "nameless sequence" as discussed above. Now, you may be thinking that this is reasonably trivial, but in fact there are numerous pitfalls for the unwary. Let's consider a simple RAM that's 256 words deep and 1 bit wide; that is, a device with an 8-bit address bus and a l-bit data bus. It is immediately obvious that we can't test the address bus in the same way in which we treated the simple wires in Figure 24-1, because, although we can apply stimulus to one end of the address bus, we can't directly monitor any response at the other end of the bus. The problem is that the device itself acts as a buffer, and the only way to monitor the actions of the address bus is by means of the data bus, which, in this case, is only l-bit wide. Assuming that we decide to base our tests on a "walking ones" sequence, then one approach would be as shown in Figure 24-4. Valu~e w r i t ~ n ~nd reaa on t h 8 1 - b i t ~ a t a hue / v \
aaareoo[7:0] All O~
~ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 O0 0000 1000 0 0 0 1 0 0 0 0 O0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0
paoo 0 I 0 0 0 0 0 0 0 0
pa~)oI
paso 2
0 I 0 0 0 0 0 0 0
0 0 I 0 0 0 0 0 0
pa~8~ 3 thru 8
Figure 24-4: This RAM access test based on a "walking ones" sequence requires 81 write cycles and 81 read cycles. Note that we've augmented the walking ones sequence to include an initial address of all Os. This wasn't necessary in the case of the simple wires in Figure 24-1, because we could actually "see" the stimulus values. But as we just noted, the RAM acts like a buffer, which means that we can't directly "see" the values on the address bus. So if we omitted our new all Os pattern, then we wouldn't be able to detect any cases where one of the address bits was shorted to a logic O. Due to the fact that this RAM is only l-bit wide, the test sequence actually requires a number of cycles, or passes. The first pass commences by writing a 1 into the first address location followed by 0s into all of the other locations, and then reading all of the locations back to ensure that the data is as it should be.
Chapter 24 Testing RAMs a n d ROMs
The entire pass is then repeated eight more times, changing the location containing the 1 each time. Thus, the total test would require 81 writes and 81 reads (9 writes and 9 reads for each of the 9 passes). We can improve this test somewhat by noting that only two actually change state for each pass. Thus, we could shorten writes on the first pass, followed by only two writes on each (resulting in 25 writes in total), but we'd still require 9 reads that our writes hadn't caused any other bits to change.
bits in the memory the test to have 9 subsequent pass per pass to ensure
Now consider the effect of basing our test on the "nameless" sequence (Figure 24-5). Note that we have to augment the nameless sequence to include an initial address of all Os (for the same reasons as discussed above for the walking ones test). But this still means that we only have five address patterns to worry about, so we will only have to perform five passes. The first pass requires 5 writes and 5 reads, while each subsequent pass requires 2 writes and 5 reads, resulting in a grand total of 13 writes and 25 reads (as opposed to the 25 writes and 81 reads required for the optimized "walking ones" test). Values written and read on the 1-bit data bus v
/
address[7:0] All Os
0 1 1 1 0
0 0 0 0 0 0 0 1 1 1 0 0 0 0 1 O0 1 1 O0 0 1 0 1 0 1 0 1 0 1 0 1 0 1
pass 0 1 0 0 0 0
\
pass 1
pass 2
pass3
pass4
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
Figure 24-5: This RAM access test based on the "nameless" sequence requires only 13 write cycles and 25 read cycles
Now you may be wondering what all of the fuss is about, because the clock speeds of today's systems mean that either test would only require a fraction of a second. But remember that this was a very simple RAM, and the scale of the task can increase dramatically for deeper memories or for those cases in which you only have serial access to the address bus via a shift register. Before moving on, we should note that having more data bits can make the task both simpler and more complex at the same time. For example, consider a similar RAM with an 8-bit address bus and a 3-bit data bus. In this case, we might be tempted to use the test sequence shown in Figure 24-6.
323
324
Designus Maximus Unleashed!
adare~[7:O] All O~
0 1 1 1 0
0 1 1 0 1
0 1 0 1 0
0 1 0 0 1
0 0 1 1 0
0 0 1 0 1
0 0 0 1 0
0 0 0 0 1
~lata[2:0] 000 001 010 011 100
Figure 24-6: This RAM access test only requires 5 write cycles and 5 read cycles, but this is not a good test As we see, the fact that this RAM is 3 bits wide allows us to associate a different pattern of 0s and ls with each of our input addresses (in this case we simply assigned a binary count to the patterns on the data bus). The reason this test sequence appears to be attractive is that it only requires a single pass involving 5 writes followed by 5 reads. The sequence does ensure that none of the address or data wires have any stuck-at or open faults, and also that there are no bridging faults between any of the address lines or between any of the data lines. However, the test is sadly lacking, because it won't detect bridging faults between a~l~lre~[5] and ~lata[O], a~l~lre~[3] and ~lata[1], or a~i~lre~[O] and ~lata[2]. We could try rearranging the data values, but the simplest solution might be to add one or more test vectors in which a~l~Ire~s[5], a~l~Ires~[3], and address[O] carry opposing logic values to ~lata[O], ~lata[1], and ~lata[2], respectively. Beware! Note that the title of this section was: "Access tests for a single RAM." Things get a little trickier if you have multiple devices connected together to increase the width or depth of the total RAM. This is nothing to worry about as long as you take it into consideration when you're writing your tests, but whatever you do don't make it an afterthought. In the days of yore, I wrote a test based on a schematic that showed a 64 K-word RAM where each word was 16-bits wide. Being but a young whipper-snapper, I didn't realize that this was a hierarchical schematic and that there were actually sixteen RAM devices each 1-bit wide. The end result was an extremely cunning test sequence that didn't actually test anything at all!
Chapter 24 Testing RAMs and ROMs
~ n t e r n a l ~ e s t s 3 o r a Single I~q}VI Once we've proved that we can indeed "talk" to the device as discussed above, the next step is to test that device's internal structures. In fact, there are two distinct types of functional faults that we're interested in here. The first involves detecting memory cells that have failed such that they can't actually store either a logic 0 or a logic 1, while the second requires us to detect cases where a cell's contents are altered during the process of writing (or reading) other cells. One of the simplest and fastest general-purpose methods of testing the internal structures of a RAM device is the "checkerboard sequence," in which logic Os and logic l s are written into alternate memory locations in a checkerboard pattern. For example, in the case of a RAM with an 8-bit data bus, address 0 would be loaded with $55, address I with $AA, address 2 with $55, address 3 with $AA, and so on for the remaining locations (where '$' indicates a hexadecimal value, so $55 - binary 01010101 and $AA -- binary 10101010). Once you've loaded the RAM you wait for some amount on time (which could be seconds or minutes), then you read back all of the memory locations to ensure that these patterns are still there, thereby confirming the device's ability to retain data. The entire test would then be repeated with the inverse patterns; that is, address 0 would be loaded with $AA, address I with $55, and so on. Thus, this test requires 2 writes and 2 reads for each memory location. At the other end of the spectrum is what we might call the "moving inversions sequence," which, assuming an 8-bit data bus, requires 17 writes and 18 reads for each memory location:
a) b) c) el) e) f) g) h) i) j)
Load every location with $00 Read address 0 and check that it still contains $00 Load address 0 with $01 (binary 00000001) Read address 0 and check that it contains $01 Load address 0 with $03 (binary 00000011) Read address 0 and check that it contains $03 Load address 0 with $07 (binary 00000111) Read address 0 and check that it contains $07 Continue the sequence until address 0 contains SFF (binary 11111111) Repeat points (b) through (i) for each of the remaining addresses
At this point, every location in the RAM should contain $FF, so we continue the sequence as follows:
k) Read address 0 and check that it still contains SFF I) Load address 0 with SFE (binary 11111110)
325
326
Designus Maximus Unleashed!
m) n) o) p) q) r) s)
Read address 0 and check that it contains SFE Load address 0 with SFC (binary 11111100) Read address 0 and check that it contains SFC Load address 0 with $F8 (binary 11111000) Read acicire~s 0 and check that it contains $F8 Continue the sequence until aclciress 0 contains $00 Repeat pointe (k) through (r) for each of the remaining aciciresses
The checkerboard sequence is a little under-powered for many people's taste, while the moving inversions sequence can consume more time than you wish to spend. An alternative sequence which falls some way between the two yet only requires 3 writes and 2 reads for each location would be as follows:
a) b) c) el) e) f) g)
Load every location with $55 Read address 0 and check that it still contains $55 Write addrees 0 with SAA Repeat points (b) and (c) for each of the remaining addresses Read aclcireee 0 and check that It etiil containe SAA Write adclreee 0 with $55 Repeat pointe (e) anti (f) for each of the remaining addressee
Unfortunately, none of the above tests give us 100% confidence that any particular cell will remain undisturbed when writing to or reading from its neighboring cells. If we assume that memory cells are square and form a regular matrix on the silicon, then we might also make the assumption that each cell has eight < close neighbors (Figure 24-7). .....
::::,iiii::iiii
-..
-..y
.
So we'd ideally load each cell with a logic 0 and load the eight surrounding cells with logic Is. We would then perform a "surround write disturb" test by re-writing the surrounding cells with logic l s a certain Figure 24- 7: Assuming that memory number of times, followed by a cells are square and form a regular "surround read disturb" test by reading matrix on the silicon, then each cell the surrounding cells a certain number has eight close neighbors of times, and then check that the target cell still contained a logic O. We would then load the target cell with a logic I and the surrounding cells with logic Os and repeat the surround disturb tests, and also perform the same tests on every other cell in the device.
Chopter 24 TestingRAMsond ROMs3 2 7 If you actually perform these tests, then you can safely say that you've tested the functionality of the device into the ground. However, in addition to the fact that these "surround disturb" tests take an unbelievably long time to perform, there's also the problem that you rarely have knowledge of the device's internal structure. For example, consider the first three words of a 256-word by 8-bit RAM, and note the cell containing a logic 0 that appears to be surrounded by logic ls (that is, the cell referenced as data[l] at address I):
address[7:0]
data[7:0]
0
00000111
1
00000101
2
00000111
The reason we say that this cell only "appears" to be surrounded by logic ls is that the way the cells are physically laid out on the silicon doesn't necessarily conform to our mental visualization. Thus, it's quite possible that our target cell is actually surrounded by one or more logic Os from the other cells shown here. The end result is that if you wish to perform tests at this level, then you must have knowledge of the underlying silicon architecture. The strange thing about memory tests is that there's always something more to learn. For example, one of the readers of EDN Magazine emailed me with a problem he'd encountered on a piece of hardware that was supposedly working correctly. Assume that we're dealing with a RAM whose width is 18 bits. This engineer discovered that if he was feeding a RAM location with a series of patterns containing successively more logic ls, then at some point (say 13 logic l s in the 18-bit word) the device would start dropping bits. When the board's designers investigated, they discovered that it was necessary to add more decoupling capacitors to the board to maintain the supply voltage at an acceptable level whilst switching a large number of bits. Note that the checkerboard sequence discussed above might well miss this problem, whilst the moving inversions sequence would probably catch it.
Ce s t i'n g / ~ Oj~ls In the case of read-only devices, including ROMs, PROMs, EPROMs, and EEPROMs, it's rarely worth the effort to try to distinguish between access tests and internal tests. The problem with access tests on these devices is that you can spend a tremendous amount of time trying to determine which locations contain
3 2 8 DesignusMaximus Unleashed! the specific patterns of 0s a n d ls that you need to isolate any faults. In fact in the case of devices that only have a l-bit data bus, writing any form of meaningful access test is well-nigh impossible. Additionally, as testing the contents of a readonly device essentially only requires a single read to each location, you're generally better off simply performing a full internal test and letting your ability to access the device "fall out in the wash." Assuming that you have full access to the device's address and data busses, perhaps the simplest way of testing the device is to read each word in turn and c o m p a r e it to the corresponding word in the original design file that you used to "program" the device in the first place. Of course, for a 64 K-byte device, this m e a n s that you've got to maintain a 64 K-byte file s o m e w h e r e on your system which can be a pain. Additionally, if you plan on using this type of technique to perform s o m e form of built-in self test on a ROM in an e m b e d d e d system, then y o u ' d end up requiring two devices with identical contents which would be meaningless. In fact the most c o m m o n way of testing read-only devices is to use a Cyclic Redundancy Check (CRC) approach. The idea here is to use a software equivalent of a linear-feedback shift-register (LFSR) to process the data in the original design file and to generate a checksum value, c~l The physical device can then be tested by reading each word and using an identical algorithm to generate a new checksum, then comparing this result to the original checksum. It is actually a c o m m o n practice to store the original checksum somewhere in the device itself, which saves you having to r e m e m b e r the checksum value. 121
Overcoming Serial Flccess Problems For reasons that defy explanation, a large proportion of ROM tests that have fallen on my shoulders have involved circuits where the access to the device's address bus was via a shift register (I have no feel for whether this is a c o m m o n situation or whether it's simply a case of bad luck rolling downhill and myself living in the valley) (Figure 24-8).
ILFSRs and CRCs were introduced in Chapter 17. 2Apropos of nothing at all, it's also good practice to reserve at least one byte in the device to act as a revision number for its firmware contents (this can save a lot of time in prototyping environments where sticky labels on programmable devices can either fall off or fail to be updated when the device is reprogrammed).
Chapter 24 Testing RAMs and ROMs
'r c ~ a ,'/
Figure 24-8: In certain cases the address access to a ROM m a y be via a serial shift register
In cases such as these, it can be extremely time-consuming to cycle through all of the addresses using a binary sequence. For example, testing a 256 word device (which would therefore have an 8-bit address bus) using a binary addressing sequence would require us to clock the shift register 8 x 256 times. This is because we need 8 clocks to load the first address of 000000002, another 8 clocks to load the second address of 000000012 , and so on up to the last address of 111111112. In fact there are possibilities for cutting the number of clocks down. For example, consider the last two addresses in this binary sequence: 111111102 and 111111112. A little thought reveals that moving from the penultimate address to the final value does not require us to load eight logic ls into the shift register, but simply to shift one more logic I into the penultimate value's most-significant bit. There are a number of cases like this in the binary count sequence, but detecting and handling them is more trouble than it's worth. As a simple alternative, l've often found it efficacious to use an LFSR implemented in software to drive the serial input. Every time the software LFSR is "clocked," the virtual bit that "falls off the end" is used to drive the physical serial address input. Thus, excluding any special cases, our example above only requires 256 clocks to cycle through all of the addresses. The only special cases that have to be taken into account are when pre-loading the shift register at the start of the sequence and address zero (because an XOR-based LFSR won't pass through the all Os value). (Note that the algorithm used to generate the original checksum value from the design file must also take this LFSR-based addressing mechanism into account.)
329
330
Designus Maximus Unleashed! ,
, , . ,
,r
,
. .
.
.
.
.
r
IAnique ]groblem$ and IAnique Solutions As usual, this discussion has only skimmed the surface of a very interesting field. In reality, every design presents its own unique problems which require equally unique solutions. For example, a surprising number of memory devices have both serial address and serial data busses, which therefore mandate their own esoteric testing strategies. In fact the only useful general-purpose advise anyone can offer is that, when you start to design a system, for goodness sake think about how that system is eventually going to be tested and provide sufficient access to the bowels of the circuit to facilitate the testing process.
The topic in thi~ chapter wae published in a condensed form under the title My memory i~ not what it ueed to be: Te~ting FAMe and ROMs, in the February 1st, 1996 Issue of EDN (www.ednmag.com), and is repro~lucecl in its original form here with their kind permiseion. For thoee who are intereeted in knowing more about testing digital logic in general, I've personally found the following books to be of use:c~ Author. Alexander Miczo =Digital Logic Te~ting anti Simulation" Publisher. John Wiley & Son~ ISBN 0-471-60365-1 Author. R.G. Bennette "Introduction to Digital Board Teeting" Publisher. Crane Rueeak (New York), Edward Arnold (London) ISBN 0-8448-1385-0 Author: I~.G. Bennette "Design of Testable Logic Circuite" Publisher: "Addison Wesley" ISBN 0-201-14403-4
3Note that these are books that have found their way to my bookshelf over the years, and I can't promise that they are still in print.
Chapter 25:
Deep Submicron Delay Effects "Sometimes it's just one delay after anothefi" In this chapter you will discover: The E v o l u t i o n o f D e l a y S p e c i f i c a t i o n s ............................................................. 332 A P o t p o u r r i o f D e f i n i t i o n s ........................................................................................................ 332 Signal Slopes .......................................................................................................................................332 Input Switching Thresholds .......................................................................................... 333 Intrinsic versus Extrinsic ....................................................................................................... 333 Pn-Pn a n d Pt-Pt Delays .................................................................................................... 334 State a n d Slope D e p e n d e n c y ............................................................................. 335 A l t e r n a t i v e I n t e r c o n n e c t M o d e l s ............................................................................ 335 The L u m p e d - l o a d M o d e l .............................................................................................. 335 The Distributed RC M o d e l ............................................................................................ 336 The " p u r e LC" M o d e l .......................................................................................................... 337 The RLC M o d e l ............................................................................................................................. 337 D e e p - S u b m i c r o n D e l a y E f f e c t s ................................................................................... 338 Path-Specific Pn-Pn Delays ........................................................................................ 338 T h r e s h o l d - D e p e n d e n t Pn-Pn Delays ............................................................. 339 S l o p e - D e p e n d e n t Pn-Pn Delays ......................................................................... 340 S t a t e - D e p e n d e n t Pn-Pn Delays .......................................................................... 341 P a t h - D e p e n d e n t Drive C a p a b i l i t y .................................................................. 341 S l o p e - D e p e n d e n t Drive C a p a b i l i t y ............................................................... 342 S t a t e - D e p e n d e n t Drive C a p a b i l i t y ................................................................ 342 S t a t e - D e p e n d e n t S w i t c h i n g Thresholds .................................................. 343 S t a t e - D e p e n d e n t Terminal Parasitics .......................................................... 344 Multi-Input Transitions (Pn-Pn delays) .......................................................... 344 Multi-Input Transitions (drive c a p a b i l i t y ) ....................................................345 R e f l e c t e d Parasitics ............................................................................................................... 346 V e r i f i c a t i o n T e c h n o l o g i e s ....................................................................................................... 346 S u m m a r y ............................................................................................................................................................. 352
332
Designus Maximus Unleashed!
Evolution of Delay Specific.tions Way back in the mists of time, sometime after the Jurassic period when dinosaurs ruled the earth, say around the early 1980s, the lives of IC and ASIC designers were somewhat simpler than they are today. Delay specifications for the early (multi-micron) technologies were rudimentary at best. Consider the case of a simple 2-input AND gate, for which input-to-output databook delays were originally specified as being identical for all of the inputs and for both rising and falling transitions at the output (Figure 25-1). a,b
->
y
(LH,
a,b
->
y
a,b
->
y
a
->
a
->
B
HL)
=
?ns
+
?ns/p]
(LH)
=
?ns
+
?ns/pl
(HL)
=
?ns
+
?ns/p
y
(LH)
=
?ns
+
?ns/p~
y
(HL)
=
?ns
+
?ns/p
->
y
(LH)
a
?ns
+
?ns/p
B
->
y
(HL)
=
?ns
+
?ns/p
?
->
?
?
+
-
?
La~ 1970s
!
inor~asing Gomplexity of delay
~peGiflcations LaCe 1990s
Figure 25-I. Delay specifications have become more complex over time
As device geometries shrank, however, delay specifications became increasingly complex; first by differentiating delays for rising and falling output transitions, and later by specifying different delays for each input. Additionally, these early delays were typically of the form "?ns + ?ns/pF," which means a fixed delay associated with the gate itself combined with some additional delay caused by capacitive loading. As we will see, these forms of specification simply cannot handle the types of delay effects characteristic of deep-submicron technologies, not the least in the area of RLC interconnect delays as discussed below.
q9 P o t p o u r r i of Definitions Before plunging headlong into the subject of deep-submicron delays, it is first necessary to introduce a number of definitions as follows: Signal Slopes: The "slope" of a signal is its rate-of-change when transitioning from a logic 0 to a logic 1, or vice versa. An instantaneous transition would be considered to represent the maximum possible slope value (Figure 25-2). The slope of the signal is a function of the output characteristics of the driving gate combined with the characteristics of the interconnect and the input characteristics of any load gate(s).
Chapter 25 Deep Submicron Delay Effects
A (practically unrealizable) inetantaneoue transition repreeente the maximum
~
possible =slope"
! i
I
V
/
Decreasing "slope"
i
Y
"
i
a
y
i
Figure 25-2: The "slope" of a signal is the time taken to transition between logic values Input Switching Thresholds: A n "input switching threshold" is the p o i n t at which an input to a load gate first "sees" a transition as occurring. That is, the point at which the signal presented to the input crosses some threshold value, at which point the downstream gate deigns to notice that something is happening. Input switching thresholds are usually specified as a percentage of the value (voltage differential) between a logic 0 and a logic 1, and each input may have different switching thresholds for rising and falling transitions (Figure 25-3).
....
-30%
Figure 25-3: Input switching thresholds may differ for rising and falling transitions Intrinsic v e r s u s Extrinsic: The term "intrinsic" refers to any delay effects that are internal to a logic function, while the term "extrinsic" refers to any delay effects that are associated with the interconnect (Figure 25-4).
In the early multi-micron technologies, intrinsic delays dominated over their extrinsic counterparts. For example, in the case of devices with 2.0 um geometries, the intrinsic delay typically accounted for approximately two-thirds of the total delay. But extrinsic delays became increasingly important with shrinking
333
334
Designus Maximus Unleashed!
geometries. By the time that devices with 1 pm geometries became available, the relative domination of the intrinsic and extrinsic delays had effectively reversed.
L
ii
i
l
1
2.0 micron 1.0 micron
Total delay (= 100%)
Gate delay Intrineic
I
I Interconnect (inc. fan-in) ~1~ Extrineic
..........................................
~
"1 ...........................................
~e"gl2 Figure 25-4: Intrinsic versus extrinsic delays This trend seems set to continue, the reason being that interconnect is not shrinking at the same rate as transistors and logic gates. In the case of today's sub-micron technologies, the extrinsic delay can account for up to 80% of the total delay.
Pn-Pn and Pt-Pt Delays: To a large extent, Pin-to-Pin (Pn-Pn) and Point-to-Point (Pt-Pt) delays are simply modern terms for intrinsic and extrinsic delays respectively. A Pn-Pn delay is measured between a transition occurring at the input to a gate and a corresponding transition occurring at the output from that gate, while a Pt-Pt delay is measured between the output from a driving gate to the input of a load gate (Figure 25-5). (~)
I. iT
Pn-Pn
i .............................
i~
.I. Pt-Pt
Pn-Pn
,
gl.a
rl~
i a
Y
--/--
gl.y
I
g2.a
i I
Figure 25-5: Pn-Pn versus Pt-Pt delays ISee also the discussions on Pt-Pt delays in Chapter 6.
~
Pt-Pt ~.
! I
. . . .
Chapter 25 Deep Submicron Delay Effects To be more precise, a Pn-Pn delay is the time between a signal on a gate's input reaching that input's switching threshold to a corresponding response beginning at its output, while a Pt-Pt delay is the time from the output of a driving gate beginning its transition to a corresponding load gate perceiving that transition as crossing its input switching threshold ...... phew! There are a number of reasons why we're emphasizing the fact that we consider the time when the output begins to respond to mark the end of the Pn-Pn delay and the start of the Pt-Pt delay. In the past, these delays were measured from the time when the output reached 50% of the value between a logic 0 and a logic 1. This was considered to be acceptable because load gates were all assumed to have input switching thresholds of 50%. But consider a rising transition on the output and assume that the load gate's input switching threshold for a rising transition is 30%. If we were to assume that delays are measured from the time the output crosses its 50% value, then it's entirely possible that the load gate could "see" the transition before we consider the output to have changed. Also, when we come to consider mixed-signal (analog and digital) simulation, 121then the only meaningful time to pass an event from a gate's output transitioning in the digital realm into the analog domain is the point at which the gate's output begins its transition.
State and Slope Dependency: Any attribute associated with an input to a gate (including a Pn-Pn delay) that is a function of the logic values on other inputs to that gate is said to be "State Dependent." Similarly, any attribute associated with an input to a gate (including a Pn-Pn delay) that is a function of the slope of the signal presented to that input is said to be "Slope Dependent." These state and slope dependency definitions might not appear to make much sense at the moment, but they'll come to the fore in the not-so-distant future. ~lternative
~nterconnect
7~ode/$
As the geometries of structures on the silicon shrink and the number of gates in a device increase, interconnect delays assume a greater significance. Increasingly sophisticated algorithms are required to accurately represent the effects of the interconnect. From "pure RC" (lumped-load) calculations, through distributed RC calculations, to more complex RLC formula that also take input switching threshold values into account. The Lumped-load Model: As was previously discussed, Pn-Pn gate delays in early multi-micron technologies dominated over Pt-Pt interconnect delays. Additionally, the rise and fall times of signals were typically greater than the time 2Mixed-signal simulation was introduced in Chapter 11.
335
3 3 6 DesignusMaximus Unleashed! taken for them to propagate through the interconnect. In these cases the lumpedload interconnect model was usually sufficient (Figure 25-6).
am
Equivalent (total) Capacitance
Y
Figure 25-6: The "lumped-load" interconnect model
......... Y.I
..............
In the lumped-load model, all of the capacitances associated with the track and with the inputs to the load gates are added together to give a single, equivalent capacitance. This capacitance is then multiplied by the drive capability of the driving gate (which was specified in terms of ns/pF) to give a resulting Pt-Pt delay. The lumped-load model is characterized by the fact that all of the nodes on the track are considered to commence transitioning at the same time and with the same slope. This model may also be referred to as a "pure RC" model. The Distributed RC Model: The shrinking device geometries of the mid-1980s began to mandate a more accurate representation of the interconnect than was provided by the lumped-load model. Thus, the distributed RC model was born (where R and C represent resistance and capacitance, respectively) (Figure 25-7).
In the distributed RC model, each segment of the track is treated as an RC network. The distributed RC model is characterized by the fact that all of the nodes on the track are considered to commence transitioning at the same time but with different slopes. Another way to view this is that the signal's edge is collapsing (or deteriorating) as it propagates down the track. However, verification tools such as static timing analyzers and logic simulators don't see things in quite this way. A timing calculator program is used to evaluate the characteristics of the signal using whatever level of data is provided to it, and the calculator then returns simplified Pn-Pn and Pt-Pt delays for use by the appropriate verification tool (timing calculators are discussed in more detail on the next page).
Chapter 25 Deep Submicron Delay Effects
Y a
/
y
-
~A,
-I-
._L.
T -F"
._L.
"T
Figure 25-7" The "distributed RC" interconnect model
y
y I
The "pure LC" Model: At the circuit board level, some high-speed interconnects take on the characteristics of transmission lines. This pure LC model can be represented as a "sharp" transition propagating down the track as a wavefront (where L and C represent inductance and capacitance, respectively) (Figure 25-8).
I / A/
a
y
y
Figure 25-8: The "pure LC" interconnect model Pure transmission line effects do not occur at the IC or ASIC level. However, large submicron devices do begin to exhibit certain aspects of these delay effects as discussed below. The RLC Model: In the case of large devices with deep submicron geometries, the speed of the signals coupled with relatively long traces results in the
337
338
Designus Maximus Unleashed! ,~,
,,
.
_
~
.
.
. . . . . . . . . .
, ,
_
_
_
.
~
.
_
.....
interconnect exhibiting some transmission line type effects. However, the resistive nature of IC interconnect does not support pure LC effects; instead, these traces may be described as exhibiting RLC effects (Figure 25-9).
'/ .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
!
.
y
Figure 25-9: The "RLC" interconnect model The RLC model is characterized by a discrete wavefront supplied by the interconnect's LC constituents combined with a collapsing (or deteriorating) signal edge caused by the interconnect's RC constituents.
Deep-$ubmicron Delay Effects Path-Specific Pn-Pn Delays: Each input-to-ou~ut path typically has its own Pn-Pn delay. For example, in the case of a 2-input OR gate, a change on input a causing a transition on output y would have a different delay to that of a change on input ~ causing a transition on output y (Figure 25-10). (Note that this example assumes input switching thresholds of 50%, and remember that Pn-Pn delays are measured from the time when a signal presented to an input crosses that input's switching threshold to the time when the output first begins to respond.) Similarly, each rising and falling transition at the output typically has its own Pn-Pn delay. For example, in the case of the OR gate, a change on input a causing a rising transition on output y would have a different delay to that of a change on input a causing a falling transition on output y. Path- and transition-specific Pn-Pn delays are not limited to submicron technologies and they should come as no surprise, but they are presented here to prepare the stage for the horrors that are to come.
Chapter 25 Deep Submicron Delay Effects i
a Y b~ g l
b 7 ../..
/
,_ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
, ..... ,,,,,,
11tl
'"'"'"'"
IIII
/
y
IIIIII ..... ,,,~ .......
,,,,,,,~
IIIIII '''''''''''''''~''~'''''
,,,,,,,,,,,, '"'"'""'
Illlllllll IIIIIIIIII
11111111111,,,,,,,~,,,,,, IIIt1111111 '''''''''''''~
Figure 25-10: Path-specific Pn-Pn delays Threshold-Dependent Pn-Pn Delays: Pn-Pn delays depend on the switching thresholds associated with inputs, at least to the extent that the delay through the gate doesn't actually commence until the signal presented to the input crosses the threshold (Figure 25-11).
a ~~0%
a
j
~-?0%
a
,
/
! !
i n ticks i
:
iT M
:
Y
i
in ticks i
~i
........
I1111111 .................. ~,,,, ..... ,,,~ iiiii111,,,,,,,,,~,,,,,~,,,,,,,,,,,, ,,,,
,,,,,,,,
Figure 25-1 I" Threshold-dependent Pn-Pn delays For example, if the input switching threshold for a rising transition on input a were 30% of the value between the logic 0 and logic I levels, then the input would "see" the transition earlier than it would if its input switching threshold were 70%. Additionally, the slope of a signal being presented to an input affects the time that signal crosses the input switching threshold (Figure 25-12). ii ,
,
a ,-
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
--50%
a
,
i
,
i
i
i
O~9
-1
a
b y ,_ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
~
!
:
i
,..in ti=ks .i
,~i n ticks ,, I ', ', ', ', :', ~ [ I ', ', ', ', ', ', ', ', ', I ', ', ', ', I::: ', ', ', ', ', ',', ', ', ', ', ', t ', ', ',', ', ', ',
' i !
i
I1111111 II
Figure 25-12: The slope of an incoming signal affects the time at which the input "sees" that signal
339
340
Designus Maximus Unleashed#
Assuming for the sake of simplicity that that input a has a switching threshold of 50%, then decreasing the slope of the signal presented to this input would change the time at which a "sees" the event as occurring, and consequently the time at which the Pn-Pn delay commences. Note however that this is NOT the same as "slope dependent" Pn-Pn delays, which are discussed in more detail below. Slope-Dependent Pn-Pn Delays: The previous example illustrated in Figure 25-12 was somewhat simplistic, in that it showed two Pn-Pn delays as being identical, irrespective of the slope of the incoming signal. Some vendors of computer-aided design tools refer to the previous case as "slope-dependency," but this is not a correct usage of the term. In fact a variety of delay effects may be truly slope-dependent in deep-submicron technologies; that is, they may be directly modified by the slope of a signal. For example, a gate's Pn-Pn delays from an input to an output may depend on the slope of the signal presented to that input. To put this another way; if we commence at the point at which the signal presented to an input crosses that input's switching threshold, then the Pn-Pn delay from this point may be a function of the rate-of-change of the incoming signal. (Figure 25-13 ).
! .................................
!
!
',
I
!
b i " ..........................
__". ....
i
i
!
i
:
I 0
I ,
I ,
!
IIII ..... lllilll Iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii Ill
!
IIIIIIIII ....... IIIIIIIIIIIIIIIIIIIII IIIIIIIIIIIIIIIIIIIII
! ............
Figure 25-13: Slope-dependent Pn-Pn delays Actually, the effect illustrated here, in which a decreasing slope causes an increasing Pn-Pn delay, is only one possible scenario. This particular case applies to gates and/or technologies where the predominant effect is that the switching speeds of the transistors forming the gate are directly related to the rate-ofchange-of-charge applied to their inputs. By comparison, in the case of certain technologies, a decreasing slope actually results in faster Pn-Pn delays (as measured from the switching threshold of the input). This latter case results from the fact that a sufficiently long slope permits internal transistors to become pre-charged almost to the point of switching. Thus, when the input signal actually crosses the input's switching threshold, the gate is poised at the starting blocks and appears to switch faster than it would if a "sharp" edge had been applied to the input.
Chapter 25 Deep Submicron Delay Effects
To further increase your pleasure and double your fun, both effects may be present simultaneously. Thus, applying a "sharp" edge to the input may result in a certain Pn-Pn delay, and gradually decreasing the slope of the applied signal could cause a gradual increase in the Pn-Pn delay. At some point, however, further decreasing the slope of the applied input will cause a reduction in the Pn-Pn delay, possibly to the point where it becomes smaller than the Pn-Pn delay associated with our original "sharp" edge! 131
State-Dependent Pn-Pn Delays: In addition to being slope-dependent, Pn-Pn delays are often state-dependent; that is, they depend on the logic values on other inputs (Figure 25-14). 5 a GO
b
l!iiiiiiiiiii!i !!] Y i
~ci i
Full
gl
I
---
0
~ -
b
ci co
/ i
i '
adder ......... ..........
IIIIIIIII IIIIIIIII
i
! ........................ ........................
.......... IIIIIIIIIIIIIIIIIII ........... IIIIIIIIIIIIIIIIII
................... ...................
Figure 25-14: State-dependent Pn-Pn delays
This example illustrates two cases in which a signal presented to the a input causes an identical response (in terms of logic values) at the co output. However, even assuming that the slopes of the signals presented to a and the switching thresholds on a are identical in both cases, the Pn-Pn delays may be different due to the logic values present on inputs b and ci.
Path-Dependent Drive Capability: This is where life really starts to get interesting (trust me, have I ever lied to you before?)(4) Up to this point, we have only considered effects which impact Pn-Pn delays through a gate, but many of these effects also influence the gate's ability to drive signal at its output(s). For example, the driving capability of a gate may be path-dependent (Figure 25-15). In this case, in addition to the fact that inputs a and b have different Pn-Pn delays, the driving capability of the gate (and hence the slope of the output signal) is dependent on which input caused the output transition to occur. Until recently this phenomenon has typically been associated with MOS technologies, and has generally not been linked to bipolar technologies such as TTL. However, 3And there are those who would say that electronics is dull and boring 4Don't a n s w e r that!
go figure!
34
342
Designus Maximus Unleashed!
as we plunge into deep-submicron, many of these more esoteric delay effects are beginning to manifest themselves across technologies with little regard for traditional boundaries.
b y
y
Figure 25-15: Path-dependent drive capability Slope-Dependent Drive Capability: In addition to being dependent on which input causes an output transition to occur (as discussed in the previous point), the driving capability of the gate (and hence the slope of the output signal) may also be dependent on the slope of the signal presented to the input (Figure 25-16). (Are we having fun yet?) i
!--a- ............................
--
ib
]
Y y I
'
Figure 25-16: Slope-dependent drive capability State-Dependent Drive Capability: Yet another factor that can influence the drive capability of an output is the logic values present on inputs other than the one actually causing the output transition to occur. This effect is known as statedependent drive capability (Figure 25-17). This example illustrates two cases in which a signal presented to the a input causes an identical response (in terms of logic values) at the co output. However, even assuming that the slopes of the signals presented to a and the switching thresholds on a are identical in both cases, the driving capability of the gate (and hence the slope of the output signal) may be different due to the logic values present on inputs b and cl.
Chapter 25 Deep Submicron Delay Effects S t a t e - D e p e n d e n t Switching Thresholds: As you may have noticed, the previous points on state-dependent Pn-Pn delays and state-dependent drive capabilities included the phrase "...assuming that the input switching thresholds /on a particular input] are identical in both cases..." If this caused a few alarm bells to start ringing in your mind then, if nothing else, at least these discussions are serving to hone your abilities to survive the dire and dismal depths of the deep-submicron domain.
co a
ci
Iii]
1
a -~-
a ~ ci
gl
co
CO
Full adder ,,,,,,,,,,,,
,,,,,,,,,,,,~),,,,
...............................
Illlllll
Illll]ll[ll"
Figure 25-17: State-dependent drive capability By some strange quirk of fate, an input's switching threshold may be statedependent; that is, it may depend in the logic values present on other inputs (Figure 25-18).
..../ -
30%
9........
0
,_ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
- - / .... 70% 4.-.......
0 ,_ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Figure 25-18: State-dependent input switching thresholds
Ill JJ,,,,,
.....
343
344
Designus Maximus Unleashed!
In this example, the switching threshold of input a (the point at which a "sees" a transition as occurring) depends on the logic values presented to inputs b and ci.
State-Dependent Terminal Parasitics: In addition to an input's switching threshold being state-dependent, further characteristics associated with that input (such as its parasitic values) may also depend on the logic values presented to other inputs. For example, consider a 2-input OR gate (Figure 25-19).
.
.
.
a
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
-"'=
Figure 25-19: State-dependent terminal parasitics
=
~
'
I
The terminal capacitance of input g2.a may depend on the logic value presented to input g2.b. If input g2.b is a logic 0, a transition on input g2.a will cause the output of the OR gate to switch. In this case, g1.y (the output of the gate driving g2.a) will "see" a relatively high capacitance. However, if input g2.b is a logic 1, a transition on input g2.a will not cause the output of the OR gate to switch. In this case, g1.y will "see" a relatively small capacitance. This particular effect first manifested itself in ECL technologies. In fact as far back as the late 1980s, I was made aware of one ECL gate-array technology in which the terminal capacitance of a load gate (as perceived by the driving gate) varied by close to 100% due to this form of state-dependency. But this effect is no longer confined to ECL, because as we plunge into deep-submicron, many delay effects are beginning to manifest themselves across technologies with scant regard for traditional boundaries.
Multi-lnput Transitions (Pn-Pn delays): Up to this point, we have only considered cases in which a signal presented to a single input causes an output response. The picture does of course become more complex when multi-input transitions are considered. For example, take the case of a 2-input OR gate (Figure 25-20). For the sake of simplicity we will assume that both the a and b inputs are fully symmetrical; that is, both have identical input switching thresholds and both have identical Pn-Pn delays.
Chapter 25 Deep Submicron Delay Effects a
~ -
a
/
Y i
,,,,,~,,,,, IIIIIIIIIIIIIIIIitll "'"'"'"llllilllllllllllllllt'"'""'"
................. ......
~a,,t,,,,,,lllllllllll
........................... "'"'""'11111111111"'""'""'"'""'"'"'
Figure 25-20: Multi-input transitions and Pn-Pn delays First consider the case where a transition applied to a single input (for example, input a) causes a response at the output. The resulting Pn-Pn delay is the one that is usually specified in the databook for this cell. However, if both inputs transition simultaneously, the resulting Pn-Pn delay may be reduced to close to 50% of the value specified in the databook. These two cases (a single input transition occurring in isolation versus multi-input transitions occurring simultaneously) provide us with worst-case end-points. However, it is also necessary to consider those cases where the inputs don't transition simultaneously, but they do transition closely together. For example, take the OR gate shown in Figure 25-20 and assume that both inputs are initially at logic O. Now assume that input a is presented with a rising transition which initiates the standard databook Pn-Pn delay, but before this delay has fully completed, input b is also presented with a rising transition. The result is that the actual Pn-Pn delay could occur anywhere between the two worst-case endpoints.
Multi-Input Transitions (drive capability): In addition to modifying Pn-Pn delays, multi-input transitions may also affect the driving capability of the gate, and hence the slope of the output signal (Figure 25-21). i
i
"by
/ ...............................
IIIIIIIIIII .......
; ,,, , , , , , . . . . . . . . .
i
IIII .....................
Figure 25-2 I" Multi-input transitions and drive capability
345
346
DesignusMaximusUnleashed! All of these multi-input transition effects can be estimated with simple linear approximations. Unfortunately, today's verification tools such as static timing analysis or digital logic simulation, are not well-equipped to perform on-the-fly calculations of this type (the limitations of today's tools are discussed in a little more detail below). Reflected Parasitics: In the technologies of yesteryear, it was fairly safe to assume that parasitic effects had limited scope and were generally only visible to logic gates in their immediate vicinity. For example, consider the three gates in Figure 25-22.
a
y
a
wl .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
y
a
T T .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
y
w2 .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Figure 25-22: Reflected parasitics
Traditionally it was safe to assume that gate g2 would buffer the output of gl from wire w2 and gate g3. Thus, the output gl.y would only "see" any parasitics such as the capacitances associated with wire ~ and gate terminal g2.a. These assumptions become less valid in the deep-submicron domain. Returning to the three gates shown in Figure 25-22, it is now possible for some proportion of the parasitics associated with wire w2 and gate terminal g3.a to be "reflected back" through gate g2 and to be made visible to output g1.y. Additionally, if gate g2 were a multi-input gate such as a 2-input XOR, then the proportion of these parasitics reflected back through g2 may well be state dependent; that is, they may vary depending on the logic value presented to the other input of g2. Fortunately, reflected parasitics are typically low order effects, even in the case of technologies with geometries as low as 0.35 microns. However, if history has taught us anything it is to be afraid (be very afraid), because these effects may assume a much greater significance as geometries pass through 0.25 microns and beyond.
"Verification ~echnologJe$ Timing verification of digital ICs and ASICs is typically performed with a static timing analyzer or a digital simulator. The latter includes logic simulation using either minimum or maximum delays, and dynamic timing (worst case) verification
chop,o,25 o op Subr.ic,o, o
which uses both minimum and maximum delays at the same time. Static timing analysis is faster than digital simulation, does not require a waveform, and performs an exhaustive evaluation of all possible paths. Unfortunately, a static timing analyzer may also report a great number of false paths, which can require a significant amount of effort to resolve on the part of the designer. By comparison, digital simulation does require a waveform and only evaluates those paths sensitized by that waveform (which may be considered to offer certain advantages). Additionally, digital simulation is applicable to all forms of design, while static timing analysis may not be suitable for some (typically asynchronous) designs. Today's verification tools rely on the assumption that all of the delays can be pre-calculated before the actual analysis takes place (Figure 25-23).
Topolocj,y
Timin~j
Figure 25-23: Timing verification is typically based on pre-calculation
Funct, ion
Note that in the case of static timing analysis the components are largely treated as "black boxes," and the functionality data passed to the analyzer is pretty much limited to whether or not any inversions take place between the component's input(s) and output(s). For both forms of analysis, the delay calculator either estimates pre-layout interconnect effects or uses more complex formula to take back-annotated post-layout data into account. The end result is a series of precalculated Pn-Pn and Pt-Pt values. The technique of fully pre-calculating the delays was reasonably accurate for multi-micron technologies with simple delay effects and simple interconnect models. For example, let's consider a typical mid-1980s scenario involving the combination of a multi-micron technology with a lumped-load interconnect model (Figure 25-24).
347
348
Designus Maximus Unleashed/
A
........a---T.............. ..~.iiiiiiiiiiiii.!i.ii~ :~:ii ili~~'g3 :84
Figure 25-24: Mid-1980s multimicron technology and lumpedload interconnect The library models for gates gl, g2, and g3 would include Pn-Pn delay specifications, input capacitances specified in terms of picofarads (pF), and drive capability specified in terms of nanoseconds-per-picofarad (ns/pF). Alternatively, the input capacitances and drive capability may have been specified in terms of unit loads (UL and ns/UL respectively), where 1 UL represented some quantity of capacitance. As an aid to memory, a lumped-load interconnect model means that all of the capacitances associated with the track and with the load gate inputs are added together to give a single, equivalent capacitance. Now consider the way in which a delay calculator and simulator would "see" this portion of the circuit (Figure 25-25). gl.a
gl.y
f
=
Input etimulue
=
Actual output reeponee Linearizeci output reeponee
gl.y
Simula~l output response Pn-Pn
v;~
Yl
Figure 25-25: Simple delay calculation for multi-micron/lumped load A rising edge applied to input gl.a would, after the appropriate Pn-Pn delay, cause a response at output g1.y. The actual, real-world output response would be in the form of an exponential curve with an RC time constant, but the delay calculator would use a linearized approximation. The product of the drive capability of g1.y with the total capacitive load (ns/pF x pF) was understood to represent the time taken for the output to reach 50% of the value between logic 0 and logic I (all inputs were assumed to have switching thresholds of 50%). Thus, the product of drive capability and the total capacitive load was used to represent
Chapter 25 Deep Submicron Delay Effects
the Pt-Pt delay. In many cases the calculated Pt-Pt delay was simply added to the driving gate's Pn-Pn delay to form a new combined delay. This practice was reasonably valid due to the inherent assumptions of the lumped-load model; namely that that all of the nodes on the track are considered to commence transitioning at the same time and with the same slope. The above example is obviously a fairly simplistic view of the world, but it was considered sufficient at the time. However, as timing specifications became more precise, delay calculators began to use more sophisticated algorithms. Today's delay calculators and verification tools can typically support individual Pn-Pn delays for each drive-load pair and for rising and falling edges. Thus, for the circuit illustrated in Figure 25-24, a modern verification environment would typically evaluate individual Pt-Pt delays for g1.y to g2.a (rising edge), gl.y to g2.a (falling edge), g1.y to g3.a (rising edge), and gl.y to g3.a (falling edge). Additionally, a few (actually very few) environments support path-specific Pt-Pt delays. That is, each Pn-Pn delay through gate gl may have a unique set of Pt-Pt delays associated with it. Unfortunately, this level of sophistication is the exception rather than the rule, and even these relatively elegant solutions are not sufficient to handle the more esoteric delay effects that we've been discussing in this chapter. In addition to being dependent on a particular Pn-Pn delay, Pt-Pt delays in deep-submicron technologies may be a function of state dependent effects in the load gates. Similarly, in addition to being dependent on the logic values on other inputs, Pn-Pn delays may be a function of the slopes of the signals associated with the Pt-Pt delays which trigger them. The end result is that the total delay from one particular input to a gate, through that gate, and onwards to its load gate can vary by 100% or more from transition to transition. Because all of these effects are interrelated, accurate analysis of a circuit becomes an extremely complex problem. To illustrate this in more detail, consider a portion of a circuit comprising three 2-input XOR gates (Figure 25-26). First of all note that we are considering an RLC interconnect model, which is characterized by a discrete wavefront supplied by the interconnect's LC constituents combined with a collapsing (or deteriorating) signal edge caused by the interconnect's RC constituents. The sequence of actions associated with a signal propagating across this portion of the circuit may be summarized as follows: 1) Assume (for the sake of simplicity) that a "sharp" transition is presented to either gl.a or gl.b (Figure 25-26a)
349
350
Designus Maximus Unleashed!
2) After an appropriate Pn-Pn delay (which may depend on the logic value present on the other input) the output g1.y begins to respond (Figure 25-26b). 3) The drive capability at gl.y depends on which input, gl.a or gl.b, caused the transition to occur (Figure 25-26c). 4) The input switching threshold of g2.a may be a function of the logic value on g2.b (Figure 25-26d). 5) The terminal parasitics such as the capacitance of g2.a may be a function of the logic value on g2.b (Figure 25-26e). 6) The combination of the drive capability of g1.y and the terminal parasitics of g2.a affects the slope of the signal presented to g2.a (Figure 25-26f). Additionally, this combination affects the slope of the on-going signal heading towards g3.a. 7) The combination of the slope of the signal presented to g2.a and the input switching threshold of g2.a affects the time at which g2.a "sees" the input transition as occurring. Additionally, the Pn-Pn delay from the time g2.a "sees" the input transition to the time g2.y begins to respond may be a function of the slope of the signal presented to g2.a (Figure 25-26g).
i i
Pn-Pn ,,,,,,1111111 " .... III11II
a~)~yb .....................................
...... ......
Illllll Illllll
Pt-Pt i
Pt-Pt
. . . . . . . . . . . . . . . . . . . . . . . .
IIIIIII ..... I111111 .....
v,
, ..... IIII111,,,,,11111 ...... IIIIIIIII11 """1111111""'11111"""11111111111"'"
III11 ...... I1111 ......
.....
RLC Interconnect gl ....
Figure 25-26: Combining deepsubmicron delay effects
ii ....................../i
I I I
i
......................
i' ......................./
!
I I I
Thus, the actual Pn-Pn and Pt-Pt delays related to any particular path can vary from transition to transition, because they depend on a variety of dynamically
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
-
..
_. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. .
Chapter 25 Deep Submicron Delay Effects
~....
...................................................................................
.
............................................................
35
changing circumstances such as the logic values presented to other inputs on both the driving and load gates. The bottom line is that, to accurately represent these effects, it is computationaUy impractical to pre-calculated delays ahead of time. For this reason, it would appear that static timing analysis techniques will not be capable of performing an exhaustive analysis on all the variables discussed above, and digital simulation may become the only realistic option. However, in order to achieve the accuracy necessary to account for these deep-submicron delay effects, the models would have to have dynamic delay values which could be calculated "on-the-fly" during simulation. Previous attempts at on-the-fly delay calculation have resulted in drastically increased simulation times, but these attempts did not fully utilize modern workstation architectures. Many of the delay effects discussed above may be successfully represented using a linear approximation in the form ((i x j) + k). In one scenario, a set of base level Pn-Pn and Pt-Pt delays could be pre-calculated prior to simulation. Subsequently, during simulation, a driving gate would pass two pieces of data to the load gate's input: the time at which the signal will begin its transition at that input and the slope of the signal associated with the transition (Figure 25-27). Pn-Pn+
~I
L
~i:~ii~:~ii~:~:~:~:!~:~i~i~i~i~i~i~:~::~i~i~i~i~!i~i~i~i~i~=
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Pt-Pt+ ~i~I ~i~
| ..................~ To other
.
......~I__~ ....................._i_.... b
i Figure 25-27: One possible alternative simulation structure
a
Pn-Pn+
tiiii i iii/
............... l!iiii iiiI-' ........
_Z.___, ii~iii~iiiiiiii~i~iiiiiii~iiiiiii~iii ..... .
~TT
The load gate would first combine these two pieces of data with its knowledge of the input's switching threshold to determine at what time the input will actually "see" the transition. The load gate would also evaluate the effect of the slope on its Pn-Pn delay. Both these calculations can be represented using the linear
352
Designus Maximus Unleashed!
approximation noted above. To reduce the impact to simulator speed, these calculations could be performed by the floating-point coprocessors resident in certain workstations. This technique would result in simulations with a predicted run-time only 5% greater than traditional algorithms, but with a level of accuracy approaching that of an analog simulation. ~5~ Unfortunately, supporting this type of structure with existing digital simulators would necessitate a major rework of their internal data structures and timing algorithms. In fact, for digital simulators to fully support deep-submicron in the future, it will probably be necessary to rebuild them from the ground up. Not surprisingly, simulation vendors are less than eager to commit these resources until the market begins to demand this level of timing capability. However, there is a technique available today that goes a long way toward the accurate simulation of deep-submicron timing effects, which is to use a combination of A/D and A/d mixed-signal simulation technologies. 161
Summary The majority of the delay effects introduced in this chapter have always been present, even in the case of multi-micron technologies, but many of these effects have traditionally been fourth or third order and were therefore considered to be relatively insignificant. However, as device geometries plunged through the 0.5 micron barrier to 0.35 microns, followed by 0.25 microns and beyond, some of these effects are assuming second and even first order status, and their significance will only increase with smaller geometries operating at lower voltage levels. Unfortunately, many design verification tools are not keeping pace with silicon technology. Unless these tools are enhanced to fully account for deep-submicron effects, designers will be forced to use restrictive design rules to ensure that their designs actually function. Thus, designers may find it impossible to fully realize the potential of the new and exciting technology developments that are becoming available.
The topic in this chapter was published in a condensed form under the title Delay Effect~ Rule in Deep-eu~micron ICe, in the June 12th, 1995 Issue of Electronic Design (ED) magazine (www.penton.com/eci), and is reproduced in its original form here with their kincl permission. SNote that the solution shown here is something of a simplification. Believe it or not, the full solution is rather more complex. 6The combination of A/D and A/d mixed-signal simulation technologies was introduced in Chapter 11.
Chapter 26:
Logic Diagrams and Machines "It's very easy to take more than nothing" In this chapter you will discover: Aristotle a n d t h e Tree of P o r p h y r y ........................................................................354 Euler a n d V e n n ........................................................................................................................................ 354 M a r q u a n d , Carroll, a n d K a r n a u g h .....................................................................354 Lull, Leibniz, a n d Swift ......................................................................
"
355
Carroll, S t a n h o p e , a n d J e v o n s ...................................................................................358 M a r q u a n d , Burack, a n d S h a n n o n .......................................................................359
354
DesignusMaximus Unleashed!
~ r i s t o t l e a n d t h e ~ r e e of P o r p h y r y When one mentions the word "logic," most engineers would leap to the conclusion that the conversation was focused on electronics and computers, but there has historically been a great deal of interest in logic in general. This fascination was initially expressed in the form of logic diagrams, and later in the construction of special-purpose machines for manipulating logical expressions and representations. Diagrams used to represent logical concepts have been around in one form or another for a very long time. For example, Aristotle was certainly familiar with the idea of using a stylized tree figure to represent the relationships between (and successive sub-divisions of) such things as different species. Diagrams of this type, which are known as the Tree of Porphyry, are often to be found in medieval pictures.
Euler and
Venn
Following the Tree of Po~hyry, there seems to have been a dearth of activity on the logic diagram front until 1761, when the brilliant Swiss mathematician Leonhard Euler (pronounced George Boole made significant contributions in "Oiler" in America) introduced a several areas of mathematics, but w a s geometric system that could immortalized for two works in 1847 and 1854, in generate solutions for problems in which he represented logical expressions in a class logic. However, Euler's work mathematical form now known as Boolean in this area didn't really catch on Algebra. Unfortunately, Boolean Algebra was because it was somewhat destined to remain largely unknown and unused awkward to use, and it was for the better part of a century. It was not eventually supplanted in the until 1938 t h a t Claude E. Shannon published an 1890s by a more polished article based on his mastePs thesis at MIT. in scheme proposed by the English his paper, Shannon showed how Boole's logician John Venn. Venn was concepts of TRUE and FALSE could be used to heavily influenced by the work of represent the functions of switches in George Boole (see sidebar) and electronic circuits, and Boolean Algebra quickly his Venn Diagrams very much became one of the mainstays of the digital complemented Boolean Algebra. designePs tool-chest.
)Ylarquand, earroll, a n d Karnaugh Venn Diagrams were strongly based on the interrelationships between overlapping circles or ellipses. The first logic diagrams based on squares or rectangles were introduced in 1881 by Allan Marquand, a lecturer in logic and
Chapter 26 Logic Diagrams and Machines
ethics at John Hopkins University. Marquand's diagrams spurred interest by a number of other contenders, including one offering by an English logician and author, the Reverend Charles Lutwidge Dodgson. Dodgson's diagrammatic technique first appeared in his book The Game of Logic, which was published in 1886, but he is better known to us by his pen-name, Lewis Carroll, and as being the author of Alice's Adventures in Wonderland. Apart from anything else, these rectangular diagrams are of interest to us because they were the forerunners of a more modern form known as Kamaugh Maps. Karnaugh Maps, which were invented by Maurice Karnaugh in the 1950s, can be exceptionally useful for performing logical optimizations and simplifications. 111
s
~.eibniz, a n d
Swift
Possibly the first person in the history of formal logic to use a mechanical device to generate (so-called) logical proofs was the Spanish theologian Ramon Lull. In 1274, Lull climbed Mount Randa in Majorca in search of spiritual sustenance. After fasting and contemplating his navel for several days, Lull experienced what he believed to be a divine revelation, and he promptly rushed back down the mountain to pen his famous Ars Magna. This magnum opus described a number of eccentric logical techniques, but the one of which Lull was most proud (and which received the most attention) was based on concentric disks of card, wood, or metal mounted on a central axis.
Figure 26-I" Ramon Lull's disks
XTheuse of Karnaugh Maps is discussed in exquisite detail in my book: Bebop to the Boolean Boogie (An Unconventional Guide to Electronics), HighTextPublications, ISBN 1-878707-22-1
355
356
Designus Maximus Unleoshed!
Lull's idea was that each disk should contain a number of different words or symbols, which could be combined in different ways by rotating the disks (Figure 26-1). In the case of our somewhat jocular example, we can achieve 4 x 4 x 4 - 64 different sentences along the lines of "! love mice," "You hate cats," and "They eat frogs." Of course Lull had a more serious purpose in mind, which was to prove the truth of everything contained within the Bible. For example, he used his disks to show that "God's mercy is infinite," "God's mercy is mysterious," "God's mercy is just," and so forth. Lull's devices were far more complex than our simple example might suggest, with several containing as many as sixteen different words or symbols on each disk. His masterpiece was the figura universalis, which consisted of fourteen concentric circles ~ the mind boggles at the range of combinations that could be generated by this device. Strange as it may seem to us, Lull's followers (called Lullists) flourished in the late middle ages and the renaissance, and Lullism spread far and wide across Europe. ~y is all of this of interest to us? Well by some 3trange quirk of fate, Lull's work fired the imagination of several fascinating characters, including the German Baron Gottfried von Leibniz (Figure 26-2) who is famous for introducing a mechanical calculator called the Step Reckoner in 1671. Leibniz also strongly advocated the use of the binary number system, which is fundamental to the operation of modern computers. Although Leibniz had little regard for Lull's work in general, he believed there was a chance it could be extended to apply to formal logic. In a rare flight of fancy, Leibniz
Figure 26-2: Gottfried von Leibniz (courtesy Maxfield & MontroseInteractiveInc.)
conjectured that it might be possible to create a universal algebra that could represent just about everything under the sun, including (but not limited to) moral and metaphysical truths. In 1666, at the age of 19, Leibniz wrote his Dissertio de Arte Combinatoria, from which comes a famous quote describing the way in which he believed the world could be in the future: "If controversies were to arise," said Leibniz, "there would be no more need of disputation between two philosophers than between two accountants. For it would suffice to take their pencils in their hands, and say to each other: Let us calculate."
Chapter 26 Logic Diagrams and Machines
357
Of course Lull also has his detractors (which is a kind way of saying that many people considered him to be a raving lunatic). In 1726, the Anglo-lrish satirist Jonathan Swift (Figure 26-3) wrote Gulliver's Tra1,~l~ 121 which was originally intended as an attack on the hypocrisy of the establishment (including the government, the courts, and the c l e r g y - Swift didn't like to restrict himself unduly), but which was so pleasingly written that it immediately became a children's favorite. 131 In part III, chapter 5 of the tale, a professor of Laputa shows Gulliver a machine that generates random sequences of words. This device was based on a 20 foot square frame supporting wires threaded through wooden cubes, where each face of every cube had a piece of paper bearing a word pasted onto it. Students Figure 26-3: J o n a t h a n Swift randomly changed the words using forty (Courtesy Maxfield & Montrose Interactive Inc.) handles mounted around the frame. The students then examined the cubes, and if three or four adjacent words formed part of a sentence that made any sense, they were immediately written down by scribes. The professor told Gulliver that by means of this technique: "The most ignorant person at a reasonable charge, and with little bodily labor, may write books in philosophy, poetry, law, mathematics, and theology, without the least assistance from genius or study." The point is that Swift is believed to have been mocking Lull's art when he penned this part of his story. (Having said this, computer programs have been used to create random poetry and music ...... which makes you wonder what Swift would have written about us). In fact Swift continues to affect us in strange and wondrous ways to this day. When a computer uses multiple bytes to represent a number, there are two main techniques for storing those bytes in memory: either the most-significant byte is stored in the location with the lowest address (in which case we might say it's stored "big-end-first), or the least-significant byte is stored in the lowest address (in which case we might say it's stored "little-end-first). Not surprisingly, some computer designers favor one style while others take the opposite tack. This didn't really matter until people became interested in creating heterogeneous computing environments in which multiple diverse machines were connected together, at which point many acrimonious arguments ensued. In 1980, a famous 2On the off chance you were wondering, Swift penned his great work nine years before the billiard cue was invented. Prior to this, players used to strike the balls with a small mace. 31t's a funny old world when you come to think about it.
358
Designus Maximus Unleashed! .
.
.
.
:...~
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
paper written by Danny Cohen entitled "On Holy Wars and a Plea for Peace" used the terms big-endian and little-endian to refer to the two techniques for storing data. These terms, which are still in use today, were derived from that part of Gulliver's tale whereby two countries go to war over which end of a hardboiled egg should be eaten f i r s t - the little end or the big end!
earrolis Stanhope, and ~evons Leaping from one subject to another with the agility of a mountain goat, we might also note that Lewis Carroll (Figure 26-4) enjoyed posing logical conundrums in many of his books, such as Alice's Adventures in Wonderland (1865), Through the Looking-Glass (1872), and The Hunting of the Snark (1876). For example, consider this scene from the Mad Hatter's tea party in Chapter 7 of Alice's Adventures in Wonderland: c41
J~
"Take some more tea," the March Hare said to Alice, very earnestly. "l've had nothing yet," Alice replied in an offended tone: "so I can't take more." "You mean you can't take less," said the hatter: "it's very easy to take more than nothing." And we would have to chastise ourselves soundly if we neglected the scene involving Tweedledum and Tweedledee in Chapter 4 of Through the Looking-Glass: "I know what you're thinking about," said Tweedledum; "but it isn't so, nohow."
Figure 26-4: Lewis Carroll
(Courtesy Maxfleld & Montrose Interactive Inc.)
"Contrariwise," continued Tweedledee, "if it was so, it might be; and if it were so, it would be; but as it isn't, it ain't. That's logic." You have to admit, these gems of information aren't to be found in your average technical book, are they? But once again we've wandered off the beaten path ("No," you cry, "tell me it isn't so!"). The world's first real logic machine ~ in the 4The phrase "As m a d as a Hatter" comes from the fact that, in ye olden tymes, the manufacturers of men's top hats used mercury compounds as part of the process. Over time the mercury accumulated in their bodies causing severe impairment to their mental functions.
Chapter 26 Logic Diagrams and Machines
sense that it could actually be used to solve formal logic problems; as opposed to Lull's, which tended to create more problems than it solved ~ was invented in the early 1800s by the British scientist and statesman Charles Stanhope (third Earl of Stanhope). A man of many talents, the Earl designed a device called the Stanhope Demonstrator, which was a small box with a window in the top, along with two different colored slides that the user pushed into slots in the sides. Although this doesn't sound like much it was a start, Isl but Stanhope wouldn't publish any details and instructed his friends not to say anything about what he was doing. In fact it wasn't until around sixty years after his death that the Earl's notes and one of his devices fell into the hands of the Reverend Robert Harley, who subsequently published an article on the Stanhope Demonstrator in 1879. Working on a somewhat different approach was the British logician and economist William Stanley Jevons, who, in 1869, produced the earliest model of his famous Jevons' Logic Machine. This device is notable because it was the first machine that could solve a logical problem faster than that problem could be solved without using the machine! Jevons was an aficionado of Boolean logic, and his solution was something of a cross between a logical abacus and a piano (in fact it was sometimes referred to as a "Logic Piano"). This device, which was about 3 feet tall, consisted of keys, levers, and pulleys, along with letters that could be either visible or hidden. When the operator pressed keys representing logical operations, the appropriate letters appeared to reveal the result.
Jl~arquand, ~urack, a n d Shannon The next real advance in logic machines was made by Allan Marquand, whom we previously met in connection with his work on logic diagrams. In 1881, by means of the ingenious use of rods, levers, and springs, Marquand extended Jevons' work to produce the Marquand Logic Machine. Like Jevons' device, Marquand's machine could only handle four variables, but it was smaller and significantly more intuitive to use. ~6~ Things continued to develop apace. In 1936, the American psychologist Benjamin Burack from Chicago constructed what was probably the world's first electrical logic machine. Burack's device used light bulbs to display the logical relationships between a collection of switches, but for some reason he didn't publish anything about his work until 1949. In fact the connection between Boolean Algebra and circuits based on switches had been recognized as early as 1886 by a teacher called Charles Pierce, but nothing substantial happened in this SThere was more to the Stanhope Demonstrator than we can cover here 6Followingthe invention of his logic machine, Marquand abandoned logical pursuits to become a professor of art and archeology at Princeton University..
359
360
Designus Maximus Unleashed!
area until Claude E. S h a n n o n published his 1938 paper (as was discussed earlier in this chapter). Following S h a n n o n ' s paper, a substantial a m o u n t of attention was focused on developing electronic logic machines. Unfortunately, interest in special-purpose logic machines w a n e d in the 1940s with the advent of general-purpose computers, which proved to be much more powerful and for which programs could be written to handle formal logic. 171
For your further reading pleasure, this topic was abstracted from the book: Bebop Bytee Back (An Unconventional Guide to Computer~), ISI3N 0-9651934-0-3, with the kind permission of Doone Publications (www.doone.com) (see also the order form in the back of this book).
7An example of one such program is logic synthesis, which can be used to translate high-level descriptions of circuits into optimized gate-level representations (see also Chapters 3 and 5).
Chapter 27:
Transistors of the Future "Hetrojunction, diamond, and plastic transistors "
In this chapter you will discover: Faster is Better
...........................................................................................................................................
362
H e t r o j u n c t i o n Transistors ............................................................................................................. 363 D i a m o n d Substrates ....................................................................................................................... 365 C h i p - O n - C h i p ( C O C ) ..................................................................................................................... 367 A n d t h e B a n d Plays O n ........................................................................................................... 368
362
Designus Maximus Unleashed!
~g$ter is !~etter If there is one truism in electronics, it is that "faster is better," and a staggering amount of research and development is invested in increasing the speed of electronic devices. Ultimately there are only two ways to increase the speed of transistor switches based on existing semiconductor technologies. The first is to reduce the size of the structures on the semiconductor, thereby obtaining smaller transistors that are closer together and use less power. The second is to use alternative semiconductor materials that inherently switch faster. For example, the band-gap effects associated with gallium arsenide's 3:5 valance structure mean that these transistors switch approximately eight times faster and use a tenth of the power of their silicon counterparts. However, gallium arsenide is a difficult material to work with, while silicon is cheap, readily available, and relatively robust. Additionally, the electronics industry has billions of dollars invested in silicon-based processes, and would be reluctant to leap into something outrageously new unless there were extremely compelling benefits associated with doing so.
For these reasons, speed improvements have traditionally been achieved by making transistors smaller. However, it is becoming apparent that we are reaching the end of this route using conventional technologies. At one time, the limiting factors appeared to be simple process limitations: the quality of the resist, the ability to manufacture accurate masks, and the features that could be achieved with the wavelength of ultraviolet light. Around 1990, when structures with dimensions of 1.0 microns first became available, it was believed that structures of 0.5 microns would be the effective limit that could be achieved with opto-lithographic processes, and that the next stage would be a move to X-ray lithography. However, there have been constant improvements in the techniques associated with mask fabrication, optical systems and lenses, servo motors and positioning systems. Also, there have been significant advances in chemical engineering such as chemically-amplified resists, in which the application of a relatively small quantity of ultraviolet light stimulates the formation of chemicals in the resist which accelerates the degrading process. This reduces the amount of ultraviolet light which is required to degrade the resist and allows the creation of finer features with improved accuracy The combination of all these factors means that 0.25 micron processes started to come online in the early part of 1997, and it is now considered feasible to achieve structures as small as 0.1 microns by continuing to refine existing processes. As we previously noted, the speed of a transistor is strongly related to its size, which affects the distance electrons have to travel. Thus, to enable transistors to switch faster, technologists have concentrated on a strategy referred to as scaling,
Chapter 27 Transistorsof the Future
which means reducing the size of the transistors. However, while reducing the size of transistor structures, it is necessary to maintain certain levels of dopants to achieve the desired effect. This means that, as the size of the structures is reduced, it is necessary to increase the concentration of dopant atoms. Increasing the concentration beyond a certain level causes leakage, resulting in the transistor being permanently ON and therefore useless (you can't call it a "switch" if it's always ON). Thus, technologists are increasingly considering alternative materials and structures.
Hetrojunction
ransistors
An interface between two regions of semiconductor having the same basic composition but opposing types of doping is called a homojunction. For example, consider a generic NMOS transistor (Figure 27-1).
Metal track (drain)
Metal track (gate) Metal track
(source)
N-type silicon
Silicon dioxide P-typeeilicon ~
...........................i..~....'..~....=.....'.................Silicon ............... (eubstrate) .........
Figure 27-I" A standard NMOS transistor is based on homojunctions
Assume that we're dealing with a positive-logic system, in which a logic I value has a more positive potential than a logic O. In this case, when a logic I value is presented to the gate terminal of the transistor, the gate terminal's positive potential (relative to a logic O) repels the positively charged holes in the P-type material, thereby opening a channel and allowing current to flow between the source and drain terminals. In this type of transistor, all of the doped regions are formed in the same piece of semiconductor, so the junctions between the N- and P-type regions are homojunctions. By comparison, the interface between two regions of dissimilar semiconductor materials is called a hetrojunction. Homojunctions dominate current processes because they are easier to fabricate, but the interface of a hetrojunction has naturally occurring electric fields which can be used to accelerate electrons, and
363
364
Designus Maximus Unleashed!
transistors created using hetrojunctions can switch much faster than their homojunction counterparts of the same size. One form of hetrojunction that is attracting a lot of interest is found at the interface between silicon and germanium. Silicon and germanium are in the same family of elements and have similar crystalline structures which, in theory, should make it easy to combine them but, in practice, is a little more difficult. A process currently being evaluated is to create a standard silicon wafer with doped regions, and to then grow extremely thin layers of a silicon-germanium alloy where required. One of the most popular methods of depositing these layers is chemical vapor deposition (CVD), in which a gas containing the required molecules is converted into a plasma by heating it to extremely high temperatures using microwaves (where plasma is a gaseous state in which the atoms or molecules are dissociated to form ions). The plasma carries atoms to the surface of the wafer where they are attracted to the crystalline structure of the substrate. This underlying structure acts as a template, and the new atoms continue to develop the structure to build up a layer on the substrate's surface. Ideally, such a hetrojunction would be formed between a pure silicon substrate and a pure layer of germanium. Unfortunately, germanium atoms are approximately 4 percent larger than silicon atoms, the resulting crystal lattice cannot tolerate the strains that develop, and the result is defects in the structure. In fact, millions of minute inclusions occur in every square millimeter preventing the chip from working. Hence, the solution of growing a layer of silicongermanium alloy, which relieves the stresses in the crystalline structure, thereby preventing the formation of inclusions (Figure 27-2). 9_'~
Inolueione
511loon-
PU~
~jermanlum
germanium
No inoluelone a'r, ~,he bounclary
5111con
eubel;ra'c~
4
Figure 27-2: Hetrojunctions between pure silicon and pure germanium have inclusions (a), but using a silicon-germanium alloy prevents this (b)
,.,,
,, .
.
.
.
.
.
.
.
,
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
, .
.
.
.
.
.
.
.
.
.
.
.
.
.
Chapter 27 Transistorsof the Future .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
, .
.
.
.
.
.
.
,
,
Silicon-germanium hetrojunction devices offer the potential to create transistors which switch as fast, or faster, than those on gallium arsenide, but which use significantly less power and are based on a robust silicon substrate. Additionally, such transistors can be produced on existing fabrication lines, thereby preserving the investment and leveraging current expertise in silicon-based manufacturing processes. Diamond
Substrates
As was noted in the previous section, there is a constant drive towards smaller, more densely packed transistors switching at higher speeds. Unfortunately, although smaller transistors individually use less power that their larger cousins, modern devices can contain millions of the little rascals, which use a significant amount of power and generate a substantial amount of heat. Thus, although we can see our way to building devices containing more than 100 million transistors by the year 2000, there's a strong chance that such devices would melt into a pool of incandescent slag if we were to use them at their full potential. And so we come to diamond, which derives its name from the Greek adamas, meaning "invincible." Diamond is famous as the hardest substance known, but it also has a number of other interesting characteristics: it is a better conductor of heat at room temperatures than any other material (it can conduct five times as much heat as copper, which is the second most thermally-conductive material known), in its pure form it is a good electrical insulator, it is one of most transparent materials available, and it is extremely strong and non-corrosive. For all of these reasons, diamond would form an ideal substrate material for multichip modules. However, there are a number of methods for depositing or growing diamond crystals, one of the most successful being chemical vapor deposition (CVD), which we introduced in our earlier discussions on hetrojunction transistors. In this CVD process, microwaves are used to heat mixtures of hydrogen and hydrocarbons into a plasma, out of which diamond films nucleate and form on suitable substrates. Although the plasma chemistry underlying this phenomena is not fully understood, polycrystalline diamond films can be nucleated on a wide variety of materials, including metals such as titanium, molybdenum, and tungsten, ceramics, and other hard materials such as quartz, silicon, and sapphire. CVD processes work by growing layers of diamond directly onto a substrate. A similar, more recent, technique, known as chemical vapor infiltration (CVI), commences by placing diamond powder in a mold. Additionally, thin posts, or columns, can be pre-formed in the mold, and the diamond powder can be
365
366
Designus Maximus Unleashed!
deposited around them. When exposed to the same plasma as used in the CVD technique, the diamond powder coalesces into a polycrystalline mass. After the CVI process has been performed, the posts can be dissolved leaving holes through the diamond for use in creating vias. CVI processes can produce diamond layers twice the thickness of those obtained using CVD techniques at a fraction of the cost. An alternative, relatively new technique for creating diamond films involves heating carbon with laser beams in a vacuum. Focusing the lasers on a very small area generates extremely high temperatures, which rip atoms away from the carbon and also strip away some of their electrons. The resulting ions fly off and stick to a substrate placed in close proximity. Because the lasers are tightly focused, the high temperatures they generate are localized on the carbon, permitting the substrate to remain close to room temperature. Thus, this process can be used to create diamond films on almost any substrate, including semiconductors, metals, and plastics. Last, but not least, in the late 1980s, a maverick inventor called Ernest Nagy de Nagybaczon invented a simple, cheap, and elegant technique for creating thin diamond films. Nagy's process involves treating a soft pad with diamond powder, spinning the pad at approximately 30,000 revolutions per minute, and maintaining the pad in close contact with a substrate. Although the physics underlying the process is not fully understood, diamond is transferred from the pad to form a smooth and continuous film on the substrate. Interestingly enough, Nagy's technique appears to work with almost any material on almost any substrate! In addition to multichip modules, diamond has potential for a variety of other electronics applications. Because diamond is in the same family of elements as silicon and germanium, it can function as a semiconductor and could be used as a substrate for integrated circuits. In fact, in many ways, diamond would be far superior to silicon: it is stronger, it is capable of withstanding high temperatures, and it is relatively immune to the effects of radiation (the bane of components intended for nuclear and space applications). Additionally, due to diamond's high thermal conductivity, each die would act as its own heat sink and would rapidly conduct heat away. It is believed that diamond-based devices could switch up to 50 times faster than silicon and operate at temperatures over 500~ All of the techniques for forming artificial diamond described above result in films that come respectfully close, if not equal, to the properties of natural diamond in such terms as heat conduction. Unfortunately, these techniques all result in nanophase structures, where nanophase materials are a new form of matter
Chapter 27 Transistorsof the Future
which was only recently discovered, in which small clusters of atoms form the building blocks of a larger structure. These structures differ from those of naturally occurring crystals, in which individual atoms arrange themselves into a lattice. In fact, it is believed that it may be possible to create more than thirty previously unknown forms of diamond using these techniques. Substrates for integrated circuits require the single, large crystalline structures found only in natural diamond, but natural gems are relatively small, and today's semiconductor processes are geared to work with wafers 200 mm to 300 mm in diameter. Unfortunately, there are currently no known materials onto which a single-crystal diamond layer will grow, with the exception of single crystal diamond itself (which sort of defeats the point of doing it in the first place). The only answer appears to be to modify the surface of the substrate onto which the diamond layer is grown, and many observers believe that this technology may be developed in the near future. ehip-on-ehip
(eoe)
The intra-chip connections linking bare die on a multichip module are a source of fairly significant delays. One obvious solution is to mount the die (unpackaged chips) as closely together as possible, thereby reducing the lengths of the tracks and the delays associated them. However, each die can only have a limited number of other dies mounted in close proximity on a 2D substrate. The solution is to proceed into three dimensions. Each die is very thin and, if they are mounted on top of each other, it is possible to have over a hundred die forming a lcm cube. One problem with this chip-on-chip technique is the amount of heat that is generated, which drastically affects the inner layers forming the cube. However, this problem would be alleviated if the die were constructed out of diamond as discussed above: first, because diamond devices have the potential to operate at temperatures up to 500C; and second, because diamond is such a good conductor of heat. Furthermore, the fact that diamond is one of most transparent materials available would facilitate intra-chip communication throughout the 3D cube using surface-emitting laser diodes and photo-transistors constructed alongside the standard transistors on the die. Thus, the ability to create consistent wafer-sized, single-crystal diamond films would revolutionize electronics as we know it today. If it does prove possible to create such films then, in addition to being "a girl's best friend," diamonds would quickly become "an electronics
engineer's biggest buddy."
367
368
Designus Maximus Unleashed!
~ n d t h e ~ a n d P l a y s On
....
While the idea of diamond transistors is extremely interesting, we should note that researchers are experimenting with a variety of other potentially exciting techniques. For example, scientists in France have developed plastic transistors; that is, transistor-like switches that can be created by screen-printing multiple layers of polymers with different electrical characteristics. Potential applications for these devices range from roll-up computer displays to animated illustrations in comics, newspapers, and textbooks. Another interesting phenomena is that of electromagnetic transistor fabrication. For some time it has been known that the application of strong electromagnetic fields to special compound semiconductors can create structures that behave like transistors. The original technique was to coat the surface of a semiconductor substrate with a layer of dopant material, and to then bring an extremely strong, concentrated electromagnetic field in close proximity. The theory behind this technique was that the intense field caused the electromigration of the dopant into the substrate. However, much to everyone's surprise, it was later found that this process remained effective without the presence of the dopant! Strange as it may seem, nobody actually understands the mechanism that causes this phenomenon, but some physicists suspect that the strong electromagnetic fields cause microscopic native defects in the crystals to migrate through the crystal lattice and cluster together. Yet another possibility is that of switches based on organic molecules called proteins, c~l Organic molecules have a number of useful properties, not the least that their structures are intrinsically "self healing" and reject contamination. Also, in addition to being extremely small, many organic molecules have excellent electronic properties. Unlike metallic conductors, they transfer energy by moving electron excitations from place to place rather than relocating entire electrons. Although today's silicon transistors are extremely small (with dimensions measured in fractions of a millionth of a meter), each silicon transistor is still composed of millions of atoms.
IA protein is a complex organic molecule formed from chains of amino acids, which are themselves formed from combinations of certain atoms, namely: carbon, hydrogen, nitrogen, oxygen, usually sulfur, and occasionally phosphorous or iron. Additionally, the chain "folds in on itself" forming an extremely complex 3D shape.
..........
~. . . . . . . . . . . . . . . . . . . . .
~.___~
.........................
~
.....................
~
..........
~ . . ~ _
...........
~. . . . . . . . . . . . . . . .
~
.....................................
Chapter 27 Transistorsof the Future
u~ .......................................................................................................
~_.~
..................................
By comparison, protein switches are thousands of times smaller; they switch orders of magnitude faster; and they consume a fraction of the power of their silicon counterparts. Thus far researchers have only m a n a g e d to construct individual switches and a few primitive logic functions using these techniques. However, the first semiconductor transistor was only constructed fifty years ago ~2~ as I pen these words and the pace of development has accelerated dramatically since those days, so who knows what the future might bring?
The topic in this chapter was published in a condensed form under the title Tran~istona of the Future, in the July 17th, 1997 issue of EDN (www.ednmag.com), and is reproduced in its original form here with their kind permission. For your fu~her reading pleasure, portions of this article were abstracted from the book: Bel~op to the Boolean Boogie (An
ti tiii liii
Unconventional Guide to Electronics), ISBN 1-878707-22-1, with the kind F
permission of HighText Publications (www.hightext-publications.com) (see also the order form in the back of this book), iiii
2Bell Laboratories in the United States began research into semiconductors in 1945, and Physicists William Shockley, Walter Brattain and John Bardeen succeeded in creating the first point-contact germanium transistor on the 23rd December, 1947 (they took a break for the Christmas holidays before publishing their achievement, which is why some reference books state that the first transistor was created in 1948).
369
This Page Intentionally Left Blank
Chapter 28:
Protein-Based
Switches and Nanotechnology "It's alive, it's alive !
"
In this chapter you will discover: R u n n i n g H e a d - F i r s t I n t o a Brick Wall ..................................................................372 The H u m b l e W a t e r M o l e c u l e .........................................................................................372 The H y d r o g e n
B o n d .......................................................................................................................375
P r o t e i n S w i t c h e s ....................................................................................................................................375 N a n o t e c h n o l o g y .................................................................................................................................378
3 72 Designus Maximus Unleashed/ i
unning H e a d - 3 i r s t
nto a
rick Wall
Reducing the size of a transistor both increases the speed with which it switches and decreases its power consumption. In early 1997, the processes used to create integrated circuits were refined to the point that structures with feature sizes of 0.25 um (millionths of a meter) became commercially available. Processes that can create structures of 0.18 um are in the experimental stage, and it is anticipated that current technologies can be extended to create structures as small as 0.10 um. At this point, it appears that the opto-lithographic techniques used to create the majority of today's integrated circuits will run head-first into a "brick wall," and device vendors will have to turn to alternative approaches to improve performance; for example, the hetrojunction transistors or diamond substrates introduced in the previous chapter. However, some technologists are considering more esoteric techniques, such as protein-based switches and nanotechnology.
Humble W a t e r jYIolecule Before pondering proteins, it is instructive to reflect on the way in which humble water molecules are formed and interact. Matter, the stuff that everything is made of, is formed from atoms. The heart of an atom, the nucleus, is composed of p r o t o n s and neutrons and is surrounded by a "cloud" of electrons, c~l It may help to visualize electrons as orbiting the nucleus in the same way that the moon orbits the earth (Figure 28-1). In the real world things aren't quite this simple, because electrons behave both as particles and as wave functions (which means that generally speaking, an individual electron can only be said to have a "probability" of being in a certain place at any particular time) but the concept of orbiting electrons serves our purpose for these discussions. "V8 , ~ -
////I/
/
/""
.... ;,."i......... I
................ . ....
P =Proton N = Neutron
""'"',,,
e = Electron +v~ ,,
'
~
",....
'
+V8
i
+ve = poetive charge -ve = n~jative charge
..~'ve
Figure 28-I" A helium atom consists of two protons, two neutrons, and two electrons IProtons, neutrons, and electrons are themselves composed from elementary particles called "quarks," which have been aptly described as: "The dreams that stuff is made off"
Chapter 28 Protein-Based Switches and Nanotechnology
Each proton carries a single positive (+ve) charge and each electron carries a single negative (-ve) charge. The neutrons are neutral and act like glue, holding the nucleus together and resisting the natural tendency of the protons to repel each other. Protons and neutrons are approximately the same size while electrons are very much smaller. If baseballs were used to represent the protons and neutrons, the nucleus of the helium atom would be approximately the same size as a basketball and, on the same scale, medium-sized garden peas could be used to represent the electrons. In this case, the diameter of each electron's orbit would be approximately that of 250 American football fields (excluding the end zones). Thai% th~ rn~jr~ri~ of an atom consists of e m p ~ space, so if all the empty space were removed from the atoms forming a camel, it would be possible for that camel to pass through the eye of a needle! The number of protons in the nucleus determines the type of the element; for example, hydrogen has one proton, helium two, lithium three, and so forth. Atoms vary greatly in size, from hydrogen with its single proton to those containing hundreds of protons. The number of neutrons does not necessarily equal the number of protons, and there may be several different flavors, called isotopes, of the same element differing only in their number of neutrons; for example, hydrogen has three isotopes with zero, one, and two neutrons respectively. ~21 Left to their own devices, each proton in the nucleus will have a complementary electron. If additional electrons are forcibly added to an atom, the result is a negative ion of that atom; correspondingly, if electrons are forcibly removed from an atom, the result is a positive ion. In an atom where each proton is balanced by a complementary electron, one would assume that the atom would be stable and content with its lot in life, but things are not always as they seem. Although every electron contains the same amount of negative charge, they orbit the nucleus at different levels known as quantum levels or electron shells. Each electron shell requires a specific number of electrons to fill it; the first shell requires two electrons, the second requires eight, and so forth. Thus, as helium atoms contain two protons and two electrons they are electrically balanced, and as helium's two electrons completely fill its first electron shell this atom is very stable. Under normal circumstances helium atoms do not go around casually making molecules with every other atom they meet, so helium is knows as one of the inert or noble gasses.
2Hydrogen atoms containing one neutron are referred to as deuterium (from the Greek deuteros, meaning "secondary"), while hydrogen atoms with two neutrons are called tritium (from the Greek tritos, meaning "third"). Tritium, which is relatively rare, is a radioactive isotope having a half-life of 12.5 years, and is used to form the "heavy water" used in certain atomic reactions.
3 73
3 74 Designus Maximus Unleashed! By comparison, although a hydrogen atom (chemical symbol H) contain both a proton and an electron and is therefore electrically balanced, it is still not completely happy. Given a choice, a hydrogen atom would prefer to have a second electron to fill its first electron shell. However, simply adding a second electron is not the solution, because although the first electron shell would now be filled, the extra electron would result in an electrically unbalanced negative ion. Similarly, an oxygen atom (chemical symbol O) contains eight protons and eight electrons. Two of the electrons are used to fill the first electron shell which leaves six left over for the second shell. Unfortunately for oxygen its second shell would ideally prefer eight electrons to fill it. Obviously this is a bit of a stumper, but the maker of the universe came up with a solution: atoms can use the electrons in their outermost shell to form bonds with other atoms. The atoms share each other's electrons thereby forming more complex structures. One such structure is called a molecule; for example, two hydrogen atoms can bond together with an oxygen atom to form a water molecule (chemical symbol H20) (Figure 28-2). 5light +ve charge
Olight +ve charge
Oiight +ve charge
Figure 28-2: A water molecule consists of two hydrogen atoms lending electrons to, and borrowing electrons from, an oxygen atom These types of bonds are called valence bonds. Each hydrogen atom lends its electron to the oxygen atom and at the same time borrows an electron from the oxygen atom. This leads both of the hydrogen atoms to believe they have two electrons in their first electron shell. Similarly, the oxygen atom lends t~vo electrons (one to each hydrogen atom) and at the same time borrows two electrons (one from each hydrogen atom). When the two borrowed electrons are added to the original six in the oxygen atom's second shell, this shell appears to contain the eight electrons necessary to fill it. This process can be compared to two circus jugglers passing objects between themselves, where "the quickness of the hand deceives the eye," and an observer can't tell how many objects are
Chapter 28 Protein-BasedSwitchesand Nanotechnology3 7 5 actually in the air at any particular time. The pairs of electrons are passing backwards and forwards between the atoms so quickly that each atom is fooled into believing it owns both electrons. The outermost electron shell of each atom appears to be completely filled and the molecule is therefore stable.
~he Hydrogen ~ond But electrons are not distributed evenly in a water molecule, because the oxygen atom is a bigger, more robust fellow which grabs more than its fair share. The result is that the oxygen atom ends up with an overall negative charge, while the two hydrogen atoms are left feeling somewhat on the positive side. The resulting "pressure" from the electron surfeit on the oxygen atom pushes the two hydrogen atoms together, but the fact that each hydrogen atom has a slight positive charge causes them to repel each other, resulting in a bond angle between the hydrogen atoms of approximately 105 degrees. Furthermore, this unequal distribution of charge means that the hydrogen atoms are attracted to anything with a negative bias; for example, the oxygen atom of another water molecule. Although the strength of the resulting bond, known as a hydrogen bond, is weaker than the bond between the hydrogen atom and its "parent" oxygen atom, hydrogen bonds are still relatively respectable. When water is cooled until it freezes, its resulting crystalline structure is based on these hydrogen bonds. Even in water's liquid state, the randomly wandering molecules are constantly forming hydrogen bonds with each other. These bonds persist for a short time until another water molecule clumsily barges into them and knocks them apart. From this perspective, a glass of water actually contains billions of tiny ice crystals that are constantly forming and being broken apart again. Similarly, the reason why ice is slippery is that the hydrogen bonds at the surface are constantly breaking away from the main body and then reattaching themselves, so the surface of a block of ice may be considered to consist of boiling water! (Water is unusual in many respects, such as the fact that it is slippery in its frozen state; most other frozen compounds don't exhibit this characteristic.) The reason hydrogen bonds are of interest to us here is that similar effects occur in larger molecules such as proteins.
Protein Switches Before considering protein switches, it's important to understand that this concept doesn't imply anything gross like liquidizing hamsters to extract their proteins! Out of all the elements nature has to play with, only carbon, hydrogen, oxygen, nitrogen, and sulfur are used to any great extent in living tissues, along with the occasional smattering of phosphorous, minuscule portions of a few choice metals,
376
Designus Maximus Unleashed!
and elements like calcium for bones. The basic building blocks of living tissues are twenty or so relatively simple molecules called amino acids. For example, consider three of the amino acids called threonine, alanine, and ser/ne (Figure 28-3).
Threonine
Alanine
Serine
H
H
I
H--C--H
I H--C--O~H I H ~\ N-- C--C --O - - H Hj I II H O
I
H
0
H--C--H
H ~ C --O
I
H
I
~ N - - C - - C --O - - H Hj I II H O
I
H
I
N--C - - C ~ O ~ H Hj I II H O
Figure 28-3: Threonine, alanine, and serine are three of the twenty or so biological building blocks called amino acids These blocks can join together to form chains, where the links between the blocks are referred to as peptide bonds, which are formed by discarding a water molecule (H20) from adjacent COOHand NH2 groups. (Figure 28-4).
Threonlne
Alanlne
H I
H--C--H I
H~C--O-H~ I H/N~CI --CiI H O
5erlne
H H H
I
H--C~H I H~N--C~C~H II - - I H O
I
O I
H--C--H I N ~ C ~ C III ~ O ~ H H O
Figure 28-4: Amino acids can link together using peptide bonds to form long polypeptide chains Proteins consist of hundreds or thousands of such chains of amino acids. Note that the distribution of electrons in each amino acid varies depending on the size of that acid's constituent atoms, leaving areas that are slightly more positively or negatively charged (similar to a water molecule). The linear chain shown in
Chapter28 Protein-BasedSwitchesand Nanotechnology3 7 7 Figure 28-4 is known as the primary structure of the protein, but this chain subsequently coils up into a spring-like helix, whose shape is maintained by the attractions between the positively and negatively charged areas in the chain. This helix is referred to as the protein's secondary structure, but there's more, because the entire helix subsequently "folds" up into an extremely complex threedimensional structure, whose shape is once again determined by the interactions between the positively and negatively charged areas on the helix. Although this may seem to be arbitrarily random, this resulting tertiary structure represents the lowest possible energy level for the protein, so proteins of the same type always fold up into identical (and stable) configurations. As we've already discussed, atoms can b o n d together to m a k e a variety of structures. In some cases, such as the a m o r p h o u s crystalline structures seen in metals, the electrons are only weakly b o n d e d to their "parent" atoms, and they can wander between atoms relatively freely ~ thus metals are considered to be good conductors of electricity. In other cases, such as rubber, the electrons are tightly b o n d e d to their "parent" atoms ~ so we class these structures as "insulators." In reality, there's no such thing as an insulator ~ any material will conduct if a sufficiently strong electric field is applied to it. For example, if you are foolish enough to fly a kite in a thunderstorm whilst wearing rubber boots, it would only take one bolt of lightning to convince you that your rubber boots will indeed conduct! (DO NOT TRY THIS UNDER ANY CIRCUMSTANCES!!!). But we digress. For our purposes we collectively choose to class some materials as conductors and others as insulators. Similarly, s o m e proteins will conduct electrons relatively easily (conductors) while others will resist a lot harder (insulators). Also, in the case of certain proteins, it's possible to coerce an electron to move to one end of the protein or the other, where it will remain until it's coerced back again (note that the term "end" is s o m e w h a t nebulous in this context). Thus, a protein of this type can essentially be used to store and represent a logic 0 or a logic I value based on the location of this electron. (3) Similarly, it's possible for some protein structures to be p e r s u a d e d to act in the role of switches. In the case of traditional semiconductor-based transistors, even when one considers structures m e a s u r e d in fractions of a millionth of a meter, each transistor consists of millions u p o n millions of atoms. By comparison, protein31n the case of some proteins, rather than physically moving an electron from one "end" to the other, it's possible to simply transfer an excitation from one electron to another. This requires far less power and occurs much faster that moving the electron itself, but it's a little too esoteric a concept to explore in detail here.
3 78 Designus Maximus Unleashed! based switches and registers can be constructed using a few thousand atoms, which means that they are thousands of times smaller, thousands of times faster, and consume a fraction of the power of their semiconductor counterparts.
lanotechnology The main problem with protein-based switches is their microscopic size, which makes it extremely difficult to manipulate them and assemble them into useful structures. In 1959, the legendary physicist Richard Feynman gave a visionary talk, in which he described the possibility by which sub-microscopic computers could perhaps be constructed. Feynman's ideas have subsequently been extended into a field now known as nanotechnology. In fact there are a number of different flavors of nanotechnology, but one of the more common concepts is based on the way in which biological systems operate: specifically, the way in which complex proteins called enzymes act as biological catalysts to assemble large, complex molecules from smaller molecular building blocks. Imagine a "soup" consisting of large quantities of many different types of molecules, two of which, Ma and Mb, may be combined to form larger molecules
of typeMab(Figure28-5).
( b ) ~
,."!il;~ ~,, ,, .~,. Molecule M a
~;:~''
f
Molecule
(a)
Uo,=u,
Mb
(c)
Enzyme
(d Figure 28-5: An enzyme can be used to combine molecules Ma and Mb to form molecule Mab
A catalyst is a substance that initiates a chemical reaction under different conditions (such as lower temperatures) than would otherwise be possible (the catalyst itself remains unchanged at the end of the reaction). The two molecules Ma and Mb won't bond together by themselves, because the process of bonding requires a small "nudge" of energy. However, the process of forming the bond
Chapter 28 Protein-BasedSwitchesand Nanotechnology3 7 9 releases a relatively large amount of energy, thereby leaving the resulting Mab molecule in a lower, more stable energy configuration. The roll of the enzyme is to provide the initial "nudge" of energy that initiates the reaction (the enzyme recovers this energy from the energy that is released when Ma and Mb bond together), and some natural enzymes can process more than half a million molecules a second. There are numerous natural enzymes, each dedicated to the task of "matchmaking" for two of their favorite molecules. As for standard protein molecules, the surface of an enzyme is also an extremely complex threedimensional shape with a jigsaw-puzzle patchwork of positively and negatively charged areas. The enzyme floats around (Figure 28-5a) until it bumps into a molecule of type Ma to which it bonds (Figure 28-5b). The enzyme then continues on its trek until it locates a molecule of type Mb. When the enzyme bonds to molecule Mb (Figure 28-5c), it orientates Mb in exactly the right way to complete the puzzle with molecule Ma and provides the initial energy required to initiate the bonding process. The resulting bond between Ma and Mb are far stronger than their bonds to the enzyme. Furthermore, the act of bonding Ma to Mb redistributes the charge across the surface of the ensuing Mab molecule, which results in the enzyme being repelled by this molecule. So the enzyme and its offspring part ways (Figure 28-5c), leaving the enzyme free to search for two more molecules and start the process all over again. The saga continues, because another, larger enzyme may see its task in life as bringing Mab together with yet another molecule Mcd. And so it continues, onwards and upwards, until the final result, whatever that may be, is achieved. As our ability to create "designer molecules" increases, it becomes increasingly probable that we will one day be able to create "designer enzymes." This would enable us to mass-produce structures similar to "designer proteins" that could act as alternatives to semiconductors. At a more sophisticated level, it may be possible for such a process to directly create the requisite combinations of protein switches as self-replicating logic structures across the face of a two-dimensional substrate, and eventually extend the process to create three-dimensional logic arrays. It is also conceivable that similar techniques could be used to assemble non-organic structures such as microscopic electromechanical artifacts. All that would be required (he said casually) would be for the individual components to be shaped in such a way that naturally occurring electrostatic fields would cause them to form bonds when they were brought together with their soul mates. In fact, this is one step along the path towards molecular-sized robots known as nanobots. Taken to extremes, the discipline of electronics in the future may not
3 8 0 DesignusMaximus Unleashed! involve the extreme temperatures, pressures, and noxious chemicals that are in vogue today. Instead, electronics may simply involve "cook book" style recipes, in which a number of nanobots are added to a container of chemicals, which they initially use to self-replicate until some required number is achieved, at which point they commence to build other structures (including, perhaps, larger nanobots). Although some of the above may appear to be a little far-fetched, nature manages to use these processes to create everything from bacteria and insects to aardvarks and elephants, and there is nothing theoretically stopping humans from using similar techniques. Thus far, scientists have experimented with individual protein switches and have even constructed a number of simple logical functions. This means that protein-switch technology is at roughly the same state as was the electronics industry when the first transistor was fabricated in 1947. In the case of nanotechnology, a number of organizations are now actively investigating these techniques, so if protein-switches and nanotechnology develop at anything like the same pace as semiconductors, computer designers can look forward to some interesting developments in the coming years.
The topic in this chapter was published in an edited form in Electronic~ Deeign & TechnologyNetwork (EDTN) in June 1997, and i~ reproduced in it~ original form here with their kind permiseion. EDTN is a web-only publication for electronics engineere, anti provides know-how, news, and data sheet specifications for a broad range of technologlee. It's actually pretty cool, and well worth your checking them out at www.ecltn.com
Chapter 29.
Interrupts and Interrupt Handling "Excuse me, but do you mind if I cut in?" In this chapter you will discover: B e w a r e of Low-Flying Grockles ..................................................................................382 Using a Polling Strategy ...........................................................................................................382 The Interrupt Request ( I R Q ) I n p u t ..........................................................................387 N o n - M a s k a b l e Interrupts (NMIs) ...............................................................................390 S o f t w a r e Interrupts (SWIs) .....................................................................................................391 The HALT Instruction ........................................................................................................................392 The Interrupt A c k n o w l e d g e
(lACK) O u t p u t ......................................... 393
Interrupt-Driven I n p u t / O u t p u t .......................................................................................393 H a n d l i n g Multiple Interrupt Request Signals ..........................................395 Priority E n c o d i n g ...................................................................................................................................396
382
Designus Maximus Unleashed/
eware of s
rockles
Let's suppose that you've just taken possession of a brand-new car equipped with an on-board computer, whose tasks include closing the windows (when instructed to do so) and activating the airbag (in the event of a crash). Now assume that you're merrily cruising down the highway and you flick the "Close Window" button, which causes the computer to enter a loop saying "Is the window closed yet? If not, I'I1 keep on closing it." Suddenly, as if from nowhere, a gregarious gaggle of rampaging grockles appear! Swerving to avoid them you rocket off the road, screech across a farmyard, and collide rather forcibly with an extremely large pile of manure. It's fortunate indeed that you're wearing your seat belt, because your airbag sadly fails to make an appearance (your computer is still looping around saying "Is the window closed yet? .... "). Thanking your lucky stars, you reach for the steaming-hot coffee that you recently acquired from a well-known purveyor of fast foods. But at the selfsame moment that you raise the coffee to your lips, the window finishes closing and the computer finally gets around to check what's happening in the outside world. Realizing that there's a problem, the computer immediately activates the airbag, you unexpectedly find yourself taking a somewhat larger gulp of coffee than was your original intent, and you're well on the way to having another "one of those days." Unfortunately, this scenario is not as uncommon (in general terms) as you might assume, because it can be tricky to ensure that a computer is made aware of external events in a timely manner so as to handle them appropriately.
14sing a Polling Strategy Assume that you have a rudimentary QWERTY keyboard device that's plugged into one of your computer's input ports (we'll consider an 8-bit data bus and 8bit ports for these discussions). This keyboard contains an 8-bit latch that stores the ASCII code associated with whichever key you last pressed. Also, the act of reading from the keyboard automatically clears the latch to contain a null code of $00 (where '$' indicates a hexadecimal value). Now assume that you create a .o simple program to loop around reading characters from the QWERTY keyboard and writing them to some form of output display (Figure 29-1). Figure 29- I: Flowchart for a program to monitor the Further assume that, whilst performing this task, you keyboard and display the also want your computer to act as a burglar alarm that codes associated with keys as they' re pressed
Chapter 29 Interrupts and Interrupt Handling
383
monitors the state of a switch connected to the front door of your house. For the purposes of these discussions, let's say that opening the door will cause the switch to close, in which case we want the computer to respond by ringing a bell. One way for the CPU to "see" things occurring in the outside world is via its input ports. On this basis, we might decide to connect our burglar alarm switch to a bit on one of these ports, say bit[0], and to connect the other bits to logic 0 (Figure 29-2).
Switch
\ \ G\oo~
~
-
~-
Figure 29-2: O n e might c o n n e c t an external signal (such as a burglar alarm switch) to an input port
Note that we've omitted the circuitry that generates the input port enable signals for simplicity. The way in which this particular circuit is configured means that when the switch is OPEN (meaning the door is closed), bit[O] of the input port will be presented with a logic I value via the pull-up resistor. By comparison, when the switch is CLOSED (meaning the door is open), bit[O] of the port will be presented with a logic 0 value (we could easily have wired the switch such that its OPEN and CLOSED positions were represented by logic 0 and logic I values, respectively ...... but we didn't). We now have to modify our program to check the status of our burglar alarm switch, but this may not be quite as simple as it first appears, because even a rudimentary task like this one offers myriad opportunities for mistakes (Figure 29-3). Our original program (Figure 29-3a) loops around reading from the input port connected to the keyboard until it sees a non-zero value indicating that a key has been pressed. When a non-zero value is detected, the program writes that value to the display and then returns to look for the next key (remember that the act of reading a value from this particular input port automatically clears the latch in the keyboard).
0
384
Designus Maximus Unleashed! .
.
.
.
.
.
.
.
.
.................
i
yee
.
Y 110
yea
a) Original flowchart
b) Not a good iclea
e) A better oolution
Figure 29-3: Augmenting the program to monitor the switch and ring the bell is not as easy as it m a y at first a p p e a r
In our first-pass solution (Figure 29-3b), we might simply add the test to read the alarm switch onto the end of our original program, but this isn't a particularly good idea ...... can you see why? The problem is that this version of the program only checks the state of the switch after you activate a key on the keyboard. So while you're pondering which key to press, a burglar could have entered your abode and be creeping up behind you ...... Imagine the scene when you eventually press a key on the keyboard: the bell rings, you leap to your feet shouting "Don't panic, we've got a burglar, don't panic," you turn around, and there he is! (This may well be the time when you contemplate investing in a better alarm system). As an alternative scenario, the burglar could slip into your house and close the door while you're pondering over the keyboard. In this case, the alarm won't be sounded even after you've pressed a key, because the door will be closed by the time the computer finally comes to look at it. So now you've got a burglar roaming wild and free throughout your house, while your computer is essentially saying: "Don't worry about a thing my little fruitbat, because the front door is safely closed." Jocularity aside, this latter point is quite important. A key aspect of external signals, such as the switch forming our burglar alarm, is that they're typically asynchronous. This means that they can occur at any time and are not synchronized to the computer system's clock, which therefore means that we usually have to latch such signals. In this particular scenario, we could place a latch between the switch and the port (Figure 29-4).
Chapter 29 Interrupts and Interrupt Handling 4o
Latch
~
~o~ Figure 29-4: Latching the external signal allows the CPU to detect actions that happened in the past The act of opening the door will change the state of the latch, which will retain this new state even when the door is closed again. Thus, when our program eventually manages to limp around to check the state of the door, the value in the latch will tell it that the door is either currently open or has been opened. (We could also arrange the circuit such that the act of reading from this port would automatically reset the latch). Unfortunately, even if we did add a latch to our circuit, the program represented by Figure 29-3b still would not warn us that the door has been opened until we press a key on the keyboard, which makes it next to useless as a burglar alarm. The solution is to check for the state of the door every time we go around the loop that tests to see if a key has been pressed (Figure 29-3c). Thus we see that ensuring the CPU recognizes the door's opening in a timely manner does require a little thought, and the problems can only become more pronounced as we increase the number of signals from the outside world. For example, we might decide to add burglar alarm switches to all of the doors and windows in our house. We might also decide to connect a few smoke detectors to our computer, and perhaps even add a sensor to warn us if the Jacuzzi in the master bedroom starts to overflow. Thus, we now have to perform a process known as polling (meaning surveying or sampling), which requires us to modify our program to check for each of these signals in turn (Figure 29-5). One thing we now have to consider is the relative priority of the various signals. For example, unlike the outline presented in Figure 29-5b, we might decide that checking whether or not the house was on fire takes precedence over testing to see if a key had been pressed on the keyboard. In fact, we have to prioritize all of our external signals and determine the order in which they should be evaluated.
385
386
Designu$ Maximus Unleasheci!
loop
loop
..
.,
o
Check
(a) Original program without any 8xl;rane~ue ~e~e
........
"-,..
T
ocher ewitcheeleeneore
y
....... "'",
Figure 29-5: A polling strategy requires the program to sample each of the external signals in turn
...........................................................................................................................................iJJiiJiJiiiil iilJiii ....
L (b) New program that check~ Ices of 8xCcrnaleignale
Another consideration is that our original program only contained one simple loop, but this could be a small portion of a larger program containing a multitude of loops and sub-loops (Figure 29-6). In this case we'd probably bundle all of the switch/sensor tests into a subroutine, and then ensure that we called this subroutine at the appropriate point (or points) from within each loop. Main program flow
Local
loop
Local
~.. .,q
Y
Our ~t.,e probably nee~i to be call~l from all of the local Ioope
loop
Figure 29-6: A polling strategy becomes increasingly complex as more local loops are added to the main program The end result is that, if we're not careful, we might spend more time thinking about when to call the tests for the external signals than we do creating the rest of
Chapter 29 Interrupts and Interrupt Handling
the program. Also, if we decide to add any new switches or sensors (or remove any existing ones), then we will have to re-prioritize everything and update every program that includes these tests. Last but not least, our programs might expend more effort checking the switches and sensors than they do performing the tasks for which they are predominantly intended. This can be extremely inefficient, especially in those cases when the external conditions occur infrequently (how many times do we really expect the house to catch fire on an average day?). Thus, we have two somewhat contradictory requirements, in that we don't want our programs to spend the bulk of their time checking for conditions that will rarely transpire, but when something important does arise (such as a smoke detector being activated), then we want the computer to respond quickly and effectively.
~he ~nterrupt I~equest (~F~O) ~nput Let's take a step back and re-think exactly what it is we're trying to do. We wish to create a program that can concentrate on the task for which it was intended, without being obliged to constantly check to see what's happening in the outside world. However, when an external situation meriting action does arise, then we want the computer's response to be fast and furious. Bearing this in mind, let's return to our original program that loops around reading characters from our QWERTY keyboard and writing them to our display. Let's also regress to having a single external signal to worry about, such as the burglar alarm switch on the front door. What we really want is for our program to spend the bulk of its time dealing with the keyboard, and for the act of opening the door to interrupt whatever the computer is doing and force it to do something else. To facilitate this sort of thing, CPUs are equipped with a special interrupt request input, or IRQ for short (some CPUs have multiple IRQs, but we'll leave this point for later) (Figure 29-7).
Figure 29-7: An external signal (such as a burglar alarm) can be connected directly into the CPU's IRQ input
....~i ...ii"i~' iii~iiiiii~ii~ ii ii~i~' :i~'..i.~... ~ ~
~ 5wi'r~h
\ ~
~
0
387
388
Designus Maximus Unleashed!
When the IRQ enters its active state, this fact is stored in a special latching circuit inside the CPU, thereby circumventing the problem of the IRQ going inactive before the CPU manages to check it (this is similar to the external latch we considered in Figure 29-4, except that this one is inside the CPU). In some CPUs this interrupt latch can be programmed to look for active-high (logic 1) or activelow (logic O) signals, but many simply assume that the IRQ's active state is a
logic O. The CPU also contains a special status flag called the interrupt mask, which is used to enable or disable interrupts, and which can be set or cleared under program control. By default, the CPU powers up The Interrupt maek in eome with the interrupt mask in its inactive state (which microproceeeore is considorP_.~ito be we will assume to be a logic O, see sidebar). an Interrupt Pleable, which moane Thus, in order for the CPU to be able to "see" an t h a t a logic 0 in thle etatue flag IRQ, the programmer has to use a SETIM ("set enablee interrupte while a logic I interrupt mask") instruction to place the mask in cileablee them. By comparleon, in other microproceeeore the Interrupt its active state. Similarly, if the programmer maek ie coneicler~ to act ae an subsequently wishes to prevent the CPU from Interrupt Enal~le. which meane t h a t responding to IRQs, then he or she can use a a logic 0 in thle etatue flag ~lleablee CLAIM ("clear interrupt mask") instruction to interrupte while a logic ! enablee return the mask to its inactive state.
them. The bottom line ie t h a t there ie no etanclarcl way of treating thle parr, icular flag, eo the way it work~ i~ at the cllecretion of the ~leeignere.
The CPU checks the state of the interrupt mask every time it completes a machine code instruction (that is, an opcode). If the mask is inactive the CPU simply proceeds to the next instruction; but if the mask is active, then the CPU takes a peek inside the interrupt latch to determine whether or not an interrupt has been requested (Figure 29-8).
When the CPU does decide to service an interrupt it has to perform a sequence of tasks. At a minimum it has to push a copy of the current contents of the program counter onto the top of the stack, followed by a copy of the contents of the status register. The CPU next places the interrupt mask into its inactive state, thereby preventing any subsequent activity on the IRQ input from confusing the issue (we'll discuss this in more detail later). Some CPUs also push copies of one or more of the other internal registers (such as the accumulator and the index register) onto the stack, because there's a good chance that the act of servicing the interrupt will modify the contents of these registers. If the CPU doesn't do this automatically, then it's up to the programmer to save the contents of any registers he or she deems to be important as soon as the interrupt service routine is entered.
Chapter 29 Interrupts and Interrupt Handling Read the next opcode and execute the instruction
Once the Instruction has been executed, check the state of the Interrupt mask. If the mask is Inactive then loop back, otherwise ...
... check the state of the interrupt latch to see if an Interrupt has been requested. If the latch is inactive then loop back, otherwise ...
... call an interrupt se~ice routine to cleal with the Interrupt. When this routine terminates, return to the main program to process the next instruction
Figure 29-8: The CPU checks to see if it needs to service an interrupt after executing every instruction But what is an interrupt service routine and where might one be found? In fact this routine, which is very similar to a subroutine, is a sequence of instructions that has been created by the programmer and stored somewhere in the computer's memory. As soon as the CPU has placed copies of the program counter and status register (and any other registers) on the top of the stack, it loads a hard-wired address into the program counter, then uses this address to point to a location in memory (Figure 29-9). Memory interrupt vector
Program
interrupt service routine 'ed
Figure 29-9: A hard-wired address points to the interrupt vector, which, in turn, points to the interrupt service routine
389
390
DesignusMaximusUnleashed! The location in memory identified by the hard-wired address contains the first byte of yet another address called the interrupt vector (IV), which, in turn, points to the first instruction in the interrupt service routine. Thus, the CPU effectively uses its hard-wired address to perform an unconditional jump using the indirect addressing mode, which eventually leaves it at the beginning of the interrupt service routine. Note that the interrupt vector may be stored in either the RAM or the ROM, as can the interrupt service routine; it all depends on what the system is being used for. Once the interrupt service routine has performed whatever actions are required to deal with the interrupt, it can be terminated using an ~'I'I ("return from interrupt") instruction. This is similar to an I~T5 ("return from subroutine") instruction, except that it reloads the status register with whatever byte is residing on the top of the stack before loading the program counter with the return address from the stack. Also, if the CPU is of a type that automatically pushes the contents of any other registers onto the stack following an interrupt request, then these registers would also be restored from the stack before loading the program counter with the return address. One advantage of using this sort of interrupt strategy is that (to a large extent) the interrupt service routine is distinct from main program, so it's conceptually much simpler to develop, maintain, and update. Also, we are now in a position to design the body of our program to concentrate on a certain task without explicitly having to monitor what's going on in the outside world. When an external event occurs that requires attention, the CPU automatically hands control over to the interrupt service routine; and when this routine has finished dealing with the interrupt it returns control to the main program, which picks up the main program where it left off.
~lon-}Vlaskable ~Tnterrupts (~ljl~ ~Is) In addition to the interrupt request (IRQ) input discussed above, many processors also sport a non-maskable interrupt (NMI), which has its own latch within the CPU. As it's name might suggest, an active event on the NMI will always cause the CPU to respond, irrespective of the state of the interrupt mask. Thus, the flowchart shown in Figure 29-8 would now be modified to include a test for the NMI before the test for the interrupt mask, and the NMI therefore has a higher precedence than an IRQ. Apart from the fact that it can't be masked, the system responds to an NMI in much the same way that it handles an IRQ, the only difference being that the NMI has its own hard-wired address inside the CPU; this new hard-wired address points to a separate interrupt vector in the system's
Chapter29 Interruptsand InterruptHandling3 9 1 memory; and this second interrupt vector points to its own interrupt service routine. The non-maskable interrupt tends to be used in mission-critical circumstances. For example, we might decide that an alert from a smoke detector takes priority over a warning that the Jacuzzi is overflowing. For the sake of discussion, let's assume your Jacuzzi does begin to overflow, and a short time later the resulting deluge shorts out a power point and starts a fire (yes, it's turning out to be yet another "one of those days"). The problem is that when the CPU "sees" the IRQ generated by the Jacuzzi, it will immediately leap into action and start performing the appropriate interrupt service routine. But, as you may recall, one of the first things the CPU does when it responds to an IRQ is to disable the interrupt mask, thereby preventing any other IRQs from being seen (we'll consider ways to get around this later). So if the smoke detector also generated an IRQ, the computer wouldn't see it because it would be too busy working on the Jacuzzi problem. However, if the smoke detector generates an NMI, then this will take precedence over anything else that the computer is doing, including servicing an IRQ. (Note that if the CPU has an NMI input but we're not using it, then we can just "tie it off" to its inactive state using a pull-up or pull-down resistor.)
Software
~ l n t e r r u p t s (SW~ls)
Generally speaking we regard an interrupt as being caused by an external event as discussed above. However, some CPU instruction sets include special instructions to trigger an interrupt from within the program, and these are known as software interrupts (SWIs). If the CPU supports both IRQs and NMIs, then there may be equivalent SWI instructions for each type. SWIs have a variety of uses, not the least that they allow the programmer to perform some level of testing on the interrupt service routines without having to physically trigger an external interrupt (such as burning the house down). Also, these instructions may find application in debugging the body of a program. For example, we could create an interrupt service routine whose only task was to display the current values of the CPU's registers on some form of output device (such as our memory mapped display). We could then insert SWI instructions at strategic locations within our program, such that whenever the CPU sees one of these instructions it will leap to the interrupt service routine, display the current contents of the registers, then return to the body of the program. (This is, of course, one way by which source-level debuggers can be implemented).
392
Designus Maximus Unleashed!
~he H~~
~Tnstruction
All of the program examples above have required the CPU to be constantly doing something, such as looping around reading an input port and waiting until it sees a certain value. However, it sometimes happens that the only thing we actually want the CPU to do is to wait for an interrupt to occur and then service it. Of course, we could achieve this in our program by creating some sort of a dummy loop; consider the following assembly statement (Figure 29-10a). Transla~ (Aesemble)
(a) Assembb ~urce
(b) Machine code
Figure 29-10: It's possible to create a dummy loop in the main program
Once this code has been assembled into machine code (Figure 29-10b), it will cause the CPU to continuously perform unconditional jumps back to itself. In this example we're assuming that the DUMMY label occurs at address $4F05, so the resulting machine code contains a $C1 opcode at $4F05 (we're also assuming that $C1 equates to a JMP ("unconditionaljump") instruction). The two operand bytes $4F and $05 cause the CPU to return to address $4F05, from whence it reads the $C1 opcode again, and ...... so it goes. The only way to break out of this loop is to call an interrupt or reset the computer (where the latter option is a somewhat stern measure). Unfortunately, when we do call an interrupt, the CPU will automatically push the return address $4F05 onto the top of the stack. So once the interrupt service routine has completed its task, it will return control to address $4F05 and the CPU will return to mindlessly looping around, which means that it will never be able to proceed to the instruction following the loop. We could get around this by causing the interrupt service routine to finagle the return address on the top of the stack, but this is both aesthetically unpleasing and intellectually unsatisfying. The solution is to replace our dummy loop with a HALT instruction, which uses the implied addressing mode and only occupies a single byte in memory. When the CPU sees a HALT, it stops executing the program and commences to generate internal NOP ("no-operation") instructions. Once again, the only way to break out of the HALT is to call an interrupt or to reset the computer. However, during the process of reading the HALT opcode, the CPU automatically increments the program counter to point to the next instruction. Thus, when an interrupt occurs,
Chapter29 Interruptsand InterruptHandling393 the return address placed on the stack will be for the instruction following the HALT (pretty cunning, huh?).
~he gnterrupt ~cknowledge (g~eK) Output Until now we've been considering the source of our interrupt requests to be simple devices such as switches and sensors, but this is not necessarily the case. In some circumstances the interrupt request may come from a more sophisticated device, and this device may have more than a passing interest in knowing when the CPU begins to respond to its request. Thus, CPUs are typically equipped with an interrupt acknowledge (IACK) output. Assuming that all of our control signals are active-low (which is often the case), the game commences when the external device places a logic 0 value on either the IRQ or the NMI inputs. In turn, as soon as it starts to respond to the interrupt request, the CPU drives a logic 0 onto its IACK output, thereby informing the external device that its plea for attention has been heard and is being acted upon. Once the CPU has finished servicing the interrupt, it returns the IACK output to a logic 1, which tells the external devices that the CPU is now willing and able to accept a new interrupt.
~Interrupt-Driven ~nputJOutput As we previously discussed, one technique for handling devices such as a QWERTY keyboard is to use a polling strategy. For example, creating a program that loops around reading the port connected to the keyboard until a key has been pressed; passing the code for this key to an output device; and returning to looping around waiting for the next key. But a modern computer can execute many millions of instructions a second, which means that 99.9% of the time our CPU is just hanging around twiddling its metaphorical thumbs. This is not to say that there's anything particularly wrong with this technique, providing we only want to perform simple tasks like copying characters from the keyboard to the display. However, instead of recklessly squandering all of this processing power, we might wish to employ it in a gainful way. For example, while the CPU is waiting for us to press the next key, we could be using it to perform some useful task like reformatting the contents of the display to line all of the words up nicely. The problem is that if we do create a routine to reformat the screen, then this routine will need to keep on checking the keyboard to see if we've pressed another key. What we'd really like is to leave the reformatting routine free to perform its machinations, and break in as soon as a key is pressed on the
394
Designus Maximus Unleashed#
keyboard. Just a moment, doesn't this sound suspiciously like a task for an interrupt? In fact, that's exactly where we're heading, in that we could easily equip our keyboard with the ability to generate an interrupt whenever a key is pressed (Figure 29-11).
S
Ii!i i!i!.i~..i~.. I )0~
9
Figure 29-1 I: Interrupt-driven I/O frees the CPU to perform other tasks while waiting for a key to be pressed In this scenario, the CPU can be happily performing some task or other without having to monitor the state of the keyboard. Whenever a key is pressed, the keyboard would issue an interrupt request, which would cause the CPU to hand control over to the associated interrupt service routine. In turn, this routine would read the input port connected to the keyboard, copy the resulting value to the display, then return control to the main program. Also, when the CPU starts to respond to the interrupt request, it would activate its interrupt acknowledge output, thereby informing the keyboard that things were on the move. As soon as the service routine had terminated, the CPU would return the interrupt acknowledge to its inactive state, which would inform the keyboard that it is now free to clear its internal latch. This type of interrupt-driven input control is quite common with devices such as the keyboard and the mouse. Similarly, output devices might generate interrupts to inform the CPU when they are ready to accept more data. Of course, this implies that multiple devices might be generating interrupt requests, but our example CPU only supports a single IRQ input, which means that we need to come up with a cunning ruse .......
Chapter 29 Interrupts and Interrupt Handling
Handling jYluitiple ~Interrupt I~equest Signals Let's assume that our CPU only has a single IRQ input, but that we have two external devices that wish to generate interrupt requests. One technique we can use to achieve this is to connect both of these signals together in what is referred to as a wired-AND configuration (Figure 29-12).
J
N~ .1
\ou~'-
~/
Pull-up resist, or
Figure 29-12" One technique for handling multiple interrupt requests is to use a wired-and approach The idea here is to modify each of the external devices such that when they aren't calling for an interrupt, they effectively disconnect themselves from the IRQ signal, which is therefore coerced to a weak logic I value (it's inactive state) by the pull-up resistor. However, if one of the devices does wish to call an interrupt, it can overpower the pull-up resistor by driving a strong logic 0 onto the wire. Also, the interrupt acknowledge output from the CPU can be connected to both of the external devices, thereby allowing each of them to tell if one of their number has already called an interrupt. The advantage of this scheme is that it's relatively easy to hook additional devices up to the interrupt request signal. The disadvantage is that when the CPU receives an interrupt request, it doesn't actually know which of the devices called it, so the interrupt service routine's first task is to check each device in turn to determine which device is attempting to gain the CPU's attention (using some type of polling strategy). An alternative technique for handling multiple interrupts is to simply equip the CPU with more IRQ inputs, each with its own interrupt latch, hard-wired address, interrupt vector, and interrupt service routine. In this case, the CPU's status register would now contain individual interrupt mask flags for each of the IRQ inputs.
395
396
Designus Maximus Unleashed!
Prioritg Encoding There are many different strategies for handling interrupts and it isn't possible to cover them all here. However, it would be remiss of us to neglect the topic of priority encoding, if only because it's quite an interesting subject. We commence by attaching a special device called a priority encoder to the data bus (Figure 29-13).
Figure 29-13: A priority encoder facilitates the handling of multiple interrupt requests In this particular example, our priority encoder accepts sixteen external interrupt request inputs called XII~Q[15:0] (where 'X' stands for "external") and, if any of these signals becomes active, the encoder generates a master interrupt request which is fed to the CPU. One of the first things the CPU does when it receives a master interrupt request is to read a value from the priority encoder which, in this instance, acts in a similar manner to an input port. (As for a standard input port, the priority encoder would have an enable input which would be decoded from the address and control busses, but this isn't shown here for reasons of clarity). Now here's one of the clever bits. The priority encoder converts its sixteen inputs into a 4-bit binary code (the most-significant four bits of the data byte can be set to logic 0), and it's this code the CPU sees when it reads a value from the encoder (Figure 29-14). Note that this figure only illustrate those cases in which a single external interrupt request is activated; we'll consider what happens when multiple interrupts occur in a little while (also note that we're going to simplify things just a tad for the sake of understandability). Somewhere in the system's memory are sixteen interrupt vectors organized as a table, and the hard-wired address in the CPU points to the "base" interrupt vector in this table (Figure 29-15a). When the CPU receives an interrupt request and reads the value from the priority encoder, it adds this value
Chapter 29 Interruptsand InterruptHandling ., ........
. .........
. .............................
,,,.,,..
397
to its hard-wired address, thereby generating a new address which points to the appropriate interrupt vector in the table (Figure 29-15b). This combined address is then loaded into the program counter and used by the CPU to retrieve an interrupt vector, which in turn points to the appropriate interrupt service routine.
Figure 29-14: One of the priority encoder's tasks is to convert any interrupt seen on one of its 16 inputs into a 4-bit code
One small point to consider is that, if we assume that our CPU has a 16-bit address bus and an 8-bit data word, then each interrupt vector will occupy two bytes in memory, which means that the CPU has to multiply the value from the priority encoder by two before adding it to the hard-wired address (it can easily achieve this by automatically shifting the value left by one bit). Memory 16-bit hard-wired addreee
,-,,
a
!_!
,
,,
8-bit value from priority encoder
l~.i~!ji~!iiji',iilj!iiii~ii~i~i/
i 1,,. --"t
~1 I
, I I
,,
I I
I I
Each interrupt vector poin~e to a difl=erent eervice routine
Figure 29-15: The value returned from the priority, encoder is combined with a hard-wired address to target an appropriate interrupt vector
All of this can be a little tricky to understand at first, so let's walk through a simple example. Purely for the sake of discussion, we'll assume that the base address of the interrupt vector table is located at address $9000, which is therefore the value
398
Designus Maximus Unleashed!
represented by the CPU's hard-wired address. This means that the first interrupt vector occupies addresses $9000 and $9001, the second interrupt vector occupies $9002 and $9003, the third occupies $9004 and $9005, and so forth. Now assume that the external device connected to the XIRQ[2] signal requests an interrupt, which causes the priority encoder to activate the main interrupt request signal to the CPU. After completing its current instruction, the CPU pushes the values in its program counter and status register onto the stack, and then reads a value from the priority encoder. As XII~Q[2] was the signal that called the interrupt, the code generated by the encoder will be $02 (or 00000010 in binary). The CPU multiplies this value by two (by shifting it one bit to the left) to generate $04 (or 00000100 in binary). The CPU then adds this value to the hard-wired address to generate a new address of $9004, which it loads into the program counter in order to point to the appropriate interrupt vector. Finally, the CPU performs an unconditional jump to address $9004 using the indirect addressing mode, which causes it to end up at the first instruction in the relevant interrupt service routine. Let's now return to consider what happens if the priority encoder receives multiple requests on its sixteen XlI~Q[15:0] inputs, of which there are 2 ~6 -- 65,536 potential combinations. By some strange quirk of fate, the reason this device is called a priority encoder is that it prioritizes things. Let's assume that, by default, XlI~Q[O] is considered to have a higher priority than XlRQ[I], which, in turn, has a higher priority than XlRQ[2], and so forth. Thus, if the priority encoder should happen to simultaneously receive interrupt requests on XIRQ[15], XII~Q[12], and XIRQ[9], the value it eventually hands over to the CPU will be the $09 (or 00001001 in binary) corresponding to XII~Q[9], because this input has the higher priority. Also, if the system is already dealing with an interrupt when another, higherpriority interrupt occurs, then there are techniques we can use to permit this new signal to interrupt the first (but we won't go into that here). Another interesting point is that the CPU can write values to the priority encoder, because, in addition to acting like an input port, this device can also behave in a similar fashion to an output port. Why would we wish to do this? Well, one common scenario is that the priority encoder would contain its own 16-bit interrupt mask register (completely distinct from the interrupt mask in the CPU), thereby giving programmers the power to enable or disable the external interrupt requests on an individual basis. For example, if we loaded this interrupt mask register to contain 0000 0100 0000 1001 in binary, then the priority encoder would only respond to interrupt requests on the XlI~Q[IO], XlI~Q[3], and XlI~Q[O], signals.
Chapter 29 Interrupts and Interrupt Handling Unfortunately, this article has only begun to probe the subject of interrupts, and there are a variety of additional techniques that can be used to augment or supplant those discussed here. Furthermore, there are a host of programming considerations when it comes to topics such as nested interrupts, which means enabling further interrupt requests whilst an earlier interrupt is already in the process of being serviced. For those who are interested, however, these topics are introduced in exquisite detail in the book Bebop BYTES Back (An Unconventional Guide to Computers), which also includes assembly code examples of nested interrupt service routines for use with the Beboputer T M Virtual Computer. c~l
The topic in this chapter was published in a condensed form under the title "Don t Interrupt Your Computer" in September 1st 1997 issue of EDN (www.ednmag.com), and is reproduced in its original form here with their kind permission.
!... [iii iiii [ili
For your further reading pleasure, this article was itself abstracted from the book: Bebop Bytes Back (An UnconventionalGuide to Computena), ISBN 0-9651934-0-3, with the kind permission of Doone Publications (www.doone.com) (see also the order form in the back of this book),
ii
IThe Beboputer VirtualComputerwas introduced in Chapter 2.
fi
iiiii lii
399
This Page Intentionally Left Blank
Chapter 30:
A Letter From America "The weather is fine, wish you were here In this chapter you will discover: Prolog
..........................................................................................................................................................................
402
A L e t t e r From A m e r i c a .............................................................................................................. 402 Mississippi, T e n n e s s e e , a n d A r k a n s a s ............................................................... 403 Missouri a n d Kansas ....................................................................................................................... 4 0 4 N e b r a s k a ............................................................................................................................................................ 4 0 6 S o u t h D a k o t a ............................................................................................................................................. 407 Wyoming Oklahoma
a n d C o l o r a d o ....................................................................................................... 4 1 0 a n d t h e R e t u r n H o m e ......................................................................... 412
Epilog ..........................................................................................................................................................................4 1 5
/I
402
Designus Maximus Unleashed!
Prolog In October 1990, myself, my wife Steph, and our daughters Lucie (16) and Abby (14) moved to America to set up camp in Huntsville, Alabama. (1/After two years of laboring furiously without a break, Steph and I found ourselves in the unusual position of being by ourselves (the girls were doing their own thing), so we decided to do the unthinkable and take a week's vacation. One of the strange things about living far removed from family and friends is t h a t the folks back home are desperately interested in hearing what you're doing in painstaking and excruciating detail. The problem being t h a t writing individual letters to parents, siblings, and friends describing a week's vacation could take the rest of your natural life. As a solution, I penned the first of a series of short stories called "A Letter Prom America," which we subsequently duplicated and sent to everyone we knew. The years have rolled by as is their wont, and many Pooh Sticks have sailed under the bridge since those far-off carefree days. But as i came to s t a r t preparing this manuscript, a friend reminded me as to how much he'd enjoyed my first story and insisted t h a t it should be included in this tome, so here it i s - enjoy!
7q s
3 r o m tqmerica
This being the epic tale of two pioneers roving behind the beyond, behind which few have gone behind b e y o n d before. A true and accurate account p e n n e d by Max (half man, half beast, and half wit). 121Any resemblance to actual people, places, or other paraphernalia is purely coincidental. As Lucie was safely out of the way in North Carolina visiting her friends, and Abby had jet-setted her way to visit family m e m b e r s in England for three weeks, Steph and I found ourselves free to do anything we desired. So with a song in our hearts we decided to see something of America without the plaintive cries of "What are we doing in this place? .... l'm bored, l'm bored, .... l'm starving! When are we going to eat?" "I need a rest room NOW!" and "I didn't ask to be born you know," issuing from the back of the van as is our usual custom when traveling anywhere. For some time Steph had been interested in visiting South Dakota to see Mount Rushmore, so this is what we determined to do. America is linked by an impressive array of well-sign-posted Interstate highways, but we decided to
XFamous for its night life . 2Nerves of steel, muscles of iron, and ..... wait for it, wait for it....... brains of stone!
Chapter 30 A Letter From America
enshrew these and, like the pioneers of old, only travel paths that no one with any sense would attempt. In order to ensure our survival I secretly invested in a wibbly-wobbly compass, which I proceeded to superglue onto the van's dashboard while Steph wasn't looking, and thus equipped we set off on a Saturday at the crack of noon.
j~/lississippi, ~cennessee, and ~ r k a n s a $ Initially we followed a well-worn path North-East across Alabama from Huntsville, cutting across the top-right corner of Mississippi and the bottom-left corner of Tennessee to arrive in Memphis. We paused in Memphis to take a picture of an enigmatic ten-story stainless-steel pyramid with no signs as to what it does or what it's for, and then plunged across the Mississippi river into Arkansas. For reasons unknown, Arkansas is actually pronounced "Ar-kin-saw" (as opposed to "Ar-kan-zas" as one might expect) ~ we asked almost everyone in the state why this should be so, but no one had a clue. We made camp around 9:00 pm at a rather nice hotel called the Wilson Inn, Jonesboro, Arkansas, which was very reasonably priced at $40 (twenty English pounds) for the pair of us. By some strange quirk of fate, the rest of the hotel was occupied by three girl's softball teams, whose youthful exuberance kept us entertained well into the early hours. In hindsight (the one exact science) we should perhaps have seen this as a . . . . harbinger of the horrors that were to I ~ ~ ~ ~ ~~~~ come ...... I iiii~~i~ In the mNorth_Westerly general o rn in g w e c o n tidirection, n u e d in a
i ~~~mi ~ i ~ ~ i~~ ~' i~
Just a few miles up the road, Steph ~ii"iii i~ was saddened to discover that we'd '~ ~ i ~ ~ i~ missed a tent revival meeting that had been held only the previous evening. For my part l've long r wanted to plunge my naked (and oiled) body ' into pits of rattlesnakes ~ a,dseemingly key feature of the ceremony we so tragically missed but those are the breaks I guess. A few miles further we paused at a picturesque waterfall to acquire some postcards, l'd just started to turn the rotating postcard stand when Steph exclaimed "Max, that man is trying to read those!" I quickly poked my head around the stand to apologize, but before I could speak the chap in question smiled sheepishly and said: "It's all right, ! can't read." Strange to relate, I couldn't think of a single response that seemed to fit the occasion. Why didn't they instruct me about times like these in my social studies courses at high
403
404
DesignusMaximusUnleashed! school, instead of waffling on about the inane dribble of which they are so fond? This episode left me somewhat flustered, and I only began to cheer up a few hours later when we crossed the border into Missouri (now there's something you don't expect to hear yourself saying on a regular basis).
)Ylissouri and Kansas. Not a lot to say about Missouri, except that we drove straight through it until we reached Kansas City, at which point we hung a left into Kansas and continued to a town called Topeka. As it was getting close to the time to rest our weary eyes, we celebrated by driving the wrong way down a one-way street, which, much to our surprise, turned out to the exit ramp for an interstate. Mayhap the signs saying "You are going the wrong way" should have given us a clue, but as we weren't personally acquainted with the person who wrote them we had no basis on which to determine their veracity, so we decided to ignore them in the hope they'd go away. We celebrated our continued existence by staying at the Holiday Inn in Topeka. A word to the wise ...... don't stay at the Holiday Inn in Topeka ...... it wasn't worth the service we received (and I use the term "service" in its Ioosest sense). \ ......~~ .. The next day we proceeded across Kansas, which is about as fiat a state as one could hope to find. There's little wonder that Dorothy
~,J,o, e
~ ............. ~ ............ T~
l~ne~e ' 9
.....i:iii:~ii' .. ;ii ~wec " ......~..~'.~'~.'~.;'.'~.~.~.~.~'~'~' " ......',, M I ~ ~ H ~i::~i~:iiiiiiii~:iliiii~::-iiiiili
grabbed Toto and flung herself \,.iiiiiii!il ilii into the first passing Tornado . . . . . I :::~::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: in the desperate hope of being I , , , .I transported to someplace with l Oklahoma jo~ a little more to offer on the mountain I ....iiii Z?//I ~!iiiii:!~!:iiiiiiili!iiili..... ~:~ :~i~!:i~,~ ! ~:~ :i:~:: :!::~i~i:i::::'~!~ziilil front. In fact Kansas is actually a pretty i Arka~a~i~'~"~!'~ nice place, so long as you like your ~,~ scenery to be totally devoid of anything ~ resembling a bump in the ground. Also, should ~L~~~..j you have a phobia about being flooded out in a thunderstorm, then Kansas probably isn't the state for you, because there's no chance of building your house higher than the surrounding landscape (there is no place higher than the surrounding landscape!). One can only assume that any excess rainwater runs off into neighboring states.
!
As we drove across Kansas we were constantly accosted by signs promising a "Historical Marker Ahead." There are unbelievable numbers of these signs all over America, but should you be tempted to stop, they are almost invariably
Chapter 30 A Letter From America
fatuous and inform you of inane nuggets of trivia along the lines of: "Fifty years ago today at the Battle of Big-Butt Ridge, G e n e r a / G o o b e r / V Junior shot himself in the armpit on this s p o t . " - W o w ! We continued East until we were halfway across Kansas, then we hung a right and headed North. You have to be in a state as vast as Kansas to fully appreciate the value of a compass, because the maps are often less than useful, and your only hope of getting where you're going is to drive in vaguely the right direction until you reach somewhere from whence you can take your bearings again. As that somewhere (perhaps the next town) may well be 50 miles away, you really do want to try to be heading in the approximate direction of where you're actually trying to get. As I said, we had decided to enshrew the interstates to follow the less well traveled roads, and it was about this time that we realized our wibbly-wobbly compass was performing its sole function in life slightly less well than one might have hoped. If all else fails ...... read the instructions. According to the manual, the compass had to be corrected to compensate for the effect of the metal portions of the van. The manual further suggested pointing the van to the North and using the appropriate correcting screw, then to the East and using another screw, and so forth. Unfortunately, by the time we discovered these facts, we were in the middle of an area of Kansas farmland called something like Abandon-All-Hope; the only way to accurately determine where North lay was to use a compass; and the only compass in the immediate vicinity was immovably secured to the dashboard of the van. Using the legendary survival skills I acquired in the Boy Scouts (".... moss grows on the North side of the tree .... or perhaps the South ..... mayhap the East .... ) we finally fixed the thing as best we could, and set off once again in something more closely approximating the right direction. (Steph was eager to use the fact that the sun rises in the East and sets in the West, but as it was close to noon and the sun was directly overhead, this snippet of trivia proved to be rather less than helpful). In the middle of Kansas (in the East-West direction), and half an inch from the top (on our map) there's a town called Lebanon. A couple of miles north of Lebanon is the geographical center of the conterminous United States. We wandered over to take a picture of the granite block marking the spot; surprisingly we were the only people there, no-one else seemed to care. We reflected that if the USA were attacked at that moment, we would form the final line of resistance, because we'd be the last people the attackers reached. Holding this thought of "Maxfield's Last Stand," we continued heading North (or as close to North as our compass felt behooven to inform us) and plotted a course for Nebraska.
405
406
DesignusMaximus Unleashed! ~lebraska Nebraska is often described as 75,000 square mile patch of nothing much at all, but we thought it was quite pretty (reminiscent of Kansas but with a few more bumps). In the middle of the state there's a river called the Platte, which at some times of the year is two or three miles wide. It looks very impressive until you realize that it's only about 3 inches deep and you could ford it on roller-skates. One thing we noticed in Kansas, Nebraska, and several other central states is the size of their telephone poles, which are teensy-weensy. They stand about 8 feet tall and look really weird. Also, the pylons carrying power are wooden, and look as though each one was individually constructed from any old pieces of wood the farmer happened to find laying around. The cassette player in the van had broken some time before we commenced our epic adventure, so in a forlorn effort to keep from losing our minds we played the radio, the only problem being that American radio is designed for people who've already lost their minds. Since arriving in America, we had heard numerous reports that Country & Western music had languished and was only just starting to make a comeback. But you can rest assured that we have experienced no evidence of any diminishment in its popularity! In many places the only music you can find on the radio is Country & Western, and it doesn't take long before you start to seriously wonder if you're on the right planet, c31Nebraska provided a cunning alternative, in that we could only pick up one FM station (nothing on AM), and this channel provided sterling entertainment in the form of live, after-dinner speeches from a political rally. Lord, give me strength! After three hours of this (I kid you not) we couldn't wait for good O1' Country and Western again. During large portions of our trek we were more or less permanently lost. Most of the time we were just "kind of lost," but whenever we got near something we were intent on seeing we would become "really lost." Generally it would take us about an hour to realize that we'd passed from "kind of' to "really." Unlike English maps, in which even the public houses (bars) are noted in exquisite detail, entire towns can disappear from their American counterparts. Similarly, although most of the interstates are fairly well documented, you take your chances with lesser roads. So you look at the map and think you've spotted a useful shortcut between the towns of Coma and Ringworm, but once you've set off you quickly find yourself condemned to wander back and forth through a network of unrecorded country 3 Q: What do you get if you play a Country and Western record backwards? A: You stop drinking, your wife gets out of jail, and your mother comes back home on a slow train strumming her guitar!
Chapter 30 A Letter From America
roads. Needless to say, there are few (if any) signs on the roads themselves to tell you their names or where they lead. This is all the more remarkable when you consider the vast quantities of non-essential information that are signposted every few yards, providing you with such tasty tidbits of gossip as the fact that the Honorable Percy Hornblower is running for the position of County Sheriff, frogspawning season is almost upon us, your speed is being monitored by radar from an aircraft, and there is a National Hamster Farm only three miles down the road (needless to say this farm isn't shown on your map). But I digress ...... We spent Monday night in Broken Bow, Nebraska. The townsfolk tried three other names that were rejected by the US post-service as being too similar to those of other townships before someone remembered once seeing a broken bow. Being somewhat desperate they submitted "Broken Bow" as a name and it was accepted. The food at our Inn was excellent and served with great flourish on fine bone china; the only quirk being that they didn't own any saucers. In lieu of formal entertainment, our waitress cast off her traditional role (as one who might condescend to bring sustenance to the guests) and joined us at our table to regale us with tortuous tales of how much she loved England. 141On further questioning it turned out that the closest she'd actually been to the motherland was to read "AL the James Herriot books!" As fate would have it we'd arrived in town just i~ to miss the last performance of the annual circus all was not lost because the performers were sta~ in the same hotel; in fact I came to believe that t Elephants and Sealions were sharing a room just above ours! The bedroom had interesting version of mood lighting in that, when I got into my side of the bed, all of the lights in the room promptly went out. By means of experimentation we found that if I sat doubled up (with my left leg behind my right ear) the light came on again. After reading a book in this position for a short time I realized I was strangely tired, so we retired for the night, secure in the knowledge that if we heard a strange noise, all I had to do was sit up for the lights to come on.
South Dakota On Tuesday we headed North-East again, striking out for South Dakota. I spent much of my time sewing patches over holes in my Jeans while Steph was driving, a task l'd been putting off until there was nothing better to do (and driving across 4And she LOVEDour accents!
407
408
Designus Maximus Unleashed!
Nebraska provided the ideal opportunity). It gave some of the redneck truck drivers pause for thought, however, to see Steph driving and myself doing what they took to be embroidery! When we reached Alliance, Nebraska, we turned North. Just up the road from Alliance, in the middle of miles of empty fields, is a reasonably faithful replica of Stone Henge called "Car Henge" built out of cars painted gray. Although this may sound a little tacky, it is actually pretty darned impressive, let me tell you. At some stage during our meander through the backroads something strange began to occur. I was in the "command seat" driving along the winding country lanes on automatic pilot, when it began to dawn on me that there were one heck of a lot of motorcycles in front of the van. I vaguely recalled there being no bikes approximately ten minutes earlier, then a couple of bikes, then a few more ..... and now there were more than thirty. The riders looked like something out of a "Mad Max" movie ...... weighing in at around 300 pounds, wearing cut off leathers with "Death's Head" Iogos, sporting more hair and beards than you could swing a stick at, flaunting a smorgasbord of X-rated tattoos .... and that was just the women! Even worse, upon glancing in the rearview mirror I discovered that we were being trailed by roughly the same amount of bikes, and more were joining us at the front and rear with every passing minute. I was seriously beginning to wish that l'd donned some clean underwear that morning (or any underwear at all for that matter) when we shot through a small town called something like Groin, which was decked with banners proclaiming: "Groin Welcomes The Sturgis Bikers." We later discovered that the annual Sturgis Rally attracts something like 10,000 bikers to Sturgis, South Dakota. Every town for miles around was packed to the gills with bikes (and bikers) of all shapes and sizes, and it was something of a feat to find a spot large enough to park the van. In fact we had occasion to meet numerous members of the biking fraternity over the course of the next couple of days, and they all turned out to be jolly nice chaps and chappesses (a surprisingly high number of whom were revealed to be high-school teachers, lawyers, and police officers, which sort of gives one pause for thought). We arrived in South Dakota around 4:00 pm and drove straight to Mount Rushmore, which we finally reached at 7:00 pm. Our trip through the Black Hills was made all the more interesting than would have normally been the case, because they were building the road as we went! Every few miles we had to stop and wait for a large explosion as they blasted another section of road in order for us to proceed. Mount Rushmore turned out to be really spectacular ~ the four presidents heads carved into the rock face are 60 feet from chin to hairline! Sad to relate, Steph found it difficult to see much of the view, because we were surrounded by several hundred attendees of the Sturgis Rally, so I had to
Chapter 30 A Letter From America
describe just what it was that we were looking at. We took several pictures of each other smiling and pointing at the carvings (although, most of the images of Steph show her to be bouncing frantically up and down trying to peer over the heads of the bikers) then we headed off on a quest to find somewhere to rest our weary heads. Deadwo, You can only begin to imagine our surprise and delight when we discovered that every room in a 100 mile radius was booked by the attendees of the bike rally. It was turning out to be a dark and stormy night as we were finally refused shelter at the last hotel in Keystone. But as we ambled slowly away with our shoulders drooping, our heads bowed, and a little tear rollir down my cheek, I chanced to hear the squeak ol the manager's voice calling "Mr. and Mrs. Maxfield, are you still there?" It appeared that someone had just called the hotel to cancel their booking as we'd strolled out of the lobby, and although there were plenty of other takers for the room ...... it was bequeathed to us because the manager just LOVED our accents! The following morning we drove to Rapid City, and from there out into the Badlands of South Dakota. The Badlands are almost indescribable; huge areas of strangely shaped hills and valleys and multi-colored rock formations. When we chanced to stop, there were signs everywhere saying "Do not remove the rocks." Unfortunately for the sign-painters, Steph is a habitual collector of rocks (and a habitual ignorer of signs telling her not to do things ~ she's not proud, any sign will do). However, the warnings detailing the dread things that could happen to people who ignored these signs did impress Steph to the extent that she hid her prizes in my backpack while I wasn't looking (so perhaps there's hope for her yet). We drove around the Badlands in wonder for a few hours, and then headed back to Rapid City via a town called Wall. Before I tell you about Wall, you've got to know that in America it's common to see billboards 30 feet wide and 15 feet high at the side of the road. In the lonesome heartlands of states like Kansas (where you might not see another car from one hour to the next) they're pretty much the only stimulation you can look forward to. These signs can start humongous distances away from the subject they're advertising, and you may see one that says: "Visit Gnarly Grotto, just 89 miles!" A couple of miles later you see: "Free codpieces at Gnarly Grotto, Just 78 miles". This keeps on with sign after sign promising the most awesome experience one could ever hope to have (at least in South Dakota). During the last 2 or 3 miles of the approach to "Gnarly Grotto,"
409
410
Designus Maximus Unleashed!
the pace picks up with signs every couple of hundred yards. Finally, when you think there's only a few more yards to go, you find another billboard the size of a tennis court instructing you to turn left and drive for another 23 miles. (It's even worse when they just say "Turn Left Now" and don't give a distance. We once fell for this cunning ploy and disappeared 60 miles into the deepest depths of darkest Tennessee following a sign for the Jack Daniel's distillery!) csl Anyway, as I said, fifty miles east of Rapid City is the little town of Wall, home of Wall Drugs: "The most famous drug store in the West." You know it's coming because every few yards you pass a big billboard telling you so. It's the advertising equivalent of the Chinese water torture. In 1931 in the middle of the Great Depression, a guy named Ted Hustead bought Wall Drugs. Although Wall's population could barely support the meanest drug store, Hustead had a cunning plan up his sleeve. Realizing that people driving across South Dakota were so numb with boredom that they would stop and look at almost anything, Hustead put up a plethora of eye-catching attractions, such as a stuffed buffalo, a life-size dinosaur, and a huge pole with arrows giving the distance and directions from Wall Drug to places all over the world. He also erected billboards on every road in a 50 mile radius to let people know he was there, and he filled the store with the most eclectic assortment of tourist rubbish available on the planet. 161Today, Wall Drug is surrounded by parking lots so enormous you could land a jumbo jet on them and employs most of the town. Believe it or not, in summer they can have more than 20,000 visitors a day, and this place is as close to the middle of nowhere as you can get! After experiencing the delights of Wall Drugs (which took about 5 minutes), we returned to Rapid City and then struck off North-East to Deadwood, a town that if nothing else lives up to the first half of its name. Following the discovery of gold in the black hills around the 1870s, Deadwood is rumored to have been quite an exciting place, in which one might hope to meet such characters as Calamity Jane and Wild Bill Hickock (at least until he was shot dead while playing cards in a local saloon). Today the town makes a living by relieving tourists of their money, but we didn't have any so they left us alone.
Wyoming a n d ~ o l o r a d o We quickly passed through Deadwood and, a few miles later, the equally dross town of Lead, and headed West across the border into Wyoming. We passed through Sundance, the town from which the Sundance kid took his name (which 5Only to discover that the distillery is in a dry county when we got there, so we couldn't actually buy any of Jack's gift to mankind. (The situation may have changed by the time you read this.) 61tmakes you proud to be a human being.
Chapter 30 A Letter From America 4 1 1 was quite possibly the only thing in town worth taking). As the sun sank over the horizon, we embarked on the difficult business of not being able to find a room for the night. This isn't helped when one's driving ~tifies every passing landmark as a onvention was in full swing by now (bless their little cotton socks), and we were told by all and sundry that there were no rooms to be had anywhere in the vicinity. Eventually the tourist help station gave us the address of a possible room in an unnamed motel near an obscure lake (it sounded pretty good at the time), so we set off in search of this promise of delight. In fact we did get a room, but it was horrible. The shag-pile carpet was literally crawling with insects; the bed sagged in the middle and bore the imprint of its previous occupants (a brace of 300-pound mud-wrestlers unless I missed my guess); the bathroom was so small you had to reverse out into the corridor to turn around; and the shower curtain was slashed to shreds reminiscent of Hitchcock's Psycho. I didn't want to dwell on this latter point too much, but I took the time to hide the scissors at the first opportunity to prevent Steph from getting any ideas. In fact we spent the night lying fully clothed on the top of the bed accompanied by our backpacks, which we didn't trust to leave on the floor (it really was pretty gross). The room was only 20 dollars between us, but that was 20 dollars too much! We hit the ground running at the crack of dawn, and headed toward Devil's Tower as featured in Spielberg's "Close Encounters of the Third Kind." I can't describe the Tower ~ it was just too amazing for words! We then set off South, going all the way through Wyoming, and halfway through Colorado. There's nothing like a highway stretching off to an ever-receding vanishing point to make you feel as if you're going nowhere. You can drive down a 30 mile stretch of road that's as straight as an arrow without seeing a living soul. Every once in a while you zip through a dusty little town called something like Armpit, which consists of little more than a gas station, a tree, and a dog, and that's your excitement for the next hour. As the minute hand crawls its way around and around your watch, your nether regions grow so numb that they begin to feel as though they belong to someone else (until you move a little, the "pins-and-needles" start, and you realize that they do indeed belong to you ..... Arrggghhh!). We spent Thursday night in an Ecno-Lodge hotel at Cheyenne Wells, Colorado, and took a room with a water-bed. This was the first time we'd experienced the
412
DesignusMaximusUnleashed! ~hts of one of these aqua-beasts, so we were a little ~re what to expect. On the way to our room Steph ~t about the bed, and on entering our abode she <ed herself down on it without thinking ...... I wish ~d the camera ready! c7~As it turned out they're comfortable, so much so that we overslept, and thus we didn't manage to resume our journey until noon the following day. In the fullness of time we stopped in the small town of Sheridan Lake to send some postcards. This threw the woman behind the post office counter into a state of some confusion, because she told us that they'd never had any call to send any "foreign mail" and she didn't know how much to charge. But after blowing the dust off a long-neglected book of shipping prices she rallied magnificently, and managed to overcharge us with great aplomb. These small towns are a treat. Having English people is considered to be as much fun as a circus, and if you stop for breakfast it only takes a few surreptitious phone calls on the part of the kitchen staff to entice the rest of the town to casually drop by and watch you eat! One possible explanation for this intense interest in outsiders is that, during the course of our travels, we discovered that a lot of the small-town occupants don't seem to journey far from home. We asked one woman if she could recommend any good places to eat in the only big town within 50 miles, and she said that she didn't know because she'd never been!
Oklahoma
and the I~eturn
Home
We proceeded South through Colorado and hung a left (East) into Oklahoma. We decided to stop for the night in Woodward and dropped into a Pizza-Hut for something to eat. The waiter was a 16-year old called Robert who had been left in charge (the master of all he surveyed) and was determined to do his best. When we ordered coffee he rushed two cups to us within seconds, and then informed us that it was cold! He said that he'd just put a fresh pot on, but he had brought us this cold coffee to keep us going until the hot stuff arrived! We stayed in a place called the Northwest Inn, which came highly recommended by Robert (because his mother worked there). This Inn was a pretty good deal at 42 dollars for the pair of us, including the bag we found hanging on our door in the morning containing a thermos of coffee and a newspaper. We thought this was pretty amazing value compared to England. 7A "Kodak Moment" if ever I saw one!
Chapter 30 A LetterFromAmerica 4 1 3 We meandered our way through Oklahoma and stopped for lunch in a town called Guthrie, which featured beautiful buildings and women who smoked cigars while driving huge pick-up trucks. The cafe in which we dined had a number of historic artifacts on the walls, so Steph went for a wander shouting things from the far side of the room like "Oooh Max, look at this, he had three sons!" This was much to the bewilderment of the other diners, who didn't have a clue who Steph was talking to (I was gazing fixedly out of the window pretending that I was dining alone). The quantity of food was amazing: for 10 dollars a head we had so much that we couldn't eat again until the following day! Our waitress was exceedingly proud of her onion rings and, when we refused to buy any (because we couldn't squeeze another crumb in our stomachs), she brought us a giant bowl for free in a desperate last-stand attempt to prove that the rings were as good as advertised. We were lucky to escape the town without being fed to death! After Lunch we went looking for some presents for Lucie and Abby. We found one shop that sold Bull's Testicles, which look like leather and are about the size of a 2-pound bag of sugar (quite useful for storing one's trinkets in I presume). However, we weren't sure if these would go down well so we passed them by. In the end we didn't manage to find any presents for the girls, but Steph bought some socks and I picked up a pair of Levi 501 jeans for just 21 dollars, so that was allright!
We continued
~'
across Oklahoma seeing signs like "Beware, Hitch-
~ iii
.....~..!.
i ~ i ~ i ~ i ~ i iilii!iiill i iiii!iiii i
ii ~ i ~ ~
hikers may be escaping Inmates," which certainly gave us pause for thought. We passed | a place called "Uncle John's Creek," and stopped at a gas station to make a rest stop. It's difficult to describe, but on entering the gas station one became aware of an aura of strange foreboding. I was waiting for the organ music to start when I noticed the most incredibly huge bald woman one could hope to meet sitting behind the counter. I ambled my way towards the men's rest room, which was located at the end of a corridor, only to find that the door was inexplicably propped open by a full oxygen cylinder. I started to exchange some
l
414
Designus Maximus Unleashed!
electrolytic fluids and take a weight off my mind (allright, allright, I started to relieve myself ~ good grief, do you want to know all the details) when I happened to glance over my shoulder. Believe it or not, there was a mirror mounted in the corridor's ceiling, and peering into the mirror I locked eyes with the lady behind the counter! Quickly breaking eye-contact, the hairs on the back of my neck started to quiver and a cold shiver ran down my spine as I began to imagine turning round and finding this flower of Oklahoma barreling down the passageway towards me. In a surprisingly short time I was ready to hit the road, so I wheeled around and rocketed down the corridor with my head lowered; shot out of the garage door like a speeding bullet ("Who was that masked man?"); and bounded across the parking lot into the van screaming: "Dr/ve! Dr/ve! Don't ask any questions! For God's sake just DRIVE!" From Oklahoma we returned into Arkansas, where my chief navigator was delighted to find a town called Bald Knob on the map! One can only assume that this doesn't have the same connotations in America that it would in England, but following my experience in the gas station I personally didn't want to take any chances, so we avoided Bald Knob and continued our trek across Arkansas to spent an easily forgettable night in a town called Conway. On Sunday morning we treated ourselves to a no-holds-barred breakfast, which is something of a national pastime in America. Sunday breakfast is a lot of fun, the only problem being that it can take an inordinate amount of time to order even the simplest platter; for example, eggs, bacon, hash browns (fried potatoes), and toast. The problem is that there are so many choices. The waitress is going to ask you how you want your eggs (sunny-side up, over easy, over medium, over hard, scrambled, poached, hard-boiled, in an omelet, ...); do you want American or Canadian bacon; should your hash browns be complemented by onions, tomatoes, cheese, ham, chili, or any combination thereof; and would you like the bread for your toast to be white, rye, whole wheat, stone ground, sourdough, .... and the list goes on. And we haven't even touched on the low sodium bacon, the cholesterol-free egg substitutes, the choice of butter versus a medley of alternatives, the selection of jellies and jams, eight varieties of fruit juice, and of course your main drink (hot tea, iced tea, coffee, hot chocolate, and so forth). By the time you've agreed on an order, both you and your waitress need to sit down for ten minutes to recover over a cup of coffee (of which there are at least ten gourmet blends to choose from). l'm starting to salivate just thinking about it. Anyway, after gorging ourselves on the taste-fest sensation of our choosing, we set off on the last stage of our journey, and finally arrived back at the apartment at 5:30 pm. We'd traveled 3,570 miles, and our butts (bottoms in English) could attest to every one of them.
Chapter 30 A LetterFromAmerica 4 1 5 But although it's somewhat wearing on your nether regions, l'd happily recommend this back-roads style of travel to anyone, and I personally would like to do a number of similar trips: One to N e w York and a selection of the Northern States, and one to Texas, New Mexico and the South-Western region ...... at which point I'II doubtless be penning another "Letter From America."
Epilog
Well, as 1 said at the beginning of this saga, a lot of Pooh Sticks have floated under the bridge since i originally penned this piece. As I close this chapter, Lucie, who is now 24 years old, has recently married a great guy called Welles, and she called j u s t last night to say t h a t she thinks she's pregnant. (8) Abby is almost 22 years old and presented us with our first grandchild, Willow Sierra two and a half years ago. ! was expecting the worst on the naming front, and would not have been unduly amazed to be presented with a little bundle of joy condemned to be known by a moniker like "Sunbeam Moonglow Starchild" for the rest of her natural days, so "Willow" (which is an old English name) made me very happy. In fact Willow is a little curie who can bend me round her little finger, and she's the prettiest flower in her grandfather's garden (that would be me). Sad to report, Steph and ! eventually decided to go our separate ways, but we still manage to keep in touch. In fact keeping in touch has become a lot easier recently since Steph became my lodger on her way between houses -- a circumstance which has left my neighbors agog with unfounded speculation (honestly, ! could write a book!). 8Lucie will be somewhat surprised when she reaches this point in the tale, let me tell you!
416 Designus Maximus Unleashed!
Ladies and Gentlemen, ...... ...... M a x has left the building.
Appendix A:
Installing Your
Beboputer
In this appendix you will discover: M a n d a t o r y System R e q u i r e m e n t s ......................................................................... 418 O p t i o n a l System Features .................................................................................................... 418 Installation G u i d e ................................................................................................................................ 418 I m p o r t a n t ........................................................................................................................................................... 420
418 DesignusMaximusUnleashed! ~ a n d a t o r y System i~equirements To use the
Beboputer Virtual Computer V1.00 for Windows 95, you will need:
9 Microsoft| Windows 95 TM 9 8 MB RAM (16 MB recommended ). 9 A Windows 95 compatible mouse. 9 A VGA or higher graphics adapter capable of displaying resolutions of a 800 x 600 or better. 9 15 MB of available space on your hard drive.
Optional System ~eatures 9 An 8-bit sound card or better.
~nstallation ~uide The installation program (setup.eKe) decompresses and copies the Beboputer and associated files to a directory on your hard drive (some program and multimedia files will remain on the CD-ROM). When you are ready to install your Beboputer, ensure Windows 95 is up and running, insert the CD-ROM into your CD-ROM drive, then complete the following steps (note that only the main dialog windows are shown below; in the case of any other dialogs, simply click the Next button to proceed): 1) Click the S t a ~ button on the Windows 95 taskbar, then select the Settings entry, followed by the Control Panel entry, followed by the Add/Remove Programs item, which will result in the following Add/Remove Program Properl;ies dialog appearing on your screen:
AppendixA InstallingYourBeboputer4 1 9
ActiveX InternetControls AutoPlay Extender B eboputer Computer Emulator B eboputer Computer Simulator Br~ce 2 Cabinet FileViewer Contents Submenu D esignE xpress for N EAT O! E xolore From Here [Remove ov
2) Click the Install button on the Add/Remove Program Properties dialog, which will return the Install From Floppy Disk or CD-ROM dialog:
420
Designus Maximus Unleashed!
a) Click the Next button on the Install From Floppy Disk or CD-I~OM dialog, which will result in the following Run Installation Program dialog. Initially the command line field will be blank, because the setup program is in a sub-directory on your CD. So either use the Browse button (or enter by hand) to make the command line read d:\beboputer\setup.exe (substitute d: for the drive letter of your CD-ROM as necessary).
4) Click the Flnlsh button to move on, then click the Next buttons of any subsequent forms to complete the installation. Note that before running your Beboputer, we strongly recommend that your screen resolution is set to 800 x 600 or higher.
~mport:ant If you know that you don't have Microsoft "Video for Windows" loaded on your system (or if you think you do have it loaded, but you only get sound without video when you attempt to play our multimedia lab introduction), then: a) Recommence the installation process as described above, but ...... b) Once you've invoked the ~,un Installation Program dialog as discussed in point (3) above, use the Browse button on the form to locate the file called either e~up.exe or install.~e in the d : \ ~ directory on the CD-ROM (substitute d: for the drive letter of your CD-ROM as necessary). c) Keep clicking the Next or Flnlsh buttons to complete the installation as required.
Appendix B:
8ebopufetAddressing Modes and
Instruction Set In this appendix you will discover: Addressing M o d e s ............................................................................................................................422 Implied Addressing (imp) .....................................................................................................422 I m m e d i a t e Addressing (imm) ....................................................................................... 423 Absolute Addressing (abs) .................................................................................................423 "Big" Addressing M o d e s ......................................................................................................... 424 Big immediate .................................................................................................................................425 Big absolute ........................................................................................................................................425 B e b o p u t e r Instructions by C a t e g o r i e s ........................................................... 426 B e b o p u t e r l n s t r u c t i o n Set S u m m a r y .................................................................. 427
422 Designus Maximus Unleashed! ~qddressin g ]Vlodes The phrase "addressing modes" refers to the way in which the CPU determines or resolves the addresses of any data to be used in the execution of its instructions. Different computers can support a wide variety of addressing modes, where the selection of such modes depends both on the computer's architecture and the whims of the designer. Some computers employ very few addressing modes, while others boast enough to make your eyes water. Due to its educational nature, the Beboputer supports seven addressing modes, which are far more than it really needs (although these are only a subset of all possible modes). However, we'll only be using three of these modes during the course of the demonstrations described in this book: Implied, Immediate, and Absolute.
~Tmplied 7qddres$ing (imp) The implied addressing mode refers to instructions that only comprise an opcode without an operand; for example, INCA ("increment accumulator"). In this case, any data required by the instruction and the destination of any results from the instruction are implied by the instruction itself (Figure B-l).
Program Counter(PC)
I,,
I Ls [
Memory
.....
Instruction I ~ i e t e r (1~)
(c)
.
.
.
.
.
O 8484184184184 84. . . 84 (k,) .
J Op~e ./ "1
Figure B-I: The implied addressing mode An implied sequence commences when the Program Counter (PC) reaches the opcode for an implied instruction (Figure B-la), loads that opcode into the Instruction Register (IR) (Figure B-lb), and increments the program counter (Figure B-lc). Recognizing that this is an implied instruction, the CPU executes it and continues on to the next instruction. Instructions that use implied addressing are: CLRIM, DECA, DECX, HALT, INCA,
INCX, NOP, POPA, POPSR, PUSHA, PUSHSR, ROLC,ROI~C,RTI, RTS, SETIM, SHL, and SHE
Appendix B Beboputer Addressing Modes and Instruction Set 4 2 3
lmmediate
ddressing (imm)
An instruction using immediate addressing has one data operand byte following the opcode; for example, ADD $03 ("add $03 to the contents of the accumulator") (Figure B-2). c~l
Program Counter (PC)
Memory Ca) ~
~-~i9:::::::::i::::i:::i::li::.i:li::!il:ili:i.i::::i::i:::ii:i:i:i:i:ili:B.!i:.i!i!:~C::::.i:b ) ~
@"i~'~"i~i~'i~i"~
~ ~
/
Accumulator (ACC)
Figure B-2: The implied addressing mode The sequence commences when the program counter reaches the opcode for an immediate instruction (Figure B-2a), loads that opcode into the instruction register (Figure B-2b), and increments the program counter (Figure B-2c). Recognizing that this is an immediate instruction, the CPU reads the data byte pointed to by the program counter, executes the instruction using this data, stores the result in the accumulator (Figure B-2d), and increments the program counter to look for the next instruction (Figure B-2e). Instructions that can use immediate addressing are: ADD, ADDC, AND, CMPA, LDA, OR, SUB, SUBC, and XOR.
bsolute
ddressing (abs)
An instruction using absolute addressing has two address operand bytes following the opcode, and these two bytes are used to point to a byte of data (or to a byte in which to store data); for example, ADD [~rBO6] ("acid the data stored in location $4B06 to the contents of the accumulator") (Figure B-3). The sequence commences when the program counter reaches the opcode for an absolute instruction (Figure B-3a), loads that opcode into the instruction register (Figure B-3b), and increments the program counter (Figure B-3c). Recognizing that this is an absolute instruction, the CPU reads the most-significant address byte from memory, c21stores it in the most-significant byte of the temporary INote that dollar "$" characters are used to indicate hexadecimal values. Also note that the Beboputer has a 16-bit address bus and an 8-bit data bus. 2The Beboputer assumes that addresses are stored with the most-significantbyte "on top" of the least-sgnificant byte.
424
Designus Maximus Unleashed!
program counter (Figure B-3d), and increments the main program counter (Figure B-3e). The CPU then reads the least-significant address byte from memory, stores it in the least-significant byte of the temporary program counter (Figure B-3f), and increments the main program counter (Figure B-3g).
Program Counter (PC)
Memory
(a) (c) (e) (e)
Accumulator (ACC) "i
Instruction Register (IR)
(0
(h)
ll i l . . . .
Figure B-3: The absolute addressing mode The main program counter is now "put on hold" while the CPU uses the temporary program counter to point to the target address containing the data (Figure 3-Bh). The CPU executes the original instruction using this data, stores the result into the accumulator (Figure B-3c), and returns control to the main program counter to look for the next instruction.
Instructions that can use standard absolute addressing are: ADD, ADDC, AND, CMPA, LDA, OR, STA, SUB, SUBC, and XOR. Note that, in the case of a STA ("store accumulator"), the contents of the accumulator would be copied (stored) into the data byte in memory. Also note that the jump instructions JMP, JC, JNC, JN, JNN, JO, JNO, JZ, JNZ, and JSI~ can be considered to use absolute addressing. In this case, however, the address operand bytes point to the target address which will be loaded into the main program counter.
lddressing Modes Some instructions are used to load or store 16-bit quantities; for example, the instructions that load the CPU's Index (X), Stack Pointer (SP), and Interrupt
Appendix B Beboputer Addressing Modes and Instruction Set
Vector (IV) registers. In the case of these instructions, we can consider the Beboputer to support two additional addressing flavors:
Big immediate: This is very similar to the standard immediate mode, except that an instruction using big immediate addressing has two data operand bytes following the opcode; for example, BLDSP $01C4 ("load $01C4 into the stack pointer"). Instructions that can use big immediate addressing are: BLDSP, BLDX,and BLDIV. Big absolute: This is very similar to the standard absolute mode, but the two address operand bytes following the opcode are used to point to a pair of bytes from which to load or store data; for example, BLDSP [$4B06]
("load the two bytes of data starting at location $4B06 into the stack pointer"). Instructions that use big absolute addressing are: BLDSP, BLDX, BLDIV, BSTSP, and BSTX.
425
426
Designus Maximus Unleashed!
eboputer Tnstructions bg eategories ........
....i~i.~i..i~ii .i~i~~i'iiii~i~~ii ~iiiiiii~I~ii~i~!i ~~ii~~ii~i~~I:~!ii ii~~i~~~i iiii!iii~:i~~i:~~I~i~i:i~i!~i~i~~ii~~i~ii~~~i iiii~ii:~i::ii::~:~:ii~~ii~:~:i:~'~:i'~:'i~~':~ii:'~ii:i':i~~iii~~iiiii'i'i~ii'~i'i'~ii'i'i'iiiii~~~iiiii.i~.i.~.'i~.i.........................ii~ill .........................................
~~
i~i i ~i~i~i i i i i~!!ii !i i i i!~ii i~i~!ii !i i!i i i i i i!i i~i~i i i~;~i~i!i i~i~i~i i i ~i i
5h~:
I&
K~C
RORC
In~rern~
INCA DECA INCX DECX
~&
Jumpe
I~Ul'Yl~
~
~
a c ~ u m u ~ r lefT, 1 ~ (1;hroughcarry) =c~umula~r Heh.t Ibtr, (~rou~h carry)
I n ~ e . r the ac,c,umulalx~r Oe~m~ ~e a~umu~ I n ~ ~ M e index re(tlef.~r O~remen.t Me Ind~cr~let, er
LDA 5TA
Load d a ~ In memocy Irrr,o .the a~umula,r.,or 5~:~'e ~ In the a~umulsr, or Into memory
5LDX I~l"X I~LlPSP 55TSP' DLDIV
Load d a ~ In memory In.to the Index r ~ i ~ ~ In the Ind~ r~le.ter Irrr,o memory Load d a ~ ,in m ~ In'co the eta~k polrr~ ~r,ore ~ in the eC~ck poi~r,er Into memory Load elate1 In memory Irrr,o the In'r,emJpt; veu'~r
r
PUSHA
/ l,
PUSHR POP'SR
l='uehthe a~umula~r oer~ .the e~ack Pop the ar~umula~or from the eCack Pueh.the eCaCue regleCer orr .the eCack Pop the el;a.tue re~ie'r,er from .the e'r.,ack
JMP' JSR
Jump 'r,o a new memory Ioca.tlon Jump I;o a eul~our.i.e
JZ JNZ JN JNN JC JNC JO JNO
Jump Jump Jump Jump Jump Jump Jump Jump
1~1"5 RTI
I~r, um :f~rm~aeubmu~ne . . . . . . . .
Loaele &
Pueh & Pop
: :i ~ : ~ h e
~ POPA
If the If the If the If the if the if 'the hethe If the
reeul.t wae zero reeuir wae.'t zero reeul.t wae nega.tlve reeul.t wae~'r ne~a.tive reeul.t ge.er~eel = carry reeul.t alan"r, ee.era'r.e a carry reeul~ genera~i an overflow reeult etldn'l; eenera'r,~ an overflow
Figure B-4: Beboputer instructions
by categories
Appendix B Beboputer Addressing Modes and Instruction Set 4 2 7
~eboputer ~Ins~ruc~ion ~e~ ~ummary
II
I'm I I'1 op I'1
II "11 'l~ -9
9
iliiiliili~ili~i!ii~iiiiiliiiiililiii
^ DD
iiiiiiiiiiiiiiiiiiii!
A DDC
iii!ii',']iii!iii
A ND
............................
LD~V 15LDS P
iiiiiiiiiiiil;ilJ iiiiii~jii-iiiiiiiiiii!i~ii~iii~i!ii!i!ii!i i !i !i!zi!i!i!!!! iii~iiiii~iiiil ...:.................... iiiiiiiii'iiiii?.i: ii
................
:::::::::::::::::::::::::::::::
Z;IIII]IZ;]IIII
.................................. ................. .................................. .................
...................................
.:........:.........:...... :::::::::::::::::::::::::::::::: ................ . ...:...:............ i:.[:.]151111:;:;i
eSTX CLRIM
CMPA DECA DECX HALT
~sc^
u~u;~:~:~::~:~:~:~:~ . ................................. : .................................
,.................................
!!!!!!!!!!!!!!!!!!;!!!!!!!!!!!!!!! .................
:::::::::::::::::::::::::::::::::
!!!!!!!!!!!!!!!!!!!!!ii!i!i!i!
.................................. ................................ ................................ i!ii:ii!iiiililiiii!iiiililililil ============================ ................................. 9......................... ................. i!:i: :i'i:!:i?i!:i:i!:i!'i ...................................
i!iiii:'i:~i~iii!iiili ........................... -:.:.:.:.:-:.:.:.:.:.:.:.:.:.:. ]I[II]I]IZ.I:.I;I
.................................. :::::::::::::::::::::::::::::::::: .................................. ................................. ::::::::::::::::::::::::::::::::: :, ,,,,~ ~,, ,,~,~,~~,~,,,~,~,~,~ ................................. ......:................. .................................. ::::::::::::::::::::::::::::::::::::::::
BSTSP
iii!iliiiiiiiiii!iiiii
iiiii!iiiiiiiiiiiiiiiiiiiii
:i:i:)))i:~)i:)))i:)i ..................................
................................
i!iiiiiiiiiiiii!iiiii!iiiiiii
:;i~i,i~i,i,i~::~F:,::,::~i
!iiiiiiiiiii~!iiiii!iiiiii!
::::::::::::::::::::::::::::
::::::::::::::::::::::::::::::::::
.................................
. .................................
i i i i i ~i i i!i !i!i ~ iii~i!iii:iiii!iiiii!!!iiiii?iiii!!iii!!i ii~iiii~ii!iiiii!iiiiiiii!~ili!!i!iiii!i!i !i!!!i!i!i!i!i!i!i!;!i!i!i!i!i!i!; :::::::::::::::::::::::::::::: :..........:.......:. :..
.............................................. -. ................ )i:.):. f.:.~ :::.:. :.:.F.' ................................... 9~i:.:.:.:.~:.:.:.:.ii:.f. :::::::::::::::::::::::::::::::::: ::::::::::::::::::::::::::::::::: ..................................
iiiiiiiii!iiiii',iiiiiiil
...:..:...................... ................. :::::::::::::::::::::::::::::::::
.................................. ....:....:.................
!!!!!!!!!!!!iii:~!!!!!!!!!!!!!!!! iiiiiii~i[[~]~z~z!i iiiii::ili;~::i!;, ili ii!i!iiiiiiiiii!i!iiiiiiiii ................. iiiiiiiii!iiii!i!ii!iiil ili:i!:, ::i!~:::~ iii!iii!iii!~iiiii!iiiii!
i::iiiiiiii!i~iiiiiii::iiii
.........:.....:..:........ :o:,:,:,:,:,:::::,:,:,:,:::,:
..................................
................
............................... . ............................ .,.,.:.,...,.,.:.,.:.,.:.:.:.,. ................
.................................. . ............... .................................. ................. .. ............................... ................ :::::::::::::::::::::::::::::::::::::::::::::::::: ~,~;,~,~;,~;;,~,~,k~,,;;
]][:]:]:[:]:[:]:
~,i',i',ij',i'~i',O ii', ~i~ili~ii~ii~iii i!i i i i i ii!iiiiii!iiiiiiiii ::::::::::::::::::
iiiii~iiiiiiiiii!i,i,"iiiiiiii!iiii iiiii ili!iiii~i!iliii ililNiiiiiiiil iiiiii',',iii',iiii',iiiii'iiiii!iii!!!i!ii!!i ,iiiiii ...............................
................................................ :?]:]:]:::::]:]:]:: ...................................
:::::::::::::::::::::::::::::::: .......................:.......
iiiiiii~i!!!!' !iii,ii~ii.':ii'~,i'i,'i,i',i',i!',i!i!i',i ........:................
i!ii!iiiiii!!i!iiii!ii!
:E:.]I]]]]
................ :.:.:.:.:.:.:.:.:.:.:.:.:.:.:.
................................
;)[]-]:)))z) ................................. :................................. i~i!ii~iiiiiii!iiiiii~i~iil iiiiiiii!,i!i!iii!,i!iiii!iiiii!iiiiiiiiii iiiii',i',i~iiii!iiiiiiiiiiiii!i~iiiiiiiil
iiiiii:! iii!iiiiiiii iiiiiiziiiiiiiii! iiiiiiiiiiiiii: iii~il iii~iiiii!i'i,iiiii']i,i~ii~,iiiiiiii~,i iiiiii~iliilii iiiiii!iiiiiiiiiiiii!iiii :::::::::::::::::::::::::::::: :::::::::::::::::::::::::::::
iiiiili::ililililiiii!iiiiil;!i!ill i::iiiil)ii::i::!::iligili':i::i::i:.igiil
;iii!i!i!iiiiiiiiiiiiiii
.......... 9......................................................... ....................................................................
................................. .................................................................. ................ :::::::::::::::::::::::::::::::: .................................................................... ................. : ................................. ::::::::::::::::::::::::::::::::::::::::::::::: ::::::::::::::::::::::::::::::::::::::::
i!iilii!i~iii!ii!ii!ii
iiiiiiiiiiiiiiiiiiiiiiiiiiii!ii!iiiiii ii::ilili::ii::,::i::i::ii::!::i=:i::i::
!i~i!,~!!!!!i!!i!!zi!!!i!!i! ~!,,?,!,!~!~, iiiiii'~ii!!!!i iii!l!!!i!ii!i!!!!i ................ ...............................
ZZZZ
:::::::::::::::::::::::::::::::::::::::::
....................................................................
iiiiii~iiiiiiiiiii?iii~ill ................................. .................................
!!ii!!i!iii'i'~!i!!!i! iii~ziiiii
,~x
................ ............................... :::::::::::::::::::::::::::::::
ac
i',i i',i !i ~i ',i ',i',i',iiii!!ii~!i!iiil i'~! !:'.................................................. !:':!':!!':':!!!':':':!!'!!!!!!!!!!!i!!!!!!!! :':!!!!!':':!!! 11]]111112111112111112111112 !i!i!:!!i!i!i!i!i!i!i!!i!i!i!:! ...................................................................
11111111111 iiii~iiii~iiiiiiii iiiiiiiiii~i;iiiiiiii
J MP
uN
ii!iiii~iiiiiiiiii
:::::::::::::::::::::::::::::::::::::::::::::::: ~::~!~:;:;::~::~i~f:i~::~i~gf:~::~ :::::::::::::::::::::::::::::::::::::::::::::: ................................. ................ ~;;~;;;~;~;;~;;~ ::::::::::::::::::::::::::::::::::::::::::::: ............................... ::::::::::::::::::::::::::::::::::::::::::::: ................................................................... ................ ::::::::::::::::::::::::::::::::::::: ~i~:;::~:;!~::~i~::~:::i~::f:~::~ :.:.:.:.:.,.:.:.,.:.:.,.: :.:.:: ................................. .................................
.~.~-~u.~.~-~.~.~-~.~.~.~-~. ................ ................. ............................... ................ ............................... ................ i!!!i!~iiiiiiiiiil ilii@iili!iiii!ii; ..................................................
iiiiiiiiii~iiiiiiiiiiii!i ii~i!ii!iiiii~iiii~iiiiiii!i
iiii!i~ii?iiiiiii .................................. ..................................
i!ii!iii!iiiii!i ...:........:................ ............................... ................
~,~,~#~,~,~;;;;;;~,;,~ :::::::::::::::::::::::::::::::::::::::::::::::: ................................................................... .................................. ................. ................................................................... iii~iiiiiiiiiii]iiiiiil 9..::....:................... ................................................................... :::::::::::::::::::::::::::::::::: .................................. ...................................................
i................. i!',',i!i ~!',i !i!,!ii :!!i,i................................ !',iii~iiiiiii~ ...............
JN N
J NO
ililili::i::)::ii!::}::iii::iiiii::i':!::i:: f;:;i!:;i!::~!~:.!::!if:!::~:;i~F:~ ................................................................... .................................................. .................................................. .................................. . ................ ................................................................... ...........:.....................
.................................. .................................. ..................................
iiiiiiiiiiiii~iiiii!iiiiiiiii .................
i!ii!i'~i',i~iiiii',iiiii',iiii!iii~i!ii'~iiiii iii!!i!~iiiiiiii !ii!iiiiiiii!iiiliililii iiiiiiiiiiiif,!i!iiii',i!!!iii!i!iiii!i!iiii'iii!i!iiiiiii!!iii!iii!i iiiii !iii!ii!ii~ii!iliii~ii!ii! ................................................................... :ii:iii:i:i:[;i:iii:i;i................................. :::::::::::::::::::::::::::::::::::::: ~:,:,:~:~:~:~:~:~:~:~:~:~:~:~:~:~: ................................. ..................................
iiiiii:.~iiii,iii
ii::iiii)iiiili::ilili::i':!ii:=!ii::i::! iiii!':!::}ili!!i!i::iil;:i::i::)::;!i::i
................ iiiiiiiiii!iiii.i!
J NZ
iiiiiiiiiiiiii;iiiiiiiiiiil
!iiiiii!ii!i!!iiiiiiiiiiiiiil .................................. ................. .................
iiiiiiii::~iiiiiiii ................
iiiiiiiiiii!iiiiiii!!iiiiiiiil
iiiiiiiiiiiiiiiii!
i!i!iiiiii!~iiiiiiiiiii!
ii;ii!i;ii~i!ili!i!i!ii
ii!i!iii!i!i!ii!iiiii !iiiii~ii~iiiiii .................................. ..................................
Figure B-5: Beboputer instructions set s u m m a r y
continued
428
Designus Maximus Unleashed!
.Io, I,I o, .I o, ,11 ,Iol N. JO
:.:: :. ;.:;: .. :: ::::::::; ::. .............. _,
..........................
OR
.........................
.. .. .. . . . . . . . .. .. . . . . . . . . . . . . . . . ;:; ;::::::;,:................................ ............ : ............. ..... 9:::........................................................ , : : : : : : . : : : ; : : . : : , . ...... ....
..............................................
i:ii!!!!i!'~:!!~!~!~!:!iiil iiiii!i!!!!i'~il iiii!~ii~i!
:! i:i;~i~ii ~i~i ~ i ! ;i~!~i~i ~i~i,i~i:
.....
.................................
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . .
..........................................
'Jj.J J :i:'
:
:.:.!:..
. . . . . . . . . . . . . .
,'~i~,:~i~I~iI!~i4!iii!i!i!i!i!~!!i~,!!]!ii!l!']~iii!!~iililili
i}i~!I!i.iiiliI!i~ilI!iIi~i!ili ~INiiiiiiilili!iiiii~% ................ ! l i . . . . .
!!!!!!!~!!!!!! !:!!!!!!!!!!!
.
.
.
. . . . .
.
............
. . . . .
POPA
~
POPSR
~I~I ,:
................
;TT'!
I
i,,,
,
..................
~
';!~;~/"!:~!
il!li i i i!i i i
=,
.
.,H . . . . .
PUSHSR
~15,~ !
ROLC
~78
1
RORC
~7~
I
............
-
.............
,
- - ;
N
Z ~
N i
1
RTI i
RTS
~F
5ETIM
~)~
5HL
~7C
5HR
~71
STA
:
. . . . . . . . . . . . . . . . . . . .
................
::I":T:IW "~:
PUSHA
,
................................................................
LDA NOP
............ :
Vi:::i;ii~]~i,i:i:]i]i:]ii i:i::il/i~::::::;;i':V~: w ~V,~:?~,~ ~:~:::~::,:::~,.~:: :,:i:~
JSR JZ
c
!iii!i!iiiiili~i!i!i!i!ili~i~iii~!~iiiiiiiiiiiiiiiiiiiiiiiiiiiiii!ili
#~!;i i
1 .......
iii~i!ii:,I~:,!~
.........
1
i
iN!!
1:2~i~:2:2!~,2,11151
1
-
..
i
!~i:=ii l::::ii~=::iilfi~
iiiiiiiiiiiiiiiiiiiiiiiiiiiiill
::i::i~i::i:.i~:~i:~ii
SUBC
iiiiiiiiiiiiii~:ii jii~i~iii~!ii!iil
XOR
ii
,,
=,
N
Z i~,ii'~:,i',ili',i~99 iiiiiiii!i!
~20
SUB
,,
..............
1
,i:i:i:i;i;i:~::~i:~:i:i:i:i::.:
=,
.
.
2
~21
-
2
,Z9
-
.
.
:
0
N
z
C
0
N
z
c 84
............
Figure B-6: Beboputer instructions set s u m m a r y ( c o n t i n u e d )
430
/ Ano, og $ ( h e x a d e c i m a l v a l u e ) ................................ 151 ? ( d o n ' t care states) ............................. 108, 109 - X v a l u e s .................................................................. 1 1 5 # X v a l u e s ................................................................... 1 1 5 1 0 7 2 (IEEE s t a n d a r d ) ........................................ 4 8 1 1 - 2 3 P D P c o m p u t e r ............................................ 8 1 1 6 4 (IEEE s t a n d a r d ) ........................ 1 0 5 , 1 1 0 1 3 6 4 (IEEE s t a n d a r d ) ........................................ 4 8 2:1 m u l t i p l e x e r ........................... 1 0 9 , 1 1 1 , 1 1 5 2 2 2 5 t e s t e r ...................................................................... 9 2 A 1 ( S P I C E ) ........................................................... 1 2 1 2 G 6 ( S P I C E ) .......................................................... 1 2 1 2 G 7 ( S P I C E ) .......................................................... 1 2 1 ' 2 n v a l u e s ( L F S R s ) ............................................. 2 2 8 3 : 5 v a l e n c e ............................................................... 3 6 2 3 - b a n d d e l a y s ........................................................... 6 6 4:1 a r t w o r k .................................................................. 4 2 4 - b i t b i n a r y division test c a s e ................... 1 7 0 7 4 7 4 D - t y p e flip-flip ......................................... 2 0 3
AAA A b b y ................................................................ 4 0 2 , 4 1 5 A b b y d a l e G r a n g e G r a m m a r S c h o o l ......... 4 A B E L ................................................................................. 4 6 a b s o l u t e a d d r e s s i n g .......................................... 4 2 3 a c t i v e (logic s t r e n g t h ) ...................................... 1 0 0 a / D ................................................................................... 1 2 7 A / D ..................................................................... 1 2 7 , 1 3 1 A / d ..................................................................... 1 2 7 , 1 3 1 a d a m a s .......................................................................... 3 6 5
Adaptive logic (the c o n c e p t ) ........................... 2 9 4 , L o g i c (the c o m p a n y ) .................................. t i m e - s t e p a l g o r i t h m ..................................... A D D .................................................. 1 3 6 , 1 4 4 , A D D C .............................................................. 1 3 6 ,
305 273 124 148 148
a d d i t i o n ....................................................................... 1 4 5 addressing modes .............................. 4 2 1 , 4 2 2 a b s o l u t e a d d r e s s i n g ..................................... 4 2 3 "big" a d d r e s s i n g m o d e s .......................... 4 2 4 i m m e d i a t e a d d r e s s i n g ............................... 4 2 3 i m p l i e d a d d r e s s i n g ....................................... 4 2 2 adrenaline r u s h .................................................. 112
airbag .............................................................................3 8 2 A L 2 2 0 .......................................................................... 2 7 3 b o l d -- k e y definition * - f o o t n o t e ~ - s i d e b a r
a l a n i n e .......................................................................... 3 7 6 a l g o r i t h m i c r e p r e s e n t a t i o n s ........................... 4 9
algorithms
a d a p t i v e t i m e - s t e p ........................................ 1 2 4 C a l a v e r a s a l g o r i t h m .................................... 1 3 0 fault s i m u l a t i o n ................................................... 9 4 g e n e t i c a l g o r i t h m s ............................ 2 9 7 , hill-climbing a l g o r i t h m .............................. I o c k s t e p a l g o r i t h m ......................................... n o n - r e s t o r i n g a l g o r i t h m ........................... r e s t o r i n g a l g o r i t h m .......................................
301 299 130 170 170
Alice's Adventures in W o n d e r l a n d . 3 5 8 Allan M a r q u a n d ...................................... 3 5 4 , 3 5 9 Alliance, N e b r a s k a ............................................. 4 0 8 A l o n Kfir .................................................................. v, 2 8 a l p h a w a v e s .................................................................... 5 A L U ..............................................................................1 4 3 c o r e A L U .............................................................. 1 4 4 s u b t r a c t i o n s ........................................................ 1 4 8 Alvin B r o w n ............................ v, 16, 166, 1 7 9 ambiguity (timing analysis) ........... 78, 8 0 c o m m o n - m o d e a m b i g u i t y ........... 80, 1 1 7 c o r r e l a t i o n ............................................................... 81 r e - c o n v e r g e n t f a n o u t ....................... 80, 1 1 7 American f o o t b a l l f i e l d s ............................ 3 7 3 a m i n o a c i d s .............................................................. 3 7 6 a m o r p h o u s silicon .............................................. 2 5 0 A m p h i s b a e n a ......................................................... 2 2 0
amplifier
a u d i o amplifier ........................................................ 4 b r a i n w a v e amplifier ............................................ 5
Analog a n d M i x e d S i g n a l c o n f e r e n c e ................ 11 c o s i m u l a t i o n with digital .......................... 1 2 8 effects ........................................................................ 1 2 0 i n t e r f a c e c h a r a c t e r i s t i c s ............................ 1 2 5 M a g i c ................................................................. v, 2 7 3 m i x e d - s i g n a l .......................................... 120, 1 2 7 s i m u l a t i o n ................................................................ 4 3 a / D s i m u l a t i o n ............................................ 1 2 7 A / D s i m u l a t i o n .............................. 127, A / d s i m u l a t i o n ............................... 127, a d a p t i v e t i m e - s t e p .................................. a n a l y t i c a l s o l u t i o n ....................................
131 131 124 122
c o n v e r g e n c e ................................................. 1 2 4
A Analog (continued)
ASCII
simulation (continued)
ASICs
g / b=o
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
......................7, 4 0 , 52, 8 2 , 2 4 6 ,
74
268
c o s i m u l a t i o n with digital .................... 1 2 8
a n d s t a t e m a c h i n e s ...................................... 1 8 7
differential e q u a t i o n s ............................ 1 2 2
b a s i c cells .............................................................. 2 6 8
d i s c o n t i n u i t i e s ............................................. 1 2 4 m a t r i x ................................................................. 1 2 2
cell libraries ............................................................ 1 0
m i x e d - s i g n a l ................................................. 1 2 6
M i c r o m o s a i c ............................... 5 1 , 2 4 7 ,
d e s i g n i n g .................................................................. 5 6 ~ 276
S P I C E ...................................................... 4 3 , 1 2 1
asynchronous ....................................................... 1 9 9
S P I C E 1 ........................................................... 1 2 1 S P I C E 2 ........................................................... 121
c o n t r o l circuits .................................................. 2 0 4 e x a m p l e s , r e a l - w o r l d .................................. 2 1 6
S P I C E 3 ........................................................... 1 2 2
h a z a r d s ................................................................... 2 0 5
t o l e r a n c e criteria .......................... 123, 1 2 4
static h a z a r d s ............................................... 2 0 6
analysis
d y n a m i c h a z a r d s ....................................... 2 0 6
g u i d e d p r o b e a n a l y s i s ............................... 2 3 1
i s l a n d s of logic .................................................. 2 1 1
s i g n a t u r e a n a l y s i s .......................................... 2 3 1
m a t c h e d d e l a y s ............................................... 2 1 0
timing a n a l y s i s ..................................................... 7 3
m i c r o p r o c e s s o r s ............................................. 2 1 4
c h e a p - a n d - c h e e r f u l ................................... 6 2
oscillations ........................................................... 2 0 5
d y n a m i c t i m i n g a n a l y s i s .............. 77, 8 7
p s e u d o - s y n c h r o n o u s .................................. 2 1 0
static t i m i n g a n a l y s i s ............. 68, 74, 8 7
r a c e c o n d i t i o n s ................................................ 2 0 5
timing d i a g r a m a n a l y z e r s ..................... 8 2
critical r a c e s .................................................. 2 0 6
w o r s t - c a s e a n a l y s i s .................................... 7 7
n o n - c r i t i c a l r a c e s ...................................... 2 0 6
a n a l y t i c a l s o l u t i o n .......................................... 1 2 2
r e a l - w o r l d e x a m p l e s ................................... 2 1 6
A N D a r r a y s .............................................................. 2 5 5
s t a t e m a c h i n e s ....................................... 79, 2 0 4 t u n e d - r a c e s ......................................................... 2 1 0
angular momentum
of e l e c t r o n s .................. 5
a n n e a l i n g , s i m u l a t e d ........................................ 3 0 1
w a v e - p i p e l i n i n g ............................................... 2 1 0
a n n o y i n g t u n e s ........................................................ 12
ATE
a n t i f u s e s ...................................................................... 2 5 0
A t m e l C o r p o r a t i o n ............................................. 2 9 4 a t o m s ............................................................................. 3 7 2
Apple
........................................................................ 2 1 4
Application-Specific Integrated
................................................................................ 8 6
A T P G ................................................................... 8 6 , 2 0 8
C i r c u i t s ................... 7, 4 0 , 52, 82, 2 4 6 , 2 6 8 a n d state m a c h i n e s ...................................... 1 8 7
a u d i o a m p l i f i e r ............................................................. 4
automatic t e s t
b a s i c cells .............................................................. 2 6 8
e q u i p m e n t .............................................................. 8 6
cell libraries ............................................................ 10
p a t t e r n g e n e r a t i o n .............................. 8 6 , 2 0 8
d e s i g n i n g .................................................................. 5 6 ~
automotive
M i c r o m o s a i c ............................... 5 1 , 2 4 7 ,
a v a l a n c h e effect ................................................... 2 4 9
276
i n d u s t r y ......................................... 4 1
A p t i x C o r p o r a t i o n ........................................... 272* Aristotle ........................................................................ 3 5 4
BBB
A r i t h m e t i c L o g i c U n i t ................................. 1 4 3
backplane (cosimulation technique) ....................... 129
c o r e ALU .............................................................. 1 4 4 s u b t r a c t i o n s ........................................................ 1 4 8 A r k a n s a s ...................................................... 4 0 3 , 4 1 4 ARM-6 armpit
........................................................................ 2 1 4 ........................................................................ 4 0 5
Ars M a g n a ................................................................. 3 5 5 a r t w o r k ........................................................................... 4 2
B a d l a n d s , S o u t h D a k o t a .............................. 4 0 9 b a g p i p e s ..................................................................... 1 8 4 B a l d K n o b , A r k a n s a s ...................................... 4 1 4 b a n d - g a p effects .................................................. 3 6 2 b a r r e l s h i f t e r / r o t a t o r ................................................. 7 b a s e (of n u m b e r s y s t e m ) ............................. 1 3 6
b o l d - key definition *- footnote ~ - sidebar
431
432
/ bipolar b a s e b a l l s ..................................................................... 3 7 3 basic cells ................................................................... 2 6 8 b a s k e t b a l l s ................................................................ 3 7 3 b a t c h p r o c e s s e s ( S P I C E ) ............................. 1 2 2 B a t m a n ........................................................................1 3 6 Battle of B i g - B u t t R i d g e ................................ 4 0 5 B C D ........................................................................................7
beans
c o o l b e a n s .............................................................. 9 8 o l d b e a n s .............................................................. 1 2 5 Bebop B Y T E S B a c k .............................................. 13, 16 to t h e B o o l e a n B o o g i e ................................. 11 Beboputer V i r t u a l C o m p u t e r .................. 16 a d d r e s s i n g m o d e s ......................................... 4 2 1 C P U re g i s t e r d i s p l a y ...................................... 2 4 h e x k e y p a d ............................................................ 21 installing ................................................................. 4 1 9 i n s t r u c t i o n set ...................................... 4 2 1 , 4 2 6 m e m o r y w a l k e r d i s p l a y ............................... 21 m u l t i m e d i a ............................................................. 2 4 p a p e r t a p e r e a d e r / w r i t e r ............................ 2 3 Q W E R T Y k e y b o a r d ...................................... 2 5 s o u n d c a r d ............................................................. 2 5 switch p a n e l .......................................................... 18 w e b p a g e s ............................................................... 2 5 b e h a v i o r a l r e p r e s e n t a t i o n s ....................... 4 9 B e n j a m i n B u r a c k ................................................ 3 5 9 B e r k e l e y U n i v e r s i t y ................................. 43, 121 B e r m a n , Victor ..................................................... 1 0 6 bias v o l t a g e ..................................................................... 4
Big -Butt R i d g e .......................................................... 4 0 5 - e n d i a n ................................................................... 3 5 8 B i k e R a l l y ( S t u r g i s ) ....................................... 4 0 8 b i l l b o a r d s ................................................................... 4 0 9 Bill H i c k o c k .............................................................. 4 1 0 billiard c u e ................................................................ 357*
binary
a r i t h m e t i c ............................................................. 1 3 3 A D D ......................................... 1 3 6 , 144, 1 4 8 A D D C ................................................... 136, 1 4 8 a d d i t i o n ............................................................ 1 4 5 b o r r o w o p e r a t i o n s .................................. 1 3 7 C M P ....................................................... 144, 1 4 8
b o l d - k e y definition * - f o o t n o t e ~ -- s i d e b a r
binary (continued) arithmetic (continued) c o m p l e m e n t t e c h n i q u e s ..................... 1 3 6 diminished radix c o m p l e m e n t s ................................ 136, 1 3 8 division .............................................................. 1 6 6 e n d - a r o u n d - c a r r y ....................... 137, 1 3 8 m i n u e n d ............................................. 137, 1 3 8 m u l t i p l i c a t i o n ............................................... 1 5 5 o n e ' s c o m p l e m e n t s ................................ 1 3 8 o v e r f l o w ........................................................... 1 4 5 r a d i x c o m p l e m e n t s .................... 136, 1 3 8 s i g n - m a g n i t u d e f o r m a t ........................ 1 4 0 s i g n e d b i n a r y n u m b e r s ....................... 1 4 0 S U B ........................................................ 136, 1 4 8 S U B B ................................................................. 1 3 6 ' S U B C .................................................... 136, 1 4 8 s u b t r a c t i o n ..................................................... 1 4 8 s u b t r a h e n d ....................................... 137, 1 3 8 t w o ' s c o m p l e m e n t s ..... 1 3 8 , 139, 141 u n s i g n e d b i n a r y n u m b e r s ................. 1 3 8 - c o d e d d e c i m a l ....................................................... 7 c o u n t e r ................................................................... 1 8 5 division ................................................................... 1 6 6 4-bit test c a s e .............................................. 1 7 0 e x a m p l e s u b r o u t i n e ............................... 1 7 3 l o n g division ................................................. 1 6 8 n e g a t i v e v a l u e s .......................................... 1 7 3 n o n - r e s t o r i n g a l g o r i t h m ..................... 1 7 0 r e m a i n d e r (sign of) ................................. 1 7 8 r e s t o r i n g a l g o r i t h m ................................. 1 7 0 -encoded state machines 186,187, 190 logic ( v e r s u s tertiary logic) ........................ 9 8 mul t i pl i cat i on ..................................................... 1 5 5 e x a m p l e s u b r o u t i n e ............................... 161 partial p r o d u c t s .......................................... 1 5 6 s h i f t - a n d - a d d t e c h n i q u e ..................... 1 5 6 s i g n e d b i n a r y multiplication ............ 1 6 0 numbers s i g n e d b i n a r y n u m b e r s ....................... 1 4 0 u n s i g n e d b i n a r y n u m b e r s ................. 1 3 8 to g r a y c o d e c o n v e r t e r ............................. 3 1 1 b i o - f e e d b a c k ................................................................. 5 biological cat al yst s ............................................. 3 7 8 b i p o l a r j u n c t i o n t ransi st ors ......................... 2 4 7
.~.
B I S T ( L F S R s ) ........................................................ 2 3 1 bit p e r state .............................................................. 1 8 8 B l a c k Hills ................................................................. 4 0 8
bonds p e p t i d e b o n d s .................................................. 3 7 6 h y d r o g e n b o n d s ............................................. 3 7 5 Boolean A l g e b r a ............................... 3 5 4 ~, 3 5 9 B o o l e , G e o r g e ....................................................... 3 5 4 ~ B O O L logic synthesis .. v, 28, 207*, 2 3 8 h a z a r d - f r e e logic ............................................. 2 0 7 * borrow operations ........................................... 1 3 7 b o t t o m - u p .................................................................... 4 9 b r a i n w a v e amplifier ................................................. 5 Brian, T h e Life of ............................................... 2 0 1 b r i d g i n g faults ........................................................ 3 1 8 B r o k e n B o w , N e b r a s k a ................................. 4 0 7
Brown
.
.
.
.
.
.
.
.
.
.
.
~ .............
--
~ ........
-~
~. . . . . . . . .
B1Sr/chor.icollv433
~ . . . . . . . . . . . . . .
-----
~ .................
capacitive (continued) d r i v e (Verilog logic v a l u e ) ( c o n t i n u e d ) m e d i u m c a p a c i t i v e ................................. 1 0 5 s m a l l c a p a c i t i v e .......................................... 1 0 5 e l e m e n t ........................................................ 9 9 , 1 0 1 Captain E d w a r d A . M u r p h y .................. 1 1 2 ' c a r b o n ..........................................................................3 7 5 cards, punched ..........................................................6 S P I C E d e c k ...........................................................4 3
Careers Officer .......................................................... 5 Carl C l e m e n t s ......................................................................4 J u n g ........................................................................ 220
Carol Lewis ...............................................................12 Car H e n g e ......................................................................4 0 8
radio
........................................................................ 288
A l v i n B r o w n ......................... v, 16, 166, 1 7 9
C a r r o l l , L e w i s ..........................................3 5 5 , 3 5 8
S u e B r o w n .......................................................v, 16
casting o u t the nines ........................................ 1 3 6
Brunel University ................................... 43,
121
CAT
................................................................................ 40
B S I M M O S F E T m o d e l .................................. 1 2 1 '
catalysts, biological ............................................3 7 8
built-in self-test (LFSRs) ............................... 2 3 1
c a t a p u l t ........................................................................ 200
bull's testicles ..........................................................4 1 3 B u r a c k , B e n j a m i n ..............................................3 5 9
C - e l e m e n t .................................................................2 1 0
b u r g l a r a l a r m ..........................................................3 8 2
c e r a m i c p a c k a g e ..................................................2 5 1
center of U S A ........................................................4 0 5 C H (charged high) ................................ 101, 103
CCC
Cache logic ............................................................2 9 4 C A D .................................................................................4 0 C a d e n c e D e s i g n S y s t e m s ............................ 1 0 6 C A E .................................................................................4 0 C A L .................................................................................4 0 C a l a m i t y J a n e ........................................................ 4 1 0 C a l a v e r a s a l g o r i t h m .......................................... 1 3 0 c a l c i u m ......................................................................... 3 7 6 c a lc u l a t o r, m e c h a n i c a l .................................... 3 5 6 C a l c u l u s - b a s e d s e a r c h t e c h n i q u e s ....... 3 0 1 C a l t e c h ............................................................ 2 1 4 , 2 1 5 C A M .................................................................................4 0 c a m e l s ........................................................................... 3 7 3 c a n of w o r m s ............................................................. 4 1 c a n o n i c a l form .......................................................... 5 2
capacitive
chains (timing chains) ..................................... 1 8 5 c h a n n e l e d gate arrays ....................................2 6 9 charged (logic strength) ........................... 1 0 0
C H ( c h a r g e d high) .......................... 1 0 1 , 1 0 3 C L ( c h a r g e d low) ............................. 1 0 1 , 1 0 3 C X ( c h a r g e d X) ............................................... 1 0 1
Charles D o d g s o n ................................................... 3 5 5 , 3 5 8 P i e r c e ........................................................................3 5 9 S t a n h o p e .............................................................. 3 5 9 cheap-and-cheerful t i m i n g analysis .....................................................62
checkerboard
m e m o r y test s e q u e n c e ............................. 3 2 5 patterns ..................................................................3 0 8 checksum
v a l u e s ................................. 2 2 5 , 3 2 8
c h e m i c a l l y - a m p l i f i e d resists ........................ 3 6 2
d r i v e (Verilog logic v a l u e ) ...................... 1 0 5 large c a p a c i t i v e .......................................... 1 0 5 b o l d - k e y definition * - f o o t n o t e ~ - s i d e b a r
434
chemical / cool beans chemical
vapor
d e p o s i t i o n ............................................... 3 6 4 , 3 6 5 infiltration ............................................................. 3 6 5 C h e y e n n e W e l l s , C o l o r a d o ..................... 4 1 1 China ...........................................................................11 F o r b i d d e n City .................................................... 1 1 ' G r e a t Wall of C h i n a ....................................... 1 1 ' c h i p - o n - c h i p .......................................................... 3 6 7 C h r o n o l o g y C o r p o r a t i o n ................................ 8 2 C h u c k P a g l i c c o ........................................................ 7 2 ~ c i g a r - s m o k i n g w o m e n .................................... 4 1 3 c i r c u i t b o a r d ............................................................ 4 1 a r t w o r k ...................................................................... 4 2 film a r t w o r k ........................................................... 4 2 f o o t p r i n t .................................................................... 4 1 t a p i n g u p .................................................................. 4 2 w o o l l y s w e a t e r s ................................................. 42*
circus j u g g l e r s ..................................................... 3 7 4
Cirrus
c o m b i n a t o r i a l ......................................................... 2 0 0 c o m m o n - m o d e a m b i g u i t y ................ 80, 1 1 7 c o m m o n t i m i n g libraries .................................. 6 3 c o m m u n i c a t i o n s ( L F S R s ) ............................ 2 3 0 c o m p a s s ...................................................................... 4 0 5 c o m p l e m e n t t e c h n i q u e s ............................ 1 3 6 c a s t i n g o u t t h e n i n e s ................................... 1 3 6 diminished radix c o m p l e m e n t s ...................................... 136, n i n e ' s c o m p l e m e n t s ....................... 136, o n e ' s c o m p l e m e n t s ..................................... r a d i x c o m p l e m e n t s ......................... 136,
138 137 138 138
t e n ' s c o m p l e m e n t s .......................... 136, 1 3 7 t w o ' s c o m p l e m e n t s ........... 1 3 8 , 139, 141 c o m p l e x P L D s ..................................................... 2 6 6
compressing data ..........................................................2 3 1 , 2 9 3 p u l s e s ........................................................................... 6 4
Computer-Aided
Computers .............................................................10 D e s i g n s ..........................................................................8
D e s i g n ........................................................................ 4 0 E n g i n e e r i n g ........................................................... 4 0
c i t i e s in t h e s k y .................................................. 3 0 0
L a y o u t ....................................................................... 4 0 M a n u f a c t u r e .......................................................... 4 0 Test ...........................................................................4 0
C L ( c h a r g e d low) .................................. 1 0 1 , 1 0 3 c l a s h e s (Xs) .................................................... 9 8 , I I 0 C l a u d e S h a n n o n .................................. 3 5 4 ~, 3 6 0 C l e m e n t s , Carl ............................................................. 4
clock f r e q u e n c y ...................................................... 76, 8 3 b e w a r e .............................................................. 1 2 0 m a r k / s p a c e ratio ...................................... 76, 8 3 p e r i o d ............................................................... 76, 8 3 t h r e e - p h a s e clock .......................................... 2 3 3
Close Encounters of the Third Kind 411 C L R I M i n s t r u c t i o n ........................................... 3 8 8
computers C i r r u s C o m p u t e r s ............................................ 10 m a i n f r a m e c o m p u t e r s ...................................... 6 m i c r o p r o c e s s o r s ................................................ 10 c o n d u c t o r s .............................................................. 3 7 7
confel'ence$ A n a l o g a n d M i x e d S i g n a l .......................... 11 Electronics Design Automation & Test .......................................................................... 11 P r i n t e d Circuit D e s i g n .................................. 11 c o n f i g u r a b l e l o g i c ............................................ 2 8 8
C M O S .......................................................................... 2 0 9 C M P ................................................................ 1 4 4 , 1 4 8
c o n s c i o u s n e s s , collective .............................. 2 2 0
c o a r s e - g r a i n e d a r c h i t e c t u r e s ............... 2 6 9 p r o b l e m s with ................................................... 2 7 9
Control circuits, asynchronous ............................... 204
C O C ............................................................................. 3 6 7 C o h e n , D a n n y ...................................................... 3 5 8 c o l d (low e n e r g y ) e l e c t r o n injection ... 2 5 3
E n g i n e e r i n g ............................................................... 6 logic ...........................................................................5 2 c o n t r o l l e d o s c i l l a t i o n s (Xs) .................... 1 1 0 c o n v e r g e n c e ............................................................ 1 2 4 c o n v o l u t e d G r e e k d r a m a s .......................... 1 8 4 C o n w a y , A r k a n s a s ............................................. 4 1 4 c o o l b e a n s .................................................................... 9 8
c o l l a p s i n g faults ....................................................... 9 0 c o l l e c t i v e c o n s c i o u s n e s s ............................... 2 2 0 C o l o r a d o .................................................................... 4 1 1 c o m b i n a t i o n a l ........................................................ 2 0 0
b o l d - k e y definition
* - f o o t n o t e ~ -- s i d e b a r
core
c o r e ALU .................................................................... 1 4 4 c o r r e l a t i o n .................................................................... 81 Cortes, D o n ............................................................. 2 1 7 cosimulation between a n a l o g a n d digital s i m u l a t o r s ............... 1 2 8 c o u p l e d t e c h n i q u e ..................... 129, 1 3 0 b a c k p l a n e t e c h n i q u e ............................. 1 2 9 g l u e d t e c h n i q u e ............................ 129, 1 3 0 unified t e c h n i q u e ..................................... 1 2 9 logic v a l u e sets ................................................. 1 0 4 counters J o h n s o n C o u n t e r s ........................................ 2 4 1 ring c o u n t e r s ..................................................... 2 4 1 C o u n t r y & W e s t e r n ........................................ 4 0 6 coupled ( c o s i m u l a t i o n t e c h n i q u e ) ............... 129, 1 3 0 C P L D s ........................................................................ 2 6 6 C P U register display ........................................... 2 4 C R C s .................................................. 2 2 5 , 2 3 0 , 3 2 8 cringing w h i n e r s ................................................... 1 1 2 critical r a c e s ............................................................. 2 0 6 c r o s s o v e r (genetic algorithms)... 2 9 7 , 3 0 1 c r o s s - p r o d u c t v a l u e sets ............................... 1 0 0 cruise missile ........................................................... 2 9 2 c u r r e n t state ............................................................. 1 8 4 C V D .............................................................................. 3 6 4 CVI .............................................................................. 3 6 5 C X ( c h a r g e d X) .................................................... 101 c y c l e - b a s e d s i m u l a t i o n .................................. 6 8 Q u i c k t u r n ................................................................ 6 8 S p e e d S i m ............................................................... 6 8 cyclic redundancy codes. 225, 230, 328 c y c l o t o m i c p o l y n o m i a l s ................................. 2 3 2
DDD DO ( d r i v e 0 f a u l t ) ................................................. 8 8 D1 (drive 1 fault) ................................................... 8 8 D.A. H u f f m a n ........................................................ 185 d a n c i n g girls ............................................................... 14 D a n n y C o h e n ........................................................ 3 5 8
data
c o m m u n i c a t i o n s (LFSRs) ...................... c o m p r e s s i o n (LFSRs) ................................ integrity (LFSRs) ........................................... t y p e s (VHDL) ...................................................
230 213 230 105
/
databook delays ...........................................................................62 X v e r s u s ? v a l u e s ........................................... 1 0 9 d a t a p a t h l o g i c ........................................................ 5 2
Dave G i l l o o l y .......................................................................... v Potts .............................................................................. 6 David Muller ......................................................... 2 1 0 D e a d w o o d , S o u t h D a k o t a .......................... 4 1 0 d e c k of c a r d s ............................................................. 4 3 d e c r y p t i o n (LFSRs) .......................................... 2 3 0 d e e p - s u b m i c r o n d e l a y s . 48, 61, 82, 3 3 1 m u l t i - i n p u t transitions drive capability ........................................... 3 4 5 P n - P n d e l a y s ............................................... 3 4 4 path - d e p e n d e n t drive ..................................... 3 4 1 -specific P n - P n ........................................... 3 3 8 r e f l e c t e d parasitics ........................................ 3 4 6 slope-dependent drive capability ........................................... 3 4 2 P n - P n d e l a y s ............................................... 3 4 0 state-dependent drive capability ........................................... 3 4 2 i n p u t t h r e s h o l d s ......................................... 3 4 3 P n - P n ................................................................. 3 4 1 t e r m i n a l parasitics .................................... 3 4 4 t h r e s h o l d - d e p e n d e n t P n - P n ................ 3 3 9 verification t e c h n o l o g i e s ......................... 3 4 6 d e i t y .............................................................................. 1 8 4 d e l a y lines ....................................................... 65, 1 2 6 delays 3 - b a n d d e l a y s ...................................................... 6 6 d a t a b o o k d e l a y s ................................................ 6 2 d e e p - s u b m i c r o n d e l a y s 48, 61, 82, 3 3 1 d i s t r i b u t e d d e l a y s ................................ 63, 1 2 6 inertial d e l a y s .......................................... 65, 1 2 6 m a x d e l a y s ................................................... 61, 7 4 m i n d e l a y s .................................................... 61, 7 4 m i n : m a x d e l a y s .................................................. 77 m i n : t y p d e l a y s ..................................................... 7 7 m i n : t y p : m a x d e l a y s .............................. 61, 7 4 P i n - t o - P i n d e l a y s .................... 64, 126, 3 3 4 P n - P n d e l a y s .............................. 64, 126, 3 3 4 P o i n t - t o - P o i n t d e l a y s .................................. 3 3 4
b o l d - k e y definition * - f o o t n o t e ~ - s i d e b a r
436
/ ,
or,
delays (continued) P t - P t d e l a y s ........................................................3 3 4 to a n d f r o m X v a l u e s .................................... 9 9 t r a n s p o r t d e l a y s .................................... 65, 1 2 6 t y p d e l a y s ...................................................... 61, 7 4 t y p : m a x d e l a y s ......................................... 61, 7 4 u n b a l a n c e d d e l a y s .......................................... 6 4 X v a l u e s a n d d e l a y s ....................................... 9 9 d e l t a t i m e ................................................................. 1 2 2 DeMorgan Transformations 239,243,257
Design
Analysis Associates ......................... 195, 2 1 6 capture ......................................................................40 methodologies b o t t o m - u p ......................................................... 4 9 m i d d l e - o u t ........................................................ 4 9 m i x e d - l e v e l ....................................................... 5 4 t o p - d o w n ........................................................... 4 9 s y n c h r o n o u s d e s i g n s .............. 68, 76, 2 0 7 designer m o l e c u l e s ........................................ 3 7 9 d e s i g n e r s v e r s u s e n g i n e e r s ............................ 4 0 d e s i g n i n g A S I C s a n d F P G A s ...................... 5 6 ~ d e t e c t i n g Zs a n d Xs ............................................. 9 3 D e u s e x M a c h i n a ................................................ 1 8 4 d e u t e r i u m ..................................................................373* Devil's Tower, W y o m i n g .............................. 4 1 1 D H (driving high) .................................. 101, 1 0 3 diamond s u b s t r a t e s ............................................................. 3 6 5 transistors ............................................................. 3 6 6 d i f f e r e n t i a l e q u a t i o n s ( S P I C E ) ............ 1 2 2 digital c o s i m u l a t i o n with a n a l o g ........................ 1 2 8 fault s i m u l a t i o n ................................................... 8 5 a l g o r i t h m s .......................................................... 9 4 collapsing faults ............................................ 9 0 d e t e c t i n g Zs a n d Xs ................................... 9 3 drive faults ........................................................ 8 8 e q u i v a l e n c i n g faults .................................. 9 0 e q u i v a l e n t faults ........................................... 91 fault collapsing ............................................... 9 0 functional faults ............................................ 9 2 initialization ...................................................... 9 4 o p e n faults ........................................................ 8 8 p o t e n t i a l d e t e c t i o n s ......................... 93, 9 4
b o l d - k e y definition *-- f o o t n o t e ~ -- s i d e b a r
digital (continued) fault s i m u l a t i o n ( c o n t i n u e d ) primitive gate faults ................................... 92 s h o r t faults ........................................................ 92 s t u c k - a t faults ....................................... 88, 92 Xs (detecting) ................................................. 93 Zs (detecting) .................................................. 93 logic s i m u l a t i o n ..................................... 57, 1 2 5 3 - b a n d d e l a y s ................................................ 66 c o m p r e s s i n g p u l s e s ................................... 6 4 c o s i m u l a t i o n with a n a l o g .................. 1 2 8 c y c l e - b a s e d ...................................................... 68 d i s t r i b u t e d d e l a y s ........................... 63, 1 2 6 event - d r i v e n ..............................................................61 - w h e e l ........................................................... 125 h a r d w a r e e m u l a t o r s .................................. 70 H I L O ................................................................... 1 2 5 h o m e - b r e w e d ................................................ 69 n a r r o w p u l s e s .................................... 63, 1 2 6 inertial d e l a y s ..................................... 65, 1 2 6 logic v a l u e s y s t e m s .................................... 97 m i x e d - l e v e l ....................................................... 66 P i n - t o - P i n d e l a y s ............................ 64, 1 2 6 P n - P n d e l a y s ..................................... 64, 1 2 6 s t r e t c h i n g p u l s e s ........................................... 6 4 traditional logic .............................................. 5 8 t r a n s p o r t d e l a y s ............................... 65, 1 2 6 u n b a l a n c e d d e l a y s ..................................... 6 4 signal p r o c e s s i n g ............................................ 127 r e c o n f i g u r a b l e logic ................................ 2 9 2 t h e r e ' s n o s u c h thing .................................. 1 2 0 d i m i n i s h e d r a d i x c o m p l e m e n t s 136,138 dingbat ..............................................................................4 discontinuities ........................................................ 124 Dissertio de Arte Combinatoria ............. 356 distributed delays ..............................................................63, 126 RC interconnect ............................................. 336 division, b i n a r y ................................................... 166 4-bit test case .................................................... 170 example subroutine .................................... 173 long division ...................................................... 168 negative values ................................................ 173 non-restoring algorithm ........................... 170
.
division, binary (continued)
r e m a i n d e r (sign of) ...................................... 1 7 8 r e s t o r i n g a l g o r i t h m ....................................... 1 7 0 DL (driving low) ................................... 101, 1 0 3 D o d g s o n , Charles ................................. 3 5 5 , 3 5 8 D o n Cortes ............................................................... 2 1 7
don't
care ................................................................ 108, 1 0 9 k n o w ........................................................................ 1 0 8 Doone Publications ..................... v, 13*, 186* D o r o t h y ........................................................................ 4 0 4 D o u g S m i t h ................................................. 5 6 ~, 186* d r a f t i n g office ............................................................ 4 1 d r a f t s m e n a n d d r a f t s w o m e n ........................ 4 1 d r a g o n s ........................................................................ 1 0 8 D r i b b l e r , G o n z o ................................................... 1 3 6
drive
capability m u l t i - i n p u t t r a n s i t i o n s .......................... 3 4 5 p a t h - d e p e n d e n t ........................................ 3 4 1 s l o p e - d e p e n d e n t ....................................... 3 4 2 s t a t e - d e p e n d e n t ........................................ 3 4 2 faults
...........................................................................8 8
driven/driving
logic s t r e n g t h ..................................................... 1 0 0 DH DL DX faults DO D1 DZ
(driving high) ....................... 1 0 1 , 1 0 3 (driving low) .......................... 1 0 1 , 1 0 3 (driving X) ............................................ 1 0 1 ...........................................................................8 8 (drive 0) ..................................................... 8 8 (drive 1) ..................................................... 8 8 (drive Z) ..................................................... 8 8 DSPs ........................................... 1 2 7 r e c o n f i g u r a b l e logic ..................................... 2 9 2 D - t y p e f l i p - f l o p ....................................... 1 0 9 , 1 1 6 7 4 7 4 ........................................................................ 2 0 3
dynamic
m e m o r y faults .................................................. 3 1 8 p o w e r d i s s i p a t i o n .......................................... 2 0 9 R A M ........................................................................2 5 1 t i m i n g a n a l y s i s ........................................... 77, 8 7 X v a l u e s .................................................... 1 1 4 , 1 1 7 dynamically reconfigurable logic .... 2 8 8 D X ( d r i v i n g X) ....................................................... 1 0 1 D Z (drive Z fault) ................................................... 8 8
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
,o, vo437 .
.
.
EEE ECL
EDA
................................................................................. 6 5 ................................................................................. 4 0
t h e d a w n i n g of .................................................... 4 2 ........................................................................... 11 E D I F .............................................................................. 2 8 1 E D m a g a z i n e ................................................................. v E d S m i t h ....................................................................... 6 9 e d g e s p e e d ............................................................... 1 2 0
EDA&T
editor
s t a t e d i a g r a m e d i t o r ............................. 50, 5 5 f l o w c h a r t e d i t o r ........................................ 5 0 , 5 5 EDN magazine .......................................................... v E D T N m a g a z i n e ............................................... v, 5 6 ~ E d w a r d A. M u r p h y ........................................... 1 1 2 ' E E P L D s ........................................................................ 2 6 4 E E P R O M t r a n s i s t o r s ............................ 2 5 2 , 2 6 3 E . E M o o r e ................................................................ 1 8 5 effluent, r a d i o a c t i v e .......................................... 1 8 9 electromagnetic transistor fabrication 368
Electronic Design
A u t o m a t i o n ............................................................ 4 0 t h e d a w n i n g of .............................................. 4 2 m a g a z i n e ...................................................................... v & Technology Network o n l i n e m a g a z i n e ......................................... v, 5 6 ~ N e w s m a g a z i n e ...................................................... v
Electronics Design Automation & Test conference .................................................................. 11 electrons .................................................................... 3 7 2 angular momentum
of ..................................... 5
e x c i t a t i o n s ............................................................ 3 7 7 * e x i s t e n c e of ................................................................ 6* F o w l e r - N o r d h e i m t u n n e l i n g ................. 2 5 3 shells ........................................................................ 3 7 3 elephants .................................................................. 4 0 7
encoding (state machines) b i n a r y e n c o d e d .................... 1 8 6 , 1 8 7 , 1 9 0 o n e - h o t e n c o d i n g ............... 1 8 6 , 1 8 7 , 1 9 0 encryption ( L F S R s ) ........................................ 2 3 0 e n d - a r o u n d - c a r r y .................................. 1 3 7 , 1 3 8 E n g i n e e r i n g , C o n t r o l .............................................. 6 e n g i n e e r s v e r s u s d e s i g n e r s ............................ 4 0 English football ............................................................. 4 E n u m e r a t i v e s e a r c h t e c h n i q u e s ............. 3 0 1
b o l d - key definition *- footnote ~ - sidebar
438
environmental / field-programmable e n v i r o n m e n t a l c o n d i t i o n s ............................... 61 e n z y m e s ..................................................................... 3 7 8 d e s i g n e r e n z y m e s .......................................... 3 7 9 E P A C s ........................................................................2 7 3 EPLDs ........................................................................2 6 4 E P R O M t r a n s i s t o r s ............................... 2 5 0 , 2 6 3 E Q U o p e r a t o r ........................................................ 1 1 5 e q u i v a l e n c i n g faults ............................................. 9 0 e q u i v a l e n t faults ...................................................... 9 1 E r n e s t N a g y d e N a g y b a c z o n .................... 3 6 6 Euler, L e o n h a r d .................................................. 3 5 4 e v a l u a t i o n cycle ................................................... 1 0 2
event - d r i v e n ....................................................................... 61 - w h e e l ...................................................................... 1 2 5 examples a s y n c h r o n o u s d e s i g n s ............................... 2 1 6 M a n y - S m a l l s t a t e m a c h i n e s .... 1 9 1 , 1 9 4 subroutines division (binary) ........................................ 1 7 3 m u l t i p l i c a t i o n (binary) .......................... 1 6 1 e x t r i n s i c d e l a y s .................................................. 3 3 3
FFF F a i r c h i l d ....................................................................... 5 1 false p a t h s .................................................................... 7 6 f a n o u t , r e - c o n v e r g e n t ........................... 80, 1 1 7 f a s h i o n s t a t e m e n t , p o l y e s t e r ............................ 5
fault c o l l a p s i n g ................................................................. 9 0 s i m u l a t i o n ................................................................ 8 5 a l g o r i t h m s .......................................................... c o l l a p s i n g faults ............................................ d e t e c t i n g Zs a n d Xs ................................... d r i v e faults ........................................................
94 90 93 88
e q u i v a l e n c i n g faults .................................. e q u i v a l e n t faults ........................................... fault c o l l a p s i n g ............................................... f u n c t i o n a l faults ............................................ initialization ......................................................
90 91 90 92 94
o p e n faults ........................................................ p o t e n t i a l d e t e c t i o n s ......................... 93, p r i m i t i v e g a t e faults ................................... s h o r t faults ........................................................
88 94 92 92
s t u c k - a t faults ....................................... 88, 9 2
b o l d - k e y definition * - f o o t n o t e ~ - s i d e b a r
fault (continued) simulation (continued) Xs ( d e t e c t i n g ) ................................................. 9 3 Zs ( d e t e c t i n g ) .................................................. 9 3 faults ................................................................................. 8 5 b r i d g i n g faults ................................................... 3 1 8 c o l l a p s i n g faults .................................................. 9 0 d r i v e faults .............................................................. 8 8 e q u i v a l e n c i n g faults ........................................ e q u i v a l e n t faults ................................................ fault c o l l a p s i n g .................................................... f u n c t i o n a l faults ..................................................
90 91 90 92
o p e n faults ................................................. 88, 3 1 8 p r i m i t i v e g a t e faults ........................................ 9 2 s h o r t faults .............................................................. 9 2 s t u c k - a t faults ................................ 88, 92, 3 1 8 v e r s u s fault effects ........................................ 3 1 8
feedback loops ................................................................. 77, 79 m i x e d - s i g n a l circuits .................................... 1 2 6 s e q u e n t i a l logic ................................................ 2 0 1 FET ....................................................................99, F e y n m a n , R i c h a r d ...................................... 6*, F H ( f o r c e d high) ..................................... I 0 1 , F i b o n a c c i .................................................................... field-effect t r a n s i s t o r s .......................................
102 378 103 301 247
field-programmable analog devices ....................................247, 273 devices ....................................................................245 a n t i f u s e s ........................................................... 2 5 0 C P L D s ............................................................... 2 6 6 E E P L D s ............................................................ 2 6 4 E E P R O M t r a n s i s t o r s ................. 2 5 2 , 2 6 3 E P L D s ............................................................... 2 6 4 E P R O M t r a n s i s t o r s ..................... 2 5 0 , 2 6 3 F L A S H t r a n s i s t o r s ................................... 2 5 3 f o l d e d logic .................................................... 2 5 7 F P A D s .................................................. 2 4 7 , F P G A s ........................................ 4 0 , 2 6 8 , F P I C s .................................................................. F P I D s ..................................................................
273 290 272 272
F P M S D s .............................................. 2 4 7 , 2 7 3 fusible links ....................................... 2 4 8 , 2 5 4 lateral f u s e s .............................................. 2 4 9 vertical f u s e s ........................................... 2 4 9
field-programmable / FPDs 4 3 9
field-programmable (continued)
finite state machines (continued)
devices (continued) i n - s y s t e m p r o g r a m m a b l e ..... 2 5 3 , 2 9 0
n e x t state ..............................................................1 8 4
I S P .......................................................... 2 5 3 , 2 9 0 one-time
One -Big m a c h i n e s ............................................. 191
n o n - h i e r a r c h i c a l .............................................. 191
p r o g r a m m a b l e ............... 2 4 9 , 2 5 0 , 2 5 4
- h o t e n c o d i n g ................... 186, 187, 1 9 0
O T P .......................................... 2 4 9 , 2 5 0 , 2 5 4
p e e r - t o - p e e r i m p l e m e n t a t i o n s ........... 1 9 4
P A L s ................................................................... 2 5 8
PLDs
P L A s ................................................................... 2 5 7 P L D s ......................................................... 4 6 , 253
r e a l - w o r l d e x a m p l e s ................................... 1 9 5
.............................................187, 189, 1 9 4
p r o g r a m m i n g t e c h n o l o g i e s .............. 2 4 8
state variables ...................................................1 8 4
a n t i f u s e s ...................................................... 2 5 0
state-per-bit ........................................................1 8 8
r e g i s t e r e d - o u t p u t m a c h i n e s ..... 188, 2 3 6
E E P R O M t r a n s i s t o r s ........... 2 5 2 , 2 6 3
u n d e f i n e d states ................................ 187, 1 8 9
E P R O M t r a n s i s t o r s ............... 2 5 0 , 2 6 3
f i r s t - i n f i r s t - o u t R A M .................................... 2 1 6
fusible links .............................................. 2 4 8 P R O M s .............................................................. 2 5 8 S P L D s ..................................................... 4 6 ,
253
t a x o n o m y ....................................................... 2 4 6
as an L F S R a p p l i c a t i o n ............................ 2 2 7
fitness (genetic algorithms) ................... 3 0 2 F L ( f o r c e d low) ........................................ 1 0 1 , 1 0 3 F L A S H t r a n s i s t o r s .............................................. 2 5 3
t e r m i n o l o g y ................................................... 2 4 6
fiat s c h e m a t i c s .......................................................... 4 5
t i m e l i n e ............................................................. 2 4 7
f l i p - f l o p , D - t y p e ..................................... 1 0 9 , 1 1 6
g a t e a r r a y s .................................... 40, 2 6 8 , 2 9 0
7474
........................................................................ 2 0 3
a n d s t a t e m a c h i n e s .................... 1 8 7 , 1 8 9
flowchart editor ..........................................50,
d e s i g n i n g ............................................................ 5 6 ~
f l o w c h a r t s ........................................................ 5 9 , 2 7 8
FPGA-based hardware emulators
55
70
f l y - b y - w i r e aircraft ..................................................... 6
i n t e r c o n n e c t d e v i c e s ................................... 2 7 2
f o l d e d logic ............................................................... 2 5 7
m i x e d - s i g n a l d e v i c e s ..................... 2 4 7 , 2 7 3
football fields ........................................................... 3 7 3
FIFO
.............................................................................. 2 1 6
as a n L F S R a p p l i c a t i o n ............................ 2 2 7
figura universalis ..............................................
356
f o o t p r i n t s ....................................................................... 4 1
forced (logic strength) .................................
100 F H ( f o r c e d high) ................................ 1 0 1 , 1 0 3
film a r t w o r k ................................................................. 4 2
F L ( f o r c e d low) ................................... 1 0 1 , 1 0 3
filters ("X filters") ................................................ 1 1 4
F L ( f o r c e d X) .................................................... 1 0 1
f i n e - g r a i n e d a r c h i t e c t u r e s ............................ 2 6 9
Forbidden
finite
A S I C s ........................................................................1 8 7
F o r e s t G u m p .............................................. 1 8 8 , 1 8 9 ' f o r m a l v e r i f i c a t i o n ................................................. 5 8
a s y n c h r o n o u s ...........................................79, 2 0 4
FORTRAN
bit p e r state .........................................................1 8 8
S P I C E ....................................................................... 1 2 2 four-part harmony ..........................................2 1 3
state machines ............. 49,
183, 2 6 1
b i n a r y e n c o d e d .................... 186, 187, 1 9 0 c u r r e n t state .......................................................1 8 4
C i t y ....................................................... 1 1 '
..................................................................... 6
Fowler-Nordheim electron tunneling
253
examples, r e a l - w o r l d .................................. 1 9 5
F P A D s ............................................................. 2 4 7 , 2 7 3
F P G A s .......................................................187, 1 8 9
FPDs
.............................................................................. 2 4 5
hierarchical ..........................................................191
a n t i f u s e s ................................................................. 2 5 0
initialization .........................................................1 8 9 M a n y - S m a l l m a c h i n e s ................. 1 9 1 , 1 9 4
C P L D s .................................................................... 2 6 6 E E P L D s ................................................................. 2 6 4
M e a l y m a c h i n e s ................................. 1 8 5 , 1 8 6
E E P R O M t r a n s i s t o r s ...................... 2 5 2 , 2 6 3
M o o r e m a c h i n e s ............................... 1 8 5 , 1 8 6
E P L D s ..................................................................... 2 6 4
b o l d - key definition *- footnote ~ - sidebar
440
rpo, / Gnarly functional (continued)
FPDs (continued) E P R O M t r a n s i s t o r s .......................... 2 5 0 , F L A S H t r a n s i s t o r s ........................................ f o l d e d logic .......................................................... F P A D s ........................................................ 2 4 7 , F P G A s .......................................................... 4 0 ,
263 253 257 273 268
F P I C s ........................................................................ 2 7 2 F P I D s ........................................................................ 2 7 2 F P M S D s ................................................... 2 4 7 , 2 7 3 fusible links ............................................ 2 4 8 , 2 5 4 lateral f u s e s .................................................... 2 4 9 vertical f u s e s ................................................. 2 4 9 i n - s y s t e m p r o g r a m m a b l e ........... 2 5 3 , 2 9 0 I S P ................................................................ 2 5 3 , 2 9 0 one-time programmable 249, 250, 254 OTP ........................................... 2 4 9 , 2 5 0 , 2 5 4 P A L s ........................................................................ 2 5 8 P L A s ........................................................................ 2 5 7 P L D s ..............................................................4 6 , 2 5 3 p r o g r a m m i n g t e c h n o l o g i e s ................... 2 4 8 a n t i f u s e s ........................................................... E E P R O M t r a n s i s t o r s ................. 2 5 2 , E P R O M t r a n s i s t o r s ..................... 2 5 0 , fusible links ....................................................
250 263 263 248
P R O M s ................................................................... S P L D s ........................................................... 4 6 , t a x o n o m y ............................................................. t e r m i n o l o g y ........................................................ t i m e l i n e ...................................................................
258 253 246 246 247
F P G A s ..............................................................4 0 , 2 6 8 a n d s t a t e m a c h i n e s ......................... 1 8 7 , 1 8 9 - b a s e d h a r d w a r e e m u l a t o r s .................... 7 0 d e s i g n i n g .................................................................. 5 6 ~ F P I C s ............................................................................ 2 7 2 F P I D s ........................................................................... 2 7 2 F P M S D s ......................................................... 2 4 7 , 2 7 3 f r a c t a l - b a s e d c o m p r e s s i o n .......................... 2 9 3 f r e q u e n c y , c l o c k ......................................... 76, 8 3 b e w a r e .................................................................... 1 2 0 FSMs ...................................................... 49, 1 8 3 , 2 6 1
functional
faults .............................................................................. 9 2 l a t e n c y ...................................................................... 2 9 4 m e m o r y faults .................................................. 3 1 8 r e d u n d a n c y ........................................................ 2 9 4
b o l d -- k e y d e f i n i t i o n
*-- f o o t n o t e ~ -- s i d e b a r
r e p r e s e n t a t i o n s ................................................... 4 9 t e s t & test p r o g r a m s ............................. 9, 2 3 1 g u i d e d p r o b e a n a l y s i s .......................... 2 3 1 f u s e f i l e s ........................................................................ 4 6 f u s i b l e l i n k s ............................................... 2 4 8 , lateral f u s e s ......................................................... vertical f u s e s ...................................................... f u t u r e X s .................................................................... ~ X v a l u e s ............................................................. # X v a l u e s .............................................................
254 249 249 114 115 115
ID n u m b e r s ......................................................... static v e r s u s d y n a m i c .................... 114, f u z z y l o g i c ................................................................ m i c r o c o n t r o l l e r ................................................ F X ( f o r c e d X) ........................................................
116 117 305 273 101
GGG GaAs .............................................................................. 3 6 2 .............................................................................. 2 6 4 G A s ( g e n e t i c a l g o r i t h m s ) ................ 2 9 7 , 3 0 1
GALs
gallium a r s e n i d e ................................................... 3 6 2 g a r d e n p e a s ............................................................. 3 7 3 gas, n o b l e .................................................................. 3 7 3 g a t e s w a p p i n g ........................................................... 4 5 G e n e r a l G o o b e r IV J r .................................... 4 0 5
generating
o n e ' s c o m p l e m e n t s ..................................... 1 3 8 t w o ' s c o m p l e m e n t s ...................................... 1 3 9 g e n e r i c a r r a y l o g i c ......................................... 2 6 4 g e n e t i c a l g o r i t h m s ................................. 2 9 7 , 3 0 1 G e n R a d ............................................................................. 9* 2 2 2 5 t e s t e r ................................................................. 9
George
B o o l e ....................................................................... 3 5 4 ~ Mills ....................................................................... v, 3 4 g e r m a n i u m ............................................................. 3 6 4 G e r r y M u s g r a v e ................................................... 1 2 5 G . H . M e a l y .............................................................. 1 8 5 Gillooly, D a v e ............................................................... v girls softball t e a m s .............................................. 4 0 3 glitches ........................................................................2 0 6 g l o b s of logic .............................................................. 6 8 g l u e d ( c o s i m u l a t i o n t e c h n i q u e ) . . 129, 1 3 0 G n a r l y G r o t t o ........................................................ 4 0 9
gone / Holland
gone -high, d o n ' t k n o w w h e n ............................. 7 8 -low, d o n ' t k n o w w h e n ............................... 7 8 Gonzo Dribbler .................................................. 1 3 6 G o t t f r i e d v o n L e i b n i z ...................................... 3 5 6 g r a p h i c a l e n t r y m e c h a n i s m s .................. 5 0 state diagram editors ........................... 50, 5 5 f l o w c h a r t editors ...................................... 50, 5 5 g r a y c o d e s ............................................................... 3 1 1 G r e a t Wall of C h i n a ............................................. 1 1 ' G r e e k d r a m a s ........................................................ 1 8 4 G r e e n PCs ................................................................ 2 1 3 grockles ........................................................................3 8 2
Guided
p r o b e analysis .................................................. 2 3 1 r a n d o m searches ........................................... 3 0 1
G u l l i v e r ' s T r a v e l s ............................................. 3 5 7 G u m b o ........................................................................... 12 G u m p , Forest ............................................ 188, 1 8 9 ' G u t h r i e , O k l a h o m a ........................................... 4 1 2
HHH H a L C o m p u t e r s ................................................. 2 1 5 H A L T instruction .............................................. 3 9 2 hamsters farms ........................................................................4 0 7 liquidizing ............................................................. 3 7 5 hardware d e s c r i p t i o n l a n g u a g e s ......... 4 6 , 2 6 4 , 2 7 7 A B E L .....................................................................4 6 a l g o r i t h m i c r e p r e s e n t a t i o n s ................ 4 9
b e h a v i o r a l r e p r e s e n t a t i o n s ................. 4 9 f u n c t i o n a l r e p r e s e n t a t i o n s ................... 4 9 g r a p h i c a l e n t r y m e c h a n i s m s .............. 5 0 H D L w a r s .......................................................... 4 9 l a n g u a g e - d r i v e n d e s i g n ............... 51, 5 2 s t r u c t u r a l r e p r e s e n t a t i o n s .................... 4 9 U D L / I .................................................................... 5 3 Verilog ...................... 4 8 , 53, 5 6 ~, 60, I 0 5 V H D L ...................... 4 8 , 53, 5 6 ~, 60, 1 0 5 e m u l a t o r s ................................................................. 7 0 - s o f t w a r e i n t e g r a t i o n ...................................... 6 9 s i m u l a t o r s ................................................................ 7 0 H a r l e y , R o b e r t .................................................... 3 5 9 h a r m o n i c s .................................................................. 121
h a r m o n y , f o u r - p a r t ............................................ 2 1 3 h a z a r d - f r e e logic .................................................. 2 0 7 h a z a r d s ........................................................................2 0 5 static h a z a r d s ..................................................... 2 0 6 d y n a m i c h a z a r d s ............................................ 2 0 6 H D L C h i p D e s i g n ( t h e b o o k ) .... 5 6 ~, 1 8 6 ' H D L s ...................................................... 46, 2 6 4 , 2 7 7 A B E L ........................................................................... 4 6 a l g o r i t h m i c representations ..................... 4 9 b e h a v i o r a l representations ....................... 4 9 f u n c t i o n a l representations ........................ 4 9 graphical entry m e c h a n i s m s ................... 5 0 H D L wars ............................................................... 4 9 l a n g u a g e - d r i v e n d e s i g n ........ 5 1 , 52, 2 7 7 s t r u c t u r a l r e p r e s e n t a t i o n s .......................... 4 9 U D L / I ...........................................................................5 3 Verilog ........................... 4 8 , 53, 5 6 ~, 60, 1 0 5 V H D L ............................ 4 8 , 53, 5 6 ~, 60, 1 0 5 h e a v y w a t e r ........................................................... 373* h e l i u m a t o m ............................................................ 3 7 3 helix .............................................................................. 377 h e r r i n g , r e d .............................................................. 1 2 1 h e t r o j u n c t i o n t r a n s i s t o r s ............................... 3 6 3 h e u r i s t i c t e c h n i q u e s .......................................... 1 2 5 H e w l e t t P a c k a r d .................................................. 2 1 5 h e x k e y p a d .................................................................21 H i c k o c k , Wild Bill ............................................... 4 1 0 hierarchical logic strengths ................................................... 101 representations ................................................... 4 5 state m a c h i n e s ................................................. 191 high e n e r g y e l e c t r o n injection ........... 2 5 1 , 2 5 2 - f r e q u e n c y n o i s e ............................................ 1 2 5 - i m p e d a n c e ( Z ) v a l u e s ....... 9 9 , 1 0 1 , 1 0 3 fault s i m u l a t i o n .................................... 8 9 , 9 3 - s p e e d d e s i g n .................................................... 1 2 0 - t o - l o w t r a n s i t i o n ............................................... 61 H i g h T e x t P u b l i c a t i o n s .............................. v, 1 1 ' C a r o l L e w i s ........................................................... 12 h i g h - t o - l o w t r a n s i t i o n .................................... 61 hill-climbing a l g o r i t h m .................................... 2 9 9 historical m a r k e r .................................................. 4 0 4 H L ( h i g h - t o - l o w ) .................................................... 61 H o l l a n d , J . H ........................................................... 3 0 2
bold - k e y definition *-- f o o t n o t e ~ - s i d e b a r
441
442
holographic /islands h o l o g r a p h i c o p t i c a l i n t e r c o n n e c t s ........... 11 Holy
interconnect models ..................................... 3 3 5 d i s t r i b u t e d R C m o d e l ................................ 3 3 6
S o c k s ........................................................................1 3 6
lumped-load
Wars
p u r e L C m o d e l ................................................ 3 3 7
........................................................................3 5 8
home-brewed logic simulators ............. 6 9 homojunction
honking
t r a n s i s t o r s .............................. 3 6 3
big
catapults ................................................................ 2 0 0 m a i n f r a m e c o m p u t e r s ...................................... 6
hot (high energy) electron injection ............................... 2 5 1 , 2 5 2
H u f f m a n , D . A ....................................................... 1 8 5 H u s t e a d , Ted .......................................................... 4 1 0
hydrogen
m o d e l ..................................... 3 3 5
R L C m o d e l ......................................................... 3 3 7
interface elements .............................. 1 2 8 , 1 2 9
Intergraph
C o m p u t e r Systems ......................................... 10 C o r p o r a t i o n ........................................................... 1 0 E l e c t r o n i c s .............................................................. 1 0 R e a l i Z m g r a p h i c s c a r d s ............................... 11
interrupt(s) acknowledge
o u t p u t ................................... 3 9 3
C L R I M i n s t r u c t i o n ...................................... 3 8 8
a t o m s ........................................................................3 7 3
- d r i v e n I/O ......................................................... 3 9 3
b o n d s ........................................................................3 7 5
H A L T i n s t r u c t i o n ........................................ 3 9 2
III I A C K ..............................................................................3 9 3 ice c r y s t a l s ................................................................. 3 7 5 I C L .........................................................................................6 I C R ...................................................................................2 9 2 I C S ......................................................................................1 0 ID n u m b e r s
(for X v a l u e s ) .......................... 1 1 6
identification numbers IEEE
(for X v a l u e s ) . . 1 1 6
......................................................4 8 , 1 0 5 , 1 1 0
i m m e d i a t e a d d r e s s i n g ..................................... 4 2 3 I M P Inc ..................................................................v, 2 7 3 i m p l i e d a d d r e s s i n g ............................................. 4 2 2 i n - c i r c u i t r e c o n f i g u r a b l e ................................ 2 9 2 i n c l u s i o n s ................................................................... 3 6 4 i n e r t g a s ........................................................................3 7 3 inertial d e l a y s ............................................... 6 5 , 1 2 6 i n g e n i a t o r s ................................................................ 2 0 0
initialization considerations
f a u l t s i m u l a t i o n ................................................... 9 4 L F S R s ..................................................................... 2 2 6 s t a t e m a c h i n e s ................................................. 1 8 9 X v a l u e s ................................................................. 1 1 2
input switching thresholds ..................... 3 3 3 state-dependent
.............................................. 3 4 3
installing your Beboputer
........................ 4 1 9
i n s u l a t o r s .................................................................... 3 7 7 in-system programmable
................ 2 5 3 , 2 9 0
Intel H e x f o r m a t ................................................... 2 6 5 *
bold - key definition
*-- f o o t n o t e ~ ; s i d e b a r
lACK
........................................................................3 9 3
IRQ
........................................................................ 387
IV
........................................................................ 390
latch
........................................................................ 384
m a s k ( s t a t u s flag) ......................................... 3 8 8 n e s t e d i n t e r r u p t s .......................................... 3 9 9 NMI
........................................................................ 390
non-maskable
i n t e r r u p t s ....................... 3 9 0
N O P i n s t r u c t i o n ............................................ 3 9 2 p o l l i n g .................................................................... 3 8 2 p r i o r i t y e n c o d i n g ......................................... 3 9 6 r e q u e s t i n p u t ................................................... 3 8 7 R T I i n s t r u c t i o n ............................................... 3 9 0 S E T I M i n s t r u c t i o n ....................................... 3 8 8 s e r v i c e r o u t i n e ............................................... 3 8 9 s o f t w a r e i n t e r r u p t s ...................................... 3 9 1 SWI
........................................................................ 391
v e c t o r ........................................................................3 9 0
interval value sets ........................................... 1 0 2 p r i m e (logic) v a l u e s ..................................... 1 0 3
International Computers Ltd ...................... 6 in t h e o r d e r o f ........................................................ 1 2 1 ' intrinsic d e l a y s ....................................................... 3 3 3 inverse (NOT)X
v a l u e s ................................ 1 1 5
I/0, i n t e r r u p t - d r i v e n ....................................... 3 9 3 ions
.............................................................................. 373
IRQ
.............................................................................. 387
I s a a c N e w t o n .......................................................... 3 0 1 i s l a n d s o f logic ....................................................... 2 1 1
isotopes~linear443 i s o t o p e s ........................................................................ 3 7 3
L a w , M u r p h y ' s ...................................................... 1 1 2
I S P ..................................................................... 2 5 3 , 2 9 0
layout
IV
................................................................................... 3 9 0
I v a n S u t h e r l a n d ................................................... 2 1 5
a u t o m a t i c t o o l s ................................................... 4 5 d e s i g n e r s .................................................................. 4 0 p l a c e - a n d - r o u t e .................................... 45, 3 0 1
JJJ
LDD
J a c k D a n i e l s ......................................................... 4 1 0 J a c u z z i ........................................................................ 3 9 1 J a m e s H e r r i o t b o o k s ....................................... 4 0 7 J E D E C .............................................................. 4 6 , 2 6 5 Jett, P r e s t o n ............................................................ 1 9 3
Leibniz, G o t t f r i e d v o n .................................... 3 5 6 L e o n h a r d E u l e r .................................................... 3 5 4 L e t t e r f r o m A m e r i c a ........................................ 4 0 1
Lewis Carol
Jevons L o g i c M a c h i n e ................................................. 3 5 9 William J e v o n s ................................................ 3 5 9 J . H . H o l l a n d .......................................................... 3 0 2 J o n a t h a n Swift ...................................................... 3 5 7
........................................................................... 12
C a r r o l l ........................................................ 3 5 5 , 3 5 8 LFSR ...................................69, 2 1 9 , 2 9 4 , 3 2 8 2 n v a l u e s .............................................................. 2 2 8 a p p l i c a t i o n s ............................................. 2 3 0 - 2 3 2 c h e c k s u m v a l u e s ............................................ 2 2 5
John H o p k i n s U n i v e r s i t y ...................................... 3 5 5 Venn
.......................................................... 5 1 , 52, 2 7 7
L e b a n o n , K a n s a s ................................................ 4 0 5
........................................................................ 3 5 4
Johnson Counter .............................................. 2 4 1 J o n e s b o r o , A r k a n s a s ....................................... 4 0 3 jugglers 3 7 4 J u n g , Carl .................................................................. 2 2 0 j u n i o r w o o d c h u c k s ..................................... 6, 1 9 2
C R C s .......................................................... 2 2 5 , 2 3 0 cyclic r e d u n d a n c y c o d e s ............ 2 2 5 , 2 3 0 F I F O a p p l i c a t i o n s .......................................... 2 2 7 initializing .............................................................. 2 2 6 m a n y - t o - o n e L F S R s ................................... 2 2 0 maximal - d i s p l a c e m e n t .............................................. 2 2 5 - l e n g t h ................................................................ 2 2 1 o n e - t o - m a n y L F S R s ................................... 2 2 5
KKK Kansas and Kansas City .......................... 404 Karnaugh Maps
........................................................................ 3 5 5
p r e v i o u s v a l u e ................................................. 2 2 9 s e e d v a l u e ............................................................ 2 2 6 taps
........................................................................ 2 2 0
L F S R s with 2 t o 3 2 bits ..................... 2 2 3
M a u r i c e K a r n a u g h ....................................... 3 5 5
X O R v e r s u s X N O R ...................................... 2 2 1
ker-thnnk .................................................................. 1 1 9
L H ( l o w - t o - h i g h ) .................................................. 61 library of parameterized modules ........ 275 linear feedback shift registers ................. 69, 219, 294, 328 2 n values ..............................................................228 applications ............................................. 230-232 checksum values ............................................ 225
K e y s t o n e , S o u t h D a k o t a .............................. 4 0 9 Kfir, A l o n ................................................................. v, 2 8 kite in t h u n d e r s t o r m ......................................... 3 7 7 Korea
........................................................................... 11
LLL language-driven design .......... 51, 52, 277 Laputa ........................................................................357 large
C R C s ......................................................... 2 2 5 , 2 3 0
c a p a c i t i v e (logic v a l u e ) ............................. 1 0 5
initializing .............................................................. 2 2 6 m a n y - t o - o n e L F S R s ................................... 2 2 0
-scale i n t e g r a t i o n ............................................ 1 2 1 l a t c h , i n t e r r u p t ................................................. 3 8 4 lateral fuses .............................................................. 2 4 9
cyclic r e d u n d a n c y c o d e s ............ 2 2 5 , 2 3 0 F I F O a p p l i c a t i o n s .......................................... 2 2 7
maximal - d i s p l a c e m e n t .............................................. 2 2 5
b o l d - key definition *- footnote ~ - sidebar
444
linear/LSI - l e n g t h ................................................................ 2 2 1 linear feedback shift registers (cont.) o n e - t o - m a n y L F S R s ................................... 2 2 5 p r e v i o u s v a l u e ................................................. 2 2 9 s e e d v a l u e ............................................................ 2 2 6 taps ........................................................................2 2 0 L F S R s with 2 to 3 2 bits ..................... 2 2 3 X O R v e r s u s X N O R ...................................... 2 2 1 l i q u i d i z i n g h a m s t e r s ..................................... 3 7 5 lithium a t o m ............................................................ 3 7 3 l i t t l e - e n d i a n .............................................................. 3 5 8 I o c k s t e p a l g o r i t h m .............................................. 1 3 0 logic d i a g r a m s ................................................................ 3 5 3 Allan M a r q u a n d ........................... 3 5 4 , 3 5 9 Aristotle ............................................................ 3 5 4 Carroll, L e w i s ................................. 3 5 5 , 3 5 8 C h a r l e s D o d g s o n ........................ 3 5 5 , 3 5 8 D o d g s o n , C h a r l e s ....................... 3 5 5 , 3 5 8 Euler, L e o n h a r d ........................................ 3 5 4 J o h n V e n n ...................................................... 3 5 4 Karnaugh M a p s .............................................................. 3 5 5 M a u r i c e K a r n a u g h ............................. 3 5 5 L e o n h a r d E u l e r ......................................... 3 5 4 L e w i s Carroll ................................... 3 5 5 , 3 5 8 M a r q u a n d , Allan .......................... 3 5 4 , 3 5 9 M a u r i c e K a r n a u g h .................................. 3 5 5 P o r p h y r y , Tree of ..................................... 3 5 4 T h e G a m e of L o g i c ................................ 3 5 5 T r e e of P o r p h y r y ....................................... 3 5 4 Venn D i a g r a m s .................................................... 3 5 4 J o h n V e n n ................................................ 3 5 4 m a c h i n e s ............................................................... 3 5 3 Allan M a r q u a n d ........................... 3 5 4 , 3 5 9 B e n j a m i n B u r a c k ..................................... 3 5 9 B u r a c k , B e n j a m i n ................................... 3 5 9 C h a r l e s S t a n h o p e .................................... 3 5 9 G u l l i v e r ' s Travels ...................................... 3 5 7 H a r l e y , R o b e r t ........................................... 3 5 9
Jevons
L o g i c M a c h i n e ...................................... 3 5 9 William J e v o n s ..................................... 3 5 9
b o l d - k e y definition * - f o o t n o t e ~ - s i d e b a r
J o n a t h a n Swift ........................................... 3 5 7
logic (continued) machines (continued)
Lull, R a m o n .................................................. 3 5 5 Marquand Allan M a r q u a n d ...................... 3 5 4 , 3 5 9 L o g i c M a c h i n e ...................................... 3 5 9 R a m o n Lull ................................................... 3 5 5 R o b e r t H a r l e y ............................................. 3 5 9 Stanhope C h a r l e s S t a n h o p e .............................. 3 5 9 D e m o n s t r a t o r ......................................... 3 5 9 Swift, J o n a t h a n ......................................... 3 5 7 William J e v o n s ........................................... 3 5 9 p i a n o ........................................................................3 5 9 simulation 3 - b a n d d e l a y s ................................................ 6 6 c o m p r e s s i n g p u l s e s ................................... 6 4 c y c l e - b a s e d ...................................................... 6 8 d i s t r i b u t e d d e l a y s ........................... 63, 1 2 6 e v e n t - d r i v e n .................................................... 61 h a r d w a r e e m u l a t o r s .................................. 70 h o m e - b r e w e d ................................................ 6 9 n a r r o w p u l s e s .................................... 63, 1 2 6 inertial d e l a y s ..................................... 65, 1 2 6 logic v a l u e s y s t e m s .................................... 9 7 m i x e d - l e v e l ....................................................... 6 6 P i n - t o - P i n d e l a y s ............................ 64, 1 2 6 P n - P n d e l a y s ..................................... 64, 1 2 6 s t r e t c h i n g p u l s e s ........................................... 6 4 traditional logic .............................................. 5 8 t r a n s p o r t d e l a y s ............................... 65, 1 2 6 u n b a l a n c e d d e l a y s ..................................... 6 4 s y n t h e s i s ................................................................... 5 2 B O O L ................................. v, 28, 207*, 2 3 8 v a l u e s y s t e m s ....................................................... 9 7 c r o s s - p r o d u c t sets .................................... 1 0 0 interval v a l u e sets .................................... 1 0 2 l o n g d i v i s i o n .......................................................... 1 6 8 l o o k - u p t a b l e s ........................................................ 2 7 0 low - e n e r g y e l e c t r o n injection ...................... 2 5 3 -flying g r o c k l e s ................................................. 3 8 2 -to-high t ransi t i on ............................................. 61 LPMs ........................................................................2 7 5
Luci /mi oU445 L S I ................................................................................... 1 2 1 ' L u c i e .............................................................. 4 0 2 , 4 1 5 Lull, R a m o n ............................................................ 3 5 5 l u m p e d - l o a d i n t e r c o n n e c t ........................... 3 3 5 L U T ..............................................................................2 7 0
MMM Mad as a H a t t e r ........................................................... 3 5 8 M a x m o v i e s ........................................................ 4 0 8 Madison Books and C o m p u t e r s .......... 12 magazines E l e c t r o n i c D e s i g n .................................................. v Electronic Design & Technology N e t w o r k ............................................................ v, 5 6 ~ E l e c t r o n i c D e s i g n N e w s .................................. v m a g n i t u d e ............................................................... 1 2 5 m a i n f r a m e c o m p u t e r s ........................................... 6 mallet, r e p r o g r a m m i n g with ......................... 9 3 M a n c h e s t e r U n i v e r s i t y .................................... 2 1 4 many -small s t a t e m a c h i n e s .................... 1 9 1 , 1 9 4 - t o - o n e L F S R s ................................................. 2 2 0 m a p s a n d d r a g o n s .......................................... 1 0 8 M a r c h H a r e .............................................................. 3 5 8 m a r k / s p a c e ratio ........................................... 76, 8 3
Marquand
Allan M a r q u a n d ................................. 3 5 4 , 3 5 9 L o g i c M a c h i n e ................................................. 3 5 9
mask - p r o g r a m m a b l e g a t e a r r a y s .................. 2 6 8 (status flag) ........................................................ 3 8 8 m a s t e r s o f t h e u n i v e r s e ............................. 1 3 6 m a t c h e d d e l a y s .................................................... 2 1 0 m a t r i x of differential e q u a t i o n s ( S P I C E ) ......................................... 1 2 2 M a u r i c e K a r n a u g h ............................................. 3 5 5 m a x ( m a x i m u m ) d e l a y s ......................... 6 1 , 7 4 Maxfield, S t e p h a n i e ..................... v, 4 0 2 , 4 1 5 maximal - d i s p l a c e m e n t L F S R s ................................. - l e n g t h L F S R s .................................................. Mealy, G.H ............................................................ s t a t e m a c h i n e s .................................... 1 8 5 ,
225 221 185 186
m e c h a n i c a l c a l c u l a t o r ................................ m e d i e v a l w a r m a c h i n e s ............................ t r e b u c h e t .................................................. 2 0 0 , medium c a p a c i t i v e (logic v a l u e ) ............................. - s c a l e i n t e g r a t i o n ............................................ M e g i l l , N o r m a n .................................................
356 200 217 105 121 320
memory controller ............................................................... 193 testing ...................................................................... 3 1 7 R A M s ..................................................... 3 1 7 , 3 2 1 R O M s .................................................... 3 1 7 , 3 2 7 w a l k e r d i s p l a y ...................................................... 2 1 M e m p h i s ................................................................... 4 0 3 m e r c u r y (in T o p H a t s ) .................................... 3 5 8 * metal-oxide s e m i c o n d u c t o r transistors 247 M e x i c o City, s i e g e of ....................................... 2 1 7 M i c r o m o s a i c ..................................... 5 1 , 2 4 7 , 2 7 6 microprocessor 1 0 1 c o u r s e s ........................................................ 1 3 6 a s y n c h r o n o u s m i c r o p r o c e s s o r s ......... 2 1 4 - b a s e d h a r d w a r e s i m u l a t o r s .................... 7 0 m i d d l e - o u t .................................................................. 4 9 m i g r a t i n g d e s i g n s to P L D s ............................. 4 7 Mill, H e n r y ................................................................... 1 6 ' Mills, G e o r g e ....................................................... v, 3 4 m i n ( m i n i m u m ) d e l a y s ............................ 6 1 , 7 4 m i n i m i z a t i o n ............................................................... 4 6 m i n : m a x d e l a y s ....................................................... 7 7 m i n : t y p d e l a y s .......................................................... 7 7 m i n : t y p : m a x d e l a y s ................................... 6 1 , 7 4 m i n u e n d ........................................................ 1 3 7 , 1 3 8 Mississippi .................................................................. 4 0 3 Missouri 404 mixed -level d e s i g n .................................................................... 5 4 s i m u l a t i o n .......................................................... 6 6 -signal s i m u l a t i o n ............................. 1 2 0 , 1 2 7 a / D ........................................................................ 1 2 7 A / D .......................................................... 1 2 7 , 1 3 1 A / d .......................................................... 1 2 7 , 1 3 1 c o s i m u l a t i o n s t r a t e g i e s ........................ 1 2 8 i n t e r f a c e e l e m e n t s ...................... 1 2 8 , 1 2 9 s y n c h r o n i z a t i o n t e c h n i q u e s ............. 1 3 0 v e r i f i c a t i o n s t r a t e g i e s ............................ 1 2 6 b o l d - key definition *- footnote ~ - sidebar
446
MMLogic / numbers X v a l u e s ........................................................... 1 1 3 M M L o g i c ............................................................... v, 3 4 M o d i f i e d C o u n t i n g S e q u e n c e .................. 3 2 0 m o l e c u l e s ................................................................. 3 7 4 d e s i g n e r m o l e c u l e s ...................................... 3 7 9 m o l y b d e n u m ......................................................... 3 6 5 M o n o l i t h i c M e m o r i e s Inc ............................. 2 5 8 * M o n t y P y t h o n ........................................................ 2 0 1 M o o n .............................................................................. 3 7 2 M o o r e , E.F. .............................................................185 state machines .................................... 185, 186 M O S t r a n s i s t o r s ...............................................2 4 7 moss g r o w i n g on trees ............................ 7, 4 0 5 M o t i o n J P E G ..........................................................2 9 3 Motorola F P A D s and F P M S D s .................................. 2 7 4 S - R e c o r d format ............................................265* M o u n t R u s h m o r e .............................................4 0 8 M P A s .............................................................................. 274 M P A A s .........................................................................2 7 4 M P E G ...........................................................................2 9 3 M P G A s ........................................................................2 6 8 MSI .............................................................................. 121' mud-wrestlers ........................................................411 Muller, D a v i d ..........................................................2 1 0 multi -colored charts ....................................................89 -input transitions
NNN N ( s t a t u s f l a g ) ...................................................... 1 4 7 N a g y b a c z o n , E r n e s t N a g y .......................... 3 6 6 n a m e l e s s test s e q u e n c e ................................. 3 1 9 N a n o b o t s ................................................................... 3 7 9 n a n o p h a s e m a t e r i a l s ........................................ 3 6 6 n a n o t e c h n o l o g y ............................ 1 1 , 3 7 1 , 3 7 8 n a r r o w p u l s e s ............................................... 53, 1 2 6 N e b r a s k a .................................................................... 4 0 6 n e f a r i o u s g o v e r n m e n t s c h e m e s ............. 1 4 3 n e g a t i v e i o n s ........................................................... 3 7 3 n e s t e d i n t e r r u p t s ................................................ 3 9 9 netlists g a t e level netlists ............................................... 5 8 t r a n s i s t o r - l e v e l netlists .................................. 4 4 n e u r a l n e t w o r k s ................................................. 3 0 5 n e u t r o n s ...................................................................... 3 7 2 Newton Isaac N e w t o n ....................................................301 personal organizer ........................................ 2 1 4 n e x t s t a t e ..................................................................184 nine's c o m p l e m e n t s ............................ 136, 137 nitrogen ........................................................................ 375 NMI
.............................................................................. 390
noble gas ....................................................................3 7 3 noise h i g h - f r e q u e n c y noise ................................ 125 p i n k noise ....................................................................5
d r i v e c a p a b i l i t y ........................................... 3 4 5 P n - P n d e l a y s ............................................... 3 4 4 multimedia, Beboputer ................................. 2 4
white noise .................................................................5* nono
m u l t i p l e x e r s ................................ 109, 111, 115 F P G A architectures ..................................... 271 m u l t i p l i c a t i o n , b i n a r y .................................. 155
- d e t e r m i n i s t i c p o l y n o m i a l ....................... 1 9 0 - h i e r a r c h i c a l ........................................................ 191
e x a m p l e subroutine .................................... 161 partial products ............................................... 156 shift-and-add t e c h n i q u e .......................... 156 signed b i n a r y multiplication ................. 160 M u r p h y , E d w a r d ............................................... 1 1 2 ' M u r p h y ' s L a w ....................................................... 1 1 2
M u s g r a v e , G e r r y ................................................. 1 2 5 m u s i c a l s o c k s ............................................................. 12 m u t a t i o n ( g e n e t i c a l g o r i t h m s ) .................. 3 0 3 mylar ...........................................................................41
-critical r a c e s ...................................................... 2 0 6
- m a s k a b l e i n t e r r u p t s ................................. - r e s t o r i n g a l g o r i t h m ..................................... N O P i n s t r u c t i o n .............................................. n o r m a l F E T .................................................... 99, N o r m a n Megill ...................................................... N o r t h w e s t Inn, O k l a h o m a ..........................
390 170 392 102 320 412
n o s u c h t h i n g (as digital) .............................. 1 2 0 N P p r o b l e m ............................................................. 1 9 0 n u c l e u s ........................................................................ 3 7 2 numbers s i g n e d b i n a r y n u m b e r s ............................. 1 4 0 u n s i g n e d b i n a r y n u m b e r s ...................... 1 3 8
b o l d - k e y definition * - f o o t n o t e ~ - s i d e b a r
.u mog / Nutmeg
............................................................ 43*, 1 2 2
OOO
O (status flag) .....................................................147
O 0 (open 0 fault) ......................................88, 318 O1 (open 1 fault) ......................................88, 318 Oklahoma .................................................................412 old bean ......................................................................125 One -Big machines ...................................................191 -hot encoding ........................ 186, 187, 190 of those days .....................................................2 3 4 - t i m e p r o g r a m m a b l e ........ 2 4 9 , 2 5 0 , 2 5 4 - t o - m a n y L F S R s ............................................. 2 2 5 o n e ' s c o m p l e m e n t s ........................................ 1 3 8 g e n e r a t i n g ............................................................ 1 3 8 o p e n f a u l t s .................................................... 8 8 , 3 1 8 O 0 ( o p e n 0) .......................................................... 8 8 O 1 ( o p e n 1) .......................................................... 8 8 O Z ( o p e n Z) .......................................................... 8 9 optical i n t e r c o n n e c t s ....................................... 11 o p t i m i z a t i o n ................................................................ 4 6 O R a r r a y s .................................................................. 2 5 5 o r d e r of ........................................................................ 1 2 1 ' o r g a n i c m o l e c u l e s .............................................. 3 6 8 oscillations a s y n c h r o n o u s circuits ................................ 2 0 5 uncontrolled oscillations (Xs) ................... 1 1 0 , 1 1 1 , 1 1 4 O T P ...................................................2 4 9 , 2 5 0 , 2 5 4 O u r o b o r o s ................................................................ 2 2 0 o v e r f l o w ...................................................................... 1 4 5 o v e r s h o o t ................................................................... 1 2 5 o x y g e n c y l i n d e r s .................................................. 4 1 3 O Z ( o p e n Z fault) ...................................... 8 9 , 3 1 8
PPP
Paglicco, C h u c k ...................................................72 ~ P A L s .............................................................................. 2 5 8 p a p e r t a p e r e a d e r / w r i t e r .................................. 2 3 p a r a l l e l fault s i m u l a t i o n a l g o r i t h m s ......... 9 4
parasitics
r e f l e c t e d p a r a s i t i c s ........................................ 3 4 6 t e r m i n a l p a r a s i t i c s ......................................... 3 4 4
partial p r o d u c t s ................................................................ 1 5 6 R e e d M011er i m p l e m e n t a t i o n s ............ 3 1 2 passive ( l o g i c s t r e n g t h ) ............................. 1 0 0
path
- d e p e n d e n t d r i v e c a p a b i l i t y ................. 3 4 1 -specific P n - P n d e l a y s ............................... 3 3 8 P C B .................................................................................4 1 a r t w o r k ...................................................................... 4 2 film a r t w o r k ........................................................... 4 2 f o o t p r i n t .................................................................... 4 1 t a p i n g u p .................................................................. 4 2 w o o l l y s w e a t e r s .................................................. 42* P D P 1 1 - 2 3 ...................................................................... 8 p e a s ( m e d i u m sized) ........................................ 3 7 3 p e e r - t o - p e e r (state m a c h i n e s ) ................. 1 9 4 p e p t i d e b o n d s ........................................................ 3 7 6 p e r i o d , c l o c k .................................................... 76, 8 3 phase .............................................................................. 1 2 5 -lock l o o p ............................................................. 1 2 7 375 P i a n o ( L o g i c P i a n o ) ......................................... 3 5 9
phosphorous .........................................................
P i e r c e , C h a r l e s ...................................................... 3 5 9 P i l k i n g t o n M i c r o e l e c t r o n i c s ........................ 2 9 2 p i n k n o i s e ......................................................................... 5 p i n s w a p p i n g ............................................................. 4 5 P i n - t o - P i n d e l a y s ....................... 64, 1 2 6 , 3 3 4 m u l t i - i n p u t t r a n s i t i o n s ................................ 3 4 4 p a t h - s p e c i f i c ....................................................... 3 3 8 s l o p e - d e p e n d e n t ............................................ 3 4 0 s t a t e - d e p e n d e n t .............................................. 3 4 1 p l a c e - a n d - r o u t e ........................................ 4 5 , 3 0 1 P L A s .............................................................................. 2 5 7 p l a s m a .......................................................................... 3 6 4 plastic t r a n s i s t o r s ................................................. 3 6 8 P l a t t e river ................................................................ 4 0 6 p l a y g r o u n d d u s t .......................................................... 4 PLDs .................................................................... 4 6 , 2 5 3 a n d s t a t e m a c h i n e s ........... 1 8 7 , 1 8 9 , 1 9 4 m i g r a t i n g d e s i g n s t o ....................................... 4 7 P L L ..................................................................... 1 2 7 , 2 1 6 P M E L .............................................................................. 2 9 2 P n - P n d e l a y s .................................. 64, 1 2 6 , 3 3 4 m u l t i - i n p u t t r a n s i t i o n s ................................ 3 4 4 p a t h - s p e c i f i c ....................................................... 3 3 8
b o l d - key definition *- footnote ~ - sidebar
447
448
,,.-,,. / Pn-Pn delays (continued) s l o p e - d e p e n d e n t ............................................ 3 4 0 s t a t e - d e p e n d e n t .............................................. 3 4 1
Point
programmable (continued) a n d state m a c h i n e s ...... 187, 189, 1 9 4 m i g r a t i n g designs to .................................. 4 7 r e a d - o n l y m e m o r i e s ................................... 2 5 8
c o n t a c t transistors ......................................... 2 4 7 - t o - P o i n t delays ............................................... 3 3 4
polling (interrupts) ........................................
programming PLDs
........................................................................2 6 4
382
t e c h n o l o g i e s ....................................................... 2 4 8
p o l y e s t e r f a s h i o n s t a t e m e n t .............................. 5
antifuses ........................................................... 2 5 0
polynomials
E E P R O M transistors ................. 2 5 2 , 2 6 3
c y c l o t o r n i c ............................................................ 2 3 2
E P R O M transistors ..................... 2 5 0 , 2 6 3
n o n - d e t e r m i n i s t i c ........................................... 1 9 0
fusible links .................................................... 2 4 8
polysilicon floating gate ............................ 2 5 1
P o o h Sticks ............................................................... 4 1 5 P o r p h y r y , T r e e of ................................................ 3 5 4 p o s i t i v e i o n s ............................................................. 3 7 3 p o t e n t i a l d e t e c t i o n s .................................... 93, 9 4 p o t e n t i o m e t e r s ...................................................... 2 9 8 Potts, D a v e ...................................................................... 6
power dissipation static
........................................................................2 0 9 ............................................................ 1 9 3
p r e v i o u s v a l u e ( L F S R s ) ................................ 2 2 9 p r i m a r y s t r u c t u r e ( p r o t e i n ) ......................... 3 7 7 p r i m e (logic) values .......................................... 1 0 3
p r i m i t i v e g a t e faults .............................................. 9 2 primitives, s i m u l a t i o n p r i m i t i v e s ............... 6 0
printed circuit
b o a r d ...........................................................................41 A r t w o r k ................................................................ 4 2
film a r t w o r k ...................................................... 4 2 f o o t p r i n t ............................................................... 41 t a p i n g u p ............................................................ 4 2 w o o l l y s w e a t e r s ............................................ 42* d e s i g n c o n f e r e n c e ............................................ 11 printing telegraphs ............................................ 1 6 p r i o r i t y e n c o d i n g ............................................... 3 9 6 p r o b e ( g u i d e d p r o b e a n a l y s i s ) ................ P r o d u c t i o n S e r v i c e s C o r p o r a t i o n ......... p r o d u c t - o f - s u m s .................................................. Prof. G o n z o D r i b b l e r .......................................
programmable
........................................................................2 5 8 ........................................................................3 7 6
-based switches .................................. 3 7 1 , 3 7 5 d e s i g n e r p r o t e i n s ........................................... 3 7 9
protons pseudo
........................................................................3 7 2
- r a n d o m n u m b e r s ......................................... 2 3 2 - s y n c h r o n o u s .................................................... 2 1 0
d y n a m i c ................................................................. 2 0 9
Preston J e t t
PROMs protein
231 320 308 136
a r r a y logic ............................................................ 2 5 8 logic arrays .......................................................... 2 5 7 logic d e v i c e s ............................................ 46, 2 5 3
b o l d - k e y definition * - f o o t n o t e ~ -- s i d e b a r
ptarmigan ................................................................... 73 P t - P t d e l a y s .............................................................. 3 3 4 p u l l d r i v e (Verilog logic v a l u e ) ................. 1 0 5
pulses
c o m p r e s s i n g pulses ......................................... 6 4
n a r r o w p u l s e s ......................................... 63, 1 2 6 s t r e t c h i n g p u l s e s ................................................ 6 4 punched cards ........................................................... 6 S P I C E d e c k ........................................................... 4 3
Pure
L C i n t e r c o n n e c t ............................................. 3 3 7
R e e d MUller ........................................................ 3 0 8 P y t h o n , M o n t y ..................................................... 2 0 1
QQQ
quantum quarks quartz
l e v e l s ................................................... 3 7 3
........................................................................ 372*
substrate ................................................................ 3 6 5 w i n d o w .................................................................. 2 5 1
Queen Anne ...............................................................1 6 ' Q u i c k t u m .................................................. 68, 70, 72 ~ QWERTY
k e y b o a r d ............................... 25, 3 8 2
race~shitter449 RRR
r i p p l e a d d e r .............................................................. 1 4 5
race conditions ...................................................
205
critical races ........................................................ 2 0 6 n o n - c r i t i c a l races ............................................ 2 0 6
radioactive e f f l u e n t ......................................... radix (of number system) .........................
189 136
R I S C h a r d w a r e a c c e l e r a t o r .......................... 6 9 R L C i n t e r c o n n e c t ............................................... 3 3 7 R o b e r t H a r l e y ........................................................ 3 5 9 r o b o t s , r u d i m e n t a r y ................................................. 5 r o l l e r s k a t e s .............................................................. 4 0 6
c o m p l e m e n t s ........................................ 1 3 6 , 1 3 8
Romans
d y n a m i c R A M .................................................. 2 5 1
s e r i a l a c c e s s p r o b l e m s .............................. 3 2 8 rotator/shifter, barrel .......................................... 7
RAM
i n i t i a l i z i n g R A M ............................................... 1 1 3 t e s t i n g R A M ........................................... 3 1 7 , 3 2 1 access tests .................................................... 3 2 1 i n t e r n a l tests ................................................. 3 2 5
Ramon
L u l l ............................................................ 3 5 5
r a m p a g i n g g r o c k l e s ........................................... 3 8 2
random
0 s a n d l s ................................................. 1 1 3 , 1 1 4 numbers
( p s e u d o ) ........................................ 2 3 2
Rapid City, South Dakota ......................
( a n d R o m a n r o a d s ) .................... 2 0 1
ROMs, testing .........................................
317, 327
R T I i n s t r u c t i o n ..................................................... 3 9 0 RTL
........................................................ 4 9 , 5 4 , 1 2 6
r u b b e r b o o t s ........................................................... 3 7 7 r u d i m e n t a r y r o b o t s .................................................. 5
SSS SO ( s t u c k - a t 0 f a u l t ) ......................................... 8 8 $ 1 ( s t u c k - a t I fault) ............................................. 8 8
409
s a l e s m a n , t r a v e l i n g ............................................ 1 9 0
R a t t l e s n a k e s ............................................................ 4 0 3 R e a l i Z m g r a p h i c s c a r d s .................................... 11
s a p p h i r e ........................................................................ 3 6 5 scaling ........................................................................ 3 6 2
r e c k l e s s t h r i l l - s e e k e r s ....................................... 1 1 2
schematic
r e c o n f i g u r a b l e logic .............................. 2 8 7 , 2 8 9 r e - c o n v e r g e n t f a n o u t ............................. 8 0 , 1 1 7 r e d h e r r i n g ................................................................ 1 2 1 R e e d MUller logic ................................................ 3 0 7
c a p t u r e ...................................................................... 4 4 fiat s c h e m a t i c s ............................................... 4 5 s y n t h e s i s ................................................................... 5 3
Seafood
r e f l e c t e d p a r a s i t i c s .............................................. 3 4 6
G u m b o ................................................... 1 2 s e a l i o n s ........................................................................ 4 0 7
r e g i s t e r t r a n s f e r l a n g u a g e ......... 4 9 , 5 4 , 1 2 6
s e a r c h t e c h n i q u e s ............................................... 3 0 0
registered-output state machines ...................................................
s e c o n d a r y s t r u c t u r e ( p r o t e i n ) ................... 3 7 7 188, 236
s e e d v a l u e ( L F S R s ) .......................................... 2 2 6
P L D i m p l e m e n t a t i o n s ............................... 1 8 9
s e q u e n t i a l .................................................................. 2 0 0
registers, initializing
..................................... 1 1 3 r e l a y s .............................................................................. 1 8 5
r e m a i n d e r (sign of) ............................................ 1 7 8 reprogramming
w i t h a m a l l e t ...................... 9 3
resistive F E T .................................................................... 9 9 , 1 0 2 (logic s t r e n g t h ) ................................................. 1 0 0
resolution function ..........................................
105
r e s t o r i n g a l g o r i t h m ............................................ 1 7 0 R e v i v a l m e e t i n g s ................................................. 4 0 3 r e v o l u t i o n a s a n a t i o n a l s p o r t .................. 2 8 8
serial a c c e s s p r o b l e m s ............................................. 3 2 8 f a u l t s i m u l a t i o n a l g o r i t h m s ....................... 9 4
serine ..............................................................................
376 S E T I M i n s t r u c t i o n ............................................ 3 8 8 sewing
........................................................................ 4 0 7
S h a n n o n , C l a u d e ................................ 3 5 4 ~, 3 6 0 S . H . U n g e r ............................................................... 2 1 0
Sheffield H a l l a m U n i v e r s i t y ................................................ 5 P o l y t e c h n i c ................................................................. 5
R i c h a r d F e y n m a n ....................................... 6*, 3 7 8
Sheridan Lake, Colorado
r i n g c o u n t e r ............................................................. 2 4 1
s h i f t - a n d - a d d t e c h n i q u e ................................ 1 5 6
ringing
s h i f t e r / r o t a t o r , b a r r e l ............................................... 7
........................................................................ 1 2 5
........................ 4 1 1
bold - key definition *- footnote ~ - sidebar
450
,ho#/
Softronics
s h o r t faults ................................................................... 9 2 s i e g e of M e x i c o City ......................................... s i g n - m a g n i t u d e f o r m a t ................................... signature analysis ........................................... s i g n a t u r e s ............................................................. signed
binary
217 140 231 231
m u l t i p l i c a t i o n ..................................................... 1 6 0 n u m b e r s ................................................................ Silicon Graphics ............................................... s i m p l e P L D s .................................................. 4 6 , s i m u l a t e d a n n e a l i n g .........................................
140 215
253 301
simulation a n a l o g s i m u l a t i o n ................................ 4 3 , 1 2 1 a / D ........................................................................ 1 2 7 A / D .......................................................... 1 2 7 , 1 3 1 A / d .......................................................... 1 2 7 , 1 3 1 a d a p t i v e t i m e - s t e p .................................. 1 2 4 a n a l y t i c a l s o l u t i o n .................................... 1 2 2 c o n v e r g e n c e ................................................. 1 2 4 c o s i m u l a t i o n with digital .................... differential e q u a t i o n s ............................ d i s c o n t i n u i t i e s ............................................. m a t r i x ................................................................. m i x e d - s i g n a l .................................................
128 122 124 122 126
S P I C E ...................................................... 4 3 , 1 2 1 S P I C E 1 ........................................................... 1 2 1 S P I C E 2 ........................................................... 1 2 1 S P I C E 3 ........................................................... 1 2 2 t o l e r a n c e criteria .......................... 1 2 3 , 1 2 4 digital s i m u l a t i o n .................................. 58, 1 2 5 3 - b a n d d e l a y s ................................................ 6 6 c o m p r e s s i n g p u l s e s ................................... 6 4 c o s i m u l a t i o n with a n a l o g .................. 1 2 8 c y c l e - b a s e d ...................................................... 6 8 d i s t r i b u t e d d e l a y s ........................... 63, 1 2 6 event - d r i v e n ............................................................ 61 - w h e e l ........................................................... 1 2 5 h a r d w a r e e m u l a t o r s .................................. 7 0 H I L O ................................................................... 1 2 5 h o m e - b r e w e d ................................................ 6 9 n a r r o w p u l s e s .................................... 63, 1 2 6 inertial d e l a y s ..................................... 65, 1 2 6 logic v a l u e s y s t e m s .................................... 9 7 m i x e d - l e v e l ....................................................... 6 6
b o l d -- k e y definition * - f o o t n o t e ~ - s i d e b a r
simulation
(continued)
digital s i m u l a t i o n ( c o n t i n u e d ) P i n - t o - P i n d e l a y s ............................ 64, 1 2 6 P n - P n d e l a y s ..................................... 64, 1 2 6 s t r e t c h i n g p u l s e s ........................................... 6 4 t r a d i t i o n a l logic .............................................. 5 8 t r a n s p o r t d e l a y s ............................... 65, 1 2 6 u n b a l a n c e d d e l a y s ..................................... 6 4 m i x e d - s i g n a l s i m u l a t i o n ............. 120, 1 2 7 a / D ........................................................................ 1 2 7 A / D .......................................................... 127, 1 3 1 A / d .......................................................... 127, 131 c o s i m u l a t i o n s t r a t e g i e s ........................ 1 2 8 i n t e r f a c e e l e m e n t s ...................... 128, 1 2 9 s y n c h r o n i z a t i o n t e c h n i q u e s ............. 1 3 0 v e r i f i c a t i o n s t r a t e g i e s ............................ 1 2 6 X v a l u e s ........................................................... 1 1 3 p r i m i t i v e s ................................................................. 6 0
sine waves ............................................................... 121 single-crystal d i a m o n d ................................... 3 6 7 skew
....................................................................80, 2 0 9
slope d e p e n d e n c y ....................................................... 3 3 5 -dependent d r i v e c a p a b i l i t y ........................................... 3 4 2 P n - P n d e l a y s ............................................... 3 4 0 of signals ............................................................... 3 3 2
small c a p a c i t i v e (logic v a l u e ) ............................. 1 0 5 -scale i n t e g r a t i o n ............................................ 121
Smith D o u g S m i t h ............................................ 568, 1 8 6 ' E d S m i t h .................................................................. 6 9
sn7400
...........................................................................7 4
Snark
........................................................................ 358
s n i v e l i n g little t o a d s .................................................. 5 soccer
socks
.............................................................................. 4
h o l y s o c k s ............................................................ 1 3 6 m u s i c a l s o c k s ....................................................... 12 s t r e t c h - r e s i s t a n t s o c k s ................................ 2 8 8 s o f t b a l l t e a m s ...................................................... 4 0 3 S o f t r o n i c s Inc ...................................................... v, 3 4
A~ state (continued)
software a n d h a r d w a r e i n t e g r a t i o n .......................... 6 9 - i n t e r r u p t s .......................................................... 3 9 1 solution s p a c e ..................................................... 2 9 8 s o u n d card, Beboputer .................................... 2 5 S o u t h D a k o t a ......................................................... 4 0 7 S P A R C ........................................................................ 2 1 5
machines (continued) Mealy machines ........................... 185, 186 Moore machines .......................... 185, 186 n e x t s t a t e ......................................................... 1 8 4 n o n - h i e r a r c h i c a l ........................................ 1 9 1 One -Big m a c h i n e s ........................................ 1 9 1
s p e c t r a l c o m p o n e n t s ........................................ 1 2 5 S p e e d S i m ..................................................................... 6 8
SPICE
- h o t e n c o d i n g ............. 1 8 6 , 1 8 7 , 1 9 0 p e e r - t o - p e e r i m p l e m e n t a t i o n s ...... 1 9 4
..............................................................43, 121
d e c k .......................................................................... 4 3 N u t m e g ...................................................................... 43* S P I C E 1 ................................................................ 1 2 1 S P I C E 2 ................................................................ 1 2 1 S P I C E 3 ................................................................ 1 2 2 spikes ........................................................................2 0 6 SPLDs ........................................................................2 5 3 s p l i n e - b a s e d t e c h n i q u e s ............................... 1 2 5 S - R e c o r d f o r m a t .................................................. 2 6 5 * S S I ................................................................................... 1 2 1 '
Stanhope C h a r l e s S t a n h o p e ......................................... 3 5 9 D e m o n s t r a t o r .................................................... 3 5 9
state
P L D s ........................................ 1 8 7 , 1 8 9 , 1 9 4 r e a l - w o r l d e x a m p l e s .............................. 1 9 5 registered-output machines 188, 236 s t a t e v a r i a b l e s ............................................. s t a t e - p e r - b i t ................................................... u n d e f i n e d s t a t e s ........................... 1 8 7 , - p e r - b i t .................................................................... v a r i a b l e s ................................................... 1 8 4 ,
184 188 189 188 205
static p o w e r d i s s i p a t i o n .......................................... 2 0 9 t i m i n g a n a l y s i s ................................. 6 8 , 74, 8 7 X v a l u e s .................................................... 1 1 4 , 1 1 7
status flags ............................................................. 147 s t d _ I o g i c ( V H D L ) ............................................... 1 0 5
d e p e n d e n c y ....................................................... 3 3 5 -dependent d r i v e c a p a b i l i t y ........................................... 3 4 2 i n p u t t h r e s h o l d s ......................................... 3 4 3 P n - P n ................................................................. 3 4 1 t e r m i n a l p a r a s i t i c s .................................... 3 4 4 d i a g r a m e d i t o r ........................................... 5 0 , 5 5 d i a g r a m s ...................................................... 5 9 , 2 7 8 d i f f e r e n c e .............................................................. 2 0 5 logic s t a t e ................................................. 1 0 0 , 1 0 2 m a c h i n e s ........................................ 4 9 , 1 8 3 , 2 6 1 A S I C s ................................................................. 1 8 7 a s y n c h r o n o u s ...................................... 79, 2 0 4 bit p e r s t a t e ................................................... 1 8 8
step f u n c t i o n ..................................................... 1 2 2 , 1 2 3 R e c k o n e r .............................................................. 3 5 6
Stephanie M a x f i e l d .................... v, 4 0 2 , 4 1 5 sticky c u t o u t s ....................................................................... 4 1 tape ........................................................................2 5 1 Stone Henge ......................................................... 4 0 8 s t r e n g t h (logic) ......................................... 1 0 0 , 1 0 2 s t r e t c h i n g p u l s e s ..................................................... 6 4 s t r e t c h - r e s i s t a n t s o c k s ..................................... 2 8 8
strong d r i v e (Verilog logic v a l u e ) ...................... 1 0 5 g e n e r i c logic s t r e n g t h ................................. 1 0 0
b i n a r y e n c o d e d .............. 1 8 6 , 1 8 7 , 1 9 0 c u r r e n t s t a t e .................................................. 1 8 4
structural representations .......................... 4 9
e x a m p l e s , r e a l - w o r l d ............................ F P G A s .................................................. 1 8 7 , h i e r a r c h i c a l .................................................... initialization ...................................................
SO ( s t u c k - a t 0) .................................................... 8 8 $ 1 ( s t u c k - a t 1) .................................................... 8 8 Sturgis Bike Rally ........................................... 4 0 8
195 189 191 189
M a n y - S m a l l m a c h i n e s ............ 1 9 1 , 1 9 4
s t u c k - a t f a u l t s .................................. 8 8 , 9 2 , 3 1 8
S U B ................................................................ 1 3 6 , 1 4 8 S U B B .............................................................................. 1 3 6 '
b o l d - k e y d e f i n i t i o n * - f o o t n o t e ~ -- s i d e b a r
452
suBc /
,g
S U B C ............................................................... 136, 1 4 8 s u b m i c r o n d e l a y s ............................... 4 8 , 61, 8 2 subroutine examples division (binary) ............................................. 1 7 3 m u l t i p l i c a t i o n (binary) ............................... 1 6 1 s u b t r a c t i o n .............................................................. 1 4 8
t a x o n o m y ( F P D s ) ............................................. 2 4 6 Ted H u s t e a d ............................................................ 4 1 0 t e l e p h o n e p o l e s .................................................... 4 0 6 t e l e p r i n t e r s ................................................................... 16 t e m p e r a t u r e ................................................................ 61 t e n ' s c o m p l e m e n t s ............................... 136, 1 3 7 t e r m i n a l parasitics ............................................... 3 4 4 t e r m i n o l o g y (FPDs) .......................................... 2 4 6
s u b t r a h e n d .................................................. 137, 1 3 8 s u c k - i t - a n d - s e e ......................................................... 5 8 S u e B r o w n ............................................................ v, 1 6 sulfur ..............................................................................3 7 5 s u m - o f - p r o d u c t s ........................................ 5 2 , 3 0 8 S U N ..............................................................................2 1 5
tertiary logic (versus binary logic) ......................... 98
Kid .............................................................................4 1 0 t o w n in W y o m i n g .......................................... 4 1 0 supply
testing
Sundance
d r i v e (Verilog logic v a l u e ) ...................... 1 0 5 g e n e r i c logic s t r e n g t h ................................. 1 0 0 surround read/write disturb tests .... 3 2 6 S u t h e r l a n d , Ivan .................................................. 2 1 5 SWI ..............................................................................3 9 1 Swift, J o n a t h a n .................................................... 3 5 7 s w i t c h i n g t h r e s h o l d s .................................... 3 3 3 s t a t e - d e p e n d e n t .............................................. 3 4 3 s w i t c h p a n e l , B e b o p u t e r .............................. 1 8 s y n c h r o n i z a t i o n t e c h n i q u e s ................... 1 3 0 C a l a v e r a s a l g o r i t h m .................................... 1 3 0 I o c k s t e p a l g o r i t h m ........................................ 1 3 0
synchronous
b i n a r y c o u n t e r .................................................. 1 8 5 d e s i g n s ................................................ 68, 76, 2 0 7 s y n t a x e r r o r .................................................................. 6
synthesis
logic s y n t h e s i s ...................................................... 5 2 s c h e m a t i c s y n t h e s i s ........................................ 5 3 s y s t e m a d m i n i s t r a t o r s ........................................ 41
TTT
tabular stimulus and response .............................................. 43, 58, 1 2 5 T a i w a n ...........................................................................11 t a p - d a n c i n g .............................................................. 1 8 4 t a p i n g u p ....................................................................... 4 2 t a p s ...................................................................................2 2 0 for L F S R s with 2 to 3 2 bits ................... 2 2 3
b o l d - k e y definition *-- f o o t n o t e ~ - s i d e b a r
trits ...........................................................................9 8 s t r u c t u r e (protein) ......................................... 3 7 7 t e s t , f u n c t i o n a l ............................................. 9, 2 3 1 testicles, bull's ........................................................ 4 1 3 R A M s ..........................................................3 1 7 , 3 2 1 a c c e s s tests .................................................... 3 2 1 i n t e r n a l tests ................................................. 3 2 5 R O M s ..........................................................3 1 7 , 3 2 7 serial a c c e s s p r o b l e m s ......................... 3 2 8 The G a m e of L o g i c ................................................. 3 5 5 H u n t i n g of t h e S n a r k ................................. 3 5 8 Life of B r i a n ....................................................... 2 0 1 three-phase c l o c k ............................................ 2 3 3 t h r e o n i n e .................................................................... 3 7 6 t h r e s h o l d - d e p e n d e n t P n - P n d e l a y s .... 3 3 9 thrill-seekers, r e c k l e s s ..................................... 1 1 2 T h r o u g h t h e L o o k i n g G l a s s ....................... 3 5 8
time -line (FPDs) ........................................................247 -step, adaptive algorithm ........................ 124 timing analysis ......................................................................73 cheap-and-cheerful ................................... 62 d y n a m i c timing analysis .............. 77, 87 c o m m o n - m o d e ambiguity .. 80, 117 correlation ....................................................81 e-convergent fanout .............. 80, 117 s k e w .................................................................. 8 0 static t i m i n g analysis ............. 68, 74, 8 7 timing d i a g r a m a n a l y z e r s ..................... 8 2 T i m i n g D e s i g n e r ............................................ 8 2 w o r s t - c a s e anal ysi s .................................... 77 c h a i n s ........................................................................1 8 5
/ u.k
Timing Designer ..................................................
82
t i t a n i u m ........................................................................ 3 6 4 t o a d , s n i v e l i n g .............................................................. 5 t o l e r a n c e c r i t e r i a ..................................... 1 2 3 , 1 2 4 Tomahawk
cruise missile .............................. 2 9 2
top
- d o w n ......................................................................... 4 9
Hats
........................................................................ 358*
Topeka, Kansas
................................................. 4 0 4
t o r n a d o ........................................................................ 4 0 4 T o t o .............................................................................. 4 0 4 traffic light c o n t r o l l e r ........................................... 9 4 t r a n s i s t o r - t r a n s i s t o r logic ..................... 5 8 , 1 2 6
transistors b i p o l a r j u n c t i o n t r a n s i s t o r s .................... 2 4 7 d i a m o n d t r a n s i s t o r s ..................................... 3 6 6 E E P R O M t r a n s i s t o r s ...................... 2 5 2 , 2 6 3 E P R O M t r a n s i s t o r s .......................... 2 5 0 , 2 6 3 field-effect t r a n s i s t o r s ................................. 2 4 7 F L A S H t r a n s i s t o r s ........................................ 2 5 3 h e t r o j u n c t i o n t r a n s i s t o r s .......................... 3 6 3 h o m o j u n c t i o n t r a n s i s t o r s ........................ 3 6 3 m e t a l - o x i d e s e m i c o n d u c t o r s ................ 2 4 7 of t h e f u t u r e ....................................................... 3 6 1 plastic t r a n s i s t o r s ............................................ 3 6 8 p o i n t c o n t a c t t r a n s i s t o r s .......................... 2 4 7
transport delays
...................................... 65, 1 2 6 t r a v e l i n g s a l e s m a n p r o b l e m ....................... 1 9 0 t r e b u c h e t ....................................................... 2 0 0 , 2 1 7 T r e e of P o r p h y r y .................................................. 3 5 4
tri-state
UUU U (undefined/uninitialized) values ............................................. 1 0 1 , UDL/I ULSI
105, 110
........................................................................... 5 3 ........................................................................ 2 0 9
U l t r a S P A R C ............................................................ 2 1 5 u l t r a v i o l e t light .......................................... 2 5 1 , 3 6 2 u n b a l a n c e d d e l a y s ................................................ 6 4 uncontrolled oscillations (Xs) ........................ 1 1 0 , 1 1 1 , 1 1 4
undefined states in s t a t e m a c h i n e s ......................... 1 8 7 , 1 8 9 logic s t a t e s ...................................................... 1 1 0 U v a l u e s ...................................... 1 0 1 , 1 0 5 , 1 1 0
underwear riding
u p ............................................. 5*
U n g e r , S. H .............................................................. 2 1 0 u n i c y c l e .............................................................. 13, 2 0 0 u n i f i e d ( c o s i m u l a t i o n t e c h n i q u e ) .......... 1 2 9
uninitialized U v a l u e s ...................................... 1 0 1 , 1 0 5 , 1 1 0 X v a l u e s .................................................................. 1 1 0
unit s t a t e d i f f e r e n c e ................................................ 2 0 5 u n d e r test ................................................................ 9 3
United States, center
o f ............................ 4 0 5
u n i v e r s e , m a s t e r s of ......................................... 1 3 6
University
B r u n e l U n i v e r s i t y ................................. 4 3 , 1 2 1 J o h n H o p k i n s U n i v e r s i t y ........................ 3 5 5 of B e r k e l e y , California .................... 4 3 , 1 2 1
d e v i c e s .......................................................... 9 9 , 1 1 0 P L D o u t p u t s ...................................................... 2 6 0
trit
...................................................................................... 9 8 tritium ........................................................................ 373* T T L .....................................................................
58, 126
t u n e d - r a c e s .............................................................. 2 1 0
tungsten
.
..................................................................... 3 6 5
- t i t a n i u m fuse .................................................... 2 4 9
Tweedledee and T w e e d l e d u m ............... two's complements ............. 1 3 8 , 1 3 9 ,
358 141
g e n e r a t i n g ............................................................ 1 3 9 typewriter, first patent ................................... 1 6 ' t y p (typical) d e l a y s ..................................... 61, 7 4 t y p : m a x d e l a y s ......................................................... 7 7
of M a n c h e s t e r ................................................... 2 1 4 of M i c h i g a n ......................................................... 302* S h e f f i e l d H a l l a m U n i v e r s i t y ........................ 5
unknown X values
...... 9 8 , 1 0 2 , 1 0 3 , 1 0 9
~ X v a l u e s ............................................................. 1 1 5 # X v a l u e s ............................................................. 1 1 5 d e l a y s to a n d f r o m .......................................... 9 9 d y n a m i c Xs ............................................ 1 1 4 , 1 1 7 fault d e t e c t i n g ...................................................... 9 3 filters ("X filters") ........................................... 1 1 4 f u t u r e Xs ................................................................ 1 1 4 ~ X v a l u e s ....................................................... 1 1 5 # X v a l u e s ........................................................ 1 1 5 ID n u m b e r s ................................................... 1 1 6
b o l d -- k e y d e f i n i t i o n * - f o o t n o t e ~ - s i d e b a r
453
454
unknown / WL u n k n o w n X values ( c o n t i n u e d ) future Xs (continued) static v e r s u s d y n a m i c .............. 1 1 4 , 1 1 7 ID n u m b e r s ......................................................... 1 1 6 initialization c o n s i d e r a t i o n s ................... 1 1 2 i n v e r s e ( N O T ) X v a l u e s ........................... 1 1 5 m i x e d - s i g n a l c o n s i d e r a t i o n s ................. 1 1 3 representing c l a s h e s ..................................................... 98, I I 0 c o n t r o l l e d oscillations ........................... 1 1 0 u n c o n t r o l l e d oscillations 1 1 0 , 1 1 1 , 1 1 4 uninitialized s t a t e s ................................... 1 1 0 static Xs ..................................................... 1 1 4 , 1 1 7 versus
- X v a l u e s ....................................................... 1 1 5 # X v a l u e s ........................................................ 1 1 5 ? v a l u e s ............................................................. U v a l u e s ........................................................... w e l l - b e h a v e d Xs ............................................. with ID n u m b e r s ............................................ u n s i g n e d b i n a r y n u m b e r s ........................ U S A , c e n t e r of ...................................................... UUT UV
109 110 111 116 138 405
.................................................................................9 3 ................................................................2 5 1 , 3 6 2
VVV v a c u u m tubes ...................................................... 185 valence bonds .......................................................3 7 4 value systems ......................................................... 97 cross-product sets ......................................... 100 interval value sets .......................................... 102 Venn Diagrams ............................................................... 3 5 4 J o h n V e n n ........................................................... 3 5 4 V e r i B e s t I n c ............................... 10, 5 6 ~, 69, 7 2 ~ V e r i l o g .......................................................4 8 , 53, 6 0 d a t a t y p e s & logic sets .............................. 1 0 5 d e s i g n i n g with ..................................................... 5 6 ~ v e r t i c a l f u s e s ......................................................... 2 4 9 VHDL .......................................................4 8 , 53, 6 0 d a t a t y p e s & logic sets .............................. 1 0 5 d e s i g n i n g with ..................................................... 5 6 ~ s t d _ I o g i c ................................................................ 1 0 5 V i c t o r B e r m a n .................................................... 1 0 6 v i e w s ................................................................................. 5 5
b o l d - k e y definition
* - f o o t n o t e ~ -- s i d e b a r
virtual logic ............................................................... 2 9 3 V I U F ........................................................................... 1 0 6 V L S I .............................................................................. 2 0 9 v o l t a g e ............................................................................. 61 v o n Leibniz, G o t t f r i e d .................................... 3 5 6
WWW w a l k i n g o n e s t e s t ............................................. 3 1 9
Wall D r u g s ........................................................................ 4 1 0 G r e a t Wall of C h i n a ....................................... 1 1 ' t o w n in S o u t h D a k o t a ............................... 4 0 9 walrus ........................................................................... 14 w a r m a c h i n e s , m e d i e v a l .......................... 200 t r e b u c h e t .................................................. 2 0 0 , 2 1 7 water f r o z e n ........................................................................ 3 7 5 m o l e c u l e ................................................................ 3 7 4 w a v e - p i p e l i n i n g .................................................. 210 waves alpha waves ..............................................................5
sine waves ........................................................... 121 weak drive (Verilog logic value) ...................... 105 generic logic strength ................................. 100 W H (weak high) .......................... 101, 103 W L (weak low) ............................. 1 0 1 , 1 0 3 W X (weak X) ............................................... 101 web pages, B e b o p u t e r ................................... 25 well-behaved X values ................................... 111 Welles L a n e ( L u c i e ' s H u s b a n d ) ............. 4 1 5 W H ( w e a k high) ..................................... 101, 1 0 3 w h a t - i f a n a l y s i s .............................................. 52, 8 3 w h i n e r s , c r i n g i n g ................................................. 1 1 2 w h i t e n o i s e ...................................................................... 5*
Wild Bill Hickock .........................................................410 p a r t i e s ......................................................................... 1 4 W i l l i a m J e v o n s ................................................... 3 5 9 W i l l o w ........................................................................4 1 5 W i l s o n Inn, A r k a n s a s ...................................... 4 0 3 wired
- A N D ..........................................................253, 395 -OR ........................................................................2 5 3 W L ( w e a k l o w ) ....................................... 101, 1 0 3
woo ,c,u w o o d c h u c k s ..................................................... 6, 1 9 2 W o o d w a r d , O k l a h o m a .................................. 4 1 2 w o o l l y s w e a t e r s ....................................................... 42* worst-case a n a l y s i s ...................................................................... 7 7
/ z
i n v e r s e ( N O T ) X v a l u e s ........................... 1 1 5 m i x e d - s i g n a l c o n s i d e r a t i o n s ................. 1 1 3
o p e r a t i n g c o n d i t i o n s ................................... 2 0 9 w r i g g l e ........................................................................ 1 1 9 W X ( w e a k X) .......................................................... 1 0 1
representing c l a s h e s ..................................................... 9 8 , I I 0 c o n t r o l l e d oscillations ........................... 1 1 0 u n c o n t r o l l e d oscillations 1 1 0 , 1 1 1 , 1 1 4 uninitialized s t a t e s ................................... 1 1 0 static Xs ..................................................... 1 1 4 , 1 1 7
W y o m i n g .................................................................... 4 1 0
versus
XXX X (unknown)values 98, 102, 103, 109 - X v a l u e s ............................................................. 1 1 5 # X v a l u e s ............................................................. 1 1 5 d e l a y s t o a n d f r o m .......................................... 9 9 d y n a m i c Xs ............................................ 1 1 4 , 1 1 7 fault d e t e c t i n g ...................................................... 9 3 filters ("X filters") ........................................... 1 1 4 f u t u r e Xs ................................................................ 1 1 4 - X v a l u e s ....................................................... # X v a l u e s ........................................................ ID n u m b e r s ................................................... static v e r s u s d y n a m i c .............. 1 1 4 , ID n u m b e r s ......................................................... initialization c o n s i d e r a t i o n s ...................
115 115 116 117 116 112
- X v a l u e s ....................................................... # X v a l u e s ........................................................ ? v a l u e s ............................................................. U v a l u e s ........................................................... w e l l - b e h a v e d Xs ............................................. with ID n u m b e r s ............................................
XORs
115 115 109 110 111 116
in R e e d MUller logic .................................... 3 0 8 v e r s u s X N O R s ( L F S R s ) ........................... 2 2 1 X - r a y l i t h o g r a p h y ............................................ 3 6 2
ZZZ
Z ( h i g h - i m p e d a n c e ) v a l u e s 99, 101, 103 fault detecting ...................................................... 93 Z (status flag) .......................................... 147, 148
b o l d - k e y definition
*- footnote ~ - sidebar
455
This Page Intentionally Left Blank
Designus Maximus Unleashed!
How to get your very own copy of ......
Bebop to the Boolean Boogie (An Unconventional Guide to Electronics) by Clive "Max" Maxfield ISBN: 1-878707-22-1 To obtain copies of this book, please mail this form, along with your check or m o n e y order, to: HighText Publications PO Box 1489 Solana Beach, CA 92075, USA For credit card orders, call toll-free: 1-800-247-6553, or fax this form with your credit card information to 540-567-2539. Name
.........................................................................................................
Address .........................................................................................................
Phone
.........................................................................................................
[ ] Please send me ..............copies of Bebop to the Boolean Boogie at $39.00 per copy ($35.00 plus $4.00 shipping and handling). [ ] I have enclosed a check .............. or m o n e y order .............. m a d e out to HighText Publications in the a m o u n t of $ ..................... [ ] Please charge this order to my credit card: N a m e on card ..................................................................................... Type of card ..................................................................................... Number ..................................................................................... Expiration date ..................................................................................... Signature .....................................................................................
[ ] Please send me a copy of your current catalog International orders, please call +1-419-281-1802 or fax +1-419-281-6833 Visit our web site at www.hightext-publications.com
45 7
458
Designus Maximus Unleashed!
H o w to get your very own copy of ......
Bebop BYTES Back (An Unconventional Guide to Computers) Includes
Beboputer T M Virtual Computer on CD-ROM (For Windows 95) by Ciive "Max" Maxfield & Alvin Brown ISBN: 0 - 9 6 5 1 9 3 4 - 0 - 3
To obtain copies of this book, please mail this form, along with your check or m o n e y order, to: Doone Publications 7950 H w y 72W, # G 1 0 6 Madison AL 3 5 7 5 8 USA For credit card orders, call toll-free: 1-800-311-3753, or fax this form with your credit card information to 205-837-0580.
N
a
m
e
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Address .........................................................................................................
Phone
.........................................................................................................
[ ] Please send me ................ copies of B e b o p BYTES Back at $49.95 per copy plus $7 S & H priority [ ] or $4 S & H book rate [ ] [ ] I have enclosed a check ................ or m o n e y order ................ m a d e out to Doone Publications in the a m o u n t of $ ................ [ ] Please charge this order to my credit card (visa or mastercard): N a m e on card ..................................................................................... Type of card ..................................................................................... Number ..................................................................................... Expiration date ..................................................................................... Signature ..................................................................................... International orders, please call or fax + 1 - 2 0 5 - 8 3 7 - 0 5 8 0 Visit our web site at w w w . d o o n e . c o m