This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Hole fog
FIGURE 25.1
??
?
Electron fog model in metal and semiconductor.
Ev
Vojin Oklobdzija/Digital Systems and Applications 6195_C025 Final Proof page 3 4.10.2007 3:47pm Compositor Name: VBalamugundan
25-3
Microelectronics for Home Entertainment
IBase φ = φi −VBE φ
Emitter
IEmitter n-type
Base Ln =
Dnτn
Qn = AEWB n(0)/2 WB
p-type
τt = WB2/2Dn ICollector
ICollector IE = Io exp(VBE/kT) = βIB IB = Qn/τn
FIGURE 25.2
n-type Collector
Io = q AE Dn {ni2/Na}/WB
Bipolar transistor action.
semiconductor are depicted as the moisture on the top of a floating box in water. If the box is heavy, the water surface is very close to the top of the box and there is a lot of moisture. This corresponds to the n-type semiconductor band diagram. If the box is relatively light, only a small bottom portion of the box is submerged into the water and the top of the box can be quite dry, and there will a lot of bubbles (holes) under the bottom of the box. This corresponds to the p-type semiconductor. Applying these p- and n-type semiconductor box models, a diode behavior model can be constructed and the diode rectifying characteristics can be explained.
25.2.2
Bipolar Transistor Device Model
Figure 25.2 shows energetic boys (electron fog in the emitter region) trying to climb a hill (base region) to catch the girls on the hill (hole fog, which is the majority carrier in the base region). Some of the boys can luckily catch girls on the hill, recombine, become happy and disappear as light or heat energy. But the hill width is very short and most of the boys will not have enough time to catch girls and fall down the cliff (the base-collector depletion region). The poor boys are now collected deep down the cliff in the collector region. In the time interval Dt, IEDt boys are jumping to the hill to catch girls on the hill. Some boys are lucky enough to catch girls on the hill. The number of girls caught by the energetic boys in Dt is IB Dt, which is proportional to the number of the average boys on the hill Qn. The girls are supplied as the base current IB. Other salient physical parameters normally used in the bipolar transistor device modeling are also given in the figure.
25.2.3
MOSFET Model
Figure 25.3 shows a MOSFET structure. If you see how the electron fog moves from the left source nþ region to the right nþ region through the SiSiO2 surface under the MOS gate, one can see that it is also considered as an electron transportation along an npn-structure. In this case, however, the potential in the p-region is controlled by the gate voltage isolated by the thin oxide. The figure shows the electron fog moving from the source to the region under the gate at the onset of strong inversion at the SiSiO2 surface. At this point the electron fog density at the channel is equal to the density of the majority ‘‘hole fog’’ in the p-type Si substrate, and the gate voltage at this point is defined to be the threshold voltage Vth of the MOSFET. Figure 25.4 shows water flowing from the right source region to the left drain region through the water gate. The depth of the channel Vch is given as (Vg Vth), where Vg is the applied gate voltage
Vojin Oklobdzija/Digital Systems and Applications 6195_C025 Final Proof page 4 4.10.2007 3:47pm Compositor Name: VBalamugundan
25-4
Digital Systems and Applications
Vth(VBB,VS) = VFB + {B + VS − VBB} + γ
{B + VS − VBB}
VFB = VBB − (kT/q) In{Nc NA/ni2} − {χsi − φm}/q + QSS/COX γ=
K = q εSi NA/C2OX
B = 2(kT/q) In(NA/ni)
2K
VG = Vth (VBB, VS) Gate
Source n+
Drain
z
n+
p-Si
Enfog (z)
q ∆V
EC (z) VS
kT In(Nc/Nd+)
φS(inv)
Enfog (z)
VBB′ 0
EC (z)
L
VD
z
FIGURE 25.3
MOSFET at Onset VG ¼ Vth.
VGate I = µQE VSource e−
E = (VDrain − VSource)/L VSf (x)
VDrain
dQ
Q = WCo ∆V = WCo (Vch − VSf) Vch = (VGate − Vth)
Vch
L
FIGURE 25.4
MOSFET I-V characteristics.
which induces the channel depth Vch ¼ (Vg Vth). The amount of the water flow I is proportional to the mobility m, the water amount Q under the gate and the electric field E, i.e., I ¼ mQE can be written in this rough approximation. In the first approximation, take E ¼ (Vd Vs)=L, where Vd, Vs, and L are the drain voltage, the source voltage, and the gate channel length. The total charge can be approximated as Q ¼ WCoDV, where W and Co are the channel width and the oxide capacitance of the actual corresponding MOSFET transistor, respectively. Now, DV corresponds to the voltage difference between the average water surface (Vd þ Vs)=2 and the channel potential Vch ¼ (Vg Vth).
Vojin Oklobdzija/Digital Systems and Applications 6195_C025 Final Proof page 5 4.10.2007 3:47pm Compositor Name: VBalamugundan
25-5
Microelectronics for Home Entertainment
That is, DV ¼ (Vd þ Vs)=2 Vch. Hence, since Q ¼ WCoDV, the equivalent amount Q of the water (or charge) under the gate is given as Q ¼ WCo[(Vd þ Vs)=2 Vch], where Vch ¼ (Vg Vth), E ¼ (Vd Vs)=L. Now if these relationships are put into the original equation I ¼ mQE, this leads, without going through the calculations normally done in the classical gradual channel approximation, finally to the classical MOS I-V equation: I ¼ (W =2L)mCo [Vd þ Vs 2Vch ](Vd Vs ) ¼ (W =2L)mCo [Vd þ Vs 2(Vg Vth )](Vd Vs )
25.2.4
Buried Channel CCD Structure
Figure 25.5 shows the physical structure and the potential profile of a buried channel CCD. The signal charge is the electron fog in the lightly doped n-region at the surface. As you can see, these signal charges are isolated from the direct contact to the SiSiO2 interface and do not suffer the charge trapping. This structure gives a good CCD charge transfer efficiency of more than 99.9999% along the buried channel CCD shift register in the direction of this chapter. At very high light, excess charge can be drained into the substrate by lowering the well voltage Vwell or making the substrate voltage very deep and inducing the punch-through mode in the n-p-n(sub) structure. High-density and high-performance, solid-state imagers became available applying this structure as the scanning system. The surface n-layer is completely depleted when there is no signal charge. It is dynamically operated. It is considered as one extended application of dynamic MOS device operations. The most wellknown dynamic operation of a MOS device application is the DRAM data storage operation.
25.2.5
HAD Sensor, a pnp-Substructure
The floating diode structure for image sensing unit was well known in early 1970s. The author simply proposed to use a pnp-substructure instead for the imaging element. Figure 25.6 shows the proposed structure. It is a simple pnp bipolar transistor structure itself with a very lightly doped base region, operated in the strong cut-off mode with the base majority charge completely depleted. It is the first practical application of the bipolar transistor in dynamic operation mode, which turned out to be the best structure and way to convert photons to electrons for imaging including the current
VWell VG SiO2
n
p
n-sub
VSub
VWell
VG
SiO2
VSub Signal charge (electron fog)
FIGURE 25.5
Buried channel CCD structure.
Vojin Oklobdzija/Digital Systems and Applications 6195_C025 Final Proof page 6 4.10.2007 3:48pm Compositor Name: VBalamugundan
25-6
Digital Systems and Applications
VHAD
VWell
SiO2 p
p
n
VSub
n-sub
VE VHAD
SiO2
SiO2
SiO2
VWell
VB
VC SiO2
P+ n p
SiO2
n-sub VSub VSub
FIGURE 25.6 A typical PNP bip Tr structure in the early 1970s, and a proposed application as an image-sensing element in 1975.
MOS imagers applications. The sensor structure is now called the HAD sensor in Sony’s current video cameras and digital still cameras.
25.3
LSI Chips for Home Entertainment
25.3.1
Digital Still Camera
The picture in the Fig. 25.7 shows a 2=3 in. 190 K pixel IT CCD imager, ICX016=XC-37, which the author designed when he was still a young CCD design engineer in early 1981. This model became the model of the world’s first consumer CCD video camera for mass production in 1983. The goal now is to become ‘‘Imaging Device No. 1!’’ Many applications of CCD and LCD are used, as seen in Fig. 25.8.
CCD-G5
1983
VGA QVGA SVGA XGA SXGA HD
640 × 480 320 × 240 800 × 600 1024 × 768 1280 × 1024 1280 × 720 DSC-P1
2/3 In. 190 K Pixel IT CCD Imager
FIGURE 25.7
2048 × 1536 2048(3:2) 1600 × 1200 1280 × 960 640 × 480
The world’s first consumer CCD video camera for mass production 1983.
Vojin Oklobdzija/Digital Systems and Applications 6195_C025 Final Proof page 7 4.10.2007 3:48pm Compositor Name: VBalamugundan
25-7
Microelectronics for Home Entertainment
Probe Scanner CCD DSC Survey
Projector in jets HT poly-si-LCD Rear projector Camcorder
Glastron Front projector
FIGURE 25.8
25.3.2
Applications of CCD and LCD.
AIBO, a Home Entertainment Robot
This subsection reviews the most popular product, the entertainment robot AIBO shown in Fig. 25.9. When you buy a brand new AIBO, it is like a baby, so it does not have any knowledge. It has a certain intelligence level that is preprogrammed. You can play with the AIBO and gradually your AIBO will recognize your gestures and voices. AIBO will remember the wonderful time you spent together with it.
Stereo microphone Tactile sensor Acceleration sensor, gyrometer, etc
Memory stick CCD color camera (180 K pixel)
64 bit RISC processor 16 MB memory aperios OS
Speaker
18 DOF
Weight: 1.6 Kg Size: 156 × 266 × 274 mm (without tail)
FIGURE 25.9
AIBO model ERS-110.
Li-ion battery (7.2 V ` 2900 mAh)
Vojin Oklobdzija/Digital Systems and Applications 6195_C025 Final Proof page 8 4.10.2007 3:48pm Compositor Name: VBalamugundan
25-8
Digital Systems and Applications
Actually the experience and knowledge AIBO accumulates during these memorable moments are stored in a chewing gum size NVRAM called a memory stick shown in Fig. 25.9. This memory stick can be also used in other products such as PCs, digital audios, and DSCs. Unfortunately it is not used in PS and PS2 for generation compatibility as of now. But in one form or another, there is a definite need NVRAMs in PS, DSC, digital audio, PC, and the future home entertainment robots. The twenty-first century will become an era of autonomous robots, which are partners of human beings. Autonomous robot will help and support people in the future. AIBO is designed to be the first product model of robot entertainment systems. The main application of this robot is a pet-style robot, which must be lifelike in appearance. Although AIBO is not a nursing robot, the development of AIBO is the first step of the era of autonomous robots in the twenty-first century. The following are some works done in the Digital Creation Laboratory at Sony. Most of the works were actually done by the pioneering engineers, Mr. Fujita, Mr. Kageyama, Mr. Kitano, and Mr. Sabe. The epoch-making debut of AIBO, model ERS-110 in 1999, had the following features: First of all, it has a CCD color camera with 180 K pixels. Of course, it does not have a mechanical shutter. It does not have any eyelid. It has an audio sensor called microphones, a pair of them for stereo audio pick-up. It also has an acceleration sensor, gyrometer, and also a tactile sensor. So, if you pat it on the head gently, it will show some happy gesture. If you strike it on the head, it will interpret it as your sermon. The moving joints have 18 degrees-of-freedom in total. Before introducing this first AIBO model, ERS-110, the basic research period lasted about five years. Now we have the second generation AIBO model, ERS-210 and also another type of robot, Sony Dream Robot, SDR-3, as seen in Fig. 25.10. The second generation AIBO model, ERS-210, has the following features: Joint DOF: neck: 3, mouth: 1, ear: 2, legs: 3 3 4, tail: 2, total: 20 Sensors: color CMOS image sensor (1100 K pixel) Microphone 3 2 Infrared sensor Acceleration sensor 3 3 AIBO 2nd generation, ERS-210
FIGURE 25.10
New AIBO models: ERS-210 and SDR-3.
Sony dream robot, SDR-3
Vojin Oklobdzija/Digital Systems and Applications 6195_C025 Final Proof page 9 4.10.2007 3:48pm Compositor Name: VBalamugundan
25-9
Microelectronics for Home Entertainment
Tactile sensor 3 7 CPU: 64 bit RISC processor (192 MHz) Memory: 32 MB DRAM OS, Architecture: Aperios, OPEN-R1.1 IF: PCMCIA, memory stick The model SDR-3 has the following features: Joint DOF: neck: 2, body: 2, arms: 4 3 2, legs: 6 3 2, total: 24 Sensors: color CCD camera 1800 K pixel, microphone 3 2 Infrared sensor, acceleration sensor 3 2 gyrometer 3 2, tactile sensor 3 8 CPU: 64 bit RISC processor 3 2 Memory: 32 MB DRAM 3 2 OS, Architecture: Aperios, OPEN-R It weighs 5.0 kg and its size is 500 3 220 3 140 mm. It has an OPEN-R architecture. It is made of configurable physical components (CPCs). The CPU in the head recognizes the robot configuration automatically. The components are built for plug & play or hot plug-in use. The relevant information in each segment is memorized in each CPC. Each CPS may have a different function such as behavior planning, motion detection, color detection, walking, and camera module. Each CPS is also provided the corresponding object oriented programming and software component. With this OPEN-R architecture, the body can be decomposed or assembled anyway for plug & play or hot plug-in use. The diagram in Fig. 25.11 shows the details of the logical hardware block diagrams, which contain DMAC : FBK: CDT: IPE and HUB Internal bus Image data
DMAC
RISC CPU
CCD camera
FBK /CDT
SDRAM
Serial port Parallel port
Remoter computer for development
DSP/IPE
Flash ROM
Peripheral interface Memory stick
Battery manager
OPEN-R bus host controller
PC card
OPEN-R Bus
HUB
HUB
HUB
HUB AD
AD
HUB
DA
AD
DA
DA
Potentio- actuator meter
Potentio- actuator Potentio- actuator meter meter
FIGURE 25.11
Logical hardware block diagram.
AD Tactile sensor
DA
AD
DA
Mic
Speaker
Vojin Oklobdzija/Digital Systems and Applications 6195_C025 Final Proof page 10
25-10
4.10.2007 3:48pm Compositor Name: VBalamugundan
Digital Systems and Applications
Switch
Switch
Motor
Motor
x3
AGR-S
x3
AGR-S Motor
Mic
x2 Speaker
Switch
CPU AGR-S
AGR-S
AGR-S
AGR-L
Gravity sensor AGR-S
CCD
Gyro meter
Motor x3 AGR-S
Motor OPEN-R Bus Switch
FIGURE 25.12
x3
AGR-S Motor
x3
Switch
Topology of ERS-110.
In the following two figures, Figs. 25.12 and 25.13, the topology of model ERS-110 and Model SDR-3x are shown, respectively. At the same time, it is very important to have a powerful software platform that covers the top semantic layer to the deep bottom of the device driver objects codings. Careful design considerations are very important to make the middleware software components.
25.3.3
Memory Stick
AIBO, VAIO PC, and other audio and video products now use memory sticks as digital data recording media. In July 1997, Sony had a technical announcement. The following year, in January 1998, the VAIO center was inaugurated. On July 1998, Sony had a product announcement. The 4 Mbyte and 8 Mbyte memory sticks were on sale in September 1998. In February 1999, Sony announced Magic Gate, that is, memory sticks with copyright protection feature. Figure 25.14 shows the form comparison. The memory stick is unique in its chewing gum-like shape and it is much taller in length than other media. The difference in appearance of memory stick from other media is clear in size and features. Figure 25.15 shows the internal structure. It is fool proof. It features a simple 10-pin connection and it is impossible to touch the terminals directly. The shape was designed intentionally to make exchanging of media easy, without having to actually see them, and to guide the direction for easy and correct insertion. Much contrivance is made in the design. In order to decrease the number of connector pins for ensuring reliability of the connectors, serial interface was adopted instead of parallel interface used in conventional memory cards. As a result, connector pins were reduced to 10. And as the structure is such that these pins do not touch the terminal directly, extremely high reliability is ensured. The length is same as AA size battery of 50 mm for further deployment to portable appliances. The width is 21.5 mm and the thickness is 2.8 mm.
Vojin Oklobdzija/Digital Systems and Applications 6195_C025 Final Proof page 11
4.10.2007 3:48pm Compositor Name: VBalamugundan
25-11
Microelectronics for Home Entertainment
Mic CPU
AGR-S Speaker
AGR-L
AGR-S Motor
Gravity sensor
Motor
Motor
x3
CCD x2
AGR-S x2
AGR-S
AGR-S
Motor
CPU
x3
Gyro meter AGR-S
Motor
AGR-L
x2 Motor OPEN-R Bus
Motor
Motor
Motor
x3
x3
x3 switch
Topology of SDR-3x.
21.5 mm 2.8 mm
Smart media
45 mm
Compact flash
36 mm
50 mm
Memory stick
FIGURE 25.14
AGR-S
AGR-S
x3 switch
FIGURE 25.13
AGR-S
AGR-S x3
43 mm
37 mm
t 3.3 mm
t 0.8 mm
Form comparison.
The memory stick consists of Flash EEPROM and a controller, controlling multiple Flash EEPROM, flexible to their variations, and capable of correcting errors unique to different Flash EEPROMs used. The memory stick converts parallel to=from serial data with the controller designed in compliance with the serial interface protocol; any kind of existing or future Flash EEPROM can be used for the memory stick. The function load on the controller chip is not excessive, and its cost can be kept to a minimum. It is light and the shape makes it easy to carry around and to handle. Also, the write-protection switch enables easy protection of variable data.
Vojin Oklobdzija/Digital Systems and Applications 6195_C025 Final Proof page 12
4.10.2007 3:48pm Compositor Name: VBalamugundan
25-12
Digital Systems and Applications
Vss BS Vcc DIO Reserve INS Reserve SCLK Vcc Vss
Controller
Flash 512B 16B
Register S/P & P/S I/F
Page buffer (512B) ECC(3B) Attri. ROM
FLASH I/F
512B 16B 512B 16B 512B 16B
Sequencer
OSC Cont
512B 16B 512B 16B 512B 16B
Reset
FIGURE 25.15
Internal structure.
For still-image format, DCF standardized by JEIDA is applied. DCF stands for design rule for camera file system and JEIDA stands for Japan Electronic Industry Development Association. For voice format, ITU-T Recommendation G.726 ADPCM is adopted. The format is regulated for applications that convert voice data to text data by inserting a memory stick to a PC. The memory stick can handle multiple applications such as still image, moving image, voice, and music on the same media. In order to do this, formats of respective application and directory management must be stipulated to realize compatibility among appliances. Thus, simply by specifying the ‘‘control information’’ format, one can have a new form of enjoyment through connecting AV appliances and the PC. This format, which links data handed in AV appliances, enables relating multiple AV applications. For example, voice recorded on IC recorder can be dubbed on to a still image file recorded by a digital still camera. Presently, the music world is going from analog to digital, and the copyright protection issue is becoming serious along with the wide use of the Internet. The memory stick can provide a solution to this problem by introducing ‘‘Magic Gates (MG),’’ a new technology. Open MG means (1) allowing music download through multiple electronic music distribution platforms, (2) enabling playback of music files and extracting CD on PCs (OpenMG Jukebox), (3) transferring contents securely from PCs to portable devices. Figure 25.16 shows the stack technology applied to the memory stick with four stacked chips.
25.3.4
PlayStation 2
PlayStation 2 was originally aimed at the fusion of graphics, audio=video, and PC. The chipset includes a 128-bit CPU called ‘‘Emotion Engine’’ with 300 MHz clock frequency with direct Rambus DRAM of
Vojin Oklobdzija/Digital Systems and Applications 6195_C025 Final Proof page 13
4.10.2007 3:48pm Compositor Name: VBalamugundan
25-13
Microelectronics for Home Entertainment
4 Stacked chip
Applied to memory stick
FIGURE 25.16
Stack technology.
32 Mbyte main memory. The chipset also includes a graphic synthesizer chip with 150 MHz clock frequency. It has 4 MB video RAM as an embedded cache. As SPUs, the chipset also has an I=O processor for X24 speed CR-ROM drive and X4 speed DVDROM. Figure 25.17 shows PlayStation 2 (SCPH-10000) system block diagram. PlayStation 2, which Sony Computer Entertainment, Inc., released in March 2000, integrates games, music, and movies into a new dimension. It is designed to become the boarding gate for computer entertainment. PlayStation 2 uses an ultra-fast computer and 3-D graphics technology to allow the creation of video expressions that were not previously possible. Although it supports DVD, the latest media, it also features backward compatibility with PlayStation CD-ROM so that users can enjoy the several thousand titles of PlayStation software. PlayStation 2 is designed as a new generation computer entertainment system that incorporates a wide range of future possibilities. The table shows the performance specifications of the graphic synthesizer chip, CXD2934.
PlayStation 2 Main memory (direct RDRAM)
Memory Link /USB
I/O processor
Controller
Emotion engine (128-bit CPU)
Two-wavelength laser coupler SLK3201PE Motor
PD IC CXA250BH2
RF amp. CXA2605R
Graphics synthesizer
Digital video encoder
CXD2934GB
Analog video encoder CXA3525R
CD/DVD front-end
Sound processor
CXD1869Q
CXD2942R
Memory (SCPH-10000)
FIGURE 25.17
PSX2 system block diagram.
Disc controller CXP 102064R
Display output
Sound output
Vojin Oklobdzija/Digital Systems and Applications 6195_C025 Final Proof page 14
25-14
4.10.2007 3:48pm Compositor Name: VBalamugundan
Digital Systems and Applications
Playstation 2
Cross sectional view
1Metal
Capacitor
P-TEOS
Bit line 0.25 µm CMOS 4 MB embedded DRAM 42.7 M transistors Clock 150 MHz Band width 48 GB/s 75 M Polygon per sec 384 pin BGA
FIGURE 25.18
W LOCOS Logic Tr Word line
BMD (Buried metal diffusion)
4 MB EmDRAM for PSX2.
Clock Frequency Number of pixel engines Hybrid DRAM capacity Total memory bandwidth Maximum number of display colors Z buffer Process Technology Total number of transistors 384-pin BGA image output formats
150 MHz 16 parallel processors 4 MB@150 MHz 48 GB=s 2560 bits 32 bits (RGBA: 8-bit each) 0.25 mm 43 M Tr’sPackage NTSC=PAL, D-TV, VESA (upto 1280 3 1024 dots)
In addition to the 128-bit CPU Emotion Enginey and I=O processor, Playstation 2 adopts several advanced technologies. The graphics synthesizer graphic engine, CXD2934GB, takes full advantage of embedded DRAM system LSI technology. Figure 25.18 shows the chip photograph of Sony’s 0.25 mm CMOS 4 MB embedded DRAM, which has 42.7 M Trs. The clock rate is 150 MHz, with 48 GB=s bandwidth. It can draw 75 M polygons per second. It has 384 pin in BGA. Its cross-sectional view is also shown here. The semiconductor’s optical integrated device technology contributes significantly to miniaturization and high reliability in the optical pickups, SLK3201PE, a two-wavelength laser coupler chip. PlayStation 2 also adopts the optical disc system chip solution which has a solid track record, CXD2942R, a sound processor chip, and has earned the trust of the optical disc system market. It also includes CXD1869 (CD=DVD signal processor LSI), CXP102064R (disk controller), CXA2605R (Cd=DVD RD matrix amplifier), and CXA3525R (analog video encoder). The first commercial product for use in consumer products were the 0.5 mm LSI chips for 8-mm camcorders in 1995. Then, Sony had 0.35 mm LSI chips for MD products with low voltage operation of 2.0 V. Now, the 0.25 mm PlayStation 2 graphics synthesizer has eDRAM with 48 GB=s bandwidth. Figure 25.19 shows the EmDRAM history. Sony Em-DRAM has a high-band performance of 76.8 GB=s. See Fig. 25.20. In the following three figures, Figs. 25.21 through 25.23, the memory cell size trend, some details of our embedded DRAM history, and the vertical critical dimensions between 0.25 and 0.18 mm EmDRAM process are shown, respectively.
Vojin Oklobdzija/Digital Systems and Applications 6195_C025 Final Proof page 15
4.10.2007 3:48pm Compositor Name: VBalamugundan
25-15
Microelectronics for Home Entertainment
Higher performance system Many applications/products 0.25 µm 1999 MP start
0.35 µm 1996 MP start
1st EmDRAM product 0.50 µm 1995 MP start
Graphics synthesizer (PS2) MD LSI 2 Mbit DRAM + 200 kG 2.0 V low voltage operation
8 mm camcoder LSI 512 kbit DRAM + 50 kG
FIGURE 25.19
32 Mbit DRAM + 1500 KG DRAM bandwidth 48 GB/s
Embedded DRAM history.
High-Bandwidth DRAM Performance Peek bandwidth (GByte/s) 100
76.8 GB/s
Sony Em-DRAM 33 GB/s
MO EmDRAM ISSCC 98 Champion
10 M 2.0 GB/s NM EMDRAM 1 0.5 GB/s s DRAM
0.6 GB/s
EmDRAM 10.6 GB/s
1.6 GB/s (Direct)
Rambus DRAM (Concurrent--60)
0.1 96.1Q
FIGURE 25.20
97.1Q
98.1Q 99.1Q MP Start (C. Year)
00.1Q
Performance of embedded DRAM.
Now, a few words on the feature and critical issues of 130 nm Emb-DRAM LSI process. The most advanced design rule to achieve high performance Tr – Enhance resolution, and refine OPC system (speed, accuracy) Large variation in duty cycles Reduce isolation—dense bias High global step-> Enlarge D.O.F High aspect hole process Enhance etching durability
Vojin Oklobdzija/Digital Systems and Applications 6195_C025 Final Proof page 16
4.10.2007 3:48pm Compositor Name: VBalamugundan
25-16
Digital Systems and Applications Embedded DRAM: COB structure from 0.25 µm
Memory cell size (µm2)
ASC4 10 10.0 10 4M−1
8 4M−2
1.0
Embedded SRAM
7~8
ASC5 3.5
6~7
5 4
Embedded DRAM 4.5 16M−1
ASC6 0.72
2.5 16M−3
Commodity DRAM
ASC7
1.8 64M−1 256 M−1
0.3 1G−1
0.1 0.8
0.7
0.5
ASC8
0.7−0.6
0.4
0.35
0.25
0.18
0.2 4G−1 0.13
Design rule (µm)
FIGURE 25.21
Design rule
DRAM cell size
Key process technology
CMOS memory cell size.
0.7 µm 2.10 µm x 4.28 µm = 8.99 (µm2) 2 Metal(Al) layer BPSG reflow Spin on glass Stacked capacitor
0.35 µm 1.28 µm x 2.74 µm = 3.51 (µm)
0.25 µm 0.60 µm x 1.32 µm = 0.79 (µm2)
0.18 µm 0.44 µm x 0.84 µm = 0.37 (µm2)
3 Metal(Al-Cu) layer SiO2 dummy Blanket W(Tungsten) Stacked capacitor
5 Metal(Al-Cu) layer Buried metal diffusion CMP Poly shrunken contact cylindrical capacitor
6 metal(Al-Cu) layer Self aligned contact Hemispherical - Grained silicon Self aligned Co silicide Shallow trench isolation
1994~
1998~
2000~
Logic cross section
DRAM cross section
Mass production
FIGURE 25.22
1992~
Embedded DRAM history.
OPC ¼ optical proximity correction DOF ¼ depth of focus In the 0.18 mm EmDRAM process, the optical proximity correction (OPC) technology and the phaseshift mask technology (PSM) were very important. See Figs. 25.24 and 25.25. Many high-performance manufacturing and measurement automatic machines, such as those shown in Fig. 25.26, are necessary.
Vojin Oklobdzija/Digital Systems and Applications 6195_C025 Final Proof page 17
4.10.2007 3:48pm Compositor Name: VBalamugundan
25-17
Microelectronics for Home Entertainment
0.25 µm process
0.18 µm process 1MT W.CMP Capacitor HSG
Logic Tr. Gate Oxinitride Co Salicide 0.15 µ µm Dual_Gate
Bit Line
SAC STI
BMD 0.25 µm
0.18 µm
(Buried Metal on Diffusion layer)
Logic Tr. Gate
DRAM cell 0.88*0.42 µm2
DRAM cell 1.32*0.60 µm2 W-Policide single gate (L = 0.25 µm) BMD(Buried metal on diffusion layer)
FIGURE 25.23
0.15 µm
DRAM Tr. Gate
DRAM Tr. Gate
W-Policide dual gate (L = 0.15 µm) High aspect ratio 1st metal contact
Em-DRAM process technology.
MASK DATA without OPC
MASK DATA with OPC
Printed pattern
DRAM cell
FIGURE 25.24
SRAM cell
Optical proximity correction.
Photo resist pattern
0.16 mm
Logic area
FIGURE 25.25
Phase-shift mask (PSM) technology.
0.16 mm
DRAM area
Vojin Oklobdzija/Digital Systems and Applications 6195_C025 Final Proof page 18
4.10.2007 3:48pm Compositor Name: VBalamugundan
25-18
Digital Systems and Applications
y *3 nm TIS accuracy *130 wafers per hour (300 mmφ) Reference pattern A Resist pattern B *Ethernet data transfer system Courtesy of Hitachi Electronics Engineering Co., Ltd x
O
B
B
δ A
A
AB 40 µm
FIGURE 25.26
LA3200
Overlay accuracy measurement system.
0.18 mm embedded DRAM logic 5 metal 5 metal via 4 metal 4 metal contact 3 metal 3 metal via 2 metal 2 metal via Cell capacitor
1 metal 1 metal contact
Bit line Word line P1L (0.15Gate) DRAM region
FIGURE 25.27
Logic region
Cross-sectional view.
Figure 25.27 shows the cross-sectional view of 0.18 mm EmDRAM, which was realized by utilizing all these technologies and high-performance machines. Now some comments on key factors: technology extention such as optical extention and full flat process technology. KrF lithograpy optical extention features high NA, ultra-resolution, thin photo resist, and the OPC technology. Wirings are fully planarized interlayers of Cu=Dual Damascene. The EmDRAM features a fully planarized capacitor with the global step-less DRAM=logic structure by self-align process.
Vojin Oklobdzija/Digital Systems and Applications 6195_C025 Final Proof page 19
Microelectronics for Home Entertainment
25.4
4.10.2007 3:48pm Compositor Name: VBalamugundan
25-19
Conclusion
Some introductory comments on the basic semiconductor device concepts were given. They are strongly related to the microelectronics of the present home entertainment LSI chips. The chapter covered in detail some product specifications and performance aspects of the home entertainment LSI chip sets, such as those used in digital cameras, home robotics, and games. Cost of EmDRAM and its solutions by using EmDRAM are strongly related with new market creation such as PSX2. The EmDRAM technology for PS2=computer and some other future home entertainment electronics gadgets has a potential to be the technology driver in the years to come.
References 1. Yoshiaki Hagiwara, ‘‘Solid State Device Lecture Series Aph=E183 at CalTech’’ in 1998–1999, http:www.ssdp.Caltech.edu=apheel83=. 2. Yoshiaki Hagiwara, ‘‘Measurement technology for home entertainment LSI chips,’’ Presentation at the Tutorial Session in ICMTS2001, Kobe, Japan, March 19–22, 2001. 3. M. Fujita and H. Kitano: ‘‘{{D}evelopment of and {A}utonomous {Q}uadruped {R}obot for {R}obot {E}ntertainment},’’ Autonomous Robots, vol. 5, pp. 7–8, Kluwer Academic Publishers, Dordrecht, the Netherlands, 1998. 4. Kohtaro Sabe, ‘‘Architecture of entertainment robot-development of AIBO –,’’ IEEE Computer Element MESA Workshop 2001, Mesa, Arizona, Jan. 14–17, 2001. 5. JP 1215101 (a Japanese Patent #58-46905), Nov. 10, 1975 by Yoshiaki Hagiwara.
Vojin Oklobdzija/Digital Systems and Applications 6195_C025 Final Proof page 20
4.10.2007 3:48pm Compositor Name: VBalamugundan
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 1 11.10.2007 8:30pm Compositor Name: TSuresh
26
Mobile and Wireless Computing 26.1
Bluetooth—A Cable Replacement and More.............. 26-2 What is Bluetooth? . Competing Infrared Technology . Secure Data Link . Master and Slave Roles . Bluetooth SIG Working Groups . The Transport Protocol Group . The Bluetooth Transceiver . The Middleware Protocol Group . The Application Protocol Group . Bluetooth Development Kits . Interoperability . Bluetooth Hardware Implementation Issues
26.2
Signal Processing ASIC Requirements for High-Speed Wireless Data Communications ............. 26-8 Introduction . Emerging High-Speed Wireless Systems . VLSI Architectures for Signal Processing Blocks . Conclusions
26.3
University of North Florida
26.4
Babak Daneshrad 26.5
Mohammad Ilyas Florida Atlantic University
Abdul H. Sadka Giovanni Seni
26.6
Jayashree Subrahmonia IBM Thomas J. Watson Research Center
Indiana University
Ingrid Verbauwhede Katholieke Universiteit Leuven and UCLA
.
Computer Networks . Challenges and Issues
.
Video over Mobile Networks ..................................... 26-39
Pen-Based User Interfaces—An Applications Overview ................................................ 26-50 Introduction . Pen Input Hardware . Handwriting Recognition . Ink and the Internet . Extension of the Pen-and-Paper Metaphor . Pen Input and Multimodal Systems . Summary
Motorola Human Interface Labs
Larry Yaeger
.
Introduction . Evolution of Standard Image=Video Compression Algorithms . Digital Representation of Raw Video Data . Basic Concepts of Block-Based Video Coding Algorithms . Subjective and Objective Evaluation of Perceptual Quality . Error Resilience for Mobile Video . New Generation Mobile Networks . Provision of Video Services over Mobile Networks . Conclusions
Santa Clara University
University of Surrey
Communications and Computer Networks ............. 26-27 A Brief History . Introduction Resource Allocation Techniques Summary and Conclusions
University of California
Samiha Mourad Garret Okamoto
Communication System-on-a-Chip .......................... 26-16 Introduction . System-on-a-Chip (SoC) . Need for Communication Systems . Communication SoCs . System Latency . Communication MCMs . Summary
John F. Alexander Raymond Barrett
26.7
What Makes a Programmable DSP Processor Special?........................................................ 26-72 Introduction . DSP Application Domain . DSP Architecture . DSP Data Paths . DSP Memory and Address Calculation Units . DSP Pipeline . Conclusions and Future Trends
26-1
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 2 11.10.2007 8:30pm Compositor Name: TSuresh
26-2
26.1
Digital Systems and Applications
Bluetooth—A Cable Replacement and More
John F. Alexander and Raymond Barrett 26.1.1
What is Bluetooth?
Anyone who has spent the time and effort to connect a desktop computer to its peripheral devices, network connections, and power source knows the challenges involved, despite the use of color-coded connectors, idiot proof icon identification, clear illustrations, and step-by-step instructions. As computing becomes more and more portable, the problems are compounded in the laptop computer case, and the palmtop device case, let alone the cell phone case, where cabling solutions are next to impossible. The challenges associated with cabling a computer are tough enough for purposes of establishing the ‘‘correct’’ configuration, but are nearly unmanageable if the configuration must be dismantled each time a portable device is carried about in its portable mode. Similar to a knight in shining armor, along comes Bluetooth; offering instant connectivity, intelligent service identification, software driven system configuration, and a myriad of other advantages associated with replacing cabling with an RF link. All of this good stuff is provided for a target price of $5 per termination, a cost that is substantially lower than the cost of most cables with a single pair of terminations. This miracle of modern communication technology is achieved with a 2.4-GHz frequency hopping trans-ceiver and a collection of communications protocols. At least, that is the promise. The participants who include such industrial giants as IBM, Motorola, Ericsson, Toshiba, Nokia, and over a thousand other consortium participants provide credibility for the promise. There has been considerable interest in the press over the past few years in the evolution of the open Bluetooth1 [1] specification for short-range wireless networking [2]. Bluetooth is one of many modern technological ‘‘open’’ specifications that are publicly available. The dream is to support Bluetooth shortrange wireless communications (10–100 m) any where in the world. The 2.4 GHz frequency spectrum was selected for Bluetooth primarily because of its globally available free license. As we entered the twenty-first century there were already more than 1800 members of Bluetooth special interest group (SIG) [3]. Its reasonably high data rate (1 Mb=s gross data rate) and advanced error correction make it a serious consideration that is irresistible for hundreds of companies in a very diverse group of industries, all interested in ad hoc wireless data and voice linkages. The Bluetooth specification utilizes a frequency hopping spread spectrum algorithm for the hardware and specifies rapid frequency hopping of 1600 times per second. As one might conclude 2.4 GHz digital radio transceivers that support this type of high frequency communication are quite complex, however, the hardware design and implementation is just the tip of the iceberg in understanding Bluetooth. The goal of this chapter is to provide the reader with a thorough overview of Bluetooth. An overview is detailed in the standard, but the Bluetooth specifications alone are thousands of pages. Some of the proposed and existing Bluetooth usage models are the cordless computer, the ultimate headset, the three-in-one phone, the interactive conference (file transfer), the Internet bridge, the speaking laptop, the automatic synchronizer, the instant postcard, ad hoc networking, and hidden computing.
26.1.2
Competing Infrared Technology
First, a brief digression will be taken into infrared wireless communication. With the advent of the personal digital assistant (PDA), it was obvious for the need of a low cost, low power means of wireless communication between user’s devices and peripherals. At an Apple Newton users group one could see hundreds of enthusiasts ‘‘beaming’’ business cards back and forth. As other vendors came out with PDA each had its own proprietary infrared communication scheme. Eventually one ‘‘standard’’ method of communication between users applications came about as an outgrowth of the work of the Infrared Data Association. This specification became known as IrDA [4]. An international organization creates and promotes interoperable, low cost infrared data interconnection standards that support a walk-up,
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 3 11.10.2007 8:30pm Compositor Name: TSuresh
Mobile and Wireless Computing
26-3
point-to-point user model. The standards support a broad range of appliances, computing, and communications devices. Several reasons exist for mentioning the IrDA. First, many of the companies involved in the Bluetooth effort are members of the IrDA and have many products, which support IrDA protocols. Thus, much of the learning time in developing and attempting to implement a workable open standard for ad hoc short range wireless communication is in house. Also the IrDA has been one of the many well thought out high technology products that never gained much user acceptance. Many of the members of the Bluetooth SIG were anxious not to make the same mistake but to gain a way to profit from all the hard work invested in IrDA. The proposed solution seemed simple. Just include more or less the entire IrDA software protocol stack in Bluetooth. Thus, the many already developed but seldom-used ‘‘beaming’’ applications out there could readily use Bluetooth RF connectivity. Whether this was a good idea, only time can tell. But it is important in understanding the Bluetooth specification because it is so heavily influenced by millions of hours of corporate IrDA experience and frustrations.
26.1.3
Secure Data Link
Providing a secure data link is a fundamental goal for the Bluetooth SIG. One could envision the horror of walking through an airport with your new proprietary proposal on your laptop and having the competition wirelessly link to your machine and steal a copy. Without good security Bluetooth could never gain wide acceptance in virtually all cell phones, laptops, PDAs, and automobiles that the drafters envisioned. Secure and nonsecure modes of operation are designed into the Bluetooth specification. Simple security is provided via authentication, link keys, and PIN codes, similar to bank ATM machines. The relatively high frequency hopping at 1600 hops=sec adds significantly to the security of the wireless link. Several levels of encryption are available if desired. In some cases, this can be problematic in that the level of encryption allowed for data and voice varies between countries and within countries over time. The Bluetooth system provides a very secure environment, eavesdropping is difficult. Bluetooth probably will be shown to be more secure than landline data transmission [5].
26.1.4
Master and Slave Roles
The Bluetooth system provides a simple network, called a piconet, nominally 10 m in radius. This is the 1-mW power mode (0 dbm). There is also a 10-mW mode allowed, which probably could reach a 100 m in ideal cases, but it may not become widely implemented. One should think of a Bluetooth piconet as a 10 m personal bubble providing a moderately fast and secure peer-to-peer network. The specification permits any Bluetooth device to be either a master or a slave. At the baseband level, once two devices establish connection, one has to be a master and the other a slave. The master is responsible for establishing and communicating the frequency-hopping pattern based on the Bluetooth device address and the phase for the sequence based on its clock [6]. Up to seven active slaves are allowed all of which must hop in unison with the master. The Bluetooth specification allows for the direct addressing of up to 255 total slave units, but all but seven of the slaves must be in a ‘‘parked’’ mode. The master–slave configuration is necessary at the low protocol levels to control the complex details of the frequency hopping, however, at higher levels, the communication protocol is a peer-to peer and the connection established looks like point-to-point. The protocol supports several modes, which include active, sniff & hold, and park. Active uses the most power. While the master unit is in sniff mode, it conserves power by periodically becoming active. Additionally, the slave is in a hold mode but wakes up periodically based on timing from the master to ‘‘see’’ if any data is ready for it. While a slave is in park mode it consumes the least power, but the slave still maintains synchronization with the master. A more complex Bluetooth communication topology is the scatternet. In one of the simpler scatternet schemes there are two masters with a common slave device active in two piconets. In another variation on the scatternet, one device is a slave in one piconet and the master in another. Using this scatternet
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 4 11.10.2007 8:30pm Compositor Name: TSuresh
26-4
Digital Systems and Applications
idea some have speculated that an entire wireless network could be formed by having many piconets, each with one common node. Theoretically, this would work, but the rest of the original Bluetooth specification is not designed for making Bluetooth a wireless LAN. It is likely the newer SIG work group on personal area networking will be interested in expanding the definition and capability of Bluetooth scatternet capability. Currently there is lots of interest in forming location aware ad hoc wireless networks [7]. NASA has already approached this author for ideas for use of Bluetooth for ad hoc small area networks in space missions. The appeal of a wireless link made up of five dollar, very small, low-power, self-configuring, parts capable of connecting various sensors is irresistible for complex space missions where power and payload weight is at a premium.
26.1.5
Bluetooth SIG Working Groups
To understand the Bluetooth specification it is important to understand how the very large Bluetooth SIG is organized. The actual work in producing the various specifications is done by the various SIG working groups. Given that the Bluetooth specification is thousands of pages of detailed technical documentation, it is not practical to just sit down and read the specification sheet. Briefly, five major groups compose the SIG including the air interface group, the software group, the interoperability group, the legal group, and the marketing group [3]. The software group contains three working subgroups primarily responsible for the Bluetooth protocol stack. These are the lower Transport Protocol Group, the Middleware Protocol Group, and the Application Group. The protocol stack follows the international origination of standardization (ISO) seven-layer reference model for open system interconnection [8].
26.1.6
The Transport Protocol Group
The Transport Protocol Group includes ISO layers one and two, which are the Bluetooth radio, the link controller baseband, the link manager, the logical link controller and application protocol (L2CAP) layer, and the host controller interface. Collectively this set of protocol groups form a virtual pipe to move voice and data from one Bluetooth device to another. Audio applications bypass all of the higher level layers to move voice from one user to another [6]. The L2CAP layer prevents higher level layers from having to deal with any of the complexity of the frequency hopping Bluetooth radio and its complex control or special packets used over the Bluetooth air radio interface. The responsibility of the L2CAP layer is to coordinate and maintain the desired level of service requested and coordinate new incoming traffic. The L2CAP layer’s concern is with asynchronous information (ACL packet) transmission [6]. This layer does not know about the details of the Bluetooth air interface such as master, slave, polling, frequency hopping, and such. Its job is to support the higher layer protocol multiplexing so multiple applications can establish connectivity over the same Bluetooth link simultaneously [9]. Device authentication is based on an interactive transaction from the link manager. When an unknown Bluetooth device request connectivity, the device requested ask the requester to send back a 16 byte random number key, which is similar to the familiar bank ATM PIN code procedure. Once a device is authenticated it is necessary for the device to store the authentication codes so this process can be automatic in future connections. Link encryption up to 128 bytes is supported and is controlled by desirability and governing legal issues of the area. Encryption applies only to the data payload and is symmetric. Power management of connected devices is also handled at this level. In sniff mode the slave must wake up and listen at the beginning of each even-numbered slot to see if the master intends to transmit [6]. In hold mode the slave is suspended for a specified time. The API for hold mode puts the master in charge but provisions are available to negotiate the time. In Park mode, the slave dissociates itself from the piconet while still maintaining synchronization of the hopping sequence. Before going in to park mode the master informs the slave of a low-bandwidth beacon channel the master can use to wake the parked slave if there not already seven active slaves.
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 5 11.10.2007 8:30pm Compositor Name: TSuresh
Mobile and Wireless Computing
26-5
Paging schemes allow for a more repaid reconnection of Bluetooth devices. For example, paging is used in the event a master and a slave need to switch rolls to solve some problem such as forming some sort of local area network. Support for handling paging is optional in the Bluetooth specification. Another role of the link managers is to exchange information about each other to make passing data back and forth more efficient.
26.1.7
The Bluetooth Transceiver
The Bluetooth systems operate in the industrial and scientific (ISM) 2.4 GHz band. This band is available license free on a global basis and is set a side for wireless data communications. In the United States the Federal Communication Commission (FCC) sets up rules for transmitters operating in the ISM band under section 15.247 of the Code of Federal Regulations. The frequency allocated is from 2,400 MHz to 2,483.5 MHz. The Bluetooth transceiver operates over 79 channels each of which is one megahertz wide. At least 75 of the 79 frequencies hoped to must be pseduo-random. Bluetooth uses all 79 channels and hops at a rate of 1600 hopes per second.
26.1.8
The Middleware Protocol Group
The Middleware Protocol Group includes ISO layers three and six, which are made up of the RFCOMM protocol, the service discovery protocol (SDP), IrDA interoperability protocols, IrDA, and Bluetooth wireless protocol, and the audio and telephony control protocol. Fitting Bluetooth into the ISO model is really up to the developer. If you want to make it fit it makes sense, but there is lots of strange baggage imbedded protocols in Bluetooth that makes this difficult to see. First, we have already seen the voice communication connect down at the L2CAP layer. Now we are faced with how the toss in multiplexed serial port emulation, IrDA interoperability, and a bunch of protocols from telephony world. No wonder the standard goes on for thousands of pages and hundreds of companies around the world are struggling with comparability testing of various Bluetooth devices designed from this very complex specification.
26.1.9
The Application Protocol Group
The Application Protocol Group includes ISO layer seven. This grouping contains the most extensive variety of special-purpose profiles all of which rely on the six lower levels for service. These include the generic profiles, the serial and object exchange profile, the telephony profiles, and the networking profiles. The generic profiles includes the generic access profile and the service discovery application profile. The serial and object exchange profile contains the serial port profile, the generic object exchange profile, the object push profile, the file transfer profile, the synchronization profile, the networking profiles, the dial-up networking profile, the LAN access profile, the fax profile, the telephony profiles, the cordless telephony profile, the intercom profile, the headset profile, and the cordless telephony profile. Most of these applications profiles are self-explanatory and are only of detailed interest to the software developer when developing a specific application using the appropriate profile. This is not to say that they are not important, but they provide very detailed application programmer interfaces (API) [15]. The possible Bluetooth applications keep expanding. This stimulates interest in expanding the array of application profiles in the Bluetooth specification. Several of the newer application profiles are the car profile, a richer audio=video profile, and a local positioning profile.
26.1.10
Bluetooth Development Kits
Given the obvious complexity of the Bluetooth hardware and software applications, having access to good development kits is essential to speed the implementation of the specification. The first inexpensive development kit to become widely available to universities was Ericsson’s Bluetooth Application and Training Toolkit. This is a first generation Bluetooth kit that demonstrates important Bluetooth features
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 6 11.10.2007 8:30pm Compositor Name: TSuresh
26-6
Digital Systems and Applications
and has a well defined, but extensive proprietary API in Cþþ. Application development is possible, but is time-consuming and tedious requiring knowledge of Cþþ to learn a vast API. Newer kits, specifically for development, are more efficient. Cambridge Silicon Radio (CSR) Bluetooth silicon radio has been very well publicized in the Bluetooth development and features ‘‘all CMOS’’ one chip solution. The CSR development kit includes software for CRR ‘‘BlueCorey’’ [11] IC with on-chip Bluetoothy protocol and a PC development environment. Tools for embedded ‘‘1-chip’’ products are provided. Bluetooth BlueCore-to-host serial protocol and integrated Bluetooth protocol: BlueStacky. An innovative feature is that BlueCore devices enable users to configure the level of BlueStack that loads at boot time using software switches. SCR clams that running the full Bluetooth protocol locally on a BlueCore device significantly reduces the load on the host embedded processor, delivering major advantages to users of there Bluetooth system on a chip solution [12]. Many other development tools can be found currently at the http:==www.bluetooth.com= product=dev_tools=development.asp. The above two are referenced because they have been around for a year or so and the authors have direct experience with them [2].
26.1.11
Interoperability
There is a conflict with IEEE 802.11 Wireless Network Specification, which uses a direct sequence spread spectrum approach in the same frequency band. The direct sequence modulation is incompatible with the frequency hopping approach employed in Bluetooth. It is unlikely that an elegant interoperability solution can be found, without duplication of the entire hardware solutions for each; however, some early ad hoc reports in the trade press seem to point to the interoperability between 802.11 and Bluetooth to be minor [13,14]. Both operate in the 2.4 GHz ISM band, and both are a form of spread spectrum, but the 802.11 is direct sequence modulated spread spectrum and allows more power. Bluetooth is frequency hopping and low power.
26.1.12
Bluetooth Hardware Implementation Issues
First, for Bluetooth to achieve the stated goals for widespread usage at low cost, there are severe hardware constraint issues to be addressed. Second, the environment into which Bluetooth is likely to be deployed is rapidly changing. Finally, the business models for adoption of Bluetooth technology are also impacted. Broadly speaking, there are two classes of hardware implementation for Bluetooth, one employs discrete multiple chips to produce a solution, and a second in which Bluetooth becomes an embedded intellectual property (IP) block in a system-on-a-chip (SoC) product. For the short run, the multiple chip strategy provides an effective implementation directed at prototype and assembly-level products. The strategy is effective during the initial period of development for Bluetooth, while the specification is still evolving and the product volumes are still low; however, a strong case can be made that the high volumes, low cost, and evolving environment make an IP block approach inevitable if Bluetooth is to enjoy wide acceptance. In addressing the environmental issues, the most widely dispersed communications product today is the cell phone, with its service variants. The Internet connectivity of cell phones is soon to surpass the Internet connectivity of the desktop computer. The consequence of the cell phone driving the environment for all information processing connectivity under its constraints of low power, tight packaging, high volume, and low cost manufacturing forces examination of IP blocks for SoC solutions. Bluetooth is highly attractive for cell phone products as a wire replacement, enabling many of the existing profiles from a cell phone, as well as providing expansion for future applications. The desktop computer embraces Bluetooth also as a wire replacement, but has a history of services supported by cabling. The cell phone cannot support many services with cabling, and in contrast to the desktop, service extensions for the raw communication capability of 3G and 4G cell phones must be addressed by wireless solutions.
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 7 11.10.2007 8:30pm Compositor Name: TSuresh
Mobile and Wireless Computing
26-7
Once Bluetooth IP block solutions exist, the market forces will drive high-volume products toward either embedded or single-chip Bluetooth implementations. Technological hurdles must be overcome in the road toward Bluetooth IP block solutions. Presently, the RF front-end solutions for bluetooth are nearly all implemented in bipolar IC technology, implying at least a BiCMOS IC, which is widely recognized as too high cost for high-volume SoC products. As the lithography becomes available for denser CMOS IC products, the deep submicron devices provide the density and speed increases to support SoC solutions in the digital arena, and also improve the frequency response of the analog circuitry, enabling the possibility of future all-CMOS implementation. In addition, communications system problems must be solved in order to ensure the feasibility of an all-CMOS implementation. For example, one of the more popular architectures for a modern communications receiver is the zero-IF (ZIF) approach. Unfortunately, the ZIF approach usually converts the RF energy immediately to baseband without significant amplification, which places the very small signal in the range of 1=f noise of the semiconductor devices employed. Typically, the only devices with substantially low enough noise are bipolar devices, which are to be avoided for system level considerations. Alternative architectures that are suitable include variants of super heterodyne architectures that usually require tuned amplifiers, which are also seldom suitable for integration. One approach that seems to meet all the requirements is one variant of the super heterodyne architecture known as low-IF, that places the energy high enough in the spectrum to avoid noise considerations, but low enough to be addressed by DSP processing to achieve the requisite filtering. Regardless of the particular architecture chosen, the rapid channel switching involved in the frequency-hopping scheme necessitates frequency synthesis for local oscillator functions. There is considerable design challenge in developing a fully integrated voltage-controlled-oscillator (VCO) for use in a synthesizer that slews rapidly and still maintains low phase noise. To compound the above issues, true IP block portability implies a level of process independence that is not currently enjoyed by any of the available architectures. Portability issues are likely to be addressed by intelligence in the CAD tools that are used to customize the IP blocks to the target process through process migration and shrink paths.
References 1. Bluetooth is a trademark owned by Telefonaktiebolagent L M Ericsson, Sweden and licensed to the promoters and adopters of the Bluetooth Special Interest Group. 2. http:==www.bluetooth.com=developer=specification=specification.asp Bluetooth Specification v1.1 core and v1.1 profiles. 3. http:==www.bluetooth.com=sig=sig=sig.asp Bluetooth Special Interest Group (SIG). 4. Infrared Data Association IrDA http:==www.irda.org. 5. Bray, J. and Sturman, C.F., Bluetooth, Connectivity with our Cables, Prentice-Hall, Englewood Cliffs, NJ, 2001. 6. Miller, B.A. and Bisdikian, C., Bluetooth Revealed, Prentice-Hall, Englewood Cliffs, NJ, 2001. 7. Tseng, Y., Wu, S., Liao, W., and Chao, C., Location Awareness in Ad Hoc Wireless Mobile Networks. [June 2001], IEEE Computer, 46,52. 8. International Origination of Standardization Information processing systems–Open Systems Interconnection–Connection Oriented Transport Protocol Specification, International Standard number 8825, ISO, Switzerland, May 1987. 9. Held, G., Data Over Wireless Networks, McGraw-Hill, New York, 2001. 10. http:==www.comtec.sigma.se=Ericsson’s Bluetooth Application and Training Tool Kit 200. 11. BlueCorey’’ and BlueStacky’’ are registered trademarks of Cambridge Silicon Radio, Cambridge, England, 2001. 12. http:==www.csr.com=software.htm, Development software for BlueCorey ICs Cambridge Silicon Radio, Cambridge, England, 2001.
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 8 11.10.2007 8:30pm Compositor Name: TSuresh
26-8
Digital Systems and Applications
13. Dornan, A., Wireless Ethernet: Neither Bitten or Blue. Network Magazine, May 2001. 14. Merritt, R., Conflicts Between Bluetooth and Wireless LANs Called Minor. EE Times, February 2001. 15. Muller, N.J., Bluetooth Demystified, McGraw-Hill, New York 2000.
26.2
Signal Processing ASIC Requirements for High-Speed Wireless Data Communications
Babak Daneshrad 26.2.1
Introduction
To date, the role of application specific integrated circuits (ASICs) in wireless communication systems has been rather limited. Almost all of the signal processing demands of second generation cellular systems such as GSM and IS-136 (US TDMA) can be met with the current generation of general purpose DSP chips (e.g., TI TMS320, Analog Device’s ADSP 21xx, or Lucent’s DSP16xx families). The use of ASICs in wireless data communications has been limited to wireless LAN systems such as Lucent’s WaveLAN and the front end, chip-rate processing needs of DSSS-CDMA based systems such as IS-95 (US CDMA). Several major factors are redirecting the industry’s attention towards ASICs for the realization of highly complex and power efficient wireless communications equipment. First, the move toward third generation (3-G) cellular systems capable of delivering data rates of up to 384 kbps in outdoor macrocellular environments (an order of magnitude higher than the present second generation systems) and 2 Mbps in indoor micro-cellular environments. Second, the emergence of high-speed wireless data communications, whether in the form of high-speed wireless LANs [1] or in the form of broadband fixed access networks [2]. A third, but somewhat more subtle factor is the increased appeal of software radios. Radios that can be programmed to transmit and receive different waveforms and thus enable multi-mode and multi-standard operation. Although ASICs are by nature not programmable, they are parameterizable. In other words, ASIC designers targeting wireless applications must develop their architectures in such a way as to provide the user with features such as variable symbol rates and carrier frequency, as well as the ability to shut off parts of the circuit that may be unused under benign channel conditions. For DSSS systems the ASICs should provide sufficient flexibility to accommodate programmability of the chip-rate, spreading factor, and the spreading code to be used. The next subsection further explores these elements and identify key signal processing tasks that are suited for ASIC implementation. Section 26.2.3 will present signal processing algorithms and ASIC architectures for the realization of these blocks. This section ends with Section 26.2.4.
26.2.2
Emerging High-Speed Wireless Systems
26.2.2.1
Third Generation (3-G) Cellular Networks
Second generation cellular systems such as IS-136, GSM, and IS-95 have mainly focused on providing digital voice services and low-speed data traffic. With the growing popularity of the Internet and the need for multimedia networking, standardization bodies throughout the world are looking at the evolution of current systems to support high-speed data and multimedia services. The technology of choice for all such 3-G systems is wideband code division multiple access (W-CDMA) based on direct sequence spread spectrum (DSSS) techniques [3,4]. The targeted chipping rate for these systems is 3.84 Mcps for the European UTRA standardization work, and a multiple of 1.2288 Mcps for the CDMA2000 proposal. In addition to providing higher data rates, which come about in part due to the increased bandwidth utilization of 3-G systems, a second and equally important aim of these systems is to increase the capacity of a cell (number of simultaneous calls supported by a cell). To this end, all the current
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 9 11.10.2007 8:30pm Compositor Name: TSuresh
26-9
Mobile and Wireless Computing
proposals call for the use of sophisticated receivers utilizing multi-user detection and possibly smart antenna technologies. In order to better appreciate the signal processing requirements of these receiver units, consider the block diagrams presented in Fig. 26.1. Figure 26.1a depicts the transmitter of a DSSS system, along with a candidate successive interference canceller (SIC) shown in Fig. 26.1b [5]. The details of the rake receiver are shown in Fig. 26.1c. The tremendous processing requirements of this architecture will become evident by considering a modest system operating at a chip rate of say, 4 Mcps using a 32-tap shaping filter, four rake fingers per user, four complex correlators per rake finger and 10 users in the cell, the number of operations (real multiply-adds) needed for a 5-stage SIC is upwards of 14 billion operations per second or giga-operations per second (GOPS). This amount of processing can easily overwhelm even the latest generation of generalpurpose processors such as the TI TMS320C6x which delivers 1.6 giga-instructions per seconds (GIPS), but only 400 mega multiply-add operations per second [6]. At an anticipated power dissipation of 850 mW per processor, the overall power consumption of a SIC circuit based on such units will be quite large. It is also worth noting that many operands are in the SIC or other MUD receiver that require only a few number of bits (i.e., multiplication with a 1-bit PN code sequence). This fact can be exploited in a dedicated ASIC datapath architecture but not in a general-purpose software programmable architecture.
↑2 Data, F baud
Shaping filter
Received sig. NF chip
Shaping filter
User-1 rake
↓N
User-K rake
Choose largest
PN, F chip
(a)
Remodulator
To next stage SIC
(b)
AFC loop
To VCO
PN
Σ Variable rate interpolator
Pilot correlator Data correlator Early correlator Late correlator
NCO Rake finger 1 Rake finger 2 (c)
Loop filter
Rake Combiner
Rake finger K
FIGURE 26.1 Block diagram of (a) generic DSSS transmitter, (b) successive interference canceller for multiuser detection, and (c) rake receiver for a system with parallel pilot channel (i.e., IS-95).
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 10 11.10.2007 8:30pm Compositor Name: TSuresh
26-10
26.2.2.2
Digital Systems and Applications
Broadband Wireless Networks
Emerging broadband fixed wireless access systems provide high-speed connectivity between a cellular base station and a home or office building at data rates of a few Mbps to a few tens of Mbps. On the other hand, standardization activities that are currently targeting high-speed wireless mico-cellular (wireless LAN) systems are looking at delivering 10–20 Mbps over the air data rates in the near future, with higher rates projected in the long term. It is generally accepted that in order to achieve such high data rates, beam switching or beamforming techniques must be integrated into the development of the nodes. In addition, single carrier systems must include adaptive equalization to overcome time varying channel impairments, while multicarrier systems based on OFDM will require a large number of subcarriers [7]. The signal processing requirements for such high data rate systems could easily mount into the tens of GOPS range, thus necessitating the development of ASICs. Furthermore, the flexibility of digital implementation compared to an analog implementation of the down-conversion path makes a digital IF architecture more appealing. Figure 26.2 depicts the detailed block diagram of a single carrier high-speed wireless communication receiver complete with adaptive beamforming, adaptive equalization, and variable symbol rates. The flexibility offered by such an architecture can meet the demands of different systems requiring different levels of performance. In this architecture, the direct digital frequency synthesizer (DDFS) serves three roles. First, it enables down-conversion of any carrier frequency up to half the sampling frequency of the analog-to-digital converter. Second, it can replace or complement a VCO for the purposes of carrier recovery, and third it can easily generate different phases needed by the beamforming circuit. The variable rate decimator block is a key element in variable symbol rate systems where it is desired to maintain the same exact analog filtering, but yet accommodate user defined symbol rates. This is particularly important in wireless systems where a predefined data rate is difficult to guarantee due to statistical channel variations such as fading and shadowing. In such scenarios, the user can simply back-off on the symbol rate and provide connectivity albeit at a lower data rate. The flexible decimation architecture depicted in Fig. 26.2 consists of two stages. The first is a course decimator block, which can decimate the signal by 2N for N ¼ 0, 1, 2, . . . , M. This section is realized IF Freq. ↓ 2N
A/D
↓α W1
IF Freq. ↓ 2N
A/D θ1 Beamformer coefficients
θK
DDFS
Matched filter
↓α
From timing WK recovery From carrier recovery
FFF FBF
FIGURE 26.2 Block diagram of an all-digital receiver for a single carrier system (i.e., QAM) featuring digital IF sampling, beamforming, variable symbol rate, adaptive equalization, all digital timing, and carrier recovery loops.
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 11 11.10.2007 8:30pm Compositor Name: TSuresh
26-11
Mobile and Wireless Computing
using a cascade of N decimate by two stages. The second part of the decimator is a variable rate interpolator block, which can change the sampling rate by any value in the range of 2–4. Not only can this block be used to change the sampling frequency of the signal, but it is the vital element in the realization of an all digital timing recovery loop. The matched filter is typically a fixed coefficient fixed impulse response (FIR) filter. This block is followed by a decision feedback equalizer (DFE) that helps mitigate the effects of intersymbol interference (ISI) caused by the multipath nature of the channel. The DFE is made up of two adaptive FIR filters referred to as the feedforward filter (FFF) and the feedback filter (FBF). The amount of processing (in terms of real multiply-adds per second) needed to realize these blocks can easily run into several GOPS. As an example, a baseband QAM receiver consisting of a 30-tap matched filter, a 10-tap FFF and a 5-tap FBF adapted using the least mean squares (LMS) algorithm, and running at 10 Mbaud requires close to 2.5 GOPS of processing. Once the processing needs of the DDFS, variable rate filters, and the beamforming network are also factored in, the processing requirements can easily reach 7–8 GOPS.
26.2.3
VLSI Architectures for Signal Processing Blocks
26.2.3.1
Fixed Coefficient Filters
The most intuitive means of implementing a FIR filter is to use the direct form implementation presented in Fig. 26.3a [12]. Applying the transposition theorem to this filter we get the transposed structure shown in Fig. 26.3b. The two structures are identical in terms of I=O, however, the transposed form is ideal for high speed filtering operations since the critical path for an N tap filter is always one multiplier delay plus one adder delay. The critical path of the direct form, however, is one multiplier delay plus N-1 adder delays. The fact is that the symbol rate for most wireless communication systems is a few tens of megahertz, whereas a typical multiplier in today’s CMOS process technologies can easily reach speeds of 80–100 MHz. It is thus desirable to use the hybrid architecture shown in Fig. 26.3c where each multiplier
−1
z W0
W1
z
−1
−1
z
W2
WN-1
(a)
WN-1
(b)
WN-2
−1
z
W5 W4 W3
W1
−1
W0
−1
z
z
W2 W1 W0
(c)
FIGURE 26.3 Alternative FIR filter structures: (a) direct form FIR structure, (b) transposed form FIR structure, and (c) hybrid FIR structure.
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 12 11.10.2007 8:30pm Compositor Name: TSuresh
26-12
Digital Systems and Applications
accumulator is time-shared between several taps (three in this case) resulting in a more compact circuit for lower symbol rates. The implementation of fixed coefficient FIR filters can be further simplified by moving away from the use of 2’s complement number notation, and using a signed-digit number system in which each digit can take on one of three values {1, 0, 1}. In general there are multiple signed-digit representations for the same number and a canonic signed-digit (CSD) representation can be defined for which no two nonzero digits are adjacent [8]. The added flexibility of signed-digit numbers allows us to realize the same coefficient using fewer nonzero coefficients than would be possible with a simple 2’s complement representation. Using an optimization program, it is possible to design an FIR filter using CSD filters with as few as three or four nonzero digits for each coefficient. This could help significantly reduce the complexity of fixed coefficient multipliers since the number of partial products generated is directly proportional to the number of nonzero digits in the multiplier. 26.2.3.2
Direct Digital Frequency Synthesizer (DDFS)
Given an input frequency word W, a DDFS will produce a frequency proportional to W. The most common techniques for realizing a DDFS consist of first accumulating the frequency word W in a phase accumulator and then producing the sine and cosine of the phase accumulator value using a table lookup or a coordinate rotation (CORDIC) algorithm. These two approaches are depicted in Fig. 26.4. The two metrics for measuring the performance of a DDFS are the minimum frequency resolution Df and the spurious free dynamic range (SFDR). The frequency resolution can be improved by increasing the wordlength used in the accumulator, while the SFDR is affected by the wordlengths in both the accumulator as well as the sine=cosine generation block. One of the main challenges in the development of the table lookup DDFS has been to limit the size of the sine=cosine table. This has been accomplished through two steps [9]. First, by exploiting the symmetry of the sine and cosine functions it is only necessary to store b of the period of a sine wave and derive the remainder of the period through manipulation of the saved portion. Second, the number of bits per entry can be reduced by dividing the sine table between a coarse ROM and a fine ROM with the final result obtained after simple post-processing of the values. Combining these two techniques can result in the reduction of the sine tables by an order of magnitude or better. In the CORDIC algorithm, Fig. 26.4, sine and cosine of the argument are calculated using a cascade of stages, each of which rotates its input complex vector by d=2k (d ¼ p=2) if the kth bit of W is 0 and d=2k if the bit is a 1. Thus each stage performs the following matrix operation:
xout yout
W
¼
cos u sin u
sin u cos u
xin yin
N bits
+
Reg
q (t )
¼ cos u
1 tan u tan u 1
Sin()
Cos (bWt)
Lookup or generation
Sin (bWt )
xin yin
(a)
W
N bits
+
Reg
κ
Cos (bWt )
0 q (t )
Sin (bWt)
(b)
FIGURE 26.4
Two most common DDFS architectures: (a) table lookup and (b) coordinate rotation (CORDIC).
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 13 11.10.2007 8:30pm Compositor Name: TSuresh
26-13
Mobile and Wireless Computing
In [10] a simplification of the CORDIC DDFS is presented in which for small u, tan(u) is simply approximated by u. In [11] a different modification to the CORDIC architecture is proposed that will facilitate low-power operation in cases where a sustained frequency is to be generated. This is achieved by calculating the necessary angle of rotation for each sampling clock period, and dedicating a single rotation stage in a feedback configuration to contiually rotate the phasor through the desired angle. 26.2.3.3
Decimate=Interpolate Filters
Variable rate interpolation and decimation filters play a very important role in the development of highly flexible and self contained all-digital receivers. As previously mentioned, they are the critical element of all digital timing recovery loops as well as systems capable of operating at a host of user defined symbol rates. Additionally, digital resampling allows the ASIC designer to ensure that the clock frequency at all portions of the circuit are the minimum that they need to be to properly represent the signals. This could have significant impact on the size and power consumption of the resulting ASIC since power scales with the clock frequency and the square of the supply voltage. Thus, for a given circuit with a critical path of say t seconds, if the data rate into the block is lowered by a factor K, then the frequency dependent portion of the dissipated power is also scaled by the same factor; however, additional power savings can be achieved by noting that the block now has Kt seconds to complete its task. Because the speed of a digital circuit is proportional to the supply voltage, we can reduce the supply voltage and still ensure that the circuit meets the speed constraints. Given the coefficients of an FIR decimation or interpolation filter, the structure of choice for the realization of a decimate by D or an interpolate by D filter is the polyphase structure [12] shown in Fig. 26.5. The attractiveness of this structure is in the fact that the filter is always operated at the lower sampling frequency. In many cases it is desirable to resample the signal by a power of 2N. In which case N decimate (interpolate) by two stages can be cascaded one after the other. Each decimator will consist of a halfband filter followed by a decimator. The halfband filter could be realized using the polyphase structure to simplify its implementation. Moreover, these filters are typically very small consisting of anywhere from 7 to 15 taps depending on the specified stopband attenuation and the size of the transition band. Their implementation can be simplified by exploiting the fact that close to half of the coefficients are zero and the remainder are symmetric about the main tap due to the linear phase characteristics of the halfband filter. Finally, since these are fixed-coefficient filters, they can be realized using CSD coefficients [13]. It is interesting to note that for the special case of a decimate (interpolate) by 2N, it is possible to reuse the same hardware element and simply recirculate the data through it. In this architecture, the filter is run at the highest data sampling rate. The first pass through the filter will use up 1=2 of its computational resources, the second pass will use up 1=4 of the resources, and so on [14]. Although conceptually attractive, the clock generation circuit for such an architecture is quite critical and complex and this approach looses its appeal for recirculating factors greater than 3 or 4.
h0, hU , h2U, ...
Interpolator structure
h1, hU +1, h2U +1, ...
x(n)
hU -1, h2U -1, h3U +1 ...
y(n)
h0, hD, h2D, ... x(n) Decimator structure
FIGURE 26.5
h1, hD +1, h2D +1, ...
hD -1, h2D -1, h3D -1, ...
Polyphase filter structures for interpolation and decimation.
y(n)
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 14 11.10.2007 8:30pm Compositor Name: TSuresh
26-14
Digital Systems and Applications
Ts
FIGURE 26.6
α
Ti
Ts
Continuous time filter
DAC
Variable rate interpolation. X(m) b3(0) Ts
b3(1)
Ts
b3(2)
Ts
b3(3)
b2(i )
b1(i )
b0(i )
b3(i ) µk
FIGURE 26.7
Y(k )
Farrow structure.
In cases where the oversampling ratio is large (e.g., narrowband signal) an alternative approach using a cascaded integrator-comb (CIC) structure can be used to implement a multiplierless decimator. The interested reader is referred to [15] for a brief overview of a CIC ASIC. The continuously variable decimator block shown in Fig. 26.2 can resample the input signal by any factor a in the range of 2–4. The operation of this block is equivalent to that shown in Fig. 26.6, where the input data x(n), originally sampled at 1=Ts is resampled to produce an output sequence y(n) sampled at 1=Ti. The entire operation is performed digitally. To better understand the operation of this block, let us define the variable mk, to be the time difference between the output sample y(k) and the most recent input sample x1. The job of the variable rate interpolator is to weight the adjacent input samples ( . . . , x0, x1, . . . ), based on the ratio mk=Ts, and add the weighted input samples to obtain the value of the output sample, y(k). Mathematically, a number of
0
Amplitude/dB
−10 −20 −30 −40 −50 −60 −70 −80
0
0.5
1
1.5
Frequency/Fs
FIGURE 26.8
Frequency response of polynomial-based interpolator [18].
2
2.5
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 15 11.10.2007 8:30pm Compositor Name: TSuresh
Mobile and Wireless Computing
26-15
interpolation schemes can perform the desired operation; however, many of them, such as sinc-based interpolation, require excessive computational resources for a practical hardware implementation. For real-time calculation, Erup et al. [16] found polynomial-based interpolation to yield satisfactory results while minimizing the hardware complexity. In this approach, the weights of the input samples are given as polynomials in the variable mk and can be easily implemented in hardware using the Farrow structure [17] shown in Fig. 26.7. In this sructure, all the filter coefficients are fixed and polynomials in mk are realized by nesting the multipliers as shown in Fig. 26.7. The signal contained in the imageband will cause aliasing after resampling; however, proper choice of the coefficients in the Farrow structure can help optimize the frequency response of the interpolator for a particular application. An alternative method to determine the filter coefficients is outlined in (see Fig. 26.8) [18].
26.2.4
Conclusions
Section 26.2 reviewed trends in the wireless communications industry towards high speed data communications in both the macrocellular and the microcellular environments. The implication of these trends on the underlying digital circuits will move designers towards dedicated circuits and ASICs to meet these demands. As such the paper outlined the major signal processing tasks that these ASICs will have to implement.
References 1. K. Pahlavan, et al., ‘‘Wideband local access: wireless LAN and wireless ATM,’’ IEEE Commun. Mag., pp. 34–40, Nov. 1997. 2. J. Mikkonen, et al., ‘‘Emerging wireless broadband networks,’’ IEEE Commun. Mag., vol. 36, no. 2, pp. 112–17, Feb. 1998. 3. E. Dahlman, Bjorn Gudmundson, M. Nilsson, and J. Skold, ‘‘UMTS=IMT-2000 based on wideband CDMA,’’ IEEE Commun. Mag., pp. 70–80, Sept. 1998. 4. Y. Furuya, ‘‘W-CDMA: an approach toward next generation mobile radio system, IMT-2000,’’ Proc. IEEE GaAs IC Symposium, pp. 3–6, Oct. 1997. 5. A. Duel-Hallen, J. Holtzman, and Z. Zvonar, ‘‘Multiuser detection for CDMA systems,’’ IEEE Pers. Commun. Mag., pp. 46–58, April 1995. 6. http:==www.ti.com=sc=docs=dsps=products.htm. 7. B. Daneshrad, et al., ‘‘Performance and implementation of clustered OFDM for wireless communications,’’ ACM MONET special issue on PCS, vol. 2, no. 4, pp. 305–14, 1997. 8. H. Samueli, ‘‘An improved search algorithm for the design of multiplierless FIR filters with powers-oftwo coefficients,’’ IEEE TCAS, vol. 36, no. 7, pp. 1044–1047, July 1989. 9. H.T. Nicholas, III and H. Samueli, ‘‘A 150-MHz direct digital frequency synthesizer in 1.25 mm CMOS with 90 dBc spurious performance,’’ IEEE JSSC, vol. 25, no. 12, pp. 1959–969, Dec. 1991. 10. A. Madisetti, A. Kwentus, and A.N. Willson, Jr., ‘‘A sine=cosine direct digital frequency synthesizer using an angle rotation algorithm,’’ Proc. IEEE ISSCC ’95, pp. 262–63. 11. E. Grayver and B. Daneshrad, ‘‘Direct digital frequency synthesis using a modified CORDIC,’’ IEEE ISCAS, June 1998. 12. J.G. Proakis and D.G. Manolakis, Introduction to Digital Signal Processing, Macmillan, London, 1988. 13. J. Laskowsky and H. Samueli, ‘‘A 150-MHz 43-tap halfband FIR digital filter in 1.2-mm CMOS generated by silicon compiler.’’ Proc. IEEE CICC ’92, pp. 11.4=1–4, May 1992. 14. T.J. Lin and H. Samueli, ‘‘A VLSI architecture for a universal high-speed multirate FIR digital filter with selectable power-of-two decimation=interpolation ratios,’’ Proc. ICASSP ’91, pp. 1813–816, May 1991.
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 16 11.10.2007 8:30pm Compositor Name: TSuresh
26-16
Digital Systems and Applications
15. A. Kwentus, O. Lee, and A. Willson, Jr., ‘‘A 250 Msample=sec programmable cascaded integrator-comb decimation filter,’’ VLSI Signal Processing, IX, IEEE, New York, pp. 231–40, 1996. 16. L. Erup, F.M. Gardner, and R.A. Harris, ‘‘Interpolation in digital modems. II. Implementation and performance,’’ IEEE Trans. on Commun., vol. 41, no. 6, pp. 998–1008, June 1993. 17. C.W. Farrow, ‘‘A continuously variable digital delay element,’’ Proc. ISCAS ’88, pp. 2641–645, June 1988. 18. J. Vesma and T. Saramaki, ‘‘Interpolation filters with arbitrary frequency response for all-digital receivers,’’ IEEE ISCAS ’96, pp. 568–71, May 1996.
26.3
Communication System-on-a-Chip
Samiha Mourad and Garret Okamoto 26.3.1
Introduction
Communication traffic worldwide is exploding: wired and wireless, data, voice, and video. This traffic is doubling every 100 days and it is anticipated that there will be a million people online by 2005. Today, more people are actually using mobile phones than are surfing the Internet. This unprecedented growth has been encouraged by the deployment of digital subscriber lines (DSL) and cable modems, which telephone companies have provided promptly and at a relatively low price. Virtual corporations have been created because of the availability and dependability of communication products such as laptops, mobile phones and pagers, which all support mobile employees. For example, vending machines may contact the suppliers when the merchandise level is low so that suppliers remotely vary the prices of the merchandise according to supply and demand. With such proliferation in communication products and the need for a high volume, high speed transfer of data, new standards such as ATM and ITU-T are being developed. In addition, a vast body of knowledge, central to problems arising in the design and planning of communication systems, has been published; however, in fabricating products to meet these needs, the industry has continually attempted to use new design approaches that have not been fully researched or documented. Communication devices need to be of small size and low power dissipation for portability and need to be operated at very high speed. Any of these devices, as other digital products, may consist of a single integrated circuit (IC) or more likely many ICs mounted on a printed circuit broad (PCB). Although the new technology (small feature size) has resulted in higher speed ICs, the transfer of data from one IC to another still creates a bottleneck of information. The I=O pads, with their increasing inductance, cause supply surges that compromise signal integrity. As an alternative to PCB design, another design approach known as multichip module (MCM) consists of placing more than one chip in the same package. The connections between modules have a large capacitive load that slows down communication among all of the modules. In the late 1990s, a new paradigm design called system-on-a-chip (SoC) has been successfully used to integrate the components of an entire system on one chip. This is in contrast to the traditional design where the components are implemented in separate ICs and then assembled on a PCB. Section 26.3 describes the new design paradigm of a SoC and its beneficial attributes are outlined. The remainder of the paper concentrates on communication devices and Section 26.3.3 emphasizes the need for these systems. Descriptions of communication SoCs and projections on their characteristics are given in Section 26.3.4. Latency, an important attribute, is the subject of Section 26.3.5; and Section 26.3.6 describes the integration of these systems with analog parts in MCM.
26.3.2
System-on-a-Chip (SoC)
The shift toward very deep submicron technology has encouraged IC designers to increase the complexity of their designs to the extent that an entire system is now implemented on a single chip. To increase the design productivity and decrease time-to-market, reuse of previously designed modules
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 17 11.10.2007 8:30pm Compositor Name: TSuresh
26-17
Mobile and Wireless Computing
RAM
Interface block (RT level)
UDL Micropro. (layout)
FIGURE 26.9
FPGA
DSP (netlist)
Controller (algorithm)
UDL
RAM
A system-on-a-chip (SoC).
is becoming common practice in SoC design; however, the reuse approach is not limited to in-house designs. It is extended to modules that have been designed by others as well. Such modules are referred to as embedded cores. This design approach has encouraged the founding of several companies that specialize in providing embedded cores to service multiple customers. It is predicted that in the near future, cores, of which 40% to 60% will be from external sources (Smith 1997), will populate 90% of a chip. Except for a very few, individual companies do not have the wide range of expertise that can match the spectrum of design types in demand today. Core-based design, justified by the need to decrease time-to-market, has created a host of challenging problems for the design and testing community. First, there are legal issues for the core provider and the user, regarding the intellectual property (IP). Second, there are problems with integrating and verifying a mix of proprietary and external cores that are more involved than simply integrating ICs on a PCB. A typical SoC configuration is shown in Fig. 26.9. It consists of several cores that are also referred to as modules, blocks, or macros. Often, these terms are used interchangeably. These cores may be DSP, RAM modules, or controllers. This same image of an SoC may be perceived as a PCB with the cores being the ICs mounted on it. It also resembles standard cells laid on the floor of an IC. In the latter case, the blocks are of elementary gates of the same layout height. That is, they are all ICs in the PCB case or all standard cells in the IC case. For an SoC, they may consist of a several types, as described below. A UDL is a user defined logic that is basically equivalent to glue logic in microprocessors. Cores are classified in three categories: hard, firm, and soft (Gupta 1997). Hard cores are optimized for area and performance and they are mapped into a specific technology and possibly a specific foundry. They are provided as layout files that cannot be modified by the users. Soft cores, on the other hand, may be available as HDL technology-independent files. From a design point of view, the layout of a soft core is flexible, although some guidelines may be necessary for good performance. The flexibility allows optimization to the desired levels of performance or area. Firm cores are usually provided as technologydependent netlists using library cells whose size, aspect ratio, and pin location can be changed to meet the customer needs. Table 26.1 summarizes the attributes of reusable cores. The table indicates a clear trade-off between design flexibility on one hand, and predictability and hence time-to-market performance complexity on the other. Soft cores are easily embedded in a design. The ASIC designers have complete control over the implementation of this core, but it is the designer’s job to optimize it for area, test, or power performance. Hard cores are very appropriate for time critical applications, whereas soft cores are candidates for frequent customization. The relationship between flexibility and predictability is illustrated in Fig. 26.10. The cores can also be classified from a testing perspective. For example, there is typically no way to test a hard core unless the supplier provides a test set for this core, whereas a test set for the soft core needs to
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 18 11.10.2007 8:31pm Compositor Name: TSuresh
26-18
Digital Systems and Applications
TABLE 26.1
Categorizing Reusable Cores Process Technology
Portability
Not applicable
Independent
Unlimited
RTL RTL, blocks Netlist
Reference
Generic
Library mapping
Polygon data
Footprint, timing model Process specific library and design rules
Fixed
Process mapping
Type
Flexibility
Design Flow
Representation
Soft
Very flexible Unpredictable
System design
Behavioral
Firm
Flexible
RTL design Floor planning Placement
Hard
Inflexible Predictable
Routing Verification
Libraries
Source: Hunt 1996.
Flexibility
Soft
Firm
Hard
Predictibility, performance, and complexity
FIGURE 26.10
Trade-offs among types of cores (Hunt 1996).
be created if not provided by the core provider. This makes hard cores more demanding when developing a test strategy for the chip. For example, it would be difficult to transport through hard cores a test for an adjacent block that may be another core or a UDL component. In some special cases, the problem may be alleviated if the core includes well described testability functions. 26.3.2.1
Design and Test Flow
An integrated design and test process is highly recommended. This approach cannot be more appropriate than it is for core-based systems. Conceptually, the SoC paradigm is analogous to the integration of several ICs on a PCB, but there is a fundamental difference. Whereas in a PCB the different ICs have been designed, verified, fabricated, and tested independently from the board, fabrication and testing of an SoC are done only after integration of the different cores. This fact implies that even if the cores are accompanied by a test set, incorporation of the test sets is not that simple and must be considered while integrating the system. In other words, reuse of design does not translate to easy reuse of the test set. What makes this task even more difficult is that the system may include different cores that have different test strategies. Also, the cores may cover a wide range of functions as well as a diverse range of technologies, and they may be described using different HDL languages, such as Verilog, VHDL, and Hardware C to GDSII. The basic design flow applies to SoC design in the sense that the entire system needs to be entered, debugged, modified for testability, validated, and mapped to a technology; but all of this has to be done in an integrated framework. Before starting the design process, an overall strategy needs to be chartered to
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 19 11.10.2007 8:31pm Compositor Name: TSuresh
Mobile and Wireless Computing
26-19
facilitate the integration. In this respect, the specification phase is enlarged and a test strategy is included. This move toward more design on the system level and less time on the logic level. The design must first be partitioned. Then decisions must be made on such questions as: . . . .
Which partition can be instantiated by an existing core? Should a core be supplied by a vendor or done in-house? What type of core should be used? What is the integration process to facilitate verification and testing?
Because of the wide spectrum of core choices and the diversity of design approaches, SoC design requires a meta-methodology. That is, a methodology that can streamline the demands of all other methodologies used to design and test the reusable blocks as well as their integration with user defined logic. To optimize on the core-based design, an industry group deemed it necessary to establish a common set of specifications. This group, known as the virtual socket interface alliance (VSIA), was announced formally in September 1996. Its intent is to establish standards that facilitate communication between core creators and users, the SoC designers (IEEE 1999a). An example of using multiple cores is the IBM-designed PowerPC product line, based on the PowerPC 40X chip series (Rincon 1997). The PowerPC micro-controller consisted of a hard core and several soft cores. For timing critical components such as the CPU, a hard core was selected, while soft cores were used for peripheral functions such as the DMA controller, external bus interface unit (EBIU), timers, and serial port unit (SPU). The EBIU may be substituted by, say, a hard core from Rambus. A change in the simulation and synthesis processes is required for embedded cores due primarily to the need to protect the intellectual property of the core provider. Firm cores may be encrypted in such a manner as to respond to the simulator without being readable by humans. For synthesis, the core is instantiated in the design. In the case of a soft core, sometimes the parameters are scaled to meet the design constraints. To preserve the core performance, the vendor may include an environment option to prevent the synthesis program from changing some parts of the design. This will protect the core during optimization, but the designer may remove such an option and make some changes in the design. A hard or a firm core is treated as a black box from the library and goes through the synthesis process untouched. 26.3.2.2
Advantages of SoCs
The overall size of the end product is reduced because manufacturers can put the major system functions on a single chip, as opposed to putting them on several chips. This reduces the total number of chips needed for the end product. For the same reason, the power consumption is reduced. SoC products provide faster chip speeds due to the integration of the components=functions into one chip. Many applications such as high-speed communication devices (VoIP, MoIP, wireless LAN, 3G cellular phones) require chip speeds that may be unattainable with separate IC products. This is primarily due to the physical limitations of moving data from one chip to another, through bonding pads, wires, buses, etc. Integrating chip components=functions into one chip eliminates the need to physically move data from one chip to another, thereby producing faster chip speeds. Another important advantage of SoCs is the reuse of previously designed circuits, thereby reducing the design process time. This consequently translates into shorter time-to-market. In addition to decreasing time-to-market, it is very important to decrease the cost of packaging and testing, which are constantly increasing with the finer technology features. Instead of testing several chips and the PCB on which they are assembled, testing time is reduced to only one IC. SoCs are, however, very complex and standards are now being developed to facilitate their testing (IEEE 1995b). In the remainder of this paper, we focus on communication systems that we will refer to as comm. SoC or simply SoC.
26.3.3
Need for Communication Systems
Public switched telephone networks (PSTN) are becoming congested due to increasing Internet traffic as shown in Fig. 26.11. This drives the development of broadband access technology and high-speed optical
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 20 11.10.2007 8:31pm Compositor Name: TSuresh
26-20
Digital Systems and Applications
900 800 700
Million
600 500 400 300 200 100 0 1995 1996 1997 1998E 1999E 2000 2001 2002 WWW Devices (M)
WWW Users (M)
Users and devices
(a)
FIGURE 26.11
1997
(b)
1998
1999
2000
Annual bandwidth growth
Internet growth.
networks. Another important factor is the convergence of voice, data, and video. As a consequence, there is a need for low and uniform latency devices for real time traffic. In addition, Internet service providers (ISP) and corporate Intranet are needed for voice and data IP gateways. Mobile users drive the development of wireless and satellite devices. In addition, there is an increasing demand for routers= switches, DSL modems, etc. All needs mentioned above require smaller size and faster communication devices. Telephone calls that used to last an average of three minutes now exceed an hour or more when connected to the Internet. This has resulted in increasing the demand on DSL that transmit data over Internet protocols (IP) such as voice-over-IP (VoIP), mobile-over-IP (MoIP), and wireless requires speeds that may be unattainable with separate IC products. Examples of products: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.
2G and 3G wireless devices (CDMA2000, WCDMA), etc. DSL modems Infrastructure, carrier, and enterprise circuit, packet switched and VoIP devices Satellite modems Cable modems and HFC routing devices De=MUX for data stream on optical network Web browsers (WAP) or short messaging systems (I-mode) LAN telephony ATM systems Enterprise, edge network and media-over-IP switches and high-speed routers Wireless LAN (IEEE 802.11 IEEE 802.11a, and IEEE 802.11b) Bluetooth
Maybe the most important example of an emerging wireless communication standard is Bluetooth. This is a wireless personal area network (PAN) technology from the Bluetooth special interest group (SIG), founded in 1998 by Ericsson, IBM, Intel, Nokia, 3Com, Lucent, Microsoft, Motorola, and Toshiba. Bluetooth is an open standard for short-range transmission of digital voice and data between mobile devices (cellular phones, PDAs, laptops) and desktop devices. Bluetooth may provide a common standard to enable PDAs, laptop and desktop computers, cellular phones, thermostats, and virtually every other home and business electronic device to communicate with each other. Manufacturers will rely on SoC advances to help reach the target of $5 added cost to a consumer appliance by 2001. A study by Merrill Lynch projected that Bluetooth semiconductor revenue will reach $3.4 billion in 2005, with
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 21 11.10.2007 8:31pm Compositor Name: TSuresh
26-21
Mobile and Wireless Computing
Bluetooth included in 1.7 billion devices that year, and the Bluetooth SIG estimated that the technology would be a standard feature in 100 million mobile phones by the end of 2001.
26.3.4
Communication SoCs
The exponential growth of the Internet and the bandwidth shown in Fig. 26.11, indicate that more communication products are geared towards this technology, which requires a communication mode different than that used in traditional switching telephony. For example, in a PSTN, circuit switching is used and requires a dedicated physical circuit through the network during the life of a telephone session. In Internet and ATM technology, however, packet switching is used. Packet switching is a connectionless technology, where a message is broken into several small packets to be sent to a destination. The packet header contains the destination and source address, plus a sequence number so that the message can be reassembled. There is a paradigm shift in digital communication motivated by the evolution of Internet as mission critical service that demands migration from circuit switch to packet switch. The older paradigm supported the data traffic part of the telephone networks. Whereas the new paradigm support the convergence of voice, data, and video and require a new class of media-over-IP systems voice traffic as part of the data network, thus requiring communication SoC for VoIP. Most communication SoC consists of few components that are clustered around a central processing unit (CPU), which controls some or all of the following: (1) Packet processing, (2) Programmable DSP for data and signaling algorithm=protocol implementation, (3) I=O for interface with voice and data network such as ATM, PCI, Ethernet, H100=110, (4) memory system for intermediate storage of voice and data streams, (5) hardwired DSP or accelerators for Codec and multi level mod=demod to increase system throughput, and (6) MPEG cores for media-over-IP MoIP processing (Fig. 26.12). Communication SoCs are actually a mix of software and hardware. Some of the circuits contain hardwired algorithms for code processing, but the software can be stored on the chip for protocols that process data. Figure 26.13 shows the software for a typical VoIP. This include several layers of software and IP such as:
P-Bus
1. Telephony signaling : Network interface protocol, which contains address translation and parsing and protocols such as H-3xx, media gateway control protocol (MGCP), and real time conferencing protocol (RTCP). 2. Voice processing : includes voice-coding unit using G.xx protocol, voice activation detection (VAD), comfort noise generation (CNG), which is used in fax-to-fax communication. 3. User interface : provides system services to the user such as key pad and display drivers and user procedures.
DSP 1...n cores and algorithm accelerators
MEMORY
I/O comm/data cores
FIGURE 26.12
Components of a communication SoC.
CPU
M-Bus
Memory system
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 22 11.10.2007 8:31pm Compositor Name: TSuresh
26-22
Digital Systems and Applications
Telephony signaling
User interface Display driver
Cal processing
User procedures
Key pad driver
Network management
Address translation and parsing
Network management agent
H.323 protocols SGCP/ MGCP
Audible driver
Startup/ init
RTOS
WDT driver
POST
BSP
Mem manager
H.225
H.245 Cap
Web server java apps
RAS/ RSTP
SMMP
SW upload
System services Net int. driver Control unit TCP PCM interface unit u-Law A-Law L-PCM Tone Gen
Tone detect
Echo Cano
Gain Cnd
Voice activity detect VAD
Voice coding unit G711 PCM G726
Packet playout unit
Packet protocol encaps unit (RTP)
Delay jitter, lost packet
MAC/ARP Ethernet driver
Network interface protocols
Voice processing
FIGURE 26.13
UDP IP
Software for VoIP SoC.
4. Network management: software upload and handling Java applets. 5. Network Interface Protocols: such as transmission control protocol (TCP), user datagram protocol (UDP), which is a TCP=IP, and Ethernet driver. Other software and protocol may also be included such as packet processing and network management protocols, call control=signaling protocols=fax and modem tone detection, echo canceller, VAD, CNG, read to order systems (RTOS), and other software components for MoIP systems. Communication SoCs that accomplish the above tasks are expected to grow in size as projected in Fig. 26.14. The number of gates per chip will increase from one million in 1999 to 7 M in 2003. A major component in a communication SoC is the embedded memory banks, which is also expected to increase from 1 to 16 Mbit. The type of memory used will change from static RAMs (SRAM) to enhanced dynamic RAMs (EDRAM), which are much more compact.
16 7
M gate
8
4 EDRAM
Hybrid 1 1999
FIGURE 26.14
SSRAM 2000
0 2001
2002
Communications SoCs: Density and memory size.
2003 Year
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 23 11.10.2007 8:31pm Compositor Name: TSuresh
26-23
Mobile and Wireless Computing
MIPS 1000 0
MB/s
EDRA M External memory 100 1999
FIGURE 26.15
2000
2001
2002
2003
Communications SoCs processing power and memory handwidth.
The processing power of these SoCs is also expected to increase as illustrated in Fig. 26.15. The processing power is measured in million instructions per second (MIPs). It is predicted to grow from 100 to 1000 MIPs (dashed line) from 1999 to 2003. In same time period, the memory bandwidth (solid line) will increase from 100 to 1000 Mbits. The growth of the number of DSP processors by SoC is shown in Fig. 26.16a. With all of this growth, it is interesting that the price of SoCs is estimated to decrease according to the trend shown in Fig. 26.16b. Several predictions were given to the bandwidth of communication chips. Two of these predictions are shown in Fig. 26.17. One assumes that the bandwidth will triple each year in the next 25 years as illustrated by the solid line (George Dilder-Telecosm). The other shows that the growth will be at the rate of 8–16 times a year [SUN Microsystems]. In the 1990s, Bill Gates claimed that ‘‘we will have infinite bandwidth in a decade of time (Gates 1994).’’
26.3.5
System Latency
Latency is defined as the delay experienced a certain processing stage. The latency trends in Fig. 26.18 refer to the time taken to map the voice data into a packet to be transmitted. Three main types of latency are usually identified: . . .
100
Frame=packetization Delay Media processing delay=complexity of the system Bridging delay, e.g., used for conferencing or multi SoC system 100
$
Processor s
10
10
1 1999
1 2000
(a)
FIGURE 26.16
2001
2002
2003 Year
99
00
01
02
(b)
(a) Number of DSP processors per SoC. (b) Price per functional VoIP channel.
03 Year
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 24 11.10.2007 8:31pm Compositor Name: TSuresh
26-24
Digital Systems and Applications
Log growth WAN/MAN bandwidth 2x/3-4mo 1M 10,000 2x/18mo Processor performance
100
CY95
FIGURE 26.17
CY97
CY99
CY01
CY03
CY05
CY07
Bandwidth trends.
CO/GW
Network
CO/GW
Media / TV transmitter First bit transmitted
Media/TV receiver
A
Processing delay
A Network transit delay
Last bit received
Processing delay
t
End-to-end delay
FIGURE 26.18
System latency.
These delays may occur at different times in the life of the data in the communication system. A simplified communication system is shown in Fig 26.18. It starts with the sender transmitting data through the network to a receiver at the other end. The total system latency is known as the end-to-end delay. It consists of the time taken to send the first bit of a packet to the time it takes to receive the last bit in the stream of data, i.e., . . .
Delay in processing the data at the sending end Transit delay within the network Delay in processing the data at the receiving end
With the use of SoC, latency has been reduced and this reduction is projected to continue as the technology feature is getting finer. The trend is illustrated in Fig. 26.19. Several SoCs may themselves be integrated in one multichip module (MCM) as will be discussed next.
26.3.6
Communication MCMs
Digital communication SoCs are usually connected to external analog functions and I=O as depicted in Fig. 26.20. In order to optimize the interface between digital SoCs and analog functions, it is beneficial to
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 25 11.10.2007 8:31pm Compositor Name: TSuresh
26-25
Mobile and Wireless Computing
Latency
100
msec
10
1 1999
FIGURE 26.19
2000
2001
2002
2003
Latency for voice to packet in communication SoCs.
integrate both designs in a MCM. The simplest definition for an MCM is a single electronic DDR / SDRAM package containing more than one IC (Doanne 1993). An MCM then combines high performComm. SOC ance ICs with a custom-designed common subSpecialized I/O and strate structure that provides mechanical analog support for the chips and multiple layers of functions conductors to interconnect them. Such an arrangement takes advantage of the performance FIGURE 26.20 Communication MCMs. of the ICs because the interconnect length is much shorter. Multichip modules are not new, they preceded SoC. They have several advantages because they improve the maximum external memory bandwidth achieved, reduce size and weight of the product, increase the operating speed, and decrease power dissipation of the system; however, they are limited by wiring capacitance to frequencies below 150 MHz, e.g., Sony’s HandyCam. Thus, they are limited by slower memory in comparison with the massive parallel processing power of an SoC with embedded memory. MCM wide bus pin out is restricted by cost and yield in comparison with an SoC that provides high throughput data processing with wide 256–1024 bit on chip data bus. System configurability is harder to achieve in MCM than in SoC that are software configurable. Analog and digital functions are separately optimized in MCM while in SoC many analog functions are optimized and their yield improved by using on chip integrated DSP algorithms. Multiple communication SoCs and analog functions can be packaged on a single MCM. The advantage of MCMs is even more pronounced when the package is enhanced. For example, flip-chips may be used or even more advanced package. The interconnections between the various SoCs and the memory chips are the major paths for crosstalk and other types of signal distortion. Reducing the routing length of the connection will help to increase the operation speed. This can be achieved with a chip-on-chip (CoC) module. The metal redistribution layers were fabricated on the top of the processor and the two memory chips, while the original bond pads still remained for the wire bonding to the substrate. The memory chips can be mounted on the top of the processor using flip-chip technology. Redistribution layers have been used to replace the bond wires and traces on the substrate to provide the interconnections between memory chip MCM
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 26 11.10.2007 8:31pm Compositor Name: TSuresh
26-26
Digital Systems and Applications
and processor. Since Know Good Die memory chips are usually used, testing only requires open=short test between the processor and memory chips. No burn-in and extensive memory tests are required, so the connection to the package ball can be removed as a new test program is implemented with the open=short test of the memory interface through other IO paths of the VGA processor.
26.3.7
Summary
The broadband access, infrastructure, carrier, and enterprise Communication SoCs will demand higher MIPS, integration, and memory bandwidth. They will also demand lower latency, power dissipation, and cost=channel or function. Comm. SoC utilizes programmable DSP, hardwired DSP accelerators, and I=O to implement Comm. protocols and systems in a highly integrated form. Higher memory access frequency, DSP interface speeds, and specialized analog functions will demand the integration of Comm. SoCs on Comm. MCM.
References Batista, Elisa, ‘‘Bluetooth Promises and Hurdles,’’ Wired News, June 2000. Doanne, D.A. and P.D. Franzon, Eds. (1993), Multichip Module Technologies and Alternatives: The Basics, Van Nostrand Reinhold, New York. Gehring and Koutroubinas, ‘‘Designing cableless devices with the bluetooth specification,’’ Communication Systems Design, February 2000. Gupta, R.K. and Y. Zorian (1997), ‘‘Introduction to core—based system design,’’ IEEE Des. Test Comput., Vol. 14, No. 4, pp. 15–25. Hunt, M. and J.A. Rowson (1996), ‘‘Blocking in a system on a chip,’’ IEEE Spectrum, Vol. 36, No. 11, pp. 35–41. IEEE (1999a), P1450 Web site http==grouper.ieee.org=groups=1450=. IEEE (1999b), P1500 Web site http==grouper.ieee.org=groups=1500=. Mourad, S. and Y. Zorian, Principles of testing electronic systems, Wiley, 2000. Mourad, S. and B. Greene (2000), ‘‘Scan-path based testing of system on a chip,’’ Proc. IEEE International Conference of Electronics, Circuits and Systems, Cyprus, pp. 1081–1084. Murray, B.T. and J.P. Hayes (1996), ‘‘Testing ICs: getting to the core of the problem,’’ IEEE Computer, Vol. 29, No. 11, pp. 32–38. Okamoto, G, Smart Antenna Systems and Wireless LANs, Kluwer Academic Publishers, Boston, MA, 1999. Okamoto, G., S.-S. Jeng, and G. Xu, ‘‘Evaluation of timing synchronization algorithms for the smart wireless LAN system,’’ Proceedings of the IEEE VTC ’99 Conference, May 1999, pp. 2014–2018. Okamoto, G. and C.-W. Chen, ‘‘Capacity improvement of smart antenna systems via the maximum SINR beam forming algorithm,’’ Proceedings of the ICSPAT 2000 Conference, October 2000. Okamoto, G., et al., ‘‘An improved algorithm for dynamic slot assignment for the SWL system,’’ Proceedings of the Asilomar 2000 Conference, Pacific Grove, CA, October 2000. Smith, G. (1997), ‘‘Test and system level integration,’’ IEEE Des. Test Comput., Vol. 14, No. 4. Varma, P. and S. Bhatia (1997), ‘‘A structured test reuse methodology for core-based system chip,’’ Proc. IEEE International Test Conference, pp. 294–302. Zorian, Y. (1993), ‘‘A distributed BIST control scheme for complex VLSI devices,’’ Proc. 11th IEEE VLSI Test Symposium, pp. 6–11. Zorian, Y. (1997), ‘‘Test requirements for embedded core-based systems and IEEE P-1500,’’ Proc. IEEE International Test Conference, pp. 191–199. Zorian, Y., et al. (1998), ‘‘Testing embedded-core based system chips,’’ Proc. IEEE International Test Conference, pp. 135–149. VSI (1998), VSI Alliance Web site http:=www.vsi.org=. http:==www.digianswer.com=bluetooth=.
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 27 11.10.2007 8:31pm Compositor Name: TSuresh
Mobile and Wireless Computing
26.4
26-27
Communications and Computer Networks
Mohammad Ilyas The field of communications and computer networks deals with efficient and reliable transfer of information from one point to another. The need to exchange information is not new but the techniques employed to achieve information exchange have been steadily improving. During the past few decades, these techniques have experienced an unprecedented and innovative growth. Several factors have been and continue to be responsible for this growth. The Internet is the most visible product of this growth and it has impacted the life of each and every one. Section 26.4 describes salient features and operational details of communications and computer networks. The contents of Section 26.4 are organized in several subsections. Section 26.4.1 describes a brief history of the field of communications. Section 26.4.2 deals with the introduction of communication and computer networks. Section 26.4.3 describes operational details of computer networks. Section 26.4.4 discusses resource allocation mechanisms. Section 26.4.5 briefly describes the challenges and issues in communication and computer networks that are still to be overcome. Section 26.4.6 summarizes the article.
26.4.1
A Brief History
Exchange of information (communications) between two or more entities has been a necessity since the existence of human life. It started with some form and shape of human voice that one entity can create and other(s) can listen and interpret. Over a period of several centuries, these voices evolved into languages. As the population of the world grew, more and more languages were born. For a long time, languages were used for face-to-face communications. If there were ever a need to convey some information (a message) over a distance, someone would be briefed and sent to deliver the message to a distant site. Gradually, additional methods were developed to represent and exchange the information. These methods included symbols, shapes, and eventually alphabets. This development facilitated information recording and use of nonvocal means for exchanging information. Hence, preservation, dissemination, sharing, and communication of knowledge became easier. Until about 150 years ago, all communication was via wireless means and included smoke signals, beating of drums, and use of reflective surfaces for reflecting light signals (optical wireless). Efficiency of these techniques was heavily influenced by the environmental conditions. For instance, smoke signals were not very effective in windy conditions. In any case, as we will note later, some of the techniques that were in use centuries ago for conveying information over a distance, were similar to the techniques that we currently use. The only difference is that the implementation of those techniques is exceedingly more sophisticated now than it was centuries ago. As the technological progress continued and electronic devices started appearing on the surface, the field of communication also started making use of the innovative technologies. Alphabets were translated into their electronic representations so that information may be electronically transmitted. Morse code was developed for telegraphic exchange of information. Further developments led to the use of telephone. It is important to note that in earlier days of technological masterpieces, users would go to a common site where one could send a telegraphic message over a distance or could have a telephonic conversation with a person at a remote location. This was a classic example of resource sharing. Of course, human help was needed to establish a connection with remote sites. As the benefits of the advances in communication technologies were being harvested, the electronic computers were also emerging and making the news. The earlier computers were not only expensive and less reliable, they were also huge in size. For instance, the computers that used vacuum tubes, were of the size of a large room and used roughly about 10,000 vacuum tubes. These computers would stop working if a vacuum tube had burnt, and the tube would need to be replaced by using a ladder. On the average, those computers would function for a few minutes, before another vacuum tube’s replacement was
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 28 11.10.2007 8:31pm Compositor Name: TSuresh
26-28
Digital Systems and Applications
necessary. A few minutes of computer time was not enough to execute a large computer program. With the advent of transistors, computers not only became smaller in size, less expensive, but also more reliable. These aspects of computers resulted in their widespread applications. With the development of personal computers, there is hardly any side of our lives that has not been impacted by the use of computers. The field of communications is no exception and the use of computers has escalated our communication capabilities to new heights.
26.4.2
Introduction
Communication of information from one point to another in an efficient and reliable manner has always been a necessity. A typical communication system consists of the following components as shown in Fig. 26.21: . . . . .
Source that generates or has the information to be transported Transmitter that prepares the information for transportation Transmission medium that carries the information from one end to the other Receiver that receives the information and prepares it for delivering to the receiver Destination that takes the information from receiver and utilizes it as necessary
The information can be generated in analog or in digital form. Analog information is represented as a continuous signal that varies smoothly in time. As one speaks in a microphone, an analog voice signal is generated. Digital information is represented by a signal that stays at some fixed level for some duration of time followed by a change to another fixed level. A computer works with digital information that has two levels (binary digital signals). Figure 26.22 shows an example of analog and digital signals. Transmission of information can also be in analog or in digital form. Therefore, we have the following four possibilities in a communication system [21]: . . . .
Analog information transmitted as an analog signal Analog information transmitted as a digital signal Digital information transmitted as an analog signal Digital information transmitted as a digital signal Source
FIGURE 26.21
Transmitter
Transmission medium
FIGURE 26.22
Destination
A typical communication system.
Amplitude
(a)
Receiver
Amplitude
Analog signal
Time
Typical analog and digital signals.
(b)
Digital signal
Time
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 29 11.10.2007 8:31pm Compositor Name: TSuresh
Mobile and Wireless Computing
26-29
There may not be a choice regarding the form (analog or digital) of information being generated by a device. For instance, a voice signal as one speaks, a video signal as generated by a camera, a speed signal generated by a moving vehicle, and an altitude signal generated by the equipment in a plane will always be analog in nature; however, there is a choice regarding the form (analog or digital) of information being transmitted over a transmission medium. Transmitted information could be analog or digital in nature and information can be easily converted from one form to another. Each of these possibilities has its pros and cons. When a signal carrying information is transmitted, it looses its energy and strength and gathers some interference (noise) as it propagates away from the transmitter. If energy of signal is not boosted at some intermediate point, it may attenuate beyond recognition before it reaches its intended destination. That will certainly be a wasted effort. In order to boost energy and strength of a signal, it must be amplified (in case of analog signals) and rebuild (in case of digital signals). When an analog signals is amplified, the noise also becomes amplified and that certainly lowers expectations about receiving the signal at its destination in its original (or close to it) form. On the other hand, digital signals can be processed and reconstructed at any intermediate point and, therefore, the noise can essentially be filtered out. Moreover, transmission of information in digital form has many other advantages including processing of information for error detection and correction, applying encryption and decryption techniques to sensitive information, and many more. Thus, digital information transmission technology has become the dominant technology in the field communications [9,18]. As indicated earlier, communication technology has experienced phenomenal growth over the past several decades. The following two factors have always played a critical role in shaping the future of communications [20]: . .
Severity of user needs to exchange information State of the technology related to communications
Historically, inventions have always been triggered by the severity of needs. It has been very true for the field of communications as well. In addition, there is always an urge and curiosity to make things happen faster. When electricity was discovered and people (scattered around the globe) wanted to exchange information over longer distances and in less time, telegraph was invented. Morse code was developed with shorter sequences (of dots and dashes) for more frequent alphabets. That resulted in transmission of message in a shorter duration of time. Presence of electricity, and capability of wires to carry information over longer distances, led to the development of devices that converted human voice into electrical signal, and thus led to the development of telephone systems. Behind this invention was also a need=desire to establish full-duplex (two-way simultaneous) communication in human voice. As the use of telephone became widespread, there was a need for a telephone user to be connected to any other user, and that led to the development of switching offices. In the early days, the switching offices were operated manually. As the state of the technology improved, the manual switching offices were replaced by automatic switching offices. Each telephone user was assigned a telephone number for identification purposes and a user able to dial the number for the purpose of establishing a connection with the called party. As the computer technology improved and the computers became easier to afford and smaller in size, they found countless uses including their use in communications. The computers not only replaced the automatic (electromechanical) switching offices, they were also employed in many other aspects of communication systems. Examples include conversion of information from analog to digital and vice versa, processing of information for error detection and=or correction, compression of information, and encryption=decryption of information, etc. As computers became more powerful, there were many other applications that surfaced. The most visible application was the amount of information that users started sharing among themselves. The volume of information being exchanged among users has been growing exponentially over the last three decades. As users needed to exchange such a mammoth amount of information, new techniques were invented to facilitate the process. There was not only a need for users to exchange information with
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 30 11.10.2007 8:31pm Compositor Name: TSuresh
26-30
Digital Systems and Applications
others in an asynchronous fashion, there was also need for computers to exchange information among themselves. The information being exchanged in this fashion has different characteristics than the information being exchanged through the telephone systems. This need led to the interconnection of computers with each other and that is what is called computer networks.
26.4.3
Computer Networks
Computer networks is an interconnection of computers. The interconnection forms a facility that provides reliable and efficient means of communication among users and other devices. User communication in computer networks is assisted by computers, and the facility also provides communication among computers. Computer networks are also referred to as computer communication networks. Interconnection among computers may be via wired or wireless transmission medium [5,6,10,13,18]. There are two broad categories of computer networks: . .
Wide area networks Local=metropolitan area networks
Wide area computer networks, as the name suggests, span a wider geographical area and essentially have a global scope. On the other hand, local=metropolitan area networks span a limited distance. Local area networks are generally confined to an industrial building or an academic institution. Metropolitan area networks also have limited geographical scope but it is relatively larger than that of the local area networks [19]. Typical wide and local=metropolitan area networks are shown in Fig. 26.23. Once a user is connected to a computer network, it can communicate with any other user that is also connected to the network at some point. It is not required that a user must be connected directly to another user for communicating. In fact, in wide area networks, two communicating users will rarely be directly connected with each other. This implies that the users will be sharing the transmission links for exchanging their information. This is one of the most important aspects of computer networks. Sharing of resources improves utilization of the resources and is, of course, cost-effective as well. In addition to sharing the transmission links, the users will also share the processing power of the computers at the switching nodes, buffering capacity to store the information at the switching nodes, and any other resources that are connected to the computer network. A user that is connected to a computer network at any switching node will have immediate access to all the resources (databases, research articles, surveys, and much more) that are connected to the network as well. Of course, access to specific information may be restricted and a user may require appropriate authorization to access the information. The information from one user to another may need to pass through several switching nodes and transmission links before reaching its destination. This implies that a user may have many options available to select one out of many sequences of transmission links and switching nodes to exchange its information. That adds to the reliability of information exchange process. If one path is not available, not feasible or is not functional, some other path may be used. In addition, for better and effective sharing of resources among several users, it is not appropriate to let any user exchange a large quantity of information at a time; however, it is not uncommon that some users may have a large quantity of information to exchange. In that case, the information is broken into smaller units known as packets of information. Each packet is sent towards destination as a separate entity and all packets are assembled together at the destination side to re-create the original piece of information [2]. Due to resource sharing environment, users may not be able to exchange their information at any time they wish to because the resources (switching nodes, transmission links) may be busy serving other users. In that case, some users may have to wait for some time before they begin their communication. Designers of computer networks should design the network so that the total delay (including wait time) is as small as possible and that the total amount of information successfully exchanged (throughput) is as large as possible.
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 31 11.10.2007 8:31pm Compositor Name: TSuresh
26-31
Mobile and Wireless Computing
Switching node
Source
Destination
(a)
Destination Source
(b)
Transmission medium
Source
Destination
Transmission medium (c)
FIGURE 26.23 (a) A typical wide area computer communication network. (b) A typical local=metropolitan area communication bus network. (c) A typical local=metropolitan area communication ring network.
As can be noted, many aspects must be addressed for enabling networks to transport users’ information from one point to another. The major aspects are listed below: . . . .
. . . . .
Addressing mechanism to identify users Addressing mechanism for information packets to identify their source and destination Establishing a connection between sender and receiver and maintaining it Choosing a path or a route (sequence of switching nodes and transmission links) to carry the information from a sender to a receiver Implementing a selected route or path Checking information packets for errors and recovering from errors Encryption and decryption of information Controlling the flow of information so that shared resources are not over taxed Informing the sender that the information has been successfully delivered to the intended destination (acknowledgment)
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 32 11.10.2007 8:31pm Compositor Name: TSuresh
26-32 . .
.
Digital Systems and Applications
Billing for the use of resources Making sure that different computers that are running different applications and operating systems, can exchange information Preparing information appropriately for transmission over a given transmission medium
This is not an exhaustive list of items that need to be addressed in computer networks. In any case, all such issues are addressed by very systematic and detailed procedures. The procedures are called communication protocols. The protocols are implemented at the switching nodes by a combination of hardware and software. It is not advisable to implement all these features in one module of hardware or software because that will become very difficult to manage. It is a standard practice that these features be divided in different smaller modules and then modules be interfaced together to collectively provide implementation of these features. International Standards Organization (ISO) has suggested dividing these features into seven distinct modules called layers. The proposed model is referred to as Open System Interconnection (OSI) reference model. The seven layers proposed in the OSI reference model are [2]: . . . . . . .
Application layer Presentation layer Session layer Transport layer Network layer Data link layer Physical layer
Physical layer deals with the transmission of information on the transmission medium. Data link layer handles the information on a single link. Network layer deals with the path or route of information from the switching node where source is connected to the switching node where receiver is connected. It also monitors end-to-end information flow. The remaining four layers reside with the user equipment. Transport layer deals with the information exchange from source to the sender. Session layer handles establishment of session between source and the receiver and maintains it. Presentation layer deals with the form in which information is presented to the lower layer. Encryption=decryption of information can also be performed at this layer. Application layer deals with the application that generates the information at the source side and what happens to it when it is delivered at the receiver side. As the information begins from the application layer at the sender side, it is processed at every layer according to the specific protocols implemented at that layer. Each layer processes the information and appends a header and=or a trailer with the information before passing it on to the next layer. The headers and trailers appended by various layers contribute to the overhead and are necessary for transportation of the information. Finally, at the physical layer, the bits of information packets are converted to an appropriate signal and transmitted over the transmission medium. At the destination side, the physical layer receives the information packets from the transmission medium and prepares them for passing these to the next higher layer. As a packet is processed by the protocol layers at the destination side, its headers and trailers are stripped off before it is passed to the next layer. By the time information reaches the application layer, it should be in the same form as it was transmitted by the source. Once a user is ready to send information to another user, he or she has two options. He or she can establish a communication with the destination prior to exchanging information or he can just give the information to the network node and let the network deliver the information to its destination. If communication is established prior to exchanging the information, the process is referred to as connection-oriented service and is implemented by using virtual circuit connections. On the other hand, if no communication is established prior to sending the information, the process is called connectionless service. This is implemented by using datagram environment. In connection-oriented
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 33 11.10.2007 8:31pm Compositor Name: TSuresh
Mobile and Wireless Computing
26-33
(virtual circuit) service, all packets between two users travel over the same path through a computer network and hence arrive at their destination in the same order as they were sent by the source. In connectionless service, however, each packet finds its own path through the network while traveling towards its destination. Each packet will therefore experience different delay and the packets may arrive at their destination out of sequence. In that case, destination will be required to put all the packets in proper sequence before assembling them [2,10,13]. As in all resource sharing systems, allocation of resources in computer networks requires a careful attention. The main idea is that the resources should be shared among users of a computer network as fairly as possible. At the same, it is desired to maintain the network performance as close to its optimal level as possible. The fairness definition, however, varies from one individual to another and depends upon how one is associated with a computer networks. Although fairness of resource sharing is being evaluated, two performance parameters—delay and throughput—for computer networks are considered. The delay is the duration of time from the moment information is submitted by a user for transmission to the moment it is successfully delivered to its destination. The throughput is amount of information successfully delivered to its intended destination per unit time. Due to the resource sharing environment in computer networks, these two performance parameters are contradictory. It is desired to have the delay as small as possible and the throughput as large as possible. For increasing throughput, a computer network must handle increased information traffic, but the increased level of information traffic also causes higher buffer occupancy at the switching nodes and hence, more waiting time for information packets. This results in an increase in delay. On the other hand, if information traffic is reduced to reduce the delay, that will adversely affect the throughput. A reasonable compromise between throughput and delay is necessary for a satisfactory operation of a computer network [10,11]. 26.4.3.1
Wide Area Computer Networks
A wide area network consists of switching nodes and transmission links as shown in Fig. 26.23a. Layout of switching nodes and transmission links is based on the traffic patterns and expected volume of traffic flow from one site to another site. Switching nodes provide the users access to a computer network and implement communication protocols. When a user is ready to transmit its information, the switching node, to which the user is connected to, will establish a connection if a connection-oriented service has been opted. Otherwise, the information will be transmitted in a connectionless environment. In either case, switching nodes play a key role in determining path of the information flow according to some well-established routing criteria. The criteria include performance (delay and throughput) objectives among other factors based on user needs. For keeping the network traffic within a reasonable range, some traffic flow control mechanisms are necessary. In late 1960s and early 1970s, when data rates of transmission media used in computer networks were low (a few thousands of bits per second), these mechanisms were fairly simple. A common method used for controlling traffic over a transmission link or a path was an understanding that sender will continue sending information until the receiver sends a request to stop. The information flow will resume as soon as the receiver sends another request to resume transmission. Basically the receiver side had the final say in controlling the flow of information over a link or a path. As the data rates of transmission media started increasing, this method was not deemed efficient. To control the flow of information in relatively faster transmission media, a sliding window scheme was used. According to this scheme, sender will continuously send information packet but no more than a certain limit. Once the limit has reached, the sender will stop sending the information packets and will wait for the acknowledgement of the packets that have been transmitted. As soon as an acknowledgement is received, the sender may send another packet. This method ensures that there are no more than a certain specific number of packets in transit from sender to receiver at any given time. Again the receiver has the control over the amount of information that sender can transmit. These techniques for controlling the information traffic are referred to as reactive or feedback based techniques because the decision to transmit or not to transmit is based on the current traffic conditions.
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 34 11.10.2007 8:31pm Compositor Name: TSuresh
26-34
Digital Systems and Applications
The reactive techniques are acceptable in low to moderate data rates of transmission media. As the data rates increase from kilobits per second to megabits and gigabits per second, the situation changes. Over the past several years, the data rates have increased manifold. Optical fibers provide enormously high data rates. Size of the computer networks has also experienced tremendous increase. The amount of traffic flowing through these networks has been increasing exponentially. Given that, the traffic control techniques used in earlier networks are not quite effective anymore [11,12,22]. One more factor that has added to the complexity of the situation is that users are now exchanging different types of information through the same network. Consider the example of Internet. The geographical scope of Internet is essentially global. Extensive use of optical fiber as transmission media provides very high data rates for exchanging information. In addition, users are using Internet for exchanging any type of information that they come across, including voice, video, data, etc. All these factors have essentially necessitated use of modified approach for traffic management in computer networks. The main factor leading to this change is that the information packets are moving so fast through the computer networks that any feedback-based (or reactive) control will be too slow to be of any use. Therefore, some preventive mechanisms have been developed to maintain the information traffic inside a computer network to a comfortable level. Such techniques are implemented at the sender side by ensuring that only as much information traffic is allowed to enter the network as can be comfortably handled by the networks [1,20,22]. Based on the users’ needs and state of the technology, providing faster communications for different types of services (voice, video, data, and others) in the same computer network in an integrated and unified manner, has become a necessity. These computer networks are referred to as broadband integrated services digital networks (BISDNs). Broadband ISDNs provide end-to-end digital connectivity and users can access any type of communication service from a single point of access. Asynchronous transfer mode (ATM) is expected to be used as a transfer mechanism in broadband ISDNs. ATM is essentially a fast packet switching technique where information is transmitted in the form of small fixed-size packets called cells. Each cell is 53 bytes long and includes a header of 5 bytes. The information is primarily transported using connection-oriented (virtual circuit) environment [3,4,8,12,17]. Another aspect of wide area networks is the processing speed of switching nodes. As the data rates of transmission media increases, it is essential to have faster processing capability at the switching nodes. Otherwise, switching nodes become bottlenecks and faster transmission media cannot be fully utilized. When transmission media consists of optical fibers, the incoming information at a switching node is converted from optical form to electronic form so that it may be processed and appropriately switched to an outgoing link. Before it is transmitted, the information is again converted from electronic form to optical form. This slows down the information transfer process and increases the delay. To remedy this situation, research is being conducted to develop large optical switches to be used as switching nodes. Optical switches will not require conversion of information from optical to electronic and vice versa at the switching nodes; however, these switches must also possess the capability of optical processing of information. When reasonable sized optical switches become available, use of optical fiber as transmission media together with optical switches will lead to all-optical computer and communication networks. Information packets will not need to be stored for processing at the switching nodes and that will certainly improve the delay performance. In addition, wavelength division multiplexing techniques are rendering use of optical transmission media to its fullest capacity [14]. 26.4.3.2
Local and Metropolitan Area Networks
A local area network has a limited geographical scope (no more than a few kilometers) and is generally limited to a building or an organization. It uses a single transmission medium and all users are connected to the same medium at various points. The transmission medium may be open-ended (bus) as shown in Fig. 26.23b or it may be in the form of a loop (ring) as shown in Fig. 26.23c. Metropolitan area networks also have a single transmission medium that is shared by all the users connected to the network, but the medium spans a relatively larger geographical area, upto 150 km.
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 35 11.10.2007 8:31pm Compositor Name: TSuresh
Mobile and Wireless Computing
26-35
They also use a transmission medium with relatively higher data rates. Local and metropolitan area networks also use a layered implementation of communication protocols as needed in wide area networks; however, these protocols are relatively simpler because of simple topology, no switching nodes, and limited distance between the senders and the receivers. All users share the same transmission medium to exchange their information. Obviously, if two or more users transmit their information at the same time, the information from different users will interfere with each other and will cause a collision. In such cases, the information of all users involved in a collision will be destroyed and will need to be retransmitted. Therefore, there must be some well-defined procedures so that all users may share the same transmission medium in a civilized manner and have successful exchange of information. These procedures are called medium access control (MAC) protocols. There are two broad categories of MAC protocols: . .
Controlled access protocols Contention-based access protocols
In controlled access MAC protocols, users take turns in transmitting their information and only one user is allowed to transmit information at a time. When one user has finished its transmission, the next user begins transmission. The control could be centralized or distributed. No information collisions occur and, hence, no information is lost due to two or more users transmitting their information at the same time. Example of controlled access MAC protocols include token-passing bus and token-passing ring local area networks. In both of these examples, a token (a small control packet) circulates among the stations. A station that has the token is allowed to transmit information, and other stations wait until they receive the token [19]. In contention-based MAC protocols, users do not take turns in transmitting their information. When a users becomes ready, it makes its own decision to transmit and also faces a risk of becoming involved in a collision with another stations who also decides to transmit at about the same time. If no collision occurs, the information may be successfully delivered to its destination. On the other hand, if a collision occurs, the information from all users involved in a collision will need to be retransmitted. An example of contention-based MAC protocols is carrier sense multiple access with collision detection (CSMA=CD) which is used in Ethernet. In CSMA=CD, a user senses the shared transmission medium prior to transmitting its information. If the medium is sensed as busy (someone is already transmitting the information), the user will refrain from transmitting its information; however, if the medium is sensed as free, the user transmits its information. Intuitively, this MAC protocol should be able to avoid collisions, but collisions still do take place. The reason is that transmissions travel along the transmission medium at a finite speed. If one user senses the medium at one point and finds it free, it does not mean that another user located at another point of the medium has not already begun its transmission. This is referred to as the effect of the finite propagation delay of electromagnetic signal along the transmission medium. This is the single most important parameter that causes deterioration of performance in contention-based local area networks [11,19]. Design of local area networks has also been significantly impacted by the availability of transmission media with higher data rates. As the data rate of a transmission medium increases, the effects of propagation delay becomes even more visible. In higher speed local area networks such as Gigabit Ethernet, and 100-BASE-FX, the medium access protocols are designed such that to reduce the effects of propagation delay. If special attention is not given to the effects of propagation delay, the performance of high-speed local area networks becomes very poor [15,19]. Metropolitan area networks essentially deal with the same issues as local area networks. These networks are generally used as backbones for interconnecting different local area networks together. These are high-speed networks and span a relatively larger geographical area. MAC protocols for sharing the same transmission media are based on controlled access. Two most common examples of metropolitan area networks are fiber distributed data interface (FDDI) and distributed queue dual bus (DQDB). In FDDI, the transmission medium is in the form of two rings, whereas DQDB uses two
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 36 11.10.2007 8:31pm Compositor Name: TSuresh
26-36
Digital Systems and Applications
buses. FDDI rings carry information in one but opposite directions and this arrangement improves reliability of communication. In DQDB, two buses also carry information in one but opposite directions. The MAC protocol for FDDI is based on token passing and supports voice and data communication among its users. DQDB uses a reservation-based access mechanism and also supports voice and data communication among its users [19]. 26.4.3.3
Wireless and Mobile Communication Networks
Communication without being physically tied-up to wires has always been of interest and mobile and wireless communication networks promises that. The last few years have witnessed unprecedented growth in wireless communication networks. Significant advancements have been made in the technologies that support wireless communication environment and there is much more to come in the future. The devices used for wireless communication require certain features that wired communication devices may not necessarily need. These features include low power consumption, light weight, and worldwide communication ability. In wireless and mobile communication networks, the access to a communication network is wireless so that the end users remain free to move. The rest of the communication path could be wired, wireless, or combination of those. In general, a mobile user, while communicating, has a wireless connection with a fixed communication facility and rest of the communication path remains wired. The range of wireless communication is always limited and therefore range of user mobility is also limited. To overcome this limitation, cellular communication environment has been devised. In a cellular communication environment, geographical region is divided into smaller regions called cells, thus the name cellular. Each cell has a fixed communication device that serves all mobile devices within that cell. However, as a mobile device, while in active communication, moves out of one cell and into another cell, service of that connection is transferred from one cell to another. This is called handoff process [7,16]. The cellular arrangement has many attractive features. As the cell size is small, the mobile devices do not need very high transmitting power to communicate. This leads to smaller devices that consume less power. In addition, it is well known that the frequency spectrum that can be used for wireless communication is limited and can therefore only support a small number of wireless communication connections at a time. Dividing communication region into cells allows use of the same frequency in different cells as long as they are sufficiently apart to avoid interference. This increases the number of mobile devices that can be supported. Advances in digital signal processing algorithms and faster electronics have led to very powerful, smaller, elegant, and versatile mobile communication devices. These devices have tremendous mobile communication abilities including wireless Internet access, wireless e-mail and news items, and wireless video (through limited) communication on handheld devices. Wireless telephones are already available and operate in different communication environments across the continents. The day is not far when a single communication number will be assigned to every newborn and will stay with that person irrespective of his=her location. Another field that is emerging rapidly is the field if ad hoc wireless communication networks. These networks are of a temporary nature and are established for a certain need and for a certain duration. There is no elaborate setup needed to establish these networks. As a few mobile communication devices come in one another’s proximity, they can establish a communication network among themselves. Typical situations where ad hoc wireless networks can be used are classroom environment, corporate meetings, conferences, disaster recovery situations, etc. Once the need for networking is satisfied, the ad hoc networking setup disappears.
26.4.4
Resource Allocation Techniques
As discussed earlier, computer networks are resource sharing systems. Users share the common resources as transmission media, processing power and buffering capacity at the switching nodes, and other resources that are part of the networks. A key to successful operation of computer networks is a fair and
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 37 11.10.2007 8:31pm Compositor Name: TSuresh
Mobile and Wireless Computing
26-37
efficient allocation of resources among its users. Historically, there have been two approaches to allocation of resources to users in computer networks: . .
Static allocation of resources Dynamic allocation of resources
Static allocation of resources means that a desired quantity of resources is allocated to each user and they may use it whenever they need. If they do not use their allocated resources, no one else can. On the other hand, dynamic allocation of resources means that a desired quantity of resources is allocated to users on the basis of their demands and for the duration of their need. Once the need is satisfied, the allocation is retrieved. In that case, someone else can use these resources if needed. Static allocation results in wastage of resources, but does not incur the overhead associated with dynamic allocation. Which technique should be used in a given a situation is subject to the famous concept of supply and demand. If resources are abundant and demand is not too high, it may be better to have static allocation of resources; however, when the resources are scarce and demand is high, dynamic allocation is almost a necessity to avoid the wastage of resources. Historically, communication and computer networks have dealt with both the situations. Earlier communication environments used dynamic allocation of resources when users will walk to public call office to make a telephone call or send a telegraphic message. After a few years, static allocation of resources was adopted, when users were allocated their own dedicated communication channels and these were not shared among others. In late 1960s, the era of computer networks dawned with dynamic allocation of resources and all communication and computer networks have continued with this tradition to date. With the advent of optical fiber, it was felt that the transmission resources are abundant and can satisfy any demand at any time. Many researchers and manufacturers held the opinion in favor of going back to the static allocation of resources, but a decision to continue with dynamic resource allocation approach was made and that is here to stay for many years to come [10].
26.4.5
Challenges and Issues
Many challenges and issues are related to communications and computer networks that are still to be overcome. Only the most important ones will be described in this subsection. High data rates provided by optical fibers and high-speed processing available at the switching nodes has resulted in lower delay for transferring information from one point to another. However, the propagation delay (the time for a signal to propagate from one end to another) has essentially remained unchanged. This delay depends only on the distance and not on the data rate or the type of the transmission medium. This issue is referred to as latency versus delay issue [11]. In this situation traditional feedback-based reactive traffic management techniques become ineffective. New preventive techniques for effective traffic management and control are essential for achieving the full potential of these communication and computer networks [22]. Integration of different services in the same networks has also posed new challenges. Each type of sexrvice has its own requirements for achieving a desired level of quality of service (QoS). Within the networks any attempt to satisfy QoS for a particular service will jeopardize the QoS requirements for other service. Therefore, any attempt to achieve a desired level of quality of service must be uniformly applied to the traffic inside a communication and computer network and should not be intended for any specific service or user. That is another challenge that needs to be carefully addressed and its solutions achieved [13]. Maintaining security and integrity of information is another continuing challenge. The threat of sensitive information passively or actively falling into unauthorized hands is very real. In addition, proactive and unauthorized attempts to gain access to secure databases are also very real. These issues need to be resolved to gain the confidence of consumers so that they may use the innovations in communications and computer networking technologies to their fullest [13].
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 38 11.10.2007 8:31pm Compositor Name: TSuresh
26-38
26.4.6
Digital Systems and Applications
Summary and Conclusions
Section 26.4 discussed the fundamentals of communications and computer networks and the latest developments related to these fields. Communications and computer networks have witnessed tremendous growth and sophisticated improvements over the last several decades. Computer networks are essentially resource sharing systems in which users share the transmission media and the switching nodes. These are used for exchanging information among users that are not necessarily connected directly. Transmission rates of transmission media have increased manifold and the processing power of the switching nodes (which are essentially computers) has also been multiplied. The emerging computer networks are supporting communication of different types of services in an integrated fashion. All types of information, irrespective of its type and source, is being transported in the form of packets (e.g., ATM cells). Resources are being allocated to users on a dynamic basis for better utilization. Wireless communication networks are emerging to provide worldwide connectivity and exchange of information at any time. These developments have also posed some challenges. Effective traffic management techniques, meeting QoS requirements, and information security are the major challenges that need to be surmounted in order to win the confidence of users.
References 1. Bae, J., and Suda, T., Survey of traffic control schemes and protocols in ATM networks, Proceedings of the IEEE, Vol. 79, No.2, February 1991, pp. 170–189. 2. Beyda, W., Data Communications from Basics to Broadband, Third Edition, 2000. 3. Black, U., ATM: Foundation for Broadband Networks, Prentice-Hall, Englewood Cliffs, NJ, 1995. 4. Black, U., Emerging Communications Technologies, Second Edition, Prentice-Hall, Englewood Cliffs, NJ, 1997. 5. Chou, C., ‘‘Computer networks in communication survey research,’’ IEEE Transactions on Professional Communication, Vol. 40, No. 3, September 1997, pp. 197–208. 6. Comer, D., Computer Networks and Internets, Prentice-Hall, Englewood Cliffs, NJ, 1999. 7. Goodman, D., Wireless Personal Communication Systems, Addison-Wesley, Reading, MA, 1999. 8. Goralski, W., Introduction to ATM Networking, McGraw-Hill, New York, 1995. 9. Freeman, R., Fundamentals of Telecommunications, John Wiley & Sons, New York, 1999. 10. Ilyas, M., and Mouftah, H.T., ‘‘Performance evaluation of computer communication networks,’’ IEEE Communications Magazine, Vol. 23, No. 4, April 1985, pp. 18–29. 11. Kleinrock, L., ‘‘The latency=bandwidth tradeoff in gigabit networks,’’ IEEE Communications Magazine, Vol. 30, No. 4, April 1992, pp. 36–40. 12. Kleinrock, L., ‘‘ISDN-The path to broadband networks,’’ Proceedings of the IEEE, Vol. 79, No. 2, February 1991, pp. 112–117. 13. Leon-Garcia, A., and Widjaja, I., Communication Networks, Fundamental Concepts and Key Architectures, McGraw Hill, New York, 2000. 14. Mukherjee, B., Optical Communication Networks, McGraw-Hill, New York, 1997. 15. Partridge, C., Gigabit Networking, Addison-Wesley, Reading, MA, 1994. 16. Rappaport, T., Wireless Communications, Prentice-Hall, Englewood Cliffs, NJ, 1996. 17. Schwartz, M., Broadband Integrated Networks, Prentice-Hall, Englewood Cliffs, NJ, 1996. 18. Shay, W., Understanding Communications and Networks, Second Edition, PWS, 1999. 19. Stallings, W., Local and Metropolitan Area Networks, Sixth Edition, Prentice-Hall, Englewood Cliffs, NJ, 2000. 20. Stallings, W., ISDN and Broadband ISDN with Frame Relay and ATM, Fourth Edition, Prentice-Hall, Englewood Cliffs, NJ, 1999.
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 39 11.10.2007 8:31pm Compositor Name: TSuresh
Mobile and Wireless Computing
26-39
21. Stallings, W., High-Speed Networks, TCP=IP and ATM Design Principles, Prentice-Hall, Englewood Cliffs, NJ, 1998. 22. Yuan, X., ‘‘A study of ATM multiplexing and threshold-based connection admission control in connection-oriented packet networks,’’ Doctoral Dissertation, Department of Computer Science and Engineering, Florida Atlantic University, Boca Raton, Florida 33431, August 2000.
26.5
Video over Mobile Networks
Abdul H. Sadka 26.5.1
Introduction
Due to the growing need for the use of digital video information in multimedia communications especially in mobile environments, research efforts have been focusing on developing standard algorithms for the compression and transport of video signals over these networking platforms. Digital video signals, by nature, require a huge amount of bandwidth for storage and transmission. A 6-second monochrome video clip of QCIF (176 3 144) resolution and a frame rate of 30 Hz requires over 742 kbytes of raw video data for its digital representation where each pixel has an 8-bit luminance (intensity) value. When this digital signal is intended for storage or remote transmission, the occupied bandwidth becomes too large to be accommodated and thus compression becomes necessary for the efficient processing of the video content. Therefore, in order to transmit video data over communication channels of limited bandwidth, some kind of compression must be applied before transmission. Video compression technology has witnessed a noticeable evolution over the last decade as research efforts have revolved around the development of efficient techniques for the compression of still images and discrete raw video sequences. This evolution has then progressed into improved coding algorithms that are capable of handling both errors and varying bandwidth availability of contemporary communication media. The contemporary standard video coding algorithms provide both optimal coding efficiency and error resilience potential. Current research activity is focused on the technologies associated with the provision of video services over the future mobile networks at user-acceptable quality and with minimal cost requirements. Section 26.5 discusses the basic techniques employed by video coding technology, and the associated most prominent error resilience mechanisms used to ensure an optimal trade-off between the coding efficiency and quality of service of standard video coding algorithms. This section also sheds the light on the algorithmic concepts underlying these technologies and provides a thorough presentation of the capabilities of contemporary mobile access networks, such as general packet radio service (GPRS), to accommodate the transmission of compressed video streams at various network conditions and application scenarios.
26.5.2
Evolution of Standard Image=Video Compression Algorithms
The expanding interest in mobile multimedia communications and the concurrently expanding growth of data traffic requirements have led to a tremendous amount of research work during a period of over 15 years for developing efficient image and video compression algorithms. Both International Telecommunications Union (ITU) and International Organization for Standardization (ISO) have released a number of standards for still image and video coding algorithms that employ the discrete cosine transforms (DCT) and the Macroblock (MB) structure of an image to suppress the temporal and spatial redundancies incorporated in a sequence of images. These standardized algorithms aimed at establishing an optimal trade-off between the coding efficiency and the perceptual quality of the reconstructed signal. After the release of the first still-image coding standard, namely JPEG [1], CCITT recommended the standardisation of the first video compression algorithm for low-bit rate communications at p 3 64 kbit=s over ISDN, namely ITU-T H.261 [2] in 1990. In post 1990s, intensive work has been carried out to develop improved versions of the aforementioned ITU standard, and this has
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 40 11.10.2007 8:31pm Compositor Name: TSuresh
26-40
Digital Systems and Applications
culminated in a number of video coding standards, namely MPEG-1 [3] for audiovisual data storage (1.52 Mbit=s) on CD-ROM, MPEG-2 [4] (or ITU-T H.262) for HDTV applications (4–9 Mbit=s), ITU-T H.263 [5] for very low bit rate (<64 kbit=s) communications over PSTN networks, and then the first content-based, object-oriented audiovisual compression algorithm, namely MPEG-4 [6], for multimedia communications over mobile networks in 1998. Recent standardization work resulted in recommending annexes to ITU-T H.263 standard, namely H.263þ [7] and H.263þþ [8] for improved coding efficiency, bit rate scaleability, and error resilience performance. ITU-T is currently considering the standardization of H.26L, a new video compression algorithm expected to outperform H.263 at very low bit rate applications. Despite this remarkable evolution of digital video coding technology, the common feature for all the released standards so far is that they all employ the same algorithmic concepts and build on them for further improvement in both quality and coding efficiency. In this chapter section, the fundamental techniques that constitute the core of today’s video coders are presented.
26.5.3
Digital Representation of Raw Video Data
A video signal is a sequence of still images. When played at a high enough rate, the sequence of images (mostly referred to as video frames) gives the impression of an animated video scene. Video frames are captured by a camcorder at a certain sampling rate and processed as a sequence of still pictures correlated by motion dependencies. When adjacent frames are strongly correlated, smaller redundancy is found in the video signal if only the difference between successive frames is encoded. The process of exploiting temporal redundancies between adjacent frames by subtracting the prediction image (sometimes referred to as the motion compensated image) from the original input image and then coding the resulting residual is called INTER frame coding. If no motion prediction was employed in encoding a video frame and only spatial redundancies were exploited to compress a video frame, then the frame is said to be INTRA coded. Each video frame is a two-dimensional matrix of pixels, each of which is represented by a luminance (intensity) component and two chrominance (color) components Y, U, and V, respectively. In blockbased video coders, each frame is divided into groups of blocks (GOB). Each GOB is divided into a number of MBs (macroblock). A MB relates to 16 pixels by 16 lines of luminance Y and the spatially corresponding 8 pixels by 8 lines of chrominance U and V. A MB consists of four Y-blocks and two spatially corresponding color difference blocks. Figure 26.24 depicts the hierarchical layering structure of a video frame of Quadrature Common Intermediate Format (QCIF) resolution, i.e., 176 pixels by 144 lines.
26.5.4
Basic Concepts of Block-Based Video Coding Algorithms
Despite their differences, the video coding standards have the same core structure. They all adopt the MB structure as described in the previous section and consist of the same major building blocks. The standard video coding algorithms employ one of the two coding modes, INTRA or INTER. A typical block diagram of a block-based transform video coder is depicted in Fig. 26.25. 26.5.4.1
Discrete Cosine Transforms (DCT)
The 64 coefficients of an 8 3 8 block of data are passed through a DCT transformer. DCT extracts the spatial redundancies of the video block by gathering the biggest portion of its energy in the low frequency components that are located in the top left corner of the block. The transfer function of a two-dimensional DCT transformer employed in a block-based video coder is given in Eq. 26.1 below: 7 X 7 h X 1 ui h vi cos p(2y þ 1) f (x,y) cos p(2x þ 1) F(u,v) ¼ C(u)C(v) 4 16 16 x¼0 y¼0
(26:1)
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 41 11.10.2007 8:31pm Compositor Name: TSuresh
26-41
Mobile and Wireless Computing
GOB: 144x16 Y, 88x8 Cb and 88x8 Cr
PICTURE : 176x44 Y, 88x72 Cb and 88x72 Cr GOB no. 1 GOB no. 2
MB MB no. 1 no. 2
GOB no. 3
MB no. 11
GOB no. 9 MB: 16x16 Y, 8x8 Cb and 8x8 Cr MB no. k
=
1
2
3
4
+
4 Y–Blocks
5
+
1 Cb–Block
6 1 Cr–Block
Block: 8x8 Y, Cb or CrBlock One of 64 pixels in a Block
FIGURE 26.24
Hierarchical layering structure for a QCIF frame in block-based video coders.
Input +
−
DCT
Q
Zigzag
RLC
HUFF Output
IQ
IDCT
+ MC
FM
ME
FIGURE 26.25
DCT: Discrete cosine transform Q: Quantisation RLC: Run-length coding HUFF: Huffman coding IQ: Inverse quantisation IDCT: Inverse DCT FM: Frame memory MC: Motion compensation ME: Motion estimation
Block diagram of a block-based video coder.
with u, v, x, y ¼ 0, 1, 2, . . . , 7, where x and y are the spatial coordinates in the pixel domain, u and v are the coordinates in the transform domain 1 C(u) ¼ pffiffiffi 2 1 C(v) ¼ pffiffiffi 2
for u ¼ 0; 1 otherwise for v ¼ 0; 1 otherwise
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 42 11.10.2007 8:31pm Compositor Name: TSuresh
26-42
26.5.4.2
Digital Systems and Applications
Quantization
Quantization is a process that maps the symbols representing the DCT transformed coefficients from one set of levels to a narrower one in order to minimise the number of bits required to transmit the symbols. Quantization in block-based coders is a lossy process and thus it has a negative impact on the perceptual quality of the reconstructed video sequence. The quantization parameter (Qp) is a user-defined parameter that determines the level of distortion that affects the video quality due to quantization. The higher the quantization level, Qp, the coarser the quantization process. Quantization uses different techniques based on the coding mode employed (INTRA or INTER), the position of the coefficient in a video block (DC or AC coefficients), and the coding algorithm under consideration. 26.5.4.3
Raster Scan Coding
It is also known as zigzag pattern coding. The aim of zigzag coding the 8 3 8 matrix of quantised DCT coefficients is to convert the two-dimensional array into a stream of indices with a high occurrence of successive 0 coefficients. The long runs of zeros will then be efficiently coded as will be shown in the next subsection. The order of a zigzag pattern encoder is depicted in Fig. 26.26. 26.5.4.4
Run-Length Coding
The run-length encoder takes the one-dimensional array of quantised coefficients as input and generates coded runs as output. Instead of coding each coefficient separately, the run-length coder searches for runs of similar consecutive coefficients (normally zeros after the DCT and quantisation stages) and codes the length of the run and the preceding nonzero level. A 1-bit flag (LAST) is sent after each run to indicate whether or not the corresponding run is the last one in the current block. Run-lengths and levels are then fed to the Huffman coder to be assigned variable length codewords before transmission on the video channel. 26.5.4.5
Huffman Coding
Huffman coding, traditionally referred to as entropy coding, is a variable length coding algorithm that assigns codes to source-generated bit patterns based on their frequency of occurrence within the generated bit stream. The higher the likelihood of a symbol, the smaller the length of the codeword assigned to it and vice versa. Therefore, Entropy coding results in the optimum average codeword size for a given set of runs and levels. 26.5.4.6
Motion Estimation and Prediction
For each MB in a currently processed video frame, a sum of absolute differences (SAD) is calculated between its pixels and those of each 16 3 16 matrix of pixels that lie inside a window (in the previous frame) of a user-defined size called the search window. The 16 3 16 matrix, which results in the least SAD, is considered to most resemble the current MB and referred to as the ‘‘best match.’’ The displacement vector between the currently coded MB and the matrix that spatially corresponds to its best match in the previous frame is called the motion vector (MV) and the relative SAD is called the MB residual matrix. If the smallest SAD is less than a certain threshold then the MB is INTER coded by sending the MV and the DCT coefficients of the residual matrix, otherwise the MB is INTRA coded. The coordinates of the MV are transmitFIGURE 26.26 Sequence of zigzag-coding coefficients of a quantised 8 3 8 block. ted differentially using the coordinates of one or
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 43 11.10.2007 8:31pm Compositor Name: TSuresh
26-43
Mobile and Wireless Computing
Maximum−area search window for current MB
MB1 MB (x1,y1)
(x,y)
Frame N−1
Frame N
MB : Currently processed MB in Frame N MB1 : Best match 16 ⫻ 16 matrix in Frame N−1 MVx = x−x1 MVy = y−y1
FIGURE 26.27
Motion estimation process in a block-based video coder.
H.261 MV1 MV
MVDx = MVx − MV1x MVDy = MVy − MV1y
H.263 MV2 MV3 MV1 MV
MVDx = MVx − Px MVDy = MVy − Py Px = Median (MV1x, MV2x, MV3x) Py = Median (MV1y, MV2y, MV3y)
MVD : Differentially coded motion vector
FIGURE 26.28
Motion prediction in 2 ITU-T video coding standards.
more MVs corresponding to neighboring MBs (left MB in ITU-T H.261 or left, top, and top right MBs in ITU-T H.263 and ISO MPEG-4) within the same video frame. Figures 26.27 and 26.28 illustrate the motion estimation and prediction processes of contemporary video coding algorithms.
26.5.5
Subjective and Objective Evaluation of Perceptual Quality
The performance of a video coding algorithm can be simply subjectively evaluated by visually comparing the reconstructed video sequence to the original one. Two major types of subjective methods are used to assess the quality of perceptual video quality. In the first, an overall quality rating is assigned to the image (usually last decoded frame of a sequence). In the second, quality impairment is induced on a standard type image until it is completely similar to the reference image or vice versa. Objectively, the video quality is measured by using some mathematical criteria, the most common of which is the peak-to-peak signal-to-noise ratio (PSNR) defined in Eq. 26.2.
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 44 11.10.2007 8:31pm Compositor Name: TSuresh
26-44
Digital Systems and Applications
(a)
(b)
(c)
(d)
FIGURE 26.29 150th Frame of original: (a) ‘‘Suzie’’ sequence and its compressed version at 64 kbit=s using, (b) H.261, (c) baseline H.263, and (d) Full-option H.263.
PSNR ¼ 10 log10
1 MN
2552 PM1 PN 1 ^(i,j))2 i¼0 j¼0 (x(i,j) x
(26:2)
For a fair comparison of perceptual quality between two video coding algorithms, the objective and subjective results must be evaluated at the same target bit rates. Because the bit rate in kbit=s is directly proportional to the number of frames coded per unit of time, the frame rate (f=s) has also to be mentioned in the evaluation process. Figures 26.29 and 26.30 show the subjective and objective results, respectively, for coding 150 frames of the sequence ‘‘Suzie’’ at a bit rate of 64 kbit=s and a frame rate of 25 f=s.
26.5.6
Error Resilience for Mobile Video
Mobile channels are characterised by a high level of hostility resulting from high bit error ratios (BER) and information loss. Because of the bit rate variability and the spatial and temporal predictions, coded video streams are highly sensitive to transmission errors. This error sensitivity can be the reason for an ungraceful degradation of video quality, and hence the total failure of the video communication service. A single bit error could lead to a disastrous damage to perceptual quality. The most damaging effect of errors is that which leads to a loss of synchronisation at the decoder. In this case, the decoder is unable to determine the size of the affected variable-length video parameter and, therefore, drops the stream bits following the position of error until it resynchronises at the next synch word. Consequently, it is vital to employ an error resilience mechanism for the success of the underlying video communication service. A popular technique used to mitigate the effects of errors is called error concealment [9]. It is a decoderbased zero-redundancy error control scheme whereby the decoder makes use of previously received error-free video data for the reconstruction of the incorrectly decoded video segment. A commonly used approach conceals the effect of errors on a damaged MB by relying on the content of the spatially corresponding MB in the previous frame. In the case where motion data is corrupted, the damaged
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 45 11.10.2007 8:31pm Compositor Name: TSuresh
26-45
Mobile and Wireless Computing
40.0 (a) (b) (c)
39.0 38.0 37.0 36.0 Y-PSNR (dB)
35.0 34.0 33.0 32.0 31.0 30.0 29.0 28.0 27.0 26.0 25.0 0
25
50
75
100
125
150
Frame No.
FIGURE 26.30 and (c) H.261.
PSNR values for Suzie sequence compressed at 64 kbit=s (a) baseline H.263, (b) full-option H.263,
motion vector can be predicted from the motion vectors of spatially neighboring MBs in the same picture. On the other hand, transform coefficients could also be interpolated from pixels in neighboring blocks. However, error concealment schemes cannot provide satisfactory results for networks with high BERs and long error bursts. In this case, error concealment must be used in conjunction with error resilience schemes that make the coded streams more robust to transmission errors and video packet loss. In the literature, there are a large number of error resilience techniques specified in the standard ISO MPEG-4 [10] and the annexes to ITU-T H.263 defined in recommendations H.263þ [11] and H.263þþ [12]. One of the most effective ways of preventing the propagation of errors in encoded video sequences is the regular insertion of INTRA-coded frames, which do not make use of any information from previously transmitted frames; however, this method has the disadvantage of making the traffic characteristics of a video sequence extremely bursty since a much larger number of bits are required to obtain the same quality levels as for INTER (predictively coded) frames. A more efficient improvement to INTRA-frame refresh consists of regular coding of INTRA MBs per frame, referred to as Adaptive INTRA Refresh (AIR), where the INTRA coded MBs are identified as part of the most active region in the video scene. The insertion of a fixed number of INTRA coded MBs per frame can smooth out the bit rate fluctuations caused by coding the whole frame in INTRA mode. In the following subsections, we present two major standard-compliant error resilience algorithms specified in the MPEG-4 video coding standard, namely data partitioning and two-day decoding with reversible codewords. 26.5.6.1
Video Data Partitioning
The non error-resilient syntax of video coding standards suggests that video data is transmitted on a MB basis. In other words, the order of transmission is established such as all the parameters pertaining to a particular MB are sent before any parameter of the following MB is transmitted. This implies that a bit error detected in the texture data of an early MB in the video frame leads to the loss of all forthcoming MBs in the frame. Data partitioning changes the order of transmission of video data from a MB basis to a frame basis or a Visual Object Plane (VOP) basis in MPEG-4 terminology. Each video packet that corresponds to a VOP consists of two different partitions separated by specific bit patterns called markers (DC marker for INTRA coded VOPs and motion marker for INTER coded VOPs). The first
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 46 11.10.2007 8:31pm Compositor Name: TSuresh
26-46
Digital Systems and Applications
Motion marker
Synch word
Texture Texture data data
Motion/shape data Motion/Shape data
FIGURE 26.31
Data partitioning for error resilient video communication.
partition contains the shape information, motion data, and some administrative parameters such as COD for INTRA frames and MCBPC of all the MBs in the VOP, while the second partition contains the texture data (i.e., the transform coefficients TCOEFF) of all the MBs inside the VOP and other control parameters such as CBPY. Using this partitioning structure as illustrated in Fig. 26.31, the errors that hit the data bits of the second partition do not lead to the loss of the whole frame since the error-sensitive motion data would have been correctly decoded upfront. 26.5.6.2
Two-Way Decoding and Reversible Variable-Length Codewords
Two-way decoding is used with reversible VLC words in order to reduce the size of the damaged area in a video bit stream. This error resilience technique enables the video decoder to reconstruct a part of the stream that would have been skipped in the ordinary one-way decoding due to loss of synchronisation. This is achieved by allowing the decoding of the variable-length codewords of the video bit stream in the reverse direction. The reversible codewords are symbols that could be decoded in both the forward and reverse directions. An example of reversible VLCs is a set of codewords where each one of them consists of the same number of the starting symbol, either 1 or 0. For instance, the set of variable-length codewords that is defined by 0100, 11001, 10101, 01010, 10011, 0010, consists of codewords that contain three 1s or 0s each, where the 1 or 0 is the starting symbol, respectively. In conventional one-way decoding, the decoder loses synchronisation upon detection of a bit error. This is mainly due to the variable rate nature of compressed video streams and the variable-length Huffman codes assigned to various symbols that represent the video parameters. In order to restore its synchronisation, the decoder skips all the data bits following the position of errors until it falls on the first error-free synch word in the stream. The skipped bits are then discarded, regardless of their correctness, resulting in an effective error ratio that is larger than the channel BER by orders of magnitude. The response of the one-way video decoder to a bit error is depicted in Fig. 26.32. With two-way decoding, a part of the skipped segment of bits can be recovered by enabling decoding in the reverse direction as shown in Fig. 26.33. Upon detection of a bit error, the decoder stops its operation searching for the next synch word in the bit stream. Upon gaining synchronization at the synch word, the decoder resumes its operation in the backward direction thereby rescuing the part of the bit stream, which has been discarded in the forward direction. If no error is detected in the reverse direction then the damaged area is confined to the MB where the bit error has been detected in the forward direction. If an error has also been flagged up in the backward direction, then the segment of bits between the positions of error in both the forward and backward directions is discarded as the error damaged area as shown in Fig. 26.33. In many cases, a combination of error resilience techniques is used to further enhance the error robustness of compressed video streams to transmission errors of mobile environments. For instance, L
Synch word
0
1
...... m m+1 ........................ n+1
n
Cannot be decoded Error
FIGURE 26.32
One-way decoding of variable-length codes.
Error
......
0
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 47 11.10.2007 8:31pm Compositor Name: TSuresh
26-47
Mobile and Wireless Computing
L
Synch word
0
1
m m+1
n+1
n
0
Cannot be decoded Error
FIGURE 26.33
Error
Two-way decoding of variable-length codes.
both data partitioning and two-way decoding can be jointly employed to protect the error-sensitive motion data of the first video partition. The motion vectors and the administrative parameters contained in the first partition are all coded with reversible VLC words. The detection of a bit error in the forward direction triggers the decoder to stop its operation, regain synchronisation at the motion marker separating the two partitions in the corresponding VOP, and then decode backwards to salvage some of the correctly received bits that were initially skipped in the forward direction.
26.5.7
New Generation Mobile Networks
Packet-switched mobile access networks such as GPRS [13] and EGPRS [14] are intended to give subscribers access to a variety of mobile multimedia services that run on different networking platforms, let it be the core mobile network, i.e., UMTS, ATM, or even Internet. The packet-switched mobile access networks have a basic common feature in that they are all IP-based and allow time multi-slotting on a given radio interface. The multi-slotting capabilities enable the underlying networking platform to accommodate higher bit rates by providing the end-user with a larger physical layer capacity. The real-time interactive and conversational services are very much delay-critical, so the provision of these services over mobile networks can only be achieved by using a service class capable of guaranteeing the delay constraints with one-delays in the order of 200 msec being required. In order to achieve such delay requirements, it is necessary to avoid using any retransmissions or repeat-requests scenarios by operating the RLC layer of the GPRS protocol stack in the unacknowledged mode of operations. Similarly, the transport layer protocol that must be employed is the user datagram protocol (UDP), which operates over IP and does not make use of any repeat-request system. IP networks do not guarantee the delivery of packets and neither do they provide any mechanism to guarantee the orderly arrival of packets. This implies that not only does the inter-packet arrival time vary but it is also likely that packets may arrive out of order. Therefore, in order to transmit real-time video information, some transport-layer functionality must be overlaid on the network layer to provide timing information from which streaming video may be reconstructed. To offer this end-to-end network transport functionality, the IETF real-time transport protocol (RTP) [15] is used. RTP fulfills functions such as payload type identification, sequence numbering, timestamping, and delivery monitoring, and operates on top of IP and UDP for the provision of real-time services and video applications over the IP-based mobile networks. On the other hand, the mobile access networks employ channel protection schemes that provide error control capabilities against multipath fading and channel interferers. For instance, GPRS employs four channel protection schemes (CS-1 to CS-4), offering flexibility in the degree of protection and data traffic capacity available to the user. Varying the channel coding scheme allows for an optimization of the throughput across the radio interface as the channel quality varies. The data rates provided by GPRS with the channel coding schemes enabled are 8 kbit=s for CS-1, 12.35 kbit=s for CS-2, 14.55 kbit=s for CS-3, and 20.35 kbit=s for CS-4; however, almost 15% of the bits in the payload of a radio block are used up by header information belonging to the overlying protocols. Therefore, the rates presented to the video source for each one of the channel coding schemes per time slot are 6.8 kbit=s for CS-1, 10.5 kbit=s for CS-2, 12.2 kbit=s for CS-3, and 17.2 kbit=s for CS-4. It is, however, envisaged that the CS-1 and CS-2
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 48 11.10.2007 8:31pm Compositor Name: TSuresh
26-48
Digital Systems and Applications
schemes will be used for video applications. Obviously, the available throughput to a single terminal will be multiples of the given rates per slot, depending upon the multi-slotting capabilities of the terminal. Conversely, EGPRS provides 9 channel coding schemes of different protection rates and capabilities and the choice of a suitable scheme is again a trade-off between the throughput and the error protection potential.
26.5.8
Provision of Video Services over Mobile Networks
Taking into perspective the traffic characteristics of a coded video source employing a fixed quantiser, we observe that the output bit rate is highly variable with high peaks taking place each time an INTRAcoded frame is transmitted. INTRA frames require roughly three times on average the bandwidth required for transmitting a predictively coded frame. Therefore, if the frequency of INTRA frames is increased for error control purposes as discussed in Section 26.5.6, the encoder will have to discard a number of frames following each INTRA coded frame until some bandwidth becomes available. Despite the fact that a fixed quantiser leads to a constant spatial quality, yet the frequent insertion of INTRA frames in the video sequence has a degrading effect on the temporal quality of the entire video sequence. In order to preventively cure this situation, it is advisable that a rate control mechanism be employed at the video encoder before the coded video bit stream is sent over the mobile channel. One method is to vary the used quantiser value in order to truncate the high-frequency DCT coefficients in accordance with the target bit rate of the video coder and the number of bits available to code a particular frame, VOP or MB. Coding an INTRA frame with a coarse quantiser results in a poor spatial quality but helps improve the temporal quality of the video sequence by maintaining the original frame rate and reducing the jittering effect caused by the disparity in size between INTRA and INTER coded frames. The video delivery over mobile channels can take the form of real-time delay-sensitive conversational services, delay-critical (on-demand or live) streaming services, or delay-insensitive multimedia messaging applications. The latter requires guarantee on the error-free delivery of intended messages without placing any stipulation on the duration of transmission and therefore allows retransmissions of erroneous messages to take place. The former two categories of video services, however, are rather more delay-critical and necessitate the use of both application and transport layer end-to-end error control schemes for the robust transmission of compressed video in mobile environments. The analysis of the GPRS protocol efficiency shows that a reduction of 15% in the data rate per time slot, as seen by the video encoder, is enough to compensate for all the protocol overheads. The video quality that can be achieved in video communications over the new generation mobile networks, is a function of the time slot=coding-scheme combination and the channel conditions during the time of video packet transmission. It is observed that in error-free conditions, CS-1 yields a sub-optimal quality due to the large overhead it places on the available bandwidth of each time slot; however, in error-prone conditions and for C=I ratios lower than 15 dB, CS-1 presents the best error protection capabilities and offers the best video quality as compared to other channel coding schemes. When eight time slots are used with CS-1, GPRS can offer a video payload data rate of 54.4 kbit=s. At this rate, it has been demonstrated that QCIF-resolution conversational MPEG-4 video services can be offered over GPRS for a frame rate of 10 f=s with fairly good perceptual quality, especially when frequency hopping is used; however, for highly detailed scenes involving a high amount of motion, the error-free video quality at high C=I ratios suffers both spatially and temporally because of the coarse quantiser used and the jitter resulting from the large number of discarded frames respectively. The error protection schemes of the GPRS protocol are used in conjunction with the application-layer error resilience techniques specified by the MPEG-4 video compression standard. Figure 26.34 shows the subjective video quality achieved by transmitting an MPEG-4 coded video sequence (at 18 kbit=s) over a GPRS channel with and without error resilience (AIR) when CS-1 and four time slots are used. On the other hand, video services on EGPRS are less likely to encounter the same problems posed by the lack of bandwidth in the GPRS networks. When EGPRS employs the channel coding scheme MCS-9,
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 49 11.10.2007 8:31pm Compositor Name: TSuresh
26-49
Mobile and Wireless Computing
(a)
(b)
FIGURE 26.34 One frame of Suzie sequence encoded with MPEG-4 at 18 kbit=s and transmitted over a GPRS channel with C=I ¼ 15 dB, with CS-1 and 4 time-slots used: (a) no error resilience and (b) AIR.
the terminal can be offered a data rate of 402.4 kbit=s when 8 time slots are employed. Obviously, at this data rate, there exists a much higher flexibility in selecting the operating picture resolution and the video content intended for transmission over the mobile network.
26.5.9
Conclusions
The provision of video services over the new generation mobile networks is made possible through the enabling technologies supported by the error protection schemes and the multi-slotting capabilities of the radio interface. Conversational video applications are delay-sensitive and thus do not support retransmissions of corrupted video data. To provide a user-acceptable video quality, the video application must employ an error resilience mechanism in conjunction with the physical layer channel coding schemes. A wide range of error resilience techniques have been developed in recent video compression algorithms and their annexed versions. The use of error resilience techniques for supporting the provision of video services over mobile networks helps enhance the perceptual quality, especially at times where the mobile channel is suffering from low C=I ratios resulting from high BERs and radio block loss ratios.
References 1. ISO=IEC JTC1 10918 & ITU-T Rec. T.81: Information Technology—Digital Compression and coding of continuous-tone still images: Requirements and guidelines, 1994. 2. CCITT Recommendation H.261: Video Codec for audiovisual services at p 3 64 kbit=s, COM XV-R 37-E, 1990. 3. ISO=IEC CD 11172: Coding of moving pictures and associated audio for digital storage media at 1.5 Mbit=s, December 1991. 4. ISO=IEC CD 13818-2: Generic coding of moving pictures and associated audio, November 1993. 5. Draft ITU-T Recommendation H.263: Video coding for low bit rate communication, May 1996. 6. ISO=IEC JTC1=SC29=WG11N2802: Information technology—Generic coding of audiovisual objects— Part 2: Visual, ISO=IEC 14496-2, MPEG Vancouver meeting, July 1999. 7. Draft ITU-T Recommendation H.263 Version 2 (H.263þ): Video coding for low bit rate communications, January 1998. 8. Rapporteur for Q.15=16—Draft for H.263þþ, Annexes U, V and W to Recommendation H.263, ITU Telecommunication Standardisation Sector, November 2000. 9. Y. Wang, and Q.F. Zhu, ‘‘Error control and concealment for video communication: a review,’’ Proc. of the IEEE, Vol. 86, No. 5, pp. 974–997, May 1998.
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 50 11.10.2007 8:32pm Compositor Name: TSuresh
26-50
Digital Systems and Applications
10. R. Talluri, ‘‘Error resilient video coding in the MPEG-4 standard,’’ IEEE Communications Magazine, pp. 112–119, June 1998. 11. S. Wenger, G. Knorr, J. Ott, and F. Kossentini, ‘‘Error Resilience Support in H.263þ,’’ IEEE Transaction on Circuit and Systems for Video Technology, Vol. 8, No. 7, Nov. 1998. 12. G. Sullivan, ‘‘Rapporteur for Q.15=16—Draft for H.263þþ, Annexes U, V and W to Recommendation H.263,’’ ITU Telecommunication Standardisation Sector, November 2000. 13. Digital Cellular Telecommunications System (Phase 2þ), ‘‘General Packet Radio Service (GPRS); Overall description of the GPRS Radio Interface; Stage 2,’’ ETSI=SMG, GSM 03.64, V. 5.2.0, January 1998. 14. Tdoc SMG2 086=00, ‘‘Outcome of Drafting Group on MS EGPRS Rx Performance,’’ EDGE Drafting Group, January 2000. 15. H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, ‘‘RTP: A Transport Protocol for Real-Time Applications,’’ RFC1889, January 1996.
26.6
Pen-Based User Interfaces—An Applications Overview
Giovanni Seni, Jayashree Subrahmonia, and Larry Yaeger 26.6.1
Introduction
A critical feature of any computer system is its interface with the user. This has led to the development of user interface technologies such as mouse, touch screen, and pen-based input devices. They all offer significant flexibility and options for computer input; however, touch screens and mice cannot take full advantage of human fine motor control, and their use is mostly restricted to data selection, i.e., as pointing devices. On the other hand, pen-based interfaces allow, in addition to the pointing capabilities, other forms of input such as handwriting, gestures, and drawings. Because handwriting is one of the most familiar forms of communication, pen-based interfaces offer a very easy and natural input method. A pen-based interface consists of a fine-tipped stylus and a transducer device that allows the movement of the stylus to be captured. Such information is usually given as a time ordered sequence of x–y coordinates (digital ink) and an indication of inking, i.e., whether the pen is up or down. Digital ink can be passed on to recognition software that will convert the pen input into appropriate text or computer actions. Alternatively, the handwritten input can be organized into ink documents, notes, or messages that can be stored for later retrieval or exchanged through telecommunication means. Such ink documents are appealing because they capture information as the user composed it, including text in any mix of languages and drawings such as equations and graphs. Pen-based interfaces are desirable in mobile computing (e.g., personal digital assistants [PDAs]) and mobile phones because they are scalable. Only small reductions in size can be made to keyboards before they become awkward to use; however, if they are not shrunk in size, they can never be very portable. This is even more problematic as mobile devices develop into multimedia terminals with numerous functions ranging from agenda and address book to wireless web browser, because of the increasing amounts of text that must be entered. Voice-based interfaces may appear to be a solution, but they entail all the problems that cell phones already have introduced in terms of disturbing bystanders and loss of privacy. Furthermore, using voice commands to control applications such as a web browser can be difficult and tedious; by contrast, clicking on a link with a pen, or entering a short piece of text by writing, is very natural and takes place in silence. Recent hardware advances in alternative ink capture devices based on ultrasonic and optical tracking technologies have also contributed to the renewed interest in pen-based systems. These technologies avoid the need for pad electronics, thus reducing the cost, weight, and thickness of a pen-enabled system. Furthermore, they can sometimes be retrofitted to existing writing surfaces such as whiteboards [1,2] or used with paper, either specially marked [3,4] or plain [5].
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 51 11.10.2007 8:32pm Compositor Name: TSuresh
Mobile and Wireless Computing
26-51
This section reviews a number of applications, old and new, in which the pen can be used as a convenient and natural form of input. Our emphasis will be on the user experience, highlighting limitations of existing solutions and suggesting ways of improving them. We begin with a short review of currently available pen data acquisition technologies in Section 26.6.2. Then, in Section 26.6.3, we discuss handwriting recognition user interfaces for mobile devices and the need for making applications aware of the handwriting recognition process. In Section 26.6.4, we present Internet-related applications such as ink messaging. In Section 26.6.5 we discuss some methods for combining computer and paper inking, and then analyze some of the benefits and issues associated with working with digital ink on the computer. Finally, in Section 26.6.6, we present examples of synergistic interfaces being developed which combine the pen with other input modalities.
26.6.2
Pen Input Hardware
The function of the pen input hardware is to convert pen tip position over time into X,Y coordinates at a sufficient temporal and spatial resolution for handwriting recognition and visual presentation [6]. A pen input system consists of a combination of pen, tablet, and in some cases, paper. Examples of these include PDAs and some electronic or graphics tablets. Some of these have a glass surface sitting directly atop the display. These integrated tablet-plus-displays allow you to point and write where you are looking, and are fairly intuitive to use. Others, like many graphics tablets, have an input device that is separate from the display. These opaque tablets are less intuitive, requiring the user to write in one place while they look in another. Users new to opaque tablets often find them difficult to use for text, image editing, and the like, but with practice, users can become quite proficient with the devices. Paper-based systems provide another alternative for inputting digital ink, when used with special pens and, sometimes, special paper. They provide a natural-feeling writing surface, high resolution, and do not suffer from screen glare or parallax issues. However, they are necessarily always at a remove from the digital ink they are producing or from the user interface they might wish to control. Both tablet-based and paper-based pen input systems must detect and report when the pen tip is in contact with the writing surface. Paper systems face an additional challenge because of the need to keep a consistent, familiar pressure between pen tip and paper, while still sensing this contact. Pen hardware platforms available today use one of the following four kinds of technologies: 1. Magnetic tracking: Sequentially energized coils embedded in the pad couple a magnetic field into a pen tank circuit (coil and capacitor). Neighboring coils pick up the magnetic field from the pen, and their relative strength determines pen location [7]. The magnetic field can also be generated in the pen, requiring a battery that increases pen weight and thickness [8] but can help enable wireless tablets [9]. 2. Electric tracking: The conductive properties of a hand and normal pen can be used for tracking [10]. A transmitter electrode in the pad couples a small displacement current through the paper to the hand, down through the pen, and back through the paper to an array of receiver electrodes. Pen location is calculated as the center of mass of the received signal strengths. 3. Ultrasonic tracking: Ultrasonic tracking is based on the relatively slow speed of sound in air (330 m=s). A pen generates a burst of acoustic energy and electronics in the pad measure the time of arrival to two stationary ultrasonic receivers [1,2]. The ultrasonic transmission is either synchronized to the pad, typically with an infrared signal, or a third ultrasonic receiver is used [11]. 4. Optical tracking technology: Optical sensors are mounted in the tip of the pen [12,13] that can either provide relative tracking (like a mouse) or absolute position tracking (like a touch screen). Optics may also be mounted at the top of a pen [5] and used to determine the pen’s position relative to some constant frame of reference, such as the edges of a piece of paper or a tablet PC, though the accuracy of these devices is not yet demonstrated. Yet another approach captures a sequence of small images of handwriting and assembles them to reconstruct the entire page [14].
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 52 11.10.2007 8:32pm Compositor Name: TSuresh
26-52
26.6.2.1
Digital Systems and Applications
Discussion of Input Hardware
Magnetic tracking is the widest deployed system owing to high spatial resolution (>1000 dpi), acceptable temporal resolution (>100 Hz), reliability, and relatively low cost [7]. Magnetic and electric tracking require pad electronics and shielding, adding modest thickness and weight to portable devices. Electric tracking uses a normal pen but has no direct way to measure pen tip contact, and must rely on less reliable pen trajectory analysis [15]. Ultrasonic tracking does not require the same writing-surface electronics, thus potentially eliminating the added thickness issue. Relative tracking can reach 256 dpi, but absolute spatial resolution is limited to about 50 dpi because of the air currents that cause Doppler shifts. Optical tracking with tip-mounted sensors offers high spatial (>2000 dpi) and temporal (>200 Hz) resolution, and can utilize a self-contained pen that remembers everything written. Tiny dots, acting like bar codes, can provide absolute positioning, and can also encode page number, eliminating overwrites when a person forgets to tell the digitizer they have changed pages (a challenge that pen hardware systems with paper interfaces have to address). Optical tracking with top-mounted sensors is still in its infancy as of this writing, but prototype units are projected to produce >600 dpi relative spatial resolution, on the order of 0.25 mm absolute spatial resolution, and >100 Hz temporal resolution. Optical methods based on CMOS technology should lend themselves to low-power, low-cost designs.
26.6.3
Handwriting Recognition
Handwriting is a very well-developed skill that humans have used for over 5000 years as a means of communicating and recording information. With the widespread acceptance of computers and computer keyboards, the future role of handwriting in our culture might seem questionable. However, as we discussed in the introduction, a number of applications exist where the pen can be more convenient than a keyboard. This is particularly so in the mobile computing space where keyboards are not ergonomically feasible. Handwriting recognition is fundamentally a pattern classification task. The objective is to take an input graphical mark—the handwritten signal collected via a digitizing device—and classify it as one of a prespecified set of symbols. These symbols correspond to the characters or words in a given language encoded in a computerized representation such as ASCII (see Fig. 26.35). In this field, the term online has been used to refer to systems devised for the recognition of patterns captured with digitizing devices that preserve the pen trajectory; the term offline refers to OCR (optical character recognition) techniques, which instead take as input a static two-dimensional image representation, usually acquired by means of a scanner. Handwriting recognition systems can be further grouped according to the constraints they impose on the user with respect to writing style (see Fig. 26.36a). The more restricted the allowed handwritten input, the easier the recognition task and the lower the required computational resources [16]. At the most restrictive end of the spectrum, in the boxed-discrete style, users write one character at a time
Coordinate sequence {(X(t ), Y(t ), Z(t ))}
Recognition system
Lexicon = {..., nearly, ...}
Ranked results ... “nearly ”, 0.70 ... Language Model
FIGURE 26.35 The handwriting recognition problem. The image of a handwritten character, word, or phrase is classified as one of the symbols, or symbol strings, from a known list. Some systems use knowledge about the language in the form of dictionaries (or Lexicons) and frequency information (i.e., language models) to aid the recognition process. Typically, a score is associated with each recognition result.
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 53 11.10.2007 8:32pm Compositor Name: TSuresh
Mobile and Wireless Computing
26-53
FIGURE 26.36 Different handwriting styles. In (a), Latin characters are used from top to bottom, according to the presumed difficulty in recognition. (Adapted from Tappert, C.C., Adaptive on-line handwriting recognition, In 7th International Conference on Pattern Recognition, Montreal, Canada, 1984.) In (b), the Graffiti unistroke (i.e., written with a single pen trace) alphabet that restricts characters to a unique prespecified way that simplifies automatic recognition, the square dot indicates starting position of the pen.
within predefined areas. This removes one difficult step—character segmentation (the partitioning of strokes into letters)—from the recognition process. Further recognition accuracy can be achieved by requiring users to adhere to rules that restrict character shapes so as to minimize letter similarity (see Fig. 26.36b). Of course, such techniques require users to learn a new alphabet. At the least restrictive end of the spectrum, in the mixed style, users are allowed to write words or phrases the same way they do on paper—in their own personal style—whether they print, write in cursive, or use a mixture of the two. Recognition of mixed-style handwriting is a difficult task owing to ambiguity in segmentation and large variations in letter style. Segmentation is complex because it is often possible to wrongly break up letters into parts that are in turn meaningful (e.g., the cursive letter d can be subdivided into letters c and l). Variability in letter shape is partly due to of coarticulation (the influence of one letter on another), and the presence of ligatures (connected characters), which frequently give rise to unintended, spurious letters being detected in the script. Writing styles also vary substantially from individual to individual. In addition to the writing style constraints, the complexity of the recognition task is also determined by dictionary-size and writer-adaptation requirements. The size of a dictionary or language model can vary from extremely small (for tasks such as state name recognition) to huge or even open-ended (for tasks like proper name recognition). In open vocabulary recognition, any sequence of letters is a plausible recognition result, which is the most difficult scenario for a recognizer. Yet, open-ended, out-of-dictionary writing is essential for many situations, and the best recognizers balance probabilities at the letter and the word levels to prefer in-dictionary solutions while still permitting out-of-dictionary solutions. In the writer-adaptation dimension, systems capable of out-of-the box recognition are called writer independent; i.e., they can recognize the writing of many writers. This gives a good average performance across different writing styles. However, there is a considerable improvement in recognition accuracy that can be obtained by customizing the letter models of the system to a writer’s specific writing style. Recognition in this case is called writer dependent. Despite these challenges, significant progress has been made in the building of writer-independent systems capable of handling unconstrained text and using dictionary sizes of 20,000 words [17–19] and more. The writer-independent recognizer that shipped in second generation Newton PDAs (the Print Recognizer, circa 1996), and was widely regarded as the world’s first genuinely usable handwriting recognizer, used a language model with over 75,000 words in any combination with close to 200 prefixes and over 500 suffixes, plus regular expression grammars for punctuation, dates, times, phone numbers, postal codes, and money, in addition to being able to write completely out-of-dictionary sequences of characters [20]. When updated as Mac OS X’s Inkwell [21–23], in 2002, explicit support for writing
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 54 11.10.2007 8:32pm Compositor Name: TSuresh
26-54
Digital Systems and Applications
URLs was added. Handwriting recognition is also an integral part of Microsoft’s Windows XP Tablet PC Edition [24], introduced in 2002 for Tablet PCs, improved and extended in 2005’s Service Pack 2, and also driving the so-called Ultra-Mobile PCs [25] introduced in 2006. For a comprehensive survey of the basic concepts behind written language recognition algorithms (see Refs. [26–28]). 26.6.3.1
Character-Based Interfaces
In Fig. 26.37 examples of character-based user interfaces for handwritten text input are presented that are representative of those found on many mobile devices. Because of the limited CPU and memory resources available on these platforms, handwritten input is often restricted to the boxed-discrete style, in which one character is entered at a time. The following are the additional highlights of the user interface on these text input methods: 1. Special input area: Users are not allowed the freedom of writing anywhere on the screen. Instead, there is an area of the screen specially designated for the handwriting user interface, whether for text input or control. This design choice offers the following advantages: a. No toggling between edit=control and ink mode: Pen input inside the input method area is treated as ink to be recognized by the recognizer; pen input outside this area is treated as mouse events (for pressing on-screen buttons, selecting text, scrolling, etc.). Without this separation, special provisions, sometimes intrusive and nonintuitive, must be taken to distinguish between the two pen modes. b. Better user control: Within the specially designated writing window, it is possible to have additional GUI (graphical user interface) elements that help the user with the input task. For instance, there might be buttons for common edit keys such as backspace, newline, and delete. Similarly, a list of recognition alternates can be easily displayed and selected from. This is
(a)
(b)
FIGURE 26.37 Character-based text input method on today’s mobile devices. In (a), user interface for English character input on a cellular phone. In (b), user interface for Chinese character input on a two-way pager.
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 55 11.10.2007 8:32pm Compositor Name: TSuresh
Mobile and Wireless Computing
26-55
particularly valuable because top-N recognition accuracy—a measure of how often the correct answer is among the highest ranked N results—is generally much higher than top-1 accuracy. c. Consistent UI metaphor: Despite its ergonomic limitations, an on-screen keyboard is generally available as one of the text input methods on the device. Using a special input area for handwriting makes the user interface of alternative text entry methods similar. 2. Modal input. The possibilities of the user’s input are selectively limited in order to increase recognition accuracy. Common modes include digits, symbols, uppercase letters, and lowercase letters in English, or traditional versus simplified in Chinese. By limiting the number of characters against which given input ink is matched, the opportunities for confusion and misrecognition are decreased, thus improving recognition accuracy. Writing modes represent another trade-off between making life simpler for the system or simpler for the user (and can cause difficulties if, for example, the system is expecting digits for a phone number field, but the user tries to enter 1–800–GO–FEDEX). 3. Natural character set. It is possible to use any character writing style commonly used in the given language, no need to learn a special alphabet. Characters can be multi-stroke, i.e., written with more than one pen trace. 4. Multi-boxed input. Having the user write every character in its own box provides valuable information for case disambiguation, helping the recognizer distinguish between an uppercase S and a lowercase s, between C and c, and so on, based simply on letter height relative to box height. It also almost entirely eliminates the character segmentation problem, since the strokes in a single box comprise a single character. In addition, when multi-stroke input is allowed, end of writing is generally detected by use of a timer that is set after each stroke is completed; the input is deemed concluded if a set amount of time elapses before any more input is received in the writing area. This time-out scheme is sometimes confusing to users, and gives the perception that recognition takes longer than it actually does. Multiple boxes give better performance in this regard because a character in one box can be concluded if input is received in another box, removing the need to wait for the timer to finish. (However, using a word-level language model, rather than recognizing just individual characters, almost always improves recognition accuracy, which implies that the best hypothesis about a given character may change multiple times as recognition proceeds, which can also be confusing to users. And if one’s tablet technology provides entering- and exiting-proximity data, indicating when the pen is near the tablet surface, the end-of-writing timer can be conveniently short-circuited by terminating words or phrases when the pen leaves proximity of the tablet.) Of all the restrictions imposed on users by these character-based input methods, modality is the one where user feedback has been strongest: people want modeless input. One challenge facing modeless recognition is the unavoidable increase in perplexity—the range and variability, and thus the confusability, in the set of possible answers—that results from having to always allow recognition of all modes—letters, numbers, symbols, words, dates, times, etc.—simultaneously. Another challenge results from the fact that distinguishing between letters that have very similar forms across modes can be virtually impossible without additional information. In English orthography, for instance, there are letters for which the lowercase version of the character is merely a smaller version of the uppercase version; examples include Cc, Oo, Ss, Uu, Ww, etc. Simple attempts at building modeless character recognizers can result in a disconcerting user experience because uppercase letters, or digits, might appear inserted into the middle of lowercase text. Such m1Xed ModE w0rdS (mixed mode words) look like gibberish to users. In usability studies, the authors have further found that as the text data entry needs on wireless PDA devices shifts from short address book or calendar items to longer notes or e-mail messages, users deem writing one letter at a time to be inconvenient and unnatural.
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 56 11.10.2007 8:32pm Compositor Name: TSuresh
26-56
26.6.3.2
Digital Systems and Applications
More Natural User Interfaces
One known way of dealing with the character confusion difficulties described in the previous section is to use contextual information in the recognition process. At the simplest level this means recognizing characters in the context of their surrounding characters and taking advantage of visual clues derived from word shape. One simple technique is to maintain a running estimate of capital letter height (postrecognition, so even lowercase letters can inform the estimate), and use this to help disambiguate case and even numbers versus letters [20]. More sophisticated applications of this geometric context might look at relative character baselines, heights, sizes, and adjacency between pairs of adjacent characters [29]. Beyond shape, the interpretation of the textual context in which letters are recognized can assist in the disambiguation process. Even a simple bias toward case and mode consistency—so lowercase is more likely to follow lowercase, everywhere except the beginnings of words, and numbers are more likely to follow numbers—can improve recognition accuracy [20]. At a higher level, contextual knowledge can be in the form of lexical constraints, e.g., a dictionary of known words in the language may be used to restrict or guide interpretations of the input ink. These ideas naturally lead to the notion of a word-based text input method. By word, we mean a string of characters that, if printed in text using normal conventions, would be surrounded by white-space characters (see Fig. 26.38a). Consider a mixed-case word recognition scenario where the size and position of a character, relative to other characters in the word, is taken into account during letter identification (see Fig. 26.38b). Such additional information would allow us to disambiguate between the lowercase and upper case version of letters that otherwise are very much alike. Figure 26.38b also illustrates how relative position within
(b)
(a)
(c)
FIGURE 26.38 Word-based text input method for mobile devices. In (a), a user interface prototype. In (b), an image of the mixed-case word Wow, where relative size information can be used for distinguishing among the letters in the Ww pair. In (c), an image of the digit string 90187 where ambiguity in the identity of the first three letters can be resolved after identifying the last two characters as digits.
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 57 11.10.2007 8:32pm Compositor Name: TSuresh
26-57
Mobile and Wireless Computing
the word could enable us to correctly identify trailing punctuation marks such as periods and commas. A different kind of contextual information can be used to enforce some notion of consistency among the characters within a word. For instance, we could have a digits-string recognition context that favors word hypotheses where all the characters can be viewed as digits; in the image example of Fig. 26.38c, the recognizer would thus rank string ‘‘90187’’ higher than string ‘‘gol87.’’ The use of lexical dictionaries (really just word lists, though ubiquitously referred to as dictionaries) can further guide word-level recognition. For example, despite an observed tremendous variability in how people write their letters u and v, the presence of the word jump in a recognizer’s language model can predispose it to produce jump, instead of jvmp. (Not to mention preferring mixed over m1Xed.) Of course, if a user actually wants to write jvmp, she may then have to write very carefully indeed, in order to overcome this bias, if and only if the recognizer allows writing outside of its dictionaries. A strictly or rigidly applied language model might refuse to recognize words not in its dictionaries. This can produce disturbing and sometimes comical whole word substitutions when a word is misrecognized; e.g., a person might write ‘‘catching on?’’ and get back recognition results such as ‘‘egg freckles.’’ The Newton PDA’s first generation recognizer suffered from this problem. Together with an untenably small dictionary of just about 10,000 words, this resulted in the Doonesbury effect—those whole word substitutions—so named for the lampooning of the Newton in Gary Trudeau’s Doonesbury cartoon strip (using the ‘‘egg freckles’’ example above, among others). A loosely applied language model in the second-generation Newton Print Recognizer allowed users to write outside the dictionaries [20] (besides having much larger dictionaries and generally higher recognition accuracy), and produced much better results for users. This looseness was achieved by incorporating a regular expression grammar that allowed any character anywhere, but at a much lower probability than the word-based parts of the overall language model. In addition to the modeless input enabled by a word-based input method, there is a writing throughput advantage over character-based ones. In Fig. 26.39 we show the results of a timing experiment where eight users were asked to transcribe a 42-word paragraph using our implementation of both kinds of input methods on a keyboardless PDA device. The paragraph was derived from a newspaper story and contained mixed-case words and a few digits, symbols, and punctuation marks. The length of the text was assumed to be representative of a long message that users might want to compose on such devices. For comparison purposes, users were also timed with a standard on-screen
+ 350
+
Seconds
300
250
+
200
150
On-screen keyboard
Word-based input method
Character-based input method
FIGURE 26.39 Boxplots of time to enter a 42-word message using three different text input methods on a PDA device: an on-screen QWERTY keyboard, a word-based handwriting recognizer, and a character-based handwriting recognizer; median writing throughput was 15.9, 13.6, and 11.4 words=min, respectively.
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 58 11.10.2007 8:32pm Compositor Name: TSuresh
26-58
Digital Systems and Applications
(software) keyboard. Each timing experiment was repeated three times. The median times were 158, 185, and 220 s for the keyboard, word-based, and character-based input methods, respectively. Thus, entering text with the word-based input method was, on average, faster than using the character-based method. In this study, the word-based input method did not have, on average, a time advantage over the soft keyboard; however, the user who was fastest with the word-based input method (presumably someone for whom the recognition accuracy was very high and thus had few corrections to do) was able to complete the task in 141 s, which is below the median soft keyboard time. Furthermore, the authors believe that the time gap between these two input methods will be (and may have been already) reversed with improved recognition accuracy and an intuitive, modeless means of error correction, such as were present in the later Newtons and in modern Tablet PCs. We also expect handwriting to offer improvements relative to soft-keyboard text entry in the case of European languages, which have accent marks requiring additional key presses on a soft keyboard. As one can expect, the advantages of nonmodality and speed associated with a word-based input method over a character-based one comes at the expense of additional computational resources. Some word-based recognition engines have been shown to require a 103 increment in MIPS and memory resources compared to character-based engines. One should also say that, as evidenced by the variability in the timing data shown in the above plots, there isn’t a single input method that works best for every user. It is thus important to offer users a variety of input methods from which to choose. 26.6.3.3
Write-Anywhere Interfaces
In the same way that writing words, as opposed to writing one letter at a time, constitutes an improvement in terms of naturalness of the user experience, we must explore recognition systems capable of handling continuous handwriting such as phrases. For any kind of computer—and especially the kind of mobile devices we have been considering here, with very limited screen real estate—this idea leads to the notion of a write-anywhere interface in which the user is allowed to write anywhere on the screen; i.e., on top of any application and system element on the screen (see Fig. 26.40). A write-anywhere text input method is also appealing because there is no special inking area covering up part of the application in the foreground. However, a special mechanism is needed to distinguish pen movement events intended to manipulate user interface elements such as buttons, scrollbars, and menus (i.e., edit=control=mouse mode) and pen events corresponding to handwriting (i.e., ink=pen mode). The solution typically involves a tap and hold scheme wherein the pen has to be maintained down without dragging it for a certain amount of time in order to get the stylus to act temporarily as a mouse, otherwise its input is treated as ink. Alternative methods for distinguishing mousing from inking include treating the pen as a mouse, except when users FIGURE 26.40 Write-anywhere text input method for hold down a particular barrel button on the side mobile devices. Example of an Address Book application of the pen or a particular key on the keyboard, with the Company field appearing with focus. Handwhereupon they get ink. The barrel button written input is not restricted to a delimited area of the mousing-vs-inking option showed up in Mac screen but rather can occur anywhere. The company name Data Warehouse has been written. OS X’s Inkwell [21] as of the Tiger (OS X 10.4)
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 59 11.10.2007 8:32pm Compositor Name: TSuresh
Mobile and Wireless Computing
26-59
release, as did an additional, unusual option that permitted inking in the air above the tablet surface— while the pen is in close proximity to, but not actually pressing against the tablet—whenever the barrel button is pressed. These options, plus the use of a key on the keyboard to trigger inking were previously introduced in the Motion [30] professional video creation and editing application on the Mac. When using the more common method of distinguishing mousing from inking—requiring the user to tap and hold to perform mousing—there is another potential user interface problem, which is that users expect certain user interface elements to perform immediate actions. Unlike a tap and drag operation, such as one might use to make a text selection, and for which a delayed action is tolerable, when users tap in a scrollbar or tap an open button, they expect the corresponding action to take place immediately. The same is likely true if a user attempts to drag a window by tapping and dragging the window’s titlebar. To accommodate these expectations, a good pen interface should add two more important mouse-vs-ink disambiguations: (1) identify tapping (by the short time between pen down and pen up, and the short distance traveled) and treat the input as a mousing action, and (2) identify certain controls or regions of the screen as instant mousers. If the pen lands in an instant mouser, it immediately behaves as a mouse, instead of an ink pen. The first accommodation takes care of simple issues like tapping on a scrollbar or button, but the second accommodation is required to support immediate dragging of windows, scrollbar thumbs, movable icons, and the like. One must be careful, of course, not to identify too many user interface elements as instant mousers, lest the user feel that the decision about where to write on the screen is overly complicated. An additional user interface issue with a write-anywhere text input paradigm is that there are usually no input method control elements visible anywhere on the screen. This frees up precious screen real estate, but hides functionality. For instance, access to recognition alternates might require a special pen gesture. As such, a write-anywhere interface will generally have more appeal to advanced users. Furthermore, recognition in the write-anywhere case is more difficult because there is no implicit information on word separation, orientation, or expected size of the text. 26.6.3.4
Scrolling Input Window
An interesting and innovative approach to pen input on devices with severely constrained screen sizes is to allow the pen input window to scroll while the user is writing [31]. The originators of this idea refer to this approach as a treadmill technique, for the way the input window moves continuously underneath the user’s writing. For a language written left-to-right, such as English, then, the inking area would continuously scroll to the left, so users could continue to write fairly normally, but in place on the screen. This combines the advantages of a dedicated writing area with the advantages of a seemingly large—in fact, unlimited—writing area. Though users must adapt their writing style somewhat and this technology has yet to make it out of the research lab, the adjustments users must make seem minor and the technique seems promising. 26.6.3.5
Recognition-Aware Applications
Earlier in this section, we discussed how factors such as segmentation ambiguity, letter co-articulation, and ligatures make exact recognition of continuous handwritten input a very difficult task. To illustrate this point, consider the image shown in Fig. 26.41 and the set of plausible interpretations given for it. Can we choose with certainty one of these recognition results as the correct one? Clearly, additional information, not contained within the image, is required to make such a selection. One such source of information already mentioned is the dictionary, or lexicon, for constraining the letter strings generated by the recognizer. At a higher-level, information from the surrounding words can be used to decide, for example, between a verb and a noun word possibility. It is safe to say that the more constraints explicitly available during the recognition process, the more ambiguity in the input that can be automatically resolved. Less ambiguity results in higher recognition accuracy and thus an improved user experience.
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 60 11.10.2007 8:32pm Compositor Name: TSuresh
26-60
Digital Systems and Applications
For many common applications in PDA devices, e.g., contacts, agenda, and web browser, it is possible to specify the words log dug and patterns of words that can be entered lug lag in certain data fields. Examples of structured data fields are telephone numbers, clay clug zip codes, city names, dates, times, URLs, etc. In order for recognition-based input clcrj doij methods to take advantage of this kind of contextual information, the text input (a) (b) framework on PDA devices needs to FIGURE 26.41 Inherent ambiguity in continuous handwritallow applications to specify the expected ing recognition. In (a), a sample image of a handwritten word. context for a given input field. When text In (b), possible recognition results, strings not in the English is entered using a keyboard, lexical context lexicon are in italics. is usually not required to obtain accurate text, although certain very precisely defined input fields, such as social security numbers, credit card numbers, and state two-letter codes, might benefit from such constraints. However, operating systems and applications employed largely on pen-based devices can improve the user experience significantly by supporting and providing specific contexts for text input fields and communicating them to recognizers. One typically uses a grammar to define the permitted strings in a language, e.g., the language of valid telephone numbers. A grammar consists of a set of rules or productions specifying the sequences of characters or lexical items forming allowable strings in the defined language. Two common classes of grammars are BNF grammar or context-free grammar and regular grammar (see [32] for a formal treatment). Grammars are also used in the field of speech recognition and recently the W3C (World Wide Web Consortium) Voice Browser Working Group has suggested an XML-based syntax for representing BNF-like grammars [33]. In Fig. 26.42 we show a fragment of a possible grammar for defining telephone number strings. In an ink-aware text input framework, this grammar, together with the handwritten ink, could be passed along to the recognition engine when an application knows that the user is expected to enter a telephone number. clog
dog
FIGURE 26.42 Example of an XML grammar-defining telephone numbers and written as per the W3C Voice Working Group Specification. There are four private rule definitions that are combined to make the main rule called phone-num.
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 61 11.10.2007 8:32pm Compositor Name: TSuresh
Mobile and Wireless Computing
26-61
An alternative approach, one used in the Newton for example, is for the text input framework to provide a set of common classes of input, such as phones, dates, times, general text, etc., and let the application specify one or more of these categories, as appropriate. Given this kind of contextual information, recognizers should, in general, use the information to boost the probability of the relevant text strings, but should not, unless expressly requested, make the constraint a rigidly applied one. And it may be better to factor the soft constraints appropriate to a recognizer from the hard constraints the text input framework or the application itself might need to apply, in order to verify a correctly formatted credit card number or two-letter state code for example. Most text input fields are not so rigid, and users may be frustrated when they try to enter 1–800MY-APPLe in a phone number field if the recognizer refuses to recognize anything but digits (and hyphens and parentheses). Information about how the ink was collected, such as resolution and sampling rate of the capture device, whether writing guidelines or other writing size hints were used, spatial relationships to nearby objects in the application interface, etc., should also be made available to the recognition engine for improved recognition accuracy. To be fully ink- or recognition-aware, the text input framework and applications need to provide an easy, in-place, modeless correction mechanism. This is best supported by overwriting and a top-N list of recognition alternatives. Overwriting is the ability to write directly over the text to be replaced, making a separate selection process unnecessary, and is particularly convenient if the interface permits overwriting individual characters. A list of recognition alternatives leverages the fact that recognition accuracy for, say, the top-5 alternatives is usually substantially higher than for just the top-1 alternative. Either the recognizer or the text input framework may also explicitly substitute particularly common or convenient alternatives in these lists, such as repeating the top choice but with altered case for the first letter. One user interface technique for accessing such an alternatives list with a write-anywhere recognition model is to expose the alternatives in a pop-up menu when the user taps twice and holds down the pen on the second tap. Another technique is to take advantage of a given system’s existing contextual menu access method, such as right-clicking (Windows) or control-clicking (Macintosh) on a recognized word, to bring up the recognition alternates menu.
26.6.4
Ink and the Internet
Digital ink does not always need to be recognized in order to be useful. Two daily life applications where users take full advantage of the range of graphical representations that are possible with a pen are messaging, as when we leave someone a post-it note with a handwritten message, and annotation, as when we circle some text in a printed paragraph or make a mark in an image inside of a document. This subsection discusses Internet-related applications that will enable similar functionality. Both applications draw attention to the need for a standard representation of digital ink that is appropriate in terms of efficiency, robustness, and quality. 26.6.4.1
Ink Messaging
Two-way transmission of digital ink, possibly wireless, offers PDA users a compelling new way to communicate. Users can draw or write with a stylus on the PDA screen to compose a note in their own handwriting. Such an ink note can then be addressed and delivered to other PDA users, e-mail users, or fax machines. The recipient views the message as the sender composed it, including drawings and text in any mix of languages (see Fig. 26.43). In the context of mobile-data communications it is important for the size of such ink messages to be small. There are two distinct modes for coding digital ink: raster scanning and curve tracing [34,35]. Facsimile (fax machine) coding algorithms belong to the first mode, and exploit the correlations within consecutive scan lines. Chain coding (CC), belonging to the second mode, represents the pen trajectory as a sequence of transitions between successive points in a regular lattice. It is known that curve tracing algorithms result in a higher coding efficiency if the total trace length is not too long. Furthermore, use of a raster-based technique implies the loss of all time-dependent information.
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 62 11.10.2007 8:32pm Compositor Name: TSuresh
26-62
Digital Systems and Applications
FIGURE 26.43 Example of ink messaging application for mobile devices. Users can draw or write with a stylus on the device screen to compose an e-mail in their own handwriting; no automatic recognition is necessarily involved.
26.6.4.2
Message sizes of about 500 bytes have been reported for messages composed in a typical PDA screen size, using a CC-based algorithm known as multi-ring differential chain coding (MRDCC) [36]. MRDCC is attractive for transmission of ink messages in terms of data syntax, decoding simplicity, and transmission error control; however, MRDCC is lossy, i.e., the original pen trajectory cannot be fully recovered. If exact reconstructability is important, a lossless compression technique is required. This is likely to be the case if the message recipient might wish to run verification or recognition algorithms on the received ink, e.g., if the ink in the message corresponds to a signature that is to be used for computer authentication. One example of a lossless curve-tracing algorithm proposed by the ITU (international telecommunication union) is zone coding [37]. Our unpublished evaluation of zone coding, however, reveals there is ample room for improvement. Additional requirements for an ink messaging application include support for embedded ASCII text, support for embedded basic shapes (such as rectangles, circles, and lines), and support for different pen-trace attributes (such as color and thickness).
InkML and SMIL
SMIL, pronounced smile, stands for synchronized multimedia integration language. It is a W3C recommendation [38] defining an XML compliant language that allows a spatially and temporally synchronized description of multimedia presentations. In other words, it enables authors to choreograph multimedia presentations where audio, video, text, and graphics are combined in real-time. A SMIL document can also interact with a standard HTML page. SMIL documents are becoming common on the web as part of streaming technologies [39,40]. The following are the basic elements in a SMIL presentation: a root-layout, which defines things like the size and color of the background of the document; a region, which defines where and how a media element such as an image can be rendered, e.g., location, size, overlay order, scaling method; one or more media elements such as text, img, audio, and video; means for specifying a timeline of events, e.g., seq, and par indicate a block of media elements that will all be shown sequentially or in parallel, respectively, dur gives an explicit duration, begin delays the start of an element relative to when the document began or the end of other elements; means for skipping some part of an audio or a video (clip-begin and clipend); means for adapting the behavior of the presentation to the end-user system capabilities (switch); means for freezing a media element after its end (fill); and a means for hyperlinking (a). For a complete introduction and tutorial see Refs. [41,42]. Digital ink is not currently supported as a SMIL native media type. One option would be to convert the ink into a static image, say in GIF format, and render it as an img element; however, this would preclude the possibility of displaying the ink as a continuous media (like an animation). Another option is to use the SMIL generic media reference ref (see Fig. 26.44); this option requires the existence of an appropriate MIME content-type=subtype. Another W3C working group has produced a working draft of an Ink Markup Language, InkML [43], which can serve as the data format for representing ink entered with an electronic pen or stylus. InkML supports the input and processing of handwriting, gestures, sketches, music, and other notational languages in web-based and non-web-based applications, and provides a common format for the exchange of ink data between hardware devices and between software components such as handwriting
Vojin Oklobdzija/Digital Systems and Applications 6195_C026 Final Proof page 63 11.10.2007 8:32pm Compositor Name: TSuresh
26-63
Mobile and Wireless Computing
(a) <smil> <meta name ¼ ‘‘title’’ content ¼ ‘‘Ink and SMIL’’ =>