1 INTRODUCTION
1
Virtual machine Mn, with machine language Ln
Level 3
Virtual machine M3, with machine language L3
…
Level n
Level 2
Level 1
Level 0
Programs in Ln are either interpreted by interpreter running on a lower machine, or are translated to the machine language of a lower machine
Virtual machine M2, with machine language L2
Programs in L2 are either interpreted by interpreters running on M1 or M0, or are translated to L1 or L0
Virtual machine M1, with machine language L1
Programs in L1 are either interpreted by an interpreter running on M0, or are translated to L0
Actual computer M0, with machine language L0
Programs in L0 can be directly executed by the electronic circuits
Figure 1-1. A multilevel machine.
Level 5
Problem-oriented language level Translation (compiler)
Level 4
Assembly language level Translation (assembler)
Level 3
Operating system machine level Partial interpretation (operating system)
Level 2
Instruction set architecture level Interpretation (microprogram) or direct execution
Level 1
Microarchitecture level Hardware
Level 0
Digital logic level
Figure 1-2. A six-level computer. The support method for each level is supported is indicated below it (along with the name of the supporting program).
*JOB, 5494, BARBARA *XEQ *FORTRAN
FORTRAN program
*DATA
Data cards
*END
Figure 1-3. A sample job for the FMS operating system.
2222222222222222222222222222222222222222222222222222222222222222222222222222222222222 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Year 1 Name Made by Comments 1 1 1 1 1 1834 1 Analytical Engine 1 Babbage 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 First attempt to build a digital computer 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1936 Z1 Zuse First working relay calculating machine 1 1 1 1 1 1943 1 COLOSSUS 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 British gov’t 1 First electronic computer 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1944 Mark I Aiken First American general-purpose computer 1 1 1 1 1 1946 1 ENIAC I 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Eckert/Mauchley 1 Modern computer history starts here 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1949 EDSAC Wilkes First stored-program computer 1 1 1 1 1 1951 1 Whirlwind I 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 M.I.T. 1 First real-time computer 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1952 IAS Von Neumann Most current machines use this design 1 1 1 1 1 1960 1 PDP-1 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 DEC 1 First minicomputer (50 sold) 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1961 1401 IBM Enormously popular small business machine 1 1 1 1 1 1962 1 7094 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 IBM 1 Dominated scientific computing in the early 1960s1 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Burroughs 1 First machine designed for a high-level language 1 1963 1 B5000 1 1 1 1 1 1964 1 360 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 IBM 1 First product line designed as a family 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1964 6600 CDC First scientific supercomputer 1 1 1 1 1 1965 1 PDP-8 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 DEC 1 First mass-market minicomputer (50,000 sold) 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1970 PDP-11 DEC Dominated minicomputers in the 1970s 1 1 1 1 1 1974 1 8080 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Intel 1 First general-purpose 8-bit computer on a chip 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1974 CRAY-1 Cray First vector supercomputer 1 1 1 1 1 1978 1 VAX 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 DEC 1 First 32-bit superminicomputer 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1981 IBM PC IBM Started the modern personal computer era 1 1 1 1 1 1985 1 MIPS 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 MIPS 1 First commercial RISC machine 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1987 SPARC Sun First SPARC-based RISC workstation 1 1 1 1 1 1990 1 RS6000 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 IBM 1 First superscalar machine 1
Figure 1-4. Some milestones in the development of the modern digital computer.
Memory
Control unit
Arithmetic logic unit
Input
Output Accumulator
Figure 1-5. The original von Neumann machine.
CPU
Memory
Console terminal
Paper tape I/O
Other I/O
Omnibus
Figure 1-6. The PDP-8 omnibus.
2 222222222222222222222222222222222222222222222222222222222222222222222222222 12 222222222222222222222222222222222222222222222222222222222222222222222222222 1 Model 30 1 Model 40 1 Model 50 1 Model 65 1 Property 1 1 1 1 1 1 performance 1 3.5 1 10 21 222222222222222222222222222222222222222222222222222222222222222222222222222 12Relative 1 1 1 1 12Cycle 1 1 1 1 1 time (nsec) 1000 625 500 250 222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 memory (KB) 64 256 512 2 222222222222222222222222222222222222222222222222222222222222222222222222222 1 Maximum 1 1 256 1 1 1 12Bytes 1 1 1 1 fetched per cycle 1 2 4 16 2222222222222222222222222222222222222222222222222222222222222222222222222221 1 1 1 1 1 1 number of data channels 1 3 3 4 6 12Maximum 222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1
Figure 1-7. The initial offering of the IBM 360 product line.
100000000
16M 64M
10000000
1M
Transistors
1000000 100000
256K
4K
10000 1000
4M
64K 16K 1K
100 10 1 1965
1970
1975
1980
1985
1990
Figure 1-8. Moore’s law predicts a 60 percent annual increase in the number of transistors that can be put on a chip. The data points given in this figure are memory sizes, in bits.
1995
22222222222222222222222222222222222222222222222222222222222222222222222 122222222222222222222222222222222222222222222222222222222222222222222222 1 Price ($) 1 1 Type Example application 1 1 1 1 Disposable computer 1 122222222222222222222222222222222222222222222222222222222222222222222222 1 1 Greeting cards 1 122222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Embedded computer 10 Watches, cars, appliances 1 1 1 1 Game computer 2 1 2222222222222222222222222222222222222222222222222222222222222222222222 1 100 1 Home video games 1 122222222222222222222222222222222222222222222222222222222222222222222222 1 1 Desktop or portable computer 1 Personal computer 1K 1 1 1 1 Server 10K Network server 2 2222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 Collection of Workstations 1 100K 1 Departmental minisupercomputer 1 122222222222222222222222222222222222222222222222222222222222222222222222 1 Mainframe 1 1 1M 1 Batch data processing in a bank 2 1 2222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Supercomputer 10M Long range weather prediction 122222222222222222222222222222222222222222222222222222222222222222222222 1 1 1
Figure 1-9. The current spectrum of computers available. The prices should be taken with a grain (or better yet, a metric ton) of salt.
2222222222222222222222222222222222222222222222222222222222222222222222222222222222222 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Date 1 MHz Transistors 1 1 Memory 1 1 Chip Notes 1 1 1 1 1 1 1 4004 2,300 1 640 1 First microprocessor on a chip 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 4/1971 1 0.108 1 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 8008 4/1972 0.108 3,500 16 KB First 8-bit microprocessor 1 1 1 1 1 1 1 8080 2 1 6,000 1 64 KB 1 First general-purpose CPU on a chip 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 4/1974 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 6/1978 1 1 8086 5-10 1 29,000 1 1 MB 1 First 16-bit CPU on a chip 1 1 1 1 1 1 1 8088 5-8 1 29,000 1 1 MB 1 Used in IBM PC 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 6/1979 1 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 80286 2/1982 8-12 134,000 16 MB Memory protection present 1 1 1 1 1 1 1 80386 4 GB 1 First 32-bit CPU 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 10/1985 1 16-33 1 275,000 1 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 80486 4/1989 25-100 1.2M 4 GB Built-in 8K cache memory 1 1 1 1 1 1 1 Pentium 3.1M 1 4 GB 1 Two pipelines; later models had MMX 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 3/1993 1 60-233 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Pentium Pro 1 3/1995 1 150-200 1 5.5M 1 4 GB 1 Two levels of cache built in 1 1 1 1 1 1 1 Pentium II 1 5/1997 1 233-400 1 7.5M 1 4 GB 1 Pentium Pro plus MMX 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1
Figure 1-10. The Intel CPU family. Clock speeds are measured in MHz (megahertz) where 1 MHz is 1 million cycles/sec.
Pentium II
10M
Pentium
1M
Transistors
80286 100K
Moore's law
8080 4004 1K 8008
10K
80486
Pentium Pro
80386
8086 8088
100 10 1 1970 1972 1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 Year of introduction
Figure 1-11. Moore’s law for CPU chips.
2 COMPUTER SYSTEMS ORGANIZATION
1
Central processing unit (CPU)
Control unit Arithmetic logical unit (ALU)
I/O devices
Registers
…
…
Main memory
Disk
Printer
Bus
Figure 2-1. The organization of a simple computer with one CPU and two I/O devices.
A+B
A
Registers
B
A
B
ALU input register ALU input bus
ALU
A+B
ALU output register
Figure 2-2. The data path of a typical von Neumann machine.
public class Interp { static int PC; static int AC; static int instr; static int instr3type; static int data3loc; static int data; static boolean run3bit = true;
// program counter holds address of next instr // the accumulator, a register for doing arithmetic // a holding register for the current instruction // the instruction type (opcode) // the address of the data, or −1 if none // holds the current operand // a bit that can be turned off to halt the machine
public static void interpret(int memory[ ], int starting3address) { // This procedure interprets programs for a simple machine with instructions having // one memory operand. The machine has a register AC (accumulator), used for // arithmetic. The ADD instruction adds am integer in memory to the AC, for example // The interpreter keeps running until the run bit is turned off by the HALT instruction. // The state of a process running on this machine consists of the memory, the // program counter, the run bit, and the AC. The input parameters consist of // of the memory image and the starting address. PC = starting 3address; while (run3bit) { instr = memory[PC]; // fetch next instruction into instr PC = PC + 1; // increment program counter instr3type = get3instr3type(instr); // determine instruction type data3loc = find3data(instr, instr3type); // locate data (−1 if none) if (data3loc >= 0) // if data3loc is −1, there is no operand data = memory[data 3loc]; // fetch the data execute(instr 3type, data); //execute instruction } } private static int get3instr3type(int addr) { ... } private static int find3data(int instr, int type) { ... } private static void execute(int type, int data){ ... } }
Figure 2-3. An interpreter for a simple computer (written in Java).
S1
S2
S3
S4
S5
Instruction fetch unit
Instruction decode unit
Operand fetch unit
Instruction execution unit
Write back unit
(a) S1:
1
S2:
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
1
2
3
4
5
6
1
2
3
4
5
6
7
8
9
S3: S4: S5: 1
2
3
4 5 Time (b)
…
Figure 2-4. (a) A five-stage pipeline. (b) The state of each stage as a function of time. Nine clock cycles are illustrated.
S1
Instruction fetch unit
S2
S3
S4
S5
Instruction decode unit
Operand fetch unit
Instruction execution unit
Write back unit
Instruction decode unit
Operand fetch unit
Instruction execution unit
Write back unit
Figure 2-5. (a) Dual five-stage pipelines with a common instruction fetch unit.
S4 ALU
ALU S1
S2
S3
Instruction fetch unit
Instruction decode unit
Operand fetch unit
S5 LOAD
Write back unit
STORE
Floating point
Figure 2-6. A superscalar processor with five functional units.
Control unit Broadcasts instructions
8 × 8 Processor/memory grid Processor Memory
Figure 2-7. An array processor of the ILLIAC IV type.
Local memories
Shared memory CPU
CPU
CPU
CPU
Shared memory CPU
CPU
CPU
CPU
Bus (a)
Bus (b)
Figure 2-8. (a) A single-bus multiprocessor. (b) A multicomputer with local memories.
Address
Address
1 Cell
Address
0
0
0
1
1
1
2
2
2
3
3
3
4
4
4
5
5
5
6
6
16 bits
7
7
(c)
8
12 bits
9
(b)
10 11 8 bits (a)
Figure 2-9. Three ways of organizing a 96-bit memory.
2222222222222222222222222222222222 12222222222222222222222222222222222 1 Bits/cell 1 Computer 1 1 1 Burroughs B1700 1 21 222222222222222222222222222222222 1 1 12222222222222222222222222222222222 1 1 IBM PC 8 1 DEC PDP-8 1 1 12 21 222222222222222222222222222222222 1 1 IBM 1130 16 12222222222222222222222222222222222 1 1 1 DEC PDP-15 1 1 18 21 222222222222222222222222222222222 1 1 XDS 940 24 12222222222222222222222222222222222 1 1 12222222222222222222222222222222222 1 1 Electrologica X8 27 1 1 1 XDS Sigma 9 32 21 222222222222222222222222222222222 1 1 12222222222222222222222222222222222 1 1 Honeywell 6180 36 1 CDC 3600 1 1 48 21 222222222222222222222222222222222 1 1 CDC Cyber 60 12222222222222222222222222222222222 1 1 Figure 2-10. Number of bits per cell for some historically interesting commercial computers.
Address
Little endian
Big endian
Address
0
0
1
2
3
3
2
1
0
0
4
4
5
6
7
7
6
5
4
4
8
8
9
10
11
11
10
9
8
8
12
12
13
14
15
15
14
13
12
12
Byte
Byte 32-bit word
32-bit word
(a)
(b)
Figure 2-11. (a) Big endian memory. (b) Little endian memory.
Big endian
Transfer from big endian to little endian
Little endian
0
J
I
M
4
S
M
I
T
8
H
0
0
12
0
16
0
M
I
J
J
I
M
T
I
M
S
S
M
I
T
4
0
0
0
H
H
0
0
0
8
12
21 0
0
0
0
0
0 21 12
16
4
0
0
0
0
1
M
I
J
0
T
I
M
S
4
0
0
0
0
H
8
0
0 21
0
0
0 21
0
1
0
0
1
(a)
4
(b)
4
Transfer and swap
1 (c)
(d)
Figure 2-12. (a) A personnel record for a big endian machine. (b) The same record for a little endian machine. (c) The result of transferring the record from a big endian to a little endian. (d) The result of byte-swapping (c).
0
4 16
22222222222222222222222222222222222222222222222222222 122222222222222222222222222222222222222222222222222222 Word size 1 Check bits 1 Total size 1 Percent overhead 1 1 1 1 1 1 8 4 12 50 22222222222222222222222222222222222222222222222222222 1 1 1 1 1 122222222222222222222222222222222222222222222222222222 1 1 1 1 16 5 21 31 1 1 1 1 1 32 6 38 19 22222222222222222222222222222222222222222222222222222 1 1 1 1 1 64 7 71 11 122222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 1 1 128 8 136 6 22222222222222222222222222222222222222222222222222222 1 1 1 1 1 256 9 265 4 122222222222222222222222222222222222222222222222222222 1 1 1 1 122222222222222222222222222222222222222222222222222222 11 11 11 11 512 10 522 2 1 Figure 2-13. Number of check bits for a code that can correct a single error.
A 0 1
1
C
A
A
0
0
1
0 1
1
0
1 1
1
C
0 Parity bits
B (a)
1
0 0
B
C
Error
(b)
0 B (c)
Figure 2-14. (a) Encoding of 1100. (b) Even parity added. (c) Error in AC.
Memory word 1111000010101110 0 1
0 2
1 3
0 4
1 5
1 6
1 7
0 8
0 0 0 0 1 0 1 1 0 1 1 1 0 9 10 11 12 13 14 15 16 17 18 19 20 21
Parity bits
Figure 2-15. Construction of the Hamming code for the memory word 1111000010101110 by adding 5 check bits to the 16 data bits.
Main memory CPU Cache
Bus Figure 2-16. The cache is logically between the CPU and main memory. Physically, there are several possible places it could be located.
4-MB memory chip Connector Figure 2-17. A single inline memory module (SIMM) holding 32 MB. Two of the chips control the SIMM.
Registers Cache
Main memory
Magnetic disk
Tape
Optical disk
Figure 2-18. A five-level memory hierarchy.
Intersector gap or ect 1s
ta bits 6 da 409
ble am e Pr
Track width is 5–10 microns
E C C
Direction of arm motion
Width of 1 bit is 0.1 to 0.2 microns
Dire c Preamb le
Read/write head
tion
of d
isk
40 96 da ta
rot ati on
bit s C
C
E
Disk arm
Figure 2-19. A portion of a disk track. Two sectors are illustrated.
Read/write head (1 per surface) Surface 7 Surface 6 Surface 5 Surface 4 Surface 3 Direction of arm motion Surface 2 Surface 1 Surface 0
Figure 2-20. A disk with four platters.
222222222222222222222222222222222222222222222222222222222222 1222222222222222222222222222222222222222222222222222222222222 1 LD 5.25′′ 1 HD 5.25′′ 1 LD 3.5′′ 1 HD 3.5′′ 1 Parameters 1 1 1 1 1 1 Size (inches) 5.25 5.25 3.5 3.5 222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222 1 1 1 1 Capacity (bytes) 360K 1.2M 720K 1.44M 1 1 Tracks 1 1 1 1 1 40 80 80 80 222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 Sectors/track 9 15 9 18 1222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 Heads 1 1 1 1 1 2 2 2 2 222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 Rotations/min 300 360 300 300 1 1222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222 1 1 1 1 Data rate (kbps) 250 500 250 500 1 1 1 1 1 1 1 1 Type 222222222222222222222222222222222222222222222222222222222222 1 Flexible 1 Flexible 1 Rigid 1 Rigid 1 Figure 2-21. Characteristics of the four kinds of floppy disks.
222222222222222222222222222222222222222222222222222222 1222222222222222222222222222222222222222222222222222222 1 Data bits 1 Bus MHz 1 MB/sec 1 Name 1 1 1 1 1 SCSI-1 8 5 5 21 22222222222222222222222222222222222222222222222222222 1 1 1 1 1222222222222222222222222222222222222222222222222222222 1 1 1 1 SCSI-2 8 5 5 1 Fast SCSI-2 1 1 1 1 8 10 10 21 22222222222222222222222222222222222222222222222222222 1 1 1 1 Fast & wide SCSI-2 16 10 20 1222222222222222222222222222222222222222222222222222222 1 1 1 1 1 Ultra SCSI 1 1 1 1 16 20 40 1222222222222222222222222222222222222222222222222222222 1 1 1 1 Figure 2-22. Some of the possible SCSI parameters.
(a)
(b)
Strip 0
Strip 1
Strip 2
Strip 3
Strip 4
Strip 5
Strip 6
Strip 7
Strip 8
Strip 9
Strip 10
Strip 11
Strip 0
Strip 1
Strip 2
Strip 3
Strip 0
Strip 1
Strip 2
Strip 3
Strip 4
Strip 5
Strip 6
Strip 7
Strip 4
Strip 5
Strip 6
Strip 7
Strip 8
Strip 9
Strip 10
Strip 11
Strip 8
Strip 9
Strip 10
Strip 11
Bit 1
Bit 2
Bit 3
Bit 4
Bit 5
Bit 6
Bit 7
RAID level 0
(c)
RAID level 2
Bit 1
Bit 2
Bit 3
Bit 4
Parity
(d)
(e)
(f)
RAID level 1
RAID level 3
Strip 0
Strip 1
Strip 2
Strip 3
P0-3
Strip 4
Strip 5
Strip 6
Strip 7
P4-7
Strip 8
Strip 9
Strip 10
Strip 11
P8-11
Strip 0
Strip 1
Strip 2
Strip 3
P0-3
Strip 4
Strip 5
Strip 6
P4-7
Strip 7
RAID level 4
Strip 8
Strip 9
P8-11
Strip 10
Strip 11 RAID level 5
Strip 12
P16-12
Strip 13
Strip 14
Strip 15
P16-19
Strip 12
Strip 17
Strip 18
Strip 19
Figure 2-23. RAID levels 0 through 5. Backup and parity drives are shown shaded.
Spiral groove
Pit Land
2K block of user data
Figure 2-24. Recording structure of a Compact Disc or CD-ROM.
…
Symbols of 14 bits each
42 Symbols make 1 frame Frames of 588 bits, each containing 24 data bytes
… Preamble
Bytes 16
98 Frames make 1 sector Data
ECC
2048
288
Mode 1 sector (2352 bytes)
Figure 2-25. Logical data layout on a CD-ROM.
Printed label Protective lacquer Reflective gold layer Dye layer
Dark spot in the dye layer burned by laser when writing
1.2 mm Polycarbonate Direction of motion
Photodetector
Substrate
Lens Prism Infrared laser diode
Figure 2-26. Cross section of a CD-R disk and laser (not to scale). A silver CD-ROM has a similar structure, except without the dye layer and with a pitted aluminum layer instead of a gold layer.
Polycarbonate substrate 1 0.6 mm Single-sided disk
Semireflective layer
, , , ,
Aluminum reflector
Adhesive layer
Aluminum reflector
0.6 mm Single-sided disk
Polycarbonate substrate 2
Figure 2-27. A double-sided, dual layer DVD disk.
Semireflective layer
SCSI controller Sound card
Modem
Card cage Edge connector Figure 2-28. Physical structure of a personal computer.
Monitor
CPU
Memory
Video controller
Keyboard
Floppy disk drive
Hard disk drive
Keyboard controller
Floppy disk controller
Hard disk controller
Bus
Figure 2-29. Logical structure of a simple personal computer.
Memory bus
SCSI bus
SCSI scanner
SCSI disk
Sound card
Main memory
PCI bridge
CPU cache
SCSI controller
Printer controller
Video controller
ISA bridge
Network controller PCI bus
Modem
ISA bus
Figure 2-30. A typical modern PC with a PCI bus and an ISA bus. The modem and sound card are ISA devices; the SCSI controller is a PCI device.
Horizontal scan Grid Screen
Electron gun
Spot on screen Vacuum Vertical deflection plate
Vertical retrace Horizontal retrace (a)
(b)
Figure 2-31. (a) Cross section of a CRT. (b) CRT scanning pattern.
Liquid crystal Rear glass plate Rear electrode
ÃÁCAÃÁCA
Rear polaroid
Front glass plate Front electrode Front polaroid
y Dark
z
Bright
Light source
Notebook computer (b) (a)
Figure 2-32. (a) The construction of an LCD screen. (b) The grooves on the rear and front plates are perpendicular to one another.
Character
Attribute Analog video signal
CPU
Main memory
Video board A2B2C2
Monitor Video RAM
ABC
Bus
Figure 2-33. Terminal output on a personal computer.
CPU
Serial I/O card Memory UART RS-232-C connector
Terminal
Telephone line (analog) ABC ABC
Modem
Modem Keyboard
Some signals: Protective ground (1) Transmit (2) Receive (3) Request to send (4) Clear to send (5) Data set ready (6) Common return (7) Carrier detect (8) Data terminal ready (20)
Figure 2-34. Connection of an RS-232-C terminal to a computer. The numbers in parentheses in the list of signals are the pin numbers.
Pointer controlled by mouse Window
Menu
Cut Paste Copy
Mouse buttons Mouse
Rubber ball
Figure 2-35. A mouse being used to point to menu items.
(a)
(b)
Figure 2-36. (a) The letter ‘‘A’’ on a 5 × 7 matrix. (b) The letter ‘‘A’’ printed with 24 overlapping needles.
Rotating octagonal mirror
Laser
Drum sprayed and charged Light beam strikes drum Drum
Toner Scraper Discharger Heated rollers Blank paper
Stacked output Figure 2-37. Operation of a laser printer.
(a)
(b)
(c)
(d)
(e)
(f)
Figure 2-38. Halftone dots for various gray scale ranges. (a) 0–6. (b) 14–20. (c) 28–34. (d) 56–62. (e) 105–111. (f) 161–167.
(a)
Voltage
V2
0
1
0
0
1
Time 0 1
1
0
0
0
1
0
0
V1 High amplitude
Low amplitude
High frequency
Low frequency
(b)
(c)
(d)
Phase change
Figure 2-39. Transmission of the binary number 01001011000100 over a telephone line bit by bit. (a) Twolevel signal. (b) Amplitude modulation. (c) Frequency modulation. (d) Phase modulation.
ISDN terminal Digital bit pipe T
U NT1
ISDN telephone
ISDN terminal
ISDN alarm
Customer's equipment
ISDN exchange
To carrier's internal network
Carrier's equipment
Figure 2-40. ISDN for home use.
3 THE DIGITAL LOGIC LEVEL
1
+VCC +VCC +VCC Vout V1
Collector
Vout
Vout Vin
V2
V1
V2
Emitter
Base (a)
(b)
(c)
Figure 3-1. (a) A transistor inverter. (b) A NAND gate. (c) A NOR gate.
NOT A
X
A
NAND X
B A 0 1
X 1 0
(a)
NOR
A
X
B A 0 0 1 1
B 0 1 0 1 (b)
X 1 1 1 0
AND
A
X
B A 0 0 1 1
B 0 1 0 1 (c)
X 1 0 0 0
OR
A
X
B A 0 0 1 1
B 0 1 0 1 (d)
X 0 0 0 1
A 0 0 1 1
B 0 1 0 1
X 0 1 1 1
(e)
Figure 3-2. The symbols and functional behavior for the five basic gates.
A B C
A B C
A 1 A 4
5
B
ABC
ABC
2 A 0 0 0 0 1 1 1 1
B 0 0 1 1 0 0 1 1
C 0 1 0 1 0 1 0 1
(a)
M 0 0 0 1 0 1 1 1
8
B 6 ABC C 3 C 7
ABC
(b)
Figure 3-3. (a) The truth table for the majority function of three variables. (b) A circuit for (a).
M
A
A
A
A (a)
A A
AB
A+B
B B
A AB
A
A+B
B B (b)
(c)
Figure 3-4. Construction of (a) NOT, (b) AND, and (c) OR gates using only NAND gates or only NOR gates.
AB
A B
AB + AC
A
A(B + C)
B AC
C
B+C
C
A
B
C
AB
AC
AB + AC
A
B
C
A
B+C
A(B + C)
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
1
0
1
0
0
1
0
0
0
0
0
1
0
0
1
0
0
1
1
0
0
0
0
1
1
0
1
0
1
0
0
0
0
0
1
0
0
1
0
0
1
0
1
0
1
1
1
0
1
1
1
1
1
1
0
1
0
1
1
1
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
(a)
(b)
Figure 3-5. Two equivalent functions. (a) AB + AC. (b) A(B + C).
Name
AND form
OR form
Identity law
1A = A
0+A=A
Null law
0A = 0
1+A=1
Idempotent law
AA = A
A+A=A
Inverse law
AA = 0
A+A=1
Commutative law
AB = BA
A+B=B+A
Associative law
(AB)C = A(BC)
(A + B) + C = A + (B + C)
Distributive law
A + BC = (A + B)(A + C)
A(B + C) = AB + AC
Absorption law
A(A + B) = A
A + AB = A
De Morgan's law
AB = A + B
A + B = AB
Figure 3-6. Some identities of Boolean algebra.
AB
=
A+B
A+B
(a)
AB
=
(c)
=
AB
(b)
A+B
A+B
=
AB
(d)
Figure 3-7. Alternative symbols for some gates: (a) NAND. (b) NOR. (c) AND. (d) OR.
A
A
B
XOR
0
0
0
0
1
1
1
0
1
A
1
1
0
B
B
(a)
(b)
A
A
B
B
A
A
B
B (c)
(d)
Figure 3-8. (a) The truth table for the XOR function. (b)-(d) Three circuits for computing it.
A
B
F
A
B
F
A
B
F
0V
0V
0V
0
0
0
1
1
1
0V
5V
0V
0
1
0
1
0
1
5V
0V
0V
1
0
0
0
1
1
5V
5V
5V
1
1
1
0
0
0
(a)
(b)
Figure 3-9. (a) Electrical characteristics of a device. (b) Positive logic. (c) Negative logic.
(c)
VCC 14
13
12
11
10
9
8
1
2
3
4
5
6
7
Notch
GND
Figure 3-10. An SSI chip containing four gates.
Pin 8
D0 D1 D2 D3 F
D4 D5 D6 D7 A A B B C C
A
B
C
Figure 3-11. An eight-input multiplexer circuit.
VCC
D0
D0
D1
D1
D2
D2
D3
F
D4
D3
D5
D5
D6
D6
D7
D7
A B C (a)
F
D4
A B C (b)
Figure 3-12. (a) An MSI multiplexer.. (b) The same multiplexer wired to compute the majority function.
D0
D1
A
B
A
D2
A
D3
B
D4
B C
C C
D5
D6
D7
Figure 3-13. A 3-to-8 decoder circuit.
EXCLUSIVE OR gate A0 B0
A1 B1 A=B A2 B2
A3 B3 Figure 3-14. A simple 4-bit comparator.
A
If this fuse is blown, B is not an input to AND gate 1.
B 12 3 2 = 24 input signals
L
24 input lines
0
1
49
0
1 6 outputs If this fuse is blown, AND gate 1 is not an input to OR gate 5.
50 input lines
5
Figure 3-15. A 12-input, 6-output programmable logic array. The little squares represent fuses that can be burned out to determine the function to be computed. The fuses are arranged in two matrices: the upper one for the AND gates and the lower one for the OR gates.
D0
D1
D2
D3
D4
D5
D6
D7
S0
S1
S2
S3
S4
S5
S6
S7
C
Figure 3-16. A 1-bit left/right shifter.
Exclusive OR gate A
B
0
0
0
0
0
1
1
0
1
0
1
0
1
1
0
1
Sum Carry A
Sum
B
Carry
Figure 3-17. (a) Truth table for 1-bit addition. (b) A circuit for a half adder.
Carry in Carry Carry Sum in out
A
B
0
0
0
0
0
0
0
1
1
0
0
1
0
1
0
0
1
1
0
1
1
0
0
1
0
1
0
1
0
1
1
1
0
0
1
1
1
1
1
1
A
Sum
B
Carry out (a)
(b)
Figure 3-18. (a) Truth table for full adder. (b) Circuit for a full adder.
Logical unit
Carry in
AB INVA A ENA B ENB
A+B
Output
B Sum
Enable lines
F0
Full adder
F1
Decoder
Carry out
Figure 3-19. A 1-bit ALU.
F1 F0
A7 B7
A6 B6
A5 B5
A4 B4
A3 B3
A2 B2
A1 B1
A0 B0
1-bit ALU
1-bit ALU
1-bit ALU
1-bit ALU
1-bit ALU
1-bit ALU
1-bit ALU
1-bit ALU
O7
O6
O5
O4
O3
O2
O1
O0
Carry in
Carry out
Figure 3-20. Eight 1-bit ALU slices connected to make an 8bit ALU. The enables and invert signals are not shown for simplicity.
INC
C1
Delay
C2
(a)
(b)
A B C (c) Figure 3-21. (a) A clock. (b) The timing diagram for the clock. (c) Generation of an asymmetric clock.
S
0
1
Q
S
0
0
Q
1
1 R
0
0 0
0 (a)
Q
R
1
0 (b)
Q
A
B
NOR
0
0
1
0
1
0
1
0
0
1
1
0
(c)
Figure 3-22. (a) NOR latch in state 0. (b) NOR latch in state 1. (c) Truth table for NOR.
S Q Clock Q R Figure 3-23. A clocked SR latch.
D Q
Q
Figure 3-24. A clocked D latch.
d ∆
a
b
b AND c d
c
(a)
c
b
a Time (b)
Figure 3-25. (a) A pulse generator. (b) Timing at four points in the circuit.
D Q
Q
Figure 3-26. A D flip-flop.
D
Q
CK
(a)
D
Q
CK
(b)
D
Q
CK
(c)
Figure 3-27. D latches and flip-flops.
D
Q
CK
(d)
VCC 13
14
12
11
10
D
Q
2
Q
CK Q PR
CK Q PR
1
8
CLR
CLR D
9
3
4
5
6
7 GND
(a) VCC 20
19
Q
2
D
17
D
16
15
Q
Q
14
D
13
D
12
CK CLR
CK CLR
CK CLR
CLR CK
CLR CK
CLR CK
CLR CK
D
3
Q
4
D
Q
5
6
D
7
Q
8
11
Q
CK CLR
Q
1
18
D
9
10 GND
(b)
Figure 3-28. (a) Dual D flip-flop. (b) Octal flip-flop.
Data in I2 I1 I0 Write gate
Word 0 select line
A1 A0
Word 1 select line
Word 2 select line
D Q
D Q
D Q
CK
CK
CK
D Q
D Q
D Q
CK
CK
CK
D Q
D Q
D Q
CK
CK
CK
D Q
D Q
D Q
CK
CK
CK
Word 0
Word 1
Word 2
Word 3
CS • RD
CS O1
RD
O2 O3 OE
Output enable = CS • RD • OE
Figure 3-29. Logic diagram for a 4 × 3 memory. Each row is one of the four 3-bit words. A read or write operation always reads or writes a complete word.
Data in
Data out
Control (a)
(b)
(c)
(d)
Figure 3-30. (a) A noninverting buffer. (b) Effect of (a) when control is high. (c) Effect of (a) when control is low. (d) An inverting buffer.
A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 A16 A17 A18
512K 3 8 Memory chip (4 Mbit)
D0 D1 D2 D3 D4 D5 D6 D7
A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 A10
4096K 3 1 Memory chip D (4 Mbit)
RAS CAS
CS WE OE
CS WE OE
(a)
(b)
Figure 3-31. Two ways of organizing a 4-Mbit memory chip.
2222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 Byte 1 1 1 1 Type 1 Category 1 Erasure 1 alterable 1 Volatile 1 1 Typical use 12222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222 SRAM 1 Read/write 1 Electrical 1 Yes 1 Yes 1 Level 2 cache 1 1 DRAM 1 Read/write 1 Electrical 1 Yes 1 Yes 1 Main memory 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222 ROM 1 Read-only 1 Not possible 1 No 1 No 1 Large volume appliances 1 1 PROM 1 Read-only 1 Not possible 1 No 1 No 1 Small volume equipment 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222 EPROM 1 Read-mostly 1 UV light 1 No 1 No 1 Device prototyping 1 1 EEPROM1 Read-mostly 1 Electrical 1 Yes 1 No 1 Device prototyping 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 112222222222222222222222222222222222222222222222222222222222222222222222222222222 Flash 11 Read/write 11 Electrical 11 No 11 No 11 Film for digital camera 11
Figure 3-32. A comparison of various memory types.
Addressing Data Bus control
Bus arbitration Coprocessor
Typical MicroProcessor
Status
Interrupts
Symbol for clock signal
Miscellaneous
Φ +5v
Symbol for electrical ground
Power is 5volts
Figure 3-33. The logical pinout of a generic CPU. The arrows indicate input signals and output signals. The short diagonal lines indicate that multiple pins are used. For a specific CPU, a number will be given to tell how many.
CPU chip Buses Registers
Memory bus
Bus controller
I/O bus
ALU
On-chip bus
Memory
Disk
Modem
Figure 3-34. A computer system with multiple buses.
Printer
222222222222222222222222222222222222222222222222222222222222222222222222222222 1222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Master Slave Example 1 1 1 1 CPU 1222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Memory 1 Fetching instructions and data 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 CPU I/O device Initiating data transfer 1 1 1 1 CPU 2 1 22222222222222222222222222222222222222222222222222222222222222222222222222222 1 Coprocessor 1 CPU handing instruction off to coprocessor 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Memory 1 DMA (Direct Memory Access) 1 I/O 1 1 1 1 Coprocessor 1222222222222222222222222222222222222222222222222222222222222222222222222222222 1 CPU 1 Coprocessor fetching operands from CPU 1
Figure 3-35. Examples of bus masters and slaves.
20-Bit address 20-Bit address
Control
20-Bit address Control 8088
80286
4-Bit address 80386 Control 8-Bit address
4-Bit address
Control
Control Control (a)
(b)
(c)
Figure 3-36. Growth of an address bus over time.
Read cycle with 1 wait state T1 Φ
T2
T3
TAD
ADDRESS
Memory address to be read
TDS DATA
Data TM
MREQ
TMH
TML TRH
RD TDH
TRL WAIT
Time (a)
Symbol
Parameter
Min
TAD
Address output delay
TML
Address stable prior to MREQ
Max
Unit
11
nsec
6
nsec
TM
MREQ delay from falling edge of Φ in T1
8
nsec
TRL
RD delay from falling edge of Φ in T1
8
nsec
TDS
Data setup time prior to falling edge of Φ
TMH
MREQ delay from falling edge of Φ in T3
8
nsec
TRH
RD delay from falling edge of Φ in T3
8
nsec
TDH
Data hold time from negation of RD
5
0
nsec
nsec
(b)
Figure 3-37. (a) Read timing on a synchronous bus. (b) Specification of some critical times.
ADDRESS
Memory address to be read
MREQ
RD
MSYN
DATA
Data
SSYN
Figure 3-38. Operation of an asynchronous bus.
Bus request Bus grant Arbiter Bus grant may or may not be propagated along the chain
1
2
3
4
5
3
4
5
I/O devices (a)
Arbiter
Bus request level 1 Bus request level 2 Bus grant level 2 Bus grant level 1
1
2 (b)
Figure 3-39. (a) A centralized one-level bus arbiter using daisy chaining. (b) The same arbiter, but with two levels.
Bus request Busy +5v Arbitration line
In Out
In Out
In Out
In Out
In Out
1
2
3
4
5
Figure 3-40. Decentralized bus arbitration.
T1
T2
T3
T4
T5
T6
Data
Data
Data
Φ
ADDRESS
DATA
Memory address to be read
Count
Data
MREQ RD WAIT BLOCK
Figure 3-41. A block transfer.
T7
INT INTA
CPU
RD WR A0 CS
8259A Interrupt controller
D0-D7
IR0 IR1 IR2 IR3 IR4 IR5 IR6 IR7
Clock Keyboard
Disk
+5 v
Figure 3-42. Use of the 8259A interrupt controller.
Printer
14.0 cm
Pentium II SEC cartridge
512 KB unified L2 cache
Pentium II processor
6.3 cm
16 KB level 1 instruction cache
To local bus
16 KB level 1 data cache
Contact
1.6 cm
Figure 3-43. The Pentium II SEC package.
Bus arbitration
Request
BPRI# LOCK# Misc# A# ADS# REQ# Parity#
Error
Misc#
Snoop
Misc#
Response
RS# TRDY# Parity#
Data
D# DRDY# DBSY# Parity#
RESET# 3 Interrupts
33 5
5 3
VID 4
5 3
Compatibity 11
Pentium II CPU
Diagnostics 3
3
Initialization 2 Power management
64 7 Miscellaneous 8
27
35
Φ Power
Figure 3-44. Logical pinout of the Pentium II. Names in upper case are the official Intel names for individual signals. Names in mixed case are groups of related signals or signal descriptions.
Bus cycle T1
T2
T3
T4
T5
T6
T7
T8
T9
Req
Error
Snoop
Resp
Data
Req
Error
Snoop
Resp
Data
Req
Error
Snoop
Resp
Req
Error
Snoop
Resp
Req
Error
Snoop
Req
Error
Snoop
Req
Error
T10
T11
T12
Φ Transaction 1 2 3 4 5 6 7
Data Data Resp
Data Resp
Snoop
Data Resp
Data
Figure 3-45. Pipelining requests on the Pentium II’s memory bus.
Pin 1 Index
Figure 3-46. The UltraSPARC II CPU chip.
18
Tag address Tag valid
Level 2 cache tags
Bus arbitration
5
Memory address
35
Address parity 25
Tag data
4
Tag parity
Address valid UltraSPARC II CPU Wait
20
Data address Reply
Data address valid Level 2 cache data
UPA interface to main memory
4
Level 1 caches 128
Data
16
Parity
5 Control
UDB II memory buffer
Memory data
128
Memory ECC
16
Figure 3-47. The main features of the core of an UltraSPARC II system.
Programmable I/O lines
16
MicroJava 701 CPU Level 1 caches
PCI bus
Flash PROM
I
Main memory
D Memory bus
Figure 3-48. A microJava 701 system.
Motherboard
PC bus connector
PC bus
Plug-in Contact board Chips
CPU and other chips
New connector for PC/AT
Edge connector
Figure 3-49. The PC/AT bus has two components, the original PC part and the new part.
Local bus
Cache bus
Level 2 cache
Memory bus
PCI bridge
CPU
Main memory PCI bus
SCSI
USB
ISA bridge
IDE disk
Graphics adaptor
Available PCI slot
Monitor Mouse
Modem
Keyboard
ISA bus
Sound card
Printer
Available ISA slot
Figure 3-50. Architecture of a typical Pentium II system. The thicker buses have more bandwidth than the thinner ones.
PCI device
PCI device
PCI device
Figure 3-51. The PCI bus uses a centralized bus arbiter.
GNT#
REQ#
GNT#
REQ#
GNT#
REQ#
GNT#
REQ#
PCI arbiter
PCI device
22222222222222222222222222222222222222222222222222222222222222222222222222222222 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Signal 1 Lines 1 Master 1 Slave 1 Description 1 1 1 1 1 1 CLK 1 1 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Clock (33 MHz or 66 MHz) 1 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 32 1 1 1 Multiplexed address and data lines 1 AD × × 1 PAR 1 1 1 1 Address or data parity bit 1 1 × 21 2222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 C/BE 4 1 × 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Bus command/bit map for bytes enabled 1 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 Indicates that AD and C/BE are asserted 1 FRAME# 1 1 1 × 1 1 1 1 1 1 IRDY# 1 1 × 21 2222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Read: master will accept; write: data present 1 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Select configuration space instead of memory 1 IDSEL 1 1 × 1 DEVSEL# 1 1 1 Slave has decoded its address and is listening 1 1 1 × 21 2222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 TRDY# 1 1 × 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Read: data present; write: slave will accept 1 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Slave wants to stop transaction immediately 1 STOP# 1 1 × 1 1 1 1 1 1 PERR# 1 1 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Data parity error detected by receiver 1 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Address parity error or system error detected 1 SERR# 1 1 1 REQ# 1 1 1 Bus arbitration: request for bus ownership 1 1 1 21 2222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 GNT# 1 1 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Bus arbitration: grant of bus ownership 1 1122222222222222222222222222222222222222222222222222222222222222222222222222222222 11 11 11 Reset the system and all devices 11 RST# 1 11 (a) 22222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Sign 1 Lines 1 Master 1 Slave 1 1 Description 22222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 1 × 22222222222222222222222222222222222222222222222222222222222222222222222222222222 1 REQ64# 1 1 1 Request to run a 64-bit transaction 1 ACK64# 1 1 1 × 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 Permission is granted for a 64-bit transaction 1 1 AD 1 32 1 1 1 1 × Additional 32 bits of address or data 22222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 PAR64 1 1 1 × 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 Parity for the extra 32 address/data bits 1 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Additional 4 bits for byte enables 1 C/BE# 4 1 × 1 1 1 1 1 1 LOCK 1 1 × 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Lock the bus to allow multiple transactions 1 1 1 1 SBO# 1 1 1 Hit on a remote cache (for a multiprocessor) 1 22222222222222222222222222222222222222222222222222222222222222222222222222222222 1 SDONE 1 1 1 1 Snooping done (for a multiprocessor) 1 1 22222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 INTx 4 1 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Request an interrupt 1 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 IEEE 1149.1 JTAG test signals 1 JTAG 5 1 1 1 1 1 1 1 1 1 1 M66EN 1 22222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 Wired to power or ground (66 MHz or 33 MHz) 1 (b)
Figure 3-52. (a) Mandatory PCI bus signals. (b) Optional PCI bus signals.
Bus cycle Read
T1
Idle
T2
T3
T4
White
T5
T6
T7
Φ Turnaround AD C/BE#
Address Read cmd
Data Enable
Address
Data
Write cmd
Enable
FRAME# IRDY# DEVSEL# TRDY#
Figure 3-53. Examples of 32-bit PCI bus transactions. The first three cycles are used for a read operation, then an idle cycle, and then three cycles for a write operation.
Time (msec) 1
0
2
3
Idle Frame 1
Frame 0
Frame 2
Packets from root
Packets from root SOF
SOF
IN
DATA ACK
Frame 3
SOF
SOF OUT DATA ACK From device
Data packet from device
SYN PID PAYLOAD CRC
SYN PID PAYLOAD CRC
Figure 3-54. The USB root hub sends out frames every 1.00 msec.
8
CS A0-A1
2 8255A Parallel I/O chip
WR RD RESET D0-D7
8
8
8 Figure 3-55. An 8255A PIO chip.
Port A
Port B
Port C
RAM at address 8000H
PIO at FFFCH
, ,
EPROM at address 0
0
4K 8K 12K 16K 20K 24K 28K 32K 36K 40K 44K 48K 52K 56K 60K 64K
Figure 3-56. Location of the EPROM, RAM, and PIO in our 64K address space.
A0 Address bus A15
CS
CS
2K 3 8 EPROM
2K 3 8 RAM
CS PI0
(a) A0 Address bus A15
CS
CS
2K 3 8 EPROM
2K 3 8 RAM
CS PI0
(b)
Figure 3-57. (a) Full address decoding. (b) Partial address decoding.
4 THE MICROARCHITECTURE LEVEL
1
MAR
To and from main memory
Memory control registers
MDR
PC
MBR
SP
LV
Control signals Enable onto B bus
CPP
Write C bus to register TOS
OPC C bus
B bus H A
ALU control
B
6
N Z
ALU
Shifter
Shifter control 2
Figure 4-1. The data path of the example microarchitecture used in this chapter.
2222222222222222222222222222222222222222222222222 12222222222222222222222222222222222222222222222222 F 1 F 1 ENA 1 ENB 1 INVA 1 INC 1 Function 1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 0 0 A 2222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 1 1 1 1 1 1 12222222222222222222222222222222222222222222222222 1 1 0 1 0 1 0 0 B 3 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 A 1 0 1 12222222222222222222222222222222222222222222222222 3 1 1 0 1 B 1 1 1 12222222222222222222222222222222222222222222222222 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 A+B 2222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 A+B+1 1 1 1 1 12222222222222222222222222222222222222222222222222 1 1 1 1 A+1 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 12222222222222222222222222222222222222222222222222 1 1 0 1 0 1 B + 1 1 1 1 1 1 1 1 12222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 12222222222222222222222222222222222222222222222222 1 1 1 1 1 1 B−A 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 B−1 1 1 1 12222222222222222222222222222222222222222222222222 1 1 1 1 −A 1 0 1 12222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 0 0 A AND B 1 2222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 0 1 A OR B 1 1 1 1 12222222222222222222222222222222222222222222222222 1 1 0 1 0 1 1 0 1 0 1 1 1 0 0 1 1 1 1 1 1 1 12222222222222222222222222222222222222222222222222 0 1 0 0 0 1 1 1 1 1 1 1 1 1 12222222222222222222222222222222222222222222222222 11 1 1 1 1 1 1 12222222222222222222222222222222222222222222222222 1 1 0 1 −1 1 1 0 1 0 1 1 1 0 Figure 4-2. Useful combinations of ALU signals and the function performed.
Registers loaded instantaneously from C bus and memory on rising edge of clock
Shifter output stable
Cycle 1 starts here
Clock cycle 1
∆w
∆x
Set up signals to drive data path Drive H and B bus
∆y
Clock cycle 2
New MPC used to load MIR with next microinstruction here
∆z
ALU and shifter
MPC available here
Propagation from shifter to registers
Figure 4-3. Timing diagram of one data path cycle.
32-Bit MAR (counts in words) Discarded 0 0
32-Bit address bus (counts in bytes)
Figure 4-4. Mapping of the bits in MAR to the address bus.
Bits
9
3
NEXT_ADDRESS
Addr
J M P C
J A M N
8 J A M Z
JAM
S L L 8
9
3
4
S F0 F1 E E I I H O T C L S P M M W R F R P O P V P C D A R E E N N N N I T R R T A C A C S P A B V C 1 A E D H
ALU
C
Mem
B bus
B
B bus registers 0 = MDR 1 = PC 2 = MBR 3 = MBRU 4 = SP
Figure 4-5. The microinstruction format for the Mic-1.
5 = LV 6 = CPP 7 = TOS 8 = OPC 9 -15 none
Memory control signals (rd, wr, fetch) 3 4 4-to-16 Decoder
MAR MDR
MPC
9
PC O
8
MBR SP
512 × 36-Bit control store for holding the microprogram
8
LV
JMPC
CPP
Addr
J
ALU
C
MIR M B
TOS JAMN/JAMZ
OPC H
B bus
2 1-bit flip–flop
N
6 ALU control
High bit
ALU
Control signals Enable onto B bus
Z Shifter C bus
2 Write C bus to register
Figure 4-6. The complete block diagram of our example microarchitecture, the Mic-1.
Address
Addr
JAM
0x75
0x92
001
Data path control bits JAMZ bit set
…
0x92
…
0x192
One of these will follow 0x75 depending on Z
Figure 4-7. A microinstruction with JAMZ set to 1 has two potential successors.
SP LV SP
LV SP LV
a3 a2 a1 (a)
108 104 100
b4 b3 b2 b1 a3 a2 a1
c2 c1 b4 b3 b2 b1 a3 a2 a1
(b)
(c)
SP
LV
d5 d4 d3 d2 d1 a3 a2 a1 (d)
Figure 4-8. Use of a stack for storing local variables. (a) While A is active. (b) After A calls B. (c) After B calls C. (d) After C and B return and A calls D.
, , , SP
SP
LV
a2 a3 a2 a1
(a)
LV
a3 a2 a3 a2 a1
(b)
SP
LV
a2 + a3 a3 a2 a1 (c)
SP LV
a3 a2 a2 + a3 (d)
Figure 4-9. Use of an operand stack for doing an arithmetic computation.
Current Operand Stack 3
SP
Current Local Variable Frame 3 LV Local Variable Frame 2 Constant Pool
Local Variable Frame 1
Method Area
CPP
Figure 4-10. The various parts of the IJVM memory.
PC
222222222222222222222222222222222222222222222222222222222222222222222222222222222 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 Hex 1 Mnemonic Meaning 1 1 1 1 0x10 1 BIPUSH byte 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Push byte onto stack 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 0x59 DUP Copy top word on stack and push onto stack 1 1 1 1 222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 0xA7 1 GOTO offset 1 Unconditional branch 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Pop two words from stack; push their sum 1 0x60 1 IADD 1 1 1 1 0x7E IAND Pop two words from stack; push Boolean AND 222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 0x99 1 IFEQ offset 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Pop word from stack and branch if it is zero 1 1 0x9B 1 IFLT offset 1 Pop word from stack and branch if it is less than zero 1 222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 offset Pop two words from stack; branch if equal 0x9F IF 3 ICMPEQ 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 0x84 1 IINC varnum const 1 Add a constant to a local variable 1 222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 0x15 1 ILOAD varnum 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Push local variable onto stack 1 1 0xB6 1 INVOKEVIRTUAL disp 1 Invoke a method 1 222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 0x80 1 IOR 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Pop two words from stack; push Boolean OR 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 0xAC IRETURN Return from method with integer value 1 1 1 1 0x36 1 ISTORE varnum 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Pop word from stack and store in local variable 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 0x64 ISUB Pop two words from stack; push their difference 1 1 1 1 index Push constant from constant pool onto stack 0x13 LDC 3 W 222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Do nothing 1 0x00 1 NOP 1 1 1 1 0x57 1 POP 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Delete word on top of stack 1 0x5F 1 SWAP 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Swap the two top words on the stack 1 1 0xC4 1 WIDE 1 Prefix instruction; next instruction has a 16-bit index 1 1 1 1 1 222222222222222222222222222222222222222222222222222222222222222222222222222222222
Figure 4-11. The IJVM instruction set. The operands byte, const, and varnum are 1 byte. The operands disp, index, and offset are 2 bytes.
Stack after INVOKEVIRTUAL Caller's LV Caller's PC Space for caller's local variables
Stack before INVOKEVIRTUAL Pushed parameters
Caller's local variable frame
Parameter 3 Parameter 2 Parameter 1 OBJREF Previous LV Previous PC
SP
Caller's local variables Parameter 2 Parameter 1 Link ptr (a)
SP
Stack base after INVOKEVIRTUAL
Stack base before INVOKEVIRTUAL LV
Parameter 3 Parameter 2 Parameter 1 Link ptr Previous LV Previous PC Caller's local variables Parameter 2 Parameter 1 Link ptr (b)
Figure 4-12. (a) Memory before executing INVOKEVIRTUAL. (b) After executing it.
LV
Stack before IRETURN Return value Previous LV Previous PC
SP
Caller's local variables Parameter 3 Parameter 2 Parameter 1 Link ptr Previous LV Previous PC Caller's local variable frame
Caller's local variables Parameter 2 Parameter 1 Link ptr (a)
Stack base before IRETURN LV
Stack after IRETURN Return value Previous LV Previous PC
Stack base after IRETURN
SP
Caller's local variables Parameter 2 Parameter 1 Link ptr
LV
(b)
Figure 4-13. (a) Memory before executing IRETURN. (b) After executing it.
i = j + k; if (i == 3) k = 0; else j = j − 1;
(a)
1 ILOAD j // i = j + k 2 ILOAD k 3 IADD 4 ISTORE i 5 ILOAD i // if (i < 3) 6 BIPUSH 3 7 IF3ICMPEQ L1 8 ILOAD j // j = j − 1 9 BIPUSH 1 10 ISUB 11 ISTORE j 12 GOTO L2 13 L1: BIPUSH 0 14 ISTORE k 15 L2: (b)
0x15 0x02 0x15 0x03 0x60 0x36 0x01 0x15 0x01 0x10 0x03 0x9F 0x00 0x0D 0x15 0x02 0x10 0x01 0x64 0x36 0x02 0xA7 0x00 0x07 // k = 0 0x10 0x00 0x36 0x03 (c)
Figure 4-14. (a) A Java fragment. (b) The corresponding Java assembly language. (c) The IJVM program in hexadecimal.
0
j 1
k j 2
j+k 3
j 8
1 j 9
j–1 10
11
4
j 5
3 j 6
7
12
0 13
14
15
Figure 4-15. The stack after each instruction of Fig. 4-14(b).
222222222222222222222222222 1222222222222222222222222222 1 DEST = H 1 1 DEST = SOURCE 2 22222222222222222222222222 1 1 33 1222222222222222222222222222 1 DEST = H 3 33333333 1 1 DEST = SOURCE 21 22222222222222222222222222 1 1 DEST = H + SOURCE 21 22222222222222222222222222 1 DEST = H + SOURCE + 1 1 21 222222222222222222222222221 DEST = H + 1 1 21 22222222222222222222222222 1 DEST = SOURCE + 1 1 21 22222222222222222222222222 1 DEST = SOURCE − H 21 222222222222222222222222221 1 DEST = SOURCE − 1 21 22222222222222222222222222 1 1 DEST = −H 21 22222222222222222222222222 1 DEST = H AND SOURCE 1 21 22222222222222222222222222 1 DEST = H OR SOURCE 1 21 222222222222222222222222221 DEST = 0 1 21 22222222222222222222222222 1 DEST = 1 1 21 22222222222222222222222222 1 DEST = −1 12222222222222222222222222221 Figure 4-16. All permitted operations. Any of the above operations may be extended by adding ‘‘<< 8’’ to them to shift the result left by 1 byte. For example, a common operation is H = MBR < < 8
Operations Comments 2Label 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 PC = PC + 1; fetch; goto (MBR) MBR holds opcode; get next byte; dispatch 2Main1 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 nop1 goto Main1 Do nothing 22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 iadd1 MAR = SP = SP − 1; rd Read in next-to-top word on stack iadd2 H = TOS H = top of stack MDR = TOS = MDR + H; wr; goto Main1 Add top two words; write to top of stack 2iadd3 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 isub1 MAR = SP = SP − 1; rd Read in next-to-top word on stack isub2 H = TOS H = top of stack isub3 MDR = TOS = MDR − H; wr; goto Main1 Do subtraction; write to top of stack 22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 iand1 MAR = SP = SP − 1; rd Read in next-to-top word on stack iand2 H = TOS H = top of stack iand3 MDR = TOS = MDR AND H; wr; goto Main1 Do AND; write to new top of stack 22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 ior1 MAR = SP = SP − 1; rd Read in next-to-top word on stack ior2 H = TOS H = top of stack ior3 MDR = TOS = MDR OR H; wr; goto Main1 Do OR; write to new top of stack 22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 dup1 MAR = SP = SP + 1 Increment SP and copy to MAR dup2 MDR = TOS; wr; goto Main1 Write new stack word 22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 pop1 MAR = SP = SP − 1; rd Read in next-to-top word on stack pop2 Wait for new TOS to be read from memory TOS = MDR; goto Main1 Copy new word to TOS 2pop3 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 swap1 MAR = SP − 1; rd Set MAR to SP − 1; read 2nd word from stack swap2 MAR = SP Set MAR to top word swap3 H = MDR; wr Save TOS in H; write 2nd word to top of stack swap4 MDR = TOS Copy old TOS to MDR swap5 MAR = SP − 1; wr Set MAR to SP − 1; write as 2nd word on stack TOS = H; goto Main1 Update TOS 2swap6 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 bipush1 SP = MAR = SP + 1 MBR = the byte to push onto stack bipush2 PC = PC + 1; fetch Increment PC, fetch next opcode MDR = TOS = MBR; wr; goto Main1 Sign-extend constant and push on stack 2bipush3 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 iload1 H = LV MBR contains index; copy LV to H iload2 MAR = MBRU + H; rd MAR = address of local variable to push iload3 MAR = SP = SP + 1 SP points to new top of stack; prepare write iload4 PC = PC + 1; fetch; wr Inc PC; get next opcode; write top of stack TOS = MDR; goto Main1 Update TOS 2iload5 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 istore1 H = LV MBR contains index; Copy LV to H istore2 MAR = MBRU + H MAR = address of local variable to store into istore3 MDR = TOS; wr Copy TOS to MDR; write word istore4 SP = MAR = SP − 1; rd Read in next-to-top word on stack istore5 PC = PC + 1; fetch Increment PC; fetch next opcode istore6 TOS = MDR; goto Main1 Update TOS
22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 wide1 PC = PC + 1; fetch; goto (MBR OR 0x100) Multiway branch with high bit set 22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 wide3iload1 PC = PC + 1; fetch MBR contains 1st index byte; fetch 2nd wide3iload2 H = MBRU << 8 H = 1st index byte shifted left 8 bits wide3iload3 H = MBRU OR H H = 16-bit index of local variable 3iload4 MAR = LV + H; rd; goto iload3 MAR = address of local variable to push 2wide 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 wide3istore1 PC = PC + 1; fetch MBR contains 1st index byte; fetch 2nd wide3istore2 H = MBRU << 8 H = 1st index byte shifted left 8 bits wide3istore3 H = MBRU OR H H = 16-bit index of local variable wide 3 istore4 MAR = LV + H; goto istore3 MAR = address of local variable to store into 22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 ldc3w1 PC = PC + 1; fetch MBR contains 1st index byte; fetch 2nd ldc3w2 H = MBRU << 8 H = 1st index byte << 8 ldc3w3 H = MBRU OR H H = 16-bit index into constant pool ldc3w4 MAR = H + CPP; rd; goto iload3 MAR = address of constant in pool Figure 4-17. The microprogram for the Mic-1 (part 1 of 3).
Operations Comments 2Label 22222222222222222222222222222222222222222222222222222222222222222222222222222222222222 iinc1 H = LV MBR contains index; Copy LV to H iinc2 MAR = MBRU + H; rd Copy LV + index to MAR; Read variable iinc3 PC = PC + 1; fetch Fetch constant iinc4 H = MDR Copy variable to H iinc5 PC = PC + 1; fetch Fetch next opcode MDR = MBR + H; wr; goto Main1 Put sum in MDR; update variable 2iinc6 22222222222222222222222222222222222222222222222222222222222222222222222222222222222222 goto1 OPC = PC − 1 Save address of opcode. goto2 PC = PC + 1; fetch MBR = 1st byte of offset; fetch 2nd byte goto3 H = MBR << 8 Shift and save signed first byte in H goto4 H = MBRU OR H H = 16-bit branch offset goto5 PC = OPC + H; fetch Add offset to OPC goto Main1 Wait for fetch of next opcode 2goto6 22222222222222222222222222222222222222222222222222222222222222222222222222222222222222 iflt1 MAR = SP = SP − 1; rd Read in next-to-top word on stack iflt2 OPC = TOS Save TOS in OPC temporarily iflt3 TOS = MDR Put new top of stack in TOS iflt4 N = OPC; if (N) goto T; else goto F Branch on N bit 222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 ifeq1 MAR = SP = SP − 1; rd Read in next-to-top word of stack ifeq2 OPC = TOS Save TOS in OPC temporarily ifeq3 TOS = MDR Put new top of stack in TOS Z = OPC; if (Z) goto T; else goto F Branch on Z bit 2ifeq4 22222222222222222222222222222222222222222222222222222222222222222222222222222222222222 if3icmpeq1 MAR = SP = SP − 1; rd Read in next-to-top word of stack if3icmpeq2 MAR = SP = SP − 1 Set MAR to read in new top-of-stack if3icmpeq3 H = MDR; rd Copy second stack word to H if3icmpeq4 OPC = TOS Save TOS in OPC temporarily if3icmpeq5 TOS = MDR Put new top of stack in TOS 3icmpeq6 Z = OPC − H; if (Z) goto T; else goto F If top 2 words are equal, goto T, else goto F 2if22222222222222222222222222222222222222222222222222222222222222222222222222222222222222 T OPC = PC − 1; fetch; goto goto2 Same as goto1; needed for target address 222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 F PC = PC + 1 Skip first offset byte F2 PC = PC + 1; fetch PC now points to next opcode F3 goto Main1 Wait for fetch of opcode
222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 invokevirtual1 PC = PC + 1; fetch MBR = index byte 1; inc. PC, get 2nd byte invokevirtual2 H = MBRU << 8 Shift and save first byte in H invokevirtual3 H = MBRU OR H H = offset of method pointer from CPP invokevirtual4 MAR = CPP + H; rd Get pointer to method from CPP area invokevirtual5 OPC = PC + 1 Save Return PC in OPC temporarily invokevirtual6 PC = MDR; fetch PC points to new method; get param count invokevirtual7 PC = PC + 1; fetch Fetch 2nd byte of parameter count invokevirtual8 H = MBRU << 8 Shift and save first byte in H invokevirtual9 H = MBRU OR H H = number of parameters invokevirtual10 PC = PC + 1; fetch Fetch first byte of # locals invokevirtual11 TOS = SP − H TOS = address of OBJREF − 1 invokevirtual12 TOS = MAR = TOS + 1 TOS = address of OBJREF (new LV) invokevirtual13 PC = PC + 1; fetch Fetch second byte of # locals invokevirtual14 H = MBRU << 8 Shift and save first byte in H invokevirtual15 H = MBRU OR H H = # locals invokevirtual16 MDR = SP + H + 1; wr Overwrite OBJREF with link pointer invokevirtual17 MAR = SP = MDR; Set SP, MAR to location to hold old PC invokevirtual18 MDR = OPC; wr Save old PC above the local variables invokevirtual19 MAR = SP = SP + 1 SP points to location to hold old LV invokevirtual20 MDR = LV; wr Save old LV above saved PC invokevirtual21 PC = PC + 1; fetch Fetch first opcode of new method. invokevirtual22 LV = TOS; goto Main1 Set LV to point to LV Frame Figure 4-17. The microprogram for the Mic-1 (part 2 of 3).
Label Operations Comments 2222222222222222222222222222222222222222222222222222222222222222222222 ireturn1 MAR = SP = LV; rd Reset SP, MAR to get link pointer ireturn2 Wait for read ireturn3 LV = MAR = MDR; rd Set LV to link ptr; get old PC ireturn4 MAR = LV + 1 Set MAR to read old LV ireturn5 PC = MDR; rd; fetch Restore PC; fetch next opcode ireturn6 MAR = SP Set MAR to write TOS ireturn7 LV = MDR Restore LV ireturn8 MDR = TOS; wr; goto Main1 Save return value on original top of stack
Figure 4-17. The microprogram for the Mic-1 (part 3 of 3).
BIPUSH (0×10)
BYTE
Figure 4-18. The BIPUSH instruction format.
ILOAD (0x15) (a)
INDEX
WIDE (0xC4)
ILOAD (0x15)
INDEX BYTE 1
INDEX BYTE 2
(b)
Figure 4-19. (a) ILOAD with a 1-byte index. (b) WIDE ILOAD with a 2-byte index.
Address 0×1FF
Control store Microinstruction execution order
0×115
wide_iload1
0×100
Main1
0×C4
wide1
0×15
iload1
WIDE ILOAD ILOAD 3
1
1
2
2
0×00
Figure 4-20. The initial microinstruction sequence for ILOAD and WIDE ILOAD. The addresses are examples.
IINC (0x84)
INDEX
CONST
Figure 4-21. The IINC instruction has two different operand fields.
Memory
1 Byte
n+3 n + 2 OFFSET BYTE 2
OFFSET BYTE 2
OFFSET BYTE 2
OFFSET BYTE 2
OFFSET BYTE 2
n + 1 OFFSET BYTE 1
OFFSET BYTE 1
OFFSET BYTE 1
OFFSET BYTE 1
OFFSET BYTE 1
GOTO (0xA7)
GOTO (0xA7)
GOTO (0xA7)
GOTO (0xA7)
GOTO (0xA7)
n
n+1
n+1
n+2
n+2
n
n
n
OFFSET BYTE 1
OFFSET BYTE 1
OFFSET BYTE 2
n Registers PC OPC MBR
0xA7
OFFSET BYTE 1
OFFSET 1 << 8
H (a)
(b)
(c)
(d)
(e)
Figure 4-22. The situation at the start of various microinstructions. (a) Main1 . (b) goto1. (c) goto2. (d) goto3. (e) goto4.
22222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Label Operations Comments 21 2222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 MAR = SP = SP − 1; rd Read in next-to-top word on stack 1 pop1 1 Wait for new TOS to be read from memory 1 1 pop2 1 pop3 1 TOS = MDR; goto Main1 Copy new word to TOS 1 Main1 PC = PC + 1; fetch; goto (MBR) MBR holds opcode; get next byte; dispatch11 122222222222222222222222222222222222222222222222222222222222222222222222222222222
Figure 4-23. New microprogram sequence for executing POP.
222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Label 1 Operations Comments 222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 MAR = SP = SP − 1; rd Read in next-to-top word on stack 1 pop1 1 MBR holds opcode; fetch next byte 1 Main1.pop PC = PC + 1; fetch 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222 TOS = MDR; goto (MBR) Copy new word to TOS; dispatch on opcode 11 1 pop3
Figure 4-24. Enhanced microprogram sequence for executing POP.
22222222222222222222222222222222222222222222222222222222222222222222222222222222 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Label Operations Comments 1 1 MBR contains index; Copy LV to H 1 iload1 H = LV 1 MAR = address of local variable to push 1 iload2 MAR = MBRU + H; rd 1 1 iload3 MAR = SP = SP + 1 SP points to new top of stack; prepare write 1 1 iload4 PC = PC + 1; fetch; wr Inc PC; get next opcode; write top of stack 1 1 1 Update TOS 1 iload5 TOS = MDR; goto Main1 1 Main1 PC = PC + 1; fetch; goto (MBR) MBR holds opcode; get next byte; dispatch 1222222222222222222222222222222222222222222222222222222222222222222222222222222221
Figure 4-25. Mic-1 code for executing ILOAD.
22222222222222222222222222222222222222222222222222222222222222222222222222222222 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Label Operations Comments 1 1 MAR = address of local variable to push 1 iload1 MAR = MBRU + LV; rd 1 SP points to new top of stack; prepare write 1 1 iload2 MAR = SP = SP + 1 1 iload3 PC = PC + 1; fetch; wr 1 Inc PC; get next opcode; write top of stack 1 iload4 TOS = MDR 1 Update TOS 1 1 iload5 PC = PC + 1; fetch; goto (MBR) MBR already holds opcode; fetch index byte 1 122222222222222222222222222222222222222222222222222222222222222222222222222222222
Figure 4-26. Three-bus code for executing ILOAD.
MBR2
Shift register From memory
IMAR
MBR1
+1 2 low-order bits
C bus PC +1, 2 Write PC
Figure 4-27. A fetch unit for the Mic-1.
B bus
Word fetched Word fetched Word fetched
0
MBR1
1
MBR1
2
MBR1
3
MBR1
4
MBR1
5
MBR1
6
MBR2 MBR2
MBR2 MBR2
MBR2
Transitions MBR1: Occurs when MBR1 is read MBR2: Occurs when MBR2 is read Word fetched: Occurs when a memory word is read and 4 bytes are put into the shift register
Figure 4-28. A finite state machine for implementing the IFU.
Memory control registers
MAR To and from main memory
MDR PC Instruction fetch unit (IFU)
MBR MBR2 SP LV
Control signals CPP
Enable onto B bus
TOS
Write C bus to register
C bus
OPC B bus
H A bus
ALU control
6 ALU
Shifter
Figure 4-29. The datapath for Mic-2.
N Z
Operations Comments 2Label 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 goto (MBR) Branch to next instruction 2nop1 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 iadd1 MAR = SP = SP − 1; rd Read in next-to-top word on stack iadd2 H = TOS H = top of stack MDR = TOS = MDR+H; wr; goto (MBR1) Add top two words; write to new top of stack 2iadd3 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 isub1 MAR = SP = SP − 1; rd Read in next-to-top word on stack isub2 H = TOS H = top of stack isub3 MDR = TOS = MDR−H; wr; goto (MBR1) Subtract TOS from Fetched TOS-1 22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 iand1 MAR = SP = SP − 1; rd Read in next-to-top word on stack iand2 H = TOS H = top of stack iand3 MDR = TOS = MDR AND H; wr; goto (MBR1) AND Fetched TOS-1 with TOS 22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 ior1 MAR = SP = SP − 1; rd Read in next-to-top word on stack ior2 H = TOS H = top of stack MDR = TOS = MDR OR H; wr; goto (MBR1) OR Fetched TOS-1 with TOS 2ior3 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 dup1 MAR = SP = SP + 1 Increment SP; copy to MAR MDR = TOS; wr; goto (MBR1) Write new stack word 2dup2 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 pop1 MAR = SP = SP − 1; rd Read in next-to-top word on stack pop2 Wait for read TOS = MDR; goto (MBR1) Copy new word to TOS 2pop3 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 swap1 MAR = SP − 1; rd Read 2nd word from stack; set MAR to SP swap2 MAR = SP Prepare to write new 2nd word swap3 H = MDR; wr Save new TOS; write 2nd word to stack swap4 MDR = TOS Copy old TOS to MDR swap5 MAR = SP − 1; wr Write old TOS to 2nd place on stack TOS = H; goto (MBR1) Update TOS 2swap6 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 bipush1 SP = MAR = SP + 1 Set up MAR for writing to new top of stack MDR = TOS = MBR1; wr; goto (MBR1) Update stack in TOS and memory 2bipush2 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 iload1 MAR = LV + MBR1U; rd Move LV + index to MAR; read operand iload2 MAR = SP = SP + 1 Increment SP; Move new SP to MAR iload3 TOS = MDR; wr; goto (MBR1) Update stack in TOS and memory 22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 istore1 MAR = LV + MBR1U Set MAR to LV + index istore2 MDR = TOS; wr Copy TOS for storing istore3 MAR = SP = SP − 1; rd Decrement SP; read new TOS istore4 Wait for read TOS = MDR; goto (MBR1) Update TOS 2istore5 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 goto (MBR1 OR 0x100) Next address is 0x100 Ored with opcode 2wide1 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 wide 3 iload1 MAR = LV + MBR2U; rd; goto iload2 Identical to iload1 but using 2-byte index 22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 3istore1 MAR = LV + MBR2U; goto istore2 Identical to istore1 but using 2-byte index 2wide 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 3w1 MAR = CPP + MBR2U; rd; goto iload2 Same as wide 3iload1 but indexing off CPP 2ldc 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
iinc1 MAR = LV + MBR1U; rd Set MAR to LV + index for read iinc2 H = MBR1 Set H to constant iinc3 MDR = MDR + H; wr; goto (MBR1) Increment by constant and update 22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 goto1 H = PC − 1 Copy PC to H goto2 PC = H + MBR2 Add offset and update PC goto3 Have to wait for IFU to fetch new opcode goto (MBR1) Dispatch to next instruction 2goto4 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 iflt1 MAR = SP = SP − 1; rd Read in next-to-top word on stack iflt2 OPC = TOS Save TOS in OPC temporarily iflt3 TOS = MDR Put new top of stack in TOS iflt4 N = OPC; if (N) goto T; else goto F Branch on N bit
Figure 4-30. The microprogram for the Mic-2 (part 1 of 2).
Operations Comments 2Label 22222222222222222222222222222222222222222222222222222222222222222222222222222222222222 ifeq1 MAR = SP = SP − 1; rd Read in next-to-top word of stack ifeq2 OPC = TOS Save TOS in OPC temporarily ifeq3 TOS = MDR Put new top of stack in TOS ifeq4 Z = OPC; if (Z) goto T; else goto F Branch on Z bit 222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 if3icmpeq1 MAR = SP = SP − 1; rd Read in next-to-top word of stack if3icmpeq2 MAR = SP = SP − 1 Set MAR to read in new top-of-stack if3icmpeq3 H = MDR; rd Copy second stack word to H if3icmpeq4 OPC = TOS Save TOS in OPC temporarily if3icmpeq5 TOS = MDR Put new top of stack in TOS if 3 icmpeq6 Z = H − OPC; if (Z) goto T; else goto F If top 2 words are equal, goto T, else goto F 222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 T H = PC − 1; goto goto2 Same as goto1 222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 F H = MBR2 Touch bytes in MBR2 to discard F2 goto (MBR1) 222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 invokevirtual1 MAR = CPP + MBR2U; rd Put address of method pointer in MAR invokevirtual2 OPC = PC Save Return PC in OPC invokevirtual3 PC = MDR Set PC to 1st byte of method code. invokevirtual4 TOS = SP − MBR2U TOS = address of OBJREF − 1 invokevirtual5 TOS = MAR = H = TOS + 1 TOS = address of OBJREF invokevirtual6 MDR = SP + MBR2U + 1; wr Overwrite OBJREF with link pointer invokevirtual7 MAR = SP = MDR Set SP, MAR to location to hold old PC invokevirtual8 MDR = OPC; wr Prepare to save old PC invokevirtual9 MAR = SP = SP + 1 Inc. SP to point to location to hold old LV invokevirtual10 MDR = LV; wr Save old LV invokevirtual11 LV = TOS; goto (MBR1) Set LV to point to zeroth parameter. 222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 ireturn1 MAR = SP = LV; rd Reset SP, MAR to read Link ptr ireturn2 Wait for link ptr ireturn3 LV = MAR = MDR; rd Set LV, MAR to link ptr; read old PC ireturn4 MAR = LV + 1 Set MAR to point to old LV; read old LV ireturn5 PC = MDR; rd Restore PC ireturn6 MAR = SP ireturn7 LV = MDR Restore LV ireturn8 MDR = TOS; wr; goto (MBR1) Save return value on original top of stack
Figure 4-30. The microprogram for the Mic-2 (part 2 of 2).
Memory control registers
MAR To and from main memory
MDR PC Instruction fetch unit (IFU)
MBR1 MBR2 SP LV
Control signals CPP
Enable onto B bus
TOS
Write C bus to register
OPC
C bus
B bus
H A bus C latch
A latch
ALU control
B latch
6 ALU
Shifter
Figure 4-31. The three-bus data path used in the Mic-3.
N Z
22222222222222222222222222222222222222222222222222222222222222222222222222222222 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Label Operations Comments 1 swap1 MAR = SP − 1; rd 1 Read 2nd word from stack; set MAR to SP 1 1 Prepare to write new 2nd word 1 swap2 MAR = SP 1 Save new TOS; write 2nd word to stack 1 swap3 H = MDR; wr 1 swap4 MDR = TOS Copy old TOS to MDR 1 1 1 swap5 MAR = SP − 1; wr 1 Write old TOS to 2nd place on stack 1122222222222222222222222222222222222222222222222222222222222222222222222222222222 11 swap6 TOS = H; goto (MBR1) Update TOS
Figure 4-32. The Mic-2 code for SWAP.
222222222222222222222222222222222222222222222222222222222222222222222222222222222 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 Swap2 1 Swap3 1 Swap4 1 1 1 Swap1 Swap5 Swap6 1 1 1 1 1 1 1 1 Cy 1 MAR=SP−1;rd 1 MAR=SP 1 H=MDR;wr 1 MDR=TOS 1 MAR=SP−1;wr 1 TOS=H;goto (MBR1) 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 1 B=SP 1 1 1 1 1 1 1 1 2 1 C=B−1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 B=SP 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 3 MAR=C; rd C=B 1 1 1 1 1 1 1 1 4 1 MDR=mem 1 MAR=C 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 5 B=MDR 1 1 1 1 1 1 1 1 6 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 C=B 1 B=TOS 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 7 H=C; wr C=B B=SP 1 1 1 1 1 1 1 1 8 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 Mem=MDR 1 MDR=C 1 C=B−1 1 B=H 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 9 MAR=C; wr C=B 1 1 1 1 1 1 1 1 10 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 Mem=MDR 1 TOS=C 1 11222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 11 11 1 1 1 1 1 1 goto (MBR1)
Figure 4-33. The implementation of SWAP on the Mic-3.
IFU
IFU
Reg A
C
B
IFU
Reg A
C
B
IFU
Reg A
C
B
Reg A
C
B
1
IFU
ALU
ALU
ALU
ALU
Shifter
Shifter
Shifter
Shifter
IFU
Reg A
C
B
IFU
Reg A
C
B
IFU
Reg A
C
B
Reg A
C
B
2
IFU
ALU
ALU
ALU
ALU
Shifter
Shifter
Shifter
Shifter
IFU
Reg A
C
B
IFU
Reg A
C
B
IFU
Reg A
C
B
Reg A
C
B
Instruction
3
IFU
ALU
ALU
ALU
ALU
Shifter
Shifter
Shifter
Shifter
IFU
Reg A
C
B
IFU
Reg A
C
B
IFU
Reg A
C
B
Reg A
C
B
4
Cycle 1
ALU
ALU
ALU
ALU
Shifter
Shifter
Shifter
Shifter
Cycle 2
Cycle 3
Cycle 4
Ti me
Figure 4-34. Graphical illustration of how a pipeline works.
Micro-op ROM index
IJVM length 1
Final Goto Queueing unit
2
From memory
3
Micro-operation ROM IADD ISUB ILOAD IFLT
Instruction fetch unit Decoding unit
Queue of pending micro-ops To/from memory Drives stage 4
ALU
C
M A B
MIR1
Drives stage 5
ALU
C
M A B
MIR2
Drives stage 6
ALU
C
M A B
MIR3
Drives stage 7
ALU
C
M A B
MIR4
4 Registers
7
6 C
A
B
ALU
5
Shifter
Figure 4-35. The main components of the Mic-4.
1
2
3
4
5
6
7
IFU
Decoder
Queue
Operands
Exec
Write back
Memory
Figure 4-36. The Mic-4 pipeline.
CPU package
CPU chip L1-I
Processor board
L1-D
Unified L2 cache
Keyboard controller
Split L1 instruction and data caches
Unified L3 cache
Graphics controller
Main memory (DRAM)
Disk controller
Board-level cache (SRAM)
Figure 4-37. A system with three levels of cache.
Valid
Addresses that use this entry Tag
Entry
Data
2047
65504-65535, 131040-131072, …
7 6 5 4 3 2 1 0
96-127, 65632-65663, 131068-131099 64-95, 65600-65631, 131036-131067, … 32-63, 65568-65599, 131004-131035, … 0-31, 65536-65567, 131072-131003, … (a)
Bits
16
11
TAG
LINE
3
2
WORD BYTE
(b)
Figure 4-38. (a) A direct-mapped cache. (b) A 32-bit virtual address.
Valid
Valid
Tag
Data
Valid
Tag
Data
Valid
Tag
Data
Tag
Data
2047
7 6 5 4 3 2 1 0 Entry A
Entry B
Entry C
Figure 4-39. A four-way associative cache.
Entry D
if (i == 0) k = 1; else k = 2;
(a)
CMP i,0; compare i to 0 BNE Else; branch to Else if not equal Then: MOV k,1; move 1 to k BR Next; unconditional branch to Next Else: MOV k,2; move 2 to k Next: (b)
Figure 4-40. (a) A program fragment. (b) Its translation to a generic assembly language.
Branch/ no branch
Valid Slot
Branch address/tag
6 5 4 3 2 1 0
Valid Slot
Prediction Branch bits address/tag
6 5 4 3 2 1 0 (a)
Valid Slot
Prediction bits Branch Target address/tag address
6 5 4 3 2 1 0 (b)
(c)
Figure 4-41. (a) A 1-bit branch history. (b) A 2-bit branch history. (c) A mapping between branch instruction address and target address.
Branch
No branch
00 Predict no branch
Branch
No branch
Branch
01
10
Predict no branch one more time
Predict branch one more time
No branch
11
Branch
Predict branch
No branch
Figure 4-42. A 2-bit finite-state machine for branch prediction.
222222222222222222222222222222222222222222222222222222222222222222222222222222222 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 Registers being read 1 1 Registers being written 1 1 11 11 1 Cy 1 # 1 Decoded 1 Iss 1 Ret 1 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 1 1 11 1 11 R3=R0 R1 11 1 11 1 1 1 11 1 11 11 11 11 11 11 1 1 11 1 1 1 1 1 1 1 1 * 1 1 1 1 1 1 2 1 R4=R0+R2 1 2 1 1 1 1 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 2 11 1 11 1 11 11 11 11 11 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 3 1 R5=R0+R1 1 3 1 11 3 1 2 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 11 3 1 2 1 1 1 1 1 1 1 11 1 4 1 R6=R1+R4 1 – 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 3 11 2 11 1 11 11 11 11 11 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 4 1 11 1 1 11 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 11 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 11 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 5 1 11 11 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 5 1 R7=R1 *R2 1 5 1 1 2 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 6 1 6 1 R1=R0−R2 1 – 1 11 2 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 7 1 11 11 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 11 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 8 1 5 11 11 1 1 1 1 1 1 1 1 1 1 1 1 9 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 11 11 11 11 11 1 1 1 6 1 1 11 1 7 1 R3=R3 R1 1 7 1 1 1 1 1 1 1 1 1 1 1 11 11 1 1 1 1 1 1 1 1 1 11 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 * 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 11 1 11 1 11 1 11 11 11 11 1 1 11 1 11 11 1 11 11 11 11 1 10 1 1 1 1 1 1 1 1 6 11 11 1 1 1 1 1 1 1 1 1 11 11 1 1 1 1 1 1 1 1 11 11 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 12 1 7 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 11 1 1 1 1 1 1 1 1 1 1 1 1 13 8 R1=R4+R4 8 2 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 11 1 1 1 1 11 11 11 11 11 11 1 1 1 1 1 1 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 11 11 1 14 1 1 1 1 1 1 1 1 2 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 11 11 11 8 1 1 11 11 11 11 11 11 11 1 1 11 11 11 11 11 11 11 1 1 15 11 222222222222222222222222222222222222222222222222222222222222222222222222222222222
Figure 4-43. Operation of a superscalar CPU with in-order issue and in-order completion.
222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Registers being read 1 1 Registers being written 1 222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 11 11 1 222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Cy 11 # 11 Decoded 11 Iss 11 Ret 1 1 0 11 1 11 2 11 3 11 4 11 5 11 6 11 7 1 1 0 11 1 11 2 11 3 11 4 11 5 11 6 11 7 1 1 1 1 1 1 R3=R0 R1 1 1 1 11 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 * 1 1 1 1 1 1 2 1 R4=R0+R2 1 2 1 1 1 1 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 2 11 1 11 1 11 11 11 11 11 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 3 1 R5=R0+R1 1 3 1 11 3 1 2 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 11 3 1 2 1 1 1 1 1 1 1 11 1 4 1 R6=R1+R4 1 – 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 11 5 11 R7=R1 *R2 11 5 11 1 1 3 11 3 11 2 11 11 11 11 11 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 6 1 S1=R0−R2 1 6 1 11 4 1 3 1 3 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 11 1 2 11 3 1 3 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 4 11 1 1 3 11 4 11 2 11 11 1 11 11 11 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 3 1 4 1 2 1 1 1 1 1 1 11 1 7 1 R3=R3 *S1 1 – 1 1 1 1 1 1 1 1 1 1 11 3 1 4 1 2 1 1 3 1 1 1 11 1 8 1 S2=R4+R4 1 8 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 2 1 3 1 2 1 1 3 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 11 1 2 2 3 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 11 1 1 1 1 1 1 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 11 11 11 11 11 1 11 1 1 5 1 6 11 2 1 1 1 3 1 1 1 1 1 1 1 1 1 11 1 1 7 1 1 2 1 1 1 1 1 3 1 1 1 11 11 1 1 1 1 1 1 1 1 1 1 1 11 1 6 11 1 1 1 1 1 4 11 1 1 1 1 1 1 1 2 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 5 11 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 8 11 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 7 1 11 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 8 1 11 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 11 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 9 1 1 1 1 7 11 11 1 1 1 1 1 1 1 11 11 1 1 1 1 1 1 1 1
Figure 4-44. Operation of a superscalar CPU with out-oforder issue and out-of-order completion.
evensum = 0;
evensum = 0;
oddsum = 0;
oddsum = 0; i = 0;
i = 0; while (i < limit) {
i >= limit
while (i < limit)
k = i * i * i;
k = i * i * i; if ((i/2) * 2) = = 0)
if ((i/2) * 2) = = 0) evensum = evensum + k;
T
F
evensum = evensum + k;
else
oddsum = oddsum + k;
oddsum = oddsum + k; i = i + 1;
i = i + 1; } (a)
(b)
Figure 4-45. (a) A program fragment. (b) The corresponding basic block graph.
To level 2 cache
Local bus to PCI bridge
Bus interface unit
Level 1 I-cache
Fetch/Decode unit
Level 1 D-cache
Dispatch/Execute unit
Retire unit
Micro-operation pool (ROB)
Figure 4-46. The Pentium II microarchitecture.
Level 1 I-cache Pipeline stage IFU0
Cache line fetcher
Next IP
IFU1
Instruction length decoder
Dynamic branch predictor
IFU2
Instruction aligner
ID0
0
1
2
ID1
Micro-operation queuer
RAT
Register allocator
Micro-operation sequencer Static branch predictor
ROB Micro-operations go in the ROB
Figure 4-47. Internal structure of the Fetch/Decode unit (simplified).
Port 0
MMXExecution unit Floating-Point execution unit Integer execution unit
Port 1
MMXExecution unit Floating-Point execution unit Integer execution unit
Reservation station
From/to ROB
Port 2
Load Unit
Loads
Port 3
Store Unit
Stores
Port 4
Store Unit
Stores
Figure 4-48. The Dispatch/Execute unit.
To main memory Memory interface unit
Level 2 cache
External cache unit
Prefetch/Dispatch unit Level 1 cache
Grouping logic
Integer execution unit
Floating-point unit
Load/store unit
Integer registers
FP registers
Level 1 D-cache
ALU
ALU
FP ALU
FP ALU
Load store
Graphics unit
Figure 4-49. The UltraSPARC II microarchitecture.
Store queue
Integer pipeline Execute Fetch
Decode
Cache
N1
N2
Group
N3 Register
X1
X2
X3
Floating-point/graphics pipeline
Figure 4-50. The UltraSPARC II’s pipeline.
Write
Memory and I/O bus interface unit 32
32
0-16 KB Instruction cache
0-16 KB Data cache
32 Prefetch, decode, and folding unit
32 Execution control unit
Integer and floating-point unit
2 x 32
3 x 32
64 32-Bit registers for holding the top 64 words of the stack
Figure 4-51. The block diagram of the picoJava II with both level 1 caches and the floating-point unit. This is configuration of the microJava 701.
Fetch from I-cache
Decode and fold
Fetch operands from stack
Execute instruction
Access data cache
Figure 4-52. The picoJava II has a six-stage pipeline.
Write results to stack
Without folding SP k
m k
k+m
k
k
k
k
k
k
k
n
n
n
n
k+m
n
k+m
m
m
m
m
m
m
m
Start
After folded instruction
SP SP 8 7 6 5 4 3 2 1 0
With folding
Start
After ILOAD k
After ILOAD m (a)
After IADD
After ISTORE n
(b)
Figure 4-53. (a) Execution of a four-instruction sequence to compute n = k + m. (b) The same sequence folded to one instruction.
22222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Group Description Example 21 2222222222222222222222222222222222222222222222222222222222222222222222222222222 1 NF 1 Nonfoldable instructions 1 GOTO 1 21 2222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 LV 21 2222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Pushing a word onto the stack 1 ILOAD 1 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 MEM Popping a word and storing it in memory ISTORE 1 BG1 1 Operations using one stack operand 1 IFEQ 1 21 2222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 BG2 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Operations using two stack operands 1 IF3CMPEQ 1 122222222222222222222222222222222222222222222222222222222222222222222222222222222 OP 1 Computations on two operands with one result 1 IADD 1
Figure 4-54. JVM instruction groups for folding purposes.
222222222222222222222222222222222222222222222222222222222222222222222222222 1222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 Instruction sequence Example 1 1 1 1 LV 1 1 MEM LV OP ILOAD, ILOAD, IADD, ISTORE 222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 LV OP ILOAD, ILOAD, IADD 1 LV 1 1 1 LV 1 1 1 LV 1 BG2 1 ILOAD, ILOAD, IF3CMPEQ 1222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 LV BG1 1222222222222222222222222222222222222222222222222222222222222222222222222222 1 ILOAD, IFEQ 1 1 1 1 1 LV 1 1 ILOAD, IF3CMPEQ 1 BG2 1 1 222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 MEM 1 1 LV ILOAD, ISTORE 1222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222 11 IADD, ISTORE 11 1 OP 1 MEM 1 1 Figure 4-55. Some of the JVM instruction sequences that can be folded.
5 THE INSTRUCTION SET ARCHITECTURE LEVEL
1
FORTRAN 90 program
C program
FORTRAN 90 program compiled to ISA program
C program compiled to ISA program Software
ISA level Hardware ISA program executed by microprogram or hardware
Hardware
Figure 5-1. The ISA level is the interface between the compilers and the hardware.
Address
Address
8 Bytes
15
14
13
12
11
8 Bytes
10
9
8
24 16 8 0
19 15
14
13
17
16
12
Aligned 8-byte word at address 8 (a)
18
24 16 8 0
Nonaligned 8-byte word at address 12 (b)
Figure 5-2. An 8-byte word in a little-endian memory. (a) Aligned. (b) Not aligned. Some machines require that words in memory be aligned.
Bits
16
8 AH BH CH DH
8 A X B X C X D X
AL
EAX
BL
EBX
CL
ECX
DL
EDX
ESI EDI EBP ESP
CS SS DS ES FS GS
EIP
EFLAGS
Figure 5-3. The Pentium II’s primary registers.
222222222222222222222222222222222222222222222222222222222222222222222 1 Register 1 Alt. name 1 Function 21 22222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 R0 G0 Hardwired to 0. Stores into it are just ignored. 21 22222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 R1 – R7 G1 – G7 Holds global variables 1 R8 – R13 1 O0 – O5 1 Holds parameters to the procedure being called 1 21 22222222222222222222222222222222222222222222222222222222222222222222 1 1 1 R14 1 SP 1 Stack pointer 1 21 22222222222222222222222222222222222222222222222222222222222222222222 1 R15 1 O7 1 Scratch register 1 21 22222222222222222222222222222222222222222222222222222222222222222222 1 1 1 R16 – R23 1 L0 – L7 1 Holds local variables for the current procedure 1 21 22222222222222222222222222222222222222222222222222222222222222222222 1 Holds incoming parameters 1 R24 – R29 1 I0 – I5 21 22222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 R30 FP Pointer to the base of the current stack frame 21 22222222222222222222222222222222222222222222222222222222222222222222 1 1 1 11222222222222222222222222222222222222222222222222222222222222222222222 1 1 R31 1 I7 1 Holds return address for the current procedure 11 Figure 5-4. The UltraSPARC II’s general registers.
R0 R1
G0 G1
G0 G1
Global 7
R7
G7
R8
O0
0 Global 1
…
… …
G7
R0 R1
…
… … R7
0 Global 1
Global 7
CWP = 6
… …
Alternative name
R13 R14 R15
O5 SP O7
Stack pointer Temporary
R16
L0
Local 0
…
… … CWP = 7 R8
O0
Outgoing parmeter 0
Local 7
R24
I0
Incoming parameter 0
Outgoing parmeter 5 Stack pointer Temporary
R16
L0
Local 0
…
… … R23
L7
Local 7
R24
10
Incoming parameter 0
… … Incoming parmeter 5 Frame pointer Return address (a)
Overlap
R29 R30 R31
CWP decremented on call in this direction
I5 FP I7
…
OS SP O7
I5 FP I7
L7
… …
… … R13 R14 R15
R29 R30 R31
R23
Incoming parmeter 5 Frame pointer Return address
Part of previous window
Part of previous window
(b)
Figure 5-5. Operation of the UltraSPARC II register windows.
2222222222222222222222222222222222222222222222222222222222222222222222222 1 8 Bits 1 16 Bits 1 32 Bits 1 64 Bits 1 128 Bits 1 Type 21 222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 Signed integer × × × 21 222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 12222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 Unsigned integer × × × 1 1 1 1 1 1 1 Binary coded decimal integer 1 × 21 222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 112222222222222222222222222222222222222222222222222222222222222222222222222 11 11 11 11 11 11 Floating point × ×
Figure 5-6. The Pentium II numeric data types. Supported types are marked with ×.
2222222222222222222222222222222222222222222222222222222222222222222222222 1 8 Bits 1 16 Bits 1 32 Bits 1 64 Bits 1 128 Bits 1 Type 21 222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 Signed integer × × × × 21 222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 1 1 12222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Unsigned integer × × × × 1 1 1 1 Binary coded decimal integer 1 1 1 21 222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 Floating point × × × 12222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 Figure 5-7. The UltraSPARC II numeric data types.
2222222222222222222222222222222222222222222222222222222222222222222222222 1 8 Bits 1 16 Bits 1 32 Bits 1 64 Bits 1 128 Bits 1 Type 21 222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 Signed integer × × × × 21 222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 1 1 12222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Unsigned integer 1 1 1 1 Binary coded decimal integer 1 1 1 21 222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 Floating point × × 12222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 Figure 5-8. The JVM numeric data types.
OPCODE (a)
OPCODE
ADDRESS1 ADDRESS2 (c)
OPCODE
ADDRESS (b)
OPCODE ADDR1 ADDR2 ADDR3 (d)
Figure 5-9. Four common instruction formats: (a) Zeroaddress instruction. (b) One-address instruction (c) Twoaddress instruction. (d) Three-address instruction.
1 Word
1 Word
Instruction
Instruction
Instruction
Instruction
Instruction
Instruction
Instruction
Instruction
Instruction
Instruction
Instruction
Instruction
(a)
(b)
1 Word Instruction Instruction
Instr.
Instruction (c)
Figure 5-10. Some possible relationships between instruction and word length.
Instr.
15
14
13
Opcode
12
11
10
Address 1
9
8
7
6
Address 2
5
4
3
2
Address 3
Figure 5-11. An instruction with a 4-bit opcode and three 4-bit address fields.
1
16 bits 4-bit opcode
0000 xxxx yyyy zzzz 0001 xxxx yyyy zzzz 0010 xxxx yyyy zzzz
15 3-address instructions
… 1100 xxxx yyyy zzzz 1101 xxxx yyyy zzzz 1110 xxxx yyyy zzzz 8-bit opcode
1111 0000 yyyy zzzz 1111 0001 yyyy zzzz 1111 0010 yyyy zzzz
14 2-address instructions
… 1111 1011 yyyy zzzz 1111 1100 yyyy zzzz 1111 1101 yyyy zzzz 12-bit opcode
1111 1110 0000 zzzz 1111 1110 0001 zzzz
31 1-address instructions
… 1111 1111 1111 1111
1110 1110 1111 1111
1110 1111 0000 0001
zzzz zzzz zzzz zzzz
… 1111 1111 1101 zzzz 1111 1111 1110 zzzz 16-bit opcode
1111 1111 1111 0000 1111 1111 1111 0001 1111 1111 1111 0010
16 0-address instructions
… 1111 1111 1111 1101 1111 1111 1111 1110 1111 1111 1111 1111 15 12 11 8 7 4 3 0 Bit number
Figure 5-12. An expanding opcode allowing 15 three-address instructions, 14 two-address instructions, 31 one-address instructions, and 16 zero-address instructions. The fields marked xxxx, yyyy, and zzzz are 4-bit address fields.
Bytes
Bits
0-5
1-2
0-1
0-1
0-4
0-4
PREFIX
OPCODE
MODE
SIB
DISPLACMENT
IMMEDIATE
6
1 1
Bits
INSTRUCTION
2
3
SCALE
INDEX
3 BASE
Which operand is source? Byte/word Bits
2
3
3
MOD
REC
R/M
Figure 5-13. The Pentium II instruction formats.
Format 1a
2
1b 2 2 2 3
6
5
1
8
5
DEST
OPCODE
SRC1
0
FP-OP
SRC2
DEST
OPCODE
SRC1
1
IMMEDIATE CONSTANT
5
3
22
DEST
OP
IMMEDIATE CONSTANT
3
22
OP
PC-RELATIVE DISPLACEMENT
4
A COND 2
4
1
5
3 Register Immediate
SETHI
BRANCH
30 PC-RELATIVE DISPLACEMENT
Figure 5-14. The original SPARC instruction formats.
CALL
Bits
8
8
8
8
8
Format 1
OPCODE
2
OPCODE
3
OPCODE
4
OPCODE
5
OPCODE
INDEX
DIMENSIONS
6
OPCODE
INDEX
#PARAMETERS
7
OPCODE
INDEX
8
OPCODE
32-BIT BRANCH OFFSET
9
OPCODE
VARIABLE LENGTH…
BYTE
BYTE = index, constant or type
SHORT
INDEX
SHORT = index, constant or offset
CONST
CONST
Figure 5-15. The JVM instruction formats.
0
22222222222222222222222222222 1122222222222222222222222222222 11 11 MOV R1 4 11 Figure 5-16. An immediate instruction for loading 4 into register 1.
MOV R1,#0 ; accumulate the sum in R1, initially 0 MOV R2,#A ; R2 = address of the array A MOV R3,#A+1024; R3 = address if the first word beyond A LOOP: ADD R1,(R2); register indirect through R2 to get operand ADD R2,#4 ; increment R2 by one word (4 bytes) CMP R2,R3 ; are we done yet? BLT LOOP ; if R2 < R3, we are not done, so continue Figure 5-17. A generic assembly program for computing the sum of the elements of an array.
MOV R1,#0 ; accumulate the OR in R1, initially 0 MOV R2,#0 ; R2 = index, i, of current product: A[i] AND B[i] MOV R3,#4096; R3 = first index value not to use LOOP: MOV R4,A(R2); R4 = A[i] AND R4,B(R2) ; R4 = A[i] AND B[i] OR R1,R4 ; OR all the Boolean products into R1 ADD R2,#4 ; i = i + 4 (step in units of 1 word = 4 bytes) CMP R2,R3 ; are we done yet? BLT LOOP ; if R2 < R3, we are not done, so continue Figure 5-18. A generic assembly program for computing the OR of Ai AND Bi for two 1024-element arrays.
222222222222222222222222222222222222222222 11222222222222222222222222222222222222222222 11 11 11 124300 11 MOV R4 R2 Figure 5-19. A possible representation of MOV R4,A(R2) .
A
California
x
(
B
+
C
)
New York Switch
Texas ⊥
Figure 5-20. Each railroad car represents one symbol in the formula to be converted from infix to reverse Polish notation.
⊥
Most recently arrived car on the Texas line
⊥
Car at the switch + – x / (
)
⊥
4
1
1
1
1
1
5
+
2
2
2
1
1
1
2
–
2
2
2
1
1
1
2
x
2
2
2
2
2
1
2
/
2
2
2
2
2
1
2
(
5
1
1
1
1
1
3
Figure 5-21. Decision table used by the infix-to-reverse Polish notation algorithm
22222222222222222222222222222222222222222222222222222222222222 122222222222222222222222222222222222222222222222222222222222222 1 Infix Reverse Polish notation 1 1 1 1 A + B × C A B C × + 2 2222222222222222222222222222222222222222222222222222222222222 1 1 1 122222222222222222222222222222222222222222222222222222222222222 1 1 A×B+C AB×C+ 1 A×B+C×D 1 AB×CD×+ 1 122222222222222222222222222222222222222222222222222222222222222 1 1 (A + B) / (C − D) 122222222222222222222222222222222222222222222222222222222222222 1 AB+CD−/ 1 1 A×B/C 1 AB×C/ 1 2 2222222222222222222222222222222222222222222222222222222222222 1 1 1 ((A + B) × C + D)/(E + F + G) 122222222222222222222222222222222222222222222222222222222222222 1 AB+C×D+EF+G+/ 1 Figure 5-22. Some examples of infix expressions and their reverse Polish notation equivalents.
2222222222222222222222222222222222222222222222222222222222222222 1 Instruction 1 1 Step 1 Remaining string Stack 21 222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 825×+132×+4−/ BIPUSH 8 8 21 222222222222222222222222222222222222222222222222222222222222222 1 1 1 12222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 2 25×+132×+4−/ BIPUSH 2 8, 2 1 1 BIPUSH 5 1 8, 2, 5 1 3 1 5×+132×+4−/ 21 222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 4 1 ×+132×+4−/ 1 IMUL 1 8, 10 1 21 222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 5 +132×+4−/ IADD 18 21 222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 6 1 132×+4−/ 21 222222222222222222222222222222222222222222222222222222222222222 1 BIPUSH 1 1 18, 1 1 1 BIPUSH 3 1 18, 1, 3 1 7 1 32×+4−/ 21 222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 8 1 2×+4−/ BIPUSH 2 18, 1, 3, 2 21 222222222222222222222222222222222222222222222222222222222222222 1 1 1 12222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 9 ×+4−/ IMUL 18, 1, 6 1 10 1 + 4 − / 1 IADD 1 18, 7 1 21 222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 11 1 4 − / 1 BIPUSH 4 1 18, 7, 4 1 21 222222222222222222222222222222222222222222222222222222222222222 1 12 1 − / 1 ISUB 1 18, 3 1 21 222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 13 1 / 12222222222222222222222222222222222222222222222222222222222222222 1 IDIV 1 6 1 Figure 5-23. Use of a stack to evaluate a reverse Polish notation formula.
Bits
8
1
5
5
5
1
OPCODE
0
DEST
SRC1
SRC2
2
OPCODE
1
DEST
SRC1
3
OPCODE
8
OFFSET
OFFSET
Figure 5-24. A simple design for the instruction formats of a three-address machine.
Bits
8
3
OPCODE
MODE
5
4
3
5
4
REG
OFFSET
MODE
REG
OFFSET
(Optional 32-bit direct address or offset) (Optional 32-bit direct address or offset)
Figure 5-25. A simple design for the instruction formats of a two-address machine.
22222222222222222222222222222222222222222222222222222222222222222222222 122222222222222222222222222222222222222222222222222222222222222222222222 1 1 MOD 1 1 1 1 1 1 R/M 00 01 10 11 2 2222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 122222222222222222222222222222222222222222222222222222222222222222222222 1 000 M[EAX] 1 M[EAX + OFFSET8] 1 M[EAX + OFFSET32] 1 EAX or AL 1 1 001 1 M[ECX] 1 M[ECX + OFFSET8] 1 M[ECX + OFFSET32] 1 ECX or CL 1 122222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 010 1 M[EDX] M[EDX + OFFSET8] M[EDX + OFFSET32] EDX or DL 1 122222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 011 1 M[EBX] 1 M[EBX + OFFSET8] 1 M[EBX + OFFSET32] 1 EBX or BL 1 122222222222222222222222222222222222222222222222222222222222222222222222 1 1 SIB with OFFSET8 1 SIB with OFFSET32 1 ESP or AH 1 100 1 SIB 122222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 122222222222222222222222222222222222222222222222222222222222222222222222 101 1 Direct 1 M[EBP + OFFSET8] 1 M[EBP + OFFSET32] 1 EBP or CH 1 1 1 1 M[ESI + OFFSET8] 1 M[ESI + OFFSET32] 1 ESI or DH 1 110 M[ESI] 2 2222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1122222222222222222222222222222222222222222222222222222222222222222222222 1 111 1 M[EDI] 1 M[EDI + OFFSET8] 1 M[EDI + OFFSET32] 1 EDI or BH 11 Figure 5-26. The Pentium II 32-bit addressing modes. M[x] is the memory word at x.
EBP
i in EAX
Other local variables Stack frame
a [0]
EBP + 8
a [1]
EBP + 12
a [2]
EBP + 16
SIB Mode refrences M[4 * EAX + EBP + 8]
Figure 5-27. Access to a[i].
2222222222222222222222222222222222222222222222222222222222222222 12Addressing 1 mode 1 Pentium II 1 UltraSPARC II 1 JVM 222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 Immediate × × × 2 222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 12Direct 1 1 1 1 × 222222222222222222222222222222222222222222222222222222222222222 1 Register 1 1 1 1 × × 12222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 indirect × 12Register 1 1 1 1 222222222222222222222222222222222222222222222222222222222222222 1 Indexed 1 1 1 1 × × × 21 222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 × 12Based-indexed 222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 112Stack 11 11 11 11 × 222222222222222222222222222222222222222222222222222222222222222 Figure 5-28. A comparison of addressing modes.
i = 1; L1: first-statement; . . . last-statement; i = i + 1; if (i < n) goto L1; (a)
i = 1; L1: if (i > n) goto L2; first-statement; . . . last-statement i = i + 1; goto L1; L2: (b)
Figure 5-29. (a) Test-at-the-end loop. (b) Test-at-the-beginning loop.
Character available Keyboard status
Interrupt enabled
Ready for next character Display status
Interrupt enabled
Keyboard buffer
Display buffer
Character received
Character to display
Figure 5-30. Device registers for a simple terminal.
public static void output3buffer(int buf[ ], int count) { // Output a block of data to the device int status, i, ready; for (i = 0; i < count; i++) { do { status = in(display3status3reg);// get status ready = (status << 7) & 0x01;// isolate ready bit } while (ready == 1); out(display 3buffer3reg, buf[i]); } } Figure 5-31. An example of programmed I/O.
Terminal
Address CPU
DMA Count
100 32 4 1
…
100
RS232C Controller
…
Device
Memory
Direction
Bus
Figure 5-32. A system with a DMA controller.
Transfer of control
Moves MOV DST,SRC
Move SRC to DST
JMP ADDR
Jump to ADDR
PUSH SRC
Push SRC onto the stack
Jxx ADDR
Conditional jumps based on flags
POP DST
Pop a word from the stack to DST
CALL ADDR
Call procedure at ADDR
XCHG DS1,DS2
Exchange DS1 and DS2
RET
Return from procedure
LEA DST,SRC
Load effective addr of SRC into DST
IRET
Return from interrupt
CMOV DST,SRC
Conditional move
LOOPxx
Loop until condition met
INT ADDR
Initiate a software interrupt
INTO
Interrupt if overflow bit is set
Arithmetic ADD DST,SRC
Add SRC to DST
SUB DST,SRC
Subtract DST from SRC
MUL SRC
Multiply EAX by SRC (unsigned)
LODS
Load string
IMUL SRC
Multiply EAX by SRC (signed)
STOS
Store string
DIV SRC
Divide EDX:EAX by SRC (unsigned)
MOVS
Move string
IDIV SRC
Divide EDX:EAX by SRC (signed)
CMPS
Compare two strings
ADC DST,SRC
Add SRC to DST, then add carry bit
SCAS
Scan Strings
SBB DST,SRC
Subtract DST & carry from SRC
INC DST
Add 1 to DST
DEC DST
Subtract 1 from DST
STC
Set carry bit in EFLAGS register
NEG DST
Negate DST (subtract it from 0)
CLC
Clear carry bit in EFLAGS register
CMC
Complement carry bit in EFLAGS
Binary coded decimal
STD
Set direction bit in EFLAGS register
DAA
Decimal adjust
CLD
Clear direction bit in EFLAGS reg
DAS
Decimal adjust for subtraction
STI
Set interrupt bit in EFLAGS register
AAA
ASCII adjust for addition
CLI
Clear interrupt bit in EFLAGS reg
AAS
ASCII adjust for subtraction
PUSHFD
Push EFLAGS register onto stack
AAM
ASCII adjust for multiplication
AAD
ASCII adjust for division
POPFD
Pop EFLAGS register from stack
LAHF
Load AH from EFLAGS register
SAHF
Store AH in EFLAGS register
SWAP DST
Change endianness of DST
CWQ
Extend EAX to EDX:EAX for division
CWDE
Extend 16-bit number in AX to EAX
ENTER SIZE,LV
Create stack frame with SIZE bytes
LEAVE
Undo stack frame built by ENTER
NOP
No operation
HLT
Halt
IN AL,PORT
Input a byte from PORT to AL
OUT PORT,AL
Output a byte from AL to PORT
WAIT
Wait for an interrupt
Boolean AND DST,SRC
Boolean AND SRC into DST
OR DST,SRC
Boolean OR SRC into DST
XOR DST,SRC
Boolean Exclusive OR SRC to DST
NOT DST
Replace DST with 1’s complement
Shift/rotate SAL/SAR DST,#
Shift DST left/right # bits
SHL/SHR DST,#
Logical shift DST left/right # bits
ROL/ROR DST,#
Rotate DST left/right # bits
RCL/RCR DST,#
Rotate DST through carry # bits
Test/compare TST SRC1,SRC2
Boolean AND operands, set flags
CMP SRC1,SRC2
Set flags based on SRC1 - SRC2
Strings
Condition codes
Miscellaneous
SRC = source DST = destination
# = shift/rotate count LV = # locals
Figure 5-33. A selection of the Pentium II integer instructions.
LDSB ADDR,DST LDUB ADDR,DST LDSH ADDR,DST LDUH ADDR,DST LDSW ADDR,DST LDUW ADDR,DST LDX ADDR,DST
Loads Load signed byte (8 bits) Load unsigned byte (8 bits) Load signed halfword (16 bits) Load unsigned halfword (16) Load signed word (32 bits) Load unsigned word (32 bits) Load extended (64-bits)
STB SRC,ADDR STH SRC,ADDR STW SRC,ADDR STX SRC,ADDR
Stores Store byte (8 bits) Store halfword (16 bits) Store word (32 bits) Store extended (64 btis)
Arithmetic ADD R1,S2,DST Add ADDCC “ Add and set icc “ ADDC Add with carry ADDCCC “ Add with carry and set icc SUB R1,S2,DST Subtract “ SUBCC Subtract and set icc “ SUBC Subtract with carry SUBCCC “ Subtract with carry and set icc MULX R1,S2,DST Multiply SDIVX R1,S2,DST Signed divide UDIVX R1,S2,DST Unsigned divide TADCC R1,S2,DST Tagged add Shifts/rotates SLL R1,S2,DST Shift left logical (64 bits) SLLX R1,S2,DST Shift left logical extended (64) SRL R1,S2,DST Shift right logical (32 bits) SRLX R1,S2,DST Shift right logical extended (64) SRA R1,S2,DST Shift right arithmetic (32 bits) SRAX R1,S2,DST Shift right arithmetic ext. (64)
SRC = source register DST = destination register R1 = source register S2 = source: register or immediate ADDR = memory address
Boolean AND R1,S2,DST Boolean AND ANDCC “ Boolean AND and set icc “ ANDN Boolean NAND ANDNCC “ Boolean NAND and set icc OR R1,S2,DST Boolean OR “ ORCC Boolean OR and set icc “ ORN Boolean NOR “ ORNCC Boolean NOR and set icc XOR R1,S2,DST Boolean XOR “ XORCC Boolean XOR and set icc “ XNOR Boolean EXCLUSIVE NOR XNORCC “ Boolean EXCL. NOR and set icc Transfer of control BPcc ADDR Branch with prediction BPr SRC,ADDR Branch on register CALL ADDR Call procedure RETURN ADDR Return from procedure JMPL ADDR,DST Jump and Link SAVE R1,S2,DST Advance register windows RESTORE “ Restore register windows Tcc CC,TRAP# Trap on condition PREFETCH FCN Prefetch data from memory LDSTUB ADDR,R Atomic load/store MEMBAR MASK Memory barrier Miscellaneous SETHI CON,DST Set bits 10 to 31 MOVcc CC,S2,DST Move on condition MOVr R1,S2,DST Move on register NOP No operation POPC S1,DST Population count RDCCR V,DST Read condition code register WRCCR R1,S2,V Write condition code register RDPC V,DST Read program counter
TRAP# = trap number FCN = function code MASK = operation type CON = constant V = register designator
CC = condition code set R =destination register cc = condition r = LZ,LEZ,Z,NZ,GZ,GEZ
Figure 5-34. The primary UltraSPARC II integer instructions.
22222222222222222222222222222222222222222222222222222222222222222222222 1 122222222222222222222222222222222222222222222222222222222222222222222222 1 Instruction How to do it 1 1 1 MOV SRC,DST OR SRC with G0 and store the result DST 22222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 122222222222222222222222222222222222222222222222222222222222222222222222 CMP SRC1,SRC2 SUBCC SRC2 from SRC1 and store the result in G0 1 1 ORCC SRC1 with G0 and store the result in G0 1 1 TST SRC 22222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 NOT DST 1 XNOR DST with G0 1 122222222222222222222222222222222222222222222222222222222222222222222222 1 SUB DST from G0 and store in DST 1 1 NEG DST 22222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 INC DST 1 ADD 1 to DST (immediate operand) 1 122222222222222222222222222222222222222222222222222222222222222222222222 1 1 SUB 1 from DST (immediate operand) 122222222222222222222222222222222222222222222222222222222222222222222222 DEC DST 1 1 1 CLR DST OR G0 with G0 and store in DST 22222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 122222222222222222222222222222222222222222222222222222222222222222222222 NOP SETHI G0 to 0 1 1 1 RET 1 22222222222222222222222222222222222222222222222222222222222222222222222 1 JMPL %I7+8,%G0 1 Figure 5-35. Some simulated UltraSPARC II instructions.
typeLOAD IND8 typeALOAD BALOAD SALOAD CALOAD AALOAD
Loads Push local variable onto stack Push array element on stack Push byte from an array on stack Push short from an array on stack Push char from an array on stack Push pointer from an array on ”
typeSTORE IND8 typeASTORE BASTORE SASTORE CASTORE AASTORE
Stores Pop value and store in local var Pop value and store in array Pop byte and store in array Pop short and store in array Pop char and store in array Pop pointer and store in array
BIPUSH CON8 SIPUSH CON16 LDC IND8 typeCONST_# ACONST_NULL
Pushes Push a small constant on stack Push 16-bit constant on stack Push constant from const pool Push immediate constant Push a null pointer on stack
typeADD typeSUB typeMUL typeDIV typeREM typeNEG
Arithmetic Add Subtract Multiple Divide Remainder Negate
ilAND ilOR ilXOR ilSHL ilSHR ilUSHR
Boolean/shift Boolean AND Boolean OR Boolean EXCLUSIVE OR Shift left Shift right Unsigned shift right
x2y i2c i2b
Conversion Convert x to y Convert integer to char Convert integer to byte
DUPxx POP POP2 SWAP
Stack management Six instructions for duping Pop an int from stk and discard Pop two ints from stk and discard Swap top two ints on stack
Comparison IF_ICMPrel OFFSET16 Conditional branch IF_ACMPEQ OFFSET16 Branch if two ptrs equal IF_ACMPNE OFFSET16 Branch if ptrs unequal IFrel OFFSET16 Test 1 value and branch IFNULL OFFSET16 Branch if ptr is null IFNONNULL OFFSET16 Branch if ptr is nonnull LCMP Compare two longs FCMPL Compare 2 floats for < FCMPG Compare 2 floats for > DCMPL Compare doubles for < DCMPG Compare doubles for > Transfer of control INVOKEVIRTUAL IND16 Method invocation INVOKESTATIC IND16 Method invocation INVOKEINTERFACE ... Method invocation INVOKESPECIAL IND16 Method invocation JSR OFFSET16 Invoke finally clause typeRETURN Return value ARETURN Return pointer RETURN Return void RET IND8 Return from finally GOTO OFFSET16 Unconditional branch Arrays ANEWARRAY IND16 NEWARRAY ATYPE MULTINEWARRAY IN16,D ARRAYLENGTH
Create array of ptrs Create array of atype Create multidim array Get array length
Miscellaneous IINC IND8,CON8 Increment local variable WIDE Wide prefix NOP No operation GETFIELD IND16 Read field from object PUTFIELD IND16 Write field to object GETSTATIC IND16 Get static field from class NEW IND16 Create a new object INSTANCEOF OFFSET16 Determine type of obj CHECKCAST IND16 Check object type ATHROW Throw exception LOOKUPSWITCH ... Sparse multiway branch TABLESWITCH ... Dense multiway branch MONITORENTER Enter a monitor MONITOREXIT Leave a monitor IND8/16 = index of local variable CON8/16, D, ATYPE = constant
type, x, y = I, L, F, D OFFSET16 for branch
Figure 5-36. The JVM instruction set.
Program counter
Program counter
Jumps
Time
Time
(a)
(b)
Figure 5-37. Program counter as a function of (smoothed). (a) Without branches. (b) With branches.
time
Peg 1
Peg 2
Peg 3
Figure 5-38. Initial configuration for the Towers of Hanoi problem for five disks.
Initial state
First move 2 disks from peg 1 to peg 2
Then move 1 disk from peg 1 to peg 3
Finally move 2 disks from peg 2 to peg 3
Figure 5-39. The steps required to solve the Towers of Hanoi for three disks.
public void towers(int n, int i, int j) { int k; if (n == 1) System.out.println("Move a disk from " + i + " to " + j); else { k = 6 − i − j; towers(n − 1, i, k); towers(1, i, j); towers(n − 1, k, j); } } Figure 5-40. A procedure for solving the Towers of Hanoi.
Address
FP k Old FP = 1000 Return addr j=2 i=1 n=2 k=2 Old FP Return addr j=3 i=1 n=3
k Old FP = 1024 Return addr j=3 i=1 n=1 k=3 Old FP = 1000 Return addr j=2 i=1 n=2 k=2 Old FP Return addr j=3 i=1 n=3
FP k=3 Old FP = 1000 Return addr j=2 i=1 n=2 k=2 Old FP Return addr j=3 i=1 n=3
k=3 Old FP = 1024 Return addr j=2 i=1 n=1 k=3 Old FP = 1000 Return addr j=2 i=1 n=2 k=2 Old FP Return addr j=3 i=1 n=3
(b)
(c)
(d)
(e)
SP
SP
SP
FP
FP k Old FP Return addr j=3 i=1 n=3 (a)
SP
Figure 5-41. The stack at several points during the execution of Fig. 5-40.
1068 1064 1060 1056 1052 1048 1044 1040 1036 1032 1028 1024 1020 1016 1012 1008 1004 1000
A called from main program
(a)
(b)
Calling procedure
Called procedure
CALL LL
CA
RE
TU
RN
RETU
CA
LL
RN
RN
U ET
R
A returns to main program
Figure 5-42. When a procedure is called, execution of the procedure always begins at the first statement of the procedure.
(a) A called from main program
(b)
B
RESUME
RESUME
A
B
RESUME RESU
ME A
EB
RESUM
RE
SU
ME
A
A returns to main program
Figure 5-43. When a coroutine is resumed, execution begins at the statement where it left off the previous time, not at the beginning.
Disk interrupt priority 4 held pending RS232 ISR finishes disk interrupt occurs RS232 interrupt priority 5
Disk ISR finishes
Printer interrupt priority 2
0
Printer ISR finishes
10
15
User Printer program ISR User
20
25
35
40
RS232 ISR
Disk ISR
Printer ISR
User Printer
User Printer
User
User program
Time
Stack
Figure 5-44. Time sequence of multiple interrupt example.
.586 ; compile for Pentium (as opposed to 8088 etc.) .MODEL FLAT PUBLIC 3towers ; export ’towers’ EXTERN 3printf:NEAR ; import printf .CODE 3towers: PUSH EBP; save EBP (frame pointer) MOV EBP, ESP ; set new frame pointer above ESP CMP [EBP+8], 1 ; if (n == 1) JNE L1 ; branch if n is not 1 MOV EAX, [EBP+16] ; printf(" ...", i, j); PUSH EAX ; note that parameters i, j and the format MOV EAX, [EBP+12] ; string are pushed onto the stack PUSH EAX ; in reverse order. This is the C calling convention PUSH OFFSET FLAT:format; offset flat means the address of format CALL 3printf ; call printf ADD ESP, 12 ; remove params from the stack JMP Done ; we are finished L1: MOV EAX, 6 ; start k = 6 − i − j SUB EAX, [EBP+12] ; EAX = 6 − i SUB EAX, [EBP+16] ; EAX = 6 − i − j MOV [EBP+20], EAX ; k = EAX PUSH EAX ; start towers(n − 1, i, k) MOV EAX, [EBP+12] ; EAX = i PUSH EAX ; push i MOV EAX, [EBP+8] ; EAX = n DEC EAX ; EAX = n − 1 PUSH EAX ; push n − 1 CALL 3towers ; call towers(n − 1, i, 6 − i − j) ADD ESP, 12 ; remove params from the stack MOV EAX, [EBP+16] ; start towers(1, i, j) PUSH EAX ; push j MOV EAX, [EBP+12] ; EAX = i PUSH EAX ; push i PUSH 1 ; push 1 CALL 3towers ; call towers(1, i, j) ADD ESP, 12 ; remove params from the stack MOV EAX, [EBP+12] ; start towers(n − 1, 6 − i − j, i) PUSH EAX ; push i MOV EAX, [EBP+20] ; push 20 PUSH EAX ; push k MOV EAX, [EBP+8] ; EAX = n DEC EAX ; EAX = n−1 PUSH EAX ; push n − 1 CALL 3towers ; call towers(n − 1, 6 − i − j, i) ADD ESP, 12 ; adjust stack pointer Done: LEAVE ; prepare to exit RET 0 ; return to the caller .DATA format DB "Move disk from %d to %d\n"; format string END Figure 5-45. The Towers of Hanoi for the Pentium II.
#define N %i0 #define I %i1 #define J %i2 #define K %l0 #define Param0 %o0 #define Param1 %o1 #define Param2 %o2 #define Scratch %l1 .proc 04 .global towers
/* N is input parameter 0 */ /* I is input parameter 1 */ /* J is input parameter 2 */ /* K is local variable 0 */ /* Param0 is output parameter 0 */ /* Param1 is output parameter 1 */ /* Param2 is output parameter 2 */ /* as an aside, cpp uses the C comment convention */
towers: cmp N, 1 bne Else
save %sp, −112, %sp ! if (n == 1) ! if (n != 1) goto Else
sethi %hi(format), Param0 ! printf("Move a disk from %d to %d\n", i, j) or Param0, %lo(format), Param0! Param0 = address of format string mov I, Param1 ! Param1 = i call printf ! call printf BEFORE parameter 2 (j) is set up mov J, Param2 ! use the delay slot after call to set up parameter 2 b Done ! we are done now nop ! fill delay slot Else: mov 6, K sub K, J, K sub K, I, K
! start k = 6 −i − j !k=6−j !k=6−i−j
add N, −1, Scratch mov Scratch, Param0 mov I, Param1 call towers mov K, Param2
! start towers(n − 1, i, k) ! Scratch = N − 1 ! parameter 1 = i ! call towers BEFORE parameter 2 (k) is set up ! use the delay slot after call to set up parameter 2
mov 1, Param0 mov I, Param1 call towers mov J, Param2
! start towers(1, i, j) ! parameter 1 = i ! call towers BEFORE parameter 2 (j) is set up ! parameter 2 = j
mov Scratch, Param0 mov K, Param1 call towers mov J, Param2
! start towers(n − 1, k, j) ! parameter 1 = k ! call towers BEFORE parameter 2 (j) is set up ! parameter 2 = j
Done: ret restore
! return ! use the delay slot after ret to restore windows
format:
.asciz "Move a disk from %d to %d\n"
Figure 5-46. The Towers of Hanoi for the UltraSPARC II.
L1:
ILOAD30 ICONST 31 IF3ICMPNE L1
// local 0 = n; push n // push 1 // if (n != 1) goto L1
GETSTATIC #13 NEW #7 DUP LDC #2 INVOKESPECIAL #10 ILOAD31 INVOKEVIRTUAL #11 LDC #1 INVOKEVIRTUAL #12 ILOAD32 INVOKEVIRTUAL #11 INVOKEVIRTUAL #15 INVOKEVIRTUAL #14 RETURN
// n == 1; this code handles the println statement // allocate buffer for the string to be built // duplicate the pointer to the buffer // push pointer to string "move a disk from " // copy the string to the buffer // push i // convert i to string and append to the new buffer // push pointer to string " to " // append this string to the buffer // push j // convert j to string and append to buffer // string conversion // call println // return from towers
BIPUSH 6 ILOAD31 ISUB ILOAD32 ISUB ISTORE33
// Else part: compute k = 6 − i − j // local 1 = i; push i // top-of-stack = 6 − i // local 2 = j; push j // top-of-stack = 6 − i − j // local 3 = k = 6 − i − j; stack is now empty
ILOAD30 ICONST 31 ISUB ILOAD31 ILOAD33 INVOKESTATIC #16
// start working on towers(n − 1, i, k); push n // push 1 // top-of-stack = n − 1 // push i // push k // call towers(n − 1, 1, k)
ICONST 31 ILOAD31 ILOAD32 INVOKESTATIC #16
// start working on towers(1, i, j); push 1 // push i // push j // call towers(1, i, j)
ILOAD30 ICONST 31 ISUB ILOAD33 ILOAD32 INVOKESTATIC #16 RETURN
// start working on towers(n − 1, k, j); push n // push 1 // top-of-stack = n − 1 // push k // push j // call towers(n − 1, k, j) // return from towers
Figure 5-47. The Towers of Hanoi for JVM.
INSTRUCTION 1
INSTRUCTION 2
INSTRUCTION 3
TEMPLATE
INSTRUCTION 1
INSTRUCTION 2
INSTRUCTION 3
TEMPLATE
INSTRUCTION 1
INSTRUCTION 2
INSTRUCTION 3
TEMPLATE
R1
R2
Instructions can be chained together
R3
PREDICATE REGISTER
Figure 5-48. IA-64 is based on bundles of three instructions.
if (R1 == 0) R2 = R3;
CMP R1,0 BNE L1 MOV R2,R3
CMOVZ R2,R3,R1
L1: (a)
(b)
(c)
Figure 5-49. (a) An if statement. (b) Generic assembly code for (a). (c) A conditional instruction.
if (R1 == 0) { R2 = R3; R4 = R5; } else { R6 = R7; R8 = R9; } (a)
CMP R1,0 BNE L1 MOV R2,R3 MOV R4.R5 BR L2 L1: MOV R6,R7 MOV R8,R9 L2: (b)
CMOVZ R2,R3,R1 CMOVZ R4,R5,R1 CMOVN R6,R7,R1 CMOVN R8,R9,R1
(c)
Figure 5-50. (a) An if statement. (b) Generic assembly code for (a). (c) Conditional execution.
if (R1 == R2) R3 = R4 + R5; else R6 = R4 − R5
(a)
CMP R1,R2 BNE L1 MOV R3,R4 BR L2 L1: MOV R6,R4 SUB R6,R5 L2: (b)
CMPEQ R1,R2,P4
ADD R3,R4,R5 SUB R6,R4,R5 ADD R3,R5
(c)
Figure 5-51. (a) An if statement. (b) Generic assembly code for (a). (c) Predicated execution.
6 THE OPERATING SYSTEM MACHINE LEVEL
1
Level 3
Operating system machine level Operating system
Level 2
Instruction set architecture level Microprogram or hardware
Level 1
Microarchitecture level
Figure 6-1. Positioning of the operating system machine level.
Address space Address
8191 4096
Mapping
4K Main memory 4095 0
0 Figure 6-2. A mapping in which virtual addresses 4096 to 8191 are mapped onto main memory addresses 0 to 4095.
Page
Virtual addresses
15
61440 – 65535
14
57344 – 61439
13
53248 – 57343
12
49152 – 53247
11
45056 – 49151
10
40960 – 45055
9
36864 – 40959
8
Bottom 32K of main memory
32768 – 36863
Page frame
Physical addresses
7
28672 – 32767
7
28672 – 32767
6
24576 – 28671
6
24576 – 28671
5
20480 – 24575
5
20480 – 24575
4
16384 – 20479
4
16384 – 20479
3
12288 – 16383
3
12288 – 16383
2
8192 – 12287
2
8192 – 12287
1
4096 – 8191
1
4096 – 8191
0
0 – 4095
0
0 – 4095
(a)
(b)
Figure 6-3. (a) The first 64K of virtual address space divided into 16 pages, with each page being 4K. (b) A 32K main memory divided up into eight page frames of 4K each.
15-bit
Memory address
1 1 0 0 0 0 0 0 0 0 1 0 1 1 0
Output register
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 1 1 0
Input register
Virtual page Page table Present/absent bit 15 14 13 12 11 10 9 8 7 6 5 4 3
1
110
2 1 0
20-bit virtual page
12-bit offset
32-bit virtual address
Figure 6-4. Formation of a main memory address from a virtual address.
Page table Virtual page
Page frame
15 0
0
14 1
4
13 0
0
12 0
0
11 1
5
10 0
0
9
0
0
Main memory
Page frame
8
1
3
7
0
0
Virtual page 6
7
6
1
7
Virtual page 5
6
5
1
6
Virtual page 11 5
4
0
0
Virtual page 14 4
3
1
2
Virtual page 8
3
2
0
0
Virtual page 3
2
1
1
0
Virtual page 0
1
0
1
1
Virtual page 1
0
1 = Present in main memory 0 = Absent from main memory
Figure 6-5. A possible mapping of the first 16 virtual pages onto a main memory with eight page frames.
Virtual page 7
Virtual page 7
Virtual page 7
Virtual page 6
Virtual page 6
Virtual page 6
Virtual page 5
Virtual page 5
Virtual page 5
Virtual page 4
Virtual page 4
Virtual page 4
Virtual page 3
Virtual page 3
Virtual page 3
Virtual page 2
Virtual page 2
Virtual page 2
Virtual page 1
Virtual page 1
Virtual page 0
Virtual page 0
Virtual page 8
Virtual page 8
(a)
(b)
(c)
Figure 6-6. Failure of the LRU algorithm.
Virtual address space Free Currently used
Call stack
Address space allocated to the call stack
Parse tree Constant table Source text Symbol table Figure 6-7. In a one-dimensional address space with growing tables, one table may bump into another.
20K 16K 12K
Symbol table
8K Source text
4K 0 Segment 0
Segment 1
Constant table Segment 2
Parse tree Segment 3
Call stack
Segment 4
Figure 6-8. A segmented memory allows each table to grow or shrink independently of the other tables.
222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 Segmentation 1 Consideration Paging 211 22222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 the programmer be aware of it? 21 Need 22222222222222222222222222222222222222222222222222222222222222222222222222222222 1 No 1 Yes 1 12How 1 1 1 many linear addresses spaces are there? 1 Many 1 22222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 virtual address space exceed memory size?1 Yes 21 Can 22222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Yes 1 12Can 1 1 1 variable-sized tables be handled easily? No Yes 1 22222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 Why was the technique invented? 1 To simulate large 1 To provide multiple 1 11222222222222222222222222222222222222222222222222222222222222222222222222222222222 11 memories 11 address spaces 11
Figure 6-9. Comparison of paging and segmentation.
, , ,,, , (3K)
(3K)
Segment 5 (4K)
Segment 5 (4K)
Segment 4 (7K)
Segment 4 (7K)
Segment 3 (8K)
Segment 3 (8K)
Segment 3 (8K)
Segment 2 (5K)
Segment 2 (5K)
Segment 2 (5K)
10K
(4K)
Segment 6 (4K) Segment 2 (5K)
Segment 5 (4K) Segment 6 (4K)
(3K)
(3K)
(3K)
Segment 2 (5K)
Segment 7 (5K)
Segment 7 (5K)
Segment 7 (5K)
Segment 7 (5K)
Segment 0 (4K)
Segment 0 (4K)
Segment 0 (4K)
Segment 0 (4K)
Segment 0 (4K)
(a)
(b)
(c)
(d)
(e)
Segment 1 (8K)
Figure 6-10. (a)-(d) Development of external fragmentation (e) Removal of the external fragmentation by compaction.
Descriptor Page frame
Descriptor segment
Segment number
Page number
Word
Page table
Offset Page
18-Bit Segment number
6-Bit page number
10-Bit offset within the page
Two-part MULTICS address
Figure 6-11. Conversion of a two-part MULTICS address into a main memory address.
Bits
13
1 2
INDEX 0 = GDT 1 = LDT
Privilege level (0-3)
Figure 6-12. A Pentium II selector.
Relative address 0
32 Bits BASE 0-15 BASE 24-31 G D 0
0 : LIMIT is in bytes 1 : LIMIT is in pages 0 : 16-bit segment 1 : 32-bit segment
LIMIT LIMIT 16-19 P DPL
TYPE
BASE 16-23
4
Segment type and protection Privilege level (0-3) 0 : Segment is absent from memory 1 : Segment is present from memory
Figure 6-13. A Pentium II code segment descriptor. Data segments differ slightly.
Selector
Offset Descriptor Base address
+
Limit Other fields
32-bit linear address Figure 6-14. Conversion of a (selector, offset) pair to a linear address.
Bits
10
Linear address 10
12
DIR
PAGE
OFF
(a) Page directory
Page table
Page frame
Word selected DIR
PAGE OFF (b)
Figure 6-15. Mapping of a linear address onto a physical address.
User programs
Possible uses of the levels
ared libraries Sh stem calls y S
Kernel 0 1 2 3 Level Figure 6-16. Protection on the Pentium II.
Bits
51
13
Virtual 8K Virtual address page number Offset
Physical address Bits
8K Page frame
Offset
28
13
48
16
45
19
42
22
64K Virtual page number Offset
512K Virtual page number Offset
4M Virtual page number Offset
64K Page Offset frame
512K Page Offset frame
4M Page Offset frame
25
16
22
19
19
Figure 6-17. Virtual to physical mappings on the UltraSPARC.
22
TSB (MMU + sofware)
TLB (MMU hardware) Context Flags Virtual page Physical page Valid
Context Virtual Flags page Physical tag page Valid
Translation table (Operating system)
(a)
Entry 0 is shared by all virtual pages ending in 0…0000
Entry 1 is shared by all virtual pages ending in 0…0001
Format is entirely defined by the operating system
(b) (c)
Figure 6-18. Data structures used in translating virtual addresses on the UltraSPARC. (a) TLB. (b) TSB. (c) Translation table.
Logical record number
14
15 1 logical record
15 16
17
Next logical record to be read
17 18
16
18 19
19
20
20
21
21
22
Main memory
22
Next logical record to be read
Main memory
23 Logical record 18
23
Buffer
24
24
25
25
26
(a)
Logical record 19
(b)
Figure 6-19. Reading a file consisting of logical records. (a) Before reading record 19. (b) After reading record 19.
Buffer
Sector 11
Sector 1
Sector 1
5
4
2
1
3
Track 0
3
1
0
Track 4
Sector 0
6
12 0
11
1
Sector 11
Sector 0
1 7
1
1
6
5
9
3
9 7
4
12
Read/ write head
0
8
14 2
8
Read/ write head
10 14
13
Direction of disk rotation
Direction of disk rotation
(a)
(b)
Figure 6-20. Disk allocation strategies. (a) A file in consecutive sectors. (b) A file not in consecutive sectors.
Track Sector Number of sectors in hole 0 0 1 1 2 2 2 3 3 4
0 6 0 11 1 3 7 0 9 3
5 6 10 1 1 3 5 3 3 8
Track 0 0 1 2 3 4
0 0 1 0 1
1
2
3
4
Sector 5 6 7
8
9 10 11
0 0 0 0 1
0 0 1 0 1
0 0 0 1 0
0 0 0 1 0
1 0 0 1 0
0 0 0 1 0
0 0 0 1 0
0 0 0 0 0
0 0 1 1 0 (b)
(a)
Figure 6-21. Two ways of keeping track of available sectors. (a) A free list. (b) A bit map.
0 1 0 0 0
0 0 0 0 1
File 0
File name:
Rubber-ducky
File 1
Length:
1840
File 2
Type:
Anatidae dataram
File 3
Creation date:
March 16, 1066
File 4
Last access:
September 1, 1492
File 5
Last change:
July 4, 1776
File 6
Total accesses: 144
File 7
Block 0:
Track 4
Sector 6
File 8
Block 1:
Track 19
Sector 9
File 9
Block 2:
Track 11
Sector 2
File 10
Block 3:
Track 77
Sector 0
Figure 6-22. (a) A user file directory. (b) The contents of a typical entry in a file directory.
Process 3 waiting for CPU Process 3 Process 3 Process 2 Process 2 Process 1 Process 1 Process 1 running Time
Time
(a)
(b)
Figure 6-23. (a) True parallel processing with multiple CPUs. (b) Parallel processing simulated by switching one CPU among three processes.
In
In Out
In
In Out
In, out
Out
In
Out
Out (a)
(b)
(c)
(d)
Figure 6-24. Use of a circular buffer.
(e)
(f)
public class m { final public static int BUF3SIZE = 100; // buffer runs from 0 to 99 final public static long MAX3PRIME = 100000000000L; // stop here public static int in = 0, out = 0; // pointers to the data public static long buffer[ ] = new long[BUF3SIZE];// primes stored here public static producer p; // name of the producer public static consumer c; // name of the consumer public static void main(String args[ ]) { // main class p = new producer( ); // create the producer c = new consumer( ); // create the consumer p.start( ); // start the producer c.start( ); // start the consumer } // This is a utility function for circularly incrementing in and out public static int next(int k) {if (k < BUF3SIZE − 1) return(k+1); else return(0);} } class producer extends Thread { public void run( ) { long prime = 2;
// producer class // producer code // scratch variable
while (prime < m.MAX3PRIME) { prime = next3prime(prime); // statement P1 if (m.next(m.in) == m.out) suspend( ); // statement P2 m.buffer[m.in] = prime; // statement P3 m.in = m.next(m.in); // statement P4 if (m.next(m.out) == m.in) m.c.resume( ); // statement P5 } } private long next3prime(long prime){ ... } // function that computes next prime } class consumer extends Thread { public void run( ) { long emirp = 2;
// consumer class // consumer code // scratch variable
while (emirp < m.MAX3PRIME) { if (m.in == m.out) suspend( ); // statement C1 emirp = m.buffer[m.out]; // statement C2 m.out = m.next(m.out); // statement C3 if (m.out == m.next(m.next(m.in))) m.p.resume( );// statement C4 System.out.println(emirp); // statement C5 } } }
Figure 6-25. Parallel processing with a fatal race condition.
100
100
In = 22
Producer at P5 sends wake up consumer at C1
Producer at P1 consumer at C1
Producer at P1 consumer at C5
100
Buffer empty
In = Out = 22
In = 23 Prime
Out = 22
Prime
Out = 21
1
1 number in buffer (a)
1 number in buffer 1
1 (b)
(c)
Figure 6-26. Failure of the producer-consumer communication mechanism.
22222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 Instr 1 Semaphore = 0 Semaphore > 0 21 2222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 Semaphore=semaphore+1 1 1 Up 1 Semaphore=semaphore+1; 1 1 1 if the other process was halted attempting to 1 1 1 1 1 complete a down instruction on this sema1 1 1 1 phore, it may now complete the down and 1 1 1 1 21 2222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 continue running 1 1 Down 1 Process halts until the other process ups this 1 Semaphore=semaphore−1 1 1 1 1 semaphore 122222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1
Figure 6-27. The effect of a semaphore operation.
public class m { final public static int BUF3SIZE = 100; // buffer runs from 0 to 99 final public static long MAX3PRIME = 100000000000L; // stop here public static int in = 0, out = 0; // pointers to the data public static long buffer[ ] = new long[BUF3SIZE];// primes stored here public static producer p; // name of the producer public static consumer c; // name of the consumer public static int filled = 0, available = 100; // semaphores public static void main(String args[ ]) { // main class p = new producer( ); // create the producer c = new consumer( ); // create the consumer p.start( ); // start the producer c.start( ); // start the consumer } // This is a utility function for circularly incrementing in and out public static int next(int k) {if (k < BUF3SIZE − 1) return(k+1); else return(0);} } class producer extends Thread { // producer class native void up(int s); native void down(int s); // methods on semaphores public void run( ) { // producer code long prime = 2; // scratch variable while (prime < m.MAX3PRIME) { prime = next3prime(prime); down(m.available); m.buffer[m.in] = prime; m.in = m.next(m.in); up(m.filled); }
// statement P1 // statement P2 // statement P3 // statement P4 // statement P5
} private long next3prime(long prime){ ... } // function that computes next prime } class consumer extends Thread { // consumer class native void up(int s); native void down(int s); // methods on semaphores public void run( ) { // consumer code long emirp = 2; // scratch variable while (emirp < m.MAX3PRIME) { down(m.filled); emirp = m.buffer[m.out]; m.out = m.next(m.out); up(m.available); System.out.println(emirp); } } }
// statement C1 // statement C2 // statement C3 // statement C4 // statement C5
Figure 6-28. Parallel processing using semaphores.
22222222222222222222222222222222222222222222222222222222222222222222222222222222222 122222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 Category Some examples 1 1 1 File management 122222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Open, read, write, close, and lock files 1 122222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 Directory management Create and delete directories; move files around 1 1 1 Process management 122222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Spawn, terminate, trace, and signal processes 1 122222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 Memory management Share memory among processes; protect pages 1 1 1 Getting/setting parameters1 Get user, group, process ID; set priority 122222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 122222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 Dates and times Set file access times; use interval timer; profile execution 1 1 1 Networking 122222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Establish/accept connection; send/receive message 1 1122222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Miscellaneous 1 Enable accounting; manipulate disk quotas; reboot the system 11
Figure 6-29. A rough breakdown of the UNIX system calls.
Shell
User program
User mode
System call interface File system
Process management
Block cache
IPC
Scheduling
Device drivers
Signals
Memory mgmt.
Hardware
Figure 6-30. The structure of a typical UNIX system.
Kernel mode
POSIX program
Win32 program
OS/2 program
POSIX subsystem
Win32 subsystem
OS/2 subsystem
User mode
System interface
System services
Executive
File cache
I/O File systems
Virtual memory
Processes and threads
Security
and
Object management Device drivers
Win32
Microkernel
Graphics device interface
Hardware abstraction layer Hardware
Figure 6-31. The structure of Windows NT.
Kernel mode
22222222222222222222222222222222222222222222222222222222222222222222222222222 122222222222222222222222222222222222222222222222222222222222222222222222222222 1 Windows 95/98 1 NT 5.0 1 Item 1 1 1 1 Win32 API? 122222222222222222222222222222222222222222222222222222222222222222222222222222 1 Yes 1 Yes 1 122222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Full 32-bit system? No Yes 1 1 1 1 Security? 122222222222222222222222222222222222222222222222222222222222222222222222222222 1 No 1 Yes 1 122222222222222222222222222222222222222222222222222222222222222222222222222222 1 No 1 Yes 1 Protected file mappings? 1 1 1 1 Sep. addr space for each MS-DOS program? No Yes 2 2222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 Plug and play? 122222222222222222222222222222222222222222222222222222222222222222222222222222 1 Yes 1 Yes 1 1 Unicode? 1 No 1 Yes 1 122222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Runs on Intel 80x86 80x86, Alpha 122222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 Multiprocessor support? 1 No 1 Yes 1 122222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Re-entrant code inside OS? 122222222222222222222222222222222222222222222222222222222222222222222222222222 1 No 1 Yes 1 1 Some critical OS data writable by user? 1 Yes 1 No 1 122222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1
Figure 6-32. Some differences between versions of Windows.
Address 0xFFFFFFFF
Stack
Data Code 0 Figure 6-33. The address space of a single UNIX process.
2222222222222222222222222222222222222222222222222222222222222222222222222222222 12222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 API function Meaning 1 1 1 VirtualAlloc 12222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Reserve or commit a region 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 VirtualFree Release or decommit a region 1 1 1 VirtualProtect 21 222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Change the read/write/execute protection on a region 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Inquire about the status of a region 1 VirtualQuery 1 1 1 VirtualLock Make a region memory resident (i.e., disable paging for it) 21 222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 VirtualUnlock 12222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Make a region pageable in the usual way 1 1 CreateFileMapping 1 Create a file mapping object and (optionally) assign it a name 1 21 222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 MapViewOfFile Map (part of) a file into the address space 12222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 UnmapViewOfFile 1 Remove a mapped file from the address space 1 21 222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 OpenFileMapping 12222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Open a previously created file mapping object 1
Figure 6-34. The principal API functions for managing virtual memory in Windows NT.
222222222222222222222222222222222222222222222222222222222222222222222222222222 1222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 System call Meaning 1 1 1 creat(name, mode) 1222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Create a file; mode specifies the protection mode 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 unlink(name) Delete a file (assuming that there is only 1 link to it) 1 1 1 open(name, mode) 1222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Open or create a file and return a file descriptor 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Close a file 1 close(fd) 1 1 1 read(fd, buffer, count) 1 Read count bytes into buffer 21 22222222222222222222222222222222222222222222222222222222222222222222222222222 1 write(fd, buffer, count) 1 Write count bytes from buffer 1222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 lseek(fd, offset, w) 1 Move the file pointer as required by offset and w 1 21 22222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 stat(name, buffer) Return information about a file 1222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 chmod(name, mode) 1 Change the protection mode of a file 1 21 22222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 fcntl(fd, cmd, ...) 1222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Do various control operations such as locking (part of) a file 1
Figure 6-35. The principal UNIX file system calls.
// Open the file descriptors infd = open(′′data′′, 0); outfd = creat(′′newf′′, ProtectionBits); // Copy loop do { count = read(infd, buffer, bytes); if (count > 0) write(outfd, buffer, count); } while (count > 0); // Close the files close(infd); close(outfd); Figure 6-36. A program fragment for copying a file using the UNIX system calls. This fragment is in C because Java hides the low-level system calls and we are trying to expose them.
Root directory bin dev lib usr
… /dev
/bin
…
/usr
…
/lib
ast jim
…
… /usr/ast
/usr/jim
bin
jotto
data foo.c
…
… /usr/ast/bin game 1 game 2 game 3 game 4
… Data files
Figure 6-37. Part of a typical UNIX directory system.
222222222222222222222222222222222222222222222222222222222222222222222222 1222222222222222222222222222222222222222222222222222222222222222222222222 1 1 System call Meaning 1 1 1 mkdir(name, mode) Create a new directory 222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222 1 1 rmdir(name) Delete an empty directory 1 1 1 opendir(name) 1222222222222222222222222222222222222222222222222222222222222222222222222 1 Open a directory for reading 1 readdir(dirpointer) 1222222222222222222222222222222222222222222222222222222222222222222222222 1 Read the next entry in a directory 1 1 closedir(dirpointer) 1 Close a directory 1 222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 chdir(dirname) 1222222222222222222222222222222222222222222222222222222222222222222222222 1 Change working directory to dirname 1 1222222222222222222222222222222222222222222222222222222222222222222222222 1 Create a directory entry name2 pointing to name1 1 link(name1, name2) 1 1 1 1 unlink(name) 222222222222222222222222222222222222222222222222222222222222222222222222 1 Remove name from its directory 1 Figure 6-38. The principal UNIX directory management calls.
2222222222222222222222222222222222222222222222222222222222222222222222222222222 12222222222222222222222222222222222222222222222222222222222222222222222222222222 1 UNIX 1 1 API function Meaning 1 1 1 1 CreateFile 12222222222222222222222222222222222222222222222222222222222222222222222222222222 1 open 1 Create a file or open an existing file; return a handle 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222 1 unlink 1 Destroy an existing file 1 DeleteFile 1 1 1 1 CloseHandle 21 222222222222222222222222222222222222222222222222222222222222222222222222222222 1 close 1 Close a file 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222 1 read 1 Read data from a file 1 ReadFile 1 1 1 1 WriteFile write Write data to a file 21 222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 SetFilePointer 12222222222222222222222222222222222222222222222222222222222222222222222222222222 1 lseek 1 Set the file pointer to a specific place in the file 1 1 GetFileAttributes 1 stat 1 Return the file properties 1 21 222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 LockFile fcntl Lock a region of the file to provide mutual exclusion 12222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 UnlockFile 1 fcntl 1 Unlock a previously locked region of the file 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1
Figure 6-39. The principal Win32 API functions for file I/O. The second column gives the nearest UNIX equivalent.
// Open files for input and output. inhandle = CreateFile(′′data′′, GENERIC 3READ, 0, NULL, OPEN3EXISTING, 0, NULL); outhandle = CreateFile(′′newf′′, GENERIC 3WRITE, 0, NULL, CREATE 3ALWAYS, FILE3ATTRIBUTE 3NORMAL, NULL); // Copy the file. do { s = ReadFile(inhandle, buffer, BUF 3SIZE, &count, NULL); if (s > 0 && count > 0) WriteFile(outhandle, buffer, count, &ocnt, NULL); while (s > 0 && count > 0); // Close the files. CloseHandle(inhandle); CloseHandle(outhandle);
Figure 6-40. A program fragment for copying a file using the Windows NT API functions. This fragment is in C because Java hides the low-level system calls and we are trying to expose them.
2222222222222222222222222222222222222222222222222222222222222222222222222222222 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222 1 UNIX 1 API function Meaning 1 1 1 1 CreateDirectory 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222 1 mkdir 1 Create a new directory 1 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222 1 RemoveDirectory rmdir Remove an empty directory 1 1 1 1 2222222222222222222222222222222222222222222222222222222222222222222222222222222 1 FindFirstFile 1 opendir 1 Initialize to start reading the entries in a directory 1 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222 1 readdir 1 Read the next directory entry FindNextFile 1 1 1 1 MoveFile Move a file from one directory to another 2222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 11 chdir 11 Change the current working directory 11 112222222222222222222222222222222222222222222222222222222222222222222222222222222 SetCurrentDirectory
Figure 6-41. The principal Win32 API functions for directory management. The second column gives the nearest UNIX equivalent, when one exists.
Standard MS-DOS information File name name Security
MFT entry for one file MFT header
Master file table
Figure 6-42. The Windows NT master file table.
Data
A
A
A
A
Original process
A
Children of A
A
Grandchildren of A
Figure 6-43. A process tree in UNIX.
2222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 Thread call Meaning 21 222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 pthread3create 21 222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Create a new thread in the caller’s address space 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 pthread3exit Terminate the calling thread 1 1 1 pthread3join 21 222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Wait for a thread to terminate 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Create a new mutex 1 pthread3mutex3init 1 1 1 pthread3mutex3destroy Destroy a mutex 21 222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 pthread3mutex3lock 12222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Lock a mutex 1 1 pthread3mutex3unlock 1 Unlock a mutex 1 21 222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 pthread 3 cond 3 init Create a condition variable 1 1 21 222222222222222222222222222222222222222222222222222222222222222222222222222222 1 pthread3cond3destroy 1 Destroy a condition variable 1 21 222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 pthread3cond3wait 12222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Wait on a condition variable 1 1 pthread3cond3signal 1 Release one thread waiting on a condition variable 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1
Figure 6-44. The principal POSIX thread calls.
7 THE ASSEMBLY LANGUAGE LEVEL
1
2222222222222222222222222222222222222222222222222222222222222222222 1 1 Programmer-years to 1 Program execution 1 12222222222222222222222222222222222222222222222222222222222222222222 1 produce the program 1 time in seconds 1 1 1 1 1 Assembly language 50 33 1 1 1 1 10 1 100 1 1 High-level language 1 1 1 1 1 1 Mixed approach before tuning 1 1 1 1 1 Critical 10% 1 1 90 1 1 1 1 Other 90% 9 10 1 1 1 1 1 33 1 33 1 1 1 Total 10 1 100 1 1 1 1 1 1 1 1 Mixed approach after tuning 1 1 1 1 1 1 Critical 10% 6 30 1 1 1 Other 90% 9 1 10 1 1 1 1 33 33 1 1 1 1 1 Total 15 1 40 1 12222222222222222222222222222222222222222222222222222222222222222222 1
Figure 7-1. Comparison of assembly language and high-level language programming, with and without tuning.
Opcode Operands Comments 2Label 22222222222222222222222222222222222222222222222222222222222222222222 FORMULA: MOV EAX,I ; register EAX = I ADD EAX,J ; register EAX = I + J MOV N,EAX ;N=I+J I J N
DW DW DW
3 4 0
; reserve 4 bytes initialized to 3 ; reserve 4 bytes initialized to 4 ; reserve 4 bytes initialized to 0 (a)
Label Opcode Operands Comments 2222222222222222222222222222222222222222222222222222222222222222222222 FORMULA MOVE.L I, D0 ; register D0 = I ADD.L J, D0 ; register D0 = I + J MOVE.L D0, N ;N=I+J I J N
DC.L DC.L DC.L
3 4 0
; reserve 4 bytes initialized to 3 ; reserve 4 bytes initialized to 4 ; reserve 4 bytes initialized to 0 (b)
Opcode Operands Comments 2Label 22222222222222222222222222222222222222222222222222222222222222222222222222222222222 FORMULA: SETHI %HI(I),%R1 ! R1 = high-order bits of the address of I LD [%R1+%LO(I)],%R1 ! R1 = I SETHI %HI(J),%R2 ! R2 = high-order bits of the address of J LD [%R2+%LO(J)],%R2 ! R2 = J NOP ! wait for J to arrive from memory ADD %R1,%R2,%R2 ! R2 = R1 + R2 SETHI %HI(N),%R1 ! R1 = high-order bits of the address of N ST %R2,[%R1+%LO(N)] I: J: N:
.WORD 3 .WORD 4 .WORD 0
! reserve 4 bytes initialized to 3 ! reserve 4 bytes initialized to 4 ! reserve 4 bytes initialized to 0 (c)
Figure 7-2. Computation of N = I + J. (a) Pentium II. (b) Motorola 680x0. (c) SPARC.
222222222222222222222222222222222222222222222222222222222222222222222222222 1222222222222222222222222222222222222222222222222222222222222222222222222222 1 Pseudoinstr 1 Meaning 1 1 1 SEGMENT Start a new segment (text, data, etc.) with certain attributes 222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222 1 End the current segment 1 ENDS 1 1 1 ALIGN 1222222222222222222222222222222222222222222222222222222222222222222222222222 1 Control the alignment of the next instruction or data 1 1222222222222222222222222222222222222222222222222222222222222222222222222222 1 Define a new symbol equal to a given expression 1 EQU 1 1 1 222222222222222222222222222222222222222222222222222222222222222222222222222 1 DB 1 Allocate storage for one or more (initialized) bytes 1 DD 1222222222222222222222222222222222222222222222222222222222222222222222222222 1 Allocate storage for one or more (initialized) 16-bit halfwords 1 1 DW 1 Allocate storage for one or more (initialized) 32-bit words 1 222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 DQ 1222222222222222222222222222222222222222222222222222222222222222222222222222 1 Allocate storage for one or more (initialized) 64-bit double words 1 1 PROC 1 Start a procedure 1 222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 ENDP 1222222222222222222222222222222222222222222222222222222222222222222222222222 1 End a procedure 1 1 MACRO 1 Start a macro definition 1 222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 ENDM End a macro definition 1222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 PUBLIC Export a name defined in this module 1 1 1 EXTERN 1222222222222222222222222222222222222222222222222222222222222222222222222222 1 Import a name from another module 1 1222222222222222222222222222222222222222222222222222222222222222222222222222 1 Fetch and include another file 1 INCLUDE 1 1 1 IF 1222222222222222222222222222222222222222222222222222222222222222222222222222 1 Start conditional assembly based on a given expression 1 1222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 ELSE Start conditional assembly if the IF condition above was false 1 1 1 222222222222222222222222222222222222222222222222222222222222222222222222222 1 ENDIF 1 End conditional assembly 1 COMMENT 1 Define a new start-of-comment character 1222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 PAGE 1 Generate a page break in the listing 1 222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 END 222222222222222222222222222222222222222222222222222222222222222222222222222 1 Terminate the assembly program 1
Figure 7-3. Some of the pseudoinstructions available in the Pentium II assembler (MASM).
MOV MOV MOV MOV
EAX,P EBX,Q Q,EAX P,EBX
MOV MOV MOV MOV
EAX,P EBX,Q Q,EAX P,EBX
SWAP
MACRO MOV EAX,P MOV EBX,Q MOV Q,EAX MOV P,EBX ENDM SWAP SWAP
(a)
(b)
Figure 7-4. Assembly language code for interchanging P and Q twice. (a) Without a macro. (b) With a macro.
2222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Item Macro call Procedure call 21 222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 When is the call made? 21 222222222222222222222222222222222222222222222222222222222222222222222222222 1 During assembly 1 During execution 1 1 Is the body inserted into the object 1 Yes 1 No 1 1 program every place the call is 1 1 1 1 1 1 1 made? 21 222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 Is a procedure call instruction 1 No 1 Yes 1 1 inserted into the object program 1 1 1 1 1 1 1 and later executed? 21 222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 Must a return instruction be used 1 No 1 Yes 1 12222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 after the call is done? 1 1 1 1 1 How many copies of the body ap1 One per macro call 1 1 1 pear in the object program? 12222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1
Figure 7-5. Comparison of macro calls with procedure calls.
MOV MOV MOV MOV
EAX,P EBX,Q Q,EAX P,EBX
MOV MOV MOV MOV
EAX,R EBX,S S,EAX R,EBX
CHANGE
MACRO P1, P2 MOV EAX,P1 MOV EBX,P2 MOV P2,EAX MOV P1,EBX ENDM CHANGE P, Q CHANGE R, S
(a)
(b)
Figure 7-6. Nearly identical sequences of statements. (a) Without a macro. (b) With a macro.
Label Opcode Operands Comments Length ILC 222222222222222222222222222222222222222222222222222222222222222222222222222 MARIA: MOV EAX,I EAX = I 5 100 MOV EBX, J EBX = J 6 105 ROBERTA: MOV ECX, K ECX = K 6 111 2 117 IMUL EAX, EAX EAX = I * I 3 119 IMUL EBX, EBX EBX = J * J 3 122 IMUL ECX, ECX ECX = K * K 2 125 MARILYN: ADD EAX, EBX EAX = I * I + J * J 2 127 ADD EAX, ECX EAX = I * I + J * J + K * K STEPHANY: JMP DONE branch to DONE 5 129
Figure 7-7. The instruction location counter (ILC) keeps track of the address where the instructions will be loaded in memory. In this example, the statements prior to MARIA occupy 100 bytes.
222222222222222222222222222222222222222222222222 1222222222222222222222222222222222222222222222222 1 Value 1 1 Symbol Other information 1 1 1 1 MARIA 1222222222222222222222222222222222222222222222222 1 100 1 1 1222222222222222222222222222222222222222222222222 1 ROBERTA 1 111 1 1 MARILYN 1 125 1 1 222222222222222222222222222222222222222222222222 1 1 1 1 STEPHANY 1 129 1 1222222222222222222222222222222222222222222222222 1 Figure 7-8. A symbol table for the program of Fig. 7-7.
2222222222222222222222222222222222222222222222222222222222222222222222 1 1 First 1 Second 1 Hexadecimal 1 Instruc- 1 Instruc- 1 1 Opcode 1 operand 1 operand 1 1 tion 1 tion 1 opcode 1 1 1 1 1 1 1 length class 21 222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 12222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 AAA — — 37 1 6 1 ADD 1 EAX 1 immed32 1 1 1 1 05 5 4 21 222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 ADD 01 2 19 1 reg 1 reg 1 1 1 1 21 222222222222222222222222222222222222222222222222222222222222222222222 1 AND 1 EAX 1 immed32 1 1 1 1 25 5 4 21 222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 AND 21 2 19 12222222222222222222222222222222222222222222222222222222222222222222222 1 reg 1 reg 1 1 1 1 Figure 7-9. A few excerpts from the opcode table for a Pentium II assembler.
public static void pass3one( ) { // This procedure is an outline of pass one of a simple assembler. boolean more3input = true; // flag that stops pass one String line, symbol, literal, opcode; // fields of the instruction int location 3counter, length, value, type; // misc. variables final int END3STATEMENT = −2; // signals end of input location 3counter = 0; initialize 3tables( );
// assemble first instruction at 0 // general initialization
while (more 3input) { line = read3next3line( ); length = 0; type = 0;
// more3input set to false by END // get a line of input // # bytes in the instruction // which type (format) is the instruction
if (line 3is3not3comment(line)) { symbol = check3for3symbol(line); // is this line labeled? if (symbol != null) // if it is, record symbol and value enter3new3symbol(symbol, location 3counter); literal = check3for3literal(line); // does line contain a literal? if (literal != null) // if it does, enter it in table enter3new3literal(literal); // Now determine the opcode type. −1 means illegal opcode. opcode = extract3opcode(line); // locate opcode mnemonic type = search3opcode3table(opcode); // find format, e.g. OP REG1,REG2 if (type < 0) // if not an opcode, is it a pseudoinstruction? type = search3pseudo3table(opcode); switch(type) { // determine the length of this instruction case 1: length = get3length3of3type1(line); break; case 2: length = get3length3of3type2(line); break; // other cases here } } write 3temp3file(type, opcode, length, line);// useful info for pass two location 3counter = location 3counter + length;// update loc3ctr if (type == END3STATEMENT) { // are we done with input? more3input = false; // if so, perform housekeeping tasks rewind 3temp3for3pass3two( ); // like rewinding the temp file sort3literal 3table( ); // and sorting the literal table remove3redundant3literals( ); // and removing duplicates from it } } }
Figure 7-10. Pass one of a simple assembler.
public static void pass3two( ) { // This procedure is an outline of pass two of a simple assembler. boolean more 3input = true; // flag that stops pass one String line, opcode; // fields of the instruction int location3counter, length, type; // misc. variables final int END3STATEMENT = −2; // signals end of input final int MAX3CODE = 16; // max bytes of code per instruction byte code[ ] = new byte[MAX 3CODE]; // holds generated code per instruction location3counter = 0;
// assemble first instruction at 0
while (more3input) { // more3input set to false by END type = read3type( ); // get type field of next line opcode = read3opcode( ); // get opcode field of next line length = read3length( ); // get length field of next line line = read3line( ); // get the actual line of input if (type != 0) { // type 0 is for comment lines switch(type) { // generate the output code case 1: eval3type1(opcode, length, line, code); break; case 2: eval3type2(opcode, length, line, code); break; // other cases here } } write3output(code); // write the binary code write3listing(code, line); // print one line on the listing location3counter = location3counter + length;// update loc3ctr if (type == END3STATEMENT) {// are we done with input? more3input = false; // if so, perform housekeeping tasks finish3up( ); // odds and ends } } } Figure 7-11. Pass two of a simple assembler.
Andy Anton Cathy Dick Erik Frances Frank Gerrit Hans Henri Jan Jaco Maarten Reind Roel Willem Wiebren
14025 31253 65254 54185 47357 56445 14332 32334 44546 75544 17097 64533 23267 63453 76764 34544 34344
0 4 5 0 6 3 3 4 4 2 5 6 0 1 7 6 1
(a)
Hash table
Linked table
0
Andy
14025
Maarten
23267
1
Reind
63453
Wiebren
34344
2
Henri
75544
3
Frances
56445
Frank
14332
4
Hans
44546
Gerrit
32334
5
Jan
17097
Cathy
65254
6
Jaco
64533
Willem
34544
7
Roel
76764
Dick
54185
Anton
31253
Erik
47357
(b)
Figure 7-12. Hash coding. (a) Symbols, values, and the hash codes derived from the symbols. (b) Eight-entry hash table with linked lists of symbols and values.
Source procedure 1
Source procedure 2
Source procedure 3
Object module 1
Translator
Object module 2
Linker
Executable binary program
Object module 3
Figure 7-13. Generation of an executable binary program from a collection of independently translated source procedures requires using a linker.
Object module B 600 500
CALL C
Object module A 400
400
300
CALL B
300
200
MOVE P TO X
200 100
100 0
MOVE Q TO X
BRANCH TO 200
0
BRANCH TO 300
Object module C 500 400
CALL D Object module D 300
300 200
MOVE R TO X
MOVE S TO X
100
100 0
200
BRANCH TO 200
0
BRANCH TO 200
Figure 7-14. Each module has its own address space, starting at 0.
1900 1800
1900 MOVE S TO X
1700 1600
1500
Object module D
BRANCH TO 200
1500
CALL D
1000
MOVE R TO X
1300
BRANCH TO 200
1100 1000
CALL C
MOVE Q TO X
Object module B
800
700
600
600 BRANCH TO 300
400
CALL B
300
MOVE P TO X
200
100 0
CALL 1600
MOVE R TO X
Object module C
BRANCH TO 1300
CALL 1100
900
700
500
BRANCH TO 1800
1200
900 800
Object module D
1400 Object module C
1200
1100
MOVE S TO X
1700 1600
1400
1300
1800
500
Object module A
MOVE Q TO X
Object module B
BRANCH TO 800
400
CALL 500
300
MOVE P TO X
Object module A
200 BRANCH TO 200
100
BRANCH TO 300
0
Figure 7-15. (a) The object modules of Fig. 7-14 after being positioned in the binary image but before being relocated and linked. (b) The same object modules after linking and after relocation has been performed. Together they form an executable binary program, ready to run.
End of module Relocation dictionary
Machine instructions and constants
External reference table Entry point table Identification Figure 7-16. The internal structure of an object module produced by a translator.
2200 2100
MOVE S TO X
2000 1900 1800
Object module D
BRANCH TO 1800
CALL 1600
1700 1600
MOVE R TO X
Object module C
1500 1400
1300
BRANCH TO 1300
CALL 1100
1200
1100
MOVE Q TO X
Object module B
1000
900 800
BRANCH TO 800
700
CALL 500
600
MOVE P TO X
Object module A
500 400
BRANCH TO 300
0
Figure 7-17. The relocated binary program of Fig. 7-15(b) moved up 300 addresses. Many instructions now refer to an incorrect memory address.
, ,,
A procedure segment
CALL EARTH
The linkage segment rect Indi ssing e Invalid address r add E A R T H
CALL FIRE
Invalid address A I R
Linkage information for the procedure of AIR
Invalid address F I R E
Name of the procedure is stored as a character string
CALL AIR
CALL WATER CALL EARTH
Indirect word
w
Invalid address A T E R
CALL WATER
(a)
A procedure segment
CALL EARTH
The linkage segment rect Indi ssing Address of earth re add E A R T H
To earth
Invalid address A I R
CALL FIRE CALL AIR
F
CALL WATER
Invalid address I R E
Invalid address W A T E R
CALL EARTH
CALL WATER
(b)
Figure 7-18. Dynamic linking. (a) Before EARTH is called. (b) After EARTH has been called and linked.
User process 1
User process 2
DLL Header A B C D
Figure 7-19. Use of a DLL file by two processes.
8 PARALLEL COMPUTER ARCHITECTURES
1
P
P
P
P
P P
Shared memory
P P P
P
P (a)
P
CPU
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P P
P
P
P
(b)
Figure 8-1. (a) A multiprocessor with 16 CPUs sharing a common memory. (b) An image partitioned into 16 sections, each being analyzed by a different CPU.
M
P
M
P
M
P
M
P
M
M
M
M
Private memory
P
P
P
P
CPU
Messagepassing interconnection network
P
P
P
P
M
M
M
M
(a)
P
P
M
P
P
M
P
P
M
P
P
M
P
P
P
P
CPU P
Messagepassing interconnection network
P P P
P
P
P
P
(b)
Figure 8-2. (a) A multicomputer with 16 CPUs, each with each own private memory. (b) The bit-map image of Fig. 8-1 split up among the 16 memories.
Machine 1
Machine 2
Machine 1
Machine 2
Machine 1
Machine 2
Application
Application
Application
Application
Application
Application
Language run-time system
Language run-time system
Language run-time system
Language run-time system
Language run-time system
Language run-time system
Operating system
Operating system
Operating system
Operating system
Operating system
Operating system
Hardware
Hardware
Hardware
Hardware
Hardware
Hardware
Shared memory
Shared memory
Shared memory
(a)
(b)
(c)
Figure 8-3. Various layers where shared memory can be implemented. (a) The hardware. (b) The operating system. (c) The language runtime system.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
Figure 8-4. Various topologies. The heavy dots represent switches. The CPUs and memories are not shown. (a) A star. (b) A complete interconnect. (c) A tree. (d) A ring. (e) A grid. (f) A double torus. (g) A cube. (h) A 4D hypercube.
Input port
CPU 1
Output port A
B
C
D
End of packet
Middle of packet
Four-port switch
CPU 2
Front of packet
Figure 8-5. An interconnection network in the form of a fourswitch square grid. Only two of the CPUs are shown.
CPU 1
Entire packet
Input port
Four-port switch
Output port
A
B
A
B
A
B
C
D
C
D
C
D
CPU 2 Entire packet
Entire packet (a)
(b)
(c)
Figure 8-6. Store-and-forward packet switching.
CPU 1 B
C
D
,
,
A
CPU 3
CPU 2
Four-port switch
Input port Output buffer
CPU 4
Figure 8-7. Deadlock in a circuit-switched interconnection network.
60 N-body problem 50
Linear speedup
Speedup
40
Awari
30
20
10 Skyline matrix inversion 0
0
10
20
30 40 Number of CPUs
50
60
Figure 8-8. Real programs achieve less than the perfect speedup indicated by the dotted line.
n CPUs active
…
Inherently sequential part
Potentially parallelizable part
1 CPU active
f
1–f
f
1–f
fT
(1 – f)T/n
T (a)
(b)
Figure 8-9. (a) A program has a sequential part and a parallelizable part. (b) Effect of running part of the program in parallel.
CPU
Bus (a)
(b)
(c)
(d)
Figure 8-10. (a) A 4-CPU bus-based system. (b) A 16-CPU bus-based system. (c) A 4-CPU grid-based system. (d) A 16CPU grid-based system.
P1 P1
P2
Work queue
P3
P1
P2
P3 P1
Synchronization point
P1
P3
P5
P4
P2 P2
P2
P6
P3 P7
P8
Process
Synchronization point P9
(a)
(b)
(c)
(d)
Figure 8-11. Computational paradigms. (a) Pipeline. (b) Phased computation. (c) Divide and conquer. (d) Replicated worker.
P3
222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 Physical 1 Logical 1 (hardware) 1 1 1 (software) Examples 21 22222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222222 Multiprocessor 1 Shared variables 1 Image processing as in Fig. 8-1 1 1 Multiprocessor 1 Message passing 1 Message passing simulated with buffers in memory 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222222 Multicomputer 1 Shared variables 1 DSM, Linda, Orca, etc. on an SP/2 or a PC network 1 1 Multicomputer 1 Message passing 1 PVM or MPI on an SP/2 or a network of PCs 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1
Figure 8-12. Combinations of physical and logical sharing.
2222222222222222222222222222222222222222222222222222222222222222222222222 1 Instruction 1 Data 1 1 1 1 streams 1 streams 1 Name 1 1 Examples 2222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 12222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 SISD 1 Classical Von Neumann machine 1 1 1 1 Multiple 1 SIMD 1 Vector supercomputer, array processor 1 2222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 Multiple 12222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 MISD 1 Arguably none 1 1 Multiple 1 Multiple 1 MIMD 1 Multiprocessor, multicomputer 1 1 2222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1
Figure 8-13. Flynn’s taxonomy of parallel computers.
Parallel computer architectures
SISD
SIMD
MISD
(Von Neumann)
MIMD
?
Vector processor
Array processor
UMA
Bus
Multiprocessors
COMA
Switched
Multicomputers
NUMA
CC-NUMA
Shared memory
NC-NUMA
MPP
Grid
COW
Hypercube
Message passing
Figure 8-14. A taxonomy of parallel computers.
Input vectors
Vector ALU
Figure 8-15. A vector ALU.
222222222222222222222222222222222222222222222222 1222222222222222222222222222222222222222222222222 1 1 Operation Examples 1 1 1 = f (B ) f = cosine, square root A i 1 i 1 2 22222222222222222222222222222222222222222222222 1 1 1 1222222222222222222222222222222222222222222222222 1 1 f2 = sum, minimum Scalar = f2 (A) 1 1 1 Ai = f3 (Bi, Ci ) 1222222222222222222222222222222222222222222222222 1 f3 = add, subtract 1 Ai = f4 (scalar, Bi ) 1 f4 = multiply Bi by a constant 1 1222222222222222222222222222222222222222222222222 Figure 8-16. Various combinations of vector and scalar operations.
2 2222222222222222222222222222222222222222222222222222222 1 1 1 12Step Name Values 2222222222222222222222222222222222222222222222222222222 1 1 1 12 11 1 − 9.212 × 10 1 Fetch operands 1.082 × 10 21 2222222222222222222222222222222222222222222222222222222 1 1 1 12 12 1 1 12 2222222222222222222222222222222222222222222222222222222 1 2 Adjust exponent 1.082 × 10 − 0.9212 × 10 1 1 1 1 12 3 21 2222222222222222222222222222222222222222222222222222222 1 1 Execute subtraction 1 0.1608 × 10 11 4 12 2222222222222222222222222222222222222222222222222222222 1 Normalize result 1 1.608 × 10 1 Figure 8-17. Steps in a floating-point subtraction.
222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 Cycle 21 22222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 1 1 Step 1 2 3 4 5 6 7 21 22222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Fetch operands B1 , C1 1 B2 , C2 1 B3 , C3 1 B4 , C4 1 B5 , C5 1 B6 , C6 1 B7 , C7 1 1 1 1 B1 , C1 1 B2 , C2 1 B3 , C3 1 B4 , C4 1 B5 , C5 1 B6 , C6 1 Adjust exponent 1 21 22222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222 + C B + C B + C B + C B + C Execute operation1 B 1 1 2 2 1 3 3 1 4 4 1 5 5 1 1 1 1 1 1 11 11 11 B1 + C1 11 B2 + C2 11 B3 + C3 11 B4 + C4 11 Normalize result 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222
Figure 8-18. A pipelined floating-point adder.
A
B
S
64 24-Bit holding registers for addresses
8 24-Bit address registers
ADD
8 64-Bit scalar registers
64 64-Bit holding registers for scalars
8 64-Bit vector registers
ADD
ADD
ADD
BOOLEAN
MUL
BOOLEAN
SHIFT
RECIP.
SHIFT
MUL Address units
64 Elements per register
T
POP. COUNT Scalar integer units
Scalar/vector floatng-point units
Vector integer units
Figure 8-19. Registers and functional units of the Cray-1
CPU 2 Write 200 1
Write 100
x
Read 2x
Read 2x
3
W100
W100
W200
W200
R3 = 100
R4 = 200
R3 = 200
W200
W100
R3 = 200
R4 = 200
R3 = 100
R4 = 200
R3 = 200
R4 = 100
R4 = 200
R4 = 200
R3 = 100
(b)
(c)
(d)
4 (a)
Figure 8-20. (a) Two CPUs writing and two CPUs reading a common memory word. (b) - (d) Three possible ways the two writes and four reads might be interleaved in time.
Write
CPU A
1A
CPU B
1B
2A
CPU C
1C
1D 1E
2B
2C
3A
3B
1F
3C
Synchronization point Time
Figure 8-21. Weakly consistent memory uses synchronization operations to divide time into sequential epochs.
2D
CPU
CPU
M
Shared memory
Private memory
Shared memory CPU
CPU
M
CPU
CPU
Cache Bus (a)
(b)
(c)
Figure 8-22. Three bus-based multiprocessors. (a) Without caching. (b) With caching. (c) With caching and private memories.
M
22222222222222222222222222222222222222222222222222222222222222 122222222222222222222222222222222222222222222222222222222222222 1 1 Action 1 Local request Remote request 1 1 1 1 Read miss Fetch data from memory 22222222222222222222222222222222222222222222222222222222222222 1 1 1 1 122222222222222222222222222222222222222222222222222222222222222 1 1 1 Read hit Use data from local cache 1 Write miss 1 Update data in memory 1 1 22222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 Write hit 1 Update cache and memory 1 Invalidate cache entry 1 22222222222222222222222222222222222222222222222222222222222222 Figure 8-23. The write through cache coherence protocol. The empty boxes indicate that no action is taken.
CPU 1
CPU 2
CPU 3 Memory
(a)
CPU 1 reads block A
A Exclusive Bus
Cache
CPU 1
CPU 2
CPU 3 Memory
(b)
CPU 2 reads block A
A Shared
Shared Bus
CPU 1
CPU 2
CPU 3 Memory
(c)
CPU 2 writes block A
A Modified Bus
CPU 1
CPU 2
CPU 3
A
A
Memory
(d) Shared
CPU 3 reads block A
Shared Bus
CPU 1
CPU 2
CPU 3 Memory
(e)
CPU 2 writes block A
A Modified Bus
CPU 1
CPU 2
CPU 3 Memory
(f)
CPU 1 writes block A
A Modified Bus
Figure 8-24. The MESI cache coherence protocol.
111
110
101
100
011
010
001
000
Memories Crosspoint switch is open
000 001
CPUs
010
(b)
011
Crosspoint switch is closed
100 101 110 111 (c)
Closed crosspoint switch
Open crosspoint switch (a)
Figure 8-25. (a) An 8 × 8 crossbar switch. (b) An open crosspoint. (c) A closed crosspoint.
16 × 16 Crossbar switch (Gigaplane-XB) Transfer unit is 64-byte cache block Board had 4 GB + 4 CPUs
UltraSPARC CPU
… 0
1
2
1-GB memory module
14
15
Four address buses for snooping
Figure 8-26. The Sun Enterprise 10000 symmetric multiprocessor.
A
X
B
Y (a)
Module
Address
Opcode
(b)
Figure 8-27. (a) A 2 × 2 switch. (b) A message format.
Value
3 Stages CPUs
Memories
000 001
1A
2A
000
3A
b
b
010
1B
2B
b
010
3B
011
011 b
100 1C
100 3C
2C
101 110 111
001
101 a
a 1D
a
2D
a
3D
Figure 8-28. An omega switching network.
110 111
CPU Memory
MMU
Local bus
CPU Memory
Local bus
CPU Memory
Local bus
CPU Memory
Local bus
System bus
Figure 8-29. A NUMA machine based on two levels of buses. The Cm* was the first multiprocessor to use this design.
Node 0
Node 1
CPU Memory
CPU Memory
Local bus
Local bus
Node 255 CPU Memory
Directory
… Local bus
Interconnection network (a) 218-1 Bits
8
18
6
Node
Block
Offset
(b)
4 3 2 1 0
0 0 1 0 0
82
(c)
Figure 8-30. (a) A 256-node directory-based multiprocessor. (b) Division of a 32-bit memory address into fields. (c) The directory at node 36.
Intercluster interface CPU with cache
Intercluster bus (nonsnooping) Memory
D
0
1
D
4
5
D
8
12
9
D
13
D
D
2
D
D
6
D
D
10
D
D
14
Local bus (snooping)
3
7
11
15
D
D
D
D Directory
Cluster
(a)
Cluster Block This is the directory for cluster 13. This bit tells whether cluster 0 has block 1 of the memory homed here in any of its caches.
0 1 2 3 4 5 6 7 8 9…
3 2 1 0
State 15
Uncached, shared, modified
(b)
Figure 8-31. (a) The DASH architecture. (b) A DASH directory.
Quad board with 4 Pentium Pros and up to 4 GB of RAM Snooping bus interface Directory controller
32-MB cache RAM Directory
Data pump
IQ board
SCI ring
RAM
CPU
Figure 8-32. The NUMA-Q multiprocessor.
Local memory table at home node
Bits 6 7 13 Back State Tag 219-1
6 Fwd
Back State
Tag
Fwd
Back State
Tag
Fwd
0 Node 4 cache directory
Node 9 cache directory
Node 22 cache directory
Figure 8-33. SCI chains all the holders of a given cache line together in a doubly-linked list. In this example, a line is shown cached at three nodes.
CPU
Node
Memory
…
…
Local interconnect
Disk and I/O
…
Local interconnect
Communication processor High-performance interconnection network
Figure 8-34. A generic multicomputer.
Disk and I/O
Network
Disk
Tape
GigaRing
Alpha
Shell
Node
Mem
Alpha
Mem
Control + E registers
Control + E registers
Commun. processor
Commun. processor
Alpha
…
Full-duplex 3D torus
Figure 8-35. The Cray Research T3E.
Mem
Control + E registers Commun. processor
Kestrel board
64-Bit local bus
38
PPro
PPro
64 MB
I/O
NIC
PPro
PPro
64 MB
I/O
NIC
32 2
64-Bit local bus
(a)
(b)
Figure 8-36. The Intel/Sandia Option Red system. (a) The kestrel board. (b) The interconnection network.
CPU group
CPU group
CPU group
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6 7
1
1
4
2
3 5
Time
3
9
5
8 9
6
7
3 6
2
6 8
4
9
5
4
1 7
2
8
7
(a)
(b)
(c)
Figure 8-37. Scheduling a COW. (a) FIFO. (b) Without headof-line blocking. (c) Tiling. The shaded areas indicate idle CPUs.
CPU
CPU
CPU Backplane
Packet going east
Packet going west
(a)
Line card Ethernet (b)
Figure 8-38. (a) Three computers on an Ethernet. (b) An Ethernet switch.
Switch
CPU 1
2
3
4
Cell 7
5 Packet
6
Port
8
Virtual circuit
9
11
10
12
ATM switch
13 14
15 16
Figure 8-39. Sixteen CPUs connected by four ATM switches. Two virtual circuits are shown.
Globally shared virtual memory consisting of 16 pages 0
0
1
2
2
5
9
3
4
5
6
1
3
8
10
CPU 0
7
8
6
9
10 11 12 13 14 15
4
7
12
14
CPU 1
2
9
10
5
1
3
6
8
CPU 0
13
15 Memory
CPU 2
CPU 3
Network
(a)
0
11
4
7
12
14
CPU 1
11
13
CPU 2
15
CPU 3
(b)
0
2
9
10 CPU 0
5
1
3
8
10
6
CPU 1
4
7
12
14 CPU 2
11
13
15
CPU 3
(c)
Figure 8-40. A virtual address space consisting of 16 pages spread over four nodes of a multicomputer. (a) The initial situation. (b) After CPU 0 references page 10. (c) After CPU 1 references page 10, here assumed to be a read-only page.
(′′abc′′, 2, 5) (′′matrix-1′′, 1, 6, 3.14) (′′family′′, ′′is sister′′, Carolyn, Elinor) Figure 8-41. Three Linda tuples.
Object implementation stack; top:integer; # storage for the stack stack: array [integer 0..N-1] of integer; operation push(item: integer); function returning nothing begin stack[top] := item; push item onto the stack top := top + 1; # increment the stack pointer end; operation pop( ): integer; begin guard top > 0 do top := top - 1; return stack[top]; od; end; begin top := 0; end;
# function returning an integer # suspend if the stack is empty # decrement the stack pointer # return the top item
# initialization
Figure 8-42. A simplified ORCA stack object, with internal data and two operations.
A BINARY NUMBERS
1
dn
…
100's place
10's place
1's place
d2
d1
d0
.
.1's place
.01's place
.001's place
d–1
d–2
d–3
n
Number =
Σ
di × 10i
i = –k
Figure A-1. The general form of a decimal number.
…
d–k
1
Binary
1
Octal
1
1
1
1× 1024
+1× + 512
+1× + 256
+1× + 128
3
7
2
1
210
29
28
27
+1× + 64
0 26
+0× +0
1 25
+1× + 16
0 24
+0× +0
0 23
+0× +0
0 22
+0× +0
1 21
+ 1 × 20 +1
3 × 8 + 7 × 8 + 2 × 8 + 1 × 80 1536 + 448 + 16 + 1 3
Decimal
2
2
0
1
0
1
2 × 103 + 0 × 102 + 0 × 101 + 1 × 100 +0 +1 2000 + 0 Hexadecimal
7
D
1
.
7 × 162 + 13 × 161 + 1 × 160 1792 + 208 +1
Figure A-2. The number 2001 in binary, octal, and hexadecimal.
2222222222222222222222222222222222222222 1 Octal 1 Hex 1 Decimal 1 Binary 21 222222222222222222222222222222222222222 1 0 1 0 1 0 1 0 1 21 222222222222222222222222222222222222222 1 1 1 1 1 1 1 1 1 1 1 1 21 222222222222222222222222222222222222222 2 1 10 1 2 1 2 1 21 222222222222222222222222222222222222222 3 1 11 1 3 1 3 1 21 222222222222222222222222222222222222222 1 1 1 1 1 4 1 100 1 3 1 3 1 21 222222222222222222222222222222222222222 5 1 101 1 5 1 5 1 21 222222222222222222222222222222222222222 6 1 110 1 6 1 6 1 21 222222222222222222222222222222222222222 1 1 1 1 7 111 7 7 1 21 222222222222222222222222222222222222222 1 1 1 1 8 1 1000 1 10 1 8 1 21 222222222222222222222222222222222222222 9 1 1001 1 11 1 9 1 21 222222222222222222222222222222222222222 12222222222222222222222222222222222222222 1 1 1 10 1010 12 A 1 1 1 1 1 1 11 1 1011 1 13 1 B 1 21 222222222222222222222222222222222222222 12 1 1100 1 14 1 C 1 21 222222222222222222222222222222222222222 13 1 1101 1 15 1 D 1 21 222222222222222222222222222222222222222 1 14 1 1110 1 16 1 E 1 21 222222222222222222222222222222222222222 1 1 1 1 15 1 1111 1 17 1 F 1 21 222222222222222222222222222222222222222 16 1 10000 1 20 1 10 1 21 222222222222222222222222222222222222222 20 1 10100 1 24 1 14 1 21 222222222222222222222222222222222222222 1 1 1 1 1 30 1 11110 1 36 1 1E 1 21 222222222222222222222222222222222222222 40 1 101000 1 50 1 28 1 21 222222222222222222222222222222222222222 50 1 110010 1 62 1 32 1 21 222222222222222222222222222222222222222 1 1 1 60 111100 74 1 3C 1 21 222222222222222222222222222222222222222 1 1 1 1 70 1 1000110 1 106 1 46 1 21 222222222222222222222222222222222222222 80 1 1010000 1 120 1 50 1 21 222222222222222222222222222222222222222 12222222222222222222222222222222222222222 1 90 1011010 1 132 1 5A 1 1 1 1 1 1 100 1 11001000 1 144 1 64 1 21 222222222222222222222222222222222222222 1000 1 1111101000 1 1750 1 3E8 1 21 222222222222222222222222222222222222222 112222222222222222222222222222222222222222 2989 11 101110101101 11 5655 11 BA 11 Figure A-3. Decimal numbers and their binary, octal, and hexadecimal equivalents.
Example 1 Hexadecimal Binary Octal
. B 6 0 0 0 1 1 0 0 1 0 1 0 0 1 0 0 0. 1 0 1 1 0 1 1 0 0 5 0 . 5 1 4 1 5 4 1
9
4
8
Example 2 Hexadecimal Binary Octal
C 4 . B 0 1 1 1 1 0 1 1 1 0 1 0 0 0 1 1. 1 0 1 1 1 1 0 0 0 1 0 0 7 5 3 . 5 7 0 4 6 4 7
B
A
3
Figure A-4. Examples of octal-to-binary and hexadecimal-tobinary conversion.
Quotients
Remainders
1492 746
0
373
0
186
1
93
0
46
1
23
0
11
1
5
1
2
1
1
0
0
1
1 0 1 1 1 0 1 0 1 0 0 = 149210
Figure A-5. Conversion of the decimal number 1492 to binary by successive halving, starting at the top and working downward. For example, 93 divided by 2 yields a quotient of 46 and a remainder of 1, written on the line below it.
1
0
1
1
1
0
1
1
0
1
1
1 1 + 2 × 1499 = 2999
Result
1 + 2 × 749 = 1499 1 + 2 × 374 = 749 0 + 2 × 187 = 374 1 + 2 × 93 = 187 1 + 2 × 46 = 93 0 + 2 × 23 = 46 1 + 2 × 11 = 23 1 + 2 × 5 = 11 1+2×2=5 0+2×1=2 1+2×0=1
Start here
Figure A-6. Conversion of the binary number 101110110111 to decimal by successive doubling, starting at the bottom. Each line is formed by doubling the one below it and adding the corresponding bit. For example, 749 is twice 374 plus the 1 bit on the same line as 749.
222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 N N −N −N −N −N 1 1 1 1 1 decimal 1 binary signed mag. 1’s compl. 2’s compl. excess 128 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 00000001 1 10000001 1 11111110 1 11111111 1 01111111 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 2 1 00000010 1 10000010 1 11111101 1 11111110 1 01111110 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 3 1 00000011 1 10000011 1 11111100 1 11111101 1 01111101 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 4 1 00000100 1 10000100 1 11111011 1 11111100 1 01111100 1 2 22222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 5 1 00000101 1 10000101 1 11111010 1 11111011 1 01111011 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 6 1 00000110 1 10000110 1 11111001 1 11111010 1 01111010 1 1 1 1 1 1 1 1 7 00000111 10000111 11111000 11111001 01111001 2 22222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 8 1 00001000 1 10001000 1 11110111 1 11111000 1 01111000 1 1 1 1 1 1 1 1 9 1 00001001 1 10001001 1 11110110 1 11110111 1 01110111 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 10 1 00001010 1 10001010 1 11110101 1 11110110 1 01110110 1 1 1 1 1 1 1 1 20 1 00010100 1 10010100 1 11101011 1 11101100 1 01101100 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 30 1 00011110 1 10011110 1 11100001 1 11100010 1 01100010 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 40 1 00101000 1 10101000 1 11010111 1 11011000 1 01011000 1 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 50 1 00110010 1 10110010 1 11001101 1 11001110 1 01001110 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 60 1 00111100 1 10111100 1 11000011 1 11000100 1 01000100 1 2 22222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 70 1 01000110 1 11000110 1 10111001 1 10111010 1 00111010 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 80 1 01010000 1 11010000 1 10101111 1 10110000 1 00110000 1 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 90 01011010 11011010 10100101 10100110 00100110 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 100 01100100 11011010 10011011 10011100 00011100 1 1 1 1 1 1 1 1 127 1 01111111 1 11111111 1 10000000 1 10000001 1 00000001 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 11222222222222222222222222222222222222222222222222222222222222222222222222222222222 128 11 Nonexistent 11 Nonexistent 11 Nonexistent 11 10000000 11 00000000 11
Figure A-7. Negative 8-bit numbers in four systems.
Addend Augend Sum Carry
0 +0 33 0 0
0 +1 33 1 0
1 +0 33 1 0
Figure A-8. The addition table in binary.
1 +1 33 0 1
Decimal
1's complement
2's complement
10 + (−3)
00001010 11111100
00001010 11111101
+7
1 00000110
1 00000111
carry 1
discarded
00000111 Figure A-9. Addition in one’s complement and two’s complement.
B FLOATING-POINT NUMBERS
1
5 Positive underflow
3 Negative underflow 1 Negative overflow
—10100
2 Expressible negative numbers
4 Zero
—10—100 0
6 Expressible positive numbers
10—100
7 Positive overflow
10100
Figure B-1. The real number line can be divided into seven regions.
2 22222222222222222222222222222222222222222222222222222222222222 1 Digits in fraction 1 Digits in exponent 1 Lower bound 1 Upper bound1 12 22222222222222222222222222222222222222222222222222222222222222 1 1 1 1 −12 9 1 1 1 10 1 1 3 1 10 21 22222222222222222222222222222222222222222222222222222222222222 1 1 1 1 −102 3 2 1099 12 22222222222222222222222222222222222222222222222222222222222222 1 1 10 1 1 1 1 1 1 1 −1002 999 3 3 10 10 12 22222222222222222222222222222222222222222222222222222222222222 1 1 1 1 −10002 9999 1 1 1 1 1 3 4 10 10 12 22222222222222222222222222222222222222222222222222222222222222 1 1 1 1 −13 9 1 1 1 1 1 4 1 10 10 21 22222222222222222222222222222222222222222222222222222222222222 1 1 1 1 12 22222222222222222222222222222222222222222222222222222222222222 1 1 10−103 1 1 4 2 1099 1 1 1 1 1 −1003 4 3 10999 12 22222222222222222222222222222222222222222222222222222222222222 1 1 10 1 1 1 1 1 1 1 −10003 4 4 109999 12 22222222222222222222222222222222222222222222222222222222222222 1 1 10 1 1 −14 9 1 1 1 1 1 5 1 10 10 21 22222222222222222222222222222222222222222222222222222222222222 1 1 1 1 −104 99 12 22222222222222222222222222222222222222222222222222222222222222 1 1 10 1 1 5 2 10 1 1 1 1 1 −1004 5 3 10999 12 22222222222222222222222222222222222222222222222222222222222222 1 1 10 1 1 1 1 1 1 1 −10004 9999 5 4 10 10 12 22222222222222222222222222222222222222222222222222222222222222 1 1 1 1 −1009 999 1 1 1 1 1 10 3 10 10 21 22222222222222222222222222222222222222222222222222222222222222 1 1 1 1 −1019 999 1 1 1 1 1 20 3 10 12 22222222222222222222222222222222222222222222222222222222222222 1 1 10 1 1
Figure B-2. The approximate lower and upper bounds of expressible (unnormalized) floating-point decimal numbers.
Example 1: Exponentiation to the base 2 2–2 2
Unnormalized:
0 1010100
–1
.0
2–4 2
–3
2–6 2
–5
2–8 2
–7
2–10 2
–9
2
2–12 –11
2
2–14 –13
2
2–16 –15
20 –12 –13 –15 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 = 2 (1 × 2 + 1 × 2 + 1 × 2
+ 1 × 2–16) = 432 Sign Excess 64 Fraction is 1 × 2–12+ 1 × 2–13 –15 –16 + exponent is +1 × 2 + 1 × 2 84 – 64 = 20 To normalize, shift the fraction left 11 bits and subtract 11 from the exponent. Normalized:
0 1001001
.1
1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 = 29 (1 × 2–1+ 1 × 2–2+ 1 × 2–4 + 1 × 2–5) = 432
Fraction is 1 × 2–1 + 1 × 2–2 +1 × 2–4 + 1 × 2–5
Sign Excess 64 + exponent is 73 – 64 = 9
Example 2: Exponentiation to the base 16
Unnormalized:
0 1000101
.
16–1
16–2
16–3
0 0 00
0 0 00
0 0 01
16–4 1 0 1 1 = 165 (1 × 16–3+ B × 16–4) = 432
Fraction is 1 × 16–3 + B × 16–4
Sign Excess 64 + exponent is 69 – 64 = 5
To normalize, shift the fraction left 2 hexadecimal digits, and subtract 2 from the exponent. Normalized:
0 1000011 Sign Excess 64 + exponent is 67 – 64 = 3
.
0001
1011
0000
0 0 0 0 = 163 (1 × 16–1+ B × 16–2) = 432
Fraction is 1 × 16–1 + B × 16–2
Figure B-3. Examples of normalized floating-point numbers.
Bits 1
8
23 Fraction
Sign
Exponent (a)
Bits 1
11
52
Exponent
Fraction
Sign (b)
Figure B-4. IEEE floating-point formats. (a) Single precision. (b) Double precision.
22222222222222222222222222222222222222222222222222222222222222222222222 1 1 Single precision 1 Double precision 1 Item 22222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 122222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Bits in sign 1 1 1 1 1 1 Bits in exponent 8 11 122222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 Bits in fraction 1 1 1 23 52 122222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 Bits, total 1 1 1 32 64 22222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 Exponent system Excess 127 Excess 1023 122222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 Exponent range −126 to +127 −1022 to +1023 122222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 −126 −1022 1 Smallest normalized number 1 1 1 2 2 22222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 128 1024 122222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Largest normalized number approx. 2 approx. 2 1 1 1 1 −38 Decimal range to 1038 1 approx. 10−308 to 103081 122222222222222222222222222222222222222222222222222222222222222222222222 1 approx. 10 1 Smallest denormalized number1 1 1 approx. 10−45 approx. 10−324 122222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Figure B-5. Characteristics of IEEE floating-point numbers.
Normalized ±
0 < Exp < Max
Any bit pattern
Denormalized ±
0
Any nonzero bit pattern
Zero ±
0
0
Infinity ±
1 1 1…1
0
Not a number ±
1 1 1…1
Any nonzero bit pattern
Sign bit
Figure B-6. IEEE numerical types.