This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
) H (r - n) - E0 (grad(/?<) H (n - r) = -eo^Sfr ) l {r\) Jdr8{r ) and ip^(r,9,(p) constitute the Green's function associated with the point charge q\ located at (ri,0\, ), respectively. ) for r —> ri. Therefore, since the potential function
<
- r{)er + e0ip 5{r - n)er,
(26a) (26b)
respectively. With D > e r = D> and D < e r = Df and the fact that the supports of the functions H (r — r,), H (r* — r), and S(r — rt) are mutually exclusive, and, at the same time, their union spans the entire r—axis, we can split (25) and (26b) into the following equations: (V-D>)H(r-ri)=0
(27a)
(V-D<)ff(ri-r)=0
(27b)
D> (r, 6,4>) 6(r - n) - D< (r, 6, <j>) 6(r -
n)
= pi-^6 (r - n) 6(cos0 - cos6»i) 6 (> -
(27c)
D> (r, 0,0) H (r - n) = -e0 ( g r a d ^ (r, 0,0)) H {r - n)
(28a)
D< (r, 0,0) # (r4 - r) = - e 0 (grad^< (r, 0,0)) if (n - r)
(28b)
and
- e o ^ (r, 0,0) <5(r - r,)e r + e 0 y < (r, 0, >) <5(r - ri)e r = 0, respectively. It is worth mentioning that integrating
(28c)
Factorization of Potential and Field Distributions
181
(27c) with respect to r from 0 to oo results in the "interface condition" for D r at r = rj. Likewise, multiplying (28c) from both sides by e r and integrating the resulting equation with respect to r from 0 to oo, we obtain the "interface condition" for ip at r = r,. Putting together what we have found so far, we obtain the equations in (29), (30) and (31), which are, respectively, associated with H(r — n ) , H (n — r) and S (r-j — r):
H{r-n):=*{
(29)
H(ri-r):=*{
(30)
[D>(rue,<j>)-D<(ri,e^) — Pih& ( cos # _ COS#i) 0~{4>~ 4>i) 6 (u - r) :-
(31) ^>(ri,6,
=0
Since in the domain r < Ti and r > Ti we have the same differential equations (Eqs. (29) and (29)) the distinction conveyed through the signs < and -* becomes obsolete, and therefore, they will be omitted subsequently. With this omission our governing equations are: V•D = 0 D = -eogrady?
(32a) (32b)
The divergence operator V- applied to the vector field F in spherical co-ordinates is defined as follows:
' • » - ? 5 (•"•')+ ^ ( - «
+sb^
P3)
182
A.-R.
Baghai-Wadji
& E. Li
We earlier had denned the grad operator, Eq. (23). With these definitions (32) can be written in the form:
_
dip
D =
1 dip
.„ , ,
-£°-&7e>- ~ £°-r-deee ~ £o7^6d^^
I dip
(34b)
Writing the vector equation (34b) in the form of three scalar equation (34) gives the following set of first order equations for the four scalar functions ip, Dr, Dg, and D$.
?|(^) + sbl< s '" s "'» + sb^= 0
(35a)
Dr = - e o ^
(35b)
Dg = -e0-%s
(35c)
D
* = -£°7to%
^
It is a simple exercise to convince ourselves, that upon substituting (35b), (35c), and (35d) into (35a) we obtain the standard Laplace equation for the potential function ?. However, it is advantageous to take an alternative path. Our goal is the diagonalization of these equations with respect to r. To this end we proceed as follows: The first step is the identification of all terms with an r—derivative. Inspecting (35) we realize that r—derivatives of only ip and Dr occur. We refer to these field components as the essential field components for radial diagonalization. As will be seen shortly, all the remaining field components can be eliminated. In the present case these are Dg and D^. Note that (35b) is already in the desirable final form: Firstly, it exclusively involves the essential variables, and secondly, the r—derivative of only one field component occur. Substituting (35c) and (35d) into (35a) we realize that the latter equation also possesses these criteria: it only involves the essential variables ip, and Dr, and the r—derivative of only one of the variables appears in the equation. With aforementioned substitutions we obtain:
Factorization of Potential and Field Distributions
r2
r
gr \
I
i
£o
d ( sme
r^edG(
id? £o
rde
183
1 dy N = 0 rsvaO d(j) \ rsinO d(j>, (36a)
-1DP £o
T
= *£
(36b)
dr
In the second and third terms in (36a) the term 1/r can be moved behind the d/d6 and d/d
i d £o sin0 06
(37a)
dr
£o r" q2
(37b)
H^-B^-S^)
Note t h a t in (37a) we have introduced r2Dr. have
f n 7*2
£
. o s b s 5 (sin<%) + £ 0 ^ 3 5 *
Written in matrix form we
r2D.
°
_9_ dr
We can obtain an alternative formulation by considering the relationship
!L(r2Dr)=2rDr
+ r2^rDT
(39)
in (38), i.e.,
£0
£o?r r2sin9
08
(sm6w)
+
e0ris]n'ieg^
2 r
" 'P ' .Dr.
d '
Obviously considering tp and DT (instead of ip and r2Dr) as essential field variables destroys the antisymmetric structure of the matrix at the LHS of (38).
A.-R.
184
Baghai-Wadji
& E. Li
We conclude this section by the following comment, which is significant in numerical calculations. Introducing the variables y/£o
\/eoV
iib^( s i n 6 l M)
+
^e3^
0
y/£0
d_ dr r2-Sa=
(41)
4. Derivation of Green's Functions in Spherical Co-ordinates In what follows we apply the above mentioned recipe to the simplest possible case, i.e., the determination of the potential distribution due to a point charge source in free space. In an (r, 9, cp) spherical co-ordinate system consider a point charge specified by the strength qi and the coordinates (ri,9i,<j>i). As mentioned earlier, the radial distance r\ of the source point from the center of the co-ordinate system subdivides the entire space into two parts: r < r\ (domain I) and r > r\ (domain II), which are separated by the sphere r = r\. In the following the field variables in the domains I and II, respectively, will be denoted by the superindices 1 and 2.
4.1. Potential
Distribution
For the potential distributions associated with q\ we obtain the following expressions in the regions I and II: oo
I
^)(r,M) = E E 4&y*WM) /=0 m=—l oo I
2
(42a)
*
^ >M,4>) = E E Bft^YUO,®
(42b)
1=0 m=-l
The choice of these expressions for the potential function guarantees that the series in (42) are finite in their respective domains of definitions.
Factorization of Potential and Field Distributions
185
Domain II
Fig. 2. Given a single point charge q\, we can select an arbitrary spherical coordinate system (r,6,
4.2. Radial Displacement
Components
Using the relationship Dr = —eod
oo
D^{r,e,cj>) = £ (=0 oo
D?\r,e,
I
J2
{-eo^'-'JlWM)
(43a)
m=-l I
E
1=0 m=-l
f
1 1 ° %(1 + !)3+2 Yim(0,
e B
*•
(43b)
'
For quick reference we summarize t h e formulas in (42) and (43) in t h e following slightly modified form:
A.-R.
186
Baghai-Wadji
oo
I
^1)(r,6»,0) = £
£
1=0 oo
^)(r,M) =
& E. Li
(44a)
[^VjlWM)
m=-l
Z
£
£
(2) 1 B,im Ylm{0,
(44b)
/=0 m = - / oo Z
^(r,«,0) =
5 : E { - e o ; [ ^ r ' ] } y l l B ( t f9,4) I
(44c)
(=0 m = - i oo £
Ylm{9,CJ>)
(44d)
As will be clear shortly, it is computationally advantages to work with the coefficients [-A^rj] and [BJ^/r1^1] as our unknowns, rather than A)^ and B J m . The square brackets in (44) are meant to point out this fact. For the determination of [-*4;mVi] and [-Bjm /r^"1"1], involved in these equations, we utilize the "interface conditions," which are formulated in the next section. 4.3. Interface
Conditions
The solutions in regions I and II satisfy the following interface conditions on the (ri)— sphere: (45a) •Dl1Hrt,0,
4.4. Determination
(45b)
lim {n ± el = r, .
of the Coefficients [A^r1^
and
For the determination of the coefficients [-Aj^r^] and [-BjJ/r[ +1 ] (I £ N°,rn = —/,••• ,1) we proceed as follows (N° denotes positive integers): (i) Substitute Eqs. (44) for the fields into the interface conditions (45).
Factorization
of Potential and Field
Distributions
187
(ii) Multiply both sides of the resulting equations by Yt*m, (9, <j>) (I € N°, m = —I, • • • ,1). Here and in the following * stands for the complex conjugation, (iii) Exchange the order of summations and integrations, (iv) Integrate the terms on both sides of the equations over the surface of the r\ — sphere. (Recall that the surface element on the r\— sphere is r\d9sm.9d(t>.) (v) Use the orthogonality relationship for the spherical functions (7). In the following we apply the steps listed in the above recipe to
TV
J d9sin9 J d
^¥^,(9,
0 7T
/- „
27T
= J d9sin9j # Q
E
Q
i
£
[AW('T)'] > W M )
V 1=0 m = — I
xy; m ,(0,<£)(rr) 2 OO
(46a)
I
= (T)2£ E KW] 1=0 m=-l
x J d9sin9 J <Wm,(0,4>)Yim(9, >) 0
(46b)
0 oo
2
/
5
= (-D E »' E l—Q oo
[MnlirlY] < W
(46c)
m=-l
= K ^ E - ^ K ^ r ) ' ]
(46d)
1=0
= (f)2
[4Urif]
(46e)
188
A.-R. Baghai-Wadji
& E. Li
In transition from (46b) to (46c) we used the orthogonality relationship (7).
4.4.2. Projection of the Potential Function ) TV
2n
J desine J dfoW (r+,o, <£)y,rm, (e, 4,) ( r + )2 0 2TT
00
f dOsinO / " # < J T
1
1
E B (2)
Ylm{QA)
1=0 m=-
xy;m,(0,0)(r+)2sin^#
(^)2E=0 m=-l £
B,
(47a)
(2)
2TT
J dOsmO J d4Ytrm, (9, >)Y,m (9, <£) 0
{rt?
(47b)
0 ! S,?(2) i m ' '(r+)''+ 1 .
(47c)
4.4.3. Interface Condition for the Potential Function By virtue of (45a) we equate the results obtained from (46e) and (47c) and obtain
M)
X
/ _ - / _ o(2)
(48)
Replacing V and m' by I and m, respectively, together with setting rf = r]*" = ri we obtain: 4(1)-J _ p ( a )
2
(49)
Factorization
of Potential and Field
Distributions
189
4.4.4. Projection of the Radial Displacement Function Dr ' (rx , 9,
(9, <j>) (rf ) 2
J d9sm9 J d
= fd9sm6 / # J E E
{-^)\A^)l]Ylm{9,4>)
xy; m ,(0»(rf) 2
(50a)
= (OaE(-^)E[^(T)'] x /" d9sin9 J d<S>Y{,ml (9,
(50b)
(50c)
4.4.5. Projection of the Radial Displacement Function D-.(2)^ r '(r1 ,9,<j>) onto the Co-ordinate Function Yi>m> (9,4>) ir
2n
J d6sm9 f dfiDW (r+,9, 0)y,?m, (9, cj>) (r+ o 2TT
oo
= /d9sin9 J'dJ£
/
Y, U^
xY,1„,(S,4,)(rt)2
(2)
J-
B' m (r+)'+! I W M ) (51a)
i\ ± r. 2x
x y dOBinff Jdd>Y!n.(9,
_ W)'(«£il) [B£
>)Ylm(e, <£)
'
(51b) (51c)
A.-R.
190
Baghai-Wadji
& E. Li
4.4.6. Projection of the Source Term pr<5(cos# — cos6i)5{(j) — fa)6(r — r\) onto the Co-ordinate Function Yi'm>(6, fa 2-rr
oo
dr o
o o
x { gi *(cos0 - coS61)6((/> - fa)6(r -
roll^MXri)
2
}
7T
= qi I d6sin66(cos9 — cosOi) o f 2ir
\
oo
x J Jd
- n)i
=v,Tm,(».*i)
(52a) 7T
gi / d0sin0<5(cos0 - cos9{)Y{!ml (0, fa)
(52b)
«ir,r m ,(fli,0i)
(52c)
Substituting the results obtained in (50c), (51c), and (52c) into (45b) we have
(rn
a
(^)[^(rr)'']+(rf)
a
(
e
„^i
! I'm' ( r + ) J ' + l
D(2)
x=qiYlrm,(el,fa)
(53)
Using r1 = rf = r i , and replacing /' and m' by / and m, respectively, we arrive at:
i[AJJM]+(/ + !)
(2) £ Jm
1
J+l
eon
YLifiufa)
We can write (49) and (54) in the matrix form as follows:
(54)
Factorization of Potential and Field Distributions W l - 1 ' r AAlmrrl
"1
1
191
0 (55)
./
1
R «
1 + 1.
L
r
l
-1
.^r^(^^i).
Multiplying both sides of (55) by the matrix
1 21 + 1
7+ 1
1
-I
1
(56)
which is the inverse of the matrix at the left-hand side of (55) we obtain A(1)rl
1 2/ + 1
A
lmrl
R (2)
1
Z+ l
0
1
(57a) -I
1
^ m ^ l . ^ ) .
M™(01.*l) (57b)
21 + 1 LeSr^m(»i.*i)
Or, written explicitly,
(58a)
EQTX 21 + 1
[Blm^[+1
91
eon
(58b)
2T+-iY^e^-
•*. A ( 1 V
Vi
"*
A
lmri
Fig. 3. Network representation of the algorithm for the calculation of the expansion coefficients - A ^ n in the 1st layer.
Or, finally, in order to draw a conclusion, we write
192
A.-R. Baghai-Wadji & E. Li
,(2)J_ B lm 1+1
qi
V,
r
Fig. 4. Network representation of the algorithm for the calculation of the expansion coefficients M?j m Ai + * in the 2nd layer.
9i Y? WiM e o n 21 + 1 r[ m
(1)
B\
(2)
e 0 n 2/ + 1
(59a)
ri+117m(0i,0i).
(59b)
Substituting A\J and B\Jl into the expressions for (/^^(r, 9, 0) and ip{2){r,e,<j>) in (42) we obtain:
£STT;T 1 S - ( *-*'> oo
rlYlm{6,4>)
(60a)
/
^)(r-^,^=x:E
9i ! j+iv, e 0 r i 2/ + ^ Y ^ i ^ i )
1=0 m=-l 1 XYi {9,>) .;+i m
(60b)
The potential functions
Factorization
of Potential and Field
Distributions
193
4.5. Potential Functions Due to two Point Charges on the Same Sphere Prom the above analysis it is immediate that for two point charges q\ and q2 being located at two points with the co-ordinates (r%, 6i,(j>i) and ( n , 62,
L
1=0 m— — l
92
+ OO
e0ri 2/
(
-
(61b)
92
eon 21 + : We will return to these formulae when we have obtained the functions associated with three layers in the next section. 5. Three-Layer Problem In this section we consider two point charges q\ and q2, which we assume to be located at (ri,6i,<j>i) and (r2,62,(^2)'• The radii r\ and r2 (ri < r2) partition the entire space into three regions Region I: r < r\, Region II: r\ < r < r2, Region III: r2 < r. With the formula for the radial component of the displacment vector Dr(r,0,<j>), i.e., Dr(r,0,<j>) = —eodf/dr, we have the following expressions for the potential and the dielectric displacement in each region: Region I: 00
1
(62a)
1=0 m=-l
DW(r, M) = E E ( -£
A.-R. Baghai-Wadji & E. Li
194
Domain III
Fig. 5. In the figure are shown two point charge sources, q\ and (72, with the coordinates {ri,0i,<j)i) and (7"2,02>02), respectively, with respect to an arbitrarily chosen co-ordinate system. In this case two spheres with their centers at the origin of the co-ordinate system, and radii r\ and T2 ( n < T2), passing through the position of the charges q\ and 92, respectively, subdivide the entire space into three domains: Domain I (0 < r < n ) , domain II (ri < r < T2), and domain III (7-2 < r).
00
I
^>M,4>) = £ £
{ [A&V]ylm(M)
(63a)
1=0 m=-l (2)
+
00
(
\Blm-[TT ,
,(M)}
. x
D?\rA4>)=Y, £ { (-*o;) [^V] y,m(M) + Uo
i+ 1
(2) 1 B,lm _/+l
ljm(M)}
(63b)
Factorization of Potential and Field Distributions
195
Fig. 6. In this figure it is shown that we may equally well choose the reference co-ordinate system (£,77, C), rather than the (x,y,z) co-ordinate system.
Region III: 8)
00
/
*< (r, »,*) = £ £ Z=0 00
R(3)JL" l+l
"Im
r
Ylm{6,cf>)
(64a)
B{3)
(64b)
m--l I
(•m
lm
— Ylm(6,<j>) rl+l
S o u r c e F u n c t i o n s The source functions are specified as follows:
Q^(r,
6,>) = qAs(r
- nMcose
- c o s f c ) ^ - &)
i = 1,2 In Figure 6 it is shown that we may choose any reference co-ordinate system (£, rj, C) instead of the (x, y, z) co-ordinate system. In this way we
A.-R. Baghai-Wadji & E. Li
196
obtain various equivalent representations for the potential- and field distribution functions in entire space. A very important question for applications is the relationship between the expansion coefficients A and B in the system (x, y, z) and those in the system (£, rj, £)• This point will not be addressed in this paper. Solution of the Problem For the determination of the various coefficients A and B in (62)-(64) we proceed as follows: (1) (2) (3) (4)
Impose the "interface conditions" on the r\— and r 2 — spheres. Multiply both sides of each of the resulting equations by Y{tm,{6, <j>). Integrate over the surfaces of the r\— and r 2 — spheres. Utilize the orthogonality condition in (7). Applying these steps we obtain:
^^ri+flft'-^
(65a)
'4™M - '42M + (' + M22 4 T = ^-y,*ro(fli.0i)
(65b)
ry (2) l
(2)
Eo^i
( 3 )
A r + R — - B — lmr2 + &lm T+T _ °lm T+T
A
r
2
r
ffi^ (.0t)CJ
2
u\y2 ~(i + i)B% ^n + (i + i ) C } ^ r 92 " £or Y^(92,
(65d)
Comment: Consider the coefficients -A^r'J an( ^ - ^ m / r 2 + 1 L which are the field expansion coefficients in the first and the last regions in the present problem, in greater detail: The first region is fully specified by the radius ri (r < 7*1). The last region is specified by only one radius r 2 (r > r 2 ). The second region, or generally speaking, every other region i (i = 2, ••• ,N) is characterized by two radii ri-i and r». Coming back to the coefficients L ^ V J and LBj m /r2 +1 we recognize that in L4/mV' the superindex of A and the subindex of r coincide. Furthermore, we realize that in 5; m '/r 2 + 1 the superindex of B equals the subindex of r minus one. Therefore, whenever, the superindex of A and the index of the associated r coincide, or the superindex of B and the index of the associated r are equal we speak of the natural order of indices. If the natural order is violated
Factorization
of Potential and Field
Distributions
197
in a certain equation we can restore it by multiplying and dividing by a nondimensional quantity as explained in the following examples. Example 1: In (65a) the term Afjri satisfies our condition: The superindex A and the first index of r are equal. But what about the term ^ . ^ r ' ? Obviously our condition is violated. However, we can write l(2) Note that the "cor[4mM] = (»i/ri) [A\y2], by introducing 1 rection term" is smaller than unity: r\/r\ = ri/r2) < 1. Example 2: In (65a) the term [ J S ^ M * 1 satisfies our condition: The superindex B equals the subindex of r minus one. But what about the term 2?j m '/r 2 +1 in (65c)? Obviously our condition is violated. However, we can
i]-
write
[B£M
+ 1
]
- ( r i + 1 M + 1 ) [B^/r1^},
by introducing
[B%/r[^\ +1
Note that here also the "correction term" is smaller than unity: r j /r 2 + 1 = (ri/r2)l+l < 1. Performing these modifications, and dividing the second and forth equations by I + 1 (65) can be written in the following form:
,(2)_J_
["SM]-£) [ ^ H
lm
(66a)
l+l
(2)J_
B,lm
l+l
r
l
1i £0^1 I + 1
(66b)
ir m (0i,&) B{2)
— r
i
l+l
T+iKKl-(s) ^ 7r ^ 7 ^ ( ^ 2 , 0 2 ) £0 2 * + 1
i
m B (3) J+l
(66c)
'2
(2)_j_ B lm l+l
+
(3)_J_ Blm l + l (66d)
Note also that the division by I 4- 1 results in the term 1/(1 + 1) and 1/(1 + 1) which are both less than unity for I > 0. Writing in matrix form we have:
A.-R.
198
Baghai-Wadji
lm
["£M]-[(s)'
& E. Li
£•
(67a) (2) 1
B,lm ZJ+T
AWri
A
lmr2
+
iTi KM] [-*(*)'
(2) l1 r +i
B,lm
9i
(67b)
eori I + 1 YtnVuti)
4(2)^ J+i
(S)
j(2) 1 Jm -.'+'
(3)_J_ 5,lm J+l
(67c)
'9
4 (2 V
A -fef] 92
(2) DR
i^7TTy'»<*'*'>
1 lm ZTT
+
(3)_L_ J+l '2 -
B <™
(67d)
The above equations can be cast into the following matrix equation:
Factorization
of Potential and Field
-1
(-)'
i_ +1
l
Distributions
4™M
1
(T71
-(n.\l HI \r2J
0
1
A
\l+l
B(2)
1
5 (3)
1
-1 1+1
°
1
r
lm 1
L
I ( n. r2J
!
199
i+T - fe) ^iM^i^r)
(68)
L^TTT^mCa.^)
It is worth mentioning that the only information needed to construct the matrix at the left-hand side are the nondimensional quantities 1/(1 + 1) and (ri/r 2 ).
5.1.
Calculation
of the
Coefficients
While we can solve (68) straightforwardly it is instructive to eliminate the coefficients appearing in the boundary regions, i.e., in the first and the last regions first. The benefit of this elimination becomes apparent when we consider more general multilayered problems in the next section. Using (67a) and (67c) we can obviously eliminate L^JJ^i and
Nm/ r 2 + 1 ]
from
( 67b )
and
(67d):
A.-R.
200
Baghai-Wadji
(2),I Afj.r: 2
l T l
-( A
1+ fl
& E. Li
+/+ i
(2) 1
\r2)
(s)'
1 B,(2) 771 I T + T
Qi
(69a)
Eon I + y*im(0i,4i)
4(2)^
1
l+i
(S)
+ -*-
5,(2) 1
-(A
4(2) , lmr2
A
1+1
o(2)
92
1
(69b)
£0^2 J + 1
Combining the terms at the left-hand side of these equations, we can write (69) in the following compact form:
1 +
A{2)rl
(+1
1
^ITT^m^l^l) (70)
D(2)
•1 + i + i
1
Y
.^m rm(^h)
Multiplying both sides of (70) by (/ -I-1)/(2/ + 1) we obtain:
0
1
1
0
A
lmr2
(2) 1 B,lm 7+i"
sSra+r^^i^i)' e?r22! +
(71)
y
i i m (02,>2)
Comment: It is interesting to note that the equation for the determination of the A—coefficient decouples from the the equation for the determination of the B—coefficient. As will be shown in the next section, this decoupling takes place irrespective of the number of layers. Since the matrix at the LHS is normal we obtain:
Factorization
Im
£•
of Potential and Field
0
1
1
0
201
Distributions
sTn2l+lYim(0^M (72)
B
l
(2) 1
Finally writing (67a) and (67c) in matrix form we have
lmT\
B,
4(2V
(ft)'
A
A
lmr2
(73)
(2) 1
(3) 1
(ft)'
Bl
l
->
The following comment completes the calculation of the unknown coefficients in our problem. Comment: The matrix at the RHS of (73) has the eigenvalues
A* = | H ] ± 1.
(74)
The corresponding normalized eigenvectors are s/2 2
±
(75)
X^ ^
2
Therefore, the matrix at the RHS of (73) can be decomposed as follows:
V2 2 V2 2
\/2 2 2
(ft)' + 1 0
V2 2
V2 2
s/2
_\f2
(76)
(ft)'-
Since the eigenvector matrix in (76) is normal, substituting the decomposition (76) for the matrix in (73) and multiplying the resulting equation by the inverse of the eigenvector matrix we obtain:
A.-R.
202
Baghai-Wadji
& E. Li
f ( 4 ^ + *, 3*0 (77)
0 5.2.
fe)'"
{2) l (2) Jl2 (A r - B ,1 ^ ^!mr2 ^m^+Ty
Explicit Formulae for the Potential Regions I, II, and III
Functions
in
In order to construct formulae for the potential functions in regions r < r±, r\
Substituting A ^ r j coefncients:
j+i
\e0ri2I
(78b)
and Bi>„/ r i + 1 into (73) we obtain the remaining
(79a
+
(S)'fe27TT^<"''«}
<79b>
Substituting these coefficients into (34a), (35a), and (36a) we obtain the formulae for ip^(r,9,<j>), tp^(r,6,4>), and
Factorization of Potential and Field Distributions
q2 r
$> 2
1 21+1
203
A (2) r
WV
A
lm 2
A
lmrl
l
r
(V^) 1 Vi
1 21+1
^
XniW
Fig. 7. Network representation of the algorithm for the calculation of the expansion coefficients ^ j ^ n in the 1st layer, and M4;„ir2 m * n e 2nd layer.
qi
V,
1 21+1
! =. 1n3(2) lm 1+1 r
>4W (ri/r 2 )
*2
V2
1 21+1
s> v.J
WV
3) - "nl( m
1
1+1 2
r
Fig. 8. Network representation of the algorithm for the calculation of the expansion coefficients •Bj m /»i M in the 2nd layer, and •Bj m /r 2 + in the 3rd layer. However, the major point here is something else: We want to investigate the forms of these expressions for the case r\ = r 2 . This corresponds to the problem of determining the potentials due to two point charges q\ and qi located at the points (n,9i,
92
B
%
Substituting A [ ^ r 2
=
(80a)
WTiVj1^2''
AY' = eon ^
^
+
and s } m / r i + 1
l
Y
^
^
(80b)
into (73) we obtain the remaining
204
A.-R.
Baghai-Wadji
& E. Li
coefficients:
.(1) _
92
1
1 v*
gi
1
1
Im
(3) _
B% =
g2
2fTT^e-M
(81a)
1
(81b)
Substituting for U ^ l and [ s ^ l from (78) into (35a), we obtain
OO
I
r'y, m (0,0) L
Z=0 m=—l
+
(82)
l
gi
1
j+ + 1i v «
r ri
eon 2Z + 1
y^((9i,0i)
^m(M)}
Recall that region II is defined by rx < r < r2. On the other hand we assumed that ri = r2- Therefore, in the present case, region II consists of the sphere r = r\. Considering this fact in (82) we obtain:
OO
(2)
I
M . « = 1=0E Em=-l{ +
J2
1_J_
eon 2Z + 1 r[
ycmah,
^2iT-^+lY^^
(Note the appearance of n at the LHS.) Next substituting (79a) and (79b) into (34a) and (36a), respectively, we have
Factorization of Potential and Field Distributions «2
1
1
,
v ^(r,e,t) = E E [ i^jaTT^to.fc)
(84a)
l
(=0 m=—l
' OO
205
1 ^ -r\Yl*rn(Oi,(t>i)}rlYlm(d,
£
^>M^) = E E_[ ^ a r r ^ 1 ^ ^ ^ )
< 84b )
1=0 m=-l
+
9i
A comparison with potential functions in (61) shows that we have been obtained the same expressions. But what about the formula in (83)? Setting r = ri in (84) we obtain OO
I
-.
-
= E E [ ikwrr^6^
VVM*)
^ = 0 771=—/
+ OO
(85a)
*
9i 1 1 e0n 21 + 1 ri
^WTiAY^Mr^^
/
r 1+1 ( <»>(rlfM) = E E .[ eiV27TT ' ^ ^^2) 0 ri 2/ + 1
(85b)
+ 91
1 +1 I ri F z ^(e 1) 0 1 )] ; i T y Im (6l^) e 0 ri 2/ +
and thus ^
( n , 0,0) = ^ ( 2 ) (r1,0,
(86)
This means that <^2) (ri, 6,
206
A.-R.
Baghai-Wadji
& E. Li
problem of calculating the potential functions associated with four point charges which lead to a five-layer problem. In what follows we rely on our results from previous sections but provide sufficient information to have a proper understanding of the construction procedure.
6.1. Point
Charges
Assume four point charges specified as follows:
Q{iHr,e,4>) =q~5(r-ri)6(cos0-cosei)6(
(87)
i = 1,2,3,4
The radial distances of the point charges from the center of a chosen spherical co-ordinate partition the entire space into five regions. The electric potential and the radial component of the dielectric displacement in ith region are denoted by the superscript (i). We use the following correspondences:
Region I: r < rx $(D ^ A\yi DW
(88a)
<=> -eQUy
(88b)
*W**^+B%±
(89a)
Region II: r\ < r < r2
^^-^V+so^l^^r
Region III: r^ < r < r%
(89b)
Factorization
$(3)
of Potential and Field
Distributions
1 A
lmr
+ i 3
iim
(90a)
rl+l
(3) J . -60-^™ ^ +eo
207
J + l (3) 1 S m l +l
(90b)
r
Region IV: r$ < r < r±
(91a) (91b)
Region V: r$ < r
$(5) .
(5) B,lm
£o
6.2. Interface
1
(92a)
rl+l
(92b)
B ('m r i + l
Conditions
Satisfying the interface conditions on the r,—spheres (i obtain the following set of eight equations.
1,2,3,4) we
Interface Conditions on the n— Sphere:
KM]=A«r<1 +
B,(2)J_
(93a)
*m J+l '1
(2) [ ^ r i ] - M « r i + (i + l) B;im gl y«m(«l,0l) £0^1
Interface Conditions on the T2~Sphere:
i
J+l r l
(93b)
A.-R.
208
[^im^J +
D
lm
Baghai-Wadji
l+l —
& E. Li
A
lmr2
+
l[^2ri}-(l+l)B^-lA: + (J + 1)
D(3)_J_ •'lm l+l
(3) B,lm
i
l+l '2
(94a)
0)_J
lm'2
Q2
(94b)
£0^2
Interface Conditions on the 7-3—Sphere:
["Wl
L
1 o(3) *
J
-AWrl
+
}(4)_L_
(95a)
To
3
(4)_J_
+(J + i) 5,im
(+1
93 £0^3
^(^3,03)
(95b)
Interface Conditions on the r4—Sphere:
U ( 4 V1 4 - R ( 4 ) - l _ - B(5)
—
(96a)
lm r 1+1 4
4
94
e 0 r 4 y«m(«4,^4)
(96b)
We multiply the A— and 5—coefficients in various layers by appropriate non-dimensional quantities in order to obtain [J4)^T^] (S = 1,2,3,4) and [-^Jm/rs-i] ( s = 2,3,4,5) as the new coefficients. Furthermore, in each group of the interface conditions, we divide the second equation by I + 1. We obtain:
Factorization
of Potential
and Field
Distributions
AWrl] - feY Ll (2 Vl + \B{2) — A r lm \\ ~ y^J [Almr2\ + [^JmrJ+l T+i H<mVlJ ~ T+i W r*™1*2! j i Yt L B 1 + e0ri I + :
2&\-£m '<*M M-i
[^] ? (3)
+ 3(m
+
r
(2)
s;Im
(s
J+l
-® ^
(2) B;Im
J+l
X
1 J+l
1+1
[ B, "VJ = KM] + W + Bj'Im
£0»"2 i + 1
i
^2
Z + l Vr3
^(fc.fc)
i+i
[4M]
~2
1 J+l
i+i
r
? (3)
i
'Im J + l
,(4)J_1 7m i + i J
*Im(*3,&) e0r3 Z + 1
»*3
Z + l V>"4
A.-R.
210
KM] + (*)'
Baghai-Wadji
B (4) m J r
3
& E. Li
(5)J_ B,lm l+l
+1
(100a)
r
4
B,( 4 ) _ | _ r
3
+
1
(5) S im J+1
(100b)
£0^4 I + 1
Define
1 e0rs I + 1 *£,(*..*.) s = 1,2,3,4.
6(s)
(101)
Written in matrix form we have
A
lmr2
KM] = [(S)'
(102a) B,,(2) 7m
J^
4 (2) J
7Tl)[^]+ -*fe)' *
5, (2)_L_ lm
(i)
Q[
(102b)
211
Factorization of Potential and Field Distributions 3 4l(m V r 3
AAW-1 r
A
lm 2
1+1
(*)
B,(2)
1
(103a)
(*)'
,
(3)
1
5,Zm r*+
A
\r2)
R (2)
1
A(3)rl ^Im/'i
+
i +fii
<2, (2)
Vr3y
(103b)
? (3)
1 'lmZP?Z
4(3V
A
lmri
^imr3
1+1
te)
(104a)
- (s)
,(3)J_ Zm r i
(4) B,Zm
1 l r+ l
4(3) r l
'
/'r V
+1 ? (3)
~ \rij
i+T
1
*i™;p_, A
lmr4
1+1 \r4J
B
(104b)
Q, (3)
(4) 1
4(4) r I i+i
(S)
o(4)
,1+1
1+1
\r4J
B,1(5) ™J
1
4(4) r l o(4)
1
i +l
(5)
+ Bl
Substituting (102a) into (102b) we obtain
(105a)
L
= Q\42
(105b)
A.-R. Baghai-Wadji & E. Li
212
4 (2 V [0
1]
A
lmr2
R (2)
]±LQW
(106)
1
Substituting (105a) into (105b) we obtain
A\
;i
o]
(4U
(4) 1
B,
I + 1 Q(4) 2/ + l V / m
(107)
The remaining four equations in (103) and (104) written in matrix form give:
A{2)rl
(s) m -(s)'
(2) 1 1 B lm ri^"
(108) 0)^3
A,1V
(3) 1 1 B lm -J+
4 (2 V A
imr2
o(2) _ 1
(^J
-6(2)
'+i l^j ^1771^3
B (3) 1
(109)
Factorization
of Potential and Field
213
Distributions
A{3)rl A
lmr3 1
D(3)
1
(-)'+1
-(-)'
-1 A
= 0
(110)
= 4(3)
(111)
r
lm i
o(4) 1 lm^P^
D
r
3
J
4™M (3) 1
I l+l
'r y
+i
B,Zm
(4
,4 V 5,
Combining our results we obtain
T^+i
(4) 1
™lm
214
A.-R.
Baghai-Wadji
& E. Li
0 -1
(S)
"(S)
J+l
J+l 1+1 Vr3/
1 \M-i
( T3 J (+1
Os)
i+i
—j-M' ;+i V^y
o lm 2.
JiLoW 2i+l^m
1 B i(2) mT+T A4
(3)-I imr3
Q, (2)
(112) B (3) r
1 2
(4U AIV (m'4 (4) 1
B,
6(3)
J+inW
3
6.3. Decoupling
of the Coefficient
Matrix
In what follows we will demonstrate that the above system of equations decouples into two systems. Consider the first three equations:
Factorization
of Potential and Field
1
0
Distributions
01
0
J±Lod) 1
D(2)
/
\l+l
fe) -fe)
1
/
I l+l
xi+1
-fe)'
215
2l+l^
r
l
-1
0 Im o
-J-(a.)1
(2)
1
M-l ^ r 3 / /
D(3)
1
L Q«.
->lm 7 + T
(113) Adding the second equation to the third one, and also, multiplying the second equation by —1/(1 + 1) and adding the resulting equation to the third one we obtain:
1 + -*x
^
(' + * ) ( * ) '
i+i
0
- ( ! + £ ) (r
2
\i+i J
» 1
^ M-l
A%r< R (2)
1
A ^ 4(2)
^(mr3
B,
(3) 1
1 (114)
^lm
4(2)
Multiplying the second and third equations by (2Z + 1)/(Z +1) we obtain:
A.-R.
216
Baghai-Wadji
& E. Li
A
lmr2
0"
0(1)
D(2) 1 13 Im ZJTT
l+l 21 + 1
0
-(s)'
A(3)rl
(+1
Qlm
(115)
(2)
(S)
1
o(3) L
Ql
1 r
2
J
Finally multiplying the first equation by (ri/r2)l+1 sulting equation to the third equation we obtain:
and adding the re-
4 (2 V A
0
1
0
lmT2
Q, (1)
o(2) 1 B ImTFT
1 0
0 0
-(g) 0
•Almr3
,0)
l+l 21+1
Q?] fc
1
(-)
1+1 -
m+Q% (116)
Obviously the first and the third equations, which involve [•Bjm'/ri+1] and [•Bjm/r2+1]i decouple from the second equation, which interrelates
[A\y2] and {A\3M}. Next we consider the third equation in (116) together with the fourth and the fifth equations in (112):
Factorization
of Potential and Field
Distributions
217
Alm{3)r3rl
fi
1 7+T"
D(3)
n
lm
-1
(S)' I l+l
4(4V
-00'
B,(4)
1 3
V 2/ ft [(5
2Z+1
r
i1m m
^
^ ^lm
(117)
Qlm
Adding the second equation to the third one, and subtracting 1/(1 + 1) times the second equation from the third equation we obtain:
A?2r{ (3) 1 B,ImTTTT
21+1 (rA1 l+l \r4)
21+1 i+i
Alm(A)rirl
A
1
_2l±l
(rA
l+l
\r3)
2/4-1 l+l
B (4)
1
Q% +n(2)Ql
Ufe)
21+1
(3)
Q)
(118)
(1)
Q)
Multiplying the second and the third equations by (I + 1)/(2Z + 1) we obtain:
A.-R. Baghai-Wadji & E. Li
218
" 4 (3 V. " 0
1
A
lmr£
0'
0
o(3)
1
-(s)
0
1 r
2
0
A(4)rl
A
0
"(3)'
lmr4,
1
0
R(4)
1 r
L
3
J
fe)'+1^+C Z+ l 2/ + 1
(119)
3) g (Zm.
Q,(3) Multiplying the first equation by {r2/r^)l+1 t h e third equation we obtain:
and adding the result to
A
lmr3
0
0"
1
o(3)
1 '2
1
0
-05)'
0 A{A)rl
A
0
0
lmri
1_
r
L
,1+1 _
(£) f+ 1 21 + 1
3
(i)
J
(2)
0(3)
(120)
fe)mf(s)'+'«£+<5
n(2) lm
+ Q,(3)
Note t h a t replacing r a / r 4 by r 2 / r 3 we obtain the matrix in (116). Equations (116) and (120) taken together give:
Factorization of Potential and Field Distributions
0
1
0
0
0
0 R(2)
1
0
0
0
-(*)'
219
0
0
0
1
0
0
1
Im o
0
(3)
0
0
0
0
1 0
0 0
B,Im
2
-fe/ o 0
1 r^1
(4
4 V
1
(4)
B,
1
Q,(1)
J+ l 2/ + 1
\r2j
Ql™. + Qlm Q,(3)
'(%)WQ%
(3)
+ Q% + Q\
So far we have not used the last equation in (112). Adding this equation t o (121) results in:
220
A.-R.
0
1
1
0
0
0
0
0
Baghai-Wadji
& E. Li
0
0
0
0
0
0
0
1
0
0
0
1
0
0
0
0
0
0
1
0
0
0
0
1
0
-(*)'
-(*)'
0
(2U 1 A)t>r'. lm'2 (2) B lm
1
+1
r^
A*. (3) B lm
1 r1"^1
AfiV: (4) l B lm r
1 +1
(i)
Q)
(2)
l+ l 21 + 1
(-)
Qtt + Q% (122) Q,
(3)
^ lm
terferc+os
(3)
+Q
o{4)
Evidently this system of equations decouples into a subsystem for the determination of the coefficients A and a subsystem for the calculation of the coefficients B. The system for the B coefficients is immediately solvable:
Factorization
of Potential and Field 1
"D(2)
1
0
0"
0
1
0
0
0
1.
n
lmZTT
R(3)
221
•
l+l 21 + 1
1 r
2
o(4) 1 lmZTTT r 3
Distributions
n
L
J (i)
(2)
(S)
«+<5lm
fe)'+1[fe)'+1C
+ Ql
(123) (3)
+ QI
The system for the coefficients A is:
r>i (2) r'i
0
"(S)'
A
1
(*)'
1+1 6(3) 21 + 1
(124)
a(4)
AWri
1
0
lmrZ
Q{2)
Multiplying the third equation by {vs/r^)1 and adding the resulting equation to the second equation we obtain:
(9)'
o(2)
0
1
0
0
1_
lm
Alm{3)r3rl
/i
&
l+l l 21 + 1 &) Q.
AWri lAlmr4J
Multiplying the second equation by (r2/r3)1 equation to the first equation results in:
(4) , lm ^
Q(3)
**lm
(125)
0(4) **lm
and adding the resulting
222
A.-R.
Baghai-Wadji
& E. Li
+c
1
0
o-
A
0
1
0
A{3)rl
0
0
1.
AWrl
lmr2
l+l 21 + 1
(3)
Qlm
(126)
For the determination of the remaining coefficients, i.e., [^(J^i] and [B^/r{+1} we use (102a) and (105a):
(127a) 1
?(5) r
(4)_J_
(127b)
4
Substituting for the coefficients A and B at the RHSs of (127) from (126) and (123) we obtain:
1+ ^2 r
^Im
3,
^Im
+^K^
(128a)
i+l
2/ + 1 (5) 1 B\Im 1+1 Z+ l
-:s)
+
Hi
HI
(S)
Qlm + Qlm
S(3)
(4)
+ <0+<&™
Summarizing our results we obtain the following formulae:
(128b)
Factorization
2l
+l
of Potential and Field
223
Distributions
\AWri
l+l H-ri.
+ <&>+$£!
~ J Qlm + Qlm
-\Ai2 1 + 1' ]A, H™ " r-
^Qtt + Q?
ITT n-^J " \u J $™ +
+ Q?2
r[^]=<5.n(4)lm
T+T
1
(2) B,(m
(129e)
\ l+l
21 + 1 (3)J_ B\ l+l
2Z + 1 i+ 1
(129d)
= Q,(1)
7+1
~ J
(129f)
Qlm + Qlm
1+1
B (4) _L r
(129b)
(129c)
^
21 + 1 1+
21 + 1
(129a)
3
l+l
(-
VoS+gf J Qlm "•" ^ i m +C
2/ + 1 (5) B:im /+ 1
(129g)
l+l
1
(+1
J* J \l + l
~ J
Qlm + Qlm + &
+ Qlm{4)
(129h)
224
A.-R. Baghai-Wadji & E. Li
From the above we obtain the following recurrence formulae for the coefficients A and B:
(130a)
KM] = £) [ ^ ]
(2) 1 B 'm 1+1
i+i
2
(4)_j_
5 im
J+l
B,(4)_£_
(130e)
2/ + l ^ ' m
(3)_|_
B\'"i ~ ' + l r
(130d)
2Z + l v ' m
(2)_i_ l+l
B,lm
r2j
i+i
2l +
l^lm
(3) _ j _ ( 3 ) B\'"» J + l + - ^ 1 Qv m 2Z + l '
(130f)
(130g)
(130h)
Factorization
of Potential and Field
Distributions
225
- A(V
q4
V4
A
lmr4
(r3/r4)
V3
-e-
-
A(3) r1
A
lmr3
(r2/r3) q2
5>r2
-e-
A(2) r
J
*
A
r
-
A (1) r'
lm 2
(ri/r 2 )
r
*o l
- #
^
A
lmri
Fig. 9. Network representation of the algorithm for the iterative calculation of the expansion coefficients Avlri in i layer, i = N — 1, • • • , 1 .
7. Conclusion We presented a recipe for the factorization of potential- and field distributions without utilizing the Addition Theorem. Factorization plays a fundamental role in the majority of standard fast algorithms, e.g. the Fast Multipole Method and its variations. Factorization, as used in this paper, refers to the multiplicative decomposition (tensor product) of the potential functions into functions, which separately depend on the co-ordinates of the source- and the observation points. The multiplicative decomposition is crucial whenever we need to, e.g. integrate over a large number of source points, while keeping the observation point invariant. Such circum-
226
A.-R. Baghai-Wadji & E. Li
(2) 1
qi
•*-
B lm
r
1+1 1
1+1
(r,/r2) q2
V2
(3) R _L 15
m
lm
1+1
1+1
(^3)
q3 ^ 3
^ 4
Blm
r
1+1
3
(r 3 /r 4 ) q4
(4) 1
e1+1
(5) 1
•e-
Blm
r
l+l
4
Fig. 10. Network representation of the algorithm for the iterative calculation of the expansion coefficients -B;„/»"'iJ hi i t h layer, i = 2, • • • ,N.
stances routinely arise in large scale engineering computations based on singular surface integrals. T h e proposed factorization relies on the diagonalization of the underlying partial differential equations with respect to a distinguished direction in space. We outlined our method in great detail by introducing a constructive recipe for the diagonalization of the Poisson's equation in spherical co-ordinates. We provided a self-sufficient exposition on the factorization of potential- and field functions. It can be shown that the proposed recipe is applicable to problems in electrodynamics and elastodynamics, involving isotropic, anisotropic, or bianisotropic materials. More generally, our recipe is applicable to any system of linear partial differential equations which can be diagonalized. Recently
Factorization
of Potential and Field
Distributions
227
we published a conjectured claiming that all linear physically realizable systems belong to this class of problems. Therefore, it is reasonable to expect t h a t the proposed method will enable us to apply the Fast Multipole Method to those problem in engineering practice for which no Addition Theorem has been formulated so far. To enhance the efficiency of computations further, wavelets [19]-[22], and problem-specific wavelets [23]-[25] can be used to obtain sparse "impedance matrices" in the Boundary Element Method applications. Wavelets and frames based on radial functions seem to be very promising tools in solving boundary value problems. On the other hand, the functional dependence of the factorized form in cylindrical- and spherical co-ordinates involves radial functions. We are presently investigating the construction of wavelets and frames which are induced by the radial functions which appear in multipole expansions. Acknowledgement This work was done while one of the authors (ARBW) was visiting the Institute for Mathematical Sciences, National University of Singapore and Institute of High Performance Computing (IHPC) in 2003. The visit was supported by the Institute and IHPC. References 1. W. D. Jackson, "Classical Electrodynamic," John Wiley & Sons, Inc., New York, 1975. 2. L. Greengard, "The Rapid Evaluation of Potential Fields in Particle Systems," MIT, Cambridge, 1987. 3. V. Rokhlin, "Rapid Solution of Integral Equations of Scattering Theory in Two Dimensions," J. Comput. Phys., vol. 86, no. 2, pp. 414-439, 1990. 4. R. Coifman, V. Rokhlin, and S. Wandzura, "The Fast Multipole Method for the Wave Equation: A Pedestrian Prescription," IEEE Antennas Propagat. Mag., vol. 35, no. 3, pp. 7-12, June 1993. 5. C. C. Lu and W. C. Chew, "A Fast Algorithm for Solving Hybrid Integral Equation," IEE Proceedings-H, vol. 140, no. 6, pp. 455-460, December 1993. 6. N.I. Muskhelishvili, "Singular Integral Equations," P. Noordhoff N.V. Groningen - Holland, 1953. 7. C. A. Brebbia, J. C. F. Telles, and L. C. Wroble, "Boundary Element Techniques," Springer Verlag, 1984. 8. R. F. Harrington, "Field Computation by Moment Methods," New York: Macmillan, 1968. 9. M. M. Ney, "Method of Moments as Applied to Electromagnetic Problems," IEEE Trans. Microwave Theory Tech., vol. MTT-33, pp. 972-980, 1985.
228
A.-R.
Baghai-Wadji
& E. Li
10. R. P. Kanwal, "Generalized Functions: Theory and Technique," Academic Press, 1983. 11. A. R. Baghai-Wadji, " A Unified Approach for Construction of Green's Funct i o n s , " Habilitation Thesis, Vienna University of Technology, 1994. 12. A. R. Baghai-Wadji, " T h e o r y and Applications of Green's Functions," a chapter in Selected Topics in Electronics and Systems - Vol. 20: Advances in Surface Acoustic Wave Technology, Systems and Applications (Vol. 2), C. C.W. Ruppel, and T. A. Fjeldly (Editors), p p . 83-149, World Scientific, 2001. 13. A. R. Baghai-Wadji, "Fast-MoM: A Method-of-Moments Formulation for Fast C o m p u t a t i o n s , " in A C E S Journal, special issue of t h e Applied Computational Electromagnetics Society Journal, pp. 75-80, vol. 12, no. 2, 1997. 14. A. R. Baghai-Wadji a n d D. Penunuri, "Coordinate-Free, FrequencyIndependent Universal Functions for BAW Analysis in SAW Devices," in Proc. I E E E Ultrosonics Symp., pp. 287-290, 1995. 15. S. Ramberger, and A.R. Baghai-Wadji, " E x a c t Eigenvector- and Eigenvalue Derivatives with Applications to Asymptotic Green's Functions," Proc. A C E S 2002, T h e 1 8 t h Annual Review of P r ogress in Applied C o m p u t a t i o n a l Electromagnetics, March 18-22. Monterey, USA. 16. K. Varis, and A.R. Baghai-Wadji, "Pseudo-Spectral Analysis of Radially-Diagonalized Maxwell's Equations in Cylindrical Co-ordinates," Optics Express, vol. 11, no. 23, p p . 3048-3062, 2003. 17. M. T. Manzuri-Shalmani, and A.R. Baghai-Wadji, "Elemental Field Distributions in Corrugated Structures with Large-Amplitude G r a t i n g s , " Electronics Letters, 2003, accepted. 18. A. R. Baghai-Wadji, a book chapter, " A Symbolic Procedure for t h e Diagonalization of Linear P D E s in Accelerated Computational Engineering," 15 pages, Editors: Franz Winkler and Ulrich Langer, Springer Verlag, 2003. 19. Y. Meyer, "Ondelettes et Operateurs, I: Ondelettes," Hermann, 1990. English translation: "Wavelets and O p e r a t o r s , " Cambridge University Press, 1992. 20. I. Daubechies, " T e n Lectures on Wavelets," CBMS-NSF Regional Conference Series in Applied Mathematics, SIAM, 1992. 21. G.G. Walter, "Wavelets and Other Orthogonal Systems with Applications," C R C Press, Inc., 1994. 22. E. Hernandez, anf G. Weiss, "A First Course on wavelets," C R C Press, Inc., 1996. 23. A.R. Baghai-Wadji, and G. G. Walter, "Green's Function-Based Wavelets," International IEEE-SU, Sonics and Ultrasonics Conference, P u e r t o Rico, 2000. 24. A.R. Baghai-Wadji, and G. G. Walter, "Green's Function-Based Wavelets: Selected P r o p e r t i e s , " International IEEE-SU, Sonics and Ultrasonics Conference, P u e r t o Rico, 2000. 25. A.R. Baghai-Wadji, and G. G. Walter, "Green's Function Induced Waveletsand Wavelet-like Orthogonal Systems for EM Applications," Proc. A C E S 2002, T h e 1 8 t h Annual Review of Progress in Applied Computational Electromagnetics, March 18-22. Monterey, USA.
VIRTUALIZATION-AWARE APPLICATION FRAMEWORK FOR HIERARCHICAL MULTISCALE SIMULATIONS ON A GRID Aiichiro Nakano, Rajiv K. Kalia, Ashish Sharma, Priya Vashishta Collaborator)) for Advanced Computing and Simulations, Department of Computer Science, Department of Physics & Astronomy, Department of Materials Science & Engineering, University of Southern California Los Angeles, CA 90089-0242, USA E-mail: (anakano, rkalia, sharmaa, priyav)@usc.edu Shuji Ogata Graduate School of Engineering, Nagoya Institute of Technology Nagoya 466-8555, Japan E-mail: [email protected] Fuyuki Shimojo Department of Physics, Kumamoto University Kumamoto 860, Japan E-mail: [email protected] A virtualization-aware application framework is developed, based on datalocality principles, to perform hierarchical multiscale simulations of materials on a Grid of distributed computing and visualization platforms. The framework combines: linear-scaling algorithms based on space-time multiresolution techniques; topology-preserving computational-space decomposition with wavelet-based adaptive load balancing; and immersive and interactive visualization of billion-atom datasets. Multiscale simulations are performed on a Grid to seamlessly integrate quantum mechanical calculation based on the density functional theory and atomistic simulation based on the molecular dynamics method.
1. Introduction Metacomputing on a Grid1 of geographically distributed Teraflop-toPetaflop computers and immersive virtual reality environments connected via high-speed networks will revolutionize science and engineering, by enabling hybrid simulations that integrate multiple expertise distributed globally.2 Such multidisciplinary applications are
229
230
A. Nakano et al.
emerging at the forefront of computational science and engineering. For example, hierarchical multiscale simulations embed accurate calculations—e.g., quantum mechanical (QM) calculations to handle chemical reactions—within coarse models—e.g., molecular dynamics (MD) simulation to describe large-scale atomistic processes and finite element (FE) calculation for continuum mechanics—only where/when high-fidelity modeling is required.3"5 On a Grid, applications will likely be virtualized, i.e., the user will not know on which computers the code is running. However, virtualization of high-end applications (such as billion-atom MD and thousand-atom QM simulations) is an unsolved challenge. To support virtualization, applications need to be: 1) scalable from a single processor to thousands of processors; 2) portable from one architecture to another in terms of performance; and 3) adaptive to dynamically changing computing resources. In the past years, we have been developing a virtualization aware application framework for hierarchical multiscale materials simulations, based on data locality principles. Our framework includes 0(N) algorithms (N is the problem size) for broad applications such as: 1) space-time multiresolution molecular dynamics (MRMD) algorithm for large-scale atomistic simulations; and 2) linear-scaling density functional theory (LSDFT) algorithm to reduce the exponential complexity of the QM problem. To implement these algorithms on massively parallel and distributed computing platforms, our framework also includes: 1) topology-preserving computationalspace decomposition for structured message passing to minimize the latency; and 2) wavelet-based curvilinear-coordinate load balancing to minimize the load-imbalance and communication costs. Finally, the framework enables interactive visualization of high-dimensional datasets from extreme-scale (e.g., billion atoms) simulations in immersive virtual environments, using multilevel, probabilistic and parallel/distributed techniques. In Sec. 2, we describe the 0(N) MRMD and LSDFT algorithms, and Sec. 3 deals with the scalable parallel computing framework. Multiscale simulations and immersive/interactive visualization techniques are presented in Sees. 4 and 5, respectively. Finally, Sec. 6 contains conclusions.
Virtualization-Aware
Application
Framework
231
2. Linear-Scaling Atomistic Simulation Algorithms Spatiotemporal localities inherent in scientific/engineering problems can be utilized to design algorithms with low computational complexities. This section shows two such examples in MD and QM simulations. lA.Multiresolution Molecular Dynamics In the MD approach, one obtains the phase-space trajectories of the system (positions and velocities of all atoms at all time) by numerically integrating Newton's equations of motion,6'7 dr
dr{
\ I
where M, and r, are the mass and position of the z'-th atom, respectively. In Eq. (1), atomic force laws for describing how atoms interact with each other is encoded in the interatomic potential energy, EMo(rN), which is a function of the positions of all N atoms, r^ = {ri, r2,..., r^}. In our manybody interatomic potential scheme,8 EMDO^) is expressed as an analytic function that depends on relative positions of atomic pairs and triplets. The hardest computation in MD simulations is the evaluation of long-range electrostatic forces, which requires 0(N2) operations. The multiresolution molecular dynamics (MRMD) algorithm6'7'9 reduces this 0(N2) complexity to 0(N), by making use of spatial localities based on the fast multipole method (FMM) 10 ' u . The first essential idea of the FMM is clustering, i.e., instead of computing interactions between all atomic pairs, atoms are clustered and cluster-cluster interactions are computed (see Fig. 1). At the source of interaction, cluster information is encapsulated in terms of the multipoles of the charge distribution with a well-defined error bound. At the destination, the electrostatic potential is expanded in terms of local terms, which is similar to the Taylor expansion. The second essential idea is to use larger clusters for longer distances, in order to reduce the computation and keep the error constant. This is achieved by recursively subdividing the simulation box into smaller cells to form an octree data structure. The O(N) algorithm traverses this tree twice. In the upward pass, multipoles are computed for all cells at all levels. First the multipoles of the leaf cells are computed using atomic charges and coordinates, and the multipoles of these children cells are shifted and combined to obtain the multipoles of the parent cells. This procedure is
232
A. Nakano et al.
repeated until the root of the octree is reached. In the downward pass, these multipoles are translated to local terms for all cells at all levels, starting from the root. For a given cell at each level, only the multipoles of a constant number of interactive cells contribute to the local terms. Contributions from farther cells have already been computed at the previous coarse level, and they are inherited from the parent cell. On the other hand, the contributions from the nearest-neighbor cells will be computed at the next fine level. This procedure is repeated until we reach the leaf level. Finally, the nearest-neighbor-cell contributions at the leaf level are computed by direct summation over atoms. Since constant computation is performed at each of the O(N) octree nodes, the complexity of the FMM algorithm is O(N). Our scalable and portable parallel FMM algorithm, which has the unique capability to compute stress tensors with a novel complex charge method, is freely available from an open source public library.11
Fig. 1. Schematic of the FMM. (Left) Atoms (dots) are clustered (squares), and clustercluster interactions are computed. (Center) Two-dimensional example of hierarchical clustering, in which the entire system at level / = 0 is recursively divided into subsystems. (Right) For a given cell at an octree level (shown in black in the bottom panel), only the multipoles from a constant number of neighbor cells (shown in gray in the bottom panel) are translated to the local terms. Contributions from farther cells (hatched) have already been computed at the previous coarse level, and they are inherited from the parent cell (black in the top panel). On the other hand, translation of the multipoles from the nearestneighbor cells (white in the bottom panel) will be delegated to the children cells at the next fine level.
The MRMD algorithm also uses multiple time stepping (MTS)9'12' 13 to take advantage of temporal localities. The MTS uses different forceupdate schedules for different force components, i.e., forces from the nearest-neighbor atoms are computed at every MD step, and forces from
Virtualization-Aware
Application
Framework
233
farther atoms are computed with less frequency. This not only reduces the computational cost but also enhances the data locality, and accordingly the parallel efficiency is increased. These different force components are combined using a reversible symplectic integrator,12 and the resulting algorithm consists of nested loops to use forces from different spatial regions. It has been proven that the phase-space volume occupied by atoms is a simulation-loop invariant in this algorithm, and this loop invariant results in excellent long-time stability of the solutions. 2.2. Linear-Scaling Density Functional Theory on Hierarchical Grids Breakage and formation of bonds in chemically reactive regions need to be described by a QM method, which explicitly treats the electronic degrees-of-freedom. Since each electron's wave function is a linear combination of many states, the combinatorial solution space for the many-electron problem is exponentially large. The density functional theory (DFT) avoids the exhaustive enumeration of many-electron correlations by solving M single-electron problems in a common average environment (M is the number of independent wave functions and is on the order of JV).'4"16 As a result, the problem is reduced to a selfconsistent matrix eigenvalue problem, which can be solved with O(A^) operations. The DFT problem can also be formulated as a minimization of the energy, EQM(TN, I|>M), with respect to electron wave functions, tyM(r) = {xp ](r), ii>2(r), ..., i p ^ r ) } , subject to orthonormalization constraints between the wave functions. We include electron-ion interaction using norm-conserving pseudopotentials17' l8 and the exchange-correlation energy associated with electron-electron interaction in a generalized gradient approximation19. For efficient parallel implementation of DFT, we have developed a hierarchical real-space grid method,20' 21 based on higher-order finite differencing and multigrid acceleration ' ' . In the hierarchical grid method, these real-space multigrids are adaptively refined25 near each atom to accurately describe pseudopotentials (see Fig. 2). For larger systems (M > 1,000), however, the 0(M3) orthonormalization becomes the bottleneck, and hence, linear-scaling DFT algorithms become essential.26 We have implemented21 an O(M) algorithm27 based on the data locality principle called "quantum nearsightedness"28—the observation that, for most materials at most temperatures, the off-diagonal elements of the density matrix,
234
A. Nakano et al. M
p(r,r')=^^(r)^ / ! (r'),
(2)
n=l
decay exponentially, i.e., p(r,r')ocexp(-C l r - r ' l ) for lr-r'H»oo (Cis a constant). Such a diagonally dominant matrix can be represented by maximally localizing each wave function, ip„(r), by a unitary transformation and then truncating it with a finite cut-off radius. In addition, a Lagrange-multiplier-like technique is used to perform unconstrained minimization, avoiding the O(A^) orthonormalization procedure.
Fig. 2. Schematic of hierarchical real-space grids. Coarse multigrids (gray) are used to accelerate iterative solutions of the DFT problem. Fine grids (meshes in the bottom panel) are adaptively generated near the atoms (spheres in the bottom panel) to accurately operate pseudopotentials on the wave functions.
3. Scalable Parallel Computing Framework Data locality principles are key to developing a scalable parallel computing framework as well. For parallelization of MRMD and LSDFT algorithms, we have developed a topology-preserving computational spatial decomposition scheme6'7 to minimize latency through structured message passing9 and load-imbalance/communication costs through a novel wavelet-based load-balancing scheme29. In spatial decomposition, the total volume of the system is divided into P subsystems of equal volume, and each subsystem is assigned to a processor in an array of P processors. To calculate the force on an atom in a subsystem, the coordinates of the atoms in the boundaries of neighbor subsystems are "cached" from the corresponding processors. After updating the atomic positions due to a time-stepping procedure, some atoms may have moved
Virtualization-Aware
Application
Framework
235
out of its subsystem. These atoms are "migrated" to the proper neighbor processors. With the spatial decomposition, the computation scales as NIP while communication scales in proportion to (NIP)213 for an N-atom system. Many MD simulations are characterized by irregular atomic distribution and associated load imbalance. For irregular data structures, the number of atoms assigned to each processor varies significantly, and this load imbalance degrades the parallel efficiency. The load-balancing problem can be stated as an optimization problem, i.e., one minimizes the load-imbalance cost as well as the size and the number of messages: T = W.p(max p |{i I r, E p}\) + tcomm(maxpp +f
I ||r,- -dp\ < rc }|)
latency(maxp[^message(P)])
where the three terms are the load-imbalance cost, the size of messages, and the number of messages, respectively. In Eq. (3), dp and Nmessage{p) denote the boundary surface of the physical volume assigned to processor p, and the number of messages per MD step for/?, respectively. The expression, maxg/(p), denotes the maximum value of function f[p) over all the processors, and rc is the range of the interatomic potential. The prefactors, /comp, (Comm and latency, are constants related to the processor speed, communication bandwidth and latency, respectively, and they are determined experimentally by test runs on the parallel computer under consideration. To minimize the number of messages, we preserve the 3D mesh topology, so that message passing is performed in a structured way in only 6 steps. To minimize the load imbalance cost as well as the message size, we have developed a computational-space decomposition scheme.30 The main idea of this scheme is that the computational space shrinks where the workload density is high and expands where the density is low, so that the workload is uniformly distributed in the computational space. To implement the curved computational space, we introduce a curvilinear coordinate transformation, t = x + u(x),
(4)
where x is a position in the physical Euclidean space and u(x) is a deformation field. We then use regular 3D mesh topology in the computational space, \, to map atom / to processor p in an array of Px x Py x Pz processors:
236
A. Nakano et al.
Pfa) = Px{Six)PyPz + Py (liy )PZ + P&iz) Pa{%ia) = [%iaPa/La\
'
... V. /
( « = *,V,z)
where §; = (|f«, ^ , ffe) is the coordinate of atom i and L a is the simulation box size in the a direction in the computational space. This regular 3D mesh partition in the computational space results in curved partition boundaries in the physical space, x. The load-imbalance and communication costs are minimized as a functional of the coordinate transformation, l(x), using simulated annealing. We have found that wavelet representation leads to compact representation of curved partition boundaries, and accordingly to fast convergence of the minimization procedure.29 Another critical issue in high-end parallel/distributed computing is the handling of large datasets. For example, a 1.5-billion-atom MD simulation we are currently performing produces 150 GB of data per time step (or every few seconds), including atomic species, positions, velocities, and stresses. For scalable input/output (I/O) of such large datasets, we have designed a data compression algorithm based on data localities.31 It uses octree indexing and sorts atoms accordingly on the resulting spacefilling curve, which is a mapping from the threedimensional space to a one-dimensional list, while preserving the spatial proximity of successive list elements. By storing differences between successive atomic coordinates, the I/O requirement for a given error tolerance reduces from 0(N\ogN) to 0(N). An adaptive, variable-length encoding scheme is used to make the scheme tolerant to outliers. An order-of-magnitude improvement in the I/O performance was achieved for actual MD data with user-controlled error bound. The 0(N) algorithms in Sec. 2 and the parallel computing framework in this section have been combined to achieve scalability up to 6.4 billion-atom MRMD and 0.44 million-electron LSDFT simulations on multi-Teraflop architectures.6 4. Hierarchical Multiscale Simulations on a Grid Our hierarchical multiscale simulation framework4' 5 consists of: i) hierarchical division of the physical system into subsystems of decreasing sizes and increasing quality-of-solution (QoSn) requirements (e.g., needs for describing nonlinear atomistic processes or chemical reactions), S0 D Si D ... D S„; and ii) a suite of simulation services, Ma (a
Virtualization-Aware Application Framework
237
= 0, 1, ..., n), of ascending order of accuracy (e.g., FE -< MD -< DFT), see Fig. 3. In this scheme, an accurate estimate of the energy of the entire system is obtained from the recurrence relation,32 E
a{Si)
= Ea-l{Si)
+ Ea{Si+\)-
E
a-\{Si+\)>
(6)
where EJJSi) is the energy of system S, calculated with method Ma. This modular, additive hybridization scheme has minimal interdependency and communication between simulation modules. Other physical quantities such as interatomic forces are obtained by derivatives of Eq. (6), and accordingly are additive as well. . Level of theory Accurate M 2 T O "
Coarse
Y Small
y^ System size Large
Fig. 3. Extrapolation in the two-dimensional meta-model space in a hierarchical multiscale simulation. Recursive applications of Eq. (6) accurately describe a large system (denoted by star), using less compute-intensive calculations (circles).
We have used the additive hybridization framework to perform: 1) MD/DFT simulations of crack initiation in Si in the presence of water molecules;33 2) FE/MD simulations of stress distributions at Si/amorphous Si3N4 interfaces;34 and 3) multiscale FE/MD/DFT simulations of oxidation in Si.4 We have also performed a multidisciplinary, collaborative MD/DFT simulation on a Grid of geographically distributed Linux clusters in the US and Japan, based on the modular, additive hybridization scheme (see Fig. 4).2 The multiscale MD/QM simulation code has been Grid-enabled based on a divide-and-conquer scheme, in which the QM region is a union of multiple QM clusters.
238
A. Nakano et al.
Fig. 4. Multiscale MD/DFT simulation of the reaction of water at a crack tip in silicon (top), on a Grid of distributed Linux clusters in the US and Japan (bottom). In this figure, five QM calculations (circles) around five water molecules are embedded in an MD simulation.
Since the energy is a sum of the QM energy corrections for the clusters in the additive divide-and-conquer hybridization scheme, £ = £ M D (system) + X[ £ 'DFT( cluster )~ £ 'MD( cluster )]-
(7)
cluster
each QM-cluster calculation does not access the atomic coordinates in the other clusters, and accordingly its parallel implementation involves no inter-QM-cluster communication. Furthermore, the multiple-QMcluster scheme is computationally more efficient than the single-QMcluster scheme because of the O(A^) scaling. (The large prefactor of 0(N) DFT algorithms makes conventional 0(A^) algorithms faster below a few hundred atoms.21) The hybrid MD/DFT simulation algorithm has been implemented on parallel computers, by first dividing processors into the MD and DFT calculations (task decomposition) and then using spatial decomposition in each task. The additive hybridization scheme makes the MD and DFT subtasks entirely independent except for the exchange of cluster-atom coordinates and calculated forces. The MD processors compute the
Virtualization-Aware
Application
Framework
239
energy and forces of the entire system and send the atomic coordinates of the QM clusters to each of the QM processor groups. Subsequently, the MD and QM processors independently perform the MD and QM computations on the atomic clusters. The QM energy and forces are then returned to the MD processors, where the total energy and corresponding forces are calculated and the equations of motion are integrated to update the atomic positions and velocities. The communications between the MD and QM processors are minimal, since the MD processors only need to send several hundred atomic coordinates to each QM cluster, which in return sends back the calculated several hundred force components. We have implemented the multiscale MD/DFT simulation algorithm as a single MPI (Message Passing Interface)35 program. The Globus middleware1 and the Grid-enabled MPI implementation, MPICH-G2,36 have been used to implement the MPI-based multiscale MD/DFT simulation code in a Grid environment. In the initial implementation, processors on multiple PC clusters are statically allocated using a host file. The user specifies the number of processors for each QM-cluster calculation in a configuration file. In more recent MD/DFT simulations, a simple local error indicator based on atomic bond lengths has been used to automatically change the size of QM calculations in run-time. The Gridified MD/QM simulation code has been used to study environmental effects of water molecules on fracture in silicon. A preliminary run of the code has achieved a parallel efficiency of 94% on 25 PCs distributed over 3 PC clusters in the US and Japan, see Fig. 4. 5. Immersive and Interactive Visualization of Massive Datasets Data locality principles also play a critical role in designing scalable visualization techniques. Interactive visualization/mining of highdimensional datasets is essential for understanding hybrid multiscale material simulations.37 An immersive virtual environment is ideal for interactively exploring complex material processes, e.g., in nanoceramics and nanocomposites. We have an immersive and interactive visualization platform called ImmersaDesk, which is used to render billion-atom datasets at a near interactive speed. To achieve this capability, we have developed a visualization system based on a parallel and distributed approach with a Linux cluster to efficiently select a data subset within the field-of-view (view-frustum culling) using the octree data structure (see Fig. 5).38'39
240
A. Nakano et al.
Fig. 5. Schematic of octree-based view-frustum culling. (Left) Extraction of octree nodes (in gray) that contain the viewer's field-of-view (triangle). (Right) Recursive traversal of the octree to check if the nodes at different levels intersect with the field-of-view. If a parent node is found to be visible, then its children are tested. When applied to the entire octree, the visible nodes are extracted in CHlogN) time.
We have also developed a novel probabilistic approach to efficiently remove hidden atoms (occlusion culling) far from the viewer.38,39 This approach is based on a recurrence relation, vc-(l-Dc)vc_ls
(8)
where vc is the visibility (i.e., the fraction of atoms that are probably seen by the user) of the c-th octree cell and Dc is the density of atoms (normalized in the range between 0 and 1) of the c-th octree cell. In Eq. (8), the leaf octree cells are ordered in an ascending order of distance from the user, based on a line-drawing algorithm. When the viewer is moving, the probabilistic occlusion culling is activated to decimate atoms with probability 1 - vc, with typically a few percent of pixel loss. Furthermore, we use a machine-learning approach to predict the user's next movement and prefetch data from the Linux cluster to the graphics server.40 Using this visualization system, we have demonstrated interactive visualization of a billion-atom dataset in an immersive virtual reality environment. 6. Conclusion In summary, we are developing a virtualization aware framework for hierarchical multiscale simulations based on data locality principles.
Visualization-Aware
Application
Framework
241
Such a framework will become increasingly more important in the coming era of "cyber-infrastructures for science and engineering," when globally distributed multidisciplinary teams will collaborate on a Grid. The scope of hierarchical multiscale simulations is rapidly expanding. For example, we have recently performed multiscale FE/MD/DFT simulations to study the oxidation of silicon surface.4 Such hierarchical multiscale simulations will enhance the scalability on Grid architectures by performing more accurate but less scalable computations only when and where they are needed, and they are expected to play a significant role in scientific research and engineering developments in the cyber-infrastructure era. Acknowledgments This work was partially supported by AFRL, ARL, DOE, NSF, and USC-Berkeley-Princeton DURINT. Benchmark tests were performed at Department of Defense (DoD)'s Major Shared Resource Center under a DoD Challenge Project. Parallel simulations were also performed on the 1,512-processor HPC cluster at the Research Computing Facility and 400+ processor Linux clusters at the CoUaboratory for Advanced Computing and Simulations at the University of Southern California.
242
A. Nakano et al.
References 2
3
4
5
6
7
8
9
10 11
12
13 14 15 16
17 18 19 20
21
22 23 24 25 26 27 28 29 30
I. Foster and C. Kesselman, The Grid 2: Blueprint for a New Computing Infrastructure (Morgan Kaufmann, San Francisco, 2003). H. Kikuchi, R. K. Kalia, A. Nakano, et al, Proc. Supercomputing '02 (IEEE, 2002). J. Q. Broughton, F. F. Abraham, N. Bernstein, et al., Phys. Rev. B 60, 2391 (1999). S. Ogata, E. Lidorikis, F. Shimojo, et al., Comput. Phys. Commun. 138, 143 (2001). A. Nakano, M. E. Bachlechner, R. K. Kalia, et al., Comput. Sci. Eng. 3(4), 56 (2001). A. Nakano, R. K. Kalia, P. Vashishta, et al., Scientific Programming 10, 263 (2002). A. Nakano, T. J. Campbell, R. K. Kalia, et al., in Handbook of Numerical Analysis, Special Volume on Computational Chemistry, ed. C. Le Bris (Elsevier, Amsterdam, 2003) p. 639. P. Vashishta, R. K. Kalia, and A. Nakano, J. Nanoparticle Research 5,119 (2003). A. Nakano, R. K. Kalia, and P. Vashishta, Comput. Phys. Commun. 83, 197 (1994). L. Greengard and V. Rokhlin, J. Comput. Phys. 73, 325 (1987). S. Ogata, T. J. Campbell, R. K. Kalia, et al., Comput. Phys. Commun. 153, 445 (2003). G. J. Martyna, M. E. Tuckerman, D. J. Tobias, et al, J. Chem. Phys. 101, 4177 (1994). A. Nakano, Comput. Phys. Commun. 105, 139 (1997). P. Hohenberg and W. Kohn, Phys. Rev. 136, B864 (1964). W. Kohn and L. J. Sham, Phys. Rev. 140, Al 133 (1965). W. Kohn and P. Vashishta, in Inhomogeneous Electron Gas, eds. N. H. March and S. Lundqvist (Plenum, 1983) p. 79. N. Troullier and J. L. Martins, Phys. Rev. B 43, 8861 (1991). R. D. Kingsmith, M. C. Payne, and J. S. Lin, Phys. Rev. B 44, 13063 (1991). J. P. Perdew, K. Burke, and M. Ernzerhof, Phys. Rev. Lett. 77, 3865 (1996). S. Ogata, F. Shimojo, R. K. Kalia, et al, Comput. Phys. Commun. 149, 30 (2002). F. Shimojo, R. K. Kalia, A. Nakano, et al, Comput. Phys. Commun. 140, 303 (2001). J. R. Chelikowsky, Y. Saad, S. Ogiit, et al., Phys. Stat. Sol. (b) 217, 173 (2000). J.-L. Fattebert and J. Bernholc, Phys. Rev. B 62, 1713 (2000). T. L. Beck, Rev. Mod. Phys. 72, 1041 (2000). T. Ono and K. Hirose, Phys. Rev. Lett. 82, 5016 (1999). S. Goedecker, Rev. Mod. Phys. 71, 1085 (1999). F. Mauri and G. Galli, Phys. Rev. B 50, 4316 (1994). W. Kohn, Phys. Rev. Lett. 76, 3168 (1996). A. Nakano, Concurrency: Practice and Experience 11, 343 (1999). A. Nakano and T. J. Campbell, Parallel Comput. 23, 1461 (1997).
Virtualization-Aware Application Framework
243
A. Omeltchenko, T. J. Campbell, R. K. Kalia, et al., Comput. Phys. Commun. 131, 78 (2000). S. Dapprich, I. Komaromi, K. S. Byun, et al., J. Mol. Struct. (Theochem) 461462, 1 (1999). S. Ogata, F. Shimojo, R. K. Kalia, et al., J. Appl. Phys. (2004) in press. E. Lidorikis, M. E. Bachlechner, R. K. Kalia, et al., Phys. Rev. Lett. 87, 086104 (2001). W. Gropp, E. Lusk, and A. Skjellum, Using MPI, 2nd Ed. (MIT Press, Cambridge, 1999). N. T. Karonis, B. Toonen, and I. Foster, J. Parallel Distributed Comput. 63, 551 (2003). A. Nakano and J. X. Chen, Comput. Sci. Eng. 5, 14 (2003). A. Sharma, A. Nakano, R. K. Kalia, et al., Presence-Teleoperators and Virtual Environments 12, 85 (2003). A. Sharma, R. K. Kalia, A. Nakano, et al, Comput. Sci. Eng. 5(2), 26 (2003). X. Liu, A. Sharma, P. Miller, et al., in Proc. Int'l Conf. on Parallel & Distributed Processing Techniques & Applications IV (CSREA Press, 2002) p. 2054.
M O L E C U L A R D Y N A M I C S SIMULATION A N D LOCAL QUANTITIES
Tamio Ikeshoji Research Institute for Computational Sciences (RICS), National Institute of Advanced Industrial Science and Technology (AIST), AIST Tsukuba Central 2, Umezono 1-1-1, Tsukuba, 305-8568, Japan E-mail: [email protected] Various roles of molecular dynamics simulation are discussed, at first. Then, in order to bridge the macroscopic simulation and atomic level simulation, calculation methods of the local quantities, particularly local pressure, in the molecular dynamics simulation are discussed. It is necessary to integrate force between atoms along the path connecting the atoms within the local volume. This integration corresponds to the virial term used in the molecular dynamics calculation under the periodic boundary conditions. 1. Introduction 1.1. Equilibrium
and
non-equilibrium
Molecular dynamics (MD) gives trajectories of all atoms in the system simulated. MD is used and has been developed to calculate various physical properties relating to thermodynamics and kinetics under the equilibrium conditions and to calculate various processes under nonequilibrium conditions. 1.1.1. Thermodynamic
states
Under the equilibrium, time average is important in MD calculations. Therefore, time development is not necessary to be realistic. A lot of methods giving various thermodynamic states have been proposed. Nose's thermostat [1], Parrinello-Rahman's constant pressure algorithm [2], and so on are historically important and are generally used in the present MD simulations. Recently, some new ensemble methods like Multi-canonical ensemble
245
246
T. Ikeshoji
method have been presented and are being tested.
1.1.2. Kinetic parameters Some kinetic parameters can be obtained by equilibrium MD. Activation energy, for instance, is calculated by constrained MD, in which bond distance, configurations, etc. are constrained. Blue-moon sampling method [3] is an example using the constraints method. Efficient methods to simulate rare events have been recently presented; for instance, meta-dynamics method [5].
1.2. NEMD
(NonEquilibrium
Molecular
Dynamics)
Molecular dynamics has another aspect which concerns time development of systems. In this case, time development must be physically meaningful, though atomic trajectories may not be correct. MD simulations of fractures, phase transition processes, dislocations, etc. are in this category. This kind simulation can be called nonequilibrium MD (NEMD). But, it means usually simulations at the steady state under the non-equilibrium conditions; simulations of thermal flow, mass flow, and so on.
1.2.1. Boundary-driven
NEMD
From simple analogy with the real system of heat flow, two parts of high temperature and low temperature are artificially made in a MD system. They can be placed periodically under the periodic boundary conditions [6]. This method can give thermal conductivity, Soret coefficient, etc. This idea can be expanded to the mass flow in the two-component system. Concentration gradient is at the steady state obtained by mass swapping in the concentration controlled regions [7].
1.2.2. Perturbed NEMD In boundary-driven NEMD, steady flow is achieved with steady gradient. Steady flow can be also realized under the constant field using a an additional force to the atoms [8]. An example is ionic flow with a constant electric potential gradient. Steady thermal flow is also obtained. Transport properties are calculated from the linear response theory.
Local Quantities
(Pressure) in Molecular
Dynamics
247
1.2.3. Unsteady state MD simulations at unsteady state seem to become popular because the longer time simulations are possible now. Time development is the target. Various mechanical processes in the atomistic scale such as fractures, dislocations, Martensitic transitions, and so on are in this category. Another examples are formation process of atomic clusters [9] and others and phase transition process.
1.3. Bridge
to macroscopic
systems
Unsteady state process simulations are often done to mimic the real large system; fracturing, sintering, phase separation, and so on. In order to analyze the simulation results, it is necessary to calculate local quantities such as pressure, temperature, concentration, heat flux, etc., because the system is not uniform. In this article, calculation methods of local quantities, particularly local pressure, are discussed. In continuum dynamics, they are well defined. In molecule dynamics, we see atoms and molecules in the system. They are not continuum. These local quantity calculation will become important, when we develop large-scale and multi-scale simulation methods.
2. Local quantities 2.1. Local quantities
in
volume
A quantity per volume, A, in a local volume, Viocai, (See Fig. 1) is easily
O
o y
local
O Fig. 1.
Local quantity in volume Vj o c a ;.
calculated, when the corresponding atomic quantity, a*, of atom i, is given. It is simply a summation of that in volume Viocai as
T.
248
Ikeshoji
A(Vlocal) = - L -
Y,
«*•
(!)
Local density, /?, is, for instance, calculated by this way from atomic mass, rrij, of atom i as PiViocal) = 77 5Z m i ' (2) K oca ' < lev,.,*,, where the summation is for all atoms in the volume, V;oco;. Thermodynamically meaningful ensemble average quantity, A, is a time average of atomic quantity at MD time step t in the volume. A(Vlocal)
= {A(Vlocal))
(3)
1 vlocal(t2-t1
£ { £ *(4
i)step.
+
ep=tl
KiGVlaaal
(4)
)
< • • • > denotes the ensemble average. The local volume is, here, assumed to be constant. Kinetic temperature in V, T(V), can be calculated in this way (in case of three dimensional coordinates):
l = ( Y. m * M < <5) \ZnlocalkB . ^ I where kg is Boltzman constant and niocai is the total number of atoms in the local volume. When there is a net or local flow in the system, the above equation must be modified as
T(Vlocal)
™ -
)
»X<«<-»o)2;,
- \ 5 = £ , £
TO
where VQ is the time-average flow velocity in the volume:
= i ( £ m^
v
0 = T7 (
m
^
ivi )
(7)
\i&Vloc.at
M
= £ step=ti
£ ieViooai
m
*
(8
)
Local Quantities
2.2. Local quantities
(Pressure) in Molecular
Dynamics
249
at point
It is mathematically possible to define a local quantity at a point, R, using a delta function, 6, as all
A(R) = Y/aiS(fl~R),
(9)
i
in which the delta function has dimension of length"3. It is zero except at x=0, but integration of it in the whole space is equal to the unity as <5(r) = 0, f y rt) 8{f)dr = 1.
(10) (11)
J —(
Density at R can be written with this definition as all
p(R) = '£imi6(fi-R).
(12)
i
Integration of this equation gives local density in the volume as: p(Vlocal)
(13)
= fp{R)dr
(14)
-l
= T7 *local
/-OO
Yl
/
/*00
/
/»00
/
m
iti(xi
- x)5(yi - y)S(zi - z)dxdydz
(15)
• J —oo ./— oo J — oo
(16) The summation is for all atoms in the volume. This equation is, of course, equivalent to Eq. (2) and it is practically used in the MD calculations. It is impossible to use Eq. (9) including the delta function directly 3. Local pressure Pressure is defined mechanically as a force onto a surface or thermodynamically as energy necessary to change a volume. From this definition, it is impossible to think a pressure of a single atom. So, it seems to be difficult to define a local pressure at a point. But, it was already shown by Irving and
250
T. Ikeshoji
Kirkwood [10] to be possible as described here. In this section, two similar terms, pressure tensor P and stress tensor a, are used. Since pressure is a thermodynamic quantity and stress is a mechanical quantity, the former is defined as a time-average value and the latter can be instantaneously calculated. They are both the tensor and in the following relationship. (17)
»
In this section, we consider mainly the system consisting of particles interacting by the pariwise potential. 3.1. Local pressure
as a force onto a surface
When a finite size of surface is considered as shown Fig. 2, pressure onto this surface, i.e. a sum of force fa between atoms i and j onto this plane, looks to be denned as force per surface as
p(«)
E £ \i£left j£right
J ij
(18)
ocal
where the direction is shown by superscript ^a\ But, it is arbitrary, since we don't know how the force acts onto the surface. It may be straight, or may be curved to avoid the surface as shown in Fig. 2. If we consider the infinitely large surface, pressure on the surface can be calculated, since the force must pass through the surface. This is known as method of plane (MOP), which is proposed by Todd and Evans [16] to calculate a local pressure on infinitely large surface.
Fig. 2.
Local pressure on a finite surface.
Local Quantities
(Pressure) in Molecular
Dynamics
3.2. Pressure in volume under the periodic conditions
251
boundary
Before moving to the local pressure at point, pressure calculation in ordinary MD of particles with a pairwise potential shall be reviewed. Pressure tensor, P , of N atom system in a unit cell of Lx, Ly, Lz SLX) in size under the periodic boundary conditions is calculated from the virial theorem
\i€cell
I
= PkBT{l}-^/j2
\iGcell j^i
£fa®&})
\ȣce/J j^i
I
(2°)
I
(21)
where
Uj =fj-fi
(22)
and
or nj Symbol
252
T. Ikeshoji
Fig. 3. Local pressure on a surface in the unit cell under the periodic boundary conditions. (Contribution from two atoms is shown.)
where the summation is for all atoms located at the left of the plane and those at the right, and the force coming from the momentum change in the a direction when an atom passes the surface
l1
£
(a)
(25)
2m,- v.
Tt~s x i<x, x i+wj '''6t>x 3
where the summation is for all atoms passing through the surface within time St. The latter force is due to the momentum change of the left part because of disappear of the atom and that of the right; it corresponds to the force due to hitting the wall. The total stress a at time t in the unit cell is obtained by integration of the stress at a; as
1 1
(ax)
Lx Jo
E
It's Xi<X,Xi+v\
X
M"4EE(§
(a)
•dx.
&t>X
(26)
Integration of the first term is valid only the distance where atoms move within St. Integration of the second term is valid for the distance from atoms i to j . Thus:
Local Quantities
(Pressure) in Molecular
Dynamics
-(ax) = v fes («i-}«) 2^(Q) - 5 E E W ^{E^M-SEEW}
253
(27)
(28)
Since pressure is the time average of —a, virial theorem Eq. (21) is obtained. To get Eq. (21), the fact that velocities in the three directions are independent at equilibrium is used for the first term. Non-orthogonal parts in the first term becomes zero from this fact.
3.3. Pressure
in plane
layer
An example of practical uses of the local pressure is to calculate pressure profile through various interfaces. We shall consider the system which is divided into several plane layers parallel to the interface as shown in Fig. 4.
!
k»
t
Fig. 4. Layers divided to calculate pressure profile through a plane interface parallel to the y — z plane.
Pressure in an infinite layer or a layer under the periodic boundary conditions can be obtained by integrating the pressure on surface, Eq. (24) and Eq. (25) in the layer, from x\ to X2 instead of 0 to Lx used in Eq. (26) to obtain the pressure of the whole system.
254
T. Ikeshoji
-*S2r 1
(29)
^
Xi<X,Xi+V;rtidt>X
Xi<X
Xj>X
\
(32) where Ag^ is the distance of part of rij which located in the layer from x\ to X2- Integration part is illustrated in Fig. 5 by arrows. Summation of
Fig. 5. Integration part of straight connecting line between atoms.
the second term is not only for particles in the layer but also for particles of which distance vector passes the layer. Summation of the first term is only for particles in the layer. Pressure profile obtained by this method is shown in Fig. 6. The dip of the pressure is due to the surface tension at the liquid-gas interface. 3.3.1. Incorrect equation of local pressure in layer If the summation of the second term in Eq. (32) is taken only for atoms in the layer and those interacting with them, the correct pressure is not obtained. From this incorrect idea, i.e. a simple extension of the virial theorem in Eq. (21), may give the following equation for the stress.
Local Quantities
(Pressure) in Molecular
255
Dynamics 1.2
0.02 a. 0
•*-*-•*-•••-.»
:•#•-•-*•«•
J;-«-*:^-»
*•
? - • " • • • »-«-^.-«~a
• • • • * - • • - • * -
b) : 0.8
'- 0.4
-0.02
c
0.04 0.02 I
a) :
H.
'B'"R
\ "
0
%
" "
-e
\
-0.02
\
-0.04
~xx •
• - • y y _ j x
•
ZZ
i„i.
-0.06
H . . . .
-0.08
10
15
x
25
20
i
. . . .
30
35
Fig. 6. Pressure profile through a liquid-gas interface of Lennard-Jones particle system. Density profile is shown by a solid line in the upper panel. (Units are in Lennard-Jones reduced units.)
_a(<xx) [incorrect]
1 S(x2
~x{)
^2
2miV
i
EE i^layer
(33)
j
This equation does not give the correct pressure as shown in Fig. 7, in which the pressure profile perpendicular to the layer is wavy, though there is the mechanical balance. The reason of the incorrect results is that Eq. (33) is not obtained by the correct integration. It does not have, for instance, contribution from atom pair both of which are not located in the layer. Another examples are shown in Fig. 8 and Fig. 9, which show the pressure profile through the solid-solid interface of Lennard-Jones crystals (fee) with the different lattice directions. Correct way of Eq. (32) gives the flat profile of the pressure tenor in the normal (xx) direction (Fig. 8). The pressure tensor in the transverse direction (yy and zz) does not show the flat profile because of the non-uniform density when it is calculated in atomic scale. This is understood from a kind of surface tension. When the incorrect equation Eq. (7) is used, the pressure profile in the normal direction is not
T. Ikeshoji
256
0.02
•
0
- » • » - • . - ,
,,,,,,,,, |
1.2
,. , . |, ., . » — -jr--*-
b) : 0.8
i
^ 0.4
c
• ;
0.02
*• NIl^H
....!....
k i * i » i * . ti-*r±
"jn
0.04 . . . . . . . . . . . . 0.02
0
a
-0.02
XX
•-•yy x
-0.04
zz
a'
-0.06 -0.08
0
....
" {V
0
*••*
0
10
15
20
25
30
35
Fig. 7. Incorrect pressure profile through a liquid-gas interface of Lennard-Jones particle system by Eq. (33). (Units are in Lennard-Jones reduced units.)
flat (Fig. 9); it contains a wavy profile in the atomic scale, since atoms are fixed with a vibration.
Fig. 8. Pressure profile through a solid-solid interface of Lennard-Jones particle system by Eq. (32). (Units are in Lennard-Jones reduced units.)
More details discussion on the integration path and about spherical
Local Quantities
(Pressure) in Molecular
Dynamics
257
Fig. 9. Incorrect pressure profile through a solid-solid interface of Lennard-Jones particle system by Eq. (33). (Units are in Lennard-Jones reduced units.)
layers will be discussed in the following section.
3.4. Pressure
at point
Local pressure in an infinite layer seems to be well expressed. But, it is necessary to discuss on the integration path more precisely. Such a discussion was presented by Irving and Kirkwood in 1950 (before molecular dynamics being popular) [10] to give the atomic scale expression for the thermodynamic and kinetic quantities. Schofield and Henderson discussed the pressure in more details [13] along with Irving Kirkwood's work. In this subsection, we describe pressure at point according to Schofield and Henderson's way. In order to get the pressure at a point, momentum density J{R) at point R is considered, at first, It can be expressed using the 5 function as
J(R) = Y,Pi5[R-r] Pi =
rriiVi.
(34) (35)
Time evolution of the momentum density in the a-direction is, then, expressed with path integral along arbitrary path Coi from the origin to r* as
258
T. Ikeshoji
djM(R) dt l
i
v,") XH) (a) (0) Pi p;
I
(36)
p
*
•'C'oi
where $(ri) is potential energy at r*j and 2 is point on the path. Time evolution of momentum density in volume V by continuum mechanics is
dJ(a) v
()
(/v^)
(37) =v dt R When we compare the above two equations, Eqs. (36) and (37), inside of Vg may be equal each other without an integration constant. We get, JR
then, the expression for the a/3 component in the stress tensor;
«-^ f
n (a) D (/3)
(38)
rrii
JCoi
I
where (To is the integration constant. With virial theorem of Eq. (21), the integration constant is 0, and kinetic and conngurational terms may be as follows. Kinetic part (ideal part): (39) Configurational part (non-ideal part): R-l
dl^
(40)
For the two-body potential, it becomes with an arbitrary integration path Cij from atoms i to j .
^ ( 5 ) = | Ei j^kE / if>?ws R-l
dl^
(41)
Ci
In order to use these equations defined at point in the MD calculations, it is necessary to integrate these equations in the volume concerned.
Local Quantities
(Pressure) in Molecular
I
Dynamics
259
(42)
<#»<«-,)— £ ££-•
(*•)
''local JViocal
Kinetic part is simply integrated as
Configurational part becomes
ZVlocal *-?"$£
JCiieVtBcal)
3.4.1. Integration path There is a long history about the discussion on the integration path appeared in Eq. 41. It was emphasized that the integration path is arbitrary [13]. Irving and Kirkwood presented naturally a straight line as an example (see Fig. 10) [10]. It is called now Irving and Kirkwood choice (IK-contour). Harasima showed another path which consists of two parts, a line parallel to a flat or spherical plane and the perpendicular line as shown in Fig. 10 [12]. It is called Harasima choice (H-contour). These two contours have been always raised when local pressure is a subject of papers. But, some questions on the arbitrariness have been presented. Blokhuis and Bedeaux described that H-contour does not obey microscopic sum rule [14]. Wajnryb et al. described that pressure must be independent on numbering of atoms and center of the coordinates [15]. Schofield and Henderson also described that to be symmetry for the pressure tensor the path must be straight line [13]. In order to avoid this kind of arbitrariness another kind of calculation methods of the local pressure were presented. One is Method of Plane (MOP) valid only on infinite pane by Todd and Evans [16]. Lovett, and M. Baus gave " Local thermodynamic pressure " (similar to tangential component in H-contour) [18]. 3.4.2. Pressure tensors in the flat layer by path-integral through IK-contour and H-contour Normal component: In the case of the flat layer under the periodic boundary conditions, the
260
T. Ikeshoji
H-contour Fig. 10.
IK-contour
Irving-Kirkwood (IK) contour and Harasima (H) contour.
pressure tensor by IK-contour is the same as that already discussed in section 3.3. Normal component (xx) of the stress tensor can be written with two coordinates, x\ and X2, which are the left and right side coordinate of the layer. Since integration path is from x^ to Xj, the region which contributes to the integration in the layer can be written from min(x2,Xj) to maix(xi,Xj).
(xx)
E E-^/ Jij
(45)
dx
Xi 5:^2 Xj > X i
A*) (46) Xi<X2 Xj>Xl
=- E E Xil&Xj)
X j *^_Xx
(x) Ax)
r fU'fii' ' ij Jij
V
X2 -
X\
(47)
»
This normal component is independent on the choice of contours. Any contours which connect atoms i and j give the same value as can be understood from Fig. 11. Tangential component: In the case of the tangential component of the stress tensor, the expression depends on the choice of contour. The equation for the IK-contour is similar to that of the normal component as
-o-,
(vv)
„(y) Av) r ij Jij X2 -
V
Jv) ij
Xi
'
(48)
Local Quantities
Fig. 11.
(Pressure) in Molecular
Dynamics
261
A contour to connect atoms i and j .
Integration through the H-contour, contribution is only when atom j is located in the layer.
-4
W)
=-
E
r(y) Av) J M
17 -
(49)
Xa<Xi<Xb
This is the same expression as Eq. (33), which was given as the expression which was simply extended from the virial theorem, but its normal direction component gives the incorrect non-uniform pressure profile through the flat interface at equilibrium. 3.4.3. Pressure in spherical layer In the case of calculation of the pressure in the spherical coordinate, the system is divided into spherical layers as shown in Fig. 12. Pressure in the spherical layers is calculated from Eq. (39) and Eq. (40). Kinetic part does not depend on the choice of the contour. IK-choice in the spherical layer We consider a spherical shell of radius R and thickness AR. For the IK choice of contour dj, the integration path is shown by arrows in Fig. 13. In order to integrate, a new variable A to define the path I is introduced as l = ri + Xfij;
0
(50)
The integration region is in the interval dj £ Vjoca; is defined by Aa < A < At where Aa and Af, are given by a = r\ + Xafij and b = ?i + Xbfij, respectively, and a and b are the entry and exit points, respectively, of dj in the layer. For some configurations of i and j , dj m&y penetrate
T. Ikeshoji
Fig. 12. Layers divided to calculate pressure profile through the spherical interface of a droplet.
Fig. 13.
Integration region by IK-choice in the spherical layer.
the layer twice as seen in Fig. 13, and C^ e Viocai represents two separate intervals. These situations will be understood in the following as the integral fxb actually consisting of two parts (fxb + Jxb'), without being explicitly shown. Furthermore, we assume the notation that Aa = 0 if Fj £ V\0cai and A;, = 0 if fj € Viocai- The configurational part of the aa-component of the stress tensor follows
- 4 * ( Q a ) Wocai) = - £
£
^ fb
(ea • /;-) (eQ • r y ) dX.
(51)
Local Quantities
(Pressure) in Molecular
263
Dynamics
Note that the unit vector, e r , e$, and e^ are not constant through the contour /, when they are expressed in the Cartesian coordinate as (52)
T'T'Tj (ll+l2y)1/2'
lyh
*>TXZ
ee =
m+qyvw+iDw by
e
Igr
1/2, 2
2
(54)
,0 > .
1/2
~(ll+ly) (l X+l y)
Assuming that fa and f^ are parallel ( ftj = —fij^j/r^), written as IK(rr) -&C Knocal)
fij
f "
<jf^{R) may be
=
2
^ V ^
r%X + 2{fi-rij)r%X
+
{fi-fijf
r?-A2 + 2 ( r i . r y ) A + r?
-^\vlocal) / , / , r~Tr v ,;*
,• 3
Vi,yrij,x
~ ri,xrij,y)
H
I JXa
(53)
i
dX
(55)
= r?M* + 2(rfi,f'fiJ,4>)*
+ rl
dX,
(56) where r i ^ and r ^ are vectors projected of fi and f^, respectively, on the x-y plane. The integrals may be expressed in the closed form, with the results
-a^HVocal)
= ]T £ •>•
r b VI ^
- A«) F^)
- F(X°)]
'
( 57 )
1
where
F
W
- ~\Jririj
~ (^i ' ^j)2
arctan
rfjX +
jri-fjj)
(58)
_\Jririj~(r'i-rij)'2
and lK(
njV
(59)
264
T.
Ikeshoji
where
^•» r 'J>
G{\) =
r/ 2 r 2
\l i,4> ij,4> If rjrjj — (ft • fij) center,
r
'.*rai>L-
_ /r - - r
arctan
^2
\ i, ' ij,
\ri,(j>rij,
~ \ri,
(60) = 0, i.e. particles i and j are on the same line from the
[rfj (Afc - A„) F(Xb) - F(Xa)}
= 1.
(61)
^ rl,
(62)
In the calculation, the value of A at the crossing point of the connecting line with a sphere of radius R is needed.
Afl = 4 " j " in • rij) ± \l{n • njf
- (rf - B?)\
with 0 < \ R < 1 . (63)
H-choise in the spherical layer: H-contour in the spherical coordinate can be divided into two parts; straight line from atom i toward the center of coordinate and the line on the spherical surface where atom j is located. The former contributes to the radial direction and the latter contributes to tangential direction. Normal component in the stress tensor is very simple. It is similar to the flat layer. TH{rr)
=
i-
\ ^ \ ^
/ f..
( r0rn
2Vlocai For the force in the tangential direction, angle w between atoms i and j along the H-contour is introduced. Then, tangential component becomes
^J.S^MH))-
2V
i€Vloaal
j^t
(65)
Local Quantities
(Pressure) in Molecular
Dynamics
265
3.4.4. Pressure profile in the liquid droplet Pressure tensors in the droplet are calculated using the equations shown in the above. An example of liquid droplets of Lennard-Jones particles is shown in Fig. 14. There is a small but obvious difference between IK and H choices. o.i 0.08 0.06
I""" 1
w®
^/\
D
. ly Har isima u\ zr'n tC ^pSSsggiKajs 0.02 -~i sverse —tran 0 c£ -- By 1 larasinjia -0.02 'm ^
0.04
S
-0.04
Fig. 14.
^
0
2
4
H
6
10
12
14
Pressure tensor in the liquid droplet. Lines are by IK-choice.
Pressure tensor calculation in the liquid droplet have been presented by Thompson et al. [11]. They used both IK- and H-contours to calculate the pressure in the spherical layer and on the spherical plane and found no difference within errors. But, highly accurate calculate in layers showed here the difference. The force balance are satisfied in the both profiles. Which is correct or better? An answer is obtained from the calculation in the homogeneous liquid described in the next part.
3.4.5. Pressure profile in homogeneous liquid with spherical layers Hafskjold and Ikeshoji have recently shown that H-contour does not give uniform pressure for the uniform liquid of hard sphares under the equilibrium when the spherical coordinate is used [19]. When we use the LennardJones particles with equations given in the above section, pressure near the center is different from the bulk value ss shown in Fig. 15, even though the system is uniform, homogeneous, and mechanically balanced. Mechanical balance in the spherical coordinate is expressed as
Pr(r) =PN{r) +
2dpN(r) r dr
(66)
266
T. Ikeshoji
and
f
PN(T)
(67)
pT(r)dr ,
Jo
where superscripts N and T show normal and tangential components. Pressure profile by H-contours shown here satisfies this mechanical balance. When IK-contour is used for the integration, the pressure is uniform shown in Fig. 16. Therefore, the IK-contour seems to be better integration path even for the inhomogeneous fluid.
sure Tensor
1.5
iK n f1
•*WM
i
—
m %£
0.5
•
/
i
0 0
2
4
6
8
0
10
2
4
6
Fig. 15. Pressure tensor by H-contour in homogeneous Lennard-Jones liquid. (The line shown by the arrow is the pressure expected from the force balance.)
1.5
1
i
h
I Mv 1
m
0.5
\ \ 0
Fig. 16.
i
2
4
6
8
10
0
2
4
6
8
10
Pressure tensor by IK-contour in homogeneous Lennard-Jones liquid.
4. Concluding remarks Local quantities are used to bridge the molecular dynamics calculation and macroscopic one. Some quantities are easily bridged but some quantities
Local Quantities
(Pressure) in Molecular
Dynamics
267
of a function of more than two atoms are not so simple. Pressure tensor is such a quantity and discussed here. Integration path problem was not solved completely. But, we found an evidence to avoid Harasima choice to be used to calculated the local pressure tensor. The recommended calculation method of the local pressure tensor in a local volume is to integrate the force along with straight line connecting the two atoms within the local volume as illustrated in Fig. 17.
Fig. 17.
Recommended path integration for the local pressure tensor.
Acknowledgments The work was partially done while the author was visiting the Institute for Mathematical Sciences, National University of Singapore and Institute of High Performance Computing (IHPC) in 2003. The visit was supported by the Institute and IHPC. This work was conducted as a collaboration with professor Bj0rn Hafskjold in Norwegian University of Science and Technology. References 1. S. Nose, Molec. Phys., 5 2 , 255 (1984). S. Nose, Prog. Theor. Phys Suppl, 1 0 3 1 (1991). 2. M. Parrinelo a n d A. R a h m a n , Phys. Rev. Lett, 4 5 , 1196 (1980). 3. M. Sprik a n d G. Ciccotti, J. Chem. Phys., 1 0 9 , 7737 (19 98) a n d references therein. 4. Gear, Kevrekidis C o m p . C h e m . E n g . (2002), A. Laio, M. Parrinello, P N A S 99 (2002). 5. T . Ikeshoji a n d B . Hafskjold, Mol. Phys., 8 1 , 251 (1994). 6. B . Hafskjold a n d T . Ikeshoji, Mol. Sim., 1 6 , 139 (1996). 7. M . J . Gillan, J. Phys. C: Solid State Phys., 2 0 , 521 (1987); D. M a c G o w a n a n d D . J . E v a n s , Phys. Rev., A 3 4 , 2133 (1986). 8. T . Ikeshoji, B . Hafskjold, Y. Hashi and Y. Kawazoe, Phys. Rev. Lett., 7 6 , 1792 (1996).
268
T. Ikeshoji
9. J.H. Irving and J.G. Kirkwood, J. Chem. Phys., 18, 817 (1950). 10. S.M. Thompson, K.E. Gubbins, J.P.R.B. Walton, R.A.R. Chantry, and J.S. Rowlinson, J. Chem. Phys., 8 1 , 530 (1984). 11. A. Harasima, Adv. Chem. Phys., 1, 203 (1958). 12. P. Schofield and J.R. Henderson, Proc. R. Soc. Lond., A379, 231-246 (1982). 13. E.M. Blokhuis and D. Bedeaux, J. Chem. Phys., 97, 3576 (1992). 14. E. Wajnryb, A.R. Altenberger, and J.S. Dahler, J. Chem. Phys., 103, 9782 (1995). 15. B.D. Todd, D.J. Evans, Phys. Rev. E, 52, 1672 (1995). 16. F. Varnik, J. Baschnagel, and K. Binder, J. Chem. Phys., 113 444 4(2000). 17. H.El Bardoumi,, R. Lovett, and M. Baus, J. Chem. Phys., 113, 9804 (2000). 18. B. Hafskjold and T. Ikeshoji, Phys. Rev. E, 66, 011203 (2002). 19. T. Ikeshoji and B. Hafskjold, Molec. Simul, 29, 101 (2003).
RECENT ADVANCES IN MODELING AND SIMULATION OF HIGH-SPEED INTERCONNECTS
M i c h e l N a k h l a and R a m A c h a r Department of Electronics, Carleton University, Ottawa, ON - K1S5B6, Canada E-mail: {msn, [email protected] The rapid increase in operating speeds, density and complexity of modern integrated circuits has made interconnect analysis a requirement for all state-of-the-art circuit simulators. Interconnect effects such as ringing, signal delay, distortion and crosstalk can severely degrade signal integrity. Interconnections can be from various levels of design hierarchy. As the frequency of operation increases, the interconnect lengths become a significant fraction of the operating wavelength, and conventional lumped models become inadequate in describing the interconnect performance and transmission line models become necessary. This chapter describes some of the recent advances in transmission line simulation techniques. Application of mode-order-reduction algorithms to high-speed interconnect analysis is also presented. 1.
Introduction
With the rapid developments in VLSI technology, design and CAD methodologies, at both the chip and package level, the operating frequencies are fast reaching the vicinity of GHz and switching times are getting to the sub-nano second levels. The ever increasing quest for high-speed applications is placing higher demands on interconnect performance and highlighted the previously negligible effects of interconnects such as ringing, signal delay, distortion, reflections and crosstalk. Interconnects can exist at various levels of design hierarchy such as on-chip, packaging structures, multichip modules, printed circuit boards and backplanes. In addition, the trend in the VLSI industry towards miniature designs, low power consumption and increased integration of analog circuits with digital blocks has further complicated the issue of signal
269
270
M. Nakhla & R. Achar
integrity analysis. It is predicted that interconnects will be responsible for majority of signal degradation in high-speed systems 1 " 25 . Highspeed interconnect problems are not always handled appropriately by the conventional circuit simulators, such as SPICE . If not considered during the design stage, these interconnect effects can cause logic glitches which render a fabricated digital circuit inoperable, or they can distort an analog signal such that it fails to meet specifications. Since extra iterations in the design cycle are costly, accurate prediction of these effects is a necessity in high-speed designs. Hence it becomes extremely important for designers to simulate the entire design along with interconnect subcircuits as efficiently Oft 1 ^S
as possible while retaining the accuracy of simulation . Speaking on a broader perspective, a "high-speed interconnect" is the one in which the time taken by the propagating signal to travel between its end points can't be neglected. An obvious factor which influences this definition is the physical extent of the interconnect, the longer the interconnect, more time the signal takes to travel between its end points. Smoothness of signal propagation suffers once the line becomes long enough for signal's rise/fall times to roughly match its propagation time through the line. Then the interconnect electrically isolates the driver from the receivers, which no longer function directly as loads to the driver. Instead, within the time of signal's transition between its high and low voltage levels, the impedance of interconnect becomes the load for the driver and also the input impedance to the receivers 1>15 . This leads to various transmission line effects, such as reflections, overshoot, undershoot, crosstalk and modeling of these needs the blending of EM and circuit theory. Alternatively, the term 'high-speed' can be defined in terms of the frequency content of the signal. At low frequencies an ordinary wire, in other words, an interconnect, will effectively short two connected circuits. However, this is not the case at higher frequencies. The same wire, which is so effective at lower frequencies for connection purposes, has too much inductive/ capacitive effects to function as a short at higher frequencies. Faster clock speeds and sharper slew rates tend to add more and more high-frequency contents. An important criterion used for classifying interconnects is the electrical length of an interconnect. An interconnect is considered to be "electrically short", if at the highest operating frequency of interest, the interconnect length is physically shorter than approximately one-tenth of the
Recent Advances in Modeling and Simulation
wave-length (i.e., length
of High-Speed Interconnects
of the interconnect/X
271
~ 0.1 , A, = v / / ) . E l s e
the interconnect is referred as "electrically long"1'15. In most digital applications, the desired highest operating frequency (which corresponds to the minimum wavelength) of interest is governed by the rise/fall time of the of the propagating signal. For example, the energy spectrum of a trapezoidal pulse is spread over an infinite frequency range, however, most of the signal energy is concentrated near the low frequency region and decreases rapidly with increase in frequency. Hence ignoring the high-frequency components of the spectrum above a maximum frequency, fmax , will not seriously alter the overall signal shape. Consequently, for all practical purposes, the width of the spectrum can be assumed to be finite. In other words, the signal energy of interest is assumed to be contained in the major lobes of the spectrum and the relationship between desired fmax with the tr is the rise/fall time of the signal can be expressed as ' '
'
'
fmax"0.35/tr
(1)
This implies that, for example, for a rise time of 0.1ns, the maximum of frequency of interest is approximately 3GHz or the minimum wave-length of interest is lOcms. In some cases the limit can be more conservatively set as / J max
90
=1/' r
In summary, the primary factors which influence the decision that, "whether high-speed signal distortion effects should be considered", are interconnect length, cross-sectional dimensions, signal slew rate and the clock-speed. Other factors which also should be considered are logic levels, dielectric material and conductor resistance. Electrically short interconnects can be represented by lumped models where as electrically long interconnects need distributed or full-wave models. 1.1
High-Speed
Interconnect
Models
Depending on the operating frequency, signal rise times and nature of the structure, the interconnects can be modeled as lumped, distributed (frequency independent/dependent RLCG parameters, lossy, coupled) or full-wave models.
272
1.1.1
M. Nakhla & R. Achar
Lumped
Models
At lower frequencies, the interconnect circuits could be modelled using lumped RC or RLC circuit models. RC circuit responses are monotonic in nature. However, in order to account for ringing in signal waveforms, RLC circuit models may be required. Usually lumped interconnect circuits extracted from layouts contain large number of nodes which make the simulation highly CPU intensive. 1.1.2
Distributed
Transmission Line Models
At relatively higher signal-speeds, electrical length of interconnects becomes a significant fraction of the operating wavelength, giving rise to signal distorting effects that do not exist at lower frequencies. Consequently, the conventional lumped impedance interconnect models become inadequate and transmission line models based on quasi-TEM assumptions are needed. The TEM (Transverse Electromagnetic Mode) approximation represents the ideal case, where both E and H fields are perpendicular to the direction of propagation and it is valid under the condition that the line cross-section is much smaller than the wavelength. However, the inhomogeneities in practical wiring configurations, give rise to E or H fields in the direction of propagation. If the line cross-section or the extent of these nonuniformities remain a small fraction of the wavelength in the frequency range of interest, the solution to Maxwell's equations are given by the so called quasi-TEM modes and are characterized by distributed R, L, C, G per unit length parameters . In practical situations, owing to complex interconnect geometries and varying cross-sectional areas, the interconnects may need to be modelled as nonuniform lines. In this case, the per unit length parameters are functions of distance 1.1.3
" . Distributed Models with Frequency-Dependent
Parameters
At low frequencies, the current in a conductor is distributed uniformly through out its cross section. However, as the operating frequency increases, the current distribution gets uneven and it starts getting concentrated more and more near the surface or edges of the conductor. This phenomenon can
Recent Advances in Modeling and Simulation
of High-Speed Interconnects
273
be categorized as follows: skin, edge and proximity effects ' ' ' " . The skin effect causes the current to concentrate in a thin layer near the conductor surface and this reduces the effective cross-section available for signal propagation. This leads to increase in the resistance to signal propagation and other related effects . The edge effect causes the current to concentrate near the sharp edges of the conductor. The proximity effect causes the current to concentrate in the sections of ground plane that are close to the signal conductor. To account for these effects, modelling based on frequency-dependent p.u.l. parameters may be necessary. Rest of the chapter is organized as follows. Section-2 provides a detailed analysis of transmission line equations and derivation of a generic multiconductor transmission line stamp, suitable for inclusion in a MNA analysis. In Section-3, a review of formulation of circuit equations in the presence of distributed elements and limitations of conventional simulators is given. Review of efficient techniques for discretization of Telegrapher's equations is given in Section-4. Sections 5-7 give a detailed account of simulation of interconnects using model-reduction techniques. Section-8 provides references to related topics.
2. Distributed Transmission Line Equations Transmission line characteristics are in general described by Telegrapher's equations. Consider the transmission line system shown in Fig. la. Telegrapher's equations for such a structure can be derived by discretizing the line into infinitesimal sections of length Ax and assuming uniform perunit length (p.u.l.) parameters of resistance (/?), inductance (L), conductance (G) and capacitance ( Q . Each section then includes a resistance RAx, inductance LAx, conductance GAx and capacitance CAx (Fig. lb). Using Kirchoff's current and voltage laws, one can write 15 , ~\ v(x + Ax, t) = v(x, t) - RAxi(x, t) - LAx^-i(x, t) at
(2)
or v(x + Ax,£-v(x,t)
=
_R.{X)
{)_^_
t)
(3)
274
M. Nakhla & R. Achar
Taking the limit Ax —» 0 , one gets Av(x, t) = -Ri(x, t) - lJLi(x, t) ox at
(4)
ifO. t)
i(x, t)
i(d, t)
v(U, t)
vtx, t)
v{d,t)
ground, x =0
x x + Ax
x =d
(a) Transmission line system
i(x, t)
uwc—WV
v(x, t)
LAx
x ! *-
R&x CAx Ax
• > , (x + A x )
(b) Representaion of a discretized section Fig. 1. Transmission line system and discretization Similarly, we can obtain the second transmission line equation in the form: 3-('(x, t) - -Gv(x, t) - C^-v(x, t) ox at
(5)
Taking Laplace transform of equations (4) and (5) one can write d V(x, s) = -(R + sL)l(x, s) = -ZI(x, s) ox
(6)
I(x,s) = -(G + sC)V(x, s) = -YV(x,s)
(7)
dx
Recent Advances in Modeling and Simulation
of High-Speed Interconnects
275
where Z and Y represent the p.u.l. impedance and admittances of the transmission line, given by Z=R + sL; Y = G + sC (8) The set of equations represented by (6) and (7) can be solved if they can be written in terms of one of the unknowns (either V(x, s) or I(x, s) ) as follows -ZjVix, s) = ZYV(x, s) = TZV(x, s) dx -^jl(x,s) dx
= YZl{x,s) = I I(x,s)
(9)
(10)
where Y(s) is the complex propagation constant, given by T(s) = CC+./P = JZY = J(R+jwL)(G+jwC)
(11)
where a represents the real part of the propagation constant and is known as attenuation constant, whose units are expressed in (Nepers/m). p represents the imaginary part of the propagation constant and is known as phase constant, whose units are expressed in (radians/m). Solution of (9) and (10) is given as a combination of forward-reflected waves travelling on the line as V(x,s) = V(0,s)e±Yis)x
(12)
±r{s)x
I(x,s) = l(0,s)e
(13)
The phase shift and attenuation experienced by the travelling waves are given by
e
and
ai s x
e-
->
>
respectively. If the lines are lossless, the
propagation constant is given by, Y(s) = ;'P = JZY = jwjLC in this case represents a pure-delay element. 2.1
Multiconductor
Transmission
Line
. The line
System
Consider the multiconductor transmission line (MTL) system, with /V coupled conductors, shown in Fig. 2.
M. Nakhla & R. Achar
vx{x,t)
v,(0, t)
vi(d,t)
i2(d, t)
i2(x, t)
i2(0, t)
v2{x, t)
v 2 (0, t)
v2(d, t)
iN(d, t)
iN(x, t)
iN(0, t) 1
vjv(0, t)
i\{d,t)
ix(x,t)
»i(o,0
•
vN(x, t)
vN{d, t)
£
ground . x =0
Fig. 2. Multiconductor transmission line system Using the steps similar to the case of single transmission line, we can derive the multiconductor transmission line equations. Per-unit-length parameters (R, L, G & C) in this case become matrices and voltage/current variables become vectors represented by v and i, respectively. Noting these changes, we can re-write (4) and (5) as v(x,t) dx i(x, t\
0 G
R v(x, t) 0 i(x, t)_
0 C
L 0
v(x, t) i{x, t)
Taking Laplace transform of equation (14) one can write
(14) (15)
Recent Advances in Modeling and Simulation
l-V(x, ox p(x,s) ox
S)
of High-Speed Interconnects
277
= -ZI(x, s)
(16)
= -YV(x,s)
(17)
where Z and Y represent the impedance and admittance matrices, given by Z = R + sL,
Y = G + sC
(18)
The R, L, G & C matrices are obtained by a two-dimensional solution of Maxwell's equations at appropriate positions, along the propagation axis. For this purpose, depending on the nature and geometry of the structure, and the desired accuracy, techniques based on quasi-static or full-wave approaches can be used. The R, L, G & C matrices are symmetric and positive definite 2.2
'
.
Multiconductor
Transmission
Line
Stamp
In this section, we derive a stamp relating the terminal currents and voltages of MTL structures, suitable for inclusion in SPICE-like simulators. The transmission line stamp equations.
is derived, through decoupling
of
MTL
From (16) and (17), we get two sets of coupled wave equations as a2 -^V(x,s)=ZYV(x,3) dx -5L/(* ( ,) = YZI(x, s) dx
(19)
(20)
Decoupling of equations in (19) or (20) can be achieved through the use of suitable modal transformation matrices . For this purpose, introduce a transformation W relating the circuit voltages V and modal voltages V as V(x,s) = WV(x,s)
(21)
278
M. Nakhla
& R.
Achar
Hence (19) can be re-written as (for simplicity, we omit the accompanying term (x,s)) — V = (W ZYW)V dx
(22)
For effective de-coupling of equations to take place, the matrix product in parenthesis must lead to a diagonal matrix as,
W lZYW
\]
0 0
0
0
(23)
0 0 Y2 where the diagonal matrix contains the eigenvalues of the product which corresponds to the roots of the characteristic equation |Y^£/-Zr| = 0;
k=l,2,...,N
ZY, (24)
where U represents the unity matrix (we assume the general case that, there exist N distinct eigenvalues). Having obtained the propagation constants, solution of (22) can be written in the standard form as Vk(x) = e where
Vk(x)
cki + eK ckr;
k = 1,2
(25)
N
represents the k? modal voltage and cki,
ckr
corresponding constants, pertaining to incident and reflected respectively. Eq. (25) can be written in the matrix form as Vx(x) V2(x)
VN(x)
-v
are the waves,
-\r -Y„x
'2r
(26)
-Ni
L
Wr
Recent Advances in Modeling and Simulation
Define, E{x) - diag
of High-Speed Interconnects
279
- V and pre-multiplying both sides of (26) .. e by the modal transformation matrix W (from (21)), we can write (26) in terms of circuit voltages as -V
V(x) = W[E(x)]Cl + W[E(x)]
-l.
(27)
C2
where Cl and C 2 are constant vectors, which can be determined from the terminal currents and voltages (i.e. at x = 0 and x = d). A relationship between the near-end (x=0) and far-end (x=d) voltages can be derived using (27) as V(0) V{d)_
w
w
WE(d)
W[E(d)]~
(28)
Next, substituting (27) in (16), we have *i
-WT[E(x)]C1 + WT[E(x)]
(29)
C2 = -ZI(x); YA
or -i.
Wj = z
I(x) = W / [ £ W ] C 1 - W / [ £ ( x ) ] 'C2
wr
(30)
A relationship between the near-end (x-0) and far-end (x=d) can be derived using (30) as 7(0) I{d)\
W. WtE(d)
-w; -l
(31)
-Wt[E(d)]
Using (28) and (31) and eliminating the constants, Cx and C 2 , we get the representation in terms of y-parameters as
M. Nakhla & R. Achar
280
1(0) I(d\
-W;
w. WtE(d)
-W^d)}-
w
w
WE(d)
W[E(d)]~
V(0) V(d)_
(32)
Assume that the multiconductor stamp is required in the standard form shown in Fig. 3. In this case, to account for the current 1(d) as flowing inwards, the expression for 1(d) in (32) must be multiplied b y - 1 . Noting this and simplifying (32) further, we can write the MTL stamp in terms of yparameters as
W(0) =
7(0)
[V(d)_
Yl2
V(0)
^22_
V(d)
Y~u _^21
(33)
where (-Y t d)
{-rkd)
(Tkd)
;
E, = diag
E2 = diag<
-e
(Ykd)
(k= 1,2, ...,N) (34)
I(d, s)
/(0,5) V(0,s)-
MTL
System
V(d, s)
Fig. 3. Multiconductor transmission line system 2.2.1 Matrix Exponential Stamp An alternative form of the MTL stamp is also quite popular and it has the matrix exponential form , which is explained below. Equations (6) and (7) can be written in the hybrid form as
Recent Advances in Modeling and Simulation
dx
V(x, s) = (D + sE) V(x, s) ; I(X, 5)_ _I(x, s)_
D=
of High-Speed Interconnects
0 -R ; -G 0
E=
0 -L -C 0
281
(35)
Solution of (35) can be written as (D + sE)d V(0, s) V(d, s) =e I(d, s) W *).
(36)
A relationship between the forms represented by (33) and (36) can be obtained as follows: Define T(s) as ^11
T(s)
r
i2
(D + sE)d
(37)
^21 ^22
Using some algebraic manipulations, we can express the relationships between the hybrid parameters (36) and the y-parameters (33) as -Tn'Tu
7(0) I(d\
V(d) I(d)_
/
i
21~ 22*i2
M2 Y
2i
+
Y
Y22 u
V(0) V(d\
12 Ml
MI
22*12
12 -1
MI
~YKYX2
V(0) 7(0).
(38)
(39)
Similarly, another useful representation of the MTL stamp is in terms of ABCD parameters, which can be written as
V n -u
V(0) +
hi 7
0" 7(0)
22 -U I(d\
=
0 0
(40)
In the next section, we will review a generic formulation of distributed interconnect circuit equations, suitable for general purpose circuit simulators.
282
M. Nakhla & R. Achar
3.
Formulation of Circuit Equations
The MNA and output equations for lumped linear networks can be written using a generic notation as Wx(t) + Gx(t) = Bu(t) y =
L
u{t)
^
where B and L are selector matrices, with entries (0 or 1). The superscript T ' denotes the transpose. Let b(t) = Bu{t). From (41), MNA equations in the frequency-domain can be written as (G + sW)X(s) = b(s) Y(s) = LTU(s)
(42)
For the case of nonlinear elements, MNA equations in (41) can be modified as Wx(t) + Gx(t) + F(x(t))-b(t)
=0
y = LTu(t)
(43)
where F(x(t)) is a nonlinear function of x. 3.1
Formulation Distributed
of linear subnetworks Elements
containing
Consider a linear subnetwork n containing distributed elements. Using (33), the frequency-domain equations of a distributed subnetwork containing nd coupled conductors can be written as Yd(s)Vd(s) = Id(s)
(44)
where Vd(s) and Id(s) represent the Laplace-domain terminal voltages and currents of the distributed element, respectively, Yd(s)
represents the
admittance matrix having complex dependency on frequency, which are
Recent Advances in Modeling and Simulation
of High-Speed Interconnects
283
described in terms of line parameters. Equation (42) representing the lumped linear network can be combined with (44) as ^
n +
Gn Ld
(45)
YL dl where • Wm G„ G 91 K
n
are constant matrices describing the lumped memory
and memoryless elements of subnetwork n, respectively, 91 n is the node-space of subnetwork n, • Ld is the selector matrix which maps the terminal currents of the distributed subnetwork to the nodal space of the linear subnetwork % . • b% e 91 is a constant vector with entries determined by independent voltage/current sources of subnetwork n, Vn(s) e 91 is the vector of node voltage waveforms appended by independent voltage source currents, linear inductor current waveforms of linear subnetwork JI . The equation (45) can be concisely written as
(46)
W(s)X(s) = b(s)
3.2
Generic formulation of nonlinear circuits with distributed elements
Consider a general network containing arbitrary number of nonlinear and linear (lumped and distributed) components. For simplicity, let the linear components be grouped into a single linear subnetwork n as shown in Fig. 4. Using (43), without loss of generality, the circuit equations network 0 can be written as % ^ ' )
+ G
^W
+ t
A W + ^ ( 0 ) - » t ( 0 = 0,
139
te[0,T]
for the (47)
M. Nakhla & R. Achar
284
where W,xN.
•
W. G e SR
are constant matrices describing the lumped mem-
ory and memoryless elements of network 0 , respectively, 4.G5R
is a
constant vector with entries determined by the independent voltage and current source, • F(xJ is a function describing the nonlinear elements of the circuit, N
xJt) € 91 * is the vector of node voltage waveforms appended by independent voltage source current, linear inductor current, nonlinear capacitor charge and nonlinear inductor flux waveforms, AL is the total number of variables in the MNA formulation and nn is the total number ports for linear subnetwork TI . • Ln = [/,. •] with elements ie {1, ...,N±},j€
{1, ...,«„}
/, e { 0 , 1}
where
with a maximum of one nonzero in /i
each row or column, is a selector matrix that maps in(t) e 91 the vector of currents entering the linear subnetwork n, into the node space 91^* of the network § , The linear multi-terminal subnetwork n can be characterized in the frequency-domain by its terminal behavior as Yn(s)VK(s) = IK(s) (48) where
Yn(s)
is the y-parameter matrix of subnetwork n,
Vn(s)
is the
vector of terminal voltage nodes that connect the subnetwork to the network <)>, / (s) is the Laplace transform (in(t)).
Recent Advances in Modeling and Simulation
of High-Speed Interconnects
285
Linear subnetwork n
Vi
*ik n
-•
• > = '
^ n l '•••• .71. •
.
^n-nJt.-.TE.
Fig. 4. Nonlinear network (|) containing linear subnetwork 71 with distributed elements 3.3
Interconnect
simulation
issues
Simulation of large interconnect networks is associated with two major bottlenecks: Mixed frequency/Time problem and CPU expense. 3.3.1
Mixed frequency/Time
Problem
The major difficulty in simulating high-frequency models such as distributed transmission lines is due to the fact that, while described in terms of partial differential equations, they are best represented in the frequency-domain (48). As seen, they do not have a direct representation in the time-domain. On the other hand, nonlinear devices can only be described in the timedomain (47). These simultaneous formulations are difficult to handle by a traditional ordinary differential equation solver such as SPICE 2 6 ' 1 3 9 " 1 5 0 .
286
3.3.2
M. Nakhla & R. Achar
CPU Expense
Frequency-domain simulation of large linear networks is conventionally done by solving (42) or (45) at each frequency point using LU decomposition and forward-backward substitution. For time-domain simulation, integration techniques are used to convert a set of time-domain differential equations into a set of difference equations. For example, application of trapezoidal rule to (43), leads to a nonlinear set of difference equation (G + T-W)v(t + At) + F(v(t + At)) = (j-W-GJv(t)
+ b(t) + b(t + At) -Fv(t) (49)
To solve (49) at each time point, Newton iterations are required, which may need several LU decompositions. This causes (note that W and G matrices for interconnect networks are usually very large) the CPU cost of a timedomain analysis to be expensive. The objectives of interconnect simulation algorithms are to address both mixed frequency/time problem as well as to handle large linear circuits without too much of CPU expense. There have been several algorithms proposed for this purpose, which are broadly classified into two main categories, as follows, (a) Approaches based on macromodelling each individual transmission line set. Techniques such as "method of characteristics" are grouped in this category and are discussed in Section 4. (b) Approaches based on model-order reduction (such as AWE, CFH, PRIMA) of the entire linear subnetwork containing lumped as well as distributed subnetworks and are discussed in Sections 5-7. It is to be noted that the second approach can also be used in conjunction with the first approach.
Recent Advances in Modeling and Simulation
of High-Speed Interconnects
287
4. Simulation Techniques Based on Transmission Line Macromodels In this approach, transmission-line networks described by Telegrapher's equations (partial differential equations) are translated into a set of ordinary differential equations (known as the macromodel), through some kind of discretization. The conventional approach ' for discrete modelling of distributed interconnects is to divide the line into segments of length Ax, chosen to be small fraction of the wavelength. If each of these segments (assume that the line is discretized into 'M' segments) are electrically small at the frequencies of interest (i.e. Ax = L/M « X), then each segment can be replaced by lumped models. Generally lumped structures used to discretize MTL equations contain, the series elements : L(x)Ax and R(x)Ax, and shunt elements: G(x)Ax and C(x)Ax. (L(x), R(x), G(x), C(x) are the per unit length inductance, resistance, conductance and capacitance of the line, respectively (Fig. 1).
4.1
Distributed v/s Lumped: Number of Lumped Segments
It is often of practical interest to know how many lumped segments are required to reasonably approximate a distributed model. For the purpose of illustration, consider LC segments, which can be viewed as low pass filters. For a reasonable approximation, this filter must pass at least some multiples of the highest frequency fmax of the propagating signal (say ten times, / 0 > 10/ max ). In order to relate these ' , we make use of the 3-db pass band of the LC filter given by
M. Nakhla & R. Achar
288
where d is the length of the line. From (1), we have fmax = 0.35/r r using (50), we can express the relation f0^10fmax
and
in terms of the delay of
the line and the rise time as — , > 10 x 0.35/r r , or •Kid
r
tr>3.5(nxd) = \0Td
(51)
In other words, delay allowed per segment is 0.1 r r . Hence the number of segments (AO is given by N = (\0xd)/tr
(52)
In the case of RLC segments, in addition to satisfying (51), the series resistance of each segment must also be accounted. The series resistance Rd representing the ohmic drop should not lead to impedance mismatch which can result in excessive reflection within the segment 2 ' 3 . However, one of the major drawbacks of the above conventional discretization is that it requires large number of sections, especially for circuits with high-operating speeds and sharper rise times. This leads to large circuit sizes and the simulation becomes CPU inefficient. In order to overcome these difficulties, several techniques for efficient discretization are proposed in the literature. These methods can be broadly classified, based on the passivity property as follows. (1) Macromodels with no guarantee of passivity: a sample of such techniques is the Method of Characteristics (2) Macromodels with guaranteed passivity by construction: a sample of such techniques are Integrated Congruent Transform and Exponential Pade based Matrix-Rational Approximation.
4.2
Method of Characteristics
The method of characteristics (MC) transforms partial differential equations of a transmission line into ordinary differential equations containing time-delayed controlled sources.
Recent Advances in Modeling and Simulation
of High-Speed Interconnects
289
Consider the case of two conductor transmission lines, as shown in Fig. 5a. An analytical solution, in terms of y-parameters for (6) or (7) can be derived 45
as 1
/ = YV,
z0(i-e-2^)
(53) -2e~Vd
1 + e-2^d
V-
where y is the propagation constant on the line, and Z 0 is the characteristic impedance. V-y and /j are the terminal voltage and current at the near end of the line, V2 and l2 are the terminal voltage and current at the far end of the line. The y-parameters of the transmission line, are complex functions of s, and in most cases cannot be directly transformed into an ordinary differential equation in the time domain. The MC succeeded in doing such a transformation, but only for lossless transmission lines. Although this method was originally developed in the time domain using what was referred to as characteristic curves (hence the name), a short alternative derivation in the frequency domain will be presented here. By re-arranging the terms in (53) we can write, Vj = ZQ/J + g-Yd[2V2 - e-yd(Z0Ix + Vj)]
(54)
V2 = Z 0 / 2 + e-Jld[2V1 - e~yd(Z0I2 + V2)] Next, (54) can be re-written as (55)
v2-z0i2 = w2 where
wx = e-yd[2v2-e-yd(z0il W2 =
e-Y
d
+ vl)]
(56)
[2V1-e-^(Z0/2+y2)]
Using (54) and (56), a recursive relation for Wj and W2 can be obtained as
290
M. Nakhla & R. Achar
w, W2 =
e-vd[2V? - WU (57)
e-yd[2Vl-Wl]
A lumped model of the transmission line can then be deduced from (54) and (57), as in Fig. 5b.
hit)
Ut)
Vj(0
vM) x = d
(a)
(b)
Fig. 5. Macromodel using Method of Characteristics If the lines were lossless (in which case the propagation constant is purely imaginary; y = y'(3), the frequency domain expression (57) can be analytically converted into time-domain using inverse Laplace transform as wx(t + i) 2v2(t)-w2(t) (58) w2(t + x) = 2 v 1 ( 0 - w 1 ( 0 4.3
Exponential
Pade based Matrix-Rational
Approximation
This algorithm directly converts partial differential equations into timedomain macromodels based on Pade rational approximations of exponential 41,42,89 . In this technique coefficients describing the macromodel are matrices computed a priori and analytically, using closed-form Pade approximant of exponential matrices. Since closed-form relations are used, this technique doesn't suffer from the usual ill-conditioning experienced with the direct
Recent Advances in Modeling and Simulation
of High-Speed Interconnects
291
application of Pade approximations. Hence it allows higher-order of approximation. Also it guarantees the passivity of the resulting Macromodel. Consider the exponential form of Telegrapher's equations describing the multiconductor transmission lines, given by (36), V(d, s) = e I(d, s)_
D
V(0, s)
Z = (D + sE)d;
7(0, s)_
0 -R -G 0
E=
0 -L -C 0
(59)
2 where d is the length of the line. The matrix e is approximated using matrix-rational function as {Z)e^QNM{Z) (60) N,M where PN M(Z) and QN M(Z) are polynomial matrices expressed in terms of closed-form Pade rational functions as rn.MW
- L(M
(M + N-JVM + N)\j\(N-jy}
j '
i =o M
UN,MW
~ L(M
(M+N-j)\M\ j + N)\j\(M-j)r
j=o
Several recursive relationships exist for PN M(Z) and QN M (Z) such as
(61)
M. Nakhla & R. Achar
292
P
N,M&)
P
N,M(Z)
~
P
N-UM(Z)
+ Z{{M
= PN,M-l(Z)
P
+ N){M
+
+ N)(M
+
N_1J
+Z
l(M
N-I,M-I(Z)
P
N_1y)
N-l,M-l(Z) (62)
QN,M(Z)
= QN- 1, « ( Z ) + Z{(M
2/v, M ( Z ) = 2/V,M-l( Z ) + Z ^ M
+ N^M
+
/ v - 1 )J QN~ 1. M- 1 ( Z )
+ A f ^ M + A ,_ 1 ^]2A'-l,A/-l(
The fact that the coefficients PNM(Z)
Z
)
and g w M (Z) are known a priori in
closed-form, provides substantial computational advantage for the proposed algorithm. The Pade rational function of (61) for M = N = n can be represented in terms of subsections, obtained by the pole-zero pairs as o =
[Pn,n(Z)]~lQnyn(Z)
=
n/2
T] l(aiU-Z)(ai*U-Z)]'i[(aiU
+ Z)(a*U + Z)](63)
i = 1
for even values of n, and
^, n ( z )]"'e n ,„( z ) 9 = (n-l)/2 _1
[a 0 f/-Z] [a 0 t/ + Z]
Y\
[(a,.f7-Z)(a 1 *t/-Z)r 1 [(a,t/ + Z)(a,*f/ + Z)] (64)
i = l
for odd values of n, where U represents the unity matrix, ai = xt +jyt are the complex roots for (' > 0 and a0 is a real root. The symbol
represents the
complex conjugate operation. It is to be noted that the PN M(Z)
is a strict Hurwitz
polynomial.
This means
coefficients
a{ and aQ in (7) and (8) are positive
definite
The hybrid stencil for such an i subsection can be written as
polynomial that
the
constants.
Recent Advances in Modeling and Simulation
[Pn,n(Z)],-+,
Vi+1 ',
+
of High-Speed Interconnects
lQn.«Wl
293
(65)
l
Proof of passivity of the macromodel can be found in 4 1 . Also the extension of the closed-form matrix-rational approximation based technique to handle frequency-dependent parameters can be found in . 4.4
Delay
Extraction
Although algorithms based on method of characteristics (MC) provide fast solutions for long low loss lines, they do not guarantee the passivity of resulting macromodels. This was substantiated by several numerical tests which showed that the MC can lead to non-passive macromodels. Moreover, it has been widely shown in the literature that, the transient analysis of a nonpassive macromodel with other passive circuit elements may lead to spurious oscillations. On the other hand, algorithms based on matrix rational-function approximations (MRA) guarantee the passivity of macromodels. However, in the presence of large delay lines (e.g. long lines with small losses), this may require high-order approximations (to accurately capture the flat delay portion) leading to inefficient transient simulation. To address the above issues, a new algorithm for passive macromodeling of transmission line subnetworks was recently introduced . It employs a mechanism for delay reduction prior to performing the matrix rational approximation. The new algorithm leads to significantly lower order macromodels for long lossy coupled lines. A brief discussion of the algorithm is given below. Using perturbation and assuming that ||A|| «||i m a x B| (where smax corresponds to the maximum frequency of interest), (36) can be approximated as 160
e(A
+ sB)
- em
UeCk, k=
(Ck = f(A,*))
(66)
1
where IJCJ » |C 2 | » ... ||C m |. It is to be noted that, the Lie product formula 161
M. Nakhla & R. Achar
294
provides a systematic alternative approach to obtain (66) with an error estimation (e ) and is given by (A + sB)
U-tJ \
+ e„
sB
A
m
m
• e
e
lk.NO
;
(67)
*= l
Next, a theorem is introduced which enables more accurate delay extraction by modifying the Lie product formula given in (67). Theorem: The product S
A „
+ £„
A A
2m
Qt = e
m
2m
(68)
e e
* =1 +s
converges asymptotically to e this case is given by max
a s m - ^ « . The associated error (em) in (A+sB)
(69) m Equation (68) henceforth is referred to as Modified Lie Formula - I. If ||A|| « |5 majc B| (which is the case for long low lossy lines), then an alternative form for (68) can be used with a reduced error, and is given by (referred to as Modified Lie Formula - II):
UQ> = 0
0<S<S„
k= 1
ne* k-
i B A £B 2m m 2m • e e e
+ £„
(70)
1
Also it can be proved that, average of the approximations in (68) and (70) as given by (referred to as Modified Lie Formula - III): / A
ne* + e„ k = 1
s
e*
S
JL A
2m Ve
m
e
A. 6.
2m
e
2m m
+e
e e
±B.\ 2m
(71)
J
further reduces the error (note that exact error estimates of each of these formulae are derived, however not given here due to lack of space). Fig. 6. demonstrates numerically, the accuracy comparisons of Lie's formula and proposed approximations (Modified Lie Formula - 1 , II and III), for a typical set of line parameters.
Recent Advances in Modeling and Simulation
4.4.1
Frequency-Dependent
of High-Speed Interconnects
295
Parameters
In the case of frequency-dependent parameters, it is ideal to extract a maximum delay, without affecting the transmission line causality conditions. Let B(s) = B{s)-Bmax. In general a logical choice for Bmax is B(°°). Next, (68), (70) can be modified as: m
sB
A(s)+sB(s)
n e
t
+ em;
Qk^e
max A(s) + sB(s)
e
e
(72)
k= 1 m
n e * + em;
sB
max A(s)+sB(s)
Qk = e
e
sB
max
e
(73)
k= 1
The products represented by (72)or (73) can be viewed as a cascade of m transmission lines. In addition, each of the kth product term can be viewed as a cascade of lossy and lossless transmission lines. The lossy terms are macromodeled using the passive matrix rational approximation. The resulting macromodels are of significant lower orders (since a significant delay portion is already extracted from these terms). They are later combined with the lossless terms using the method of characteristics approach. For example, each Qk in (73) can be realized as shown in Fig. 7 It is to be noted that passivity of the entire macromodel is now guaranteed as the passivity of each sub-line in (73) is preserved. Based on the knowledge of the norm of A and B line parameter matrices and the maximum frequency of interest, approximation represented by (72)or (73) is selected and required order (m) satisfying the pre-defined error tolerance can be determined using (69). It is to be noted that for relatively short lossy lines, delay extraction may reduce the efficiency of the MRA macromodel. Based on the knowledge of line parameters and error-estimates of MRA and modified Lie formulae, a criterion has been developed to select the appropriate macromodel (i.e., MRA or MRA with delay extraction).
M. Nakhla & R. Achar
296
Lie Formula Modified Lie Formula I Modified Lie Formula II Modified Lie Formula III
order (m) Fig. 6. Comparison of Error Estimates of Lie Formula and Modified Lie Formulae -1, II and III.
sB
max 2m
e (Lossless)
A(s) + sB(s) m
e (MRA)
sB
max 2m
e (Lossless)
Fig. 7. Macromodel Realization of the Product Terms in (73).
5. Model-Reduction Based Simulation Algorithms Interconnect networks generally tend to have large number of poles, spread over a wide frequency range. Even though majority of these poles would normally have very little effect on simulation results, however, they make the simulation to be CPU extensive by forcing the simulator to take smaller step sizes. 5.1
Dominant Poles
Dominant poles are in general those, which are close to the imaginary axis and significantly influence the time as well as the frequency characteristics of the system. The moment-matching techniques (MMTs) 77 capitalize on
Recent Advances in Modeling and Simulation
of High-Speed Interconnects
297
the fact that irrespective of the presence of large number of poles in a system, only the dominant poles are sufficient to accurately characterize a given system. A brief mathematical description of the underlying concepts of momentmatching techniques is given below. Consider a single input/single output system and let H(s) be the transfer function. H(s) can be represented in a rational form as H(s) - ^
(74)
where P(s) and Q(s) are polynomials in s . Equivalently, (74) can be written as Np k H(s) = c+ Y —'-
(75)
^ s-Pi i=o
where P • and kt are the i pole-residue pair, N
is the total number system
poles and c is the direct coupling constant. The time-domain impulse response can be computed in a closed form using inverse Laplace transform as h(t) = c8t + £ *,./''
(76)
i=0
In case of large networks, N , the total number of poles can be of the order of thousands. Generating all the N poles will be highly CPU intensive even for a small network and for large networks it is completely impractical. Model-reduction techniques address the above issue by deriving a reducedorder approximation H{s) in terms of dominant poles, instead of trying to compute all the poles of a system. Assuming that only L dominant poles were extracted which give a reasonably good approximation to the original system, equation (74) and the corresponding approximate frequency and time responses can be written as
M. Nakhla & R. Achar
298
L
l
H(s) •* H(s) = %£ = Z + Y —'-*• QW t?0'-Pi
(77)
L
h(t)~h(t)
= c5r+ ]T k/'
(78)
1=0
5.2
Moments of the
response
Consider the Taylor series expansion of a given transfer-function, H(s), at point, 5 = 0, ZJI x
ru s
T„ s
(H(s)){1)
H(s) ~H(s) = H(s) + s^—Yr—
2(H(s)f2) + s
n(H(s))in)
,_m
—9i — + ••• + s *•—*-p—
(79)
where the super-script (n) denotes the nth derivative. Using a simpler notation, we can re-write(79) as " (0 H(s) ~H(s) = m0 + m1(s) + m2s +...+mns = V s mf, mi = —^r~— (80) ( = 0
The coefficients of Taylor series expansion, (w(.) are also identical to the time-domain
moments
of the impulse response h{t). This can be easily
seen by using the inverse Laplace transform of h(t)
,
2 2
st H(s) = jh(t)e~s'dt = \h(t) , l-«+-2J-"
dt
2,
= \h(t)dt + s\{-\)th{t)dt o
o
+ s2\t-^dt+... o
= £ 1=0
s\Q^\tlh(t)dt o
(81)
Due to this analogy, the coefficients of Taylor series expansion, (m,), are generally referred as moments. It has been shown that the moments provide an estimation of delay and rise times 59 ' 60 . Elmore delay 59 , which approximates the mid-point of the
Recent Advances in Modeling and Simulation of High-Speed Interconnects
299
monotonic step response waveform by the mean of the impulse response, essentially matches the first moment of the response. This can be considered as one of the basic forms of approximation. However, in order to get accurate prediction of interconnect effects, it is essential that the reduced-order model must match (or preserve) as many moments as possible. Several algorithms can be found in the literature for reduction of large interconnect subnetworks . They can be broadly classified into two categories: (1) approaches based on explicitly matching the moments to a reduced-order model, (2) approaches based on implicitly matching the moments. The techniques such as AWE belong to the first category and are discussed in Section 6. Techniques such as PVL, PRIMA, which are based on Krylov subspace formulation, belong to the second category and are discussed in Section 7.
6. Model-reduction based on explicit moment-matching These techniques employ Pade approximation, based on explicit momentmatching to extract the dominant poles and residues of a given system 6 1 " 6 5 . 6.1
Pade
Approximation
Consider a system-transfer function
H(s)
which is approximated by a
rational function H(s) as an + a,s + a1sL+ ... +a,sL Pj(s) 2 H(s) = H(s) = -^ ! - £ - = -fjL \+bxs+...
where a0, ...,aL,bv
...,bM
+ bMsM
(82)
QM(S)
are the unknowns (total of L + M+ 1 variables).
Consider the Taylor series expansion of H(s) at (s = 0 ) , in terms of moments and match it to the rational function approximation given in (82) (hence is the name, moment-matching techniques (MMTs), which is also known as Pade approximation) as follows: an + a,s + a-,sL + ...+a,sL M
1 + bxs + ... + bMs
2
-mjTm^Tmjj
+ m
-r . . . -r ,nn. {L + +
mMsy
L+M
(83)
M. Nakhla & R. Achar
300
Cross multiplying and equating the coefficients of similar powers of 5 starting from s to s on both sides of (83), we can evaluate the denominator polynomial coefficients as m
L-M+
m L-M
m
1
L-M+2
+2 m
•-mL
L+l
m,
b,, 'M
m L+ 1
'M-2
+
m
L+ 1
mL + 2
(84)
mL + M
M-l
The numerator coefficients can be found by equating the remaining powers of s
(from s
to s ) as "0 -
"'0
<2j = m j + b^rriQ
(85) min(L, M) b m
i L-i
1 =1
Equations (84) and (85) yield an approximate transfer function in terms of rational polynomials. Alternatively, an equivalent pole-residue model can be found as follows. Poles pt are obtained by applying a root-solving algorithm on denominator polynomial Q(s). In order to obtain kt, the approximate transfer function given by (77) is expanded using Maclaurin series as H(s) = c- £ s n = 0
•n + 1
V; = oP,-
Comparing H(s) from equations (80) and (86), we note that
(86)
Recent Advances in Modeling and Simulation
mn-cj = 0
of High-Speed Interconnects
301
Pi
(87)
(0 < i < 2L)
•oPi
Residues can be evaluated by writing the equations in (87) in a matrix form as —1 Pl —2
Pl
—1
Pl —1 Pl
—1 PL
-1
--2 PL
0
M0
(88) M
—
L-
—
L-1
Pl
—
PL
L-1
0
L-1 M,
In the above equations c represents the direct coupling between input and output. There are more exact ways to compute c 6.2
Computation
of
Moments
Having outlined the concept of MMTs, we need to evaluate the moments of the system, which are required by (84) - (88). Consider the simple case of lumped circuits and the corresponding MNA equations represented by (42). Expanding the vector X(s) using Taylor series, we have [G+SW] M0 + Mts + M2s +
H
(89)
where Mi represents the i moment-vector. Equating coefficients of similar powers of s on both sides of (89) we obtain the following relationships
302
M. Nakhla & R. Achar
GM0 = ,
(90) ;/ > 0
GM, = -WM.1 - 1
Above equations give a closed form relationship for the computation of moments. The moments of a particular output of interest (which are represented by mt in equations (82) - (88), are picked from moment-vectors M; . As seen, (90) requires only one LU decomposition and few forward/ backward substitutions during the recursive computation of higher order moments. Since the major cost involved in linear circuit simulation is due to LU decomposition, MMTs yield very high speed advantage (100 to 1000 times) compared to conventional simulators. 6.3
Generalized
Computation
of
Moments
In the case of networks containing transmission lines, moment-computation is not straight-forward. A generalized relation for recursive computation of higher-order moments can be derived as follows ' ' ' 7 1 ' 7 4 . Consider (46), expanding \\i(s) and X(s) in Taylor series, at an expansion point s = a, we have74 V v|/(a) + -
(«)|
0)1
1!
M0 + Ml(s-a)
where \|/
V
- ( s - a ) + ... + •
denotes the n
+ ...+
Mn(s-a)
-{s-a)
H
(91)
derivative of \|/(.y), and Mn denotes the nth
moment of X(s) at 5 = a . Equating coefficients of similar powers of s = a on both sides of (91), we have WM0 = b WMn
= - £ r= 1
(V (r) )M„. (92)
Recent Advances in Modeling and Simulation
of High-Speed Interconnects
Generalizing (92), a recursive relation for any n be obtained as
303
higher-order moment can
[\y(a)]M0 = b (93)
" ( V I )Mn r= 1
It can be seen that the coefficient on the left hand side of (93) does not change during higher-order moment computation. Hence it requires only one LU decomposition and n forward-backward substitutions to compute n moments. Also it is easy to note that, the lumped networks are a special case of (93) (where \|/ r ) = 0 for r>2 in which case (93) reduces to the form given by (90)). Relation (93) requires the derivatives of (\\i). These can be obtained using (45) as
'wn w
W
0
=
0 ;
yA
[V]
(r)
0 (r>2)
=
?A
0
(94)
0
Using (45), transmission line moments can be computed as The derivatives Tp can be obtained as a function of the derivatives of the entries on the RHS of (38) and proper application of Leibnitz's theorem. However, this requires the derivatives of the exponential stamp represented (37). A brief review of computation of these derivatives 6.4
Transmission
Line
is given below.
Moments
Consider the exponential stamp represented by (37). We wish to expand the exponential matrix in Taylor series, as follows: e
(D + sE)d
„
„
= Fn0 + F,s +
+ Fnsn
(95)
From the property of matrix exponentiation of an arbitrary matrix A , we have
304
M. Nakhla & R. Achar A 1 A A A' e = 1 + 1! — +2!— +... +n\•
(96)
A = (D + sE)d
(97)
Let
Hence (96) can be re-written as (D + sE)d _ 6
1 ;
(D + sE)d 1!
|
({D + sE)df 2!
f
'"
+
|
((D + sE)d)n n!
(98)
Expanding the RHS of (98) further, and collecting the terms in powers of s, we have I
(D + sE)d
Dd
1 „2 ,2
+ j | (£>£ + £Z>)/ + | ; (D 2 £ + Z>£D + ED2) d3 +
^
^ £ 2 d 2 + ^{D3E + EDE2 + E2DE + E3D)d3 + . (99) Equating (95) with (99) gives „ / Dd 1„2.2 F o = ^ + -f7 + 2 l Z ) d + -
=f
„ o,o
+ /?
„ „ o,i+fo,2 +
Fj = ^ + j-(DE + ED)d2 + ^(D2E + DED + ED2)d3 + . = FL0
+
F
l.l+Fl,2+-
F2 = ^E2d2 + ^(D3E + EDE2 + E2DE + E3D)d3 + ... F
2, 0 + F2, 1
+ F
2, 2 + '
(100)
and so on. From the above results, a recursive relationship for generating transmission line moments can be obtained as
Recent Advances in Modeling and Simulation
F
i = X FiJ>
(DFii_,+EFl_li)d
F
uj =
J
J
i+j
of High-Speed Interconnects
:
305
<* <W *°> (' +» * 0
J=o
FUJ = 0
(i<0 o r ; < 0 ) ;
F0>0 = /
(1Q1)
Convergence of (101), in practice requires 20-30 terms. It is to be noted that the convergence of the series represented by (96) can suffer, if for the first few terms, An grows quicker than n!. In order to control this problem, note that the growth of An depends on its eigenvalues. If all the eigenvalues of A are within the unit circle in the complex plane, then An will decay with increasing n, leading to fast convergence. From (96) one can see that, the eigenvalues of A can be controlled by varying the length d. By restricting d to be small enough, such that the eigenvalues of (D + sE)d will also be small (over a given frequency range), so as not to cause truncation errors or slow convergence. This can be achieved efficiently, by noting that gP + sE)d\ nD + sE)d\ i 2 ) K 2 J
(D + sE)d
e
=e
e
(102)
In other words, moments of a line can be generated by squaring half-line moments. Let <6 represent the half-line moments, then (D + sE)d
e
_
„
n
' = F0 + F1s+ ... +Fns
= (O 0 + OjS + ... +
(103)
which will give,
(=0
The line can be subdivided by power of 2 (i.e, 2 sections, four sections, 8 sections...) and the moments of the smallest section that meets the convergence requirements are calculated. From these, the moments of the entire line can be recursively calculated with the help of (104).
M. Nakhla & R. Achar
306
6.5
Limitations of Single Expansion MMT Algorithms
Obtaining a lower-order approximation of the network transfer function using a single Pade expansion is commonly referred as Asymptotic Waveform Evaluation (AWE) in the literature. However, due to the inherent limitations of Pade approximants, MMTs based on single expansion often give inaccurate results. The following is a list of those properties which have the most impact on MMTs. • The matrix in (84) (which is known as Toeplitz matrix) becomes increasingly ill-conditioned as its size increases. This implies that one can only expect to detect 6 to 8 accurate poles from a single expansion. • Pade often produces unstable poles on the right hand side of the complex plane. • Pade accuracy deteriorates as we move away from the expansion point. • Pade provides no estimates for error bounds. In addition, there is no guarantee that the reduced-model obtained as above is passive. Passivity implies that a network cannot generate more energy than it absorbs, and no passive termination of the network will cause the system to o r Q'j
go unstable . The loss of passivity can be a serious problem because transient simulations of reduced networks may encounter artificial oscillations. In systems containing distributed elements the number of dominant poles will be significantly higher, and it is very difficult to capture all of them with a single Pade expansion. This lead to the development of multi-point expansion techniques such as complex frequency hopping (CFH), which are summarized in the next section. 6.6
Complex Frequency Hopping
CFH extends the process of moment Matching to multiple expansion points (hops) in the complex plane near or on the imaginary axis using a binary search algorithm . With a minimized number of frequency point expansions, enough information is obtained to enable the generation of an approximate transfer function that matches the original function up to a predefined highest frequency of interest. Using the information from all the expansion points, CFH extracts a dominant pole set as illustrated in Fig.
Recent Advances in Modeling and Simulation of High-Speed Interconnects
307
8(b). In addition, CFH provides an error criterion for the selection of accurate poles and transfer functions. 6.6.1
Selection and Minimization of Hops in CFH
A Pade approximation is accurate only near the point of expansion and its accuracy decreases as we move away from the point of expansion (hop). In order to validate the accuracy of such an approximation, at least two expansion points are necessary. Accuracies of these two expansions can be verified by matching the poles generated at these two hops (referred as pole-matching based approach). Alternatively, the two hops can be verified for their accuracy by comparing the value of the transfer functions due to both these hops at a point intermediate to them (referred as transferfunction based approach) . CFH relies on a binary search algorithm to determine the expansion points and to minimize the number of expansions. The steps involved in the binary search algorithm for both the above approaches are summarized below.
Radius of convergence of Pade expansion at .s=0
Real (a) Dominant poles from AWE
(b) Dominant poles from CFH Fig. 8. Illustration of CFH
308
6.6.2
M. Nakhla & R. Achar
Transfer
Function
Based
Approach
In this approach the transfer functions obtained at various hops (expansions) are used to ensure the accuracy of the reduced-order model up to the highest frequency of interest. Steps involved in the algorithm are given in Fig. 9. Note that the computational effort needed for a comparison as required by Step 5 is trivial as the responses can be computed in a closed-form using the transfer functions generated in Steps 2 & 3. Here e[h is a per-defined threshold relative error in the transfer functions. At the completion of the binary search algorithm, a set of transfer functions are generated. When evaluating the frequency response at a frequency point a, only the transfer function which is valid in the region containing a is used. This is repeated for all other frequency points to obtain the frequency response of the system. 6.6.3
Pole-Matching
Based
Approach
In this approach poles of the transfer function are explicitly evaluated at each hop and the hops are verified for their accuracy by comparing the poles from two adjacent hops using a binary search algorithm. If a matching pole is found between two adjacent expansions, then the binary search is stopped. The distance between the matching pole and the expansion point under consideration defines the radius of accuracy for the corresponding expansion. All the poles which are within the radius of accuracy are treated as accurate poles and are retained in the final pole-set. The poles which are outside the radius of accuracy are considered as inaccurate poles and are discarded. Once a set of dominant poles are obtained, residues of the system are obtained using (88). Further details of CFH and its search algorithms can be found in 7 4 and
75
.
Recent Advances in Modeling and Simulation of High-Speed Interconnects
Step 1:
Set fL
= 0 and
fH
= fmax
Step 2:
Expand system's response at frequency
309
. fL
= 0.
coefficients of the corresponding transfer function
Determine
H^(s)
the
using (84)
and (85). Step 3:
Expand system's response at
fjj
= fmax
cients of the corresponding transfer function 'Step 4:
Set
fmid
= -(fL
Hfjij^fmitj)
+fR)
.
Calculate
using the transfer function
• Determine the coeffiHH(s) . HL(j2llfmid) coefficients
and obtained in
Steps 2 and 3. -Step 5:
If
\HH(j2nfmid)
expand at fm^ Step 6:
- HL(j2nfmid)\ and obtain Hmid(s)
< e,fc , Go To Step 6. Otherwise .
If the threshold condition specified by step 5 is satisfied, STOP. ELSE repeat steps 2-5 between every two consecutive frequency points (e.g., between fL & fmid and fmid & fH ) .
Fig. 9. Transfer function based binary search algorithm
7.
Model-Reduction Based on Krylov-subspace Techniques
The direct moment-matching techniques such as AWE have some disadvantages associated with them. First one among them is the illconditioning associated with the moment-matrix. Due to this difficulty, the number of good poles that could be extracted from any expansion point is generally fewer than 10 poles. The second major difficulty is that, they do not guarantee the passivity of reduced-order models. In order to address these difficulties, a parallel class of algorithms, which can be classified as indirect moment-matching techniques were developed 7 9 - 9 °.
M. Nakhla & R. Achar
310
These algorithms are based on what is known as Krylov-subspace formulation and Congruent transformation. One of the main features of these algorithms is that they construct the reduced-model based on the extraction of leading eigenvalues (those with the largest magnitude) of a given system (on the contrary, the reduced models from the CFH technique is based on extracting the dominant poles of a given system). In the rest of this section, we will describe the concept and important features of these algorithms. 7.1
Preliminaries
Recall from Section-3, the time-domain MNA and the corresponding output equations can be represented in the form: Cx{t) + Gx{t) = Bu(t);
C,Ge9\nXn;
w = L x(t);
Le SR"X
5e9tnxl
xe9inxl / 1Q5 X
where n represents the total number of MNA variables. Pre-multiplying both-sides of (105) by G , we can write Ax(t) = x(t)-Ru(t);
A = -G~lC,
w = LTx(t)
R = G"' B (106)
Taking the Laplace transform of (106), we can write sAX(s) = X(s)-RU(s) W(s) = LTX(s)
(107)
Rearranging (107), we can write the transfer function Y(s) of the given system as Y(s) = ^=LT(I-sA)-lR U(s) where / is an identity matrix.
(108)
Recent Advances in Modeling and Simulation
of High-Speed Interconnects
7.1.1 Why direct Pade based approximation conditioned?
(moment-matrix)
311
is ill-
Consider the transfer-function of a system, as represented by (108). Expanding it in terms of Taylor series, we have Y(s) = LT(I + sA + s2A2 + 53A3 + ... + sqAq)R 1
sk(LTAkR)
= £ k = 0 1
k
= V s mk *=°
T
where
k
mk = L A R (109)
Ideally, increasing the order-of the Pade approximation (which is equivalent to matching more number of moments), should have given us better approximation results. However, in practice, this is true only up to very limited order, beyond which Pade approximation will not yield any better results 7 9 , 1 5 9 . This can be explained by examining the nature of higher-order moments, which are given by mk = L A R.
As can be seen, when
successive moments are explicitly calculated, they are obtained as powers of A. With the increasing values of '&', this process quickly converges to an eigenvector corresponding to an eigenvalue of A with the largest magnitude. As a result, for relatively large values of '&', the explicitly calculated moments mk,mk+x,mk + 2, ..., will not add any extra information to the moment-matrix, as all of them contain information only about the largest eigenvalue. In other words, the rows beyond '£' of moment-matrix are almost identical (or parallel to each other) making the matrix illconditioned. 7.1.2
Relationship
between eigenvalues and poles of the system
In this section we will show the correspondence between the leading eigenvalues and poles of the system. It is important to understand this concept as the Krylov-subspace based techniques obtain the reduced-models
312
M. Nakhla & R. Achar
by extracting the leading eigenvalues of a given system. Consider (106), and assume that the matrix A can be diagonalized in the form .,-1
(HO)
FXF
where X = diag\Xx X2 ... XA is a diagonal matrix, whose diagonal elements represent the eigenvalues of matrix A. The matrix F contains the eigenvectors of matrix A . Using (110), the transfer-function represented by (108) can be re-written as Y(s) = LT(I-sFXF
*) 1R
= LTF(I-sX)
\F *R
1 \-sXl = LTF
F 1R 1
1-^J
[
(HI) which can be simplified as *i Y(s) = £1—sXj
(n/X,.)
=• ^s5 - ( l / X ) J
(
k,
(112)
= 1; *-'s-p
i
where n ( is a function of eigenvectors of matrix A , kt represent the residues. From (112), we can draw following inferences: (a) poles pt are the reciprocal of eigenvalues of matrix A ; the leading eigenvalues (those with largest magnitudes) correspond to the poles closer to the origin, (b) the transfer function of Y(s) can be easily obtained in terms of poles and residues, once the eigenvalues and eigenvectors of A are available. However, for large interconnect circuits, it would be impractical to compute all the eigenvalues and eigenvectors. Hence in the following sections, we will review some of the efficient techniques to extract leading eigenvalues.
Recent Advances in Modeling and Simulation
7.2
Computation
of Eigenvalues
of High-Speed Interconnects
of Matrix
313
'A'
In general, the numerical computation of all the eigenvalues and eigenvectors of a given matrix A becomes exceedingly expensive as its size gets above few hundreds. The general approach in such cases is to approximate A with a smaller matrix A , such that the eigenvalues of A axe reasonable approximation of the leading eigenvalues of A . Due to the relatively small size of A, finding its eigenvalues will be a much simpler problem, than finding the eigenvalues of A . Next, we will review some of the basic matrix forms , which would be helpful in understanding the eigenvalue computation algorithms presented in this section. Upper Hessenberg Matrix: A matrix H is called Upper Hessenberg if Hy = 0
for
( ( > ; ' + 1 ) . For example, consider an upper Hessenberg
matrix of order q, having the following form (which is known as companion form)
H
0 1
0 0
0 0
0
1 0
0
. . 0 . . 0 -c .
(113)
. 0 . 1 One of the important advantages of the above companion form is that, its characteristic polynomial, p(x), can be analytically computed and is given by
PM = X ctx
i-i
(114)
i = l
The roots of p(x) give the eigenvalues of H. Orthogonal Matrices: A real square matrix Q is orthogonal if Q Essentially this implies QQT = QTQ = i
=Q • (115)
M. Nakhla & R. Achar
314
All columns, qi (or rows) of orthogonal matrices have unit two norms or INI 2 =
1
( w m c n i m P n e s that qt q{ = 1) and are orthogonal to one another
(which means that q{ q, = 0). QR decomposition: Let AT be a mxn matrix with m>n . Suppose that K has full column rank. Then there exists unique m x n orthogonal matrix Q and a unique upper triangular matrix R with positive diagonals (ru > 0) such that K = QR. There are several techniques (such as modified GramSchmidt orthogonalization process) available in the literature, for this purpose
159
.
Next, consider the circuit equations (106), and a simple similarity transformation as follows AK = KHq (116) where the transformation matrix K is defined as K
=[R
and H
has the upper Hessenberg
Obviously, since H
(11?)
AR ... A'*] companion
form
discussed above.
is related to the matrix A through a similarity
transformation, its eigenvalues are the same as that of A . Although it looks straight-forward, this approach has the following limitations: Computation of H
using the relation (116) ( Hq = K~ AK ) requires the
inverse of the matrix K. However, AT is a dense matrix and hence computation of its inverse will be expensive. Also, K is likely to be illconditioned since the columns of K are formed based on the sequence AlR , which as shown in Section 7.1, quickly converges to the eigenvector corresponding to the largest eigenvalue. In the next section, we will describe general techniques to overcome these problems. These algorithms belong to a class of methods known as Krylov-subspace techniques.
Recent Advances in Modeling and Simulation
7.3
of High-Speed Interconnects
315
Krylov-Subspace methods for Iterative Computation of Eigenvalues
We will start by replacing the matrix A'in (116) with an orthogonal matrix Q such that for all q, the leading q columns of K and Q span the same space. This space is called a Krylov subspace, and is denoted by K{A,R, q). In other words, any vector which is a linear combination of the leading q columns of K can be expressed also as a linear combination of the leading q columns of Q. Mathematically we will express this as K(A, R, q) = ColumnSpace ( L
^R
AqR\
= ColumnSpace[Q]
HIS")
In contrast to matrix K, the matrix Q has the following advantages: • Q is well conditioned, —1
T
• It is easy to invert since Q = Q , • Most importantly, we can compute only as many leading columns of Q as needed to get accurate solution (more details about this is covered later in this section). The next question is, how do we get the matrix Ql This can be achieved, by performing QR decomposition on matrix K. Writing K = QR, we can modify (116) as Hq = K~lAK R = =
{QR^AiQR) (R-'Q^AiQR)
QTAQ = RHR'1
=H
i
(119)
Since R and R~ are both upper triangular and Hq is upper Hessenberg, it is easy to prove that the new matrix,
H = RHqR~
, is also upper
Hessenberg. The implications of (119) is that we can reduce the matrix A of
M. Nakhla & R. Achar
316
dimension nxn to a smaller upper Hessenberg matrix H of dimension q x q using orthogonal transformation. In addition, the eigenvalues of smaller system H are approximations of the first q leading eigenvalues of larger system represented by A. Next, we will show that the columns of Q can be computed one at a time giving us the advantage of computing only as many leading columns of Q as needed. One of the popular approaches used for partial reduction of a large matrix to a smaller upper Hessenberg matrix by computing Q, is known in the literature as Arnoldi's given in the next section. 7.4
algorithmi2~^9'159.
Arnoldi Algorithm for (partial)
Assume Q = \ql
q2
More details about this is
Reduction
... q^\ , where q( represents the i column of matrix
Q. From (119) we have AQ = QH
(120)
Recall that all columns, q{ (or rows) of orthogonal matrices have ||#J
= 1
(which implies that q{ qi - 1) and are orthogonal to one another (which means that qi g = 0). Using this information, the first few steps in obtaining the Q and H matrices are outlined below. Since the ||9i||2 = 1 , an easy way to compute it, is to divide the vector R by its magnitude \\R\\2 (we get a unit vector in the direction of R). This step is illustrated in Fig. 10a. qx=R/\\R\\2.
(121)
To determine q2 and the first column of H, we multiply A by the first column of Q. This gives us Aqx, which is the first column on LHS of (120). Equating it with the first column of RHS, we have A 1\ = h\\9\+h2lq2
(122)
Recent Advances in Modeling and Simulation
of High-Speed Interconnects
317
Premultiplying both sides by q[ we have qxAqx = qxAqx
hnqlq1+h21qlq2
= hn
(123)
Knowing the value of hn and using the fact that |g 2 | = 1, we can compute h21 from (122) as ^21 = || A *i -^n^ili •
(124)
The direction for q2 can be obtained using (122) as (illustrated in Fig. 10b) Aqi-hnq1 Vi
(125)
'21
Aqx i
1\
I ^21
>«i
\R\
(a)
-^1
> « 1
(b) Fig. 10. Illustration of steps in Arnoldi algorithm
Similarly the rest of the columns of Q and H matrices can be obtained by generalizing the above steps. Note that we didn't need to explicitly compute the product A R . As a result, we were able to avoid the ill-conditioning problem arising due to the quick
M. Nakhla & R. Achar
318
convergence of the sequence \R
AR
A^R
A3R
to the eigenvector of
the largest eigenvalue. The columns qt computed by Arnoldi algorithm are called Arnoldi vectors. The loop over i updating z corresponds to the modified algorithm
Gram-Schmidt
which subtracts the components in the directions q} to g, away
from z, leaving them orthogonal to z- Computing a total of '&' Arnoldi 9
vectors costs k matrix-vector multiplications involving related cost.
A, plus 0{k n)
There are several alternative methods available in the literature for finding the Krylov-subspace . For example one can use multiple passes of orthogonalization, to increase the robustness of the modified Gram-Schmidt orthogonalization process. To recap, we started with the circuit equations
Cx{t) + Gx(t) = b{t)
T
and
—1
w = L x(t). We formed the product A = G C. Using orthogonal transformation, we were able to determine the leading eigenvalues of A which correspond to the dominant poles of the transfer-function. In the following section, we will show how to use this information to perform circuit reduction. 7.5
Circuit Reduction
Using Arnoldi
Algorithm
Finding the reduced-order circuit equations can be explained by a change of variables in (105) by mapping the vector x of dimension n into a smaller vector x of dimension q {q«n) using the orthogonal matrix Q : x = Qx
(126)
Using (126) we can re-write Laplace-domain circuit equations in (106) as sAQX(s) =
QX(s)-RU(s)
W(s) = LTQX(s)
(127)
Recent Advances in Modeling and Simulation
of High-Speed Interconnects
T
319 T
Pre-multiplying both sides of (127) by Q and using the relation QQ = I, we have X(s) = (l-sQTAQ)'1QTRU(s) W(s) = LTQ(I-sQTAQ)'lQTRU(s)
{ m )
Hence the transfer-function of the reduced system can be written as Y(s) = LTQ(I-sH)'lQTR
(129)
Comparing the original transfer-function Y(s) represented by (108) with the transfer-function Y(s) of the reduced system represented by (129), we can draw the following conclusions. The eigenvalues of Y(s) are given by the eigenvalues of H. However, since the eigenvalues of H are good approximation of the leading eigenvalues of A, we can conclude that the eigenvalues of the transfer function of the reduced system are good approximation of the poles of the original transfer function. An important criterion during the above reduction is the accuracy of the response of the reduced system given by (129). The frequency response of the reduced system (129) is also a good approximation of the frequencyresponse of the original transfer function (108). An indicator for the accuracy of the response of the reduced system is the total number of moments it can preserve (match), for a given order of reduction (q). It can be proved that the reduced system (129) of order q preserves the first q moments of the original network . In essence, we are able to implicitly match the moments and obtain a reduced-model without the need to directly use the moments as in the AWE algorithm. Hence we will not suffer from the same numerical illconditioning which is associated with direct moment-matching algorithms. The accuracy of the Arnoldi approximation gradually increases as the order q is increased since more moments of the original transfer function will be matched.
M. Nakhla & R. Achar
320
A question that may possibly arise here: how are the accuracies of Arnoldi based approximation and direct Pade based approximation are compared? It was shown in section-8 that a Pade approximation of order q matches the first 2q moments. However, an Arnoldi based reduction of order q matches only first q moments . Essentially, this means that, for a comparable accuracy, the reduced-model from Arnoldi will have double the size of the reduced model from direct Pade based approximation (in other words, direct Pade based models are more optimal). On the other hand, due to the illconditioning, direct Pade based approximation can't achieve higher-order approximation, where as Arnoldi based approximation can.
8. Related Topics and Further Reading In addition to the interconnection simulation algorithms discussed here, there are several related topics which may be of interest to the reader. 8.1
Passivity
Preservation
Passivity implies that a network cannot generate more energy than it absorbs, and no passive termination of the network will make the system unstable. Passivity is an important property, because, stable but not passive macromodels can lead to unstable systems when connected to other passive systems. The loss of passivity can be a serious problem because transient simulations of nonpassive networks may encounter artificial oscillations. This is illustrated in Fig. 11, which represents the transient response of a reduced-order macromodel of a large linear RLC circuit, when connected to an external load of 5QKQ. . Several algorithms were proposed in the literature for preservation of passivity during the reduction of interconnect networks 85 " 9 2 .
Recent Advances in Modeling and Simulation
0
0.5
of High-Speed Interconnects
1 Time(seconds)
321
15 -a
Fig. 11. Transient response of a nonpassive macromodel with passive terminations 8.2
Full-Wave
Models
At further sub-nano second rise times, the line cross-section or the nonuniformities become a significant fraction of the wavelength and field components in the direction of propagation can no longer be neglected. Consequently, full-wave models which take into account all possible field components and satisfies all boundary conditions are required to give an accurate estimation of high-frequency effects. However, circuit simulation of full-wave models is highly involved. The information that is obtained through a full-wave analysis is in terms of field parameters such as propagation constant, characteristic impedance etc. A circuit simulator requires the information in terms of currents, voltages and circuit impedances. This demands a generalized method to combine modal results into circuit simulators in terms of a full-wave stamps. References ' ' , provide solution techniques and moment generation schemes for such cases.
322
8.3
M. Nakhla & R.
Measured
Achat
data
In practice, it may not be possible to obtain accurate analytical models for interconnects because of the geometric inhomogeneity and associated discontinuities. To handle such situations modeling techniques based on measured data have been proposed in the literature 12 " 14,100 " 112 . In general, the behavior of high-speed interconnects can easily be represented by measured frequency-dependent scattering parameters or time-domain terminal measurements. However, handling measured data in circuit simulation is a tedious and a computationally expensive process. 8.4
EMI
Subnetworks
Electrically long interconnects function as spurious antennas to pick up emissions from other nearby electronic systems. This makes susceptibility to emissions a major concern to current system designers of high-frequency product. Hence the availability of interconnect simulation tools including the effect of incident fields is becoming an important design requirement. References provide analysis techniques for interconnects subjected to external EM interferences and also for radiation analysis of interconnects. 8.5
Sensitivity
Analysis
Sensitivity analysis involving large interconnect subnetworks can be highly CPU intensive. Model-reduction based approaches provide an efficient means for this purpose 113 " 117 . Acknowledgements The work was partially done while the first author was visiting the Institute for Mathematical Sciences (IMS) and Institute of High Performance Computing (IHPC) in 2003. The visit was supported by the Institute, the National University of Singapore and IHPC.
Recent Advances in Modeling and Simulation
of High-Speed Interconnects
323
References 1. 2. 3. 4.
5.
6.
7.
8.
9. 10. 11.
12.
13.
14.
H. B. Bakoglu, Circuits, Interconnections and packaging for VLSI, AddisonWesley, Reading MA, 1990. H. W. Jhonson and M. Grahaml, High-speed Digital Design, NJ: PrenticeHall, 1993. M. Nakhla and R. Achar, Complete Multimedia Book Series on Signal Integrity, Omniz Global Knowledge Corporation, Ottawa, ON, 2002. W. W. M. Dai (Guest Editor), "Special issue on simulation, modeling, and electrical design of high-speed and high-density interconnects," IEEE Transactions on Circuits and Systems, vol. 39, no. 11, pp. 857-982, Nov. 1992. M. Nakhla and A. Ushida (Guest Editors), "Special issue on modelling and simulation of high-speed interconnects," IEEE Transactions on Circuits and Systems, vol. 39, no. 11, pp. 857-982, May 2000. M. Nakhla and R. Achar, Interconnect Modelling and Simulation, Chapter XVII: The VLSI Handbook, pp. 17.1 -17.29, Editor: W-K Chen, Boca Raton: CRC Press, 2000. E. Chiprout and M. Nakhla, Asymptotic Waveform Evaluation and Moment Matching for Interconnect Analysis, Boston: Kluwer Academic Publishers, 1993. A. Deustsch, "Electrical characteristics of interconnections for high-performance systems," Proceedings of the IEEE, vol. 86, no. 2, pp. 315-355, Feb. 1998. R. Goyal, "Managing signal integrity", IEEE Spectrum, pp. 54-62, Mar. 1994. J. B. Faria, Multiconductor Transmission Line Structures, NY: John Wiley and Sons Inc., 93. N. Nakhla, A. Dounavis, R. Achar and M. Nakhla, "DEPACT: Delay Extraction and Passive Macromodeling of Lossy Coupled Transmission Lines", IEEE Transactions on Advanced Packaging, pp. 13-23, Feb. 2005. D. Saraswat, R. Achar and M. Nakhla, "Passivity Verification and Compensation of Macromodels from Measured data", IEEE Transactions on Very Large Scale Integration, July 2005. D. Saraswat, R. Achar and M. Nakhla, "A Fast Algorithm and Practical Considerations For Passive Macromodeling Of Measured/Simulated Data", IEEE Transactions on Advanced Packaging, pp. 57-70, Feb. 2004. D. Saraswat, R. Achar and M. Nakhla, "Passive Reduction Algorithm for RLC Interconnect Circuits with Embedded State-Space Systems (PRESS)", IEEE Transactions on Microwave Theory and Techniques, pp. 2215 - 2226, October 2004.
324
15. 16. 17. 18.
19.
20.
21.
22. 23.
24.
25.
26. 27.
28.
29.
M. Nakhla & R. Achar
C. Paul, Analysis of Multiconductor Transmission Lines, NY: John Wiley and Sons Inc., 1994. C. Paul, Introduction to Electromagnetic Compatibility, New York: John Wiley, 92. K. C. Gupta and R. Grag, Microstrip Lines and Slotlines, Boston, Artech House, 1996. A. E. Ruehli, "Equivalent circuit models for three dimensional multiconductor systems", IEEE Trans. Microwave Theory Tech., vol. 22, no. 3, pp. 216224, Mar. 1974. A. E. Ruehli, "Inductance calculations in a complex integrated circuit environment", IBM Journal of Research and Development, pp. 470-481, Sept. 1972. P. K. Wolff and A. E. Ruehli, "Inductance computations for complex three dimensional geometries", IEEE Trans on Circuits and Systems, pp. 16-19, 1981. A. E. Ruehli and P. A. Brennan, "Efficient capacitance calculations for three dimensional multiconductor systems", IEEE Trans on Microwave Theory and Techniques, pp. 76-82, Feb. 1973. A. E. Ruehli, "Survey of computer-aided analysis of integrated circuit interconnections", IBM Journal of R&D pp. 626-639, Nov. 1979. A. E. Ruehli and H. Heeb, "Circuit models for three dimensional geometries including dielectrics", IEEE Trans. Microwave Theory Tech., pp. 1507-1516, Mar. 1992. D. D. Ling and A. E. Ruehli, "Interconnection modelling", Advances in CAD for VLSI, 3, Part II, Circuit Analysis, Simulation and Design, pp. 211-291, North-Holland: Amsterdam, 1987. J. Cullum, A. Ruehli and T. Zhang, "A method of reduced-order modelling and simulation of large interconnect circuits and its application to PEEC models with retardation", IEEE Transactions on Circuits and Systems, pp. 261-273, Apr. 2000. T. L. Quarles, The SPICE3 Implementation Guide, Technical Report, ERLM89/44, University of California, Berkeley, 1989. D. Gao, A. Yang and S. Kang, "Modeling and simulation of interconnection delays and cross talks in high-speed integrated circuits," IEEE Trans on Circuits and Systems, pp. 1-9, Jan. 90. H. Hasegawa and S. Seki, "Analysis of interconnection delay on very highspeed LSI/VLSI chips using a microstrip line model," IEEE Trans. Electron Devices, pp. 1954-1960, Dec 1984. T. Itoh and R. Mittra, "Spectral Domain approach for calculating the dispersion Characteristics of microstrip lines", IEEE Trans. Microwave Theory Tech., pp. 496-499, Feb. 1973.
Recent Advances in Modeling and Simulation
30. 31.
32.
33.
34.
35.
36. 37. 38. 39.
40.
41.
42.
43. 44.
of High-Speed Interconnects
325
D. Mirshekar-Syahkal, Spectral Domain Method for Microwave Integrated Circuits, Wiley & Sons Inc., New York, 1990. R. H. Jansen, "Spectral Domain Approach for microwave integrated circuits", IEEE Trans. Microwave Theory Tech., vol. MTT-33, pp. 1043-1056, Feb. 1985. R. Wang, and O. Wing, "A circuit model of a system of VLSI interconnects for time response computation", IEEE Trans. Microwave Theory Tech., vol. MTT-39, pp. 688-693, April 1991. A. R. Djordjevie and T. K. Sarkar, "Closed-form formulas for frequency-dependent resistance and inductance per unit length of microstrip and strip transmission lines", IEEE Trans. Microwave Theory Tech., vol. MTT-42, pp. 241-248, Feb. 1994 A. R. Djordjevie, R. F. Harrington, T. K. Sarkar and M. Bazdar, Matrix Parameters for Multiconductor Transmission Lines: Software and Users Manual: Boston: Reteach House, 1989. A. Deutch et. al. "Modeling and characterization of long on-chip interconnections for high performance microprocessors," IBM J. Res. Develop, vol 39, pp. 547-567, Sept. 1995. A. Deutch et. al. "High-speed signal propagation on lossy transmission lines," IBM J. Res. Develop, vol 39, pp. 601-615, July 1990. A. Deutch et al., "When are transmission-line effects important for on-chip interconnections," IEEE Trans. Microwave Theory Tech., Oct 1997. J. Poltz, "Optimizing VLSI interconnect model for SPICE simulation," J. Analog Integrated Circuits and Signal Processing, vol. 5, No. 1, Jan. 1994. T. Dhane and D. D. Zutter, "Selection of lumped element models for coupled lossy transmission lines", IEEE Trans. Computer-Aided Design, vol. 11, pp. 959-967, July 1992. M. Celik, A. C. Cangellaris and A. Yaghmour, "An all purpose transmission line model for interconnect simulation in SPICE," IEEE Trans. MTT, pp. 127138, Oct. 1997. A. Dounavis, X. Li, M. Nakhla and R. Achar "Passive closed-loop transmission line model for general purpose circuit simulators," IEEE Trans, on Microwave Theory and Techniques, vol. 47, pp. 2450-2459, Dec. 1999. A. Dounavis, R. Achar and M. Nakhla "Efficient passive circuit models for distributed networks with frequency-dependent parameters", IEEE Trans. CPMT, Part B, vol. 23, pp. 382-392, Aug. 2000. F. H. Branin, Jr., "Transient analysis of lossless transmission lines," Proc. IEEE, 55, pp. 2012-2013, 1967. M. Cases and D. M. Quinn, "Transient response of uniform distributed RLC transmission lines," IEEE Trans. Circuits Syst., vol. 27, pp. 200-206, Mar. 1980.
326
45.
46.
47.
48.
49.
50.
51.
52. 53. 54.
55.
56. 57.
58.
59.
M. Nakhla & R. Achar
F. Y. Chang, "Transient analysis of lossless coupled transmission lines in a nonhomogenius medium," IEEE Trans. Microwave Theory Tech., vol. 18, pp. 616-626, Sept. 1970. F. Y. Chang, "The generalized method of characteristics for waveform relaxation analysis of lossy coupled transmission lines," IEEE Trans. Microwave Theory Tech., vol. 37, pp. 2028-2038, Dec. 1989. F. Y. Chang, "Waveform relaxation analysis of nonuniform lossy transmission lines characterized with frequency-dependent parameters," IEEE Trans. Circuits Syst., vol. 38, pp. 1484-1500, Dec. 1991. Q. Chu, Y. Lau and F. Y. Chang, "Transient Analysis of Microwave Active Circuits Based on Time-Domain Characteristic Models," IEEE Trans. MTT, pp.l097-1104,Aug. 1998. A. R. Djordjevie, T. K. Sarkar, and R. F. Harrington, "Analysis of lossy transmission lines with arbitrary nonlinear terminal networks," IEEE Trans. Microwave Theory Tech., vol. 34, pp. 660-666, June 1986. A. R. Djordjevie, T. K. Sarkar, and R. F. Harrington, "Time-domain response of multiconductor transmission lines", Proc IEEE, vol. 75, pp. 743-764, June 1987. H. Grabinski, "An algorithm for computing the signal propagation on lossy VLSI interconnect systems in the time-domain" Integration, the VLSI Journal, pp. 35-48, Oct. 1989. M. Nakhla, "Analysis of pulse propagation on high-speed VLSI chips," IEEE Journal of Solid-State Circuits, vol. 25, No. 2, pp. 490-494 Apr. 1990. R. Griffith and M. Nakhla, "Time-domain analysis of lossy multiconductor transmission lines" IEEE Trans. Microwave Theory Tech., vol. 38, Oct. 1990. R. Griffith and M. Nakhla, "Mixed frequency/time domain analysis on nonlinear circuits," IEEE Trans. Computer-Aided Design, vol. 10, pp. 10321043, Aug. 1992. E. C. Chang and S. M. Kang, "Computationally efficient simulation of a lossy transmission line with skin effect by using numerical inversion of Laplace transform," IEEE Transactions on Circuits and Systems, vol. 39, pp. 861-868, July 1992. D. Kuznetsov and J. E. Schutt-Aine, "Optimal transient simulation of transmission lines," IEEE Trans. CAS-43, pp. 110-121, Feb. 1996. W. T. Beyene, and J. E. Schutt-Aine, "Accurate frequency-domain modelling and efficient simulation of high-speed packaging interconnects", IEEE Transactions MTT, pp. 1941-1947, Oct. 1997. R. Wang and O. Wing, "Transient analysis of dispersive VLSI interconnects terminated in nonlinear loads," IEEE Trans. Computer-Aided Design, vol. 11, pp. 1258-1277, Oct. 1992. W. C. Elmore, "The transient response of damped linear networks with par-
Recent Advances in Modeling and Simulation
60. 61.
62. 63. 64.
65.
66.
67.
68.
69.
70.
71.
72.
73.
of High-Speed Interconnects
327
ticular regard to wide-band amplifiers," J. Appl. Physics, pp. 55-63, Jan. 1948. J. Rubinstein, P. Penfield and M. Horowitz, "Signal delay in RC trees," IEEE Trans. Computer-Aided Design, pp. 202-211, July 1983. L. T. Pillage and R. A. Rohrer, "Asymptotic waveform evaluation for timing analysis," IEEE Trans. Computer-Aided Design, vol. 9, pp. 352-366, Apr. 1990. G. A. Baker Jr., Essential of Pade Approximants. New York: Academic, 1975. J. H. McCabe, "A formal extension of the Pade table to include two point Pade quotients," J. Inst. Math. Applic, vol. 15, pp. 363-372, 1975. S. Kumashiro, R. A. Rohrer, A. J. Strojwas, "Asymptotic waveform evaluation for transient analysis of 3-D interconnect structures," IEEE Trans. Computer-Aided Design, vol. 12, no. 7, pp. 988-996, 1993. T. Tang and M. Nakhla, "Analysis of high-speed VLSI interconnect using asymptotic waveform evaluation technique," IEEE Trans. Computer-Aided Design, pp. 341-352, Mar. 1992. D. Xie and M. Nakhla, "Delay and crosstalk simulation of high speed VLSI interconnects with nonlinear terminations," IEEE Trans. Computer-Aided Design,??. 1798-1811, Nov. 1993. S. Lin and E. S. Kuh, "Transient simulation of lossy interconnects based on the recursive convolution formulation," IEEE Trans on Circuits and Systems, vol. 39, pp. 879-892. Nov. 92. S. Y. Kim, N. Gopal, and L. T. Pillage, "Time-domain macromodels for VLSI interconnect analysis," IEEE Trans. Computer-Aided Design, vol. 13, No. 10, pp. 1257-1270, Oct. 1994. Q. Yu and E. S. Kuh, "Exact moment-matching model of transmission lines and application to interconnect delay estimation," IEEE Trans on VLSI, pp. 311-322. Jun. 1995. V. Raghavan, J. E. Bracken, and R. A. Rohrer, "AWESpice: A general tool for accurate and efficient simulation of interconnect problems," in Proc. ACM/IEEE Design Automation Conf., pp. 87-92, June 1992. J. E. Bracken, V. Raghavan, and R. A. Rohrer, "Interconnect simulation with asymptotic waveform evaluation (AWE)," IEEE Trans, on Circuits and Systems, pp. 869-878, Nov. 1992. R. Griffith, E. Chiprout, Q. J. Zhang, and M. Nakhla, "A CAD framework for simulation and optimization of high-speed VLSI interconnections," IEEE Transactions on Circuits and Systems, vol. 39, pp. 893-906. Nov. 92. T Tang, M. Nakhla and Richard Griffith, "Analysis of lossy multiconductor transmission lines using the asymptotic waveform evaluation technique", IEEE Trans, on Microwave Theory and Tech., vol. 39, No. 12, pp. 2107-2116, Dec. 1991.
328
74.
75.
76.
77.
78.
79.
80.
81.
82.
83.
84.
85.
86.
M. Nakhla & R. Achar
E. Chiprout, M. Nakhla, "Analysis of interconnect networks using complex frequency hopping," IEEE Trans. Computer-Aided Design, vol. 14, pp. 186199, Feb. 1995. R. Sanaie, E. Chiprout, M. Nakhla, and Q. J. Zhang, "A fast method for frequency and time domain simulation of high-speed VLSI interconnects," IEEE Trans. Microwave Theory Tech., vol. 42, no. 12, pp. 2562-2571, Dec. 1994. R. Achar, M. Nakhla and Q. J. Zhang, "Full-wave analysis of high-speed interconnects using complex frequency hopping," IEEE Trans, on ComputerAided Design, pp. 997-1016, Oct. 98. M. A. Kolbehdari, M. Srinivasan, M. Nakhla, Q. J. Zhang and R. Achar, "Simultaneous time and frequency domain solution of EM problems using finite element and CFH techniques", IEEE Trans on Microwave Theory and Techniques, vol. 44, pp. 1526-1534, Sept. 1996. R. Achar, M. Nakhla, P. Gunupudi and E. Chiprout, "Passive interconnect reduction algorithm for distributed/measured networks", IEEE Trans. Circuits and Systems-II, pp. 287-301, Apr. 2000. P. Feldmann and R. W. Freund, "Efficient linear circuit analysis by Pade via Lanczos process", IEEE Transactions on. Computer-Aided Design, vol. 14, pp. 639-649, May 1995. R. W. Freund, "Reduced-order modelling techniques based on Krylov subspace and their use in circuit simulation", Technical Memorandum, Lucent Technologies, 1998. P. Feldmann and R. W. Freund, "Reduced order modeling of large linear subcircuits via a block Lanczos algorithm", in Proc. Design Automation Conf., pp. 474-479, June 1995. I. M. Elfadel and D. D. Ling, "A block rational Amoldi algorithm for multiport passive model-order reduction of multiport RLC networks, " Proc. oflCCAD-97, pp. 66-71, Nov. 1997. L. M. Silviera, M. Kamen, I. Elfadel and J. White, "A coordinate transformed Arnoldi algorithm for generating guaranteed stable reduced-order models for RLC circuits," in Technical Digest ICCAD, pp. 2288-294, Nov. 1996. L. M. Silviera, M. Kamen and J. White, "Efficient reduced-order modelling of frequency-dependent coupling inductances associated with 3-D interconnect structures", IEEE Transactions on CPMT, pp. 283-288, May 96. K. J. Kerns and A. T. Yang, "Preservation of passivity during RLC network reduction via split congruence transformations, " IEEE Trans on. ComputerAided Design, pp. 582-591, July 98. A. Odabasioglu, M. Celik and L. T. Pillage, "PRIMA: Passive Reduced-Order Interconnect Macromodeling Algorithm", IEEE Transactions on. Computer-Aided Design, pp. 645-654, Aug. 1998.
Recent Advances in Modeling and Simulation
87. 88.
89.
90.
91.
92.
93.
94.
95.
96.
97.
98.
99.
of High-Speed Interconnects
329
A. Odabasioglu, M. Celik and L. T. Pillage, "Practical considerations for passive reduction of RLC circuits", Proc. DAC, pp. 214-219, June 1999. Q. Yu, J. M. L. Wang and E. S. Kuh, "Passive multipoint moment-matching model order reduction algorithm on multiport distributed interconnect networks," IEEE Trans on. Circuits and Systems -I, pp. 140-160, Jan. 99. A. Dounavis, E. Gad, R. achar and M. Nakhla, "Passive model-reduction of multiport distributed networks including frequency-dependent parameters," IEEE Trans, on Microwave Theory and Techniques, vol. 48, pp. 2325-2334, Dec. 2000. A. C. Cangellaris, S. Pasha, J. L. Prince and M. Celik, "A new discrete timedomain model for passive model order reduction and macromodelling of high-speed interconnections," IEEE Trans. CPMT, pp. 356-364, Aug. 99. M. Celik and A. C. Cangellaris, "Simulation of dispersive multiconductor transmission lines by Pade approximation via Lanczos process," IEEE Trans. MTT, pp. 2525-2535, Dec. 96. P. Gunupudi, M. Nakhla and R. Achar, "Simulation of high-speed distributed interconnects using Krylov-subspace techniques," IEEE Transactions on CAD of Integrated Circuits and Systems, vol. 19, pp. 799-808, July 2000. R. Achar and M. Nakhla, Minimum Realization of Reduced-Order Models of High-Speed Interconnect Macromodels, Chapter: Signal Propagation on Interconnects, Editor: Hartmut Grabinski, Boston: Kluwer Academic Publishers, 1998. M. Kamon, F. Wang and J. White, "Generating nearly optimally compact models from Krylov-subspace based reduced-order models", IEEE Transactions on Circuits and Systems, pp. 239-248, Apr. 2000. O. Palusinski and A. Lee, "Analysis of transients in nonuniform and uniform multiconductor transmission lines", IEEE Trans. MTT, pp. 127-138, Jan. 1989. E. Gad and M. Nakhla /'Simulation and sensitivity computation of nonuniform transmission lines via integrated congruence transform," Proceedings IEEE 12th Topical Meeting on Electrical Performance of Electronic Packaging (EPEP), pp.259-262, Oct.2003. N. Boulejfen A. Kouki and F. Ghannouchi, "Frequency and time-domain analysis nonuniform lossy coupled transmission lines with linear and nonlinear terminations", IEEE Trans. MTT, pp. 367-379, Mar. 2000. C. Yen, Z. Fazarinc, and R.L. Wheeler, "Time-Domain Skin-Effect Model for Transient Analysis of Lossy Transmission Lines," Proc. IEEE, pp. 750-757, 1982. R. Khazaka, E. Chiprout, M. Nakhla and Q. J. Zhang, "Analysis of highspeed interconnects with frequency dependent parameters," Proc. Intl. Symp. EMC, pp.203-208, Zurich, March 95.
330
100.
101.
102.
103.
104.
105.
106.
107.
108.
109. 110.
111.
112.
M. Nakhla & R. Achar
W. Sui, D. A. Christensen and C. H. Dumey, "Extending the Two-Dimensional FDTD Method to Hybrid Electromagnetic Systems with active and Passive Lumped Elements," IEEE Trans. Microwave Theory Tech., vol. 40, no.4, pp.724-730, Apr. 1992. D.Saraswat, R. Achar and M. Nakhla, "Passive Macromodels of Microwave Subnetworks Characterized by Measured/Simulated Data", IEEE International Microwave Symposium, pp. 999-1002, Philadelphia, PA, June 2003. M. Picket-May, A. Taflove and J. Baron, "FD-TD modeling of digital signal propagation in 3-D circuits with passive and active loads," IEEE Trans. Microwave Theory Tech., vol. 42, no. 8, pp. 1514-1523, Aug. 1994. P. C. Cherry and M. F. Iskander, "FDTD analysis of high frequency electronic interconnection effects," IEEE Transactions on. Microwave Theory Tech., vol. 43, no. 10, pp. 2445-2451, Oct. 1995. S. D. Corey and A. T. Yang, "Interconnect characterization using time-domain reflectometry," IEEE Trans. Microwave Theory Tech., vol. 43, pp. 2151-2156, Sep. 95. B. J. Cooke, J. L. prince and A. C. Cangellaris "S-parameter analysis of multiconductor integrated circuit interconnect systems," IEEE Tr. Computer-Aided Design, pp. 353-360, Mar. 92. L. Vakanas, A. C. Cangellaris and O. Palusinski, "Scattering parameter based simulation of transients in lossy, nonlinearly terminated packaging interconnects", IEEE Trans. CPMT-B, 472-479, Feb. 1994. M. Celik, A. C. Cangellaris and A. Deutsch, "A new moment generation technique for interconnects characterized by measured or calculated S-parameters," IEEE Intl. Microwave Symposium Digest, pp. 196-201, June 1996. J. E. Schutt-Aine and R. Mittra, "Scattering parameter transient analysis of transmission lines loaded with nonlinear terminations," IEEE Trans. Microwave Theory Tech., pp. 529-536, 1988. J. E. Schutt-Aine and R. Mittra, "Nonlinear analysis of coupled transmission lines," IEEE Trans. CAS-36, pp. 959-967, 1989. W. T. Beyene, and J. E. Schutt-Aine, "Efficient transient simulation of highspeed interconnects characterized by sampled data," IEEE Transactions on CPMT, Part B, Vol. 21, pp. 105-113, Feb. 1998. G. Zheng, Q. J. Zhang, M. Nakhla and R. Achar "An efficient approach for simulation of measured subnetworks with complex frequency hopping", Proceedings IEEE/ACM In. Conf. Computer Aided Design, pp. 23-26, Nov. 1996, San Jose, CA. R. Achar and M. Nakhla, "Efficient transient Simulation of embedded subnetworks characterized by S-parameters in the presence of nonlinear elements", IEEE Transactions on Microwave Theory and Techniques, vol. 46, pp. 2356-2363, Dec. 1998.
Recent Advances in Modeling and Simulation
113.
114.
115.
116. 117.
118.
119.
120.
121.
122.
123.
124.
125.
of High-Speed Interconnects
331
N. Liu, M. Nakhla and Q.J. Zhang, "Time domain sensitivity of high-speed VLSI interconnects," International Journal on Circuit Theory and Applications, vol. 22, pp.479-511, Nov. 1994. Q.J. Zhang, S. Lum, and M. Nakhla, "Minimization of delay and crosstalk in high-speed VLSI interconnects", IEEE Trans. MTT, vol. 42, pp. 1555-1563, July 1992. S. Lum, M. Nakhla and Q.J. Zhang, "Sensitivity analysis of lossy coupled transmission lines with nonlinear terminations" IEEE Trans. MTT, vol. 42, 1994. R. W. Freund and P. Feldmann, "Small signal circuit analysis and sensitivity computations with PVL algorithm", IEEE Trans. CAS - II, pp. 577-585, 1996. C. Jiao, A. C. Cangellaris, A. Yaghmour and J. L. Prince, "Sensitivity analysis of multiconductor transmission lines and optimization for high-speed interconnect design," IEEE Trans. CPMT, pp. 132-141, May 2000. R. Khazaka and M. Nakhla, "Analysis of high-speed interconnects in the presence of electromagnetic interference," IEEE Trans. MTT, vol. 46, pp. 940-947, July 1998. I. Erdin, R. Khazaka and M. Nakhla, "Simulation of high-speed interconnects in the presence of incident field," IEEE Transactions on Microwave Theory and Techniques, vol. 46, pp. 2251-2257, Dec. 1998. I. Erdin, M. Nakhla and R. Achar, "Circuit analysis of electromagnetic radiation and field coupling effects for networks with embedded full-wave modules," IEEE Transactions on Electromagnetic compatibility (EMC), vol. 42, pp. 449-460, Nov. 2000. F. Olyslager, D. D. Zutter, and A. T. de Hoop, "New reciprocal circuit model for lossy waveguide structures based on the orthogonality of the eigenmodes," IEEE Trans. Microwave Theory Tech. vol 42, no. 12, pp 2261-2269, Dec. 1994. F. Olyslager, D .D. Zutter, and K. Blomme, "Rigorous analysis of the propagation characteristics of general lossless and lossy multiconductor transmission lines in multi-layered media," IEEE Trans. Microwave Theory Tech., vol 41, no. l.pp. 79-88, Jan. 1993. C. D. Taylor, R. S. Satterwhite, and C. W. Harrison, "The response of a terminated two-wire transmission line excited by a nonuniform electromagnetic field," IEEE Trans, on Antennas and Propagation, pp.987-989, Nov. 1965. A. A. Smith, "A more convenient form of the equations for the response of a transmission line excited by nonuniform fields," IEEE Trans, on EMC, vol. 15, 151-152, Aug. 1973. C. R. Paul, "Frequency response of multiconductor transmission lines illuminated by an incident electromagnetic field," IEEE Trans. Microwave Theory Tech., vol. 22, no. 4, pp. 454-457, Apr. 1976.
332
126.
127.
128.
129.
130. 131.
132.
133.
134.
135. 136.
137.
138.
139.
M. Nakhla & R. Achar
C. R. Paul, "A comparison of the Contributions of Common-Mode and Differential -Mode Currents in Radiated Emissions," IEEE Trans. Electromag. Compat., pp. 189-193, May. 89. C. R. Paul, "A SPICE model for multiconductor transmission lines excited by an incident electromagnetic field," IEEE Trans. Electromag. Compat., No. 4, pp. 342-354, Nov. 1994. I. Wuyts and D. De Zutter, "Circuit model for plane-wave incidence on multiconductor transmission lines," IEEE Trans on EMC, vol 36, no. 3, pp. 206212, Aug. 1994. E. S. M. Mok, G. I. Costache, "Skin-effect considerations on transient response of a transmission line excited by an electromagnetic wave," IEEE Trans, on EMC, vol. 34, no. 3, pp. 320-329, Aug. 1992. Y. Kami and R. Sato, "Circuit-concept approach to externally excited transmission lines," IEEE Trans, on EMC, vol 27, no. 4, pp. 177-183, Nov. 1985. N. Ari and W. Blumer, "Analytic formulation of the response of a two-wire transmission line excited by a plane wave," IEEE Transactions on Electromagentic Compatibility, vol. 30, no. 4, pp. 437-448, Nov. 1988. C. R. Paul, "Literal solutions for the time-domain response of a two-conductor transmission line excited by an incident electromagnetic field," IEEE Trans, on EMC, vol. 37, No. 2, pp. 241-251, May 1995. P. Bernardi, R. Cicchetti and C. Pirone, "Transient response of a microstrip line circuit excited by an external electromagnetic source," IEEE Trans, on EMC, vol. 34, No. 2, pp. 100-108, May 1992. G. J. Burke, E. K. Miller, and S. Chakrabarti, "Using model based parameter estimation to increase the efficiency of computing electromagnetic transfer functions," IEEE trans, on Magnetics, vol. 25, No. 4, pp. 2087-2089, July 1989. F. M. Tesche, M. V. Ianoz, T. Karlsson, EMC Analysis Methods and Computational Models, Wiley, New York, 1997. A. K. Agrawal, H. J. Price and S. H. Gurbaxani, "Transient response of multiconductor transmission lines excited by a nonuniform electromagnetic field," IEEE Trans. Electromag. Compat, vol. 22, No. 2, pp. 119-129, May 1980. Y. Kami and R. Sato, "Circuit-concept approach to externally excited transmission lines," IEEE Trans. Electromag. Compat., vol 27, no. 4, pp. 177-183, Nov. 1985. R. Raut, W. J. Steenart and G Costache, "A note on the optimum layout of electronic circuits to minimize electromagnetic field strength," IEEE Trans. Electromag. Compat., pp. 88-89, Feb. 1988. C. W. Ho, A. E. Ruehli and P. A. Brennan, "The modified nodal approach to network analysis," IEEE Trans. Circuits and Systems, vol. CAS-22, pp. 504509, June 1975.
Recent Advances in Modeling and Simulation
140. 141. 142. 143.
144. 145.
146.
147.
148. 149. 150. 151. 152. 153. 154. 155. 156. 157. 158.
of High-Speed Interconnects
333
A. E. Ruehli, Circuit Analysis, Simulation and Design, Noth-Holland, New York: 1988. J. K. White and A. S. Vincentelli, Relaxation Techniques for the simulation of VLSI Circuits, Boston: Kluwer Academic Publishers, 1990. D. O. Pederson, "A historical review of circuit simulation," IEEE Transactions on Circuits and Systems, vol. CAS-31, no. 1, Jan. 1984. A. S. Vincentelli, "Circuit Simulation," in Computer Design Aids for VLSI Circuits, P. Antognetti, D. O. Pederson and H. De Man (editors). Martinus Nijhoff Publishers, 1986, pp. 19-112. J. K. Ousterhout, "CRYSTAL: A timing analyzer for NMOS VLSI Circuits," in Proc. 3rd Caltech. Conf. on VLSI, Mar. 1983, pp. 57-69 Norman P. Jouppi, "Timing analysis and performance improvement of MOS VLSI design," IEEE Transaction on Computer-Aided Design of ICs, vol. 6, no 4, pp. 650-665, July 1987. S. Lin, M. M. Sadowska, and E. S. Kuh, "SWEC: A step wise equivalent conductance timing simulator for CMOS VLSI circuits," in Proc. Electron. Design Automation Conf, 1991, pp. 142-148. A. S. Vincetelli, E. Lelarasmee, and A. Ruehli, "The waveform relaxation method for the time-domain analysis of large scale integrated circuits," IEEE Trans. Computer-Aided Design, vol. l,no. 3, pp. 131-145, Aug. 1982. A. Devgan and R. A. Roher, "Adaptively controlled explicit simulation," IEEE Trans, on Computer-Aided Design, vol. 13, no. 6, Jun. 1994. S. Lele, "Compact finite difference schemes with spectral-like resolution," Journal Compt. Physics, vol. 103, pp. 16-42, 1992. J. Vlach and K. Singhal, Computer Methods for Circuit Analysis and Design, New York: Van Nostrand Reinhold, 1983. T. Kailath, Linear Systems. Toronto: Printce-Hall Inc., 1980. C. T. Chen, Linear system theory and design. New York: Holt, Rinehart and Winston, 1984. L. Weinberg, Network Analysis and Synthesis, NY: McGraw-Hill Book Company Inc., 1962. W. Louis, Network analysis and synthesis. New York, NY: McGraw-Hill, 1962. E. Kuh and R. Rohrer, Theory of Active Linear Networks, San Francisco: Holden-day Inc, 67. E. A. Guillemin, Synthesis of Passive Networks, New York: John Wiley and Sons Inc., 1957. M. E. V Valkenburg, Introduction to Modern Network Synthesis. New York: John Wiley and Sons Inc., 1960. U. S. Pillai, Spectrum Estimation and System Identification. New York: Springer-Verlag, 1993.
334
159. 160. 161.
162.
M. Nakhla & R. Achar
J. W. Demmel, Applied Numerical Linear Algebra, Philadelphia, PA: SIAM Publishers, 1997. M. Reed and B. Simon, Methods of Modern Mathematical Physics, Vol. I, pp. 205-296, Academic Press, New York, 1975. F Fer, "Resolution de l'equation matricielle dU/dt = pU par produit infini d'exponentielles matricielles", Acad. Roy. Belg. CI. Sci., vol. 44, no. 5, pp. 818-829, 1958. A. Dounavis, N. Nakhla, R. achar and M. Nakhla, "Delay extraction and Passive macromodelling of lossy coupled transmission lines," Proceedings IEEE 12th Topical Meeting on Electrical Performance of Electronic Packaging (EPEP), pp.251-253, Oct.2003.
MULTISCALE MODELING OF D E G R A D A T I O N A N D FAILURE OF I N T E R C O N N E C T LINES D R I V E N B Y ELECTROMIGRATION A N D STRESS GRADIENTS Robert Atkinson and Alberto Cuitino Department of Mechanical and Aerospace Engineering Rutgers Piscataway, New Jersey 08854, USA E-mail: [email protected]
University
This article introduces a multiscale modeling approach for simulating a key aspect affecting the reliability in narrow interconnect lines: migration, growth and coalescence of voids driven by electromigration and stress gradients. This approach is based on the Monte Carlo method with a Hamiltonian that contains both short-range and long-range interactions. The short-range interactions are resolved at atomic scale while the contributions of the long-range interaction are computed by solving a subsidiary continuum boundary value problem for the elastic and electric field. Thus, this approach combines both discrete and continuum techniques to provide a numerical tool to follow the complex evolving topologies that are derived from void migration, growth and coalescence. The predicted results are in good agreement with key experimental observations and other numerical and theoretical estimates.
1. I n t r o d u c t i o n T h e research into line interconnect failure has seen many changes in the past 30 years. In the early to mid 1970's, the failure of line interconnects was understood to be primarily due to grain boundary diffusion. [12] [21] This was due to the size of the lines at t h a t time. In fact, it was due to the relative size of the grains in the line, in comparison to the line width. Spawning over a decade, investigations into the mean times of failure ( M T F ) of line interconnects began in the early 1980's.[13] [10] [28] By the early 1990's it had been reported t h a t the M T F of line interconnects level off and begin to increase at some critical line width. These investigations were concerned with the effects of continually decreasing line width on the M T F . T h e results clearly indicated t h a t there was a minimum M T F t h a t occurs
335
R. Atkinson
336
& A.
Cuitino
when the line width is just slightly larger than the grain size. This is due to the transition from one primary mode of failure to another. The transition from grain boundary diffusion to surface diffusion and void motion occurs when the medium for grain boundary diffusion is removed. The removal of the medium occurs when there are no more diffusion pathways along grain boundaries that lay parallel to the line. In fact, the grain boundaries that do exist in small lines, less than 3 fim, are primarily perpendicular to the lines. This gave rise to the term bamboo structure of a line interconnect. It was at this point in the evolution of line interconnects that a new approach toward understanding had to be undertaken. In addition, accompanying experimental research into grain orientation and failure location in the line was completed. [37] [11] These investigations gave understanding into the reasons for failure of certain grain orientations and locations along the lines. A number of simulation strategies have been developed to follow the evolution of interconnect surfaces and internal voids using a continuum approach, where different mechanisms of atomic migration have been considered including electromigration as well as surface and bulk diffusion assisted by stresses [30] [40] [5] [32] [35] [19] [43] [2] [38] [4] [44] [42] [39] [41] [40] [27] [14] [33] [3] [18]. In the present approach, however, we merge discrete and continuum methodologies to describe the mass transport process in interconnects driven by electromigration and stress sources, which is described in the Section 2 and the predictions of the simulations are presented in Section 3. 2. Multiscale Discrete-Continuum Formulation The geometry and topology of the line is defined by indicating the initial position of the atoms, which is a discrete representation of the material. The evolution of the atomic positions are driven a Monte Carlo scheme that considers both discrete and continuum effects. In particular the multiscale bridging link of this methodology is in the formulation of the Hamiltonian, which is defined as follows: Na
Na
EB Si S SiS
H = J2J2~ ( > J) i i=l
j —1
Na
+ Yl\EE^
+E s
°( *)+EA^Si
w
i=l
where, EB (si,Sj) is the pair potential energy between atoms that occupy lattice sites i and j , EE (si) is the electric potential energy at lattice site i, Ea (si) is the elastic strain energy at lattice site i, EA (SJ) is the activation energy at lattice site i, and Sj is the lattice binary counter that
Multiscale Modeling of Interconnect
Lines
337
takes the value of 1 when the lattice site is occupied and 0 when empty. The first term in the Hamiltonian accounts for the short-range interactions discretely, while the second term introduces the long-range interactions via the evaluation of the per site specific energies, which are obtained by surrogate continuum boundary value problems. The geometry and topology utilized for solving of these continuum problems are dictated by discrete simulation. In particular, we consider a Local Monte Carlo approach based on the Kawasaki algorithm. The probabilities for the movement of an atom to an open position are assigned based on the values adopted by the Hamiltonian, Eq.(l) for the before and after configurations. Our approach uses the existing lattice to calculate the solution for the current densities, and elastic energies in the line. The current densities are solved using a finite difference technique while the elastic energies are obtained by a finite element approach. These are continuum solutions to problems that require boundary conditions. These boundaries are determined by the occupation state of the lattice. The solutions to the electric and stress problem are then overlaid on the lattice and each lattice site is then assigned a value based on the calculations from the finite difference and finite element solutions. This is an iterative scheme that requires the continuous update of the electric and elastic fields due to change in the geometry and topological changes driven by the local Monte Carlo selection process. This mixed discrete-continuum methodology, while consistent with continuum solutions, also provides a natural recourse for capturing the motion and aggregation of discrete/point defects leading to interconnect lines damage. The description of the discrete and continuum formulations are presented in Section 2.1 and Section 2.2 respectively. 2.1. Discrete
Formulation
We consider a Local Monte Carlo scheme to follow the evolution of the topology and geometry of the interconnect line, where the transition probabilities are computed based on the Hamiltonian Eq. (1). The equilibrium exchange rates between energy state /i and energy state v is given as follows: P (M -> v) ^ 9 (M -> v) A (M -> v)
(2)
where g is the selection probability of an atom which is always equal to -^-, where Na is the total number of atoms. Therefore the relation simplifies to the following.
338
R. Atkinson
& A.
P(fi^v)
Cuitino
A(fi^v)
(3)
Now the determination of the function A must be accomplished. Having identical but distinguishable particles implies that the configuration entropy is given by the following relation: S =
N\ Ni\N2\...
(4)
where N is the total number of energy states, and Ni is the total number of atoms that exist in energy level i. Using the Stirling approximation In (AH) SiNlnN-
N
+-IH(2TTN)
(5)
For large N, In (AH) S AT In AT- N.
(6)
Thus,
In (S) = £ N^ In ( £ NA - £ > „ In (A^) JV =
(7)
E i V ^ Q E d J V " = 0 ' £ = E £ " =*pY,E*dN* = ° (8)
where fi denotes all energy states. Constraints in differential form are zero, so adding those terms to din (5) is acceptable. dln(S)
£ d dNIn (5)
a + PE^ dN„
(9)
IL
where d In (5)
5
cWM
cW^
5>„ln K> M - J ^ l n ^ )
= In AT-In A^. (10)
Thus, at a maximum dln(S) = 0 = [InN - In A^ + a + (3E^} dA^
(11)
which can be true if and only if 0 = l n A r - l n A ^ + Q ; + /3£M
(12)
Multiscale Modeling of Interconnect
Lines
339
which can be rewritten as N - ^ = e x p (a) exp (/?£„).
(13)
Solving for constants implies E
M
^
N
=
E^exp(Q)exp(/3EM) exp (a) = Y^
N
ex
P (-/3-B/x) • (14)
Thus, N^ _ exp (/?£„) N £ M e x p (-/?£„)
l
>
N
where £ exp(—PE^) is denoted as the partition function and - ^u is the probability. Thus if we are considering the probability of an atom's movement from energy state /it, to an energy state v we have the following: A(fi —>v) _ -£_ _ s„exp(-j3S„) _ exp(/3EM) = exp c [/3 ( £ „ - £ ? „ ) ] A(v—*n) "Z exP(ffgy) exp^) " ^ ^ (16) where for a thermodynamic system /3 = -^ , thus i 4 (/x —•» i/)
— exp
fcT
(17)
The Metropolis algorithm states that the more efficient method to accept movements is to enforce that the larger of the two acceptance probabilities, A, be unity. [24] This results in the following acceptance parameters:
^ ^ M ^ 1 ^ " ^ 0 } . (18) (^ 1 otherwise J Therefore a movement is successful if there is a reduction in the total energy of the system as a result of the movement. However, a movement that raises the total energy of the system is still possible but has a finite probability denned by the previous equation. In addition, the calculation of the energy state is done by implementing the Hamiltonian. The Kawasaki simplification recongizes the fact that the single flip dynamics of the system can make the calculation of the energy states much simpler by considering only those lattice points that have been selected for movement.[24] The energy term defined in the statistical mechanics equation has contributions from three types, surface energy (A-Eg), electromigration forces
R. Atkinson
340 1
& A.
Cuitino
:
0.9 7 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
'., ,
. 1 . . . .
0
1 . . . . 1 . . . . 1 . .
1
2
3
4
5
£No. Neighbor
Fig. 1.
Probability of atomic movement, kT = 1 and Eg = 1.
(AEE), and elastic energy (AEa). The surface energy contribution is a result of the binding energy between atoms, where,
Es = {No. of Neighbors)
(EB)
(19)
where Eg is the pair potential energy associated with one atom to another. The Electromigration forces create the energy, AEE, as a result of the atom moving through the electromigration force field. The AEa term is the strain energy that results from the applied forces and displacements to the boundary of the specimen. The inclusion of these terms to the statistical mechanics relation is shown in Equation 20. p = exp
AErotai
kT
p = exp
where AETotai
{ANeighbors)EB
= AES + AEE + AEa
+ AEE + AEa
(20)
A graphical representation of p, for AEE = AEa = 0, is shown in Figure 1. Figure 1 completely describes the probability for a specimen that considers temperature effects only. Notice the integer values along the horizontal axis represent the change in number of neighbors as a result of the movement selected. The inclusion of electric and elastic effects would result in a band of energy around each of the locations that define a change in number of neighbors. This energy band represents the region in which the
Multiscale Modeling of Interconnect
2
3
Lines
341
4
ANo. Neighbor Fig. 2.
Probability of atomic movement, fcT = 1 and EB = 1-
1
2
3
ANo. Neighbor Fig. 3.
Probability of atomic movements, for fcT = 1 and Eg = 1 and EB = 2.
Erotai may exist and therefore defines the energy bands in which the atomic movements occur. This type of energy band is displayed in Figure 2. From Figure 1, it is readily seen that the relative size of the environmental energies and change in neighbors, (ANeighbor) (EB), energy term determine the response of the atoms to selected movements. The distance along the x-axis between locations that define motions is a function of the pair potential energy, EB- For example, if the pair potential energy were equal to 2, the probability decreases for the same ANeighbor. A comparison plot is shown in Figure 3.
R. Atkinson
342
& A.
Cuitino
The comparison plot shown in Figure 3 displays how the probabilities for varying EB can be altered significantly. The acception or rejection of randomly selected atoms is based on the probability defined by Equation 20 and the generation of a random number. 2.2. Continuum
Formulation
The Hamiltonian defined in Eq. 1 requires the computation of the electric potential density and elastic energy density at each of the occupied lattice sites. These contributions are obtained by setting up subrogate boundary values problems where the boundaries are defined and continuously updated by the occupancy map of the Monte Carlo (MC) calculation. Different continuum techniques can be used to compute these fields, including Finite Differences (FD) and Finite Elements (FE). In this article, we present how both FD and FE can be effectively coupled with MC. We use FD for solving the electric field and FE for solving the elastic one. 2.2.1. Electric field by finite differences The contribution of the electric field to the motion probability of the atoms comes as a result of the energy change associated with an atom that moves through that electric field. This energy change can be thought of as a mechanical work due to the motion of the atom through a vector force field that is directly related to the gradient of the electric field. This net force experienced by an atom due to electron flow is referred to as electromigration force. These electromigration forces are a function of the current density in the material, and the effective valence of the material. Recall, the electromigration forces that have been used in simulations by numerous researches is of the following form and initially presented by J. R. Black in 1969 [6]: FE = -Z*epj
(21)
7 = cVfcjs
(22)
where
and p=-.
(23) c
Therefore FB =
-Z*e\7$E
(24)
Multiscale Modeling of Interconnect
Lines
343
where V $ s is obtained from the solution of the Laplace equation. V2$E
= 0
(25)
The determination of the electromigration forces which drive mass transport is done by solving the Laplace differential equation in conjunction with specified applied boundary conditions. The boundary condition used in this program is applied electric potential. Applied current density is also a possible boundary condition that can be used to obtain a solution for this electrical problem. Applied potential at the boundary is the preferred boundary condition since the enforcement of an applied current density at a boundary results in surfaces at or around a boundary having an artificially applied current density which is effectively specifying an electromigration force at that point. Electromigration force is the desired dependent variable and should not be an applied boundary condition. The variation of the applied potential boundary conditions enables the program to simulate any desired current density. The solution of Laplace's equation across the entire lattice space is accomplished by first establishing a grid over the lattice space. Laplace's equation is a continuum solution to the electrical problem which is why the lattice space must be converted to a grid that can be operated on in a continuum fashion. These grid points house the electric potential values. Also, each grid is identical to one another. Consider an FCC crystal with its (111) plane normal to the line interconnect path and its [10T] direction parallel to the line interconnect path. This crystal has a grid unit defined according to the underlaying lattice space, see Figure 4. This regular and repeatable grid system is oriented as follows: From Figure 4 it is clear that each grid unit has a total of 4 atoms inside its boundaries. There are two atoms completely inside (4 and 5), two that have half of their area inside (2 and 7), and four that have one quarter of their area inside the boundary (1, 3, 6, and 8). This totals four atoms per grid unit. A grid is considered to be populated with atoms if there exists enough atoms, or portions of atoms, inside the grid boundary to total .75 of an atom. In addition, if the grid has any neighboring grid units that are considered not populated, the numerical solution of the Laplace equation is altered to accommodate this fact. Consider a grouping of grids as shown in Figure 4. The numerical representation of the Laplace equation at a grid point that has all of its neighboring grid points populated is given as follows.
344
R. Atkinson
& A.
Cuitino
dx
Fig. 4.
Electric Field grid overlayed on Lattice Space.
o4 o,
o2
$ 0
3>3
Fig. 5.
Electric Field grid with assigned Electric Potentials.
<92$ 9a;
V $ (at grid point 0) = -^~o 2 +
d2$ ch/2
$1 + $ 2 - 2 $ 0
dx2
$ 3 + $4 - 2 $ 0
+ •
dy2 (26)
This reduces to the following expression solving for $o :
_L_
1 f_\_ 0
"~ 2 Vdx2
+
dy2
-1
$1 + $2 dx2
$3 + $4 dy2
(27)
where Figure 5 details the labeling convention of the electric potentials for each of the 5 grids. If a grid point has a neighboring grid point that is not populated, the governing numerical representation of the Laplace equation
Multiscale Modeling of Interconnect
3>.
<&2
*0
*,
o„
o2
Oo
(a) Fig. 6.
345
O4
* 4
4>.
Lines
(b)
Electric Field grid for insulated boundary cases.
must take into consideration the fact that there now exists an insulated boundary condition along one of the sides of grid 0. Consider grid 1 has fewer than 0.75 atoms inside of it. This grid is then considered unpopulated and its edges considered to be insulated boundaries, as shown in Figure 6 (a). The requirement for an insulated boundary is that the following: j-n =0
(28)
where n is the insulator's surface normal. Notice that Figure 6 (a) has $1 = $o- This satisfies the insulator boundary condition at its surface normal. This insulated boundary condition changes the solution for $0 to the following form:
+
-(£ £)"'(£ ^ J -
(29
»
Similarly for Figure 6 (b),
The complete solution to the electric field is obtained by implementing a Newton-Raphson iterative method. The selection of this approach to determine the solution to the electric field is a function of simplicity and robustness. This method, as opposed to a boundary element method, for solution of the electric field does not get increasingly complex for an increasingly multiply connected specimen. This is of importance when implementing a discrete atomic level simulation since the specimen may have voids being created, breaking up, or coalescing. In addition, the use of a finite difference, or finite element method, is preferred since the true location of interest of the current densities is near the surfaces of the specimen. The
346
R. Atkinson & A. Cuitino
boundary element method's numerical accuracy decreases as the distance from the boundary element to the point of interest becomes comparable with the length of the elements used. Near the surface is precisely where the electromigration occurs due to the electromigration forces, therefore a finite difference approach was employed. After the electric field has been solved for, the task is then to convert this electric field to electromigration forces by the relation defined in Equation 24. Recall that the grids encompass a total of 4 atoms per unit as shown in Figure 4. The forces are assigned to the atoms in such a manner as to have contribution from any grid that an atom may coexist in. Figure 7 shows a set of grid units numbered 0 to 8 with the atoms in grid 0 numbered 1 to 8. The forces are assigned in the following manner: Fatom2
= \{F0
+ F\+£_4
+ F8)
Fatom 2 = 2 \F° + -^4) Fatom 3 = \ (F0 + F~2__+ F~4 + F7) Fatom 4 = FQ -F atom 5 = F 0
Fatoml
= \(F0
(31)
+ F\ + F 3 + F5)
Fatom 7 = 2 (^0 + ^ 3 ) Fatom 8 = 4 [FQ + F2 + F3 + F6)
Figure 8 shows an example of the electric field forces and how they assign to the atoms. The bold vectors denote the force assigned to each grid point, the thin vectors denote the force assigned to each atom, and the circles denote the position of the atoms in the lattice space. The statistical mechanics portion of the program requires the use of energy as the input variable that determines the probability of a successful move. What has been described here, thus far, has dealt with a conversion of electric potential energy to force. So why not use the electric potential directly? This is done for two reasons. Firstly, the electromigration of atoms in metals is a result of the forces due to the electrons bombarding atoms in the material. This resulting force has been quantified by the relation given in Equation 24. This defines the electromigration forces as a function of the electric potential derivative and a material property referred to as effective valence (Z*). Therefore, to consider only the potential would omit certain material-specific effects. The electromigration forces can be thought of in a mechanical sense, in that the energy state of the atom is not changed unless the atom moves. This simply means that the energy associated with
Multiscale Modeling of Interconnect
F4 Y^ 2 Y
F8 F,
A
7
A
3
]
F2 8 )
F3
F5
347
F7
4 W 5 ( 6
Lines
F6
Fig. 7.
Forces at grid points overlayed on atom unit cell.
Fig. 8.
Forces at grid points and lattice points, example.
this move is as follows: AEE
= FE • Ad
(32)
where Ad is the change in position vector from where the a t o m was before the move, to where it is after. T h e second reason it is beneficial to enforce a change from potential energy to mechanical work is due to the fact t h a t the energy on the grid space behaves correctly away from the material/no-material interfaces. T h e use of energy directly from the solution of the Laplace Equation can result in material movement in an incorrect fashion, as discovered in the early development of the electrical portion of the simulation program presented here.
348
R. Atkinson
& A.
Cuitino
The definition of the mass transport model used in continuum approaches consider the electromigration forces at the surface of the lines will result in a sweeping of material along the direction the electrons are moving at a rate that is linearly proportional to the magnitude of the current density at that point. The use of the current density here relates directly to the force on the atom, which relates directly to the energy of a movement in that force field, and then finally to the probability of a movement based on energy states. This final energy state change is where the movement probabilities are determined and, in an aggregate fashion, transforms to rates of mass transport. Since the movement probabilities are a function of the energy change inside of an exponential, an atom that is selected to move against the force field will do so based on a probability given by the statistical mechanics relation 20. The use of the statistical mechanics relation combined with the Metropoli and Kawasaki algorithms results in the movement of an atom selected to move downstream automatically, and the movement of an atom selected to move upstream is accepted or rejected based on the energy drop from that movement. From this simplified argument, it is recognized that the statistical mechanics approach results in increasingly higher aggregate mass transport response to increasingly higher current density magnitudes. This is the same trend that the continuum mass transport models enforce, however the mass transport response of a discrete system of atoms requires many Monte Carlo iterations, and can only be understood in an aggregate sense. This study of individual random movements of atoms in an effort to understand the complex aggregate response of the system is precisely the strength of this Monte Carlo statistical mechanics methodology. The evolution of the system of atoms in response to these environmental effects is what is of interest here.
2.2.2. Elastic field by finite elements Stress gradients in a metal affect the flow of vacancies in the bulk as well as surface migration of atoms. The cause of preexisting stress in the line interconnect is a result of the coefficient of thermal expansion (CTE) mismatch between the silicon dioxide substrate and metal line. The CTEs for Si02 and Cu are .05xl0~ 6 and !6.5xlO~6[K~l], respectively. The copper has nearly 300 times the CTE of the silicon dioxide. The creation of the crystal during the cooling of the metal lines after deposition on the substrate results in residual thermal stresses due to the CTE mismatch between the
Multiscale Modeling of Interconnect
Lines
349
substrate and the metal. These preexisting thermal stresses in the line interconnects have associated with them a mass transport phenomena due to the varying elastic energy field in the material as a result of the induced thermal stress. The resulting stresses in the metal lines are in fact tensile since the metal possesses a higher CTE than the substrate which it is bonding to at just below the metal's melting point. The cooling of the metal then results in a difference in the contraction of the metal as compared to the substrate/barrier. The metal contracts more and as a result experiences a tensile stress after it has cooled down to its operating temperature. Mass transport due to electromigration forces can also cause additional mechanical stresses to develop especially if the line is embedded in a multilayered substrate or simply passivated. The resulting diffusion downstream along the interface will result in a compressive stress at hillock locations and tensile stress at void locations. This project is concerned not with these mechanisms of stress evolution, but the effects of an existing stress state on a specimen, such that varying stress states, regardless of cause, can be investigated. At an atomic level, the term stress has no meaning. The atoms possess a pair potential energy as a function of the distance between the atoms. The derivative of this pair potential energy is the force between the two atoms. The elastic energy is stored in the lattice of the atoms by virtue of the distance between atoms. This requires the solution of the equilibrium positions of each atom individually. This is a computationally intensive calculation since after each Monte Carlo iteration a new equilibrium position must be solved for. Elastic effects can be included, with less computational effort, if an overlaid continuum problem is solved as was done with the electric problem. This enables the simulation to include elastic effects without the need of calculating the energy state after every Monte Carlo iteration. The use of a continuum approach to solve for the elastic energy requires the partitioning of the lattice space into elements as done in the electrical problem. In fact, the same grid pattern used to define the location of electric potentials define the elements that are used in the finite element analysis of the specimen. The corners of the grids previously defined in Figure 4 are nodes of the finite elements. The determination of whether an element has material or not is done in the same manner as defined for the electrical problem. If an element has less than 0.75 atoms inside its boundaries, then the element is considered to be empty and have no contribution to the elastic energy of the specimen.
350
R. Atkinson
& A.
Cuitino
The constitutive relation used in this approach is that of an isotropic material. The underlaying structure of the lattice clearly would result in a more complex constitutive relation, and not an isotropic constitutive relation. The consideration of the material as isotropic in an effort to reduce computational time is justified since the scope of this research is to establish effectiveness of the statistical mechanics Monte Carlo method. We use here the simplest case to verify the inclusion of elastic energy affects the response of the atoms. Increased complexity of the solution to the electrical and elastic problem is reserved for future efforts. The objective here is to include electric and elastic effects and compare results to experimental and other simulation results. 3. Applications 3.1. Void Migration Electromigration
and Coalescence Forces
Driven
by
This section we study the translation response of a void subjected to different electric fields and temperatures. We limit our analysis to a void of constant initial size of d = 50 ADs. Voids of different sizes are analyzed in [1]. The goal here is to uncover some of the details that lead to the spawning of a leg from an initially circular void and to identify what occurs after the leg has been spawned and is allowed to continue to grow. The simulation test conditions range from temperatures of 600K, 700K, and 800-ftT; and electric current such that ^ ^ has values of 0.08, 0.161, 0.242. There are three simulations with ^ - = 0.08, ~^ = 0.161, ^rf = 0.242 each having a temperature equal to 6 0 0 ^ , 7QQK, and 800K. This results in a two dimensional array of nine simulation results enabling a comparison of temperature effects with constant electric field, and vice versa. The geometric boundary conditions for all nine of these simulations have periodicity boundaries on all sides of the lattice space so any atom may move across any lattice surface. The applied electric field is defined with the cathode on the right vertical edge of the lattice space. 3.1.1. Case 1: Low intensity Figures 9, 10, and 11 display simulation results for T = 600-ftT, 700K, and 800.£sf respectively; and all have an electric field such that ^P 5 - = 0.08. These simulations show that for T = 700-K" and 8001^ the void motion is stable, but for T = 600K the void does not translate in a stable fashion.
Multiscale Modeling of Interconnect
Lines
351
(a) Step 1
(b) Step 5.0 Million
(c) Step 10.0 Million
(d) Step 15.0 Million
(e) Step 20.0 Million
(f) Step 25.0 Million
Step 30.0 Million
(h) Step 35.0 Million
(i) Step 40.0 Million
Fig. 9.
Void motion in an electric field, T = 600/C, = ^ & = 0.08.
For each of these three simulations presented thus far, the only difference in environment has been temperature. Therefore, it is clear that there exists a relationship between temperature, electric field, and void size such that a critical combination exists where a transition from stable to unstable void translation will occur. In addition, notice the void motion is stable for both T = 700K and 800.ftT. This is due to the fact that the increase in temperature only serves to increase the surface mobility of the void. Therefore, it is reasonable to assume that if simulations are run for T > 800K and AJ?B = 0.08 the void would continue to translate in a stable
352
R. Atkinson
& A.
—i
Cuitino
i
i
i
i
_i
i_
i
i
i
>__u
(a) Step 1
(b) Step 5.0 Million
(c) Step 10.0 Million
(d) Step 15.0 Million
(e) Step 20.0 Million
(f) Step 25.0 Million
Step 30.0 Million
(h) Step 35.0 Million
(i) Step 40.0 Million
Fig. 10.
Void motion in an electric field, T = 700K,
4E«
~ ^
: 0.08.
fashion. Similarly, if simulations are run for T < 600K and ^ = 0.08 the void motion would continue to display unstable translation behavior.
3.1.2. Case 2: Intermediate
intensity
Figures 12, 13, and 14 display simulation results for T = 600if, 700/^, and 800.ftT respectively; and all have an electric field such that M i 0.161. EB This is an electric field that is twice the intensity of that in the previous three simulations discussed. The void evolution in these three simulations
Multiscale Modeling of Interconnect
200
Lines
353
300
(a) Step 1
(b) Step 5.0 Million
(c) Step 10.0 Million
(d) Step 15.0 Million
(e) Step 20.0 Million
(f) Step 25.0 Million
(g) Step 30.0 Million
(h) Step 35.0 Million
(i) Step 40.0 Million
Fig. 11.
Void motion in an electric field, T = 800K, ^^-
= 0.08.
yield more complex results than can be described by the term unstable. In fact, none of these three simulations translate in a stable fashion. They all exhibit an unstable void translation ranging from simple to complex. The simple unstable translation is shown in Figure 12 (T = 700K and A B J? = 0.161). From this simulation, one can see the void change shape from circular to horseshoe-like. This horseshoe shape can be thought of as the result of the growth of two legs from the initial void, one at the top and one at the bottom surface of the void. These legs spawned from the top and bottom of the void, in Figure 12, are very slender and continue to grow in
354
R. Atkinson
& A.
Cuitino
(a) Step 1
(b) Step 2.5 Million
(c) Step 5.0 Million
(d) Step 7.5 Million
(e) Step 10.0 Million
(f) Step 12.5 Million
(g) Step 15.0 Million
(h) Step 17.5 Million
(i) Step 20.0 Million
Fig. 12.
Void motion in an electric field, T = 600A",
E
length until the supply of empty space by the initial void is depleted. The result being two separate voids whose shape is crack-like and lay parallel to the electrical current direction. The surface of these two slender voids is fairly smooth for T = QQQK. A very similar result is obtained from the simulation where the temperature is increased to 700X. Figure 13 (T = 700K and ^ - = 0.161) shows the same initial horseshoe-like shape change that occurred for T = 600-FtT, as shown in Figure 12. The first notable difference, with the increase in temperature, is the change in leg shape. The shape of the leg, as shown in
Multiscale Modeling of Interconnect
Lines
355
(a) Step 1
(b) Step 2.5 Million
(c) Step 5.0 Million
(d) Step 7.5 Million
(e) Step 10.0 Million
(f) Step 12.5 Million
(h) Step 17.5 Million
(i) Step 20.0 Million
•
#
(g) Step 15.0 Million Fig. 13.
Void motion in an electric field, T = 700K, ^£-
= 0.161.
Figure 12 (T = 600K and ^ ^ = 0.161), is relatively long, thin, and flat. The shape of the leg, as shown in Figure 13 (T = 7 0 0 ^ and ^ff- = 0.161), is shorter, thicker, and rougher than the leg for the lower temperature. Again, the increased surface mobility due to the temperature increase is the reason the voids are smoother and longer for the lower temperature simulation. The second notable difference between the two simulation results is the shape evolution after the increase to T = 700K. The bottom leg of the horseshoe shape that has developed begins to pinch and emit small voids from it. This also occurs for T = BOOK but not until the 15
356
R. Atkinson
& A.
Cuitino
(a) Step 1
(b) Step 5.0 Million
(c) Step 10.0 Million
(d) Step 15.0 Million
(e) Step 20.0 Million
(f) Step 25.0 Million
or (g) Step 30.0 Million Fig. 14.
(h) Step 35.0 Million
(i) Step 40.0 Million
Void motion in an electric field, T = 800K, ^—^ = 0.161.
millionth Monte Carlo step. The release of the smaller void from the leg tip occurred at a much earlier stage in the void evolution for T — 700K, again indicating the surface mobility due to the temperature increase allowed for this shedding of voids to occur at a greater frequency. More complex unstable translation occurs for (T = 800K and ^ 0.161), as shown in Figure 14. This simulation shows that if the temperature and electric field combination is large enough, the increase of the surface mobility enables the two legs of the horseshoe to grow toward the centerline of the void and combine. This results in an island of material floating
Multiscale Modeling of Interconnect
Lines
357
in a void. The void then continues to translate upstream and then releases the island of material back into the bulk. The simulation of (T = 800K and 4l£ 0.161) as shown in Figure 14, though not visible by the snapshots shown in the Figure, also releases many subsequent voids from the small legs that appear just after the formation of the horseshoe shape. 3.1.3. Case 3: High intensity Figures 15, 16, and 17 display simulation results for T = 600iC, 700K, and 0.242. 800.fi' respectively; and all have an electric field such that AEE
(a) Step 1
(b) Step 2.5 Million
(c) Step 5.0 Million
(d) Step 7.5 Million
(e) Step 10.0 Million
(f) Step 12.5 Million
(g) Step 15.0 Million
(h) Step 17.5 Million
(i) Step 20.0 Million
Fig. 15.
Void motion in an electric field, T = 600-fC,
: 0.242.
R. Atkinson
358
& A.
Cuitino
(a) Step 1
(b) Step 2.0 Million
(c) Step 4.0 Million
(d) Step 6.0 Million
(e) Step 8.0 Million
(f) Step 10.0 Million
(h) Step 14.0 Million
(i) Step 16.0 Million
Step 12.0 Million Fie. 16.
Void motion in an electric field, T = 700K, —gr-
This is the largest electric field simulated with these conditions. These three simulation results, for AEE 0.242, show the same basic response for each temperature when compared to the three simulations having ^ P ^ =
0.161. All three simulations show unstable void motion and void breakup, as did the simulations for AJrB = 0.161. This is certainly expected since the intensity of the electric field is increased beyond a value that already led to unstable void motion and breakup. Figure 15 (T = 600K and ^ P ^ = 0.242) shows the typical horseshoe shape evolution development as seen in all but one of the unstable void translations, Figure 9 (T = 600A" and ^P^- = 0.08).
Multiscale Modeling of Interconnect
Lines
359
(a) Step 1
(b) Step 5.0 Million
(c) Step 10.0 Million
(d) Step 15.0 Million
(e) Step 20.0 Million
(f) Step 25.0 Million
(g) Step 30.0 Million
(h) Step 35.0 Million
(i) Step 40.0 Million
Fig. 17.
Void motion in an electric field, T = 800K, ^E-
= 0.242.
Figure 15 (T = 600/C and ^ - = 0.242) shows t h a t the shape of the two voids created are similar to those in Figure 12. T h e primary difference between the two simulation results is t h a t the larger electric current led to a higher frequency of void shedding from the legs. Figure 12 (T = QQQK and E E = 0.161) shows only one void being released from the tip of the leg. Therefore, the increased frequency of the release of voids from the leg tip is due to the increase in electric current. Figure 16 (T = 700K and g e = 0.242) also has an increased frequency of void releases from the leg tips in comparison to its smaller electrical current counterpart in Figure 13
360
R. Atkinson
& A.
Cuitino
(T = 70QK and ^P^- = 0.161). Again, the cause of this must certainly be due to the increase in electric current from ^p 2 - = 0.161 to - ^ = 0.242. Figure 16 (T = 700K and ^f- = 0.242) shows that after the horseshoe is well formed, the two legs grow toward the center-line of the void at a steeper angle than seen in any of the other simulations that do not form an island of material inside the void. If the simulation were allowed to continue, the two voids would certainly begin to interact. The steepness of the angle at which the two newly formed voids grow toward the center-line is a function of both the temperature and the electrical current intensity. Figure 17 (T = 800K and ^ = 0.242), like Figure 14 (T = 800K and ^ P ^ = 0.161), evolves into a void with an island of material inside of it. This results when the current density and mobility are such that the material is swept away so fast that rounding is impossible. Like Figure 14 (T = 800K and ^ f = 0.161), the simulation for (T = 8 0 0 ^ and A B g = 0.242) releases subsequent voids from the horseshoe legs. The size of the voids that are shed from the horseshoe legs for (T = 800K and gE = 0.242) are smaller than for any of the simulation combinations of temperature and electrical current. 3.1.4. Void shedding Figure 18 shows each of the nine simulations at the instance they shed their first subsequent void into the bulk. This array of snapshots enables one to see the effects of the combination of temperature and electric current on the size of voids that are released. Consider the middle row of snapshots in Figure 18. This row shows the effects of temperature on void release in a constant electric field. This clearly shows that the size of the void that is released is inversely proportional to the temperature. Similarly, the top row in Figure 18 shows the same relationship between temperature and void size for a constant electric current. Considering the comparison of columns in Figure 18 also leads to the conclusion that there exists a relationship between the size of a void that is released and the electric current. Each of the columns in Figure 18 show a decrease in released void size as the electric current is increased. Therefore, the relationship between released void size and electric current is also inversely proportional. The frequency at which voids are shed is also a function of temperature and current. An increase in either prompts an increase of shedding frequency. In summary, the results from this section lead to the conclusion that combinations of temperature and current densities lead to a transition from
Multiscale Modeling of Interconnect
Lines
361
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
Fig. 18. Snapshot of first release of subsequent void for all nine simulations. Columns are, from left to right, T = 600K, 700K, and 800K. Rows are, from top to bottom, —B- = 0.242, 0.161, and 0.08.
stable void motion to unstable void motion. Stable void motion is characterized by the defect's ability to translate through the bulk without changing shape to the extent that the void cannot travel as one unit. Unstable void motion is characterized by a change in shape that leads to the formation of legs from the top, bottom, or both top and bottom sides of the void. These legs grow in such a way as to give the shape of the void a horseshoe-like geometry. This horseshoe geometry then results in two non-connected voids
362
R. Atkinson
& A.
Cuitino
or an interaction between the two legs resulting in an island of material inside of a void. Whether the legs become two separate voids or combine and form an island of material inside of the void is a function of the temperature and current density. The results observed here are in fact consistent with other computational investigations of the effect of electrical current on preexisting defects in interconnect including the work of [5] [38] [41]. 3.2. Void Migration Electromigration
and Coalescence Driven and Stress Gradients
by
Concurrent
The effects of electromigration are further coupled with the effects of stress driven mass transport. The difference in the CTE between the conducting layer, barrier layer, and substrate results in a constant state of stress after the deposition and annealing is complete. The stress state that exists in the line certainly plays a role in the creation of defects at the interface of the substrate. These initial surface defects provide yet another source and sink for defects in the lines. The fact remains that the continual decrease in line interconnect size has brought the technology to a crossroads. The techniques being used to create these line interconnects is exceeding the capabilities of the materials being used. It is possible to have the line interconnects be fractured and unusable after the creation process is complete and no burn-through testing has even occurred. This signals the importance of the consideration of the thermal stresses that arise during normal use of any electronic device that possesses small features. This section considers the effect of stress driven mass transport based on the strain energy density associated with an atom selected for movement. Recall that the strain energy density for element i is given by Ui = \~d? [ kidVid~.
(33)
This strain energy density is then easily converted to energy based on the number of atoms inside of the element and the area of the element. The approach used in this investigation utilized a regular grid which makes up the elements used in the numerical calculation of the finite elements. Inside each element is four atoms, therefore the strain energy per atom inside of element i is ^
= ~ -
(34)
where Ui is the strain energy density per atom, and A is the area of the element.
Multiscale Modeling of Interconnect
Lines
363
All simulations done in this investigation have applied displacement boundary conditions on each of the vertical edges of the lattice space, and applied force boundary conditions (zero) at both horizontal edges of the lattice space. In this investigation, the stress state is defined in terms of an atom's energy state as it would be in a rectangular line interconnect (under uniaxial applied strain) with no defects on its surface or in its interior. This is defined in the same spirit as it is for electric current, per unit binding energy unit (jjf-)- This dimensionless ratio, as did for electric current ( ^ £ ) , reduces the elastic effects to a form that makes the comparison of effects due to simple surface diffusion and stress driven migration more logical for a statistical mechanics approach such as this. It is important to recognize the fact that even though there exists a nonzero strain energy at each occupied lattice site, the driving force behind the mass transport due to elastic effects is the gradient of the elastic energy field. For example, a large specimen under uniaxial tension has a homogeneous strain energy density and provides no preferential direction for mass transport. However, if there were a defect in that large specimen, then there would be a non-homogeneous strain energy density resulting in a nonzero gradient near the defect. However, the effects of the defect decay as the distance from the defect increases. The result of such a localized effect in a specimen means the stress driven mass transport is a local phenomena based on the presence of a stress gradient. Therefore, in the simplified 2-D plane stress model with uniaxial (x direction) applied displacement boundary conditions the resulting mass transport due to the inclusion of elastic effects is a severely localized one. The result, as seen in the simulation results, is a crack-like defect spawned at the location of greatest strain energy.
3.2.1. Case 1: High stress gradients with no electromigration In this simulation we including stress effects. The lattice space of (x = 250, y = 250) atoms (copper) has periodic boundary conditions on all edges. The temperature of the specimen is 700/^A uniaxial strain is applied in the x direction such that J 2 - = 0.30. A single void of 60 atomic diameters is placed at x = 75. This combination of environmental effects and boundary conditions simulates the effects of a small void in the bulk during annealing. The temperature and ^f- term can be adjusted to simulate variation in operating temperature and CTE mismatch, which is directly related to the stress.
364
R. Atkinson
(a) Step 1
• (d) Step 7.29 Million
& A.
Cuitino
(b) Step 2.43 Million
(c) Step 4.86 Million
•
#
(e) Step 9.72 Million
(f) Step 12.15 Million
• (g) Step 14.58 Million Fig. 19. 700X.
Void evolution under uniaxial applied stress, such that -^f-
0.30, and T
E
Figure 19 shows the simulation results for the case just described. Initially, the void begins to facet, as expected. Then the locations with the highest strain energy density, the corners of the facets, begin to push away material from those locations. The crystallographic orientation offset simulated (9c = 0°) provides a chance for growth along four corners since the uniaxial applied stress in the x direction results in the maximum strain energy occurring at the top and bottom of the defect. Consider the effect of an offset of 30°, the result of which would be much simpler than the offset
Multiscale Modeling of Interconnect
Lines
365
of 0°. If the crystal offset were set to 30° the initial faceting would result in two corners at the top and bottom of the void, resulting in no ambiguity as to where the maximum strain energy would occur. The material being swept away from the corners of the facets results in the growth of crack-like appendages from those locations, which continue to grow throughout the simulation. There are a few continuum simulations that have included the effects of stress coupled with electromigration. For example the work of Xia et al. (1997) [44] and Bathe et a/.(2000) [5], which are in general agreement with the simulations presented in here. 3.2.2. Case 2: High stress gradients with electromigration The next step is to consider the combination of electromigration and stress driven mass transport simultaneously. The simulation results shown in Figure 20 possess all of the same geometric and environmental initial conditions. The temperature is 700X, a uniaxial strain (x direction) is applied such that •^2- = 0.30, there exists an applied electric potential such that ^£- = 0.08, and the crystal offset is 0°. The initial growth location of the appendages from the voids are the same in both this simulation and the previous one. The appendages, with the inclusion of electric current, begin to sway upwind giving a tentacle-like appearance. The result of this swaying of these crack-like defects would inevitably result in the pinching of the appendage at some point, thus shedding a void. With the inclusion of stress, the top of the void begins to show a distinct tilt that it would not otherwise display with 9c = 0°. The inclusion of elastic effects results in a change in the mass flow rate through the point of highest strain energy density. This effectively reduces the surface migration from the windward side of the void to the leeward side resulting in the void spawning these lengthy crack-like appendages without much translation of the void itself.
3.2.3. Case 3: Low stress gradients with electromigration Figure 21 shows the simulation results for a specimen that is identical to the previous, see Figure 20. The only difference is the applied displacement boundary condition is such that jjf- = 0.15. The evolution of the void in this simulation is much different than the simulation with -ff0- = 0.30. The simulation evolution, for •^2- = 0.15, requires many more Monte Carlo steps to even initiate the crack-like defect. Only a single appendage grows
366
R. Atkinson
& A.
Cuitino
4 (a) Step 1
•
*
(d) Step 2.43 Million
(e) Step 3.24 Million
4
*
(g) Step 4.86 Million Fig. 20.
(b) Step 0.81 Million
(h) Step 5.67 Million
(c) Step 1.62 Million
*
(f) Step 4.05 Million
¥ (i) Step 6.48 Million
Void evolution under uniaxial applied stress, such that -^f- = 0.30, applied
electric current, such that
: 0.08, and T = 700K.
from the void in this simulation. Recall, the previous simulation resulted in three appendages growing immediately. This single appendage is spawned from the bottom left side of the void and begins to exhibit the same swaying upwind that was seen in Figure 20. The swaying of the appendage in Figure 21 is on a smaller scale and the appendage length is certainly much smaller than those seen in Figure 20. The pinning of mass transport at the tip of the appendage that is growing in Figure 21 begins to enable the bottom surface of the void to grow perpendicular to the electric current, which
Multiscale Modeling of Interconnect
Lines
367
(a) Step 1
(b) Step 10.1 Million
(c) Step 20.2 Million
(d) Step 30.3 Million
(e) Step 40.4 Million
(f) Step 50.5 Million
*
(g) Step 60.6 Million Fig. 21.
(h) Step 70.7 Million
(i) Step 80.8 Million
Void evolution under uniaxial applied stress, such that -g2- = 0.15, applied
electric current, such that
: 0.08, and T = 700K.
is similarly displayed in Figure 20. This growing of the bottom and top surfaces perpendicular to the direction of the electric current, with an offset of 8c = 0 ° , was not possible when elastic effects were not considered. In summary, the predictions from this mixed discrete-continuum approach, as expected, differ from isotropic continuum calculation[44]. The
368
R. Atkinson
& A.
Cuitino
main differences are the faceting of the specimen due to its inherent sixfold symmetry of the underlying lattice.
3.3. Void-Surface Forces
Interaction
under
Electromigration
The effects of a void near an external surface are investigated in this section. The void is placed at varying distances away from the initially flat external surface in an effort to see the manner in which the void and the surface will interact. There exists a distance between the void and surface where the interaction between them is strong enough to force the absorption of the void into the external surface. The reason this particular scenario is of interest is since the failure of the line interconnects is not due solely to surface migration. The collection and interaction of defects at fixed locations, such as grain boundaries and vias, is also a potential cause of failure because they serve as a sink for vacancies and voids moving in the bulk. In addition, line interconnects that are used in devices are passivated, whereas the majority of experimental investigations into line interconnect failure have considered unpassivated lines. Consider a defect formed at the interface of the passivation layer and the conducting material. This defect, if in an unpassivated line, could sweep material away from it along the external surface resulting in the size of the defect increasing because the material was able to be relocated. The consideration of the passivation layer restricts this particular scenario to a certain degree because the material is encased and cannot be so easily redistributed along the interface as it can be along the free surface. This passivation layer requires the transportation of vacancies and voids through the bulk in order for the size of a defect along the passivation interface to grow. This results in a 2 to 16 times increase in lifetime for copper line interconnects with a polymide passivation layer, as stated by Jiang and Chiou [25]. Therefore, the interaction between voids and surface defects is of increased concern when considering passivated lines since the surface migration is restricted in this case. The lattice space for these simulations is (x = 250, y = 250) atoms (copper) with periodic boundary conditions along the vertical edge of the lattice space. The horizontal edge of the lattice space is set to no periodicity in order to allow a flat surface for the void to interact with. The void's diameter is 50 ADs and is placed at three different distances from the surface such that the distance between the edge of the void and the surface, a, is
Multiscale Modeling of Interconnect
Lines
369
12, 6, and 3 ADs. The applied electric potential is such that ^ ^ = 0.08 and the temperature is set to 700-ftT. 3.3.1. Case 1: Void far from surface (12 atomic spacing) Figure 22 shows the simulation results for the void that is set 12 ADs away from the surface. The ratio of the radius of the void to the distance a is ^ = 2.083. The void begins to change its geometry by sweeping material away from its bottom surface faster than at its top surface, resulting in the formation of a leg similar to that seen in Figure 9. This is an expected result since the current density in the material between the void and the external surface is larger than the current density at the top surface of the void. The external surface is also affected by the presence of a void so near to it. The material is swept away at the closest point between the void and the external surface resulting in the formation of a ramp along the external surface. The leg grows along side the ramp and eventually the leg is pinched and forms a void which translates in a stable fashion. After the release of this void, another leg begins to grow from the location the first leg pinched. The reason the resulting void is shedding another void is because of the geometry of the pinched region. The pinched region is long enough to foster the growth of yet another leg that gives rise to the release of yet another void from the initial one. Though the void and external surface are clearly affecting each others evolution, the void never combines with the external surface in this simulation. 3.3.2. Case 2: Intermediate distance (6 atomic spacing) Figure 23 shows the simulation results for the same initial conditions described in the previous simulation, except the void is brought closer to the external surface and is a distance of 6 ADs from the external surface, — = 4.167. The void in this simulation also begins its evolution by spawning a leg at its bottom surface. This leg, as in the previous simulation, grows to a point and is then pinched. Thus, it releases a second void into the bulk. However, there are some notable differences between the simulation results in Figure 22 and 23. First, the size of the void that was shed decreased as the void was brought closer to the external surface. This is due to the increase in current density between the void and the external surface. This increase in current density is due to the fact that the void is now closer to the external surface, providing less material in which the current can flow. These effects
R. Atkinson & A. Cuitino
370
(a) Step 1
(b) Step 1.35 Million
(c) Step 2.7 Million
(d) Step 4.05 Million
(e) Step 5.4 Million
(f) Step 6.75 Million
(g) Step 8.1 Million
(h) Step 9.45 Million
(i) Step 10.8 Million
Fig. 22. Void motion near a surface in an electric field, T = 70QK, 5 = 2.083.
0.08,
of increased electric current with a fixed t e m p e r a t u r e resulting in smaller voids being shed was discussed in the previous section. Recall, as the electric current increased the size of the voids t h a t were shed became smaller. T h e same effect is seen in these simulations with a void near an external surface, and this is the reason the void t h a t was shed in Figure 22 is smaller t h a n t h a t in Figure 23.
Multiscale Modeling of Interconnect
371
Lines
(a) Step 1
(b) Step 0.54 Million
(c) Step 1.08 Million
(d) Step 1.62 Million
(e) Step 2.16 Million
(f) S t e p 2.7 Million
(g) Step 3.24 Million
(h) Step 3.78 Million
(i) Step 4.32 Million
Fig. 23.
Void motion near a surface in an electric field, T =
700K,
ABE
0.08,
^ =4.167.
Second, after the release of the void, the pinched region still attached to the initial void seemed to be reabsorbed back into the void. The simulation shown in Figure 22 resulted in a second void being released from the pinched location. The fact that the pinched region seemed to be reabsorbed into the void without spawning another leg can be attributed to the random surface shape inherent in a Monte Carlo process. More specifically, the surface just
372
R. Atkinson
& A.
Cuitino
after the release of the voids, in both Figures 22 and 23, are different. The bottom surface of the void, just after the release of the first void, is more flat for the simulation in Figure 22. The result being the swift passing of material along that surface from the pinched region. In comparison, the bottom surface of the void in Figure 23 just after the pinching is not smooth and has to go through a redistribution of material before becoming flat.
3.3.3. Case 3: Void near to surface (3 atomic spacing) Figure 24 shows the simulation results for a void that is 3 ADs away from the external surface, ^ = 8.333. The void and the external surface interact almost immediately in this simulation. The initial distance that resulted in combination is so small that it can be argued that the inherent randomness of the external surface and void could have led to combination without the inclusion of the electric current.
3.3.4. The role of lattice orientation Schimschak and Krug (2000) [38] performed exactly this simulation using ^ = 2 and ~ = 3. The differences between the simulations can be explained based on the different assumptions for in the surface diffusivity. The present simulations consider anisotropic lattice diffusion while in [38] isotropy is assumed. Recall, the use of a hexagonal lattice results in a surface diffusivity that has a six-fold anisotropy. Therefore, the orientation used in this investigation is such that it tends to elongate the initial voids and spawn legs that grow parallel to the line interconnect. If the crystal is rotated 30°, the interaction between the void and external surface would occur without the need to bring them as close. This is clear from the simulation done with the crystal offset (6 c) by 30° as shown in Figure 25. This simulation shown in Figure 25 has identical initial shape, boundary conditions, and environmental variables used in the simulation shown in Figure 9, T = 600K and ^ = 0.08. The only difference is the 6C = 30°. The resulting evolution of the void geometry after the rotation of the crystal is clearly different. The void, with the 30° offset in the crystal did not become elongated like the void with no offset did. This is due to the new relative orientation between the line interconnect and the crystal. The result being a more triangular shape with the top and bottom of the initially circular void growing at an angle to the line interconnect, as opposed to parallel to it as displayed with no crystal offset.
Multiscale Modeling of Interconnect Lines
373
100
200
(a) Step 1
(b) Step 27 Thousand
(c) Step 54 Thousand
(d) Step 81 Thousand
(e) Step 108 Thousand
(f) Step 135 Thousand
(g) Step 162 Thousand
(h) Step 189 Thousand
(i) Step 216 Thousand
Fig. 24. Void motion near a surface in an electric field, T = 70QK, ^ =8.333.
=
0.08,
T h e propensity of this crystal orientation to grow defects t h a t tend to expand along the line-width leads to the conclusion t h a t this orientation is more capable of fostering void breakup and surface defect growth across the line, thus resulting in shorter times to failure of the line interconnect. This idea is supported by all of the continuum techniques t h a t simulate electromigration with surface diffusivity anisotropy. [41] [30] [20] [32] [5]
374
R. Atkinson
150
200
250
100
150
200
250
(b) Step 6.75 Million
300
(d) Step 20.25 Million
150
200
250
Cuitino
300
(a) Step 1
50
& A.
300
(g) Step 40.5 Million
50
100
150
200
350
300
(e) Step 27.0 Million
50
100
150
200
(c) Step 13.5 Million
250
(h) Step 47.25 Million
(f) S t e p 33.75 Million
50
100
150
200
250
300
(i) Step 54.0 Million
Fig. 25. Void motion in an electric field with a crystal offset of 30°, T = 600.K, and ^ = 0.08.
[38] [18] [3] [34] This idea is also validated by experimental work by Chu et al.[ll] which clearly showed that the crystalographic orientation with the offset of 30° is significantly less resistant to the causes of failure than when the 9c = 0°. In fact, Chu et al.labeled the 30° offset as the preferred orientation for failure.
Multiscale Modeling of Interconnect
Lines
375
In summary, The three Monte Carlo simulations utilized a 0° offset which aligns the [101] direction parallel to the electric current flow direction. This, as stated by Chu et al., is the most likely orientation to result in late failure of the line interconnect. Conversely, the 9c = 30° puts the [101] direction perpendicular to the electric current flow and is less resistant to the effects that lead to failure of the line interconnect. The Monte Carlo simulation results seen in Figures 9 and 25 show two identical initial conditions, except Figure 9 has it's [lOl] direction parallel (9c = 0°) to the electric current flow direction and Figure 25 has it's [lOl] direction at an angle (9c = 30°) to the electric current flow direction. The comparison of results shows that, for 9c = 0°, there is a distinct tendency for the void to elongate and deform in such a way that it becomes flatter than its initial shape. The simulation results for 9c = 30° shows the void becomes more elongated, but in the direction perpendicular to the electric current flow. The propensity for certain orientations to be more resistant to the growth of a transgranular slit is also evidenced in each of the continuum simulation papers that included surface diffusivity anisotropy. The orientation of the surface diffusivity anisotropy used in the continuum papers has the same conclusion regarding the preferred orientation of a crystal in order to slow the growth of a transgranular slit. A consistent factor in the orientations that led to shorter times to failure, regardless of the degree of the anisotropy, is the orientation which has its surface tangent of its most mobile surface parallel to the electric current flow. This due to the fact that the ability to sweep material away from the more mobile surfaces means the defect will grow in that direction. If that direction is transgranular, as it is for the 9c = 30°, then the surface at the top and bottom of the void will have the largest magnitude of current density acting on it and simultaneously be the surface that can resist the movement of its surface atoms the least. This is precisely why the void elongates in the vertical direction for 9c = 30°, and elongates in the horizontal direction for 9C = 0°.
3.4. Interaction
with Grain
Boundaries
This section investigates the effects of grain boundaries on the mass transport of material without and with electric current present. The simulations of the grain boundary scenarios consist of a vertical grain boundary with no electric current, and a simulation with electric current and a grain boundary with its top edge tilted 20° in the same direction as the electron flow, which
376
R. Atkinson
& A.
Cuitino
in our simulations is always right to left. There have been several works, using a continuum representation of the material, done to investigate the effects of a grain boundary on mass transport at the interface. The model of the grain boundary used in these Monte Carlo simulations is simply a thin region of the lattice space where the binding energy of the atoms in that region is less than in the bulk.
3.4.1. Thermal grooving Figure 26 shows the simulation results for a vertical grain boundary in a lattice that is (x = 100, y = 120) atoms (copper). There is a periodic boundary condition along the vertical edges of the lattice space. There is a free surface a the bottom edge of the lattice space. The top surface of the lattice space is not periodic but is adjusted to keep vacancies from collecting at the top edge of the lattice space. The atoms that reside in the thin region (2 ADs wide) deemed grain boundary has an adjusted binding energy of 0.85EB, there is no applied electric current, and the temperature is set to 600^. The simulation results shown in Figure 26 shows the creation of a groove appearing with the point of the groove aligned precisely with the grain boundary region. This is a logical result, and is in agreement with the previous works that have utilized a continuum representation to model this situation. [4] [33] [27] Results from Sun and Suo [4] show a similar grooving effect occurring at the grain boundary edge. The similarity of this type of surface evolution in both methods, continuum and discrete, is a close one. The driving initial condition required by the continuum simulation is the angle of inclination of the surface adjacent to the grain boundary. This angle of inclination is a function of the surface tension of both the grain boundary and the surface neighboring it. The continuum simulations [33] [27] utilize that this angle is a constant and is what drives the surface migration of material. The atomic simulation, in effect, is doing the same thing by denning the changing binding energy of the atoms based on where in the lattice space they reside. The angle of inclination shown in Figure 26 remains fairly constant in nearly all of the snapshots. The inherent randomness of the atomic motion keeps the grain boundary interface geometry changing but most certainly has a tendency to orient itself with a constant angle of inclination. Also, shown in Figure 26, there were instances where the tip of the groove formed vacancies and voids just behind it and these newly created defects then walk along the grain boundary until
Multiscale Modeling of Interconnect
(a) Step 1
Lines
377
(b) Step 1.35 Million
(c) Step 2.7 Million
(d) Step 4.05 Million
(e) Step 5.4 Million
(f) Step 6.75 Million
(g) Step 8.1 Million
(h) Step 9.45 Million
(i) Step 10.8 Million
50
Fig. 26. energy =
00
Grain boundary triple point evolution, T = 6Q0K, and Grain Boundary binding 0.85EB-
being released into the bulk. Therefore, the grain boundary is yet another vacancy source in a line interconnect. 3.4.2. Void trapping at and emission from grain boundaries Figure 27 shows the simulation results for a tilted grain boundary (20°) with an applied electric current given by — ^ = 0.08 and a temperature
R. Atkinson
378
& A.
200
(a) Step 1
200
300
(d) Step 16.2 Million
Cuitino
300
(b) Step 5.4 Million
200
300
300
400
(c) Step 10.8 Million
400
(e) Step 21.6 Million
(f) Step 27.0 Million
200
(g) Step 32.4 Million
400
(h) Step 37.8 Million
300
400
(i) Step 43.2 Million
Fig. 27. Slanted (20°) grain boundary triple point evolution, T = &00K. Leeward grain boundary binding energy = Eg, windward grain boundary energy = 1.15EB-
of 600K . The lattice space is (x = 450, y — 210) atoms (copper) with the same periodicity boundary conditions as in the previous simulation. Figure 28 shows an identical simulation, the only change being an increase in temperature to 700.ftT. The binding energy of the atoms on the left side of the grain boundary is simply EB- The binding energy of the atoms on the right side of the grain boundary is set to 1.15EB- The grain boundary
Multiscale Modeling of Interconnect
379
Lines
350 300 -
100
(a) Step 1
200
Mfc< 300
400
(b) Step 5.4 Million
200
300
(c) Step 10.8 Million
•- " "» 100
200
300
400
(d) Step 16.2 Million
20O
ML 300
(g) Step 32.4 Million
100
200
300
400
(e) Step 21.6 Million
200
3O0
«»
(h) Step 37.8 Million
100
200
(f) S t e p 27.0 Million
200
300
(i) Step 43.2 Million
Fig. 28. Slanted (20°) grain boundary triple point evolution, T = 700K. Leeward grain boundary binding energy = EB, windward grain boundary energy = 1.15-Ea-
is tilted in an effort to determine if defects that accumulate at the grain boundary would show a tendency to walk along the interface if the dot product between the grain boundary tangent and the electric current is nonzero. A small void having a diameter of 16 atoms is placed on left side of the grain boundary. The electron flow is from right to left, therefore the void and any vacancies spawned from it will travel to the grain boundary
380
R. Atkinson
& A.
Cuitino
and be forced to interact with it. It is from these accumulations at the grain boundary, away from the free surface, that any vertical movement of the voids can be identified. Prom the results of the simulations shown in Figures 27 and 28, the small void on the left side of the grain boundary free surface begins to translate in a stable fashion toward the grain boundary. However, the void is shedding more vacancies from it than it is absorbing and therefore disappears but is recombined at the grain boundary. There is a formation of two voids along the grain boundary, in addition to the wedge shape being formed at the free surface grain boundary triple point. The two voids that accumulated at the grain boundary interface initially begin to translate along the grain boundary toward the wedge at the base of the specimen. This downward translation continues until the size of the void at the grain boundary becomes so large that the void then begins to elongate in the x direction. This elongation in the x direction then leads to the pinching of the top defect near the grain boundary, thus leaving a smaller defect at the grain boundary and releasing a void into the bulk of the windward grain. The bottom void that formed at the grain boundary also increases in size due to vacancy collection, however the entire void manages to cross the grain boundary leaving no defect at the grain boundary as the other void had. This occurrence is most likely because the lower void had gotten close enough to the wedge being formed that the resulting electric current density in the thin region between the void and the wedge increased and swept along all of the defect. This is along the same line of reasoning used in the previous set of simulations to explain the interaction between a free surface and a void just below that surface. As displayed already by the simulations showed in Figures 22, 23, and 24, the relative orientation of the crystal to the line interconnect is such that seems to prevent the interaction of a surface and a void. Consider the crystal is offset by 30°. The propensity, as shown by Figure 25, is for the void to elongate in the vertical direction. The result being the likelihood of a void near a surface or grain boundary triple point to combine with it instead of being swept away from it. The ability of the void to translate through a grain boundary, instead of being combined with another defect or surface, implies that defects present in the line interconnect will pass through a grain with a zero offset more easily than through a grain with an offset of 30°. This ability to pass the defects along without accumulation results in a longer life time of that region of the line interconnect. The wedge shape that forms in both Figures 27 and 28 is the result of the electric current sweeping material to the right side of the grain boundary,
Multiscale Modeling of Interconnect
Lines
381
and away from the left side of the grain boundary. The resulting effect is an accumulation of material on the windward side of the grain boundary, and a depletion of material on the leeward side of the grain boundary. A continuum simulation of this type done by Sun and Suo [4] shows precisely this phenomena of mass accumulation and depletion. In summary, the effects a grain boundary has on mass transport when combined with electromigration is an area of concern when the reliability of line interconnects is critical. The computational and experimental investigations into the effects of a grain boundary show that the grain boundary is certainly an potential source for failure of a line interconnect, as well are interfaces between substrates, barrier layers, passivation layers, vias, and any other location where between materials exists. The statistical mechanics Monte Carlo method is able to simulate the small scale effects of a grain boundary during electromigration and compares well to continuum numerical results as well as experimental results. These simulations serve as a proof of concept for introducing additional effects such as mobile grain boundaries and multiple interfaces (polycrystals). 4. Conclusions A mixed discrete-continuum paradigm for simulating the process of degradation and failure of interconnect line due to electromigration and stress gradients is proposed. The discrete calculations utilizes a Monte Carlo approach while the continuum calculations use the finite differences and finite elements. The bridging device is a mixed Hamiltonian, where short-range interactions are described discretely while long-range interactions are incorporated via continuous fields. This approach allows for following complex geometries and topologies resulting from defect formation, migration and coalescence as well as interaction with free surfaces and grain boundaries. The methodology naturally incorporates anisotropy in the diffusivity due to the crystal lattice. Several applications cases are presented, ranging from thermal grooving to trapping and emission of voids at grain boundaries. When appropriate the estimates of the present approach are contrasted with other theoretical and numerical estimates, and experimental results. Acknowledgments This work was supported by National Science Foundation CMS-96-10536. Alberto Cuitino thanks the Institute of Mathematical Sciences (IMS) at the National University of Singapore (NUS) for providing such a nice opportunity to visit the IMS as well as NUS.
382
R. Atkinson & A. Cuitino
References 1. R. Atkinson Ph.D. Dissertation, Rutgers University, New Brunswick, New Jersey, January 2003. 2. E. Arzt, O. Kraft, W. Nix, and J. Sanchez. Journal of Applied Physics, 76(3):1563-1571, 1994. 3. A. Averbuch and M. e. a. Israeli. Journal of Computational Physics, 167:316371, 2001. 4. S. B. and S. Z. A finite element method for simulating interface motion .2. large shape change due to surface diffusion. ACTA MATERIALIA, 45(12):4953-4962, 1997. 5. D. N. Bhate, A. Kumar, and A. F. Bower. Journal of Applied Physics, 87(4):1712-1721, 2000. 6. J. R. Black. IEEE Trans. Electron Devices, ED-16(338), 1969. 7. I. A. Blech and H. Sello. Physics of Failures in Electronics, 5:496-505, 1966. 8. A. Buerke, H. Wendrock, and K. Wetzig. Crystal Research and Technology, 35(6-7):732-730, 2000. 9. S. Chiras and D. R. Clarke. Journal of Applied Physics, 88(11):6302-6312, 2000. 10. J. Cho and C. V. Thompson. Applied Physics Letters, 54(25):2577-2579, 1989. 11. X. Chu, J. A. Prybyla, S. K. Theiss, and M. A. Marcus. Applied Physics Letters, 75(24):3790-3792, 1999. 12. F. M. d'Heurle and I. Ames. Applied Physics Letters, 16(80), 1970. 13. M. L. Dreyer and C. J. Varker. Applied Physics Letters, 60(15):1861-1863, 1992. 14. Q. Duan and Y.-L. Shen. Journal of Applied Physics, 87(8):4039-4041, 2000. 15. J. D. et. al. Microelectronics Reliability, 39:1617-1630, 1999. 16. R. G. F. et. al. Thin Solid Films, 388:303-314, 2001. 17. A. H. Fischer, A. Abel, M. Lepper, A. E. Zitzelsberger, and A. von Glasow. Microelectronics Reliability, 41:445-453, 2001. 18. Y. Gao, H. Fan, and Z. Xiao. Mechanics of Materials, 32:315-326, 2000. 19. R. J. Gleixner and W. D. Nix. Journal of Applied Physics, 86(4):1932-1944, 1999. 20. M. R. Gungor and D. Maroudas. Applied Physics Letters, 73(26):3848-3850, 1998. 21. P. S. Ho, F. M. d'Heurle, and A. Gangulee. Electro- and thermo- transport in metals and alloys, edited by R. E. Hummel and H. B. Huntington, page 180, AIME, New York, 1977. 22. C. K. Hu, L. Cignac, S. G. Malhotra, and R. Rosenberg. Applied Physics Letters, 78(7):904-906, 2001. 23. J. S. Huang, T. L. Shofner, and J. Zhao. Journal of Applied Physics, 89(4):2130-2133, 2001. 24. M. E. J. and B. G.T. Monte Carlo methods in statistical physics. Oxford, New York, 1999.
Multiscale Modeling of Interconnect
Lines
383
25. J.-S. Jiang and B.-S. Chiou. Journal of Materials Science: Materials in Electronics, 12(ll):655-659, 2001. 26. T. Kawanoue, H. Kaneko, M. Hasunuma, and M. Miyauchi. Journal of Applied Physics, 74(7):4423-4429, 1993. 27. M. Khenner, A. Averbuch, M. Israeli, M. Nathan, and E. Glickman. Computational Material Science, (20):235-250, 2001. 28. E. Kinsbron. Applied Physics Letters, 36(12):968-970, 1980. 29. T. G. Koetter, H. Wendrock, H. Schuehrer, C. Wenzel, and K. Wetzig. Microelectronics Reliability, 40:1295-1299, 2000. 30. O. Kraft and E. Arzt. Acta Metallurgica, 45(4):1599-1611, 1997. 31. J. T. Lau, J. A. Prybyla, and S. K. Theiss. Applied Physics Letters, 76(2):164166, 2000. 32. M. Mahadevan and R. M. Bradley. Physical Review B, 59(16):11037-11046, 1999. 33. M. Nathan and E. e. a. Glickman. Applied Physics Letters, 77(21):3355-3357, 2000. 34. T. O. Ogurtani and E. E. Oren. Journal of Applied Physics, 90(3):1564-1572, 2001. 35. Y. Park, V. K. Andleigh, and C. V. Thompson. Journal of Applied Physics, 85(7):3546-3555, 1999. 36. J. Proost, K. Maex, and L. Delaey. Journal of Applied Physics, 87(1):99-109, 2000. 37. J. H. Rose. Applied Physics Letters, 61(18):2577-2579, 1992. 38. M. Schimschak and J. Krug. Journal of Applied Physics, 87(2):695-703, 2000. 39. Z. Suo. Motions of microscopic surfaces in materials. ADVANCES IN APPLIED MECHANICS, VOL S3, 33:193-294, 1997. 40. Z. Suo and W. Wang. Journal of Applied Physics, 76(6):3410-3421, 1994. 41. W. Wang and Z. Suo. A simulation of electromigration-induced transgranular slits. JOURNAL OF APPLIED PHYSICS, 79(5):2394-2403, 1996. 42. W. Wang and Z. Suo. Shape change of a pore in a stressed solid via surface diffusion motivated by surface and elastic energy variation. JOURNAL OF THE MECHANICS AND PHYSICS OF SOLIDS, 45(5):709-729, 1997. 43. W. Wang, Z. Suo, and T. H. Hao. Journal of Applied Physics, 79(5):23942403, 1996. 44. L. Xia, A. Bower, Z. Suo, and C. Shih. A finite element analysis of the motion and evolution of voids due to strain and electromigration induced surface diffusion. JOURNAL OF THE MECHANICS AND PHYSICS OF SOLIDS, 45(9): 1473-1493, 1997.