DOI: 10.2478/s11533-007-0022-4 Research article CEJM 5(3) 2007 429–451
Scattering monodromy and the A1 singularity Larry Bates∗1, Richard Cushman2† 1
Department of Mathematics and Statistics, University of Calgary, Calgary,Alberta, T2N 1N4 Canada 2
Mathematics Institute, University of Utrecht, 3508TA Utrecht, the Netherlands
Received 9 January 2007; accepted 31 May 2007 Abstract: We present the notion of scattering monodromy for a two degree of freedom hyperbolic oscillator and apply this idea to determine the Picard-Lefschetz monodromy of the isolated singular point of a quadratic function of two complex variables. c Versita Warsaw and Springer-Verlag Berlin Heidelberg. All rights reserved. Keywords: scattering theory, Hamiltonian mechanics MSC (2000): 70E40
1
Introduction
We begin by defining the hyperbolic oscillator and then define the scattering angle associated to certain of its motions. Consider the Hamiltonian system on T ∗ R2 = R4 with coordinates (x1 , x2 , y1 , y2 ) and symplectic form ω = dx1 ∧ dx2 − dy1 ∧ dy2 , whose motions are governed by the Hamiltonian vector field Xv associated to the Hamiltonian function v : R4 → R : (x, y) → x1 y1 + x2 y2 .
(1)
The Hamiltonian system (v, R4, ω) is called the hyperbolic oscillator. Using the defining relation Xv ω = dv, a calculation shows that Xv (x, y) = y2 ∗ †
∂ ∂ ∂ ∂ − y1 − x2 + x1 . ∂x1 ∂x2 ∂y1 ∂y2
[email protected] current address: P.O. Box 209, Livelong, SK, S0M 1J0 Canada
(2)
430
L. Bates et al. / Central European Journal of Mathematics 5(3) 2007 429–451
The flow of the vector field Xv is ⎛ ⎜ ⎜ ⎜ ⎜ ϕv : R × R4 → R4 : (r, x, y) → ⎜ ⎜ ⎜ ⎝
⎞ cosh r 0 0 sinh r
0
0
sinh r
cosh r − sinh r
0
− sinh r cosh r 0
0
0
⎟⎛ ⎞ ⎟ ⎟ x ⎟⎜ ⎟ ⎟ ⎝ ⎠. ⎟ ⎟ y ⎠
(3)
cosh r
Removing the union of the 2-planes Π± = {(±y2 , ∓y1 , y1 , y2 ) ∈ R4 (y1 , y2) ∈ R2 },
(4)
which are the stable and unstable manifolds of the hyperbolic equilibrium point (0, 0) of Xv , every integral curve of Xv starting in R4∗ = R4 \ (Π+ ∪ Π− ) runs out of every compact subset of R4∗ containing the starting point in finite positive or negative time. Our main interest in this paper is the asymptotic behavior of the integral curves of Xv in R4∗ . A calculation shows that the Hamiltonian vector field Xu (x, y) = x2
∂ ∂ ∂ ∂ − x1 − y2 + y1 , ∂x1 ∂x2 ∂y1 ∂y2
(5)
corresponding to the Hamiltonian function 1 u : R4 → R : (x, y) → (x21 + x22 − y12 − y22). 2
(6)
commutes with the vector field Xv , that is, [Xu , Xv ] = 0. In other words, the Poisson bracket {u, v} = ω(Xu , Xv ) of the functions u and v vanishes. Thus the flow ⎛ cos t sin t 0 ⎜ ⎜ ⎜ − sin t cos t 0 ⎜ ϕu : R × R4 → R4 : (t, x, y) → ⎜ ⎜ 0 cos t ⎜ 0 ⎝ 0 0 − sin t
⎞ 0
⎟⎛ ⎞ ⎟ 0 ⎟ ⎟ ⎜x⎟ ⎟⎝ ⎠ ⎟ sin t ⎟ y ⎠ cos t
(7)
of Xu preserves Π± and therefore R4∗ . Moreover, the flows ϕvr and ϕut commute. The upshot of this discussion is that (v|R4∗ , u|R4∗, R4∗ , ω|R4∗) is a Liouville integrable system with energy-momentum mapping EM : R4∗ → R2∗ = R2 \ {(0, 0)} : (x, y) → (v(x, y), u(x, y)). Note that Π+ ∪ Π− = {(x, y) ∈ R4 v(x, y) = 0 = u(x, y)} = EM−1 (0, 0).
(8)
L. Bates et al. / Central European Journal of Mathematics 5(3) 2007 429–451
431
We now want to consider the asymptotic behavior of a certain family of integral curves of Xv on R4∗ . In energy-momentum space R2∗ with coordinates (h, ) consider the circle S defined by h2 + 2 = 14 R4 with R > 0. The map 1 h h 1 σ : S ⊆ R2∗ → R4∗ : s = h, ) → σ(s) = ( + R, , , − + R R 2 R R R 2
(9)
has the property that σ(s) ∈ EM−1 (s) for every s ∈ S, that is, σ is a section of the bundle over S formed by restricting EM (8) to EM−1 (S). For each s ∈ S Γσ(s) : R → R4∗ : r → ϕvr (σ(s)) = h (cosh r R
− sinh r),
h (cosh r R
−
( R + 12 R) cosh r + (− R + 12 R) sinh r,
sinh r), ( R
+
1 R) sinh r 2
+
(− R
+
1 R) cosh r 2
(10)
is the integral curve of Xv on R4∗ which starts at σ(s). The asymptotic directions of Γσ(s) 1 are limr→±∞ Γσ(s) Γ (r), where is the norm associated to the Euclidean inner (r) σ(s) 4 product , on R . Using (10) we see that the asymptotic directions of Γσ(s) are given by lim D(r)−1 ( R + 12 R) + (− R + 12 R) tanh r, Rh (1 − tanh r), r→±∞ h 1 1 (1 − tanh r), ( + R) tanh r + (− + R) , (11) R R 2 R 2
where D(r) = R2 + R2 tanh2 r. Since limr→±∞ tanh r = ±1, we conclude that the asymptotic directions of Γσ(s) are ⎧ ⎪ ⎨ √1 (1, 0, 0, 1), as r → +∞ 2 (12) ⎪ ⎩ √ 12 2 (, h, h, −), as r → −∞. 2(h + )
Let π : T ∗ R2 → R2 : (x, y) → x
(13)
be the bundle projection map which assigns to a point in phase space the corresponding point in configuration space. For s = (h, ) ∈ S the image of Γσ(s) (9) under π is the curve γs : R → R2 : r → ( R + 12 R) cosh r + (− R + 12 R) sinh r, Rh (cosh r − sinh r) . (14) The image of γs is the branch Hs of the hyperbola in configuration space R2 (with coordinates (x1 , x2 )) given by ⎧ ⎪ 1 2 ⎨ x2 ≥ 0, if h ≥ 0 x2 (hx1 − x2 ) = h , (15) ⎪ 2 ⎩ x2 ≤ 0, if h ≤ 0. The asymptotes of Hs have directions (1, 0) and √h21+2 (, h). The angle ϑs = tan−1 h between these directions is called the scattering angle of γs . More precisely, ϑs is the
432
L. Bates et al. / Central European Journal of Mathematics 5(3) 2007 429–451
5.
ϑs = π
6. ϑs = 3π/4 7. ϑs = π/2 5.
ϑs = π/4
4. • 3. •
• 7.
8.
ϑs = 5π/4
•
• 6.
8. •
4.
9. 1. • • • ∗ s
h
3. ϑs = 3π/2
2. •
2. ϑs = 7π/4
9. ϑs ∼ 0
1.
ϑs ∼ 2π
Fig. 1 The scattering angle ϑs as s traverses the circle S in the energy-momentum plane counterclockwise from s∗− to s∗+ . The variation in ϑs is 2π. counterclockwise rotation which carries the outgoing asymptote of Hs with direction (1, 0) onto the incoming asymptote with direction √h21+2 (, h).∗ As s = (h, ) traverses the circle S clockwise starting and ending at s∗ = (0, − 12 R2 ), the scattering angle of γs increases by 2π, see figure 1. The variation of 2π in the scattering angle is called the scattering monodromy of the family s → γs . The problem we address is to give a geometric interpretation of scattering monodromy in phase space. It is not the variation of the angle between the asymptotic directions of the curve Γσ(s) as s traverses the circle S, because this angle is 12 π for every s. Hence its variation vanishes. Nor is it related to Hamiltonian monodromy [2], that is, the variation of the period lattice, because we show that the hyperbolic oscillator on R4∗ has global action angle coordinates. In this paper we show that scattering monodromy is the variation in the asymptotic twist of Γσ(s) as s traverses S. In addition, we show that scattering monodromy is a topological invariant, because it gives rise to a nonidentity map on a certain relative first homology group. ∗
The scattering angle in physics is defined as the angle as measured from the incoming asymptote, thought of as a ray from the origin, to the outgoing asymptote, see Synge [7, figure 18,p.73]. The problem with this definition is: this angle is not continuous in the parameters h and , when < 0 and h 0, because the sense of the rotation measuring the scattering angle changes discontinuously as h passes through 0. See diagrams 9 and 1 in figure 1.
L. Bates et al. / Central European Journal of Mathematics 5(3) 2007 429–451
433
Here is an outline of the contents of this paper. In section 2 we show that the two degree of freedom integrable Hamiltonian system (v|R4∗ , u|R4∗, R4∗ , ω|R4∗) has global actionangle co-ordinates. Even though action-angle coordinates define an affine structure on each nonzero level set of the energy-momentum map, the asymptotic twist of the curve Γσ(s) cannot be seen in these coordinates as it uses the projection map from phase space to configuration space, while action-angle coordinates, being given by a symplectic change of variables, ignore the distinction between position and momenta. We need something more. In section 3 we introduce a connection on the principal bundle EM−1 (S∗ ) with S∗ = S \ {s∗ }) to measure the asymptotic twist of Γσ(s) . As s traverses S∗ starting and ending at s∗ in the limit, the variation of the asymptotic twist of Γσ(s) is 2π. In section 4 we show that this variation does not depend on the choice of connection used to measure the asymptotic twist and is in fact a topological invariant. In the last section we relate the scattering monodromy to the Picard-Lefschetz monodromy of the A1 singularity associated to the critical value 0 of the holomorphic function 1 F : C2 → C : (z1 , z2 ) → (z12 + z22 ), 2
(16)
see [4, §3.A, pp. 36–41]. The holomorphic Hamiltonian vector field XF on (C2 , Ω = dz1 ∧ dz2 ) associated to F gives rise to our original integrable Hamiltonian system (v, u, R4, ω).
2
Action-angle coordinates
In this section we construct global action-angle coordinates for the Liouville integrable system (v|R4∗ , u|R4∗, R4∗ , ω|R4∗). Consider the S 1 × R-action Φ : (S 1 × R) × R4∗ → R4∗ : ((t, r), (x, y)) → (ϕvr ◦ ϕut )(x, y),
(17)
where ϕvr (3) is the flow of the vector field X v (2) and ϕut (7) is the flow of the vector field Xu (5). Note that these flows commute. From corollary 5.1.2 of §5.1 it follows that the S 1 × R action Φ on R4∗ is free and proper. Its orbit map given by the energy-momentum map EM : R4∗ → R2∗ = R2 \ {(0, 0)} : (x, y) → (v(x, y), u(x, y))
(18)
of the the integrable system (v|R4∗, u|R4∗ , R4∗ , ω|R4∗), see corollary 5.1.4. By corollary 5.1.6 the S 1 × R principal bundle exhibited by the map EM (18) is real analytically trivial. With this preparation we can prove Proposition 2.1. The diffeomorphism = dt ∧ dh + dr ∧ d) → (R4 , ω|R4) : Ψ : ((S 1 × R) × R2∗ , Ω ∗ ∗ ((t, r), (h, )) → (ϕvr ◦ ϕut )(σ(h, ))
(19)
434
L. Bates et al. / Central European Journal of Mathematics 5(3) 2007 429–451
defines global action-angle coordinates for the Liouville integrable system (v|R4∗ , u|R4∗, R4∗ , ω|R4∗). Here σ : R4∗ → R4∗ is the section of the bundle EM : R4∗ → R2∗ given by equation (9). Proof. We calculate the pull back of the 2-form ω|R4∗ by the map Ψ by evaluating ω on the basis { ∂Ψ , ∂Ψ , ∂Ψ , ∂Ψ }. Since ϕut and ϕvr are commuting linear symplectic maps ∂t ∂r ∂h ∂ of (T(x,y) R4∗ , ω(x, y)) = (R4∗ , ω|R4∗) into itself, we obtain ϕvr (Xu (x, y)) = Xu (ϕvr (x, y)) and similarly ϕut (Xv (x, y)) = Xv (ϕut (x, y)). Also d ∂ϕut (x, y) = ∂t dr and similarly ∂Ψ ∂t ∂Ψ ∂r ∂Ψ ∂h ∂Ψ ∂
∂ϕvr (x, y) ∂r
=
=
ϕut+r (x, y) = Dϕut (x, y)
r=0
ϕvr (Xv (x, y)).
d dr
ϕur (x, y) = ϕut (Xu (x, y)),
r=0
Therefore
∂ϕut v (ϕr (σ(h, ))) = ϕut Xu (ϕvr (σ(h, ))) = (ϕut ◦ ϕvr )(Xu (σ(h, ))) ∂t
= ϕvr Xv (ϕut (σ(h, ))) = (ϕut ◦ ϕvr )(Xv (σ(h, ))) ∂σ ∂σ = D(ϕut ◦ ϕvr )(x, y) = (ϕut ◦ ϕvr ) ∂h ∂h ∂σ = (ϕut ◦ ϕvr ) . ∂
Consequently it suffices to evaluate ω on the basis {Xu (σ(h, )), Xv (σ(h, )), ∂σ ∂σ , }. Let , be the Euclidean inner product on R4 . Then ∂h ∂ ω(Xu (σ(h, )), Xv (σ(h, ))) = ⎛ ⎞ 0 −1 0 0 ⎜ ⎟ ⎟ ⎜ ⎜1 0 0 0⎟ ⎜ ⎟ = ⎜ ⎟ ⎜ ⎟ ⎜0 0 0 1⎟ ⎝ ⎠ 0 0 −1 0
⎛
⎞ ⎛ h R
⎜ ⎟ ⎜ ⎟ ⎜ − ( + 1 R) ⎟ ⎜ ⎟ R 2 ⎜ ⎟, ⎜ ⎟ 1 ⎜ R − 2R ⎟ ⎝ ⎠ h −R
− ⎜ R ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝
R
⎞ +
1 R 2 ⎟
−
h R
−
h R
⎟ ⎟ ⎟ ⎟ = 0, ⎟ ⎟ ⎠
+ 12 R
using h2 +2 = 14 R4 . Similar calculations give ω(Xu (σ(h, )), ∂σ ) = 1, ω(Xu (σ(h, )), ∂σ )= ∂h ∂ ∂σ ∂σ ∂σ ∂σ 0, ω(Xv (σ(h, )), ∂h ) = 0, ω(Xv (σ(h, )), ∂ ) = 1, and ω( ∂h , ∂ ) = 0. Therefore the ma⎛ ⎞ 0 0 10 ⎜ ⎟ ⎜ ⎟ ⎜ 0 0 0 1⎟ ⎟ = dt ∧ dh + dr ∧ d. = Ψ∗ (ω|R4) is ⎜ trix of Ω ⎜ ⎟, that is, Ω ∗ ⎜ ⎟ ⎜ −1 0 0 0 ⎟ ⎝ ⎠ 0 −1 0 0 Corollary 2.2. The Hamiltonian monodromy of the Liouville integrable system (v|R4∗ , u|R4∗, R4∗ , ω|R4∗) is trivial. Proof. This follows immediately from the existence of global action-angle coordinates, see Duistermaat [2].
L. Bates et al. / Central European Journal of Mathematics 5(3) 2007 429–451
3
435
Scattering phase and asymptotic twist
In this section we define an S 1 -principal bundle with a principal connection. Using this connection we can define the scattering phase of each member of a family of integral curves of the vector field Xv , which lies in the bundle. The scattering phase of each such integral curve measures its asymptotic twist. Recall that (h, ) are coordinates on R2∗ and that S is the circle h2 + 2 = 14 R4 . Because EM(R4∗ ) = R2∗ , we get EM−1 (S) ⊆ R4∗ . Therefore the action-angle coordinate diffeomorphism Ψ (19) restricts to the diffeomorphism ΨS : (S 1 × R) × S → EM−1 (S) ⊆ R4∗ . The map ΨS intertwines the S 1 -action S 1 × ((S 1 × R) × S) → (S 1 × R) × S : (t , ((t, r), (h, ))) → ((t + t, r), (h, )), with the S 1 -action on EM−1 (S) given by restricting the flow ϕut of the vector field Xu on R4∗ to EM−1 (S). Because EM−1 (S) is the disjoint union of EM−1 (s) for s ∈ S and for each s ∈ S the level set EM−1 (s) ⊆ R4∗ is invariant under the flow ϕut , it follows that EM−1 (S) is invariant. Since the S 1 orbit space ((S 1 × R) × S)/S 1 is diffeomorphic to S × R, the orbit space EM−1 (S)/S 1 is diffeomorphic to S × R. Because the S 1 -action ϕut on R4∗ is free and proper, it is free and proper on EM−1 (S). Therefore the orbit map ρ : EM−1 (S) → EM−1 (S)/S 1 = S × R : p = ΨS ((t, r), (h, )) =
(ϕvr ◦ ϕut )(σ(h, ))
→ (EM(p), r) = ((h, ), r).
(20)
defines an S 1 -principal bundle and : S × R → EM−1 (S) : (s, r) → ΨS ((r, 0), s) = ϕv (σ(s)) Σ r
(21)
is a section. Therefore the principal bundle ρ (20) is trivial. Let S∗ = S \ {s∗ }, where s∗ = (0, − 12 R2 ). Since EM−1 (S∗ ) is the disjoint union of EM−1 (s) for s ∈ S∗ and each fiber EM−1 (s) is invariant under the flow of Xu on R4∗ , it follows that EM−1 (S∗ ) is S 1 -invariant. Therefore the S 1 -principal bundle ρ = ρ|EM−1 (S∗ ) : EM−1 (S∗ ) → EM−1 (S∗ )/S 1 = S∗ × R
(22)
is trivial. We now define a connection on the principal bundle ρ (22). Consider the 1-form dθ =
x2 dx1 − x1 dx2 x21 + x22
(23)
on R4∗∗ = R4∗ \ ({x = 0} ∩ R4∗ ). The kernel of dθ defines the horizontal distribution of the connection; while the vector field Xu spans its vertical distribution. From the fact that 1 EM({x = 0} ∩ R4∗ ) = {(0, − (y12 + y22 )) ∈ R4∗ y ∈ R2∗ } = {0} × R<0 2
436
L. Bates et al. / Central European Journal of Mathematics 5(3) 2007 429–451
it follows that EM({x = 0} ∩ R4∗ ) ∩ S∗ = ∅. Therefore EM−1 (S∗ ) is contained in R4∗∗ . So we may pull back the 1-form dθ (23) by the inclusion map ι : EM−1 (S∗ ) → R4∗∗ to a 1form ι∗ dθ on EM−1 (S∗ ). Since Xu ι∗ dθ = 1 on EM−1 (S∗ ), the 1-form ι∗ dθ is invariant under the flow of Xu on EM−1 (S∗ ). Thus ι∗ dθ defines a connection on the S 1 -principal bundle ρ : EM−1 (S∗ ) → S∗ × R (22). In what follows we will denote this 1-form by dθ. ◦ ιS , Now S ⊆ EM−1 (S) is the image of the circle S under the diffeomorphism σ = Σ is the map given by (21). For each see (9). Here ιS : S → {0} × S : s → (0, s) and Σ For s ∈ S let Γσ(s) be the integral curve of Xv on EM−1 (S), which starts at σ(s) ∈ S. each s ∈ S∗ the infinitesimal elevation † of Γσ(s) with respect to the connection dθ is dθ = (Xv dt
dθ)(Γσ(s) (t)).
(24)
The 1-form dθ on R4∗∗ is well defined and smooth, even though θ = tan−1 xx12 is multivalued. So equation (24) is well defined. From the expression for Xv given in (2) and the definition (23) of the 1-form dθ a calculation shows that equation (24) may be written as x1 (t)y1 (t) + x2 (t)y2 (t) dθ = , dt x21 (t) + x22 (t)
(25)
where Γσ(s) (t) = (x(t), y(t)). To integrate (25) and give a proof of theorem 3.1, we apply the technique of reduction of symmetry in Hamiltonian systems, which involves the use of invariant theory, see [1, Appendix 2]. The flow of the vector field Xu on (R4∗ , ω|R4∗) is periodic of period 2π and thus defines an S 1 -action on R4∗ . The algebra of S 1 -invariant polynomials on R4∗ is generated by π1 = x1 y2 − x2 y1 π2 = x1 y1 + x2 y2
π3 = 12 (x21 + x22 + y12 + y22 ) π4 =
1 (x21 2
+
x22
−
y12
−
(26)
y22 ).
These polynomials satisfy π22 + π42 = π32 − π12 ,
(27)
which comes from the identity π12 + π22 = (x21 + x22 )(y12 + y22) = π32 − π42 . Together with the inequalities π22 + π42 > 0, and π3 ≥ 0
(28)
equation (27) defines a semialgebraic variety V , which is the image of the orbit map ξ : R4∗ → R4 : (x, y) → π1 , π2 , π3 , π4 †
This terminology is due to Gijs Tuynman.
(29)
L. Bates et al. / Central European Journal of Mathematics 5(3) 2007 429–451
437
of the S 1 action on R4∗ . In other words, V is the space of orbits of this S 1 -action. For s = (h, ) ∈ S, the submanifold EM−1 (s) of R4∗ is defined by x1 y1 + x2 y2 = h 1 (x21 2
+
x22
−
y12
−
y22 )
(30)
= .
Thus the orbit space Ps = EM−1 (s)/S 1 is the semialgebraic variety defined by (27), (28), π2 = h, and π4 = .
(31)
In other words, Ps is defined by π32 − π12 = π22 + π42 = h2 + 2 , π3 ≥
√
1 h2 + 2 = R2 > 0 2
(32)
and therefore is diffeomorphic to {s}×R. The image under the orbit map ξ of the integral curve t → Γσ(s) (t) (9) of the vector field Xv starting at σ(s) is the curve Cs : R → Ps ⊆ R2 : r → π3 (r), π1 (r) = = R2 12 (cosh2 r + sinh2 r), cosh r sinh r .
(33)
Cs traces out the branch Ps of the hyperbola defined by (32). The S 1 -principal bundle ρ : EM−1 (S) → S × R (20) is given by restricting the orbit map ξ (29) to 1 1 EM−1 (S) = {(x, y) ∈ R4∗ (x1 y1 + x2 y2 )2 + (x21 + x22 − y12 − y22)2 = R4 , R > 0}. 4 4 To see this first note 1 ξ(EM−1 (S)) = {(π1 , π2 , π3 , π4 ) ∈ V π22 + π42 = R4 , R > 0}. 4 Using the defining relations (27) and (28) for V we see that 1 ξ(EM−1 (S)) = {(π2 , π4 ) ∈ R2 π22 + π42 = R4 > 0}× 4 1 4 1 2 2 2 × {(π3 , π1 ) ∈ R π3 − π1 = R , π3 ≥ R2 > 0} = S × R ⊆ R2 × R2 , 4 2 which proves the assertion. Since the image of σ(S), see (9), under the bundle projection map ρ = ξ|EM−1 (S) is S × {( 12 R2 , 0)}, the cylinder S × R is a hyperboloid of revolution in R3 , whose waist is S and whose fiber over s ∈ S is Ps (32). It is geometrically natural to divide the cylinder S × R into an upper and a lower half each bounded by S. This geometric splitting will be used later on. We now determine the Poisson structure on R4 with coordinates (π1 , π2 , π3 , π4 ). For f, g ∈ C ∞ (R4∗ ) the Poisson bracket is {f, g} =
∂f ∂g ∂f ∂g ∂f ∂g ∂f ∂g − + − . ∂x1 ∂x2 ∂x2 ∂x1 ∂y2 ∂y1 ∂y1 ∂y2
438
L. Bates et al. / Central European Journal of Mathematics 5(3) 2007 429–451
A calculation gives {π1 , π2 } = 2π3 ,
{π1 , π3 } = 2π2 ,
{π1 , π4 } = 0,
{π2 , π3 } = −2π1 ,
{π2 , π4 } = 0,
{π3 , π4 } = 0.
This determines the structure matrix of the Poisson bracket { , } on C ∞ (R4 ). The S 1 invariant Hamiltonian vector field Xv |R4∗ induces a Poisson vector field Xπ2 on (R4 , { , }), whose integral curves satisfy π˙ 1 = {π1 , π2 } = 2π3
π˙ 2 = {π2 , π2 } = 0
π˙ 3 = {π3 , π2 } = 2π1
π˙ 4 = {π4 , π2 } = 0.
(34)
Using the invariants (26) and equation (30), we can rewrite (25) as θ˙ =
h . π3 (t) +
(35)
Here s = (h, ) ∈ S∗ and Cs : R → Ps ⊆ R4 : r → (π3 (r), π1 (r)) (33) is an integral curve of Xπ2 on S × R. Integrating (35) gives ∞ ∞ h ˙ dt dθ = θ dt = Θ(s) = Θ(h, ) = −∞ −∞ π3 (t) + Γσ(s) h dπ3 h dπ3 = = , using the third equation in (34) ˙3 Cs π3 + π Cs π3 + 2π1 ∞ h
dπ3 , using (32) = √ 2 2 2 h2 +2 (π3 + ) π3 − (h + ) h (36) = tan−1 . To see that equation (36) holds we compute h d π3 u−2 d u
= −h π3 + π32 − (h2 + 2 ) u−1 (u−1 − )2 − h2 − 2 where u−1 = π3 + dv du = −√ , = −h2
α2 − v 2 2 + h2 − ( + h2 u)2 √ 2 u and α = h2 + 2 where v = + h 2 + h v = − d sin−1 √ π3 + . = − d sin−1 α h2 + 2 Therefore ∞ √
h2 +2
h d π3
= sin−1 1 − sin−1 2 2 2 π3 + π3 − (h + )
√ 2 h + 2
h = tan−1 .
L. Bates et al. / Central European Journal of Mathematics 5(3) 2007 429–451
439
For every s ∈ S∗ we call Θ(s) (36) the scattering phase of the positively oriented trajectory Γσ(s) of Xv on EM−1 (S∗ ) with respect to the connection 1-form dθ. Comparing the above calculation of the scattering phase with the calculation of the scattering angle given in §1 proves Theorem 3.1. For every s = (h, ) ∈ S∗ the scattering phase Θ(s) = tan−1 h of the trajectory Γσ(s) of Xv on EM−1 (S∗ ) with respect to the connection dθ is equal to the scattering angle ϑs of the branch Hs (15) of the hyperbola given by the image of Γσ(s) under the projection map π (13). We now give a second proof of theorem 3.1 that does not involve the theory of reduction of symmetries of Hamiltonian systems. This proof is due to one of the referees. Proof. Let dϑ be the angle 1-form
x1 dx2 −x2 dx1 x21 +x22
in configuration space R2∗ . Using the
definition we see that the connection 1-form dθ on EM−1 (S∗ ) is the pull back of − dϑ by the bundle projection map π : T ∗ R2 → R2 : (x, y) → x followed by restriction to EM−1 (S∗ ). Therefore
dθ = −
Θ(s) = Γσ(s)
=−
γs =π ◦ Γσ(s) ∞
π ∗ dϑ Γσ(s) ∞ dϑ = − γs∗ dϑ, where γs is the curve (14) −∞
=−
−∞
dϑ(γs (t)) = ϑ(γs (−∞)) − ϑ(γs (∞)).
Since γs traces out the branch Hs of the hyperbola (15), whose incoming asymptotic direction is √h21+2 (, h) with angle ϑ(γs (−∞)) = tan−1 h and whose outgoing asymptotic direction is (1, 0) with angle ϑ(γs (∞)) = 0, the angle swept out by γs is the counterclockwise rotation taking the outgoing asymptote to the incoming asymptote of Hs , namely the scattering angle ϑs = tan−1 h . −1 v Let Γ± σ(s) = {ϕr (σ(s)) ∈ EM (S∗ ) ± r ≥ 0} be a positively oriented segment of the trajectory Γσ(s) starting at σ(s). If we integrate (35) over the segment Γ+ σ(s) we obtain
+
Θ (s) =
dθ = Γ+ σ(s)
=
1 2
Cs
0
∞
θ˙ dt = Cs ∩{π1 ≥0}
1 h dπ3 = Θ(s); π3 + π˙ 3 2
h dπ3 π3 + π˙ 3 (37)
440
L. Bates et al. / Central European Journal of Mathematics 5(3) 2007 429–451
while if we use the segment Γ− σ(s) we get −
Θ (s) =
Γ− σ(s)
−∞
dθ = 0
θ˙ dt = −
0
θ˙ dt −∞
h dπ3 h dπ3 =− =− π3 + π˙ 3 Cs ∩{π1 ≤0} π3 + 2π1 Cs ∩{π1 ≤0} 1 1 h dπ3
Θ(s). = = 2 Cs π3 + π32 − (h2 + 2 ) 2
(38)
For every s ∈ S∗ we call Θ± (s) the scattering phase of the positively oriented segment Γ± σ(s) with respect to the connection 1-form dθ. Corollary 3.2. Let S± = {s = (h, ) ∈ S∗ ± h ≥ 0} and set s∗± = lim s∈S± s. Then s→s∗
Θ± (s∗+ ) = lim
s∈S+ s→s∗+
1 1 Θ(s) = π and Θ± (s∗− ) = lim Θ(s) = 0. s∈S 2 − 2 ∗
(39)
s→s−
Proof. We consider only the case Θ+ (s). As s → s∗+ with s ∈ S+ , the point s lies in the fourth quadrant of the circle S. Therefore Θ+ (s∗+ ) = 12 limh→0+ tan−1 h = π, using theorem 3.1. As s → s∗− with s ∈ S− , the point s lies in the third quadrant of the circle S. Therefore Θ+ (s∗− ) = 0. We now explain how the scattering phase Θ± (s) is the elevation of the positively oriented segment Γ± σ(s) starting at σ(s). To measure elevation we need a positively oriented horizontal reference segment. Because scattering phase is just the integral of the infinitesimal elevation, it vanishes as well on the reference segment. We construct a horizontal reference segment as follows. Consider the point s0 = (0, 12 R2 ) = (h0 , 0 ) ∈ S∗ . Then the ± h0 ˙ infinitesimal elevation of Γ± σ(s0 ) vanishes, since θ = π3 (t)+0 = 0. Thus the segment Γσ(s0 ) is horizontal and hence provides a reference segment for measuring elevation. Because the ± segments Γ± σ(s) and Γσ(s0 ) have no point in common, we cannot determine the elevation of Γ± σ(s) yet. To overcome this difficulty we use the parallel translation operator Pσ(s),σ(s0 ) along the curve γ : [0, 1] → EM−1 (S∗ ) which joins σ(s) to σ(s0 ) and is transverse to the fibers of the S 1 -principal bundle ρ : EM−1 (S∗ ) → S∗ × R. Thus under the map ρ the curve γ projects to a smooth curve γ : [0, 1] → S∗ × R. The following argument shows that scattering phase is preserved under parallel translation. Let Γ± σ(s),σ(s0 ) be the oriented segment resulting from applying the parallel translation operator Pσ(s),σ(s0 ) to the oriented segment Γ± σ(s) . Because the curve γ along which the parallel transport takes place is not horizontal with respect to the connection dθ, the scattering phase Θ± (s, s0 ) of the parallel transported segment Γ± σ(s),σ(s0 ) is not equal to ± ± the scattering phase Θ (s) of the original segment Γσ(s) . Indeed, it must be corrected by the scattering phase Θγ (s) of the curve γ. More precisely, we have
L. Bates et al. / Central European Journal of Mathematics 5(3) 2007 429–451
441
Lemma 3.3. For each s ∈ S∗ the scattering phase of Γ± σ(s),σ(s0 ) is Θ± (s, s0 ) = Θγ (s) + Θ± (s).
(40)
± Γ± σ(s),σ(s0 ) = γ + Γσ(s) ,
(41)
Proof. Because ± the elevation of Γ± σ(s),σ(s0 ) with respect to Γσ(s0 ) is ± dθ = dθ + Θ (s, s0 ) = Γ± σ(s),σ(s
0)
dθ +
=
±∞
dθ,
Γ± σ(s)
γ
using (41)
θ˙ dt = Θγ (s) + Θ± (s).
0
γ
To complete the proof we need only show that the elevation Θγ (s) of the curve γ is finite. This is immediate because the domain of γ is compact. By definition the scattering phase Θ± (s, s0 ) of Γσ(s),σ(s0 ) is the elevation of the segment ± Γ± σ(s),σ(s0 ) with respect to the horizontal segment Γσ(s0 ) . Proposition 3.4. With respect to the horizontal curve Γσ(s0 ) the elevation Θ(σ(s), σ(s0 )) of the curve Γσ(s),σ(s0 ) , given by parallel transporting the curve Γσ(s) along the curve γ, is equal to the scattering phase of Γσ(s) . Proof. From the fact that − Γσ(s),σ(s0 ) = Γ+ σ(s),σ(s0 ) + (−Γσ(s),σ(s0 ) ) − = γ + Γ+ σ(s) + (−γ − Γσ(s) ),
using (41)
− = Γ+ σ(s) + (−Γσ(s) ) = Γσ(s) ,
it follows that
Θ(s, s0) =
dθ =
Γσ(s),σ(s0 )
dθ = Θ(s). Γσ(s)
± Next we explain how the elevation of Γ± σ(s),σ(s0 ) with respect to Γσ(s0 ) measures the ± asymptotic twist of Γ± σ(s),σ(s0 ) with respect to Γσ(s0 ) . To do this we need the notion of an end at ±∞ of an integral curve of Xv . We say that two integral curves of Xv on EM−1 (S) are equivalent at ±∞ if they are asymptotic to each other as either t → ∞ or t → −∞. An equivalence class is called an end at ±∞ of a representative integral curve. Clearly each integral curve of Xv on EM−1 (S) has at least two ends — one at +∞ and the other at −∞. Because the flow of the vector fields Xv and Xu on EM−1 (S) define an affine structure on EM−1 (s) for each s ∈ S and distinct level sets of EM are disjoint, no distinct integral curves of Xv on EM−1 (S) are asymptotic to each other. Therefore
442
L. Bates et al. / Central European Journal of Mathematics 5(3) 2007 429–451
Cσ(s0 ) \ {p+ (σ(s∗ ), σ(s0 ))} p+ (σ(s∗ ), σ(s0 ))
Θ+ (σ(s))
•
•
Γ+ σ(s0 )
•
p+ (σ(s), σ(s0 ))
Γ+ (σ(s), σ(s0 )) ∗ ) Γσ(s0 ) (S
• σ(s0 ) σ(S ∗ ) −Γ− (σ(s), σ(s0 ))
•
p− (σ(s∗ ), σ(s0 ))
p− (σ(s), σ(s0 ))
•
Θ− (σ(s))
−Γ− σ(s0 )
• Cσ(s0 ) \ {p− (σ(s∗ ), σ(s0 ))}
Fig. 2 For s ∈ S∗ the asymptotic twist of Γσ(s),σ(s0 ) on Γσ(s0 ) (S∗ ) is Θ(s) = Θ+ (s)+Θ− (s).
each integral curve Γσ(s) of Xv on EM−1 (S), which starts at σ(s), has exactly two ends: be the collection of all integral curves of p+ (σ(s)) at +∞ and p− (σ(s)) at −∞. Let Γ(S) Xv on EM−1 (S) which start at a point on S = σ(S). The collection of ends of integral forms the disjoint union of two circles C ± , each of which we curves of Xv lying in Γ(S) which is with C ± is a compact cylinder Γ(S), call a circle at ±∞. The union of Γ(S) is the closure of Γ(S) in Γ(S) ∪ C ±. homeomorphic to S 1 × [0, 1]. Here Γ(S) Returning to the discussion of asymptotic twist, let Γσ(s0 ) (S∗ ) be the union of curves Γσ(s),σ(s0 ) where σ(s) ∈ S∗ . Since Γσ(s0 ) (S∗ ) = P·,σ(s0 ) (Γ(S∗ )), it follows that Γσ(s0 ) (S∗ ) less Γσ(s∗ ) . For s ∈ S∗ and let is diffeomorphic to Γ(S∗ ), which is the cylinder Γ(S) p± (σ(s), σ(s0 )) be the end at ±∞ of the positively oriented segment Γ± σ(s),σ(s0 ) . The collection of ends C ± of curves in Γσ(s0 ) (S∗ ) is homeomorphic to the collection of ends σ(s0 )
of curves in Γ(S∗ ). From the definition of elevation it follows that the end p± (σ(s), σ(s0 )) is obtained from the end p± (σ(s0 ), σ(s0 )) by a counterclockwise rotation through an angle Θ± (s), provided that Θ± (s) is positive; otherwise, it is through a clockwise rotation. Thus Θ± (s) is the asymptotic twist of the positively oriented segment Γ± σ(s),σ(s0 ) with respect ± to the reference segment Γσ(s0 ) , see figure 2.
L. Bates et al. / Central European Journal of Mathematics 5(3) 2007 429–451
443
Fig. 3 The pictures read from top to bottom and left to right show the evolution of Γσ(s),σ(s0 ) as s starts at s∗− = s∗ in top left picture and increases along S˜∗ with the picture on the top right at s0 until s reaches s∗+ = s∗ .
4
Scattering monodromy
Consider the family s → Γσ(s) as s traces out the positively oriented S∗ starting at s∗− and ending at s∗+ . The scattering monodromy of s → Γσ(s) is the variation var Θ of the ∗ ∗ asymptotic twist of the associated family s → Γσ(s),σ(s0 ) , see figure 3.
s− ,s+
Proposition 4.1. As s traces out S∗ from s∗− to s∗+ , the scattering monodromy of s → Γσ(s) is 2π. Proof. We compute Θ = Θ(s∗+ ) − Θ(s∗− ) var ∗ ∗ s− ,s+
= 2Θ+ (s∗+ ) − 2Θ− (s∗− ),
using corollary 3.2
= 2(π) − 2(0) = 2π. We may conclude from applying the inverse of the parallel transport operator P·,σ(s0 ) to the curves illustrated in figure 3 that the scattering monodromy of the family s → Γσ(s) when s traverses S∗ from s∗− to s∗+ results from applying a Dehn twist, see [6, p.198], to the closure Γσ(s ) ([−∞, ∞]) of the image of Γσ(s ) in Γ(S∗ ∗). More precisely, the basis 0
0
Γσ(s ) [−∞, ∞]} of the relative first homology group H1 (Γ(S∗ ), ∂Γ(S∗ ) = C − ∪ C + ) of {S, 0 S+Γσ(s ) [−∞, ∞]} after s traverses the compactified cylinder Γ(S∗ ) becomes the basis {S, 0 ⎛ ⎞ ⎜1 1⎟ the circle S from s∗− to s∗+ . Thus the change of basis is given by the matrix ⎝ ⎠, 01
444
L. Bates et al. / Central European Journal of Mathematics 5(3) 2007 429–451
which is called the scattering monodromy matrix. It seems that the scattering monodromy matrix is topologically determined. To show this we prove Theorem 4.2. The scattering monodromy of the family s → Γσ(s) does not depend on the choice of principal connection on the S 1 -principal bundle ρ : EM−1 (S∗ ) → S∗ × R (22). Proof. Let ϕ be a connection 1-form on the S 1 -principal bundle ρ such that the scattering phase of every integral curve of the vector field Xv on EM−1 (S∗ ) is bounded on all of EM−1 (S) . Let s ∈ S∗ and let ιs : EM−1 (s) → EM−1 (S∗ ) be the inclusion map. The following argument shows that the 1-forms ι∗s dθ and ι∗s ϕ are cohomologous. Since Xv dθ = 1 = Xv ϕ, the period of ι∗s (ϕ − dθ) over every integral curve of Xu on EM−1 (s) vanishes. Therefore ι∗s (ϕ − dθ) is an exact 1-form on EM−1 (s). In other words, for each s ∈ S∗ there is a smooth function fs on EM−1 (s) such that ι∗s ϕ = ι∗s dθ + dfs . Since ι∗s dθ and ι∗s ϕ are S 1 -invariant, so is fs . Note that the map S∗ → C ∞ (S∗ ) : s → fs is not only continuous but is also uniformly bounded. The infinitesimal elevation of Γσ(s) with respect to the connection ϕs = ι∗s ϕ is (ι∗s ϕ) = Xv
ϕ˙ s = Xv
(ι∗s dθ) + Xv
dfs = θ˙ + f˙s ,
where f˙s = LXv fs is an S 1 -invariant smooth function on EM−1 (s). Consequently, the scattering phase of Γσ(s) is ∞ ∞ ∞ ˙ Φ(s) = dϕs = ϕ˙ s dt = θ dt + f˙s dt Γ(s)
−∞ +
−∞ −
−∞
= Θ(s) + [fs (p (σ(s))) − fs (p (σ(s)))].
(42)
Taking the limit it follows that (42) holds at s∗± . The variation of Φ(s) as s traverses S∗ from s∗− to s∗+ is Φ = Φ(s∗+ ) − Φ(s∗− ) = [Θ(s∗+ ) − Θ(s∗− )] + [fs∗+ (p+ (σ(s∗+ ))) var ∗ ∗ s− ,s+
− fs∗+ (p− (σ(s∗+ )))] − [fs∗− (p+ (σ(s∗− ))) − fs∗− (p− (σ(s∗− )))] = var Θ + [fs∗+ (p− (σ(s∗+ )) + 2π) − fs∗+ (p− (σ(s∗+ )))] ∗ ∗ s− ,s+
− [fs∗− (p− (σ(s∗− )) + 2π) − fs∗− (p− (σ(s∗− )))] by proposition 4.1, the angle on C + corresponding to p+ (σ(s∗± )) is equal to 2π plus the angle of p− (σ(s∗± )) on C − . Θ. = var ∗ ∗ s− ,s+
The last equality above follows because fs∗± is a continuous function on C − ⊆ Γ(S∗ ) and is therefore periodic of period 2π. Consequently, the scattering monodromy does not depend on the choice of connection 1-form on the S 1 -principal bundle ρ.
L. Bates et al. / Central European Journal of Mathematics 5(3) 2007 429–451
5
445
The A1 singularity: Picard-Lefschetz monodromy
In this section we show how to associate the integrable system (v, u, R4, ω) to the holomorphic function 1 (43) F : C2 → C : (z1 , z2 ) → (z12 + z22 ). 2 The scattering monodromy matrix of the integrable system (v|R4∗ , u|R4∗, R4∗ , ω|R4∗) is equal to the Picard-Lefschetz monodromy matrix of the A1 -singularity associated to F , see Looijenga [4]. Clearly F has an isolated critical point (0, 0) with nondegenerate Hessian. The corresponding critical value 0 of F is isolated and F −1 (0) exhibits the A1 -singularity associated to F .
5.1 A holomorphic Hamiltonian system Let Ω = dz1 ∧ dz2 be a holomorphic symplectic form on C2 . Corresponding to the holomorphic Hamiltonian function F (43) is the holomorphic Hamiltonian vector field XF , which satisfies XF Ω = dF . The complex integral curves of XF satisfy ∂F dz1 = = z2 dτ ∂z2
and
∂F dz2 =− = −z1 dτ ∂z1
(44)
for a complex time parameter τ . We now prove some properties of the complex flow of the holomorphic Hamiltonian vector field XF . Let F −1 (0) = {(z1 , z2 ) ∈ C2 12 (z12 + z22 ) = 0}. The flow of XF on C2 \ F −1 (0) defines the SO(2, C)-action : SO(2, C) × (C2 \ F −1 (0)) → C2 \ F −1 (0) : Φ ⎛ ⎞ ⎛ ⎞ ⎞ ⎛ ⎜ cos τ sin τ ⎟ ⎜ z1 ⎟ ⎜ z1 cos τ + z2 sin τ ⎟ ⎝ ⎠, ⎝ ⎠ → ⎝ ⎠. − sin τ cos τ z2 −z1 sin τ + z2 cos τ
(45)
is free and proper. Lemma 5.1. The SO(2, C)-action Φ Proof. First we show that the diffeomorphism ϕ : C2 \ F −1 (0) → C2 \ C : (z1 , z2 ) → (ξ, η) = (z1 + i z2 , z1 − i z2 ),
(46)
(45) with the where C = {(ξ, η) ∈ C2 12 ξη = 0}, intertwines the SO(2, C)-action Φ C∗ -action : C∗ × (C2 \ C) → C2 \ C : (λ, ξ, η) → (λξ, λ−1η). Φ (47) To see this we calculate z1 , z2 ) = ϕ(cos ϕ ◦ Φ(τ, τ z1 + sin τ z2 , − sin τ z1 + cos τ z2 ) = (cos τ − i sin τ )(z1 + i z2 ), (cos τ + i sin τ )(z1 − iz2 ) −i τ , ϕ(z 1 , z2 )). = (e−i τ ξ, ei τ η) = Φ(e
446
L. Bates et al. / Central European Journal of Mathematics 5(3) 2007 429–451
on C2 \ C is free and To finish the proof we need only show that the C∗ -action Φ proper. This action is free, because (λξ, λ−1 η) = (ξ, η) implies λ = 1. To show that it is a proper action it suffices to show that the map ϕ : C∗ × (C2 \ C) → (C2 \ C) × (C2 \ C) : (λ, ξ, η) → (ξ, η, λξ, λ−1η) is proper. Towards this goal let K be a compact subset of (C2 \ C) × (C2 \ C) and let {(λn , ξn , ηn )} be a sequence of points in ϕ−1 (K). Then {ϕ(λn , ξn , ηn ) = (ξn , ηn , λn ξn , λ−1 n ηn )} is a sequence of points in K. Because K is compact, there is a subsequence {ϕ(λnk , ξnk , ηnk )} which converges to (ξ, η, μ, ν) ∈ K. Then ξ = 0 and μ = 0. Since μ λn k ξ n k = = 0, k→∞ ξnk ξ
lim λnk = lim
k→∞
the subsequence {(λnk , ξnk , ηnk )} converges to ( μξ , ξ, η). But ξ ϕ( , ξ, η) = μ
μ μ −1 ξ, η, ξ, η ξ ξ
= (ξ, η, μ, ν),
since ξη = lim ξnk ηnk = lim λnk ξnk λ−1 nk ηnk = μν. k→∞
Therefore
( μξ , ξ, η)
−1
k→∞
−1
∈ ϕ (K). So ϕ (K) is compact. Thus the map ϕ is proper.
Corollary 5.2. The S 1 × R action Φ on R4∗ (17) is free and proper. Proof. Consider the diffeomorphism ϕ : R4∗ → C2 \ F −1 (0) : (x, y) → (z1 , z2 ) = (x1 + i y1, x2 + i y2 )
(48)
A computation shows that ϕ intertwines the S 1 × R-action Φ (17) with the SO(2, C) generated by the complex flow of XF on C2 \ F −1 (0), see equation (45). In action Φ particular, − i r, ϕ(x, Φ(t y)) = (ϕ ◦ Φ)((t, r), (x, y)). is a free and proper SO(2, C)-action, the result follows. Because Φ
Lemma 5.3. The orbit map : C2 \ F −1 (0) → C∗ : (z1 , z2 ) → 1 (z 2 + z 2 ) Π 2 2 1
(49)
(45) exhibits C2 \ F −1 (0) as a holomorphic SO(2, C)-principal of the SO(2, C)-action Φ bundle over C∗ . Proof. Let
: C2 \ C → C∗ : (ξ, η) → 1 ξη. Π 2
(50)
L. Bates et al. / Central European Journal of Mathematics 5(3) 2007 429–451
447
=Π ◦ ϕ, is the orbit map of the C∗ -action (47). Let Because Π it suffices to show that Π (ξ, η), (ξ , η ) ∈ C2 \ C such that 1 η) = Π(ξ , η ) = 1 ξ η . ξη = Π(ξ, 2 2 −1 ξ Since ξ and ξ are both nonzero and ξ = ξ ξ, η = ξξ η, it follows that (ξ , η ) lies is a unique C∗ -orbit. in the same C∗ -orbit as (ξ, η). Therefore each fiber of the map Π The statement about the bundle structure follows from lemma 5.1.1.
Corollary 5.4. The energy-momentum map EM : R4∗ → R2∗ = R2 \ {(0, 0)} : (x, y) → (v(x, y), u(x, y))
(51)
of the S 1 × R-action Φ (17) exhibits R4∗ as a real analytic S 1 × R-principal bundle over R2∗ . Proof. The map ι : C∗ → R2∗ : z → (Re z, Im z) = (x, y) identifies C∗ with R2∗ . Then (49) is the orbit map of the SO(2, C)-action Φ (45) and ϕ ◦ ϕ. EM = ι ◦ Π Because Π (48) 1 intertwines the S × R-action Φ with the SO(2, C)-action Φ, it follows that EM is the orbit map of the S 1 × R-action (17). The assertion about the bundle follows from lemma 5.1.1. (49) is holoProposition 5.5. The SO(2, C)-principal bundle exhibited by the map Π morphically trivial. : C∗ → C2 \ F −1 (0) : ξ → Proof. Let Σ
1 2
(ξ + 2), 2i1 (ξ − 2) . From
Σ(ξ)) 1 (ξ + 2), 1 (ξ − 2)) = 1 (ξ + 2)2 − 1 (ξ − 2)2 = ξ Π( = Π( 2i 8 8 2 is a holomorphic section of the bundle Π. Therefore, this principal it follows that Σ SO(2, C) bundle is holomorphically trivial. Corollary 5.6. The S 1 × R-principal bundle exhibited by the map EM (51) is real analytically trivial. Proof. Let σ be the map defined in equation (9). Then σ(R2∗ ) ⊆ R4∗ . To see this it suffices to show that σ(R2∗ ) ∩ (Π+ ∪ Π− ) = ∅. Suppose that σ(h, ) ∈ Π+ . Then + 12 R = − R + 12 R and − Rh = Rh . So (h, ) = (0, 0) ∈ / R2∗ , which is a contradiction. Now R suppose that σ(h, ) ∈ Π− . Then − R − 12 R = R − 12 R, which implies R = 0. But this constradicts the hypothesis that R > 0. Therefore σ(R2∗ ) ∩ (Π+ ∪ Π− ) = ∅ as desired. The map σ is a real analytic section of the bundle EM (51), since we have EM(σ(h, )) = (h, ).
448
L. Bates et al. / Central European Journal of Mathematics 5(3) 2007 429–451
5.2 An associated real integrable system In this subsection we find two real vector fields whose integral curves satisfy a real system of equations corresponding to (44). For j = 1, 2 let zj = xj + i yj . This identifies C2 and R4 (with coordinates (x, y) = (x1 , x2 , y1 , y2 )). Set u(x, y) = Re F , see (6) and v = Im F , see (1). Since ∂z∂ j = 1 ∂ ∂ , the first equation in (44) reads − i 2 ∂xj ∂yj dy1 ∂(u + i v) 1 dx1 +i = = dτ dτ ∂z2 2
∂v ∂u + ∂x2 ∂y2
1 + i 2
∂u ∂v − ∂x2 ∂y2
(52)
and the second equation dy2 1 dx2 +i =− dτ dτ 2
∂v ∂u + ∂x1 ∂y1
1 − i 2
∂u ∂v − ∂x1 ∂y1
.
(53)
Since F is holomorphic, it satisfies the Cauchy-Riemann equations ∂v ∂u = ∂xj ∂yj
and
∂u ∂v =− , ∂yj ∂xj
(54)
for j = 1, 2. Set τ = t, where t ∈ R. Equating real and imaginary parts and using (54), equations (52) and (53) become ∂u dx1 = = x2 dt ∂x2 ∂u dy1 =− = y2 dt ∂y2
∂u dx2 =− = −x1 dt ∂x1 ∂u dy2 = = −y1 . dt ∂y1
(55)
Equation (55) is satisfied by the integral curves of a Hamiltonian vector field Xu (5) on R4 with symplectic form ω = dx1 ∧ dx2 − dy1 ∧ dy2 = Re (dz1 ∧ dz2 ) corresponding to the Hamiltonian function u (6).‡ Set τ = −i s, where s ∈ R. Equating real and imaginary parts and using (55), equations (52) and (53) become ∂v dx1 = = y2 ds ∂x2 dy1 ∂v =− = −x2 ds ∂y2
∂v dx2 =− = −y1 ds ∂x1 dy2 ∂v = = x1 . ds ∂y1
(56)
Equation (53) is satisfied by the integral curves of a Hamiltonian vector field Xv (2) on (R4 , ω) corresponding to the Hamiltonian function v (1). ‡
The Hamiltonian vector field on (R4 , ω) corresponding to a smooth function f is ∂f ∂ ∂f ∂ ∂y2 ∂y1 + ∂y1 ∂y2 .
∂f ∂ ∂x2 ∂x1
−
∂f ∂ ∂x1 ∂x2
−
L. Bates et al. / Central European Journal of Mathematics 5(3) 2007 429–451
449
The vector fields Xu and Xv on R4 are linear and therefore complete. Moreover, the matrices of Xu and Xv with respect to a basis of (R4 , ω), where the matrix of ω is ⎛
0 −1
⎜ ⎜ ⎜1 ⎜ ⎜ ⎜ ⎜0 ⎝ 0 are
⎟ ⎟ 0 0 1⎟ ⎟ ⎟, ⎟ 0 0 1⎟ ⎠ 0 −1 0
⎞
⎛ 01
00
⎜ ⎜ ⎜ −1 0 0 ⎜ ⎜ ⎜ ⎜ 00 0 ⎝ 0 0 −1
⎞ 00
⎟ ⎟ 0⎟ ⎟ ⎟ and ⎟ 1⎟ ⎠ 0
(57)
⎞
⎛ 0
0
01
⎟ ⎜ ⎟ ⎜ ⎜ 0 0 −1 0 ⎟ ⎟ ⎜ ⎟, ⎜ ⎟ ⎜ 0 −1 0 0 ⎟ ⎜ ⎠ ⎝ 1 0 00
(58)
respectively. Because the matrices (58) commute, the functions u and v Poisson commute. Thus we have proved§ Proposition 5.7. The system (v, u, R4, ω) is Liouville integrable.
5.3 Picard-Lefschetz theory In this section we show that the scattering monodromy matrix of the integrable system (v|R4∗ , v|R4∗, R4∗ , ω|R4∗) associated to the holomorphic function F (43) is equal to the PicardLefschetz monodromy matrix of the A1 -singularity corresponding to F . We refer the reader to Looijenga [4] and Milnor [5] for background in singularity theory. First we reduce the study the asymptotic behavior of the integral curves of Xv on R4 to investigating the geometry of certain integral curves of the vector field Xu restricted to the 3-sphere S 3 in R4 of radius 1 centered at the origin. Motivated by the fact that the holomorphic function F (43) is homogeneous, we use the projection map 1 Π : R4 \ {0} → S 3 : (x, y) → 2 (x, y). 2 x1 + x2 + y12 + y22
(59)
For each s = (h, ) ∈ S consider the integral curve Γσ(s) : R → R4 : r → ϕvr (σ(s)) (10) of the vector field Xv , which starts at σ(s) = ( R + 12 R, Rh , Rh , − R + 12 R) ∈ EM−1 (s). Now the image of the curve Γσ(s) under the projection map Π (59) is the curve γσ(s) : R → 1 Γ (r). By a calculation similar to the one S 3 : r → γσ(s) (r), where γσ(s) (r) = Γσ(s) (r) σ(s) §
This is a special case of a result of Flaschka [3].
450
L. Bates et al. / Central European Journal of Mathematics 5(3) 2007 429–451
used in §1, we find that the ends of γσ(s) are ⎧ ⎪ ⎨ √1 (1, 0, 0, 1), at +∞ 2 ± p (σ(s)) = ⎪ ⎩ √ 12 2 (, h, h, −), at −∞.
(60)
2(h + )
The end p+ (σ(s)) lies on the integral curve 1 1 t → ϕut ( √ (1, 0, 0, 1)) = √ (cos t, − sin t, sin t, cos t) 2 2 of the vector field Xu |S 3 . The image of this curve is the positively oriented circle C + ⊆ EM−1 ((0, 0)) ∩ S 3 , which is a component of the boundary of the closed annulus Π(EM−1 (S)). The positive orientation of C + induces a positive orientation on the closed annulus Π(EM−1 (S)) in S 3 . The end p− (σ(s)) lies on the integral curve 1 1 t → ϕu−t ( √ (1, 0, 0, −1)) = √ (cos t, sin t, sin t, − cos t) 2 2 of the vector field Xu |S 3. (Take t = tan−1 h ). The image of this curve is the circle C − ⊆ EM−1 ((0, 0))∩S 3 , which is oriented oppositely to that of C + because its orientation is induced from the positive orientation of Π(EM−1 (S)). Geometrically, C ± is the image under the projection map Π (59) of the end circle C ± of the compact cylinder Γ(S). Let (S 3 )∗ = Π(R4∗∗ ) = S 3 \ K. Here K = Π({x = 0} ∩ R4∗ ), which is the union of {(0, y) ∈ S 3 y12 + y22 = 1} and EM−1 ((0, 0)) ∩ S 3 . Let γ : R → (S 3 )∗ ⊆ R4∗∗ be a smooth curve.
Lemma 5.8. The scattering phase of the curve γ with respect to the connection dθ (23) on R4∗∗ is equal to the scattering phase of the curve γ with respect to the connection i∗ dθ on (S 3 )∗ . Here i : (S 3 )∗ → R4∗∗ is the inclusion map. Proof. Let , be the standard Euclidean inner product on R4 . Write Xv (x, y) = Y (x, y) + λ(x, y) Z(x, y),
(61)
where Z(x, y) = x1 ∂x∂ 1 +x2 ∂x∂ 2 +y1 ∂y∂ 1 +y2 ∂y∂ 2 is the radial vector field, Y (x, y), Z(x, y) = 0, and λ : R4∗ → R is a smooth function. We may think of a vector field on R4∗ as a vector in R4∗ . Take the inner product of both sides of (61) with (x, y). Using 1 y2 −x2 y1 (2) and Y (x, y), (x, y) = 0, we get λ(x, y) = 2 x2x+x 2 +y 2 +y 2 and Y (x, y) = Xv (x, y) − 1 2 1 2 λ(x, y) Z(x, y). Note that by definition Y is a vector field on (S 3 )∗ . Now θ˙ = (Xv = (Y
dθ)(γ(t)) = (Y dθ)(γ(t)),
dθ)(γ(t)) + λ(γ(t)) (Z x dx −x dx 2 1 1 2 =0 since Z x2 +x2 1
= (Y
∗
i dθ)(γ(t)),
dθ)(γ(t))
2
since Y is tangent to (S 3 )∗ .
Because the infinitesimal elevation of γ with respect to dθ and i∗ dθ are equal, the respective scattering phases of γ are equal.
L. Bates et al. / Central European Journal of Mathematics 5(3) 2007 429–451
451
The scattering phase θ(s) of the projected curve γσ(s) is the sum of the asymptotic + − and the asymptotic twist θ− (s) of the segment γσ(s) . Note twist θ+ (s) of the segment γσ(s) 1 1 ± √ √ that from (60) it follows that 2 (1, 0, 0, 1) and 2 (1, 0, 0, −1) are the end points p (σ(s0 )) of γσ(s0 ) . Both p± (σ(s0 )) have scattering phase equal to 0, because γσ(s0 ) is the image of the horizontal curve Γσ(s0 ) under the projection map Π. Since p+ (σ(s)) = ϕu0 ( √12 (1, 0, 0, 1)), we see that θ+ (s) = 0. Now p− (σ(s)) =
1 1 (, h, h, −) = ϕu−tan−1 h ( √ (1, 0, 0, −1)). 2 2(h2 + 2 )
Therefore θ− (s) = tan−1 h . Thus the scattering phase θ(s) of γσ(s) is tan−1 h . This is the same as the scattering phase of Γσ(s) . we see that the positively oriLetting σ(s) traverse the positively oriented circle S, remains fixed; whereas ented circle C + on the positively oriented closed annulus Π(Γ(S)) becomes the relative the relative cycle Π(Γσ(s) ([−∞, ∞])) on the closed annulus Π(Γ(S)) + Π(Γσ(s) ([−∞, ∞])). cycle C + + Π(Γσ(s) ([−∞, ∞])), which equals the relative cycle Π(S) Π(Γσ(s) ([−∞, ∞]))} of the relaTherefore after s traverses the circle S, the basis {Π(S), becomes the basis {Π(S), Π(S) tive first homology group of the closed annulus Π(Γ(S)) +Π(Γσ(s) ([−∞, ∞]))}. Therefore the Picard-Lefschetz ⎛ ⎞ monodromy of the A1 singularity ⎜1 1⎟ associated to the holomorphic function F is ⎝ ⎠, see [4, §3.A, pp. 36–41]. This is 01 equal to the scattering monodromy matrix of the integrable system (v|R4∗, u|R4∗ , R4∗ , ω|R4∗) associated to F .
References [1] R. Cushman and L. Bates: Global aspects of classical integrable systems, Birkh¨auser, Basel, 1997. [2] J.J. Duistermaat: “On global action angle coordinates”, Commun. Pure Appl. Math., Vol. 33, (1980), pp. 687–706. [3] H. Flaschka: “A remark on integrable Hamiltonian systems”, Phys. Lett. A., Vol. 121, (1988), pp. 505–508. [4] E. Looijenga: Isolated singularities on complete intersections, Cambridge University Press, Cambridge, U.K., 1984. [5] J. Milnor: Singularities of complex hypersurfaces, Princeton University Press, Princeton, 1968. [6] J. Stillwell: Classical Topology and Combinatorial Group Theory, Graduate Texts in Mathematics, Vol. 72, Springer Verlag, Berlin, 1980. [7] J.L. Synge: “Classical Dynamics”, In: S. Flugge (Ed.): Encyclopedia of Physics, Vol. III/1 Principles of Classical Mechanics and Field Theory, Springer Verlag, Berlin, 1960, pp. 1–225.
DOI: 10.2478/s11533-007-0019-z Research article CEJM 2007 452–469
Comparison between different duals in multiobjective fractional programming Radu Ioan Bo¸t1∗ , Robert Chares2† , Gert Wanka1‡ 1
Faculty of Mathematics, Chemnitz University of Technology, D-09107 Chemnitz, Germany 2
Center for Operations Research and Econometrics, Universit´e Catholique de Louvain, B-1348 Louvain-la-Neuve, Belgium
Received 27 November 2006; accepted 11 May 2007 Abstract: The present paper is a continuation of [2] where we deal with the duality for a multiobjective fractional optimization problem. The basic idea in [2] consists in attaching an intermediate multiobjective convex optimization problem to the primal fractional problem, using an approach due to Dinkelbach ([6]), for which we construct then a dual problem expressed in terms of the conjugates of the functions involved. The weak, strong and converse duality statements for the intermediate problems allow us to give dual characterizations for the efficient solutions of the initial fractional problem. The aim of this paper is to compare the intermediate dual problem with other similar dual problems known from the literature. We completely establish the inclusion relations between the image sets of the duals as well as between the sets of maximal elements of the image sets. c Versita Warsaw and Springer-Verlag Berlin Heidelberg. All rights reserved. Keywords: multiobjective fractional programming, Fenchel duality, Fenchel-Lagrange duality, maximal elements, properly efficient elements MSC (2000): 49N15, 90C29, 90C32
1
Introduction
In this paper we continue the study in [2] on duality assertions for multiobjective fractional optimization problems. In the mentioned paper, considering a primal optimization ∗ † ‡
E-mail:
[email protected] E-mail:
[email protected] E-mail:
[email protected]
R.I. Bo¸t et al. / Central European Journal of Mathematics 5(3) 2007 452–469
453
problem having as objective function a vector function with components that are quotients of a convex and a concave function, we attach to it an intermediate multiobjective convex optimization problem by using an approach due to Dinkelbach ([6]), which we denote by (Pμ ) for μ ∈ Rm . To this last problem we construct a multiobjective dual (Dμ ) expressed in terms of the conjugates of the functions involved. For the intermediate primal and dual problems we prove weak, strong and converse duality assertions which we use then to give dual characterizations for the efficient solutions of the initial fractional problem. The aim we follow in this paper is to make a comparison of the dual (Dμ ) with different dual problems to the parameterized multiobjective optimization problem (Pμ ) given in the past in literature. On the one hand we consider two multiobjective problems constructed by using the approach described in [3] and on the other hand the multiobjective dual due to Ohlendorf and Tammer ([11]). The approach described by Bo¸t and Wanka in [3] for constructing multiobjective dual problems by using different scalar dual problems extends the results of Jahn ([8]) for Lagrange duality. Here we consider the multiobjective duals based on some conjugate duality concepts like Fenchel duality and Fenchel - Lagrange duality (for more on this see [14]). For the four dual problems we completely establish inclusion relations between the image sets of their feasible sets through their objective functions. Moreover, we prove that the sets of maximal elements of these image sets are equal for all μ ∈ Rm +. Similar investigations on the existence of inclusion relations between the image sets and, respectively, between the sets of maximal elements of the image sets of different multiobjective duals have been done by two of the authors in [3] and [4]. A general scheme containing the relations between the multiobjective duals of Jahn ([8]), Nakayama ([10]), Wolfe ([15]), Mond-Weir ([16]) and a conjugate dual introduced by Wanka and Bo¸t in [13] is presented. Furthermore, conditions under which the dual problems are equivalent are given. In the current paper we extend these investigations to fractional multiobjective optimization problems. A duality concept for multiobjective fractional optimization problems which is not considered here, but is worth mentioning has been introduced by Chandra, Craven and Mond in [5] and is also based on Dinkelbach’s parametrization approach. The formulation of the dual problem in the paper mentioned above is close to the ones in this paper. The feasible set of the multiobjective dual is defined by means of the Lagrange duality while we consider here the Fenchel and Fenchel-Lagrange duality concepts. An extension of the considerations we make in this paper to the dual problem introduced in [5] could be done in the lines of the theory presented in [3] and [4], where for a convex vector optimization problem different dual problems defined by means of the Lagrange, Fenchel and Fenchel-Lagrange duality concepts are introduced and investigated. The paper is structured as follows. In Section 2 we introduce some preliminary notions and we formulate the multiobjective fractional primal problem and the intermediate convex problem (Pμ ), μ ∈ Rm , which is equivalent to the original in some sense. Furthermore, we introduce the dual (Dμ ) for
454
R.I. Bo¸t et al. / Central European Journal of Mathematics 5(3) 2007 452–469
(Pμ ) and recall the weak, strong and converse duality theorems given in [2]. In Section 3 we introduce the other multiobjective duals to (Pμ ) and then we give the relations of inclusion between the image sets of these problems. The existence of strict inclusion relations is shown by some examples. Finally, we prove that the sets of maximal elements of the image sets are equal for all μ ∈ Rm +.
2
Preliminaries
In this section we give some notations and preliminary results used later in the paper. The first definition introduces the ordering relation induced on Rk by the ordering cone Rk+ . Definition 2.1. For y, z ∈ Rk we denote y z if z − y ∈ Rk+ = {u = (u1 , . . . , uk )T ∈ Rk : ui ≥ 0, i = 1, . . . , k}. The notions we introduce now come from convex analysis. Definition 2.2. Let be A ⊆ Rn . The indicator function of the set A, χA : Rn → R, is defined by 0, if x ∈ A, χA = +∞, otherwise.
Definition 2.3. Let f : Rn → R be a given function. Then the conjugate function of f , f ∗ : Rn → R, is defined by f ∗ (p) = sup {pT x − f (x)}. Having a given subset A ⊆ Rn we x∈Rn
define the conjugate function of f with respect to A, fA∗ : Rn → R, as being fA∗ (p) = (f + χA )∗ (p) = sup{pT x − f (x)}. x∈A
The primal multiobjective fractional optimization problem considered here is
(P )
v- min Φ(x) x∈A A=
T
x ∈ Rn : g(x) = g1 (x), · · · , gk (x)
0 ,
where A is assumed to be non-empty, ∀x ∈ Rn , T T f (x) f (x) m 1 Φ(x) = Φ1 (x), · · · , Φm (x) = h (x) , · · · , h (x) , m 1 fi : Rn → R = R ∪ {±∞} are convex and proper functions, (−hi ) : Rn → R are convex
R.I. Bo¸t et al. / Central European Journal of Mathematics 5(3) 2007 452–469
455
functions fulfilling hi (x) > 0, ∀x ∈ A, i = 1, . . . , m, gj : Rn → R are real-valued convex m functions, j = 1, . . . , k, and ri(dom fi ) = ∅. i=1
Note that A is convex, but nevertheless (P ) is in general a non-convex problem. In order to point out the optimal solutions of the problem (P ), let us introduce the following definitions of efficiency and proper efficiency. Definition 2.4 (Efficiency for problem (P )). An element x¯ ∈ A is said to be efficient (or minimal) for (P ) if {Φ(¯ x) − Rm x)}, + } ∩ Φ(A) = {Φ(¯ or, equivalently, if there is no x ∈ A such that Φ(x) Φ(¯ x) and Φ(x) = Φ(¯ x). Definition 2.5 (Proper efficiency for problem (P )). A point x¯ ∈ A is said to be properly efficient for (P ) if there exists λ = (λ1 , . . . , λm )T ∈ int(Rm + ) such that m
m
λi Φi (¯ x) ≤
i=1
λi Φi (x), ∀x ∈ A.
i=1
Let us notice that any properly efficient solution turns out to be an efficient one, too. In order to investigate the duality for (P ) we considered in [2] the following parameterized optimization problem by using an idea due to Dinkelbach ([6]) (Pμ ) where
v- min Φ(μ) (x), x∈A
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ (μ) ⎢Φ1 (x)⎥ ⎢ f1 (x) ⎥ ⎢ μ1 · h1 (x) ⎥ ⎢ . ⎥ ⎢ . ⎥ ⎢ ⎥ .. ⎥ . ⎥ ⎢ . ⎥ ⎢ Φ(μ) (x) = ⎢ . ⎢ . ⎥= ⎢ . ⎥−⎢ ⎥ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ (μ) Φm (x) fm (x) μm · hm (x) (μ)
and μ = (μ1 , ..., μm )T ∈ Rm . Note that Φi are proper and convex if μi ≥ 0, i = 1, . . . , m. Efficiency and proper efficiency for (Pμ ) are defined in an analogous manner as done above for (P ). Kaul and Lyall ([9]) and Bector, Chandra and Singh ([1]) stated the connections between the efficient elements of (P ) and (Pμ ). Theorem 2.6 (See Refs. [1] and [9]). A point x¯ ∈ A is efficient for problem (P ) if x) ¯ = (¯ μ1 , . . . , μ ¯m ) and μ ¯ i := hfii(¯ , i= and only if x¯ is efficient for problem (Pμ¯ ), where μ (¯ x) 1, . . . , m.
456
R.I. Bo¸t et al. / Central European Journal of Mathematics 5(3) 2007 452–469
Another efficiency notion used in [2] is the so-called properly efficiency in the sense of Geoffrion. Definition 2.7 (Proper efficiency in the sense of Geoffrion [7]). A point x¯ ∈ A is said to be properly efficient in the sense of Geoffrion for (P ) if it is efficient and if there is some real number M > 0 such that for each i = 1, . . . , m and each x ∈ A satisfying Φi (x) < Φi (¯ x) there exists at least one j ∈ {1, . . . , m} such that Φj (¯ x) < Φj (x) and x) − Φi (x) Φi (¯ ≤ M. Φj (x) − Φj (¯ x) Proper efficiency in the sense of Geoffrion for problem (Pμ ) is defined in an analogous way, with Φ(μ) instead of Φ. x) ≥ 0, i = Theorem 2.8 (See Ref. [2]). Let be x¯ ∈ A and assume that μ ¯ i := hfii(¯ (¯ x) 1, . . . , m. The point x¯ is properly efficient in the sense of Geoffrion for problem (P ) if and only if x¯ is properly efficient (in the sense of Definition 2.5) for problem (Pμ¯ ), where μ ¯ = (¯ μ1 , . . . , μ ¯ m )T .
The multiobjective dual problem to (Pμ ), μ ∈ Rm , introduced in [2], based on the duality concept developed by two of the authors in [13], is the following one v-
(Dμ )
max
(u,v,q,λ,t)∈Bµ
where
Ψ(μ) (u, v, q, λ, t),
T
Ψ
(μ)
(u, v, q, λ, t) =
(μ) Ψ1 (u, v, q, λ, t), · · ·
(μ) , Ψm (u, v, q, λ, t)
,
(μ)
Ψi (u, v, q, λ, t) = − fi∗ (ui ) − (−μi hi )∗ (vi ) m 1 λj (uj + vj ) + ti , i = 1, . . . , m, − (qiT g)∗ − mλi j=1 the set of constraints is defined by Bμ =
(u, v, q, λ, t) : λ ∈ int Rm +,
m
λi qi 0,
i=1
m
λi ti = 0
i=1
and the dual variables are u = (u1, . . . , um ), ui ∈ R , v = (v1 , . . . , vm ), vi ∈ Rn , T q = (q1 , . . . , qm ), qi ∈ Rk , i = 1, . . . , m, λ = (λ1 , . . . , λm )T ∈ int Rm + , t = (t1 , . . . , tm ) ∈ Rm . The efficient elements of (Dμ ) are defined in an analogous manner as for (P ). n
¯ t¯) ∈ Bμ is said to Definition 2.9 (Efficiency for problem (Dμ )). An element (¯ u, v¯, q¯, λ, be efficient (or maximal) for (Dμ ) if ¯ t¯) + Rm } ∩ Ψ(μ) (Bμ ) = {Ψ(μ) (¯ ¯ t¯)}. u, v¯, q¯, λ, u, v¯, q¯, λ, {Ψ(μ) (¯ +
R.I. Bo¸t et al. / Central European Journal of Mathematics 5(3) 2007 452–469
457
We were able to prove the following weak duality result. Theorem 2.10 (Weak duality [2]). Let be μ ∈ Rm . There is no (u, v, q, λ, t) ∈ Bμ and x ∈ A such that Ψ(μ) (u, v, q, λ, t) Φ(μ) (x), and Ψ(μ) (u, v, q, λ, t) = Φ(μ) (x). For the strong duality theorem and the optimality conditions we need a constraint qualification. In order to formulate it let us consider the sets L = {j ∈ {1, ..., k} : gj is affine} and N = {1, ..., k} \ L. Constraint qualification (CQ) There exists an element x ∈
m
ri(dom fi ) such that gj (x ) < 0, j ∈ N, and
i=1
gj (x ) ≤ 0, j ∈ L.
¯ is a properly Theorem 2.11 (Strong duality [2]). Let μ ∈ Rm + and (CQ) be fulfilled. If x ¯ efficient element of (Pμ ), then there exists an efficient solution (¯ u, v¯, q¯, λ, t¯) ∈ Bμ of (Dμ ) and strong duality holds, i.e. ¯ t¯). x) = Ψ(μ) (¯ u, v¯, q¯, λ, Φ(μ) (¯ Let us introduce now the following condition which will be helpful for the converse duality theorem. m Definition 2.12. Let be μ ∈ Rm + and λ ∈ int(R+ ). The condition (Cμ,λ ) is fulfilled when from m (μ) λi Φi (x) > −∞ inf x∈A
i=1
it follows that there exists xλ ∈ A such that inf
x∈A
m i=1
(μ) λi Φi (x)
=
m
(μ)
λi Φi (xλ ).
i=1
Now the converse duality theorem for (Pμ ) can be formulated. Theorem 2.13 (See Ref. [2]). Let be μ ∈ Rm + given, (CQ) be fulfilled and assume that m (Cμ,λ ) holds for all λ ∈ int(R+ ). ¯ t¯) be an efficient solution of (Dμ ). Then (1) Let (¯ u, v¯, q¯, λ, ¯ t¯) ∈ cl(Φ(μ) (A) + Rm ); (a) Ψ(μ) (¯ u, v¯, q¯, λ, +
458
R.I. Bo¸t et al. / Central European Journal of Mathematics 5(3) 2007 452–469
(b) there exists a properly efficient solution x¯λ¯ ∈ A of (Pμ ) such that m
(μ) ¯ t¯)] = 0. ¯ i [Φ(μ) (¯ xλ¯ ) − Ψi (¯ u, v¯, q¯, λ, λ i
i=1 (μ) (2) If, additionally, Φ(μ) (A) is Rm (A) + Rm + -closed (Φ + is closed), then there exists a properly efficient solution x¯ ∈ A of (Pμ ) such that m
¯ i Φ(μ) (¯ xλ¯ ) = λ i
i=1
m
¯ i Φ(μ) (¯ x), λ i
i=1
and ¯ t¯). x) = Ψ(μ) (¯ u, v¯, q¯, λ, Φ(μ) (¯ By using the previous results, one can give dual characterizations for the solutions of the fractional multiobjective optimization problem (P ). Theorem 2.14 (See Ref. [2]). Let (CQ) be fulfilled and x¯ ∈ A be properly efficient x) in the sense of Geoffrion for problem (P ) with μ ¯i := hfii(¯ ≥ 0, i = 1, . . . , m. Let be (¯ x) T ¯ t¯) ∈ Bμ¯ μ ¯ := (¯ μ1 , . . . , μ ¯ m ) . Then x¯ is properly efficient for (Pμ¯ ), there exists (¯ u, v¯, q¯, λ, that is efficient for (Dμ¯ ) and strong duality between (Pμ¯ ) and (Dμ¯ ) holds. μ ¯ Theorem 2.15 (See Ref. [2]). Let (CQ) be fulfilled and μ ¯ ∈ Rm + such that the set Φ (A) m ¯ t¯) is Rm u, v¯, q¯, λ, ¯,λ ) holds for all λ ∈ int(R+ ). Let (¯ + -closed. Moreover, assume that (Cμ be an efficient solution for (Dμ¯ ). Then there exists x¯ ∈ A, a properly efficient solution x) = μ ¯ then x¯ is properly for (Pμ¯ ), and strong duality between (Pμ¯ ) and (Dμ¯ ) holds. If Φ(¯ efficient in the sense of Geoffrion for (P ).
3
Comparison with other dual problems
In this section we make a comparison between different dual problems to the parameterized multiobjective optimization problem (Pμ ) when μ ∈ Rm + . Along the problem (Dμ ) introduced in the previous section we consider other two multiobjective problems constructed by using the approach described in [3] as well as the multiobjective dual due to Ohlendorf and Tammer ([11]).
3.1 Formulation of the dual problems Bo¸t and Wanka developed in [3] an approach for constructing multiobjective dual problems by using different scalar dual problems. They extended the results of Jahn ([8]) for Lagrange duality to different conjugate duality concepts like Fenchel duality and the so-called Fenchel - Lagrange duality (for more on this see [14]). We also take these two duals into consideration in order to formulate two further multiobjective dual problems to (Pμ )
R.I. Bo¸t et al. / Central European Journal of Mathematics 5(3) 2007 452–469
(DFμ )
v-
459
Ψ(F ) (u, v, λ, y),
max
F (u,v,λ,y)∈Bµ
where
⎡
⎤
(F ) ⎢ Ψ1 (u, v, λ, y1) ⎥
⎢ Ψ(F ) (u, v, λ, y) = ⎢ ⎢ ⎣ BμF
⎡
⎤
⎢ y1 ⎥ ⎢ ⎥ ⎥ .. ⎥ = ⎢ ... ⎥ , . ⎥ ⎢ ⎥ ⎦ ⎣ ⎦ (F ) Ψm (u, v, λ, ym) ym
= (u, v, λ, y) : λ ∈
int Rm +,
m
λi y i
i=1
≤−
m
λi [fi∗ (ui ) + (−μi hi )∗ (vi )] − χ∗A −
i=1
m
λi (ui + vi )
i=1
and (DFμ L )
v-
max
FL (u,v,q,λ,y)∈Bµ
Ψ(F L) (u, v, q, λ, y),
where
⎡
⎤
(F L) ⎢ Ψ1 (u, v, q, λ, y1) ⎥
⎡
⎤
⎢ y1 ⎥ ⎢ ⎥ ⎢ ⎥ .. ⎥ = ⎢ ... ⎥ , Ψ(F L) (u, v, q, λ, y) = ⎢ . ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ (F L) Ψm (u, v, q, λ, ym) ym k BμF L = (u, v, q, λ, y) : λ ∈ int Rm + , q ∈ R+ , m i=1
λi y i ≤ −
m
λi [fi∗ (ui ) + (−μi hi )∗ (vi )]
i=1
− (q T g)∗ −
m
λi (ui + vi )
.
i=1
The fourth multiobjective dual problem considered here is the so - called Fenchel - type dual according to Ohlendorf and Tammer [11] μ (DO )
v-
max
O (p,λ,y)∈Bµ
Ψ(O) (p, λ, y),
460
R.I. Bo¸t et al. / Central European Journal of Mathematics 5(3) 2007 452–469
where
⎡
⎡
⎤
(O) ⎢ Ψ1 (p, λ, y1) ⎥
⎤
⎢ y1 ⎥ ⎢ ⎥ ⎢ ⎥ .. ⎥ = ⎢ ... ⎥ , Ψ(O) (p, λ, y) = ⎢ . ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ (O) Ψm (p, λ, ym) ym n BμO = (p, λ, y) : λ ∈ int Rm +, p ∈ R , m i=1
λi y i = − −
m
∗
(−p) −
λi μi hi A
i=1
m
∗ λi fi
i=1
(p) .
A
The weak and strong duality assertions for the presented problems have been proved by Bo¸t and Wanka in [3] and Ohlendorf and Tammer in [11], respectively.
3.2 Inclusions between the image sets For μ = (μ1 , ..., μm )T ∈ Rm + we denote the image sets of the feasible sets of the four multi(μ) objective duals through their objective functions by Dμ := Ψ(μ) (Bμ ), DF L := Ψ(F L) (BμF L ), (μ) (μ) DF := Ψ(F ) (BμF ) and DO := Ψ(O) (BμO ). Next we study the inclusion relations which exist between them. We omit proving the theorem below as this result can be derived from Proposition 5.2 in [3] and Proposition 2.1 in [4]. (μ)
(μ)
Theorem 3.1. It holds Dμ ∩ Rm ⊆ DF L ⊆ DF , ∀μ ∈ Rm +. Example 5.2 in [3] and Example 2.1 in [4] show that the relations of inclusion in Theorem 3.1 can be also strict. Assuming the constraint qualification (CQ) is fulfilled, Proposition 3.1 in [4] offers a refinement of the relation above. Theorem 3.2 (Proposition 3.1, [4]). Let (CQ) be fulfilled. Then it holds (μ)
(μ)
DF L = DF , ∀μ ∈ Rm +. This means that if (CQ) is fulfilled, then we have for all μ ∈ Rm + (μ)
(μ)
D μ ∩ Rm ⊆ D F L = D F . In the next example we show that the first inclusion in the relation above can be strict. Example 3.3. Let n = 1, m = 2, k = 1, f1 (x) = x, f2 (x) = 0, x ∈ R, g(x) = −x, x ∈ R, h1 (x) = h2 (x) = 1, x ∈ R, and μ = (1, 1)T . Thus the feasible set A looks like A = {x ∈ R : x ≥ 0} and it is obvious that the constraint qualification (CQ) is fulfilled.
R.I. Bo¸t et al. / Central European Journal of Mathematics 5(3) 2007 452–469
461
The conjugate functions turn out to be 0, if u1 = 1, ∗ f1 (u1 ) = +∞, otherwise, f2∗ (u2 ) =
0,
if u2 = 0,
+∞, otherwise,
and, respectively, for i = 1, 2, (−hi )∗ (vi ) =
1,
if vi = 0,
+∞, otherwise.
For u = (1, 0), v = (0, 0), λ = (1, 1)T and d = (−2, −2)T we have that λ1 d1 +λ2 d2 = −4 and, on the other hand, −λ1 [f1∗ (u1 ) + (−h1 )∗ (v1 )] − λ2 [f2∗ (u2) + (−h2 )∗ (v2 )] 2 −χ∗A − λi (ui + vi ) = −2 + inf x = −2. x≥0
i=1
(μ)
This means that (u, v, λ, d) ∈ BμF , which is nothing else than d ∈ DF . Let us show now that d ∈ / Dμ . If this were not true, then there would exist an element ¯ ¯ (¯ u, v¯, q¯, λ, t) ∈ Bμ such that ⎛ ⎞ 2 ⎞ ⎛ ∗ ¯ j (¯ u1 ) − (−h1 )∗ (¯ v1 ) − (¯ q1 g)∗ − 2λ¯11 uj + v¯j ) + t¯1 ⎟ λ ⎜ −f1 (¯ −2 ⎜ ⎟ j=1 ⎟ ⎜ ⎜ ⎟ ⎟ ⎜ ⎜ ⎟ ⎟=⎜ ⎜ ⎟. ⎟ ⎜ ⎜ ⎟ ⎠ ⎜ ⎝ ⎟ 2 ¯ ⎝ ⎠ 1 ∗ ∗ ∗ −2 ¯ −f2 (¯ u2 ) − (−h2 ) (¯ v2 ) − (¯ q2 g) − 2λ¯2 uj + v¯j ) + t2 λj (¯ j=1
In order to happen this we must have u¯1 = 1, u¯2 = 0, v¯1 = 0, v¯2 = 0 and so ⎞ ⎛ ⎛ ⎞ 1 ¯ −1 + inf 2 − q¯1 x + t1 ⎜ −2 ⎟ ⎜ ⎟ x∈R ⎟ ⎜ ⎜ ⎟ ⎟=⎜ ⎜ ⎟. ⎟ ⎜ ⎜ ⎟ ⎠ ⎝ ⎝ ⎠ ¯ −2 −1 + inf 2λλ¯12 − q¯2 x + t¯2 x∈R
¯ ¯ 1 t¯1 + λ ¯ 2 t¯2 < 0, This relation can be true just if q¯1 = 12 , q¯2 = 2λλ¯12 and t¯1 = t¯2 = −1. As λ ¯ t¯) ∈ Bμ . this leads to a contradiction to (¯ u, v¯, q¯, λ, (μ)
(μ)
Next we study the existence of an inclusion between DO and DF , assuming that the constraint qualification (CQ) is fulfilled. To this end we formulate and study the Fenchel dual to the following scalarized primal
462
R.I. Bo¸t et al. / Central European Journal of Mathematics 5(3) 2007 452–469
inf Φ(μ,λ) (x),
(Pμ,λ )
x∈A
where (μ,λ)
Φ
m
(x) =
λi ·
(μ) Φi (x)
=
i=1
m
λi · (fi (x) − μi · hi (x)) , x ∈ Rn
i=1
and λi > 0, i = 1, . . . , m. The Fenchel dual to (Pμ,λ ) is (see, for example, [12]) (F ) sup −(Φ(μ,λ) )∗ (p) − χ∗A (−p) . (Dμ,λ ) p∈Rn
As we will see in the proof of the next lemma, the Fenchel dual to (Pμ,λ ) turns out to be
(F ) (Dμ,λ )
sup
ui ,vi ∈Rn , i=1...m
−
m
λi [fi∗ (ui) + (−μi hi )∗ (vi )] − χ∗A −
m
i=1
λi (ui + vi )
.
i=1
Lemma 3.4. Assume that (CQ) is fulfilled and inf(Pμ,λ ) is finite. Then there is (F )
inf(Pμ,λ ) = max(Dμ,λ ), (F )
and the dual problem (Dμ,λ ) has an optimal solution. Proof. The constraint qualification (CQ) being fulfilled, according to Theorem 31.1 in (F ) [12], it follows that between (Pμ,λ ) and (Dμ,λ ) strong duality holds, namely (F )
inf(Pμ,λ ) = max(Dμ,λ ), (F )
and (Dμ,λ ) has an optimal solution. Thus there exists p¯ ∈ Rn such that inf(Pμ,λ ) = −(Φ(μ,λ) )∗ (¯ p) − χ∗A (−¯ p). On the other hand, as
m
(ri (dom fi )) = ∅, the conjugate function of Φ(μ,λ) turns out to
i=1
be ∀p ∈ Rn (cf. Theorem 16.4 in [12]) m m m (Φ(μ,λ) )∗ (p) = min (λi fi )∗ (ri ) + (−λi μi hi )∗ (si ) : (ri + si ) = p = min
i=1 m i=1
One can see that indeed (F )
(Dμ,λ )
sup
ui ,vi ∈Rn , i=1...m
−
m
i=1
λi fi∗ (ui) +
m
λi (−μi hi )∗ (vi ) :
i=1
i=1
is the Fenchel dual of (Pμ,λ ) and that strong duality holds.
λi (ui + vi ) = p .
i=1
λi [fi∗ (ui) + (−μi hi )∗ (vi )] − χ∗A
i=1 m
−
m i=1
λi (ui + vi )
R.I. Bo¸t et al. / Central European Journal of Mathematics 5(3) 2007 452–469 (μ)
463
(μ)
Theorem 3.5. Let (CQ) be fulfilled. Then it holds DO ⊆ DF . (μ) ¯ p¯) ∈ int Rm × Rn . Then we get the Proof. Let be d¯ ∈ DO with corresponding (λ, + following relations m
¯ i d¯i = − − λ
i=1
= inf
x∈A
≤ inf
x∈A
= inf
x∈A
m
∗ ¯ i μi hi λ
(−¯ p) −
p¯T x −
A
i=1 m
¯ i μi hi (x) λ
i=1
p¯T x − m
m
m
∗ ¯ i fi λ
i=1
+ inf
x∈A
¯ i μi hi (x) − p¯T x + λ
i=1
(¯ p) A
−¯ pT x +
m
¯ i fi (x) λ
i=1 m
¯ i fi (x) λ
i=1
¯ i (fi (x) − μi hi (x)) . λ
i=1
The right-hand side is nothing else but the scalarization of the parameterized primal (F ) problem (Pμ ). According to Lemma 3.4 there exists an optimal solution to (Dμ,λ¯ ), say (¯ u, v¯) = (¯ u1 , . . . , u ¯m, v¯1 , . . . , v¯m ), u¯i , v¯i ∈ Rn , i = 1, . . . , m, such that strong duality holds.Thus, m
¯ i d¯i ≤ max(D (F¯) ) λ μ,λ
i=1
=−
m
¯ i f ∗ (¯ λ i ui ) −
i=1
m
¯ i (−μi hi )∗ (¯ vi ) − χ∗A − λ
i=1
m
¯ i (¯ ui + v¯i ) . λ
i=1
¯ d) ¯ ∈ BF and so d¯ ∈ ΨF (BF ) = D (μ) . This means that (¯ u, v¯, λ, μ μ F Assuming the constraint qualification (CQ) is fulfilled, by Theorem 3.2 and Theorem 3.5, we have for all μ ∈ Rm + (μ) (μ) (μ) DO ⊆ DF L = DF . Below we introduce a further example which shows that, in general, one can find an (μ) m ⊆ DO fails. This implies that the element μ ∈ Rm + such that the inclusion Dμ ∩ R inclusion in the relation above can indeed be strict. Example 3.6. Let n = 1, m = 2, k = 1, f1 (x) = x + 2, f2 (x) = −x + 2, h1 (x) = h2 (x) = 1, x ∈ R and μ = (1, 1)T . For g(x) =
(x − 1)2 − 1, −1,
if x ≤ 1, otherwise,
the feasible set is defined as A = {x ∈ R : g(x) ≤ 0} = [0, +∞). The constraint
464
R.I. Bo¸t et al. / Central European Journal of Mathematics 5(3) 2007 452–469
qualification (CQ) is again fulfilled. The conjugate functions become −2, if u1 = 1, f1∗ (u1 ) = +∞, otherwise, −2, if u2 = −1, f2∗ (u2 ) = +∞, otherwise, and for i = 1, 2 (−hi )∗ (vi ) =
1,
if vi = 0,
∞, otherwise.
Choosing u = (1, −1), v = (0, 0), q = (1, 1), λ = (1, 1)T , t = (0, 0)T , we get that (u, v, q, λ, t) ∈ 2 2 λi qi = 1 + 1 = 2 ≥ 0 and λi ti = 0. Bμ because λ ∈ int R2+ , i=1
i=1
Furthermore
(μ) Ψi (u, v, q, λ, t)
= 2 − 1 − (1 · g)∗
1 − (ui + vi ) 2 i=1 2
+0
= 1 + inf g(x) = 0, i = 1, 2. x∈R
This means that the element d = (0, 0)T ∈ Dμ ∩ R2 . (μ) ¯ = (λ ¯1, λ ¯ 2 )T ∈ But d ∈ / DO , because in the opposite situation there would exist λ int R2+ and p¯ ∈ R such that ∗ ∗ 2 2 ¯ i hi ¯ i fi ¯ 2 d2 = 0 = − − ¯ 1 d1 + λ λ λ (−¯ p) − (¯ p) λ A
i=1
i=1
A
¯1 + λ ¯ 2 + inf [(−¯ ¯1 − λ ¯ 2 )x]. px] + λ p+λ = inf [¯ x∈A
x∈A
¯ 2 ≥ 0 and λ ¯1 + λ ¯ 2 = 0. As this can never ¯1 − λ This can be the case just if p¯ ≥ 0, −¯ p+λ be the case, the assertion is proved. The last example of this section shows that, in general, one can find an element (μ) m μ ∈ Rm also fails. This means that, even if the + such that the inclusion DO ⊆ Dμ ∩ R (μ) constraint qualification (CQ) is fulfilled, between the image sets DO and Dμ ∩ Rm there exists no relation of inclusion which holds for all μ ∈ Rm +. Example 3.7. Let n = 2, m = 2, k = 1, f1 (x1 , x2 ) = x2 , f2 (x1 , x2 ) = 0, (x1 , x2 )T ∈ R2 , h1 (x1 , x2 ) = h2 (x1 , x2 ) = 1, (x1 , x2 )T ∈ R2 , and μ = (1, 1)T . For g(x1 , x2 ) = x21 − x2 , (x1 , x2 )T ∈ R2 , the feasible set looks like A = {(x1 , x2 )T ∈ R2 : x21 ≤ x2 } and it obvious that the constraint qualification (CQ) is fulfilled. The conjugate functions turn out to be 0, if u1 = (0, 1)T , f1∗ (u1 ) = +∞, otherwise,
R.I. Bo¸t et al. / Central European Journal of Mathematics 5(3) 2007 452–469
f2∗ (u2 ) =
0,
465
if u2 = (0, 0)T ,
+∞, otherwise,
and, respectively, for i = 1, 2, (−hi )∗ (v i ) =
1,
if v i = (0, 0)T
+∞, otherwise.
For p = −1, 12 , λ = (1, 1)T and d = (−2, −1)T we have that λ1 d1 + λ2 d2 = −3 and, on the other hand, −(−λ1 h1 − λ2 h2 )∗A (−p) − (λ1 f1 + λ2 f2 )∗A (p) = 1 1 1 1 x x − 2 − = −3. −x x = − inf + − 2 + inf + 1 2 1 2 2 2 2 2 x21 ≤x2 x21 ≤x2 (μ)
This means that (p, λ, d) ∈ BμO , which is nothing else than d ∈ DO . Let us show now that d ∈ / Dμ . If this were not true, then there would exist an element 1 ¯ (¯ u, v¯, q¯, λ, t¯) ∈ Bμ , u¯ = (¯ u , u¯2) ∈ R2 × R2 , v¯ = (¯ v 1 , v¯2 ) ∈ R2 × R2 , q¯ = (¯ q1 , q¯2 ) ∈ R2+ , ¯ 2 )T ∈ int R2 , t¯ = (t¯1 , t¯2 )T ∈ R2 , such that ¯ = (λ ¯1, λ λ + ⎛ ⎞ 2 ⎞ ⎛ 1 ∗ 1 ∗ 1 ∗ j j ¯ j (¯ λ u ) − (−h1 ) (¯ v ) − (¯ q1 g) − 2λ¯1 u + v¯ ) + t¯1 ⎟ ⎜ −f1 (¯ ⎟ j=1 ⎜ −2 ⎟ ⎜ ⎟ ⎟ ⎜ ⎜ ⎜ ⎟ ⎟=⎜ ⎜ ⎟. ⎟ ⎜ ⎜ ⎟ ⎠ ⎜ ⎝ ⎟ 2 ⎝ ⎠ −1 ¯ j (¯ −f2∗ (¯ u2 ) − (−h2 )∗ (¯ v 2 ) − (¯ q2 g)∗ − ¯1 uj + v¯j ) + t¯2 λ 2λ2
j=1
In order to happen this we must have u¯1 = (0, 1), u ¯2 = (0, 0), v¯1 = (0, 0), v¯2 = (0, 0) and, so, ⎞ ⎞ ⎛ ⎛ 2 1 ¯ −1 + inf q ¯ x + t x + − q ¯ 1 1 1 2 1 ⎟ 2 (x1 ,x2 )T ∈R2 ⎜ −2 ⎟ ⎜ ⎜ ⎟ ⎟ ⎜ ⎜ ⎟ ⎟=⎜ ⎜ ⎟. ⎟ ⎜ ⎜ ⎟ ⎠ ⎝ ⎝ ⎠ ¯1 λ 2 −1 q¯2 x1 + 2λ¯2 − q¯2 x2 + t¯2 −1 + inf (x1 ,x2 )T ∈R2
This relation can be true just if q¯1 = 12 , q¯2 = this leads to a contradiction.
¯1 λ ¯ ¯ 2 , t1 2λ
¯ 1 t¯1 + λ ¯ 2 t¯2 < 0, = −1 and t¯2 = 0. As λ
What we succeeded to prove is that, assuming that (CQ) holds, for all μ ∈ Rm +, (μ)
and
(μ)
D μ ∩ Rm ⊆ D F L = D F (μ)
(μ)
(μ)
DO ⊆ DF L = DF . In general, both inclusions in the relations above can be strict. Moreover, between (μ) the image sets DO and Dμ ∩ Rm there exists no relation of inclusion which holds for all μ ∈ Rm +.
466
R.I. Bo¸t et al. / Central European Journal of Mathematics 5(3) 2007 452–469
3.3 Inclusion between the efficiency sets In this section we extend our study to the comparison of the sets of maximal elements of the image sets we dealt with in the previous subsection. Having a given subset D ⊆ Rm , an element d ∈ D is said to be maximal if there exists no d¯ ∈ D such that d¯ − d ∈ Rm + ¯ and d = d. The set of maximal elements of D will be denoted by vmax(D). In the following we assume that the constraint qualification (CQ) is fulfilled. Under this assumption we can derive the first theorem from Theorem 3.2 and, respectively, Theorem 5.4 in [3]. Theorem 3.8. It holds (μ)
(μ)
vmax(Dμ ) = vmax(DF ) = vmax(DF L ), ∀μ ∈ Rm +. The sets of maximal elements are nothing else than the image sets of the efficiency sets of the corresponding dual problems. Now it remains to investigate if there are some connections between the set of maximal (μ) elements of vmax(DO ) and the sets in the relation above. We prove first the following theorem. Theorem 3.9. It holds (μ)
(μ)
vmax(DO ) ⊆ vmax(DF ), ∀μ ∈ Rm +. (μ)
(μ)
Proof. Let be μ ∈ Rm + fixed and d ∈ vmax(DO ). This implies that d ∈ DO and, (μ) according to Theorem 3.5, we have d ∈ DF . (μ) ¯ d) ¯ ∈ BF such Now assume that d ∈ / vmax(DF ). This means that there exists (¯ u, v¯, λ, μ that d ∈ d¯ − {Rm \{0}}. Furthermore, it holds + m
¯ i di < λ
m
i=1
¯ i d¯i ≤ − λ
i=1
=−
m i=1 m
∗ ¯ i [f ∗ (¯ vi )] − χ∗A − λ i ui ) + (−μi hi ) (¯
¯ i sup {¯ λ uTi x − fi (x)} −
m
x∈Rn
m
¯ i (¯ ui + v¯i ) λ
i=1
¯ i sup {¯ λ viT x − (−μi hi )(x)} x∈Rn
i=1 ⎫ ⎧ T m ⎬ ⎨ ¯ i (¯ − − sup ui + v¯i ) x − χA (x) . λ ⎭ x∈Rn ⎩ i=1
i=1
The supremum of a function over the whole space is always greater than or equal to the supremum over a subset of this space. Thus it follows m m m T ¯ ¯ ¯ i sup{¯ ui x − fi (x)} − viT x + μi hi (x)} λi d i < − λi sup{¯ λ i=1
x∈A
i=1 ⎫ ⎧ T m ⎬ ⎨ ¯ i (¯ − − sup ui + v¯i ) x , λ ⎭ x∈A ⎩ i=1
i=1
x∈A
R.I. Bo¸t et al. / Central European Journal of Mathematics 5(3) 2007 452–469
and from here there is m
¯ i di < − sup λ
m
x∈A
¯ i [¯ λ uTi x − fi (x)]
− sup
m
x∈A
467
¯ i [¯ λ viT x + μi hi (x)]
i=1 i=1 ⎫ ⎧ T m ⎬ ⎨ ¯ i (¯ − ui + v¯i ) x λ − sup ⎭ x∈A ⎩ i=1 m m ¯ i [¯ ¯ i [¯ ≤ − sup − λ λ uT x − μi hi (x)] − sup uT x − fi (x)]
i=1
i
x∈A
= inf
x∈A
i=1
⎧ m ⎨
T ¯ i u¯i λ
x−
m
i
i=1
⎫ ⎬
¯ i μi hi (x) λ
⎭ ⎫ T m ⎬ ¯ i u¯i ¯ i fi (x) − sup λ λ x− ⎭ x∈A ⎩ i=1 i=1 m ∗ m m ∗ m ¯ i μihi ¯ i u¯i − ¯ i fi ¯ i u¯i . =− − − λ λ λ λ x∈A ⎩
i=1
i=1
⎧ m ⎨
A
i=1
i=1
A
i=1
i=1
Choose now d˜ ∈ d¯ + Rm + such that ∗ m m ∗ m m m ¯ i d˜i = − − ¯ i μi hi ¯ i u¯i − ¯ i fi ¯ i u¯i . − λ λ λ λ λ i=1
A
i=1
As for p¯ :=
i=1
A
i=1
i=1
m ˜ ∈ BO , it follows d˜ ∈ D (μ) . But d˜ ∈ d + {Rm \{0}} and ¯ i u¯i , (¯ p, λ, d) λ μ + O
i=1
(μ)
this contradicts the maximality of d in DO . The next theorem shows that the reverse inclusion also holds. Theorem 3.10. It holds (μ)
(μ)
vmax(DF ) ⊆ vmax(DO ), ∀μ ∈ Rm +. (μ)
(μ)
¯ = Proof. Let be μ ∈ Rm + fixed and d ∈ vmax(DF ). As d ∈ DF , there exists u n m ¯ ∈ int R , such that (¯ ¯ d) ∈ (¯ u1 , ..., u¯m ), v¯ = (¯ v1 , ..., v¯m ), u¯i , v¯i ∈ R , i = 1, ..., m, and λ u, v¯, λ, + F Bμ . Thus m m m ¯ i di ≤ − ¯ i [f ∗ (¯ ¯ i (¯ ui ) + (−μi hi )∗ (¯ vi )] − χ∗ − ui + v¯i ) λ λ λ i
i=1
≤
sup
ui ,vi ∈Rn , i=1,...,m
A
i=1
−
m i=1
i=1
¯ i [f ∗ (ui ) + (−μi hi )∗ (vi )] − χ∗ − λ i A
m i=1
¯ i (ui + vi ) λ
468
R.I. Bo¸t et al. / Central European Journal of Mathematics 5(3) 2007 452–469
= inf
m
x∈A
¯ i (fi (x) − μi hi (x)) < +∞, λ
i=1
because of Lemma 3.4. The supremum in the relation above must be finite and, from the (μ) maximality of d in DF , one has the following equality m m m ¯ i di =sup ¯ i [f ∗ (ui) + (−μi hi )∗ (vi )] − χ∗ − ¯ i (ui + vi ) − . λ λ λ A
i
ui ,vi ∈Rn , i=1,...,m
i=1
i=1
This means that
m
i=1
m
¯ i di = inf λ
x∈A
i=1
¯ i (fi (x) − μi hi (x)). λ
i=1
On the other hand, the infimum above can be written, equivalently, in the following way m m ¯ i (fi (x) − μi hi (x)) = inf ¯ i (fi (x) − μi hi (x)) + χA (x) λ λ inf n x∈A
x∈R
i=1
=−
' m
( ¯ i fi + χA + λ
i=1
' m
i=1
(∗
¯ i (−μi )hi + χA λ
(0).
i=1
Using again the constraint qualification (CQ), it follows by Theorem 16.4 in [12] that there exists p¯ ∈ Rn such that ' m ( ' m (∗ ¯ i fi + χA + ¯ i (−μi )hi + χA (0) = λ λ i=1
m
i=1
∗ ¯ i fi + χA λ
(¯ p) +
m
i=1
∗ ¯ i (−μi )hi + χA λ
(−¯ p).
i=1
This means that m i=1
¯ i di = − − λ
m i=1
∗ ¯ i μi hi λ
(−¯ p) −
A
m
∗ ¯ i fi λ
(¯ p), A
i=1
¯ d) ∈ BO . Therefore d ∈ D (μ) . which is nothing else than (¯ p, λ, μ O (μ) (μ) Assuming that d ∈ / vmax(DO ), there must exists d¯ ∈ DO such that d ∈ d¯ − (μ) (μ) ¯ {Rm + \{0}}. According to Theorem 3.5 we have that d ∈ DO ⊆ DF and this contradicts (μ) (μ) the maximality of d in DF . In conclusion d must belong to vmax(DO ). We conclude the paper by giving the relation which exists under the stated assumptions between the sets of maximal elements of the image sets of the multiobjective dual problems treated, namely (μ)
(μ)
(μ)
vmax(Dμ ) = vmax(DO ) = vmax(DF ) = vmax(DF L ), ∀μ ∈ Rm +.
R.I. Bo¸t et al. / Central European Journal of Mathematics 5(3) 2007 452–469
469
In other words, the image sets of the efficiency sets of all multiobjective dual problems μ ) to the primal problem (Pμ ) coincide for all μ ∈ Rm (Dμ ), (DFμ ), (DFμ L ) and (DO +. Acknowledgements. We are thankful to an anonymous reviewer for helping us to improve the quality of the paper.
References [1] C.R. Bector, S. Chandra and C. Singh: “Duality on multiobjective fractional programming”, In: Lecture Notes in Economics and Mathematical Systems, Vol. 345, Springer Verlag, Berlin, 1990, pp. 232–241. [2] R.I. Bo¸t, R. Chares and G. Wanka: “Duality for multiobjective fractional programming problems”, Nonlinear Anal. Forum, Vol. 11, (2006), pp. 185-201. [3] R.I. Bo¸t and G. Wanka: “An analysis of some dual problems in multiobjective optimization (I)”, Optimization, Vol. 53, (2004), pp. 281–300. [4] R.I. Bo¸t and G. Wanka: “An analysis of some dual problems in multiobjective optimization (II)”, Optimization, Vol. 53, (2004), pp. 301–324. [5] S. Chandra, B.D. Craven and B. Mond: “Multiobjective fractional programming duality. A Lagrangian approach”, Optimization, Vol. 22, (1991), pp. 549–556. [6] W. Dinkelbach: “On nonlinear fractional programming”, Management Science, Vol. 13, (1967), pp. 492–497. [7] A.M Geoffrion: “Proper efficiency and the theory of vector maximization”, J. Math. Anal. Appl., Vol. 22, (1968), pp. 618–630. [8] J. Jahn: “Duality in vector optimization”, Math. Program., Vol. 25, (1983), pp. 343–353. [9] R.N. Kaul and V. Lyall: “A note on nonlinear fractional vector maximization”, OPSearch, Vol. 26, (1989), pp. 108–121. [10] H. Nakayama: “Geometric consideration of duality in vector optimization”, J. Optimiz. Theory App., Vol. 44, (1984), pp. 625–655. [11] E. Ohlendorf and Ch. Tammer: “Multicriteria fractional programming - an approach by means of conjugate functions”, OR Spektrum, Vol. 16, (1994), pp. 249–254. [12] R.T. Rockafellar: Convex analysis, Princeton University Press, 1970. [13] G. Wanka and R.I. Bo¸t: “A new duality approach for multiobjective convex optimization problems”, J. Nonlinear and Convex Anal., Vol. 3, (2002), pp. 41–57. [14] G. Wanka and R.I. Bo¸t: “On the relations between different dual problems in convex mathematical programming”, In: P. Chamoni and R. Leisten and A. Martin and J. Minnemann and A. Stadler (Eds.), Operations Research Proceedings 2001, SpringerVerlag, Berlin, 2002, pp. 255–265. [15] T. Weir: “Proper efficiency and duality for vector valued optimization problems”, J. Aust. Math. Soc., Vol. 43, (1987), pp. 21–34 [16] T. Weir and B. Mond: “Generalised convexity and duality in multiple objective programming”, Bull. Aust. Math. Soc., Vol. 39, (1989), pp. 287–299.
DOI: 10.2478/s11533-007-0012-6 Research article CEJM 5(3) 2007 470–483
On the lattice of n-filters of an LMn-algebra Dumitru Bu¸sneag∗ and Florentina Chirte¸s† Faculty of Mathematics and Computer Science, University of Craiova, Department of Mathematics, 13, Al. I. Cuza street, 200 585-Craiova, Romania
Received 6 September 2006; accepted 30 March 2007 Abstract: For an n-valued L ukasiewicz-Moisil algebra L (or LMn -algebra for short) we denote by Fn (L) the lattice of all n-filters of L. The goal of this paper is to study the lattice Fn (L) and to give new characterizations for the meet-irreducible and completely meet-irreducible elements on Fn (L). c Versita Warsaw and Springer-Verlag Berlin Heidelberg. All rights reserved. Keywords: LMn -algebra, n-filter, prime n-filter, meet-irreducible n-filter, completely meet-irreducible n-filter. MSC (2000): 03D25, 06G35
1
Introduction
LMn -algebras arise in the area of mathematical many-valued logic. J. L ukasiewicz introduced the first system of many-valued logic in 1920. Since the development of various systems of logic has always been accompanied by the development of their algebraic counterpart (the associated Lindenbaum-Tarski algebras), interest in the purely algebraic aspects has become increasingly important and the corresponding area has become an important and interesting area of algebra in its own right. Following this theme, in this paper we study the algebraic aspects of the L ukasiewicz-Moisil algebras. In 1940, Gr. C. Moisil introduced the 3 and 4-valued L ukasiewicz algebras, in 1941 the n-valued L ukasiewicz algebras and he developed the theory of these L ukasiewicz algebras from an algebraic point of view. After 1960, when Moisil showed that L ukasiewicz algebras have applications in the study of electric circuits, there were many new advances in this theory from the algebraic ∗ †
e-mail:
[email protected] e-mail:
[email protected]
D. Bu¸sneag and F. Chirte¸s / Central European Journal of Mathematics 5(3) 2007 470–483
471
point of view. An important role in these advances has been played by the Bah´ıa-Blanca School created by A. Monteiro. Important advances in this area have also been developed in several doctoral theses at the University of Bucharest, written by pupils of Moisil (as V. Boicescu, G. Georgescu, Gh. Nadiu, I. Petrescu-Voiculescu) or S. Rudeanu (as A. Filipoiu and A. Iorgulescu). The structure of this paper is as follows. In Section 2 we recall some basic definitions and results relative to LMn -algebras. In Section 3 we recall some known basic definitions and results relative to the lattice Fn (L) of all n-filters of L and also we prove some new results. Theorem 3.16 characterizes the LMn -algebras for which the lattice of n-filters (Fn (L), ∧, ∨,∗ , {1}, L) is a Boolean algebra. In Section 4 we study the spectrum Specn (L) of L and the set Ircn (L) of all completely meet-irreducible elements of the lattice Fn (L). We prove that if every F ∈ Fn (L) has a unique representation as the intersection of prime n-filters, then Fn (L) is a Boolean algebra (see Theorem 4.15). We also give new characterizations for the elements of Specn (L) and Ircn (L).
2
Definitions and preliminaries
Let n be an integer, n ≥ 2 and J = {1, ..., n − 1}. Definition 2.1. (Boicescu et al. [3]) An n-valued L ukasiewicz-Moisil algebra (or LMn algebra for short) is an algebra L = (L, ∧, ∨, N, {ϕi }i∈J , 0, 1) of type (2, 2, 1, {1}i∈J , 0, 0) satisfying the following conditions: (a1 ) (L, ∧, ∨, N, 0, 1) is a De Morgan algebra, (a2 ) ϕ1 , ..., ϕn−1 : L → L are bounded lattice morphisms such that for every x, y ∈ L: (a3 ) ϕi (x) ∨ Nϕi (x) = 1 for every i ∈ J, (a4 ) ϕi (x) ∧ Nϕi (x) = 0 for every i ∈ J, (a5 ) ϕi ϕj (x) = ϕj (x) for every i ∈ J, (a6 ) ϕi (Nx) = Nϕj (x) for every i, j ∈ J with i + j = n, (a7 ) ϕ1 (x) ≤ ϕ2 (x) ≤ ... ≤ ϕn−1 (x), (a8 ) if ϕi (x) = ϕi (y) for every i ∈ J, then x = y. The endomorphisms {ϕi }i∈J are called chrysippian endomorphisms; the relation (a8 ) is called the Moisil’s determination principle. As consequences of Moisil’s determination principle we have: (c1 ) if x, y ∈ L, then x ≤ y iff ϕi (x) ≤ ϕi (y) for all i ∈ J, (c2 ) ϕ1 (x) ≤ x ≤ ϕn−1 (x) for all x ∈ L. For the remainder of this paper we denote an LMn -algebra L = (L, ∧, ∨, N, {ϕi }i∈J , 0, 1) by its universe L. Denote by C(L) the set of all complemented elements of the bounded lattice (L, ∧, ∨, 0, 1) which we call the center of L; it is easy to see that (C(L), ∧, ∨, N, 0, 1)
472
D. Bu¸sneag, F. Chirte¸s / Central European Journal of Mathematics 5(3) 2007 470–483
is a Boolean algebra. Lemma 2.2. If L is an LMn -algebra, then for every x ∈ L: (c3 ) x ∧ ϕ1 (Nx) = x ∧ Nϕn−1 (x) = 0, (c4 ) x ∨ Nϕ1 (x) = x ∨ ϕn−1 (Nx) = 1. Proof. (c3 ). For every x ∈ L we have x ≤ ϕn−1 (x), so x ∧ ϕ1 (Nx) = x ∧ Nϕn−1 (x) ≤ ϕn−1 (x) ∧ Nϕn−1 (x) = 0 (by (a4 )), hence x ∧ ϕ1 (Nx) = 0. (c4 ). Is a direct consequence of (c3 ) raplacing x by Nx.
Definition 2.3. (Boicescu et al. [3]) A congruence of an LMn -algebra L is an equivalence relation of L compatible with the operations ∧, ∨, N, ϕi , for every i ∈ J. Let θ be a congruence of L. It is known from universal algebra, since the class of LMn -algebras is equational, that L/θ is organized as an LMn -algebra (L/θ, ∧, ∨, N θ , ϕθ1 , ..., ϕθn−1 , 0/θ, 1/θ), where (b1 ) x/θ ∧ y/θ = (x ∧ y)/θ, x/θ ∨ y/θ = (x ∨ y)/θ, (b2 ) N θ (x/θ) = (Nx)/θ, (b3 ) ϕθi (x/θ) = (ϕi (x))/θ, i ∈ J. The order relation of this algebra is defined as follows: x/θ ≤ y/θ if and only if there are a ∈ x/θ and b ∈ y/θ such that a ≤ b. Definition 2.4. (Boicescu et al. [3], Cignoli [5]) A nonempty subset F ⊆ L is called an n-filter if F is a lattice filter of L and if x ∈ F , then ϕ1 (x) ∈ F. The concept of n-filter was introduced by Moisil [10, 11] for LMn -algebras and later [12] for θ-valued L ukasiewicz algebras, under the name of strong filter (because there exist lattice filters that are not n-filters - see [3], p. 248). Remark 2.5. From (a7 ) it follows that if F ⊆ L is an n-filter and x ∈ F, then ϕi (x) ∈ F for every i ∈ J. Recall [6, 14] that for x, y ∈ L the weak implication is defined as x ; y = Nϕ1 (x) ∨ y (this notion can also be found in [3], Definition 1.35 (p. 262)). Proposition 2.6. (Boicescu et al. [3], Cignoli [6]) The weak implication ; has the following properties for every x, y ∈ L : (c5 ) x ; (y ∧ z) = (x ; y) ∧ (x ; z), (c6 ) x ≤ y implies x ; y = 1,
D. Bu¸sneag and F. Chirte¸s / Central European Journal of Mathematics 5(3) 2007 470–483
473
(c7 ) x ≤ y iff ϕi (x) ; ϕi (y) = 1, for every i ∈ J, (c8 ) x ; ϕ1 (x) = 1 and ϕ1 (x) ; y = x ; y, (c9 ) if x ≤ y then z ; x ≤ z ; y. Corollary 2.7.(c10 ) If x, y, z ∈ L and x ∧ y ≤ z, then x ≤ y ; z. Proof. By (c9 ) we deduce y ; (x ∧ y) ≤ y ; z. But by (c5 ), y ; (x ∧ y) = (y ; x) ∧ (y ; y) = (y ; x) ∧ 1 = y ; x, so we obtain y ; x ≤ y ; z. Since x ≤ y ; x we deduce x ≤ y ; z. Lemma 2.8. For every x, y ∈ L we have: (c11 ) (x ; y) ∨ (y ; x) = 1, (c12 ) (x ; y) ; y = ϕ1 (x) ∨ y, (c13 ) ((x ; y) ; y) ∧ ((y ; x) ; x) = ϕ1 (x ∨ y) ∨ (x ∧ y), (c14 ) (y ; z) ; ((x ; z) ; z) = ϕ1 (x) ∨ ϕ1 (y) ∨ z, (c15 ) ϕi (x) ; ϕi (y) = Nϕi (x) ∨ ϕi (y), for every i ∈ J. Proof. (c11 ). We have (x ; y) ∨ (y ; x) = (Nϕ1 (x) ∨ y) ∨ (Nϕ1 (y) ∨ x) = (Nϕ1 (x) ∨ x) ∨ (Nϕ1 (y) ∨ y) = 1. (c12 ). Using (c4 ), we have: (x ; y) ; y = (ϕ1 (x) ∧ Nϕ1 (y)) ∨ y = (ϕ1 (x) ∨ y) ∧ (Nϕ1 (y) ∨ y) = ϕ1 (x) ∨ y. (c13 ). Using (c12 ), we have: ((x ; y) ; y) ∧ ((y ; x) ; x) = (ϕ1 (x) ∨ y) ∧ (ϕ1 (y) ∨ x) = (ϕ1 (x) ∧ x) ∨ (x ∧ y) ∨ (ϕ1 (x) ∧ ϕ1 (y)) ∨ (y ∧ ϕ1 (y)) = ϕ1 (x) ∨ (x ∧ y) ∨ ϕ1 (y) = ϕ1 (x ∨ y) ∨ (x ∧ y). (c14 ). Using (c12 ) and (c4 ), we have: (y ; z) ; ((x ; z) ; z) = (ϕ1 (y) ∧ Nϕ1 (z)) ∨ (ϕ1 (x) ∨ z) = ϕ1 (x) ∨ ϕ1 (y) ∨ z. (c15 ). Routine.
Definition 2.9. (Boicescu et al. [3]) A nonempty subset D ⊆ L is called a deductive system if the following conditions are satisfied: (D1 ) 1 ∈ D, (D2 ) if x, x ; y ∈ D, then y ∈ D. Proposition 2.10. (Boicescu et al. [3], Cignoli [5, 6])A nonempty subset D ⊆ L is a deductive system iff D is an n-filter.
474
D. Bu¸sneag, F. Chirte¸s / Central European Journal of Mathematics 5(3) 2007 470–483
Remark 2.11. A deductive system D is proper (i.e. D = L) iff 0 ∈ / D iff no element a ∈ L satisfies a, a ; 0 ∈ D. If F is an n-filter, we consider the next two relations: x mod F y iff there is f ∈ F such that x ∧ f = y ∧ f and x ∼F y iff there exists f ∈ F such that ϕi (x) ∧ f = ϕi (y) ∧ f , for every i ∈ J. Remark 2.12. Theorem 5.1.13 (p. 251) from [3] proves that mod F is a congruence on L and Proposition 5.1.31 (p. 259), from the same book, shows that x mod F y iff x ∼F y iff
n−1
[(Nϕi (x) ∨ ϕi (y)) ∧ (ϕi (x) ∨ Nϕi (y))] ∈ F.
i=1
In the following, we denote the quotient LMn -algebra L/mod F = L/ ∼F by L/F and the congruence class of x ∈ L by x/F ; the chrysippian endomorphisms ϕ˜i : L/F → L/F are defined by ϕ˜i (x/F ) = ϕi (x)/F, i ∈ J. In L/F , 0 = 0/F and 1 = 1/F ; clearly, x/F = 1 iff x ∈ F. Proposition 2.13. For x, y ∈ L we have: n−1 (ϕi (x) ; ϕi (y)) ∈ F, (i) x/F ≤ y/F iff i=1
(ii) x ≤ y implies x/F ≤ y/F , (iii) x/F ≤ y/F implies x ; y ∈ F. Proof. (i). We have x/F ≤ y/F iff (x ∧ y)/F = x/F , iff (x ∧ y) ∼F x, that is, n−1 [(Nϕi (x ∧ y) ∨ ϕi (x)) ∧ (ϕi (x ∧ y) ∨ Nϕi (x))] ∈ F. But, i=1 n−1
= = =
i=1 n−1 i=1 n−1 i=1 n−1 i=1
n−1 i=1
[(Nϕi (x ∧ y) ∨ ϕi (x)) ∧ (ϕi (x ∧ y) ∨ Nϕi (x))] [(N(ϕi (x) ∧ ϕi (y)) ∨ ϕi (x)) ∧ ((ϕi(x) ∧ ϕi (y)) ∨ Nϕi (x))] [(Nϕi (x) ∨ Nϕi (y) ∨ ϕi (x)) ∧ ((ϕi (x) ∨ Nϕi (x)) ∧ (ϕi (y) ∨ Nϕi (x))] [1 ∧ 1 ∧ (Nϕi (x) ∨ ϕi (y))] =
n−1
(ϕi (x) ; ϕi (y)).
i=1
(ii). If x ≤ y then ϕi (x) ≤ ϕi (y), hence ϕi (x) ; ϕi (y) = 1 for every i ∈ J, so (ϕi (x) ; ϕi (y)) = 1 ∈ F , that is, x/F ≤ y/F (by (i)).
D. Bu¸sneag and F. Chirte¸s / Central European Journal of Mathematics 5(3) 2007 470–483
(iii). Since
n−1
475
(ϕi (x) ; ϕi (y)) ≤ ϕ1 (x) ; ϕ1 (y) we deduce that ϕ1 (x) ; ϕ1 (y) ∈ F .
i=1
But, by (c15 ) ϕ1 (x) ; ϕ1 (y) = Nϕ1 (x) ∨ ϕ1 (y) ≤ Nϕ1 (x) ∨ y = x ; y, hence x ; y ∈ F .
3
The lattice of n-filters of an LMn-algebra
We denote by F (L) (Fn (L)) the set of all lattice filters (n-filters) of L; clearly Fn (L) ⊆ F (L). Definition 3.1. If X ⊆ L we denote by [X)n the n-filter generated by X, i.e. the least n-filter including X (we recall that we denote the lattice filter of L generated by X by [X)). If X = {a} then we will denote [{a}) ([{a})n ) by [a) ([a)n ). Also, for F ∈ Fn (L) and a ∈ L we denote F (a) = [F ∪ {a})n . Proposition 3.2. (Boicescu et al. [3]) If X = ∅, then [X)n = {1} while if X = ∅ then [X)n = {y ∈ L: there exist p ≥ 1 and x1 , ..., xp ∈ X such that ϕ1 (
p
xi ) ≤ y}.
i=1
In particular, for a ∈ L, [a)n = {x ∈ L : ϕ1 (a) ≤ x} = [ϕ1 (a)) and if a ∈ C(L), then [a)n = {x ∈ L : a ≤ x} = [a). Remark 3.3. (Balbes and Dwinger [1], Gr¨atzer [8]) If L is a distributive lattice then (F (L), ∧, ∨, {1}, L) is a complete distributive lattice, where for every F1 , F2 ∈ F (L) we have F1 ∧ F2 = F1 ∩ F2 and F1 ∨ F2 = [F1 ∪ F2 ). Definition 3.4. An n-filter F of L is called principal if there exists a ∈ L such that F = [a)n . Corollary 3.5. (i) If X ⊆ C(L) then [X)n = [X) while if X is a lattice filter of L then [X)n = {y ∈ L : ϕ1 (x) ≤ y for some x ∈ X}, (ii) if F ∈ Fn (L) and a ∈ L then F (a) = F ∨ [a)n = {y ∈ L : ϕ1 (x ∧ a) ≤ y for some x ∈ F }, (iii) if F1 , F2 ∈ Fn (L) then [F1 ∪ F2 )n = {y ∈ L : ϕ1 (x1 ∧ x2 ) ≤ y for some x1 ∈ F1 and x2 ∈ F2 }, (iv) if (Fi )i∈I is a family of n-filters of L then [ ∪ Fi )n = {y ∈ L : ϕ1 (xi1 ∧ ... ∧ xim ) ≤ y for some xij ∈ Fij , j = 1, ..., m}. i∈I
Proposition 3.6. For every x, y ∈ L we have: (i) if x ≤ y then [y)n ⊆ [x)n , (ii) [x)n ∧ [y)n = [x ∨ y)n , (iii) if F ∈ Fn (L), then F (x) ∩ F (y) = F (x ∨ y).
476
D. Bu¸sneag, F. Chirte¸s / Central European Journal of Mathematics 5(3) 2007 470–483
Definition 3.7. (Balbes and Dwinger [1], Birkhoff [2], Boicescu et al. [3]) Let L = (L, ∧, ∨) be a lattice. (i) For every y, z ∈ L, the relative pseudocomplement of y with respect to z, provided it exists, is the greatest element x such that x ∧ y ≤ z; it is denoted by y → z (i.e. not y → z = max{x ∈ L : x ∧ y ≤ z}). (ii) L is said to be relatively pseudocomplemented provided the relative pseudocomplement y → z exists for every y, z ∈ L. (iii) A Heyting algebra is a relatively pseudocomplemented lattice with 0, i.e. a bounded one. For x ∈ L, the element x∗ = x → 0 is called the pseudocomplemented of x (clearly, x∗ = max{y ∈ L : x ∧ y = 0}). If L is a relatively pseudocomplemented lattice, then → can be viewed as a binary operation on L and there exists the greatest element, 1, of the lattice: 1 = x → x, for all x ∈ L. Consequently, we have the following equivalent definitions: Definition 3.8. (1) A relatively pseudocomplemented lattice is an algebra L = (L, ∧, ∨, →, 1), where (L, ∧, ∨, 1) is a lattice with greatest element and the binary operation → on L verifies: for all x, y, z ∈ L, x ≤ y → z if and only if x ∧ y ≤ z. (1 ) A Heyting algebra is a duplicate name for bounded relatively pseudocomplemented lattice (i.e. with 0). Lemma 3.9. (Balbes and Dwinger [1]) In every Heyting algebra we have: (1) y ≤ x → y, (2) x ≤ y iff x → y = 1 (hence x → x = x → 1 = 1 and 0 → y = 1), (3) (x ∨ y) → z = (x → z) ∧ (y → z), (4) x ∧ x∗ = 0 (so, if x ≤ x∗ then x∗ = 0), (5) x ≤ (x → y) → y (for y = 0 we obtain x ≤ x∗∗ ), (6) x∗ → x = x implies x = x∗∗ , (7) x = (y → x) ∧ ((y → x) → x). Theorem 3.10. (Birkhoff [2], Theorem 15) A complete lattice is relatively pseudocomplemented if and only if satisfies the identity: (C) a ∧ ( bi ) = (a ∧ bi ). i∈I
i∈I
Proposition 3.11. (Boicescu et al. [3], Cignoli [5], Iorgulescu [9]) (i) (Fn (L), ⊆) is a complete sublattice of F (L), (ii) if F ∈ Fn (L) and (Fi )i∈I is a family of n-filters of L then
F ∧(
i∈I
Fi ) =
i∈I
(F ∧ Fi ) (where
i∈I
Fi = ∩ Fi and i∈I
i∈I
Fi = [ ∪ Fi )n ), i∈I
D. Bu¸sneag and F. Chirte¸s / Central European Journal of Mathematics 5(3) 2007 470–483
477
(iii) Fn (L) is an algebraic sublattice of F (L). Definition 3.12. For F1 , F2 ∈ Fn (L) we define: F1 F2 = {F ∈ Fn (L) : F ∩ F1 ⊆ F2 } (see Proposition 3.11, (ii)). Proposition 3.13. If F1 , F2 ∈ Fn (L), then (i) F1 F2 exists and F1 F2 = {a ∈ L : [a)n ∩ F1 ⊆ F2 }, (ii) for every F ∈ Fn (L), F ∩ F1 ⊆ F2 iff F ⊆ F1 F2 . Proof. (i). We have that: F1 F2 = [∪{F ∈ Fn (L) : F ∩ F1 ⊆ F2 })n = {a ∈ L : there exists x ∈ ∪{F ∈ Fn (L) : F ∩ F1 ⊆ F2 } such that ϕ1 (x) ≤ a} = {a ∈ L : there are F ∈ Fn (L) and x ∈ F such that F ∩ F1 ⊆ F2 and a ∈ [x)n }. If [a)n ∩ F1 ⊆ F2 , then taking F = [a)n and x = a we see that a ∈ F1 F2 . Conversely, suppose a ∈ F1 F2 . Let x, F be such that x ∈ F ∈ Fn (L), F ∩ F1 ⊆ F2 and a ∈ [x)n . Then ϕ1 (x) ≤ a, hence ϕ1 (x) = ϕ1 (ϕ1 (x)) ≤ ϕ1 (a). If y ∈ [a)n , then ϕ1 (a) ≤ y, hence ϕ1 (x) ≤ y, proving that y ∈ F . Thus [a)n ∩ F1 ⊆ F ∩ F1 ⊆ F2 . (ii). Let F ∈ Fn (L) such that F ∩ F1 ⊆ F2 and x ∈ F . Since [x)n ⊆ F and [x)n ∩ F1 ⊆ F ∩ F1 ⊆ F2 we deduce that x ∈ F2 , hence F ⊆ F1 F2 (by (i)). Conversely, suppose that F ⊆ F1 F2 and let x ∈ F ∩ F1 . Then x ∈ F1 F2 , hence [x)n ∩ F1 ⊆ F2 . Since x ∈ [x)n ∩ F1 , then x ∈ F2 , that is, F ∩ F1 ⊆ F2 . Remark 3.14. From Proposition 3.13, (ii), we deduce that (Fn (L), ∨, ∧, , {1}) is a Heyting algebra and for F ∈ Fn (L), the pseudocomplement of F is F ∗ = F {1} = {x ∈ L : [x)n ∩ F = {1}}. So, for a ∈ L, [a)∗n = {x ∈ L : [x)n ∩ [a)n = {1}}={x ∈ L : [x ∨ a)n = {1}} = {x ∈ L : x ∨ a = 1}. Thus, if a ∈ F and b ∈ F ∗ , then a ∨ b = 1. Corollary 3.15. (i) If F1 , F2 ∈ Fn (L), then F1∗ ∩ F2∗ = (F1 ∨ F2 )∗ , (ii) if x, y ∈ L, then [x)∗n ∩ [y)∗n = [x ∧ y)∗n . Proof. (i). Straightforward (from Lemma 3.9, (3) because Fn (L) is a Heyting algebra). (ii). Follows from (i) and Proposition 3.6, (ii). Theorem 3.16. The following conditions are equivalent: (i) (Fn (L), ∨, ∧,∗ , {1}, L) is a Boolean algebra, (ii) every n-filter of L is principal and for every x ∈ L there exists y ∈ L such that x ∨ y = 1 and ϕ1 (x) ∧ ϕ1 (y) = 0. Proof. (i) ⇒ (ii). Let F ∈ Fn (L); since Fn (L) is a Boolean algebra, then F ∨ F ∗ = L. So, by Corollary 3.5, (iii), for 0 ∈ L, there exist a ∈ F, b ∈ F ∗ such that ϕ1 (a ∧ b) = 0, that is, ϕ1 (a) ∧ ϕ1 (b) = 0.
478
D. Bu¸sneag, F. Chirte¸s / Central European Journal of Mathematics 5(3) 2007 470–483
If x ∈ F , since b ∈ F ∗ , we have b ∨ x = 1, hence ϕ1 (b) ∨ ϕ1 (x) = 1. Since ϕ1 (a) = ϕ1 (a)∧1 = ϕ1 (a)∧(ϕ1 (b)∨ϕ1 (x)) = (ϕ1 (a)∧ϕ1 (b))∨(ϕ1 (a)∧ϕ1 (x)) = 0∨(ϕ1 (a)∧ϕ1 (x)) = ϕ1 (a) ∧ ϕ1 (x) it follows that ϕ1 (a) ≤ ϕ1 (x) ≤ x, hence F ⊆ [a)n , therefore F = [a)n . Now suppose x ∈ L; since Fn (L) is a Boolean algebra, then 0 ∈ L = [x)n ∨ [x)∗n , hence by Corollary 3.5, (ii), there exists y ∈ [x)∗n such that ϕ1 (x ∧ y) = 0, hence ϕ1 (x) ∧ ϕ1 (y) = ϕ1 (x ∧ y) = 0. Since x ∈ [x)n and y ∈ [x)∗n , then x ∨ y = 1 (by Remark 3.14). (ii) ⇒ (i). By Remark 3.14, Fn (L) is a Heyting algebra. To prove that Fn (L) is a Boolean algebra, we must show that for F ∈ Fn (L), F ∗ = {1} only for F = L (see [1], p. 175). By hypothesis, every n-filter of L is principal, so we have a ∈ L such that F = [a)n . Also, by hypothesis, for a ∈ L, there is b ∈ L such that a ∨ b = 1 and ϕ1 (a) ∧ ϕ1 (b) = 0. Then b ∈ F ∗ = {1}, hence b = 1, so ϕ1 (a) = 0. Thus 0 = ϕ1 (a) ∈ F , hence F = L.
4
The spectrum of an LMn-algebra
We recall [1, 8] that in a distributive lattice the notions of meet-irreducible and meetprime element coincide. Let L be an LMn -algebra. For the lattice Fn (L)(which by Proposition 3.11, (ii) is distributive) we denote by Specn (L) the set of all meet-irreducible (hence meet-prime) elements of Fn (L)(Specn (L) is called the spectrum of L) and by Ircn (L) the set of all completely meet-irreducible elements of Fn (L). By Spec(L) we denote the set of all meetprime elements of the lattice F (L)(hence meet-prime elements, because the lattice F (L) is also distributive). We denote by Maxn (L)(Max(L)) the set of all maximal n-filters (filters) of L. Proposition 4.1. (Boicescu et al. [3], Cignoli [5], Iorgulescu [9]) For a proper n-filter (filter) F of L the following conditions are equivalent: (i) F is a prime n-filter (filter), (ii) F ∈ Specn (L) (F ∈ Spec(L)), (iii) if x, y ∈ L and x ∨ y ∈ F , then x ∈ F or y ∈ F . Remark 4.2. From Proposition 4.1 we deduce that Specn (L) = Fn (L) ∩ Spec(L). Theorem 4.3. (Boicescu et al. [3]) For F ∈ Fn (L) then the following conditions are equivalent: (i) F is a maximal n-filter, (ii) F¯ is an ultrafilter of C(L)( where F¯ = F ∩ C(L)), (iii) F¯ is a prime filter of C(L), (iv) F is a prime n-filter (hence Maxn (L) = Specn (L)), (v) for every i ∈ J and x ∈ L, ϕi (x) ∈ F or Nϕi (x) ∈ F , (vi) F is a minimal prime filter.
D. Bu¸sneag and F. Chirte¸s / Central European Journal of Mathematics 5(3) 2007 470–483
479
Remark 4.4. (Boicescu et al. [3], Remark 5.2.4, p. 266) (a) Any minimal prime filter is necessarily an n-filter. (b) Prime (or, equivalently, maximal) n-filters coincide with minimal prime filters. Theorem 4.5. (Boicescu et al. [3], p. 267) Suppose F ∈ Fn (L), ∅ = S ⊆ L, F ∩ S = ∅ and x, y ∈ S imply x ∨ y ∈ S. Then there is P ∈ Specn (L) such that P ∩ S = ∅ and F ⊆ P. Corollary 4.6. If F ∈ Fn (L) is proper and a ∈ L\F , then there is F ∈ Specn (L) such / F . In particular, for F = {1} we deduce that for any a ∈ L, a = 1 that F ⊆ F and a ∈ there is Fa ∈ Specn (L) such that a ∈ / Fa . Proof. By Theorem 4.5, taking S = (a]n = {x ∈ L : x ≤ ϕn−1 (a)}(if a ∈ C(L) then (a]n = (a]). Proposition 4.7. For a proper n-filter F of L the following conditions are equivalent: (i) F is prime, (ii) for any a, b ∈ L, a ; b ∈ F or b ; a ∈ F . Proof. (i) ⇒ (ii). Clearly, by Proposition 4.1, (iii), since (a ; b) ∨ (b ; a) = 1 ∈ F ( by (c11 )). (ii) ⇒ (i). Let a, b ∈ L such that a ∨ b ∈ F and suppose that a ; b ∈ F. By (c13 ) we have ((a ; b) ; b) ∧ ((b ; a) ; a) = ϕ1 (a ∨ b) ∨ (a ∧ b). Since a ∨ b ∈ F , then ϕ1 ( a∨b) ∈ F , hence ϕ1 (a∨b)∨(a∧b) ∈ F . Thus (a ; b) ; b ∈ F and (b ; a) ; a ∈ F . By Proposition 2.10 we deduce that b ∈ F . If we suppose b ; a ∈ F , then a ∈ F , hence F is prime. Corollary 4.8. If F ∈ Specn (L), then for any a ∈ F there is b ∈ L\F such that a∨b = 1. Proof. Let F ∈ Specn (L) and a ∈ F. We consider the set I = {x ∈ L : there is b ∈ L\F such that a ∨ b ≥ x}. If b ∈ L\F since a ∨ b ≥ b then b ∈ I, so L\F ⊆ I. Moreover, a ∈ I because a ∨ 0 ≥ a and 0 ∈ L\F. We shall prove that I is an ideal of the lattice L. Let x, y ∈ L such that y ∈ I and x ≤ y. Thus, there is b ∈ L\F such that a ∨ b ≥ y ≥ x, hence a ∨ b ≥ x, so x ∈ I. If x, y ∈ I then there are b, c ∈ L\F such that a ∨ b ≥ x and a ∨ c ≥ y. If we suppose that b ∨ c ∈ F we get b ∈ F or c ∈ F because F is a prime n-filter. Thus b ∨ c ∈ L\F and a ∨ (b ∨ c) ≥ x ∨ y, so x ∨ y ∈ I, hence I is an ideal of the lattice L. Now suppose 1 ∈ / I. It follows that {1} ∩ I = ∅, so, by Theorem 4.5, there is a prime n-filter F such that I ∩ F = ∅. Since L\F ⊆ I, we get F ⊆ F . But F is minimal prime (by Theorem 4.3) and F is prime, so F = F . On the other hand a ∈ I, so a ∈ / F . We /I get a ∈ F \F which contradicts the fact that F = F . Thus, our assumption that 1 ∈ is false. We conclude that 1 ∈ I, hence I = L and our proof is finished.
480
D. Bu¸sneag, F. Chirte¸s / Central European Journal of Mathematics 5(3) 2007 470–483
Proposition 4.9. For a proper n-filter F of L the following conditions are equivalent: (i) F ∈ Specn (L), (ii) for every x, y ∈ L\F , there is z ∈ L\F such that ϕ1 (x) ≤ z and ϕ1 (y) ≤ z. Proof. (i) ⇒ (ii). Let F ∈ Specn (L) and x, y ∈ L\F . If by contrary, for every a ∈ L with ϕ1 (x) ≤ a and ϕ1 (y) ≤ a then a ∈ F , since ϕ1 (x), ϕ1 (y) ≤ x∨y we deduce x∨y ∈ F . Hence x ∈ F or y ∈ F , a contradiction. (ii) ⇒ (i). We suppose by contrary that there exist F1 , F2 ∈ Fn (L) such that F1 ∩F2 = F and F = F1 , F = F2 . So, there exist x ∈ F1 \F and y ∈ F2 \F . By hypothesis there is z ∈ L\F such that ϕ1 (x) ≤ z and ϕ1 (y) ≤ z. Clearly ϕ1 (x) ∈ F1 and ϕ1 (y) ∈ F2 , so, we deduce that z ∈ F1 ∩ F2 , a contradiction. Corollary 4.10. For a proper n-filter F of L the following conditions are equivalent: (i) F ∈ Specn (L), (ii) if x, y ∈ L and [x)n ∩ [y)n ⊆ F , then x ∈ F or y ∈ F . Proof. (i) ⇒ (ii). Let x, y ∈ L such that [x)n ∩ [y)n ⊆ F . But F is meet-prime, hence [x)n ⊆ F or [x)n ⊆ F , that is, x ∈ F or y ∈ F . (ii) ⇒ (i). Let x, y ∈ L such that x ∨ y ∈ F . Then [x ∨ y)n ⊆ F . Since [x ∨ y)n = [x)n ∩ [y)n (by Proposition 3.6, (ii)) we deduce that [x)n ∩ [y)n ⊆ F , hence x ∈ F or y ∈ F . Therefore F ∈ Specn (L). Corollary 4.11. For a proper n-filter F of L the following conditions are equivalent: (i) F ∈ Specn (L), (ii) for every x, y ∈ L/F , x = 1, y = 1, there is z ∈ L/F, z = 1 such that ϕ˜1 (x), ϕ˜1 (y) ≤ z. Proof. (i) ⇒ (ii). If x = a/F, y = b/F, x = 1, y = 1, then a ∈ / F and b ∈ / F . By Proposition 4.9, there is c ∈ / F such that ϕ1 (a), ϕ1 (b) ≤ c. If we consider z = c/F then z = 1 and ϕ˜1 (x), ϕ˜1 (y) ≤ z (by Proposition 2.13, (ii)). (ii) ⇒ (i). Let a, b ∈ L\F and x = a/F, y = b/F ∈ L/F . Then x = 1, y = 1 (in L/F ). By hypothesis there is z = c/F = 1 (that is, c ∈ / F ) such that ϕ˜1 (x), ϕ˜1 (y) ≤ z. By Proposition 2.13, (iii), we deduce that ϕ1 (a) ; c, ϕ1 (b) ; c ∈ F . If we consider d = (ϕ1 (b) ; c) ; ((ϕ1 (a) ; c) ; c) = ϕ1 (a) ∨ ϕ1 (b) ∨ c (by (c14 )) then ϕ1 (a), ϕ1 (b) ≤ d and d ∈ / F (by Proposition 2.10). By Proposition 4.9 we deduce that F ∈ Specn (L). Theorem 4.12. For a proper n-filter F ∈ Fn (L) the following are equivalent: (i) F ∈ Specn (L), (ii) for every D ∈ Fn (L), D F = F or D ⊆ F. Proof. (i) ⇒ (ii). Let F ∈ Specn (L). Since Fn (L) is a Heyting algebra, for D ∈ Fn (L) we have F = (D F ) ∩ ((D F ) F )(by Lemma 3.9, (7)) and so F = D F or F = (D F ) F . If F = (D F ) F then D ⊆ F (by Lemma 3.9, (5)).
D. Bu¸sneag and F. Chirte¸s / Central European Journal of Mathematics 5(3) 2007 470–483
481
(ii) ⇒ (i). Let F1 , F2 ∈ Fn (L) such that F1 ∩ F2 = F . Then F1 ⊆ F2 F (by Definition 3.12) and so, if F2 ⊆ F then F = F2 and if F2 F = F , then F = F1 . Therefore F ∈ Specn (L). We recall that if L is a Heyting algebra, then two subsets associated with L (see [1], p. 153) are Rg (L) = {x ∈ L : x∗∗ = x} and D(L) = {x ∈ L : x∗ = 0}. The elements of Rg (L) are called regular and those of D(L) dense. Note that {0, 1} ⊆ Rg (L), 1 ∈ D(L), D(L) is a filter in L and Rg (L) is a Boolean algebra by the ordering of L (see [1], p. 157). Corollary 4.13. Specn (L) ⊆ D(Fn (L)) ∪ Rg (Fn (L)). Proof. Let F ∈ Specn (L); then by Theorem 4.12, F ∗ ⊆ F or F ∗ F = F . Since Fn (L) is a Heyting algebra, then F ∗ = {1} (by Lemma 3.9, (4)) or F = F ∗∗ (by Lemma 3.9, (6)), hence F ∈ D(Fn (L)) ∪ Rg (Fn (L)). Remark 4.14. (Boicescu et al. [3]) From Corollary 4.6 we deduce that for every proper F ∈ Fn (L) F = ∩{F ∈ Specn (L) : F ⊆ F } and ∩{F ∈ Specn (L)} = {1}. Relative to the uniqueness of deductive systems as intersection of primes we have: Theorem 4.15. If every F ∈ Fn (L) has a unique representation as an intersection of elements of Specn (L), then (Fn (L), ∨, ∧,∗ , {1}, L) is a Boolean algebra. Proof. Let F ∈ Fn (L) and F = ∩{M ∈ Specn (L) : F M} ∈ Fn (L). By Remark 4.14, F ∩ F = ∩{M ∈ Specn (L)} = {1}; if F ∨ F = L, then by Corollary 4.6 there exists F ∈ Specn (L) such that F ∨ F ⊆ F and F = L. Consequently, F has two representations F = ∩{M ∈ Specn (L) : F M} = F ∩ (∩{M ∈ Specn (L) : F M}), which is a contradiction. Therefore F ∨ F = L and so Fn (L) is a Boolean lattice. Definition 4.16. Let F ∈ Fn (L) and a ∈ L. We say that F is maximal relative to a if a∈ / F and F is maximal with this property. Proposition 4.17. Let F ∈ Fn (L) and a ∈ L\F . Then there exists an n-filter Fa maximal relative to a and F ⊆ Fa . Proof. By a well-known argument based on Zorn’s lemma (see [7]).
482
D. Bu¸sneag, F. Chirte¸s / Central European Journal of Mathematics 5(3) 2007 470–483
Theorem 4.18. For F ∈ Fn (L), F = L the following conditions are equivalent: (i) F ∈ Ircn (L), (ii) there is a ∈ L such that F is maximal relative to a. Proof. (i) ⇒ (ii). Let F ∈ Ircn (L). For every a ∈ L\F let Fa be the filter constructed in Proposition 4.17. Then F ⊆ Fa , hence there is a ∈ L such that F = Fa . a∈L\F (ii) ⇒ (i). Let F ∈ Fn (L) maximal relative to a and suppose F = Fi with Fi ∈ i∈I
/ F there is j ∈ I such that a ∈ / Fj (F ⊆ Fj ). By the Fn (L) for every i ∈ I. Since a ∈ maximality of F we deduce that F = Fj , therefore F ∈ Ircn (L). Theorem 4.19. Let F ∈ Fn (L), F = L and a ∈ L\F . Then the following conditions are equivalent: (i) F is maximal relative to a, (ii) for every x ∈ L\F , x ; a ∈ F. Proof. (i) ⇒ (ii). Let x ∈ L\F and F (x) = F ∨ [x)n = {y ∈ L : ϕ1 (f ∧ x) ≤ y for some f ∈ F }. Then F ⊂ F (x), hence the maximality of F implies a ∈ F (x), that is, ϕ1 (f ∧ x) ≤ a for some f ∈ F . Thus ϕ1 (f ) ∧ ϕ1 (x) ≤ a, hence (by Corollary 2.7) ϕ1 (f ) ≤ ϕ1 (x) ; a. Since f ∈ F then ϕ1 (f ) ∈ F , so x ; a = ϕ1 (x) ; a ∈ F (by (c8 )). / F and F ⊂ F , (ii) ⇒ (i). If by contrary there is F ∈ Fn (L), F = L such that a ∈ then there is x0 ∈ F such that x0 ∈ / F . By hypothesis we have x0 ; a ∈ F , hence a ∈ F , a contradiction. Corollary 4.20. Let F ∈ Fn (L), F = L. If the set (L/F )\{1} has greatest element, then F ∈ Ircn (L). Proof. Let p = a/F be the greatest element of (L/F )\{1}(hence a ∈ / F ) and x = b/F ∈ (L/F )\{1}(that is, b ∈ / F ). By hypothesis x ≤ p, hence b ; a ∈ F , that is, F ∈ Ircn (L) (by Theorem 4.19).
Acknowledgements The authors wish to express their appreciation for several excellent suggestions for improvements in this paper made by the referees.
References [1] R. Balbes and Ph. Dwinger: Distributive Lattices, University of Missouri Press, 1974. [2] G. Birkhoff: Lattice theory, (3rd Edition, 2nd Printing), Amer. Math. Soc., Colloquium Publications XXV, 1973. [3] V. Boicescu, A. Filipoiu, G. Georgescu and S. Rudeanu: L ukasiewicz-Moisil Algebras, North Holland, 1991.
D. Bu¸sneag and F. Chirte¸s / Central European Journal of Mathematics 5(3) 2007 470–483
483
[4] R. Cignoli: “Boolean multiplicative closures I y II”, P. Jpn. Acad., Vol. 42, (1965), pp. 1168–1174. [5] R. Cignoli: Algebras de Moisil de orden n, Thesis (PhD), Universidad National del Sur Bah´ıa Blanca, 1969. [6] R. Cignoli: Moisil Algebras, Notas de L´ogica Matem´atica, 27, Univ. National del Sur, Bah´ıa Blanca, 1970. [7] G. Georgescu and M. Ploˇsˇcica: “Values and minimal spectrum of an algebric lattice”, Math. Slovaca, Vol. 52, (2002), pp. 247–253. [8] G. Gr¨atzer: Lattice theory, W. H. Freeman and Company, San Francisco, 1979. [9] A. Iorgulescu: (1 + θ)-valued L uksiewicz-Moisil algebras with negation, Thesis (PhD), Univ. of Bucharest, 1984. [10] Gr.C. Moisil: “Recherches sur les logiques non-chrysippienns,” An. Sci. Univ. Jassy, Vol. 26, (1940), pp. 195–232. [11] Gr.C. Moisil: “Applicationi dell’algebra alle calculatrici moderne”, In: Atti 2a Reunione del Groupement des Math. d’Expression Latine, 26.IX-3.X.1961, Ed. Cremonese, Roma. [12] Gr.C. Moisil: ”Lukasiewiczian algebras”, Preprint: Computing Center, Univ. Bucharest, 1968, pp. 311–324. [13] A. Monteiro: “L’arithm´etique des filtres et les espaces topologiques”, In: ’it Segundo Symposium Americano de Mat., Centro de Cooperati´on Cient´ıfica de la UNESCO para Am´erica Latina, Montevideo, 1954, pp. 129–162. [14] A. Monteiro: “Construction des alg`ebres de L ukasiewicz trivalentes dans les alg`ebres de Boole monadiques”, Math. Japon. Vol. 12, (1967), pp. 1–23.
DOI: 10.2478/s11533-007-0021-5 Research article CEJM 5(3) 2007 484–492
Distributive implication groupoids Ivan Chajda∗, Radomir Halaˇs† Department of Algebra and Geometry, Palack´y University Olomouc Tomkova 40, 779 00 Olomouc, Czech Republic
Received 26 June 2006; accepted 7 May 2007 Abstract: We introduce a concept of implication groupoid which is an essential generalization of the implication reduct of intuitionistic logic, i.e. a Hilbert algebra. We prove several connections among ideals, deductive systems and congruence kernels which even coincide whenever our implication groupoid is distributive. c Versita Warsaw and Springer-Verlag Berlin Heidelberg. All rights reserved. Keywords: (distributive) implication groupoid, ideal, deductive system, congruence kernel, left distributivity MSC (2000): 08A30, 06F35, 20N02
1
Introduction
The concept of Hilbert algebra as an algebraic counterpart of intuitionic logic was introduced in 50-ties by L. Henkin and T. Skolem. Recall that Hilbert algebra is an algebra H = (H, •, 1) of type (2, 0) satisfying the axioms (H1) x • (y • x) = 1 (H2) (x • (y • z)) • ((x • y) • (x • z)) = 1 (H3) x • y = 1 and y • x = 1 imply x = y. It can be shown elementarily (see e.g. [1]) that (H2) can be replaced by two but rather simpler axioms (LD) x • (y • z) = (x • y) • (x • z) (left distributivity) (E) x • (y • z) = y • (x • z). (exchange) ∗ †
E-mail:
[email protected] E-mail:
[email protected]
I. Chajda et al. / Central European Journal of Mathematics 5(3) 2007 484–492
485
However, in certain considerations we need not take (H3) and/or (E). This motivated us to introduce an essentially simpler concept: Definition 1.1. An algebra A = (A, •, 1) of type (2, 0) is called an implication groupoid if it satisfies the identities (IG1) x•x= 1 (IG2) 1 • x = x. If A, moreover, satisfies also (LD), we call it distributive implication groupoid. At first, it is a natural question on the independence of axioms (LD) and (E), i.e. if there exist distributivite implication groupoids which do not satisfy (E). We can answer this question in positive by the following Example 1.2. The five element groupoid given by the table •
1
a
b
c
d
1
1
a
b
c
d
a 1
1
b
b
1
b
1
a
1 1 d
c
1
a
1 1 d
d 1
1
c
c
1
is clearly an implication groupoid which does not satisfy (E) since d • (a • c) = d • b = c = b = a • c = a • (d • c). The reader can verify by tedious computation that this implication groupoid is distributive, but it is not a Hilbert algebra.
2
Induced quasiorders
In every implication groupoid, one can introduce the so called induced relation ≤ by the setting x ≤ y if and only if x • y = 1. Clearly, this relation is reflexive. We can state several basic properties of implication groupoids: Lemma 2.1. Let A = (A, •, 1) be a distributive implication groupoid. Then A satisfies the identities x • 1 = 1 and x • (y • x) = 1.
486
I. Chajda et al. / Central European Journal of Mathematics 5(3) 2007 484–492
Moreover, the induced relation ≤ is a quasiorder (i.e. reflexive and transitive relation) on A and the following relationships are satisfied (i) x ≤ 1, (ii) x ≤ y • x, (iii) x ≤ (x • y) • y, (iv) 1 ≤ x implies x = 1, (v) y • z ≤ (x • y) • (x • z), (vi) x ≤ y implies y • z ≤ x • z, (vii) x • (y • z) ≤ y • (x • z), (viii) x • y ≤ (y • z) • (x • z). Proof. Evidently, x•1 = x•(x•x) = (x•x)•(x•x) = 1 and x•(y •x) = (x•y)•(x•x) = (x • y) • 1 = 1 by (LD). Now, suppose x ≤ y and y ≤ z. Then x • y = 1 = y • z and x • z = 1 • (x • z) = (x • y) • (x • z) = x • (y • z) = x • 1 = 1 giving x ≤ z, i.e ≤ is a quasiorder on A. Further, x • 1 = 1 gets x ≤ 1 and the proved identity x • (y • x) = 1 implies x ≤ y • x. One can compute x • [(x • y) • y)] = [x • (x • y)] • (x • y) = [(x • x) • (x • y)] • (x • y) = [1 • (x • y)] • (x • y) = (x • y) • (x • y) = 1 proving x ≤ (x • y) • y. Further, 1 ≤ x yields x = 1 • x = 1. Now, using of (LD) we derive (y • z) • [(x • y) • (x • z)] = (y • z) • [x • (y • z)] = [(y • z) • x] • [(y • z) • (y • z)] = [(y • z) • x] • 1 = 1, showing y • z ≤ (x • y) • (x • z). Suppose x ≤ y. Then x • y = 1 and we have (x • y) • [(y • z) • (x • z)] = [(x • y) • (y • z)] • [(x • y) • (x • z)] = [(x • y) • (y • z)] • [x • (y • z)], i.e. if x • y = 1 we obtain (y • z) • (x • z) = (y • z) • [x • (y • z)] = 1 by the previously proved identity. We conclude y • z ≤ x • z. For the next condition, we apply y ≤ x • y and the just proved relationship to get x • (y • z) = (x • y) • (x • z) ≤ y • (x • z). Analogously, we have y • z ≤ x • (y • z) thus (y • z) • [x • (y • z)] = 1. Hence, 1 = (y • z) • [x • (y • z)] = (y • z) • [(x • y) • (x • z)] ≤ (x • y) • [(y • z) • (x • z)] which proves x • y ≤ (y • z) • (x • z). Lemma 2.2. Let A = (A, •, 1) be a distributive implication groupoid and ≤ its induced quasiorder. If ≤ is an order on A then A satisfies the exchange axiom (E). Proof. Suppose that the quasiorder ≤ is also antisymmetric. Applying twice the relationship x • (y • z) ≤ y • (x • z), we obtain x • (y • z) ≤ y • (x • z) ≤ x • (y • z), and hence x • (y • z) = y • (x • z)
I. Chajda et al. / Central European Journal of Mathematics 5(3) 2007 484–492
487
proving (E).
Corollary 2.3. Let A be a distributive implication groupoid. The induced quasiorder ≤ is an order if and only if A is a Hilbert algebra. Proof. By [3], the induced relation on a Hilbert algebra A = (A, •, 1) is an order on A. The converse follows by Lemma 2.2 and the fact that antisymmetry of ≤ is equivalent to the axiom (H3). The concept of implication algebra was introduced by J.C. Abbott [1] to describe properties of logical connective ”implication” in a classical logic. Recall that a groupoid A = (A, •) is an implication algebra whenever it satisfies the axioms (I1) (I2) (I3)
(x • y) • x = x (x • y) • y = (y • x) • x x • (y • z) = y • (x • z)
(contraction) (quasi-commutativity) (exchange)
(see [1] or [2]). It was shown in [1] that it satisfies also x • x = y • y, i.e. x • x is an algebraic constant which is denoted by 1. It is well-known that every implication algebra is also a Hilbert algebra (and hence an implication groupoid). Then the induced relation ≤ on an implication algebra A is an order. However, we can show that the axiom (I2) is enough to ensure this property: Lemma 2.4. Let A = (A, •, 1) be a distributive implication groupoid satisfying (I2). Then the induced relation ≤ is an order on A. Proof. By Lemma 2.1, ≤ is a quasiorder on A thus we need only to show antisymmetry of ≤. Suppose a, b ∈ A with a ≤ b and b ≤ a. Then a • b = 1, b • a = 1 and, by (I2), we have a = 1 • a = (b • a) • a = (a • b) • b = 1 • b = b.
Theorem 2.5. A distributive implication groupoid is an implication algebra if and only if it satisfies the axiom (I2).
Proof. Let A = (A, •, 1) be a distributive implication groupoid and suppose that A satisfies (I2). By Lemma 2.4, the induced relation ≤ is an order on A and, by Corollary 2.3, A is a Hilbert algebra. By the Theorem and Corollary in [5], every Hilbert algebra satisfying (I2) is an implication algebra. The converse assertion is trivial.
488
I. Chajda et al. / Central European Journal of Mathematics 5(3) 2007 484–492
Remark 2.6. Theorem 2.5 is a strengthening of the main result of [5] saying that every Hilbert algebra satisfying (I2), the so called positive Hilbert algebra, is in fact an implication algebra.
3
Ideals and deductive systems
The concept of ideal for Hilbert algebras coincides with that of implication algebras and it was introduced formerly by the authors. The concept of deductive systems for Hilbert algebras was introduced by A. Diego [3]. It was proved by W. Dudek [4] that these concepts coincide in every Hilbert algebra. We are interested under what condition the same is valid for implication groupoid where the formal definitions remain unchanged: Definition 3.1. Let A = (A, •, 1) be an implication groupoid. A subset I ⊆ A is called an ideal of A whenever (1) 1 ∈ D (2) x ∈ A, y ∈ I imply x • y ∈ I (3) x ∈ A, y1 , y2 ∈ I imply (y2 • (y1 • x)) • x ∈ I.
Remark 3.2. The following assertion is useful: (*) If I is an ideal of an implication groupoid A = (A, •, 1) and a ∈ I, x ∈ A then (a • x) • x ∈ I. It follows immediately by (3) when one takes y1 = a and y2 = 1. Definition 3.3. Let A = (A, •, 1) be an implication groupoid. A subset D ⊆ A is called a deductive system of A whenever (1) 1 ∈ D (4) x ∈ D and x • y ∈ D imply y ∈ D. Of course, when the binary operation of A is considered to be a propositional logical connective ”implication” then (4) is an algebraic expression for Modus Ponens, thus deductive systems are just the sets of true values containing 1 (i.e. the ”highest” true) and closed under deductive derivation. Lemma 3.4. Let A be an implication groupoid. Then every ideal of A is a deductive system of A. Proof. We need to prove (4). For this, let A = (A, •, 1) be an implication groupoid, I be an ideal of A and x ∈ I, x • y ∈ I for y ∈ A. Put a1 = x • y, a2 = (x • y) • y. By (*), a2 ∈ I and a1 ∈ I by the assumption. Applying (3), we conclude y = 1 • y = [((x • y) • y) • ((x • y) • y)] • y = (a2 • (a1 • y)) • y ∈ I. We shall show that the converse of Lemma 3.4 does not hold in general. Example 3.5. Consider the following (non-distributive) implication groupoid:
I. Chajda et al. / Central European Journal of Mathematics 5(3) 2007 484–492
•
1
a
b
1
1
a
b
a
a
1
b
b
a
b
1
489
One can easily verify that {1, a} is its deductive system which is not an ideal since b • a = b ∈ {1, a}. However, for distributive implication groupoids we have Theorem 3.6. In every distributive implication groupoid, ideals and deductive systems coincide. Proof. With respect to Lemma 3.4, we need only to show that every deductive system of a distributive implication groupoid A = (A, •, 1) is an ideal of A. Thus we need to verify the conditions (2) and (3). Let D be a deductive system of A and a ∈ D, x ∈ A. Thus, by Lemma 2.1, a•(x•a) = 1 ∈ D and, applying (4), we conclude x • a ∈ D, i.e. D satisfies (2). Now, suppose a1 , a2 ∈ D and x ∈ A. Applying Lemma 2.1 and (LD), we derive a2 • [(a2 • (a1 • x)) • x] = [a2 • (a2 • (a1 • x))] • (a2 • x) = [(a2 •a2 )•(a2 •(a1 •x))]•(a2 •x) = (a2 •(a1 •x))•(a2 •x) = a2 •((a1 •x)•x) ≥ (a1 •x)•x ≥ a1 . Hence a1 • (a2 • [a2 • (a1 • x)) • x]) = 1 ∈ D and applying twice (4), we infer a2 • [(a2 • [a2 • (a1 • x)]) • x] ∈ D and, finally, (a2 • (a1 • x)) • x ∈ D proving (3).
4
Congruence Kernels
Let A = (A, •, 1) be an implication groupoid. Denote by ConA its congruence lattice. If θ ∈ ConA, the subset [1]θ = {x ∈ A; x, 1 ∈ θ} of A is called the congruence kernel of θ. Lemma 4.1. Let A = (A, •, 1) be an implication groupoid, let θ ∈ ConA. Then the kernel [1]θ is a deductive system of A. Proof. Suppose x ∈ [1]θ and x • y ∈ [1]θ for θ ∈ ConA. Then x, 1 ∈ θ and hence also x • y, y = x • y, 1 • y ∈ θ. However, x • y ∈ [1]θ gives x • y, 1 ∈ θ. By using of symmetry and transitivity of θ, we obtain y, 1 ∈ θ proving y ∈ [1]θ . Since 1 ∈ [1]θ , we are done.
490
I. Chajda et al. / Central European Journal of Mathematics 5(3) 2007 484–492
Lemma 4.2. Let A = (A, •, 1) be an implication groupoid and θ ∈ ConA. Then the kernel [1]θ is an ideal of A. Proof. Clearly 1 ∈ [1]θ . Suppose x ∈ A, y ∈ [1]θ . Then y, 1 ∈ θ and hence x • y, 1 = x • y, x • 1 ∈ θ, proving x • y ∈ [1]θ , thus [1]θ satisfies (2). Suppose x ∈ A and y1 , y2 ∈ [1]θ . We have (y2 • (y1 • x)) • x, 1 = (y2 • (y1 • x)) • x, (1 • (1 • x)) • x ∈ θ, thus also (y2 • (y1 • x)) • x ∈ [1]θ proving (3). Hence, [1]θ is an ideal of A.
Theorem 4.3. Let A = (A, •, 1) be a distributive implication groupoid. Then every ideal I of A is a kernel of a congruence θI given by the setting x, y ∈ θI if and only if x • y ∈ I and y • x ∈ I. Moreover, θI is the greatest congruence on A having the kernel I. Proof. Let I be an ideal of A. Since 1 ∈ I by (1), the relation θI is reflexive. Evidently, θI is symmetric. We prove transitivity of θI : Let x, y ∈ θI and y, z ∈ θI . Then x • y, y • x, y • z, z • y ∈ I and, by (2), also x • (y • z) ∈ I. Hence, applying (LD), also (x • y) • (x • z) ∈ I. By Theorem 3.6, I is also a deductive system and then x • y ∈ I and (x • y) • (x • z) ∈ I yield x • z ∈ I. Analogously we can prove z • x ∈ I, thus x, z ∈ θI . It remains to show the substitution property of θI . For this, let x, y ∈ θI and u, v ∈ θI . Then x • y, y • x, u • v, v • u ∈ I. By (LD) and (2) we have (x • u) • (x • v) = x • (u • v) ∈ I (x • v) • (x • u) = x • (v • u) ∈ I whence x • u, x • v ∈ θI . Further, by Lemma 1, we have (x • v) • (y • v) ≥ y • x (y • v) • (x • v) ≥ x • y. However, if a ∈ I and a ≤ b then a • b = 1 and b = 1 • b = (a • b) • b ∈ I by (*). Thus the foregoing relationships give (x • v) • (y • v) ∈ I and (y • v) • (x • v) ∈ I, i.e. x • v, y • v ∈ θI . By using of transitivity of θI , we conclude x • u, y • v ∈ θI and thus θ ∈ ConA. If x ∈ I then 1 • x = x ∈ I and x • 1 = 1 ∈ I which means x, 1 ∈ θI , i.e. x ∈ [1]θI . Conversely, if x ∈ [1]θI then x, 1 ∈ θI and hence x = 1 • x ∈ I. We have shown I = [1]θI thus I is the kernel of θI . Finally, if ψ ∈ ConA and [1]ψ = I then for x, y ∈ ψ we have x • y, 1 = x • y, y • y ∈ ψ y • x, 1 = y • x, y • y ∈ ψ
I. Chajda et al. / Central European Journal of Mathematics 5(3) 2007 484–492
491
thus x • y ∈ I and y • x ∈ I which yield x, y ∈ θI . Hence ψ ⊆ θI , i.e. θI is the greatest congruence on A having the kernel I. Corollary 4.4. In every distributive implication groupoid, ideals, deductive systems and congruence kernels coincide. Proof. It follows directly by Theorem 3.6 and Theorem 4.3.
We have seen in Example 3.5 that for general implication groupoids their deductive systems need not be ideals. The following example shows that also ideals need not be congruence kernels: Example 4.5. Consider the implication groupoid A given by the table •
1
a
b
c
d
1
1
a
b
c
d
a
a
1
c
d
d
b
a
a
1
c
c
c
a
a a
1
c
d
a
a a
a
1
One can show that {1, a} is its ideal but it is not a congruence kernel: 1, a ∈ θ for some θ ∈ ConA yields b, c ∈ θ, thus also 1, c ∈ θ showing that c ∈ [1]θ = {1, a}.
Acknowledgment The financial support by the grant of the Czech Government Council MSM 6198959214 is gratefully acknowledged.
References [1] J.C. Abbott: “Semi-boolean algebra”, Matem. Vestnik, Vol. 4, (1967), pp. 177–198. [2] I. Chajda and R. Halaˇs: “Algebraic properties of pre-logics”, Math. Slovaca, Vol. 52, (2002), pp. 157–175. [3] A. Diego: “Sur les alg´ebres de Hilbert”, Col. de Logique Math. Ser. A., Vol. 21, (1967), pp. 31–34. [4] W. Dudek: “On ideals in Hilbert algebras”, Acta Univ. Palack. Olom., Fac. rer. nat., Mathematica, Vol. 38, (1999), pp. 31–34.
492
I. Chajda et al. / Central European Journal of Mathematics 5(3) 2007 484–492
[5] R. Halaˇs: “Remarks on commutative Hilbert algebras”, Mathem. Bohemica, Vol. 127, (2002), pp. 525–529.
DOI: 10.2478/s11533-007-0020-6 Research article CEJM 5(3) 2007 493–504
Harmonic conformal flows on manifolds of constant curvature Amine Fawaz∗ Department of Mathematics, The University of Texas of the Permian Basin, 4901 East University, Odessa, TX 79762 Fax (432) 552-3230
Received 23 February 2007; accepted 24 May 2007 Abstract: We compute the energy of conformal flows on Riemannian manifolds and we prove that conformal flows on manifolds of constant curvature are critical if and only if they are isometric. c Versita Warsaw and Springer-Verlag Berlin Heidelberg. All rights reserved. Keywords: basic form, energy, foliation, geodesic curvature, harmonic, mean curvature, projectable vector field, symmetric functions, umbilical MSC (2000): 53C12, 53C15
1
Introduction
Let (M n+1 , g) be a smooth closed oriented Riemannian manifold of dimension n + 1 and g a smooth metric on M. We suppose M is connected. Let also L be a 1−dimensional oriented foliation on M given by a C ∞ unit vector field V . The energy of L is defined by 1 E(L) = |∇V |2 μ 2 M where ∇ is the Levi-Civita connection on T M, the tangent bundle to M, and μ the volume form coming from the metric g. This is motivated by the harmonic map theory of Eells and Sampson [8] (see also [2, 3, 9, 10, 14–16]). Definition 1.1. We say that the foliation L is harmonic if it is a critical foliation for this energy functional under variations of L through foliations Lt , |t| < . ∗
fawaz
[email protected]
494
A. Fawaz / Central European Journal of Mathematics 5(3) 2007 493–504
When the dimension of M is 2, harmonic flows were studied by the author [10]. We recall the main result: a foliation on a closed Riemann surface is harmonic if and only if it is given by the real part of a meromorphic (holomorphic if possible) vector field. Moreover, the energy integral diverges (except on 2-tori T 2 the energy integral need not diverge) and the finite part of the energy is given by the Green’s function associated with the Laplace operator [10]. Also in a recent paper [9] we investigated harmonic Riemannian flows on higher dimensional Riemannian manifolds. Our main result is a Riemannian flow on a Riemannian manifold of constant curvature is harmonic if and only if it is isometric. In this paper we study harmonic conformal foliations and their energy where we assume that the sectional curvature is constant. We begin by recalling some notions related to foliations in general. Let L be a p−dimensional oriented foliation on a smooth oriented manifold M (no metric yet to be involved) of dimension n = p + q. A vector field Y on M is projectable or an infinitesimal automorphism of L, if [X, Y ] ∈ L for all X ∈ L where L is the tangent bundle to L. This means that the local flow (global if M is compact) of Y preserves the foliation, i.e. maps leaves into leaves. In distinguished coordinates (x; y) = (x1 , ..., xp ; y1 , ..., yq ), such a vector field is of the form Y =
p i=1
with ai = ai (x, y) and
∂ ∂ ai + bα ∂xi α=1 ∂yα q
∂bα = 0, i.e. bα = bα (y). ∂xi
A differential form ω of degree r is basic, if iX ω = 0, θ(X)ω = 0 for X ∈ L. Here iX and θ(X) are the interior product and the Lie derivative in the direction X. By Cartan’s formula we have θ(X)ω = diX ω + iX dω where d is the exterior derivative. In distinguished coordinates (x; y) = (x1 , ..., xp ; y1 , ..., yq ) of L a basic form of degree r is of the form ω= ωα1 ...αr dyα1 ∧ ... ∧ dyαr α1 <...<αr
where the functions ωα1 ...αr (y) are independent of x, i.e.
∂ωα1 ...αr = 0. ∂xi
It is clear that projectable vector fields and basic differential forms descend to the local quotient U/L where U is an open distinguished set. For more details see [13]. Finally, when M is equipped with a metric g and L, L⊥ are respectively the tangent bundle along the leaves and the orthogonal bundle, we say that the foliation L is conformal (respectively Riemannian) if
A. Fawaz / Central European Journal of Mathematics 5(3) 2007 493–504
495
(θ(V )g)(X, Y ) = f (V )g(X, Y ) (respectively θ(V )g = 0) for all x ∈ M, V ∈ Lx and X, Y ∈ L⊥ x , where θ(V ) denotes the Lie derivation with respect to V . Equivalently, parallel transport of vectors in L⊥ along the fibers under the Bott partial connection ∇V X = π[V, X] (V ∈ L, X ∈ L⊥ and π is the orthogonal projection from T M onto L⊥ ) is conformal. Equivalently, L is conformal (respectively Riemannian) if and only if its leaves are locally the fibers of a horizontally conformal (respectively Riemannian) submersion (see [6] for further details). We prove the following Theorem. Theorem 1.2. Let L be a 1−dimensional oriented conformal foliation by geodesics on a smooth closed connected oriented Riemannian manifold (M n+1 , g) of dimension n + 1. Then L is harmonic if and only if n−2 ∇H + Ric(V ) n is proportional to V , where H is the mean curvature of the distribution L⊥ and Ric(V ) is the Ricci curvature operator in the direction V . In particular, if (M, g) has constant sectional curvature C, then L is harmonic when n = 2, and if n ≥ 2, L is harmonic if nC V ol(M) where and only if it is isometric; moreover, its energy is given by E(L) = 2 V ol(M) is the volume of (M, g). Remark 1.3. (i) If the curvature C satisfies C < 0, then the flow L cannot be conformal [17]. See also Section 2, Corollary 2.4. (ii) when n = 2 the Theorem implies that L is harmonic if and only if Ric(V ) is parallel to V . This is already a result of Baird-Wood [6]. (iii) In the literature, a vector field V on (M, g) is conformal if and and only if θ(V )g = f g for some C ∞ function f . These vector fields can’t have unit speed unless they are isometric; this follows from the pointwise relation (θ(V )g)(V, V ) = 2g(∇V V, V ) = f |V |2 . However, conformal vector fields are geodesible because their flows preserve the orthogonal distribution [13]. More precisely, let V be conformal vector field, then the orbits of V 1 are geodesics with respect the metric g defined by g = g on L = T L and g = g on |V |2 L⊥ . It is clear that the vector field V induces a conformal foliation by geodesics on the Riemannian manifold (M, g). (vi) One way of constructing examples of conformal vector fields can be obtained as follows: Let (M, g) possess a 1−parameter group of isometries (at least), and multiply the metric g by a function which is not constant along its trajectories. For the new metric, this group is composed of conformal nonisometric transformations. Finally we like to recall that if the flow L is Riemannian then the orthogonal distribution L⊥ is totally geodesic, that is any geodesic perpendicular to L at one point remains
496
A. Fawaz / Central European Journal of Mathematics 5(3) 2007 493–504
perpendicular to L at all points; in particular the mean curvature of L⊥ vanishes. We also like to recall that the flow L is isometric if θ(V )g = 0; this means that the local flow (global if M is compact) of V consists of isometric transformations. Note that isometric flows are necessarily Riemannian; conversely, a Riemannian flow is isometric if and only if it is geodesible [7, 13]. This paper is organized as follows: In Section 2, we give a geometric expression of the energy, and we recall a proof of a criteria of a 1−foliation to be harmonic [9]. In Section 3, we give a proof of Theorem 1.2 and we give comments when the manifold has dimension 3.
2
Energy and harmonicity
Let the foliation L be 1−dimensional and the dimension of M be n + 1. Let also L = T L be the tangent bundle to L and Q ∼ = L⊥ (via g) the normal bundle of L. The second fundamental form B of the plane field Q is defined in terms of the unit vector field V ∈ L by B(X, Y ) = g(∇X V, Y ) for X, Y ∈ Q. Note that B is not necessarily symmetric. Actually the symmetry of B is equivalent to the integrability of the distribution Q. To B we associate the shape operator S : Q −→ Q defined by g(S(X), Y ) = B(X, Y ) for X, Y ∈ Q. Recall that the symmetric functions of the curvature ηk of Q are defined at any point x ∈ M by det(I + tBx ) = Σnk=0 ηk (x) tk , where I is the identity endomorphism of Q, and Bx is viewed as the shape operator in the direction V . Observe that η1 (x) = trace Bx is the mean curvature of Q, and ηn (x) = det(Bx ). Proposition 2.1. Let L be an oriented conformal foliation on a smooth closed connected oriented Riemannian manifold (M n+1 , g) of dimension n + 1. The energy of L is given by 1 n−2 2 H }μ E(L) = {k 2 + 2η2 − 2 M n where k is the geodesic curvature of the leaves of L and H is the mean curvature of the orthogonal distribution L⊥ . Proof. Let V be a unit tangent vector field to L. Let also e1 , e2 , ..., en+1 be a local orthonormal frame defined on a neighborhood of a point p ∈ M such that at p, e1 = V ∇V V (if (∇V V )(p) = 0 then any e2 will be convenient). We compute the and e2 = |∇V V | matrix of (∇V )(p) with respect to the frame above. By the definition of a conformal foliation we have
A. Fawaz / Central European Journal of Mathematics 5(3) 2007 493–504
497
(θ(V )g)(ei, ej ) = g(∇ei V, ej ) + g(ei , ∇ej V ) = 2αδij for some function α, i, j = 2, 3, ..., n + 1 and δ is the Kronecker tensor. Also it is easy to see that (θ(V )g)(V, ei ) = kδi2 for i = 2, 3, ..., n + 1; here k = |∇V V | is the geodesic curvature of the leaves of L. Thus the matrix of θ(V )g with respect to the frame above is given by ⎞ ⎛ 0 k 0 .. . 0 ⎟ ⎜ ⎟ ⎜ ⎜ k 2α 0 . . . 0 ⎟ ⎟ ⎜ ⎟ ⎜ ⎜ 0 0 2α . . 0 ⎟ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎜. . . .. . . ⎟ ⎟ ⎜ ⎟ ⎜ ⎜. . . .. . . ⎟ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎜. . . .. . 0 ⎟ ⎠ ⎝ 0 0 0 . . 0 2α Now , since 2θ(V )g = (∇V ) + (∇V )t ((∇V )t is the transpose of (∇V )) we deduce immediately that ⎞ ⎛ 0 k 0 .. . 0 ⎟ ⎜ ⎟ ⎜ ⎜ 0 a22 a23 . . . a2,n+1 ⎟ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜0 a a . . . . ⎟ ⎜ 32 33 ⎟ ⎜ ⎟ ⎜ ∇V = ⎜ . ⎟ . . . . . . ⎟ ⎜ ⎟ ⎜ ⎟ ⎜. . . .. . . ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ . . . . . . a ⎜ n,n+1 ⎟ ⎠ ⎝ 0 an+1,2 . . . an+1,n an+1,n+1 where {aij } is the matrix of the operator B, and satisfies aii = α for all i ≥ 2 and aij = −aji for i = j; moreover, H = nα is the usual mean curvature of the distribution L⊥ except for a factor 1/n, which will be suppressed throughout this paper. Therefore |∇V |2 = k 2 + nα2 + 2 a2ij . It is also easy to see that i<j
η2 =
i<j
(α2 + a2ij ) =
n(n − 1) 2 2 α + aij and the Proposition follows immediately. 2 i<j
Remark 2.2. (i) If L is conformal and transverse to a foliation F , then F is an umbilical foliation. (ii) When n = 2, and L is conformal then it is transversely holomorphic that is the
498
A. Fawaz / Central European Journal of Mathematics 5(3) 2007 493–504
holonomy diffeomorphisms are biholomorphic. When (M, g) has constant sectional curvature C, the symmetric functions ηk are important; more precisely we have Theorem 2.3. (See [5]) Let (M n+1 , g) be a closed Riemannian manifold of constant curvature C, and L an oriented 1-foliation on M. We assume M is oriented. Then ⎛ ⎞ ⎧ ⎪ ⎪ n/2 ⎟ ⎪ k ⎪ ⎨ C2 ⎜ ⎝ ⎠ Vol (M), if n and k are even, μk μ = k/2 ⎪ M ⎪ ⎪ ⎪ ⎩ 0 , otherwise. See also [L]. Corollary 2.4. If L is a conformal flow on a closed connected Riemannian manifold (M, g) of constant curvature C, then C ≥ 0. Proof. First observe that when the dimension of M is even, then C=0. This follows from the fact that M admits a nonsingular flow L, (regardless whether the flow is conformal or not) which implies that the Euler−Poincar´ e characteristic χ(M) of M is zero. So we may assume that n is even or that the dimension of M is odd. Now, we have η2 ≥ 0, and nC V ol(M), and these imply C ≥ 0. from the previous Theorem η2 μ = 2 M In the context of Theorem 2.3 it is worthwhile to note the following fact. Proposition 2.5. Let L be an oriented isometric flow on a closed connected oriented Riemannian manifold (M, g) of constant curvature C. Then the symmetric functions of the curvature ηk satisfy ⎛ ⎞ ⎧ ⎪ ⎪ n/2 ⎟ ⎪ k ⎪ ⎨ C2 ⎜ ⎝ ⎠ μk = k/2 ⎪ ⎪ ⎪ ⎪ ⎩ 0
if
n and k are even,
otherwise.
Proof. If n is odd or that the dimension of M is even, then C = 0. But then η2 = 0 by Theorem 2.3 (η2 ≥ 0), and this implies that the operator B is identically zero; therefore, ηk = 0 for all k. We now assume n is even. We will follow the exact notations in [5]. For any point p in M consider a local orthonormal frame e1 , e2 , ..., en+1 defined on an open set U ⊂ M containing p such that e1 , e2 , ..., en are projectables, en+1 = N ∈ L = T L, and the frame is consistent with the orientation of M. This is possible because isometric flows induce
A. Fawaz / Central European Journal of Mathematics 5(3) 2007 493–504
499
1−dimensional Riemannian foliations. We also let θ1 , θ2 , ..., θn+1 be the dual coframe. Since L is Riemannian and since e1 , e2 , ..., en are projectables, the forms θ1 , θ2 , ..., θn are basic. Recall that the connection forms associated with the frame e1 , e2 , ..., en+1 are defined by ωi,j (u) = g(∇u ei , ej ) Define the differential n−forms ψk on U using the polynomials in t by Σnk=0 ψk tk = (tθ1 + ω1,n+1) ∧ ... ∧ (tθn + ωn,n+1). By [5], pages 22,23 the forms ψk are well defined and satisfy ψk ∧ θn+1 = ηn−k μ. Moreover there are n−forms τk such that ⎛ ⎞ n/2 ⎟ n−k ⎜ dτk = ψk ∧ θn+1 − C 2 ⎝ ⎠μ = (n − k)/2
{η
n−k
⎛ −C
n−k 2
⎜ ⎝
⎞ n/2 (n − k)/2
⎟ ⎠
}μ
For n and k even, the forms τk are defined by τk = (−1)n+1
(
1 ψ n−k k+1
+
C(k+2) ψ (n−k)(n−k−2) k+3
+ ... + C
n−k−2 2
(k+2)(k+4)...(n−2) ψn−1 2.4.6....(n−2)
)
and for (n − k) odd or k is odd, the forms τk are defined by τk = (−1)n+1
(
1 ψ n−k k+1
+
C(k+2) ψ (n−k)(n−k−2) k+3
+ ... + C
n−k−2 2
(k+2)(k+4)...(n−1) ψ (n−k)(n−k−2)...3.1 n
)
See [5] pages 27, 28. We will prove that the forms τk are basic. Lemma 2.6. The coefficients {aij } of the second fundamental form of L⊥ are basic. Proof. Since the flow L is isometric its leaves are geodesics. Therefore the form dθn+1 is basic. Now since the frame e1 , e2 , ..., en is projectable the functions dθn+1 (ei , ej ) for i, j = 1, 2, ..., n, are basic. But dθn+1 (ei , ej ) = ei (θn+1 (ej )) − ej (θn+1 (ei )) − θn+1 [ei , ej ] = −g(∇ei ej − ∇ej ei , en+1 ) = g(∇ei N, ej ) − g(∇ej N, ei ) = 2aij . The lemma is proved. Lemma 2.7. The forms ψk are basic. Proof. From the definition of ψk and from the fact that the forms θ1 , ..., θn are basic, it suffices to prove that the connection forms ωi,n+1 , i = 1, 2, ..., n are basic. We have iN ωi,n+1 = ωi,n+1(N) = g(∇N ei , N) = −g(∇N N, ei ) = 0 because the leaves of L are geodesics, and
500
A. Fawaz / Central European Journal of Mathematics 5(3) 2007 493–504
(θ(N)ωi,n+1 )(ej ) = N(ωi,n+1 (ej ))−ωi,n+1 [N, ej ] = N(g(∇ej ei , N))−0. (because [N, ej ] is proportional to N) Therefore (θ(N)ωi,n+1 )(ej ) = −N(aji ). But N(aji ) = 0 by the previous lemma. We continue the proof of the Proposition. By the previous lemma the forms τk are basic. Since the forms τk are of degree n, dτk = 0 for all k, and the Proposition follows. We finish this section by recalling a criteria for a 1−foliation to be harmonic [9]; but first we introduce the following bundle differential operators. Let S 2 (M) be the bundle of smooth symmetric (0, 2)−tensors on M and χ(M) the Lie algebra of C ∞ vector fields. Define δ : S 2 (M) −→ χ(M) and δ ∗ : χ(M) −→ S 2 (M) by δh = −tr12 ∇h = −
n+1
(∇ei h)(ei , −) where e1 , ..., en+1 is a local orthonormal frame,
i=1
1 and δ X = θ(X)g; recall that θ(X)g is the Lie derivative of the metric g in the direction 2 X. δ ∗ is the adjoint of δ with respect to the global scalar product <, > on M that is < δh, X >=< h, δ ∗ X > [4]. ∗
We have: Proposition 2.8. Let L be an oriented flow defined by a unit vector field V on a closed oriented Riemannian manifold (M n+1 , g) of dimension n + 1. Then L is harmonic if and only if the “vertical tension field” τ (V ) = 2δδ ∗ V + ∇H + Ric(V ) is parallel to V , where H is the mean curvature of the orthogonal distribution to L and Ric(V ) is the Ricci curvature in the direction V . Proof. First, for any vector field X on a closed Riemannian manifold (M, g) we have the following integral formula 1 2 |θ(X)g|2 − |divX|2 + Ric(X) μ. |∇X| μ = 2 M M
{
}
For a proof see [12], 5.9, 5.10. Now, For any vector field Y perpendicular to V we consider variations of the flow L by foliations Lt given by vector fields of the form Vt = V + tY . The energy of Lt is given by 1 1 1 1 2 |∇Vt | μ = |θ(Vt )g|2 − |divVt |2 + Ric(Vt ) μ. E(Lt ) = 4 2 2 M 2 M Write <, >= g(, )μ and ωt = ω + tψ where ω and ψ are the dual forms of V and
{
M
}
A. Fawaz / Central European Journal of Mathematics 5(3) 2007 493–504
501
Y respectively. Also observe that if d∗ is the adjoint of the exterior derivative d, we have divV = −d∗ ω = H the mean curvature of the orthogonal distribution Q ≈ L⊥ . We compute d d d d E(Lt )|t=0 = 2 < |t=0 δ ∗ Vt , δ ∗ V > − < |t=0 d∗ ωt , d∗ ω > + < |t=0 Ric(ωt ), ω > dt dt dt dt since Ricci is symmetric. Thus d E(Lt )|t=0 = 2 < δ ∗ Y, δ ∗ V > − < d∗ ψ, d∗ ω > + < Ric(ψ), ω > dt = 2 < δδ ∗ V, Y > − < dd∗ ω, ψ > + < Ric(ω), ψ > =< 2δδ ∗ V + ∇H + Ric(V ), Y > . Since the vector field Y is arbitrary perpendicular to V , the Proposition follows immediately. Remark 2.9. (i) One could also use variations of L through foliations Lt given by vector fields of the form Vt = V + tξY defined on a smooth compact domain D with smooth boundary, and ξ is a C ∞ function on D vanishing on the boundary ∂D. (ii) The energy of L is given by 1 E(L) = 2
g(τ (V ), V )μ M
(iii) If λ is the metric dual of the unit vector field V , then L is harmonic if and only if λ − Ric(λ) is parallel to λ, where is the Laplacian; this is the content of the Weitzenb¨ock formula [10, 16]. In particular, if (M, g) has constant sectional curvature then L is harmonic if and only if the Laplacian of λ is parallel to λ.
3
Proof of Theorem 1.2
We begin by computing the term τ (V ) appearing in Proposition 2.8. We will use the notations of Propositions 2.1 and 2.8. Since θ(V )g = 0 on L, we have τ (V ) = 2δ(αg) + ∇(nα) + Ric(V ) = −2 trace12 ∇(αg) + n∇α + Ric(V ) = −2 trace12 ∇α ⊗ g + n∇α + Ric(V ) because ∇g = 0. Thus τ (V ) = −2∇α + n∇α + Ric(V ) = (n − 2)∇α + Ric(V ). Since H = nα, the first part of the Theorem clearly follows. For the rest of the proof, we assume that (M, g) has constant sectional curvature C, necessarily nonnegative.
502
A. Fawaz / Central European Journal of Mathematics 5(3) 2007 493–504
If the flow L is isometric, then it is clearly harmonic because then H ≡ 0 and Ric(V ) = nC V ol(M) by Proposition 2.1 and Theorem 2.3. nCV ; moreover, E(L) = 2 Now suppose that L is harmonic; this implies that dH = 0 on L⊥ . We will prove that H ≡ 0. η2 μ = CV ol(M) ) , this implies If C = 0 then η2 = 0 (recall that η2 ≥ 0 and M
that the second fundamental form B is zero. Thus L is Riemannian and hence isometric because the leaves are geodesics. It is worthwhile to observe in this case that L is transverse to a totally geodesic n-dimensional foliation F ; now the foliation F lifts to a foliation by hyperplanes of the universal cover Rn+1 of M. Thus, L is the projection of a linear foliation on the torus T n+1 . Assume that C > 0 and let λ be the metric dual of the unit vector field V. Suppose for the sake of a contradiction that the function H does not vanish identically on M. Consider the set Σ = {p ∈ M : dλ(p) = 0}. Σ is clearly open. We claim that Σ = ∅. If dλ = 0 on M then, on the one hand we have λ = dd∗ λ = −dH, and on the other hand using the Bochner formula 1 g( λ, λ) = |λ|2 + |∇λ|2 + g(Ric(λ), λ) 2 we get λ = (|∇λ|2 + nC)λ (recall that λ is parallel to λ by remarks 2.9 (iii) ). Comparing the two expressions we see that −V (H) = |∇λ|2 + nC and this implies that for any point in M, V (H) < 0. Now, since we assumed H is not identically zero, the function H being continuous and differentiable must assume an extremum (maximum and minimum); in other words, there is at least a point p ∈ M such that dH(p) = 0 and this implies that Vp (H) = 0; this contradicts that V (H) < 0 on M. Hence Σ = ∅ and our claim about Σ is sustained. Let ω = Hdλ. Lemma 3.1. The form ω is closed. Proof. dω = V (H)λ ∧ dλ because dH = 0 on L⊥ ; therefore it suffices to prove that V (H)dλ = 0. Let e1 , e2 , ..., en+1 be a local orthonormal frame with e1 = V and let θ1 , θ2 , ..., θn+1 be the dual coframe; of course, θ1 = λ. For i, j ≥ 2 we have g([ei , ej ], V ) = g(∇ei ej −∇ej ei , V ) = −g(∇ei V, ej )+g(∇ej V, ei ) = −2aij by Proposition 2.1 (we are using the same notations of Proposition 2.1). Thus dλ(ei , ej ) = −2aij and since iV dλ = 0 (because the leaves of L are geodesics), n+1 aij θi ∧ θj . we have dλ = −2 i,j=2
A. Fawaz / Central European Journal of Mathematics 5(3) 2007 493–504
Now we can write [ei , ej ] =
n+1
503
bk ek −2aij V for some local functions bk . Since ei (H) =
k=2
0 for i = 2, 3, ..., n + 1, the expression of the Lie bracket shows that aij V (H) = 0 for all i, j = 2, 3, ..., n + 1 and the lemma is proved. We continue the proof of the Theorem, if M = Σ, let Σi , i = 1, 2, ... be the connected components of Σ. On each Σi we have V (H) = 0 by Lemma 3.1, this implies H is a constant, Hi on Σi ; also by continuity H = Hi on ∂Σi the boundary of Σi . Thus, V (H) = 0 on the closure Σi of Σi . Now dλ being identically 0 on M\Σ, we have dλ = 0 on the boundaries. These imply that λ = 0 on any boundary component, but then using the Bochner formula again we get |∇λ|2 +nC = 0 on any boundary ∂Σi , which is a clear contradiction. Therefore Σ = M. Now, from Lemma 3.1 we have the relation V (H)dλ = 0 which implies that V (H) = 0 on M. Hence H is a constant. By the divergence theorem we H μ = 0, hence H = 0. Therefore the flow L is Riemannian and hence isometric
have M
because its leaves are geodesics. This finishes the proof of Theorem 1.2. Remark 3.2. When the dimension of M is 3 then L is harmonic if and only if Ric(V ) is parallel to V ; as we mentioned earlier, this is a result of [6]; further investigation of this leads to the following result: L is harmonic if and only if |dλ ∧ λ| is a constant β say; without loss of generality β = 0 or 1. If β = 0 then L is transverse to a 2−dimensional foliation F , and if β = 1 then L is transverse to a contact structure; See [1] for details on contact forms; we decided not to include these results because of the following: Theorem 3.3. [6] Let M 3 be a 3−dimensional C ∞ Riemannian manifold of nonconstant curvature. Then there are at most two conformal foliations by geodesics of M 3 . If the following condition is satisfied: at each point of some open set of M 3 the Ricci tensor has precisely two distinct eigenvalues, then there is at most one conformal foliation by geodesics of M 3 . Remark 3.4. It is well known that the sphere S 3 admits no totally umbilical non-singular foliations of codimension one. Theorem 1.2 supports a conjecture raised by Wiegmink [14]; namely, the Hopf vector fields on the sphere S 3 are exactly the vector fields of minimal energy or total bending. See [14] page 220.
504
A. Fawaz / Central European Journal of Mathematics 5(3) 2007 493–504
Acknowledgment The author wishes to thank R´emi Langevin and Philippe Tondeur. Additional thanks to the referees for their valuable feedback.
References [1] D.E Blair: “Contact manifolds in Riemannian geometry”, Lect. Notes Math., Vol. 509, Springer-Verlag, Berlin-Heidelberg-New York, 1976. [2] V. Borelliand F. Brito, O. Gil-Medrano: “The Infimum of The Energy of Unit Vector Fields on Odd-Dimensional Spheres”, Ann. Glob. Anal. Geom., Vol. 23 (2003), pp. 129–140. [3] F. Brito and P. Chacon: “Energy of Global Frames”, To appear in the J. Aust. Math. Soc.. [4] M. Berger M. and D. Ebin, “Some decompositions of the space of symmetric tensors on a Riemannian manifold”, J. Differ. Geom., Vol. 3, (1969), pp. 379–392. [5] F. Brito, R. Langevin R. and H. Rosenberg: “Int´egrales de courbure sur des vari´et´ees feuillet´ees.“, J. Differ. Geometry, Vol. 16, (1981), pp. 19–50. [6] P. Baird and J.C. Wood: “Harmonic Morphisms, Seifert Fibre Spaces and Conformal Foliations”, P. Lond. Math. Soc. Vol. 64, (1992), pp. 170–196. [7] Y. Carri`ere: “Flots Riemanniens, in “Structure Transverse des Feuilletages”, Ast´erisque, Vo. 116 (1984), pp. 31–52. [8] J. Eells and J. Sampson: “Harmonic mappings of Riemannian manifolds”, Amer. J. Math., Vol. 86, (1964), pp. 109–160. [9] A. Fawaz: “Energy and Riemannian Flows”, To appear in Geometriae Dedicata. [10] A. Fawaz: “Energy and Foliations on Riemann Surfaces”, Ann. Glob. Anal. Geom., Vol. 28 (2005), pp. 75–89. [11] R. Langevin: “Feuilletages, e´nergies et cristaux liquides”, Ast´ erisque Vols. 107–108, (1983), pp. 201–213. [12] W. Poor: Differential Geometric Structures, McGraw Hill Book Company, New York etc. 1981. [13] P. Tondeur: Geometry of Foliations, Monographs in Math. Vol. 90, Birkh¨auser, 1997. [14] G. Wiegmink: “Total bending of vector fields on the sphere S 3 ”, Differ. Geome. Appl., Vol. 6, (1996), pp. 219–236 [15] G. Wiegmink: “Total bending of vector fields on Riemannian manifolds”, Math. Ann., Vol. 303, (1995), pp. 325–344. [16] C.M. Wood: “On the energy of a unit vector field”, Geometria Dedicata, Vol. 64 (1997), pp. 319–330. [17] K. Yano: Integral Formulas in Riemannian Geometry, Marcel-Decker Inc., New York, 1970.
DOI: 10.2478/s11533-007-0011-7 Research article CEJM 5(3) 2007 505–511
Under which conditions is the Jacobi space Lpw(a,b) [−1, 1] subset of L1w(α,β) [−1, 1] ? Michael Felten∗ Faculty of Mathematics and Informatics University of Hagen 58084 Hagen, Germany
Received 12 October 2006 ; accepted 8 April 2007 Abstract: Exact conditions for α, β, a, b > −1 and 1 ≤ p ≤ ∞ are determined under which the inclusion property Lpw(a,b) [−1, 1] ⊂ L1w(α,β) [−1, 1] is valid. It is shown that the conditions characterize the inclusion property. The paper concludes with some results, in which the inclusion property can be detected in relation with estimates of Jacobi differential operators and with Muckenhoupt’s transplantation theorems and multiplier theorems for Jacobi series. c Versita Warsaw and Springer-Verlag Berlin Heidelberg. All rights reserved. Keywords: Jacobi Spaces, Fourier expansion, Jacobi differential operators, transplantation theorems, multiplier theorems MSC (2000): 41A10, 42C10
1
Introduction
The Jacobi weight w (a,b) with a, b ∈ R is the function w (a,b) (x) := (1 − x)a (1 + x)b ,
x ∈ [−1, 1].
If a < 0 or b < 0, then w (a,b) has a singularity in 1 or −1 respectively. On the other hand, if a > 0 or b > 0, then w (a,b) has a root in 1 or −1 respectively. The Jacobi space Lpw(a,b) [−1, 1] with a, b > −1 and 1 ≤ p ≤ ∞ is defined as follows. If p is such a value that 1 ≤ p < ∞, then Lpw(a,b) [−1, 1] denotes the space of all measurable ∗
[email protected]
506
M. Felten / Central European Journal of Mathematics 5(3) 2007 505–511
functions f : [−1, 1] → R for which the weighted norm 1 1/p p (a,b) |f (x)| w (x) dx f Lp [−1,1] := w(a,b)
−1
is finite. The space L∞ [−1, 1] is the space of all measurable functions f : [−1, 1] → R w (a,b) for which the weighted norm (a,b) f L∞ [−1,1] := ess sup |f (x)|w (x) w(a,b)
x∈[−1,1]
is finite. Our aim in this paper is to determine the conditions under which the inclusion Lpw(a,b) [−1, 1] ⊂ L1w(α,β) [−1, 1] is valid, where α, β is another pair of values greater than −1. We will prove the following theorem. Theorem 1.1. Let α, β, a, b > −1 and 1 ≤ p ≤ ∞. Then Lpw(a,b) [−1, 1] ⊂ L1w(α,β) [−1, 1]
(1)
is equivalent to a+1 b+1 , ≤ 1 if p = 1, α+1 β+1 a+1 b+1 , < p if 1 < p < ∞, α+1 β+1 b a , < 1 if p = ∞. α+1 β+1
(2) (3) (4)
The inclusion property (1) in Theorem 1.1 is of importance for the definiteness of the (α,β) Fourier projections sn : L1w(α,β) [−1, 1] → Πn , n ∈ N, on the Jacobi space Lpw(a,b) [−1, 1]. (α,β) That is to say, sn f is well-defined for all f ∈ Lpw(a,b) [−1, 1] if the Fourier coefficients 1 (α,β) (α,β) f (t)pk (t)w (α,β) (t) dt, k ∈ N0 , (5) ck (f ) = −1
(α,β)
exist. This is fulfilled if the with respect to the orthonormal Jacobi polynomials pk inclusion property (1) is valid. Hence, Theorem 1.1 gives concrete conditions for a, b, α, β (α,β) and p under which the Fourier projections sn are well-defined on Lpw(a,b) [−1, 1]. Section 2 is concerned with the characterization and the proof of the inclusion property. At the end of this paper in Section 3 we will give some further remarks concerning Theorem 1.1. There we conclude with some results, in which the inclusion property can be detected in relation with estimates of Jacobi differential operators and with Muckenhoupt’s transplantation theorems and multiplier theorems for Jacobi series.
M. Felten / Central European Journal of Mathematics 5(3) 2007 505–511
2
507
Characterization of the Inclusion Property
The task to determine the conditions under which the inclusion (1) is valid can be related to the problem of finding conditions on α, β, a, b and p for which ||w (α,β) f ||1 is finite, whenever f is a function with finite norm ||w (a,b) f ||p. The solution to this problem is presented in the next theorem. Afterwards we will prove Theorem 1.1. Theorem 2.1. Let α, β, a, b > −1 and 1 ≤ p ≤ ∞. Then {f | w (a,b) f ∈ Lp [−1, 1]} ⊂ {f | w (α,β) f ∈ L1 [−1, 1]} if and only if ||w (α−a,β−b) ||q < ∞ with q such that
1 p
+
1 q
(6)
= 1.
We first mention that condition ||w (α−a,β−b) ||q < ∞ in Theorem 2.1 can be easily resolved: ||w (α−a,β−b) ||q < ∞ is equivalent to α − a, β − b > − 1q if 1 ≤ q < ∞ and to α − a, β − b ≥ 0 if q = ∞. Since 1q = 1 − 1p , it follows that condition ||w (α−a,β−b) ||q < ∞ is explicitly given by ⎫ ⎧ ⎪ ⎪ ⎪ ⎪ and b ≤ β if p = 1, ⎬ ⎨a ≤ α (7) ⎪ ⎪ 1 1 ⎪ ⎪ ⎩ a < α + 1 − and b < β + 1 − if 1 < p ≤ ∞. ⎭ p p Thus Theorem 2.1 states that (6) and (7) are equivalent. p Proof of Theorem 2.1. First, let ||w (α−a,β−b) ||q < ∞ with q = (1 − 1p )−1 = p−1 ∈ [1, ∞]. We will show that the inclusion (6) holds true. To this end, let f be such that w (a,b) f ∈ Lp [−1, 1], i.e., ||w (a,b) f ||p < ∞. Then from H¨older’s inequality we obtain (α,β) w f 1 = w (a,b) f · w (α−a,β−b) 1 ≤ w (a,b) f · w (α−a,β−b) < ∞, (8) p
q
which means that w (α,β) f ∈ L1 [−1, 1]. We now proceed to prove the converse result. Let the inclusion (6) hold true. Below we will distinguish between p = 1, p = ∞ and 1 < p < ∞. Case p = 1. Then q = ∞. Let f := w (−a+γ,−b+δ) with arbitrary values γ, δ > −1. Obviously, w (a,b) f = w (γ,δ) ∈ L1 [−1, 1]. From (6) it follows that w (α,β) f = w (α−a+γ,β−b+δ) ∈ L1 [−1, 1], which means α − a + γ, β − b + δ > −1. Since γ, δ > −1 have been chosen arbitrarily, we obtain α − a, β − b ≥ 0 and hence ||w(α−a,β−b) ||∞ < ∞. Case p = ∞. Then q = 1. Let f := w (−a,−b) . Obviously, w (a,b) f = 1 ∈ L∞ [−1, 1]. From (6) it follows that w (α,β) f = w (α−a,β−b) ∈ L1 [−1, 1], which means ||w (α−a,β−b) ||1 < ∞. Case 1 < p < ∞. Then q =
p p−1
f (x) :=
∈ (1, ∞). Let 1 1 |x| w (−a− p ,−b− p ) (x). ln(1 − |x|)
508
M. Felten / Central European Journal of Mathematics 5(3) 2007 505–511
We obtain, using − ln(1 − |x|) ≥ 12 |x| for x ∈ [−1, 1], ||w (a,b) f ||pp =
1 −1
|x|p 1 dx p (− ln(1 − |x|)) (1 − x2 ) 1
2 ≤ C − 12
|x|p dx + 2 |x|p
1 1 2
1 1 dx p (− ln(1 − x)) 1 − x
2 1 = C+ 1 − p (− ln(1 − x))p−1 < ∞ since p > 1.
p>1
1− 1 2
Hence w (a,b) f ∈ Lp [−1, 1] and from (6) it follows that w (α,β) f =
1 1 |·| w (α−a− p ,β−b− p ) ∈ L1 [−1, 1]. ln(1 − | · |)
A short calculation shows that 1 −1
|x| w (γ,δ) (x) dx ln(1 − |x|)
is finite if γ, δ > −1 and infinite if γ = −1 or δ = −1. Hence we obtain α − a − 1p > −1 and β − b − 1p > −1 or, rewritten, α − a, β − b > −1 + 1p = − 1q , which means that ||w (α−a,β−b) ||q < ∞. Thus our theorem is proved. Proof of Theorem 1.1. We observe that a b
Lpw(a,b) [−1, 1] = {f | w ( p , p ) f ∈ Lp [−1, 1]} with 1 ≤ p < ∞ and (a,b) f ∈ L∞ [−1, 1]}. L∞ w (a,b) [−1, 1] = {f | w
First, let 1 ≤ p < ∞ and with q such that (1) is valid if and only if ||w and pb in (7) respectively,
(α− ap ,β− pb )
1 p
+
1 q
= 1. From Theorem 2.1 it follows that
||q < ∞ or, equivalently, by replacing a and b by
a p
a ≤ α and b ≤ β if p = 1, 1 b 1 a < α + 1 − and < β + 1 − if 1 < p < ∞. p p p p These two conditions are the same as in (2) and (3). Finally, let p = ∞. From Theorem 2.1 it follows that inclusion (1) is valid if and only if ||w (α−a,β−b) ||1 < ∞, or, equivalently, α − a, β − b > −1, which is the same condition as in (4).
M. Felten / Central European Journal of Mathematics 5(3) 2007 505–511
3
509
The Inclusion Property in Other Results
The present work has dealt with the problem of determine conditions under which the Jacobi space Lpw(a,b) [−1, 1] is subset of L1w(α,β) [−1, 1]. Since we now know concrete conditions under which the inclusion property is valid (see Theorem 1.1), we will conclude this work with some results where the inclusion property can be detected in other results. We begin with a result, in which the inclusion property can be detected in relation with estimates of differential operators. To this end, let us define the Jacobi differential operator ∂ ∂ (9) P (α,β) (D) := (w (α,β) )−1 w (α+1,β+1) , ∂x ∂x where both α and β are greater than −1. The next result shows that each differential operator P (γ,δ) (D) can be estimated by P (α,β) (D) with respect to the norm of the weighted Jacobi space B = Lpw(a,b) [−1, 1] if a mild assumption on α, β, a, b and p is fulfilled. We will see that the assumption is related to the inclusion property of Jacobi spaces. The following theorem is taken from [2]. Theorem 3.1. Let B = Lpw(a,b) [−1, 1] with a, b > −1 and 1 ≤ p ≤ ∞. Also let α, β > −1 such that ⎫ ⎧ a+1 b+1 ⎪ ⎪ ⎪ ⎪ , < p if 1 ≤ p < ∞, ⎬ ⎨α+1 β+1 (10) ⎪ ⎪ ⎪ ⎪ ⎭ ⎩ a , b < 1 if p = ∞. α+1 β+1 Moreover, let γ, δ ∈ R. Then (γ,δ) P (D)g B ≤ C P (α,β) (D)g B (11) for all g ∈ C 2 [−1, 1] with a positive constant C = C(α, β, γ, δ, a, b, p) being independent of g. We must mention that condition (10) is closely related to the inclusion B ⊂ L1w(α,β) [−1, 1] with B = Lpw(a,b) [−1, 1]. Conditions (2)–(4) are the same as condition (10) except in case p = 1. If p = 1, then (2) reads a ≤ α and b ≤ β, whereas (10) reads a < α and b < β. That is to say, for example, the case B = L1w(α,β) [−1, 1] in Theorem 3.1 is not included. And Hardy’s inequalities, which were used in the proof of Theorem 3.1 in [2], cannot be used for this case. It is an open problem as to whether Theorem 3.1 is valid for B = L1w(α,β) [−1, 1]. We will not try to solve this problem in the present work. What can be said is that if (10) is fulfilled, then the inclusion B ⊂ L1w(α,β) [−1, 1] is ensured. Moreover, if p is such a value that 1 < p ≤ ∞, then (10) is equivalent to B ⊂ L1w(α,β) [−1, 1]. With the aid of Theorem 3.1 it is possible in [2] to determine conditions under which the K-functionals K(f, P (α,β) (D), t)B and K(f, P (γ,δ) (D), t)B can be estimated by each other or under which they are equivalent.
510
M. Felten / Central European Journal of Mathematics 5(3) 2007 505–511
We proceed with a result, which Muckenhoupt [3] published in 1969. To this end, we < α + 1 and b+1 < β + 1. Consequently, (3) is the same rewrite (3) for 1 < p < ∞ as a+1 p p as α+1 a+1 α+1 − < , p 2 2 β+1 b+1 β+1 − < . p 2 2 In [3] Muckenhoupt proved a theorem in which he gave a comprehensive answer to the (α,β) question as to when the Fourier projection operators sn are uniformly bounded in p Lw(a,b) [−1, 1] with 1 < p < ∞. Muckenhoupt’s result reads as follows. Theorem 3.2 (Muckenhoupt 1969). Assume that α, β > −1, 1 < p < ∞ and a, b ∈ R such that a + 1 α + 1 1 α+1 p − 2 < min{ 4 , 2 }, (12) b + 1 β + 1 1 β+1 p − 2 < min{ 4 , 2 }. Then
(α,β) s f n
Lp (a,b) [−1,1] w
≤ C f Lp
w(a,b)
[−1,1]
for all n ∈ N and f ∈ Lpw(a,b) [−1, 1], where C = C(α, β, a, b, p) is a positive constant being independent of f and n. If condition (12) holds, then it is obvious that (3) is fulfilled, meaning that the inclusion (α,β) ⊂ L1w(α,β) [−1, 1] holds true. Thus the Fourier projections sn are wellp defined on Lw(a,b) [−1, 1]. Lpw(a,b) [−1, 1]
Moreover, in 1986 Muckenhoupt [4] published transplantation theorems and multiplier theorems for Jacobi series. As a special case of Theorem 1.10 in [4, p. 4] one obtains conditions, under which the multiplier operators Tn f :=
n
(α,β)
m(k)ck
(α,β)
(f )pk
,
n ∈ N,
k=0 (α,β) ck ,
given in (5), are uniformly bounded on Lpw(a,b) [−1, 1]. with the Fourier coefficients The result is given in the following Hormander multiplier theorem. The formulation of the theorem can also be found in Ditzian and Dai’s work [1]. Theorem 3.3 (Muckenhoupt 1986). Let m : [0, ∞) → R be a bounded function satisfying |m(j) (x)| ≤ cx−j
for all x > 0 and j ∈ N.
(13)
If α, β > −1, 1 < p < ∞ and a, b ∈ R such that a+1 b+1 , < p, α+1 β+1
(14)
M. Felten / Central European Journal of Mathematics 5(3) 2007 505–511
511
then Tn f Lp
w(a,b)
[−1,1]
≤ C f Lp
w(a,b)
[−1,1]
for all n ∈ N and f ∈ Lpw(a,b) [−1, 1], where C = C(α, β, a, b, p) is a positive constant being independent of f and n. For the uniform boundedness of the operator Tn Theorem 3.3 assumes two conditions, namely the Hormander condition (13) and condition (14). The latter is exactly the same as (3). Hence, using Theorem 1.1, we find that (14) is equivalent to the inclusion property Lpw(a,b) [−1, 1] ⊂ L1w(α,β) [−1, 1]. Thus, the inclusion property is a natural condition and must be assumed for the definiteness of the multiplier operator Tn f for all f ∈ Lpw(a,b) [−1, 1].
References [1] F. Dai and Z. Ditzian: “Littlewood-Paley theory and a sharp Marchaud inequality”, Acta Sci. Math. (Szeged), Vol. 71, (2005), pp. 65–90. [2] M. Felten: “Most of the First Order Jacobi K-Functionals are Equivalent”, submitted, pp. 1–12. [3] B. Muckenhoupt: “Mean convergence of Jacobi series”, Proc. Amer. Math. Soc., Vol. 23, (1969), pp. 306–310. [4] B. Muckenhoupt: “Transplantation theorems and multiplier theorems for Jacobi series”, Mem. Amer. Math. Soc., Vol. 64, (1986), pp. iv–86.
DOI: 10.2478/s11533-007-0016-2 Research article CEJM 5(3) 2007 512–522
Holomorphic automorphisms and collective compactness in J∗-algebras of operators Jos´e M. Isidro
∗
Facultad de Matem´ aticas, Santiago de Compostela, Spain
Received 21 December 2006; accepted 11 May 2007 Abstract: Let G be the Banach-Lie group of all holomorphic automorphisms of the open unit ball BA in a J∗ -algebra A of operators. Let F be the family of all collectively compact subsets W contained in BA . We show that the subgroup F ⊂ G of all those g ∈ G that preserve the family F is a closed Lie subgroup of G and characterize its Banach-Lie algebra. We make a detailed study of F when A is a Cartan factor. c Versita Warsaw and Springer-Verlag Berlin Heidelberg. All rights reserved. Keywords: J∗ -algebras, Cartan factors, holomorphic automorphisms, Banach-Lie groups, collective compactness. MSC (2000): 32M15, 22E65, 17C65, 17B65.
1
Introduction
Let A be a J∗ -algebra, that is, a norm-closed complex vector subspace of L(H, K) closed under the triple product operation A → AA∗ A. Here H, K are complex Hilbert spaces, L(H, K) is the space of all bounded linear operators T : H → K endowed with the operator norm, and A∗ denotes the usual adjoint of A ∈ L(H, K). It is known [2, 9], that the open unit ball BA of A is homogeneous under the action of the group G := Aut(BA) of all holomorphic automorphisms of BA. On the other hand, G is a real Banach-Lie group in the topology of uniform convergence over BA [14], and its Banach-Lie algebra g := aut(BA) consists of all complete holomorphic vector fields on BA, endowed with the topology of uniform convergence on the ball. In this paper, we are interested in the study of some naturally defined subgroups ∗
E-mail:
[email protected]
J.M. Isidro / Central European Journal of Mathematics 5(3) 2007 512–522
513
of G. For instance, in [6, 7, 12] the authors considered the ball BA of the J∗ -algebra A := L(H, K), endowed it with the topology τ induced by one of the various natural topologies on L(H, K) and studied the subgroup of G of those g ∈ G that are continuous relative to τ . Here we consider a family F of subsets W ⊂ BA and study the subgroup F ⊂ G of those g ∈ G that preserve F, F := {g ∈ G : g(W) ∈ F ∀W ∈ F}. Since the elements of G are (holomorphic) homeomorphisms of BA they automatically preserve the family F1 of all compact subsets W ⊂ BA, hence F = G in this case. The same happens if we take F2 to be the family of all subsets W ⊂ BA such that dist(W, ∂BA) > 0 since the elements of G are isometries for the Carath´eodory distance in BA. One can define many families F between these two “extreme” cases. In general, F will not be closed in G nor will it be an algebraic subgroup of G, and there is no reason to expect that F is a Lie subgroup of G in the induced topology. In our case F will be the family of all collectively compact subsets W ⊂ BA. Recall that a subset W ⊂ A is said to be collectively compact if the collective image of the unit ball BH of H, that is, the set W(BH ) := T ∈W T (BH ), is relatively compact in K. Notice that collective compactness of W is defined in terms of the action of the operators in W, which does not involve either the J∗ -algebra structure of A or the holomorphic structure of the ball BA. It is known that collective compactness is not preserved even by surjective linear isometries, see ([1] p. 422 and example 2.6). We establish necessary and sufficient conditions for an automorphism g ∈ G to preserve the family F and prove that F is a closed Lie subgroup of G whose Lie algebra h admits the Cartan decomposition h = LF ⊕PF. Here LF, the Lie algebra of the isotropy group of the origin, consists of those derivations of A which preserve F. Moreover, PF consists of the vector fields that, in the canonical coordinate system (the identity Id : BA → BA as a global chart), have the form ∂ X = QA ∂X where QA (X) = A − XA∗ X, (X ∈ A), and A = QA (0) is a compact operator that belongs to A. A deeper analysis is made when A is a special Cartan factor, in which case we prove that all surjective linear isometries in the identity connected component of the unitary group of A preserve collective compactness and all derivations of A preserve the family F. Our main references are [1, 11] for background on collective compactness in the spaces of operators L(X, Y ) with X, Y Banach spaces, [2, 3] for the study of J∗ -algebras of operators and [2, 4, 9] for the study of their groups of holomorphic automorphisms. See also [12] and [7] for related problems.
2
Notation and Preliminaries
The open unit ball of a Banach space Z is denoted by BZ . By L(X, Y ) and K(X, Y ) we denote the space of bounded operators and the closed subspace of compact operators, respectively, endowed with the operator norm. For X = Y we write L(X) instead of L(X, X), and X ∗ is the dual space of X. For T ∈ L(X, Y ), we let T ∗ ∈ L(Y ∗ , X ∗ ) be
514
J.M. Isidro / Central European Journal of Mathematics 5(3) 2007 512–522
the transpose of X. By Isom(X) ⊂ L(X) we denote the group of all surjective linear isometries of X, endowed with the topology of the operator norm. Whenever G is a topological group, G0 denotes the connected component of the identity in G. Definition 2.1. A subset W ⊂ L(X, Y ) is said to be collectively compact if the set W(BZ ) := T ∈W T (BX ) is relatively compact in Y . If W ⊂ L(X, Y ) is collectively compact then W ⊂ K(X, Y ). A subset W ⊂ K(X, Y ) is relatively compact in K(X, Y ) if and only if both W and W ∗ := {T ∗ : T ∈ W} are collectively compact, and hence relative compactness implies collective compactness but the converse is not true. Surjective linear isometries of the space L(X, Y ) preserve relative compactness since they are homeomorphisms, but in general they do not preserve collective compactness. For later reference we state the following lemmas, the proofs of which can be found in [1] Lemma 2.2. Let K ⊂ C be a compact set and let U, V ⊂ K(X, Y ) be collectively compact sets of operators. Then: K · U, U + V and U are collectively compact sets. Lemma 2.3. Let X, Y, Z be Banach spaces and assume that U and V are subsets of L(X, Y ) and L(Y, Z), respectively. i) If U is a bounded subset of bounded operators and V is a collectively compact set of operators, then VU is collectively compact. ii) If U is a collectively compact set of operators and V is a compact set of bounded operators, then VU is collectively compact. We refer to [10] for the functional calculus in J∗ -algebras (odd functional calculus) used below. 2n+1 be an odd analytic function defined by a Proposition 2.4. Let f (z) = ∞ 0 c2n+1 z power series with radius of convergence R > 1, and let A be a J∗ -algebra of operators. For each T ∈ A let f (T ) ∈ A be the operator defined by the odd functional calculus. If W ⊂ BA is a collectively compact set, then so is f (W) := {f (T ) : T ∈ W}. Proof. Since R > 1, the series f (z) = c2n+1 z 2n+1 is absolutely convergent at the point z = 1, that is ρ := |c2n+1 | < ∞. Since f is odd, we have f (z) = zg(z) where the radius 2n of convergence of g(z) := ∞ is R > 1. Thus, for each T ∈ BA we have 0 c2n+1 z
c2n+1 (T ∗ T )n ≤
|c2n+1 |T 2n ≤
|c2n+1 | = ρ
(1)
∗ n and g(T ∗T ) := ∞ 0 c2n+1 (T T ) is a well-defined element in L(H). Moreover, (1) shows that U := {g(T ∗T ) : T ∈ W} is a bounded (and therefore equicontinuous) set of operators in L(H). The result then follows from (2.3) and f (W) ⊂ WU.
J.M. Isidro / Central European Journal of Mathematics 5(3) 2007 512–522
3
515
Collective compactness and holomorphic automorphisms
In what follows A ⊂ L(H, K) is a J∗ -algebra of operators and Isom(A) is the Lie group of all surjective linear isometries of A whereas Der(A) is the Lie algebra of all J∗ -algebra derivations of A. See [4] for background on this topic. We let G := Aut(BA) be the group of all holomorphic automorphisms of the open unit ball BA, endowed with the topology of local uniform convergence over BA. Recall [14] that on G, the topology of local uniform convergence is the same as the topology of uniform convergence on the unit ball BA, which (unless dim(A) < ∞) does not coincide with the topology of uniform convergence on compact subsets of BA. Recall also ([2] th. 2 and 3) that every g ∈ G can be represented in a unique way in the form g = MA L, where L ∈ Isom(A) and MA , a M¨obius transformation of BA, is given by MA (T ) := (I − AA∗ )−1/2 (T + A)(I + A∗ T )−1 (I − A∗ A)1/2 ,
T ∈ BA
(2)
where A = g(0) ∈ BA. Here positive and negative square roots are defined by the usual series expansions and I at each occurrence denotes the identity mapping on the appropriate underlying Hilbert space. We let F and FBA be the families of all collectively compact subsets W in A and in the ball BA, respectively Theorem 3.1. For an operator A ∈ BA the following conditions are equivalent i) The operator A is compact, ii) The quadratic map QA (T ) := A − T A∗ T , (T ∈ A), preserves the family F. iii) MA preserves the family FBA . Proof. i)⇔ ii). Assume that A is compact. Let W ∈ F be collectively compact. Since A∗ is bounded, U := WA∗ W is collectively compact by (2.3 ii)). Clearly the set V := {A} ∈ F is collectively compact and so is QA (W) ⊂ V + U by (2.2 ii). For the converse, notice that W := {0} ∈ F is collectively compact, hence so is A = QA (0). i) ⇔ iii). Assume A is compact. Let W ∈ FBA be a collectively compact subset of BA. Then T ≤ 1 for all T ∈ W and, as A < 1, we have ∗
f (T ) := T (I + A T ) where g(T ) :=
∞ 0
−1
=T
∞
(−1)n (A∗ T )n = T g(T )
0
(−1)n (A∗ T )n is norm-convergent since
g(T ) ≤
∞ 0
∗
A T ≤ n
∞ 0
An ≤
1 1 − A
T ∈ BA.
Therefore the set g(W) = {g(T ) : T ∈ W} is bounded and hence equicontinuous in L(H, K). Hence by (2.3) i) f (W) = {T g(T ) : T ∈ W} is a collectively compact subset in A. Moreover by (2.3), V := {(A + T )(I + A∗ T )−1 : T ∈ W} ⊂ Ag(W) + f (W)
516
J.M. Isidro / Central European Journal of Mathematics 5(3) 2007 512–522
is a collectively compact subset in A and it is quite simple to show that so is MA (W) = (I − AA∗ )−1/2 V(I − A∗ A)1/2 . Since by assumption W ⊂ BA, we actually have MA (W) ⊂ BA and so MA (W) ∈ FBA . For the converse, notice that A = MA (0). Corollary 3.2. For every g = MA L ∈ G the following conditions are equivalent i) g preserves the family FA. ii) The operator A = g(0) is compact and L preserves the family FA. Proof. i)⇒ii) Clearly W := {0} is a collectively compact set in the ball, hence −A = −g(0) is compact. Hence by (3.1), M−A = MA−1 preserves collective compactness in BA and therefore L = MA−1 g preserves collective compactness in the ball and also in A due to the linearity of L. The argument for the converse is similar. Proposition 3.3. The set F := {g ∈ G : g(F) ⊂ F} is a closed subgroup of G. Proof. It is clear that F is a subgroup of G. By [14], the topology of G is that of uniform convergence on BA. Let gn = MAn Ln , (n ∈ N), be a sequence in F and assume that limn→∞ gn = g where g = MA L ∈ G. Since W := {0} is a collectively compact set in BA and gn ∈ F , the operators An = gn (0) are compact. From gn → g we get that A = limn→∞ An is compact and A < 1. By (3.1), the M¨obius transformation MA preserves the family FBA . By [14] the relation gn → g entails Ln → L, convergence in the operator norm of L(A). To complete the proof we have to check that L preserves F, and due to linearity, it suffices to show that it preserves the family FBA . Let W be have to show that L(W) = {L(T ) : T ∈ W} is a collectively compact set in BA. We L(T ) BH is totally bounded in K. Let ε > 0 collectively compact, that is, the set T ∈W
be given. Since Ln → L we can fix an index n ∈ N such that L − Ln ≤ ε, where the is the operator norm in L(A). Since by (3.2), Ln preserves the family F, norm Ln (T ) BH is totally bounded in K, hence there are finite sets the set T ∈W
{T1 , · · · , Tr } ⊂ W
{h1 , · · · , hs } ⊂ BH
with the following property: given any T ∈ W and any h ∈ BH there is a pair (Ti , hj ) for which Ln (T ) h − Ln (Ti ) hj ≤ ε. But then, for T ∈ W and h ∈ BH we have L(T ) h − L(Ti ) hj ≤ L(T ) h − Ln (T ) h + + Ln T ) h − Ln (Ti ) hj + Ln (Ti ) hj − L(Ti ) hj ≤ 3ε
J.M. Isidro / Central European Journal of Mathematics 5(3) 2007 512–522
517
which completes the proof.
It is therefore reasonable to study conditions for an isometry L of A to preserve the family F. We first analyze the case of a C∗ -algebra. Let A := L(H); then the adjunction T → T ∗ is a surjective linear isometry of A which does not preserve collective compactness ([1] page 422 and example 2.6). This suggests that we should restrict our considerations to Isom(A)0 . Proposition 3.4. Let A ⊂ L(H) be a unital C∗ -algebra of operators acting on a Hilbert space H. Then every isometry L ∈ Isom(A)0 preserves collective compactness. Proof. By a well known result of Kadison, every L ∈ Isom(A) is of the form L(A) = Uρ(A),
A∈A
(3)
where U = L(I) is a unitary element in A and ρ : A → A is an isometry that satisfies ρ(I) = I
ρ(A2 ) = ρ(A)2
ρ(A∗ ) = ρ(A)∗ ,
that is, ρ is an element in the group Aut(A+ ) of all ∗ -automorphisms of the Jordan-C∗ algebra A+ associated to A. Let U ⊂ A be the connected component of I in the set of unitary elements of A. From (3), it is clear that Isom(A)0 consists of the transformations A → L(A) = Uρ(A)
U ∈ U, ρ ∈ Aut(A+ )0 .
By (2.3), in order to prove the statement it suffices to consider the elements ρ ∈ Aut(A+ )0 . Assume that ρ lies in that group and that ρ − Id ≤ 2/3, where the norm is the operator norm in L(A). Then by ([2] lemma 1 page 26) we have ρ(A) = UAU ∗
A ∈ A,
where U is a unitary operator in the weak-operator closure of A in L(H). Hence ρ preserves the family F by (2.3). Finally, any element in Aut(A+ )0 preserves collective compactness since Aut(A+ )0 is a connected Lie group and therefore it is generated by any neighbourhood of its identity element. At this point, it is reasonable to ask what happens in the case of an arbitrary J∗ algebra. Proposition 3.5. Let A be a J∗ -algebra. Then, for every derivation δ ∈ Der(A), the following conditions are equivalent i) δ preserves the family F of all collectively compact subsets in A, ii) For each t ∈ R, the isometry of A given by g(t) := exp tδ preserves the family F. Proof. i)⇒ii). Assume that δ ∈ Der(A) preserves the family F. We have to show that exp tδ ∈ G preserves F. Obviously we may assume t = 1. Let W ∈ F. We have to prove
518
J.M. Isidro / Central European Journal of Mathematics 5(3) 2007 512–522
that (exp δ)W BH is totally bounded in K. Let ε > 0 be given. By the properties of ∞ 1 δn ≤ ε. Since W ⊂ BA, we the exponential, there is an index N ∈ N such that n! N +1 have T ≤ 1 for all T ∈ W and hence ∞ ∞ ∞ 1 n 1 n (δ T )h ≤ δ T h ≤ δn ≤ ε n! n! N +1 N +1 N +1
holds for all T ∈ W and all h ∈ BH . To shorten the notation, set p(δ) := I + 1!1 δ + · · · + 1 n δ . By assumption δ preserves F, hence each of the sets W, δW · · · , δ n W is collectively n! compact and so is p(δ)W due to (2.2) and the inclusion p(δ)W ⊂ W + δW + · · · +
1 n δ W. n!
Therefore there are finite sets {T1 , · · · , Tr } ⊂ W
{h1 , · · · , hs } ⊂ BH
with the following property: for each T ∈ W and each h ∈ BH there is a pair (Ti , hj ) such that p(δ)T h − p(δ)Ti hj ≤ ε. Now ∞ 1 n (exp δ)T h − p(δ)Ti hj = [p(δ) + δ ]T h − p(δ)Ti hj ≤ n! N +1 ∞ 1 n δ T h ≤ 2ε p(δ)T h − p(δ)Ti hj + n! N +1
which proves the claim. Assume that t ∈ R → g(t) ∈ F is a one-parameter group of surjective linear isometries of A such that each g(t) preserves the family F. Due to the inclusion F ⊂ G, we have a one-parameter group in G and therefore there exists a derivation δ ∈ Der(A) such that g(t) = exp tδ for t ∈ R. We claim that δ preserves the family F. In order to prove it, we have to check that whenever W ⊂ BA is a collectively compact subset of operators in BA, the set δ(W) ⊂ A is collectively compact, that is, δ(W)BH is totally bounded in K. Let W be as mentioned and let ε > 0 be given. From the definition of the exponential we have 1 exp tδ − I) − δ ≤ ε t for sufficiently small values of |t| ≤ τ , t = 0, the norm being the operator norm in L(A). Fix any t = t0 in the above conditions, and set f (t0 ) :=
1 exp t0 δ − I) − δ t
J.M. Isidro / Central European Journal of Mathematics 5(3) 2007 512–522
519
to shorten the writing. Since M := sup T < ∞, we have for all T ∈ W and all h ∈ H T ∈W
f (t0 )T h − δ(T )h ≤ f (t0 ) − δT h ≤ Mε. By assumption g(t0 ) = exp t0 δ preserves the family F, therefore the sets (exp t0 δ)W ⊂ A and W are collectively compact, hence by (2.2) f (t0 )(W) = t10 (exp t0 δ − I)(W) ⊂ A is collectively compact, too. Hence f (t0 )W BH ⊂ K, is totally bounded. Thus there are finite sets {h1 , · · · , hs } ⊂ H {T1 , · · · , Tr } ⊂ W with the following property: For each T ∈ W and h ∈ BH there is a pair (Ti , hj ) such that f (t0 )T h − f (t0 )Ti hj ≤ ε. Clearly {δ(Ti )hj : 1 ≤ i ≤ r, 1 ≤ j ≤ s} is a finite subset of δ(W)BH and by construction δ(T )h − δ(Ti )hj ≤ δ(T )h − f (t0 )T h + f (t0 )T h − f (t0 )Ti hj + f (t0 )Ti hj − δ(Ti )hj ≤ εT + ε + εTi ≤ (2M + 1)ε for all T ∈ W and all h ∈ BH , which shows that δ(W)BH is totally bounded.
For U, V ∈ A, the operator U2V ∈ L(A) is given by (U2V )X := 12 (UV ∗ X + XV ∗ U), (X ∈ A) and U2V − V 2U is a derivation of A. We let Inder(A) be the closure (in the operator norm of Der(A)) of the real linear span of the set of derivations that have the above form Inder(A) := span{U2V − V 2U : U, V ∈ A} It is clear from (2.3) that U2V − V 2U preserves collective compactness in A and so do the elements of Inder(A) as one can see by a routine argument on total boundedness similar to that made in (3.3). More generally let InderF(A) denote the closure of Inder(A) in Der(A) relative to the topology τ of uniform convergence on the sets W in F. Then Lemma 3.6. If A is a J∗ -algebra then InderF(A) is a Lie subalgebra of Der(A) and each element in it preserves collective compactness in A. In general InderF(A) is a proper subset of Der(A). For a detailed study of derivations of a JB∗ -triple see [5]. From now on, we assume that the J∗ -algebra A is a special Cartan factor. This will provide us with detailed information on the group Isom(A). Indeed, for rectangular Cartan factors, which are the spaces A := L(H, K) with Hilbert spaces H, K, dim H ≤ dim K, every surjective isometry has one of the forms (see [8] Satz 4) LU,V (T ) = V T U
LU,V (T ) = V T U
T ∈A
where T stands for the transpose of T , U and V are unitary operators on H and K, respectively, and the second form does not occur unless dim H = dim K.
520
J.M. Isidro / Central European Journal of Mathematics 5(3) 2007 512–522
Let H be a Hilbert space with a conjugation x → x, (x ∈ H), and let T → T , where T x := T ∗ (x), be the associated transposition on L(H). The symmetric and anti-symmetric Cartan factors are the spaces L(H) := {T ∈ L(H) : T = T }, where = +1 and ε = −1, respectively. The surjective linear isometries of these factors are the mappings LU (T ) = UT U T ∈ L(H) , where U is a unitary operator on H. Thus, for Cartan factors of type I with dim H = dim K and for Cartan factors of type II or III all surjective linear isometries are spatial, that is, they are induced by isometries of the underlying Hilbert spaces H, K, and hence they preserve collective compactness. Cartan factors of type I with dim H = dim K also have the isometry T → T which is not induced by any isometry of H and does not preserve collective compactness in L(H). By ([12] prop 9), the only compact operator contained in an infinite-dimensional spin factor (Cartan factor of type IV) is the null operator. Hence, in this case the family F only contains the set W = {0} and therefore any L ∈ IsomA preserves F. We consider the open ball BA as a Banach manifold in the canonical atlas (consists of the identity map Id : BA → BA as a local chart). In this local coordinate, each holo∂ where h : BA → A is a morphic vector field can be represented in the form X = h ∂X holomorphic map, and we shall identify X with the function h with no danger of confusion. The set g := aut(BA) of complete holomorphic vector fields on BA is a Lie algebra in the usual vector space operations and the Jacobi bracket. We have the vector space direct sum decomposition g=L⊕P (4) where L = Der(A) is the Lie algebra of all triple derivations of A and P consists of the ∂ quadratic vector fields X = QA ∂X where QA (X) = A − XA∗ X for A ∈ A. Moreover [L, L] ⊂ L,
[L, P] ⊂ P,
[P, P] ⊂ L
and more precisely [δ1 , δ2 ] = δ1 δ2 − δ2 δ1 ,
[δ, QA ] = Qδ(A) ,
[QA , QB ] = 2(A2B − B2A).
(5)
∂ ∈ g the expression X := sup h(X) defines a norm with Finally, for X = h ∂X X≤1
respect to which g is a Banach-Lie algebra and the decomposition (4) is topological. ∂ We say that a vector field X = h ∂X preserves the family F if so does the function h, that is, h(F) ⊂ F. Define h := {X = h
∂ ∈ g : h(F) ⊂ F} ∂X
Proposition 3.7. h is a closed Lie subalgebra of g and we have the topological vector space direct sum decomposition h = LF ⊕ PF
J.M. Isidro / Central European Journal of Mathematics 5(3) 2007 512–522
521
where LF consists of those derivations δ ∈ Der(A) that preserve F and PF := {QA : A ∈ K(H, H)}. Moreover, F is a closed Banach-Lie subgroup of G whose Lie algebra is h. Proof. From (5) and (2.2) it is clear that h is a Lie subalgebra of g. To show that h is closed in g, let Xn = δn + QAn be a sequence in h with Xn → X where X = δ + QA ∈ g. Since the decomposition (4) is topological, the canonical projections from h onto L and P are continuous, and hence we have δn → δ and QAn → QA . By assumption An ∈ K(H, K) are compact operators and the relation An = QAn (0) → QA (0) = A gives that A is compact and so QA ∈ PF. The proof that LF is closed is a repetition of the arguments made in (3.3). The fact that F is a Lie subgroup of G follows by standard arguments from (3.1) and (3.5). Corollary 3.8. Let g = MA L be a holomorphic automorphism of the unit ball BA of a special Cartan factor A. If A is of type I with dim H = dim K or of type II or III then g preserves collective compactness if and only if A = g(0) is a compact operator. If A is of type I with dim H = dim K then the same is true for the automorphisms g in the connected component of the identity of G. If A is of type IV and dim A = ∞ then g preserves collective compactness if and only if g is a surjective linear isometry of A.
Acknowledgement This work was supported by Ministerio de Educaci´on y Cultura of Spain, Research Project MTM 2005-02541.
References [1] P.M. Anselone and T.W. Palmer: “Collectively compact sets of linear operators”, Pac. J. Math., Vol. 25, (1968), pp. 417–422. [2] L.A. Harris: “Bounded symmetric homogeneous domains in infinite-dimensional spaces”, In: Proceedings on Infinite Dimensional Holomorphy, Lecture Notes in Mathematics, Vol. 364, Springer – Verlag, 1974, pp. 13–40. [3] L.A. Harris: “A generalization of C∗ -algebras”, P. Lond. Math. Soc., Vol. 42, (1981), pp. 331–361. [4] L.A. Harris and W. Kaup: “Linear algebraic groups in infinite dimensions”, Illinois J.. Math., Vol. 21, (1977), pp. 666–674. [5] T. Ho, J. Martinez Moreno, A. Peralta and B. Russo: “Derivations on real and complex JB∗ -triples”, J. Lond. Math. Soc., Vol. 65, (2002), pp. 85–102. [6] J.M. Isidro and W. Kaup: “Weak continuity of holomorphic automorphisms in JB*– triples”, Math. Z., Vol. 210, (1992), pp. 277–288. [7] J.M. Isidro and L.L. Stach´o: “Weakly and weakly* continuous elements in JBW*– triples”, Acta Sci. Math. (Szeged), Vol. 57, (1993), pp. 555–567.
522
J.M. Isidro / Central European Journal of Mathematics 5(3) 2007 512–522
[8] W. Kaup: “Uber die Automorphismen Grassmancher Mannigfaltigkeiten unendlicher Dimension”, Math. Z., Vol. 144, (1975), pp. 75–96. [9] W. Kaup: “A Riemann mapping theorem for bounded symmetric domains in complex Banach spaces”, Math. Z., Vol. 183, (1983), pp. 503–529. [10] W. Kaup: “Hermitian Jordan Triple Systems and Automorphisms of Bounded Symmetric Domains”, In: Santoz Gonz´alez (Ed.): Non-Associative Algebras and Applications, Kluwer Academic Publishers, 1994, pp. 204–214. [11] T.W. Palmer: “Totally bounded sets of precompact linear operators”, P. Am. Math. Soc., Vol. 20, (1969), pp. 101–106. [12] L.L. Stach´o and J.M. Isidro: “Algebraically compact elements in JB∗ -triples”, Acta Sci. Math. (Szeged), Vol. 54, (1990), pp. 171–190. [13] H. Upmeier: “Symmetric Banach Manifolds and Jordan C∗ -Algebras”, In: North Holland Mathematics Studies, Vol. 104, North Holland, Amsterdam, 1985. [14] J.P. Vigu´e and J.M. Isidro: “Sur la topologie du groupe des automorphismes analytiques d’un domaine cercl´e born´e”, B. Sci. Math., Vol. 106, (1982), pp. 417–426.
DOI: 10.2478/s11533-007-0018-0 Research article CEJM 5(3) 2007 523–550
Low rank Tucker-type tensor approximation to classical potentials B.N. Khoromskij∗, V. Khoromskaia† Max-Planck-Institute for Mathematics in the Sciences, Inselstr. 22-26, D-04103 Leipzig, Germany.
Received 17 December 2006; accepted 7 May 2007 Abstract: This paper investigates best rank-(r1 , ..., rd ) Tucker tensor approximation of higher-order tensors arising from the discretization of linear operators and functions in Rd . Super-convergence of the best rank-(r1 , ..., rd ) Tucker-type decomposition with respect to the relative Frobenius norm is proven. Dimensionality reduction by the two-level Tucker-to-canonical approximation is discussed. Tensorproduct representation of basic multi-linear algebra operations is considered, including inner, outer and Hadamard products. Furthermore, we focus on fast convolution of higher-order tensors represented by the Tucker/canonical models. Optimized versions of the orthogonal alternating least-squares (ALS) algorithm is presented taking into account the different formats of input data. We propose and test numerically the mixed CT-model, which is based on the additive splitting of a tensor as a sum of canonical and Tucker-type representations. It allows to stabilize the ALS iteration in the case of “ill-conditioned” tensors. The best rank-(r1 , ..., rd ) Tucker decomposition is applied to 3D tensors generated by classical −|x−y| 1 potentials, for example |x−y| , e−α|x−y| , e|x−y| and erf(|x|) with x, y ∈ Rd . Numerical results for tri-linear |x| decompositions illustrate exponential convergence in the Tucker rank, and robustness of the orthogonal ALS iteration. c Versita Warsaw and Springer-Verlag Berlin Heidelberg. All rights reserved. Keywords: Kronecker products, Tucker decomposition, multi-dimensional integral operators, multivariate functions, classical potentials MSC (2000): 65F30, 65F50, 65N35, 65F10
1
Introduction
Numerical tensor decomposition methods designed initially for the problems in chemometrics and electronical engineering are becoming more and more attractive for appli∗ †
E-mail:
[email protected] E-mail:
[email protected]
524
B.N. Khoromskij et al. / Central European Journal of Mathematics 5(3) 2007 523–550
cation in large-scale numerical computations in higher dimensions [1, 2, 4–6]. Indeed, numerical tensor-product decomposition gives the possibility to construct fast and economical algorithms with linear or even sub-linear scaling for computations involving large higher-order tensors. Typical examples are the equations of many-particle modeling for electronic structure calculations which evoke rigorous computations of multi-dimensional interactions via classical potentials. For these purposes, the key point is the efficient approximation of fully populated higher-order tensors representing multivariate functions and operators by using certain data-sparse Kronecker product structures. Computational techniques of tensor decomposition can be understood as higher-order analogues of standard linear algebra methods for matrix-vector and matrix-matrix calculations. In general, the efficient multi-linear algebra (MLA) tools cannot be derived via the straightforward extension of classical numerical linear algebra. In fact, instead of “linear operations” such as SVD or EVD factorizations (finite algorithms), we arrive at challenging nonlinear optimization problems. In this paper we apply best rank-(r1 , ..., rd ) approximation via the Tucker model which can be viewed as an extension of the best rank-r approximation to a matrix. This model allows dimensionality reduction via transformation of the initial large, higherorder tensor to a smaller representation coefficients array, i.e., core tensor, with respect to the problem-dependent orthogonal tensor-product basis, the so-called Tucker factors or orthogonal matrix components. We also discuss the possible optimization of the Tucker decomposition by applying the so-called CANDECOMP/PARAFAC (CP) model (cf. [3, 17]) to the core tensor (see Appendix for the definition). This model was first rigorously analyzed in [22]. Numerical treatment of multi-dimensional operators/functions including tensor-tensor operations are usually limited by insufficient computational resources for the required MLA. As a natural remedy, tensor-product representation provides the performance of standard tensor operations with asymptotically optimal complexity. In Section 2 we estimate the complexity of tensor-product implementations of the inner, outer and contracted products as well as the Hadamard and convolution products of d-th order tensors. Then we discuss three versions of the orthogonal alternating least-squares algorithm (OALSA) to compute the best rank-(r1 , ..., rd ) Tucker-type decomposition. The first one addresses decomposing the full-format tensor into the Tucker format (OALSA(F → Tr )), see [5] for a detailed description of the traditional algorithm. The second one works with the input data presented as components in the CP model (OALSA(CR → Tr )), while the third algorithm applies to the input data given in the Tucker format (OALSA(TR → Tr )). The latter modifications can be interpreted as rank reduction methods in the CP and Tucker models, respectively. In §2.4.3 we discuss the mixed CP-Tucker and Tucker-Tucker models.‡ Numerical results show that mixed models allow a stabilization of the ALS iteration in the case of “ill-conditioned” target tensors. ‡
We appreciate the referee for attracting our attention to the fact that more general ”block term decomposition” was introduced by L. De Lathauwer at the workshop on Tensor Decompositions, Luminy, France, 2005 (see [8]).
B.N. Khoromskij et al. / Central European Journal of Mathematics 5(3) 2007 523–550
525
Lemma 2.4 proves the super-convergence property of the best rank-(r1, ..., rd ) Tuckertype approximation with respect to the Frobenius norm. To optimize the numerical operator calculus, we discuss different strategies to “compress” the core tensor in the Tucker model: (A) Best N-term approximation of the core tensor via element-wise truncation; (B) The CP decomposition of the core tensor (cf. the two-level Tucker model [20] and dimensionality reduction in [6]). Lemma 2.5 shows that the CP decomposition of a tensor represented in the Tucker format can be reduced to the CP approximation of the corresponding “small size” core tensor. Corollary 2.6 describes a two-level method for rank-reduction in the CP model. In Section 3, the Tucker model is applied to classical potentials. We give a number of numerical examples illustrating efficiency of the orthogonal rank-(r1 , ..., rd ) decomposition via OALSA applied to a class of tensors related to the Newton, Yukawa and Helmholtz potentials. Furthermore, we discuss the numerical Tucker-type decomposition of certain tensors related to the Hartree-Fock equation. Main observations from our numerics are the following: - Exponential convergence of OALSA in the Tucker rank r = max r (cf. (13)).
- Quadratic convergence for the relative energy (cf. Lemma 2.4). - Robust convergence of ALS iteration applied to the classical potentials. - Efficient tensor operations in the Tucker/CP formats leading to asymptotically optimal MLA, see §2.3. In the Appendix we present auxiliary results describing the Lagrange equation for the dual maximization problem, define the canonical decomposition and discuss quadratic convergence for the eigenvalues in the familiar Rayleigh quotient approximation (linear algebra analogue to Lemma 2.4).
2
Rank-(r1, ..., rd) Tucker-type Decomposition
2.1 Preliminaries We consider the linear space of real-valued d-th order tensors A = [ai1 ...id ] ∈ RI , defined on the product index set I = I1 × ... × Id withI := {1, ..., n }, (1 ≤ ≤ d). We make use of the Frobenius (2 -energy) norm A := A, A induced by the inner product ai1 ...id bi1 ...id with A, B ∈ RI . (1) A, B := (i1 ,...,id )∈I
It corresponds to the Euclidean norm of a vector. In the following the notation “◦” means the outer product of vectors which form the canonical (rank-1) tensor U ≡ {ui }i∈I = b · U (1) ◦ ... ◦ U (d) ∈ RI ,
b ∈ R,
defined by the entries (1)
(d)
ui1 ...id = b · ui1 · · · uid
()
with U () ≡ {ui }i ∈I ∈ RI .
(2)
526
B.N. Khoromskij et al. / Central European Journal of Mathematics 5(3) 2007 523–550
There is an alternative commonly used notation “⊗” for the outer product of vectors. The Kronecker product ⊗ usually applies when tensors are represented in the vector/matrix formats with respect to some fixed basis. Given A ∈ RI , the rank-(r1 , ..., rd ) Tucker model deals with the approximation over a sum of the rank-1 tensors A(r) =
r1 k1 =1
rd
...
kd =1
(1)
(d)
bk1 ...kd Vk1 ◦ ... ◦ Vkd ≈ A,
(3)
()
where the components Vk ∈ RI (k = 1, ..., r , = 1, ..., d) are real-valued vectors of the respective size n = #I , r = (r1 , ..., rd ) (the Tucker rank) and bk1 ...kd ∈ R (cf. Fig. 1 () visualizing (3) for d = 3). Intrinsic feature of the Tucker model is that vectors {Vk } are V
(3)
I3
r3
I2 I1
A
B
=
r2
I3
r1 r3
r2 V
(2)
I2 I1 r1
V
(1)
Fig. 1 Visualization of the Tucker model for a third-order tensor. orthonormal,
() Vk , Vm()
= δk ,m ,
k , m = 1, ..., r ; = 1, ..., d, ()
()
(4)
()
where δk ,m is Kronecker’s delta (i.e., V() = [V1 V2 ...Vr ] is an orthogonal matrix, T V() V() = I for = 1, ..., d). In the following, we denote by T (n,r) (shortly T r ) the set of tensors parameterized by (3), (4). For A(r) ∈ T r , we use the concise notation A(r) = B ×1 V(1) ×2 V(2) ... ×d V(d)
(5)
with the orthogonal matrices V() ∈ RI ×r and with the core tensor B = {bk } ∈ Rr1 ×...×rd . Here × ( = 1, ..., d) denotes the conventional -mode product of a tensor by a matrix of the respective size. The so-called CANDECOMP/PARAFAC (CP) decomposition can be formally introduced as a particular case of the Tucker model (3) corresponding to the choice r = r ( = 1, ..., d), where the only super-diagonal of B contains nonzero elements (see Appendix). However, from the computational point of view the Tucker and CP models have completely different features. We denote by C (n,r) (shortly C r ) the corresponding set of the d-th order tensors. Notice that each tensor in T r can be interpreted (highly redundant representation) as an element in C r with r = r1 · · · rd , i.e. we have T r ⊂ C r .
B.N. Khoromskij et al. / Central European Journal of Mathematics 5(3) 2007 523–550
527
We notice that in CP decomposition the factors are not necessarily orthogonal and in general r can be larger than n = max n .
In some applications (say, in numerical calculus of multi-dimensional operators) both the Tucker and CP models can be combined gainfully (cf. the two-level Tucker model in [20] and a dimensionality reduction approach in calculation of the CP model [6]). Hence, in the following, we also discuss the two-level Tucker model T (n,r,q) , which contains all elements A ∈ T (n,r) such that for the corresponding core tensor we have B ∈ C (r,q) . Clearly, we have T (n,r,q) ⊂ C (n,q) . In the following, we assume that q ≤ |r| := max r (see the discussion in §2.3).
Remark 2.1. Let r = r , n = n ( = 1, ..., d). Then the Tucker model requires only drn numbers to represent the Tucker components (we do not take into account the orthogonality constraints) plus r d memory units for the core tensor. In general, the memory d d r n + r . Compared with the canonical representation in C n,r , the consumption is =1
=1
Tucker model requires an extra memory r d − r (our numerical results indicate that an additional cost to save the core tensor pays off here in full). On the other hand, the two-level Tucker model T (n,r,q) has the reduced memory consumption dr(n + q) since, in general, drq r d , providing a good basis for various tensor-tensor operations (see §2.2).
2.2 Tensorization of Basic MLA Operations For the sake of clarity (and without loss of generality) in this section we consider the case r = r , n = n ( = 1, ..., d). If there is no confusion, we can skip the index n. We denote by N the complexity of various tensor operations (say, N·,·) or the related memory requirements (say, Nmem(B) ). We distinguish the following standard tensor-product operations: the inner product, the outer and the contracted products (cf. [1]) as well as the so-called Hadamard (component-wise) product. To estimate the complexity of numerical decomposition in T r and related computations, we take a closer look at the standard MLA operations in the Tucker format§: (I) Memory demands; (II) Frobenius and generalized 2 -inner product; (III) Various tensor-times-tensor operations including the Hadamard product; (IV) Convolution of tensors. Usually, the numerical Tucker decomposition leads to the fully populated core tensor, i.e., it is represented by r d nonzero elements. However, in some cases a special data structure can be imposed (cf. [20]), this reduces the complexity of the corresponding §
Complexity analysis of MLA in the CP model can be viewed as the particular case, corresponding to the class of tensors in T r with r = (r, ..., r) and with zero off-diagonal terms in the core tensor.
528
B.N. Khoromskij et al. / Central European Journal of Mathematics 5(3) 2007 523–550
MLA. In this discussion we assume one of the following situations: (A) B is sparsely populated, more precisely, it has only Nmem(B) r d nonzero elements. Furthermore, we denote by S(B) a sparsity pattern of B, such that #(S(B)) = Nmem(B) ; (B) B has zero off-diagonal elements (i.e., r = (r, ..., r), A ∈ C r and Nmem(B) = r); (C) B is represented in the canonical format, B ∈ C q (two-level Tucker model T (r,q) ). 2.2.1 Memory Requirements In the case (A), the Tucker model requires drn + Nmem(B) memory to represent a tensor, where Nmem(B) ≤ r d . In turn, the number of parameters in the CP model (case(B)) scales linearly in the rank parameter R: dRn+ R (cf. Remark 2.1). Setting R = αr with α ≥ 1, we can specify the range of parameters where the Tucker model is less memory consuming compared with the CP model r d−1 ≤ d(α − 1)n. Note that in some applications in the field of mathematical physics d is not too large while the grid parameter n is about several hundreds. For example, with d = 3, α = 3 √ n = 200, we obtain r ≤ 1200 ∼ 34, however the Tucker rank usually varies in [10 − 20] (see numerics below). In the case (C), the memory demands are dr(n + q). 2.2.2 Generalized 2 -Inner Product For given tensors A1 ∈ T r1 , A2 ∈ T r2 represented in the form (5), i.e., A1 = B ×1 U(1) ×2 U(2) ... ×d U(d) ,
A2 = C ×1 V(1) ×2 V(2) ... ×d V(d) ,
(6)
the 2 -inner product (1) can be computed by A1 , A2 :=
r2 r1 k=1 m=1
bk1 ...kd cm1 ...md
d
() Uk , Vm()
.
(7)
=1
We further simplify and suppose r1 = r2 . Then calculation in (7) includes dr(r + 1)/2 inner products of vectors of size n (due to the symmetry argument) plus 2#S(B) · #S(C) multiplications, leading to the overall complexity N·,· = O(dn
r(r + 1) + 2#S(B) · #S(C)), 2
and the same for the Frobenius norm. In the case (C) the inner product can be computed in Cq 2 + dr 2 n + dq 2 r operations (cf. [20], Lemma 2.8). Let A : RI → RI be a matrix having Kronecker-product form: A = A(1) ⊗...⊗A(d) ∈ RI×I (rank-1 tensor of order d) with some given A() ∈ Rn×n . Then the generalized inner product r2 r1 d () A() U k , Vm() (8) bk1 ...kd cm1 ...md AA1 , A2 := k=1 m=1
=1
B.N. Khoromskij et al. / Central European Journal of Mathematics 5(3) 2007 523–550
529
has the same computational cost as above, where n is substituted by the cost of the lowdimensional matrix-vector product in Rn . For example, the finite difference representation of the d-dimensional Laplace operator on the uniform grid is given by A = A(1) ⊗ I2 ... ⊗ Id + I1 ⊗ A(2) ⊗ ... ⊗ Id + ... + I1 ⊗ I2 ... ⊗ A(d) ,
A() , I ∈ Rn×n
with I being the n × n identity and A() = tridiag{−1, 2, −1}. Hence each matrix in the d-term representation above has the required Kronecker-product form. Furthermore, if the components A() ∈ Rn×n have H-matrix (resp. Toeplitz) structure or they inherit the wavelet-based sparsity patterns, then we have NA·,· = O(d
r(r + 1) n logq n + 2#S(B) · #S(C)). 2
2.2.3 Outer, Contracted and Hadamard Products For given tensors A ∈ RI1 and B ∈ RI2 , the outer product A ◦ B ∈ RI1 ×I2 is of size I1 × I2 with the components (A ◦ B)i1 ,i2 = Ai1 · Bi2 ,
i1 ∈ I1 , i2 ∈ I2 .
Clearly, for A1 , A2 ∈ T r we are able to tensorize the outer product by
(1) (d) (1) (d) A1 ◦ A2 := bk1 ...kd cm1 ...md Uk1 ◦ Vm1 ×2 ... ×d Ukd ◦ Vmd .
(9)
1≤k ,m ≤r, =1,...,d
This leads to the memory demands Nmem(A◦B) = O(2dr 2n + r 2d ) for the naive component-wise storage. However, the implicit representation of complexity () () O(2drn + r 2d ) can be implemented if one stores only 2r components Uk , Vm for each = 1, ..., d. The contracted product of two tensors is an extension of the matrix-vector multiplication combined with the outer product: for some portion of modes we compute the inner product while for the remaining components we calculate the outer product with the corresponding ordering of modes. For given tensors A ∈ RI×J and B ∈ RJ ×M , the contracted product along the index J results in a tensor Z := A, B J ×J of size I × M, given by Z = {zim } ∈ RI×M with zim = aij bjm . j∈J
Notice that the -mode product × can be interpreted as the particular case of the contracted product of tensors. For tensors in the Tucker form with respect to I we obtain (1) (d) (1) (d) A1 , A2 I×I = bk1 ...kd cm1 ...md Uk1 , Vm1 ×2 ... ×d Ukd , Vmd 1≤k ,m ≤r, =1,...,d
I1 ×I1
Id ×Id
530
B.N. Khoromskij et al. / Central European Journal of Mathematics 5(3) 2007 523–550
with the memory requirements Nmem(A1 ,A2 I×I ) = O(2dr 2nJ nM + r 2d ),
nJ = #J , nM = #M.
For given tensors A, B ∈ RI , the Hadamard product A B ∈ RI of two tensors of the same size I is defined component-wise (A B)i = Ai · Bi ,
i ∈ I.
Hence, for A1 , A2 ∈ T r we tensorize the Hadamard product by
(1) (d) bk1 ...kd cm1 ...md Uk1 Vm(1)1 ×2 ... ×d Ukd Vm(d)d . A1 A2 :=
(10)
1≤k ,m ≤r, =1,...,d
This leads to the memory requirements (due to the symmetry argument) Nmem(AB) = O(d
r(r + 1) n + r 2d ). 2
2.2.4 Multi-Dimensional Convolution Product The multi-dimensional convolution product is one of the basic transforms in the wide range of applications including many-particle models (see [9, 12, 18, 19] and examples in §3). We consider the discrete version of the multi-dimensional convolution transform in Rd based on the Nystr¨om type scheme (similar for the collocation with piecewise constant basis functions) f (y)g(x − y)dy ≈ hd f (yi )g(xj − yi ), I := {1, ..., n}d , (f ∗ g)(x) := Rd
i∈I
where, for the ease of presentation, the collocation points xj , yi are assumed to be located on the same equi-distant spatial tensor-product grid of size h. The functions f, g are supposed to have the finite support [0, A]d with A = nh. Introducing the corresponding function generated tensors (FGTs) F = {f (xi )}, G = {g(−xi )} ∈ RI , we define their discrete convolution product by
F ∗ G := Fk Gj−k , J := {1, ..., 2n − 1}d . k∈I
j∈J
For given A1 , A2 ∈ T r , we now tensorize the convolution product via A1 ∗ A2 := h
d
r k,m=1
(1) (d) bk1 ...kd cm1 ...md Uk1 ∗ Vm(1)1 ×2 ... ×d Ukd ∗ Vm(d)d . ()
()
(11)
Assuming that one-dimensional convolutions Uk ∗ Vm ∈ R2n−1 can be computed in O(n logq n) operations, we arrive at the overall complexity estimate N·∗· = O(dr 2n logq n).
B.N. Khoromskij et al. / Central European Journal of Mathematics 5(3) 2007 523–550 ()
531 ()
In our particular case of equidistant grids we obtain (by setting a = Uk , b = Vm ∈ Rn ) () (Uk
∗
Vm() )j
=
n
ak bj−k ,
j = 1, ..., 2n − 1.
k=1
Hence, the one-dimensional convolution can be performed by FFT in O(n log n) operations. One-dimensional O(n logq n)-complexity convolution on non-equidistant grids is discussed in [12]. It can be directly applied in (11). We notice that the convolution product appears to be one of the most computationally elaborate operations (cf. Lemma 2.3, (IV)) since in general one might have #S(B1 ) · #S(B2 ) = r 2d . Significant complexity reduction is observed if one of the convolving tensors can be represented by the CP model (say, with rank r), so that we have #S(B1 ) · #S(B2 ) = r d+1 . Hence, the complexity reduction factor is r d−1 . Below, we give numerical examples illustrating the performance of the convolution product in the Tucker format. We use the following (rather simple) algorithm: Given the function-generated tensors (FGTs) A1 ∈ T r1 , A2 ∈ T r2 with the moderate grid-size n1 = n2 = ... = nd = n corresponding to the tensor-product n × ... × n grid in [0, A]d with arbitrary grid-spacing in each spatial direction (say, adaptive grid) and with minimal grid-size hmin . We introduce an auxiliary equidistant grid of size N × ... × N with n N which satisfies h := A/N ≤ hmin due to the approximation requirements. Let Pn→N and PN →n be the corresponding 1D linear interpolation operators from the adaptive to fine grid and vice-versa, and in each spatial direction. Then we perform the following steps: CONV (T r1 , T r2 ). Given the input tensors A1 ∈ T r1 , A2 ∈ T r2 . Step I: Using Pn→N , interpolate the Tucker components to the fine grid; () () Step II: Compute one-dimensional convolution products h Uk ∗ Vm ∈ R2N −1 in O(dr 2N log N) operations by the FFT; Step III: Using PN →n , interpolate the result to the initial grid; Step IV: Compute the convolution product via (11) in at most O(r 2d) operations. Remark 2.2. Algorithm CONV (T r1 , T r2 ) has linear scaling in dimension d. However, in the case of strong mesh refinement the auxiliary dimension N may be so large (e.g. N ≥ 104 ) that the complexity of Step II, O(dr 2N log N), becomes the bottleneck of our numerical scheme. For such situations, based on the ideas in [12], one can use the special modification of the convolution of complexity O(n logq n) that applies to composite refined grids. The corresponding numerical results will be presented elsewhere. First, we demonstrate the reduction factor between the computational times tT T and tCT , corresponding to the cases of Tucker-Tucker and CP-Tucker convolving tensors, respectively, with d = 3. The next table represents the ratios tCT /tCT (2) and tT T /tCT (2) (scaled time) for the same values of the Tucker and the Kronecker ranks, r = (r, r, r), r = 2, 3, ..., 6, but with fixed parameters n = 32, 64 and N = 320, 640, respectively. Here
532
B.N. Khoromskij et al. / Central European Journal of Mathematics 5(3) 2007 523–550
tCT (2) refers to the computational time-unit for the CT-type convolution with r = 2 and with n = 32.
CT TT
r
2
3
4
5
6
7
8
9
10
n = 32
1
4.67
7.41
15.78
30.00
52.22
85.19
131.48
194.81
n = 64
6.93
22.22
57.41
122.96
234.44
410.37
668.17
-
-
n = 32
3.52
25.19
115.19
392.22
1085.19
-
-
-
-
Note that the computational time in the full n × n × n format is about T1 = 0.02, T2 = 1.11, T3 = 68.4 for n = 16, 32, 64, respectively, all scaled in computational time-unit tCT (10) = 194.8 · tCT (2) with n = 32 given in the previous table. The asymptotical complexity estimate for d-dimensional convolution in the full format is O(n2d ). The second table represents the scaled computational time tCT /tCT (1) for fixed r = 5 and fixed n = 32, 64 but for different values of N = 2p n (p = 1, 2, ..., 8). Here tCT (1) corresponds to the computational time for p = 1, n = 32.
CT
p
1
2
3
4
5
6
7
8
n = 32
1
1
1
1
1.13
1.85
4.59
15.46
n = 64
7.13
7.13
7.20
7.17
8.07
10.87
21.70
-
These data indicate that FFT on the equi-distant fine grid of size N has negligible cost compared with computing the sum (11), at least in the parameter domain N ≤ 4096. Hence, Algorithm CONV (T r1 , T r2 ) can be applied successfully in the case of moderate mesh-refinement such that N ≤ 104 . 2.2.5 Resum´e of the Complexity of MLA in the Tucker Model The next Lemma collects all the previous results but now presented in the general case of a fixed sparsity pattern of the target core tensors. Lemma 2.3. (complexity of MLA in the Tucker model). For given tensors A1 , A2 ∈ T r with fixed sparsity patterns of the core tensors S(B1 ), S(B2 ), respectively, we have (I) Memory requirements Nmem(A1 ) = dnr + #(S(B1 )). (II) Complexity of 2 -inner product 1 NA1 ,A2 = dnr(r + 1) + 2#S(B1 ) · #S(B2 ), 2 and the same for the Frobenius norm. (III) Memory requirements for the outer and Hadamard products are given by Nmem(A1 ◦A2 ) = 2dr 2 n + #S(B1 ) · #S(B2 ), 1 Nmem(A1 A2 ) = dr(r + 1)n + #S(B1 ) · #S(B2 ), 2
B.N. Khoromskij et al. / Central European Journal of Mathematics 5(3) 2007 523–550
533
respectively. Complexity estimates for these operations have similar bounds. () () (IV) Assuming that one-dimensional convolution products Uk ∗ Vm ∈ R2n−1 can be computed in O(n logq n) operations, we obtain the complexity bound NA1 ∗A2 = dr 2 n logq n + #S(B1 ) · #S(B2 ). Complexity of the convolution product in the full format is estimated by O(n2d ) while the FFT on the uniform grid still has exponential scaling in d, O(nd logq n). Proof. The proof is elementary and it goes along the line of the previous discussion.
2.3 Some Properties of the Orthogonal Tucker Decomposition The numerical Tucker-type approximation of d-th order tensors is one of the most practically important MLA operations. This operation is, in fact, one of the higher order extensions of the best rank-r approximation in the linear algebra (in particular, of the truncated SVD). Given A0 ∈ RI1 ×...×Id , its best rank-(r1 , ..., rd ) Tucker-type approximation can be derived by straightforward minimization of the quadratic cost functional f (A) := A − A0 2 → min
(12)
over all rank-r tensors A ∈ T r , which will be parameterized as in (3) and with the constraints V() ∈ V := Vn ,r ( = 1, ..., d), where Vn,r := {Y ∈ Rn×r : Y T Y = I ∈ Rr×r } is the so-called Stiefel manifold. This minimization problem was first addressed in [23]. The Appendix collects the results on the existence of local minima in (12) and describes the Lagrange equations for the corresponding dual problem (cf. Lemma 4.2). In general, the starting value in the minimization process for solving (12) can be computed using the so-called higher-order SVD [4], which provides, in fact, the conventional Tucker decomposition. For a wide class of FGTs, the quality of approximation via minimization (12) can be effectively controlled by the Tucker rank. In particular, for certain analytic generating functions we are able to prove the exponential convergence (cf. [13, 20]), A(r) − A0 ≤ Ce−αr
with r = max r .
(13)
As a consequence, the approximation error ε > 0 can be achieved with r = O(| log ε|). The following Lemma proves that the relative energy error of the Tucker decomposition A(r) is estimated by the square of the relative energy norm of A(r) − A0 . This result is a reminiscence of the error bound for the Rayleigh quotient approximation to the symmetric eigenvalue problem in linear algebra (cf. (35) in Appendix).
534
B.N. Khoromskij et al. / Central European Journal of Mathematics 5(3) 2007 523–550
Lemma 2.4. (super-convergence in energy). Let A(r) ∈ RI1 ×...×Id solve the minimization problem (12) over A ∈ T r . Then we have the ”quadratic” relative error bound A(r) − A0 2 A0 − A(r) ≤ . A0 A0 2
(14)
Proof. First part of the proof is given for the completeness (cf. [4] for a short exposition). Letting A(r) = B ×1 V(1) ×2 V(2) . . . ×d V(d) , we easily obtain the identity A(r) = B,
(15)
since orthogonal matrices V() ∈ V do not effect the Frobenius norm. Furthermore, with fixed V() ( = 1, ..., d), relation (12) is merely a linear least-square problem with respect to B, A0 , A0 − 2A0 , B ×1 V(1) ×2 . . . ×d V(d) + B, B → min, hence the corresponding Lagrange equation takes the form −A0 , δB ×1 V(1) ×2 . . . ×d V(d) + B, δB = 0 ∀ δB ∈ Rr1 ×...×rd , this implies T
T
B − A0 ×1 V(1) ×2 . . . ×d V(d) = 0.
(16)
Next we readily obtain f (Ar ) = Ar 2 − 2B ×1 V(1) ×2 . . . ×d V(d) , A0 + A0 2 T
T
= Ar 2 + A0 2 − 2B, A0 ×1 V(1) ×2 . . . ×d V(d) = A0 2 − B2 , hence it follows that [compare with (36)] A0 2 − Ar 2 = Ar − A0 2 . The latter leads to the final estimate (clearly A0 ≥ Ar ) Ar − A0 2 Ar − A0 2 A0 − Ar = ≤ . A0 (Ar + A0 )A0 A0 2 Numerical efficiency of standard tensor operations described above depends on the data-sparsity of the core tensor. The next lemma presents a simple but useful characterization of the two-level Tucker model (cf. [20]). This allows to approximate the elements in T r via the CP decomposition applied to the small sized core tensor (cf. dimensionality reduction in [6]). Lemma 2.5. (two-level Tucker-to-CP approximation). Let the target tensor A0 ∈ C (n,r) in the minimization problem (32) have the form A0 = B0 ×1 V(1) ×2 V(2) ... ×d V(d) with
B.N. Khoromskij et al. / Central European Journal of Mathematics 5(3) 2007 523–550
535
components V() ∈ RI ×r in the Stiefel manifold, and with the core tensor B0 ∈ Rr1 ×...×rd . Then, for a given q ≤ |r|, min A − A0 2 = min B − B0 2 . A∈C (n,q) B∈C (r,q)
(17)
Moreover, the optimal rank-q CP approximation A(q) ∈ C (n,q) of A0 (if existing) and the optimal rank-q CP approximation B(q) ∈ C (r,q) of B0 are related by A(q) = B(q) ×1 V(1) ×2 V(2) ... ×d V(d) .
(18)
Proof. Notice that the Tucker components Y () of any test element q
A=
(1)
λk ×1 Yk
(d)
×2 ... ×d Yk
(19)
k=1
in the left-hand side of (17) can be chosen in span{V() } ( = 1, ..., d), i.e., () Yk
r
=
()
μk,mVm() ,
k = 1, ..., r, = 1, ..., d.
(20)
m=1
Indeed, assuming () Yk
=
r
()
()
()
μk,m Vm() + Ek
with Ek ⊥span{V() },
m=1 ()
we conclude that Ek does not effect the cost function in (17) because of the orthogonality () of V() . Hence, setting Ek = 0, and substituting (20) into (19), we arrive at the desired Tucker decomposition A = B ×1 V(1) ×2 V(2) ... ×d V(d) with the respective core tensor B=
q
(1)
(d)
bk ×1 Uk ×2 ... ×d Uk ∈ C (r,q) ,
k=1 ()
()
where bk = λk , Uk = {μk,m }rm =1 ∈ Rr , obtained from A=
q
(1)
λk ×1 Yk
k=1
=
=
q
r1
λk ×1 (
k=1 r1 m1 =1
(d)
×2 ... ×d Yk (1) μk,m1 Vm(1)1 )
m1 =1 q
...
rd
md =1
k=1
λk
d
×2 ... ×d (
()
μk,m
rd md =1
(d)
μk,md Vm(d) ) d
×1 Vm(1)1 ×2 ... ×d Vm(d) . d
=1
Now the relation (17) follows since the -mode multiplication with orthogonal components V() does not change the cost function. Similar arguments justify (18).
536
B.N. Khoromskij et al. / Central European Journal of Mathematics 5(3) 2007 523–550
Lemma 2.5 suggests a two-level dimensionality reduction approach that leads to a better data structure compared with the standard Tucker model. Though A(q) ∈ C (n,q) can be represented in the CP format, its efficient storage depends on the representation of the orthogonal matrices V() . In fact, if V() are obtained via the Tucker decomposition, it is better to store A(q) in the CP format of the complexity rdn (adaptive two-level model [20]). However, if V() are represented in a fixed basis (say, sinc- or a wavelet basis) then one can store the core tensor only in the CP format, which leads to substantial memory reduction to qdr (no dependency on the data-size n). The next statement is a direct consequence of the previous lemma. Corollary 2.6. (two-level CP-to-CP approximation). Let the target tensor A0 ∈ C (n,r) in the minimization problem (32) have the canonical form A0 = D0 ×1 U(1) ×2 U(2) ... ×d U(d) with normalized components U() ∈ Rn×r (we ignore the orthogonality requirements), and with the superdiagonal core tensor D0 ∈ Rr×...×r . Introduce the equivalent representation A0 = B0 ×1 V(1) ×2 V(2) ... ×d V(d) with components V() ∈ Rn×r in the Stiefel manifold (say, computed by the QR-decomposition of each component U() ). Then, for a given q ≤ r, min A − A0 2 = min B − B0 2 . A∈C (n,q) B∈C (r,q)
(21)
Moreover, the optimal rank-q CP approximation A(q) ∈ C (n,q) of A0 (if existing) and the optimal rank-q CP approximation B(q) ∈ C (r,q) of the core tensor B0 are related by (18). Corollary 2.6 indicates that the rank reduction in the CP model can be performed via the CP approximation of a “small size” core tensor arising from the component-wise orthogonalisation in the target data-array (the latter with the cost O(dr 2n)). Remark 2.7. Note that there is a simple procedure based on SVD to reduce the Kronecker rank of the core tensor B. Let d = 3 for the sake of simplicity. Denote by Bm ∈ Rr×r , m = 1, ..., r the two-dimensional slices of B in some fixed direction. Hence, we can represent r B= Bm × Zm , Zm ∈ Rr , m=1
where Zm (m) = 1, Zm (j) = 0 for j = m (there are exactly d possible decompositions). For given tolerance ε > 0, let pm be the number of singular values of Bm which are pm larger than ε. Then, denoting by Bpm = σkm ukm × vkm the corresponding rank-pm km =1
approximation to Bm (by truncation of σpm +1 , ..., σr ), we arrive at the rank-R canonical representation r Bpm × Zm , Zm ∈ Rr , with B − BR ≤ rε, BR := m=1
which is a sum of rank-pm terms so that the total rank is R = p1 + ... + pr ≤ r 2 . This can be easily extended to arbitrary d ≥ 3 so that we have R ≤ r d−1 .
B.N. Khoromskij et al. / Central European Journal of Mathematics 5(3) 2007 523–550
537
2.4 Numerical Algorithms There is a number of algorithms in the literature to compute the CP and the Tucker models (see, e.g., [1, 5, 6, 28]). Based on Lemma 4.1, 4.2 a rank-(r1 , ..., rd ) approximation can be calculated by the Newton-type methods. 2.4.1 General Input Data Our current MATLAB implementation of the orthogonal ALS algorithm (OALSA) in the general case of full-format input tensors is based on the method described in [5] (see also implementation in [1]). It contains the following steps. OALSA (F → T r ). Given the input tensor A0 ∈ Rn1 ×...×nd and a rank-parameter r = (r1 , ..., rd ) ∈ Nd . Step I: Compute the initial guess for Tucker components. Step II: For each (m = 1, ..., d) the ALS iteration optimizes the canonical component V(m) , while the other matrix-components are kept constant (equivalent to solving the equation number m in the system (29) (resp. (30)). Termination criterion: fixed number of iterations or control the current increment. Step III: Compute the core tensor via convolution (16). Steps II and III are standard, while the method of choice in step I depends on the particular application. We distinguish three particular versions of OALSA (F → T r ) adapted to different classes of input tensors: (F) full-format F (initial guess: truncated higher-order SVD (cf. [5]), approximation with smaller Tucker rank or multi-way cross-approximation algorithm as proposed in [25]); (C) type C R with some R > |r| (may correspond to an analytic approximation via sincquadratures or exponential fitting). Initial guess: QR-decomposition with truncated higher-order SVD; (T) type T R (may correspond to an analytic approximation via tensor-product interpolation). Initial guess: QR-decomposition with truncated higher-order SVD, or approximation with smaller Tucker rank accomplished with best rank-1 approximation to the initial increment; Cases (C) and (T) will be discussed in the next section. 2.4.2 Canonical-to-Tucker and Tucker-to-Tucker Decompositions The efficient implementation in the cases (C) and (T) is based on MLA performed in special tensor formats described above. In fact, if the initial guess has data-type T R or C R with moderate |R|, then truncated higher-order SVD can be performed with lower cost compared with the case of full-format input data as in the case (F). Hence, we specify the corresponding versions of the general algorithm: OALSA(C R → T r ) and OALSA(T R → T r ).
538
B.N. Khoromskij et al. / Central European Journal of Mathematics 5(3) 2007 523–550
To perform the truncated higher-order SVD in Step I of OALSA (C R → T r ), we notice that for each = 1, ..., d the “matrix unfolding” A() to the input-tensor A0 , of dimension I × I+1 × ... × Id × I1 × ... × I−1 , can be represented as a rank-R matrix. For example, for = 1, we have (1)
A(1) = [ai1 (i2 ...id ) ] = [ai1 i2 ...id ] ∈ Rn1 ×(n2 ···nd) . We introduce the ”single hole” product and related dimension parameter n (−)
Vk
(1)
= Vk
(−1)
×2 ... ×−1 Vk
(+1)
×+1 Vk
(d)
... ×d Vk ,
n = n1 · · · n−1 n+1 · · · nd , (22)
and represent the rank-R matrix unfolding in the form A() = A BT with A ∈ Rn ×R , B ∈ Rn ×R given by A = V() D
with D = diag{b1 , ..., bR }
(23)
and with (−)
B = [V1 (−)
where Vk
(−)
is the vector unfolding to Vk
, ..., Vr(−) ],
(24)
. Then the algorithm reads as follows.
OALSA (C R → T r ) Given the input tensor A0 ∈ C R in the form (31). Step I: For each = 1, ..., d perform 1. QR-decomposition of A and B (cf. (23) and (24), respectively) in the form, () () () () A = QA RA , B = QB RB ; ()
() T
2. SVD of a matrix S = RA RB ∈ RR×R in the form S = W D V ; ˜ D ˜ ∈ RR×r ; ˜ V˜ with W 3. Truncation of S to rank r matrix W () ˜ 4. Compute the Tucker components U() = QA W . () Starting with initial values U , = 1, ..., d, proceed with Steps II and III as in the general version of OALSA. Efficient implementation of Step I in OALSA (T R → T r ) is based on the observation that for each = 1, ..., d the “matrix unfolding” A() to A0 ∈ T R , can be represented as a rank-R matrix. In fact, it is a direct consequence of representation (3). Step I of the corresponding algorithm then can be designed similar to those in OALSA (C R → T r ). The resulting algorithm reads as follows.
B.N. Khoromskij et al. / Central European Journal of Mathematics 5(3) 2007 523–550
539
OALSA (T R → T r ) Given the input tensor A0 ∈ T R in the form (3). Step I: For each = 1, ..., d compute 1. Components A and B in representation A() = A BT , and perform () () () () 2. QR-decomposition of A and B in the form, A = QA RA , B = QB RB ; () T
()
3. SVD of the matrix S = RA RB ∈ RR×R in the form S = W D V ; ˜ D ˜ ∈ RR×r ; ˜ V˜ with W 4. Truncation of S to a rank r matrix W () ˜ 5. Compute the Tucker components U() = QA W . () Starting with initial values U , = 1, ..., d, proceed with Steps II and III as in the general version of OALSA.
2.4.3 Combination of the Tucker and CP Formats In some quantum chemistry applications the target tensor may contain Tucker components having different scales of amplitudes and decay rates. In this case we can observe non-stable convergence of the ALS iteration, likely, due to large variation in elements of the core tensor. Similar to the matrix case, we call this phenomenon as ill-conditioning of a tensor. In such cases, to stabilize the convergence of the ALS iteration (without destroying the approximation power), we introduce the mixed model denoted by M(Cr1 ,T r2 ) . We say that A ∈ M(Cr1 ,T r2 ) if A := A1 + A2
with A1 ∈ C r1 , A2 ∈ T r2 .
The above format can be interpreted as a particular case of the recently proposed blockterm decomposition, to appear in [8] (see remark in Introduction). We assume that the dominating component in A can be well approximated via the CP model, A1 ∈ C r1 , while the residual (which is better conditioned) can be further approximated in the Tucker format, A2 ∈ T r2 . Tucker decomposition, AR=10, n = 65
0
0
combined CT decomposition , AR=10, n = 65
10
10
−2
10
−4
−5
10
−6
error
error
10
10
−8
10
−10
10
E
FN
−10
10
E
EFN
FE
E
E
FE
−12
C
EC
10
2
4
6
8
10
Tucker rank
12
14
16
2
4
6
8 10 12 Tucker rank
14
16
18
Fig. 2 Convergence history for the Tucker-type (left) and mixed CT (right) approximations for the tri-Slater potential (25).
540
B.N. Khoromskij et al. / Central European Journal of Mathematics 5(3) 2007 523–550
To approximate the given tensor A0 by the M(Cr1 ,T r2 ) model, we apply the following iterative (heuristic) algorithm: Given the input tensor A0 ∈ Rn1 ×...×nd , 1. Compute its CP approximation C0 in C r1 ; 2. Compute the best rank-r2 Tucker-type decomposition T0 of A0 − C0 with T0 ∈ T r2 ; 3. If A0 − C0 − T0 ≤ εtol then stop, otherwise go to Step 1 with A0 substituted by A0 − T0 . This algorithm demonstrated robust convergence in a number of numerical tests. We give numerical example for the mixed tri-linear approximation of the FGT corresponding to g(x) = C1 e−α1 |x| + C2 e−α2 |x−x2| + C3 e−α3 |x−x3 | (25) in [−AR, AR]3 ⊂ R3 with 0 < C2 , C3 C1 , 0 < α2 , α3 α1 . This function has similar features with the case of H2 O-molecule. For this example, the first component can be well approximated via the rank-1 tensor, hence we use A1 ∈ C 1 . Fig. 2 corresponds to the choice AR = 10, C1 = 150, C2 = 30, C3 = 20, α1 = 5000, α2 = α3 = 1 and x2 = (AR/6, 0, AR/5); x3 = (AR/6, 0, −AR/5). The corresponding notations are described in §3.1. Pictures illustrate a stable exponential convergence in the Tucker rank for the mixed CT model, which allows to achieve a given accuracy up to the machine precision. Possible explanation of instabilities in the Tucker convergence curve is due to appearance of very small component at the last position which becomes meaningless. Clearly, the format M(Cr1 ,T r2 ) can be applied successively to each of the components in the Tucker format T r2 . Moreover, the complexity of the Tucker-type representation with fixed rank r can be reduced by computing decomposition in the form M(T r1 ,T r2 ) with the rank parameters smaller than r, say with r1 = r2 = r/2. The simple iteration like described above has shown robust convergence.
3
Application to Classical Potentials
3.1 Introducing Notations Here we discuss the best rank-r Tucker-type decomposition of 3D tensors arising as dis cretization of classical potentials. Let d ∈ Rd be a uniform or adaptively refined tensorproduct grid indexed by I1 × . . . × Id . For a given function g : Ω → R, with Ω ⊂ Rd and with d ∈ Ω, we introduce the collocation-type function-generated tensor (FGT) of order d by (1)
(d)
A0 ≡ A(g) := [ai1 ...id ] ∈ RI1 ×...×Id with ai1 ...id := g(xi1 , . . . , xid ), (1) (d) where (xi1 , . . . , xid ) ∈ d ∈ Rd are grid collocation points. Such an approximation is known as the Nystr¨om discretization. We are interested in the validity and the rankdependence of the above tensor decomposition algorithms for approximating the 3D FGT generated by the Newton potential, Slater-type functions, the Yukawa and the Helmholtz potentials.
B.N. Khoromskij et al. / Central European Journal of Mathematics 5(3) 2007 523–550
541
The initial tensor A0 is decomposed by the Tucker model of the rank r = (r, ..., r), where the rank-parameter r increases from r = 1, 2, ... to some predefined value. Tucker components and the core tensor of the size r ×r ×r are then applied for the reconstruction of the approximating tensor A(r) ≈ A0 , which is used for estimating approximation properties of the tensor decomposition with the given rank. For every rank r Tucker decomposition, we compute the relative energy-norm (Frobenius norm) as in (1) EF N = the relative 2 -energy EF E =
A0 − A(r) , ||A0 ||
A0 − A(r) , ||A0 ||
as well as the maximum (Chebyshev) norm EC :=
maxi∈I |a0,i − ar,i| . maxi∈I |a0,i |
3.2 Newton Potential We apply the best rank-r Tucker-type decomposition algorithm with r = (r, ..., r) for approximating the Newton potential g(x) =
1 , |x|
x ∈ R3 ,
in the cube [0, AR]3 (AR = 10) on the cell-centered uniform grid with n = 64. Here and in the following |x| denotes the Euclidean norm of x ∈ Rd . Fig. 3 shows the convergence of the relative energy- and Chebyshev norms as well as of the relative energy with respect to (1) the Tucker rank up to r = 12. The orthogonal components Uk are given for k = 1, . . . , 6. It is clearly seen the exponential convergence in the Tucker rank r. Orthogonal components, r=6
Newton potential, AR=10, n = 64
0
10
0.8 Newton , AR=10, n = 64
0.6
error
0.4 0.2
−5
10
0 E
−0.2
FN
E
FE
−10
−0.4
E
10
C
2
4
6 8 Tucker rank
10
12
−0.6 0
10
20
30
40
grid points
(1)
50
60
Fig. 3 Convergence history (left) and the Tucker components Uk , k = 1, . . . , 6, (right) for the Tucker-type approximation to the Newton potential. The particular EC -error distribution is represented in Fig. (4).
542
B.N. Khoromskij et al. / Central European Journal of Mathematics 5(3) 2007 523–550 Initial Newton potential, z=1
Reconstrruction error for the Newton potentia, z=1 −5
x 10 1.5
8
1
6 4
0.5 0
2
0
0 0 0
20
40
60
0
50
20
40
60
50
Fig. 4 A plane section of the 3D Newton potential on the level n × n × 1: The initial function (left), the EC -error for the same z-coordinate level with r = 12 (right).
3.3 Functions Related to the Hartree-Fock Equation The Hartree-Fock equation for the N-electrons density function reads as the system of nonlinear eigenvalue problems F φi (x) = λi φi (x),
for i = 1, ..., N/2
(26)
with x ∈ R3 . Here the solution-dependent Fock operator is defined by 1 F φ(x) := − Δφ(x) − Vc (x) φ(x) + (J φ) (x) + (Kφ) (x), 2
x ∈ R3 ,
where the given density matrix ρ(x, y), the Hartree and exchange potentials are defined by ρ(x, y) 1 3 ρ(y, y) φ(x), (Kφ) (x) := − d3 y φ(y), (J φ) (x) := d y |x − y| 2 |x − y| correspondingly. Furthermore, the nuclei potential is given by Vc (x) =
K a=1
Za , |x−Ra |
where
Ra and Za describe the positions and charges of nuclei, respectively. With given eigenfunctions φi , the density matrix is defined by the corresponding spectral projection ρ(x, y) = 2
N/2
φ∗i (x)φi (y)
i=1
with exponential decay ρ(x, y) ∼ exp(−λ|x − y|) for nonmetallic systems. The problem consists of a tensor-product representation of the density functions ρ(x, y) and ρ(x, x) as well as of the Hartree and exchange potentials involved. Along with the Newton potential discussed above, in the following we consider some simple examples of the density function. Notice that computation of the Hartree potential can be performed via direct evaluation of the convolution product between the Newton potential and density ρ(x, x) represented in the Tucker/CP format (cf. §2.2.4). In this case the Newton potential is supposed to be approximated by the CP model (see [14],[20] for more details). The Slater function given by g(x) = exp(−α|x|) with x = (x1 , x2 , x3 )T ∈ R3
B.N. Khoromskij et al. / Central European Journal of Mathematics 5(3) 2007 523–550 Orthogonal components, r=6
Slater potential, AR=10, n = 64
0
543
10
0.5 Slater , AR=10, n = 64
error
−5
10
0 EFN EFE
−10
10
E
C
2
4
6 8 Tucker rank
10
12
−0.5 0
10
20
30
40
grid points
50
60
Fig. 5 Convergence history and Tucker components for the Slater potential. presents the electron “orbital” (α = 1) and the electron density function (α = 2) corresponding to the Hydrogen atom. In this case g(x) satisfies the one-particle Hartree-Fock equation which takes the form 1 1 φ(x) = λφ(x), − Δφ(x) − 2 |x|
x ∈ R3 ,
φ ∈ H 1 (R3 ).
We apply the rank-(r, r, r) Tucker-type decomposition to the FGT defined on the grid 3 with AR = 10. Fig. 6 presents the slices of the 9 × 9 × 9 core tensor, where the numbers indicate the maximum values at the given slice of B. Figure shows that the energy of the decomposed function is concentrated in several upper slices of the core tensor. It exposes the potential data compression abilities of the Tucker approximation due to the sparsity of thresholded B. However, our numerical experiments show that the dominating entries in B are compactly concentrated in its ”upper left corner” which indicates that the core tensor truncation may have, in fact, the similar effect as just a decomposition with a smaller Tucker rank. Slater potential, AR=10, n = 64 1.0e+01
7.8e−01
erf−function potential, AR=10, n = 64
0
10
1.4e−01
−5
3.2e−02
9.1e−03
2.8e−03
error
10
−10
10 8.1e−05 8.8e−04
EFN EFE
2.7e−04
EC 1
2
3
4 5 Tucker rank
6
7
8
Fig. 6 Slices of the 9 × 9 × 9 core tensor for the Tucker-type approximation of the Slater potential (left). Numbers indicate maximum values in B for the given slice. Approximation error for the function erf(x) (right). |x| 1 Since the Hartree-Fock equation contains the product |x| φ(x) with φ(x) = e−|x| (the so-called Yukawa potential) it is interesting to compute the tensor decomposition of the Hadamard product of the discrete Newton and Slater potentials. The convergence results
544
B.N. Khoromskij et al. / Central European Journal of Mathematics 5(3) 2007 523–550
for the Tucker approximation is presented in Fig. 9. Next, we consider a radiallysymmetric potential generated by the modified erf-function given by
g(x) =
erf(|x|) |x|
with x = (x1 , x2 , x3 )T ∈ R3 ,
which frequently arises in quantum chemistry computations. Here we define 2 erf(z) := √ π
z
2
e−t dt,
0
z ≥ 0.
Behavior of the approximation errors for the trilinear Tucker-type decomposition of erffunction is shown in Fig. 6, right. Computations were performed on the n×n×n grid with n = 64, for the interval AR = 10. Rank-r approximation exhibits a good convergence rate already for the Tucker rank r = 8, the corresponding Frobenius norm error is of the order 10−6 .
0
Slater−multi potential, AR=10, n = 32
10
0.8
E(FN) E(FE) E(C)
0.6 0.4 −5
10 error
0.2 0 0 5
−10
10
0
15
10
10
20
20
25
30
30
2
40 35
4
6 8 Tucker rank
10
12
Fig. 7 A slice n×n×2 of a 3D multi-centered Slater potential (left) and the corresponding approximation error vs. the Tucker rank (right). Finally, we analyze the “multi-centered Slater potential“ obtained by displacing a single Slater potential with respect to the m × m × m spatial grid of size H > 0 with randomly perturbed centres,
g(x) =
m m m
√ 2 2 2 e−α (x1 −iH) +(x2 −jH) +(x3 −kH) .
i=1 j=1 k=1
Fig. 7 shows the multi-centered Slater potential for m = 4, H = 3, α = 2 and the corresponding approximation error (nonperturbed case) in the cube [0, AR]3 with AR = 10 on the n × n × n grid with n = 64, the surface level corresponds to n × n × 2. Results in the case of randomly perturbed Slater potential are given in Fig. 8.
B.N. Khoromskij et al. / Central European Journal of Mathematics 5(3) 2007 523–550 Slater−Mult−Rand 1% , AR=10, n = 64
0
Slater−Mult−Rand 0.1% , AR=10, n = 64
0
10
Slater−Mult−Rand 0.01% , AR=10, n = 64
0
10
545
10
−1
10
−1
10
−2
−2
relative energy−norm
error −3
10
10
−4
10
−5
−4
10
−6
10
10
relative energy
−4
10
relative energy−norm
−3
error
10
error
−2
10
relative energy−norm
10
relative energy
−6
10
relative energy −8
10 1
1.5
2
2.5
Tucker rank
3
3.5
4
1
2
3
4
5
Tucker rank
6
1
2
3
4
Tucker rank
5
6
Fig. 8 Convergence history for the multi-centered randomly perturbed Slater potential.
3.4 Yukawa and Helmholtz Potentials In the next example, we consider a trilinear Tucker-type approximation of the third-order function-related tensor generated by the Yukawa potential e−|x| g(x) = |x|
with x = (x1 , x2 , x3 )T ∈ R3 .
We consider the FGT with cell-centered collocation points with respect to the n × n × ngrid over [0, AR]3 with AR = 10. Fig. 9 shows the convergence history and the orthogonal Orthogonal components, r=6
Yukawa potential, AR=10, n = 64
0
10
0.6 Yukawa , AR=10, n = 64
0.4 0.2 error
−5
10
0 −0.2 EFN
−0.4
EFE
−10
10
−0.6
EC 2
4
6 8 Tucker rank
10
12
−0.8 0
10
20
30
40
grid points
50
60
Fig. 9 Tucker-type approximation of the Yukawa potential and example of the Tucker components. components for the Tucker-type decomposition of the Yukawa potential given on uniform grids with n = 32, 64, 128, and with the core tensor of size 6 × 6 × 6. These components represent the “optimal”adaptive basis which has a tendency to reproduce shapes similar to the Sinc-functions. In almost all cases the ALS method was terminated after 5 iterations indicating robust convergence in the considered applications. Fig. 10 provides computational results for the Helmholtz function given by g(x) =
cos |x| |x|
with x = (x1 , x2 , x3 )T ∈ R3 .
546
B.N. Khoromskij et al. / Central European Journal of Mathematics 5(3) 2007 523–550
0
Orthogonal components, r=6
Helmholz potential, AR=6.2832, n = 64
10
0.6 Helmholz , AR=10, n = 64
0.4
−2
10
0.2 error
−4
10
0 −0.2
−6
10
EFN
−0.4
EFE
−8
10
−0.6
E
C
2
4
6 8 Tucker rank
10
−0.8 0
12
10
20
30
40
grid points
50
60
Fig. 10 Convergence history (left) and orthogonal components Uk , k = 1, . . . , 6, (right) for the Tucker-type approximation of the Helmholtz potential.
4
Appendix
4.1 Features of the Dual Maximisation Problem ()
()
()
For given components V() = [V1 V2 ...Vr ], we define the “single hole” tensor T
T
T
B(¬m) = A0 ×1 V(1) ... ×m−1 V(m−1) ×m+1 V(m+1) ... ×d V(d)
T
and let B(¬m) ∈ Rnm ×rm be the corresponding matrix representation, where rm = r1 · · · rm−1 rm+1 · · · rd . The following lemma reduces the minimisation of the original quadratic functional to the dual maximization problem thus eliminating the core tensor B from the solution process. Lemma 4.1. ([5]) For given A0 ∈ RI1 ×...×Id , the minimization problem (12) on T r is equivalent to the dual maximization problem 2 (1) (d) (1) T (d) T g(V , ..., V ) := A0 ×1 V ×2 ... ×d V (27) → max over a set V() ∈ R|I |×r from the Stiefel manifold, i.e., V() ∈ V ( = 1, ..., d). For given matrices V(m) (m = 1, ..., d), the tensor B minimizing (12) is represented by T
T
B = A0 ×1 V(1) ×2 ... ×d V(d) ∈ Rr1 ×...×rd .
(28)
The following lemma provides the explicit Lagrange equations for the dual maximization problem. Lemma 4.2. ([20]) The problem (27) has at least one global maximum. At each extremal point the corresponding Lagrange equations read as T
T
2(I − V(m) V(m) ) · B(¬m) · B(¬m) · V(m) = 0 (1 ≤ m ≤ d).
(29)
Under the compatibility condition rm ≤ r m (1 ≤ m ≤ d) equation (29) is solvable for any m = 1, ..., d.
B.N. Khoromskij et al. / Central European Journal of Mathematics 5(3) 2007 523–550
547
It is readily seen that in the case of a rank-1 approximation (i.e., r = (1, ..., 1)) the system of a Lagrange equations (29) combined with (28) can be written in the form T
T
T
T
A0 ×1 V (1) ... ×m−1 V (m−1) ×m+1 V (m+1) ... ×d V (d) = b1 V (m) , T
(30)
T
A0 ×1 V (1) ×2 ... ×d V (d) = b1 , V (m) = 1 (1 ≤ m ≤ d) with b1 ∈ R (cf. [5]). These equations explicitly represent the numerical scheme of the ALS iteration for computing the best rank-1 approximation.
4.2 Canonical Tensor Decomposition The CP model is a simplified version of a general Tucker decomposition (3) defined by A(r) =
r
(1)
bk ×1 Vk
(d)
×2 ... ×d Vk
≈ A,
bk ∈ C,
(31)
k=1 ()
where the Kronecker factors Vk ∈ CI are unit-norm vectors. Indeed, the decomposition (31) can be viewed as a special case of the Tucker model (3), where r = r1 = ... = rd and bk1 ...kd = 0 unless k1 = k2 = ... = kd , i.e., only the super-diagonal of B = {bk } is non-zero. The trilinear CP-decomposition is visualized in Fig. 11. The minimal number r in (3)
(3)
(3)
V2
V1
Vr
b2
b1
br (2)
(2)
V1 +
=
(2)
V2 + ... +
Vr
A (1)
V1
(1)
(1)
V2
Vr
Fig. 11 Visualization of the CP-decomposition for a third-order tensor. the representation (31) is called the Kronecker rank (or just rank) of a given tensor A(r) . Under moderate assumptions, the CP decomposition with rank r is unique [7, 22, 26, 27]. We denote by C r the set of component-wise normalized tensors parameterized by (31). Given A0 ∈ RI1 ×...×Id , its CP approximation can be derived by minimization of the quadratic cost functional f (A) := A − A0 2 → min (32) over all rank-r tensors A ∈ C r . In the case of orthogonally decomposable tensors simple methods to construct the CP approximation which avoid solving the minimization problem (32) can be based on greedy algorithms (cf. [30]).
4.3 Rayleigh Quotient Approximation For a given symmetric matrix A ∈ V = Rn×n , the so-called Rayleigh quotient R(u) :=
Au, u , u, u
u ∈ Rn
548
B.N. Khoromskij et al. / Central European Journal of Mathematics 5(3) 2007 523–550
is known to have the fundamental property, that λ = min R(u) and v = argmin R(u) u∈V, u=0
u∈V, u=0
(33)
appear to be the minimal eigenvalue and the corresponding eigenvector of A, Av = λv. Proposition 4.3. Assume that we have an approximation to λ (resp. to v) via minimization on a certain subspace Vr ⊂ V with dim(Vr ) = r < n, λr =
min R(u) and vr = argmin R(u).
u∈Vr , u=0
u∈Vr , u=0
(34)
Then one obtains the quadratic error estimate for the eigenvalue λr λr − λ ≤ A − λI2 v − vr 2 .
(35)
Proof. The proof is instructive. Supposing that v, v = vr , vr = 1, we obtain (cf. [29]) A(v − vr ), v − vr = Av, v − 2Av, vr + Avr , vr = λ − 2v, vr + λr = λ(2 − 2v, vr ) + λr − λ = λv − vr , v − vr + λr − λ, which implies λr − λ = (A − λI)(v − vr ), v − vr . Then (35) follows.
(36)
Acknowledgements. Numerous helpful suggestions by Prof. W. Hackbusch are gratefully acknowledged. The authors are grateful to the referees for valuable comments which have led to substantial improvement of the manuscript.
References [1] B.W. Bader and T.G. Kolda: MATLAB tensor classes for fast algorithm prototyping. SANDIA Report, SAND2004-5187, Sandia National Laboratories, 2004. [2] G. Beylkin and M. M. Mohlenkamp: “Numerical operator calculus in higher dimensions”, PNAS, Vol. 99, (2002), pp. 10246–10251. [3] J.D. Carrol and J. Chang: “Analysis of individual differences in multidimensional scaling via an N-way generalization of ’Eckart-Young’ decomposition”, Psychometrika, Vol. 35, (1970), pp. 283–319. [4] L. De Lathauwer, B. De Moorand J. Vandewalle: “A multilinear singular value decomposition”, SIAM J. Matrix Anal. Appl., Vol. 21, (2000), pp. 1253–1278.
B.N. Khoromskij et al. / Central European Journal of Mathematics 5(3) 2007 523–550
549
[5] L. De Lathauwer, B. De Moor and J. Vandewalle: “On the best rank-1 and rank(R1 , ..., RN ) approximation of higher-order tensors”, SIAM J. Matrix Anal. Appl., Vol. 21, (2000) pp. 1324–1342. [6] L. De Lathauwer, B. De Moor and J. Vandewalle: “Computation of the canonical decomposition by means of a simultaneous generalised Schur decomposition”, SIAM J. Matrix Anal. Appl., Vol. 26, (2004) pp. 295–327. [7] L. De Lathauwer: “A link between the canonical decomposition in multilinear algebra and simultaneous matrix diagonalisation”, SIAM J. Matrix Anal. Appl., Vol. 28, (2006), pp. 642–666. [8] L. De Lathauwer: Decomposition of a higher-order tensor in block terms. Part II: Definitions and uniqueness. Tech. Report no. 07-81, ESAT/SCD/SISTA, K.U. Leuven, Belgium, 2007. [9] H.-J. Flad, W. Hackbusch, B.N. Khoromskij and R. Schneider: Concept of datasparse tensor-product approximation in many-particle models, Leipzig-Kiel, 2006 (in preparation). [10] I.P. Gavrilyuk, W. Hackbusch and B.N. Khoromskij: “Tensor-product approximation to elliptic and parabolic solution operators in higher dimensions”, Computing, Vol. 74, (2005), pp. 131–157. [11] G.H. Golub and C.F. Van Loan: Matrix Computations, Johns Hopkins University Press, Baltimore, MD, 1996. [12] W. Hackbusch: “Fast and exact projected convolution for non-equidistant grids”, Preprint: 102, MPI MIS, Leipzig 2006. [13] W. Hackbusch and B.N. Khoromskij: “Low-rank Kronecker product approximation to multi-dimensional nonlocal operators. Part I. Separable approximation of multivariate functions”, Computing, Vol. 76, (2006), pp. 177–202. [14] W. Hackbusch and B.N. Khoromskij: “Low-rank Kronecker product approximation to multi-dimensional nonlocal operators. Part II. HKT representations of certain operators”, Computing, Vol. 76, (2006), pp. 203–225. [15] W. Hackbusch, B.N. Khoromskij and E.E. Tyrtyshnikov: “Hierarchical Kronecker tensor-product approximations”, J. Numer. Math., Vol. 13, (2005), pp. 119–156. [16] W. Hackbusch, B.N. Khoromskij and E.E. Tyrtyshnikov: “Approximate Iterations for Structured Matrices”, Preprint: 112, MPI MIS, Leipzig 2005 (Numer. Math., submitted). [17] R. Harshman: “Foundation of the PARAFAC procedure: Model and conditions for an ”explanatory” multi-mode factor analysis”, UCLA Working Papers in Phonetics, Vol. 16, (1970), pp. 1–84. [18] B.N. Khoromskij: “Structured data-sparse approximation to high order tensors arising from the deterministic Boltzmann equation”, Math. Comp., Vol. 76, (2007), pp. 1275–1290. [19] B.N. Khoromskij: “An introduction to structured tensor-product representation of discrete nonlocal operators”, Lecture Notes MPI MIS Leipzig, Vol. 27, (2005). [20] B.N. Khoromskij: “Structured rank-(r1 , ..., rd ) decomposition of function-related ten-
550
[21] [22]
[23]
[24] [25]
[26] [27] [28] [29] [30] [31] [32]
[33]
B.N. Khoromskij et al. / Central European Journal of Mathematics 5(3) 2007 523–550
sors in Rd ”, Comp. Meth. in Applied Math., Vol. 6, (2006), pp. 194–220. T. Kolda: “Orthogonal tensor decompositions” SIAM J. Matrix Anal. Appl., Vol. 23, (2001), pp. 243–255. J.B. Kruskal: “Three-way arrays: rank and uniqueness of trilinear decompositions, with applications to arithmetic complexity and statistics”, Linear Algebra Appl., Vol. 18, (1977), pp. 95–138. P.M. Kroonenberg and J. De Leeuw: “Principal component analysis of three-mode data by means of alternating least squares algorithms”, Psychometrika, Vol. 45, (1980), pp. 69–97. Ch. Lubich: “On variational approximations in quantum molecular dynamics”, Math. Comp. Vol. 74, (2005), pp. 765–779. I.V. Oseledets, D.V. Savostianov, and E.E. Tyrtyshnikov: “Tucker dimensionality reduction of three-dimensional arrays in linear time”, SIAM J. Matrix Anal. Appl., 2007 (to appear). N.D. Sidiropoulos and R. Bro: “On the uniqueness of multilinear decomposition of N-way arrays”, Journal of Chemometrics, Vol. 14, (2000), 229–239. A. Stegeman and N.D. Sidiropoulos: “On Kruskal’s uniqueness condition for the Candecomp/Parafac decomposition”, Lin. Alg. Appl., Vol. 420, (2007), pp. 540–552. A. Smilde, R. Broa and P. Geladi: Multi-way Analysis, Wiley, 2004. J. Strang and G.J. Fix: An Analysis of the Finite Element Method, Prentice-Hall, inc. N. J., 1973. V.N. Temlyakov: “Greedy Algorithms and M-Term Approximation with Regard to Redundant Dictionaries”, J. of Approx. Theory, Vol. 98, (1999), pp. 117–145. L.R. Tucker: “Some mathematical notes on three-mode factor analysis”, Psychometrika, Vol. 31, (1966), pp. 279–311. E.E. Tyrtyshnikov: “Tensor approximations of matrices generated by asymptotically smooth functions”, Sb. Math+., Vol. 194, (2003), pp. 941–954 (translated from Mat. Sb., Vol. 194, (2003), pp. 146–160). T. Zang and G. Golub: “Rank-0ne approximation to high order tensors”, SIAM J. Matrix Anal. Appl., Vol. 23, (2001), pp. 534–550.
DOI: 10.2478/s11533-007-0013-5 Research article CEJM 5(3) 2007 551–580
Strengthened Moser’s conjecture, geometry of Grunsky coefficients and Fredholm eigenvalues∗ Samuel Krushkal Department of Mathematics, Bar-Ilan University, 52900 Ramat-Gan, Israel and Department of Mathematical Sciences, University of Memphis, Memphis, TN 38152, USA
Received 31 July 2006 ; accepted 3 April, 2007 Abstract: The Grunsky and Teichm¨ uller norms κ(f ) and k(f ) of a holomorphic univalent function f are related by κ(f ) ≤ k(f ). in a finitely connected domain D ∞ with quasiconformal extension to C In 1985, J¨ urgen Moser conjectured that any univalent function in the disk Δ∗ = {z : |z| > 1} can be approximated locally uniformly by functions with κ(f ) < k(f ). This conjecture has been recently proved by R. K¨ uhnau and the author. In this paper, we prove that approximation is possible in a stronger sense, namely, in the norm on the space of Schwarzian derivatives. Applications of this result to Fredholm eigenvalues are given. We also solve the old K¨ uhnau problem on an exact lower bound in the inverse inequality estimating k(f ) by κ(f ), and in the related Ahlfors inequality. c Versita Warsaw and Springer-Verlag Berlin Heidelberg. All rights reserved. Keywords: quasiconformal, univalent function, Grunsky coefficient inequalities, universal Teichm¨ uller space, subharmonic function, Strebel’s point, Kobayashi metric, generalized Gaussian curvature, holomorphic curvature, Fredholm eigenvalues. MSC (2000): 30C35, 30C62, 32G15,30F60, 32F45, 53A35
1
Introduction and main results
1.1 Grunsky inequalities and Moser’s conjecture The classical Grunsky theorem states that a holomorphic function f (z) = z+const +O(z −1 ) in a neighborhood U0 of z = ∞ can be extended to a univalent holomorphic function on ∗
To Reiner K¨ uhnau on his 70th birthday
552
S. Krushkal / Central European Journal of Mathematics 5(3) 2007 551–580
the disk
= C ∪ {∞} : |z| > 1} Δ∗ = {z ∈ C
if and only if its Grunsky coefficients αmn satisfy the inequalities ∞ √ mn α x x mn m n ≤ 1,
(1.1)
m,n=1
where αmn are defined by ∞ f (z) − f (ζ) =− αmn z −m ζ −n , log z−ζ m,n=1
(z, ζ) ∈ (Δ∗ )2 ,
x = (xn ) runs over the unit sphere S(l2 ) of the Hilbert space l2 with x 2 =
(1.2) ∞ 1
|xn |2 , and
the principal branch of the logarithmic function is chosen (cf. [1]). The quantity ∞ √ mn αmn xm xn : x = (xn ) ∈ S(l2 ) κ(f ) := sup
(1.3)
m,n=1
is called the Grunsky constant (or Grunsky norm) of f . Let Σ denote the collection of all univalent holomorphic functions \ {0}, f (z) = z + b0 + b1 z −1 + · · · : Δ∗ → C
(1.4)
and let Σ(k) be its subset of the functions with k-quasiconformal extensions to the unit disk Δ = {|z| < 1} so that f (0) = 0. Put Σ0 = k Σ(k). This collection closely relates to universal Teichmu ¨ ller space T modelled as a bounded domain in the Banach space B of holomorphic functions in Δ∗ with norm
ϕ B = sup(|z|2 − 1)2 |ϕ(z)|. Δ∗
(1.5)
All ϕ ∈ B can be regarded as the Schwarzian derivatives Sf = (f /f ) − (f /f )2 /2 of locally univalent holomorphic functions in Δ∗ . The points of T represent the functions f ∈ Σ0 whose minimal dilatation k(f ) := inf{k(w μ ) = μ ∞ : w μ |∂Δ∗ = f } determines the Teichm¨ uller metric on T. Here w μ denotes a quasiconformal homeomor with the Beltrami coefficient phism of C μ(z) = ∂z w/∂z w,
(1.6)
μ ∞ = ess supC |μ(z)| < 1.
(1.7)
and
S. Krushkal / Central European Journal of Mathematics 5(3) 2007 551–580
553
Grunsky’s theorem was strengthened for the functions with quasiconformal extensions by several authors. The basic results obtained by K¨ uhnau, Pommerenke and Zhuravlev 0 are as follows: for any f ∈ Σ , we have the inequality κ(f ) ≤ k(f );
(1.8)
on the other hand, if a function f ∈ Σ satisfies the inequality κ(f ) < k with some constant with a dilatation k1 = k1 (k) ≥ k (see k < 1, then f has a quasiconformal extension to C [2–4] and [5], pp. 82–84). An explicit bound k1 (k) is given in [6] (see Section 5). A crucial point is that for a generic function f ∈ Σ0 , we have in (1.8) the strict inequality κ(f ) < k(f ) (1.9) (see, e.g., [7–10]). The functions, for which equality holds in (1.8), play an important role in applications of the Grunsky inequality technique. A characterization of such functions is presented below. In 1985, J¨ urgen Moser conjectured that the set of functions with κ(f ) = k(f )
(1.10)
is rather sparse in Σ0 so that any function f ∈ Σ is approximated by functions satisfying (1.9) uniformly on compact sets in Δ∗ . This conjecture was recently proved in [8]. A related conjecture, posed in [8] and still remaining open, states that f ∈ Σ0 satisfying (1.9) cannot be the limit functions of locally uniformly convergent sequences {fn } ⊂ Σ0 with κ(fn ) = k(fn ). This was proved in [11] under the assumption that approximating maps fn are asymptotically conformal on the unit circle S 1 = ∂Δ∗ .
1.2 Main theorem Uniform convergence on compact sets is natural and sufficient in many problems of geometric complex analysis. In applications of Schwarzian derivatives, especially to Teichm¨ uller spaces, one has to use the strong topology defined by the norm (1.5). A question is: How sparse is the set of derivatives ϕ = Sf in T representing the maps with the property (1.10)? Our first main result answers this question strengthening Moser’s conjecture. Theorem 1.1. The set of points ϕ = Sf , which represent the maps f ∈ Σ0 with κ(f ) < k(f ), is open and dense in the space T. Openness follows from continuity of both quantities κ(f ) and k(f ) as functions of the Schwarzian derivatives Sf on T (cf. [12]). The main part of the proof concerns the density. The proof involves the density of Strebel points in T and relies on curvature properties of certain Finsler metrics on this space.
554
S. Krushkal / Central European Journal of Mathematics 5(3) 2007 551–580
1.3 Application to Fredholm eigenvalues are the eigenvalues The Fredholm eigenvalues ρn of a smooth closed Jordan curve L ⊂ C of its double-layer potential, i.e., of the integral equation ρ ∂ 1 dsζ = h(z), u(z) + u(ζ) log π ∂nζ |ζ − z| L
which often appears in applications (see e.g. [5, 13–18] and the references cited there). These values are intrinsically connected with the Grunsky coefficients of the corresponding conformal maps. This is qualitatively expressed by the remarkable K¨ uhnauSchiffer theorem on reciprocity of κ(f ) to the least positive Fredholm eigenvalue ρL . It by is defined for any oriented closed Jordan curve L ⊂ C |DG (u) − DG∗ (u)| 1 , = sup ρL DG (u) + DG∗ (u) where G and G∗ are, respectively, the interior and exterior of L; D denotes the Dirichlet and harmonic integral, and the supremum is taken over all functions u continuous on C ∗ on G ∪ G (cf. [10, 16]). In general, there is only a rough estimate for ρL by Ahlfors’ inequality 1 ≤ qL , ρL
(1.11)
where qL is the minimal dilatation of quasiconformal reflections across L, (that is, of the preserving L point-wise); see, orientation reversing quasiconformal homeomorphisms of C e.g., [15, 19, 20]. As a consequence of Theorem 1.1, we have Theorem 1.2. The set of quasiconformal curves L, for which Ahlfors’ inequality (1.11) is satisfied in the strict form 1/ρL < qL , is open and dense in the strongest topology determined by the norm (1.5). Proof. Since all quantities in (1.11) are invariant under the action of the M¨obius group P SL(2, C)/ ± 1, it suffices to exploit quasiconformal homeomorphisms f of the sphere C 1 ∗ carrying the unit circle S = ∂Δ onto L whose Beltrami coefficients μf (z) = ∂z¯f /∂z f have support in the unit disk Δ and which are normalized via (1.4), i.e., with restrictions f |Δ∗ ∈ Σ0 . Then the reflection coefficient qL equals the minimal dilatation k(w μ ) = μ ∞ and Theorem 1.2 immediately follows from of quasiconformal extensions w μ of f |Δ∗ to C, Theorem 1.1. In the last section we solve K¨ uhnau’s problem related to comparing the Teichm¨ uller and Grunsky norms and establish the sharp lower bound in the inverse inequalities estimating k(f ) by κ(f ) or ρL . This important problem has been open since 1981.
S. Krushkal / Central European Journal of Mathematics 5(3) 2007 551–580
2
555
Preliminaries
We briefly present here certain underlying results needed for the proof of Theorem 1.1. The exposition is adapted to our special case.
2.1 Frame maps and Strebel points For a map f μ ∈ Σ0 , we denote by [f μ ] its equivalence class containing the maps f ν ∈ Σ0 which coincide with f μ on the unit circle S 1 (and hence on Δ∗ ). Let f0 := f μ0 be an extremal representative of its class [f0 ] with dilatation k(f0 ) = μ0 ∞ = inf{k(f μ ) : f μ |S 1 = f0 |S 1 } = k, and assume that there exists in this class a quasiconformal map f1 whose Beltrami coefficient μf1 satisfies the strong inequality ess supAr |μf1 (z)| < k in some annulus Ar := {z : r < |z| < 1}. Then f1 is called a frame map for the class [f0 ] and the corresponding point of the space T is called a Strebel point. The following two results are fundamental in the theory of extremal quasiconformal maps and Teichm¨ uller spaces. Proposition 2.1. (cf. [21]) If a class [f ] has a frame map, then the extremal map f0 in this class is unique and either conformal or a Teichm¨ uller map with Beltrami coefficient μ0 = k|ψ0 |/ψ0 on Δ, defined by an integrable holomorphic quadratic differential ψ on Δ and a constant k ∈ (0, 1). This holds, in particular, if the curves f (S 1) are asymptotically conformal, which includes all smooth curves. Proposition 2.2. (cf. [22]) The set of Strebel points is open and dense in T. The proof of this fact relies on the following lemma, which also will be used in the proof of Theorem 1.1. Lemma 2.3. (cf. [22]) Suppose f0 with Beltrami coefficient μ0 is extremal in its class. Fix a number between 0 and 1, and take an increasing sequence {rn }∞ 1 with 0 < rn < 1 approaching 1. Put
if |z| < rn , μ0 (z) μn (z) = (2.1) (1 − )μ0 (z) otherwise, and let fn be a quasiconformal map with Beltrami coefficient μn . Then, for sufficiently large n, fn is a frame map for its class, and the dilatation kn of the extremal map in the class of fn approaches k0 = k(f0 ).
556
S. Krushkal / Central European Journal of Mathematics 5(3) 2007 551–580
These notions and results are extended in [22] to arbitrary Riemann surfaces; see also [36].
2.2 Basic Finsler metrics on universal Teichmu ¨ ller space The universal Teichm¨ uller space T is the space of quasisymmetric homeomorphisms of 1 the unit circle S = ∂Δ factorized by M¨obius maps. The canonical complex Banach structure on T is defined by factorization of the ball of Beltrami coefficients Belt(Δ)1 = {μ ∈ L∞ (C) : μ|Δ∗ = 0, μ < 1}, letting μ, ν ∈ Belt(Δ)1 be equivalent if the corresponding maps w μ , w ν ∈ Σ0 coincide on S 1 (hence, on Δ∗ ) and passing to Schwarzian derivatives Sf μ . The defining projection φT : μ → Swμ is a holomorphic map from L∞ (Δ) to B. The equivalence class of a map w μ will be denoted by [w μ ]. An intrinsic complete metric on the space T is the Teichmu ¨ ller metric defined by τT (φT (μ), φT(ν)) =
−1 1 inf log K w μ∗ ◦ w ν∗ : μ∗ ∈ φT (μ), ν∗ ∈ φT (ν) . 2
(2.2)
It is generated by the Finsler structure on the tangent bundle T (T) = T × B of T defined by FT (φT (μ), φT (μ)ν) = inf ν∗ (1 − |μ|2)−1 ∞ : (2.3) φT (μ)ν∗ = φT (μ)ν; μ ∈ Belt(Δ)1 ; ν, ν∗ ∈ L∞ (C) . The space T as a complex Banach manifold also has invariant metrics. Two of these (the largest and the smallest metrics) are of special interest. They are called the Kobayashi and the Carath´eodory metrics, respectively, and are defined as follows. The Kobayashi metric dT on T is the largest pseudometric d on T does not get increased by holomorphic maps h : Δ → T so that for any two points ψ1 , ψ2 ∈ T, we have dT (ψ1 , ψ2 ) ≤ inf{dΔ (0, t) : h(0) = ψ1 , h(t) = ψ2 }, e metric on Δ of Gaussian curvature −4, with the where dΔ is the hyperbolic Poincar´ differential form ds = λhyp (z)|dz| := |dz|/(1 − |z|2 ). (2.4) The Carath´ eodory distance between ψ1 and ψ2 in T is cT (ψ1 , ψ2 ) = sup dΔ (h(ψ1 ), h(ψ2 )), where the supremum is taken over all holomorphic maps h : Δ → T. The corresponding differential (infinitesimal) forms of the Kobayashi and Carath´eodory metrics are defined for the points (ψ, v) ∈ T (T), respectively, by KT (ψ, v) = inf{1/r : r > 0, h ∈ Hol(Δr , T), h(0) = ψ, dh(0) = v}, CT (ψ, v) = sup{|df (ψ)v| : f ∈ Hol(T, Δ), f (ψ) = 0},
(2.5)
S. Krushkal / Central European Journal of Mathematics 5(3) 2007 551–580
557
where Hol(X, Y ) denotes the collection of holomorphic maps of a complex manifold X into Y and Δr is the disk {|z| < r}. For general properties of invariant metrics we refer to [23, 24]. Due to the fundamental Gardiner-Royden theorem, the Kobayashi metric on Teichm¨ uller spaces is equal to the Teichm¨ uller metric (see [22, 25–27]). We shall need the following strengthened version of this theorem for universal Teichm¨ uller space given in [28]. Proposition 2.4. The differential Kobayashi metric KT (ϕ, v) on the tangent bundle T (T) of the universal Teichm¨ uller space T is logarithmically plurisubharmonic in ϕ ∈ T, equals the canonical Finsler structure FT (ϕ, v) on T (T) generating the Teichm¨ uller metric of T and has constant holomorphic sectional curvature κK (ϕ, v) = −4 on T (T). The generalized Gaussian curvature κλ of an upper semicontinuous Finsler metric ds = λ(t)|dt| in a domain Ω ⊂ C is defined by κλ (t) = −
Δ log λ(t) , λ(t)2
(2.6)
where Δ is the generalized Laplacian 1 1 Δλ(t) = 4 lim inf 2 r→0 r 2π
0
2π
λ(t + reiθ )dθ − λ(t)
(provided that −∞ ≤ λ(t) < ∞). Similar to C 2 functions, for which Δ coincides with the usual Laplacian, one obtains that λ is subharmonic on Ω if and only if Δλ(t) ≥ 0; hence, at the points t0 of local maximuma of λ with λ(t0 ) > −∞, we have Δλ(t0 ) ≤ 0. The sectional holomorphic curvature of a Finsler metric on a complex Banach manifold X is defined in a similar way as the supremum of the curvatures (2.6) over appropriate collections of holomorphic maps from the disk into X for a given tangent direction in the image. The holomorphic curvature of the Kobayashi metric K(x, v) of any complete hyperbolic manifold X satisfies κKX ≥ −4 at all points (x, v) of the tangent bundle T (X) of X, and for the Carath´eodory metric CX we have κC (x, v) ≤ −4. For details and general properties of invariant metrics, we refer to [23, 24] (see also [28, 34]).
2.3 Grunsky coefficients revised An underlying fact in the applications of the Grunsky coefficients to Teichm¨ uller space theory is that these coefficients regarded as the functions of Schwarzian derivatives Sf , which we will denote by α mn (Sf ) = αmn (f ), are, together with the Taylor coefficients of f , holomorphic on T.
(2.7)
558
S. Krushkal / Central European Journal of Mathematics 5(3) 2007 551–580
To characterize the functions f ∈ Σ0 obeying the property (1.10), denote by A1 (Δ) the subspace of L1 (Δ) formed by holomorphic functions in Δ, and consider the set A21 = {ψ ∈ A1 (Δ) : ψ = ω 2 , ω holomorphic} which consists of the integrable holomorphic functions on Δ having only zeros of even order. Put μ, ψΔ = μ(z)ψ(z)dxdy, μ ∈ L∞ (Δ), ψ ∈ L1 (Δ) (z = x + iy). D
Proposition 2.5. (cf. [7], and [20]) The equality (1.10) holds if and only if the function with Beltrami coefficient f is the restriction to Δ∗ of a quasiconformal self-map w μ0 of C μ0 satisfying the condition (2.8) sup |μ0 , ϕΔ | = μ0 ∞ , where the supremum is taken over holomorphic functions ϕ ∈ A21 (Δ) with ϕ A1(Δ) = 1. If, in addition, the class [f ] contains a frame map (is a Strebel point), then μ0 is of the form μ0 (z) = μ0 ∞ |ψ0 (z)|/ψ0 (z) with ψ0 ∈ A21 in Δ. (2.9) Geometrically the condition (2.8) means that the Carath´eodory metric on the holouller metric morphic extremal disk {φT (tμ0 / μ0 ) : t ∈ Δ} in T coincides with the Teichm¨ of this space. For analytic curves f (S 1 ) the equality (2.9) was obtained by a different method in [10].
2.4 Generalized Grunsky coefficients The proof of Theorem 1.1 involves generic holomorphic disks in T and a new Finsler structure on T determined by generalized Grunsky coefficients. The method of Grunsky inequalities is extended to bordered Riemann surfaces X with a finite number of boundary components, in particular, to multiply connected domains on the complex plane (cf. [1, 29–31]). However, unlike the case of functions univalent in the disk, a quasiconformal variant of this theory has not been developed so far. In the general case, the generating function (1.2) must be replaced by a bilinear differential ∞ f (z) − f (ζ) − RX (z, ζ) = βmn ϕm (z)ϕn (ζ) : X × X → C, − log z−ζ m,n=1
(2.10)
where the surface kernel RX (z, ζ) relates to the conformal map jθ (z, ζ) of X onto the slit along arcs of logarithmic spirals inclined at the angle θ ∈ [0, π) to a ray sphere C issuing from the origin so that jθ (ζ, ζ) = 0 and jθ (z) = z − zθ + const +O(1/(z − zθ )) as z → zθ = jθ−1 (∞)
S. Krushkal / Central European Journal of Mathematics 5(3) 2007 551–580
559
(in fact, only the maps j0 and jπ/2 are applied). Here {ϕn }∞ 1 is a canonical system of holomorphic functions on X such that (in a local parameter) ϕn (z) =
an,n an+1,n + n+1 + . . . zn z
with an,n > 0,
n = 1, 2, . . . ,
and the derivatives (linear holomorphic differentials) ϕn form a complete orthonormal system in H 2 (X). We shall deal only with simply connected domains X ∞ with quasiconformal boundaries. For any such domain, the kernel RX vanishes identically on X × X, and the expansion (2.10) assumes the form ∞ f (z) − f (ζ) αmn = − log , z−ζ f (z)m f (ζ)n m,n=1
(2.11)
where f denotes a conformal map of X onto the disk Δ∗ so that f (∞) = ∞, √ f (∞) > 0, and αmn = βmn / mn are the normalized generalized Grunsky coefficients. These coefficients also depend holomorphically on Schwarzian derivatives Sf . A theorem of Milin extending the Grunsky univalence criterion to multiply connected domains X states that a holomorphic function f (z) = z + const +O(z −1 ) in a neighborhood of the infinite point z = ∞ can be continued to a univalent function in the whole domain X if and only if the coefficients βmn in (2.10) satisfy the inequality ∞ βmn xm xn ≤ x 2
(2.12)
m,n=1
for any point x = (xn ) ∈ S(l2 ) (see [30]). Accordingly, we have the generalized Grunsky constant ∞ κX (f ) = sup βmn xm xn : x = (xn ) ∈ S(l2 )
(2.13)
m,n=1
which coincides with (1.3) for X = Δ∗ .
3
Proof of Theorem 1.1
Denote Ge := {ϕ = Sf ∈ T : f satisfies (1.10)}, Gi := {ϕ = Sf ∈ T with κ(f ) < k(f )} = T\Ge . In view of continuity of both functions κ (Sf ) := κ(f ),
k(Sf ) := k(f )
on T, we need to establish only that each point ϕ∗ = Sf ∗ ∈ Ge is the limit point of a sequence {ϕn } ⊂ Gi .
560
S. Krushkal / Central European Journal of Mathematics 5(3) 2007 551–580
We may assume that ϕ∗ is a Strebel point. Then its class [f ∗ ] contains a Teichm¨ uller extremal map with Beltrami coefficient
k ∗ |ψ ∗ (z)|/ψ ∗ (z), |z| < 1, μ∗ (z) = 0, |z| > 1, determined by a holomorphic quadratic differential ψ ∗ which has in the unit disk Δ only ∗ zeros of even order. By proposition 2.5, the maps f sμ with |s| < 1/ μ ∞ also satisfy the equality (1.10), thus we can assume that
ϕ∗ ∞ < 2.
(3.1)
We fix r ∈ (0, 1) and define a family of Beltrami coefficients μt = μ(·, t) depending on a complex parameter t, letting ⎧ ⎨μ∗ (z), |z| < r, μ(z, t) = ⎩ 1 − 1 μ∗ (z), r < |z| < 1, 1+t
and μ(z, t) = 0 for |z| > 1. The admissible values of t are those for which |μ(z, t)| < 1. This inequality holds, provided t ranges over the disk ∗ = {t ∈ C : |t + a| > R(a)} with a = a(k ∗ ) = 1/[1 − (k ∗ )2 ] > 1, R(a) = a(a − 1). Δ a (3.2) Note that this disk contains the half-plane H0 := {t ≥ −1/2}, for which we have the inequality |μ(z, t)| ≤ k ∗ . For t = ∞ and t = 0, we have, respectively, μ∞ = μ∗ and
μ∗ (z) if |z| < r, μ0 (z) = 0 otherwise. Then either inequality (1.9) holds for all points in a punctured neighborhood ∗ : |t| > M} consisting of points with the {M < |t| < ∞} or there is a disk U∗ = {t ∈ Δ a equality (1.10). The first case is trivial, thus we consider a more general situation, when the disk (3.2) contains a sequence {tn } going to infinity and such that lim μ(·, tn ) − μ(·, ∞) ∞ = 0;
n→∞
κ(f μ(·,tn ) ) = k(f μ(·,tn ) ),
n = 1, 2, . . . .
(3.3)
We claim that the assumption (3.3) yields that the equality κ(f μt ) = k(f μt )
(3.4)
∗. holds for all t ∈ Δ a μ(·,t) Every class [f ] contains a unique extremal Teichm¨ uller map, thus the images of extremal Beltrami coefficients ϕt = φT (μt ),
∗a , t∈Δ
(3.5)
S. Krushkal / Central European Journal of Mathematics 5(3) 2007 551–580
561
run over a holomorphically embedded disk in T; denote this disk by Ωa . To establish (3.4), we construct on Ωa a Finsler metric with generalized Gaussian curvature of at most −4 and compare it with the Kobayashi metric. The underlying fact is that Grunsky coefficients α mn (Sf ) generate for each x = (xn ) ∈ S(l2 ) a holomorphic map ∞ √ hx (ϕ) = mn αmn (ϕ)xm xn : T → Δ. (3.6) m,n=1
∗ (v) covering Consider in the tangent bundle T (T) = T × B the holomorphic disks Δ a ∗ ) in T. Their points are pairs (ϕ, v), where v = φ [ϕ]μ ∈ B is a tangent the disk φT (Δ a T vector to T at the point ϕ, and μ runs over the ball Belt(Dϕ )1 = {μ ∈ L∞ (C) : μ|Dϕ∗ = 0, μ ∞ < 1}. Here Dϕ and Dϕ∗ denote the images of Δ and Δ∗ under f = fϕ ∈ Σ0 with Sf = ϕ. To get the maps Δ → T preserving the origin, we transform the functions (3.6) by the chain rule for Beltrami coefficients w ν = w σ(ν) ◦ (f ν0 )−1 with σ(ν) ◦ f ν0 =
ν − ν0 ∂z f ν0 , 1 − ν 0 ν ∂z f ν0
preceded by the M¨obius map t → τ =
t − a(k) R(k)
(3.7)
and by an appropriate self-map of Δ chosen so that ϕ∗ is obtained as the image of the origin of Δ. Denote the composed maps by gx [σϕ ] and apply them to pulling back the hyperbolic metric (2.4) onto the disks in T (T) covering the disk σ := {φT (μt ) : t ∈ Δ∗ } ⊂ T. D a g [σ ] (t)|dt| Then we obtain on covering disks the conformal subharmonic metrics ds = λ x ϕ with |gx [σϕ ]| , (3.8) λgx [σϕ ] = gx [σϕ ]∗ (λhyp ) = 1 − |gx [σϕ ]|2 having Gaussian curvature −4 at noncritical points. Consider the upper envelope of these metrics κ(t) = sup λ g [σ ] (t), λ x ϕ taking the supremum over all x ∈ S(l2 ) and all σϕ ∈ Belt(Δ)1 , and take its upper semicontinuous regularization λκ(t) = lim sup λκ(t ). t →t
(3.9)
562
S. Krushkal / Central European Journal of Mathematics 5(3) 2007 551–580
The last metric actually depends only on the points ϕ ∈ T and therefore it descends from ∗. ∗ (v), to an upper semicontinuous metric on the underlying disk Δ the covering disks Δ a a Denote the last metric by λκ. Applying in a straightforward way the arguments exploited in [32], one obtains: Lemma 3.1. (a) The metric λκ is a logarithmically subharmonic Finsler metric on Ωa ; ∗ by (3.7), with τ (t∗ ) = 0, we have (b) In terms of the parameter τ related to t ∈ Δ a the equality λκ (τ ) = κ(τ ϕ∗ ) + o(τ ) as τ → 0, (3.10) where κ(τ ϕ∗ ) denotes the Grunsky constant of the map f ∈ Σ0 with Sf |Δ∗ = τ ϕ∗ . Another important property of this metric is given by the following lemma. Lemma 3.2. The generalized Gaussian curvature of λκ satisfies kλκ ≤ −4. The last inequality is equivalent to the following one Δ log λκ ≥ 4λ2κ, or Δuκ ≥ 4e2uκ , where uκ = log λκ. Here Δ again means the generalized Laplacian. Proof of Lemma 3.2 (cf. [23, 32]). Take a maximizing sequence of (renormalized) functions (3.6) for κ(ϕ∗ ), so that lim hx(p) (ϕ∗ ) = κ(ϕ∗ ),
p→∞
and construct the corresponding maps gx(p) [σϕ ] . Restrict these maps to the disk Ωa and apply again the parameter τ ranging over Δ. The above restrictions converge uniformly on compact subsets in Δ to a holomorphic map g0 (τ ) : Δ → Δ such that g0 (0) = κ(ϕ∗ ). This map determines on Ωa the conformal metric λg0 (τ ) =
|g0 (τ )| 1 − |g0 (τ )|2
of constant curvature −4 at its noncritical points; in view of (3.10) and the definition of λκ, it is a supporting metric for λκ at τ = 0 (i.e., λg0 (0) = λκ(0) and λh0 (τ ) ≤ λκ(τ ) near τ = 0). λ Hence, log λhκ0 has a local maximum at 0, and therefore, Δ log
λh 0 (0) = Δ log λh0 (0) − Δ log λκ (0) ≤ 0, λκ
which yields −
Δ log λh0 (0) Δ log λκ(0) , ≤− 2 λκ(0) λ2h0 (0)
and the desired inequality κλκ ≤ −4 on Ωa follows.
(3.11)
S. Krushkal / Central European Journal of Mathematics 5(3) 2007 551–580
563
Let us compare now λκ with the infinitesimal Kobayashi metric λK of T restricted to Ωa which is logarithmically subharmonic and has generalized Gaussian curvature −4 on this disk. For a fixed ϕ0 ∈ T, we set Hx (ϕ; ϕ0 ) =
hx (ϕ) − hx (ϕ0 ) 1 − hx (ϕ0 )hx (ϕ)
and define similarly to (2.3) the Finsler structure Fκ (ϕ0 , v) on T (T) by Fκ(ϕ0 , v) = sup{|dHx (ϕ0 ; ϕ0 )v| : x ∈ S(l2 )}.
(3.12)
It is dominated by the canonical Finsler structure (2.3). The structure (3.12) allows us to construct in a standard way on embedded holomorphic disks γ(Δ) the Finsler metrics λ(t)ds by λγ (t) = Fκ (γ(t), γ (t)). and, accordingly, the corresponding distances dγ (ϕ1 , ϕ2 ) = inf
λγ (t)dst ,
(3.13)
β
taking the infimum over C 1 smooth curves β : [0, 1] → T joining the points ϕ1 and ϕ2 . Lemma 3.3. On any extremal Teichm¨ uller disk Δ(μ0 ) = {φT (tμ0 ) : t ∈ Δ} (and its isometric images in T), we have the equality r
−1
tanh [ κ (Sf rμ0 )] =
λκ(t)dt.
(3.14)
0
Proof. Put f0 = f μ0 and consider the covering maps hx (μ) = hx ◦ φT : Belt(Δ)1 → Δ of (3.6) for ϕ ∈ Δ(μ0 ). For any appropriate hx , we have the equalities hx ()] = tanh−1 [
h x ()
0
|dt| = 1 − |t|2
λhx (t)|dt|,
(3.15)
sup λhx (t)|dt|.
(3.16)
0
and therefore,
−1
tanh [κ(f0 )] = sup x
λhx (t)|dt| =
0
0
x
The second equality in (3.16) is obtained by taking a sequence {xn } ⊂ S(l2 ) and the corresponding monotone increasing sequence of metrics λ1 = λhx , λ2 = max(λhx , λhx ), λ3 = max(λhx , λhx , λhx ), . . . 1
1
2
1
2
3
564
S. Krushkal / Central European Journal of Mathematics 5(3) 2007 551–580
so that lim λn (t) = sup λhx (t).
n→∞
x
Using the continuity of κ (Sf ) on T and the properties of quasiconformal maps, one can show that λκ = λf0 is upper semicontinuous on Δ (cf. [39]). Thus the equality (3.14) follows from (3.9) and (3.16), and the lemma follows. The last step in the proof of Lemma 3.3 can be simplified. Indeed, since the upper semicontinuous regularization of sup λhx can decrease the function, we get from (3.15) x
t
λκ(f0 ) (t)|dt| ≤ tanh−1 [κ(f0 )].
0
But for every hx , we have λhx (t) ≤ λκ(f0 ) (t), which yields the opposite inequality. Now, taking into account that the disk Ωa touches at the points ϕ∗ and ϕn = φT (μ(·, tn )) the Teichm¨ uller disks Δ(μ∗ ) and Δ(μn ) and that the metric λκ does not depend on the tangent unit vectors whose initial points are the points of Ωa , one obtains from Lemma 3.3 and (1.8) that this metric relates to the Kobayashi metric λK |Ωa as follows λκ(0) = λκ∗ (ϕ∗ ) = λK (0), λκ (tn ) = λκn (ϕn ) = λK (tn ); (3.17) for all t ∈ Ωa \ {0, tn } λκ(t) ≤ λK (t) which means that λκ is a supporting metric for λK |Ωa at t = 0 and t = tn , n = 1, 2, . . . . A more subtle comparison of these metrics is obtained by applying Minda’s maximum principle: Lemma 3.4. (cf. [33]) If a function u : Ω → [−∞, +∞) is upper semicontinuous in a domain Ω ⊂ C and its generalized Laplacian satisfies the inequality Δu(z) ≥ Ku(z) with some positive constant K at any point z ∈ Ω, where u(z) > −∞, and if lim sup u(z) ≤ 0 for all ζ ∈ ∂Ω, z→ζ
then either u(z) < 0 for all z ∈ Ω or else u(z) = 0 for all z ∈ Ω. For a sufficiently small neighborhood U0 of the origin t = 0, we put M = {sup λK (t) : t ∈ U0 }; then in this neighborhood, λK (t) + λκ(t) ≤ 2M. Consider the function u = log
λκ . λK
Then (cf. [23, 33]) for t ∈ U0 , Δu(t) = Δ log λκ(t) − Δ log λK (t) = 4(λ2κ − λ2K ) ≥ 8M(λκ − λK ).
S. Krushkal / Central European Journal of Mathematics 5(3) 2007 551–580
565
The elementary estimate M log(t/s) ≥ t − s for 0 < s ≤ t < M (with equality only for t = s) implies that M log
λg0 (t) ≥ λg0 (t) − λd (t), λd (t)
and hence, Δu(t) ≥ 4M 2 u(t). Applying Lemma 3.4, we obtain that, in view of (3.17), both metrics λκ and λK must be equal in the entire disk Ωa , which is equivalent to the desired equality (3.4). In fact, the above arguments give more, namely, that κ (ϕ) = k(ϕ)
(3.18)
on the connected component Π0 of the intersection Π = {tϕ∗ : t ∈ C} ∩ T containing the origin of T (which is simply connected by Zhuravlev’s theorem, see [4, 5]). In particular, the last equality holds also at −ϕ∗. The same arguments work for intersections of T with arbitrary complex lines passing through ϕ∗ , which yields that (3.18) must hold for all points of a ball B(ϕ∗ , δ) centered at ϕ∗ . Moving this ball from the point ϕ∗ along the segment [−ϕ∗ , ϕ∗ ] ⊂ Π0 , one derives that the equality (3.18) must hold for all points of a ball centered at the origin ϕ = 0. But the latter is impossible, since it contradicts the existence of points ϕ = Sf in a neighborhood of 0 at which κ(f ) < k(f ). This contradiction proves the theorem. A straightforward modification of the above arguments yields the following result. Theorem 3.5. Let Ω = h(Δ) be a holomorphically embedded disk in T. Assume that there exists a sequence {ϕn } of points Ω convergent to ϕ0 ∈ Ω so that κ (ϕn ) = k(ϕn ),
n = 0, 1, . . . .
Then κ (ϕ) = k(ϕ) for all ϕ ∈ Ω.
4
Grunsky’s norm of frame maps
The following theorem is a corollary of the proof of Theorem 1.1. Theorem 4.1. If a function f ∈ Σ0 has the property κ(f ) = k(f ), then for an arbitrary fixed r ∈ (0, 1) its frame maps f μr with μr given by (2.1) either simultaneously have this property or none of them does this.
566
S. Krushkal / Central European Journal of Mathematics 5(3) 2007 551–580
Proof. The proof follows in the same manner as the first stage of the proof of Theorem 1.1. Assuming that we have for some r ∈ (0, 1) the equality κ(f μr ) = k(f μr ), we can consider again the same disk Ωa ∈ T and, applying the above arguments to the pair (ϕ) = k(ϕ) for all ϕ ∈ Ωa . (ϕ∗ , ϕr = Sf μr ), obtain that κ
5
Inversion of Grunsky and Ahlfors inequalities
5.1 Sharp lower estimate We now turn to the open problem on the sharp lower estimates in the inequalities inverse to (1.8) and (1.11). Such estimates are important for example in algorithms for finding the exact or approximate values of Fredholm eigenvalues and reflection coefficients of curves (cf. e.g. [6, 14–17, 19, 20, 39]). We provide a new approach which involves the conformal metrics of negative integral curvature and certain geometric features of the universal Teichm¨ uller space. As mentioned in Section 1, there is an explicit bound k1 (k) for dilatations of quasiconformal extensions of f ∈ Σ with κ(f ) ≤ k found in [40]. It is given by 3/2 1 + κ 1 + κ 1 + k1 , 2λ −1 , (5.1) ≤ max λ K1 := 1 − k1 1−κ 1−κ where λ(K) = max w(1), with the fixed taking the maximum among all K-quasiconformal automorphisms w of C points −1, 0, ∞. The distortion function λ can be represented by elliptic integrals (see [41], p. 15). For small κ < 1/3, there is a somewhat better estimate k < 3κ,
(5.2)
(see [6, 15]). Neither of the bounds (5.1) and (5.2) is sharp. The following theorem yields a sharp bound for an individual function and improves Proposition 2.5. We shall use here the following notations. For an element μ ∈ Belt(Δ)1 we define μ∗ (z) =
μ(z)
μ ∞
so that μ∗ ∞ = 1. The extremal Beltrami coefficient in the class [f ] will be denoted by μ0 (z; f ). For a measurable real valued function u on the disk Δ which is locally bounded from above we define its circular mean Mu by 1 Mu(r) = 2π
2π u(reiθ )dθ. 0
(5.3)
S. Krushkal / Central European Journal of Mathematics 5(3) 2007 551–580
567
By Jensen’s inequality, for any convex function ω on interval I ⊂ R containing the values of both u and Mu, we have ω(Mu) ≤ Mω(u). We will apply this inequality for ω(u) = eu and ω(u) = tanh−1 u. The mean (5.3) inherits certain important properties of its original function. For example, if u is subharmonic on Δ, then so is Mu. Theorem 5.1. For every function f ∈ Σ0 , we have the sharp bound k(f ) ≤
1 κ(f ) = , α(f ) α(f )ρ(f )
where α(f ) = μinf f
0 ∈[f ]
sup ψ∈A21 ,ψA1 =1
μ∗0 (z; f )ψ(z)dxdy > 0.
(5.4)
(5.5)
Δ
μ0
If f has a unique extremal extension f , then k(f μ0 ) ≤
1 min κ(f tμ0 ). α(f μ0 ) |t|=1
with μ0
α(f ) =
sup ψ∈A21 ,ψA1 =1
∗ μ0 (z; f )ψ(z)dxdy > 0.
(5.6)
(5.7)
Δ
Note that (5.6) is a simple consequence of (5.4), because k(f tμ0 ) = μ0 ∞ for all t ∈ S 1 . This is true, in particular, for Strebel points. On uniqueness of extremal maps with nonconstant dilatations see [35]. The quantity 1 − α(f ) can be regarded as a measure of deviation of the Grunsky structure Fκ defined above from the canonical Finsler structure FT on T (T). For any f satisfying (1.10), α(f ) = 1. The applications of this theorem will be given in the next section.
5.2 Preliminaries: some other generalizations of Gaussian curvature, and circularly symmetric metrics The proof of Theorem 5.1 involves certain known results on conformal metrics ds = λ(z)|dz| on the disk Δ with λ(z) ≥ 0 (called also semi-metrics) of negative integral curvature bounded from above. We already dealt with nonsmooth conformal Finsler metrics satisfying the inequality Δ log λ ≥ Kλ2 , where Δ meant a generalization of the Laplacian 4∂∂.
(5.8)
568
S. Krushkal / Central European Journal of Mathematics 5(3) 2007 551–580
Another generalization of this condition is due to Ahlfors [13]. A conformal metric λ|dz| in a domain G on C (or on a Riemann surface) has curvature less than or equal to K in the supporting sense if for each K > K and each z0 with λ(z0 ) > 0, there is a 0 ) = λ(z0 ) and λ(z) C 2 -smooth supporting metric λ for λ at z0 (i.e., such that λ(z ≤ λ(z) in a neighborhood of z0 ) with κλ (z0 ) ≤ K (cf. [37]). There are also integral generalizations of the inequality (5.8) (see e.g. [42, 43]). We shall use its generalization in the potential sense due to [43] and say that λ has curvature at most K in the potential sense at z0 if there is a disk U about z0 in which the function log λ + PotU (λ2 ), where PotU denotes the logarithmic potential 1 PotU h = h(ζ) log |ζ − z|dξdη 2π
(ζ = ξ + iη),
U
is subharmonic. One can replace U by any open subset V ⊂ U, because the function PotU (λ2 ) − PotV (λ2 ) is harmonic on U. Note that having curvature at most K in the potential sense is equivalent to λ satisfying (5.8) in the sense of distributions. The following three preliminary lemmas were proven in [43]. Lemma 5.2. If a conformal metric has curvature at most K in the supporting sense, then it has curvature at most K in the potential sense. The following lemma concerns the circularly symmetric (or radial) metrics which are the functions of r = |z|. Its proofs involves the fact that for the disk Δ the potential PotΔ commutes with the average (5.2). Lemma 5.3. If a circularly symmetric conformal metric λ(|z|)|dz| in the unit disk has curvature at most −4 in the potential sense, then a , (5.9) λ(r) ≥ 1 − a2 r 2 where a = λ(0). The right hand-side of (5.9) defines a supporting conformal metric for λ at the origin with constant Gaussian curvature −4 on the whole disk Δ. Lemma 5.4. Let λ|dz| be a conformal metric on the unit disk which has curvature at = eMu , where u = log λ, also has has most −4 in the potential sense. Then the metric λ curvature at most −4 in the potential sense.
5.3 Proof of Theorem 5.1 The case κ(f ) = k(f ) is trivial, thus we have to deal only with the maps for which κ(f ) < k(f ).
S. Krushkal / Central European Journal of Mathematics 5(3) 2007 551–580
569
Consider first the case when the class [f ] of a given map f is a Strebel’s point and thus [f ] has a unique extremal extension f μ0 . Then on the extremal disk Δ(μ∗0 ) = {φT (tμ∗0 ) : t ∈ Δ}, the infinitesimal Kobayashi metric λK of T is equal to hyperbolic metric (2.4). We can assume also that k(f ) = μ0 ∞ is sufficiently small so that κ(f ) < k(f ) < α(f ). Put ϕt = Sf tμ∗0 for any t ∈ Δ and construct the holomorphic maps hx (t) := hx (ϕt ) =
∞ √
mn αmn (ϕt ) xm xn : Δ → Δ,
(5.10)
m,n=1 ∗ hx (t)| : x ∈ S(l2 )} = κ(f μ0 ). taking again x = (xn ) ∈ S(l2 ); then sup {| These maps determine on Δ(μ∗0 ) the corresponding conformal metrics by pulling back the hyperbolic metric (2.4):
λhx (t) := h∗x (λhyp ) =
| hx (t)| . 1 − | hx (t)|2
κ (t) = sup{λ (t) : x ∈ S(l2 )} and its Now take the upper envelope of these metrics λ hx upper semicontinuous regularization λκ(t ). λκ(t) = lim sup
(5.11)
t →t
Similarly to the metric implied in the proof of Theorem 1.1, λκ is logarithmically subharmonic on Δ and has the generalized Gaussian curvature and curvature in the supporting sense both less than or equal −4. By Lemma 5.2, its curvature in the potential sense is also at most −4, and by Lemma 5.4 its mean M[λκ](|t|) is a circularly symmetric metric with curvature also at most −4 in the potential sense. To calculate the value M[λκ](0) = λκ(0), we apply a variation of f μ ∈ Σ0 . For small
μ ∞ , we have ∂ζ f μ (ζ) 1 1 μ(ζ) μ dξdη = z − ξdη + O( μ 2∞) f (z) = z − π ζ −z π ζ −z |ζ|<1 |ζ|<1 (5.12) ∞ bn =z+ zn 0 with
1 bn = π
μ(ζ)ζ n−1dξdη + O( μ 2∞),
μ ∞ → 0.
(5.13)
|ζ|<1
The bound of the remainder in (5.12) is uniform on each disk ΔR = {|z| ≤ R < ∞}.
570
S. Krushkal / Central European Journal of Mathematics 5(3) 2007 551–580
The relations (1.2) and (5.13) yield the equality −1 αmn (φT (μ)) = −π μ(z)z m+n−2 dxdy + O( μ 2∞ ),
μ ∞ → 0.
(5.14)
Δ
Note
also that applying Parseval’s equality to holomorphic functions ∞ ω(z) = ck z k ∈ L2 (Δ) yields for their squares ψ = ω 2 ∈ A21 (Δ) the representation 0
∞ 1 √ mn xm xn z m+n−2 , ψ(z) = π m+n=2
(5.15)
with x = (xn ) ∈ l2 , x = ω L2 . We take x = 1 to get ψ A1 = 1. We lift the maps (5.10) from the disk Δ(μ∗0 ) onto its covering disk ∗) ∗ ) := {tμ∗ : |t| < 1} in the ball Belt(Δ)1 , getting the maps hx = hx ◦ φT |Δ(μ Δ(μ 0 0 0 whose differential at zero is given by 1 d hx (0)μ∗0 = − π
∞ √
μ∗0 (z)
mn xm xn z m+n−2 dxdy.
m+n=2
Δ
Comparison with (1.3), (5.14) and (5.15) yields M[λκ](0) = λκ(0) = α(f0 ),
f0 := f μ0 ,
(5.16)
where α(f0 ) is represented by (5.7). Then by Lemma 5.3, M[λκ](r) ≥
α(f0 ) , 1 − α(f0 )2 r 2
(5.17)
We claim that α(f0 ) > 0.
(5.18)
Indeed, for small |t|, we have from (5.12) ∗
f tμ0 (z) = z −
t ∗ μ , ωz Δ + O(t2 ), π 0
where ωz (ζ) = 1/(ζ − z),
ζ ∈ Δ, z ∈ C.
For any z ∈ Δ∗ , the function ωz does not vanish in Δ and thus belongs to A21 . If α(f μ0 ) = 0, we get μ∗0 , ωz Δ = 0 for all z ∈ Δ, which yields that μ0 is orthogonal to all powers ζ n , n = 0, 1, . . . , and therefore to all integrable holomorphic functions in Δ (in other words, the functions ωz with z ∈ Δ∗ span the whole space A1 (Δ)). This means that μ0 is a locally trivial Beltrami coefficient, which is impossible for extremal Beltrami coefficients (see e.g. [22, 38]), and (5.18) follows.
S. Krushkal / Central European Journal of Mathematics 5(3) 2007 551–580
571
Now, integrating both sides of (5.17) over a radial segment [0, ] with = μ0 ∞ , one obtains
∗
M[λκ](r)dr ≥ tanh−1 [α(f μ0 )] = tanh−1 [α(f μ0 )k(f μ0 )] = tanh−1 [α(f μ0 )k(f μ0 )].
0
(5.19)
Applying Lemma 3.3 yields t
λκ(f μ0 ) (t)|dt| = tanh−1 [κ(f μ0 ];
0
thus after averaging λκ(reiθ ), 0
1 M[λκ](r)dr = 2π
2π 0
0
1 λκ(reiθ )dθdr = 2π
2π 0
λκ(reiθ )drdθ = tanh−1 [κ(f μ0 )].
0
(5.20) Comparison of (5.19) and (5.20) yields the inequality (5.6) and proves the theorem for Strebel points. Now the left-hand inequality for arbitrary f ∈ Σ0 is a simple corollary of (5.6), and the second equality in (5.4) follows then from the K¨ uhnau-Schiffer theorem. We need to be sure that the quantity (5.5) is positive. This will be shown in the next section, but it follows also from (5.1) or (5.2). Theorem 5.1 is proved completely.
5.4 Geometric picture A geometric meaning of the quantity (5.7) is that this value equals the supremum of uller tangent vector L∞ -lengths of projections of the unit Teichm¨ ϕ0 = φT (0)μ∗0 to the disk Δ(μ∗0 ) at the origin onto the elements of the distinguished set A21 ∩ {ψ ∈ A1 (Δ), ψ = 1}. Therefore, α(f ) does not depend on the choice of extremal μ∗0 in a class [f ]. More precisely, one can write α(f ) = sup{|νϕ0 , ψΔ | : ψ ∈ A21 , ψ A1 (Δ)=1 }, with
1 νϕ0 (z) = (1 − |z|2 )2 ϕ0 (1/¯ z )1/¯ z4, (5.21) 2 noting that, for sufficiently small |t|, the Schwarzian derivative ϕ0 (t) := Sf tμ∗0 determines ∗ by (5.21) the harmonic Beltrami coefficient of the Ahlfors-Weill extension of the map f tμ0 across the unit circle S 1 .
572
S. Krushkal / Central European Journal of Mathematics 5(3) 2007 551–580
In view of the characteristic property of extremal Beltrami differentials, we have for any such μ∗0 the equality νϕ0 = μ∗0 + σ0 , σ ∈ A1 (Δ)⊥ , where A1 (Δ)⊥ = {ν ∈ Belt(Δ)1 : ν, ψΔ = 0 for all ψ ∈ A1 (Δ)} = ker φT (0) is the set of so-called locally trivial Beltrami coefficients (see e.g. [22, 38]).
5.5 Generalization The arguments applied in the first step of proof of Theorem 5.1 can be extended in a straightforward manner to any hyperbolic simply connected domain D containing the infinite point and bounded by quasicircle. Denote by Σ0 (D) the class of univalent holomorphic functions in D with expansion f (z) = z + const +O(1/z) near z = ∞ and having quasiconformal extension across the boundary of D. The sets A1 (D) and A21 (D) are defined similarly to the case of the unit disk. Then we obtain Theorem 5.5. Let D be a quasidisk containing z = ∞. Then for every function f ∈ Σ0 (D) having a unique extremal quasiconformal extension to the complementary \ D, there is a sharp bound domain C k(f ) ≤
κD (f ) , αD (f )
where κD (f ) is the generalized Grunsky constant (2.13) for Σ0 (D), and ∗ sup μ0 (z; f )ψ(z)dxdy . αD (f ) = ψ∈A21 (D),ψA1 (D)=1
D
5.6 Question Is the function α (Sf ) = α(f ) plurisubharmonic on T ? Were the answer affirmative, we would get several interesting consequences.
6
Comparing the Teichmu ¨ller and Grunsky dilatations: Ku ¨hnau’s problem
6.1 Sharp lower bound Theorem 5.1 is rich in applications. It allows us to solve an old problem of K¨ uhnau related to comparing the Teichm¨ uller and Grunsky dilatations. A case in point is the map t 2/3 f3,t (z) = z 1 + 3 ∈ Σ(|t|), 0 ≤ |t| < 1, (6.1) z
S. Krushkal / Central European Journal of Mathematics 5(3) 2007 551–580
573
whose extremal extension to Δ is |z| 3 2/3 f3,t (z) = z 1 + t z with Beltrami coefficient μ3 (z) := μf3,t (z) = t|z|/z. In polar coordinates, μ3 (reiθ ) = te−iθ . The map (6.1) has threefold rotational symmetry f (e2nπi/3 z) = e2nπi/3 f (z), n = 0, 1, 2. Note that f3,t (z) = ft (z 3/2 )2/3 , where ft (z) =
z + t/z,
|z| ≥ 1,
z + tz,
|z| < 1.
In 1981, K¨ uhnau [9], applying the technique of Fredholm eigenvalues, discovered that the map (6.1) satisfies κ(f3,t ) < k(f3,t ). This was the first (though the simplest) example of maps not obeying the property (1.10) which actually gave rise to a new development. K¨ uhnau posed in [9, 15] the question of finding the sharp bound in the inequality (5.2). A strengthened problem, stated later in [40], asked if the function (6.1) has the following important feature: Problem. Is the exact bound in (5.2) attained by the function (6.1)? The following theorem solves this problem affirmatively and improves the estimates (5.1) and (5.2). Theorem 6.1. For f ∈ Σ0 we have the bound 3 k(f ) ≤ √ κ(f ) = 1.07 . . . κ(f ) 2 2
(6.2)
(and accordingly for Fredholm eigenvalues) which is asymptotically sharp as κ → 0, with equality for the map (6.1). In view of the density of Strebel points and continuity of k(f ) and κ(f ) on T, in order to prove Theorem 6.1, it suffices to establish the estimate (6.2) for Strebel points (classes) [f ]. We know that each such class [f ] contains a unique extremal Teichm¨ uller μ0 map f with Beltrami coefficient of the form μ0 (z) = k|ϕ0 (z)|/ϕ0 (z),
ϕ0 ∈ A1 (Δ) \ {0}.
(6.3)
We will again use the normalized Beltrami coefficients μ∗0 (z; f ) = |ϕ0 (z)|/ϕ0 (z) and denote μ∗3 (z) = |z|/z.
574
S. Krushkal / Central European Journal of Mathematics 5(3) 2007 551–580
It follows from Theorem 5.1 that the least admissible factor in the estimate k(f ) ≤ Mκ(f ) for all f ∈ Σ0 is given by 1 = inf∗ μ0 M
sup ψ∈A21 ,ψA1 =1
|μ∗0, ψΔ |.
(6.4)
We precede the proof of Theorem 6.1 by two lemmas. Lemma 6.2. For f = f3,t , we have sup ψ∈A21 ,ψA1 =1
|μ∗3, ψΔ |
= max ψ3
|μ∗3 , ψ3 Δ |
√ 2 2 , = 3
(6.5)
where ψ3 are the squares of nonconstant linear functions ψ3 (z) = ω1 (z)2 := (a0 + a1 z)2 with a0 = 0, a1 = 0, and ψ3 A1 = 1. Proof. For any 2
ψ(z) = ω(z) :=
∞
ak z
k
2
= a0 + 2a0 a1 z + (a21 + 2a0 a2 )z 2 + · · · ∈ A21
0
with nonzero a0 and a1 , we have |z| ∗ [a0 + 2a0 a1 z + (a21 + 2a0 a2 )z 2 + . . . ]dxdy μ3 , ψΔ = z Δ
1 2π 4π a0 a1 . [a0 e−iθ + 2a0 a1 r + (a21 + 2a0 a2 )reiθ + . . . ]dθrdr = = 3 0
(6.6)
0
Noting that, by Parseval’s equality for orthonormal system { (n + 1)/πz n }∞ 0 ,
ψ A1 (Δ) = ω1 2L2 (Δ) = π
∞ |an |2 = 1, n + 1 0
one derives from (6.5) and (6.6) the equality sup ψ∈A21 ,ψA1 =1
|μ∗3, ψΔ | = 2
sup |a0 |2 +|a1 |2 /2≤1/π
|a0 a1 |.
The function F (a0 , a1 ) = a20 a21 is holomorphic on the closed ball 1 a1 |a1 |2 ≤ , Bπ = a = a0 , √ ∈ C2 : |a0 |2 + 2 π 2
(6.7)
S. Krushkal / Central European Journal of Mathematics 5(3) 2007 551–580
575
√ thus |a0 a1 |2 attains its maximal value on the boundary sphere Sπ = {|a| = 1/ π}. This yields sup ψ∈A21 ,ψA1 =1
|μ∗3, ψΔ |2 =
1 |a |2 8 16 2 1 π max |a0 a1 |2 = max |a1 |2 − = , Sπ |a1 | 9 π 2 9
and this maximum is attained at the points a = (a0 , a1 /2) with √ √ |a0 | = 1/ 2π, |a1 | = 1/ π. If in (6.6) at least one of a0 , a1 vanishes, then μ∗3, ψΔ = 0, which shows that such ψ bring no contribution to the supremum in (6.5). The lemma is proved. The next lemma is a slight extension of the previous one to the maps fm+2,t (z) = ft (z (m+2)/2 )2/(m+2) , m = 3, 5, 7 . . . , whose normalized Beltrami coefficients in Δ are μ∗m+2 (z) = |z|m /z m . Lemma 6.3. For f = fm+2,t with m = 3, 5, . . . , we have the estimate α(fm+2,t ) =
sup ψ∈A21 ,ψA1 =1
|μ∗m+2 , ψΔ | ≥ max |μ∗m+2 , ψm Δ |
√ (m + 1)(m + 3) 2 2 > , ≥ m+2 3
ψm
(6.8)
where ψm are polynomials 2 ψm (z) = ωm (z)2 := a(m−1)/2 z (m−1)/2 + a(m+1)/2 z (m+1)/2 with
ψm A1 (Δ) = ωm 2L2 (Δ) =
|a(m−1)/2 |2 |a(m+1)/2 |2 1 + = . m+1 m+3 2π
Proof. Now we have to estimate |z|m 2 ∗ μm+2 , ψm Δ = a(m−1)/2 z (m−1) + 2a(m−1)/2 a(m+1)/2 z m + a2(m+1)/2 z (m+1) dxdy m z Δ
=
4π a(m−1)/2 a(m+1)/2 m+2
for (a(m−1)/2 , a(m+1)/2 ) satisfying |a(m−1)/2 |2 |a(m+1)/2 |2 1 + = . m+1 m+3 2π
576
S. Krushkal / Central European Journal of Mathematics 5(3) 2007 551–580
The same arguments as in the proof of Lemma 6.2 yield the first inequality in (6.8). The maximum of |μ∗m, ψm Δ | is attained when 1 1 m+1 m+3 , |a(m+1)/2 | = . |a(m−1)/2 | = 2 π 2 π Thereafter, one easily verifies by induction that the function g(m) =
(m + 1)(m + 3) , (m + 2)2
m = 1, 3, 5, . . . ,
is strictly monotone increasing with m, which yields the second inequality in (6.8). Proof of Theorem 6.1. It remains to compare the values μ∗0, ψΔ for an arbitrary extremal coefficient (6.3) with μ∗m+2 , ψΔ . Without loss of generality, we can restrict ourselves to coefficients μ0 whose defining quadratic differentials ϕ0 are of the form ϕ0 (z) = cm z m + cm+1 z m+1 + . . . ,
m ≥ 1 odd, cm = 0,
that is, having zero of odd order at the origin, because for zero at z0 = 0 one can take γ ∗ ϕ0 = (ϕ0 ◦ γ)(γ )2 ,
γ ∗ ψ = (ψ ◦ γ)(γ )2 ,
with γz = (z − z0 )/(1 − z 0 z), and this transform preserves the value of ||ϕ0|/ϕ0 , ψΔ |. Then cm+1 m 1 + z + . . . cm |z| |ϕ0 (z)| ∗ =c m =: cμ∗m + μ μ0 (z; f0 ) = , ϕ0 (z) z 1 + cm+1 z + . . . cm is of the form where c = |cm |/cm and the remainder μ μ (re ) = iθ
∞
Cn (r)einθ .
(6.9)
−∞
The notation means that this series does not contain the term C−m (r)e−imθ . If the coefficient Cm (r) in (6.9) is real and distinct from zero, then we take the polynomial ψm with 1 i m+3 m+1 , a(m+1)/2 = , a(m−1)/2 = 2 π 2c π getting κ(f μ0 ) ≥ |μ∗0, ψm Δ | = |cμ∗m+2 , ψm Δ + μ, ψm Δ | (m + 1)(m + 2) 1 (m + 1)(m + 2) + Am > > κ(f3 ), = i m+2 m+2 m+2
S. Krushkal / Central European Journal of Mathematics 5(3) 2007 551–580
where
Am = 2π
0
577
1
Cm (r)rdr
is also real and distinct from zero, and f3 = f3,t . If Cm (r) = iβ(r), β(r) ∈ R \ {0}, we take ψm with coefficients
1 = 2
a(m−1)/2
m+1 , π
a(m+1)/2
1 = 2c
m+3 π
and obtain in a similar way the inequality κ(f μ0 ) > κ(f3 ). If Cm (r) = α(r) + iβ(r) with α(r) = 0, β(r) = 0, we take ψm with a(m−1)/2
1 = 2
m+1 , π
a(m+1)/2
= 2c
m+3 π
and appropriate ∈ S 1 so that (m + 1)(m + 2) (m + 1)(m + 2) + α(r) + iβ(r) > . m+2 m+2 In this case we again obtain κ(f μ0 ) > κ(f3 ). Finally, in the case Cm (r) ≡ 0 we have for each ψm the equality |μ∗0, ψm Δ | = |μ∗m+2 , ψm Δ |, and κ(f μ0 ) ≥ κ(f3 ). Together with (6.4) this yields the assertion of Theorem 6.1.
As observed by K¨ uhnau, we have for the map (6.1), letting k = k(f ), κ = κ(f ), by [15], formula (21) (where r 2 = k), k or equivalently,
1+
3κ k2 ≤ √ =: κ ∗ , 9 2 2
k2 ≤ 3
(κ ∗ )2 +
3(κ ∗ )2 9 9 − = ; 4 2 (κ ∗ )2 + 94 + 32
hence, k ≤
κ∗ (κ∗ )2 9
+ 14 +
1 2
1 = κ ∗ 1 − (κ ∗ )2 + . . . , 18
which shows that the equality in (6.2) is attained by the map (6.1) only asymptotically as κ → 0.
578
S. Krushkal / Central European Journal of Mathematics 5(3) 2007 551–580
6.2 Geometric applications Theorem 6.1 has interesting geometric consequences. The inequalities (1.8) and (6.2) result in 3 κ(f ) ≤ k(f ) ≤ √ κ(f ), 2 2
(6.10)
and similarly for reciprocals of Fredholm eigenvalues of quasicircles f (S 1 ). uller space T is a homogeneous Since κ(f ) ≤ cT (0, Sf ) and the universal Teichm¨ domain, one obtains from (6.10) the following inequalities estimating the behavior of invariant metrics on this space. Theorem 6.4. For any two points ϕ1 , ϕ2 ∈ T, their Carath´eodory and Kobayashi distances are related by 3 tanh−1 dT (ϕ1 , ϕ2 ) ≤ √ , 1≤ −1 tanh cT (ϕ1 , ϕ2 ) 2 2 or 3 −1 cT (ϕ1 , ϕ2 ) ≤ dT (ϕ1 , ϕ2 ) ≤ tanh √ tanh cT (ϕ1 , ϕ2 ) . 2 2 The approach exploited above can be extended to other important inequalities, for example, to exact estimation of the minimal dilatation k0 (M) of quasiconformal extensions of M-quasisymmetric homeomorphisms of the circle.
Acknowlegments I am grateful to the referees for their comments and suggestions which improved the exposition.
References [1] H. Grunsky: “Koeffizientenbedingungen f¨ ur schlicht abbildende meromorphe Funktionen”, Math. Z., Vol. 45, (1939), pp. 29–61. [2] R.K¨ uhnau: “Verzerrungss¨atze und Koeffizientenbedingungen vom Grunskyschen Typ f¨ ur quasikonforme Abbildungen”, Math. Nachr., Vol. 48, (1971), pp. 77–105. [3] C. Pommerenke: Univalent Functions, Vandenhoeck & Ruprecht, G¨ottingen, 1975. [4] I.V. Zhuravlev: “Univalent functions and Teichm¨ uller spaces”, Preprint: Inst. of Mathematics, Novosibirsk, 1979, p. 23 (Russian). [5] S.L. Krushkal and R. K¨ uhnau: Quasikonforme Abbildungen - neue Methoden und Anwendungen, Teubner-Texte zur Math., Bd. 54, Teubner, Leipzig, 1983. [6] R.K¨ uhnau: “M¨oglichst konforme Spiegelung an einer Jordankurve”, Jber. Deutsch. Math. Verein., Vol. 90, (1988), pp. 90–109. [7] S.L. Krushkal: “Grunsky coefficient inequalities, Carath´eodory metric and extremal quasiconformal mappings”, Comment. Math. Helv., Vol. 64, (1989), pp. 650–660.
S. Krushkal / Central European Journal of Mathematics 5(3) 2007 551–580
579
[8] S.L. Krushkal and R. K¨ uhnau: “Grunsky inequalities and quasiconformal extension”, Israel J. Math., Vol. 152, (2006), pp. 49–59. [9] R.K¨ uhnau: “Zu den Grunskyschen Coeffizientenbedingungen”, Ann. Acad. Sci. Fenn. Ser. A. I. Math., Vol. 6, (1981), pp. 125–130. [10] R.K¨ uhnau: “Wann sind die Grunskyschen Koeffizientenbedingungen hinreichend f¨ ur Q-quasikonforme Fortsetzbarkeit?”, Comment. Math. Helv., Vol. 61, (1986), pp. 290– 307. [11] S.L. Krushkal: “Beyond Moser’s conjecture on Grunsky inequalities”, Georgian Math. J., Vol. 12, (2005), pp. 485–492. [12] Y.L. Shen: “Pull-back operators by quasisymmetric functions and invariant metrics on Teichm¨ uller spaces”, Complex Variables, Vol. 42, (2000), pp. 289–307. [13] L. Ahlfors: “An extension of Schwarz’s lemma”, Trans. Amer. Math. Soc., Vol. 43, (1938), pp. 359–364. [14] D. Gaier: Konstruktive Methoden der konformen Abbildung, Springer-Verlag, BerlinG¨ottingen-Heidelberg, 1964. [15] R.K¨ uhnau: “Zur Berechnung der Fredholmschen Eigenwerte ebener Kurven”, Z. Angew. Math. Mech., Vol. 66, (1986), pp. 193–200. [16] M. Schiffer: “Fredholm eigenvalues and Grunsky matrices”, Ann. Polon. Math., Vol. 39, (1981), pp. 149–164. [17] G. Schober: “Estimates for Fredholm eigenvalues based on quasiconformal mapping.” In: Numerische, insbesondere approximationstheoretische Behandlung von Funktiongleichungen, Lecture Notes in Math., Vol. 333, Springer-Verlag, Berlin, (1973), pp. 211–217. [18] S.E. Warschawski: “On the effective determination of conformal maps”, In: L. Ahlfors, E. Calabi et al. (Eds.): Contribution to the Theory of Riemann surfaces, Princeton Univ. Press, Princeton, (1953), pp. 177–188. [19] L. Ahlfors: “Remarks on the Neumann-Poincar´e integral equation”, Pacific J. Math. Vol. 2, (1952), pp. 271–280. [20] S.L. Krushkal: “Quasiconformal extensions and reflections”, In: R. K¨ uhnau (Ed.): Handbook of Complex Analysis: Geometric Function Theory, Vol. II, Elsevier Science, Amsterdam, 2005, pp. 507–553. [21] K. Strebel: “On the existence of extremal Teichmueller mappings”, J. Anal. Math, Vol. 30, (1976), pp. 464–480. [22] F.P. Gardiner and N. Lakic: Quasiconformal Teichm¨ uller Theory, Amer. Math. Soc., 2000. [23] S. Dineen: The Schwarz Lemma, Clarendon Press, Oxford, 1989. [24] S. Kobayayshi: Hyperbolic Complex Spaces, Springer, New York, 1998. [25] C.J. Earle, I. Kra and S.L. Krushkal: “Holomorphic motions and Teichm¨ uller spaces”, Trans. Amer. Math. Soc., Vol. 944, (1994), pp. 927–948. [26] C.J. Earle and S. Mitra: “Variation of moduli under holomorphic motions”, In: Stony Brook, NY, 1998: The tradition of Ahlfors and Bers, Contemp. Math. Vol. 256, Amer. Math. Soc., Providence, RI, 2000, pp. 39–67.
580
S. Krushkal / Central European Journal of Mathematics 5(3) 2007 551–580
[27] H.L. Royden: “Automorphisms and isometries of Teichm¨ uller space”, Advances in the Theory of Riemann Surfaces (Ann. of Math. Stud.), Vol. 66, Princeton Univ. Press, Princeton, (1971), pp. 369–383. [28] S.L. Krushkal: “Plurisubharmonic features of the Teichm¨ uller metric”, Publications de l’Institut Math´ematique-Beograd, Nouvelle s´erie, Vol. 75, (2004), pp. 119–138. [29] N.A. Lebedev: The Area Principle in the Theory of Univalent Functions, Nauka, Moscow, 1975 (Russian). [30] I.M. Milin: “Univalent Functions and Orthonormal Systems”, Transl. of mathematical monographs, vol. 49, Transl. of Odnolistnye funktcii i normirovannie systemy, Amer. Math. Soc., Providence, RI, 1977. [31] M. Schiffer and D. Spencer: Functionals of finite Riemann Surfaces, Princeton Univ. Press, Princeton, 1954. [32] S. L. Krushkal: “Schwarzian derivative and complex Finsler metrics”, Contemporary Math., Vol. 382, (2005), pp. 243–262. [33] D. Minda: “The strong form of Ahlfors’ lemma”, Rocky Mountain J. Math., Vol. 17, (1987), pp. 457–461. [34] M. Abate and G. Patrizio: “Isometries of the Teichm¨ uller metric”, Ann. Scuola Super. Pisa Cl. Sci., Vol. 26, (1998), pp. 437–452. [35] V. Boˇzin, N. Lakic, V. Markovic and M. Mateljevi´c: “Unique extremality”, J. Anal. Math., Vol. 75, (1998), pp. 299–338. [36] C.J. Earle and Zong Li: “Isometrically embedded polydisks in infinite dimensional Teichm¨ uller spaces”, J. Geom . Anal., Vol. 9, (1999), pp. 51–71. [37] M. Heins: “A class of conformal metrics”, Nagoya Math. J., Vol. 21, (1962), pp. 1–60. [38] S.L. Krushkal: Quasiconformal Mappings and Riemann Surfaces, Wiley, New York, 1979. [39] S.L. Krushkal: “A bound for reflections across Jordan curves”, Georgian Math. J., Vol. 10, (2003), pp. 561–572. ¨ [40] R. K¨ uhnau: “Uber die Grunskyschen Koeffizientenbedingungen”, Ann. Univ. Mariae Curie-Sklodowska, Sect. A, Vol. 54, (2000), pp. 53–60. [41] O. Lehto: Univalent Functions and Teichm¨ uller Spaces, Springer-Verlag, New York, 1987. [42] Yu.G. Reshetnyak: “Two-dimensional manifolds of bounded curvature”, Geometry, IV, Encyclopaedia Math. Sci., Vol. 70, Springer, Berlin, 1993, pp. 3–163, 245–250. Engl. transl. from: Two-dimensional manifolds of bounded curvature, Geometry, 4, Itogi Nauki i Tekhniki, Akad. Nauk SSSR, Vsesoyuz. Inst. Nauchn. i Tekhn. Inform., Moscow, 1989, pp. 189, 273–277, 279 (Russian). [43] H.L. Royden: “The Ahlfors-Schwarz lemma: the case of equality”, J. Anal. Math., Vol. 46, (1986), pp. 261–270.
DOI: 10.2478/s11533-007-0014-4 Research article CEJM 5(3) 2007 581–595
Connections between Romanovski and other polynomials Hans J. Weber∗ Department of Physics, University of Virginia, Charlottesville, VA 22904, USA
Received 21 February 2007 ; accepted 26 April 2007 Abstract: A connection between Romanovski polynomials and those polynomials that solve the onedimensional Schr¨ odinger equation with the trigonometric Rosen-Morse and hyperbolic Scarf potential is established. The map is constructed by reworking the Rodrigues formula in an elementary and natural way. The generating function is summed in closed form from which recursion relations and addition theorems follow. Relations to some classical polynomials are also given. c Versita Warsaw and Springer-Verlag Berlin Heidelberg. All rights reserved. Keywords: Romanovski polynomials, complexified Jacobi polynomials, generating function, recursion relations, addition theorems MSC (2000): 33C45, 42C15, 33C30, 34B24
1
Introduction and review of basic results
Romanovski polynomials were discovered in 1884 by Routh [1] in the form of complexified Jacobi polynomials on the unit circle in the complex plane and were then rediscovered as real polynomials by Romanovski [2] in a statistics framework. Recently real polynomial solutions of the Scarf [3] and Rosen-Morse potentials [4] in (supersymmetric) quantum mechanics were recognized [5] to be related to the Romanovski polynomials. Here we apply to Romanovski polynomials a recently introduced natural method of reworking the Rodrigues formula [6] that leads to connections with other polynomials. The paper is organized as follows. In the Introduction we define the complemen(α,−a) tary polynomials Qν (x) then establish both a recursive differential equation satis(α,−a) fied by them and the procedure for the systematic construction of the Qν (x), derive their Sturm-Liouville differential equation (ODE), their generating function and its con∗
E-mail:
[email protected]
582
H.J. Weber / Central European Journal of Mathematics 5(3) 2007 581–595
sequences, all based on the results of ref. [6]. Section 2 deals with a parameter addition theorem, Sect. 3 with orthogonality integrals and Sect. 4 with connections of these Romanovski polynomials to the classical polynomials. Sect. 5 deals with further applications using auxiliary polynomials. Definition 1.1. The Rodrigues formula that generates the polynomials is given by (a,α)
Pl
(x) =
1 dl σ(x)l dl w0 (x) l [w (x)σ(x) ] = , l wl (x) dxl w0 (x) dxl
l = 0, 1, ... .
(1)
where σ(x) ≡ 1 + x2 is the coefficient of y of the hypergeometric ODE (1) of ref. [6] that the polynomials satisfy. The variable x is real and ranges from −∞ to +∞. The corresponding weight functions wl (x) = σ(x)−(l+a+1) e−α cot
−1
x
= σ(x)−l w0 (x)
(2)
depend on the parameters a, α that are independent of the degree l of the polynomial (a,α) (x). Pl (a,α)
(x) polynomials satisfy Pearson’s firstLemma 1.2. The weight functions of the Pl order ODE dwl (x) = [α − 2x(l + a + 1)]wl (x). (3) σ(x) dx Proof. This is straightforward to check using d cot−1 x/dx = −1/σ(x).
We now apply the method of ref. [6] and introduce the complementary polynomi(α,−a) als Qν (x) defining them inductively according to the Rodrigues representation (see Eq. (5) of ref. [6]) (a,α)
Pl
1 dl−ν [σ(x)l−ν wl (x)Q(α,−a) (x)], ν wl (x) dxl−ν
(x) =
ν = 0, 1, . . . , l.
(4)
(α,−a)
(x) ≡ 1. Then For Eq. (4) to agree with the Rodigues formula (1) for ν = 0 requires Q0 comparing Eq. (3) with Eq. (4) of ref. [6] we find the coefficient τ (x) = α − σ (x)(a + l) of y in the ODE of the polynomials. Comparing instead with the ODE (37) of ref. [5] gives their parameter β = −a. We now identify the polynomials of ref. [6] with those defined in Eq. (4) (x), l ≥ ν. Pν (x; l) = Q(α,−a) ν (α,−a)
We will show below that the polynomials Qν (α,−a)
Theorem 1.3. Qν ential relation
(5)
(x) are independent of the parameter l.
(x) is a polynomial of degree ν that satisfies the recursive differ(α,−a)
(α,−a)
Qν+1 (x) = σ(x)
dQν
+ [τ (x) + 2x(l − ν − 1)]Q(α,−a) (x) ν
(x)
+ [α − 2x(a + ν + 1)]Q(α,−a) (x), ν = 0, 1, . . . . ν
dx
(α,−a)
= σ(x)
(x)
dQν
dx
(6)
H.J. Weber / Central European Journal of Mathematics 5(3) 2007 581–595
583 (α,−a)
Proof. The inductive proof of Theorem 2.2 of ref. [6] applied to the polynomial Qν (x) proves this theorem, and Eq. (6) agrees with Eq. (76) of ref. [5] provided their parameter β = −a in ref. [6]. Since Eq. (6) is independent of the parameter l, so are the polynomials (α,−a) Qν (x) that are generated from it. Comparing the recursive ODE (6) with one stated in [2] leads us to the identification of our polynomials (α,k−m)
Qk
(x) = ϕk (m, x), Q(α,−a) (x) = ϕν (a + ν, x) ν
(7)
as a Romanovski polynomial (with its parameter depending on its degree), and comparing with Eq. (69) of ref. [5], Q(α,−a) (x) = Rν(α,β−ν) (x), β = −a. ν
(8)
Notice that the parameter α (−ν in ref. [2]) is suppressed in Romanovski’s notation. The fact that the integer index ν of the complementary polynomials occurs in the parameter m of the Romanovski polynomials is sometimes disadvantageous (for orthogonality), but also occasionally a definite advantage (for the generating function). Moreover, (α,−a)
Pl (x; l) = Ql
(α,−a)
(x) = Kl Cl
(x),
(9)
(α,−a)
where Kl is a normalization constant and the Cl (x), after a change of variables, become part of the solutions of the Scarf and Rosen-Morse potentials in the Schr¨odinger (α,−a) equation [3],[4]. However, for Cl (x) to be part of the solution of the Schr¨odinger 2b [4]; but the equation with the trigonometric Rosen-Morse potential requires α = αl = l+a polynomials may be defined for the general parameter α. Recursion relations and recursive ODEs are practical tools to systematically generate the polynomials. (α,−a)
Theorem 1.4. The polynomial Qν (α,−a)
dQν
dx
(x)
(x) satisfies the basic recursive ODE (α,−a)
(α,−a)
= −ν(2a + ν + 1)Qν−1 (x) ≡ −λν Qν−1 (x).
(10)
Proof. Eq. (10) follows from a comparison of the recursive ODE (6) with a three-term recursion relation as outlined in Corollary 4.2 of ref. [6], is ODE (32) of ref. [6], and agrees with Eq. (75) of ref. [5] provided their β = −a, which is consistent with our previous statements. (α,−a)
(x) just lowers its degree (and index) by unity, Thus, taking a derivative of Qν up to a constant factor, a property the Romanovski polynomials share with all classical polynomials.
584
H.J. Weber / Central European Journal of Mathematics 5(3) 2007 581–595 (α,−a)
Theorem 1.5. The polynomials Qν Liouville type (α,−a)
σ(x)
(x) satisfy the differential equation of Sturm(α,−a)
dQν d2 Qν (x) (x) = −λν Q(α,−a) + [α − σ (x)(a + ν)] (x). ν 2 dx dx
(11)
Proof. Substituting the basic ODE (10) in the recursive ODE (6) yields (α,−a)
Qν+1 (x) = −
(α,−a)
(α,−a)
dQ (x) σ(x) d2 Qν+1 (x) 1 − [α − (a + ν + 1)σ ] ν+1 2 λν+1 dx λν+1 dx
(12)
which, for ν → ν − 1 is the ODE of the theorem. Again, the ODE is independent of the parameter l in Eq. (5). Theorem 1.6. The polynomial Pl (x) satisfies the ODE σ(x)
d2 Pl (x) dPl (x) = −λl Pl (x). + τ (x) 2 dx dx
(13)
Proof. For ν = l, we use Pl (x) = Pl (x; l) in the notation of ref. [6] to rewrite the recursive ODE (6) of ref. [6] and Eq. (10) as Pl (x; l) = σ(x)P l−1 (x; l) + τ (x)Pl−1 (x; l) = Pl (x) = −
σ(x) τ (x) Pl (x) − P (x), (14) λl λl l
which is the ODE (1) in ref. [6] for the polynomial Pl (x). (Note that τ (x) is given right after Eq. (4).) (α,−a)
Theorem 1.7. The polynomials Qν
(x) satisfy the generalized Rodrigues formulas
σ(x)ν dν w0 (x) dν l [w (x)σ(x) ] = ; (15) l dxν w0 (x) dxν ν−μ −1 ν−l d σ(x)l−μ wl (x)Q(α,−a) (x) = w (x)σ(x) (x) , μ = 0, 1, . . . , ν. (16) Q(α,−a) ν μ l ν−μ dx
Q(α,−a) (x) = wl−1 (x)σ(x)ν−l ν
Proof. These Rodrigues formulas are those of Theorem 2.3 of ref. [6]; they agree with Eqs. (72) and (73) of ref. [5] provided their β = −a, as we found earlier. Note that, from Eq. (2) the product wl (x)σ(x)l does not depend on l, so there is no l dependence in Eqs. (15,16). (α,−a)
The Qn (x) polynomials generalize the Pn (x) in the sense of allowing any power of σ(x) in the Rodrigues formula, not just σ(x)n as for the Pn (x). In other words, the (α,−a) (α,−a) Qn (x) are associated Pn (x) (or Cn (x)) polynomials, as in the relationship between Laguerre (Legendre) and associated Laguerre (Legendre) polynomials. A generalization of Eq. (15), Q(α,−a−l) (x) = ν
σ(x)ν+l dν σ(x)−l w0 (x) , l = 0, ±1, . . . , ν w0 (x) dx
(17)
H.J. Weber / Central European Journal of Mathematics 5(3) 2007 581–595
585
just reproduces the same polynomial with a shifted parameter a → a + l. (α,−a)
Definition 1.8. The generating function for the Qν Q(x, y; α, −a) =
∞ yν ν=0
ν!
(x) polynomials is defined as
Q(α,−a) (x). ν
(18)
The generating function is our main tool for deriving recursion relations. Theorem 1.9. The generating function can be summed in the closed form wl (x)Q(x, y; α, −a) = σ(x) Q(x, y; α, −a) = (1 + x2 )a+1 eα cot
−1
x
1−l
∞ (yσ(x))ν dν −(a+1) α cot−1 x σ(x) , e ν! dxν ν=0
[1 + (x + y(1 + x2 ))2 ]−(a+1) e−α cot
−1 (x+y(1+x2 ))
∂μ l−1 w (x)σ(x) Q(x, y; α, −a) l ∂y μ ∞ (yσ(x))ν−μ dν−μ −1 μ = σ(x) (σ(x)−(a+μ) e−α cot x Q(α,−a) (x)), μ ν−μ (ν − μ)! dx ν=μ ∂ μ Q(x, y; α, −a) = wl (x)−1 σ(x)μ−l [(1 + (x + yσ(x))2 ]−(a+μ) ∂y μ −1 2 · e−α cot (x+yσ(x)) Q(α,−a) ((x + yσ(x))2 ). μ
,(19)
(20)
(21)
Both Taylor series converge if x and x + yσ(x) are regular points of the weight function. Proof. The first relation is derived in Theorem 3.2 of ref. [6] by substituting the Rodrigues formula (15) in the defining series (18) of the generating function and recognizing it as a Taylor series. The other follows similarly. Theorem 1.10. The generating function satisfies the partial differential equation (PDE) σ(x)Q(x, y; α, −a) (α,−a) ∂Q(x, y; α, −a) = Q (x + yσ(x)). ∂y 1 + [x + yσ(x)]2 1
(22)
Proof. This PDE is derived by straightforward differentiation in Theorem 3.3 of ref. [6] in preparation for recursion relations by translating the case μ = 1 in Eq. (21) into a partial differential equation (PDE). One of the main consequences of Theorem 1.10 is a general recursion relation. (α,−a)
Theorem 1.11. The Qν (α,−a)
(x) polynomials satisfy the three-term recursion relation (α,−a)
Qν+1 (x) = [α − 2x(a + ν + 1)]Q(α,−a) (x) − νσ(x)(2a + ν + 1)Qν−1 (x). ν
(23)
586
H.J. Weber / Central European Journal of Mathematics 5(3) 2007 581–595
Proof. Equation (22) translates into this recursion relation by substituting Eq. (18) defining the generating function and thus rewriting this as (1 + y 2 σ 2 (x) + 2xy)
∞ y ν−1 Q(α,−a) (x) = [α − 2x(a + 1) − 2y(a + 1)σ(x)] ν (ν − 1)! ν=1 ∞ y ν (α,−a) Q (x). (24) · ν! ν ν=0
Just like the recursive ODE (6), this recursion allows for a systematic construction of the Romanovski polynomials, in contrast to the Rodrigues formulas which become impractical for large values of the degree ν. (α,−a)
Theorem 1.12. The polynomials Qν (x) satisfy the differential equation of SturmLiouville type (α,−a) dQ (x) d ν σ(x)l−ν+1 wl (x) = −λν σ(x)l−ν (x)wl (x)Q(α,−a) (x); (25) ν dx dx λν = ν(2a + ν + 1) . Proof. This ODE is equivalent to the ODE (11) and agrees with Eq. (78) of ref. [5] if β = −a there. Note that the inductive proof in Theorem 5.1 in ref. [6] is much lengthier than our proof of Eqs. (11,13).
2
Parameter Addition
The multiplicative structure of the generating function of Eq. (19) involving the two parameters in the exponents of two separate functions, as displayed in
a+1 y−x σ(x) −1 −1 ; α, −a = eα(cot x−cot y) , Q x, σ(x) σ(y)
(26)
allows for the following theorems. (α,−a)
Theorem 2.1. The Qν
(x) polynomials satisfy the parameter addition relation
(α +α ,−a −a −1) QN 1 2 1 2 (x)
=
N N ν1 =0
ν1
(α ,−a2 )
2 1 ,−a1 ) Q(α (x)QN −ν ν1 1
(x).
(27)
Proof. This formula follows from the Taylor expansion ∞ ∞ y ν1 +ν2 (α1 ,−a1 ) y N (α1 +α2 ,−a1 −a2 −1) (α2 ,−a2 ) Qν1 Q (x)Qν2 (x) = (x). ν !ν ! N! N ν ,ν =0 1 2 N =0 1
2
(28)
H.J. Weber / Central European Journal of Mathematics 5(3) 2007 581–595
587
of the generating function identity Q(x, y; α1, −a1 )Q(x, y; α2 , −a2 ) = Q(x, y; α1 + α2 , −a1 − a2 − 1).
(29)
Given the complexity of the polynomials, the elegance and simplicity of this relation are remarkable. Example 2.2. The case N = 0 is trivial, and N = 1 becomes the additive identity (α1 +α2 ,−(a1 +a2 +1))
Q1
(x) = α1 + α2 − 2x(a1 + a2 + 2) = [α1 − 2x(a1 + 1)] + [α2 − 2x(a2 + 1)] (α1 ,−a1 )
= Q1
(α2 ,−a2 )
(x) + Q1
(x).
(30)
The first case involving additive and multiplicative aspects of the polynomials is N = 2 which we decompose and multiply out as follows: (α1 +α2 ,−a1 −a2 −1)
Q2
= [α1 + α2 − 2x(a1 + a2 + 2)][α1 + α2 − 2x(a1 + a2 + 3)] − 2σ(x)(a1 + a2 + 2) = {[α1 − 2x(a1 + 1)] + [α2 − 2x(a2 + 1)]} · {[α1 − 2x(a1 + 2)] + [α2 − 2x(a2 + 1)]} − 2σ(x)[(a1 + 1) + (a2 + 1)] = [α1 − 2x(a1 + 1)][α1 − 2x(a1 + 2)] − 2σ(x)(a1 + 1) + [α2 − 2x(a2 + 1)][α2 − 2x(a2 + 2)] − 2σ(x)(a2 + 1) + [α1 − 2x(a1 + 1)][α2 − 2x(a2 + 1)] + [α2 − 2x(a2 + 1)][α1 − 2x(a1 + 1)] (α1 ,−a1 )
= Q2
(α2 ,−a2 )
+ Q2
(α1 ,−a1 )
+ Q1
(α2 ,−a2 )
Q1
(α2 ,−a2 )
+ Q1
(α1 ,−a1 )
Q1
.
(31)
The addition theorem is consistent with the homogeneous polynomial theorem in the √ variables x, α, σ (without using σ = x2 + 1) which the polynomials satisfy and can be generalized to an arbitrary number of parameters. Theorem 2.3. The Qν (α,−a) (x) polynomials satisfy the more general polynomial identity (α +α +···+αn ,−(a1 +a2 +···+an +n−1)) (x) QN 1 2
=
0≤νj ≤N,ν1 +···+νn
n N! (αj ,−aj ) n Qνj (x). (32) 1 νj ! j=1 =N +n
Proof. This follows similarly from the Taylor expansion of the product identity of n generating functions n
Q(x, y; αj , −aj ) = Q(x, y; α1 + α2 + · · · + αn , −(a1 + a2 + · · · + an + n − 1)). (33)
j=1
588
H.J. Weber / Central European Journal of Mathematics 5(3) 2007 581–595
As an application of the parameter addition theorem we now separate the two pa(0,−a) (α,1) and Qμ . To this end, we rameters a and α into two sets of simpler polynomials Qν expand the generating functions in the identity Q(x, y; α, −a) = Q(x, y; 0, −a)Q(x, y; α, 1)
(34)
in terms of their defining polynomials. This yields (α,−a)
Theorem 2.4. The Qν
(x) polynomials satisfy the decomposition identity
(α,−a) QN (x)
=
N N ν=0
ν
(α,1)
Qν(0,−a) (x)QN −ν (x).
(35)
Proof. This identity follows from a Taylor expansion of the generating function identity (34) in terms of sums of products of polynomials involving only one parameter each. Definition 2.5. The generating function α[cot−1 x−cot−1 (x+yσ(x))]
e
=
∞ yν ν=0
ν!
Q(α,1) (x) ν
(36)
defines the second set of the polynomials, while the first one will be treated in detail below (0,−a) (x) as finite sums of Gegenbauer polynomials in upon expanding the polynomials Qν Sect. IV or finite power series in Eq. (53). (0,−a)
We also note that Qν polynomial expansion.
(α,−a)
Theorem 2.6. The Qν
(0,−a)
(x) = Kν Cν
(x), so the latter also have a Gegenbauer
(x) polynomials satisfy the (parity) symmetry relation
Q(−α,−a) (x) = (−1)ν Q(α,−a) (−x). ν ν
(37)
Proof. This relation derives from the generating function identity Q(−x, −y; −α, −a) = Q(x, y; α, −a)
(38)
which holds because α cot−1 (x+yσ(x)) in the generating function Eq. (18) stays invariant under α → −α, x → −x, y → −y, and σ(−x) = σ(x).
3
Orthogonality Integrals
This section deals with an application of the generating function to integrals that are relevant for studying the orthogonality of the polynomials.
H.J. Weber / Central European Journal of Mathematics 5(3) 2007 581–595 (α,−a)
Definition 3.1. We define orthogonality integrals for the Qν
(α,−a)
∞
(α,−a)
589
(x) polynomials by [5]
−1
(x)Qν (x)e−α cot x = 0, a > −3/2, μ + ν even, (39) σ(x)(μ+ν)/2+a+2 −∞ while for μ+ν odd there needs to be an extra σ(x) in the numerator for the orthogonality integrals to vanish. (a,α) Oμ,ν
=
dx
Qμ
(α,−a)
Thus, the Qν (x) polynomials form two infinite subsets, each with general orthogonality, but polynomials from different subsets are not mutually orthogonal. While displaying infinite orthogonality, this property falls short of the general orthogonality (α,−a) of all classical polynomials. The Qν (x) polynomials form a partition of the set of all Romanovski polynomials, as shown in Eq. (8), with upper index dependent on the (α,β) running index ν, though. The Romanovski polynomials Rν (x) with upper indices independent of the degree ν, the running index, form another partition that has the finite (α ,−a) orthogonality, as discussed in more detail in ref. [5]. The orthogonality of the Cn n (x) polynomials from the Schr¨odinger equation with α = αn as discussed below Eq. (9), is yet another form of orthogonality similar to that of hydrogenic wave functions, which also differs from the mathematical orthogonality of associated Laguerre polynomials, the subject of Exercise 13.2.11 in ref. [7]. The orthogonality integrals of Eq. (39) suggest analyzing the following integral of the generating functions
2
∞ y dx −1 Q(x, √ ; α, −a) e−α cot x I(y) = a+2 σ −∞ σ(x) −1 ∞ (α,−a) (α,−a) ν1 +ν2 ∞ Qν1 y (x)Qν2 (x)e−α cot x = dx ν !ν ! σ(x)(ν1 +ν2 )/2+a+2 1 2 −∞ ν ,ν =0 1
2
∞ y ν1+ν2 (a,α) O = ν !ν ! ν1 ,ν2 ν ,ν =0 1 2 1
(40)
2
(α,−a)
which is written directly in terms of orthogonality integrals Oν1,ν2 defined in Eq. (39). On the other hand, we can express the integral as √
∞ −1 −1 −1 dxe−α cot x+2α cot x−2α cot (x+y σ) I(y) = √ ]2(a+1) σ(x)a+2 [1 + y 2 + 2xy −∞ σ √ √
∞
∞ −1 −1 α cot−1 x−2α cot−1 (x+y σ) dx e dx e−α cot x+2α cot (x−y σ) = + , (41) √ ]2(a+1) √ ]2(a+1) σ a+2 [1 + y 2 + 2xy σ a+2 [1 + y 2 − 2xy 0 0 σ σ (α,−a)
which is manifestly not even in the variable y. If the Qν polynomials were orthogonal, then the double sum in I of Eq. (40) would collapse to a single sum over normalization integrals multiplied by even powers of y, i.e. I would be an even function of y. This (α,−a) result shows that the Qν polynomials are not orthogonal in the conventional sense. √ (a,α) In fact, the extra σ in the orthogonality integrals O2ν,2μ+1 is not built into the generating function. In other words, the fact that I(y) = I(−y) indirectly confirms that the
590
H.J. Weber / Central European Journal of Mathematics 5(3) 2007 581–595
(α,−a)
Qν polynomials have more complicated orthogonality properties than the Romanovski polynomials with parameters that are independent of the degree of the polynomial, as discussed in more detail in ref. [5]. Let us next consider the special parameter α = 0 and analyze similarly the integral
∞ ∞ (0,−a) y dx y ν ∞ dxQν (x) y ν (a,0) O I0 (y) = Q(x, √ ; 0, −a) = = a+2 σ ν! −∞ σ ν/2+2+a ν! ν,0 −∞ σ(x) ν=0 ν=0
∞
∞ dx 1 dx 1 = + , (42) 2xy a+2 a+2 √ ]a+1 σ σ [1 + y 2 + √σ ]a+1 [1 + y 2 − 2xy 0 0 σ
∞
(0,−a)
(0,−a)
(x) are orthogonal to Q0 (x) = 1 then the which is an even function of y. If the Qν sum in Eq. (42) will collapse to its first term and I0 is a constant. It is quite a surprise that this actually happens in the interval −1 ≤ y ≤ 1 for all parameter values a for which the integral I0 converges. For example, I0 (y) = r(a)π =const. with a rational number r(a) that depends on the exponent a, where r(0) = 1/2, r(1) = 3/4, r(2) = 5/8, r(3) = 35/64, r(4) = 32 · 7/27 , if a is a non-negative integer; in general I0 (y) = √ πΓ(a + 3/2)/Γ(a + 2). For y > |1|, I0(y) varies and deviates from the constant value. From the structure of the integral, this anomalous behavior of I0 is rather unexpected. (a,0) Since for α = 0 parity is conserved in the ODE (11), the orthogonality integrals O2ν,0 (a,0) (a,0) are zero and O2ν,2μ = 0 more generally. Since it is shown in ref. [5] that O2ν,2μ for μ = ν (0,−a) vanish, the Qν (x) polynomials are orthogonal in the conventional sense. Since each (0,−a) (0,−a) (0,−a) (x) is proportional to Ql (x), the Cl (x) polynomials are orthogonal. This Cl is confirmed by I0 and its constancy in the interval −1 ≤ y ≤ 1 is thus proved. The restriction to parameter value α = 0 can be removed:
I1 (y) =
∞ −∞ ∞
=
−∞ ∞
= 0
y ν (a,α) y dx √ Oν,0 ; α, −a) = Q(x, σ(x)a+2 ν! σ ν=0 ∞
−1
√
−1
√
dx e−α cot (x+y σ) √ ]a+1 σ a+2 [1 + y 2 + 2xy σ dx e−α cot (x+y σ) + √ ]a+1 σ a+2 [1 + y 2 + 2xy σ
0
∞
−1
√
dx e−α cot (x+y σ) , √ ]a+1 σ a+2 [1 + y 2 − 2xy σ
(43)
which is neither even in y nor independent of y. Therefore, if we wish to find the nor(α,−a) polynomials we have to split up the generating function into malizations of the Qν its even and odd parts in y and integrate them separately, each with the proper power of σ(x) in the orthogonality integral.
4
Relations with Gegenbauer Polynomials
The relation of the Romanovski polynomials as complexified Jacobi polynomials on the unit circle in the complex plane is described in detail in ref. [5]. Therefore, we focus here on relations with Gegenbauer polynomials.
H.J. Weber / Central European Journal of Mathematics 5(3) 2007 581–595
591
We start with the simplest case of parameter values a = 0 = α, which also happens to be relevant for physics [4], to derive from the generating function an expression for (α,−a) (x) in terms of a finite sum of Gegenbauer polynomials. For a = 0 = α, Eq. (19) Qν takes the explicit form Q(x, y; 0, 0) = (1 + x2 )[1 + (x + y(1 + x2 ))2 ]−1 =
1 . 1 + 2xy + y 2 (1 + x2 )
(44)
(0,0)
Theorem 4.1. The Qm (x) polynomials have the expansion into Gegenbauer polynomials [m/2] (0,0) (x) Qm
= m!
(n+1)
(−1)n x2n Cm−2n (−x), m = 0, 1, . . . .
(45)
n=0
Proof. For |xy|/|1+2xy +y 2| < 1, the generating function identity (44) may be expanded as an absolutely converging geometric series Q(x, y; 0, 0) =
∞ n=0
(−x2 y 2 )n . (1 + 2xy + y 2 )n+1
(46) (n+1)
Substituting the generating function of Gegenbauer polynomials [7] Cl 2 −(n+1)
(1 − 2xy + y )
=
∞
(n+1)
Cl
(x)y l ,
(x), (47)
l=0
we obtain the expansion [m/2] ∞ ∞ ∞ (n+1) (n+1) 2 2 n l m Q(x, y; 0, 0) = (−x y ) Cl (−x)y = y (−1)n x2n Cm−2n (−x), (48) n=0
m=0
l=0
n=0
where m = l + 2n was used upon interchanging the summations, with [m/2] the integer part of m/2. On comparing with Eq. (19) defining the generating function Q(y, x; α, −a) (0,0) we obtain the expansion of the Qm (x) polynomials as a finite sum of Gegenbauer polynomials of Theorem 4.1. (0,0)
(0,0)
(0,0)
Since Qm (x) = Km Cm (x), this result is also valid for the Cm (x) polynomials. It can be generalized to parameter values a = 0 : (0,−a)
Theorem 4.2. The QN
(x) polynomials have the Gegenbauer polynomial expansion
(0,−a) (x) QN
[N/2]
= N!
n=0
−a − 1 n
(n+a+1)
x2n CN −2n (−x).
(49)
592
H.J. Weber / Central European Journal of Mathematics 5(3) 2007 581–595
Proof. This relation follows from expanding the generating function
a+1 1 σ(x) = Q(x, y; 0, −a) = 2 2 2 1 + x + y σ (x) + 2xyσ(x) [1 + 2xy + y 2 + x2 y 2 ]a+1
∞ −a − 1 (x2 y 2 )n = [1 + 2xy + y 2]n+a+1 n n=0
∞ −a − 1 (n+a+1) (x2 y 2)n Cl = (−x)y l n n,l=0 [N/2] ∞ −a − 1
(n+a+1) N x2n CN −2n (−x) = y (50) n n=0 N =0 in terms of the binomial series and then again using the generating functions of the Gegenbauer polynomials. (α,−a)
Theorem 4.3. The QN pansion
(x) polynomials have the general Gegenbauer polynomial ex-
1 (α,−a) QN (x) = N! ν=0 N
N ν
(α,1) QN −ν (x)
[N/2]
n=0
−a − 1 n
(n+a+1)
x2n CN −2n (−x).
(51)
Proof. Substituting the expansion of Theorem 4.2 into Eq.(35) in which the Gegenbauer (α,1) polynomials depend only on the parameter a while the Qν (x) depend only on α yields the desired expansion. The Gegenbauer polynomials are well-known generalizations of Legendre polynomials. The hyperbolic Gegenbauer ODE (λ)
σ(x)y − (2λ + 1)xy + Λl y = 0
(52)
becomes the ODE (11) for α = 0, ν = l and 2λ + 1 = 2(a + ν), so the solutions of (0,−(λ−l+1/2)) (x) polynomials. In fact, for α = 0 we can directly solve Eq. (52) are the Ql (0,−a) the ODE (11) for the Qν (x) polynomial solutions in terms of finite power series. (0,−a)
Theorem 4.4. The QN (0,−a) QN (x)
(x) polynomials have the explicit finite power series
[N/2]
=
xN −2μ aμ , aμ = −
μ=0
a1 = −
(N − 2μ + 2)(N − 2μ + 1) aμ−1 , 2μ(2a + 2μ + 1)
N(N − 1) N(N − 1) · · · (N − 2μ + 1) , aμ = (−1)μ μ 2(2a + 3) 2 μ!(2a + 3)(2a + 1) · · · (2a + 2μ + 1)
(α,−a) QN (x)
=
N N ν=0
ν
(α,1) QN −ν (x)
(53)
[N/2]
μ=0
xN −2μ aμ
(54)
H.J. Weber / Central European Journal of Mathematics 5(3) 2007 581–595
593
with aμ from Eq. (53) and [N/2] denoting the integer part of N/2. Proof. Since the proof by mathematical induction is straightforward, we just give the results. As the ODE is invariant under the parity transformation, x → −x, we have even and odd solutions. Substituting Eq. (53) in Eq. (35) yields the second relation stated in Theorem 4.4. (0,−a)
This is also valid for the Cl
5
(x) polynomials up to the normalization Kl .
Auxiliary polynomials
Carrying out the innermost derivative of the Rodrigues formula (1), we find
σ l dl−1 w0 σ l dl−1 σ w0 (α,−a−1) [α − σ (a + 1)] = αQl−1 ,(55) (x) − (a + 1) Pl (x) = w0 dxl−1 σ w0 dxl−1 σ and are led to define auxiliary polynomials: σ(x)l+1 dl Sl+1 (x) = w0 (x) dxl
σ (x)w0 (x) σ(x)
.
(56)
Example 5.1. S1 (x) = σ (x), S2 (x) = σ σ(x) + σ (x)[α − σ (x)(a + 2)], . . . .
(57)
So Sl (x) =
α Pl (x) (α,−a−1) Ql−1 . (x) − a+1 a+1
(58)
Applying a derivative to w0 Sl /σ l yields
α d w0 (α,−a−1) 1 d w0 Pl (x) dl σ (x)w0 (x) = . Q (x) − dxl σ(x) a + 1 dx σ l l−1 a + 1 dx σl (α,−a−1)
Using the recursive ODEs for Ql−1 σ(x)l+1 dl w0 (x) dxl
σ (x)w0 (x) σ(x)
(59)
and Pl yields (α,−a)
Q (x) α (α,−a−1) (α,−a−1) = (Ql . (60) (x) − σ Ql−1 (x)) − l+1 a+1 a+1
A comparison of Eqs. (58,60) yields (α,−a−1)
Pl+1 (x) = ασ (x)Ql−1
(α,−a)
complementing the relation Pl (x; l) = Kl Cl
(α,−a)
(x) + Ql+1
(α,−a)
(x) = Ql
(x),
(61)
(x) = Pl (x).
For Laguerre polynomials σ(x) = x and relation (55) corresponds to −x
l e l+1 x d = l!L−l−1 Sl+1 (x) = x e (x), l l dx x
(62)
594
H.J. Weber / Central European Journal of Mathematics 5(3) 2007 581–595
while Eq. (55) becomes lLl (x) = lLl−1 (x) − xL1l−1 (x).
(63)
For Jacobi polynomials σ (x) = −2x. As −2x = 1 − x − (1 + x), where 1 ± x can be incorporated into the weight functions w(x) = (1 − x)a (1 + x)b , there is no need to introduce auxiliary polynomials. For example, Eq. (55) becomes (a,b)
2lPl
6
(a,b+1)
(x) = (a + l)(1 + x)Pl−1
(a+1,b)
(x) − (b + l)(1 − x)Pl−1
(x).
(64)
Discussion (α,−a)
We have used a simple and natural method for constructing polynomials Qν (x) that (α,−a) are complementary to the Cn (x) polynomials and related to them by a Rodrigues (α,−a) formula. Similar to the classical orthogonal polynomials, the Qν (x) appear as solutions of a Sturm-Liouville ordinary second-order differential equation and obey Rodrigues formulas themselves. On the other hand, and different from the classical polynomials, their infinite sets of orthogonality integrals are not the standard ones. These real orthogonal polynomials and their nontrivial orthogonality properties are closely related to Romanovski polynomials and to physical phenomena. In summary, all basic properties of Romanovski polynomials derive from the Rodrigues formula (1) except for the orthogonality integrals.
Acknowledgments (α,−a)
It is a pleasure to thank M. Kirchbach for introducing me to the Cn (x) polynomials. Thanks are also due to V. Celli for help with some of the orthogonality integrals.
References [1] E.J. Routh: “On some properties of certain solutions of a differential equation of second order”, Proc. London Math. Soc., Vol. 16, (1884), pp. 245–261. [2] V. Romanovski: “Sur quelques classes nouvelles de polynomes orthogonaux”, C. R. Acad. Sci. Paris, Vol. 188, (1929), pp. 1023–1025. [3] N. Cotfas: “Systems of orthogonal polynomials defined by hypergeometric type equations with application to quantum mechanics”, Cent. Eur. J. Phys., Vol. 2, (2004), pp. 456–466; “Shape invariant hypergeometric type operators with application to quantum mechanics”, Preprint: arXiv:math-ph/0603032. [4] C.B. Compean and M. Kirchbach: “The trigonometric Rosen-Morse potential in supersymmetric quantum mechanics and its exact solutions”, J. Phys. A-Math. Gen., Vol. 39, (2006), pp. 547–557. [5] A. Raposi, H.J. Weber, D. Alvarez-Castillo and M. Kirchbach: “Romanovski polynomials in selected physics problems”, Cent. Eur. J. Phys., to be published.
H.J. Weber / Central European Journal of Mathematics 5(3) 2007 581–595
595
[6] H.J. Weber: “Connections between real polynomial solutions of hypergeometric-type differential equations with Rodrigues formula”, Cent. Eur. J. Math., Vol. 5, (2007), pp. 415–427. [7] G.B. Arfken and H.J. Weber: Mathematical Methods for Physicists, 6th ed., ElsevierAcademic Press, Amsterdam, 2005.
DOI: 10.2478/s11533-007-0015-3 Research article CEJM 5(3) 2007 596–606
On hypercentral groups B.A.F.Wehrfritz∗ School of Mathematical Sciences Queen Mary, University of London, Mile End Road London E1 4NS England
Received 6 March 2007; accepted 8 May 2007 Abstract: Let G be a hypercentral group. Our main result here is that if G/G is divisible by finite then G itself is divisible by finite. This extends a recent result of Heng, Duan and Chen [2], who prove in a slightly weaker form the special case where G is also a p-group. If G is torsion-free, then G is actually divisible. c Versita Warsaw and Springer-Verlag Berlin Heidelberg. All rights reserved. Keywords: hypercentral group, divisible-by-finite group MSC (2000): 20 F 19
1
Introduction
We consider here the influence of the abelianization G/G on the structure of a hypercentral group G (a group G is hypercentral if its upper central series, continued transfinitely if necessary, reaches G, the least ordinal for which it reaches G being the central height of G). For example, if G/G is divisible, then G is divisible; this is part of 9.23 of Robinson’s Ergebnisse [5], a result derived from work of S.N. Chernikov and G. Baumslag of some 12 to 26 years earlier. This definitive account of Robinson’s of the situation in 1972 was where the topic seems to have laid until very recently. In 2006 Heng, Duan and Chen [2] published the following very interesting theorem. Let G be a hypercentral p-group for some prime p. Then G is divisible by finite if and only if the index (G : Gp ) is finite and then G is a finite extension of a divisible abelian group. If (G : Gp ) is finite, then trivially (G : G Gp ) is finite and the latter happens only if G/G is divisible by finite (recall G is a p-group here). Our first goal is to give a short proof of the following strengthening of Heng, Duan and Chen’s theorem. ∗
E-mail:
[email protected]
B.A.F. Wehrfritz / Central European Journal of Mathematics 5(3) 2007 596–606
597
Theorem 1.1. Let G be a hypercentral p-group for some prime p. Then the index (G : G Gp ) is finite if and only if G is divisible by finite and then G is divisible-abelian by finite with central height less than ω2 but not ω. If also G is nilpotent, then G is actually divisible-central by finite. As an example of a group G as in Theorem 1.1, consider the infinite locally dihedral 2-group (see [3] p. 47). Here G is a 2-group, (G : G ) = (G : G2 ) = 2 and G has central height ω + 1. In particular G is not centre by finite. Further G is a Pr¨ ufer 2-group and so is divisible abelian. The account in [5] quoted above relates to hypercentral groups in general and not just to p-groups. The obvious question is to what extent does Theorem 1.1 extend to arbitrary hypercentral groups. This is answered by the following, whose proof is more involved than that of Theorem 1.1. Theorem 1.2. Let G be a hypercentral group. Then G is divisible by finite if and only if G/G is divisible by finite. If G is torsion-free with G/G divisible by finite, then G is divisible. Since there exist divisible hypercentral groups with arbitrarily large central heights, unlike in Theorem 1.1, here there can be no bound on the central height of G, nor need there be a divisible abelian (or for that matter divisible of some bounded central height) subgroup of G of finite index (cf. Lemma 3.1 below). For example, Ref. [6] (p. 421, Remark e) constructs hypercentral unipotent finitary linear groups with unbounded central heights and choosing the ground field to be the rationals yields divisible such groups; alternatively one can use McLain’s construction in [4] Section 4. However quite a bit can be said about the structure of a group as in Theorem 1.2 and the following collects together a number of bits and pieces. For any group G we denote its upper central series by {ζα (G)}0≤α≤σ , where σ is the central height of G, and its hypercentre by ζ(G). Theorem 1.3. Let the group G be hypercentral and divisible by finite. Then G has a unique maximal divisible subgroup H and a unique maximal periodic subgroup T and they satisfy the following. a) G/T is divisible and G/(H ∩ T ) = H/(H ∩ T ) × T /(H ∩ T ). b) H ∩ T is a divisible central subgroup of H and H/(H ∩ T ) is divisible and torsionfree. c) T /(H ∩ T ) is finite, H ∩ T ≤ ζω (G) and T ≤ ζω+r (G) for some integer r ≥ 0. Much of the work for Theorem 1.2 beyond that needed for Theorem 1.1 involves groups of finite exponent; that is, we will be checking the orders of elements rather than the orders of groups. On the way to proving Theorem 1.2 we will pass the following. Theorem 1.4. Let G be a hypercentral group with G/G divisible by of finite exponent (equivalently; with G Gn /G divisible for some positive integer n).
598
B.A.F. Wehrfritz / Central European Journal of Mathematics 5(3) 2007 596–606
a) If G is nilpotent, then G is divisible by of finite exponent (dividing nc , where n is as above and c is the nilpotency class of G). b) If G is torsion-free, then G is divisible. c) There exist periodic examples of such groups G with G/G of exponent 2 and G not divisible by of finite exponent. Moreover there exist such G with central heights ω and ω + 1. If G is a group with G/G divisible by finite there are two extreme cases. The first, where G/G is divisible, is covered extensively by Robinson in [5]; the second, where G/G is finite, is easily settled. Theorem 1.5. Let G be a hypercentral group with G/G finite. Then G is periodic, divisible-abelian by finite and has central height less then ω2 but not ω. Of course the infinite dihedral 2-group is an example of a group as in Theorem 1.5. Note that by Theorem 1.4c) there is no analogous result for G/G of finite exponent. Throughout this paper π denotes a set of primes and π its complementary set of primes. Robinson in [5] works with a set-up relativised to π. Specifically instead of divisible groups he works with π-divisible groups; that is, groups G such that for every x ∈ G and every π-number n (i.e. a positive integer whose prime divisors lie in π) there exists y ∈ G with x = y n . We do the same. Thus the above theorems extend to a π version (though for Theorem 1.1 this adds nothing). The precise statements of these π-results we give below at the time of their proofs. The theorems of this introduction are then obtained by setting π equal to the set of all primes, a set that is natural to denote by ∅ . By a π-torsion-free group we mean one with no non-trivial elements of order a π-number and, of course, by a π-group we mean one in which every element has order a π-number. We focus in this paper on divisibility. However there are other influences of G/G on a hypercentral group G recorded in the literature. The earlest seems to be Gr¨ un’s Lemma; if G/G =< 1 > then G =< 1 >. Another easy result with a similar proof is that G is a π-group whenever G/G is a π-group. See [5] Vol. 1, p. 48.
2
Abelian Groups
The following are some simple facts about abelian groups that we will use later. We write these groups additively. Lemma 2.1. Let G be an abelian group and H a subgroup of G of exponent n with G/H π-divisible. Then nG is π-divisible. Proof. For x −→ nx is a homomorphism of G onto nG with kernel containing H. Then nG, as an image of G/H, is π-divisible.
B.A.F. Wehrfritz / Central European Journal of Mathematics 5(3) 2007 596–606
599
Lemma 2.2. Let G be a finite by π-divisible, abelian group. Then G is π-divisible by finite. Proof. Let H be a finite subgroup of G with G/H π-divisible. If n is a π-number, then H + nG = G and so (G : nG) ≤ |H|. Pick n with (G : nG) maximal. Then nG = mnG for every π-number m. Thus nG is π-divisible, while (G : nG) ≤ |H| is finite. Lemma 2.3. Let G be an abelian p-group for some prime p. If (G : pG) is finite, then G is divisible by finite. (Conversely if G is divisible by finite, then (G : pG) is finite and G is a direct sum of a divisible group and a finite group.) Proof. Suppose (G : pG) is finite. If G = C ⊕ H, where C is a non-trivial cyclic group, then (H : pH) = (G : pG)/p < (G : pG). Thus the conclusion will follow from induction on (G : pG). If no such C exists then every element of G of order p has infinite height and G is divisible (see Ref. [1] 27.2 and Remark C on p. 98). The result follows. Lemma 2.4. If the abelian group G is π-divisible by (of finite exponent / resp. finite), then G is π-divisible by (a π-group of finite exponent / resp. a finite π-group). Proof. For G has a maximal π-divisible subgroup H with G/H of finite exponent. Then < 0 > is the only π-divisible subgroup of G/H. Therefore G/H is a π-group. In the second case G/H is actually finite and the second claim follows. Lemma 2.5. If H is a π-divisible subgroup of the abelian group G with G/H a π-group, then G splits over H. Proof. Copy the proof of Ref. [1] 21.1 or see Remark K on p. 223. Lemma 2.5 is an example where the full divisible result does not extend to the πdivisible case in general. For if we take π = ∅ in 2.5 the hypothesis on G/H is that G/H is periodic where as in fact no such hypothesis is needed in this case. We remark in passing that although an abelian group G with G = pG for every prime p is divisible, an abelian group with (G : pG) finite for every prime p need not be divisible by finite. Indeed even if such a group is torsion-free and reduced it need not have finite rank.
3
The Heng-Duan-Chen theorem
Lemma 3.1. Let G be a locally nilpotent group with an abelian normal subgroup A satisfying G/CG (A) finite. a) Then A ≤ ζω (G) and if G is abelian by finite, then G has central height less than ω2. b) Suppose A is torsion-free. Then A ≤ ζ1 (G) and if A has finite index in G, then G
600
B.A.F. Wehrfritz / Central European Journal of Mathematics 5(3) 2007 596–606
is nilpotent. Proof. a) Let X be a finitely generated subgroup of G with G = X.CG (A). Then X is nilpotent, say of class c. Clearly A ∩ ζ1 (X) ≤ ζ1 (G). A simple induction yields that A ∩ ζc (X) ≤ ζc (G). Thus A ∩ X ≤ ζc (G) and, since G = X X, we conclude that A ≤ ζω (G). The second part follows at once. b) Here A is torsion-free, so with X as in a) above, A ∩ X is free abelian. Now unipotent groups in characteristic zero are torsion-free and G/CG (A) acts unipotently on A ∩ X. Therefore A ∩ X ≤ ζ1 (G) and the rest follows. Lemma 3.2. Let G be a residually finite, locally nilpotent p-group with (G : G Gp ) finite. Then G is finite. Proof. For G is locally finite and G = XG Gp for some finite subgroup X of G. If φ is a homomorphism of G onto a finite group, then (G Gp )φ is the Frattini subgroup of Gφ, Gφ = Xφ.(G Gp )φ = Xφ and |Gφ| ≤ |X|. This is for all such φ and G is residually finite. Consequently |G| ≤ |X| and G = X is finite. The Proof of Theorem 1.1. If X is any subgroup of G set θX = X X p and N = 1≤k<∞ θk G (where θ2 = θ(θG) etc.). Then G/θk G is soluble of finite exponent. It is also finite, for induction on k yields that G/θk+1 G is elementary-abelian by finite and hence residually finite and so finite by 3.2. Therefore G/N is residually finite and so finite by 3.2 again; thus N = θk G = θk+1 G for some k. By [5] 9.23 the group N is abelian and divisible. Clearly G = XN for some finite subgroup X of G. If G has central height ≤ ω, then X lies in some finitely indexed term of the upper central series of G and hence G is nilpotent. Thus N is central in G by [3] 1.F.1 or [5] 3.13. Finally G has central height less than ω2 by 3.1. Lemma 3.3. Let G be a periodic hypercentral group with G/G divisible by finite. Then G is divisible-abelian by finite and of central height less than ω2 and not ω. Proof. Now G is a direct product ×pGp , where Gp is a p-group and p ranges over all primes. Clearly G/G is isomorphic to the direct product of the Gp /Gp . Thus for almost all p we have Gp /Gp divisible and for these p the group Gp is divisible abelian by Ref. [5] 9.23. For the remaining primes Gp is divisible-abelian by finite by Theorem 1.1 of central height less than ω2 and not ω. The claim follows. If G is a π -group, then G and G/G are π-divisible, so clearly there can be no restriction on the central height of such G, unlike as in 3.3. Lemma 3.4. Let G be a periodic hypercentral group with G/G π-divisible by finite. Then G is (π-divisible with its π-component abelian) by (a finite π-group).
B.A.F. Wehrfritz / Central European Journal of Mathematics 5(3) 2007 596–606
601
Proof. Here G = H × K, where H is a π-group and K is a π -group. Then H/H , as an image of G/G , is divisible by finite. Thus H is divisible-abelian by a finite π-group by 3.3. As a π -group K is always π-divisible. The claims follow. If G is nilpotent in Theorem 1.1 then G is central by finite. Here is a further condition that implies this conclusion; compare Theorem 3.3 of Ref. [2]. Lemma 3.5. Let G be a locally nilpotent, FC p-group with (G : G Gp ) finite. Then G is a finite extension of a central divisible subgroup. Proof. As an FC group, G modulo its centre Z is residually finite (e.g. Ref. [2], 4.31). Thus G/Z is finite by 3.2, the group G is nilpotent and an application of Theorem 1.1 yields the result.
4
Groups G with G/G finite
Here we prove Theorem 1.5. Thus G is a hypercentral group with G/G finite. Suppose first that G is also torsion-free. If z ∈ ζ2 (G), then [G, z] ≤ ζ1 (G) is torsion-free and, as an image of G/G , is finite. Thus [G, z] =< 1 > and ζ1 (G) = ζ2 (G) = G. But then G∼ = G/G is finite and torsion-free, so G =< 1 >. The above shows that in general G is periodic, so G = ×p Gp , where Gp is a p-group for each prime p. Clearly G/G ∼ = ×p (Gp /Gp ). Thus Gp = Gp for almost all p and for such p we have Gp =< 1 > (e.g. by Gr¨ un’s Lemma). If Gp = Gp then Gp is a finite extension of a divisible abelian p-group by Theorem 1.1, of central height less than ω2 and not ω. Since there are only a finite number of primes of this type, the theorem follows. Although we have not recorded it in the statement of Theorem 1.5, it is clear from the above that if G/G is also a π-group, then G too is a π-group.
5
Groups G with G/G of finite exponent
We begin with some general and quite likely well-known lemmas. Lemma 5.1. Let M be a non-trivial normal subgroup of a group G with M ∩ζ(G) =< 1 >. Then M ∩ ζ1 (G) =< 1 >. Proof. For just pick the ordinal α minimal with M ∩ ζα (G) =< 1 >. Then β = α − 1 exists and [M ∩ ζα (G), G] ≤ M ∩ ζβ (G) =< 1 >. Consequently α = 1. Lemma 5.2. Let < 1 >≤ N ≤ M be a series in the group G stabilized by G with N of finite exponent n a π-number and with M/N abelian and π-divisible. Then M is abelian, M n is π-divisible and [M, G] =< 1 >. Proof. Now M = M n N, so [M, G] = [M n , G] ≤ [M, G]n ≤ N n =< 1 >. In particular
602
B.A.F. Wehrfritz / Central European Journal of Mathematics 5(3) 2007 596–606
M is abelian. Finally M n is π-divisible by 2.1. Lemma 5.3. Let G be a hypercentral group and π any set of primes. a) G has a unique maximal π-divisible subgroup δπ (G). b) δπ (G/δπ (G)) =< 1 > . c) If G is a poly π-divisible-by-finite, then G is π-divisible by a finite π-group. (If π is the set of all primes we write δ(G) for δπ (G).) Proof. a) Let D be the subgroup of G generated by all the π-divisible subgroups of G (of which < 1 > is one of course). Then D/D is abelian, generated by π-divisible subgroups and hence is π-divisible. Consequently D is π-divisible by Ref. [5] 9.23. Clearly then D is the unique maximal π-divisible subgroup δπ (G) of G. b) Set E/D = δπ (G/D). Then E/E is abelian, π-divisible by π-divisible and hence π-divisible. Consequently E is π-divisible (Ref. [5] 9.23) and so E = D. c) By a trivial induction we may assume that G is (π-divisible by finite) by (π-divisible by finite). Also we may pass to G/δπ (G) and assume that δπ (G) =< 1 >. Consequently we have normal subgroups N ≤ H of G with N a finite π-group, H/N π-divisible and G/H finite. But then H/CH (N) is finite, so H/N.CH (N) is a finite π-divisible group and hence is a π -group. Therefore CH (N)/ζ1 (N) ∼ = N.CH (N)/N is π-divisible. Replacing H by CH (N) we may assume that N ≤ ζ1 (H). Set M/N = ζ1 (H/N). Then M/N is π-divisible (Ref. [5] 9.23 again). By 5.2 the group M has finite exponent (recall δπ (G) =< 1 >) a π-number. Therefore M = N, H = N and |G| = |N|(G : H) is finite. Since δπ (G) =< 1 >, so G is a π-group. Lemma 5.4. Let G be a group with G/G π-divisible by of finite exponent n. Assume G has no non-trivial p-divisible central subgroups. a) For any integers m and i satisfying 1 ≤ i < m the factor Ai = [ζm (G),i G]/[ζm (G),i+1 G] has exponent dividing n and [ζm (G), G] has exponent dividing nm−1 . b) G/CG (ζm (G)) is nilpotent of class less than m and of exponent dividing nm−1 for each positive integer m. c) [ζω (G), G] is a periodic π(n)-group and G/CG (ζω (G)) is residually (a nilpotent π(n)-group of finite exponent). Here π(n) denotes the set of prime divisors of the positive integer n. Proof. By 2.4 we may assume that n is a π-number, i.e. that π ⊇ π(n). a) Now Ai is generated by images of G/G and hence has a π-divisible subgroup Bi ≥ Ani . Now Am−1 ≤ ζ1 (G), so Bm−1 is a π-divisible central subgroup of G. Thus Bm−1 =< 1 >= Anm−1 . Suppose Anj =< 1 > for all j satisfying i < j < m. Then [ζm (G),i+1 G] has exponent dividing nm−i−1 . Define Hi ≤ G by Hi /[ζm (G),i+1 G] = Bi . By repeated use of 5.2 we can ’move’ the π-divisible factor Bi down the series. Since < 1 > is the only π-divisible central subgroup
B.A.F. Wehrfritz / Central European Journal of Mathematics 5(3) 2007 596–606
603
of G, the subgroup Hi has finite exponent a π-number, Bi =< 1 > and Ani =< 1 >. By induction on m − i we now have that Ani =< 1 > whenever 1 ≤ i < m, from which it follows that [ζm (G), G] has exponent dividing nm−1 . b) Now G/CG (ζm (G)) stabilizes the series ζm (G) ≥ [ζm (G), G] ≥ [ζm (G),2 G] ≥ .... ≥ [ζm (G),m−1 G] ≥< 1 > . By Part a) all but the first factor of this series has exponent dividing n. Then b) follows from stability theory (e.g. [3] Section 1.C). c) [ζω (G), G] = m<ω [ζm (G), G] and CG (ζω (G)) = m<ω CG (ζm (G)). Thus c) follows from a) and b). Lemma 5.5. Let G be a nilpotent group of class c with G/G π-divisible by of finite exponent n. Then G has a π-divisible normal subgroup N with G/N a π-group of exponent dividing nc . Proof. By 2.4 we may assume n is a π-number. Trivially we may assume that c ≥ 2. Let {γ i G}1≤i≤c+1 denote the lower central series of G. Each factor γ i G/γ i+1 G is π-divisible by of exponent dividing n (e.g. by Ref. [5] 2.26; recall 2.5). Using 5.2 we may push central π-divisible factors down to the bottom of the series. The result follows. Lemma 5.6. Let N be a π divisible normal subgroup of the π-torsion-free group G contained in the hypercentre of G. Then G/N is also π-torsion-free. Proof. For each ordinal a ≥ 0 set Zα = N ∩ ζα (G). If possible pick a minimal subject to G/Zα not being π-torsion-free. Let g ∈ G\Zα with g n ∈ Zα for some π-number n. If α is a limit ordinal, then g n ∈ Zβ for some β < α, which contradicts the minimal choice of α. Thus β = α − 1 exists. Since N is π-divisible, there exists h in N with hn = g n . Also each ζγ+1 (G)/ζγ (G), and hence each Zγ+1 /Zγ , is π-torsion-free (Ref. [5] 2.25), so N/Zα is π-torsion-free. Consequently h ∈ Zα and (gh−1 )n ∈ g n h−n Zβ = Zβ . By the minimal choice of α we have gh−1 ∈ Zβ , so g ∈ Zα . This contradiction shows that no such α exists. Since N = Zσ for some σ, the claim follows. Lemma 5.7. Let G be a π-torsion-free hypercentral group with G/G π-divisible by of finite exponent. Then G is π-divisible. Proof. By 5.6 (and 5.3) we may assume that δπ (G) =< 1 >. Then [ζ2 (G), G] has finite exponent a π-number by 2.4 and 5.4. But G is π-torsion-free. Hence [ζ2 (G), G] =< 1 >, ζ2 (G) = ζ1 (G) and G is abelian. But G/G is π-divisible by a π-group, δπ (G) =< 1 > and G is π-torsion-free. Therefore G =< 1 > and the result follows. Lemma 5.8. There exist periodic hypercentral groups G0 and G1 with Gi /Gi of exponent 2 and Gi not divisible by of finite exponent, such that Gi has central height ω + i for i = 0, 1.
604
B.A.F. Wehrfritz / Central European Journal of Mathematics 5(3) 2007 596–606
Lemmas 5.5, 5.7 and 5.8 suggest there is little more that can be said about hypercentral groups with G/G divisible by of finite exponent. Note that Theorem 1.4 is an immediate consequence of 5.5, 5.7 and 5.8. Proof. Let Dj =< gj , aj > be a dihedral group of order 2j+1 , where gj has order 2 and aj has order 2j . Let G0 be the direct product of the Dj for j ≥ 1. It is easy to check that G0 has all the required properties; actually G0 has infinite exponent with δ(G0 ) =< 1 >. In the cartesian product j≥1 Dj set G1 =< g, aj : j ≥ 1 >, where g = (gj )j≥1 . Clearly G1 is periodic hypercentral with G1 /G1 of exponent 2. Again δ(G1 ) =< 1 > and G1 has infinite exponent. Also < aj : j ≥ 1 >≤ ζω (G1 ). If ζω (G1 ) = G1 , then g ∈ ζm (G) for some m < ω. But then a2j = [g, aj ] ∈ ζm (G) for all j and this is clearly false for j > m + 1. Thus G1 has central height ω + 1. There is nothing special about the prime 2 here; one can construct examples G0 and G1 as in 5.8 that are p-groups for any prime p (based, for example, on subgroups of the wreath product of a cyclic group of order pj by a cyclic group of order p.
6
Groups G with G/G divisible by finite
A certain amount about groups as in the title can be read off from the results of Section 5. Lemma 6.1. Let G be a hypercentral group with G/G π-divisible by finite. a) Suppose δπ (G) =< 1 >. Then G/CG (ζm (G)) is a finite π-group for every positive integer m. b) If G is nilpotent, then G is π-divisible by a finite π-group. c) If G is π-torsion-free, then G is π-divisible. Proof. If K is any image of G of finite exponent a π-number, then K is π-divisible by finite by 3.4. It follows that K is finite. Then a), b and c) follow respectively from 5.4 b), 5.5 and 5.7. Lemma 6.2. Let G be a hypercentral group with G/ζ1 (G) π-divisible. Then G is πdivisible. In 6.2 consider for the moment the case where π is the set of all primes. Even with G/ζ1 (G) and G divisible there is no need for G to be divisible; just let G be any nondivisible abelian group. Also the converse of 6.2 does not hold; if G is the infinite locally dihedral 2-group, then G is divisible while G/ζ1(G) ∼ = G is not. Proof. We may pass to G/δπ (G ); equivalently assume δπ (G ) =< 1 >. If z ∈ ζ2 (G)\ζ1 (G), then the non-trivial subgroup [z, G] of G , as an image of G/ζ1 (G), is π-divisible. By the initial reduction no such subgroup exists. Thus no such element z
B.A.F. Wehrfritz / Central European Journal of Mathematics 5(3) 2007 596–606
605
exists, G is abelian and G =< 1 > is trivially π-divisible. Lemma 6.3. Let G be a hypercentral group with a periodic abelian normal subgroup T . Suppose that G/T is π-divisible, that G/G is π-divisible by finite and that δπ (G) =< 1 >. Then G is a finite π-group. Proof. By 6.1 a) the factor G/CG (T ∩ ζ2 (G)) is a finite π-group. Also as an image of G/T it is π-divisible. Hence it is trivial, so T ∩ ζ2 (G) = T ∩ ζ1 (G). Therefore T ≤ ζ1 (G) by 5.1 and so G is π-divisible by 6.2. But δπ (G) =< 1 >. Thus G is abelian, G ∼ = G/G is π-divisible by finite and G is a finite π-group. Lemma 6.4. Let G be a hypercentral group with G/G π-divisible by finite. Then G is π-divisible by a finite π-group. Proof. Let T denote the maximum periodic subgroup of G. Then G/T is torsion-free (Ref. [5] 2.25) and hence is π-divisible by 6.1c). Thus (G/T )/δπ (G/T ) is finite by 6.3 and hence G/T is π-divisible by finite. But G/T is torsion-free, so T /T is also π-divisible by finite and consequently T is π-divisible by finite by 3.4. Therefore G is π-divisible by a finite π-group by 5.3c). Theorem 1.2 is immediate from 6.4 and 6.1c). Let H/G be a divisible subgroup of G/G of finite index, where G is a hypercentral group. If H = G, then G is divisible by [5] 9.23. If we could deduce in general that H/H was divisible, then G would be divisible by finite immediately from [5] 9.23 again. Easy examples (e.g. G = X × Y where X =< 1 > is divisible abelian, Y is say dihedral of order 8 and H = X × Y ) show that H/H need not be divisible, even if H = G , so this direct approach will not work. (Note that the other extreme case where H = G is also immediate, but now from Theorem 1.5.) It is possible to prove 6.4 along the lines of our proof of 1.1 at least for groups of central height at most ω 2 using 6.1a) and the following variant of 3.2. Lemma 6.5. Let G be a locally nilpotent group with γ r G periodic for some positive integer r. Suppose G is residually a finite π-group and G/G is π-divisible by finite. Then G is a finite π-group. Proof. Let H/G be a π-divisible subgroup of G/G of finite index. Clearly G = XH for some finitely generated nilpotent subgroup X of G. Then X ∩ γ r G is also finitely generated (X satisfies the maximal condition on subgroups) and hence is finite. Let φ be a homomorphism of G onto a finite π-group. Then Hφ = G φ, so Gφ = Xφ.G φ. But G φ = (Gφ) lies in the Frattini subgroup of Gφ. Hence Gφ = Xφ and |(γ r G)φ| = |γ r (Gφ)| = |γ r (Xφ)| = |(γ r X)φ| ≤ |γ r X| ≤ |X ∩ γ r G| < ∞. Since G is residually a finite π-group, we obtain |γ r G| = |X ∩ γ r G| < ∞. But then G is nilpotent and π-divisible by finite by 6.1 b). Again G is residually a finite π-group. Therefore G is a finite π-group.
606
B.A.F. Wehrfritz / Central European Journal of Mathematics 5(3) 2007 596–606
Lemma 6.6. Let G be a π-divisible by finite, hypercentral group. Set H = δπ (G) and let T denote the maximum π-subgroup of G. a) G/T is π-divisible and G/(H ∩ T ) = H/(H ∩ T ) × T /(H ∩ T ). b) H ∩ T is a central π-divisible subgroup of H and H/(H ∩ T ) is π-divisible and π-torsion-free. c) T /(H ∩ T ) is finite, H ∩ T ≤ ζω (G) and T ≤ ζω+r (G) for some integer r ≥ 0. Theorem 1.3 is an immediate consequence of 6.6. There is no bound possible on the central height of H/(H ∩ T ), even if π is the set of all primes (see [6] p. 421, Remark e) or [4] Section 4 again), so there is no bound on the central height in 6.6 of G itself. If G is π-torsion-free then G = H in 6.6 and G is π-divisible, a fact we already know from 6.1c). Proof. a) Now G/H is a finite π-group by 5.3c). Also G/T is π-divisible by 6.1c). Then G/HT is a finite π-divisible π-group, so G = HT . Hence G/(H ∩ T ) = H/(H ∩ T ) × T /(H ∩ T ). b) Clearly H ∩ T is the maximum π-subgroup of H and H/(H ∩ T ) is π-divisible and π-torsion-free. By [5] 9.23 the subgroup H ∩ T is central in H and is π-divisible. c) G/H is finite, so T /(H ∩ T ) is too. Also CG (H ∩ T ) ≥ H by Part b), so H ∩ T ≤ ζω (G) by 3.1. Consequently T ≤ ζω+r (G) for some r < (G : H). The proof is complete.
References [1] L. Fuchs: Infinite Abelian Groups, Vol. 1, Academic Press, New York, 1970. [2] L. Heng, Z. Duan and G. Chen: “On hypercentral groups G with |G : Gn | < ∞”, Comm. Algebra, Vol. 34, (2006), pp. 1803–1810. [3] O.H. Kegel and B.A.F. Wehrfritz: Locally Finite Groups, North-Holland, Amsterdam, 1973. [4] D.H. McLain: “Remarks on the upper central series of a group”, Proc. Glasgow Math. Assoc., Vol. 3, (1956), pp. 38–44. [5] D.J.S. Robinson: Finiteness Conditions and Generalized Soluble Groups, Springer– Verlag, Berlin, 1972. [6] B.A.F. Wehrfritz: “Nilpotence in finitary linear groups”, Michigan Math. J., Vol. 40, (1992), pp. 419–432.
DOI: 10.2478/s11533-007-0017-1 Research article CEJM 5(3) 2007 607–618
Frequent oscillation in a nonlinear partial difference equation Jun Yang1,2∗ , Yu Jing Zhang1, Sui Sun Cheng3† 1
College of Science, Yanshan University, Qinhuangdao Hebei 066004, P.R. China 2
Mathematics Research Center in Hebei Province, Shijiazhuang Hebei 050000, P.R. China 3 Department of Mathematics, Tsing Hua University, Hsinchu, Taiwan 30043, R.O. China
Received 23 October 2006; accepted 16 April 2007 Abstract: This paper is concerned with a class of nonlinear delay partial difference equations with variable coefficients, which may change sign. By making use of frequency measures, some new oscillatory criteria are established. This is the first time oscillation of these partial difference equations is discussed by employing frequency measures. c Versita Warsaw and Springer-Verlag Berlin Heidelberg. All rights reserved. Keywords: partial difference equations, frequency measures, frequently oscillatory solution, unsaturated solution MSC (2000): 39A11
1
Introduction
To motivate what follows, consider interconnected neuron units placed on an arbitrary large board. The time dependent state values of the neuron units can be quite complicated if the interconnection and activation rule is not regular. Therefore in neural network design, we tend to work on interconnection and activation rules that are ‘local and uniform’. As an example, let Z be the set of integers, Z[k, l] = {i ∈ Z|i = k, k + 1, ..., l} and ∗ †
E-mail:
[email protected] E-mail:
[email protected]
608
J. Yang et al. / Central European Journal of Mathematics 5(3) 2007 607–618
Z[k, ∞) = {i ∈ Z|i = k, k + 1, ...} . Consider neuron units placed on Ω = Z[0, ∞) × Z[0, ∞)\{(0, 0)}. (t)
Let ui,j be the state value of the neuron unit during the time period t ∈ {0, 1, 2, ...} and placed at the lattice point (i, j) ∈ Ω. Then a simple interconnection arrangement is provided by connecting each neuron unit with its four nearest neighbors, and a simple activation rule is the following (t+1)
ui,j
(t)
(t)
(t)
(t)
= ui−1,j + ui+1,j + ui,j−1 + ui,j+1, i, j ∈ Z[0, ∞),
which states that the new state value after one period of time is given by the sum of the state values of its four neighbors. There are many questions related to the properties of the above simple model. In particular, it is of interest to seek steady state solutions (t) (t) ui,j which is time independent so that ui,j = ui,j for i, j, t ∈ Z[0, ∞). This then leads us to the steady state equation ui,j = ui+1,j + ui−1,j + ui,j+1 + ui,j−1, i, j ∈ Z[0, ∞). If more distant neighbors and more general activation rules are allowed, then more general steady state equations may arise: e.g., ui,j = f (ui+1,j ) + g(ui−1,j ) + p(ui,j+1) + q(ui,j−1) + F (ui+2,j ) + G(ui−2,j ) + · · · + Q(ui,j−2), where f, g, ..., Q are functions that reflect the properties of the individual neuron units. In this paper, we are interested in the following steady state equation um,n = um+1,n + um,n+1 + pm,n |um−k1 ,n−l1 |α sgn um−k1 ,n−l1 +qm,n |um−k2 ,n−l2 |β sgn um−k2 ,n−l2 = 0,
(1)
for m, n ∈ Z[0, ∞), where (H1) α ∈ [0, 1) and β ∈ (0, ∞), (H2) p = {pm,n }m,n∈Z[0,∞) and q = {qm,n }m,n∈Z[0,∞) are real double sequences, (H3) k1 , k2 , l1 and l2 are nonnegative integers such that k1 > k2 ≥ 0 and l1 > l2 ≥ 0. In addition to (H1–H3), we will also assume (H4) if p = {pm,n } has negative components, then α and β are also chosen such that (β − 1)/(β − α) is a quotient of positive odd integers, while if q = {qm,n } has negative terms, then α and β are also chosen such that (1 − α)/(β − α) is a quotient of positive odd integers. There are several reasons for studying this particular equation. One reason is that for the case where pm,n ≥ 0 and qm,n ≥ 0 for m, n ∈ Z[0, ∞), some oscillatory properties of the corresponding equation have been derived [1], but when pm,n and qm,n may take on negative values, no results are available. There are other reasons as well. Indeed, the usual concepts of oscillation or stability of steady state solutions do not catch all their fine details, and it is necessary to use the concept of frequency measures introduced in [2] to provide better descriptions.
J. Yang et al. / Central European Journal of Mathematics 5(3) 2007 607–618
609
Indeed, our criteria differ from the classical ones in that they are described by means of the ‘measures’ of the level sets of the involved parameter sequences. Consequently, our theorems provide us information concerned with how frequent the solutions oscillate. Such results will then likely provide better understanding of the original neural network designs. Our plan is as follows. In the next section, we recall some of the terminologies and basic results related to the frequency measures. Then we derive several criteria for all solutions of (1) to be frequently oscillatory or unsaturated. In the final section, we give some examples to illustrate our results. We pause to point out that our results, in contrast to those in [1], allow pmn and qmn to take on negative values as well. The reason is that we use frequency measures instead of the values of these double sequences. Furthermore, although the techniques here are similar to those in [2] and [5], we emphasize on nonlinear partial difference equations, while in [2], we emphasize on ordinary difference equations of the form Δ(xn + cn xn−k ) = f (n, xn−l ), n = 0, 1, 2, ..., and in [5] (and [6]), linear partial difference equations such as um,n+1 + am,n um+k,n − bm,n um,n−δ − cm,n um−l,n = 0, (m, n) ∈ Z(−∞, ∞) × Z[0, ∞). As a consequence, the details here are quite different from those in [2, 5] or [6]. There are still unanswered questions related to equation (1). For instance, can we find positive solutions. Are solutions bounded? While there are some answers for special cases (see e.g. [3]), we leave the general case to be studied in the future. For now, let us settle on the definition of a solution of (1): A real double sequence {um,n } defined for (m, n) ∈ Z[−k1 , ∞) × Z[−l1 , ∞) is a solution of (1) if substitution of {um,n } into (1) renders it into an identity. For the sake of convenience, Z[−k1 , ∞) × Z[−l1 , ∞) will be denoted by Ω in the following section. Given a double sequence {um,n } , the partial differences um+1,n − um and um,n+1 − um,n will be denoted by Δ1 um,n and Δ2 um,n respectively.
2
Preliminary
The union, intersection and difference of two sets A and B will be denoted by A + B, A · B and A\B respectively. The number of elements of a set S will be denoted by |S| . Let Φ be a subset of Ω. Then X m Φ = {(i + m, j) ∈ Ω| (i, j) ∈ Φ} and Y m Φ = {(i, j + m) ∈ Ω| (i, j) ∈ Φ} are the translations of Φ. Let α, β, λ and δ be integers satisfying α ≤ β and λ ≤ δ. The union βi=α δj=λ X i Y j Φ will be denoted by Xαβ Yλδ Φ. Clearly, (i, j) ∈ Ω\Xαβ Yλδ Φ ⇔ (i − s, j − t) ∈ Ω\Φ
610
J. Yang et al. / Central European Journal of Mathematics 5(3) 2007 607–618
for α ≤ s ≤ β and λ ≤ t ≤ δ. For any m, n ∈ Z[0, ∞), we set Φ(m,n) = {(i, j) ∈ Ω| − k1 ≤ i ≤ m, −l1 ≤ j ≤ n} (m,n) Φ lim sup mn m,n→∞
If
exists, then the superior limit, denoted by μ∗ (Φ), will be called the upper frequency measure of Φ. Similarly, if (m,n) Φ lim inf m,n→∞ mn exists, then the inferior limit, denoted by μ∗ (Φ), will be called the lower frequency measure of Φ. If μ∗ (Φ) = μ∗ (Φ), then the common limit is denoted by μ(Φ) and is called the frequency measure of Φ. Clearly, μ(∅) = 0, μ(Ω) = 1 and 0 ≤ μ∗ (Φ) ≤ μ∗ (Φ) ≤ 1 for any subset Φ of Ω. Furthermore if Φ is finite, then μ(Φ) = 0. The following results are concerned with the frequency measures and their proofs are similar to those in [3]. Lemma 2.1. Let Φ and Γ be subsets of Ω. Then μ∗ (Φ+Γ) ≤ μ∗ (Φ)+μ∗ (Γ). Furthermore, if Φ and Γ are disjoint, then μ∗ (Φ) + μ∗ (Γ) ≤ μ∗ (Φ + Γ) ≤ μ∗ (Φ) + μ∗ (Γ) ≤ μ∗ (Φ + Γ) ≤ μ∗ (Φ) + μ∗ (Γ), so that μ∗ (Φ) + μ∗ (Ω\Φ) = 1. Lemma 2.2. Let Φ be a subset of Ω and α, β, λ and δ be integers such that α ≤ β and λ ≤ δ. Then μ∗ Xαβ Yλδ Φ ≤ (β − α + 1)(δ − λ + 1)μ∗ (Φ) and
μ∗ Xαβ Yλδ Φ ≤ (β − α + 1)(δ − λ + 1)μ∗ (Φ).
Lemma 2.3. Let Φ1 , ..., Φn be subsets of Ω. Then n
n
n Φi ≤ μ∗ (Φi ) − (n − 1)μ∗ Φi μ∗ i=1
and μ∗
n i=1
i=1
Φi
i=1
≤ μ∗ (Φ1 ) + μ∗
n i=2
Φi
− (n − 1)μ∗
n
Φi
.
i=1
Lemma 2.4. Let Φ and Γ be subsets of Ω. If μ∗ (Φ) + μ∗ (Γ) > 1, then the intersection Φ · Γ is infinite.
J. Yang et al. / Central European Journal of Mathematics 5(3) 2007 607–618
611
For any real double sequence {vi,j } defined on a subset of Ω, the level set {(i, j) ∈ Ω| vi,j > c} is denoted by (v > c) . The notations (v ≥ c) , (v < c), (v ≤ c) are similarly defined. Let u = {ui,j }(i,j)∈Ω be a real double sequence. If μ∗ (u ≤ 0) = 0, then u is said to be frequently positive, and if μ∗ (u ≥ 0) = 0, then u is said to be frequently negative. u is said to be frequently oscillatory if it is neither frequently positive nor frequently negative. If μ∗ (u > 0) = ω ∈ (0, 1), then u is said to have unsaturated upper positive part, and if μ∗ (u > 0) = ω ∈ (0, 1), then u is said to have unsaturated lower positive part. u is said to have unsaturated positive part if μ∗ (u > 0) = μ∗ (u > 0) = ω ∈ (0, 1). The concepts of frequently oscillatory and unsaturated double sequences were introduced in [3–6]. It was also observed that if a double sequence u = {ui,j }(i,j)∈Ω is frequently oscillatory or has unsaturated positive part, then it is oscillatory, that is, u is not positive for all large m and n, nor negative for all large m and n. Thus if we can show that every solution of (1) is frequently oscillatory or has unsaturated positive part, then every solution of (1) is oscillatory. We remark that the concept of frequency measure introduced here is a natural one since the concept of a solution of (1) is a double sequence {um,n } defined for (m, n) ∈ Ω = Z[−k1 , ∞) × Z[−l, ∞), and Φ(m,n) = {(i, j) ∈ Ω| − k1 ≤ i ≤ m, −l1 ≤ j ≤ n} is then the number of lattice points inside the lattice rectangle Z[−k1 , m] × Z[−l1 , n] m and n, which contains (m + k1 + 1)(n + l1 + 1) number of lattice points. For large (m,n) (m,n) /2mn) /mn (instead of Φ (m + k1 + 1)(n + l1 + 1) is asymptotic to mn. Thus Φ measures the relative size of the part of Φ ‘inside’ the lattice rectangle. We remark further that the well known asymptotic density in number theory is closely related to our concept of frequency measure. From the literature, a sequence A = {an }∞ n=1 of positive integers a1 < a2 < · · · has lower asymptotic density δ(A) and upper asymptotic density δ(A) defined by
δ(A) = lim inf n→∞
A(n) , n
δ(A) = lim sup n→∞
A(n) , n
where A(n) denotes the number of integers of A which are not greater than n. These concepts and related ones such as the Schnirelmann density of A have been extensively used in number theory for stating properties of integer sequences (see e.g. [8–10]). However, here we are dealing with double sequences and the emphasis is changed to their values instead of their domains of definition. And hence the above definitons and their properties are needed.
612
3
J. Yang et al. / Central European Journal of Mathematics 5(3) 2007 607–618
Frequently Oscillatory Solutions
An inequality, which can be found in [7], will be used in deriving the following results: If x, y ≥ 0 and c, d > 0 such that c + d = 1, then cx + dy ≥ xc y d . Lemma 3.1. Suppose there exist m0 ≥ 2k1 and n0 ≥ 2l1 such that pm,n ≥ 0, qm,n ≥ 0 for (m, n) ∈ Z[m0 − 2k1 , m0 + 1] × Z[n0 − 2l1 , n0 + 1]. Lemma 3.2. Let {um,n } be a solution of (1). If um,n ≥ 0 for (m, n) ∈ Z[m0 −2k1 , m0 +1] × Z[n0 − 2l1 , n0 + 1], then Δ1 um,n ≤ 0, Δ2 um,n ≤ 0 for (m, n) ∈ Z[m0 − k1 , m0 ] × Z[n0 − l1 , n0 ], and if um,n ≤ 0 for (m, n) ∈ Z[m0 − 2k1 , m0 + 1] × Z[n0 − 2l1 , n0 + 1], then Δ1 um,n ≥ 0, Δ2 um,n ≥ 0 for (m, n) ∈ Z[m0 − k1 , m0 ] × Z[n0 − l1 , n0 ]. Proof. If um,n ≥ 0 for (m, n) ∈ Z[m0 − 2k1 , m0 + 1] × Z[n0 − 2l1 , n0 + 1], then from (1), um,n = um+1,n + um,n+1 + pm,n uαm−k1 ,n−l1 + qm,n uβm−k2 ,n−l2 ≥ um+1,n + um,n+1 . Hence Δ1 um,n ≤ 0, Δ1 um,n ≤ 0 for (m, n) ∈ Z[m0 − k1 , m0 ] × Z[n0 − l1 , n0 ]. Similarly, we may also show Δ1 um,n ≥ 0, Δ2 um,n ≥ 0 for (m, n) ∈ Z[m0 − k1 , m0 ] × Z[n0 − l1 , n0 ]. Let
β−α β−α , . θ = min β −1 1−α Then since β ∈ (1, ∞) and α ∈ [0, 1), we see that θ > 1. Also let p
β−1 β−α
q
1−α β−α
=
β−1 β−α
1−α β−α
pm,n qm,n
. m,n∈Z[0,∞)
β−1
1−α
Under the assumption (H4), p β−α q β−α is well defined. We remark that if pm,n ≥ 0 or qm,n ≥ 0, then the assumption (H4) may not be needed. Theorem 3.3. Suppose there exist constants ω1 , ω2 , ω3 and ω0 such that μ∗ (p < 0) = ω1 , μ∗ (q < 0) = ω2 , μ∗ ((p < 0)(q < 0)) = ω3 and β−1 1−α μ∗ θp β−α q β−α > 1 > 4(k1 + 1)(l1 + 1)(ω1 + ω2 − ω3 ). Then every nontrivial solution of (1) is frequently oscillatory.
J. Yang et al. / Central European Journal of Mathematics 5(3) 2007 607–618
613
Proof. Suppose to the contrary that u = {um,n } is a frequently positive solution of (1). Then μ∗ (u ≤ 0) = 0. By Lemmas 2.1, 2.2 and 2.3, we have 2k1 2l1 Y−1 [(p < 0) + (q < 0) + (u ≤ 0)] 1 = μ∗ Ω\X−1 2k1 2l1 Y−1 [(p ≤ 0) + (q < 0) + (u ≤ 0)] +μ∗ X−1 2k1 2l1 Y−1 [(p < 0) + (q < 0) + (u ≤ 0)] ≤ μ∗ Ω\X−1
+4(k1 + 1)(l1 + 1)μ∗ {[(p < 0) + (q < 0)] + μ∗ (u ≤ 0)} 2k1 2l1 Y−1 [(p < 0) + (q < 0) + (u ≤ 0)] + 4(k1 + 1)(l1 + 1)(ω1 + ω2 − ω3 ) ≤ μ∗ Ω\X−1 β−1 1−α 2k1 2l1 Y−1 [(p < 0) + (q < 0) + (u ≤ 0)] + μ∗ θp β−α q β−α > 1 . < μ∗ Ω\X−1
Therefore by Lemma 2.4, the intersection β−1 1−α 2k1 2l1 β−α β−α Ω\X−1 Y−1 [(p < 0) + (q < 0) + (u ≤ 0)] · θp q >1 is infinite. This implies that there exist m0 ≥ 2k1 and n0 ≥ 2l1 such that β−1
1−α
β−α β−α θpm 0 ,n0 qm0 ,n0 > 1
(2)
and
pm,n ≥ 0, qm,n ≥ 0, um,n > 0 for (m, n) ∈ Z[m0 − 2k1 , m0 + 1] × Z[n0 − 2l1 , n0 + 1]. (3) In view of (3) and Lemma 3.1, we may then see that Δ1 um,n ≤ 0 and Δ2 um,n ≤ 0 for (m, n) ∈ Z[m0 − k1 , m0 ] × Z[n0 − l1 , n0 ], and hence um0 −k1 ,n0 −l1 ≥ um0 −k2 ,n0 −l2 ≥ um0 ,l0 so that by (1) and (2), 0 ≥ um0 +1,n0 + um0 ,n0 +1 − um0 ,n0 + pm0 ,n0 uαm0 −k2 ,n0 −l2 + qm0 ,n0 uβm0 −k2 ,n0 −l2 β−1
1−α
β−α β−α ≥ um0 +1,n0 + um0 ,n0 +1 − um0 ,n0 + θpm 0 ,n0 qm0 ,n0 um0 ,n0 β−1 1−α β−α β−α ≥ θpm um0 ,n0 > 0, 0 ,n0 qm0 ,n0 − 1
which is a contradiction. In a similar manner, if u = {um,n } is a frequently negative solution of (1) such that ∗ μ (u ≥ 0) = 0, then we may show that β−1 1−α 2k1 2l1 Y−1 [(p < 0) + (q < 0) + (u ≥ 0)] · θp β−α q β−α > 1 Ω\X−1 is infinite. Again we may arrive at a contradiction as above. The proof is complete.
614
J. Yang et al. / Central European Journal of Mathematics 5(3) 2007 607–618
Theorem 3.4. Suppose there exist constants ω1 , ω2 and ω3 such that β−1 1−α μ∗ (p < 0) = ω1 , μ∗ (q < 0) = ω2 , μ∗ θp β−α q β−α ≤ 1 = ω3 and
β−1 1−α ω + ω + ω 1 1 2 3 β−α β−α − . q ≤1 > μ∗ (p < 0) · (q < 0) · θp 2 8(k1 + 1)(l1 + 1)
Then every nontrivial solution of (1) is frequently oscillatory. Proof. Suppose to the contrary that {um,n } be an eventually positive solution of (1). Then μ∗ (u ≤ 0) = 0. By Lemma 2.1, 2.2 and 2.3, we get β−1 1−α 2k1 2l1 ∗ β−α β−α μ {Ω − X−1 Y−1 (p < 0) + (q < 0) + θp q ≤ 1 + (u ≤ 0) β−1 1−α 2k1 2l1 Y−1 (p < 0) + (q < 0) + θp β−α q β−α ≤ 1 + (u ≤ 0) = 1 − μ∗ X−1 β−1 1−α ≥ 1 − 4(k1 + 1)(l1 + 1) μ∗ (p < 0) + (q < 0) + θp β−α q β−α ≤ 1 + μ∗ (u ≤ 0) β−1 1−α ≥ 1 − 4(k1 + 1)(l1 + 1) μ∗ (p < 0) + μ∗ (q < 0) + μ∗ θp β−α q β−α ≤ 1 β−1 1−α > 0. −2μ∗ (p < 0) · (q < 0) · θp β−α q β−α ≤ 1 Thus, we know that
β−1 1−α 2k1 2l1 Ω − X−1 Y−1 (p < 0) + (q < 0) + θp β−α q β−α ≤ 1 + (u ≤ 0)
is infinite. This implies that there exist m0 ≥ 2k1 and n0 ≥ 2l1 such that (2) and pm,n ≥ 0, qm,n ≥ 0, um,n > 0 hold for (m, n) ∈ Z[m0 − 2k1 , m0 + 1] × Z[n0 − 2l1 , n0 + 1]. By similar discussions as in the proof of Theorem 3.3, we may arrive at a contradiction against (2). In case u = {um,n } is eventually negative, then μ∗ (u ≥ 0) = 0. In an analogous manner, we may see that β−1 1−α 2k1 2l1 Ω − X−1 Y−1 (p < 0) + (q < 0) + θp β−α q β−α ≤ 1 + (u ≥ 0) is infinite. This leads to a contradiction again. The proof is complete.
4
Unsaturated Solutions
The methods used in the above proofs can be modified to obtain the following results for unsaturated solutions. Theorem 4.1. Suppose μ∗ (p < 0) = ω1 , μ∗ (q < 0) = ω2 , μ∗ ((p < 0) · (q < 0)) = ω3 and there is ω0 ∈ (0, 1) such that β−1 1−α μ∗ θp β−α q β−α > 1 > 4(k1 + 1)(l1 + 1)(ω1 + ω2 + ω0 − ω3 ).
J. Yang et al. / Central European Journal of Mathematics 5(3) 2007 607–618
615
Then every nontrivial solution of (1) has unsaturated upper positive part. Proof. Let u = {um,n } be a nontrivial solution of (1). We assert that μ∗ (u > 0) ∈ (ω0 , 1). Suppose not, then μ∗ (u > 0) ≤ ω0 or μ∗ (u > 0) = 1. In the former case, applying arguments similar to the proof of Theorem 1, we may then arrive at the fact that β−1 1−α 2k1 2l1 β−α β−α Ω\X−1 Y−1 [(p < 0) + (q < 0) + (u > 0)] · θp q >1 is infinite and a subsequent contradiction. In the latter case, we have μ∗ (u ≤ 0) = 0. By Lemmas 2.1, 2.2 and 2.3, we have 2k1 2l1 1 = μ∗ Ω\X−1 Y−1 [(p < 0) + (q < 0) + (u ≤ 0)] 2k1 2l1 Y−1 [(p < 0) + (q < 0) + (u ≤ 0)] +μ∗ X−1 2k1 2l1 Y−1 [(p < 0) + (q < 0) + (u ≤ 0)] ≤ μ∗ Ω\X−1 +4 (k1 + 1) (l1 + 1)μ∗ [(p < 0) + (q < 0) + (u ≤ 0)] 2k1 2l1 Y−1 [(p < 0) + (q < 0) + (u ≤ 0)] ≤ μ∗ Ω\X−1
+4 (k1 + 1) (l1 + 1) {μ∗ [(p < 0) + (q < 0)] + μ∗ (u ≤ 0)} 2k1 2l1 Y−1 [(p < 0) + (q < 0) + (u ≤ 0)] ≤ μ∗ Ω\X−1 +4 (k1 + 1) (l1 + 1) (ω1 + ω2 − ω3 ) β−1 1−α 2k1 2l1 ∗ β−α β−α < μ Ω\X−1 Y−1 [(p < 0) + (q < 0) + (u ≤ 0)] + μ∗ θp q >1 . Therefore by Lemma 4, we know that the set β−1 1−α 2k1 2l1 Ω\X−1 Y−1 [(p < 0) + (q < 0) + (u ≤ 0)] · θp β−α q β−α > 1 is infinite. Then by discussions similar to those in the proof of Theorem 1 again, we may arrive at a contradiction. This completes the proof. Combining Theorem 3.4 and 4.1, we have the following Theorem 4.2 and the proof of which is omitted. Theorem 4.2. Suppose there exist constants ω1 , ω2, ω3 and ω0 ∈ (0, 1) such that β−1 1−α ∗ ∗ β−α β−α ≤ 1 = ω3 μ (p < 0) = ω1 , μ (q < 0) = ω2 , μ∗ θp q and β−1 1−α ω + ω + ω + ω 1 1 2 3 0 − . μ∗ (p < 0) · (q < 0) · θp β−α q β−α ≤ 1 > 2 8(k1 + 1)(l1 + 1) Then every nontrivial solution of (1) has unsaturated upper positive part. Theorem 4.3. Suppose there exist constants ω1 , ω2, ω3 , ω4 and ω0 ∈ (0, 1) such that β−1 1−α μ∗ (p < 0) = ω1 , μ∗ (q < 0) = ω2 , μ∗ θp β−α q β−α ≤ 1 = ω3
616
J. Yang et al. / Central European Journal of Mathematics 5(3) 2007 607–618
β−1 1−α μ∗ (p < 0) · (q < 0) · θp β−α q β−α ≤ 1 = ω4 and 4(k1 + 1)(l1 + 1)(ω1 + ω2 + ω3 + ω0 − 2ω4 ) < 1. Then every nontrivial solution of (1) has unsaturated upper positive part. Proof. We assert that μ∗ (u > 0) ∈ (ω0 , 1). First, we prove that μ∗ (u > 0) > ω0 . Otherwise, if μ∗ (u > 0) ≤ ω0 , by Lemma 2.1, Lemma 2.2 and Lemma 2.3, we have β−1 1−α 2k1 2l1 2k1 2l1 β−α β−α q ≤1 + μ∗ Ω\X−1 Y−1 [(u > 0)] μ∗ Ω\X−1 Y−1 (p < 0) + (q < 0) + θp β−1 1−α 2k1 2l1 2k1 2l1 β−α β−α q ≤1 − μ∗ X−1 Y−1 [(u > 0)] = 2 − μ∗ X−1 Y−1 (p < 0) + (q < 0) + θp β−1 1−α ≥ 2 − 4(k1 + 1)(l1 + 1) μ∗ (p < 0) + μ∗ (q < 0) + μ∗ θp β−α q β−α ≤ 1 + μ∗ (u > 0) β−1 1−α > 1. −2μ∗ (p < 0) · (q < 0) · θp β−α q β−α ≤ 1 Hence, by Lemma 2.4, we see that β−1 1−α 2k1 2l1 2k1 2l1 Ω\X−1 Y−1 (p < 0) + (q < 0) + θp β−α q β−α ≤ 1 · Ω − X−1 Y−1 [(u > 0)] is infinite. Then there exist m0 ≥ 2k1 and n0 ≥ 2l1 such that (2) and pm,n ≥ 0, qm,n ≥ 0, um,n ≤ 0 hold for (m, n) ∈ Z[m0 − 2k1 , m0 + 1] × Z[n0 − 2l1 , n0 + 1]. Applying similar discussions as in the proof of Theorem 1, we can get a contradiction. Next, we prove that μ∗ (u > 0) < 1. Otherwise, μ∗ (u ≤ 0) = 0. Analogously, we see that β−1 1−α 2k1 2l1 2k1 2l1 Ω\X−1 Y−1 (p < 0) + (q < 0) + θp β−α q β−α ≤ 1 · Ω\X−1 Y−1 [(u ≤ 0)] is infinite. Then, we can also obtain to a contradiction. The proof is complete.
We remark that under the same conditions in Theorem 4.1, Theorem 4.2 or Theorem 4.3, we may show that every nontrivial solution of (1) has unsaturated lower positive parts as well.
5
Examples
We give several examples to illustrate our previous results. Example 5.1. Consider the partial difference equation 1
1
3
um+1,n + um,n+1 − um,n + 2n− 2 |um−4,n−2| 2 sgnum−4,n−2 + |um−1,n−1| 2 sgnum−1,n−1 = 0. (4)
J. Yang et al. / Central European Journal of Mathematics 5(3) 2007 607–618
617
1 β−α = 2. It From (4), we know that α = 12 , β = 32 , pm,n = 2n− 2 , qm,n = 1, θ = min β−α , β−1 1−α is clear that β−1 1−α μ∗ θp β−α q β−α > 1 = 1 β−1 1−α ∗ ∗ ∗ β−α β−α q ≤1 =0 μ (p < 0) = μ (q < 0) = μ∗ ((p < 0) · (q < 0)) = μ θp and
β−1 1−α μ∗ (p < 0) · (q < 0) · θp β−α q β−α ≤ 1 = 0.
Hence, by Theorem 3.3 or Theorem 3.4, every nontrivial solution of (4) is frequently oscillatory. Furthermore, let ω0 ∈ (0, 1/60). We then see that all conditions in Theorem 4.1, Theorem 4.2 or Theorem 4.3 are satisfied. Consequently, every nontrivial solution of (4) has unsaturated upper positive part. As a matter of fact, u = {(−1)m 2n } is such a solution with μ∗ (u > 0) = 1/2. Example 5.2. Consider the partial difference equation um+1,n +um,n+1−um,n +pm,n |um−3,n−2 |1/3 sgn um−3,n−2 +qm,n |um−2,n−1|4/3 sgn um−2,n−1 = 0, where pm,n =
⎧ ⎪ ⎨ −1 m = 10s and n = 15t; s, t ∈ Z[0, ∞) ⎪ ⎩1
otherwise
and qm,n = 1 for m, n ∈ Z[0, ∞). Here we have α = 1/3, β = 4/3 and θ = 3/2. Since β−1 1−α 1 β−α β−α , μ (p < 0) = μ θp q ≤1 = 150 β−1 1−α 149 μ∗ θp β−α q β−α > 1 = , 150 ∗
∗
μ∗ (q < 0) = μ∗ ((p < 0) · (q < 0)) = 0, β−1 1−α μ∗ (p < 0) · (q < 0) · θp β−α q β−α ≤ 1 = 0, by Theorem 3.3 or Theorem 3.4, every nontrivial solution is frequently oscillatory. Moreover, given ω0 = 3/401, by Theorem 4.1, Theorem 4.2 or Theorem 4.3, we may see that every nontrivial solution has unsaturated upper positive part. Example 5.3. Consider the partial difference equation um+1,n +um,n+1−um,n +pm,n |um−2,n−2 |3/5 sgn um−2,n−2 +qm,n |um−11,n−1 |7/3 sgn um−1,n−1 = 0, where pm,n
⎧ ⎪ ⎨ −3/2 m = 9s and n = 10t; s, t ∈ Z[0, ∞) = , ⎪ ⎩1 otherwise
618
J. Yang et al. / Central European Journal of Mathematics 5(3) 2007 607–618
qm,n =
⎧ ⎪ ⎨ −1 m = 10s and n = 12t; s, t ∈ Z[0, ∞) ⎪ ⎩1
.
otherwise
Here we have α = 3/5, β = 7/3 and θ = 13/10. Since μ∗ (p < 0) =
1 , 90
β−1 1−α 1 μ∗ (q < 0) = μ∗ θp β−α q β−α ≤ 1 = , 120 β−1 1−α 119 β−α β−α μ∗ θp , q >1 = 120 and
β−1 1−α μ∗ ((p < 0) · (q < 0)) = μ∗ (p < 0) · (q < 0) · θp β−α q β−α ≤ 1 =
1 , 5400 by Theorem 3.3 or Theorem 3.4, every nontrivial solution is frequently oscillatory. On the other hand, given ω0 = 1/2701, by Theorem 4.1, Theorem 4.2 or Theorem 4.3, every nontrivial solution has unsaturated upper positive part.
Acknowledgment This work was supported by the NNSF of P.R.China (60404022,60604004), the NSF of Hebei Province (102160) and NS of Education office in Hebei Province (2004123).
References [1] B.G. Zhang and Q.J. Xing: “Oscillation of certain partial difference equations”, J. Math. Anal. Appl., to appear. [2] C.J. Tian, S.L. Xie and S.S. Cheng: “Measures for oscillatory sequences”, Comput. Math. Appl., Vol. 36, (1998), pp. 149–161. [3] S.S. Cheng, Partial Difference Equations, Taylor and Francis, 2003. [4] Z.Q. Zhu and S.S. Cheng: “Frequent oscillation in a neutral difference equation”, Southeast Asian Bull. Math., Vol. 29, (2005), pp. 627–634. [5] Z.Q. Zhu and S.S. Cheng: “Frequently oscillatory solutions for multi-level partial difference equations”, Int. Math. Forum, Vol. 1, (2006), pp. 1497–1509. [6] Z.Q. Zhu and S.S. Cheng: “Unsaturated solutions for partial difference equations with forcing terms”, Cent. Eur. J. Math., to appear. [7] E.F. Beckenbach and R. Bellman: Inequalities, Random House, N. Y. Berlin, 1961. [8] H. Halberstam and K.F. Roth: Sequences, Springer-Verlag, 1983. [9] I. Niven: “The asymptotic density of sequences”, Bull. Amer. Math. Soc., Vol. 57, (1951), pp. 420–434. [10] C.J. Tian and S.S. Cheng: “Frequent convergence and applications”, Dyn. Cont. Dis. Ser. A, Vol. 13, (2006), pp. 653–668.