This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
= h(y) (∇V η, T(y)−∇V η, ∇A(η)) eη(Ω),T(y)−A(η(Ω)) (14) = q(y, η)∇V η, T(y) − ∇A(η) . By the chain rule applied to Ψ (q(y(x), η), p(y)), we get < Ψ (q(y, η), p(y)), V >=< q (y, η), V > ∂1 Ψ (q, p), which gives < D (Ω), V >= χ q(y, η)∂1 Ψ (q, p)∇V η, T(y) − ∇A(η)dy. We introduce C= q(y, η)∂1 Ψ (q, p) (T(y) − ∇A(η)) dy = E[∂1 Ψ (q, p) (T(Y ) − E[T(Y )])] χ
which completes the proof. A.3 Proof of Lemma1 When using the MLE, the term E[T(Y )] can be empirically estimated with T(Y ) and so derived easily with respect to the domain Ω. We propose to directly derive the expression ∇A(η) = T(Y ) which gives: κ j=1
< ηj , V >
∂2A (η) =< Ti (Y ) , V > ∂ηi ∂ηj
∀i ∈ [1, κ] ,
(15)
¨ which can be written in the compact form ∇V (T) = A(η)∇ V η. ¨ Restricting our study to the full rank exponential family, where A(η) is a symmetric positive-definite, hence invertible, matrix (Theorem 2), the domain derivative of the pa¨ −1 ∇V (T) = ∇V η where ∇V (T) is given rameters η is uniquely determined by A(η) 1 by: ∇V (T) = |Ω| ∂Ω T(y) − T(y(a)) (V · N)da(x) (taking benefit of theorem 4) and the lemma follows.
Optimization of Divergences within the Exponential Family for Image Segmentation
149
A.4 Proof of Lemma 2 Since p and q belongs to the same parametric law, they share the same value for h(y), T(y) and A(η) and then log(q) − log(p) = η − η 1 , T(y) − A(η) + A(η 1 ). The value of C is then C = s1 − s2 , with: s1 = E[(η − η 1 , T(y) − A(η) + A(η 1 ) + 1)(Ti (Y ) − E[Ti (Y )]] p s2 = E[ (Ti (Y ) − E[Ti (Y )]]Ep [(Ti (Y ) − E[Ti (Y )]] q Developing the expression of the expectation of the second term,we find s2 = Ep [(Ti (Y ) − E[Ti (Y )]] = ∇A(η 1 ) − ∇A(η). Using the linearity of the expectation and the fact that E[Tj (Y )(Ti (Y )] − E[Ti (Y )]E[Tj (Y )] designates the co¨ ij = variance matrix of the sufficient statistics T and can then be replaced by A(η) κ ¨ ¨ ¨ Cov[T(Y )]ij = A(η)ji , we find: s1 = j=1 (ηj − η1j )A(η)ij , and then C = A(η) (η − η1 ) + ∇A(η) − ∇A(η 1 ).
Convex Multi-class Image Labeling by Simplex-Constrained Total Variation Jan Lellmann, Jörg Kappes, Jing Yuan, Florian Becker, and Christoph Schnörr Image and Pattern Analysis Group (IPA) HCI, Dept. of Mathematics and Computer Science, University of Heidelberg {lellmann,kappes,yuanjing,becker,schnoerr}@math.uni-heidelberg.de
Abstract. Multi-class labeling is one of the core problems in image analysis. We show how this combinatorial problem can be approximately solved using tools from convex optimization. We suggest a novel functional based on a multidimensional total variation formulation, allowing for a broad range of data terms. Optimization is carried out in the operator splitting framework using Douglas-Rachford Splitting. In this connection, we compare two methods to solve the Rudin-Osher-Fatemi type subproblems and demonstrate the performance of our approach on single- and multichannel images.
1
Introduction
In this paper, we study the variational approach inf f (u) , f (u) = − u(x), s(x)dx + λ TV(u) , u∈C
λ>0,
(1)
Ω
for determining a labeling u : Ω → RL , that is a contextual classification of each pixel x ∈ Ω into one out of L classes, based on an arbitrary vector-valued similarity function s(x) ∈ RL as input data that has been computed from image data beforehand. The objective function (1) comprises the common form of a data term plus a regularization term. The data term is given by the L2 inner product of the assignment variables u and the similarity function s, and the regularizer is a total variation (TV) formulation for vector-valued data, TV(u) = ∇u1 2 + · · · + ∇uL 2 dx . (2) Ω
Furthermore, the constraint u ∈ C restricts the vector field u(x) at each location x ∈ Ω to lie in the standard probability simplex, that is u(x) ∈ RL + and L i=1 u(x) i = 1 for all x ∈ Ω. Our work is motivated by the following observation. Suppose that at each pixel x ∈ Ω, there is an unambiguous assignment (labeling) of the data s(x) to some class l ∈ {1, . . . , L} represented by the corresponding l-th unit vector, X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 150–162, 2009. c Springer-Verlag Berlin Heidelberg 2009
Convex Multi-class Image Labeling by Simplex-Constrained Total Variation
151
Fig. 1. Left: Noisy input image. Right: The labeled image based on the non-binary assignment u as global minimizer of the convex approach (1). The discrete problem is accurately solved by a continuous approach.
u(x) = el . Then, an interface with√area A between two image regions labeled with l and l , respectively, adds A 2 to the regularization term iff l = l , as all but two gradients under the square root vanish. As a result, under these √ assumptions and up to the immaterial constant 2, the TV term corresponds to the well-known Potts model that assigns constant penalties to local changes of the labeling. A significant difference between the Potts model and our approach (1), however, is that the former amounts to solve a discrete combinatorial problem, whereas the latter is a continuous convex optimization problem. Experiments show that our approach (1) approximates discrete decisions fairly well (Fig. 1 and 2) by computing a global optimum to a single convex optimization problem. By contrast, the state-of-the-art discrete approach [1] approximates the combinatorial solution by solving a non-uniquely defined sequence of binary problems via graph cuts. This fact, along with the potential of continuous convex optimization for parallel implementations and their more robust dependency on (hyper-) parameters, motivated to investigate the approach (1) as a promising model for a general “labeling submodule” within computer vision systems. To this end, – We have a closer look at the data and regularization terms (section 2). – We apply an operator splitting approach to (1) in order to decompose the computation of a globally optimal labeling into two independent computational steps: TV denoising for vector-valued data, and projection of the labeling vectors u(x) on the canonical simplex (section 3). – We evaluate two different algorithms for the TV denoising subroutine (section 4) and compare the performance of our convex method to a range of established graph cut-based approaches (section 5). Related work. In contrast to the binary case with anisotropic discretization [2], multi-class energies are generally not submodular and thus cannot be optimized globally using graph cuts [3]. Some extensions exist, which find a local minimum by solving a sequence of binary graph cuts [1]. The continuous formulation – optimization on the set of characteristic functions – is known as continuous cut [5]. Chan et al. [6] showed that this problem can be relaxed and solved on
152
J. Lellmann et al.
Fig. 2. Output of the standard TV approach [4] for scalar-valued images applied to the noisy input image depicted in Fig. 1, for different values of the regularization parameter λ. Irrespective of this value, the performance is worse than with the approach (1) (cf. Fig. 1, right), because the latter approximates the Potts model that does not depend on the size (contrast) of discontinuities. Consequently, the former approach cannot remove noise without degrading weak discontinuities, as is apparent above for the horizontal discontinuities.
a convex set, without losing global optimality. In contrast, our work is aimed at the multi-class case. In [7], a comparable approach based on [8] was presented, which relies on a natural ordering of the labels, as given in e.g. stereo reconstruction. An approach very similar to ours was recently presented in [9], where the authors use a different formulation of the total variation on vector fields, and an alternating optimization method. The (discrete) Potts model was studied in [10], where approximate solutions were computed by an LP relaxation with explicit constraints. In contrast, our approach considers the general TV term and a problem decomposition into efficiently solvable subproblems, without the need to introduce additional variables. Notation. We consider the discretized version of our approach (1). Let Ω = {1, . . . , n1 } × · · · × {1, . . . , nd } ⊆ Rd , d ∈ N, denote a regular image grid of n := |Ω| pixels. The (multidimensional) image space X := Rn×L is equipped with the Euclidean inner product ·, ·Ω over the vectorized elements. We naturally identify v = (v 1 , . . . , v L ) ∈ Rn×L with ((v 1 ) · · · (v L ) ) ∈ RnL . Superscripts v i denote a collection of vectors, while subscripts vk denote vector components. Using the notation e = (1, 1, . . . , 1) , the standard simplex on RL n×L L and its extension C on R are given by ΔL := v ∈ R v ≥ 0 , e, v = 1
and C := x∈Ω ΔL . Define δC (x) to be 0 iff x ∈ C, and +∞ otherwise. Let grad := (grad 1 , . . . , gradd ) be the d-dimensional forward difference gradient operator for Neumann boundary conditions. Accordingly, div := −grad is the backward difference divergence operator for Dirichlet boundary conditions. These operators extend to Rn×L via Grad := (IL ⊗ grad), Div := (IL ⊗ div), where IL is the L × L identity matrix. We will also need the convex sets L 12 Bλ := (p1 , . . . , pL ) ∈ Rd×L pi 22 λ , i=1
(3)
Convex Multi-class Image Labeling by Simplex-Constrained Total Variation
Dλ :=
Bλ ⊆ Rn×d×L , Eλ := {u ∈ Rn×L |u = Div p , p ∈ Dλ } .
153
(4)
x∈Ω
The discrete total variation on vector-valued data is then defined as
TV(u) := σE1 (u) = Gx u2 ,
(5)
x∈Ω
where σM (u) := supp∈M u, p is the support function from convex analysis, and Gx is an (Ld)×n matrix composed of rows of (Grad) s.t. Gx u gives the gradients of all ui in x stacked one above the other.
2
Variational Approach
Based on the introduced notation, our novel approach (1) reads inf f (u) ,
u∈C
λ TV(u) , f (u) = −u, sΩ + data term regularization term
λ>0,
(6)
As the objective function f and the constraint set C are convex, the overall problem is convex as well. We will now define and motivate each term. Data Term. The data term in (6) is fairly general. Any vector-valued similarity function s can be used, whose components s(x) i indicate the affinity of some data point at x with class i. As an example, suppose we have image features g(x), x ∈ Ω, prototypical feature vectors G = (G1 , . . . , GL ) as well as a distance measure d on the features. We might think of g as a grayscale image, of G as some prototypical gray values, and of d as a quadratic distance measure, possibly derived from a statistical noise model. The hard assignment of the pixel x ∈ Ω to a label (or class) l(x) ∈ {1, . . . , L} should then be penalized by the distance d(g(x), Gl(x) ) of the corresponding feature to the prototype of the assigned class. Denoting the negative distance by s, and summing up over the image domain, we see that
s(x), u(x) for u(x) = el(x) . d g(x), Gl(x) = − (7) x∈Ω
x∈Ω
Thus, instead of looking for l ∈ {1, . . . , L}n , we may equivalently look for u ∈ {e1 , . . . , eL }n . However, the right hand side formulation has the advantage that it extends naturally to the soft assignment u ∈ C: We may now solve the easier problem of optimizing for u on the convex set C. In our experiments, we chose d(x, y) = ||x − y||1 , as the 1 -norm is still convex but known to be more robust against noise and outliers. However, s is not restricted to representing distances. In fact, it may be arbitrarily nonlinear and nonconvex in x and g, and involve nonlocal operations on g. The complexity is completely hidden within the precomputed vector s. Regularization Term. Recall that the regularizer of (6) is defined (5) as
TV(u) = sup u, Div p = Gx u2 . (8) p∈D1
x∈Ω
154
J. Lellmann et al.
This definition for vector-valued u parallels the definition of the “isotropic” total variation measure in the scalar-valued case [11, 4, 12]. It is also known as MTV [13, 14, 15], and was recently studied in [16] in its continuous formulation. Contrary to the anisotropic discretization, where one would substitute the sum of 1-norms in (3), it is less biased towards edges parallel to the axes. See also [17] for an overview of TV-based research and applications. Optimality. After solving the relaxed problem, it remains to show that a binary solution can be recovered. For the continuous, binary case, Chan et al. [6] showed that an exact solution can be obtained by thresholding at almost any threshold. However, their results do not immediately transfer to the discrete multi-class case. In particular, the crucial “layer cake” formula holds for 1 -, but not 2 discretizations of the TV. Contrary to the binary case, it is not clear which rounding scheme to use for vector-valued u. For our experiments, we chose the final class label for each pixel x as the index l of the maximal u∗l (x) of the global optimum u∗ of (6). This defines a suboptimal discrete solution u∗t . Bounding the error f (u∗t ) − f (u∗d) with respect to the unknown discrete optimum u∗d will be subject of our future work.
3
Optimization
Two basic problems arise concerning the optimization of (6): Nondifferentiability of the objective function due to the TV term, and handling of the simplex constraint u ∈ C. We cope with the latter using the tight Douglas-Rachford splitting method as presented in the following section. We refer to [18] for the full derivations. Douglas-Rachford Splitting. Minimization of a proper, convex, lower-semicontinuous (lsc) function f : X → R can be regarded as finding a zero of its (necessarily maximal monotone [19, Chap. 12]) subgradient operator T := ∂f : X ⇒ X. In the operator splitting framework, ∂f is assumed to be decomposable into the sum of two “simple” operators, T = A + B, of which forward and backward steps can practically be computed. Here, we consider the (tight) Douglas-Rachford-Splitting iteration [20, 21], z k+1 ∈ (Jτ A (2Jτ B − I) + (I − Jτ B ))(z k ) ,
(9)
where Jτ T := (I + τ T )−1 is the resolvent of T . Under the very general constraint that A and B are maximal monotone and A + B has at least one zero, the sequence (z k ) will converge to a point z, with the additional property that x := Jτ B (z) is a zero of T ( [22, Thm. 3.15], [22, Prop. 3.20], [22, Prop. 3.19], [23]). In particular, for f = f1 +f2 , fi proper, convex, lsc with ri(dom f1 )∩ri(dom f2 ) = ∅ (ri(S) denoting the relative interior of a set S), it can be shown [19, Cor. 10.9] that ∂f = ∂f1 + ∂f2 , and the ∂fi are maximal monotone. As x ∈ Jτ ∂fi (y) ⇔ x = argmin(2τ )−1 x − y22 + fi (x), the computation of the resolvents reduces to proximal point optimization problems involving only the fi .
Convex Multi-class Image Labeling by Simplex-Constrained Total Variation
155
Application. For our specific problem, we split inf (f1 (u) + f2 (u)) , f1 (u) = −u, sΩ + λTV(u) , f2 (u) = δC (u) .
u∈C
(10)
and get the following Douglas-Rachford scheme: Algorithm 1. Outer loop (Douglas-Rachford) 1: choose some u0 and a fixed step size τ > 0 2: repeat 1 3: solve uk ← argminu { 2τ u − z k 2 − u, s + σEλ (u)} 1 k 4: solve w ← argminw { 2τ w − (2uk − z k )2 + δC (w)} 5: z k+1 ← z k + wk − uk 6: until uk − uk−1 ∞ δouter .
As f is bounded from below on the compact set C and thus attains its minimum. From the remarks in the last section, we get convergence of the scheme for the discrete case: δC (w) and σEλ are both proper, convex, lsc with dom σEλ = Rn and ri(C) = ∅. In practice, one has to deal with solutions of the subproblems with limited accuracy. While there are extensions of the convergence result that take these inexact solutions into account [22, Prop. 4.50], they require the subproblems to be solved with increasing accuracy. However, we found that the method generally converged even though these requirements were not met.
4
Inner Loop Optimization
The second subproblem (Alg. 1, step 4) is a projection on the constraint set, wk = ΠC (2uk − z k ), which requires one projection on the low-dimensional unit simplex ΔL per x ∈ Ω. These projections can be computed in a finite number of steps [24]. The first subproblem (step 3) is equivalent to 1 uk = argminu u − (z k + τ s)2 + (τ λ)T V (u), (11) 2 i.e. an extension to vector vector-valued u of the classical Rudin-Osher-Fatemi (ROF, TV-L2 ) problem with regularization parameter τ λ. Many methods have been suggested to solve the ROF problem, e.g. PDE, fixpoint, or interior point methods for primal [4, 25], dual [26, 27, 28], or mixed [29] formulations. We evaluate two approaches: First, we will formulate a particularly simple gradient projection method in the operator splitting framework, cf. [30]. This scheme was introduced in [27] and extended to the multidimensional case in [31] (see also [16]). The second approach is based on the fast half-quadratic method of Yang et al. [15]. −1 k Forward-backward approach. The optimality of step (z − k 3, τ k condition u) + s ∈ ∂σEλ (u), can be rewritten as u = τ z /τ + s − ΠEλ z /τ + s . To compute the projection ΠEλ , we use the dual representation, 1 1 2 2 ΠEλ (x) = argmin q − xΩ = Div argmin Div p − xΩ + δDλ (p) . (12) 2 p q∈Eλ 2
156
J. Lellmann et al.
Using a simple forward-backward splitting for the inner problem results in the (gradient projection) update rule pj+1 = ΠDλ p − νDiv (Div p − x) . The projection ΠDλ can be computed explicitly and is separable in x, while the inner part can be computed for all models independently. This opens up the method to parallelization. Convergence is guaranteed for ν < 2/Div Div (see e.g. [22,√Thm. 3.12]). Extending the argument in [26, Thm. 3.1], we find that div 4d. Accord1 ingly, we may set ν < 2d . In our experiments, we set ν = 0.95 2d to avoid numerical problems close to the theoretical maximum. Wrapping up, we have Algorithm 2. Inner loop, forward-backward approach 1: 2: 3: 4: 5:
k
x ← zτ + s, choose arbitrary p0 ∈ Rn×d×L repeat pj+1 = ΠDλ (pj − νDiv (Div p − x)) until pj+1 − pj ∞ δinner uk ← τ (x − Div pj+1 ).
Half-quadratic approach. While the forward-backward method is simple and easy to implement, its convergence speed is in practice not satisfactory. As an alternative, we tested an ROF specialization of the general multichannel image restoration method by Yang et al. [15]. Starting from (11), the problem is to find μ uk = argminu g(u) , g(u) := u − f 2 + T V (u) , (13) 2 where μ := τ1λ and f := z k + τ s. Using a half-quadratic approach [32, 33], Yang et al. derive the splitting/penalty formulation
μ β 2 (u, y) = argmin yx + yx − Gx u + u − f 2Ω . (14) 2 2 Ld nL yx ∈R ,x∈Ω,u∈R x∈Ω
The parameter β controls smoothing of the total variation; setting β n/(2ε) guarantees ε-suboptimality of the solution of the smoothed problem with respect to the original problem (for a derivation see [18]). Equation (14) can be solved using alternating minimization w.r.t. u and the auxiliary variables yx . The latter is highly parallelizable, as it boils down to n separate explicit operations: yxj+1 = max Gx u − β −1 , 0 (Gx u/Gx u) . (15) On the other hand, minimizing (14) for u amounts to solving μ Grad Grad + (μ/β)I(nL) uj+1 = Grad y j+1 + f, β for uj+1 , where y j+1 is a proper rearrangement of the yx .
(16)
Convex Multi-class Image Labeling by Simplex-Constrained Total Variation
157
Fig. 3. Results of the speed comparison between forward-backward (FB) and halfquadratic method (HQ) for the inner problem, applied to data from the first iteration of the outer problem (cf. Table 1). Left to right: Original input, FB with τ λ = 5, HQ with τ λ = 5, FB with τ λ = 20, HQ with τ λ = 20. Iteration counts were fixed at 80 resp. 300 to equalize the runtime for both approaches. For larger regularization parameter, the half-quadratic method outperforms the forward-backward approach as smoothness increases.
For periodic boundary conditions, Yang et al. solved (16) rapidly using FFT. In our case, Neumann boundary conditions and thus the Discrete Cosine Transform (DCT-2) [34] are appropriate. This requires 2L independent (parallelizable) individual DCTs which can be efficiently computed in O(n log n) each. By the alternating application of the above two steps, we can solve (14) for fixed β large enough for any required suboptimality bound. In practice, convergence can be sped up by starting with a small β and solving a sequence of problems for increasing β, warm-starting each with the solution for the previous problem. Given an arbitrary u0 ∈ RnL , the complete algorithm reads Algorithm 3. Inner loop, half-quadratic approach 1: while stopping criterium not satisfied do 2: compute y j+1 from (15) 3: compute uj+1 from y j+1 and (16), 4: possibly increase β 5: end while The stopping criteria can be based on the residual [15]. For our experiments, we set a fixed iteration count, as increasing β at each step turned out to lead to fastest convergence, and residua for different β are not comparable.
5
Experiments, Performance Evaluation
Inner Problem. We compared the half-quadratic approach to the conventional forward-backward method. The difficulty with the former lies in the choice of the update strategy for β. We chose a generalization of the exponential strategy outlined in [15]: Set β = βmin and update by multiplying with c := (βmax /βmin)1/K for some K until β = βmax . We made the following observations: – In order to rapidly minimize the objective function, it is best to use a continuation strategy, i.e. to increase β at each step, rather than spending time on solving (14) exactly for each β.
158
J. Lellmann et al.
– Increasing K generally improves the quality of the result. – For fixed βmax and K, there seems to be a unique optimal βmin that minimizes the final objective function value. With the continuation strategy and fixed βmax , we found the optimal βmin to usually lie in the range of 10−5 βmax to 10−3 βmax . Unfortunately, there seems to be a strong dependency on the choice of λ as well as the scale and complexity of s. We set βmin = 0.2 · 10−4 βmax , which worked well for our data. βmax was set at n/0.2 according to a suboptimality bound of ε = 0.1 (section 4). We compared the performance of the two methods in terms of the objective function value for fixed runtime of the optimized Matlab implementations (Fig. 3, Table 1). For larger τ λ,the half-quadratic method gives better results. For τ λ = 20, less than 10 iterations are required to reach the quality of 300 iterations of the forward-backward method, giving a speedup of about 4-5. However, finding the optimal parameter set is more involved than for the forward-backward method. Table 1. Run times t (in seconds), objective function values r and relative differences (rHQ − rFB )/rHQ for the experiment in Fig. 3. For larger τ λ, the half-quadratic method gives more accurate results in the same time. τλ 0.1 1 2 5 10 20 50 tHQ 1.14 1.23 1.20 1.31 0.98 0.95 1.08 tFB 1.03 1.02 1.06 1.03 1.22 1.25 1.19 rHQ 3901.9 27660.7 36778.5 40038.8 42262.8 44377.1 44752.5 rFB 3901.9 27660.4 36760.6 40104.3 42924.3 46988.6 57504.9 rel. diff. 1.17e-16 1.24e-5 4.85e-4 -1.64e-3 -0.0156 -0.0588 -0.285
Overall Problem. We evaluated the performance of our algorithm against five different methods in their publicly available implementations from the Middlebury MRF benchmark [35]: Belief Propagation (BP), Sequential Belief Propagation (BPS), Graph Cuts with alpha-expansion (GCE) and alpha-beta swap (GCS), and Sequential Tree Reweighted Belief Propagation (TRBPS). Of each of the grayscale 32 × 32 images, 20 noisy copies were generated and segmented into four gray levels with fixed intensities. In view of the last section and in order not to mix up speed with accuracy issues, we used the forward-backward approach for the inner loop. We set δinner = 1 · 10−3 , δouter = 2 · 10−2, and τ = 1. For small λ, our method shows results comparable to the other approaches with respect to the number of bad labels. We point out again that this solution to the non-binary labeling problem is achieved by solving the convex optimization problem (6) followed by local rounding as explained in section 2. In contrast to our method, the MRF benchmark algorithms optimize the anisotropic energy. To compensate, their λ was scaled by a common factor of √ ≈ 2 that was found empirically. Nevertheless, the discretization gives them a small advantage on images with axis parallel edges (experiments 1 and 2).
Convex Multi-class Image Labeling by Simplex-Constrained Total Variation
159
25
50 bp bps gce gcs trws tv
40 30
bp bps gce gcs trws tv
20 15 10
20 10 0 0
Standard deviation
Incorrect labels (mean %)
Fig. 4. Exemplary grayscale segmentation results for the benchmarked methods for four labels. Left to right: Noisy input data, final results for BP, BPS, GCE, GCS, TRWS, and the proposed method (TV). λ was manually chosen for each method. Axis-parallel edges are better recovered by the anisotropic methods, while our isotropic discretization has an advantage on diagonal edges.
5 0
0.2
0.4
0.6
λ
0.8
1
1.2
1.4
0
0.2
0.4
0.6
λ
0.8
1
1.2
1.4
Fig. 5. Error rates for the first experiment in Fig. 4. For each λ, all experiments were repeated 20 times with random noise (zero-mean Gaussian with σ = 0.45, 0.35, 0.25 resp. 0.35 for experiments 1-4 and image intensities in [0, 1]), and the percentage of incorrectly assigned labels compared to ground truth was recorded. Sequential Belief Propagation (BPS) generally performed worst, while our method (TV) was on par with the others, in particular for lower λ. The figure also reveals that belief propagation (BP) gets stuck in a good, but often inferior local optimum, and does not respond to larger values of λ, i.e. stronger regularization requested by the user.
Figure 6 demonstrates the performance of our algorithm for color segmentation. Only few outer iterations (20 in our case) are necessary for accurate optimization.
160
J. Lellmann et al.
Fig. 6. Performance of our method for four-class segmentation based on 1 color distance. Left to right: Ground truth, inspired by [29, 36]; ground truth overlaid with Gaussian noise, σ = 1; local nearest-neighbor labeling; our approach with λ = 0.7 after 20 outer iterations. The energy of the result is about 1% lower than the energy of the ground truth, suggesting that at this noise level, further improvements are limited by the model.
6
Conclusion and Future Work
In this paper, we presented a convex variational approach to solve the combinatorial multi-labeling problem for energies involving a general data term, total-variation-like regularizers, and simplex constraints. To enforce the simplex constraint, we based our approach on the globally convergent Douglas-Rachford operator splitting scheme. We evaluated two methods in order to efficiently solve the ROF-type subproblems, and showed that the half-quadratic approach allows faster convergence at the price of more involved parameter tuning. Experiments showed that the quality of the generated labelings is comparable to state of the art discrete optimization methods, and can be achieved by just solving a convex optimization problem. Due to the generality of the data term, our method allows for a wide range of features or distance measures. To fully evaluate these possibilities in connection with variations of the TV measure is a subject of our future research. Acknowledgements. Jing Yuan gratefully acknowledges support by the German National Science Foundation (DFG) under grant SCHN 457/9-1.
References 1. Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. PAMI 23(11), 1222–1239 (2001) 2. Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. PAMI 26(9), 1124–1137 (2004) 3. Kolmogorov, V., Zabih, R.: What energy functions can be minimized via graph cuts? PAMI 26(2), 147–159 (2004) 4. Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 5. Strang, G.: Maximal flow through a domain. Math. Prog. 26, 123–143 (1983)
Convex Multi-class Image Labeling by Simplex-Constrained Total Variation
161
6. Chan, T.F., Esedo¯ glu, S., Nikolova, M.: Algorithms for finding global minimizers of image segmentation and denoising models. J. Appl. Math. 66(5), 1632–1648 (2006) 7. Pock, T., Schönemann, T., Graber, G., Bischof, H., Cremers, D.: A convex formulation of continuous multi-label problems. In: ECCV, vol. 3, pp. 792–805 (2008) 8. Ishikawa, H.: Exact optimization for Markov random fields with convex priors. PAMI 25(10), 1333–1336 (2003) 9. Zach, C., Gallup, D., Frahm, J.M., Niethammer, M.: Fast global labeling for realtime stereo using multiple plane sweeps. In: VMV (2008) 10. Kleinberg, J., Tardos, E.: Approximation algorithms for classification problems with pairwise relationships: Metric labeling and MRFs. In: FOCS, pp. 14–23 (1999) 11. Ziemer, W.: Weakly Differentiable Functions. Springer, Heidelberg (1989) 12. Meyer, Y.: Oscillating Patterns in Image Processing and Nonlinear Evolution Equations. Univ. Lect. Series, vol. 22. AMS (2001) 13. Sapiro, G., Ringach, D.L.: Anisotropic diffusion of multi-valued images with applications to color filtering. Trans. Image Process. 5, 1582–1586 (1996) 14. Chan, T.F., Shen, J.: Image processing and analysis. SIAM, Philadelphia (2005) 15. Yang, J., Yin, W., Zhang, Y., Wang, Y.: A fast algorithm for edge-preserving variational multichannel image restoration. Tech. Rep. 08-09, Rice Univ. (2008) 16. Duval, V., Aujol, J.F., Vese, L.: A projected gradient algorithm for color image decomposition. CMLA Preprint (2008-21) (2008) 17. Chan, T., Esedoglu, S., Park, F., Yip, A.: Total variation image restoration: Overview and recent developments. In: The Handbook of Mathematical Models in Computer Vision. Springer, Heidelberg (2005) 18. Lellmann, J., Kappes, J., Yuan, J., Becker, F., Schnörr, C.: Convex multi-class image labeling by simplex-constrained total variation. TR, U. of Heidelberg (2008) 19. Rockafellar, R., Wets, R.J.B.: Variational Analysis, 2nd edn. Springer, Heidelberg (2004) 20. Douglas, J., Rachford, H.H.: On the numerical solution of heat conduction problems in two and three space variables. Trans. of the AMS 82(2), 421–439 (1956) 21. Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM Journal on Numerical Analysis 16(6), 964–979 (1979) 22. Eckstein, J.: Splitting Methods for Monotone Operators with Application to Parallel Optimization. PhD thesis, MIT (1989) 23. Eckstein, J., Bertsekas, D.P.: On the Douglas-Rachford splitting method and the proximal point algorithm for max. mon. operators. M. Prog. 55, 293–318 (1992) 24. Michelot, C.: A finite algorithm for finding the projection of a point onto the canonical simplex of Rn . J. Optim. Theory and Appl. 50(1), 195–200 (1986) 25. Dobson, D.C., Curtis, Vogel, R.: Iterative methods for total variation denoising. J. Sci. Comput 17, 227–238 (1996) 26. Chambolle, A.: An algorithm for total variation minimization and applications. JMIV 20, 89–97 (2004) 27. Chambolle, A.: Total variation minimization and a class of binary MRF models. In: Rangarajan, A., Vemuri, B.C., Yuille, A.L. (eds.) EMMCVPR 2005. LNCS, vol. 3757, pp. 136–152. Springer, Heidelberg (2005) 28. Aujol, J.F.: Some algorithms for total variation based image restoration. CMLA Preprint (2008-05) (2008) 29. Chan, T.F., Golub, G.H., Mulet, P.: A nonlinear primal-dual method for total variation-based image restoration. J. Sci. Comput. 20, 1964–1977 (1999) 30. Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. SIAM J. Multisc. Model. Sim. 4(4), 1168–1200 (2005)
162
J. Lellmann et al.
31. Bresson, X., Chan, T.: Fast minimization of the vectorial total variation norm and applications to color image processing. Tech. Rep. 07-25, UCLA (2007) 32. Geman, D., Yang, C.: Nonlinear image recovery with halfquadratic regularization. IEEE Trans. Image Proc. 4(7), 932–946 (1995) 33. Cohen, L.: Auxiliary variables and two-step iterative algorithms in computer vision problems. JMIV 6(1), 59–83 (1996) 34. Strang, G.: The discrete cosine transform. SIAM Review 41(1), 135–147 (1999) 35. Szeliski, R., Zabih, R., Scharstein, D., Veksler, O., Kolmogorov, V., Agarwala, A., Tappen, M., Rother, C.: A comparative study of energy minimization methods for Markov random fields. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 16–29. Springer, Heidelberg (2006) 36. Hintermüller, M., Stadler, G.: An infeasible primal-dual algorithm for total bounded variation-based inf-convolution-type image restoration. J. Sci. Comput. 28(1), 1–23 (2006)
Geodesically Linked Active Contours: Evolution Strategy Based on Minimal Paths Julien Mille and Laurent D. Cohen CEREMADE, UMR CNRS 7534, Université Paris IX-Dauphine Place du Maréchal de Lattre de Tassigny, 75016 Paris, France {mille,cohen}@ceremade.dauphine.fr
Abstract. The proposed method is related to parametric and geodesic active contours as well as minimal paths, in the context of image segmentation1 . Our geodesically linked active contour model consists in a set of vertices connected by paths of minimal cost. This makes up a closed piecewise defined curve, over which an edge or region energy functional is formulated. The greedy algorithm is used to move vertices towards a configuration minimizing the energy functional. This evolution technique ensures lower sensitivity to erroneous local minima than usual gradient descent of the energy. Our method intends to take advantage of explicit active contours, minimal paths and greedy evolution techniques.
1
Introduction
Among well known variational models for image segmentation, active contours have drawn lively interest since their introduction by Kass et al [1]. Their key principle is the research of a curve minimizing an energy functional, which mainly depends on the adequacy of the curve to the target object. Active contours are implemented either with a parametric curve - in which case they are often referred to as ’snakes’ - or in an implicit fashion based on the level set framework [2] [3]. Early active contour models are mainly parametric and boundary-based, as the data term of the energy functional is an edge indicator function integrated along the curve. The Euler-Lagrange equation, determined by calculus of variations, indicates the minimizing flow to be followed by gradient descent scheme. These models are dependent of curve parameterization and unable to adapt their topology. Moreover, gradient descent is sensitive to local minima of the energy functional. Parameterization invariance is achieved by the geodesic active contour model [4], which introduces a geometrically intrinsic functional, whereas topology adaptiveness is provided by the level set implementation. Significant attempts have been made to decrease the sensitivity to local minima, based either on the gradient descent direction or on the minimization method itself. The balloon force [5] falls into the first category, as it adds a normal-oriented inflation or retraction component, in order to increase the capture range of the snake. As regards the evolution process, several heuristics based 1
This work was partially supported by ANR grant NanoGPSCellulaire ANR-05NANO- 045-06.
X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 163–174, 2009. c Springer-Verlag Berlin Heidelberg 2009
164
J. Mille and L.D. Cohen
on local searches have been proposed as alternatives to gradient descent, including dynamic programming [6] [7] and the greedy algorithm [8] [9]. The latter, which is subsequently addressed in the paper, considers the energy as a sum of curve points energies. It basically consists in iteratively moving curve points to locations minimizing their own energies, these locations belonging to a search window. On the other hand, the minimal path approach by Cohen and Kimmel [10], which seeks for a curve of minimal cost between two end-points, can be used to recover open and closed boundaries. It is closely related to the geodesic active contour with respect to the functional to be optimized, but has in addition the main benefit of finding a global minimum efficiently thanks to the Fast Marching technique [11]. In this paper, we deal with an explicit implementation of active contour, i.e. a discrete curve defined by control points, or vertices. The described method is both related to minimal paths and greedy search. Our geodesically linked active contour model is made up of a set of vertices connected by paths of minimal cost with respect to a boundary-based metric. We define search windows centered at each vertex and evolve vertices according to a greedy fashion. Making a given vertex movable and the other ones still, we consider every geodesically linked contour passing through the points in the window of the moving vertex. This last one is finally moved to the location leading to the contour of smallest energy. The motivation for this work resides in several points. Firstly, the minimal path approach alone can only find a minimizer of an edge functional, with one or several(s) fixed input end-point(s). Conversely, our model is suitable to any energy functional, which we prove by endowing it with edge-based or different regionbased energies, including the minimal variance of the Chan and Vese model [12]. We believe that describing the curve with geodesics is pertinent whatever the energy functional is. Indeed, whether the functional holds edge, region and/or even shape prior terms, the major part of the final curve will be located on more or less salient edges. In comparison to snakes driven by gradient descent, the use of search windows significantly reduces sensitivity to erroneous local minima and energy weights tuning.
2 2.1
Background Parametric and Geodesic Active Contours
The active contour model is represented as a plane curve Γ with C 2 position vector c(u) = [x(u) y(u)]. Segmentation of an object of interest is performed by finding the curve minimizing an energy functional E, which has the general form: 1 E(Γ ) = L(c, c , c )du (1) 0
where L is usually made up of internal terms regularizing the curve and external terms attaching the curve to image data. According to calculus of variations, the following variational derivative vanishes if the curve is a local minimizer of E: δE ∂L d ∂L d2 ∂L = − + 2 (2) δΓ ∂c du ∂c du ∂c
Geodesically Linked Active Contours
165
Curve evolution is usually performed by gradient descent, taking the opposite variational derivative as a descent direction. Given an image I defined over D ∈ R2 , they use the following edge indicator g, which is a decreasing function of gradient magnitude (the image is usually convolved with the derivative of a gaussian). The original parametric snake [1] has the following energy and variational derivative: 1 2 2 Lsnake (c, c , c ) = α c + β c + g(c) 2 (3) δEsnake = −αc + βc + ∇g δΓ The energy functional of the snake is dependent on parameterization. This has an impact on discretization, since the energy varies in terms of sampling when the contour is implemented as a polygonal curve. The geodesic active contour (GAC) [4] solves the parameterization issue by introducing an intrinsic energy functional, weighting the edge indicator by length element c : LGAC (c, c , c ) = g(c) c (4)
δEGAC = (∇g, n − κg) n δΓ
where n and κ are the unit inward normal vector and curvature, respectively. Hence, the flow resulting from the geometric energy also holds a regularization term. This model lends itself to level set implementation, allowing topology changes. Boundary-based models driven by gradient descent, whether parametric or geodesic, are relatively blind to neighboring structures and may get trapped in local minima induced by noise. To increase the capture range, the balloon force was introduced in [5] for parametric contours, whereas an advection term is used in [3] for level sets. Despite such techniques, gradient descent may still cause the contour to miss or pass through significant boundaries. The minimal path method addresses this issue by finding a global minimum of the energy. 2.2
Minimal Paths
The minimal path approach by Cohen and Kimmel [10] aims at finding curves of minimal lengths in a Riemannian space endowed with an heterogeneous isotropic metric. The length of path C is: 1 1 L(C) = (5) P˜ (C(s))ds = P˜ (C(u)) C (u) du 0
0
where s is the arc length. Potential P˜ , which defines the isotropic metric, should be chosen according to the application. Curves located on image boundaries are detected by using an edge-dependent potential P˜ (x) = w + g(x), where w is a regularizing constant. Hence the cost of C may be rewritten using euclidean length: 1 1 L(C) = (w + g(C(s)))ds = wLeuclidean(C) + g(C(s))ds (6) 0
0
166
J. Mille and L.D. Cohen
With respect to the energy functional to be minimized, the minimal path approach is similar to the geodesic active contour model, as can be seen in term LGAC of eq. 4. However, the minimal path has the avantageous difference of reaching the global minimum of the energy, given two fixed end-points x0 and x1 . Starting from point x0 , the minimal action map U0 should be calculated. It corresponds to the minimal cost integrated along a path starting at x0 and ending at x: C(0) = x0 U0 (x) = inf L(C) s.t. C(1) = x C The action map U0 is the viscosity solution of the Eikonal equation ∇ U = P˜ with initial condition U (x0 ) = 0. This allows U0 to be computed by the Fast Marching method [11], which is similar in principle to Dijkstra’s graph search algorithm. Once the action map has been computed, the geodesic γ, i.e. the path of minimal action linking a point x1 to x0 , is found by back-propagation starting from x1 until x0 is reached: γ = −∇U0 (γ). In its initial formulation, the minimal path method determines an open curve between two fixed end-points. It is also able to find closed contours by providing only one point on the final contour and detecting a saddle point on the minimal action map [10]. 2.3
Greedy Algorithm
Along with dynamic programming [6] [7], greedy methods deal with discrete energy functionals. The greedy algorithm for active contours, as developed in [8], seeks for a minimizer of the energy by means of a set of local optimizations. It is only applicable on explicit implementations, where the contour is represented as a polygon with n vertices {vi }1≤i≤n . The total energy is considered as a sum of vertex energies: n E(Γ ) = Evertex (vi ) i=1
where Evertex is the discretization of the energy at a given vertex, using finite differences. Considering the snake term Lsnake of eq. 3, it comes: 1 2 2 α vi − vi−1 + β vi+1 − 2vi − vi−1 + g(vi ) (7) Evertex (vi ) = 2 Vertices are successively moved in order to minimize their own energies. At each iteration, a square window of width m is considered around the current vertex. ˜ i in the window. The energy of the latter is computed at each tested position v The vertex is then moved to the position leading to the lowest energy, which is summarized by the evolution scheme, at iteration t: (t+1)
vi
= arg min Evertex (˜ vi ) ˜ i ∈W vi(t) v
where W(x) is the window centered at point x. The initial greedy algorithm [8] performs in O(nm2 ) operations. The window size has an obvious impact on computational cost, but also on convergence abilities. Indeed, the contour can capture
Geodesically Linked Active Contours
167
farther structures as the window is larger. The greedy algorithm is by essence a discrete optimization heuristic. The formulation of the variational derivative is not used and continuous calculus of variations is thus not necessary.
3
The Geodesically Linked Active Contour
We develop an approach taking advantage of above described methods. Our geodesically linked active contour is based simultaneously on an explicit implementation of active contours, minimal paths and the greedy algorithm. Basically, we deal with an evolving explicit closed curve, allowing initialization inside or around the target object without providing fixed points. Minimal paths coupled with a geometric energy functional allows a parameterization-free handling of the contour. The use of the greedy algorithm, as opposed to gradient descent, guarentees better robustness to local minima. 3.1
Minimal Paths to Connect Vertices
Let us consider a set of n linked vertices S = {vi }1≤i≤n . We denote as γi (u) = [xi (u) yi (u)] the geodesic path connecting vi to vi+1 : C(0) = vi γi = arg min L(C) s.t. (8) C(1) = vi+1 C where the cost functional L is defined in eq. 5. At every step of the evolution algorithm, the set of geodesics {γi }1≤i≤n describes a closed piecewise differentiable contour Γ , which euclidean length is: Leuclidean(Γ ) =
n i=1
0
1
γi (u) du
One may note that a concatenation of geodesics γi is not a geodesic itself, since it is forced to pass through given points. To some extent, curve Γ may be considered as a piecewise minimizer of an edge-based functional. If a uniform potential P˜ (x) = 1 was chosen, the geodesics would become straight lines of equation γi (u) = (1 − u)vi + uvi+1 , u ∈ [0, 1], in which case Γ would represent a polygon. Fig. 1 depicts geodesically linked contours with uniform potential and edge-based potential (dark smooth lines represent high image gradient areas). As described in section 2.2, path γi is determined by gradient descent of the minimal action map Ui+1 of origin vi+1 . Given start point vi+1 , the Fast Marching algorithm [11] allows to specify one or several end points (in our case vi ) so that propagation can be stopped when vi is reached. This prevents the whole image from being visited by the Fast Marching and saves computational time. In the case of edge-based segmentation, the interest of describing the evolving contour with geodesics is obvious. Indeed, in the end of deformation, the
168
J. Mille and L.D. Cohen
(a)
(b)
Fig. 1. Vertices linked by geodesics with uniform potential (a) and edge-based potential (b)
geodesics fit the actual boundaries of the sought object. On the other hand, in the case of region-based segmentation, image edges are not explicitly searched. However, we believe that linking vertices with geodesics is relevant for any usual segmentation criterion. We may assume that the final contour should be partially located on more or less salient boundaries, whatever energy functional is optimized. In subsequent sections, we formulate three energies independently implemented on the geodesically linked active contour, namely the edge, region and narrow band region energies. Before, we recall Green’s theorem, which we use to convert domain integrals into boundary integrals. For every region R and real-valued function f over R2 , we have: f (x)dx = P dx + Qdy (9) R
∂R
where [P (x) Q(x)] is a continuously differentiable vector field such that: 1 x 1 y Q(x, y) = f (t, y)dt P (x, y) = − f (x, t)dt 2 −∞ 2 −∞
(10)
The theorem expects that ∂R should be at least piecewise smooth, it is thus applicable to the geodesically linked active contour. For instance, to express the area of region Rin enclosed by Γ , we consider eq. 9 with f (x) = 1: n 1 1 |Rin | = xi (u)yi (u) − xi (u)yi (u)du 2 i=1 0 3.2
Edge Energy
Boundary-based segmentation is performed by minimizing an edge energy. The edge indicator function g is integrated along geodesics. In order not to penalize lengthy contours, the edge energy is normalized by euclidean length: n 1 1 g(γi (u)) γi (u) du Eedge (Γ ) = Leuclidean(Γ ) i=1 0 Note that according to eq. 6, the integral of g along γi equals Ui+1 (vi ) minus the euclidean length Leuclidean(γi ). Hence, once the action maps have been computed, the edge indicator does not need to be summed over geodesics again. With the edge energy alone, if the search space of vertex coordinates is too small, the
Geodesically Linked Active Contours
169
contour fails at capturing actual boundaries when initialized far from them. To increase the capture range, we add an area-dependent term, which minimization acts like a balloon force [5]: Eballoon (Γ ) =
|D| − |Rin | |D|
where |D| is the image area. In that case, the total energy is a weighted sum of edge and balloon energies. 3.3
Region Energy
The increasing use of region terms has proven to overcome limitations of edgebased only models, especially when dealing with data sets suffering from noise and lack of contrast between neighboring structures. Classical region-based deformable models segment images according to statistical data computed over the object of interest and the background. Image partitions should be uniform in terms of pixel intensities or higher level features like texture descriptors. We rely on the intensity variance, which is close to the two-phase Mumford-Shah segmentation model by Chan and Vese [12]. The average intensity in the inner region is expressed using Green’s theorem: μ(Rin ) =
1 |Rin |
1 |Rin | i=1 n
I(x)dx = Rin
1
0
xi P (γi ) + yi Q(γi )du
where P and Q are the summed intensities (see template formulas in eq. 10). Then, the inner intensity variance is: 1 1 2 2 σ (Rin ) = (I(x) − μ(Rin )) dx = I 2 (x)dx − μ(Rin )2 |Rin | Rin |Rin | Rin where the integral of squared intensities may also be expanded according to Green’s theorem. Corresponding quantities on the outer region may be expressed using relation f (x)dx = Rout
3.4
D
f (x)dx −
f (x)dx Rin
Narrow Band Region Energy
The ideal case of uniform regions is rarely encountered in real applications, as the background usually contains structures of various intensities. Hence, strict homogeneity is not necessarily a desirable property. In order to account for spatially varying intensity, local statistics in region-based segmentation have emerged recently [13] [14]. The narrow band principle, which has proven its efficiency in the evolution of level sets [3], is used in our approach to formulate a local region term [15].
170
J. Mille and L.D. Cohen
Γ
Γ[−B]
Γ[B]
Bin Bout Fig. 2. Inner and outer bands for narrow band region energy
Instead of dealing with whole domains Rin and Rout , we consider an inner band Bin and an outer band Bout in the vicinity of the contour, as depicted in fig. 2. The narrow band region energy is the intensity variance over the bands: Eband (Γ ) = σ 2 (Bin ) + σ 2 (Bout ) Our narrow band region energy is based on parallel curves [16]. We define curve γ[B] i as a parallel curve of γi : γ[B] i (u) = γi (u) + Bni (u)
(11)
where B is the user-defined band thickness, constant along the curve, and ni is the inward unit normal to geodesic γi . Hereafter, we will use the index [B] to denote all quantities related to the parallel curve. Bands Bin and Bout are bounded by parallel curves of the n geodesics γi , respectively γ[B] i and γ[−B] i . We assume that geodesics are smooth enough so that their parallel curves do not self-intersect nor exhibit singularities. An important property resulting from the definition in eq. 11 is that the velocity vector of the parallel curve can be expressed as a function of the velocity vector of the initial curve, as well as its curvature and normal. Using the identity ni = − κi γi , we have: γ[B] i = γi + Bni = (1 − Bκi )γi
(12)
By a change of variable, an integral over inner band Bin may be expressed explicitly in terms of the curve and band thickness: n 1 B f (x)dx = f (γi + bni ) γi (1 − bκi ) db du (13) Bin
0
i=1
0
We use the template formula in eq. 13 to express the mean and variance of intensities in the inner band: n 1 1 B μ(Bin ) = I(γi + bni ) γi (1 − bκi ) db du |Bin | i=1 0 0 1 σ (Bin ) = |Bin | i=1 2
n
0
1
0
B
(I(γi + bni ) − μ(Bin ))2 γi (1 − bκi ) db du
and similarly for the outer band, replacing b with −b.
Geodesically Linked Active Contours
3.5
171
Evolution with Greedy Algorithm
Vertices should be moved in order to minimize the selected energy. This is usually performed with gradient descent of the Euler-Lagrange equation. In our case, it is difficult to differentiate Eedge , Eregion or Eband with respect to a given vertex vi , since these energies depend on geodesics to vi (see eq. (8)). The greedy algorithm presented in section 2.3 provides us a way to evolve vertices without differentiating the energy. Motion of curve points can always be decomposed into normal and tangential components. While the geometry of the curve is modified by normal displacements, tangential motion only affects curve parameterization [4]. Since the distribution of vertices along the contour can be updated with a resampling technique, we only consider normal displacement in the greedy evolution. We define a normal-oriented window WN of length m centered at vertex vi : m m
WN (vi ) = vi + knvi k ∈ − , 2 2 where nvi is the inward unit normal vector, estimated by finite difference using the second and next-to-last points of geodesics γi and γi+1 , respectively. Since steps between successive points in the window are integers, the window may be computed using a Bresenham-like algorithm.
γ˜i ˜i v
vi+1
vi
γ˜i−1
vi−1 Fig. 3. Geodesics linking neighboring vertices to points in search window
Greedy evolution is performed by moving vertex vi to the position in the window which corresponding geodesically linked contour has the smallest en˜ i belonging to the window. The associated ergy. Let us consider a test position v geodesics γ˜i−1 and γ˜i link it to the neighbors of vi , as depicted in fig. 3. The energy of the corresponding geodesically linked contour Γ˜ = {γ1 , ..., γi−2 , γ˜i−1 , γ˜i , γi+1 , ..., γn } is computed and compared to the energy of the initial contour Γ . All window points are tested in this way, so that the evolution scheme at iteration t is: (t+1)
vi
=
arg min
˜ i ∈WN v
(t) vi
E(Γ˜ )
172
J. Mille and L.D. Cohen
where E is one of the previously described energies. If we consider set H = {1, ..., i − 2} ∪ {i + 1, ...n} holding indices of geodesics not influenced by a modification on vi , all quantities involved in the energies are written with constant ˜ i . For instance, the area of the tested inner and variable parts with respect to v region is decomposed: 1 ˜ xj (u)yj (u) − xj (u)yj (u) du Rin =
j∈H
+ 0
0 1
x ˜i−1 y˜i−1
−
x ˜i−1 y˜i−1
du + 0
1
x ˜i y˜i − x ˜i y˜i du
˜ i need to be comThis implies that the part of energies invariant with respect to v puted only once, before moving vi . Finally, once all vertices have been treated, resampling may be performed to maintain consistent distribution of vertices along the curve.
4
Experiments
We tested the geodesically linked active contour with the three different energy configurations (edge+balloon, region and narrow band region). A comparison
Fig. 4. Segmentation of left ventricle: initialization (a) and final location (b) of the geodesically linked active contour, initialization (c) and final location (d) of the parametric contour
Geodesically Linked Active Contours
173
with a parametric snake endowed with the same energies is provided. The snake was initialized as a small circle inside the area of interest, far from the target boundaries. Similarly, the initial vertices of our model are sampled on a circle. Results are shown in fig. 4. For each row, columns (a) and (b) represents the initial and final states of the geodesically linked active contour, respectively. Columns (c) and (d) represent the same states for the snake. For all experiments, the regularization weight w was set to 0.25, which achieved sufficient regularization for all tested images. The size of the window was m = 50 and the maximal inter-vertex distance for resampling was set to 20. The image in row 1, which was segmented using the edge energy, depicts the gapclosing ability of the model. The geodesically linked active contour managed to pass through false edges and reach actual boundaries. Thanks to the large search window, it turned out to be rather unsensitive to balloon strength, as values for coefficient α in the range [0.1, 4] were suitable. On the other hand, the balloon coefficient has a strong influence on the gradient descent-driven parametric snake, which yields difficult parameter tuning. Actually, it was not possible to find a correct balloon weight allowing to jump false edges while stabilizing on real ones. The image in row 2, which was segmented using the region energy, depicts a similar phenomenon. The geodesically linked contour does not get trapped in small gaps in the region, which could present an interest for segmentation of partially occluded objects. Row 3 depicts a MRI of the heart left ventricle, which was used to put the narrow band region energy into application. The band thickness B is an important parameter. Apart from its impact on the algorithmic complexity - computing intensity means and variances on the bands takes at least O(nB) operations it controls the trade-off between local and global features around the object. If B = 1, the region energy is as local as an edge term. The main image property having an effect on the minimal band thickness is the edges sharpness. Indeed, the deformable curve needs a larger band as the boundaries of the target object are fuzzy. However, B = 10 was a suitable value in our experiments. Note that we depict the state of the parametric snake before self-collision. One may note that an unconstrained region-based level set method would also properly segment images in row 2 and 3. However, this remark should be moderated by the fact our model is dedicated to applications where topology preservation is needed.
5
Conclusion and Perspectives
We proposed the geodesically linked active contour model for image segmentation. The model lies on an explicitly implemented curve moved by an evolution method based on minimal paths and a greedy algorithm. Linking curve points with geodesics solves parameterization issues and allows the contour to fit the most salient boundaries at every step of deformation. Displacing vertices according to a greedy search ensured lower sensitivity to erroneous local minima than usual gradient descent of the energy. The model was endowed with edge and
174
J. Mille and L.D. Cohen
region energies and was validated on a few datasets. Further work may focus on developing an adaptive search window for greedy evolution. Currently, the window length is constant whatever the values of energies or the previous positions of vertices. We believe the algorithm could be improved by adapting the window length with respect to these properties, in order to avoid visiting positions that would not seemingly minimize the energy.
References 1. Kass, M., Witkin, A., Terzopoulos, D.: Snakes: active contour models. International Journal of Computer Vision 1(4), 321–331 (1988) 2. Osher, S., Sethian, J.: Fronts propagation with curvature-dependent speed: algorithms based on Hamilton-Jacobi formulations. Journal of Computational Physics 79, 12–49 (1988) 3. Malladi, R., Sethian, J., Vemuri, B.: Shape modeling with front propagation: a level set approach. IEEE Transactions on Pattern Analysis and Machine Intelligence 17(2), 158–175 (1995) 4. Caselles, V., Kimmel, R., Sapiro, G.: Geodesic active contours. International Journal of Computer Vision 22(1), 61–79 (1997) 5. Cohen, L.: On active contour models and balloons. Computer Vision, Graphics, and Image Processing: Image Understanding 53(2), 211–218 (1991) 6. Amini, A., Weymouth, T., Rain, R.: Using dynamic programming for solving variational problems in vision. IEEE Transactions on Pattern Analysis and Machine Intelligence 12(9), 855–867 (1990) 7. Geiger, D., Gupta, A., Luiz, A., Vlontzos, J.: Dynamic programming for detecting, tracking, and matching deformable contours. IEEE Transactions on Pattern Analysis and Machine Intelligence 17(3), 294–302 (1995) 8. Williams, D., Shah, M.: A fast algorithm for active contours and curvature estimation. Computer Vision, Graphics, and Image Processing: Image Understanding 55(1), 14–26 (1992) 9. Sakalli, M., Lam, K.M., Yan, H.: A faster converging snake algorithm to locate object boundaries. IEEE Transactions on Image Processing 15(5), 1182–1191 (2006) 10. Cohen, L., Kimmel, R.: Global minimum for active contour models: a minimal path approach. International Journal of Computer Vision 24(1), 57–78 (1997) 11. Sethian, J.: A fast marching level set method for monotonically advancing fronts. Proceedings of the National Academy of Science 93(4), 1591–1595 (1996) 12. Chan, T., Vese, L.: Active contours without edges. IEEE Transactions on Image Processing 10(2), 266–277 (2001) 13. Piovano, J., Rousson, M., Papadopoulo, T.: Efficient segmentation of piecewise smooth images. In: Sgallari, F., Murli, A., Paragios, N. (eds.) SSVM 2007. LNCS, vol. 4485, pp. 709–720. Springer, Heidelberg (2007) 14. Lankton, S., Tannenbaum, A.: Localizing region-based active contours. IEEE Transactions on Image Processing 17(11), 2029–2039 (2008) 15. Mille, J., Boné, R., Cohen, L.: Region-based 2D deformable generalized cylinder for narrow structures segmentation. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 392–404. Springer, Heidelberg (2008) 16. Farouki, R., Neff, C.: Analytic properties of plane offset curves. Computer Aided Geometric Design 7(1-4), 83–99 (1990)
Validation of Watershed Regions by Scale-Space Statistics Tomoya Sakai and Atsushi Imiya Institute of Media and Information Technology, Chiba University, Japan {tsakai,imiya}@faculty.chiba-u.jp
Abstract. This paper shows a potential use of scale space for statistical validation of watershed regions of a greyscale image. The watershed segmentation has difficulty in distinguishing valid watershed regions associated with real structures of the image from invalid random regions due to background noise. In this paper, a hierarchy of watershed regions is established by following merging process of the regions in a Gaussian scale space. The distribution of annihilation scales (lives) of the regional minima is investigated to statistically judge the regions as being valid or not. Recursive validation using the hierarchy prevents oversegmentation due to the randomness.
1
Introduction
The aim of this study is to develop a statistical validation scheme for segmentation of a greyscale image. If we do not have a priori knowledge on the shapes or structures of objects in the image, topographic features of the greyscale image, and the watersheds in particular, are useful for unsupervised image segmentation. A well-known phenomenon in the watershed segmentation is oversegmentation, that is, producing a large number of undesired tiny regions. Since the undesired watershed regions are mainly caused by noise in the image, it is desirable to settle the oversegmentation problem by taking account of statistical properties of the randomness. There is a body of literature dealing with the oversegmentation problem of watersheds [1,2,3,4,5,6,7,8]. In the antecedent work, most schemes for preventing the oversegmentation attempt to hierarchically merge the oversegmented regions on the basis of similarity between adjacent regions measured by the MDL [3], colour distance [8], and so on. Diffusion-based multiscale image representations are also used for merging the regions [5, 6, 8], since the scale space theory [9,10,11,12,13,14,15] mathematically underpins topological relationships among the topographic features without a priori knowledge about them. The oversegmentation can be reduced by selecting levels in the hierarchy of regions, or by setting lower bounds to the scale above and below which the watersheds are valid and invalid, respectively. In this paper, we show that the scale-space treatment of the image is also useful for the statistical analysis of the random watershed regions. The validity X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 175–186, 2009. c Springer-Verlag Berlin Heidelberg 2009
176
T. Sakai and A. Imiya
of a watershed region can be quantified in terms of the statistical confidence of distinguishing it from invalid watershed regions due to randomness. We present a fully unsupervised watershed segmentation algorithm, in which the watershed regions are recursively validated according to their hierarchical relationships in the scale space.
2 2.1
Watershed Segmentation with Variable Scale Gaussian Scale Space
In the Gausian scale-space theory [9,10,11,12,14,15,16], a one-parameter family of nonnegative functions is derived from a d-dimensional greyscale image f (x), x ∈ Rd . f (x, σ) = G(x, σ) ∗ f (x) (1) Here, “∗” expresses d-dimensional convolution, and G(x, σ) is an isotropic Gaussian function with the scale σ. |x|2 1 (2) G(x, σ) = √ d exp − 2 2σ 2π σ d We redefine the d-dimensional greyscale image and its scale-space representation in the extended real scale and space as follows. Definition 1. A d-dimensional greyscale image is defined as a nonnegative d ¯ d with a finite net image intensity scalar function f (x), x ∈ R x∈R¯ d f (x)dx . ¯ d, R ¯ + ), is the convoDefinition 2. The scale-space image f (x, σ), (x, σ) ∈ (R lution of the greyscale image f (x) with the isotropic Gaussian kernel G(x, σ). ¯ d and R ¯ + denote the d-dimensional extended real space including a point Here, R at infinity and the extended real scale including an infinite scale, respectively. Although the domain of a greyscale image in practice is bounded within a limited area or volume, we embed such an image in the extended real scale space. The point at infinity will be theoretically used as a representative point of the background of the image in the watershed segmentation later. 2.2
Watershed Segmentation and Hierarchy of Regions
The watershed segmentation was derived from spatial partitioning on the basis of the drainage patterns of rainfall. As the topographic height map defines the boundaries of the catchment basins draining to the same lowest points, a twodimensional greyscale image defines the watershed boundary curves enclosing regions with local minima when we regard the image intensity as the topographic height. For a d-dimensional image, the entire space is partitioned by (d − 1)dimensional hypersurfaces into d-dimensional watershed regions. Each watershed region defined by a smooth function f (x) contains a unique local minimum, to
Validation of Watershed Regions by Scale-Space Statistics
177
which any point in the watershed region is connected by a gradient curve of f (x). In practice, the watershed segmentation of the gradient image |∇f (x)| is known to provide better intuitive partitions than that of the image f (x) itself [2, 5, 6, 8] because object boundaries in a scene may cause large spatial changes in the image intensity. Simple computation of the watersheds of the images results in oversegmentation caused by tiny and insignificant catchment basins. As suggested in the antecedent work [3, 5, 6, 8], hierarchical relationships among the watershed regions are of great help for merging the oversegmented regions. We employ the scale-space framework to derive the hierarchy because the scale-space axioms are acceptable in general cases where any prior information about the similarities among the unexpected watershed regions are not given. If we apply the gradient watershed segmentation to the image f (x, σ) with the variable scale σ, we can observe the evolution of the watersheds with respect to scale. The catastrophy theory applied to the gradient watershed segmentation in the Gaussian scale space [5] shows that the gradient watershed regions of f (x, σ) may be generically annihilated, merged, created and splitted with increasing scale σ. Therefore, hierarchical watershed segmentation using multiscale representation of the image [2, 6, 8] is essentially the extraction of the hierarchical relationships among the watershed regions in the scale space through the generic events. Since every watershed region is represented by its local minimum, the trajectories of the regional minima in scale space describe the relationships among the regions. For the purpose of validation of the regions, we derive the hierarchy from all the traceable regional minima from the finest scale along their trajectories in scale space. We trace the trajectories by local minimisation at every level of scale [16]. In an annihilation or merging event, two regional minima and a saddle between them are involved. We regard one of these two regional minima as a child of the resulting regional minimum after the event. We trace only one of two local minima after a creation or splitting event because we are interested in the hierarchy of the regions at the finest scale. Remark that the point at infinity is a local minimum which exists at any scale. The local minimum at infinity is the regional minimum of the image background because the rainfall in the background region is drained to this ideal point. The following algorithm RegionHierarchy traces every trajectory of the regional minimum from every pixel p ∈ P at σ = 0 until the regional minimum disappears or goes outside the image boundary toward the local minimum at infinity with increasing scale. RegionHierarchy(set of pixel centres P , image f (p ∈ P )) 1 let G be a graph with card(P ) + 1 nodes with the labels l = 0, . . . , N where l = 0 represents the point at infinity; 2 store σlt = ∞ in all nodes of G; 3 set σmax to be the size of the convex hull of P ; 4 σ := 0; 5 Q := P ; 6 while card(Q) = 1 or σ < σmax do
178
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
T. Sakai and A. Imiya
Q := Q; σ := σ + Δσ, where Δσ is a small value compared with the space intervals of the points Q; for each q l ∈ Q do update q l by minimising |∇f (x, σ)|2 with q l as the initial position 1 ; if q l is outside the convex hull of P then connect the two nodes of G labelled 0 and l; end if end for let L be a list of labels corresponding to the points in Q; while card(L) = 1 do pop a label l from L; n := NearestNeighbour(L, l); if |q l − q n | < εσ, where q l , q n ∈ Q, and ε is the tolerance of minimisation then if |q l − q l | > |q n − q n |, where q l , q n ∈ Q then child := l and parent := n; else child := n and parent := l; end if connect the two nodes of G labelled parent and child; t set σchild := σ; remove q child from Q; end if end while end while return G.
The resulting graph G is a set of trees representing the hierarchy of the watershed regions of the gradient image. Any node in G represents a watershed region consisting of the pixels indicated by its subtree nodes. The annihilation or merging scale σ t is stored at the node in G corresponding to p. We utilise the bicubic spline interpolation [17] to seach for the local minimum with subpixel precision in Step 10. The function NearestNeighbour in Step 18 searches for the nearest point to pl in the set of points listed in L and returns its label. The annihilation or merging event is detected in Step 19, and one of the two regional minima with larger displacement is identified as the child in Step 20. Figure 1 shows an example of the trajectories of regional minima and the region hierarchy obtained by RegionHierarchy. Since the set of tree, G, expresses hierarchical relationships among the image pixels, any tree node with a scale σ > 0 represents a set of pixels consisting a watershed region.
1
It is trivial that the watershed regions of the gradient magnitude squared |∇f |2 are identical to those of the gradient magnitude |∇f |.
Validation of Watershed Regions by Scale-Space Statistics
179
σ
(a)
(b)
(c)
Fig. 1. Trajectories of regional minima and region hierarchy. (a) A noisy 96 × 96 image f (x) embeded in a dark background. (b) Gradient magnitude squared |∇f (x, σ = 20)|2 . The brighter the larger magnitude. (c) Trajectories of the regional minima in scale space. The thick curves (blue) are the parts of the trajectories for σ > 5. The thin straight lines (red) are the edges of G between the nodes with σ > 5.
2.3
Scale Selection Problem
We need a criterion to select the scales or the tree levels in hierarchy. One may expect that the watersheds of the image f (x, σ) at a small scale σ well approximates the boundary of true image regions. However, if noise spoils the fine structure of the image, the estimated watersheds at small scales are stochastic and experimentally less reproducible. The noise is suppressed at a large scale, but the watershed segmentation is poor in terms of detection ability and localisation: the edges of small watershed regions are smoothed out, and the boundary shapes of large regions are simplified. Since the randomness is the major cause of the oversegmentation problem in the watershed methods [1, 4, 5], the oversegmentation problem should be resolved in a statistical manner.
3 3.1
Validation of Watershed Regions Valid Watershed Regions
Generally, a greyscale image expresses spatial distribution of a measured physical quantity. The true image f true(x), which we want to measure and apply the watershed segmentation to, is inevitably spoiled by random noise through the measurement. Therefore, the actual image f (x) presents valid watersheds related to those of the true image f true (x) and invalid watersheds due to the randomness. Assertion 1. A valid watershed region of an observed gradient image |∇f (x)| is related to one of the watershed regions of the true gradient image |∇f true (x)|. Since the watershed regions are represented by the region minima, the image f (x) has the valid watershed regions of the gradient image |∇f (x)| iff the true gradient image |∇f true (x)| has corresponding local minima. Contrapositively,
180
T. Sakai and A. Imiya
iff |∇f true (x)| is a featureless image without any local minimum, then no valid watershed exists for any observation f (x), which should be considered as an image of the background only. This condition means that f true (x) = 0 everywhere ¯ d because of the Definition 1. Therefore, f (x) for f true (x) = 0, i.e., the in R noise image, produces only the invalid watershed regions. The valid watershed region must be statistically distinguishable from such invalid region. From this viewpoint, the validity of the watershed region is interpreted as the statistical confidence in rejecting the following null hypothesis. Null hypothesis H0 : The watershed region is that of the noise image. Alternative hypothesis H1 : The watershed region is not that of the noise image. The null hypothesis H0 is rejected if the regional minimum is distinguishable from that of the noise image using test statistics. 3.2
Life Distribution
An important fact is that the randomness of the image f (x, σ) is filtered out as the scale σ increases, and deterministic features of the image f (x) emerge at large scales. In other words, the deterministic features such as the valid watershed regions are established from coarse to fine. There presumably exists a critical lower bound of scale, above and below which the watersheds of f (x, σ) are valid and invalid, respectively. In order to observe how the valid regions survive until large scales against the scale-space filtering, we define the life of the watershed region. Definition 3. The life of the watershed region is defined as the annihilation scale σ t of the regional minimum. Let W be a distribution of the lives of the watershed regions of |∇f (x, σ)| for the image of random noise. If W can be parametrically modelled, a goodness-of-fit test can be performed under the null hypothesis H0 . That is, if an image f (x) is an observation of a true uniform image with noise, then the model of W fits the distribution of lives {σ t } of its watershed regions, and H0 for any watershed regions of f (x) is accepted. We investigate experimentally the life distribution W for the gradient watershed regions of a Gaussian white noise image as shown in Fig. 2(a). We averaged the frequencies of lives over one hundred noise images. We discard the lives of pixel points whose annihilations are detected in 0 < σ ≤ Δσ by RegionHierarchy because not all the pixel centres are the local minima. Figure 2(b) is the averaged histogram of life. The obtained life histogram shows an unimodal shape. This implies that there exists a scale where the merging of the regions most frequently occurs. The regional minima of the noise image are uniformly distributed random points, and the regions tend to merge with nearest regions. Therefore, we deduce that this unimodal property is associated with distribution of the nearest neighbour distances of random points. In fact, the nearest neighbour distance distribution has a unimodal shape (See appendix A). The scale of
Validation of Watershed Regions by Scale-Space Statistics
181
σ
Relative frequency (a)
(b)
Fig. 2. Noise image and the averaged life histogram for its gradient watershed regions. (a) The noise image has uncorrelated random pixel values. (b) The life histogram shows relative frequency of scale at which the regional minima of the gradient image are annihilated as Gaussian blurring of the noise image proceeds.
the mode can be used as a gauge of the density of invalid regions. The regional minima with significantly large values of life out of the unimodal distribution W can be identified to be valid, because such regional minima are distinguishable from the invalid regional minima of the noise image. 3.3
Recursive Validation
We can set a critical value of the scale to judge the watershed regions valid or invalid. Although the computation of such a critical scale requires the parametric model of the life distribution in the strict sense of statistics, the critical scale can be roughly evaluated by the peak and decaying form of the life histogram. If the image contains valid regions, the life histogram may be multimodal or may have a peak at a small scale relative to the outlying lives representative of the valid regions. According to our experimental result in Section 3.2, a regional minimum with a life which is more than six times greater than the peak can be considered to be valid with the statistical confidence level α > 99% under the assumption of uncorrelated Gaussian random pixel values of a two-dimensional image as the noise. We present an algorithm RegionDiscovery for discovery of the valid watershed regions. This algorithm recursively validates the regions in a top-down fashion using each tree T in G by RegionHierarchy. According to the hierarchy, any discovered region is split into subregions as long as they are valid. Each subregion is validated using the life histograms constructed from the lives stored in the subtrees of T corresponding to the subregion.
182
T. Sakai and A. Imiya
RegionDiscovery(tree T , set of valid regions V , significance level α) 1 let Σ be a set of life values stored in T except the root; 2 let s be the subroot node of T with the largest life value σmax ∈ Σ; 3 if IsMultimodal(Σ) or IsOutlier(σmax , Σ, α) then 4 RegionDiscovery(Subtree(T , s), V , α); 5 RegionDiscovery(T \Subtree(T , s), V , α); 6 else 7 push the region R := Pixels(T ) into V ; 8 end if. Here, the function IsMultimodal returns true if the histogram of Σ is not unimodal. IsOutlier returns true if the life σ t is greater than the critical αlevel of scale computed from the given set of lives Σ. Note that these functions discard the lives in 0 < σ ≤ Δσ. Subtree extracts the subtree with subroot node s from the tree T . Pixels returns a set of pixels whose labels are recorded at the nodes in the given tree. The following function, Watershed, executes our watershed segmentation algorithm for a given image f with a set of pixels P and a significance level α. It returns the set of valid watershed regions consisting of subsets of P . Watershed(set of pixel centres P , image f , significance level α) 1 set V := ∅; 2 G := RegionHierarchy(P, f ); 3 for each tree T in G do 4 RegionDiscovery(T , V , α); 5 end for 6 return V .
scale 30 25 20 15 10 5
(a)
(b)
(c)
Fig. 3. An example of our watershed segmentation of noisy image. (a) Original image with 20% noise. (b) Trajectories of local minima of the gradient magnitude of (a) in scale space. The trajectories reaching out of the spatial domain are subordinate to a local minimum at infinity. (c) Watershed regions of the gradient magnitude by the algorithm Watershed. The brightness indicates the order of lives.
Validation of Watershed Regions by Scale-Space Statistics
4
183
Test Example
We demonstrate our gradient watershed segmentation Watershed for a noisy greyscale image. The purpose of this section is not to test the performance of the algorithm, but to show that the statistics in scale space has potential to discover the valid watershed regions without any prior information about them. Figure 3(a) shows a 128 × 128 test image f (x) with 20% additive noise [18]. The trajectories of local minima of |f (x, σ)| traced from σ = 0 in scale space are shown in Fig. 3(b). We see a large number of local minima created by the noise at small scales. As the scale increases, the local minima are hierarchically grouped and representative local minima survive at larger scales. Figure 3(c) shows the segmentation result with a confidence level α = 99% for f (x). There are nine
σ=2
σ=6
σ = 12
Fig. 4. Watershed segmentation of Fig. 3(a) at different scales. First row: the scalespace image f (x, σ). Second row: the gradient magnitude |∇f (x, σ)|. Third row: the watersheds of |∇f (x, σ)|. Each column corresponds to the same scale indicated below.
184
T. Sakai and A. Imiya
discovered regions clearly corresponding to the major regions of the original image. The tiny faults in the regions were caused by failure in the minimisation. They were wrongly assigned to the image background, which should be fixed in the future work. For the comparision purpose, we show in Fig. 4 the simple watershed segmentation results at a few levels of scale without using the region hierarchy or statistics in scale space. We see invalid small regions at small scales while the shapes of valid regions at large scales are distorted. It is remarkable that structural and statistical analyses using scale space can reconstruct the precise edges of statistically valid watershed regions despite the significant noise.
5
Concluding Remarks
The scale-space treatment of the image clarifies not only the hierarchical relationships among the watershed regions but also their statistical properties. We can observe in the Gaussian scale space how the random features are suppressed and deterministic features emerge as the scale grows. A valid watershed region must be statistically distinguishable from unreproducible regions caused by the random features. The reproducibility is a desirable ability of image recognition techniques. On the basis of this simple requirement we described the null hypothesis H0 , which is to be rejected if the watershed region is valid. A watershed region is recognised as valid at a statistical confidence level in rejecting H0 . We presented a validation scheme for watershed segmentation using statistics in scale space. We defined the life of a watershed region, whose distribution is useful for testing H0 . We showed that the life distribution for the noise image is unimodal, and the valid regions can be identified by the regional minima with significantly large values of lives out of the unimodal distribution. The statistical properties of the life and the region hierarchy enable the recursive validation of the watershed regions. A distinctive feature of our scheme is that it does not require any definition of similarity or dissimilarity measures between watershed regions, which is used in many methods for preventing oversegmentation. Instead, we focused on the statistical differences between the valid and invalid regions in scale space. In order to take advantage of the potential of scale-space statistics, our scheme requires further investigation, especially in relation to the model of the life distribution, and improvement and acceleration of the algorithms to obtain feasible segmentation results for larger size real images.
References 1. Vincent, L., Soille, P.: Watersheds in digital spaces: An efficient algorithm based on immersion simulations. IEEE Trans. on Pattern Analysis and Machine Intelligence 13(6), 583–598 (1991)
Validation of Watershed Regions by Scale-Space Statistics
185
2. Beucher, S.: Watershed, hierarchical segmentation and waterfall algorithm. In: Proc. Math. Morphology and Its Appl. to Image Processing, pp. 69–76 (1994) 3. Maes, F., Vandermeulen, D., Suetens, P., Marchal, G.: Computer-aided interactive object delineation using an intelligent paintbrush technique. In: Ayache, N. (ed.) CVRMed 1995. LNCS, vol. 905, pp. 77–83. Springer, Heidelberg (1995) 4. Hagyard, D., Razaz, M., Atkin, P.: Analysis of watershed algorithms for greyscale images. In: Proc. of IEEE Intl. Conf. Image Procesing, vol. 3, pp. 41–44 (1996) 5. Olsen, O.F., Nielsen, M.: Generic events for the gradient squared with application to multi-scale segmentation. In: ter Haar Romeny, B.M., Florack, L.M.J., Viergever, M.A. (eds.) Scale-Space 1997. LNCS, vol. 1252, pp. 101–112. Springer, Heidelberg (1997) 6. Gauch, J.M.: Image segmentation and analysis via multiscale gradient watershed hierarchies. IEEE Trans. on Image Processing 8(1), 69–79 (1999) 7. Roerdink, J.B.T.M., Meijster, A.: The watershed transform: definitions, algorithms, and parallelization strategies. Fundamenta Informaticae 41, 187–228 (2001) 8. Vanhamel, I., Pratikakis, I., Sahli, H.: Multiscale gradient watersheds of color Images. IEEE Trans. on Image Processing 12(6), 617–626 (2003) 9. Witkin, A.P.: Scale space filtering. In: Proc. of 8th IJCAI, pp. 1019–1022 (1986) 10. Koenderink, J.J.: The structure of images. Biological Cybernetics 50, 363–370 (1984) 11. Lindeberg, T.: Scale-Space Theory in Computer Vision. Kluwer, Boston (1994) 12. Weickert, J., Ishikawa, S., Imiya, A.: Linear Scale-Space has First been Proposed in Japan. Journal of Mathematical Imaging and Vision 10, 237–252 (1999) 13. Lifshitz, L.M., Pizer, S.M.: A multiresolution hierarchical approach to image segmentation based on intensity extrema. IEEE Trans. on Pattern Analysis and Machine Intelligence 12(6), 529–540 (1990) 14. Florack, L.M.J., Kuijper, A.: The topological structure of scale-space images. Journal of Mathematical Imaging and Vision 12(1), 65–79 (2000) 15. Kuijper, A.: The deep structure of Gaussian scale-space images. PhD thesis, Utrecht University (2002) 16. Sakai, T., Imiya, A.: Gradient structure of image in scale space. Journal of Mathematical Imaging and Vision 28(3), 243–257 (2007) 17. Keys, R.: Cubic convolution interpolation for digital image processing. IEEE Trans. on Acoustics, Speech, and Signal Processing 29(6), 1153–1160 (1981) 18. SAMPL database, http://sampl.ece.ohio-state.edu/database.htm 19. Suwa, N.: Quantitative morphology: stereology for biologists. Iwanami Shoten (1977) (in Japanese)
A
Distribution of Nearest Neighbour Distances
We present a proof that the nearest neighbour distances obey the Weibull distribution if the points in Rd are uniformly distributed in a Poisson arrangement [19]. The Poisson arrangement is defined as the uniformly random distribution of points with constant density ρ such that the number of points x in a fixed volume V follows the Poission distribution. Po(x; λ) =
λx exp(−λ) x!
(3)
186
T. Sakai and A. Imiya
Here, λ = ρV is the expected number of points in the volume V . Let r be the distance from an arbitrary point. The distribution of the nearest neighbour distances, p(r), can be regarded as the probability that the nearest neighbour is found in an infinitesimal gap between r and r + δr. This is the case that no points are found within the distance r, and at least one point is found between r and r + δr. Since the volume Vd of a unit d-ball and its surface area Sd−1 has a relationship Vd d = Sd−1 , we have p(r)δr = Po(0; ρVd rd ) 1 − Po(0; ρSd−1 rd−1 δr) ≈ exp(−ρVd rd ) 1 − exp(ρSd−1 rd−1 δr) = exp(−ρVd rd ) · ρSd−1 rd−1 δr = exp(−ρVd rd ) · ρVd drd−1 δr √ Letting s = 1/d ρVd be the scale of the average volume of d-dimensional hypercube per point, we obtain the Weibull distribution
d r d−1 r d p(r; s, d) = (4) exp − s s s where s and d correspond to the so-called scale and shape parameters of the Weibull distribution, respectively. This distribution p(r; s, d) has a mode at r = s d (d − 1)/d. For a fixed dimensionality, the mode depends only on the scale parameter s, which enables us to calculate the point density ρ from the mode.
Adaptation of Eikonal Equation over Weighted Graph Vinh-Thong Ta, Abderrahim Elmoataz, and Olivier Lézoray Université de Caen Basse-Normandie, GREYC CNRS UMR 6072, Image Team {vinhthong.ta,abderrahim.elmoataz-billah,olivier.lezoray}@unicaen.fr http://www.info.unicaen.fr/˜vta
Abstract. In this paper, an adaptation of the eikonal equation is proposed by considering the latter on weighted graphs of arbitrary structure. This novel approach is based on a family of discrete morphological local and nonlocal gradients expressed by partial difference equations (PdEs). Our formulation of the eikonal equation on weighted graphs generalizes local and nonlocal configurations in the context of image processing and extends this equation for the processing of any unorganized high dimensional discrete data that can be represented by a graph. Our approach leads to a unified formulation for image segmentation and high dimensional irregular data processing.
1
Introduction
Solutions of the nonlinear eikonal equation have found numerous applications. One can quote for instance, geometric optics, image analysis or computer vision including shape from shading [1, 2], median axis or skeleton extraction [3], topographic segmentation (watershed) [4] or geodesic distance computation on discrete and parametric surfaces [5, 6, 7, 8, 9]. The latter works consider both structured and unstructured meshes on cartesian or non-cartesian domains. The eikonal equation is a special case of the following general continuous Hamilton-Jabobi equation: H(x, f, ∇f ) = 0 x∈Ω ⊂ IRn , (1) f (x) = φ(x) x∈Γ ⊂ Ω where φ in the boundary condition is a positive speed function defined on Ω and f (x) is the traveling time or distance from source Γ . Then, the eikonal equation can be expressed by using the following Hamiltonian: H(x, f, ∇f ) = ∇f (x) − P (x),
(2)
where P (x) is a given potential function. Solution of (1) represents the shortest distance from x to the zero distance curve given by Γ (where φ(x)=0). Solutions of (2) are usually based on a discretization of the Hamiltonian where the approximation of the derivatives is performed by the Godunov [10] or the X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 187–199, 2009. c Springer-Verlag Berlin Heidelberg 2009
188
V.-T. Ta, A. Elmoataz, and O. Lézoray
Lax-Friedrich [11] schemes. Then, many numerical methods have been proposed and investigated to solve the nonlinear system described by (2). For instance, one can quote the following schemes. (i) An iterative scheme [1] relying on fixed point methods that solves a quadratic equation was proposed. (ii) The fast sweeping methods [12] that use Gauss-Seidel type of iterations to update the distance function field. The key point of fast sweeping is to update the points in a certain order. (iii) Tsitsiklis [13] was the first to develop a Dijkstra like method and proposed an optimal algorithm for solving the eikonal equation. Based on this idea, [14, 11] produced the fast marching methods. Another approach to solve (2) is to consider a time dependent version of the equation and to evolve it to the steady state. Then, (2) can be rewritten as ⎧ n ⎪ ⎨∂f (x, t)/∂t = −∇f (x) + P (x) x∈Ω ⊂ IR (3) f (x, t) = φ(x) x∈Γ ⊂ IRn . ⎪ ⎩ f (x, 0) = φ0 (x) x∈Ω This paper only considers the discrete analogue of the time dependent formulation of the eikonal equation but, in future works, the stationary case (time independent) will be also considered. Contributions. In this work, we propose an adaptation of (3) over weighted graphs of the arbitrary structure. The goal here is to provide a simple and common formulation that solves the eikonal equation for any discrete data that can be represented by a weighted graph such as images or high dimensional data defined on irregular domains. This alternative formulation for solving the eikonal equation is based on partial difference equations (PdEs) and discrete gradients over weighted graphs. Our formulation has several advantages. Any discrete domain that can be described by a graph can be considered without any spatial discretization. In the context of image processing, local and nonlocal configurations are directly enabled within a same formulation. Finally, the aim of this paper is not to solve a particular application with the eikonal equation but to show the potentialities of our proposition to address image segmentation, data clustering or distance computation. Paper Organization. The paper is organized as follows. Section 2 recalls basics, definitions and operators on weighted graphs. Section 3 introduces our formulation for solving the eikonal equation. Section 4 shows the potentialities of our proposition for the segmentation of images and unorganized data processing. Finally, last Section concludes.
2
Discrete Derivatives on Weighted Graphs
This Section recalls basics, definitions, operators and processes on weighted graphs.
Adaptation of Eikonal Equation over Weighted Graph
2.1
189
Definitions and Weighted Graphs Construction
Notations and Definitions. We consider the general situation where any discrete domain can be viewed as a weighted graph. Let G=(V, E, w) be a weighted graph composed of two finites sets: vertices V and weighted edges E⊆V ×V . An edge (u, v)∈E connects two adjacent (neighbor) vertices u and v . The neighborhood of a vertex u is noted N (u)={v∈V \{u} : (u, v)∈E}. The weight ωuv of an edge (u, v) can be defined with a function w:V ×V →IR+ such that w(u, v)=ωuv if (u, v)∈E and w(u, v)=0 otherwise. Graphs are assumed to be simple, connected and undirected implying that function w is symmetric. Let f :V →IR be a discrete real-valued function that assigns a real value f (u) to each vertex u∈V . We denote by H(V ) the Hilbert space of such functions defined on V . Weighted Graphs Construction. Any discrete domain can be represented by a weighted graph where functions of H(V ) represents the data to process. In the general case, an unorganized set of points V ⊂IRn can be seen as a function f 0 :V ⊂IRn →IRn . Then, constructing a graph from this data consists in defining the set of edges E by modeling the neighborhood. It is based on a similarity relationship between data with a pairwise distance measure μ:V ×V →IR+ . There exists several methods to transform a set of vertices V into a neighborhood (similarity) graph (see [15] for a survey on proximity and neighborhood graphs). In this paper, we focus on two particular graphs: the τ -neighborhood graphs and a modified version of k-nearest neighbors graphs. The k nearest neighbors graph, noted k-NNG is a weighted graph where each vertex u∈V is connected to its k nearest neighbors which have the smallest distance measure towards u according to function μ. Since this graph is directed, a modified version of this graph is used to make it undirected. The τ -neighborhood graph, noted Gτ is a weighted graph where the τ -neighborhood Nτ for a given vertex u∈V is defined as Nτ (u)={v∈V \{u} : μ(u, v)≤τ } with τ >0 a threshold parameter. 2D images can be viewed as functions f 0 :V ⊂ZZ 2 →IRn . In this case, the associated distance μ for construct the neighborhood graph is usually the city block or the Chebychev distances computed with the spatial coordinates of each vertex representing an image pixel. With these distances and the τ -neighborhood graphs, one recovers the two usual graphs used in image processing, the 4-adjacency grid graph (denoted G0 with the city block distance) and the 8-adjacency grid graph (denoted G1 with the Chebychev distance) with τ ≤1. Another useful graph structure in image processing is the region adjacency graph (RAG) where vertices correspond to image regions, and the set of edges is obtained by considering an adjacency distance. With the τ -neighborhood (τ =1), the RAG is the Delaunay graph of an image partition. Weights Computation. Similarities between data can be incorporated within edges’ weights according to a measure of similarity g:E→IR+ that satsfies w(u, v)=g(u, v) for (u, v)∈E. Then, the distance computation between data is performed by comparing their features that generally depend on a given initial function f 0 ∈H(V ). To this aim, each vertex u∈V is assigned with a feature vector F (f 0 , u)∈IRm . With F , the following weight functions can be considered. For
190
V.-T. Ta, A. Elmoataz, and O. Lézoray
a given edge (u, v)∈E and a distance measure ρ:V ×V →IR+ associated to F , we can have g0 (u, v) = 1 (constant weight case) , g1 (u, v) = (ρ(F (f 0 , u), F (f 0 , v)) + )−1 with >0, →0, g2 (u, v) = exp(−ρ(F (f 0 , u), F (f 0 , v))2 /σ 2 ) with σ>0, where σ controls the similarity and ρ is usually the euclidean distance. Several choices for the expression of F can be considered depending on the features to preserve. The simplest one is F (f 0 , .)=f 0 . In the context of image processing, an important feature vector F is provided by image patches, i.e., F (f 0 , u)=Fτ (f 0 , u)={f 0 (v) : v∈Nτ (u) ∪ {u}}. In the case of a grayscale image Fτ (f 0 , .) is a vector of size (2τ +1)2 corresponding to the values of f 0 in a square window of size (2τ +1)×(2τ +1) centered at vertex u (a pixel). Color images can be handled using features of dimension 3×(2τ +1)2 . Then, the resultant weight function directly incorporates local or nonlocal features [16]. This feature vector has been proposed in the context of texture synthesis [17], and further used in the context of image processing [18,19]. 2.2
Graph Based Discrete Gradients
Let G=(V, E, w) be a weighted graph. The discrete weighted gradient of a function f ∈H(V ) at a vertex u∈V is defined by
(∇w f )(u) = (∂v f (u))(u,v)∈E
where ∂v f (u)= w(u, v)(f (v)−f (u)) corresponds to the discrete (partial) derivative of f with respect to the edge (u, v). These definitions have been used by [20] for image and mesh regularization. Based on the latter works, two discrete formulations of weighted morphological gradients on graphs have been proposed − by [21]: namely, the weighted external ∇+ w and the internal ∇w gradient operators. For u∈V + (∇+ w f )(u) = (∂v f (u))(u,v)∈E
and
− (∇− w f )(u) = (∂v f (u))(u,v)∈E ,
(4)
where the external ∂v+ f (u) and the internal ∂v− f (u) discrete partial derivatives are ∂v+ f (u) = max(0, ∂v f (u)) and ∂v− f (u) = − min(0, ∂v f (u)), with ∂v− f (u)=∂u+ f (v). When the weight is constant (w=g0 ) these definitions recover the classical directional derivative operators. The Lp -norm (with 0
1/p w(u, v)p/2 |(f (v)−f (u))± |p and (5) (∇± w f )(u)p = v∼u
1/2 |(f (v)−f (u))± | . (∇± w f )(u)∞ = max w(u, v) v∼u
(6)
Adaptation of Eikonal Equation over Weighted Graph
191
Notation v∼u means that vertex v is adjacent to u. ∇± w refers to both external and internal gradient (with respect to the sign) and (a)+ = max(0, a) and (a)− = min(0, a). These gradients have the following property: p − p (∇w f )(u)pp = (∇+ w f )(u)p + (∇w f )(u)p
with 0
3
Eikonal Equation on Weighted Graphs
In this Section, we present our formulation to approximate the eikonal equation (3) over weighted graphs by considering PdEs and the morphological gradients presented in the previous Section. With morphological processes described by (8), the time dependent eikonal formulation (3) can be viewed as an erosion process regarding the minus sign and a null potential function P . With the corresponding internal gradient (∇− w) involved in discrete PdEs based erosion process, (3) can be directly rewritten with weighted graphs. Given a graph G=(V, E, w) and a function f ∈H(V ), we obtain a discrete PdEs based version of the system (3) ⎧ − ⎪ ⎨∂f (u, t)/∂t = −∇w f (u)p + P (u) u∈V f (u, t)=φ(u) u∈V0 ⊂ V , ⎪ ⎩ f (u, 0)=φ0 (u) u∈V where V0 corresponds to the initial seed vertices. With f n (u) ≈ f (u, nΔt), this iterative numerical scheme is obtained for all u∈V : n f n+1 (u)=f n (u) − Δt (∇− (9) w f )(u)p − P (u) .
192
V.-T. Ta, A. Elmoataz, and O. Lézoray
The steady state (i.e. given a fixed number n of iteration or when f n+1 −f n < ) of this process is the solution of the eikonal equation (2). Injecting the corresponding internal gradient norm in (9), we obtain for the Lp -norm (5) and the L∞ -norm (6)
1/p f n+1 (u)=f n (u) − Δt w(u, v)p/2 | min(0, f (v)−f (u))|p − P (u) , (10) v∼u
f
n+1
(u)=f (u) − Δt max w(u, v)1/2 | min(0, f (v)−f (u))| − P (u) . n
v∼u
(11)
The proposed methodology leads to a simple and common formulation that constitutes an adaptative framework for the eikonal equation. Indeed, our approach only depends on the p value and the weight function w. In Sect. 4, experiments show how the framework can be adapted to address image segmentation or data clustering. Relations with other schemes. Scheme (9) has the advantage to work on any graph structures. Then, with an adapted graph topology and an appropriated weight function, the proposed formulation is linked to well-known schemes such as Osher-Sethian Hamiltonian discretization scheme or the graph based Dikjstra algorithm. Osher-Sethian scheme. Let G0 =(V, E, g0 ) be an unweighted 4-adjacency grid graph associated with an image. Then, (10) recovers the exact Osher-Sethian upwind first order Hamiltonian discretization scheme [14] when p=2 and using G0 :
1/2 f n+1 (u)=f n (u) − Δt | min(0, f (v)−f (u))|2 − P (u) . v∼u
Replacing vertices u∈V and their neighborhood by their spatial coordinates (x, y), the latter expression can be rewritten as 2 f n+1 ((x, y))=f n ((x, y))−Δt | min 0, f n ((x, y))−f n ((x−1, y)) 2 +| max 0, f n ((x+1, y))−f n ((x, y)) 2 +| min 0, f n ((x, y))−f n ((x, y−1))
2 1/2 +| max 0, f n ((x, y+1))−f n ((x, y)) − P ((x, y)) , since min(0, a−b)2 = max(0, b−a)2 . This equation corresponds to the discretization scheme of the Hamilton-Jacobi equations proposed by [14]. Dikjstra scheme. Let G=(V, E, g0 ) be an unweighted graph. Then, (11) corresponds to an iterative version of the Dikjstra shortest path algorithm defined on graphs of arbitrary structure. Indeed, in the case where p=∞, Δt=1 and with G, (11) becomes, for all u∈V f n+1 (u) = f n (u) − max | min(0, f (v)−f (u))| + P (u) = min(f n (v)) + P (u), v∼u
v∼u
Adaptation of Eikonal Equation over Weighted Graph
193
by considering the neighborhood of u as the set N (u)∪{u} and with the properties that max(0, a−b)= − min(0, b−a) and min(0, a−b)= min(a, b)−b. This equation corresponds to a shortest path algorithm for a given graph where at each step, the distance f (u) at vertex u corresponds to the minimal distance in its neighborhood.
4
Experiments
The proposed formulation of the eikonal equation and can be used to process any function defined on vertices of a graph or on any arbitrary discrete domain. This Section illustrates the potentialities of our formulation through examples of weighted distance computation, image segmentation and unorganized high dimensional data processing. Different graph structures and weight functions are also used to show the flexibility of our approach. In the sequel, all experiments are obtained with a constant potential function P =1. Clearly, a different potential function can be adapted for a particular application. The objective of the following experiments is not to solve a particular application. They only illustrate the potential and the behavior of our eikonal equation formulation. Adaptative Front Propagation and Weighted Distances. Figure 1 shows the adaptivity of our formulation in order to compute weighted distances. Indeed, this example shows results for different p values, graph topologies, weight functions and features F . The initial seed is located at the top left corner of the original grayscale image f 0 :H(V )→IR. First, second and third rows of Fig. 1 show results for p=2, 1 and ∞ respectively, where (10) and (11) are used. All the results correspond to color distance maps (red for small and blue for large distances) where iso-levels sets are superimposed in white. First and second columns of Fig. 1 show results obtained with unweighted (w=g0 ) graphs. First column uses a 4-adjacency grid graph (G0 ) and corresponds to the classical case. Second column uses a 25-adjacency grid graph (G2 ) and shows the effect of a larger neighborhood. Third and fourth columns show results obtained with weighted graphs. Third column considers graph G0 weighted by function g2 with F =f 0 . By using non constant weights, image information is automatically integrated in the distance computation that modifies the front evolution speed particularly into the textured sub-image. Fourth column shows the nonlocal case where graph G2 is constructed and weighted with function g2 associated with patches of size 11×11. In that case, repetitive information are clearly captured by the weights that stops the front propagation around the textured sub-image. Finally, segmentation of the textured sub-image can be simply obtained by thresholding the computed distances. Image Segmentation with Region Based Graphs. The goal of the following two examples is not to show a perfect segmentation but to show how we can take advantage of graph topologies in image segmentation. The basic idea is to consider that image pixels are not the only relevant components in image
194
V.-T. Ta, A. Elmoataz, and O. Lézoray
Original f 0 :H(v)→IR
G0 , w=g0
G2 , w=g0
G0 , w=g2 F =f 0
G2 , w=g2 F =F5 (f 0 , .)
p=1
p=2
p=∞
Fig. 1. Front propagation and weighted distances with different p values, graph configurations G, weights w and features F . Figures represent color distance maps with iso-level sets obtained by thresholding the distances. The seed is located at the top left corner (see text for more details).
and more abstract elements such as image regions can be used. Hence, we suggest to work directly with a reduced version of images: image partitions. Image partitions can be obtained by image pre-processing methods such as watershed. Figures 2(b) and 3(b) show such partitions computed from Figs. 2(a) and 3(a). Figures 2(c) and 3(c) are reconstructed images from partitions with the mean color value for each region. Figure 2 presents an example of image segmentation based on RAG and also shows that this graph structure can accelerate segmentation processes. This example compares segmentation obtained by a 4-adjacency grid graph G0 weighted by function g2 with pixel grayscale values (Fig. 2(d)) and segmentation result with a RAG constructed from partition (Fig. 2(b)) and weighted by function g2 with mean values (Fig. 2(e)). Color distance maps are obtained with the initial seeds (white points) in Fig. 2(a). Segmentations are performed by thresholding the obtained distances. Results show similar behaviors both on distance maps and segmentations while drastically speeding-up the segmentation process in the RAG case. Indeed, the number of vertices in the RAG represents approximatively 3% as compared to the number of vertices in the pixel based graph. The direct consequence is a decreasing of the computational complexity thanks to the reduced amount of data to consider. On a standard computer the computing time can be decreased by a 10 factor. Figure 3 shows another benefit of using a RAG structure: nonlocal (non spatially connected) object segmentation. This experiment compares segmentation results with RAG (Fig. 3(d)) and nonlocal RAG (Fig. 3(e)). Both graphs are
Adaptation of Eikonal Equation over Weighted Graph
(a)
Original and seeds
(white)
(d)
(b)
Partition (97% of
reduction)
Grid graph G0 , w=g2
(c)
195
Reconstructed im-
age
(e)
RAG, w=g2
Fig. 2. Acceleration of image segmentation process. (a) original image (150×235) with 35 250 pixels. (b) partition with 999 regions (97% of reduction in terms of image components). (c) reconstructed image with mean color value. (d) and (e): at left, distance color maps (red for small and blue for large distances) and at right, final segmentations. Images (d) are obtained with a pixel based graph computed from (a). Images (e) are obtained with a RAG constructed with (b) and (c) (see text for more details).
computed from partition 3(b) weighted by function g2 with mean color values. In nonlocal RAG case, each vertex neighborhood is extended by a 5 nearest neighborhood based on mean value feature. The obtained graph is a RAG∪5NNG graph. Figures 3(d) and 3(e) show color distance maps computed from initial seeds (white stroke) in Fig. 3(a) and final segmentations. For local case (Fig. 3(d)), object marked by seeds is well segmented with respect to close distances (red color). The other objects are far (blue color) and the final segmentation only extracts the marked one. For nonlocal case (Fig. 3(e)), the distance within the marked object is close to the initial seeds. In addition the distances to other triangles in the scene are also computed as close to seeds (red color). The consequence is that all the objects in the image are extracted by thresholding even if they are not spatially close with a minimal number of initial seeds. Unorganized High Dimensional Data Processing. The following experiments show applications of our formulation of the eikonal equation for the processing of high dimensional data in irregular domains. Figure 4 shows applications of the eikonal equation for data clustering and shortest path problems. The initial data set (Fig. 4(a)) is constituted of 133 images of head pose. Each image is of size 29×29. From this data set, two possible applications can be performed: clustering and head pose transition estimation. The goal here is not to solve machine leaning problems, but to show that these problems can be addressed by our formulation of eikonal equation. In order to process such data, a graph (|V |=133) is constructed where each vertex represents an image and is described by a feature of size 29×29 (i.e IR841 ). In the following results, initial seeds (images) are represented with white boundaries. Points that are close and
196
V.-T. Ta, A. Elmoataz, and O. Lézoray
(a)
Original and seeds
(white)
(d)
(b)
Partition (98% of re-
(c)
Reconstructed image
duction)
RAG, w=g2
(e)
RAG∪5-KNNG,w=g2
Fig. 3. Nonlocal region based image segmentation. (a) original image (256×256) with 65 536 pixels. (b) partition with 1 324 regions (98% of reduction as compared to original one). (d) and (e) at left, distance color maps (red for small and blue for large distances) and at right, final segmentations. Graphs used in (d) and (e) are computed from (b) and (c) (see text for more details).
far to seeds are respectively represented with blue and red colors in distance maps (Fig. 4(b) and 4(c)). Figures 4(b) and 4(d) show the application of the eikonal equation for data clustering. Such an application can be used for data set exploration or semisupervised learning: given an input seed (query) one wants to obtained the closest points with respect to the initial input. Figure 4(b) shows the distance map obtained from a single initial seed. Figure 4(d) shows clustering results. Initial input has a white boundary. The 10 closest images are located at the top and the 10 farthest are located at the bottom of Fig. 4(d). Figures 4(c) and 4(e) shows another example of application of the eikonal equation for data set. Given two initial images, one wants to recover a transition sequence of images that separates them. This problem can be viewed as a shortest path problem solved by the eikonal equation. Figure 4(c) shows the distance map obtained from the initial seeds. Figure 4(e) shows the obtained path from seed at top left to seed at bottom right. These experiments show satisfying results and the ability of our approach to address machine learning problems even if a simple euclidean distance is used to compare data points. Clearly, results can be improved by using well adapted distances or features estimation.
Adaptation of Eikonal Equation over Weighted Graph
(a)
Original data, 133 images of size 29×29 (|V |=133, f 0 : V →
841
IR
(b)
)
Color distance map+seed (white boundaries)
(d)
197
Local clustering with (b)
(c)
Color distance map+seeds (white boundaries)
(e)
Shortest path with (c)
Fig. 4. High dimensional data clustering and shortest path. (b) and (c) color distance maps (blue for small and red for large distance) images superimposed with white boundaries are initial seeds. (d) clustering results where at top the 10 closest and at bottom the 10 farthest with respect to the seed (white boundary). (e) shortest path from the two initial seeds (white boundary).
5
Conclusion
In this paper, a discrete version of the eikonal equation over weighted graphs of arbitrary structure is proposed. Solution of the eikonal equation based on PdEs, discrete gradients and weighted graphs is presented. The proposed formulation constitutes a simple, common and adaptative framework that recovers well-known definitions and unifies local and nonlocal configurations in the context of image processing. This framework can consider any discrete data that can be represented by weighted graphs. Through experiments, we have shown the potentiality and the flexibility of our approach to address image segmentation
198
V.-T. Ta, A. Elmoataz, and O. Lézoray
and unorganized high dimensional data processing. Finally, an ongoing work is to address the stationary (time independent) version of the eikonal equation and to solve this equation by considering fast marching like methods on arbitrary graphs within our framework.
References 1. Rouy, E., Tourin, A.: A viscosity solutions approach to shape-from-shading. SIAM J. Num. Anal. 29, 867–884 (1992) 2. Sethian, J.: A fast marching level set methods for monotonically advancing fronts. Proc. Nat. Acad. Sci. 41(2), 199–235 (1999) 3. Siddiqi, K., Bouix, S., Tannenbaum, A., Zucker, S.W.: The hamilton-jacobi skeleton. In: Proc. ICCV, pp. 828–834 (1999) 4. Maragos, P., Butt, M.: Curve evolution, differential morphology and distance transforms as applied to multiscale and eikonal problems. Fundamentae Informatica 41, 91–129 (2000) 5. Bronstein, A.M., Bronstein, M.M., Kimmel, R.: Weighted distance maps computation on parametric three-dimensional manifolds. J. Comput. Phys. 225(1), 771–784 (2007) 6. Sethian, J.A., Vladimirsky, A.: Ordered upwind methods for static hamiton-jacabi equations: Theory and algorithms. SIAM J. Num. Anal. 41(1), 325–363 (2003) 7. Abgrall, R.: Numerical discretization of the first-order hamilton-jacobi equations on triangular meshes. Comm. Pure and Applied Math. 49, 1339–1373 (1996) 8. Shu, C.-W., Zhang, Y.-T.: High order WENO schemes for hamilton-jacobi equations on triangular meshes. SIAM J. Scien. Comp. 24, 1005–1030 (2003) 9. Mémoli, F., Sapiro, G.: Fast computation of weighted distance functions and geodesics on implicit hyper-surfaces. J. Comput. Phys. 173, 730–764 (2001) 10. Leveque, R.: Finite Volume Methods for Hyperbolic Problems. Cambridge University Press, Cambridge (2002) 11. Sethian, J.A.: Level Set Methods and Fast Marching Methods. Evolving Interfaces in Computational Geometry. In: Fluid Mechanics, Computer Vision, and Materials Science, 2nd edn. Cambridge University Press, Cambridge (1999) 12. Zhao, H.K.: Fast sweeping method for eikonal equations. Math. Comp. 74, 603–627 (2005) 13. Tsitsiklis, J.: Efficient algorithms for globally optimal trajectories. IEEE Trans. Autom. Control 40(9), 1528–1538 14. Osher, S., Sethian, J.: Fronts propagating with curvature-dependent speed: Algorithms based on Hamilton-Jacobi formulations. J. Comput. Phys. 79, 12–49 (1988) 15. Jaromczyk, J., Toussaint, G.: Proc. IEEE. Relative Neighborhood Graphs and Their Relatives 80(9), 1502–1517 (1992) 16. Elmoataz, A., Lézoray, O., Bougleux, S., Ta, V.T.: Unifying local and nonlocal processing with partial difference operators on weighted graphs. In: Proc. LNLA, pp. 11–26 (2008) 17. Efros, A., Leung, T.: Texture synthesis by non-parametric sampling. In: Proc. ICCV, pp. 1033–1038 (1999) 18. Buades, A., Coll, B., Morel, J.: Nonlocal image and movie denoising. IJCV 76(2), 123–139 (2008) 19. Gilboa, G., Osher, S.: Nonlocal operators with applications to image processing. Report 07-23, UCLA (2007)
Adaptation of Eikonal Equation over Weighted Graph
199
20. Bougleux, S., Elmoataz, A., Melkemi, M.: Discrete regularization on weighted graphs for image and mesh filtering. In: Sgallari, F., Murli, A., Paragios, N. (eds.) SSVM 2007. LNCS, vol. 4485, pp. 128–139. Springer, Heidelberg (2007) 21. Ta, V.T., Elmoataz, A., Lézoray, O.: Partial difference equations over graphs: Morphological processing of arbitrary discrete data. In: Proc. ECCV, pp. 668–680 (2008) 22. Brockett, R., Maragos, P.: Evolution equations for continuous-scale morphological filtering. IEEE Trans. Signal Process. 42(12), 3377–3386 (1994)
A Variational Model for Interactive Shape Prior Segmentation and Real-Time Tracking Manuel Werlberger, Thomas Pock, Markus Unger, and Horst Bischof Institute for Computer Graphics and Vision, Graz University of Technology {werlberger,pock,unger,bischof}@icg.tugraz.at http://www.gpu4vision.org
Abstract. In this paper, we introduce a semi-automated segmentation method based on minimizing the Geodesic Active Contour energy incorporating a shape prior. We increase the robustness of the segmentation result using the additional shape information that represents the desired structure. Furthermore the user has the possibility to take corrective actions during the segmentation and adapt the shape prior position. Interaction is often desirable when processing difficult data like in medical applications. To facilitate the user interaction we add a shape deformation which allows to change the shape position manually by the user and automatically in terms of underlying image features. Using a variational formulation, the optimization can be done in a globally optimal manner for a fixed shape representation. To obtain real-time behavior, which is especially important for an interactive tool, the whole method is implemented on the GPU. Experiments are done on medical, as well as on video data and camera streams that are processed in real-time. In terms of medical data we compare our method with a segmentation done by an expert. The GPU based binaries will be available online on our homepage.
1
Introduction
Image segmentation is a very common problem in computer vision. Many segmentation methods use low-level features to obtain a division into fore- and background. Due to the need of robustness it has become a common practice to incorporate high level knowledge to gain reasonable results. The method presented in this paper enhances the robustness of segmentation by imposing shape information of the desired object which allows a precise result with difficult image data (Fig. 1). Pioneering contributions have been made by Cremers et al. [1] with their variational approach of ‘Diffusion Snakes’, the level set formulation by Leventon et al. [2, 3] as well as the region based approach by Paragios and Rousson [4, 5]. The efficient registration of the shape prior to the desired image structure is a challenging problem. Therefore we developed a semi-automated segmentation tool that allows to adjust the object position by hand and by a local optimization routine which is modelled as shape transformation in either case. For the X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 200–211, 2009. c Springer-Verlag Berlin Heidelberg 2009
A Variational Model for Interactive Shape Prior Segmentation
201
Fig. 1. Segmentation of the metacarpal bone of a ring finger. The left image shows a simple intensity thresholding which clearly fails. The pure GAC segmentation in the middle image does not result into a valid segmentation either. The shape prior segmentation in the right image can also deal with the low-contrast regions and provides an accurate result.
realization of the segmentation method we use a variational formulation of the Geodesic Active Contour (GAC) energy. The minimization is done with a fast primal-dual approach that is implemented using NVIDIA graphics hardware to become real-time capable. Therefore the method can even be used for tracking objects in videos or live camera streams. Approaches based on the calculus of variation have had great success and recently it has been shown that variational methods show good parallelization capabilities that benefit from a GPU implementation [6, 7]. The main contribution of our work is the incorporation of a shape representation into a variational segmentation framework. We show that the resultant segmentation is globally optimal for a fixed shape prior and provide a GPU implementation for a fast primal-dual optimization procedure defining a definite convergence criterion. Therefore we can add the possibility to interact with the shape position and get a segmentation result in real-time. The framework permits a local optimization of the shape position to get a correct segmentation of objects with a preceding misalignment of the prior. The remainder of the paper is organized as follows: First we give an overview of related work. Section 3 discusses the method on combining a shape prior with a variational formulation of the GAC segmentation model which leads to a segmentation model utilizing a Mumford-Shah (MS) like data term as shape force. In Section 3.1 we propose a fast numerical algorithm to compute the solution of the segmentation model. In 4 we present experiments and a qualitative assessment on reference data. Finally, Section 5 gives a short conclusion.
2 2.1
Related Work Mumford-Shah Segmentation
In [8], Mumford and Shah (MS) proposed a segmentation model of the form 2 2 (u − f ) dx + α |∇u| dx + β length (Γ ) (1) min u,Γ
Ω
Ω\Γ
202
M. Werlberger et al.
where f denotes the observed image, u its piecewise smooth approximation and Γ represents the edges in u. Equation (1) is based on a piecewise smooth approximation of the intensity function and was used in the computer vision community for various tasks like denoising, inpainting, stereo matching, segmentation and many more. A special case of the MS model to segment an image into fore- and background was proposed with the so-called piecewise constant MS segmentation model (2) that was later used by Chan and Vese [9] in combination with a level-set optimization. 2 2 min Per (Σ) + λ (f − c1 ) dx + λ (f − c2 ) dx (2) Σ,c1 ,c2
Σ
Ω\Σ
In (2) f denotes the input image and c1 , c2 the mean values of the fore- and background intensities separated by the region Σ. This realization of the MS functional represents the Potts model [10] for two distinct classes. In addition Chan et al. provided a convex formulation in [11] in form of a Total Variation (TV) functional for a binary segmentation u = 1Σ : |∇u| dx + λ us (x) dx (3) min u∈{0,1}
Ω
Ω
with
2
2
s (x) = (c1 − f (x) ) + (1 − u) (c2 − f (x) )
(4)
and with Ω |∇u| dx being the TV-norm in a distributional sense Ω |D u|. Due to the fact that we are working on image data which can be interpreted as sufficient smoothfunctions, the TV-norm is valid for any input u and we stick to the notation Ω |∇u| dx in this paper. For this formulation the TV-norm denotes the length of the segmentation: Ω |∇u| dx = Per (Σ). Moreover Chan et al. [11] showed that a global minimizer can be found for (2) with a restricted minimization of the relaxed problem (3) so that 0 ≤ u ≤ 1. The minimization set is then given by Σ = x ∈ Ω : u > μ , for every μ ∈ (0, 1) . (5) 2.2
Geodesic Active Contours
Based on the Snake model of Kass et al. [12], Caselles et al. [13] and Kichenassamy et al. [14, 15] proposed an energy that is invariant with respect to new parametrizations of the contour. The Geodesic Active Contour (GAC) (in 3D the model is called minimal surface) is defined as the variational problem |C| min g (|∇I (C(s))|) ds , (6) C
0
where |C| describes the Euclidean length of the curve C and the function g models an edge detector. The edge strength has to be restricted to an interval g ∈ (0, 1]. One common choice for computing g is κ
g(|∇I|) = e−η|∇I| ,
for some reasonable parameters κ and η.
(7)
A Variational Model for Interactive Shape Prior Segmentation
203
The general intention of the Snake model is to locate the curve at points with a high edge strength and keep a certain smoothness in the curve. The main advance of GACs is the profound mathematical framework that makes the model very versatile for different applications. The main drawbacks of the model are the non-convexity of the GAC energy and that the empty set is always a global minimizer of (6). In [16, 17, 18], several authors proposed the so-called weighted Total Variation (8) that can be used to give an alternative formulation of the GAC energy. They showed that if u = 1ΩC is a binary function with C the boundary of ΩC , the energy (8) equals the GAC energy (6). T Vg (u) = g |∇u| dx (8) Ω
Note that the weighted TV-norm is similar to the regularization term of (3). The additional weighting function g is a pointwise constant multiplier and therefore the method of Chan et al. [11, 19] mentioned in the previous Section 2.1 is still valid. Based on these assumptions they showed that by replacing 1ΩC with u ∈ [0, 1], (8) becomes convex, allowing to compute a global minimizer. However, there remains the problem that the empty set depicts a global optimal solution. In [6, 7], Unger et al. proposed a variational formulation incorporating user constraints that avoid this drawback of the classical GAC model.
3
Shape Prior Segmentation
Our main contribution is to combine GAC segmentation and a MS-like data term to incorporate shape information. Therefore we model the GAC energy with the weighted Total Variation (8) and utilize the need of additional constraints by imposing the shape prior. Starting with a formulation of the Mumford-Shah like energy utilized by Chan and Vese in [9] we obtain a variational optimization problem like in (3). The multiplicative part in the data-term will be used as shape information s (x) to model the shape prior segmentation energy. In addition we use the weighted Total Variation (8) to model the GAC energy and add a parameter λ to balance between regularization and shape force: min g |∇u| dx + λ s (x) u dx (9) 0≤u≤1
Ω
Ω
For a low λ the result of the GAC will be preferred, whereas for increasing λ the shape prior will be taken more into account. In Fig. 2 the effects of different parameter settings are shown. Since (9) is homogeneous of degree one, the thresholding theorem of [11] still applies in our case, allowing us to compute the global minimizer of (9) as mentioned in Section 2.1 according the MS segmentation model (3). The optimization method will be discussed in more detail in the following Section 3.1.
204
M. Werlberger et al.
Fig. 2. Evaluation of different settings of λ: The images show that with increasing λ = {0.01, 0.02, 0.1} (left to right) the segmentation is more attracted to the fixed shape. For a low λ the influence of the pure GAC energy increases and the segmentation is more attracted to significant edges.
The shape prior itself influences the segmentation by setting pixelwise foreand background constraints which are modelled in the following way: s (x) < 0 s (x) > 0
... ...
Foreground Background
(10)
Therefore one can use different types of shape representations. As a simple example we defined the shape s (x) as a binary function with s (x) = −1 within the shape region and s (x) = 1 outside similar to [20] where Cremers et al. use subspace methods to learn a representative set of shapes. Binary functions are used to encode shapes which lead to problems when interpolating between two instances. As a consequential step we use a signed-distance map with the constraints (10) which implicitly includes a distance information towards the shape boundary. That means for our algorithm that the more a pixel is within the shape boundary, the more likely this region belongs to the desired segmentation. In Fig. 5 we show the benefit on using a signed-distance map as shape representation compared to a binary one. This representation and the combination with a GAC energy allows to handle deformations with a single prior. 3.1
Solving the Shape Prior Segmentation Model
It is well known that functionals like (9) are difficult to optimize due to the L1 norm |∇u|. Chan et al. [21], Carter [22] and Chambolle [23, 24] proposed a dual formulation for optimizing the classical variational problem of Rudin, Osher and Fatemi [25] for image denoising. Such a primal-dual approach can be applied to our minimization problem (9). The main intention is to remove the singularity by introducing the dual formulation of the weighted TV-norm g |∇u| dx = max − u div p dx , (11) Ω
d T
||p||≤g
Ω
where p = p1 , . . . , p : Ω → Rd is the dual variable with d being the problems dimension. Combining this maximization problem with the initial minimization task (9) this leads to
A Variational Model for Interactive Shape Prior Segmentation
min max − u div p dx + λ s (x) u dx .
0≤u≤1 ||p||≤g
Ω
205
(12)
Ω
For a fixed shape prior the outline of the primal-dual optimization algorithm is given as follows: 1. Primal update: The primal update accomplishes the segmentation update and therefore performs the optimization according to the minimization of u: ∂ − u · div p dx + λ s (x) u dx = − div p + λs (x) (13) ∂u Ω Ω Performing a gradient descent update scheme this leads to un+1 = Π[0,1] un − τP (− div p + λs (x) ) ,
(14)
where τP denote the steplength and the orthogonal projection Π towards the binary set [0, 1] can be done with a simple thresholding step. 2. Dual update: The maximization according to p can be stated as ∂ p · ∇u dx + λ s (x) u dx = ∇u (15) ∂p Ω Ω with the additional constraint ||p|| ≤ g. This results into a gradient ascent method with a orthogonal reprojection to restrict the length of p to the weight g: pn+1 = ΠB0g pn + τD ∇u (16) Here B0g denotes a d-dimensional ball centered at the origin with the radius g. The reprojection onto B0g can be formulated with ΠB0g q =
q ||q|| max 1, g
(17)
3. Iterate until convergence: Solving the optimization problem (12) results in a consecutive update scheme with a gradient descent (14) and a gradient ascent step (16). Such an iterative algorithm demand on a convergence criterion. Therefore we take the energy of the single steps into account: Primal energy: The primal energy can be calculated by solving (12) by maximizing the equation towards the dual variable p. Due to (11) p can be restated as ∇u g |∇u| p ∈ B0g
p=
if ∇u =0
(18)
else
for the optimization. This results into the energy equation (19) which is the same as evaluating the energy functional (9). g |∇u| + λ s (x) u dx. (19) EP rimal = Ω
206
M. Werlberger et al.
Fig. 3. Relation of primal to dual energy while optimizing with the proposed primaldual update scheme. The plot shows the iterations in a logarithmic scale over 100 iterations. Note that after 20 iterations the primal-dual gap is small enough to stop iterating.
Dual energy: The dual energy can be formulated by minimizing (12) towards u:
min u − div p + λs (x) dx , (20) 0≤u≤1
Ω
which conclude that the binary segmentation u ∈ {0, 1} is set to u = 1 if the term − div p + λs (x) < 0 and u = 0 otherwise: min − div p + λs (x) , 0 dx (21) EDual = Ω
In [26, 27], Zhu et. al. introduce a measurement for the convergence state of primal-dual algorithms in case of the ROF model. As a criterion they use the gap between the primal and dual energy. Applied to our primal-dual optimization algorithm we get an adaption of the energies like in Fig. 3. Therefore we use a fixed stepwidth for the primal (τP ) and dual update (τD ) with the constraint that τP τD ≤ 12 . For all the results shown in this paper we used τP = τD = √12 . We also tried adaptive timesteps (τP , τD ) for the optimization steps similar to the work of Zhu and Chan in [26] but did not find a reasonable equivalent for our method. 3.2
Shape Alignment
So far, our considerations assume a spatially fixed shape prior s (x) . In order to adapt the shape prior to different locations in the image we introduce a set of transformation parameters φ = {t, R, S} with the transformation parameters t for translation, R for rotation and S for the scale. Imposing this transformation into the segmentation energy (9) leads to an additional optimization parameter: min φ (t, R, S) ◦ s (x) u dx (22) g |∇u| dx + λ u,φ
Ω
Ω
A Variational Model for Interactive Shape Prior Segmentation
207
Fig. 4. Segmentation of a vertebra in an X-ray image of the spline. Due to very bad contrast the segmentation without prior would fail completely. The definition of the prior was prepared by us and therefore cannot be considered as reference data.
In [20], Cremers et al. show with the help of the Lipschitz continuity that with an integrated rigid body motion the energy functional remains convex and therefore can be optimized globally for fixed transformation parameters. They additionally show that for optimizing the shape position itself the complete subspace Ω where the energy (9) is defined has to be sampled on a rather fine grid with all possible shape positions for evaluating the minimization task in a global manner. This is of course not feasible for an interactive application due to the need of optimizing the problem in real-time. Therefore we added a semiautomated approach that allows the user to have influence on the shape position and in addition an automated search for the optimal transformation parameters in a local neighborhood can be done. The optimization scheme in Sect. 3.1 can be retained with an extension on optimizing the transformation φ: 1. Solve shape prior segmentation model. 2. Optimize transformation parameters φ with a fixed u: Here we use a semi-automated position optimization of the shape prior. First, the user has the possibility to do a coarse positioning of the shape and second an optimization step tries to fit the shape to desired surrounding structures. Therefore we evaluate the energy (19) for different transformations φ. The optimal position is found where (19) has its minimum. The user gets an immediate result while changing the shape position and therefore can directly interact with the segmentation algorithm. 3. Iterate until convergence. Note that the domain of φ is restricted due to performance issues. Doing a complete search over the whole parameters space of the transformation parameters, a global optimal solution of (22) can be calculated like in [20].
4
Experimental Results
To reach real-time performance we have to compute the iterative solution as fast as possible. Due to the good parallelization attributes of variational algorithms we decided to implement the method using GPGPU programming with the help of NVIDIAs CUDA. The involved enhancements offer the possibility to combine
208
M. Werlberger et al.
Fig. 5. The same dataset as in Fig. 1 was used to evaluate the shape alignment step. Therefore the shape prior is placed nearby the desired bone (left image) and the alignment step searches for the optimal position in the local neighborhood. The results (middle image) show that the segmentations are equivalent to the hand-labeled points. The right image shows the alignment step using a binary shape prior which fails for the optimization step. For the correct result in the middle image a signed distance representation of the shape term was used.
Fig. 6. The position refinement is robust against partial occlusion
user interactivity with the computational intensive variational method and get a segmentation result at interactive rates. The performance depends mainly on the size of the search region in the parameter space and on the shape prior’s size as well. As an example we achieve 80 frames per second on a NVIDIA GeForce GTX 280 for pure segmentation and 20 frames per second including the position optimization. In our Framework we have two possibilities to provide a shape prior for the segmentation. Either the user can define a shape prior directly with a segmentation of a structure using pure GAC energy by setting foreground and background seeds. This is especially useful when recurring structures have to be segmented. For more accurate results especially on difficult data we can load a predefined shape structure that is used for the segmentation method. In Fig. 1 a predefined prior is used for segmenting finger bones and compare the segmentation result with a simple intensity thresholding and a segmentation with the pure GAC-energy. The result shows an identical segmentation result for the shape prior segmentation and the reference data labeled by an expert. A more difficult example is shown in Fig. 4 which shows a segmentation of a single vertebra in an X-ray image of the spline. Due to the very bad contrast simple thresholding would obviously fail completely and also pure GAC energy would end up into setting very much seed information.
A Variational Model for Interactive Shape Prior Segmentation
209
Fig. 7. Real-time tracking of an espresso cup in a live camera stream
Fig. 8. Multiple vertebrae of the cervical spline are tracked separately through a sequence. The intention is to ascertain the movement of the vertebrae towards each other.
Fig. 9. Segmentation of bottles with a single shape prior
Examples of automated shape alignment with optimizing the transformation parameters in a local neighborhood are shown in Fig. 5–8. Fig. 5 shows again the labeled bone dataset. The overlay with reference data shows that the position optimization leads to the correct segmentation. For non-medical image data examples are presented in Fig. 6 and 7. The first one shows that the proposed method is robust against partial occlusion. Furthermore the method can be used
210
M. Werlberger et al.
for tracking a certain structure over a sequence of frames. Fig. 7 shows a sequence of an espresso cup which was directly processed from a camera image with on-thefly segmentation and position optimization. For a restricted domain of movement we gain real-time performance for tracking a sequence. In Fig. 8 we track parts of the cervical spline in a moving X-ray image series to obtain a path of movement of the vertebrae to each other during a flexion. The predefined shapes of the four vertebrae are initialized in the first frame and than automatically tracked over the complete flexion. This can be used to ascertain shapes of implants for intervertebral discs. In Fig. 9 multiple objects are segmented with the help of a single shape prior. Therefore a prior is defined in form of a bottle and placed roughly on each bottle in the image. The fine adjustments are done automatically.
5
Conclusion
In this paper, we proposed a globally optimal shape prior segmentation method with additional user interaction and automated position refinement. With this approach we can handle very different images and gain robust segmentation results. Especially the segmentation for difficult data like the low-contrast spline image benefits from the additional shape information. A great advantage of variational methods like this are the parallelization capability that especially profits by the modern graphics hardware that are able to boost the performance of such highly parallel algorithms.
References 1. Cremers, D., Tischhäuser, F., Weickert, J., Schnörr, C.: Diffusion snakes: Introducing statistical shape knowledge into the Mumford–Shah functional. International Journal of Computer Vision 50(3), 295–313 (2002) 2. Leventon, M., Faugeraus, O., Grimson, W.: Level set based segmentation with intensity and curvature priors. In: Workshop on Mathematical Methods in Biomedical Image Analysis, pp. 4–11 (2000) 3. Leventon, M., Grimson, W., Faugeras, O.: Statistical shape influence in geodesic active contours. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 316–323. IEEE, Los Alamitos (2000) 4. Paragios, N., Rousson, M., Ramesh, V.: Matching distance functions: A shape-toarea variational approach for global-to-local registration. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2351, pp. 775–789. Springer, Heidelberg (2002) 5. Paragios, N., Rousson, M., Ramesh, V.: Non-rigid registration using distance functions. Computer Vision and Image Understanding 89(2-3), 142–165 (2003) 6. Unger, M., Pock, T., Bischof, H.: Continuous Globally Optimal Image Segmentation with Local Constraints. In: Computer Vision Winter Workshop (2008) 7. Unger, M., Pock, T., Trobin, W., Cremers, D., Bischof, H.: TVSeg - Interactive total variation based image segmentation. In: British Machine Vision Conference (2008)
A Variational Model for Interactive Shape Prior Segmentation
211
8. Mumford, D., Shah, J.: Optimal approximations by piecewise smooth functions and variational problems. Comm. on Pure and Applied Math. XLII(5), 577–685 (1988) 9. Chan, T.F., Vese, L.A.: Active contours without edges. IEEE Trans. Image Processing 10(2), 266–277 (2001) 10. Potts, R.B.: Some generalized order-disorder transformations. Proc. Camb. Phil. Soc. 48, 106–109 (1952) 11. Chan, T.F., Esedoglu, S., Nikolova, M.: Algorithms for finding global minimizers of image segmentation and denoising models. SIAM Journal of Applied Mathematics 66(5), 1632–1648 (2006) 12. Kass, M.: Snakes: Active contour models. International Journal of Computer Vision 1(4), 321–331 (1980) 13. Caselles, V., Kimmel, R., Sapiro, G.: Geodesic active contours. International Journal of Computer Vision 22(1), 61–79 (1997) 14. Kichenassamy, S., Kumar, A., Olver, P., Tannenbaum, A., Yezzi, A.: Conformal curvature flows: From phase transitions to active vision. Archive for Rational Mechanics and Analysis, 275–301 (1996) 15. Kichenassamy, S., Kumar, A., Olver, P.J., Tannenbaum, A.R., Yezzi Jr., A.J.: Gradient flows and geometric active contour models. In: International Conference on Computer Vision, pp. 810–815 (1995) 16. Leung, S., Osher, S.: Global minimization of the active contour model with TVinpainting and two-phase denoising. In: Paragios, N., Faugeras, O., Chan, T., Schnörr, C. (eds.) VLSM 2005. LNCS, vol. 3752, pp. 149–160. Springer, Heidelberg (2005) 17. Bresson, X., Esedoglu, S., Vandergheynst, P., Thiran, J.P., Osher, S.J.: Global minimizers of the active contour/snake model. In: International Conference on Free Boundary Problems: Theory and Applications (FBP) (2005) 18. Bresson, X., Esedoglu, S., Vandergheynst, P., Thiran, J.P., Osher, S.J.: Fast global minimization of the active contour/snake model. J. of Mathematical Imaging and Vision 28(2), 151–167 (2007) 19. Chan, T.F., Esedoglu, S.: Aspects of total variation regularized L1 function approximation. SIAM Journal of Applied Mathematics 65(5), 1817–1837 (2005) 20. Cremers, D., Schmidt, F.R., Barthel, F.: Shape priors in variational image segmentation: Convexity, Lipschitz continuity and globally optimal solutions. In: Computer Vision and Pattern Recognition, pp. 1–6 (2008) 21. Chan, T.F., Golub, G.H., Mulet, P.: A nonlinear primal-dual method for total variation-based image restoration. SIAM Journal on Scientific Computing 20(6), 1964–1977 (1999) 22. Carter, J.: Dual Methods for Total Variation-based Image Restoration. PhD thesis, UCLA (2001) 23. Chambolle, A.: An algorithm for total variation minimization and applications. Journal of Mathematical Imaging and Vision 20(1-2), 89–97 (2004) 24. Chambolle, A.: Total variation minimization and a class of binary MRF models. In: Rangarajan, A., Vemuri, B.C., Yuille, A.L. (eds.) EMMCVPR 2005. LNCS, vol. 3757, pp. 136–152. Springer, Heidelberg (2005) 25. Rudin, L.I., Osher, S.J., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena 60, 259–268 (1992) 26. Zhu, M., Chan, T.: An efficient primal-dual hybrid gradient algorithm for total variation image restoration. UCLA CAM Report 08-34 (2008) 27. Zhu, M., Wright, S.J., Chan, T.F.: Duality-based algorithms for total variation image restoration. UCLA CAM Report 08-33 (2008)
A Nonlinear Probabilistic Curvature Motion Filter for Positron Emission Tomography Images Musa Alrefaya, Hichem Sahli, Iris Vanhamel, and Dinh Nho Hao Vrije Universiteit Brussel, Dept. Electronics and Informatics ETRO-IRIS, Pleinlaan 2, B-1050 Brussels, Belgium {malrefay,hsahli,iuvanham}@etro.vub.ac.be http://www.etro.vub.ac.be
Abstract. Positron Emission Tomography (PET) is an important nuclear medicine imaging technique which enhances the effectiveness of diagnosing many diseases. The raw-projection data, i.e. the sinogram, from which the PET is reconstructed, contains a very high level of Poisson noise. The latter complicates the PET image’s interpretation which may lead to erroneous diagnoses. Suitable denoising techniques prior to reconstruction can significantly alleviate the problem. In this paper, we propose filtering the sinogram with a constraint curvature motion diffusion for which we compute the edge stopping function in terms of edge probability under the assumption of contamination by Poison noise. We demonstrate through simulations with images contaminated by Poisson noise that the performance of the proposed method substantially surpasses that of recently published methods, both visually and in terms of statistical measures.
1
Introduction
Positron Emission Tomography (PET) is an in vivo nuclear medicine imaging method that provides functional information of the body tissues. The PET image results from reconstructing very noisy, low resolution raw data, i.e. the sinogram, in which important features are shaped as a curved structures. Enhancing the PET image spurred a wide range of denoising models and algorithms. Some methodologies focus on enhancing the reconstructed PET image directly, where others prefer enhancing the sinogram prior to reconstruction. Existing methods may suffer drawbacks such as the careful selection of a high number of parameters, smoothing of the important features’ boundaries, or prohibitive computation. Recently, nonlinear diffusion techniques have been investigated for PET images. Many researchers did explore the application of the well-known Perona
The authors sincerely wishes to express great thanks to Prof. M. Defrise, Division of Nuclear Medicine at AZ-VUB, for his discussions and feedback. The comparison to the TV-Nestrove scheme would not have been possible without the help of Dr. Pierre Weiss from INRIA-France, who did provide us the Matlab Code.
X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 212–223, 2009. c Springer-Verlag Berlin Heidelberg 2009
A Curvature Motion Filter for PET
213
and Malik anisotropic diffusion [15] in combination with diverse diffusivity functions, on PET images [2, 4, 5, 14, 26], as well as on sinograms [6, 25]. The main drawback of this filter, with respect to the Poisson noise, which characterize such type of images, is that the diffusion produces important oscillations in the gradient, which finally leads to a poorly smoothed image [28, 29]. Moreover, the adopted diffusivity functions do not consider the special properties of the sinogram in which the preservation of the curved-shape features is paramount (see Figure1. In [28], mean curvature motion and Gaussian curvature motion of PET images have been investigated. Total variation (TV) scheme for smoothing the PET images was also discussed in [28]. Happonen et al. [29] propose filtering the sinogram in the stackgram domain where the signal along the sinusoidal trajectories of the sinogram can be filtered separately. They used and compared the Gaussian and nonlinear filters. Filtering the sinogram has the advantage that the noise distribution is known, which is not the case after reconstruction. Consequently, this work proposes filtering the sinogram by means of a curvature constrained filter in which the amount of diffusion is modulated according to a probabilistic diffusivity function that suits images contaminated with Poisson noise. In addition, a comparison of the proposed method with TV-based methodologies proposed in [1] and [12], is conducted. For this purpose, a simulated thorax PET phantom was constructed to which varying levels of Poisson noise have been added is used. For evaluating the filtering approaches, contrast noise curves (contrast versus background noise at different iteration numbers) were generated for the different filtering approaches. The reminder of the paper is organized as follows. Section 2, briefly review the notions of curvature motion, edge affected diffusion filtering, and self-snakes. The proposed filtering scheme is introduced in Sect. 3. Section 4.1 introduces the applied validation methods, the remainder of Sec.4 discusses the experimental results. Conclusions and future work are given in Sec.5.
2
Geometry Driven Scale-Space Filtering
This section reviews the formulations for mean curvature motion (MCM), Edge Affected Variable Conductance Diffusion (EA-VCD), and self-snakes. Let f be a scalar image defined on the spatial image domain Ω, then the family of diffused versions of f is given by: U (f ) : f (.) → u(., t) with u(., 0) = f (.)
(1)
where U is referred to as the scale-space filter, u is denoted the scale-space image, and the scale t ∈ R+ [23, 26]. The denoised or enhanced version of f , is a given u(., t) that is closest to the unknown noise-free version of f . 2.1
Curvature Motion
One way of introducing smoothness in the curve is to let it evolve under its Euclidean curvature k. Mean curvature motion (MCM) is considered as the
214
M. Alrefaya et al.
standard curvature evolution. MCM allows diffusion solely along the level-lines. In Gauge coordinates the corresponding PDE formulation is: ∇u ut (., t) = uvv = k|∇u| = div |∇u| (2) |∇u| Hence diffusion solely occurs along the v-axis. 2.2
Edge Affected Variable Conductance Diffusion
Variable Conductance filtering (VCD) is based on the diffusion with a variable conduction coefficient that controls the rate of diffusion [23]. In the case of Edge Affected-VCD (EA-VCD), the conductance coefficient is inversely proportional to the edgeness. Consequently it is commonly referred to as the edge stopping function (g), in which the edgeness is typically measured by the gradient magnitude. The EA-VCD is governed by: ut = div [g(|∇u|)∇u]
(3)
The above PDE system together with the initial condition given in (1) is completed with homogenous von Neumann boundary condition on the boundary of the image domain. Note that the Perona and Malik’s antitropic diffusion [15] is an EA-VCD. 2.3
Self-Snakes
Self-snakes are a variant of the MCM in which an edge-stopping function is introduced [19]. The main goal is preventing further shrinking of the level-lines once they have reached the important image edges. For scalar images, self-snakes are governed by: ∇u ut = |∇u|div g(|∇u|) (4) |∇u| This equation adopts the same boundary condition as (3). Furthermore, it can be decomposed in two parts [19, 23]: ∇u ut = g|∇u|div |∇u| +(∇g).∇u (5) = gk|∇u| +(∇g).∇u The first part describes a degenerate forward diffusion along the level lines, i.e. orthogonal to the local gradient; it allows preserving the edges. Additionally, the diffusion is limited in areas with high gradient magnitude and encouraged in smooth areas. Actually the first term is the constraint curvature motion. The second term can be viewed as a shock filter since it pushes the level-lines towards valleys of high gradient, acting as Osher’s shock filter [18].
A Curvature Motion Filter for PET
3
215
The Probabilistic Curvature Motion Filter
Based on (i) the curvature motion method, and (ii) a probabilistic diffusivity function, we presented in an earlier work [16], this section introduces the proposed filtering schemes for PET sinogram, considering the following characteristics: 1. The important features in the sinogram are curved structures with high contrast values. These represent the region of interests in the reconstructed PET image, e.g. tumor. 2. The weak edges in the sinogram are the edges that contains low contrast values. In other words, edges with small |uww |. 3. The noise in the sinogram is a priori identified as a Poisson noise. The above presented schemes, namely, MCM (2), EA-VCD (3) and the SelfSnakes (5), can be derived using the following general equation: ut = g1 (|∇u|)uvv + g2 (|∇u|)uww
(6)
where the second order Gauge derivatives of the image in the (vv) and (ww) directions are given by: uvv =
uxx u2y −2ux uy uxy +uyy u2x
(u2x +u2y )
uww =
uxx u2y +2ux uy uxy +uyy u2x
(u2x +u2y )
(7)
Equation (6) comprises, a diffusion modulated by g1 along the image edges (vv) (a smoothing term), and a diffusion adjustable by g2 across the image edges (ww) (a sharpening term). Careful modeling of these terms allows efficiently denoising the PET sinograms, whilst keeping their interesting features. In the following sections, we propose the use of a probabilistic diffusivity function, and derive two diffusions schemes, for which we did apply the Gauge derivatives numerical approximation that was described in [23]. 3.1
The Probabilistic Diffusivity Function
The main idea of the Probabilistic Diffusivity Function [16] is to express the diffusivity function as the probability that the observed gradient presents no edge of interest under a suitable marginal prior distribution for the noise-free gradient histogram. The diffusivity function was defined as: gpr (x) = A(1 − P (H1 |x))
(8)
where the normalizing constant A is set to A = 1/(1 − P (H1 |0)) to ensure that gpr (0) = 1; the hypothesis H1 describes the notion whether an edge element of interest is present given the considered noise, and H0 an edge element of interest is absent. Formally, H0 : y ≤ σn , and H1 : y > σn (9)
216
M. Alrefaya et al.
with y being the ideal, noise-free, gradient magnitude, and σn the noise standard deviation in the observed gradient image. In [16] it has been demonstrated that gpr (x) = (1 + μη(0))
1 1 + μη(x)
(10)
where μ = P (H1 )/P (H0 ) is the prior odds, and η(x) = p(x|H1)/p(x|H0) is the likelihood ratio. −1 Considering a Laplacian prior p(y) = λ2 e−λ|y| , we have μ = eλσn − 1 [16], and the parameter λ can be estimated as λ = [0.5(σ 2 − σn2 )]−1/2
(11)
with σ 2 denoting the variance of the noisy image, and σn2 , as defined above. Due to limited space, the reader is referred to [16] for the detailed expression of η(x) in (10). The proposed diffusivity function,(10), has no free parameters to optimize, and it fits well in the cluster of the reference backward-forward diffusivities. Indeed, for the considered PET sinograms, the noise standard deviation, σn ,in 11 is being estimated as σn2 = V ar(uLn ) where the image noise uLn is reconstructed from the two finest resolution levels coefficients by applying the wavelet decomposition of u, using daubechi(4) function. 3.2
Probabilistic Constraint Curvature Motion
For the probabilistic constraint curvature motion (PCCM), we start from a constraint version of mean curvature motion: the diffusion across the level lines is prohibited whilst the diffusion along the level-lines is controlled via the probabilistic diffusivity function (10): ut = gpr (|∇u|)uvv = gpr (|∇u|)k|∇u|
(12)
. Thus the function g1 in (6) is chosen to be g1 = gpr , for dealing with Poisson noise. This filter effectively smooths the image, as well as preserves edges of the important features such as lines, curve and flow-like structures. By its nature, the PCCM cannot enhance the weak edges and/or features in the sinogram. The second term in (6) allows the sharpening. Consequently, we . set g2 = g1 . In this way, weak but important edges are enhanced whilst the noise is removed efficiently. Formally, the enhanced PCCM (ePCCM) is given by: ut = gpr (|∇u|)uvv + gpr (|∇u|)uww
3.3
(13)
Probabilistic Self Snakes (PSS)
It can be demonstrated that the diffusion of scalar images via EA-VCD can be decomposed into (5), moreover, it can be rewritten as [7, 23]: ut = g(|∇u|)uvv + [g(|∇u|) + g (|∇u|)|∇u|] uww
(14)
A Curvature Motion Filter for PET
217
consolidating the properties of both the self-snakes and the EA-VCD into a single diffusion schema. Considering equation (6), and the proposed probabilistic . . diffusivity function, we have g1 (x) = gpr (x), and the sharpening term, g2 (x) = gpr (x) + xgpr (x). This filter proves to be very effective and flexible for the sinogram image where the high contrast regions, which represent a tumor in the reconstructed PET, should be smoothed wisely without blurring the poor edges. Like EA-VCDT, the main advantage of this filter is that the average gray value of the image is not altered during the diffusion process which is a significant issue in the sinogram.
4 4.1
Experiments and Discussion Introduction
The goal of the conducted experiments consists of measuring the performance of the proposed filtering methods, and studying their influence on the two commonly used PET reconstruction methods. The filtered back projection (FBP) [11], and the iterative ordered subset expectation maximization (OSEM) [8], respectively. The FBP algorithm is computationally efficient while the OSEM algorithm can incorporate easily prior information on the image to improve image quality. In our experiments, the reconstruction parameters of these algorithms are set as follows: the hamming parameter in the FBP method is 0.5, while for the OSEM, we use 16 subsets and run it for 4 iterations. In the following the reconstructed PET images are denoted by UF (t) = FBP(ut ), and UO (t) = OSEM(ut ), respectively, for a given enhanced sinogam ut . A simulated thorax PET phantom, containing three regions of interest (tumors) was constructed. To which varying levels of Poisson noise have been added and used for the evaluation. 50 realizations (noisy sinograms) with added noise level of 1x106 coincident events, have been generated. Each sinogram has a mm size of 256x256 pixels and their spacing is 2x2 pixel . Figure.1(a) shows the ideal noise-free sinogram with the PET images obtain via the FBP (Fig.1(c)) and, OSEM (Fig.1(d)) reconstruction. A corresponding noise contaminated realization is shown in Fig.1(b),(e)-(f). The proposed PCCM and PSS diffusion schemes have been assessed and evaluated against resent Total Variation (TV) denoising techniques, namely, the approach of Chambolle [1], denoted here after as TV-C, and the Nesterov [12] algorithm, denoted as TV-N. 4.2
Quantitative Evaluation Measures
Two types of evaluation measures are adopted. The first set stems from measuring the quality of the filtering techniques whilst the second set originates from validating the quality of the PET reconstruction. As ground-truth information, the former uses the noise-free image, whilst the latter needs prior identification of the important areas by a medical professional.
218
M. Alrefaya et al.
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 1. (a) Original simulated sinogram and its reconstructed PET image using (b) FBP (c) OSEM reconstruction. (d) An example of one realization of a noisy sinogram and (e-f) the corresponding reconstructed PET images. The tumors (ROI) are the 3 clearly visible white spots.
Denoising Quality. The idea is to verify the quality of the denoised sinogram, ut , with respect to the noise-free image I. In this work, we adopt the following measures [22]: DQ1: The Peak Signal to Noise Ratio (PSNR) is a statistical measure of error, used to determine the quality of the filtered images. It represents the ratio of a signal power to the noise power corrupting. Obviously, one sees that the higher the PSNR, the better the quality. P SN R(t) = 10log10
Card(Ω) |I(p) − ut (p)|
(15)
p∈Ω
DQ2: The correlation (Cmρ ) between the noise-free and the filtered image. The higher this correlation the better the quality is. Cmρ (t) = ρ [I, ut ]
(16)
DQ3: The calculated variance of the noise (NV) describes the remaining noiselevel. Therefore, it should be as small as possible. NV(t) = V ar (|I − ut |)
(17)
In this work, we are interested in comparing the maximum of each measure for the different filtering approaches. The latter yields the best obtainable result per measure.
A Curvature Motion Filter for PET
219
The Contrast Recovery Curve. For evaluating the filtering on the reconstructed PET images, the filtered data, at discrete scales for the proposed PDE approaches, and regularization parameters values, for the TV approaches, were reconstructed using the FBP and the OSEM approaches. With the reconstructed data sets we determine the contrast recovery curve using a set of region of interests that were identified by a medical professional. In our case, the 3 white spots that represent tumors in Fig.1. This was accomplished by quantifying a Contrast Gain, cg, and coefficients of variations V arcg . We calculate the contrast gain cgi for each realizations i ∈ [1, N = 50] and its overall variance. Let R = {r1 , r2 , . . . , rn } be the set of identified ROIs (n = 3 in our case), and B a representative background tissue area,then: (i) (i) 1 1 cgi (t) = n1 U (p, t) − Card(B) U (p, t) Card(r) r∈R
V arcg (t) =
1 N
N
p∈r
(cgi (t) −
i=1
1 N
N
p∈B
cgj (t))
2
(18)
j=1
where p is a pixel. The V arcg versus cg plot provides a straightforward evaluation method for the contrast-noise tradeoff [3]. The best quality PET reconstruction is situated in the upper, i.e. high contrast gain, left, i.e. high stability, area of the plot. 4.3
Evaluation
A fundamental issue with scale-spaces induced by diffusion processes, as the ones proposed in this paper, is the automatic selection of the most salient scale. For our PET sinogram denoising application, we use an earlier proposed optimal scale selection approach [22], where the maximum correlation method has been adopted: σ[no (t0 )] topt = argmax Cmp (t) = argmax σ[ut ] + σ[no (t)] (19) σ[ut0 ] withno is the so-called outlier noise, which we estimated using the proposed wavelet-based noise estimation. Note that, t0 is the zeroth scale, thus ut0 = f and n(t0 ) represents the initial amount of noise. Figures 2.(a),(d) illustrates the obtained optimal scale using the PSS approach, and and the TV-C results, respectively. The Table 1 lists the quantitative results comparing the different denoising approaches. The best performing filtering method, per measure is, displayed in bold. As it can be seen, the best performing filtering is achieved when using the PSS. Furthermore, we notice that for all the used measures, the proposed diffusion methods outperform the considered TV-based filters. Figure 3 depicts the Contrast Recovery Curves for the investigated filtering methods. Recall that, the best enhancement is obtained when the contrast gain is as high as possible whilst the variance over it remains as small as possible. Furthermore, the degree of smoothness of the curve indicates the stability and biasing level.
220
M. Alrefaya et al. Table 1. Denoising quality measures v.s. filtering approaches f PCCM ePCCM PSS TV-C TV-N P SN R(topt ) -17.510 -4.1000 -4.1000 -3.9900 -5.7200 -5.1100 NV(topt ) 7.5100 1.6000 1.6000 1.5500 1.9300 1.8000 Cmρ (topt ) 0.8500 0.9915 0.9916 0.9917 0.9876 0.9890
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 2. Enhanced sinogram and reconstructed PET images. PSS-based approach first row, TV-based approach second row. 1.5
Contrast Gain
1.4
1.3
1.2
PCCM + FBP ePCCM + FBP PSS + FBP PCCM + OSEM ePCCM + OSEM PSS + OSEM TV−N + FBP TV−C + FBP TV−N + OSEM TV−C + OSEM
1.1
1
0.9
0.8 1.5
Variance x 10 2
2.5
3
−3
3.5
4
4.5
Fig. 3. The contrast-noise curve. The best performance occurs in the area where the contrast gain is high and its variance is low.
A Curvature Motion Filter for PET
221
We may observe that all investigated methods have a good performance. However, the proposed PSS yields the best performance on the given data set.
5
Conclusions
Experiments show that combining the probabilistic diffusivity function with the curvature motion diffusion produces a powerful nonlinear filtering method that is appropriate for PET sinograms. It preserves the boundaries of the curvy shape features and wisely smoothes the regions of interest as well as the other regions. Our findings show that the PCCM method smoothes the PET images and keeps the boundaries of the important features, while the weak edges in some cases are vanished. On the other hand, the ePCCM method overcome this problem and the contrast recovered better in the ROIs by the enhancing term in the filter. This filter gives a well smoothed image and preserves the edges, and gains the advantage of the curvature motion diffusion and the shock filter. The PSS approach deal better with the problem of the poor and discontinuity of edges which is common in the PET images. Using the probabilistic diffusion function has proven to be an effective and suitable tool for controlling the diffusion process in the proposed scheme. The results, as shown in the contrast-noise curves, demonstrate that this function has a great capability to detect and enhance the important features edge’s in the high noisy sinogram images. Moreover, The proposed diffusivity function has no free parameters to optimize. All parameters are image-based, and are automatically estimated and proved to give the the best results.
References 1. Chambolle, A.: An Algorithm for Total Variation Minimization and Applications. JMIV 20, 89–97 (2004) 2. Chan, T., Li, H., Lysaker, M., Tai, X.C.: Level Set Method for Positron Emission Tomography. International Journal of Biomedical Imaging 2007 (2007) 3. Comtat, C., Kinahan, P.E., Fessler, J., Beyer, T., Townsend, D.W., Defrise, M., Michel, C.: Clinically feasible reconstruction of 3D whole-body PET/CT data using blurred anatomical labels. Phys. Med. Biol. 47, 1–20 (2002) 4. Demirkaya, O.: Diffusion Filtering of Functional Images using the structural information available in Hyprid imaging modalities. In: IEEE Medical Imaging Symposium, Germany (2008) 5. Demirkaya, O.: Post-reconstruction filtering of positron emission tomography whole-body emission images and attenuation maps using nonlinear diffusion filtering. Acad. Radiol. 11, 1105–1114 (2004) 6. Demirkaya, O.: Anisotropic diffusion filtering of PET attenuation data to improve emission images. Physics in Medicine Biology 47(20), 271–278 (2002) 7. Didas, S., Weickert, J.: Combining Curvature Motion and Edge-Preserving Denoising. In: Sgallari, F., Murli, A., Paragios, N. (eds.) SSVM 2007. LNCS, vol. 4485, pp. 568–579. Springer, Heidelberg (2007)
222
M. Alrefaya et al.
8. Hudson, M., Larkin, R.: Accelerated image reconstruction using ordered subsets of projection data. IEEE Trans. Med. Imag. 13(4), 601–609 (1994) 9. Happonen, A.P., Koskinen, M.O.: Experimental Investigation of Angular Stackgram Filtering for Noise Reduction of SPECT Projection Data: Study with Linear and Nonlinear Filters. International Journal of Biomedical Imaging 2007 (2007) 10. Jonsson, E., Huang, S.C., Chan, T.: Total Variation Regularization in Positron Emission Tomography. UCLA, Tech. Rep. no. 48 (1998) 11. Kak, C.A., Slaney, M.: Principles of Computerized Tomographic Imaging. IEEE Press, Los Alamitos (1999) 12. Nesterov, Y.: Smooth minimization of non-smooth functions. Mathematic Programming, Series A 103, 127–152 (2005) 13. Osher, S., Sethian, J.A.: Fronts propagating with curvature-dependent speed: algorithms based on Hamilton Jacobi formulations. Journal of Computational Physics 79(1), 12–49 (1988) 14. Padfield, D.R., Manjeshwar, R.: Adaptive conductance filtering for spatially varying noise in PET images. Progress in biomedical optics and imaging 7(3) no. 30 (2006) 15. Perona, P., Malik, J.: Scale space and edge detection using anisotropic diffusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 12, 629–639 (1990) 16. Pizurica, A., Vanhamel, I., Sahli, H., Philips, W., Katartzis, A.: A Bayesian formulation of edge-stopping functions in non-linear diffusion. IEEE Signal Processing Letters 13(8), 501–504 (2006) 17. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 18. Rudin, L.I., Osher, S.: Feature-oriented image enhancement with shock filters. Technical Report, Department of Computer Science, California Institute of Technology (1989) 19. Sapiro, G.: Geometric partial differential equations and image analysis. University Press, Cambridge (2001) 20. Sumengen, B., Manjunath, B.S.: Edgeflow-driven Variational Image Segmentation: Theory and Performance Evaluation. Technical Report, Department of Electrical and Computer Engineering University of California, Santa Barbara (2006) 21. Turkheimer, F.E., Boussion, N., Anderson, A.N., Pavese, N., Piccini, P., Visvikis, D.: PET Image Denoising Using a Synergistic Multiresolution Analysis of Structural (MRI/CT) and Functional Datasets. The Journal of nuclear medicine 49, 657–666 (2008) 22. Vanhamel, I., Mihai, C., Sahli, H., Katartzis, A., Pratikakis, I.: Scale Selection for Compact Scale-Space Representation of Vector-Valued Images. International Journal of Computer Vision 4485 (2008) 23. Vanhamel, I.: Vector valued nonlinear diffusion and its application to image segmentation Ph.D. Thesis, Vrije Universiteit Brussel, Faculty of Engineering Sciences, Electronics and Informatics (ETRO) (2006) 24. Wang, Y., Zhou, H.: Total Variation Wavelet-Based Medical Image Denoising. International Journal of Biomedical Imaging 2006 (2006) 25. Wang, W.: Anisotropic Diffusion Filtering for Reconstruction of Poisson Noisy Sinograms. Journal of Communication and Computer 2(11), 16–23 (2005) 26. Weickert, J.: Anisotropic diffusion in image processing. ECMI Series. TeubnerVerlag, Stuttgart (1998)
A Curvature Motion Filter for PET
223
27. Weiss, P., Aubert, G., Blanc-Fraud, L.: Efficient schemes for total variation minimization under constraints in image processing. Technical Report 6260, INRIA (2007) 28. Zhu, H., Shu, H., Zhou, J., Bao, X., Luo, L.: Bayesian algorithms for PET image reconstruction with mean curvature and Gauss curvature diffusion regularizations. Computers in Biology and Medicine 37(6), 793–804 (2007) 29. Zhu, H., Shu, H., Zhou, J., Toumoulin, C., Luo, L.: Image reconstruction for positron emission tomography using fuzzy nonlinear anisotropic diffusion penalty. Med. Biol. Eng. Comput. 44(11), 983–997 (2006)
Finsler Geometry on Higher Order Tensor Fields and Applications to High Angular Resolution Diffusion Imaging Laura Astola and Luc Florack Department of mathematics and computer science, Eindhoven University of Technology, PO Box 513, NL-5600 MB Eindhoven, The Netherlands [email protected]
Abstract. We study three dimensional volumes of higher order tensors, using Finsler geometry. The application considered here is in medical image analysis, specifically High Angular Resolution Diffusion Imaging (HARDI) [1] of the brain. We want to find robust ways to reveal the architecture of the neural fibers in brain white matter. In Diffusion Tensor Imaging (DTI), the diffusion of water is modeled with a symmetric positive definite second order tensor, based on the assumption that there exists one dominant direction of fibers restricting the thermal motion of water molecules, leading naturally to a Riemannian framework. HARDI may potentially overcome the shortcomings of DTI by allowing multiple relevant directions, but invalidates the Riemannian approach. Instead Finsler geometry provides the natural geometric generalization appropriate for multi-fiber analysis. In this paper we provide the exact criterion to determine whether a field of spherical functions has a Finsler structure. We also show a fiber tracking method in Finsler setting. Our model also incorporates a scale parameter, which is beneficial in view of the noisy nature of the data. We demonstrate our methods on analytic as well as real HARDI data.
1
Introduction
High Angular Resolution Diffusion Imaging (HARDI) is a non-invasive medical imaging modality that measures the attenuation of directional MRI (Magnetic Resonance Imaging) signal due to the diffusion of water molecules. Diffusion weighted measurements are taken in several directions, typically ranging from 50 to 130 (equidistant) angular directions. It is assumed that this diffusion of water molecules reveals relevant information of the underlying tissue architecture. The so-called apparent diffusion coefficient, D(g), is computed from the StejskalTanner [2] formula S(g) = exp(−bD(g)), (1) S0
The Netherlands Organisation for Scientific Research (NWO) is gratefully acknowledged for financial support.
X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 224–234, 2009. c Springer-Verlag Berlin Heidelberg 2009
Finsler Geometry on HOT Fields
225
where S(g) is the signal associated with gradient direction g, S0 the signal obtained when no diffusion gradient is applied, and b is a parameter associated with the imaging protocol. In the Diffusion Tensor Imaging framework, (1) is interpreted as S(g) = exp(−bg T Dg) , S0
(2)
with the 3 × 3 two-tensor D describing the probability of directional diffusivity at each voxel. A natural way to do geometric analysis on the image, is to use the inverse of the diffusion tensor D as the Riemann metric tensor [3]. This approach has been exploited to some extent in the DTI literature [4], [5], [6], [7]. Since HARDI data typically contains more directional measurements than the traditional DTI, we study it as a metric space, but using a more refined model for directional information than can be accounted for by using only the local position dependent inner product i.e. Riemannian metric. Higher order tensor representations [8], [9], [10], [11] of HARDI data are well suited to differential geometric methods. We mention that Finsler geometry has already been introduced in HARDI setting. In the work of Melonakos et al. [12] the homogeneity condition is forced by normalizing the parameter-vectors, but we take a different approach, using higher order monomial tensors and an ODE-based fiber tracking method. This paper is organized as follows. In section 2, we give a very short introduction to Finsler geometry and in section 3, we show that indeed HARDI measurements can be modeled with a Finsler-structure and give the specific condition which ensures this. In section 4 we discuss how to switch back and forth between iterative polynomial tensor fitting, that allows Laplace-Beltrami smoothing, and a monomial tensor fitting convenient for constructing a Finsler-norm. In section 5 we show some results of fiber-tracking based on the local Finsler metric and demonstrate it on an analytical example as well as on a real HARDI data of a rat brain scan. In the appendix we will give the details of the construction of the strong convexity criterion.
2
Finsler Geometry
In a perfectly homogeneous and isotropic medium, geometry is Euclidean, and shortest paths are straight lines. In an inhomogeneous space, geometry is Riemannian and the shortest paths are geodesics. If a medium is not only inhomogeneous, but also anisotropic1 , i.e. has innate directional structure, the appropriate geometry is Finslerian [13] [14] and the shortest paths are correspondingly Finsler-geodesics. As a consequence the metric tensor depends on both, position and direction. This is also a natural model for high angular resolution diffusion images. 1
We will call a medium isotropic if it is endowed with a direction independent inner product, or Riemannian metric. In the literature such a medium is also often referred to as anisotropic due to the directional bias of the metric itself.
226
L. Astola and L. Florack
Definition 1. We denote the bundle of tangent spaces T(x,y) M (y = 0) as T M \ {0}. A Finsler norm is a function F : TM → [0, ∞) that satisfies each of the following criteria: 1. Differentiability: F is C ∞ on the tangent bundle T M \ {0}. 2. Homogeneity: F (x, λy) = λF (x, y). 3. Strong convexity: The Hessian matrix, with components gij (x, y) =
1 ∂ 2 F 2 (x, y) , 2 ∂y i ∂y j
(3)
is positive definite at every point (x, y) of T M \ {0}.
3
Finsler Norm on HARDI Higher Order Tensor Fields
We want to show that higher order tensors, such as those fitted to HARDI data, do define a Finsler norm, which can be used in the analysis of this data. We take as a point of departure a given orientation distribution function (ODF), which if normalized, is a probability density function on the sphere and which can be computed from the data by using one of the methods described in the literature [15], [16], [17], [18], [19]. It models the probability that a given direction corresponds to a direction of a fiber. We use the heuristics that a high probability of finding a fiber in direction y corresponds to a larger diffusivity and at the same time to a shorter travel time from the diffusing particle point of view. Just as in the Riemannian framework, we can actually take our metric tensor to be the inverse of a local (y-dependent) two-tensor. We use the Einstein summation convention ai bi = i ai bi , and put y = (y1 , y2 , y2 ) = (sin θ cos ϕ, sin θ sin ϕ, cos θ) ,
(4)
thus y denotes a unit vector while y = ||y||y is a general vector in R3 . We denote the higher order spherical tensor (a homogeneous polynomial restricted to sphere) approximating the ODF as D. As an example, we show how a field of sixth order tensors D(x) defines a Finsler norm. This can be extended directly to all even order tensors. We put 1/6 F (x, y) = Dijklmn (x)y i y j y k y l y m y n . (5) In the following, we verify the defining criteria stated in Definition 1. 1. Differentiability: The tensor field D(x) is constructed by fitting a tensor to the set of angular samples at each voxel, using a least squares method. The data set with fixed angle is continuous in x by linear interpolation between the sample points and differentiable w.r.t. x using Gaussian derivatives. Therefore the tensor field itself is differentiable in x, and because D is always positive, differentiability of F w.r.t. x follows. The differentiability of F in y is obvious from Eq. (5).
Finsler Geometry on HOT Fields
227
Fig. 1. Left:A fourth order spherical harmonic (or tensor), representing the (not convexified) norm function and 3 ellipsoids illustrating the metric tensors corresponding to the 3 vectors with same color. Right: Similarly a sixth order spherical harmonic function with 3 metric tensors.
2. Homogeneity: Indeed for any α ∈ R+ , x ∈ M , v ∈ Tx M : 1/6 = αF (x, v) . F (x, αv) = Dijklmn (x)αv i αv j αv k αv l αv m αv n
(6)
3. Strong convexity: We now state a strong convexity criterion for a general Finsler norm in R3 , by analogy to the R2 -criterion by Bao et al [13]. We have put the derivation of the condition into the appendix, and merely state the result here. We consider the so-called indicatrix of the norm function F at any fixed x, which is the set {g | g : (θ, ϕ) → R3 , F (g) = 1}. In our case the indicatrix is the ODF, which can be easily seen from the homogeneity condition 2. in Definition 1. F (y(θ, ϕ)) =
1 =⇒ F (ODF (θ, ϕ) · y(θ, ϕ)) = 1 . ODF (θ, ϕ) 2
∂ ∂ We denote g˙ θ := ∂θ (g), g¨θ := ∂θ 2 (g) and similarly for ϕ. We define the following three matrices: ⎛ 1 2 3⎞ ⎛ 1 2 3⎞ ⎛ 1 2 3⎞ g¨ϕ g¨ϕ g¨ϕ g g g g¨θ g¨θ g¨θ m = ⎝ g˙ θ1 g˙ θ2 g˙ θ3 ⎠ , mθ = ⎝ g˙ θ1 g˙ θ2 g˙ θ3 ⎠ , mϕ = ⎝ g˙ θ1 g˙ θ2 g˙ θ3 ⎠ . (7) g˙ ϕ1 g˙ ϕ2 g˙ ϕ3 g˙ ϕ1 g˙ ϕ2 g˙ ϕ3 g˙ ϕ1 g˙ ϕ2 g˙ ϕ3
Then the strong convexity requires: (gij y˙ θi y˙ ϕj )2 det(mϕ ) det(mθ ) > 0 , and > . det(m) det(m) gij y˙ θi y˙ θj
(8)
Since we use linear interpolation between tensors, we only need to check the condition at original data-points. This condition is always met in our ODF-data, and we expect it to hold quite generally. The goal of this section was to define a Finsler-structure and in particular a Finsler metric tensor gij (x, y) corresponding to a given tensorial ODF. Indeed
228
L. Astola and L. Florack
in case the ODF is a symmetric tensor of order two, this metric tensor is equivalent to the Riemann metric tensor. Following our Finsler approach, instead of one metric tensor per voxel we obtain a bundle of metric tensors at any x. For illustration, see Fig.1.
4
Transforming a Polynomial Tensor to a Monomial Tensor
Assume we wish to apply Laplace-Beltrami smoothing to our spherical data, by which we obtain a field of spherical functions at any desired scale, and that we wish to use a tensorial representation of the data instead of spherical harmonics. As is shown in [10], this smoothing is easy to do, using iterative polynomial tensor fitting. The point here is that for Finsler analysis, we would rather work with a tensor representation of monomial form D(y) = Di1 ···in yi1 · · · yin ,
(9)
than with the equivalent polynomial expression ˜ D(y) =
n
˜ i1 ···i yi1 · · · yik , D k
(10)
k=0
but still exploit the convenient (co-domain) scale space representation of the latter: n ˜ i1 ···i yi1 · · · yik . ˜ τ) = e−τ k(k+1) D (11) D(y, k k=0
This poses no problem, since we can rather easily transform the polynomial expression to a monomial one, using the fact that our polynomials are restricted to the sphere (eq. (4)), thus we may expand a lower order tensor to a sparse higher order one and symmetrize it. We can also always transform the monomial expression to polynomial sum of irreducible monomial tensors using Clebschprojection [20].
5
Fiber Tracking in HARDI Data Using Finsler Geometry
In DTI setting the most straightforward way of tracking fibers is to follow the principal eigenvector corresponding to the largest eigenvalue of the diffusion tensor until some stopping criterion. This method cannot reveal crossings and only provides a single direction (if at all) per voxel. Instead computing the shortest paths according to the diffusion-induced Riemann metric tensor, we could expect these to be the candidates for real fibers [5]. Of course, most of the shortest paths (geodesics) are not representing actual fibers, and therefore we should extract the potential neural fibers from arbitrary geodesics based on their connectivity [6]. We show some results of solving well-connected geodesics in an analytic as well as in a real rat brain data.
Finsler Geometry on HOT Fields
5.1
229
Analytic Tensor Field
We treat an analytic norm field in R2 , but the situation can be directly extended to R3 . Let us take as a convex norm function at each spatial position 1 1 (12) F (ϕ) = (cos 4ϕ + 4) 4 = 5 cos4 ϕ + 2 cos2 ϕ sin2 ϕ + 5 sin4 ϕ 4 . This is an example of fourth order tensor on unit vectors. Such a tensor field could represent an infinitely dense field of orthogonally crossing fibers. From the fact that F has no x-dependence we conclude that the geodesic coefficients vanish and that the geodesics coincide with the Euclidean geodesics γ(t) = (t · cos ϕ, t · sin ϕ), i.e. straight lines. However the so-called connectivity of a geodesic [6], [21] is relatively large, only in cases, where the directional norm function is correspondingly small. In Finsler setting the connectivity measure m(γ) is:
ηij γ˙ i γ˙ j dt m(γ) = , (13) gij (γ, γ) ˙ γ˙ i γ˙ j dt where the ηij (γ) represents the covariant Euclidean metric tensor which in Cartesian coordinates reduces to the constant identity matrix, γ˙ the tangent to the curve γ and gij (γ, γ) ˙ the Finsler-metric tensor (which depends not only on the position on the curve but also on the tangent of the curve). For illustration we compute explicitly the metric tensors, using Cartesian coordinates:
1 g11 g12 gij = , (14) (5 cos ϕ4 + 2 cos ϕ2 sin ϕ2 + 5 sin ϕ4 )3/2 g21 g22 where g11 = 5(5 cos ϕ6 + 3 cos ϕ4 sin ϕ2 + 15 cos ϕ2 sin ϕ4 + sin ϕ6 ) g12 = g21 = −48 cos ϕ3 sin ϕ3 g22 = 5(cos ϕ6 + 15 cos ϕ4 sin ϕ2 + 3 cos ϕ2 sin ϕ4 + 5 sin ϕ6 ) 1 2
1 2
g˙ g ¨ 2 The strong convexity criterion gg¨˙ 1 gg˙ 2 − −g1 g˙ 2 > 0 in R [13] on the indicatrix g(ϕ), for metric (14) is satisfied for every ϕ, since
13 − 8 cos 4ϕ g¨1 g˙ 2 − g˙ 1 g¨2 = >0. 1 2 1 2 g˙ g − g g˙ (4 + cos 4ϕ)2
(15)
The connectivity measure for a (Euclidean) geodesic γ can be computed analytically:
dt , (16) m(γ) =
(4 + cos(4ϕ))1/4 dt 5π 7π which gives the maximal connectivities in directions { π4 , 3π 4 , 4 , 4 }, as expected. See Fig. 2 for an illustration. We observe that on such a norm field the Riemannian (DTI) framework would result in Euclidean geodesics and constant connectivity over all geodesics thus revealing no information at all of the angular heterogeneity.
230
L. Astola and L. Florack
20
3
2
10 1
0 3
2
1
1
2
3
1
10
2
20 20
10
0
10
20
3
Fig. 2. Left:A field of fourth order spherical harmonics as in the norm function eq. (12) representing dense crossings and some well connected geodesics, colored in red. Right: 200 equiangular metric tensors of the same norm function, and an ellipse with light blue color corresponding to the metric in direction ϕ = π4 .
Fig. 3. Left:Finsler geodesics emanating from a voxel, and the most connective ones in red. Right: Fibers through same neighborhood in the traditional DTI principal eigenvector tracking.
5.2
Real Rat Brain Data
The Subthalamic Nucleus is a small area in the brain, that is involved in physiopathology of Parkinson’s disease [22]. We computed the Finsler geodesics and their connectivities, having an initial point in several central voxels in the Subthalamic Nucleus. These voxels were located based on comparison to an atlas of rat brain [23]. We tracked Finsler geodesics using the standard equation (ODEformulation) [14](p.78) and second order Taylor approximation, with initial directions as the 49 measurement directions, stepsize 0.2 voxel size and for 10 steps. Then we selected those 30% of all geodesics that have the best connectivity. Compared to the traditional DTI-tracking, we found that one of the main
Finsler Geometry on HOT Fields
231
directions with strong connectivity typically coincide with the DTI-fibers, but we also found other potential fiber directions. For illustration see Fig. 3.
6
Conclusions and Future Work
We have seen that it is indeed possible to analyze spherical tensor fields using Finsler geometry. It gives new methods to work with the data and also has the potential to give new information on the data. Finsler geodesics and Finsler curvatures are examples of geometric measures that can be applied on HARDI fiber-analysis, and which will be a subject of extensive future work.
Acknowledgement The rat brain data acquired for a study [24], was kindly provided by Ellen Brunenberg.
References 1. Tuch, D., Reese, T., Wiegell, M., Makris, N., Belliveau, J., van Wedeen, J.: High angular resolution diffusion imaging reveals intravoxel white matter fiber heterogeneity. Magnetic Resonance in Medicine 48(6), 1358–1372 (2002) 2. Stejskal, E., Tanner, J.: Spin diffusion measurements: Spin echoes ion the presence of a time-dependent field gradient. The Journal of Chemical Physics 42(1), 288–292 (1965) 3. Cohen de Lara, M.: Geometric and symmetry properties of a nondegenerate diffusion process. The Annals of Probability 23(4), 1557–1604 (1995) 4. O’Donnell, L., Haker, S., Westin, C.F.: New approaches to estimation of white matter connectivity in diffusion tensor MRI: Elliptic PDEs and geodesics in a tensorwarped space. In: Dohi, T., Kikinis, R. (eds.) MICCAI 2002. LNCS, vol. 2488, pp. 459–466. Springer, Heidelberg (2002) 5. Lenglet, C., Deriche, R., Faugeras, O.: Inferring white matter geometry from diffusion tensor MRI: Application to connectivity mapping. In: Pajdla, T., Matas, J.G. (eds.) ECCV 2004. LNCS, vol. 3024, pp. 127–140. Springer, Heidelberg (2004) 6. Astola, L., Florack, L., ter Haar Romeny, B.: Measures for pathway analysis in brain white matter using diffusion tensor images. In: Karssemeijer, N., Lelieveldt, B. (eds.) IPMI 2007. LNCS, vol. 4584, pp. 642–649. Springer, Heidelberg (2007) 7. Astola, L., Florack, L.: Sticky vector fields and other geometric measures on diffusion tensor images. In: MMBIA 2008, IEEE Computer Society Workshop on Mathematical Methods in Biomedical Image Analysis, held in conjunction with CVPR 2008, Anchorage, Alaska, The United States. CVPR, vol. 20, pp. 1–7. Springer, Heidelberg (2008) 8. Özarslan, E., Mareci, T.: Generalized diffusion tensor imaging and analytical relationships between diffusion tensor imaging and high angular resolution diffusion imaging. Magnetic resonance in Medicine 50, 955–965 (2003) 9. Barmpoutis, A., Jian, B., Vemuri, B., Shepherd, T.: Symmetric positive 4th order tensors and their estimation from diffusion weighted MRI. In: Karssemeijer, N., Lelieveldt, B. (eds.) IPMI 2007. LNCS, vol. 4584, pp. 308–319. Springer, Heidelberg (2007)
232
L. Astola and L. Florack
10. Florack, L., Balmashnova, E.: Decomposition of high angular resolution diffusion images into a sum of self-similar polynomials on the sphere. In: Proceedings of the Eighteenth International Conference on Computer Graphics and Vision, GraphiCon 2008, Moscow, Russian Federation, June 2008, pp. 26–31 (2008) (invited paper) 11. Florack, L., Balmashnova, E.: Two canonical representations for regularized high angular resolution diffusion imaging. In: MICCAI Workshop on Computational Diffusion MRI, New York, USA, September 10, 2008, pp. 94–105 (2008) 12. Melonakos, J., Pichon, E., Angenent, S., Tannenbaum, A.: Finsler active contours. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(3), 412–423 (2008) 13. Bao, D., Chern, S.S., Shen, Z.: An Introduction to Riemann-Finsler Geometry. Springer, Heidelberg (2000) 14. Shen, Z.: Lectures on Finsler Geometry. World Scientific, Singapore (2001) 15. Tuch, D.: Q-ball imaging. Magnetic Resonance in Medicine 52(4), 577–582 (2002) 16. Jansons, K., Alexander, D.: Persistent angular structure: New insights from diffusion magnetic resonance imaging data. Inverse Problems 19, 1031–1046 (2003) 17. Özarslan, E., Shepherd, T., Vemuri, B., Blackband, S., Mareci, T.: Resolution of complex tissue microarchitecture using the diffusion orientation transform. NeuroImage 31, 1086–1103 (2006) 18. Jian, B., Vemuri, B., Özarslan, E., Carney, P., Mareci, T.: A novel tensor distribution model for the diffusion-weighted MR signal. NeuroImage 37, 164–176 (2007) 19. Descoteaux, M., Angelino, E., Fitzgibbons, S., Deriche, R.: Regularized, fast and robust analytical q-ball imaging. Magnetic Resonance in Medicine 58(3), 497–510 (2006) 20. Müller, C. (ed.): Analysis of Spherical Symmetries in Euclidean Spaces. Applied Mathematical Sciences, vol. 129. Springer, New York (1998) 21. Prados, E., Soatto, S., Lenglet, C., Pons, J.P., Wotawa, N., Deriche, R., Faugeras, O.: Control Theory and Fast Marching Techniques for Brain Connectivity Mapping. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, New York, USA, vol. 1, pp. 1076–1083. IEEE Computer Society Press, Los Alamitos (2006) 22. Hamani, C., Saint-Cyr, J., Fraser, J., Kaplitt, M., Lozano, A.: The subthalamic nucleus in the context of movement disorders. Brain, a Journal of Neurology 127, 4–20 (2004) 23. Paxinos, G., Watson, C.: The Rat Brain In Stereotaxic Coordinates. Academic Press, San Diego (1998) 24. Brunenberg, E., Prckovska, V., Platel, B., Strijkers, G., ter Haar Romeny, B.M.: Untangling a fiber bundle knot: Preliminary results on STN connectivity using DTI and HARDI on rat brains. In: Proceedings of the 17th Meeting of the International Society for Magnetic Resonance in Medicine (ISMRM), Honolulu, Hawaii (2009)
Finsler Geometry on HOT Fields
233
Appendix We seek the general condition for gij (y)v i v j > 0 ,
(17)
to be valid in R3 (= Tx M ). From the homogeneity of the norm function F , it follows that it is sufficient to have this condition on the unit level set of the norm. We consider this level surface i.e. the set of vectors y for which F (y) = 1 and a parametrization y(θ, ϕ) = (y 1 (θ, ϕ), y 2 (θ, ϕ), y 3 (θ, ϕ)). In what follows we abbreviate gij = gij (x, y). From F (y) = 1 we have gij y i y j = 1 .
(18)
Taking derivatives of both sides and using a consequence of Euler’s theorem for homogeneous functions ( [13] p.5) that says ∂gij k y =0, ∂y k
(19)
we obtain gij y˙ θi y j = 0
(20)
gij y˙ ϕi y j = 0 , implying y˙ θ ⊥g y and y˙ ϕ ⊥g y. Taking derivatives once more, we get gij y¨θi y j = −gij y˙ θi y˙ θj gij y¨ϕi y j = −gij y˙ ϕi y˙ ϕj i gij y¨θϕ yj
=
−gij y˙ θi y˙ ϕj
(21) .
We may express an arbitrary vector v as a linear combination of orthogonal basis vectors:
y˙ ϕ , y˙ θ v = αy + β y˙ θ + γ y˙ ϕ − y˙ θ . (22) y˙ θ , y˙ θ We substitute this expression for v to the left hand side of (17) and obtain: (gij y˙ θi y˙ ϕj )2 i j 2 i j 2 i j 2 i j , (23) gij v v = α gij y y + β gij y¨θ y + γ gij y¨ϕ y − gij y˙ θi y˙ θj because the mixed terms vanish due to the orthogonality of basis vectors. On the other hand, for y’s on the indicatrix we have as a consequence of ∂F Euler’s theorem on homogeneous functions (denoting Fyi = ∂y i ): Fyi y i = F (y) = 1 .
(24)
234
L. Astola and L. Florack
Differentiating eq. (24) w.r.t. θ and ϕ, we obtain two equations: Fyi y˙ θi = 0
(25)
Fyi y˙ ϕi
(26)
=0,
for F is a homogeneous function. The matrices m, mθ , mϕ are as defined in eq. (7). Solving system of equations (24), (25) and (25) we get: Fy1 =
y˙ ϕ2 y˙ θ3 − y˙ ϕ3 y˙ θ2 y˙ ϕ3 y˙ θ1 − y˙ ϕ1 y˙ θ3 y˙ ϕ1 y˙ θ2 − y˙ ϕ2 y˙ θ1 , Fy2 = , Fy3 = . det(m) det(m) det(m)
(27)
Now using equalities gij y¨i θ y j = Fyk y¨θk ,
Fyi = gij y j , and
det(mθ ) , det(m)
−gij y¨θi y j =
gij y¨i ϕ y j = Fyk y¨ϕk ,
(28)
det(mϕ ) det(m)
(29)
−gij y¨ϕi y j =
we obtain
2
gij v v = α − β i j
2
gij y¨θi y j
−γ
if det(mθ ) >0 det(m)
and
2
gij y¨ϕi y j
−
(gij y˙ θi y˙ ϕj )2
gij y˙ θi y˙ θj
(gij y˙ θi y˙ ϕj )2 det(mϕ ) > . det(m) gij y˙ θi y˙ θj
>0
(30)
(31)
Bregman-EM-TV Methods with Application to Optical Nanoscopy Christoph Brune, Alex Sawatzky, and Martin Burger Westfälische Wilhelms-Universität Münster, Institut für Numerische und Angewandte Mathematik, Einsteinstr. 62, D-48149 Münster, Germany {christoph.brune,alex.sawatzky,martin.burger}@wwu.de http://imaging.uni-muenster.de Abstract. Measurements in nanoscopic imaging suffer from blurring effects concerning different point spread functions (PSF). Some apparatus even have PSFs that are locally dependent on phase shifts. Additionally, raw data are affected by Poisson noise resulting from laser sampling and "photon counts" in fluorescence microscopy. In these applications standard reconstruction methods (EM, filtered backprojection) deliver unsatisfactory and noisy results. Starting from a statistical modeling in terms of a MAP likelihood estimation we combine the iterative EM algorithm with TV regularization techniques to make an efficient use of a-priori information. Typically, TV-based methods deliver reconstructed cartoon-images suffering from contrast reduction. We propose an extension to EM-TV, based on Bregman iterations and inverse scale space methods, in order to obtain improved imaging results by simultaneous contrast enhancement. We illustrate our techniques by synthetic and experimental biological data.
1
Introduction
Image reconstruction is a fundamental problem in many fields of applied sciences, e.g. nanoscopic imaging, medical imaging or astronomy. Fluorescence microscopy for example is an important imaging technique for the investigation of biological (live-) cells, up to nano-scale. In this case image reconstruction arises in form of deconvolution problems. Undesirable blurring effects can be ascribed to diffraction of light. Mathematically, image reconstruction in such applications can often be formulated as the solution of a linear inverse and ill-posed problem. The task consists of computing an estimation of an unknown object from given measurements. Typically these problems deal with Fredholm integral equations of the first kind ¯ , f¯ = Ku
(1)
¯ is a compact operator, f¯ (exact) data and u the desired image. In the where K ¯ is a convolution operator case of nanoscopic imaging K ¯ (Ku)(x) = (k ∗ u)(x) = k(x − y)u(y)dy , Ω X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 235–246, 2009. c Springer-Verlag Berlin Heidelberg 2009
236
C. Brune, A. Sawatzky, and M. Burger
where k is a convolution kernel, describing the blurring effects created by a ¯ is not suitable, nanoscopic apparatus. Determining u by direct inversion of K since (1) is ill-posed. In such cases regularization techniques are needed to produce reasonable reconstructions. A frequently used way to realize regularization techniques is the Bayesian model, whose aim is the computation of an estimate u of the unknown object by maximizing the a-posteriori probability density p(u|f ) with measurements f . The latter is given according to Bayes formula p(u|f ) ∼ p(f |u) p(u) .
(2)
This approach is called maximum a-posteriori probability (MAP) estimation. If the measurements f are given, we describe the density p(u|f ) as the a-posteriori likelihood function which depends on u only. The Bayesian approach (2) has the advantage that it allows to incorporate additional information about u via the prior probability density p(u) into the reconstruction process. The most frequently used prior densities are Gibbs functions p(u) ∼ e−α R(u) ,
(3)
where α is a positive parameter and R a convex energy. Usual models for the probability density p(f |u) in (2) are Gaussian- or Poisson-distributed raw data f , i.e. 2
p(f |u) ∼ e−Ku−f 2 /(2σ
2
)
,
p(f |u) ∼
(Ku)fi i
fi !
i
e−(Ku)i ,
(4)
¯ In the canonical case of additive where K is a semi-discrete Operator based on K. Gaussian noise (see (4), left) the minimization of the negative log likelihood function (2) leads to classical Tikhonov regularization [1] based on minimizing a functional of the form 1 2 min Ku − f 2 + α R(u) . (5) u≥0 2 The first, so-called data-fidelity term, penalizes the deviation from equality in (1) whereas R is a regularization term as in (3). If we choose K = Id and the total variation (TV) regularization technique R(u) := |u|BV , we obtain the wellknown ROF-model [2] for image denoising. The additional positivity constraint is necessary in typical applications as the unknown represents a density image. In nanoscopic imaging measured data are stochastic and pointwise, more precisely, they are called "photon counts". This property refers to laser scanning techniques in fluorescence microscopy. Consequently, the random variables of measured data are not Gaussian- but Poisson-distributed (see (4), right), with expected value given by equation (1). Hence a MAP estimation via the negative log likelihood function (2) leads to the following variational problem [1] min (Ku − f log Ku) dμ + α R(u) . (6) u≥0
Ω
Bregman-EM-TV Methods
237
Up to additive terms independent of u, the data-fidelity term is the so-called Kullback-Leibler functional (also known as cross entropy or I-divergence) between the two probability measures f and Ku. A particular complication of (6) compared to (5) is the strong nonlinearity in the data fidelity term and resulting issues in the computation of minimizers. In case of K = Id, i.e. in case of Poisson noise removal with total variation regularization, we refer to [3]. In the absence of regularization (α = 0) the EMalgorithm (cf. [4, 5, 6]) has become a standard scheme, which is however difficult to be generalized to regularized cases. Robust solutions of this problem for appropriate models of R is one of the novelties of this paper. The specific choice of the regularization functional R in (6) is important for how a-priori information about the expected solution is incorporated into the reconstruction process. Smooth, in particular quadratic regularizations have attracted most attention in the past, mainly due to the simplicity in analysis and computation. However, such regularization approaches always lead to blurring of the reconstructions, in particular they cannot yield reconstructions with sharp edges. Recently, singular regularization energies, in particular those of 1 or L1 -type, have attracted strong attention. In this work, we introduce an approach which uses total variation (TV) as the regularization functional. TV regularization was derived as a denoising technique in [2] and generalized to various other imaging tasks subsequently. The exact definition of TV [7], used in this paper, is R(u) := |u|BV = sup u divg , (7) g∈C0∞ (Ω,Rd ), ||g||∞ ≤1
Ω
which is formally (true if u is sufficiently regular) |u|BV = Ω |∇u|. The motivation for using TV is the effective suppression of noise and the realization of almost homogeneous regions with sharp edges. These features are attractive for nanoscopic imaging if the goal is to identify object shapes that are separated by sharp edges and shall be analyzed quantitatively. Unfortunately, images reconstructed by methods using TV regularization suffer from loosing contrast. In this paper, we suggest to extend EM-TV by iterative regularization to Bregman-EM-TV, attaining simultaneous contrast enhancement. More precisely, we apply total variation inverse scale space methods by employing the concept of Bregman distance regularization. The latter has been derived in [8] with a detailed analysis for Gaussian-type problems (5) and generalized to time-continuity [9] and Lp -norm data fitting terms [10]. Here, in the case of Poisson-type problems, the method consists in computing a minimizer u1 of (6) with R(u) := |u|BV first. Updates are determined successively by computing l+1 l u = arg min (Ku − f log Ku) dμ + α ( |u|BV − p , u ) , (8) u∈BV (Ω)
Ω
l
where p is an element of the subgradient of the total variation semi norm in ul . Introducing the Bregman distance with respect to | · |BV defined via p˜ (u, u ˜) := |u|BV − |˜ u|BV − ˜ p, u − u ˜ , D|·| BV
p˜ ∈ ∂|˜ u|BV ⊆ BV ∗ (Ω) ,
(9)
238
C. Brune, A. Sawatzky, and M. Burger
where ·, · denotes the duality product, allows to characterize ul+1 in (8) as pl l+1 l u = arg min (Ku − f log Ku) dμ + α D|·|BV (u, u ) . (10) u∈BV (Ω)
Ω
We will see that inverse scale space strategies can noticeably improve reconstructions for inverse problems with Poisson statistics like optical nanoscopy.
2
Reconstruction Methods
In literature there are two types of reconstruction methods that are used in general: analytic (direct) and algebraic (iterative) methods. A classical example for a direct method is the Fourier-based filtered backprojection (FBP). Although FBP is well understood and computationally efficient, iterative type methods obtain more and more attention in the applications mentioned above. The major reason is the high noise level (low SNR) and the type of statistics, which cannot be taken into account by direct methods. Hence, we will give a short review on the Expectation-Maximization (EM) algorithm [4, 11], which is a popular iterative algorithm to maximize the likelihood function p(u|f ) in problems with incomplete data. Then we will proceed to the presentation of the proposed EMTV and Bregman-EM-TV algorithm. 2.1
Reconstruction Method: EM Algorithm
In the absence of prior knowledge any object u has the same relevance, i.e. the Gibbs a-priori density p(u) in (3) is constant. We can then normalize p(u) such that R(u) ≡ 0. Hence (6) reduces to the constrained minimization problem min (Ku − f log Ku) dμ . (11) u≥0
Ω
A suitable iteration scheme for computing stationary points, which also preserves positivity (assuming K preserves positivity), is the so called EM algorithm (cf. [12]) K∗ f , k = 0, 1, . . . . (12) uk+1 = uk ∗ K 1 Kuk For noise-free data f several convergence proofs of the EM algorithm to the maximum likelihood estimate, i.e. the solution of (11), can be found in literature [12,13,14,15]. Besides, it is known that the speed of convergence of iteration (12) is slow. A further property of the iteration is a lack of smoothing, whereby the so-called "checkerboard effect" arises. For noisy data f it is necessary to differentiate between discrete and continuous modeling. In the discrete case, i.e. if K is a matrix and u is a vector the existence of a minimum can be guaranteed since the smallest singular value is bounded by a positive value. Hence, the vectors are bounded during the iteration and convergence is ensured. However, if K is a general continuous operator
Bregman-EM-TV Methods
239
the convergence is not only difficult to prove, but even a divergence of the EM algorithm is possible. Again the reason is the ill-posedness of the integral equation (1), which transfers to problem (11). This aspect can be taken as a lack of additional a-priori knowledge about the unknown u resulting from R(u) = 0. The EM algorithm converges to a minimizer if it exists. Consequently, in the continuous case it is essential to ensure consistence of the given data to prevent divergence of the EM algorithm. As described in [13], the EM iterates show the following typical behavior for ill-posed problems. The (metric) distance between the iterates and the solution decreases initially before it increases as the noise is amplified during the iteration process. This issue might be regulated by using appropriate stopping rules to obtain reasonable results. In [13] it is shown that certain stopping rules indeed allow stable approximations. Ways to improve reconstruction results are TV or Bregman-TV regularization techniques that we will consider in the following section. 2.2
Reconstruction Method: EM-TV Algorithm
The EM or Richardson/Lucy algorithm is currently the standard iterative reconstruction method for deconvolution problems with Poisson noise based on the linear equation (1). However, with the assumption R(u) = 0, no a-priori knowledge about the expected solution is taken into account, i.e. different images have the same a-priori probability. Especially in case of measurements with low SNR the multiplicative fixed point iteration (12) delivers unsatisfactory and noisy results even with early termination. Therefore we propose to integrate nonlinear variational methods into the reconstruction process to make an efficient use of a-priori information and to obtain improved results. An interesting approach to improve the reconstruction is the EM-TV algorithm. In the classical EM algorithm, the negative log likelihood functional (11) is minimized. We modify the functional by adding a weighted TV term [2], min . (13) (Ku − f log Ku) dμ + α|u|BV u∈BV (Ω) u≥0
Ω
This is exactly (6) with TV as regularization functional R. That means images with smaller total variation are preferred in the minimization (have higher prior probability). BV (Ω) is a popular function space in image processing since it can represent discontinuous functions. By minimizing TV the latter are even preferred [16, 17]. Hence, expected reconstructions are cartoon-like images. Obviously, such an approach cannot be used for studying very small structures in an object, but it is perfect for segmenting different cell structures and analyzing them quantitatively. For the solution of (13), we propose a forward-backward splitting algorithm, which can be realized by alternating classical EM steps with almost standard TV minimization steps as encountered in image denoising. The latter is solved by using duality [18] obtaining a robust and efficient algorithm. For designing the proposed alternating algorithm, we consider the first order optimality condition
240
C. Brune, A. Sawatzky, and M. Burger
of (13). Due to the total variation, this variational problem is not differentiable in the usual sense. But the latter is convex since TV is convex and since we can extend the data fidelity term to a Kullback-Leibler functional, cf. [19], without affecting the stationary points. For such problems powerful methods from convex analysis are available, e.g. a generalized derivative called the subdifferential [20], denoted by ∂. This generalized notion of gradients and the Karush-Kuhn-Tucker (KKT) conditions [20, Theorem 2.1.4] yield the existence of a Lagrange multiplier λ ≥ 0 such that ⎧ ⎫ f ⎨ 0 ∈ K ∗1 − K ∗ + α ∂|u|BV − λ ⎬ Ku . (14) ⎩ ⎭ 0 = λu By multiplying (14) with u we can eliminate the Lagrange multiplier and derive the following semi-implicit iteration scheme K∗ f + α ˜ uk pk+1 = 0 (15) uk+1 − uk ∗ K 1 Kuk ˜ := Kα∗ 1 . Interestingly, the second term within with pk+1 ∈ ∂|uk+1 |BV and α this iteration scheme is the EM step in (12). Consequently, method (15) solving variational problem (13), can be realized as a nested two step iteration, ⎧ ⎫ ∗ f ⎨u 1 = u K ⎬ (EM step) k k+ 2 K ∗ 1 Kuk . (16) ⎩ ⎭ uk+1 = uk+ 12 − α ˜ uk pk+1 (TV step) Thus, we alternate an EM step with a TV correction step. The complex second half step from uk+ 12 to uk+1 can be realized by solving the following variational problem, (u − uk+ 12 )2 1 . (17) uk+1 = arg min +α ˜ |u|BV 2 Ω uk u∈BV (Ω) Inspecting the first order optimality condition confirms the equivalence of this minimization with the TV correction step in (16). Problem (17) is just a modified version of the Rudin-Osher-Fatemi (ROF) model, with weight u1k in the fidelity term. This analogy creates the opportunity to carry over efficient numerical schemes known for the ROF-model. For the solution of (17) we use the exact definition of TV (7) with dual variable g and derive an iteration scheme for the quadratic dual problem similar to [18]. The resulting algorithm reads as follows: We initialize the dual variable g 0 with 0 (or the resulting g from the previous TV correction step) and for any n ≥ 0 we compute the update g n+1 =
α uk divg n − uk+ 12 ) g n + τ ∇(˜ 1 + τ |∇(˜ α uk divg n − uk+ 12 )|
,
0 < τ <
1 , 4α ˜ uk
with the constrained damping parameter τ to ensure stability and convergence of the algorithm. For a detailed analytical examination of EM-TV we refer to [21].
Bregman-EM-TV Methods
2.3
241
Extension to Inverse Scale Space: Bregman-EM-TV
The EM-TV algorithm (16) we presented solves the problem (13) and delivers cartoon-reconstructions with sharp edges due to TV regularization. However, the realization of TV steps via the weighted ROF-models (17) has the drawback that reconstructed images suffer from loosing contrast. Thus, we propose to extend (13) and therewith EM-TV by iterative regularization to a simultaneous contrast correction. More precisely, we perform a contrast enhancement by inverse scale space methods and by using the Bregman iteration. These techniques have been derived in [8], with a detailed analysis for Gaussian-type problems (5), and have been generalized in [9, 10]. Following these methods, an iterative refinement is realized by a sequence of modified EM-TV problems based on (13). The inverse scale space methods concerning TV, derived in [8], follow the concept of iterative regularization by the Bregman distance [22]. In case of the Poisson-model the method initially starts with a simple EM-TV algorithm, i.e. it consists in computing a minimizer u1 of (13). Then, updates are determined successively by considering variational problems with a shifted TV, namely (8), where pl is an element of the subgradient of the total variation in ul . The Bregman distance concerning TV is defined in (9). The introduction of this definition allows to characterize the sequence of modified variational problems (8) by addition of constant terms as pl l+1 l u = arg min (Ku − f log Ku) dμ + α D|·|BV (u, u ) . (18) u∈BV (Ω)
Ω
Thus, the first iterate u1 can also be realized by the variational problem (18), if p u0 is constant and p0 := 0 ∈ ∂|u0 |BV . The Bregman distance D|·| does not BV represent a distance in the common (metric) sense, since D is not symmetric in general and the triangle inequality does not hold. Though, compared to (8), the p formulation in (18) offers the advantage that D|·| is a distance measure with BV p D|·| (u, u ˜) ≥ 0 BV
p and D|·| (u, u ˜) = 0 for u = u˜ . BV
Besides, the Bregman distance is convex in the first argument because | · |BV is convex. In general, i.e. for any convex functional J (see e.g. [10]), the Bregman distance can be interpreted as the difference between J(·) in u and the Taylor linearization of J around u ˜ if, in addition, J is continuously differentiable. Before deriving a two-step iteration corresponding to (16) we will motivate the contrast enhancement by iterative regularization in (18). The TV regularization in (13) prefers functions with only few oscillations. The iterative Bregman regularization has the advantage that, with ul as an approximation to the possible solution, additional information is available. The variational problem (18) can be interpreted as follows: search for a solution that matches the Poisson distributed data after applying K and simultaneous minimization of the residual of the Taylor approximation of | · |BV around ul . In the following we will see that this form of regularization does not change the position of gradients with respect
242
C. Brune, A. Sawatzky, and M. Burger
to the last computed EM-TV solution ul but that an increase of intensities is permitted. This leads to a noticeable contrast enhancement. For the derivation of a two-step iteration we consider the first order optimality condition of the variational problem (8) resp. (18). Due to convexity of the Bregman distance in the first argument we can determine the subdifferential of (18). Analogous to the derivation of the EM-TV iteration the subdifferential of the log likelihood functional can be expressed by the Fr´echet derivative in (14). Hence, the optimality condition is given by f ∗ ∗ 0 ∈ K 1 − K + α ( ∂|ul+1 |BV − pl ), pl ∈ ∂|ul |BV . (19) Kul+1 For u0 constant and p0 := 0 ∈ ∂|u0 |BV this delivers a well defined update of the iterates pl , K∗ 1 f l+1 l p 1− ∗ ∈ ∂|ul+1 |BV , := p − α ˜ K 1 Kul+1 where α ˜ := Kα∗ 1 results from an operator normalization. Analogous to EM-TV we can apply the idea of the nested iteration (16) in every refinement step, l = 1, 2, · · · . For the solution of (18) condition (19) yields a strategy consisting of an EM-step ul+1 followed by solving the adapted weighted ROF-problem k+ 1 2
ul+1 k+1
⎫ ⎧ ⎬ ⎨ 1 (u − ul+11 )2 k+ 2 l = arg min + α ˜ ( |u| − p , u ) . BV ⎭ ul+1 u∈BV (Ω) ⎩ 2 Ω k
(20)
Following [8,9,10], we provide an opportunity to transfer the shift-term pl , u to the data-fidelity term. This approach facilitates the implementation of contrast enhancement with Bregman distance via a slightly modified EM-TV algorithm. With the scaling v l := α ˜ pl and (19) we obtain the following update formula K∗ f l+1 l , v0 = 0 . =v − 1− ∗ (21) v K 1 Kul+1 Using this scaled update we can rewrite the second step (20) to ⎧ ⎫ ⎨ 1 (u − ul+11 )2 − 2uul+1 ⎬ vl k k+ 2 = arg min + α ˜ |u| . ul+1 BV k+1 ⎭ ul+1 u∈BV (Ω) ⎩ 2 Ω k Note that l+1 l+1 l 2 l+1 2 l 2 l+1 l+1 l l )2 − 2uul+1 (u − ul+1 k v = (u − (uk+ 1 + uk v )) + (uk ) (v ) − 2uk+ 1 uk v , k+ 1 2
2
holds, where the last two terms are independent of u. Hence (20) ⎧ l 2 ⎨ 1 (u − (ul+11 + ul+1 k v )) k+ 2 = arg min + α ˜ |u|BV ul+1 k+1 ul+1 u∈BV (Ω) ⎩ 2 Ω k
2
simplifies to ⎫ ⎬ , (22) ⎭
Bregman-EM-TV Methods
243
i.e. the second step (20) can be realized by a slight modification of the TV step introduced in (17). Obviously, the efficient numerical implementation of the weighted ROF-problem in Section 2.2 using the exact definition of TV and duality strategies can be applied in complete analogy to (22). The update variable v in (21) is an error function with reference to the optimality condition of the unregularized log-likelihood functional (11). In every refinement step of the Bregman iteration v l+1 differs from v l by the current error in the optimality condition (11). Within the TV-step (22) one observes that an iterative regularization with the Bregman distance leads to contrast enhancement. Instead of fitting to the EM solution ul+1 in the weighted norm, we use a function in the fidelity k+ 1 2
term whose intensities are increased by the error function v l . Resulting from the idea of adaptive regularization v l is weighted by ul+1 k , too. As usual for iterative methods the described reconstruction method by iterative regularization needs a stopping criterion. The latter should stop at an iteration offering a solution that approximates the true image as good as possible. This is necessary to prevent that too much noise arises by the inverse scale space strategy. In the case of Gaussian noise, the discrepancy principle is a reasonable stopping criterion, i.e. the procedure would stop if the residual Kul − f 2 reaches the variance of the noise. In the case of Poisson noise, however, it makes sense to stop the Bregman iteration if the Kullback-Leibler distances of Kul and the given data f reach the noise level. For synthetic data the noise level is given by the KL distance of Ku∗ and f , where u∗ denotes the true, noise-free image. For experimental data it is necessary to find a suitable estimate for the noise level from counts.
3
Results
In recent years revolutionary imaging techniques have been developed in light microscopy with enormous importance for biological and material sciences or medicine. For a couple of decades the technology of light microscopy has been considered to be exhausted, as the resolution is basically limited by Abbe’s law for diffraction of light. By developing stimulated emission depletion (STED)and 4Pi-microscopy now resolutions are achieved that are way beyond these
(a)
(b)
(c)
(d)
(e)
Fig. 1. Synthetic data concerning different PSFs: (a) true image; (b) Gaussian PSF; (c) is convolved with Gaussian PSF and Poisson noise; (d) PSF appearing in 4Pi microscopy; and (e) is convolved with 4Pi PSF and Poisson noise
244
C. Brune, A. Sawatzky, and M. Burger
(a)
(b)
(c)
(f)
(e)
(d)
(g)
(h)
Fig. 2. Synthetic data: (a) raw data; (b) EM reconstruction, 20 its, KL-distance: 3.20; (c) EM-TV, α = 0.04, KL-distance: 2.43; (d) Bregman-EM-TV, α = 0.1, after 4 updates, KL-distance: 1.43; (e) true image; (f)-(h) horizontal slices EM, EM-TV and Bregman-EM-TV compared to true image slice
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 3. Experimental data: (a) Protein Bruchpilot in active zones of neuromuscular synapses in larval Drosophila; (b) EM-TV; (c) Bregman-EM-TV; (d) Protein Syntaxin in cell membrane, fixed mamalian (PC12) cell; (e) EM-TV; and (f) Bregman-EM-TV
Bregman-EM-TV Methods
245
diffraction barrier [23,24]. To get an impression of nanoscopic images blurred by different convolution kernels (PSFs), we refer to Figure 1. Figure 2 illustrates our techniques at a simple synthetic object. With EM-TV (see 2(c) and 2(g)) we get rid of noise and oscillations, but we are not able to separate the objects sufficiently. Using Bregman-EM-TV a considerable improvement resulting from contrast enhancement can be achieved. This aspect is underlined by the values of the KL-distance for the different reconstructions. Figure 3, (a)-(c) demonstrate the protein Bruchpilot [25] and its EM-TV and Bregman-EM-TV reconstruction. Particularly, the latter delivers well separated object segments and a high contrast level. In Figure 3, (d)-(f) we illustrate our techniques by reconstructing Syntaxin [26], a membrane integrated protein participating in exocytosis. Here, the contrast enhancing property of Bregman-EM-TV is observable as well, compared to EM-TV. It is possible to preserve fine structures in the image.
4
Conclusions
We have derived reconstruction methods for inverse problems with Poisson noise. Particularly, we concentrated on deblurring problems in nanoscopic imaging, although the proposed methods can easily be adapted to other imaging tasks, i.e. medical imaging (PET, [27]). Motivated by a statistical modeling we developed a robust EM-TV algorithm that incorporates a-priori knowledge into the reconstruction process. By combining EM with simultaneous TV regularization we can reconstruct cartoon-images with sharp edges, that deliver a reasonable basis for quantitative investigations. To overcome the problem of contrast reduction, we extended the reconstruction to Bregman iterations and inverse scale space methods. We applied the proposed methods to optical nanoscopy and pointed out their improvements in comparison to standard reconstruction techniques. Acknowledgments. This work has been supported by the German Federal Ministry of Education and Research through the project INVERS. C.B. acknowledges further support by the Deutsche Telekom Foundation, and M.B. by the German Science Foundation DFG through the project "Regularisierung mit Singulären Energien". The authors thank Dr. Katrin Willig and Dr. Andreas Schönle (MPI Biophysical Chemistry, Göttingen) for providing experimental data and stimulating discussions.
References 1. Bertero, M., Lantéri, H., Zanni, L.: Iterative image reconstruction: a point of view. In: Mathematical Methods in Biomedical Imaging and Intensity-Modulated Radiation Therapy (IMRT). CRM series, vol. 8 (2008) 2. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 3. Le, T., Chartrand, R., Asaki, T.J.: A variational approach to reconstructing images corrupted by Poisson noise. J. Math. Imaging Vision 27, 257–263 (2007) 4. Shepp, L.A., Vardi, Y.: Maximum likelihood reconstruction for emission tomography. IEEE Transactions on Medical Imaging 1(2), 113–122 (1982)
246
C. Brune, A. Sawatzky, and M. Burger
5. Richardson, W.H.: Bayesian-based iterative method of image restoration. J. Opt. Soc. Am. 62, 55–59 (1972) 6. Lucy, L.B.: An iterative technique for the rectification of observed distributions. The Astronomical Journal 79, 745–754 (1974) 7. Acar, R., Vogel, C.R.: Analysis of bounded variation penalty methods for ill-posed problems. Inverse Problems 10, 1217–1229 (1994) 8. Osher, S., Burger, M., Goldfarb, D., Xu, J., Yin, W.: An iterative regularization method for total variation based image restoration. Multiscale Modelling and Simulation 4, 460–489 (2005) 9. Burger, M., Gilboa, G., Osher, S., Xu, J.: Nonlinear inverse scale space methods. Commun. Math. Sci. 4(1), 179–212 (2006) 10. Burger, M., Frick, K., Osher, S., Scherzer, O.: Inverse total variation flow. SIAM Multiscale Modelling and Simulation 6(2), 366–395 (2007) 11. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum Likelihood from Incomplete Data via the EM Algorithm. J. of the Royal Statistical Society, B 39, 1–38 (1977) 12. Natterer, F., Wübbeling, F.: Mathematical methods in image reconstruction. SIAM Monographs on Mathematical Modeling and Computation (2001) 13. Resmerita, E., et al.: The expectation-maximization algorithm for ill-posed integral equations: a convergence analysis. Inverse Problems 23, 2575–2588 (2007) 14. Vardi, Y., Shepp, L.A., Kaufman, L.: A statistical model for positron emission tomography. J. of the American Statistical Association 80(389), 8–20 (1985) 15. Iusem, A.N.: Convergence analysis for a multiplicatively relaxed EM algorithm. Mathematical Methods in the Applied Sciences 14, 573–593 (1991) 16. Evans, L.C., Gariepy, R.F.: Measure theory and fine properties of functions. Studies in Advanced Mathematics. CRC Press, Boca Raton (1992) 17. Giusti, E.: Minimal surfaces and functions of bounded variation. Birkhäuser, Basel (1984) 18. Chambolle, A.: An algorithm for total variation minimization and applications. J. of Mathematical Imaging and Vision 20, 89–97 (2004) 19. Resmerita, E., Anderssen, S.: Joint additive Kullback-Leibler residual minimization and regularization for linear inverse problems. Math. Meth. Appl. Sci. 30, 1527– 1544 (2007) 20. Hiriart-Urruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms I. Grundlehren der mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 305. Springer, Heidelberg (1993) 21. Brune, C., Sawatzky, A., Wübbeling, F., Kösters, T., Burger, M.: EM-TV methods for inverse problems with poisson noise (in preparation) (2009) 22. Bregman, L.M.: The relaxation method for finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comp. Math. and Math. Phys. 7, 200–217 (1967) 23. Klar, T.A., et al.: Fluorescence microscopy with diffraction resolution barrier broken by stimulated emission. PNAS 97, 8206–8210 (2000) 24. Hell, S., Schönle, A.: Nanoscale resolution in far-field fluorescence microscopy. In: Hawkes, P.W., Spence, J.C.H. (eds.) Science of Microscopy. Springer, Heidelberg (2006) 25. Kittel, J., et al.: Bruchpilot promotes active zone assembly, Ca2+ channel clustering, and vesicle release. Science 312, 1051–1054 (2006) 26. Willig, K.I., Harke, B., Medda, R., Hell, S.W.: STED microscopy with continuous wave beams. Nature Meth. 4(11), 915–918 (2007) 27. Sawatzky, A., Brune, C., Wübbeling, F., Kösters, T., Schäfers, K.: Accurate EMTV algorithm in PET with low SNR. In: IEEE Nucl. Sci. Symp. (2008)
PDE-Driven Adaptive Morphology for Matrix Fields Bernhard Burgeth, Michael Breuß, Luis Pizarro, and Joachim Weickert Mathematical Image Analysis Group, Faculty of Mathematics and Computer Science, Saarland University, 66041 Saarbrücken, Germany {burgeth,breuss,pizarro,weickert}@mia.uni-saarland.de http://www.mia.uni-saarland.de
Abstract. Matrix fields are important in many applications since they are the adequate means to describe anisotropic behaviour in image processing models and physical measurements. A prominent example is diffusion tensor magnetic resonance imaging (DT-MRI) which is a medical imaging technique useful for analysing the fibre structure in the brain. Recently, morphological partial differential equations (PDEs) for dilation and erosion known for grey scale images have been extended to three dimensional fields of symmetric positive definite matrices. In this article we propose a novel method to incorporate adaptivity into the matrix-valued, PDE-driven dilation process. The approach uses a structure tensor concept for matrix data to steer anisotropic morphological evolution in a way that enhances and completes line-like structures in matrix fields. Numerical experiments performed on synthetic and realworld data confirm the gap-closing and line-completing qualities of the proposed method.
1
Introduction
Initiated in the sixties by the pioneering research of Serra and Matheron on binary morphology [23, 31], this branch of image processing has developed into a rich field of research. Numerous monographs e.g. [17, 24, 32, 33, 34] and proceedings, e.g. [16,18,22] bear witness to the variety in mathematical morphology. The building blocks of morphological operations are dilation and erosion. These are usually realised by algebraic set operations involving a probing set, a so-called structuring element, e.g. [34] for details. An alternative approach to dilation is given [1] by the nonlinear partial differential equation (PDE) ∂t u = ∇u = |∂x u|2 + |∂y u|2 (1) with initial condition u(x, y, 0) = f (x, y). The equation mimics the dilation of a grey scale image f with respect to a ball-shaped structuring element of growing radius t. PDEs of this type using a continuous size parameter t for the structuring element give rise to continuous-scale morphology [1,2,6,29,35]. Equation (1) has been extended in two ways: X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 247–258, 2009. c Springer-Verlag Berlin Heidelberg 2009
248
B. Burgeth et al.
Firstly, in [5] adaptivity has been incorporated by introducing a speed function β = β(u) into (1), ∂t u = β(u) · ∇u
(2)
Earlier attempts towards adaptivity have been made in [20, 26] where a local switch between dilation and erosion with a nonadaptive structuring element leads to a so-called morphological shock filter, and in [21] introducing morphological amoebae described in a set-theoretic framework. Secondly, in [8] scalar continuous morphology has been extended to a PDEdriven morphology of matrix-valued images, matrix fields for short. Matrix fields have received increasing attention over the recent years since they are the appropriate data type to describe anisotropy in models or measurements of physical quantities. For instance, diffusion tensor magnetic resonance imaging (DT-MRI) became a valuable tool in medicine for in vivo diagnosis. It results in three dimensional tensor fields that describe the diffusive properties of water molecules, and as such the structure of the tissue under examination. The goal of this article is to introduce adaptivity into morphology for matrix fields. As it turns out it is advantageous to start for this generalisation from a scalar adaptive formulation for d-dimensional data u in form of the PDE ∂t u = M (u) · ∇u
(3)
with ∇u as a column vector and a data dependent, symmetric, positive semidefinite d × d-matrix M = M (u)rather than from (2). For example, for greyvalue ab images (d = 2) one has M = and (3) turns into bc ∂t u =
(a∂x u + b∂y u)2 + (b∂x u + c∂y u)2
(4)
→ M (x, y) transforms a sphere cenAn application of the mapping (x, y) tered around the origin into an ellipse. So, in fact, (3) describes a dilation with an ellipsoidal structuring element. The matrix M must contain directional information of the evolving u, and thus it may be derived from the so-called structure tensor. The structure tensor, going back to [14, 27, 4], is a classic tool in image processing to extract directional information from an image. It is given by Sρ (u(x)) := Gρ ∗ ∇u(x) · (∇u(x)) = Gρ ∗ ∂xi u(x) · ∂xj u(x) i,j=1,...,d (5) Here Gρ ∗ indicates a convolution with a Gaussian of standard deviation ρ, however, more general averaging procedures can be used. For more details the reader is referred to [3] and the literature cited there. We will make use of the extended structure tensor concept for matrix fields as proposed in [10]. There it was used to steer an coherence-enhancing diffusion process for matrix fields, an anisotropic filtering process that has been proposed for scalar and colour images in [36, 37].
PDE-Driven Adaptive Morphology for Matrix Fields
249
In [38, 7, 13] Di Zenzo‘s approach [12] to construct a structure tensor for multi-channel images has been extended to matrix fields yielding a standard structure tensor (using the notation of forthcoming Section 2): Jρ (U (x)) := m i,j=1 Sρ (Ui,j (x)) This construction has been refined to a customisable structure tensor in [30]. The article has the following structure: We will briefly convey in Section 2 basic notions of matrix analysis needed to establish a matrix-valued PDE for an adaptively steered morphological dilation process. This includes a short account of the construction of an extended structure tensor for matrix fields. In Section 3 we introduce the steering tensor that guides the dilation process adaptively. We explain how the numerical scheme of Rouy and Tourin is generalised to the matrix valued setting in Section 4. We compare in our experiments adaptive and isotropic dilation with CED-diffusion when applied to synthetic matrix fields and real DT-MRI data sets. We report on this comparison of the results in Section 5. The remarks in Section 6 conclude this article.
2
Matrix Analysis and an Extended Structure Tensor Concept
This section contains the key definitions for the formulation of matrix-valued PDEs. For a more detailed exposition the reader is referred to [9]. A matrix field is considered as a mapping U : Ω ⊂ Rd −→ Symm (R) from a d-dimensional image domain into the set of symmetric m × m-matrices with real entries, U (x) = (Up,q (x))p,q=1,...,m . The set of positive (semi-) definite matrices, + denoted by Sym++ m (R) (resp., Symm (R)), consists of all symmetric matrices A with v, Av := v Av > 0 (resp., ≥ 0) for v ∈ Rm \ {0} . This set is of special interest since DT-MRI produces data with this property. Note that at each point x the matrix U (x) of a field of symmetric matrices can be diagonalised yielding U (x) = V (x) D(x)V (x), where V (x) is a orthogonal matrix, while D(x) is a diagonal matrix. In the sequel we will denote m × m - diagonal matrices with entries λ1 , . . . , λm ∈ R from left to right simply by diag(λi ). The extension of a function h : R −→ R to Symm (R) is standard [19]: With a slight abuse of notation we set h(U ) := V diag(h(λ1 ), . . . , h(λm ))V ∈ Sym+ m (R), h denoting now a function acting on matrices as well. Specifying h(s) = |s|, s ∈ R as the absolut value function leads to the absolut value |A| ∈ Sym+ m (R) of a matrix A. It is natural to define the partial derivative for matrix fields componentwise: ∂ ω U = (∂ω Up,q )p,q=1,...,m
(6)
where ω ∈ {t, x1 , . . . , xd }, that is, ∂ ω stands for a spatial or temporal derivative. Viewing a matrix as a tensor (of second order), its gradient would be a third order tensor according to the rules of differential geometry. However, we adopt a more operator-algebraic point of view by defining the generalised gradient ∇U (x) at a voxel x = (x1 , . . . , xd ) by ∇U (x) := (∂ x1 U (x), . . . , ∂ xd U (x))
(7)
250
B. Burgeth et al.
which is an element of (Symm (R))d , in close analogy tothe scalar setting where ∇u(x) ∈ Rd . For W ∈ (Symm (R))d we set |W |p := p |W1 |p + · · · + |Wd |p for 0 < p < +∞. It results in a positive semidefinite matrix from Sym+ m (R), the direct counterpart of a nonnegative real number as the length of a vector in Rd . There will be the need for a symmetric multiplication of symmetric matrices. We opt for the so-called Jordan product A •J B := 12 (AB + BA) . It produces a symmetric matrix, and it is commutative but neither associative nor distributive. Furthermore, for later use in numerical schemes we have to clarify the notion of maximum and minimum of two symmetric matrices A, B. In direct anaology with relations known to be valid for real numbers one defines [8]: max(A, B) =
1 1 (A + B + |A − B|) and min(A, B) = (A + B − |A − B|) (8) 2 2
where |F | stands for the absolut value of the matrix F . With this at our disposal we formulate the matrix-valued counterpart of (3) as ∂ t U = |M (U ) • ∇U |2
(9)
with an initial matrix field F (x) = U (x, 0). Here M (U ) denotes a symmetric md × md-block matrix with d2 blocks of size m × m that is multiplied block-wise with ∇U employing the symmetrised product "•". Note that | · |2 stands for the length of M (U ) • ∇U in the matrix valued sense. The construction of M (U ) is detailed in Section 3 and relies on the so-called full structure tensor. The full structure tensor S L for matrix fields as defined in [10] reads S L (U ) := Gρ ∗ ∇U ·(∇U ) = Gρ ∗ ∂ xi U · ∂ xj U i,j=1,...,d (10) with Gρ ∗ indicating a convolution with a Gaussian of standard deviation ρ. S L (U (x)) is a symmetric md × md-block matrix with d2 blocks of size m × m, S L (U (x)) ∈ Symd (Symm (IR)) = Symmd (IR). Typically for the 3D medical DT-MRI data one has d = 3 and m = 3, yielding a 9 × 9-matrix S L . It can md be diagonalised as S L (U ) = k=1 λk wk wk with real eigenvalues λk (w.l.o.g. arranged in decreasing order) and an orthonormal basis {wk }k=1,...,md of IRmd . In order to extract useful d-dimensional directional information S L (U ) ∈ Symmd (IR) is reduced to a structure tensor S(U ) ∈ Symd (IR) in a generalised projection step [10] using the block operator matrix TrA := diag(trA , . . . , trA ) containing the trace operation. We set Tr := TrIm where Im denotes the m × m unit matrix. This operator matrix acts on elements of the space (Symm (IR))d as well as on block matrices via formal block-wise matrix multiplication, ⎛ ⎞⎛ ⎞ ⎛ ⎞ trA · · · 0 M11 · · · M1d trA (M11 ) · · · trA (M1d ) . . . . . . .. .. .. ⎝ .. . . . .. ⎠ ⎝ .. . . . .. ⎠ = ⎝ ⎠, (11) . 0 · · · trA trA (Md1 ) · · · trA (Mdd ) Md1 · · · Mdd provided that the square blocks Mij have the same size as A. The projection that is conveyed by the reduction process condenses the directional information contained in S L (U ), for a more detailed reasoning we must refer the reader to [10]
PDE-Driven Adaptive Morphology for Matrix Fields
251
for the sake of brevity. The reduction operation is accompanied by an extension operation: The Im -extension is the mapping from Symd (IR) to Symmd (IR) conveyed by the Kronecker product ⊗ : ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ v11 · · · v1d v11 · · · v1d Im · · · Im v11 Im · · · v1d Im ⎜ .. . . .. ⎟ ⎜ .. . . .. ⎟ ⎜ .. . . .. ⎟ ⎜ .. .. ⎟(12) .. −→⎝ . ⎝ . . . ⎠ . . ⎠⊗⎝ . . . ⎠:=⎝ . . . ⎠ vd1 · vdd
vd1 · · · vdd
Im · · · Im
vd1 Im · · · vdd Im
This resizing step renders a proper matrix-vector multiplication with the large generalised gradient (∇U (x)) possible. By specifying the matrix A in (11) one may invoke a priori knowledge into the direction estimation [10]. The research on these structure-tensor concepts has been initiated by [38, 7]. The approaches to matrix field regularisation suggested in [11] are based on differential geometric considerations. Comprehensive survey articles on the analysis of matrix fields using various techniques can be found in [39].
3
Steering Matrix M (U ) for Matrix Fields
With this notions we are in the position to propose the steering matrix M in the adaptive dilation process for matrix fields. We proceed in four steps: 1. The matrix field IRd x → U (x) provides us with a module field of generalised gradients ∇U (x) from which we construct the generalised structure tensor S L (U (x)) possibly with a certain integration scale ρ. This step corresponds exactly to the scalar case. 2. We infer d-dimensional directional information by reducing S L (U (x)) with trA by means of the block operator matrix TrA leading to a symmetric d × dmatrix S, for example S = Jρ if A = Im , (13) S(x) := TrA S L (U (x)) 3. The symmetric d × d-matrix S is spectrally decomposed, and the following mapping is applied: Rd+ −→ Rd H: , (14) c (λ1 , . . . , λd ) −→ λ1 +···+λ (λd , λd−1 , . . . , Kc · λ1 ) d with constants c, K > 0. H applied to S yields the steering matrix M , M := H(S)
(15)
Observe that the ellipsoid associated with the matrix M is flipped if compared with S and, depending on the choice of K, more excentric than the one accompanying S. 4. Finally we enlarge the d × d-matrix M to a md × md-matrix M by the extension operation: ⎛ ⎞ Im · · · Im ⎜ ⎟ M = M ⊗ ⎝ ... . . . ... ⎠ (16) Im · · · Im
252
4
B. Burgeth et al.
Matrix-Valued Numerical Schemes
In the context of PDE-based mathematical morphology, first-order finite difference methods such as the Osher-Sethian scheme [25] and the Rouy-Tourin method [28] are reasonable choices for solving the scalar PDE (4). We choose the latter in our experiments. The variant we present for the sake of brevity in its two-dimensional form reads 1 x n 2 1 n+1 n x n ui,j = ui,j + τ max max −D− ui,j , 0 , max D+ ui,j , 0 hx hx 1/2 1 y n 2 1 y n + max max −D− ui,j , 0 , max D+ ui,j , 0 (17) hy hy In the latter formulation we employ the notation unij as the grey value of the image u at the pixel centred at (ihx , jhy ) ∈ R2 at the time-level nτ of the
Fig. 1. (a) Top left: 2D slice of original 3D matrix field. (b) Top right: Adaptive dilation with of the original data with K = 25, ρ = 1 after t = 0.3. (c) Bottom left: Standard PDE-based dilation mimicing a ball-shaped structuring element after t = 1. (d) Bottom right: CED-filtering with ρ = 4 after t = 10.
PDE-Driven Adaptive Morphology for Matrix Fields
253
evolution. Additionally we use standard abbreviations for forward and backward x n x n difference operators, i.e., D+ ui,j := uni+1,j −uni,j and D− ui,j := uni,j −uni−1,j . and spatial grid size hx , hy . This scheme approximates, in the pixel (ihx , jhy ) 1 x n 1 x n max −D− ui,j , 0 , max D+ ui,j , 0 (18) ux ≈ max hx hx uy ≈ max
1 y n 1 y n max −D− ui,j , 0 , max D+ ui,j , 0 hy hy
(19)
Using this approximations, we modify the original Rouy-Tourin scheme (17) in an obvious manner to obtain a numerical scheme for the adaptive version of the PDE-based dilation (3). The extension to higher dimensions poses no problem. Since linear combinations and elementary functions such as the square, squareroot or absolute value function for matrix fields are now at our disposal it is straightforward to define one sided differences in x-direction for 2D matrix fields of m × m-matrices: x n D+ U (i, j) := U n ((i + 1)hx , jhy ) − U n (ihx , jhy ) ∈ Symm (R)
(20)
x n U (i, j) := U n (ihx , jhy ) − U n ((i − 1)hx , jhy ) ∈ Symm (R) D−
(21)
Fig. 2. (a) Left: 2D slice of 3D DT-MRI data set. (b) Right: Adaptive dilation of the original data with K = 10, ρ = 1, t = 0.5.
254
B. Burgeth et al.
In order to avoid confusion with the subscript notation for matrix components we used the notation U (i, j) to indicate the (matrix-) value of the matrix field evaluated at the voxel centred at (ihx , jhy ) ∈ R2 . The y-direction (and z-direction in 3D) is treated accordingly. The notion of supremum and infimum of two matrices – as needed in a matrix variant of Rouy-Tourin – has been provided by (8). Having these generalisations at our disposal a modified, adaptive version of the Rouy-Tourin scheme is available now in the setting of matrix fields simply by replacing grey values unij by matrices U n (i, j).
5
Experiments
The matrix data are visualised as an ellipsoid in each voxel via the level sets of quadratic form {v ∈ R2 v : v U −2 (i, j)v = const.} associated with the matrix
Fig. 3. (a) Top left: Enlarged section of the original data of figure 2 showing the genu area. (b) Top right: Adaptive dilation of the original data with K = 10, ρ = 1, t = 0.5. (c) Bottom left: Standard PDE-based dilation mimicing a ball-shaped structuring element with t = 0.5. (d) Bottom right: CED-filtering with ρ = 1 after t = 0.5.
PDE-Driven Adaptive Morphology for Matrix Fields
255
Fig. 4. (a) Top left: Enlarged section of the original data of figure 2 showing the splenium area. (b) Top right: Adaptive dilation of the original data with K = 10, ρ = 1, t = 0.5. (c) Bottom left: Standard PDE-based dilation mimicing a ball-shaped structuring element with t = 0.5. (d) Bottom right: CED-filtering with ρ = 1 after t = 0.5.
U (i, j) ∈ Sym+ 3 (R) representing the matrix field at voxel (ihx , jhy ). By using U −2 the length of the semi-axes of the ellipsoid correspond directly with the three eigenvalues of the matrix. Changing the constant const. amounts to a mere scaling of the ellipsoids. Note that only positive definite matrices produce ellipsoids as level sets of its quadratic form. In all our experiments we compare the results of the proposed matrix-valued adaptive dilation with the isotropic dilation [8] , and with the matrix-valued coherence-enhancing diffusion from [10]. For the explicit numerical schemes we used a time step size of 0.1, grid size hx = hy = 1, and c = 0.01 · K in (14). Figure 1 shows a synthetic data set of size 32 × 32 representing an interrupted diagonal stripe built from cigar-shaped ellipsoids of equal size. All methods succeed to some degree to fill the gaps. In the case of the proposed adaptive dilation the gap is filled almost completely with tensors comparable in size with the original ones while the width of the stripe is not altered at all. However, the numerical scheme has a slight bias towards the directions of the coordinate system entailing in the appearance of mild artefacts. Standard dilation fills the gap basically as a side effect of the isotropic dilation process which leads also to a considerable widening of the ribbon-like structure. CED for matrix fields produces indeed small cigar-shaped ellipsoids at the location of the gap. But the process is considerably slower than any of the dilation processes
256
B. Burgeth et al.
and the neighbouring ellipsoids become smaller due to the property of mass conservation. Additionally an undesirable widening of the stripe is observed. We also tested the proposed method on a real DT-MRI data set of a human head consisting of a 128 × 128 × 38-field of positive definite matrices. Figure 2 shows the lateral ventricals in a 40 × 55 2D section before and after applying adaptive dilation with speed parameter K = 10, integration scale ρ = 1 and stopping time t = 0.5. For a better comparison we display two enlarged regions of interest in Figures 3 and 4, namely the genu and the splenium areas, resp.. We observe that adaptive dilation preserves the shape of the ventricles better than the isotropic dilation, while enhancing slightly the directional structure of the fibre tracts surrounding the ventricles. Due to measurement errors the fibre tracts are interrupted in the original Figures 3(a) and 4(a). These holes in the anisotropic regions (splenium) are quickly filled by the adaptive dilation while CED-filtering will take much longer to do so.
6
Conclusion
In this article we have presented a novel method for an adaptive, PDE-based dilation process in the setting of matrix fields. The evolution governed by a matrix-valued PDE is guided by a steering tensor, the construction of which relies on an extended structure tensor concept for matrix fields. A matrix-valued extension of the Rouy-Tourin-scheme that allows to include directional information is employed to solve the novel PDE. Experiments on positive semidefinite DT-MRI and synthetic data confirm that the novel adaptive dilation process displays line-enhancing and gap-closing qualities, and as such it is superior to standard isotropic dilation which extends structures in all directions. It is also a valuable alternative in terms of quality and speed to coherence-enhancing diffusion filtering for matrix fields, an anisotropic processes which aims at enhancing flow-like structures as well but may suffer from dissipative effects. Future research will concentrate on improving the numerical realisation of our adaptive dilation.
Acknowledgement The financial support of the German Academic Exchange Service (DAAD) for the third author is gratefully acknowledged.
References 1. Alvarez, L., Guichard, F., Lions, P.-L., Morel, J.-M.: Axioms and fundamental equations in image processing. Archive for Rational Mechanics and Analysis 123, 199–257 (1993) 2. Arehart, A.B., Vincent, L., Kimia, B.B.: Mathematical morphology: The Hamilton–Jacobi connection. In: Proc. Fourth International Conference on Computer Vision, Berlin, pp. 215–219. IEEE Computer Society Press, Los Alamitos (1993)
PDE-Driven Adaptive Morphology for Matrix Fields
257
3. Bigün, J.: Vision with Direction. Springer, Berlin (2006) 4. Bigün, J., Granlund, G.H., Wiklund, J.: Multidimensional orientation estimation with applications to texture analysis and optical flow. IEEE Transactions on Pattern Analysis and Machine Intelligence 13(8), 775–790 (1991) 5. Breuß, M., Burgeth, B., Weickert, J.: Anisotropic continuous-scale morphology. In: Martí, J., Benedí, J.M., Mendonça, A.M., Serrat, J. (eds.) IbPRIA 2007. LNCS, vol. 4478, pp. 515–522. Springer, Heidelberg (2007) 6. Brockett, R.W., Maragos, P.: Evolution equations for continuous-scale morphological filtering. IEEE Transactions on Signal Processing 42, 3377–3386 (1994) 7. Brox, T., Weickert, J., Burgeth, B., Mrázek, P.: Nonlinear structure tensors. Image and Vision Computing 24(1), 41–55 (2006) 8. Burgeth, B., Bruhn, A., Didas, S., Weickert, J., Welk, M.: Morphology for tensor data: Ordering versus PDE-based approach. Image and Vision Computing 25(4), 496–511 (2007) 9. Burgeth, B., Didas, S., Florack, L., Weickert, J.: A generic approach to diffusion filtering of matrix-fields. Computing 81, 179–197 (2007) 10. Burgeth, B., Didas, S., Weickert, J.: A general structure tensor concept and coherence-enhancing diffusion filtering for matrix fields. Technical Report 197, Department of Mathematics, Saarland University, Saarbrücken, Germany (July 2007); to appear in: Laidlaw, D., Weickert, J. (eds.): Visualization and Processing of Tensor Fields. Springer, Heidelberg (2009) 11. Chefd’Hotel, C., Tschumperlé, D., Deriche, R., Faugeras, O.: Constrained flows of matrix-valued functions: Application to diffusion tensor regularization. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 251–265. Springer, Heidelberg (2002) 12. Di Zenzo, S.: A note on the gradient of a multi-image. Computer Vision, Graphics and Image Processing 33, 116–125 (1986) 13. Feddern, C., Weickert, J., Burgeth, B., Welk, M.: Curvature-driven PDE methods for matrix-valued images. International Journal of Computer Vision 69(1), 91–103 (2006) 14. Förstner, W., Gülch, E.: A fast operator for detection and precise location of distinct points, corners and centres of circular features. In: Proc. ISPRS Intercommission Conference on Fast Processing of Photogrammetric Data, Interlaken, Switzerland, June 1987, pp. 281–305 (1987) 15. Goutsias, J., Heijmans, H.J.A.M., Sivakumar, K.: Morphological operators for image sequences. Computer Vision and Image Understanding 62, 326–346 (1995) 16. Goutsias, J., Vincent, L., Bloomberg, D.S. (eds.): Mathematical Morphology and its Applications to Image and Signal Processing. Computational Imaging and Vision, vol. 18. Kluwer, Dordrecht (2000) 17. Heijmans, H.J.A.M.: Morphological Image Operators. Academic Press, Boston (1994) 18. Heijmans, H.J.A.M., Roerdink, J.B.T.M. (eds.): Mathematical Morphology and its Applications to Image and Signal Processing. Computational Imaging and Vision, vol. 12. Kluwer, Dordrecht (1998) 19. Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge University Press, Cambridge (1990) 20. Kramer, H.P., Bruckner, J.B.: Iterations of a non-linear transformation for enhancement of digital images. Pattern Recognition 7, 53–58 (1975) 21. Lerallut, R., Decencière, E., Meyer, F.: Image filtering using morphological amoebas. Image and Vision Computing 25(4), 395–404 (2007)
258
B. Burgeth et al.
22. Louverdis, G., Vardavoulia, M.I., Andreadis, I., Tsalides, P.: A new approach to morphological color image processing. Pattern Recognition 35, 1733–1741 (2002) 23. Matheron, G.: Eléments pour une théorie des milieux poreux. Masson, Paris (1967) 24. Matheron, G.: Random Sets and Integral Geometry. Wiley, New York (1975) 25. Osher, S., Fedkiw, R.P.: Level Set Methods and Dynamic Implicit Surfaces. Applied Mathematical Sciences, vol. 153. Springer, New York (2002) 26. Osher, S., Sethian, J.A.: Fronts propagating with curvature-dependent speed: Algorithms based on Hamilton–Jacobi formulations. Journal of Computational Physics 79, 12–49 (1988) 27. Rao, A.R., Schunck, B.G.: Computing oriented texture fields. CVGIP: Graphical Models and Image Processing 53, 157–185 (1991) 28. Rouy, E., Tourin, A.: A viscosity solutions approach to shape-from-shading. SIAM Journal on Numerical Analysis 29, 867–884 (1992) 29. Sapiro, G., Kimmel, R., Shaked, D., Kimia, B.B., Bruckstein, A.M.: Implementing continuous-scale morphology via curve evolution. Pattern Recognition 26, 1363– 1372 (1993) 30. Schultz, T., Burgeth, B., Weickert, J.: Flexible segmentation and smoothing of DTMRI fields through a customizable structure tensor. In: Bebis, G., Boyle, R., Parvin, B., Koracin, D., Remagnino, P., Nefian, A., Meenakshisundaram, G., Pascucci, V., Zara, J., Molineros, J., Theisel, H., Malzbender, T. (eds.) ISVC 2006. LNCS, vol. 4291, pp. 455–464. Springer, Heidelberg (2006) 31. Serra, J.: Echantillonnage et estimation des phénomènes de transition minier. PhD thesis, University of Nancy, France (1967) 32. Serra, J.: Image Analysis and Mathematical Morphology, vol. 1. Academic Press, London (1982) 33. Serra, J.: Image Analysis and Mathematical Morphology, vol. 2. Academic Press, London (1988) 34. Soille, P.: Morphological Image Analysis, 2nd edn. Springer, Berlin (2003) 35. van den Boomgaard, R.: Mathematical Morphology: Extensions Towards Computer Vision. PhD thesis, University of Amsterdam, The Netherlands (1992) 36. Weickert, J.: Coherence-enhancing diffusion of colour images. In: Sanfeliu, A., Villanueva, J.J., Vitrià, J. (eds.) Proc. Seventh National Symposium on Pattern Recognition and Image Analysis, Barcelona, Spain, April 1997, vol. 1, pp. 239–244 (1997) 37. Weickert, J.: Coherence-enhancing diffusion filtering. International Journal of Computer Vision 31(2/3), 111–127 (1999) 38. Weickert, J., Brox, T.: Diffusion and regularization of vector- and matrix-valued images. In: Nashed, M.Z., Scherzer, O. (eds.) Inverse Problems, Image Analysis, and Medical Imaging. Contemporary Mathematics, vol. 313, pp. 251–268. AMS, Providence (2002) 39. Weickert, J., Hagen, H. (eds.): Visualization and Processing of Tensor Fields. Springer, Berlin (2006)
On Semi-implicit Splitting Schemes for the Beltrami Color Flow Lorina Dascal1 , Guy Rosman1 , Xue-Cheng Tai2,3 , and Ron Kimmel1 1
Department of Computer Science, Technion – Israel Institute of Technology, 32000, Haifa, Israel {lorina,rosman,ron}@cs.technion.ac.il 2 Division of Mathematical Sciences, SPMS, Nanyang Technological University, 50 Nanyang Avenue, 639798, Singapore [email protected] 3 Department of Mathematics, University of Bergen, Johannes Brunsgate 12, 5007, Bergen, Norway [email protected]
Abstract. The Beltrami flow is an efficient non-linear filter, that was shown to be effective for color image processing. The corresponding anisotropic diffusion operator strongly couples the spectral components. Usually, this flow is implemented by explicit schemes, that are stable only for small time steps and therefore require many iterations. In this paper we introduce a semi-implicit scheme based on the locally one-dimensional (LOD) and additive operator splitting (AOS) schemes for implementing the anisotropic Beltrami operator. The mixed spatial derivatives are treated explicitly, while the non-mixed derivatives are approximated in a semi-implicit manner. Numerical experiments demonstrate the stability of the proposed scheme. Accuracy and efficiency of the splitting schemes are tested in applications such as the scale-space analysis and denoising. In order to further accelerate the convergence of the numerical scheme, the reduced rank extrapolation (RRE) vector extrapolation technique is employed.
1
Introduction
Nonlinear diffusion filters based on partial differential equations (PDEs) have been extensively used in the last decade for different tasks in image processing. Their efficient implementation requires designing numerical schemes in which the issues of accuracy, stability, and computational cost all play important roles. The Beltrami image flow is an example of a non-linear filter, that is efficient for color image processing. It treats the image as a 2-D manifold embedded in a hybrid spatial-feature space. Minimization of the image area surface yields the Beltrami flow. The corresponding diffusion operator is anisotropic and strongly couples the spectral components. Due to its anisotropy and non-separability, so far there is no efficient implicit, nor operator-splitting-based numerical scheme for the partial differential equation that describes the Beltrami flow in color. X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 259–270, 2009. c Springer-Verlag Berlin Heidelberg 2009
260
L. Dascal et al.
Usual discretizations of this filter are based on explicit schemes, that limit the time step and therefore result in a large number of iterations. In [1] an acceleration technique based on the reduced rank extrapolation (RRE) algorithm [2, 3] was proposed in order to speed-up the slow convergence of the explicit scheme. As an alternative to the explicit scheme, an approximation using the short time kernel of the Beltrami operator was suggested in [4]. Although unconditionally stable, this method is still computationally demanding, since computing the kernel involves geodesic distance computation around each pixel. The bilateral filter, which can be shown to be an Euclidean approximation of the Beltrami kernel, was studied in different contexts (see [5], [6], [7], [8], [9], [10]). Recently, a related filter, the nonlocal means filter, was proposed in [11] and shown to be useful in denoising gray-scale and color images. In this paper we propose to approximate the system of nonlinear coupled equations given by the Beltrami flow by a semi-implicit finite difference scheme based on operator splitting. Additive operator splitting (AOS) schemes were first developed for (nonlinear elliptic/parabolic) monotone equations and NavierStokes equations [12, 13]. In image processing applications, the AOS scheme was found to be an efficient way for approximating the Perona-Malik filter [14], especially if symmetry in scale-space is required. The AOS scheme is first order in time, semi-implicit, and unconditionally stable with respect to its time-step [13, 14]. In the early 1950’s (see [15]) the alternating-direction method (ADI) was introduced, and in [16] the LOD (locally one-dimensional) splitting method was proposed. The LOD scheme and other multiplicative splitting methods were employed in the context of nonlinear diffusion image filtering in [17]. We stress that the main characteristic of this class of equations, which allows splitting, is local isotropy. However, in the case of the anisotropic Beltrami operator, the main difficulty in splitting stems from the presence of the mixed derivatives. To overcome this problem, we suggest to construct the following semi-implicit scheme; the spatial mixed derivatives are discretized explicitly at the current time step nΔt, while those that do not contain mixed derivatives are approximated using an average of two levels of time steps: nΔt and (n + 1)Δt (Crank-Nicolson scheme). As our equations are nonlinear, a stability proof of the corresponding finite difference scheme is a non-trivial task. We provide numerical experiments which indicate that the LOD and the AOS splitting schemes for the nonlinear Beltrami color filter are stable for a wide range of time steps. We demonstrate the efficiency and stability of the splitting in applications such as: Beltramibased scale space and Beltrami-based denoising. In order to further expedite the LOD/AOS splitting schemes, we show how to speed-up their convergence by using the RRE (reduced rank extrapolation) technique. The RRE method was introduced by Me˘sina and Eddy [2, 3] to speed-up the convergence of general sequences of vectors without explicit knowledge of the sequence generator. This technique was applied in [1] in order to speed up the slow convergence of the standard explicit scheme for the Beltrami color flow. In this paper we show that in applications such as scale-space and denoising of color images, the semiimplicit LOD/AOS schemes can also be accelerated using the RRE technique.
On Semi-implicit Splitting Schemes for the Beltrami Color Flow
261
This paper is organized as follows: In Section 2 we briefly summarize the Beltrami framework. In Section 3 we briefly review general semi-implicit splitting operator schemes. In Section 4 we propose a semi-implicit splitting scheme for the anisotropic Beltrami operator, based on the LOD/AOS schemes. In Section 5 we demonstrate the efficiency and stability of the LOD/AOS splitting schemes for Beltrami-based scale-space and Beltrami-based denoising. Furthermore, we propose to accelerate the LOD/AOS schemes using the RRE technique. Section 6 concludes the paper.
2
The Beltrami Framework
Let us briefly review the Beltrami framework for non-linear diffusion in computer vision [18, 19, 20, 21]. We represent images as embedding maps of a Riemannian manifold in a higher dimensional space. We denote the map by U : Σ → M , where Σ is a two-dimensional surface, with (σ 1 , σ 2 ) denoting coordinates on it. M is the spatial-feature manifold, embedded in Rd+2 , where d is the number of image channels. For example, a gray-level image can be represented as a 2D surface embedded in R3 . The map U in this case is U (σ 1 , σ 2 ) = (σ 1 , σ 2 , I(σ 1 , σ 2 )), where I is the image intensity. For color images, U is given by U (σ 1 , σ 2 ) = (σ 1 , σ 2 , I 1 (σ 1 , σ 2 ), I 2 (σ 1 , σ 2 ), I 3 (σ 1 , σ 2 )), where I 1 , I 2 , I 3 are the three components of the color vector. Next, we choose a Riemannian metric on this surface, g, with elements denoted by gij . The canonical choice of coordinates in image processing is Cartesian (we denote them here by x1 and x2 ). For such a choice, which we follow in the rest of the paper, we identify σ 1 = x1 and σ 2 = x2 . In this case, σ 1 and σ 2 are the image coordinates. We denote the elements of the inverse of the metric by superscripts g ij , and the determinant by g = det(gij ). Once images are defined as embedding of Riemannian manifolds, it is natural to look for a measure on this space of embedding maps. Denote by (Σ, g) the image manifold and its metric, and by (M, h) the spacefeature manifold and its metric. Then, the functional S[U ] assigns a real number to a map U : Σ → M , √ S[U, gij , hab ] = ds σ g||dU ||2g,h , (1) where s is the dimension of Σ, g is the determinant of the image metric, and the range of indices is i, j = 1, 2, ... dim(Σ) and a, b = 1, 2, ... dim(M ). The integrand ||dU ||2g,h is expressed in a local coordinate system by ||dU ||2g,h = (∂xi U a )g ij (∂xj U b )hab . This functional, for dim(Σ) = 2 and hab = δab , was first proposed by Polyakov [22] in the context of high energy physics, in the theory known as string theory. The elements of the induced metric for color images with Cartesian color coordinates are 3 3 1 + β 2 a=1 (Uxa1 )2 β 2 a=1 Uxa1 Uxa2 G = (gij ) = , (2) 3 3 β 2 a=1 Uxa1 Uxa2 1 + β 2 a=1 (Uxa2 )2
262
L. Dascal et al.
where a subscript of U denotes a partial derivative and the parameter β > 0 determines the ratio between the spatial and spectral (color) distances. Using standard methods in calculus of variations, the Euler-Lagrange equations with respect to the embedding (assuming Euclidean embedding space) are 1 1 δS 0 = − √ hab b = √ div (D∇U a ), g δU g
(3)
Δg U a
where the diffusion matrix is D =
√ −1 gG . Note that we can write 2
div(D∇U ) =
∂xq (dqr ∂xr U ).
q,r=1
The operator that acts on U is the natural generalization of the Laplacian from flat spaces to manifolds. It is called the Laplace-Beltrami operator, and denoted by Δg . The parameter β, in the elements of the metric gij , determines the nature of the flow. At the limits, where β → 0 and β → ∞, we obtain respectively a linear diffusion flow and a nonlinear flow, akin to the TV flow [23] for the case of grey-level images (see [20] for details). The Beltrami scale-space emerges as a gradient descent minimization process 1 δS Uta = − √ = Δg U a , g δU a
a = 1, 2, 3.
(4)
For Euclidean embedding, the functional in Eq. (1) reduces to S(U ) =
√ g dx1 dx2 .
(5)
This geometric measure can be used as a regularization term for color image processing. In the variational framework, the reconstructed image is the minimizer of a cost-functional. This functional can be written in the following general form, 3
Ψ (U ) = λ ||U a − F a ||2 + S(U ), a=1
where the parameter λ controls the smoothness of the solution and F is the given image. The modified Euler-Lagrange equations as a gradient descent process are 1 δΨ 2λ Uta = − √ = − √ (U a − F a ) + Δg U a , g δU a g
a = 1, 2, 3.
(6)
On Semi-implicit Splitting Schemes for the Beltrami Color Flow
3
263
Operator Splitting Schemes
In this section we briefly review standard first order accurate splitting schemes for diffusion equations. One of the main drawbacks of the semi-implicit schemes for such equations in multiple dimensions is that the resulting inverted matrix does not have an efficient algorithm for its inversion. In order to remedy this shortcoming, splitting techniques are commonly employed in solving timedependent partial differential equations. They allow one to reduce problems in multiple spatial dimensions to a sequence of problems in one dimension, which are easier to solve. One of the simplest splitting schemes belonging to the class of multiplicative operator splitting schemes, is the locally one-dimensional (LOD) scheme [16]. The LOD scheme only needs to invert one three-diagonal matrix for each direction. It is simple to implement, is unconditionally stable and it is first order accurate. However, the system matrix is not axis symmetric, a property that may be important in some cases. If such a property is required, one could use the additive operator splitting scheme [13], which was actually invented for parallel implementation of splitting methods. Even for sequential implementations, the AOS is almost as efficient as the LOD scheme; instead of multiplying the operators, one computes them independently and then averages the sums of the inverse of the two matrices. We want to emphasize that the matrices for AOS use 2Δt instead of Δt. It is not a trivial matter to apply dimensional splitting schemes for Beltrami type of equations. Our goal is to construct a splitting scheme for the nonlinear anisotropic Beltrami operator, which would amount to inverting tridiagonal matrices, be unconditionally stable and preserve the time discretization accuracy that was obtained without applying splitting techniques.
4
The Proposed Splitting Scheme
In this section we present an operator splitting scheme for the Beltrami filter. Before splitting, we first introduce a semi-implicit approximation scheme to our equations. A semi-implicit Crank-Nicolson scheme for an equation involving mixed derivatives can rely on the following discretization of the spatial derivatives operators: mixed derivatives are computed at time step nΔt, while the non-mixed derivatives are computed as the average of the values at time steps nΔt and (n + 1)Δt. This approach for handling mixed derivatives in semiimplicit schemes for approximating linear equations has been considered in several previous works (see [24, 25, 26] for example), including the context of image processing [27], although it was not combined with the Crank-Nicolson method in the latter case. We note that in numerical experiments we have found the introduction of the Crank-Nicolson method into the splitting scheme necessary in order to maintain stability for large time steps. A simpler scheme, similar to the one used in [27], did not seem to be sufficiently stable for this PDE and the applications demonstrated in this paper. We now present the scheme we intend to use.
264
L. Dascal et al.
First, we refine our grid notations. We work on the rectangle Ω = (0, 1)×(0, 1), which we discretize by a uniform grid of m × m pixels, such that xi = iΔx, yj = jΔy, tn = nΔt, where 1 ≤ i ≤ m, 1 ≤ j ≤ m, 1 ≤ n ≤ J and JΔt = T . Let the 1 grid size be Δx = Δy = m−1 . a For each channel U , a = 1, 2, 3 of the color vector, we define the discrete approximation (U a )nij by (U a )(iΔx, jΔy, nΔt) = (U a )nij ≈ U a (iΔx, jΔy, nΔt). We impose von-Neumann boundary condition, and initially set U a to be our initial data image. 4.1
LOD/AOS Scheme for the Beltrami Scale-Space
We approximate the Beltrami filter given in Eq. (4) by the following semi implicit Crank-Nicolson scheme: 1 1 n a n+1 1 n a n (U a )n+1 − (U a )n = √ n All (U ) + All (U ) + Δt g 2 2 2
l=1
2
2
l=1
Anqr (U a )n ,
q=1 r =q
where U a is the N -dimensional vector denoting one of the components of the color vector, and Anqr is a central difference approximation of the operator ∂xq (dqr ∂xr ) at time step n. Rearranging terms, we obtain −1
2
Δt (U a )n+1 = I − √ n Anll 2 g l=1 ⎛ ⎞ 2
2
Δt Δt ⎝I + √ Anqr + √ n Anll ⎠ (U a )n , g n q=1 2 g r =q
l=1
which can also be written as ⎞ −1 ⎛
2 2
2
Δt Δt ⎝I + Δt (U a )n+1 = I − A¯nll A¯nqr + A¯nll ⎠ (U a )n , 2 2 q=1 l=1
where
r =q
l=1
1 A¯11 = √ ∂x (A∂x ), g
1 A¯22 = √ ∂y (C∂y ), g
1 A¯12 = √ ∂x (B∂y ), g
1 A¯21 = √ ∂y (B∂x ), g
and the functions A, B, C are the corresponding elements of the diffusion matrix associated with the Beltrami flow.
On Semi-implicit Splitting Schemes for the Beltrami Color Flow
265
Again, this semi-implicit scheme still has a major drawback. At each iteration one needs to solve a large linear system whose matrix of coefficients is not tridiagonal and thus costly. Instead, we employ the LOD splitting scheme Δt ¯ −1 Δt ¯ −1 A22 A11 (U a )n+1 = I − I− 2 2 2
Δt ¯ Δt ¯ A11 )(I + A22 ) + Δt (I + A¯nqr (U a )n , 2 2 q=1 r =q
or the AOS scheme, that reads, (U a )n+1 =
−1 −1 1 I − ΔtA¯22 + I − ΔtA¯11 2 2
Δt ¯ Δt ¯ A11 )(I + A22 ) + Δt (I + A¯nqr (U a )n . 2 2 q=1 r =q
The above splitting schemes are efficient because at each time step a single tridiagonal matrix inversion is performed for each spatial dimension. The system of differential equations we deal with is nonlinear. The question of theoretical stability of the LOD/AOS based nonlinear finite difference scheme is a non-trivial challenge, with theory still lagging behind common practice. Our numerical experiments indicate that the splitting is stable for a wide variety of parameters, suitable for most applications, as will be shown in Section 5. 4.2
LOD/AOS Scheme for the Beltrami-Based Denoising
The splitting scheme in the presence of a fidelity term requires a slight modification that we detail below. In this case we solve for each channel the equation 2λ Uta = − √ (U a − F a ) + Δg U a , g
(7)
with von-Neumann boundary condition and the initial condition U a (x, 0) = F a (x). The Crank-Nicolson scheme approximating Eq. (7) is 2 Δt ¯n λ −1 (U a )n+1 = I − All + 2Δt √ n I 2 g l=1
2
Δt ¯n Δt ¯n A11 )(I + A22 ) + Δt A¯nqr (U a )n + 2 2 q=1 r =q λ +2ΔtF a √ n . g
(I +
(8)
266
L. Dascal et al.
It is possible to use LOD/AOS approximations for the inverse of the matrix in the above scheme. √ However, we would like to treat the fidelity term in a special way. When λ/ g n is big, we find that the scheme proposed below possesses better stability properties. We now describe the details for treating the fidelity term for our CrankNicolson the nominator and the denominator by the matrix scheme. Dividing S n = 1 + 2Δt √λgn I, and rearranging terms, we get 2 Δt n −1 ¯n −1 (S ) (U a )n+1 = I − All 2
l=1
2
Δt ¯n Δt ¯n A11 )(I + A22 ) + Δt A¯nqr (U a )n 2 2 q=1 r =q λ +2(S n )−1 ΔtF a √ n . g
(S n )−1 (I +
Approximating the semi-implicit scheme based on the LOD-splitting, we have −1 −1 1 1 I − Δt(S n )−1 A¯n11 (U a )n+1 = I − Δt(S n )−1 A¯n22 2 2 2
Δt ¯n Δt ¯n A11 )(I + A22 ) + Δt (S n )−1 (I + A¯nqr (U a )n + 2 2 q=1 r =q λ +2(S n )−1 ΔtF a √ n . g A similar splitting scheme can be developed using AOS.
5
Experimental Results
We proceed to demonstrate experimentally the stability, accuracy, and efficiency of the LOD and AOS splitting schemes for the Beltrami color flow. In Figure 1 we show the results of the Beltrami flow, implemented by employing the LOD splitting scheme for approximating Eq. (4). Next we illustrate the use of the splitting schemes in the case where the functional involves a fidelity term. A noisy image as well as the reference denoising result, based on the explicit scheme, are shown in Figure 3, next to the result of the AOS and LOD splitting schemes. Note that the visual results obtained by the two schemes are similar to the reference image. 5.1
RRE Extrapolation Technique for Acceleration of the LOD Splitting Scheme
In [28, 1] vector extrapolation was applied in order to speed up the slow convergence of the explicit schemes for the Beltrami color flow. In the experiments
On Semi-implicit Splitting Schemes for the Beltrami Color Flow
267
Fig. 1. Top row, left: The original image which contains JPEG artifacts.√Middle: Results of the LOD splitting scheme with Δt = 1, after 1 iteration, β = 103 , λ = 0. Right: Results of the LOD splitting scheme with after 2 iterations. Bottom row, left: Results of the LOD splitting scheme with after 4 iterations. Middle: a close-up of the original image. Right: a close-up of the resulting image after 4 iterations.
Fig. 2. The different image channels of an image patch taken from the images in Figure 1. Left to right: An image patch before denoising, its different color channels, the denoised image, and the denoised color channels. The color arrows indicate the direction of the gradient in the various color channels.
below we demonstrate how the RRE extrapolation technique can also be used to accelerate the convergence of implicit schemes. Figure 4 shows that the RRE method accelerates the LOD scheme. A comparison is also given to the convergence rate achieved by the method of [28,1]. Extrapolation techniques also allow us to obtain a more accurate rate, if one takes a smaller time step.
268
L. Dascal et al.
Fig. 3. Large image at the right: An image with artifacts resulting from lossy compression.. Smaller images – a close-up on a section of the image. Top row, left: The image with JPEG artifacts. Right: Beltrami-based denoising by explicit scheme, run with 4000 explicit iterations, Δt = 0.0005. Bottom row,√left: Denoising by LOD, Δt = 0.02. Right: Denoising by AOS, Δt = 0.02. λ = 1, β = 2000.
10
Residual Norm
10
10
10
10
5
Explicit Explicit+RRE LOD LOD+RRE
0
−5
−10
−15
0
10
20 30 40 CPU Time (sec)
50
Fig. 4. Graph of the residuals (LOD, explicit+RRE and LOD+RRE) versus CPU times. Parameters: Δt = 0.05 for the explicit scheme, Δt = 2.5 for LOD, λ = 0.5, β = √ 500 ≈ 22.36.
6
Conclusions
Due to its anisotropy and non-separability nature, no implicit scheme, nor operator splitting based scheme was so far introduced for the partial differential equations that describe the Beltrami color flow. In this paper we propose a
On Semi-implicit Splitting Schemes for the Beltrami Color Flow
269
semi-implicit splitting scheme based on LOD/AOS for the anisotropic Beltrami operator. The spatial mixed derivatives are discretized explicitly at time step nΔt , while the non-mixed derivatives are approximated using the average of the two time levels nΔt and (n + 1)Δt. The stability of the splitting is empirically tested in applications such as Beltrami-based scale-space and Beltrami-based denoising, which display a stable behavior. In order to further accelerate the convergence of the splitting schemes, the RRE vector extrapolation technique is employed.
Acknowledgements We thank Prof. Avram Sidi for interesting discussions. This research was supported by the United States -Israel Binational Science Foundation grant No. 2004274, by the Israeli Science Foundation grant No. 623/08, by the Ministry of Science grant No. 3-3414, and by the Elias Fund for Medical Research. XueCheng Tai is supported by the MOE (Ministry of Education) Tier II project T207N2202 and IDM project NRF2007IDMIDM002-010.
References 1. Rosman, G., Dascal, L., Kimmel, R., Sidi, A.: Efficient beltrami image filtering via vector extrapolation methods. SIAM J. Imag. Sci. (2008) (submitted) 2. Mešina, M.: Convergence acceleration for the iterative solution of the equations X = AX + f . Comp. Meth. Appl. Mech. Eng. 10, 165–173 (1977) 3. Eddy, R.: Extrapolating to the limit of a vector sequence. In: Wang, P. (ed.) Information Linkage Between Applied Mathematics and Industry, New York, pp. 387–396. Academic Press, London (1979) 4. Spira, A., Kimmel, R., Sochen, N.A.: A short-time Beltrami kernel for smoothing images and manifolds. IEEE Trans. Image Process. 16(6), 1628–1636 (2007) 5. Smith, S.M., Brady, J.: Susan - a new approach to low level image processing. Intl. J. of Comp. Vision 23, 45–78 (1997) 6. Aurich, V., Weule, J.: Non-linear gaussian filters performing edge preserving diffusion. In: Mustererkennung 1995, 17. DAGM-Symposium, London, UK, pp. 538–545. Springer, Heidelberg (1995) 7. Tomasi, C., Manduchi, R.: Bilateral filtering for gray and color images. In: Proceedings of IEEE International Conference on Computer Vision, pp. 836–846 (1998) 8. Sochen, N., Kimmel, R., Bruckstein, A.M.: Diffusions and confusions in signal and image processing. J. of Math. Imag. and Vision 14(3), 195–209 (2001) 9. Elad, M.: On the bilateral filter and ways to improve it. IEEE Trans. Image Process. 11(10), 1141–1151 (2002) 10. Barash, D.: A fundamental relationship between bilateral filtering, adaptive smoothing and the nonlinear diffusion equation. IEEE Trans. Image Process. 24(6), 844–847 (2002) 11. Buades, A., Coll, B., Morel, J.M.: A review of image denoising algorithms, with a new one. SIAM Interdisciplinary Journal 4, 490–530 (2005) 12. Lu, T., Neittaanmaki, P., Tai, X.C.: A parallel splitting up method and its application to Navier-Stokes equations. Applied Mathematics Letters 4(2), 25–29 (1991)
270
L. Dascal et al.
13. Lu, T., Neittaanmaki, P., Tai, X.C.: A parallel splitting up method for partial differential equations and its application to Navier-Stokes equations. RAIRO Mathematical Modelling and Numerical Analysis 26(6), 673–708 (1992) 14. Weickert, J., Romeny, B.M.T.H., Viergever, M.A.: Efficient and reliable schemes for nonlinear diffusion filtering. IEEE Trans. Image Process. 7(3), 398–410 (1998) 15. Peaceman, D.W., Rachford, H.H.: The numerical solution of parabolic and elliptic differential equations. Journal Soc. Ind. Appl. Math. 3, 28–41 (1955) 16. Yanenko, N.N.: The method of fractional steps. The solution of problems of mathematical physics in several variables. Springer-Verlag, New York (1971) 17. Barash, D., Schlick, T., Israeli, M., Kimmel, R.: Multiplicative operator splittings in nonlinear diffusion: from spatial splitting to multiple timesteps. J. of Math. Imag. and Vision 19(16), 33–48 (2003) 18. Kimmel, R., Malladi, R., Sochen, N.: Images as embedding maps and minimal surfaces: Movies, color, texture, and volumetric medical images. Intl. J. of Comp. Vision 39(2), 111–129 (2000) 19. Sochen, N., Kimmel, R., Maladi, R.: From high energy physics to low level vision. In: ter Haar Romeny, B.M., Florack, L.M.J., Viergever, M.A. (eds.) Scale-Space 1997. LNCS, vol. 1252, pp. 236–247. Springer, Heidelberg (1997) 20. Sochen, N., Kimmel, R., Maladi, R.: A general framework for low level vision. IEEE Trans. Image Process. 7, 310–318 (1998) 21. Yezzi, A.J.: Modified curvature motion for image smoothing and enhancement. IEEE Trans. Image Process. 7(3), 345–352 (1998) 22. Polyakov, A.M.: Quantum geometry of bosonic strings. Physics Letters 103 B, 207–210 (1981) 23. Rudin, L., Osher, S., Fatemi, E.: Non-linear total variation based noise removal algorithms. Physica D Letters 60, 259–268 (1992) 24. Yanenko, N.N.: About implicit difference methods of the calculation of the multidimensional equation of thermal conductivity. In: Proceedings of VUZ. Series of Mathematics, vol. 23(4), pp. 148–157 (1961) 25. Andreev, V.B.: Alternating direction methods for parabolic equations in two space dimensions with mixed derivatives. Zhurnal Vychislitelnoi Matematiki i Matematicheskoi Fiziki 7(2), 312–321 (1967) 26. Mckee, S., Mitchell, A.R.: Alternating direction methods for parabolic equations in three space dimensions with mixed derivatives. The Computer Journal 14(3), 25–30 (1971) 27. Weickert, J.: Coherence-enhancing diffusion filtering. Intl. J. of Comp. Vision 31(2/3), 111–127 (1999) 28. Dascal, L., Rosman, G., Kimmel, R.: Efficient Beltrami filtering of color images via vector extrapolation. In: Sgallari, F., Murli, A., Paragios, N. (eds.) SSVM 2007. LNCS, vol. 4485, pp. 92–103. Springer, Heidelberg (2007)
Multi-scale Total Variation with Automated Regularization Parameter Selection for Color Image Restoration Yiqiu Dong1 and Michael Hintermüller2 1 START-Project “Interfaces and Free Boundaries” and SFB “Mathematical Optimization and Applications in Biomedical Science”, Institute of Mathematics and Scientific Computing, University of Graz, Heinrichstrasse 36, A-8010 Graz, Austria [email protected] 2 Department of Mathematics, Humboldt-University of Berlin, Unter den Linden 6, 10099 Berlin, Germany, and START-Project “Interfaces and Free Boundaries” and SFB “Mathematical Optimization and Applications in Biomedical Science”, Institute of Mathematics and Scientific Computing, University of Graz, Heinrichstrasse 36, A-8010 Graz, Austria [email protected]
Abstract. In this paper, a multi-scale vectorial total variation model for color image restoration is introduced. The model utilizes a spatially dependent regularization parameter in order to preserve the details during noise removal. The automated adjustment strategy of the regularization parameter is based on local variance estimators combined with a confidence interval technique. Numerical results on images are presented to demonstrate the efficiency of the method.
1
Introduction
We consider the problem of recovering color images degraded by cross-channel ˆ blurring and Gaussian noise. Without loss of generality, we assume an image u is a vectorial function defined on a bounded and piecewise smooth open subset ˆ : Ω → RM , where M is the number of channels in the color Ω ∈ R2 , that is, u ˆ is given by model. The degraded form z of u ˆ + n, z = Ku where K ∈ L(L2 (Ω; RM )) is a cross-channel blurring operator, and n represents white Gaussian noise with zero mean and standard deviation σ. The problem of ˆ from z with unknown n is known to be typically ill-posed [1]. restoring u In order to preserve significant edges during restoring images, Rudin, Osher and Fatemi proposed total variation regularization [2] for gray-level images. In this approach (which we call the TV-model in what follows), the image u ˆ is recovered by solving the optimization problem λ min |Du| + |Ku − z|2 dx, (1) u∈BV (Ω) Ω 2 Ω X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 271–281, 2009. c Springer-Verlag Berlin Heidelberg 2009
272
Y. Dong and M. Hintermüller
where BV (Ω) denotes the space of functions of bounded variation and λ > 0. Because of the edge preservation ability, the TV-model is widely accepted as a reliable tool in image restoration. Over the years, various research efforts have been devoted to studying, solving and extending the TV-model; see, e.g., [3, 4, 5, 6, 7, 8, 9] as well as the monograph [1] and the many references therein. In general, images are comprised of multiple objects at different scales. This suggests that different values of λ localized at image features of different scales are desirable to obtain better restoration results. For this reason, a multi-scale total variation (MTV) model with a spatially varying choice of parameters was proposed [10]. In order to enhance image regions containing details while still sufficiently smoothing homogeneous features, a spatially dependent regularization parameter selection was proposed in [11]. In this paper, we will extend the multi-scale total variation with spatially dependent regularization parameter to restore degraded color images. The automated adjustment strategy of the regularization parameter is based on local variance estimators combined with a confidence interval technique. For speeding up the performance of the scheme we generalize the multi-scale representation according to [12, 13], and the corresponding subproblems are solved by a superlinearly convergent algorithm based on Fenchel-duality and inexact semismooth Newton techniques. The latter extends earlier work in [9]. The outline of the rest of the paper is as follows. In Section 2 we introduce the multi-scale vectorial total variation model and the primal-dual algorithm for solving the associated minimization problem. In Section 3 we extend the LVE-based parameter selection to color images. Section 4 proposes a method for color image restoration combining the multi-scale representation and spatially adaptive parameter selection. Section 5 gives numerical results to demonstrate the performance of the new method. Finally conclusions are drawn in Section 6.
2
Multi-scale Vectorial Total Variation
Based on the TV-model (1), in [14] the vectorial total variation (VTV) regularization was proposed for restoring color images: λ min |Du| + |Ku − z|2 dx, (2) 2 Ω u∈BV (Ω;RM ) Ω where the space BV (Ω; RM ) of vector-valued functions is the set of functions u ∈ L1 (Ω; RM ) such that Ω |Du| < ∞, where the vectorial TV norm Ω |Du| is defined as |Du| = sup u · divv dx : v ∈ Cc1 (Ω; RM×2 ), |v| ≤ 1 , Ω
and |v| =
Ω
M i=1 vi , vi .
The space BV (Ω; RM ) endowed with the norm uBV (Ω;RM ) = uL1 (Ω;RM ) + |Du| Ω
is a Banach space.
Multi-scale Total Variation
273
In the VTV-model (2), the parameter λ controls the trade-off between a good fit of z and a smoothness requirement due to the vectorial total variation regularization. Since images are usually comprised of multiple objects at different scales, locally different λ is desirable. Therefore, here we consider multi-scale vectorial total variation (MVTV): 1 min |Du| + λ(x)|Ku − z|2 dx. (3) 2 Ω u∈BV (Ω;RM ) Ω Similar as in Section 2 of [11], we can obtain the same conclusion on the existence and uniqueness of the minimizer for the MVTV-model. Here, we do not repeat proof details, but rather refer to [11]. 2.1
Primal-Dual Approach to Multi-scale Vectorial Total Variation
In [9] an infeasible primal-dual algorithm of generalized Newton-type was proposed for solving (1). In the sequel we extend its key features to the case (3). Rather than operating on the MVTV-model (3) the method is based on 1 μ 2 2 min |∇u| dx + λ|Ku − z| dx + |∇u| dx, (4) 2 2 Ω u∈H01 (Ω;RM ) 2 Ω Ω ¯ for almost all x ∈ Ω and 0 < μ λ ¯ −1 . The μwhere 0 < ≤ λ(x) ≤ λ term serves the purpose of a function space regularization for a “convenient" dualization in a Hilbert space setting. In our numerics, we typically choose μ = 0. Applying the Fenchel-Legendre calculus [15] analogously as in [9], the Fencheldual of (4) reads sup 2
M
p ∈ L (Ω; R ) |p(x)| ≤ 1 a.e. in Ω
1 1 − |||K ∗ z − divp|||2H −1 + z2L2 , 2 2
(P0 )
where |||u|||2H −1 = Hμ,K v, vH01 ,H −1 , v ∈ H −1 (Ω; RM ) with Hμ,K = (K ∗ λK − μ)−1 , : H01 (Ω; RM ) → H −1 (Ω; RM ), and ·, ·H01 ,H −1 denotes the duality pairing between H01 (Ω; RM ) and its dual H −1 (Ω; RM ). Moreover, L2 (Ω; RM ) = (L2 (Ω; RM ))2 . In order to avoid the non-uniqueness of the solution of (P0 ), following [9] we consider a dual regularization: 1 1 β ∗ 2 2 sup − |||K z − divp|||H −1 + zL2 − p2L2 . (P ) 2 2 2 Ω p ∈ L2 (Ω; RM ) |p(x)| ≤ 1 a.e. in Ω
where β > 0 is the regularization parameter. In order to study the effect of the βregularization of the Fenchel-dual, we apply the Fenchel-Legendre calculus once more and find that the dual of (P ) is given by 1 μ 2 2 |∇u|2 dx + λ|Ku − z| dx + Φβ (∇u)dx, (P ∗ ) min 2 Ω u∈H01 (Ω;RM ) 2 Ω Ω where for w ∈ L2 (Ω; RM ),
274
Y. Dong and M. Hintermüller
Φβ (w)(x) =
|w(x)| − β2 , if |w(x)| ≥ β, 1 2 2β |w(x)| , if |w(x)| < β.
(5)
¯ and The first-order optimality conditions of (P ∗ ) characterize the solution u ¯ of (P ∗ ) and (P ), respectively, by p ¯ − div¯ − μ¯ u + K ∗ λK u p = K ∗ λz in H −1 (Ω; RM ), max(β, |∇¯ u|)¯ p − ∇¯ u=0
2
in L (Ω; R ). M
(6a) (6b)
Note that the system (6) is non-smooth, i.e. not necessarily Fréchet-differentiable. The discrete version of this system can be solved by a semismooth Newton method [9, 11]. The generalized Newton solver converges globally, that is regardless of the initialization, and locally at a superlinear rate [9].
3
Spatially Dependent Regularization Parameter Selection
Since the capability of multi-scale vectorial total variation is mainly limited by the selection of the parameter λ, in this section we extend the way to choose λ proposed in [11] to the MVTV-model. Suppose the variance of Gaussian noise is σ 2 , which can be estimated easily in practice. With a correct choice of λ in the TV-model (1), the restored image u can satisfy the constraint |Ku − z|2 dx = σ 2 |Ω| (7) Ω
globally. However, the MVTV-model (3) represents a localized version of the constraint by allowing λ = λ(x). In order to enhance image details while preserving homogenous regions, the choice of λ must be based on local image features. Hence, we search for a reconstruction where the variance of the residual is closer to the noise variance in both the detail regions and the homogeneous parts. In order to achieve this goal we introduce local variance estimators (LVEs) for an automated adaptive choice of λ. 3.1
Local Variance Estimator
Consider the discrete version of the residual image rh = zh − K h uh , where uh is the restored image from the minimization problem (2) with λ > 0. If we use a relatively small parameter λ, the residual rh will include the noise as well as the details. Then, the average of the squared residual in a small window will reflect the distribution of details in the image. ω Let Ωi,j denote the set of pixel-coordinates in a ω-by-ω window centered at (i, j) (with obvious modification near the boundary), i.e., ω ω
ω Ωi,j ≤ s, t ≤ , = (s + i, t + j) : − 2 2
Multi-scale Total Variation
275
where · means rounding to the nearest integer towards zero. Then we apply the mean filter with window size ω to the residual image rh as follows: LVEω i,j =
M 1 M ω2
rhs,t
ω k=1 (s,t)∈Ωi,j
2 k
=
M 1 M ω2
ω k=1 (s,t)∈Ωi,j
h 2 zs,t − (K h uh )s,t k .
Here LVE stands for a “Local Variance Estimator”. In general, LVEω has a large value in the detail regions, and it has a small value in the homogeneous regions. But the noise in the residual may also lead to some large LVE values in the homogeneous regions. In order to reduce the effect due to noise, we utilize the confidence interval technique well-known in statistics [16, 17] in connection with LVE. 3.2
Upper Bound for the Local Variance
In the discrete setting, all elements of n can be regarded as an array of independent normally distributed random variables with mean 0 and variance σ 2 . Then, the random variable M 1 ω Ti,j = 2 (nhs,t )2k σ ω k=1 (s,t)∈Ωi,j
ω has the χ2 -distribution with M ω 2 degrees of freedom, that is, Ti,j ∼ χ2Mω2 . Set M 1 ω Si,j := (zhs,t − (K h uh )s,t )2k . M ω2 ω k=1 (s,t)∈Ωi,j
ˆ h , then ˆ h satisfies nh = zh − K h u If uh = u ω Si,j =
M 1 M ω2
M 1 M ω2
ˆ h )s,t )2k (zhs,t − (K h u
ω k=1 (s,t)∈Ωi,j
=
(nhs,t )2k =
ω k=1 (s,t)∈Ωi,j
σ2 ω T . M ω 2 i,j
On the contrary, if the residual image zh − K h uh contains details, we expect ω Si,j =
M 1 M ω2
k=1
M 1 > M ω2
(zhs,t − (K h uh )s,t )2k
ω (s,t)∈Ωi,j
ω k=1 (s,t)∈Ωi,j
(nhs,t )2k =
σ2 ω T . M ω 2 i,j
ω > B for some pixel (i, j) Therefore, we search for a bound B such that Si,j implies that in the residual some details are left. Given m × m, the total number
276
Y. Dong and M. Hintermüller
of pixels in the color image with M channels, we propose to consider the expected σ2 ω 2 maximum of the m2 random variables Mω 2 Ts , s = 1, . . . , m , as the bound B: B ω,m :=
σ2 E( max T ω ), M ω 2 k=1,...,m2 k
(8)
where E represents the expected value of a random variable. Similar as proposed in [11], we get σ2 (Em (T ω ) + dm (T ω )), M ω2
B ω,m =
where Em (T ω ) = Td + βκm , dm (T ω ) = β π√6 , βm = m2 fm (Td ), κ = 0.577215, and m f(Td ) is the distribution of Td , which is the so-called dominant value. 3.3
Selection of the Parameter λ
Now, we use the confidence interval for S ω to reduce the effect from noise on the local variance estimators in order to distinguish the detail regions in the images correctly. Recall that LVEω denotes the mean of the squared residual in a given window. Ideally, there is only noise in the residual. Then LVEω should behave like S ω . Hence, whenever ω,m LVEω ), i,j ∈ [0, B
(9)
we assume that the window contains noise only. On the other hand, if (9) is not satisfied, we suppose that this is due to image details contained in the residual ω image in Ωi,j . This property is useful when updating the parameter λ locally. For adapting λ algorithmically we proceed as follows. Initially we assign a small positive value to λ. Then we restore the image iteratively by increasing λ according to the following rule: ˜ k+1 λ i,j
= ζ · min
˜k λ i,j
λk+1 i,j =
+ ω +ρ (LVEk )i,j − σ ,L ,
1 ω2
˜k+1 , λ s,t
(10a)
(10b)
ω (s,t)∈Ωi,j
where ζ ≥ 1, ρ > 0, (x)+ = max(x, 0), LVEω k is obtained from uk , L is a large ˜ k ∈ L∞ (Ω), and for each channel of the vectorial data positive value to ensure λ we use the same λk during restoration. In our numerics we choose ζ = 2 which comes from the method proposed in [12] (TNV-algorithm). Finally, we set the ˜ k ||∞ /σ in order to keep the new λ ˜ k+1 at the same scale parameter ρ = ρk = ||λ ˜ as λk .
Multi-scale Total Variation
4
277
Our Method
Recently, a multi-scale image decomposition method (TNV-algorithm) was proposed in [12], which uses the TV-model (1) to extract the details in the residual, and which varies the regularization parameter over a sequence of dyadic scales to capture different features in the image. Although this method performs better than a number of existing methods, it satisfies the constraint (7) only globally, and does not consider the local characteristic of the features in the image. Referring to this decomposition method, we intertwine its idea with the MVTV-model (3), and combine it with the spatially dependent regularization parameter selection. This results in the following algorithm: Algorithm 2
2
1: Initialize uh0 = 0 ∈ RMm , ph0 = 0 ∈ RMm ×2 , λ0 = [λ0 , · · · , λ0 ] ∈ RM with 2 λ0 ∈ Rm and k = 0. 2: If k = 0 solve the discrete version of the minimization problem 1 μ 2 0 2 ˜ 0 = arg u min |∇u|2 dx + λ |Ku − z| dx + |∇u| dx, 2 Ω u∈H01 (Ω;RM ) 2 Ω Ω else compute vkh = zh − K h uhk and solve the discrete version of the minimization problem: 1 μ 2 k 2 ˜ k = arg u min |∇u|2 dx + λ |Ku − vk | dx + |∇u| dx, 2 Ω u∈H01 (Ω;RM ) 2 Ω Ω ˜ hk . 3: Update uhk+1 = uhk + u h 4: Based on uk+1 , update
+ ˜ k+1 = 2 · min λ ˜k + ρ λ LVEω − σ ,L , k
(λk+1 )i,j =
1 ω2
˜k+1 . λ s,t
ω (s,t)∈Ωi,j
5: Stop; or set k := k + 1 and go to step 2. A few remarks on the algorithm are in order. We initialize λ by a relatively small positive constant. In our numerical practice an 11-by-11 window turned out to yield reliable results. In Section 5, we study the influence of the window size on the restoration results. Similar to the Bregman iteration proposed in [18], we stop the iterative procedure as soon as the residual zh −K h uhk 2 drops below ξσ, where ξ > 1 relates to the image size. For m → ∞ we have ξ → 1.
5
Numerical Results
In this section we provide numerical results to study the behavior of our method with respect to its image restoration capabilities. We use two RGB color images
278
Y. Dong and M. Hintermüller
(a)
(b)
Fig. 1. Original images: (a) “Barbara”, (b) “Lena”
(a)
(b)
(c)
Fig. 2. Results of denoising image “Barbara” (the 1st row) and “Lena” (the 2nd row): (a) Noisy images, (b) Restored images (k = 3), (c) Final values of λ
(i.e., M = 3), “Barbara” (576-by-720) and “Lena” (512-by-512), as shown in Figure 1. Furthermore, from the experiments conducted on a broad variety of images we found that our method is robust with respect to the initial choice of λ. Thus, in all experiments listed here we use the same initial choice λ = 2.5. 5.1
Color Image Denoising
Here, we concentrate on image denoising, i.e., K h is the identity matrix. The degraded images containing Gaussian white noise with the noise level σ = 0.1. For a study of our method in the case of texture-like structures we zoom the
Multi-scale Total Variation
279
Fig. 3. Restored images by our method with different ω: (a) ω = 5, (b) ω = 11, (c) ω = 17
(a)
(b)
(c)
Fig. 4. Results of restoring blurred noisy image “Barbara” (the 1st row) and “Lena” (the 2nd row): (a) Blurred noisy images, (b) Restored images (k = 5), (c) Final values of λ
two images in Figure 1 into certain regions. In all of our experiments the image intensity range is scaled to [0, 1]. The results are shown in Figure 2 together with the number of iterations k. We can see that our method suppresses the noise successfully while preserving the details. In addition, we also show the final values of λ obtained by our choice rule. We find that in detail regions λ is large in order to preserve the details, and it is small in the homogeneous regions to remove noise.
280
Y. Dong and M. Hintermüller
In order to test our method for different values of the window size ω, Figure 3 shows the restored images with ω = 5, 11, 17. Except for some slight effects, we observe a remarkable stability with respect to ω. 5.2
Color Image Deblurring and Denoising
In this section, we illustrate the restoration ability of our method for noisy blurred images. The blurring operator K is a cross-channel blurring operator with the kernel: ⎡ ⎤ ⎡ ⎤ Krr Krg Krb 0.8 · (M, 7, 135) 0.1 · (G, 9, 7) 0.1 · (A, 7) ⎣ Kgr Kgg Kgb ⎦ = ⎣ 0.1 · (A, 9) 0.8 · (M, 7, 90) 0.1 · (G, 5, 1) ⎦ , Kbr Kbg Kbb 0.1 · (G, 7, 5) 0.1 · (M, 7, 45) 0.8 · (A, 11) where (A, r) denotes the average blur with window size r, (G, r, σ) denotes the Gaussian blur with window size r and standard deviation σ, (M, l, θ) denotes the motion blur with length l and angle θ, and (r, g, b) are the three channels in the RGB color model. Further we have Gaussian white noise with σ = 0.02. Figure 4 depicts a part of the noisy blurred “Barbara” and “Lena” images with the restored results and final values of λ. We find that our method still can preserve most of the details; see, e.g., the features on the scarf. Furthermore, for noisy blurred images our method is still able to distinguish most of the detail regions properly.
6
Conclusion
A multi-scale vectorial total variation model with spatially adapted regularization parameter λ for color image restoration is proposed in this paper. The local variance estimator LVE of the residual image is extended to the multi-channel case, and turns out to be an accurate instrument for updating λ within an iterative procedure. Assuming that the noise variance σ 2 is known, the present algorithm is completely automatized, i.e., there is no necessity of tuning parameters. The numerical results show that the new method can restore the degraded images efficiently while preserving most details.
References 1. Vogel, C.: Computational Methods for Inverse Problems. Frontiers Appl. Math., vol. 23. SIAM, Philadelphia (2002) 2. Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 3. Dobson, D., Vogel, C.: Convergence of an iterative method for total variation denoising. SIAM J. Numer. Anal. 34, 1779–1791 (1997) 4. Chambolle, A., Lions, P.L.: Image recovery via total variation minimization and related problems. Numerische Mathematik 76, 167–188 (1997)
Multi-scale Total Variation
281
5. Chang, Q., Chern, I.L.: Acceleration methods for total variation-based image denoising. SIAM J. Applied Mathematics 25, 982–994 (2003) 6. Strong, D., Chan, T.: Edge-preserving and scale-dependent properties of total variation regularization. Inverse Problems 19, 165–187 (2003) 7. Chambolle, A.: An algorithm for total variation minimization and application. Journal of Mathematical Imaging and Vision 20, 89–97 (2004) 8. Hintermüller, M., Kunisch, K.: Total bounded variation regularization as bilaterally constrained optimization problem. SIAM J. Appl. Math. 64, 1311–1333 (2004) 9. Hintermüller, M., Stadler, G.: An infeasible primal-dual algorithm for total bounded variation-based inf-convolution-type image restoration. SIAM Journal on Scientific Computing 28(1), 1–23 (2006) 10. Almansa, A., Ballester, C., Caselles, V., Haro, G.: A TV based restoration model with local constraints. J. Sci. Comput. 34(3), 209–236 (2008) 11. Dong, Y., Hintermüller, M., Rincon-Camacho, M.: Automated parameter selection in a multi-scale total variation model. IFB-Report No. 22, Institute of Mathematics and Scientific Computing, University of Graz (November 2008) 12. Tadmor, E., Nezzar, S., Vese, L.: A multiscale image representation using hierarchical (BV, L2 ) decompositions. Multiscale Model. Simul. 2, 554–579 (2004) 13. Tadmor, E., Nezzar, S., Vese, L.: Multiscale hierarchical decomposition of images with applications to deblurring, denoising and segmentation. Comm. Math. Sci. 6, 1–26 (2008) 14. Bresson, X., Chan, T.: Fast dual minimization of the vectorial total variation norm and applications to color image processing. Inverse Problems and Imaging 2(4), 455–484 (2008) 15. Ekeland, I., Témam, R.: Convex Analysis and Variational Problems. Classics Appl. Math., vol. 28. SIAM, Philadelphia (1999) 16. Papoulis, A.: Probability, Random Variables, Stochastic Processes. McGraw Hill, New York (1991) 17. Mood, A.: Introduction to the Theory of Statistics. McGraw-Hill, New York (1974) 18. Osher, S., Burger, M., Goldfarb, D., Xu, J., Yin, W.: An iterative regularization method for total variation-based image restoration. SIAM Multiscale Model. and Simu. 4, 460–489 (2005)
Multiplicative Noise Cleaning via a Variational Method Involving Curvelet Coefficients Sylvain Durand1 , Jalal Fadili2 , and Mila Nikolova3 1
2
3
M.A.P. 5 - CNRS, University Paris Descartes, France [email protected] http://www.math-info.univ-paris5.fr/∼sdurand/ GREYC CNRS-ENSICAEN-Université de Caen, France [email protected] http://www.greyc.ensicaen.fr/∼jfadili/ CMLA - CNRS, ENS Cachan, PRES UniverSud, France [email protected] http://www.cmla.ens-cachan.fr/∼nikolova/
Abstract. Classical ways to denoise images contaminated with multiplicative noise (e.g. speckle noise) are filtering, statistical (Bayesian) methods, variational methods and methods that convert the multiplicative noise into additive noise (using a logarithmic function) in order to apply a shrinkage estimation for the log-image data and transform back the result using an exponential function. We propose a new method that involves several stages: we apply a reasonable under-optimal hard-thresholding on the curvelet transform of the log-image; the latter is restored using a specialized hybrid variational method combining an 1 data-fitting to the thresholded coefficients and a Total Variation regularization (TV) in the image domain; the restored image is an exponential of the obtained minimizer, weighted so that the mean of the original image is preserved. The minimization stage is realized using a properly adapted fast Douglas-Rachford splitting. The existence of a minimizer of our specialized criterion and the convergence of the minimization scheme are proved. The obtained numerical results outperform the main alternative methods.
1
Introduction
In many active imaging systems (e.g. synthetic aperture radar, laser or ultrasound imaging), the data for the unknown image S0 : Ω → R+ , Ω ⊂ R2 , are severely corrupted with multiplicative noise. Then several independent measurements for the same image are needed: Sk = S0 ηk + nk ,
∀k ∈ {1, · · · , K},
(1)
where ηk : Ω → R+ , and nk represent the multiplicative and a typically zeromean additive noise, ∀k. Commonly (see e.g. [27]) ηk is modeled as a onesided exponential probability density function (pdf) (cf. Fig. 1(a)): pdf(ηk ) = X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 282–294, 2009. c Springer-Verlag Berlin Heidelberg 2009
Multiplicative Noise Cleaning
283
μ e−μηk 1lR+ (ηk ) for μ = 1. In practice, one takes an average of all measurements, 1 K see e.g. Fig. 2(b). Since K k=1 nk ≈ 0, the data read (cf. e.g. [27, 1, 30]): S=
K K 1 1 Sk = S 0 ηk = S0 η . K K k=1
(2)
k=1
Usually all ηk are independent. Denoting by Γ the usual Gamma-function, the mean of the noise η in (2) has a Gamma distribution (cf. Fig. 1(b)): η=
K 1 ηk : K
pdf(η) =
k=1
K K η K−1 exp (−Kη) . Γ (K)
(3)
Various adaptive filters have been proposed, see e.g. [31,17]: they work well when the noise is moderate or weak, i.e. for K large. Bayesian, variational or diffusionbased methods have been proposed as well; see e.g. [28, 24, 18, 2]. Numerous methods convert the multiplicative noise into additive noise by v = log S = log S0 + log η = u0 + n,
(4)
see e.g. [16, 30, 1, 23]. Then the pdf of n reads (cf. Fig. 1(c)): −1 exp − K(n − en ) . n = log η : pdf(n) = K K Γ (K)
(5)
One can prove that E [n] = ψ0 (K) − log K and Var [n] = ψ1 (K), where ψk (z) = d k+1 log Γ (z) is the polygamma function. A common strategy is to decompose dz the log-data v into a multiscale frame for L2 (R2 ) (an over-complete basis), say W ≡ {wi : i ∈ I} where I is a set of indexes: y = W v = W u0 + W n.
(6)
By the Central Limit Theorem, the noise W n in y is nearly Gaussian — cf. Fig. 1(d). Then coefficients y are denoised using shrinkage estimators T : R → R: yT [i] = T (W v)[i] , ∀i ∈ I. (7) Shrinkage functions designed for multiplicative noise were proposed e.g. in [30,1]. ≡ {w Let W i : i ∈ I} be a left inverse of W . Then a denoised log-image vT reads T ((W v)[i]) w i = T (y[i]) w i . (8) vT = i∈I
i∈I
Then the sought-after image is of the form ST = exp vT . 1
1
0
10
(a) ηk
0
1
1
(b) η =
K 2
1 K
k=1
1
−2
0
1
ηk (c) n = log η
Fig. 1. Noise distributions
−1
0
(d) W n
1
284
S. Durand, J. Fadili, and M. Nikolova
Our approach. We apply (4) and consider a tight-frame transform of the logdata. The restored log-image (section 2) minimizes a criterion composed of an 1 -fitting to the (suboptimally) hard-thresholded frame coefficients and a Total Variation (TV) regularization in the image domain. The minimization (section 3) uses a specialized Douglas-Rachford splitting. The full algorithm, involving a bias correction, is given in section 4. Experiments are presented in section 5. Some notations. (.T ) means transposed, (.∗ ) means convex conjugate and (. ) means adjoint.
2
Restoration of the Log-Image
Here we consider how to restore a good log-image given data v : Ωmega → R obtained using (4). We focus on methods which, for a given preprocessed data set, lead to convex optimization problems. We comment only variational methods and shrinkage estimators since they underly our specialized hybrid objective function. 2.1
Drawbacks of Shrinkage Restoration and Variational Methods
Shrinkage restoration. The main problems with these methods, sketched in (7)(8), is that shrinking large coefficients entails an erosion of the spiky features, while shrinking small coefficients yields Gibbs-like oscillations in the vicinity of edges and a loss of details in the textured area. On the other hand, if shrinkage is insufficient, some coefficients bearing mainly noise can remain almost unchanged—we call such coefficients outliers—and (8) shows that they yield artifacts with the shape of the functions w i , see Fig. 2. Even though various improvements were brought, these artifacts remain visible—see the results on Fig. 3(d) and Fig. 4(c) in Section 5 using the very recent Stein-block thresholding [8].
(a) Noisy, K = 10
(b) T = 2 Var [n] (c) T = 4 Var [n] (d) T = 6 Var [n]
Fig. 2. (a) Noisy Lena obtained according to (1)-(2) for K = 10. (b)-(d) Restorations exp vTH where data v are denoised by hard-thresholding of its curvelet coefficients, see (12)-(13), for different choices of T .
Multiplicative Noise Cleaning
285
(b) Noisy: μ = 1, K = 10 (c)ˆ u by (11) and Sˆ by (34) see (2)-(3) psnr=26.2 db, mae=8.5
(a) Original (256 × 256)
(d) Stein-block [8] (e) AA algorithm [2] (f) Our method psnr=25.5 db, mae=9.4 psnr=25.4 db, mae=9.4 psnr=26.05 db, mae=8.8 Fig. 3. Restoration of (b) using modern methods. Note that (c) is a slightly improved version of [26] and that the restoration in (d) is done in the curvelet domain.
(a) Original
(b) Noisy
(c) Stein-block th.
(d) Our method
Fig. 4. (a) Shepp-Logan phantom (256 × 256). (b) Noisy, K = 10. (c) Denoised with Stein-block thresholding in the curvelet domain [8] PSNR=24.73dB, MAE=4. (d) Denoised with our algorithm PSNR=31.25dB, MAE=1.87.
Variational methods. In these methods, the restored function minimizes a criterion Fv of the form Fv (u) = ρ Ω
ψ u(t), v(t) dt +
ϕ(|∇u(t)|) dt, Ω
(9)
286
S. Durand, J. Fadili, and M. Nikolova
where ψ : R+→ R+ measures closeness to data and ϕ(|∇u(·)|) introduces priors 2 via a trade-off parameter ρ > 0. A classical choice is ψ = u(·)−v(·) . It is usually required that the potential function ϕ : R+ → R+ promotes images involving edges. Analysing the minimizers of Fv as solutions of PDE’s on Ω, Rudin, Osher and Fatemi [25] exhibited that ϕ(|∇u(t)|) = |∇u(t)|, leads to such images, where def for any z(t) = (z1 (t), z2 (t)) ∈ R2 , t ∈ Ω, one sets |z(t)| = z1 (t)2 + z2 (t)2 . The resulting regularization term is known as Total Variation (TV) and will be denoted by · TV . However, whatever smooth data-fitting is chosen, this regularization yields images containing numerous constant regions (called staircasing effect), hence textures and fine details are removed, see [22]. The method in [2] is of this kind and operates in the image domain; the fitting term is derived ˆ defined by from (3) and the denoised image S, Sˆ = arg min FS for FS (Σ) = ρ(K) log Σ(t) + S(t)/Σ(t) dt + Σ TV , (10) Σ
exhibits constant regions (see section 5). In [26], the regularization Σ TV is changed into log Σ TV so as to reformulate the model as a convex problem, and not to over smooth the image parts with higher gray values. To recover the denoised image, we applied Sˆ ∝ exp(ˆ u) for u ˆ = arg min where Fv (u) = ρ u − v 2 + u TV . u
(11)
Following [25], various edge-preserving convex functions ϕ have been proposed; see [3] for a recent overview. Even though ϕ (0) = 0 alleviates stair-casing, a systematic drawback of the resulting restored images is that the amplitude of edges is underestimated; thus neat edges or spiky areas are subjected to erosion. 2.2
Hybrid Methods
Hybrid methods, see e.g. [9, 19, 5, 14], combine the information contained in the large coefficients y[i] obtained according to (6) with priors directly on the image u. They amount to define the restored function u ˆ by minimize Φ(u) subject to u ˆ ∈ {u : |(W (u − v)) [i]| ≤ μi , ∀i ∈ I} . Using an edge-preserving regularization, such as Φ = TV is a pertinent choice. The selection of parameters {μi }i∈J is more tricky. This choice must take into account the magnitude of the relevant data coefficient y[i]. However, choosing μi based solely on y[i], as done in these papers, is too rigid since there are either correct data coefficients that incur smoothing (μi > 0), or noisy coefficients that are left unchanged (μi = 0). A good compromise that we adopt is to determine (μi )i∈I based both on the data and on the prior term. 2.3
A Specialized Hybrid Criterion
Given the log-data v obtained by (4), we apply a frame transform as in (6) to get y = W v = W u0 + W n. The noise contained in the i-th datum reads n, wi .
Multiplicative Noise Cleaning
287
The low frequency approximation coefficients carry important information on the image. Therefore, a good choice is to keep them intact at this stage. Let I∗ ⊂ I denote the subset of all such elements of the frame. Then we apply a hard-thresholding operator TH [12] to all coefficients I \ I∗ :
0 if |t| ≤ T, def yTH [i] = TH y[i] , ∀i ∈ I \ I∗ , where TH (t) = (12) t otherwise, where T is an underoptimal threshold in order to preserve the information relevant to edges and to some fine details in textured areas, contained in the small coefficients. Let us consider vTH = W v[i] w i , where I1 = {i ∈ I : |y[i]| > T } ∪ I∗ . (13) i∈I1
The image vTH contains a lot of artifacts with the shape of the w i for those y[i] that are noisy but above the threshold T , as well as information on the fine details in the original log-image u0 . In all cases, whatever the choice of T , an image of the form vTH is unsatisfactory—see Fig. 2. The denoised coefficients, denoted by x ˆ, are obtained based on the under-thresholded data yTH . We focus on hybrid methods of the form: x ˆ = arg minx F (x) for x), where Ψ is a data-fitting term in the frame domain and F (x) = Ψ (x, yTH ) + Φ(W Φ is an edge-preserving regularization term in the log-image domain. Let us denote I0 = I \ I1 = {i ∈ I \ I∗ : |y[i]| ≤ T }.
(14)
Coefficients y[i] for i ∈ I0 can be of the two types. 1. Coefficients y[i] bearing mainly noise—then the best choice is x ˆ[i] = 0; 2. Coefficients y[i] relevant to edges and other details in u0 . Since y[i] is difficult to distinguish from the noise, the relevant x ˆ[i] should be restored using the edge-preserving prior Φ. Note that a careful restoration must find a nonzero x ˆ[i] in order to avoid Gibbs-like oscillations in u ˆ. Coefficients y[i] for i ∈ I1 are of the following two types. 1. Large coefficients which carry the main features of the sought-after function. They verify y[i] ≈ wi , u0 and can be kept intact. 2. Coefficients highly contaminated by noise, i.e. |y[i]| | wi , u0 |. We call them outliers because if we had x ˆ[i] = y[i], then u ˆ would contain an artifact with the shape of w i since by (13) we get vTH = ˆ[j]w j + y[i]w i . Instead, x ˆ[i] must be restored according to the j\i x prior Φ. This analysis clearly defines the goals that the minimizer x ˆ of F is expected to achieve: restored coefficients x ˆ[i] have to fit yTH [i] exactly if they are coherent with the prior Φ, otherwise they have to be restored according to Φ. Since [21] it is known that such requirements can be satisfied by criteria F where Ψ is non-smooth at the origin (e.g. 1 ), see also [13]. For these reasons, we focus on F (x) = Ψ (x) + Φ(x),
(15)
288
S. Durand, J. Fadili, and M. Nikolova
where, for Λ = diag(λi )i∈I , λi |(x − y)[i]| + λi |x[i]| = Λ(x − yTH ) 1 , Ψ (x) = i∈I1 ∪I∗
Φ(x) =
Ω
(16)
i∈I0
x| ds = W x . |∇W TV
(17)
In the pre-processing step (12) we do not recommend the use of a shrinkage function other than TH since it will alter all the data coefficients without restoring them faithfully. Via TH , we base our restoration on data yTH where all nonthresholded coefficients keep the original information on the sought-after image. The theorem stated next addresses the existence and the uniqueness of a minimizer for F . Given y, let Gy be the (convex) set of all minimizers of F : def Gy = x ˆ ∈ 2 (I) : F (ˆ x) = min F (x) . (18) 2 x∈ (I)
2
Theorem 1. [13] For y ∈ (I) and T > 0 given, consider F as defined in (15), where Ω ∈ R2 is open, bounded and its boundary ∂Ω is Lipschitz. Suppose is the pseudo-inverse that {wi }i∈I is a frame of L2 (Ω) and the operator W of W . Assume also that λmin = min λi > 0. Then Gy is nonempty, and for all i∈I
x x x ˆ1 , x ˆ2 ∈ Gy , ∇W ˆ1 ∝ ∇W ˆ2 , a.e. on Ω. x x In words, Sˆ1 = W ˆ1 and Sˆ2 = W ˆ2 have the same level lines, i.e. they differ by a local change of contrast; the latter is usually invisible to the naked eye. The choice of λi is investigated in [13]. Following this analysis, we use only two values for λi , depending only on the set I the index i belongs to. We focus on curvelets transforms of the log-data because (a) such a transform captures efficiently the main features of the data and (b) it is a tight-frame which is helpful for the subsequent numerical stage.
3
Minimization for the Log-Image
Let Γ0 (H) denote the class of proper lower-semicontinuous convex functions on a Hilbert space H. Now we focus on the minimization problem find xˆ such that F (ˆ x) =
min F for F = Ψ + Φ, x
(19)
where Ψ and Φ are defined in (16)-(17). Clearly, Ψ, Φ ∈ Γ0 (2 (I)), hence F ∈ Γ0 (2 (I)). The set Gy in (18) is non-empty by Theorem 1 and can be rewritten as Gy = {x ∈ 2 (I) x ∈ (∂F )−1 (0)}, where ∂F stands for subdifferential. Minimizing F amounts to finding a solution to the fixed point equation x = (Id + γ∂F )−1 (x) ,
(20)
where (Id + γ∂F )−1 is the resolvent operator associated to ∂F , γ > 0 is the proximal stepsize and Id is the identity map on 2 (I). Since (Id + γ(∂Ψ + ∂Φ))−1 cannot be calculated in closed-form, we focus on splitting methods that use separately the resolvent operators (Id + γ∂Ψ )−1 and (Id + γ∂Φ))−1 .
Multiplicative Noise Cleaning
3.1
289
Specialized Douglas-Rachford (D-R) Splitting Algorithm
The D-R family is the most general class of monotone operator splitting methods. Given a sequence μt ∈ (0, 2), D-R methods can be expressed via the recursion μt μt Id+ (2(Id+γ∂Ψ )−1− Id) ◦ (2(Id+γ∂Φ)−1− Id) x(t) . (21) x(t+1)= 1− 2 2 Since problem (19) has solutions, we have the following convergence result: Theorem 2. Let γ > 0 and μt ∈ (0, 2) be such that t∈N μt (2 − μt ) = +∞. Take x(0) ∈ 2 (I) and consider the sequence of iterates defined by (21). Then, (x(t) )t∈N converges weakly to some point x ˆ ∈ 2 (I) and (Id+γ∂Φ)−1 (ˆ x) ∈ Gy . The statement follows from [10, Corollary 5.2]. The sequence μt = 1, ∀t ∈ N fits. 3.2
Proximal Calculus
Proximity operators, invented in [20], generalize convex projection. Definition 1 (Moreau [20]). Let ϕ ∈ Γ0 (H). Then ∀x ∈ H the function z → 2 ϕ(z)+ x−z /2, for z ∈ H, achieves its infimum at a unique point denoted by proxϕ x. The relevant operator proxϕ : H → H is the proximity operator of ϕ. By the minimality condition for proxϕ , it is easy to see that ∀x, p ∈ H we have p = proxϕ x ⇐⇒ x − p ∈ ∂ϕ(p) ⇐⇒ (Id + ∂ϕ)−1 = proxϕ . By introducing def
the reflection operator rproxϕ = 2proxϕ − Id, the D-R iteration (21) reads μt μt Id + rproxγΨ ◦ rproxγΦ x(t) . (22) x(t+1) = 1 − 2 2 Proximity operator of Ψ
Lemma 1. Let x ∈ 2 (I). Then proxγΨ (x) = yTH [i]+TS γλi (x[i]−yTH [i]) , i∈I where TS γλi (z[i]) = max 0, z[i] − γλi sign(z[i]) . The proof is quite standard and can be found in our Report [15]. Note that rproxγΨ (x) = 2 yTH [i] + TS γλi (x[i] − yTH [i]) −x . (23) i∈I
(x). Computing proxγΦ Proximity operator of Φ. Clearly, Φ(x) = · TV ◦ W for an arbitrary W may be intractable. We assume that : 2 (I) → L2 (Ω) is surjective; (w1) W W = Id and W = c−1 W for 0 < c < ∞; note that W W = c Id; (w2) W (w3) W is bounded.
Let X = L2 (Ω) × L2 (Ω), ·, · X be the inner product in X and
·
p , p ∈ γ
[1, ∞] the Lp -norm on X . DefineB ∞ (X ) as the γ-radius closed L∞ -ballin X ,
γ def B ∞ = z ∈ X :
z
∞ ≤ γ = z = (z1 , z2 ) ∈ X : |z(t)| ≤ γ, ∀t ∈ Ω , and γ
PB γ (X ) : X → B ∞ (X ) the associated projector. ∞
290
S. Durand, J. Fadili, and M. Nikolova γ
Lemma 2. Let x ∈ 2 (I) and B ∞ (X ) is as defined above. Then: (x) ; proxγΦ (x) = Id − W ◦ Id − proxc−1 γ·TV ◦ W
(24)
(25) proxc−1 γ·TV (u) = u − PC (u) ,
γ/c where C = div(z) ∈ L2 (Ω) z ∈ Cc∞ (Ω × Ω), z ∈ B ∞ (X ) . (26) 2 Sketch of the proof. By (w1), range(W ) = L (Ω). Using that domain( · TV ) = = {0}. Statement (i) follows from L2 (Ω), we find cone dom · TV − range W
applying [11, Proposition 11] whose requirements are satisfied. If ϕ ∈ Γ0 (L2 (Ω)) and ϕ∗ is its convex conjugate, the Moreau decomposition [20, Proposition 4.a] asserts proxϕ + proxϕ∗ = Id .
(27)
Since the conjugate function of a norm is the indicator function of the ball of ∗ ∈ C. where C is given its dual norm, c−1 γ · TV (z) = 0 if z ∈ C, +∞ if z ∗ = PC . Identifying c−1 γ . TV with in (26). Using Definition 1, prox −1 c γ.TV ∗ ϕ and c−1 γ . TV with ϕ∗ , equation (27) leads to (ii)1 . From (24)-(25) we easily find that (x) . rproxγΦ (x) = Id − 2W ◦ PC ◦ W
(28)
Calculation of the projection PC in (25) on a discrete grid. In this case, W is an M×N tight frame with M= #I N = #Ω and assumption (w2) reads W = Id and W = c−1 W T , c ∈ (0, ∞) hence W T W = c Id). The discrete W counterpart of X is X = 2 (Ω) × 2 (Ω). We denote the discrete gradient by ¨ (cf. [6] or [29]) and the discrete divergence Div : X → 2 (Ω) is defined as ∇ ¨ . Moreover, C in (26) admits a simpler expression: Div = −∇
γ/c (29) C = Div(z) ∈ 2 (Ω) z ∈ B ∞ (X ) , γ/c
where B ∞ (X ) is defined using the new discrete notations. The projection PC in (25) does not admit an explicit form so we provide an iterative scheme for its calculation in the next lemma. Lemma 3. We adapt all assumptions of Lemma 2 to the new discrete setting, as explained above. Consider the forward-backward iteration ¨ Div(z (t) ) − cu/γ z (t+1) = PB 1 (X ) z (t) + βt ∇ (30) ∞
(31) 0 < inf βt ≤ sup βt < 1/4 t t z[i, j] if |z[i, j]| ≤ 1; PB 1 (X ) (z)[i, j] = (32) ∞ z[i, j]/|z[i, j]| otherwise . for
where ∀(i, j) ∈ Ω , 1
Note that our argument (27) to compute proxc−1 γ·TV (u) is not used in [6], which instead uses conjugates and bi-conjugates of the objective function.
Multiplicative Noise Cleaning
291
Then 1
(i) (z (t) )t∈N converges to a point zˆ ∈ B ∞ (X ); −1 (t) (ii) c γDiv(z ) converges to c−1 γDiv(ˆ z ) = (Id − proxc−1 γ·TV )(u). t∈N
The proof of this lemma can be found in our Report [15]. The iteration proposed in (30) to compute the proximity operator of the TV-norm is different from the projection algorithm of [6]. A similar iteration was proposed in [7] and in some other articles. The proof we gave is however simpler as it uses known properties of proximity operators. Note that computing prox·TV amounts to solving a discrete ROF-denoising. Our iteration to solve this problem is one possibility among others, see e.g. a recent report [4]. A crucial property of the D-R scheme (22) is its robustness to numerical errors that may occur when computing the proximity operators proxΨ and proxΦ , see [10]. More precisely, let at ∈ 2 (I) be an error term that models the inexact computation of proxγΦ in (24), as the latter is obtained through (30). If the sequence of error terms (at )t∈N and stepsizes (μt )t∈N in Theorem 2 obey t∈N μt at < +∞, then the D-R algorithm (22) converges [10, Corollary 6.2]. In our experiments, using 200 inner iterations in (30) is sufficient to satisfy this requirement. 3.3
Bias Correction to Recover the Sought-After Image
x Recall from (4) that u0 = log S0 and set u ˆ=W ˆ(NDR ) as the estimator of u0 , where NDR is the number of D-R iterations in (22). Unfortunately, the estimator u ˆ is prone to bias, i.e. E [ˆ u] = u0 − buˆ . A problem that classically arises in statistical estimation is how to correct such a bias. More importantly is how this bias affects the estimate after applying the inverse transformation, here the ˆ exponential. Our goal is then to ensure that for the estimate S of the image, we ˆ ˆ have E S = S0 . Expanding S in the neighborhood of E [ˆ u], we have u])(1+Var [ˆ u] /2+R2 ) = S0 exp (−buˆ )(1+Var [ˆ u] /2+R2 ) , (33) E euˆ = exp (E [ˆ where R2 is expectation of the Lagrange remainder in the Taylor series. One can observe that the posterior distribution of u ˆ is nearly symmetric, hence R2 ≈ 0. Then buˆ ≈ log(1v +Var [ˆ u] /2) ensures unbiasedness. Consequently, finite sample (nearly) unbiased estimates of u0 and S0 are respectively u ˆ + log(1 + Var [ˆ u] /2), and exp (ˆ u) (1 + Var [ˆ u] /2). Var [ˆ u] can be reasonably estimated by ψ1 (K), the variance of the noise n in (4) being given in (1). Thus, given the restored logimage u ˆ, our denoised image read: Sˆ = exp (ˆ u) (1 + ψ1 (K)/2) .
4
(34)
Full Algorithm to Suppress Multiplicative Noise
Piecing together Lemmas 1 and 2, and Theorem 2, we write down the full multiplicative noise removal algorithm:
292
S. Durand, J. Fadili, and M. Nikolova
Task: Denoise an image S corrupted with multiplicative noise according to (2). Parameters: The observed noisy image S, number of iterations NDR (DouglasRachford outer iterations) and NFB (Forward-Backward inner iterations), stepsizes μt ∈ (0, 2), 0 < βt < 1/4 and γ > 0, tight-frame transform W and initial threshold T (e.g. T = 2 ψ1 (K)), regularization parameters λ0,1 associated to the sets I0,1 . Specific operators: (a) TS γλi (z) = max 0, z[i] − γλi sign(z[i]) , ∀z ∈ R#I . i∈I
z[i, j] if |z[i, j]| ≤ 1 (b) ∀(i, j) ∈ Ω, PB 1 (X ) (z)[i, j] = ∞ z[i, j]/|z[i, j]| else. ¨ and Div—the discrete versions of the continuous operators ∇ and div. (c) ∇ (d) ψ1 (·) defined according to (1) (built-in Matlab function). Initialization: Compute v = log S and transform coefficients y = W v. Hardthreshold y at T to get yTH . Choose x(0) . Main iteration: For t = 1 to NDR , x(t) . (1) Inverse curvelet transform of x(t) according to u(t) = W (0) (2) Initialize z ; For s= 0 to NFB− 1 ¨ Div(z (s) ) − c u(t) . z (s) + βt ∇ z (s+1) = P 1 B ∞ (X )
γ
(3) Set z (t) = z (NFB) and compute w(t) = c−1 γ Div(z (t) ). (4) Forward curvelet transform: α(t) = W w(t) . (5) Compute r(t) = rproxγΦ (x(t) ) = x(t) − 2α(t) . (6) Find q (t)= rproxγΨ ◦ rproxγΦ x(t)= 2 yTH [i]+TS γλi r(t) [i]−yTH [i] −r(t) . (t) (7) Update x(t+1) : x(t+1) = (1 − μt /2) x(t) + (μt /2)q . x(NDR ) (1 + ψ1 (K)/2). Output: Denoised image Sˆ = exp W
5
i∈I
Experiments
In all experiments, our algorithm was run using second-generation curvelet tight frame along with the following set of parameters: ∀t, μt ≡ 1, βt = 0.24, γ = 10 and NDR = 50. The initial threshold T was set to 2 ψ1 (K). For comparison purposes, some very recent multiplicative noise removal algorithms from the literature are considered: the AA algorithm [2] minimizing the criterion in (10), and the Stein-block denoising method [8] in the curvelet domain, applied on the log transformed image. The latter is a sophisticated shrinkage-based denoiser that thresholds the coefficients by blocks rather than individually, and has been shown to be nearly minimax over a large class of images in presence of various additive bounded noises. We also tried the L2-TV method where the restored log-image u ˆ minimizes (11) and the denoised image Sˆ involves the bias correction (34). Thanks to the bias correction, it can be seen as an improved version of the first method proposed in the recent Report [26, § 4.1]. For fair comparison, the hyperparameters for all competitors were tweaked to reach their best level of performance on each noisy realization.
Multiplicative Noise Cleaning
293
The denoising algorithms were tested on two images: Lena and Boat, all of size 256×256 and gray-scale in the range [1, 256]. For each image, a noisy observation is generated by multiplying the original image by a realization of noise according to (2)-(3) for K = 10. The running time of our denoising method is 1 minute 3 seconds for 50 iterations on Intel 2.5 GHz Core Duo. The denoising performance of any algorithm is measured in terms of peak signal-to-noise ratio (PSNR) and mean absolute-deviation MAE, namely √ PSNR = 20 log10 N S0 ∞ / Sˆ − S0 dB and MAE = Sˆ − S0 /N . 2
1
The results are depicted in Figs. 3 and 4. Note that the AA algorithm tends to over-regularize the solution. Our denoiser clearly outperforms its competitors.
References 1. Achim, A., Tsakalides, P., Bezerianos, A.: Sar image denoising via bayesian wavelet shrinkage based on heavy-tailed modeling. IEEE Trans. Geosci. Remote Sens. 41(8), 1773–1784 (2003) 2. Aubert, G., Aujol, J.-F.: A variational approach to remove multiplicative noise. J. on Applied Mathematics 68(4), 925–946 (2008) 3. Aubert, G., Kornprobst, P.: Mathematical problems in image processing, 2nd edn. Springer, Berlin (2006) 4. Aujol, J.-F.: Some algorithms for total variation based image restoration. Report CLMA 2008-05 (2008) 5. Candès, E.J., Guo, F.: New multiscale transforms, minimum total variation synthesis. Applications to edge-preserving image reconstruction. Signal Processing 82 (2002) 6. Chambolle, A.: An algorithm for total variation minimization and application. J. of Mathematical Imaging and Vision 20(1) (2004) 7. Chambolle, A.: Total variation minimization and a class of binary MRF models. In: Rangarajan, A., Vemuri, B.C., Yuille, A.L. (eds.) EMMCVPR 2005. LNCS, vol. 3757, pp. 136–152. Springer, Heidelberg (2005) 8. Chesneau, C., Fadili, J., Starck, J.-L.: Stein block thresholding for image denoising. Technical report (2008) 9. Coifman, R.R., Sowa, A.: Combining the calculus of variations and wavelets for image enhancement. Applied and Computational Harmonic Analysis 9 (2000) 10. Combettes, P.L.: Solving monotone inclusions via compositions of nonexpansive averaged operators. Optimization 53(5) (2004) 11. Combettes, P.L., Pesquet, J.-C.: A Douglas-Rachford splittting approach to nonsmooth convex variational signal recovery. IEEE J. of Selected Topics in Signal Processing 1(4), 564–574 (2007) 12. Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. Biometrika 81(3), 425–455 (1994) 13. Durand, S., Nikolova, M.: Denoising of frame coefficients using l1 data-fidelity term and edge-preserving regularization. SIAM J. on Multiscale Modeling and Simulation 6(2), 547–576 (2007) 14. Durand, S., Froment, J.: Reconstruction of wavelet coefficients using total variation minimization. SIAM J. on Scientific Computing 24(5), 1754–1767 (2003)
294
S. Durand, J. Fadili, and M. Nikolova
15. Durand, S., Fadili, J., Nikolova, M.: Multiplicative noise removal using L1 fidelity on frame coefficients. Report CMLA n.2008-40 (2008) 16. Fukuda, S., Hirosawa, H.: Suppression of speckle in synthetic aperture radar images using wavelet. Int. J. Remote Sens. 19(3), 507–519 (1998) 17. Krissian, K., Westin, C.-F., Kikinis, R., Vosburgh, K.G.: Oriented speckle reducing anisotropic diffusion. IEEE Trans. on Image Processing 16(5), 1412–1424 (2007) 18. Ma, J., Plonka, G.: Combined Curvelet Shrinkage and Nonlinear Anisotropic Diffusion. IEEE Trans. on Image Processing 16(9), 2198–2206 (2007) 19. Malgouyres, F.: Mathematical analysis of a model which combines total variation and wavelet for image restoration. J. of information processes 2(1), 1–10 (2002) 20. Moreau, J.-J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. CRAS Sér. A Math 21. Nikolova, M.: Minimizers of cost-functions involving nonsmooth data-fidelity terms. Application to the processing of outliers. SIAM J. on Numerical Analysis 40(3), 965–994 (2002) 22. Nikolova, M.: Weakly constrained minimization. Application to the estimation of images and signals involving constant regions. J. of Mathematical Imaging and Vision 21(2), 155–175 (2004) 23. Pizurica, A., Wink, A.M., Vansteenkiste, E., Philips, W., Roerdink, J.B.T.M.: A review of wavelet denoising in mri and ultrasound brain imaging. Current Medical Imaging Reviews 2(2), 247–260 (2006) 24. Rudin, L., Lions, P.-L., Osher, S.: Multiplicative denoising and deblurring: Theory and algorithms. In: Osher, S., Paragios, N. (eds.), pp. 103–119. Springer, Heidelberg (2003) 25. Rudin, L., Osher, S., Fatemi, C.: Nonlinear total variation based noise removal algorithm. Physica 60D, 259–268 (1992) 26. Shi, J., Osher, S.: A nonlinear inverse scale space method for a convex mutiplicative noise model. In: UCLA 2007 (2007) 27. Ulaby, F., Dobson, M.C.: Handbook of Radar Scattering Statistics for Terrain. Artech House, Norwood (1989) 28. Walessa, M., Datcu, M.: Model-based despeckling and information extraction from sar images. IEEE Trans. Geosci. Remote Sens. 38(9), 2258–2269 (2000) 29. Welk, M., Steidl, G., Weickert, J.: Locally analytic schemes: A link between diffusion filtering and wavelets shrinkage. Applied and Computational Harmonic Analysis 24, 195–224 (2008) 30. Xie, H., Pierce, L.E., Ulaby, F.T.: SAR speckle reduction using wavelet denoising and markov random field modeling. IEEE Trans. Geosci. Remote Sensing 40(10), 2196–2212 (2002) 31. Yu, Y., Acton, S.T.: Speckle reducing anisotropic diffusion. IEEE Trans. on Image Processing 11(11), 1260–1270 (2002)
Projected Gradient Based Color Image Decomposition Vincent Duval, Jean-François Aujol, and Luminita Vese 1
Institut TELECOM, TELECOM ParisTech, CNRS UMR 5141 [email protected] 2 CMLA, ENS Cachan, CNRS, UniverSud [email protected] 3 UCLA, Mathematics Department [email protected]
Abstract. This work deals with color image processing, with a focus on color image decomposition. The problem of image decomposition consists in splitting an original image f into two components u and v = f − u. u contains the geometric information of the original image, while v is made of the oscillating patterns of f , such as textures. We propose a numerical scheme based on a projected gradient algorithm to compute the solution of various decomposition models for color images or vectorvalued images. A direct convergence proof of the scheme is provided, and some analysis on color texture modeling is given. Keywords: Color image decomposition, projected gradient algorithm, color texture modeling.
1
Introduction
Total variation regularization was introduced almost 20 years ago for image restoration in the seminal work by Rudin et al [1]. It has now grown as a popular and widely used tool in image processing (see [2, 3] and references therein for instance). If we denote by f the original image, the problem we are interested in consists in minimizing energies of the type: |Du| + μf − ukT . (1) Here |Du| is the total variation of u; we simply have |Du| = |∇u| dx in the case when u is regular. .T stands for a norm which favors the noise and/or the textures of the original image f (in the sense that it is small for such features) and k is a positive exponent. The most basic choice for .T is the L2 norm, and k = 2. However, inspired from the book by Y. Meyer [4], and also motivated by work of Mumford-Gidas [5], other spaces have been considered for modeling natural images and oscillating patterns such as textures or noise. [4] was the inspiration source of many works, e.g. to name a few [6, 7, 8, 9, 10, 11, 12, 13, 14]. Image decomposition consists in splitting an original image f into two components, u X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 295–306, 2009. c Springer-Verlag Berlin Heidelberg 2009
296
V. Duval, J.-F. Aujol, and L. Vese
and v = f − u. u contains the geometrical component of the original image (it can be seen as a sketch of the original image), while v is made of the oscillatory component (when the original image f is noise free, v is the texture component). In this work, we are concerned with color image processing. While some authors deal with color images using a Riemannian framework, like G. Sapiro and D. L. Ringach [15] or N. Sochen et al [16], others combine a functional analysis viewpoint with the Chromaticity-Brightness representation [17]. The model we use is more basic: it is the same as the one used in [18] (and related with [19]). Its advantage is to have a rich functional analysis interpretation. Note that in [20], the authors also propose a cartoon + texture color decomposition and denoising model inspired from Y. Meyer [4], using the vectorial versions of total variation and approximations of the space G(Ω) for textures (to be defined later); unlike the work presented here, they use Euler-Lagrange equations and a gradient descent scheme for the minimization. Here, we give some insight into the definition of a texture space for color images. In [21], a TV-Hilbert model was proposed for image restoration and/or decomposition: |Du| + μf − u2H (2) where .H stands for the norm of some Hilbert space H. This is a particular case of problem (1). Thanks to the Hilbert structure of H, different methods can be used to minimize (2), such as a projection algorithm [21]. We extend (2) to the case of color images. From a numerical point of view, (1) is not straightforward to minimize. Depending on the choice of .T , the minimization of (1) can be quite challenging. Even in the simplestcase when .T is the L2 norm and k = 2, handling the total variation term |Du| needs to be done with care. The most classical approach consists in writing the associated Euler-Lagrange equation to problem (1). In [1], a fixed step gradient descent scheme is used to compute the solution. This method has on the one hand the advantage of being very easy to implement, and on the other hand the disadvantage of being quite slow. To improve the convergence speed, quasi-Newton methods have been proposed [22]. Duality based schemes have also drawn a lot of attention to solve (1): first by Chan, Golub and Mulet in [23], later by A. Chambolle in [24] with a projection algorithm. This projection algorithm has recently been extended to the case of color images in [18]. It has been shown that graph cuts based algorithms could also be used [25,26]. Let us notice that it is shown in [27,28] that Nesterov’s schemes provide fast algorithms for minimizing (1). Another variant of Chambolle projection algorithm [24] is to use a projected gradient algorithm [25, 28, 29]. Here we have decided to use this approach which has both advantages of being easy to implement and of being quite efficient. The plan of the paper is the following. In Sect. 2, we define and provide some analysis about the spaces we consider in the paper. In Sect. 3, we extend the TVHilbert model originally introduced in [21] to the case of color images. In Sect. 4, we present a projected gradient algorithm to compute a minimizer of problem (2). This projected gradient algorithm has first been proposed by A. Chambolle
Projected Gradient Based Color Image Decomposition
297
in [25] for total variation regularization. A proof of convergence was given in [28] relying on optimization results by Bermudez and Moreno [30]. We derive here a simple and direct proof of convergence. In Sect. 5, we apply this scheme to solve various classical denoising and decomposition problems. We illustrate our approach with many numerical examples.
2
Definitions and Properties of the Considered Color Spaces
In this section, we introduce some notations, and we provide some analysis of the functional analysis spaces we consider to model color textures. 2.1
Introduction
Let Ω be a Lipschitz convex bounded open set in R2 . We model color images as RM -valued functions defined on Ω. The inner product in L2 (Ω, RM ) is denoted M as u, vL2 (Ω,RM ) = Ω i=1 ui vi . For a vector ξ ∈ RM , we define the norms: M M |ξ|1 = |ξi |, |ξ|2 = ξi2 , |ξ|∞ = max |ξi | . i=1
i=1
i=1...M
We say that a function f ∈ L1 (Ω, RM ) has bounded variation if the following quantity is finite: |f |T V = supξ∈B f , div ξL2 (Ω,RM ) , with B = {ξ ∈ Cc1 (Ω, R2×M )/∀x ∈ Ω, |ξ(x)|2 ≤ 1} .
(3)
This quantity is called the total variation. For more information on its properties, we refer the reader to [3]. The set of functions with bounded variation is a vector space classically denoted by BV (Ω, RM ). For f smooth enough, the total M 2 variation of f is |f |T V = Ω i=1 |∇fi | dx. Other choices of sets B are possible (see [18] for a discussion), which are mathematically equivalent and define the same BV space. But in practice, in image processing, it is crucial to have a coupling between the channels as in (3) in order to avoid visual artifacts. 2.2
The Color G(Ω) Space
The G(R2 ) space was introduced by Y. Meyer in [4] to model textures in grayscale images. For the generalization to color images, we will adopt the framework of [8]; the color space G(Ω) is also used in [20], as a generalization of [6] to color image decomposition and color image denoising. Definition 1. The space G(Ω) is defined by: G(Ω) = {v ∈ L2 (Ω, RM )/ ∃ξ ∈ L∞ (Ω, (R2 )M ), ∀i = 1, . . . , M, vi = div ξi and ξi · N = 0 on ∂Ω}
298
V. Duval, J.-F. Aujol, and L. Vese
(where ξi · N refers to the normal trace of ξi over ∂Ω). One can endow it with the norm: vG = inf{ξ∞ , ∀i = 1, . . . , M, vi = div ξi , ξi · N = 0 on ∂Ω} M 2 with ξ∞ = ess sup i=1 |ξi | . The following result was proved in [9] for grayscale images: it characterizes G(Ω). Working component by component, it is straightforward to extend it to color images (see [31]). Proposition 1 G(Ω) =
v ∈ L2 (Ω, RM )/ v=0 . Ω
Remark 1. The topology induced by the G-norm on G(Ω) is coarser than the one induced by the L2 norm. Let us consider, for m ∈ N∗ , the sequence ∀k = (k) 1 . . . M, fm (x, y) = cos mx + cos my defined on (−π, π)M . The vector field 1 1 ξ (k) = ( m sin(mx), m sin(my)) satisfies the boundary condition, and its diver√
2M and limm→+∞ f m G = gence is equal to f m . As a consequence f m G ≤ m 0. Yet, it is easy to see that f m 2L2 (Ω,RM ) = 4M π 2 . The sequence f m converges to 0 for the topology induced by the G-norm, but not for the one induced by the L2 norm.
More generally, oscillating patterns with zero mean have a small G norm (see [4] for more details).
3 3.1
Color TV-Hilbert Model: Presentation and Mathematical Analysis Presentation
The TV-Hilbert framework was introduced for grayscale images by J.-F. Aujol and G. Gilboa in [21] as a way to approximate the BV-G model. They prove that one can extend Chambolle’s algorithm to this model. In this section we show that this is still true for color images. We are interested in solving the following problem: 1 inf |u|T V + f − u2H (4) u 2λ where H is the space of zero-mean functions of L2 (Ω, RM ), regarded as a Hilbert space endowed with the following norm : v2H = v, KvL2 (Ω,RM ) . Here we assume that K : H → L2 (Ω, RM ) is a symmetric positive definite, bounded linear operator (for the topology induced by the L2 (Ω, RM ) norm on H) and K −1 is bounded on Im(K).
Projected Gradient Based Color Image Decomposition
299
Example 1 (The Rudin-Osher-Fatemi model). It was proposed in [1] for grayscale images, then extended to color images using different methods (e.g. [15] or [19]). In [18], the authors use another kind of color total variation, which is the one we use in this paper. The idea is to minimize the functional: |u|T V +
1 f − u2L2 (Ω,RM ) . 2λ
(5)
Without loss of generality, we can assume that f has zero mean. Then this model becomes a particular case of the TV-Hilbert model with K = Id. Example 2 (The OSV model). In [7], S. Osher, A. Solé and L. Vese propose to model textures by the H −1 space. In order to generalize this model, we introduce the following functional : 1 inf |u|T V + |∇Δ−1 (f − u)|2 (6) u 2λ Ω ⎛ −1 ⎞ ⎛ ⎞ Δ v1 ∇ρ1 M ⎜ ⎟ ⎜ . ⎟ .. 2 where Δ−1 v = ⎝ ⎠, ∇ρ = ⎝ .. ⎠, |∇ρ|2 = j=1 |∇ρj | and .
Δ−1 vM ∇ρM M −1 2 −1 i |∇Δ (f −u)| = Ω i=1 |∇Δ (f −ui )|2 = f −u, −Δ−1 (f −u)L2 (Ω,RM ) . Ω For K = −Δ−1 , the Osher-Solé-Vese problem is a particular case of the TV-Hilbert framework. We also refer to L. Lieu, L. Vese [14] for more general (BV, H −s ) models, as particular cases of the TV-Hilbert formulation.
3.2
Mathematical Study
For f ∈ L2 (Ω, RM ), the existence and uniqueness of the minimizer u of (4) can be proved using standard methods (see [3]). Now, let us introduce the notation v = f − u, when u is a minimizer of (4). Following Y. Meyer’s steps, one can extend the result proposed in [4] for grayscale images (see [31] for a detailed proof): Theorem 1 (Characterization of minimizers) Let f ∈ L2 (Ω, RM ). (i) If KfG ≤ λ then the solution of the TV-Hilbert problem is given by (u, v) = (0, f ). (ii) If Kf G > λ then the solution (u, v) is characterized by: KvG = λ and u, KvL2 (Ω,RM ) = λ|u|T V . For λ > 0, the set Gλ = {v ∈ L2 (Ω, RM ), vG ≤ λ} is a closed convex set, as well as K −1 Gλ . The orthogonal projection of this set is well-defined and we can notice that Theorem 1 reformulates:
H v = PK −1 G (f ) λ . u= f −v
300
V. Duval, J.-F. Aujol, and L. Vese
That is, v is the orthogonal projection of f on the set K −1 Gλ . Therefore, the problem is equivalent to its dual formulation, with v = λK −1 div p : inf λK −1 div p − f 2H .
|p|≤1
4
(7)
Projected Gradient Algorithm
We present here a projection algorithm for solving this dual formulation, inspired from [24, 18], and we provide a complete proof of convergence of this scheme. 4.1
Discrete Setting
From now on, we will work in the discrete case, using the following convention. A grayscale image is a matrix of size N × N . We write X = RN ×N the space of grayscale images. Their gradients belong to the space Y = X × X. The L2 inner product is u, vX = 1≤i,j≤N ui,j vi,j . For the gradient and divergence operators on grayscale images, we use the same discretizations as in [24]. A color image is an element of X M and its gradient belongs to Y M . The gradient and the divergence are defined component by component, so that the color divergence is still the opposite of the adjoint of the color gradient. Notice that in this framework, we have ∇2 = div2 = 8 (see [24]). 4.2
Projected Gradient
It was recently noticed ([25], [28]), that problem (7) for grayscale images could be solved using a projected gradient descent. This is the algorithm we decided to extend to the case of color images. Let B = {v ∈ Y M , ∀ 1 ≤ i, j ≤ N, |vi,j |2 ≤ 1} be the discrete version of our set of test-functions. Theorthogonal projection on x1 x2 B is easily computed: PB (x) = max{1,|x| . The projected gradient , 2 } max{1,|x|2 } descent scheme is defined by : pm+1 = PB pm + τ ∇(K −1 div pm − f /λ (8) which amounts to: pm+1 = i,j
pm + τ ∇(K −1 div pm − fλ )i,j . i,j −1 div pm − f ) | max 1, |pm i,j 2 i,j + τ ∇(K λ
(9)
Since the functional is not elliptic, the standard proof of convergence of the projected gradient algorithm (see [32] for instance) needs to be adapted to this particular case. Proposition 2. If 0 < τ < 4 K1−1 , then algorithm (9) converges. More precisely, there exists p ˜ ∈ B such that : lim (K −1 div pm ) = K −1 div p ˜
m→∞
˜ − f 2H = inf p∈B λK −1 div p − f 2H . and λK −1 div p
Projected Gradient Based Color Image Decomposition
301
Proof. We only give here a sketch of the proof. Let us first notice that p is a minimizer iff p ∈ B and ∀q ∈ B, ∀τ > 0, q − p, p − (p + τ ∇(K−1 div p − f /λ))L2 ≥ 0. Or equivalently: p = PB p + τ (∇(K −1 div p − f /λ) , where PB is the orthogonal projection on B with respect to the L2 inner product. Let p be such a minimizer. • Now let us consider a sequence defined by (8), and write A = −∇K −1 div . We have : pk+1 − p2 ≤ (I − τ A)(p − pk )2 since PB is 1-Lipschitz [32]. Provided I − τ A ≤ 1, we can deduce : pk+1 − p ≤ pk − p
(10)
and the sequence (pk − p) is convergent. • A is a symmetric positive semi-definite operator. By writing E = ker A and ⊥
F = ImA, we have Y M = E ⊕ F , and we can decompose any q ∈ Y M as the sum of two orthogonal components q E ∈ E and q F ∈ F . Notice that by injectivity of K −1 , E is actually equal to the kernel of the divergence operator. Let μ1 = 0 < μ2 ≤ . . . ≤ μa be the ordered eigenvalues of A. I − τ A = max(|1 − τ μ1 |, |1 − τ μa |) = 1 for 0 ≤ τ ≤
2 . μa
We can restrict I − τ A to F and then define : g(τ ) = (I − τ A)|F < 1 for 0 < τ < μ2a • Now we assume that 0 < τ < μ2a . Therefore, inequality (10) is true and the sequence (pk ) is bounded, and so is the sequence (K −1 div pk ). We are going to prove that the sequence (K −1 div pk ) has a unique cluster point. Let (K −1 div pϕ(k) ) be a convergent subsequence. By extraction, one ˜ its limit. Passcan assume that pϕ(k) is convergent too, and denote by p ϕ(k)+1 ing to the limit in (8), the sequence (p ) is convergent towards p ˆ = PB p ˜ + τ ∇(K −1 div p ˜ − f /λ) . Using (10), we also notice that ˜ p − p = ˆ p − p. As a consequence: ˜ p − p2 = PB p ˜ − f /λ) − PB p + τ ∇(K −1 div p − f /λ) 2 ˜ + τ ∇(K −1 div p ≤ (I − τ A)(˜ p − p)2 = (˜ p − p)E 2 + g(τ )2 (˜ p − p)F 2 < ˜ p − p2 if (˜ p − p)F =0 .
Of course, this last inequality cannot hold, which means that (˜ p −p)F = 0. Hence (˜ p − p) ∈ E = ker A and K −1 div p ˜ = K −1 div p: the sequence (K −1 div pk ) is convergent. • Since div 2 = ∇2 = 8 (see [24]), we conclude by noticing that μa ≤ 8K −1 .
Since we are only interested in v = λK −1 div p, Proposition (2) justifies the validity of algorithm (8). We can actually prove that the sequence (pm ) defined by (8) converges (see [31] Corollary 4.1).
302
5
V. Duval, J.-F. Aujol, and L. Vese
Applications to Color Image Denoising and Decomposition
In this last section, we apply the projected gradient algorithm to solve various color image problems. 5.1
TV-Hilbert Model
The Color ROF Model. As an application of (9), we use the following scheme for the ROF model (5): pm+1 = i,j
pm + τ ∇(div pm − fλ )i,j . i,j f m max 1, |pm i,j + τ ∇(div p − λ )i,j |2
(11)
The Color OSV Model: As for the OSV model (6), we use: = pm+1 i,j 5.2
pm − τ ∇(Δdiv pm + fλ )i,j . i,j m + f) | max 1, |pm − τ ∇(Δdiv p i,j 2 i,j λ
(12)
The Color A2BC Algorithm
Following Y. Meyer [4], one can use the G(Ω) space to model textures, and try to solve the problem: inf u (|u|BV + αf − uG ). In [8], the authors approximate this problem by minimizing the following functional: 1 f − u − v2L2 (Ω) + χGμ (v n ) 2λ
0 if v ∈ Gμ with χGμ (v) = . +∞ otherwise
Fμ,λ (u, v) = |u|BV +
(13)
Following [8,17,18], it is straightforward to extend the A2BC algorithm using the projection algorithm. We start by initializing with u0 = v 0 = 0, and then compute iteratively until convergence1: v n+1 = PGμ (f − un ) 5.3
and
un+1 = f − v n+1 − PGλ (f − v n+1 ) .
The Color TV-L1 Model
The TV-L1 model is very popular for grayscale images. It benefits from having both good theoretical properties (it is a morphological filter) and fast algorithms (see [26]). In order to extend it to color images, we consider the problem: M 2 inf u |u|T V + λf − u1 with the notation u1 = Ω l=1 |ul | . As for the A2BC algorithm, we are led to consider the approximation, for α > 0: inf |u|BV +
u,v 1
1 f − u − v22 + λv1 . 2α
The proof of convergence of this algorithm is the same as the one in [8].
Projected Gradient Based Color Image Decomposition
303
Fig. 1. From left to right: original and noisy images (WG, PSNR = 57.3 dB), denoised with color ROF (λ = 25, PSNR= 74.2 dB) and with color OSV (λ = 25, PSNR= 74.1 dB)
Fig. 2. Cartoon-texture decomposition using color A2BC algorithm (upper row) and color TVL1 (lower row). On top, the original image.
In order to generalize the TV-L1 algorithm proposed by Aujol et al ( [33]), we aim at solving the alternate minimization problem: 1 1 f − u − v22 f − u − v22 + λv1 . inf |u|BV + and inf u v 2α 2α
304
V. Duval, J.-F. Aujol, and L. Vese
Fig. 3. From left to right: original and noisy images (using salt and pepper noise, PSNR= 34.6 dB), denoised with color TVL1 (PSNR= 67.5 dB) and noise part
The first problem is a Rudin-Osher-Fatemi problem. Scheme (9) with K = Id is well adapted for solving it. The second one can be solved by a "vectorial soft thresholding": Proposition 3. The solution of the second problem is given by: v(x) = V Tαλ (f (x) − u(x)) =
f (x) − u(x) max (|f (x) − u(x)|2 − αλ, 0) a.e. |f (x) − u(x)|2
The proof of this last result is given in [31]. Therefore, we propose to generalize the TV-L1 algorithm by initializing with u0 = v 0 = 0, then computing iteratively until convergence (the proof of convergence is the same as the one in [33]): v n+1 = V Tαλ (f − un ) 5.4
and
un+1 = f − v n+1 − PGα (f − v n+1 ).
Numerical Experiments
Figure 1 displays denoising results using ROF (5) and OSV (6) models. The images look very similar but since the OSV model penalizes much more the highest frequencies than the ROF model [33], the denoised image still shows the lowest frequencies of the noise. The convergence speed in the ROF model is roughly the same as with the Bresson-Chan algorithm (see [18], [31]). Figure 2 displays a cartoon-texture decomposition experiment using different kinds of texture. The algorithms used were A2BC and TVL1. Both results look good. On Figure 3, a denoising experiment was performed using salt-and-pepper noise. The denoised picture looks quite good and surprisingly better than the original image! This is because the picture we used had some compression artifacts that the algorithm removed.
Acknowledgements This work has been supported by the French "Agence Nationale de la Recherche" (ANR), under grant FREEDOM (ANR07-JCJC-0048-01), "Films, REstauration Et DOnnées Manquantes", and by the National Science Foundation under Grants DMS-0312222 and DMS-0714945. Part of this work was done while the first author was visiting the Department of Mathematics, UCLA.
Projected Gradient Based Color Image Decomposition
305
References 1. Rudin, L., Osher, S., Fatemi, E.: Non linear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 2. Chan, T., Shen, J.: Image processing and analysis - Variational, PDE, wavelet, and stochastic methods. SIAM Publisher, Philadelphia (2005) 3. Aubert, G., Kornprobst, P.: Mathematical Problems in Image Processing: Partial Differential Equations and the Calculus of Variations. Applied Mathematical Sciences, vol. 147. Springer, Heidelberg (2001) 4. Meyer, Y.: Oscillating patterns in image processing and nonlinear evolution equations. In: The fifteenth Dean Jacqueline B. Lewis memorial lectures. University Lecture Series, vol. 22. American Mathematical Society, Providence, RI (2001) 5. Mumford, D., Gidas, B.: Stochastic models for generic images. Quarterly of Applied Mathematics LIV(1) (2001) 6. Vese, L., Osher, S.J.: Modeling textures with total variation minimization and oscillating patterns in image processing. Journal of Scientific Computing 19(1-3), 553–572 (2003) 7. Osher, S., Solé, A., Vese, L.: Image decomposition and restoration using total variation minimization and the H −1 norm. SIAM Journal on Multiscale Modeling and Simulation 1(3), 349–370 (2003) 8. Aujol, J.F., Aubert, G., Blanc-Féraud, L., Chambolle, A.: Image decomposition into a bounded variation component and an oscillating component. Journal of Mathematical Imaging and Vision 22(1), 71–88 (2005) 9. Aubert, G., Aujol, J.: Modeling very oscillating signals. Application to image processing. Applied Mathematics and Optimization 51(2), 163–182 (2005) 10. Aujol, J.F., Chambolle, A.: Dual norms and image decomposition models. International Journal on Computer Vision 63(1), 85–104 (2005) 11. Yin, W., Goldfarb, D., Osher, S.: A comparison of three total variation based texture extraction models. Journal of Visual Communication and Image Representation 18(3), 240–252 (2007) 12. Garnett, J., Jones, P., Le, T., Vese, L.: Modeling oscillatory components with the homogeneous spaces BM O−α and W −α,p . Pure and Applied Mathematics Quarterly (to appear) 13. Le, T., Vese, L.: Image decomposition using total variation and div (BMO). Multiscale Modeling and Simulation, SIAM Interdisciplinary Journal 4(2), 390–423 (2005) 14. Lieu, L., Vese, L.: Image restoration and decomposition via bounded total variation and negative hilbert-sobolev spaces. Applied Mathematics & Optimization 58, 167– 193 (2008) 15. Sapiro, G., Ringach, D.L.: Anisotropic diffusion of multivalued images with applications to color filtering. IEEE Transactions on Image Processing 5(11), 1582–1586 (1996) 16. Sochen, N., Kimmel, R., Malladi, R.: A general framework for low level vision. IEEE Transactions on Image Processing 7(3), 310–318 (1998) 17. Aujol, J.F., Kang, S.H.: Color image decomposition and restoration. Journal of Visual Communication and Image Representation 17(4), 916–928 (2006) 18. Bresson, X., Chan, T.: Fast minimization of the vectorial total variation norm and applications to color image processing. Inverse Problems and Imaging (IPI) (accepted) (2007)
306
V. Duval, J.-F. Aujol, and L. Vese
19. Blomgren, P., Chan, T.: Color TV: Total variation methods for restoration of vector valued images. IEEE Transactions on Image Processing 7(3), 304–309 (1998) 20. Vese, L., Osher, S.: Color texture modeling and color image decomposition in a variational-PDE approach. In: Proceedings of the Eighth International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC 2006), pp. 103–110. IEEE, Los Alamitos (2006) 21. Aujol, J., Gilboa, G.: Constrained and SNR-based solutions for TV-Hilbert space image denoising. Journal of Mathematical Imaging and Vision 26(1-2), 217–237 (2006) 22. Vogel, C.: Computational Methods for Inverse Problems. Frontiers in Applied Mathematics, vol. 23. SIAM, Philadelphia (2002) 23. Chan, T., Golub, G., Mulet, P.: A nonlinear primal-dual method for total variationbased image restoration. SIAM Journal on Scientific Computing 20(6), 1964–1977 (1999) 24. Chambolle, A.: An algorithm for total variation minimization and its applications. JMIV 20, 89–97 (2004) 25. Chambolle, A.: Total variation minimization and a class of binary MRF models. In: Rangarajan, A., Vemuri, B.C., Yuille, A.L. (eds.) EMMCVPR 2005. LNCS, vol. 3757, pp. 136–152. Springer, Heidelberg (2005) 26. Darbon, J., Sigelle, M.: Image restoration with discrete constrained total variation part I: Fast and exact optimization. Journal of Mathematical Imaging and Vision 26(3), 277–291 (2006) 27. Weiss, P., Aubert, G., Blanc-Féraud, L.: Efficient schemes for total variation minimization under constraints in image processing. SIAM Journal on Scientific Computing (to appear) (2007) 28. Aujol, J.: Some algorithms for total variation based image restoration. CMLA Preprint 2008-05 (2008), http://hal.archives-ouvertes.fr/hal-00260494/en/ 29. Zhu, M., Wright, S., Chan, T.: Duality-based algorithms for total variation image restoration, UCLA CAM Report 08-33 (May 2008) 30. Bermudez, A., Moreno, C.: Duality methods for solving variational inequalities. Comp. and Maths. with Appls. 7(1), 43–58 (1981) 31. Duval, V., Aujol, J.F., Vese, L.: A projected gradient algorithm for color image decomposition. Technical report, UCLA, CAM Report 08-40 (2008) 32. Ciarlet, P.G.: Introduction á l’Analyse Numérique Matricielle et á l’Optimisation. Dunod (1998) 33. Aujol, J., Gilboa, G., Chan, T., Osher, S.: Structure-texture image decomposition modeling, algorithms, and parameter selection. International Journal of Computer Vision 67(1), 111–136 (2006)
A Dual Formulation of the TV-Stokes Algorithm for Image Denoising Christoffer A. Elo1 , Alexander Malyshev1 , and Talal Rahman2 1
2
Department of Mathematics, University of Bergen Johannes Bruns gate 12, 5007 Bergen, Norway [email protected], [email protected] Bergen University College, Faculty of Engineering, Nygårdsgaten 112, 5020 Bergen [email protected]
Abstract. We propose a fast algorithm for image denoising, which is based on a dual formulation of a recent denoising model involving the total variation minimization of the tangential vector field under the incompressibility condition stating that the tangential vector field should be divergence free. The model turns noisy images into smooth and visually pleasant ones and preserves the edges quite well. While the original TV-Stokes algorithm, based on the primal formulation, is extremely slow, our new dual algorithm drastically improves the computational speed and possesses the same quality of denoising. Numerical experiments are provided to demonstrate practical efficiency of our algorithm.
1
Introduction
We suppose that the observed image d0 (x, y), (x, y) ∈ Ω ⊂ R2 , is an original image d(x, y) perturbed by an additive noise η, d0 = d + η.
(1)
The problem of recovering the image d from the noisy image d0 is an inverse problem that is often solved by variational methods using the total variation (TV) minimization. The corresponding Euler equation, which is a set of nonlinear partial differential equations, is typically solved by applying a gradient-descent method to a finite difference approximation of these equations. A classical total variation denoising model is the primal formulation due to Rudin, Osher and Fatemi [1] (the ROF model): λ d − d0 2L2 . (2) 2 The parameter λ > 0 can be chosen, e.g., to approximately fulfill the condition d − d0 L2 ≤ σ, where σ is an estimate of ηL2 . The Euler equation −div (∇d/|∇d|) + λ(d − d0 ) = 0 is usually replaced by a regularized one, ∇d + λ(d − d0 ) = 0, (3) −div |∇d|β min ∇dL1 + d
X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 307–318, 2009. c Springer-Verlag Berlin Heidelberg 2009
308
C.A. Elo, A. Malyshev, and T. Rahman
where |∇d|β =
|∇d|2 + β 2 is a necessary regularization, since images contain
flat areas where |∇d| = d2x + d2y ≈ 0. When solving (3) numerically, an explicit time marching scheme with an artificial time variable, t, is typically used. However, such an algorithm is rather slow due to severe restrictions requiring small time steps for the convergence. It is well known that the ROF model suffers from the so called staircase effect, which is a disadvantage when denoising images with affine regions. To overcome this defect, we motivate for a two-step approach, where the fourth-order model, studied in [2, 3, 4], is decoupled into two second-order problems. Such methods are known to overcome the staircase effect, but tend to have computational difficulties due to very large conditioning. The authors of [5, 6] used the same two-step approach as in [7], but adopting ideas from [8, 9] they proposed to preserve the divergence-free condition on the tangential vector field. Recall that the tangential vector field τ is orthogonal to the normal (gradient) vector field n of the image d: n = ∇d = (dx , dy ),
τ = ∇⊥ d = (−dy , dx )T .
(4)
Hence div τ = 0. The first step of the TV-Stokes algorithm smoothes the tangential vector field τ0 = ∇⊥ d0 for a given noisy image d0 and then solve the minimization problem 1 min ∇τ L1 + τ − τ0 2L2 τ 2δ
subject to div τ = 0,
(5)
where δ > 0 is some carefully chosen parameter. Once a smoothed tangential vector field τ is obtained, the second step reconstructs the image d by fitting it to the normal vector field by solving the minimization problem min ∇dL1 d
n − ∇d, |n| L2
subject to d − d0 L2 = σ,
(6)
where σ is an estimate of ηL2 . In [5] the minimization problems (5) and (6) are numerically solved by means of a time marching explicit scheme, while existence and uniqueness are proven for the Modified TV-Stokes in [6]. The TV-Stokes approach resulted in an algorithm which does not suffer from the staircase effect, preserves the edges, and the denoised images look visually pleasant. However, the TV-Stokes algorithm from [5] is extremely slow convergent and therefore practically unusable as demonstrated in the last section of the present paper. We adopt the TV-Stokes denoising model but reduce the above presented primal formulation to the so called dual formulation, which is then numerically solved by a variant of fast Chambolle’s iteration [10]. The reduction exploits the orthogonal projector ΠK onto the subspace K = {τ : div τ = 0} for elimination of the divergence-free constraint.
A Dual Formulation of the TV-Stokes Algorithm for Image Denoising
2
309
The TV-Stokes Denoising Algorithm in Dual Formulation
To overcome difficulties with non-differentiability in the primal formulation, Carter [11], Chambolle [10] and Chan, Golub and Mulet [12] have proposed dual formulations of the ROF model, where a dual variable p = (p1 (x, y), p2 (x, y)) is used to express the total variation: ∇dL1 = max {(d, divp)L2 : |pj (x, y)| ≤ 1 ∀(x, y) ∈ Ω, j = 1, 2} . p
(7)
For instance, a variant of dual formulation from [10] consists in minimization of the distance divp − λd0 L2 . In [10] Chambolle also proposed a fast iteration for solving this minimization problem that produces a denoised image after a few steps only. Below we show how to reduce the TV-Stokes model to a dual formulation. 2.1
Step 1
To derive a dual formulation of the first step we take advantage of the following analog of (7) for the total variation of the tangential vector field τ = (τ1 , τ2 )T : ∇τ L1 = max {(τ, divp)L2 : |pi (x, y)| ≤ 1 ∀(x, y) ∈ Ω, i = 1, 2} , p
(8)
where the dual variable p is a pair of two rows, p1 = (p11 , p12 ) and p2 = (p21 , p22 ). The divergence is defined as follows: divp = (divp1 , divp2 )T , where divpi =
∂pi2 ∂pi1 + , i = 1, 2. ∂x ∂y
(9)
This definition is similar to the vectorial dual norm from [13] for vectorial images, e.g. color images. Plugging (8) into (5) yields 1 min max (τ, divp)L2 + (τ − τo , τ − τo )L2 . (10) div τ =0 |pi |≤1 2δ Results from convex analysis, see for instance Theorem 9.3-1 in [14], allow us to exchange the order of max and min in (10) and obtain an equivalent optimization problem 1 max min (τ, divp)L2 + (τ − τo , τ − τo )L2 . (11) 2δ |pi |≤1 div τ =0 Now comes a trick. Let us introduce the orthogonal projection ΠK onto the constrained subspace K = {τ : div τ = 0}. Note that τ0 ∈ K. By means of the pseudoinverse Δ+ we may write that τ1 τ1 τ + = − ∇Δ div 1 . (12) ΠK τ2 τ2 τ2
310
C.A. Elo, A. Malyshev, and T. Rahman
The constraint div τ = 0 means that ΠK τ = τ , and the latter implies the equalities (τ, divp) = (ΠK τ, divp) = (τ, ΠK divp). Hence (11) is equivalent to 1 (13) max min (τ, ΠK divp)L2 + (τ − τo , τ − τo )L2 . |pi |≤1 div τ =0 2δ Solution to the minimization problem (without constraint div τ = 0!) 1 min (τ, ΠK divp)L2 + (τ − τo , τ − τo )L2 τ 2δ is τ = τ0 − δΠK divp
(14)
and satisfies the constraint div τ = 0. Owing to (14) we have the equality 1 1 (τ − τo , τ − τo ) = 2δ [(τ0 , τ0 ) − (δΠK divp − τ0 , δΠK divp − τ0 )] , (τ, ΠK divp) + 2δ which together with (13) gives our dual formulation:
min ΠK divp − δ −1 τ0 L2 : |pi | ≤ 1, i = 1, 2 . (15) p
Numerical solution of (15) is computed by Chambolle’s iteration from [10]:
pn + Δt ∇ ΠK divpn − δ −1 τ0 0 n+1 p = 0, p . (16) = 1 + Δt |∇ (ΠK divpn − δ −1 τ0 )| The iteration converges rapidly when Δt ≤ 14 . The smoothed tangential field after n iterations is given by τn = τ0 − δΠK divpn . 2.2
Step 2
The image d is reconstructed at the second step by fitting it to the normal vector field built from the tangential vector field computed at step 1, (n1 , n2 ) = (τ2 , −τ1 ). Again we introduce a dual variable r = (r1 (x, y), r2 (x, y)) and use the formula ∇dL1 = max|r|≤1 (∇d, −r)L2 . Then the minimization problem (6) is equivalent to the problem n 1 d − d0 2L2 , d, div r + (17) min max + d |r|≤1 |n| 2μ L2 where μ > 0 is a Lagrangian multiplier. After interchanging min and max in (17) we find conditions for attaining the minimum: n d = d0 − μ div r + . (18) |n| By analogy with (15) we can derive the dual formulation for step 2: d0 n − : |r| ≤ 1 . min div r + r |n| μ L2
(19)
A Dual Formulation of the TV-Stokes Algorithm for Image Denoising
Chambolle’s iteration for (19) is as follows: n − μ−1 d0 rn + Δt ∇ div rn + |n| . rn+1 = n − μ−1 d0 1 + Δt ∇ div rn + |n| 2.3
311
(20)
The Discrete Algorithm
The staggered grid is used for discretization as in [5]. For convenience we introduce the differentiation matrices ⎞ ⎛ ⎞ ⎛ 1 −1 1 ⎟ ⎜ −1 1 ⎟ ⎟ −1 1 1⎜ 1⎜ ⎟ ⎟ ⎜ ⎜ T .. .. (21) , −B = ⎜ B= ⎜ ⎟, ⎟ . . . . .. .. ⎠ ⎟ h⎝ h⎜ ⎝ −1 1 ⎠ −1 1 −1 where B is the forward difference operator and −B T is the backward difference operator. The discrete gradient operator applied to a matrix d is then defined as ∇h d = dBxT , By d , (22) where Bx (By ) stands for differentiation in the x (resp. y) direction. The discrete divergence operator is given by divh (p1 , p2 ) = −p1 Bx − ByT p2 .
(23)
The discrete analog of the projection operator ΠK has the form h ΠK = I − ∇h (Δh )+ divh ,
(24)
where the gradient and divergence are applied in a slightly different manner: T τ1 dBx h T h div = −τ1 Bx − By τ2 , ∇ d= . (25) τ2 By d To complete the definition (24) we need a description of the pseudoinverse operator (Δh )+ for the discrete Laplacian Δh d = −dBxT Bx − ByT By d.
(26)
Let us introduce the orthogonal N × N matrix of the Discrete Cosine Transform, C, which is defined by dst(eye(N)) in MATLAB. The symmetric matrix defined in MATLAB by dst(eye(N-1)), satof the Discrete Sine Transform, S, T isfies the equation S S = (N/2) ∗ I, where I is theidentity matrix. We prefer N/2 of order N − 1. The to use the orthogonal symmetric matrix S = −S/ singular value decomposition of B has the form B = S[0, Σ]C,
Σ = diag(σ1 , . . . , σN −1 ),
where the diagonal matrix Σ has the diagonal entries
(27)
312
C.A. Elo, A. Malyshev, and T. Rahman
σk =
πk 2 sin , h 2N
k = 1, 2, . . . , N − 1.
(28)
By the aid of (27) equation (26) can be rewritten as f = Δh d = −dC T
0
Σx2
C − CT
0
Σy2
Cd.
Denoting f = Cf C T and d = CdC T we arrive at the equation 0 0 − d. f = −d Σy2 Σx2
(29)
(30)
Suppose that the matrices f and This equation is easily solved with respect to d. d have the entries fij and dij for i, j = 0, 1, . . . . Note that in our case f00 = 0. Then the solution d = G(f) is as follows: d00 = 0, 2 , di,0 = −fi,0 /σi,y
i = 1, 2, . . . ,
2 , d0,j = −f0,j /σj,x
j = 1, 2, . . . ,
(31)
2 2 + σj,x ), i, j = 1, 2, . . . . dij = −fij /(σi,y
Thus the pseudoinverse operator (Δh )+ can be efficiently computed with the help of the Discrete Cosine Transform: (Δh )+ f = C T G(Cf C T )C,
(32)
where the function G is defined in (31). In conclusion we recall that multiplication of an N × N matrix by C or C T = C −1 is typically implemented by the aid of the fast Fourier transform and requires only O(N 2 log2 N ) arithmetical operations. All other computations have the cost O(N 2 ).
(a) Lena, 200 × 200
(b) Cameraman, 256 × 256 Fig. 1. Original images
(c) Barbara, 512 × 512
A Dual Formulation of the TV-Stokes Algorithm for Image Denoising
313
Algorithm. Dual TV-Stokes Given d0 , k, δ and μ ; Step one; Let p0 = 0 and q 0 = 0 ; Calculate τ 0 = (v 0 , u0 ) : v 0 = −Bd and u0 = dB T ; Initialize counter: n = 0 ; while not converged do Calculate projections: h (πp , πq ) = ΠK (divh pn , divh q n )
pn + k ∇h πp − δ −1 v0 . 1 + k |(∇h (πp − δ −1 v0 ))| q n + k ∇h πq − δ −1 u0 . q n+1 = 1 + k |(∇h (πq − δ −1 u0 ))| Update counter: n = n + 1 ; end Calculate τ : pn+1 =
h τ = τ0 − ΠK (δdivh pn+1 , δdivh q n+1 )
(33)
(34)
(35)
(36)
Step two; Let r 0 = 0 and calculate the normal field: n = (n1 , n2 ), n1 = u(v 2 + u2 )−1/2 and n2 = −v(v 2 + u2 )−1/2 ; Initialize counter: n = 0 ; while not converged do Calculate projections: r
n+1
r n + k ∇h divh (r n + n) − μ−1 v0 . = 1 + k ∇h divh (rn + n) − μ−1 v0
(37)
Update counter: n = n + 1 ; end Recover image d: d = d0 − μdivh r n+1
(38)
Algorithm 1. Dual TV-Stokes algorithm for image denoising 2.4
Numerical Experiments
In what follows we present several examples to show how the TV-Stokes method works for different images. All the images we have tested are normalized into gray-scale values, ranging from 0 (black) to 1 (white). In the experiments we start with a clean image, shown in figure 1, and then add random noise with zero mean. This is done by the imnoise MATLAB command, where the variance
314
C.A. Elo, A. Malyshev, and T. Rahman 5
130
4.4
x 10
4.2
120
4
110 3.8
100 3.6
90 3.4
80
3.2
3
70
2.8
60
2.6
50
0
10
20
30
40
50
60
70
80
90
0
1
2
3
4
5
6
(a) Dual TV-Stokes algorithm 1
7
8 4
100
x 10
(b) TV-Stokes [5]
Fig. 2. Energy vs. iterations plot for the first step
parameter is set to 0.001 for the Barbara image and 0.005 for the Lena image. The Cameraman image is taken directly from the paper [5], so we compare the results with the same noisy image as input. In [5] this model is further compared to the two-step method LOT and famous ROF model. The signal-to-noise ratio is measured in decibels before denoising: " ! (d − d)2 dx Ω SN R = 20 log10 ! , (39) (η − η)2 dx Ω ! ! 1 1 where d = |Ω| and η = |Ω| Ω d dx, Ω η dx The numerical procedures used in [5] were based on explicit finite difference schemes. This process is very slow, as the constraint converges slowly. However, in the proposed dual method the constraint is satisfied on each step by the orthogonal projection. The energy and number of iterations required for convergence in step one are shown in figure 2. The figure clearly illustrates that the dual TV-Stokes algorithm requires less iterations before the energy is stable than the primal TV-Stokes algorithm. Although the iterations in the dual TV-Stokes algorithm require more computational effort in each iteration, it is much faster than using sparse linear solvers. Inverting the Laplacian for the orthogonal projection in each iteration is a bottleneck for very large images. In all these examples the projection was applied by the aid of the Fast Fourier Transform, which needs O(n2 log(n)) operations in each iteration. For very large images, one should consider using a multigrid solver method for applying the projection. This will reduce the operations cost to O(N 2 ). All methods were coded in MATLAB, and in table 1 the CPU time is given in seconds for each test image. The figure shows the dual TV-Stokes algorithm vs. the primal TV-Stokes algorithm from [5]. We measure the L2 -norm of the energy in (15) and (19) for stopping criteria, and stop the iteration when the difference of the energy is below 10−3 . For the TV-Stokes algorithm we used the same stopping criteria as in [5], where the tolerance of the L2 -norm of the
A Dual Formulation of the TV-Stokes Algorithm for Image Denoising
315
Table 1. Runtimes of the dual TV-Stokes algorithm compared to the TV-Stokes algorithm [5]. The test system is a 2 Opteron 270 dualcore 64-bit processor and 8GB RAM. Both steps in the dual TV-Stokes algorithm are computed with 150 iterations, while the first step in the primal TV-Stokes algorithm is calculated with 75000 iterations and the second step with 25000 iterations. Algorithm Dual TV-Stokes algorithm Image First step Second step Lena 9.8 1.12 Cameraman 17.4 2.2 Barbara 128.2 20.7
TV-Stokes algorithm, [5] First step Second step 9083.2 1992.5 11189.0 2259.4 80602.5 14926.3
constraint is equal to 5 × 10−3 and the difference in the energy tolerance is equal to 10−3 . The time steps were set to 10−3 and 5 × 10−3 respecitvely for the first and second step of the TV-Stokes algorithm. Our first test is the well known Lena image, which we will recover from highly added noise. We have cropped the image to show the face, which consists of smooth areas and edges that are important to preserve. The denoised image in Figure 3, shows that the dual TV-Stokes method has recovered the smooth areas without inducing any staircase-effect. The smoothing parameter δ is equal to 0.0835 and μ is equal to 0.17. Since this is a highly noisy image, the ROF model fails to give a visually pleasant image, because the smooth surfaces are piecewise continuous. The TV-Stokes algorithm however, has nearly the same quality as the dual TV-Stokes algorithm. For the TV-Stokes algorithm, δ was equal to 0.045. The next test is the Cameraman image, which consists of a smooth skyline and some low-intensity buildings in the background. The buildings are difficult to recover, as they get smeared out by the denoising. The results are shown in figure 4 with δ equal to 0.055 and μ equal to 0.08. The TV-Stokes result is taken from [5] where the SNR are the same as the one we report, 20 log10 (8.21) ≈ 18.28. Figure 4.d shows the TV-Stokes reconstruction for the same noisy image, where the delta parameter is equal to 0.06. The last example is the Barbara image, which is quite detailed, with high and low intensity textures. The high intensity textures and the smooth areas are preserved quite well, but the low intensity textures disappear in the same way as for the Cameraman. This image is 512 × 512 in size, which makes the algorithm slower, because of the rather large number of matrix operations per iteration. However, reaching a result for the optimal parameters is still obtainable, since the method has a denoised image after a few steps. Thus, one can run the method multiple times to find the optimal parameters. For this image we used δ equal to 0.05 and μ equal to 0.15. We do not report on an optimal result for this particular case of the TV-Stokes algorithm, due to page limitation and the amount of running time. Clearly, using the dual formulation is more effective than solving the model with the explicit gradient descent method. The CPU time is found for only one runtime, since computing an average of many runtimes is very time consuming
316
C.A. Elo, A. Malyshev, and T. Rahman
(a) Noisy image, SN R ≈ 14.0
(b) Denoised using the dual TVStokes algorithm
(c) Contour plot, dual TV-Stokes (d) Difference image, dual TVimage Stokes
(e) Denoised using ROF [1]
(f) Difference image, ROF
(g) Denoised using the TV-Stokes (h) Difference image, TV-Stokes algorithm [5] Fig. 3. Lena image (200 × 200), denoised using the dual TV-Stokes, TV-Stokes and the ROF algorithm
A Dual Formulation of the TV-Stokes Algorithm for Image Denoising
317
(a) Noisy image, SN R ≈ 18.28 (b) Denoised using the dual TVStokes algorithm
(c) Difference image, dual TV- (d) Denoised using Stokes Stokes algorithm [5]
the
TV-
Fig. 4. Cameraman (256 × 256), denoised using the dual and the primal formulation of the TV-Stokes algorithm
(a) Noisy image, SN R ≈ 20.0
(b) Denoised image
Fig. 5. Barbara (512 × 512), denoised using the dual formulation of the TV-Stokes algorithm
for the TV-Stokes method. Although, the time shown are for one runtime, they clearly give the indication that our method is much faster and stable. The comparison with the primal method also shows that the proposed dual method has the same denoising quality.
318
C.A. Elo, A. Malyshev, and T. Rahman
References 1. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Phys. D 60(1-4), 259–268 (1992) 2. Chan, T., Marquina, A., Mulet, P.: High-order total variation-based image restoration. SIAM J. Sci. Comput. 22(2), 503–516 (2000) 3. Chambolle, A., Lions, P.L.: Image recovery via total variation minimization and related problems. Numer. Math. 76, 167–188 (1997) 4. Lysaker, O., Lundervold, A., Tai, X.C.: Noise removal using fourth-order partial differential equation with applications to medical magnetic resonance images in space and time. IEEE Trans. Imag. Proc. 12, 1579–1590 (2003) 5. Rahman, T., Tai, X.C., Osher, S.: A tv-stokes denoising algorithm. In: Sgallari, F., Murli, A., Paragios, N. (eds.) SSVM 2007. LNCS, vol. 4485, pp. 473–483. Springer, Heidelberg (2007) 6. Litvinov, W., Rahman, T., Tai, X.C.: A modified tv-stokes model for image processing (submitted) (2008) 7. Lysaker, O.M., Osher, S., Tai, X.C.: Noise removal using smoothed normals and surface fitting. IEEE Transaction on Image Processing 13(10), 1345–1357 (2004) 8. Bertalmio, M., Bertozzi, A., Sapiro, G.: Navier-stokes, fluid dynamics, and image and video inpainting. In: Proc. IEEE Computer Vision and Pattern Recognition (CVPR) (2001) 9. Tai, X., Osher, S., Holm, R.: Image inpainting using tv-stokes equation. Image Processing based on partial differential equations (2006) 10. Chambolle, A.: An algorithm for total variation minimization and applications. J. Math. Imaging Vis. 20(1-2), 89–97 (2004) 11. Carter, J.: Dual methods for total variation-based image restoration. PhD thesis, UCLA (2001) 12. Chan, T.F., Golub, G.H., Mulet, P.: A nonlinear primal-dual method for total variation-based image restoration. SIAM J. Sci. Comput. 20(6), 1964–1977 (1999) 13. Bresson, X., Cham, T.F.: Fast minimization of the vectorial total variation norm and applications to color image processing. CAM Report 07-25 (2007) 14. Ciarlet, P.G., Jean-Marie, T., Bernadette, M.: Introduction to numerical linear algebra and optimisation. Cambridge University Press, Cambridge (1989)
Anisotropic Regularization for Inverse Problems with Application to the Wiener Filter with Gaussian and Impulse Noise Micha Feigin and Nir Sochen School of Mathematics, Tel Aviv University [email protected], [email protected]
Abstract. Most inverse problems require a regularization term on the data. The classic approach for the variational formulation is to use the L2 norm on the data gradient as a penalty term. This however acts as a low pass filter and thus is not good at preserving edges in the reconstructed data. In this paper we propose a novel approach whereby an anisotropic regularization is used to preserve object edges. This is achieved by calculating the data gradient over a Riemannian manifold instead of the standard Euclidean space using the Laplace-Beltrami approach. We also employ a modified fidelity term to handle impulse noise. This approach is applicable to both scalar and vector valued images. The result is demonstrate via the Wiener filter with several approaches for minimizing the functional including a novel GSVD based spectral approach applicable to functionals containing gradient based features.
1
Introduction
Handling degraded images, both due to blur and noise, is a practical reality in any imaging field. The common image degradation model is I = I0 ∗ h + n
(1)
where I, the observed image, is the result of a convolving the input image (or ideal image) I0 with some blurring kernel h. The result is then summed with additive noise n. This is a common model for any system that contains a lens and sensor. Both the blur and noise are a combination of several processes. Some typical causes for image blue are out of focus images, motion blur due to an unstable camera and/or object and a low pass filter resulting from the finite aperture and anti aliasing filter on the sensor. Noise can result from the sensor and amplifier due to low light, heat, dead pixels and background radiation or from memory and communication corruption. Each of these processes has it’s own typical blur kernel and noise distribution statistics [1, 2]. A direct naive approach to handle the blur can be given using a spectral (Fourier) approach manipulation of the degradation model equation. To see the ˆ +n difficulty though, look at the Fourier transform of this equation Iˆ = Iˆ0 · h ˆ X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 319–330, 2009. c Springer-Verlag Berlin Heidelberg 2009
320
M. Feigin and N. Sochen
(where the hat notation denotes the Fourier transform). This transforms the convolution into a multiplication which allows for an easy rearrangement of the ˆ Any L2 kernel h will the equation. Extracting Iˆ0 gives us Iˆ0 = Iˆ − n ˆ /h. decay to zero at infinity. This results with a divide by zero issue at least for high frequencies. Add to that the issue that the SNR usually drops at these frequencies, which makes this procedure very sensitive to noise. One solution is this case is the Wiener filter [3], which can be derived from the standard variational formulation for ill posed inverse problems by adding prior knowledge (or assumptions) via an additional penalty term to the reconstruction. That is to minimize an energy functional of the form S (I0 ) = I0 ∗ h − I + μ Φ (I0 ) . fidelity term penalty
(2)
Here Φ is some function of the parameter I0 that imposes the assumptions on the model. A common constraint term is Φ (I0 ) = ∇I0 which penalizes high frequencies as these are often the source of instability. The side effect of this constraint is that while high frequency noise is reduced in the reconstruction, edge detail is lost as well as is demonstrated in Fig. 1.
(a) Original Image (b) Degraded Input
(c) μ = 5 · 10−4
(d) μ = 5 · 10−5
Fig. 1. Edge preservation vs. Noise suppression with the Wiener filter. The input image 1(a) is degraded using Gaussian white noise 1(b). The results show the difference between preferring noise suppression 1(c) to edge preservation 1(d).
This functional is often minimized under the L2 norm which is appropriate for Gaussian noise. This is mainly due to the fact that the resulting Euler Lagrange equations are linear and are thus (relatively) easy to solve. That is, the classic Wiener filter functional based on the L2 norm 2 2 (3) S (I0 ) = I0 ∗ h − IL2 + ∇I0 L2 = |I0 ∗ g − I|2 + |∇I0 |2 dA . results with the following Euler Lagrange equations (see [4] for the derivation of the Euler Lagrange formulation of the convolution) −h (−¯ x) ∗ (h (¯ x) ∗ I0 − I) − μΔI0 = 0 .
(4)
Anisotropic Regularization for Inverse Problems
321
Here x ¯ is the coordinate vector x ¯ = (x, y) for the two dimensional case. This can be solved as before by applying the Fourier transform, which results with ˆ (−ω) · h ˆ (¯ h ω ) · Iˆ0 − Iˆ + μ |¯ ω |2 Iˆ0 = 0 (5) where ω ¯ = (ωx , ωy ) is the frequency vector for the resulting frequencies along the x and y axes respectively. Now, assuming that the convolution kernel is real we ˆ (−ω) = h ˆ ∗ (ω) (where h∗ is the conjugate of h) to rewrite can use the identity h the equation as ˆ h∗ (ω) Iˆ0 = Iˆ . (6) 2 ˆ 2 h (ω) + μ |ω| Despite being easy to solve, there are two main issues with the L2 norm approach, both for the constraint and the fidelity term. The first issue is that it fails to preserve object boundaries (Fig. 1). The main reason is the penalty term that penalizes high frequencies. As the fidelity term is also L2 it does little to alleviate this problem. The second issue is that the fidelity term is designed to handle Gaussian noise and behaves poorly in the presence of impulse noise One solution to both these issues is to use the L1 or total variation (TV) norm [5,6,7]. When used for the fidelity term it improves behavior with impulse noise. For the constraint it improves edge preservation. For the functional S (I0 ) = I0 ∗ h − IT V + μ ∇I0 T V = |I0 ∗ h − I| + μ |∇I2 | dA (7) the resulting Euler-Lagrange equations are −h (−¯ x) ∗
h (¯ x) ∗ I0 − I − μdiv |h (¯ x) ∗ I0 − I|
∇I |∇I|
= 0.
(8)
Unfortunately though the solution of which is unstable. One approach to improve on this is to use an augmented TV norm [5] 2 2 (I0 ∗ h − I) + η + μ |∇I0 | + ηdA (9) S (I0 ) = with 0 < η 1. The resulting modified Euler-Lagrange equation are ⎛ ⎞ ∇I h (¯ x) ∗ I0 − I ⎠ = 0. − μdiv ⎝ −h (−¯ x) ∗ 2 2 (h (¯ x) ∗ I0 ) + η |∇I| + η
(10)
This greatly improves the response of the fidelity term to impulsive noise, but not so much for the edge preservation of the constraint. It also doesn’t account explicitly for the edges in the image. Other approaches include using Mumford-Shah like techniques of edge detection into the functional [4], weighing the Laplacian based on edge detection [8], Perona-Malik like regularizers [9], maximal likelihood estimators [10], certainty maps [11] and channel pairing on color images [12].
322
M. Feigin and N. Sochen
We propose two novelties in this paper. The first is to combine the augmented L1 norm on the fidelity term for handling impulse noise with anisotropic regularization based on the Laplace Beltrami operator for edge preservation. This is achieved by keeping the L2 norm of the gradient, however this is calculated over a Riemannian manifold instead of the standard Euclidean space using a Laplace-Beltrami approach [13]. When combined with the augmented TV norm (9), this approach also produces exceptional results for impulsive noise (Sec. (4)) The second is the use of the GSVD (generalized singular value decomposition) for the minimization of functionals that employ a gradient based penalty term. It’s direct contribution is the ability easily minimize non-local operators and functionals defined on non square domains where the Fourier transform is inapplicable. For isotropic operators it can be very efficient as the decomposition needs to be calculated once only off line. One interesting point to both these approaches is the relation to other frameworks. In particular it enables to better understand the relation to sparse representation and K-SVD [14]. It is important to note that both these ideas are easily applicable to general ill posed inverse problems over general feature spaces, and specifically for this case, also for color images [15] and textures [16]. The rest of this paper is organized as follows: Sec. 2 discusses the anisotropic approach. Sec. 3 discusses several approaches to minimizing the functional, including a novel approach using the GSVD. Sec. 4 shows some results of the method.
2
Anisotropic Regularization for the Wiener Filter
The problem with edge preservation lies with the gradient based penalty term. In the Euler-Lagrange equations it manifests as a Laplacian that acts as a low pass filter. In order to correctly formulate the anisotropic penalty term, we start with the Euler Lagrange equation for the Wiener filter −h (−x) ∗ (h (x) ∗ I0 − I) − μΔI0 = 0
(11)
and replace the Laplacian with an anisotropic operator, namely the LaplaceBeltrami operator [13] resulting with −h (−x) ∗ (h (x) ∗ I0 − I) − μΔg I0 = 0 .
(12)
The Laplace Beltrami operator is defined as √ 1 Δg I = √ div gG−1 ∇I g where for the gray-scale case
1 + Ix2 Ix Iy , G= Ix Iy 1 + Iy2
g = det (G) .
(13)
(14)
What this does is apply the Laplacian diffusion operator, but instead of applying it under the standard Euclidean norm, it is applied over the image manifold [13].
Anisotropic Regularization for Inverse Problems
323
This means that we are looking at the image as a two dimensional manifold in three dimensional space for gray scale images and in 5 dimensional space for color images. When applying the diffusion operator, distance between pixels is measured over this manifold so the distance takes into account not only spatial offset but also intensity offset. The result is that pixels on different side of an edge are farther apart than pixels on the same homogeneous region and the edges act as insulators so that image data doesn’t flow across edges. This can be extended to color images by applying the diffusion √ on a per- channel basis, that is for each channel I i the process is Δg I i= √1g div gG−1 ∇I i with 2 i i 1 + i Ixi i IxIy . (15) G= i i 2 1 + i Iyi i Ix Iy The metric itself takes into account all the channels coupling them in the final process to remove misalignment of the edges across the different channels. Note that the image channels can be color channels such as RGB, CMY or more general features such as textures [16]). When extending the functional to handle impulse noise using the augmented L1 fidelity term, the Euler-Lagrange equations become instead −h (−¯ x) ∗
3
h (¯ x) ∗ I0 − I 2
(h (¯ x) ∗ I0 ) + η
− μΔg I = 0 .
(16)
Finding the Minimizer
There are several approaches to minimizing the resulting functional. We already have the Euler-Lagrange equations, i.e Eq. (12) and 16. Using the direct Fourier space approach, even for the L2 fidelity term, is not applicable here since the Fourier transform doesn’t diagonalize the LaplaceBeltrami operator. A different relatively simple direct approach approach is to use the gradient descent equations h (¯ x) ∗ I0 − I ∂ I0 = h (−¯ + μΔg I x) ∗ ∂t (h (¯ x) ∗ I0 )2 + η
(17)
For the L2 fidelity term there are two other spectral approaches that can be applied here, and eigen transform and the GSVD. The advantage of these among other things is that they provide a direct solution and thus prove the existence of the minimizer, same as for the standard Wiener filter. Proving the existence of a minimizer for the proposed Tikhonov functional is much more difficult and beyond the scope of this paper, but can be done using similar lines to those taken in [5]. 3.1
The Laplace-Beltrami Eigen-Space
We can use the same approach implemented in [17] to diagonalize the LaplaceBeltrami operator. The problem is that the Eigenvectors of the Laplace-Beltrami
324
M. Feigin and N. Sochen
operator don’t convert the convolution into a multiplication, so we need to combine this approach with the Fourier transform. We start with the Euler-Lagrange equations for the anisotropic Wiener filter, Eq. (12). If we linearize the Laplace Beltrami operator by fixing the metric, it becomes a self adjoint negative (semi) definite operator and thus it’s eigenspace is a bases to the function space under the L2 norm. Insert into this equation the eigen decomposition of the image using this eigen space I0 = c0i φi , I = ci φi (18) i
i
This produces h (−x) ∗
h (x) ∗
i
c0i φi −
+μ
ci φi
i
λi c0i φi = 0
(19)
i
which after rearrangement gives c0i h (−x) ∗ h (x) ∗ φi + μλi c0i φi = ci h (−x) ∗ φi .
(20)
i
Now, to handle the convolution, apply the Fourier transform ˆ∗ · h ˆ · c0 φˆi + μλi c0 φˆi = − −h ci ˆh∗ φˆi i i i
(21)
i
which can be rewritten as
c0i
i
ˆ 2 ˆ ∗ φˆi . ci h h − μλi φˆi =
(22)
i
˜ Here I˜ = (ci ) and This is a linear set of equations of the form AI˜0 = B I. 0 ˜ I0 = ci are the coefficient vectors in the Laplace-Beltrami eigen-space. This is a system of equations needs to be solved for I˜0 . Using these coefficients the ideal image I0 can be reconstructed. For a full solution this needs to be combined with fixed point iterations updating the metric, although it is stable with respect to the flow so in effect this is rarely need. There are two things to note here. First, the coefficients of I decay rather quickly so we can truncate I˜ and thus not calculate the right hand side of B. The same assumption can be made for I˜0 and thus for A. 3.2
Using the GSVD
Consider an energy functional with two linear operators La and Lb using the L2 norm 2 2 S (f ) = |La f | + μ |Lb f | dA . (23)
Anisotropic Regularization for Inverse Problems
325
Assuming that these operators can be discretized as matrices A and B respectively this can written as equations with v a vector representation of the function f S (v) = Av2 + μ Bv2 . (24) The two matrices A and B have a joint diagonalization based on the general singular value decomposition (GSVD) of the form [18] A = U Σ1 X T ,
B = V Σ2 X T
(25)
with U and V unitary matrices and Σ1 and Σ2 positive diagonal (not necessarily square). U and V must have the same number of columns but not necessarily the same number of rows (this last property we will need later on). Thus Eq. (24) can be rewritten as 2 2 S (v) = U Σ1 X T v L2 + μ V Σ2 X T v L2 . (26) Now, we can substitute v˜ = X T v to construct a functional in v˜. Also note that the L2 norm is invariant to unitary transformations, thus this functional is equivalent to 2 2 S (˜ v ) = Σ1 v˜L2 + μ Σ2 v˜L2 . (27) This new functional can be minimized according to v˜ resulting with Σ1T Σ1 v˜ + μΣ2T Σ2 v˜ = 0
(28)
We would like to do something similar with the Wiener-Filter formulation. The problem is that the gradient operator can not be discretized as a matrix operator since it takes a function and returns a vector. Luckily, what we need is an operator operating on I such that the norm would be equal to that of the gradient. For the L2 case this can be achieved as follows 2 2 S (I0 ) = |h ∗ I0 − I| + μ |∇I0 | dA 2 Dx 2 2 (29) ⇒ HI0 − I + Dy I0 = HI0 − IL2 + DI0 L2 L2 where H is the convolution matrix (which is block cyclic but not cyclic in the x 2D case) and D = D is the matrix resulting from stacking the matrix for the Dy derivative in the x direction and the one for the derivative in the y direction. For the L2 case we get 2 Dx 2 2 2 (30) Dy I0 = Dx I0 L2 + Dy I0 L2 = ∇I0 L2 . L2 Now we can use the fact that the GSVD can be applied to matrices with a different number of rows to diagonalize this equation H = U Σ1 X T ,
D = V Σ2 X T
(31)
326
M. Feigin and N. Sochen
Using this we can do the same procedure as before 2 2 HI0 − I2L2 + DI0 2L2 ⇒ U Σ1 X T I0 − I L + V Σ2 X T I0 L 2
2
(32)
and again based on U and V being unitary and substituting I˜0 = X T I0 and I˜ = U −1 I = U T I results with 2 2 (33) S I˜0 = Σ1 I˜0 − I˜ + Σ2 I˜0 L2
L2
this can be minimized according to I˜0 to produce Σ1T Σ1 I˜0 − I˜ + Σ2T Σ2 I˜0 = 0
(34)
or after rearrangement and back-substitution −1 T T Σ1 U I0 . I = X −T Σ1T Σ1 + μΣ2T Σ1
(35)
Note that Σ1T Σ1 + μΣ2T Σ1 is a diagonal matrix and thus easy to invert (in fact for μ = 1 it is the identity matrix). To apply the same idea to the anisotropic case, we need to formulate the prior to the Laplace-Beltrami operator as a gradient over a manifold instead. The operator is the minimizer of the following symmetric positive definite √ √ −1 2 ∇I T G−1 ∇I gdm σ = Dg ∇I dm σ, gG = Dg2 (36) and the discrete formulation for the anisotropic derivative matrix Dg (which replaces D in Eq. 29) can be found via an eigen decomposition of the matrix √ −1 gG √ ⎛ 2 √ 2 2 2 ⎞ Ix + 1+Ix +Iy Iy Ix Iy (1− 1+Ix2 +Iy2 )
√ √ D + D x y 4 4 2 2 2 2 2 2 2 2 Dx 1+Ix +Iy ⎜ (I +Iy )√ 1+Ix +Iy ⎟ (Ix +I√ y) = ⎝ I xI 1− Dg = A (37) ⎠ 1+Ix2 +Iy2 ) Iy2 + 1+Ix2 +Iy2 Ix2 x y( Dy √ √ D + D (Ix2 +Iy2 ) 4 1+Ix2 +Iy2 x (Ix2 +Iy2 ) 4 1+Ix2 +Iy2 y One advantage of this approach is that it is applicable to non-local operators and to non square domains where the Fourier transform as applied to the original Wiener filter fails. For the isotropic case it needs to be calculated once off line as the transform is constant and thus can be very efficient for reoccurring problems (or by splitting the problem into constant sized patches as described in [17]).
4
Numerical Results
Comparing the reconstruction quality based on standard measurements alone such as SNR and PSNR doesn’t do justice to the method. This is due to the fact that these values are not good assessors for edge reconstruction being L2 based measures. Despite this and for a lack of a better objective comparison method,
Anisotropic Regularization for Inverse Problems
327
we do see an improvement in the reconstruction based on these measurements. It is important to also note the subjective difference when looking at the images themselves. The biggest difference is seen near pronounced edges and textures which are much better preserved than with the standard wiener filter. This method also removes ringing (Gibbs effect) seen around strong edges and color skews in color images. The results are cropped and zoomed to better accent the difference due to the limit of the medium.
(a) Input
(b) Degraded
(c) Standard W.F.
(d) Anisotropic W.F.
Fig. 2. Reconstruction of a gray-scale image (2(a)) degraded using a Gaussian kernel and Gaussian noise (2(b)) with standard deviation of 10%. The image is reconstructed using the standard (2(c)) and anisotropic Wiener filter (2(d)).
The first example (Fig. 2) shows the results for a gray scale image degraded by a Gaussian kernel and Gaussian noise with standard deviation of 10% (with a resulting SNR of 16.34db). The reconstruction for both the standard Wiener filter (2(c)) and the anisotropic version (2(d)) is done based on the L2 fidelity term. The SNR of the reconstructed images are 20.72db and 21.08db respectively. The anisotropic reconstruction displays less noise, especially visible in homogeneous areas such as the white background and skin. The edges in the isotropic version on the other hand display both blur (such as the back, hands and hair) and ringing around pronounced edges not appearing in the anisotropic version. This is most pronounced around the dominant edges of the back and the hair. Figure 3 shows the results of applying the Wiener filter to an image with impulse noise (11% density, with 8.47db SNR). The first two examples (3(b), 3(e)) display the result of applying the standard and anisotropic Wiener filters respectively, both using the L2 fidelity term. Despite improving SNR values (15.9db and 16.48db) the results are still rather miserable, although the anisotropic version still displays more pronounced edges (teeth, wall) as well as less noise. On the other hand, looking at the versions employing the augmented L1 fidelity term (3(c) and 3(f)), on first look one can mistake them for the input image. Despite this the anisotropic version still displays much sharper results up close, as well as improved SNR (22.48db compared to 22.98db). The following examples for color images show the extendability of the method to vector valued images.
328
M. Feigin and N. Sochen
(a) Input Image
(b) Std. W.F. L2 fidelity
(c) Std. W.F., L1 fidelity
(d) Degraded Image
(e) AI W.F., L2 fidelity
(f) AI W.F., L1 fidelity
Fig. 3. Restoration of a gray scale image corrupted by impulse noise of density 0.11. Figures 3(b) and 3(e) show the reconstruction using regular and anisotropic Wiener filter with L2 fidelity. Figures 3(c) and 3(f) show the reconstruction using the L1 fidelity term.
Figure 4 shows the results for a color image degraded by a Gaussian kernel and Gaussian noise with a standard deviation of 10% (SNR of 16.7db). As can be seen, the anisotropic reconstruction produces sharper edges without the color shifts and ringing which is visible around sharp edges. Additionally, there is less overall noise and color shifts due to the smoothing of the noise. SNR for the isotropic case is 21.04db compared to 21.6db for the anisotropic variation. Fig. 5 shows the results of applying both the regular and anisotropic Wiener filter, both based on the L1 fidelity term, to a color image degraded by a Gaussian kernel and impulse noise with 11% density (SNR of 11db). The anisotropic variation shows sharper edges, better color restoration and less color skews around edge boundaries. This, like the previous results, is most pronounced around bright edges such as the teeth, eyes and wall. The SNR of the reconstruction is 20db and 23.1db for the isotropic and anisotropic varieties respectively.
Anisotropic Regularization for Inverse Problems
(a) Degraded image
(b) Standard W.F.
329
(c) Anisotropic W.F.
Fig. 4. Color image degraded by a gaussian kernel and uncorrelated Gaussian noise (4(a)) with standard deviation of 10%. Figures 4(b) and 4(c) show the results for the standard and the anisotropic reconstruction.
(a) Degraded image
(b) Std. W.F. L1 fidelity
(c) AI W.F. L1 fidelity
Fig. 5. Color image degraded by a gaussian kernel and uncorrelated impulse noise (5(a)) with density 0.11. Figures 5(b) and 5(c) show the results for the standard and anisotropic restoration based on the L1 fidelity term.
5
Conclusion
In this work we presented an anisotropic regularization term for inverse problems that allows to better preserve object edges while at the same time improving noise suppression. Combined with an augmented L1 fidelity term it provides remarkable results for images corrupted by impulse noise.
References 1. Goodman, J.: Introduction to Fourier Optics. McGraw-Hill Book Company, New York (1996) 2. Jähne, B.: Digital Image Processing, 5th edn. Springer, Heidelberg (2002)
330
M. Feigin and N. Sochen
3. Gonzalez, R.C., Woods, R.E.: Digital image processing, 2nd edn. Prentice-Hall, Englewood Cliffs (2002) 4. Bar, L., Sochen, N., Kiryati, N.: Semi-blind image restoration via mumford-shah regularization. IEEE Trans. on Image Processing 15(2), 483–493 (2005) 5. Bar, L., Kiryati, N., Sochen, N.: Image deblurring in the presence of impulsive noise. Int. J. Comput. Vision 70(3), 279–298 (2006) 6. Blomgren, P., Chan, T.F.: Color tv: Total variation methods for restoration of vector-valued images. IEEE Trans. Image Processing 7, 304–309 (1998) 7. Chan, T.F., Vese, L.A.: Image segmentation using level sets and the piecewiseconstant mumford-shah model. Technical Report 00-14, UCLA CAM (2000) 8. Charbonnier, P., Blanc-féraud, L., Aubert, G., Barlaud, M.: Deterministic edgepreserving regularization in computed imaging. IEEE Trans. Image Processing 6, 298–311 (1997) 9. Welk, M., Theis, D., Weickert, J.: Variational deblurring of images with uncertain and spatially variant blurs. In: Kropatsch, W.G., Sablatnig, R., Hanbury, A. (eds.) DAGM 2005. LNCS, vol. 3663, pp. 485–492. Springer, Heidelberg (2005) 10. Jalobeanu, A., Blanc-Feraud, L., Zerubia, J.: An adaptive gaussian model for satellite image deblurring. IEEE Transactions on Image Processing (4), 613–621 (2004) 11. Krajsek, K., Mester, R.: The edge preserving wiener filter for scalar and tensor valued images. In: DAGM-Symposium, pp. 91–100 (2006) 12. Kaftory, R., Sochen, N., Zeevi, Y.Y.: Variational blind deconvolution of multichannel images. Int. J. Imaging Science and Technology 15(1), 56–63 (2005) 13. Sochen, N., Kimmel, R., Malladi, R.: A general framework for low level vision. IEEE Trans. Image Processing, Special Issue on Geometry Driven Diffusion 7, 310–318 (1998) 14. Aharon, M., Elad, M., Bruckstein, A.: The K-SVD: An algorithm for designing of overcomplete dictionaries for sparse representation. IEEE Trans. On Signal Processing 54(11), 4311–4322 (2006) 15. Kimmel, R., Malladi, R., Sochen, N.: Images as embedded maps and minimal surfaces: Movies, color, texture, and volumetric medical images. International Journal of Computer Vision 39, 111–129 (2000) 16. Sagiv, C., Sochen, N., Zeevi, Y.: Gabor features diffusion via the minimal weighted area method. In: EMMCVPR (September 2001) 17. Feigin, M., Sochen, N., Vemuri, B.C.: Efficient anisotropic α-kernels decompositions and flows. In: POCV (2008) 18. Golub, G.H., Loan, C.F.V.: Matrix computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996)
Locally Adaptive Total Variation Regularization Markus Grasmair Department of Mathematics, University of Innsbruck, Technikerstr. 21a, A-6020 Innsbruck, Austria [email protected] http://infmath.uibk.ac.at
Abstract. We introduce a locally adaptive parameter selection method for total variation regularization applied to image denoising. The algorithm iteratively updates the regularization parameter depending on the local smoothness of the outcome of the previous smoothing step. In addition, we propose an anisotropic total variation regularization step for edge enhancement. Test examples demonstrate the capability of our method to deal with varying, unknown noise levels.
1
Introduction
Because of its ability to generate images with piecewise smooth structures that are well separated by pronounced edges, total variation regularization is one of the most widely used techniques for image denoising and related tasks. Since the first proposal by Rudin, Osher, and Fatemi [14] of using the total variation for denoising purposes, that is, the L1 -norm of the gradient, this method has been applied to a wide range of applications in imaging and inverse problems. We refer to [1, 2, 3, 5, 12, 13, 15] to name but a few contributions to this field. Given a noisy function f ∈ L2 (Ω) on some open and bounded domain Ω ⊂ IRn , n ∈ IN, the goal of denoising is to find a new function u close to f that retains the important features of f while noise, consisting of fast oscillations, is removed. Noting that edges belong to the most prominent features in images, this task can be achieved by minimizing the total variation functional 2 1 T (u; α) := u(x) − f (x) dx + α|Du|(Ω) (1) 2 Ω with respect to u ∈ BV(Ω). The regularization parameter α > 0 in (1) controls the amount of smoothing that is desired: the larger α, the more the regularized function uα tends to consist of well separated homogeneous regions. Conversely, a small parameter α implies a function lying close to the input data, but also possibly exhibiting a significant number of oscillations. The relation between α and uα , however, exists only on a qualitative level. There is no simple connection between the value of α and the smoothness of uα , or even between α and the difference f − uα , which is simply the part of the data classified as noise by the functional T . The necessity of taking into X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 331–342, 2009. c Springer-Verlag Berlin Heidelberg 2009
332
M. Grasmair
account both the data and the expected noise level is a well established fact in the theory of inverse problems (see for instance [8]). Because for many applications of mathematical imaging, in particular tasks that are to be completely automated, a precise knowledge of the noise is not available, this leads to the conclusion that, in these cases, a-priori parameter choices are not feasible. Instead, one should adapt α until both uα and the perceived noise f − uα are satisfactory. Though better than a fixed a-priori choice, also adaptation of the regularization parameter need not be sufficient for good results. It may happen that the noise on the image f is not identically distributed but varies locally. In this case, it is difficult to find a compromise between oversmoothing in noise-free regions caused by too large a parameter choice, and a still noisy output resulting from a small parameter. Similar effects can be observed, if the structure of the noise-free data itself changes over the image. Then, the regularization parameter should be larger for homogeneous parts of the image than for parts with small details. The problem of finding a parameter that is suited for the whole image can be circumvented by passing from a global parameter α > 0 to a parameter function α : Ω → IR>0 . Then, the regularization functional reads as 1 T (u; α) = 2
Ω
2 u(x) − f (x) dx +
α(x) d|Du|(x) .
(2)
Ω
This functional is well-defined, if α is continuous, and, using direct methods, can readily be shown to attain a minimizer, if α is bounded away from zero. Total variation regularization with non-constant regularization parameter has already been studied in several other articles [6, 9, 10, 11, 16, 17]. In [16, 17], the choice of α is based on the scale of the features one wants to recover. In [10], at first the uniform problem is solved with an automatically identified optimal regularization parameter α. The result of the first denosing attempt is used for extracting the edges in the image, at which subsequently the regularization parameter is locally increased. Then the minimization problem is solved a second time with the localized parameter α(x). The approach in [11] uses statistical properties of the residual in order to decide whether the local regularization parameter is suited. The criterion employed there is based on the local variance of the residual: If it is close to the noise level, one can expect that mostly noise has been filtered. It it is higher, then the residual probably contains texture and therefore the regularization parameter has to be decreased. The estimates in [11] are closely related to the inequalities in [10], though the approaches by which they are reached differ considerably. Note moreover that the same idea has already been employed in [6] for one-dimensional total variation regularization. In this paper, we propose to target some a-priori specified smoothness of the output uα , which is measured in terms of the oscillations of the direction ∇uα /|∇uα | of the gradient of the image. This direction can be determined by passing to a dual formulation, as it essentially equals the rescaled dual variable. This idea of parameter adaptation based on the properties of the dual function is taken from [6].
Locally Adaptive Total Variation Regularization
333
The main concept of this paper of using a dual variable to provide a guess on the smoothness of the regularized image is introduced in Section 2. For further improving this smoothed image by enhancing the edges, we propose to subsequently apply anisotropic total variation regularization with an anisotropy that is estimated from the same dual variable that has determined the isotropic regularization parameter (see Section 3). A complete description of the algorithm can be found in Section 4. Finally, we apply this method in Section 5 to two test examples that show its suitability for adaptive noise removal.
2
Parameter Adaptation via Dual Variables
Consider the dual formulation of T (·; α), which consists in solving the constrained minimization problem 2 J (V ) := div V (x) + f (x) dx → min , Ω
|V (x)| ≤ α(x) almost everywhere on Ω , V (x) · ν(x) = 0 almost everywhere on ∂Ω ,
(3)
over the space of vector valued essentially bounded functions L∞ (Ω; IRn ). In (3), ν denotes the outward normal to the domain Ω, and the equation V · ν = 0 is understood in a distributional sense. Also, the divergence of an essentially bounded function is defined distributionally. To be precise, the functions V and div V satisfy the equation ∇φ(x) · V (x) dx = − φ(x) div V (x) dx Ω
Ω
for every φ ∈ C 1 (IR ). Minimization of Tα is equivalent to solving the dual problem (3) in the sense that a function Vα ∈ L∞ (Ω; IRn ) solves (3), if and only if uα := f + div Vα minimizes Tα . We refer to [4], which treats the dual formulation of total variation regularization, and to [7] for a detailed introduction to infinite dimensional convex analysis. We now examine the dual variable V more closely. Formally, the optimality condition for a minimizer uα of the functional T reads as ∇uα (x) for almost every x ∈ Ω . uα (x) − f (x) div α(x) |∇uα (x)| n
Since uα − f = div Vα , one sees that the dual minimizer Vα introduced above in fact coincides with the direction of the gradient of uα , multiplied by α(x). In particular, for almost every x ∈ Ω, we either have that |Vα (x)| = α(x) or the gradient of uα at x is zero, that is, uα is approximately constant near x. Even more, the local behaviour of Vα is strongly related to a certain kind of regularity of the regularized function uα : Large variations of Vα /α on the unit
334
M. Grasmair
sphere imply equally large variations of the direction of the gradient of uα . In other words, variations of Vα /α imply small oscillations of uα . The method we propose in the following takes advantage of these properties of Vα and uα and exploits their relation. Let r > 0 be some fixed parameter. We define the r-local mean of a vector valued, essentially bounded function W ∈ L∞ (Ω; IRn ) at x ∈ Ω by 1 Mr (x; W )(x) := − W (y) dy := n W (y) dy . L Br (x) ∩ Ω Br (x)∩Ω Br (x)∩Ω Here, Ln denotes the n-dimensional Lebesgue measure. In addition, we define the r-local variation of W by Σr (x; W )(x) := W (x) − Mr (x; W ) . (4) The definition of Σr directly implies that Σr (x; W ) ≤ 2 ess sup |W (y)| : y ∈ Br (x) ∩ Ω for almost every x ∈ Ω. Applying the above inequality to the scaled solution Wα (x) := Vα (x)/α(x) of (3), one immediately sees that 0 ≤ Σr (x; Wα ) ≤ 2 max |Vα (y)|/α(y) : y ∈ Br (x) ∩ Ω ≤ 2 . Moreover, the actual size of the value Σr (x; Wα ) provides an indication of the oscillation of the function uα near x: If Σr (x; Wα ) is close to zero, then the gradient ∇uα points in roughly the same direction on the whole set Br (x). Conversely, a value above one implies that the orientation of ∇uα (x) vastly differs from the majority of directions present in Br (x). See Figure 1 for an example of a smoothed image with corresponding local variation of the dual variable Vα . In this manner, the function Σr (x; Wα ) can serve as a local criterion for the smoothness of the regularized function uα . If the finally desired smoothness is not yet reached, that is, if Σr (x; Wα ) is too large, it is necessary to increase the local regularization parameter α(x). Conversely, if the function uα appears too smooth, that is, Σr (x; Wα ) is close to zero, then α(x) is decreased and a new tentative solution uα is computed. This process of computing Σr (x; Wα ) and updating α is repeated until the update of uα becomes small enough. In order to reach a uniform smoothness of the regularized image uα over its whole domain, we propose to prescribe some target smoothness 0 < θ < 1. Then one can compute a suitable update α ˜ of α setting s α(x) ˜ = α(x) θ + Σr (x; Wα )/2 (5) for some parameter s > 0 determining the size of the update. Iteration of this update will lead to a uniform smoothness Σr (x; Wα ) ≈ 2(1 − θ). The choice
Locally Adaptive Total Variation Regularization
335
Fig. 1. Smoothed image (left) and corresponding function Σr (right). Bright pixel values indicate a higher value of Σr .
of the target smoothness should reflect the properties of the image one wants to recover: A large parameter (θ ≥ 0.7) means that only the structures about the size of r are of interest. Small values (θ ≈ 0.55) put more emphasis on the structures of size smaller than r (see also Figure 4). In order to avoid too rapid changes of the parameter α(x), it is necessary smooth the update α ˜ computed by means of (5). Also from a theoretical point of view, this smoothing procedure is required for obtaining a continuous regularization function α. We propose to simply replace the update α ˜ (x) by its local mean value Mr (x; α). ˜ In this way, the average smoothness in the balls Br (x) will be almost independent of x.
3
Edge Enhancement by Anisotropy
Having determined the size of the local regularization parameter α(x) by means of the scaled dual variable Wα , it is in addition possible to use the distribution of the values of Wα on the unit sphere for sharpening edges and, in particular, thin ridges, which usually tend to get oversmoothed. To that end, instead of applying isotropic regularization, we introduce an anisotropy the direction of which is determined by the local covariance of Wα . For R > 0 we define the IRn×n -valued function CovR (x; W ), the covariance of W on BR (x) ∩ Ω, by defining its (i, j)-th component as (i) (i,j) (i) (j) W (y) − MR (x; W ) W (j) (y) − MR (x; W ) dy . CovR (x; W ) := − BR (x)∩Ω
(6) Again using the property that Wα is proportional to ∇uα , one sees that the principal component of CovR (x; Wα ) indicates, up to sign, the prevailing direction of ∇uα near x. This dominant direction can be pronounced further by replacing the isotropic bound |Vα (x)| ≤ α(x) in (3) by an anisotropic one defined by CovR (x; Wα ). This is achieved by minimizing J (V ) respecting the constraints V · ν = 0 on ∂Ω and
336
M. Grasmair
c(x) V (x)t CovR (x; Wα )V (x) ≤ 1
on Ω .
(7)
Here, the scalar valued function c : Ω → IR>0 has to be chosen in such a way that a similar amount of smoothing is reached as for isotropic regularization with parameter α(x). For determining a suitable size for c, note that the amount of smoothing induced by the bound (7) can be estimated by the determinant of the matrix c(x) CovR (x; Wα ), which, for consistency with the constraint |V (x)| ≤ α(x), should equal α(x)−2n . Thus one obtains for the function c the value −1/n c(x) = α(x)−2 det CovR (x; Wα ) . We therefore propose an edge enhancement via solving the minimization problem 2 J (V ) = div V (x) + f (x) dx → min , Ω
V (x)t A(x)V (x) ≤ 1 almost everywhere on Ω ,
(8)
V (x) · ν(x) = 0 almost everywhere on ∂Ω . Here
−1/n CovR (x; Wα ) , A(x) = α(x)−2 det CovR (x; Wα )
and Wα = Vα /α, where Vα is the solution of (3). Denoting the solution of (8) by VA and defining uA := f + div VA , we obtain an enhanced version of the isotropic total variation minimizer uα .
4
Summary of the Algorithm
We now summarize the method developed in the previous sections for adaptive denoising of a noisy image f ∈ L2 (Ω). Algorithm 1. Set k = 1, choose some initial regularization function α1 : Ω → IR>0 , a smoothness parameter 0 < θ < 1, some r > 0, R > 0, s > 0, and ε > 0. 1. Compute
Vk := arg min J (V ) : |V (x)| ≤ αk (x) on Ω, V · ν = 0 on ∂Ω . 2. Define Wk := Vk /αk and compute Σr (x; Wk ) (see (4)). 3. If Vk − Vk−1 < ε go to 5. 4. Compute s α ˆ k+1 (x) := αk (x) θ + Σr (x; Wα )/2 and ˆ k+1 ) , αk+1 (x) := Mr (x; α increase k by one, and go to 1.
Locally Adaptive Total Variation Regularization
337
5. Compute CovR (x; Wk ) (see (6)) and −1/n CovR (x; Wk ) . A(x) := α(x)−2 det CovR (x; Wk ) 6. Compute
VA := arg min J (V ) : V (x)t A(x)V (x) ≤ 1 on Ω, V · ν = 0 on ∂Ω .
Define the regularized function uA := f + div VA . In steps 1–4, only the regularization function α is determined. For this, it is not necessary to compute the minimizers of J precisely. Instead, a reasonable approximation of a minimizer is sufficient to provide a good update of α, at least during the first iterations. In particular if an iterative method is used for the minimization of J , the computation time can be improved by stopping the iteration well before convergence is reached. In the numerical examples below, the functions Vk and VA were computed by alternating between gradient of J and descent steps for the minimization t projections of V on the sets V : |V (x)| ≤ α (x) and V : V (x) A(x)V (x) ≤ k 1 , respectively. The function Vk−1 was used as initial guess for the computation of Vk .
5
Examples
The algorithm presented in Section 4 is tested by means of two images. The first, synthetic image shows a collection of ellipses and rectangles of different size and intensity (see Figure 2, upper left). These clean data were distorted by normally distributed random noise. In order to illustrate the capability of the algorithm for dealing with varying noise level, the standard deviation of the random noise was chosen to increase towards the right bottom of the image from about 10% of the maximal intensities to 150% (see Figure 2, lower left). The original image only consisting of simple geometric forms without any texture, it should be perfectly suited for total variation regularization. The changing noise level within the distorted data, however, makes a uniform parameter choice almost impossible: If the regularization parameter is chosen too small, then the noise on the right hand side of the data is barely removed. In particular, the right hand edges of the lower ellipses can hardly be recovered. On the other hand, a too large regularization parameter leads to the disappearance of the small circle at the left hand side of the image (see Figure 2, middle column). Only a very small range of parameters removes the noise reasonably well while still preserving the small scale structure—and even then the contrast deteriorates. Figure 2, upper right, shows the smoothed image obtained with Algorithm 1. Since the original image is very smooth, the smoothness parameter was chosen rather large as θ = 0.85. The variance Σr was evaluated on balls with a radius of 3 pixels, the complete image measuring 256 × 256 pixels. The lower right image in Figure 2 shows the distribution of the finally chosen regularization function α. As expected, it increases to the right bottom, where more noise is present. Over the whole image, the maxima and minima of α differ by a factor of 12.
338
M. Grasmair
Fig. 2. Left column: original and noisy image; the noise level increases to the right bottom of the image. Middle column: denoising without parameter adaptation; either small details are lost or the smoothing effects are partially insufficient. Right column, upper row: denoised image for smoothness parameter θ = 0.85. Right column, lower row: logarithm of the finally chosen regularization function α; the minima and maxima of α differ by a factor of 12.
One can see in the resulting image that the noise is efficiently removed. Also, the shape of the two lower ellipses is reconstructed in a reasonable way, considering that rather more noise than signal is present in these regions. Moreover, the small circle on the left is clearly visible, though some contrast was lost. As a second test example, we consider the photographer image. In a first experiment we add different levels of random noise (see Figure 3). The outcome of the adaptive Algorithm 1 (right column) is compared with the solution of standard total variation regularization with constant parameter choice independent of the noise level (middle column). The smoothness parameter for the adaptive algorithm was chosen as θ = 0.60; the regularization parameter for the standard algorithm was selected in such a way that the results for moderate noise level (third row) are comparable. The results show that, as expected, a constant regularization parameter only yields good results for a very specific noise level. For stronger noise, almost no smoothing is obtained, whereas the image is oversmoothed in case it is already quite clean. In contrast, the adaptive algorithm yields comparable results for different noise levels, and is also able to treat noise-free images (first row). In order to illustrate the effect of the smoothness parameter, we apply Algorithm 1 to the noise-free photographer image and vary θ (see Figure 4). For a value of θ = 0.55 mainly the grass and details of the camera are smoothed. As
Locally Adaptive Total Variation Regularization
339
Fig. 3. Left column: image with Gaussian noise; the noise level increases with each row (σ = 0, 30, 50, 100). Middle column: total variation denoised image with constant parameter choice. The regularization parameter is kept the same for all images. Right column: denoised images with adaptive parameter choice for a smoothness parameter θ = 0.60.
θ increases, more and more details are lost until only the large scale structures in the image remain. Thus, the smoothness parameter works in some sense like the regularization parameter of standard total variation regularization.
340
M. Grasmair
Fig. 4. Influence of the smoothness parameter θ. First row: original image and smoothed images with θ = 0.55 and θ = 0.60. Second row: smoothed images with θ = 0.65, θ = 0.70, and θ = 0.80. Table 1. Comparison between standard TV regularization, the method proposed in [11], and our method for different smoothness parameters. The table provides signal to noise ratios for the photographer image with various levels of Gaussian noise added (σ = 20, 30, 40, 50). original uniform TV adaptive ( [11]) θ = 0.55 θ = 0.60 θ = 0.65 9.47 6.31 3.86 1.93
14.63 12.13 10.05 8.38
15.63 13.35 11.59 10.11
15.30 13.28 11.47 10.00
14.73 13.46 12.45 11.54
13.71 12.84 12.18 11.51
There is, however, a notable difference. In the standard method, the time when structures in the image disappear depends on their scale, which is basically the ratio between contrast, that is, the difference of the intensities of the structure and the background, and the perimeter of the structure. As opposed to this, the model presented here puts much less emphasis on the contrast. Low contrast but distinct parts of the image tend to disappear much later than with uniform regularization. Compare for instance the rightmost building in the images of Figure 4 with the outcome of the standard method (Figure 3, first row, middle image). Finally, Table 1 compares the performance of our algorithm with uniform total variation regularization and the adaptive method from [11]. The regularization parameter for the comparison was chosen in such a way that the norm of the
Locally Adaptive Total Variation Regularization
341
residual equals the norm of the noise. At small noise levels, the texture enhancing method [11] and even uniform regularization perform better. On the other hand, our algorithm provides good results if much noise is present. Note moreover that the here proposed method does not require a guess on the noise level, whereas the other methods do.
6
Conclusion
We have introduced an algorithm for the local adaptation of the regularization parameter in total variation regularization applied to the task of image denoising. The main idea of the method is to base the parameter choice on the smoothness of the output image, which is measured in terms of the variation of the direction of its gradient. This variation can be obtained when employing a dual method for the actual minimization of the total variation regularization functional. Starting from an initial guess of the regularization function, the proposed algorithm consecutively computes the corresponding minimizer of the total variation functional and updates the regularization function depending on the smoothness of the update. The iteration stops when the update is sufficiently small. As a post-processing step, we propose to apply an anisotropic regularization method intended to sharpen edges. Again, the regularization is determined by the dual variable. This anisotropic regularization step reduces the contrast loss due to isotropic smoothing and, in particular, is suited for the enhancement of ridges. The examples presented in Section 5 indicate the suitability of the proposed method for denoising images with unknown, varying noise levels. In particular, they show its ability to provide an estimate for the amount of smoothing required to obtain a certain smoothness of the output.
Acknowledgement This work has been supported by the Austrian Science Fund (FWF) within the national research network Industrial Geometry, project 9203-N12.
References 1. Acar, R., Vogel, C.R.: Analysis of bounded variation penalty methods for ill-posed problems. Inverse Probl. 10(6), 1217–1229 (1994) 2. Aubert, G., Kornprobst, P.: Mathematical problems in image processing. In: Partial differential equations and the calculus of variations, With a foreword by Olivier Faugeras, 2nd edn. Applied Mathematical Sciences, vol. 147. Springer, New York (2006) 3. Burger, M., Osher, S.: Convergence rates of convex variational regularization. Inverse Probl. 20(5), 1411–1421 (2004) 4. Chambolle, A.: An algorithm for total variation minimization and applications. J. Math. Imaging Vision 20(1–2), 89–97 (2004)
342
M. Grasmair
5. Chambolle, A., Lions, P.-L.: Image recovery via total variation minimization and related problems. Numer. Math. 76(2), 167–188 (1997) 6. Davies, P.L., Kovac, A.: Local extremes, runs, strings and multiresolution. Ann. Statist. 29(1), 1–65 (2001) 7. Ekeland, I., Temam, R.: Convex Analysis and Variational Problems. NorthHolland, Amsterdam (1976) 8. Engl, H.W., Hanke, M., Neubauer, A.: Regularization of inverse problems. Mathematics and its Applications, vol. 375. Kluwer Academic Publishers Group, Dordrecht (1996) 9. Frigaard, I.A., Ngwa, G., Scherzer, O.: On effective stopping time selection for visco-plastic nonlinear BV diffusion filters used in image denoising. SIAM J. Appl. Math. 63(6), 1911–1934 (electronic) (2003) 10. Frigaard, I.A., Scherzer, O.: Herschel–Bulkley diffusion filtering: non-Newtonian fluid mechanics in image processing. Z. Angew. Math. Mech. 86(6), 474–494 (2006) 11. Gilboa, G., Sochen, N., Zeevi, Y.Y.: Variational denoising of partly-textured images by spatially varying constraints. IEEE Trans. Image Process. 15(8), 2281–2289 (2006) 12. Ito, K., Kunisch, K.: Augmented Lagrangian methods for nonsmooth, convex optimization in Hilbert spaces. Nonlinear Anal. 41A, 591–616 (2000) 13. Nashed, M.Z., Scherzer, O.: Least squares and bounded variation regularization with nondifferentiable functional. Numer. Funct. Anal. Optim. 19(7-8), 873–901 (1998) 14. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Phys. D 60(1–4), 259–268 (1992) 15. Scherzer, O., Grasmair, M., Grossauer, H., Haltmeier, M., Lenzen, F.: Variational Methods in Imaging. Applied Mathematical Sciences, vol. 167. Springer, New York (2008) 16. Strong, D.M.: Adaptive Total Variation Minimizing Image Restoration. CAM Report 97-38, University of California, Los Angeles (1997) 17. Strong, D.M., Aujol, J.-F., Chan, T.F.: Scale recognition, regularization parameter selection, and Meyer’s G norm in total variation regularization. Multiscale Model. Simul. 5(1), 273–303 (electronic) (2006)
Basic Image Features (BIFs) Arising from Approximate Symmetry Type Lewis D. Griffin1, Martin Lillholm1, Mike Crosier1, and Justus van Sande2 2
1 Computer Science, University College London, London WC1E 6BT, UK Biomedical Engineering, Eindhoven University of Technology, The Netherlands [email protected]
Abstract. We consider detection of local image symmetry using linear filters. We prove a simple criterion for determining if a filter is sensitive to a group of symmetries. We show that derivative-of-Gaussian (DtG) filters are excellent at detecting local image symmetry. Building on this, we propose a very simple algorithm that, based on the responses of a bank of six DtG filters, classifies each location of an image into one of seven Basic Image Features (BIFs). This effectively and efficiently realizes Marr’s proposal for an image primal sketch. We summarize results on the use of BIFs for texture classification, object category detection, and pixel classification. Keywords: Gaussian Derivatives, Hermite Transform, Group Theory.
1 Introduction Previous schemes for detection of image symmetry are fairly complex [1-6]; requiring, for example, comparison of the outputs of filters at multiple positions. Herein we show that symmetries may be detected by single linear filters. Building on this we present a simple algorithm that computes a Marr-type primal sketch [7] by categorizing local image structure according to its approximate symmetry. The paper is organized as follows. In section 2 we present results on image symmetries. In 3 we show how to test whether a linear filter is sensitive to a symmetry. In 4 we review image measurement with derivative-of-Gaussian (DtG) filters. In 5 we consider the symmetry-sensitivity of these DtG filters. In 6 we show how this sensitivity gives rise to a system of Basic Image Features (BIFs). In 7 we summarize results on using BIFs for texture categorization, object category detection and pixel classification. In 8 we conclude. Sections 2-5 are a distillation of work published, in press and under review in fuller form elsewhere [8-14]; 6 is new; parts of 7 have been presented or are under review in fuller form elsewhere [9, 11].
2 Image Symmetries Symmetry of a structure (X) is always relative to some class of admissible transformations. A structure is said to have a symmetry when a non-trivial group of X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 343–355, 2009. © Springer-Verlag Berlin Heidelberg 2009
344
L.D. Griffin et al.
admissible transformations, known as the automorphism group, each leave it indistinguishable from the original. This is denoted by Aut [ X ] := {t | t D X = X } .
Considering images, an obvious class of transformations are the spatial isometries; and the possible symmetries, relative to this class, have long been catalogued [15-17]. A broader class of transformations, where each spatial isometry is combined with a permutation of a finite set of image ‘colour’ values, has also been considered. These allow the symmetries of, for example, Escher’s ‘Reptiles’ to be expressed [18]. The gamut of possible ‘colour symmetries’ has been fully determined [19, 20]. We propose that the class of ‘image isometries’, defined as a spatial isometry combined with an intensity isometry, is appropriate for images. We write an image isometry as g = ( i, s ) , where i : \ → \ is an intensity isometry, and s : \ 2 → \ 2 is a spatial isometry. Such an image isometry is applied to an image I : \ 2 → \
(
)
according to g D I = i D I ( s D _ ) = i I ( s ( _ ) ) . Choosing a class of transformations is tantamount to choosing a geometry [21], and the geometry that corresponds to the class of image isometries has previously been considered for images [22] and much earlier, abstractly, as one of a larger class of possible geometries [23]. We have employed a method for determining the possible automorphism groups of images, relative to the class of image isometries. The method relies on two results. First, that the projection of a group of image isometries onto their spatial or intensity components in both cases makes a group. Second, that (except for a special case) the intensity projection group must be isomorphic to a factor group of the spatial projection group [8]. Using the method, we have determined the possible automorphism groups of 2-D images, except for cases that contain discrete periodic translations. A summary of these possible symmetries, together with our notational system is shown in fig. 1. The symmetries include: familiar ones, such as reflectional (J2,1), reflect-and-negate (J6,1 ), and Yin-Yang type (J7,2); simple but often ignored ones, such as variation in one direction only (J3); simple but novel, such as continuous translate-and-increment in one direction, plus a line of reflection parallel to that direction (J11); and some wholly novel, such as continuous translate-and-increment in one direction, plus a continuous line of centres of Yin-Yang type symmetry (J12).
3 Sensitivity of Linear Filters to Symmetries Detection of a symmetry seems to require multiple measurements, but this is incorrect. Consider a +1/-1 filter, such as used in finite-difference schemes. When positioned so that it straddles a putative line of reflection, a necessary criterion for the symmetry is that the filter gives a 0 response. We generalizes this: a filter F is sensitive to a symmetry K if it gives the same response to all images that have the symmetry (i.e. ∃ f ∈ \ Aut [ I ] ⊇ K ⇒ F I = f ). This definition is impractical because it requires assessment across all images. However, we have found a necessary and sufficient test that requires only a single integral to be computed. We present this below in Theorem 1, after introducing some notation.
J6,3, J6,5, …
…
J6,2
J6,1
J1,4, J1,6, …
…
… J2,3, J2,5, …
…
J1,2
J2,4, J2,6, …
…
…
…
J0
J2,1
J7,4, J7,6, …
J8,4, J8,6, …
J2,2
J3
J4
J8,2
J7,2
J9
J11
Jslope
J10
Fig. 1. The group/subgroup lattice of the possible image symmetries, excluding those with discrete periodic translation
J1,3, J1,5, …
…
J6,4, J6,6, …
J5
Jconst
J12
Basic Image Features (BIFs) Arising from Approximate Symmetry Type 345
346
L.D. Griffin et al.
We use an inner product notation ( F I :=
G G ∫ F( x) I(x) )
to denote the
G x∈\ 2
measurement of an image I : \ 2 → \ by a filter F : \ 2 → \ ; and we define an K operator F ( ) := ∑ i s D F which, roughly speaking, ‘smears’ a filter by a group. ( i , s )∈K
Theorem: Symmetry-Sensitivity Test for Filters
F is sensitive to K if and only if F F ( K ) = 0 Proof A formal proof will be published elsewhere [14]. Intuitively the truth of the theorem can be understood as follows. The signal that a filter ‘sees’ best is a copy of itself. Of all the symmetric signals, a symmetrised version of the filter should be the most easily seen. If the filter cannot see a symmetrised version of itself, then it is insensitive to the symmetry.
4 Gaussian Derivative Filters Gaussian Derivative (DtG) filters are defined in 1-D by
Gσ ( x ) := ( 2πσ
)
1 2 −2
e
−
x2 2σ 2
, Gσ(
n)
( x ) :=
n
dn ⎛ −1 ⎞ ⎛ x ⎞ Gσ ( x ) = ⎜ ⎟ Hn ⎜ ⎟ Gσ ( x ) dx n 2 σ ⎝ ⎠ ⎝σ 2 ⎠
where Hn is the nth Hermite polynomial; and in 2-D by Gσ(
m, n )
( x, y ) := Gσ( m)( x ) Gσ( n)( y ) .
They are used as a general-purpose method to probe an image location (which for simplicity we assume is at the origin 0 ) by computation of inner products jmn = Gσ( m, n ) I . Typically, one measures with a family of DtG filters up to some order e.g. the 2nd
{
}
order family Gσ( m, n ) | 0 ≤ m + n ≤ 2 . Scale-normalized filter responses c pq := σ p + q j pq make later equations simpler. The suitability of DtGs as the front-end of an uncommitted computational vision system arises from the symmetries that individual filters and families possess [24]. First amongst these is a scale symmetry, which manifests as a change of size, but not of shape, when a DtG is rescaled by blurring with a Gaussian kernel. Second is that the linear span of a family of DtGs is rotationally symmetric. The responses of a bank of DtG filters entangle intrinsic and extrinsic aspects of image structure. For example, an in-plane rotation of the image about the measurement point causes the DtG responses to change. A representation that
Basic Image Features (BIFs) Arising from Approximate Symmetry Type
347
disentangles these aspects for measurement up to 2nd order has been developed [13]. The representation works by factoring out of the 6-D 2nd order DtG response space the changes due to the group of image isometries that fix the measurement point and do not invert the intensity axis which we denote D∞ ( 0 ) × A+ (1) . 12.1
=
-2.3
=
-0.1
2nd order DtG family
image patch
0.8
4.1
-3.7
point in 6-D jet space
6-D jet space D∞ ( 0 ) × A+ (1) - The group of centred rotations and reflections, and positive affine intensity re-scalings
= The 2nd order local-image-structure orbifold
Fig. 2. The top part illustrates schematically the probing of an image patch by a bank of DtG filters resulting in a point in jet space; the bottom, the factoring of the jet space by a group of transformations resulting in the local-image-structure orbifold
The result is an orbifold – a type of manifold with boundaries, creases and corners allowed – consisting of a 3-D and a 0-D component (figure 2). The intrinsic aspect of a 6-tuple of filter responses corresponds to a particular location in the orbifold, and is invariant to rotating the image about the measurement point, reflecting it in a line through the measurement point, or affinely scaling the intensity. When the responses of the 1st and 2nd order DtG filters are all zero, the intrinsic aspect is the 0-D part of the orbifold; all other responses map to the 3-D component. A coordinate system ( l , b, a ) ∈ ⎡⎣ − π2 , π2 ⎤⎦ × ⎡⎣ 0, π2 ⎤⎦ × ⎡⎣0, π2 ⎤⎦ for the 3-D component is given by [13]:
(
)
2 ⎛ ⎞ 2 2 2 l = arctan ⎜ 4 c10 + c01 + ( c20 − c02 ) + 4 c11 , c20 + c02 ⎟ ⎝ ⎠
⎛ 2 2 b = arctan ⎜ 2 c10 + c01 , ⎝
a=
1 2
((
( c20 − c02 )2 + 4 c112 ⎞⎟ ⎠
((
)
)
2 2 arctan c01 − c10 ( c02 − c20 ) + 4c10 c01 c11 , 2 c012 − c102 c11 + c10 c01 ( c20 − c02 )
))
The orbifold has been equipped with a metric, induced by one on the filter response space, which expressed as a line element in the lba-system is
(
−1
)
ds 2 = dl 2 + cos 2 l db 2 + da 2 2 ( 5 − 3cos2b ) sin 2 2b . The orbifold is intrinsically curved, but it can be embedded into Euclidean 3-space with only mild distortion.
348
L.D. Griffin et al.
5 Symmetry-Sensitivity of DtG Filters Using the elements of sections 2-4, we can determine which DtG filters are sensitive to which symmetries. We consider not just canonical filter forms (e.g. an x-derivative) but any linear combination of filters in the 2nd order filter family. This allows us to determine the symmetry-sensitivity of the entire filter family, independent of the particular basis filters used. For example, while the x-derivative filter is sensitive to a reflectional symmetry with a vertical mirror line through the measurement point, the x- and y-derivatives together are sensitive to any reflectional symmetry in a line through the measurement point. J 6,c {3, J 7,c {6, c J 8,4
J1,c{3,
}
J1,c{4, J
c 2,3 +
c J1,2
J
}
J
c 5
J 6,c {6, } J 7,c {12, } J 8,c {6, } J const
} }
c J 6,2 c J 7,4
a 4
J11a
c J 2,2
a J 2,1
a J 2,2 + a+ J 8,2 +
J3
J10
J11g J12g
c J8,2 J 9a
c J 7,2
c J 8,4
J12a J slope
J 4g J 9g
a J 6,1
a J 6,2 + a− J 8,2 +
J 5a
SS is the exterior only
J0
g g J1,2 + J 2,1+ g g J 6,1 + J 7,2+ g J 8,2 +
SS is the entire volume
Fig. 3. The sensitivity-submanifolds (SS) of different symmetry types are shown in red. The different possible SS are arranged in a lattice induced by inclusion relations. The symmetry type labels correspond to those used in figure 1. Superscripts indicate the spatial relationship between the symmetry and the origin: a c indicates origin-centred rotation; an a+ that the origin is contained in a line of reflection, but is not a centre of rotation; similarly for a- and anti-reflections; a g indicates general position, neither centred nor aligned. All symmetries labelled in a box have the indicated SS; those on the left are minimal.
The filter family sensitivities can be projected into the orbifold to determine where the intrinsic component of the jet responses must lie whenever the image has any of a class of symmetries equivalent by conjugation with an element of D∞ ( 0 ) × A+ (1) . We
Basic Image Features (BIFs) Arising from Approximate Symmetry Type
349
call the restricted set of possible responses the sensitivity-submanifold (SS). For example the SS is the orbifold exterior ( a = 0 ∧ a = π 2 ) for reflectional symmetry in a mirror through the measurement point. The results are summarized in fig 3.
6 Symmetry-Based Basic Image Features (BIFs) We have used the symmetry sensitivities of the DtG filters as a starting point in defining a set of Basic Image Features (BIFs) that realize Marr’s idea of a primal sketch of image structure, in a computationally simple scheme. We do not claim that the scheme is derived as rigorously as the results on symmetry sensitivity. Our scheme works by considering the orbifold projection of jets, and classifying them according to the SS that they are closest to i.e. we define a Voronoi cell partitioning of the orbifold with the SS as cell centres. We find that this works best when only seven 0-D SS (the first and second rows of figure 3) are used, though we cannot justify this beyond that it produces nice results. The resulting orbifold partitioning is shown in the top-left of figure 4.
Fig. 4. Top left: the partitioning of the orbifold into BIF categories. Bottom left: BIFs calculated across a range of scales for a simple image of a figure ‘8’; in each cube scale increases right-to-left. Lower cubes sectioned for visualisation. Right: an example complex greyscale image, with BIFs calculated at one particular scale.
The orbifold distance to the six of these SS that lie in the 3-D component of the c orbifold are simple to compute; for example, the distance to the J 7,2 SS is tan −1
(
1 c2 2 20
2 2 + c11 + 12 c02
) (c
2 10
)
2 + c01 . To find which distance is shortest it is
350
L.D. Griffin et al.
computationally equivalent but simpler to find which of six quantities is maximum. The distance to the seventh SS, which corresponds to the origin of jet space where all the 1st and 2nd derivative filters have zero response, is not well-defined. We incorporate it into our scheme by using a multiple of the 0th order jet response. The full resulting scheme for computing BIFs is as follows. i) compute scale-normalized DtG filter responses as described in section 7. ii) compute λ :=
1 2
( c20 + c02 )
and γ :=
1 4
( c20 − c02 )2 + c112
iii)classify according to which is the largest of
{
M = ε . c00 ,
2 2 c10 + c01 , λ , − λ , (γ + λ )
2 , (γ − λ )
}
2, γ .
In our scheme the only free parameters, that have to be tuned to the application are the filter scale σ and ε which controls the amount of image classified as flat; a setting of ε = 0.05 is an effective default. For display purposes we find the following colour scheme effective: if ε .c00 is the largest of M then colour the pixel pink; if 2 2 c10 + c01 is largest colour it grey; then black, white, blue, yellow and green.
7 Example Applications Using BIFs We summarize results on using BIFs for texture, object and pixel classification. 7.1 Texture Classification
Textures are often classified based on a representation of them by a histogram over a texton vocabulary [25-29]. Textons are categorical patch classifications [25, 30]. To define the texton vocabulary, a space of patch descriptions is typically Voronoi partitioned into on-the-order-of 1000 texton categories, usually around centres found by k-means clustering of the responses from many images. Textures are then classified by nearest-neighbour matching of histograms. We have investigated the classification performance of an approach in which images are labelled using spatial complexes of BIFs instead of Voronoi cells in a local description vector space. Our approach is (i) simpler because we have eliminated the clustering step needed to produce a dictionary of features, and (ii) faster because we assign image patches to histogram bins without having to use a high-dimensional nearest-neighbour computation. We call the spatial complexes of BIFs that we use analogously to textons, Basic Image Patterns (BIPs). The type of BIP that we have found effective for texture description is a scale-template of the BIFs at the same location but at four, octave-separated scales. Unlike spatial-template BIPs, these scale-templates retain the rotation invariance of BIFs, which has been shown [30] to be advantageous in texture classification tasks. For textures, we do not use the pink/flat BIF category, so four scales produces a 64=1296 bin histogram representation, which seems to capture the right trade-off between specificity and generality (see figure 5).
Basic Image Features (BIFs) Arising from Approximate Symmetry Type
351
Fig. 5. Left: An image from the CUReT 'polyester' texture class. Centre: BIFs computed at four octave-separated scales, stacked to form an array of 'column-BIPs'. Right: Occurrence histogram of column-BIPs from every position in the image".
Our method has been tested on the CUReT texture dataset [31]. As reported in [9], the simple column-BIP representation and nearest-neighbour matching using the Bhattacharyya distance correctly classifies 98.2±0.1% of the remaining 49 images per class, which is at least as good as other methods using nearest-neighbour classifiers. Extending this method by using a multi-scale histogram comparison [9] results in an improvement to 98.6±0.1% on CUReT, which is comparable to methods [27] using SVMs for classification; and produces what are, to the best of our knowledge, the best reported results [9] on the more challenging UIUCTex [32] and KTH-TIPS [27] datasets, which include variations in scale. 7.2 Object Categorization
Texton approaches have also been shown to be useful for object categorization [28, 33]. Similar to texture, the ‘standard’ approach is to partition a patch descriptor space, such as that used by SIFT [34, 35], into on-the-order-of 1000 categories (visual words) and then to describe each image to be analyzed by what visual words it contains, and to use machine learning techniques to determine a classifier that can predict the category of object based on such descriptions. We have conducted preliminary experiments to assess whether visual words built from BIFs could be used rather than SIFT-space categories. As with texture this would be simpler and faster. For our initial experiments, we have labelled pixels according to their BIF type and, inspired by SIFT, with an orientation, quantized at the π 4 level. The orientation depends on the BIF type: grey BIFs have one of eight possible orientations based on 1st order structure; yellow, green and blue BIFs have one of four possible orientations based on 2nd order structure; black, white and pink BIFs are unoriented. Thus we have twenty-three possible orientation-augmented BIF (oBIF) labels. oBIFs are a natural 2nd order generalization of the gradient orientation alphabet typically used in SIFT [34, 35]. See figure 6 for an example image and calculated oBIFs.
352
L.D. Griffin et al.
Fig. 6. The top row (left) shows an image from the PASCAL challenge, labeled with direction-augmented BIFs at right. On the bottom are shown the 4ä4 template BIPs whose occurrence in an image most informatively signals the presence of a car.
We have tested three different types of visual word, which when built from BIFs or oBIFs we call Basic Image Patterns (BIPs); two based on geometrical partitioning of patch space and one based on more standard data-driven quantization. Each BIP system has been used with simple un-optimised of-the-shelf classifiers and applied to the 20-class PASCAL VOC 2008 [33] object recognition challenge dataset. Our score in figure 7 is based on a late fusion of the three schemes and is mid-field: above other first-time entrants and below well-optimised veteran entries. Using the PASCAL VOC 2008 [33] dataset, examples of 4x4 template BIPs whose presence in images is approximately independent, and which are maximally informative for the ‘car’ category are shown in figure 6. SurreyUv A_SRKDA UvA_TreeSFS LEAR_shotgun Uv A_ FullSFS Uv A_ Soft5ColorSift LEAR_ flat XRCE TKK_ ALL_ SFBS TKK_MAXVAL BerlinFIRSTNikon UCL ECPLIAMA CASIA_ LinSVM INRIASaclay_ CMA CASIA_ NonLinSVM INRIASaclay_ MEVO FIRST_ SCST FIRST_ SC1C CASIA_ NeuralNet 0
10
20
30
40
50
Fig. 7. Results for the PASCAL VOC 2008 challenge. Each bar in the chart is a challenge entry - our result is highlighted.
Basic Image Features (BIFs) Arising from Approximate Symmetry Type
353
7.3 Pixel Classification
Many image problems involve inferring one of a small class of labels for each pixel of an image. For complex images with unpredictable global structure, most approaches balance the likelihood of the labels, given the local image structure, and the likelihood of the local arrangement of inferred labels. In both cases the likelihoods are computed on the basis of statistics learnt from groundtruth-labelled training data. We have experimented with the use of BIFs in the computation of label likelihoods given the image i.e. ignoring the likelihood of arrangements of inferred labels. For our experiments we have used 2-D Electron Microscopy images of neuronal grey matter tissue, stained to enhance neuronal membranes. We trained on four images with hand-drawn groundtruth data, indicating the position of membranes, and evaluated on a further four images. We use a k-Nearest Neighbour (k-NN) approach to classification. NN classification starts by compiling a list of descriptors of all the patches in the training data, together with the groundtruth label of the pixel in the patch centre. The classifier is used by extracting a patch around each pixel to be classified, forming a description of it, comparing the description to each the compiled descriptions, finding the k which are most similar, and assigning the pixel being analyzed with the label associated with the majority of the k. We evaluated a baseline solution based on pixel values. The distance between two patch descriptions is simply the Euclidean distance between their blurred pixel values, minimized over allowing one patch to be rotated or reflected into eight configurations. We jointly optimize blur, patch size and k. The best settings that we find are: no blur, 7ä7 patches, and k=14. At these settings membrane-labelled pixels overlap (intersection divided by union) with the groundtruth by 48%. Our solution uses a patch of BIF labels as a patch descriptor. The distance between two descriptors is simply the number of pixels where the label does not agree. As in the baseline, we minimize the distance over one of the patches being rotated or reflected. We jointly optimize the scale ( σ ) at which the BIFs are computed, the parameter ε which controls the amount of the flat BIF class, patch size and k. The best settings that we find are σ = 1.2 , ε = 0.15 , 9ä9 patches, and k=10. At these settings we achieve an overlap of 55%. See figure 8. image
groundtruth
greylevel-based classification
BIF-based classification
BIFs
Fig. 8. Typical results of our pixel-classification system
So, using BIF- rather than greylevel-description raises the score from 48% to 55%. Computation is also faster because the kNN lookup dominates the cost of computing patch descriptions, and with BIFs the distances that need to be computed are of a Hamming rather than Euclidean type.
354
L.D. Griffin et al.
8 Conclusions We have derived a scheme for classifying image structure into one of seven BIF types based on the outputs of a bank of six DtG filters. Applied to an entire image, the output realizes Marr’s notion of an image primal sketch. Presented results show that BIF description is simple, fast and effective for texture, object and pixel classification. The BIF system was derived by considering the sensitivity of DtG filters to image symmetry. Although the final algorithm is pleasingly simple, there are some weak points in the derivation of the BIFs from symmetry sensitivities. Specifically, why are only the 0-D SS considered, how exactly does orbifold distance correspond to degree of failure of symmetry, why should least-approximate local symmetry be an effective feature label? We hope that the foundation of symmetry-sensitivity of DtGs can eventually answer all of these questions in a scheme where arbitrary choice has been eliminated. Such a scheme will be extendable to higher-order filter families (where appeal to visual evidence and past practice are less effective), for which a richer alphabet of feature labels is to be expected. We predict that such a richer alphabet will give more effective solutions in the application areas that we have reviewed.
References 1. Liu, Y.X., Collins, R.T., Tsin, Y.H.: A computational model for periodic pattern perception based on frieze and wallpaper groups. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(3), 354–371 (2004) 2. Scognamillo, R., et al.: A feature-based model of symmetry detection. Proceedings of the Royal Society of London Series B-Biological Sciences 270(1525), 1727–1733 (2003) 3. Mellor, M., Brady, M.: A new technique for local symmetry estimation. In: Kimmel, R., Sochen, N.A., Weickert, J. (eds.) Scale-Space 2005. LNCS, vol. 3459, pp. 38–49. Springer, Heidelberg (2005) 4. Bonneh, Y., Reisfeld, D., Yeshurun, Y.: Quantification of local symmetry - application to texture-discrimination. Spatial Vision 8(4), 515–530 (1994) 5. Mancini, S., Sally, S.L., Gurnsey, R.: Detection of symmetry and anti-symmetry. Vision Research 45(16), 2145–2160 (2005) 6. Baylis, G.C., Driver, J.: Perception of symmetry and repetition within and across visual shapes: Part-descriptions and object-based attention. Visual Cognition 8(2), 163–196 (2001) 7. Marr, D.: Vision. W H Freeman & co., New York (1982) 8. Griffin, L.D.: Symmetries of 1-D Images. Journal of Mathematical Imaging and Vision 31(2-3), 157–164 (2008) 9. Crosier, M., Griffin, L.D.: Texture classification with a dictionary of basic image features. In: CVPR 2008. IEEE, Los Alamitos (2008) 10. Lillholm, M., Griffin, L.D.: Statistics and category systems for the shape index descriptor of local image. Image and Vision Computing (in press) (2008) 11. Lillholm, M., Griffin, L.D.: Novel image feature alphabets for object recognition. In: ICPR 2008 (2008) 12. Griffin, L.D.: Symmetries of 2D images: cases without periodic translation. Journal of Mathematical Imaging and Vision (in press)
Basic Image Features (BIFs) Arising from Approximate Symmetry Type
355
13. Griffin, L.D.: The 2nd order local-image-structure solid. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(8), 1355–1366 (2007) 14. Griffin, L.D., Lillholm, M.: Symmetry-sensitivity of derivative of gaussian filters. IEEE Transactions on Pattern Analysis and Machine Intelligence (in press) 15. Bieberbach, L.: Über die bewegungsgruppen der euklidischen raume I. Mathematische Annalen 70, 297 (1911) 16. Conway, J.H., et al.: On three-dimensional space groups. Contributions to Algebra and Geometry 42(2), 475–507 (2001) 17. Grünbaum, B., Shephard, G.C.: Tilings and Patterns. WH Freeman & co., New York (1987) 18. Schattschneider, D.: MC Escher. Visions of Symmetry. Plenum Press (1990) 19. Holser, W.T.: Classification of symmetry groups. Acta Crystallographica 14, 1236–1242 (1961) 20. Loeb, A.A.: Color and Symmetry. Robert E. Krieger (1978) 21. Klein, F.: A comparative review of recent researches in geometry (trans. by MW Haskell). Bulletin of the New York Mathematical Society 2, 215–249 (1892) 22. Koenderink, J.J., van Doorn, A.J.: Image processing done right. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 158–172. Springer, Heidelberg (2002) 23. Cayley, A.: Sixth memoir upon the quantics. Philosophical Transactions of the Royal Society 149, 61–70 (1859) 24. Koenderink, J.J., van Doorn, A.J.: Generic Neighborhood Operators. IEEE Transactions on Pattern Analysis and Machine Intelligence 14(6), 597–605 (1992) 25. Varma, M., Zisserman, A.: Texture classification: are filter banks necessary? In: CVPR 2003. IEEE, Los Alamitos (2003) 26. Varma, M., Zisserman, A.: A statistical approach to texture classification from single images. International Journal of Computer Vision 62(1), 61–81 (2005) 27. Hayman, E., et al.: On the signifigance of real-world conditions for material classification. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3024, pp. 253–266. Springer, Heidelberg (2004) 28. Zhang, J., et al.: Local features and kernels for classification of texture and object categories: a comprehensive study. In: CVPR 2006 (2006) 29. Perronnin, F., et al.: Adapted vocabularies for generic visual categorization. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 464–475. Springer, Heidelberg (2006) 30. Varma, M., Zisserman, A.: Unifying Statistical Texture Classification Frameworks. Image and Vision Computing (in press) (2005) 31. Cula, O.G., Dana, K.J.: Compact representation of bidirectional texture functions. In: CVPR 2001. IEEE, Los Alamitos (2001) 32. Lazebnik, S.C., Schmid, C., Ponce, J.: A spare texture representation using local affine regions. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(8), 1265– 1278 (2005) 33. Csurka, G., et al.: Visual categorization with a bag of keypoints. In: ECCV 2004, pp. 1–22 (2004) 34. Lowe, D.G.: Towards a computational model for object recognition in IT cortex. In: Biologically Motivated Computer Vision, Proceeding, pp. 20–31 (2000) 35. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)
An Anisotropic Fourth-Order Partial Differential Equation for Noise Removal Mohammad Reza Hajiaboli Department of Electrical and Computer Engineering Concordia University, Montreal, Canada [email protected]
Abstract. Fourth-order nonlinear diffusion filters are isotropic filters in which the strength of diffusion at regions with strong image features such as regions with an edge or texture is reduced leading to their preservation. However, the optimal choice of parameter in the numerical solver of these filters for having a minimal distortion of the image features results in a very slow convergence rate and formation of speckle noise on the denoised image especially when the noise level is moderately high. In this paper, a new fourth-order nonlinear diffusion filter is introduced, which have an anisotropic behavior on the image features. In the proposed filter, it is shown that a suitable design of a set of diffusivity functions to unevenly control the diffusion on the directions of level set and gradient leads to a fast convergent filter with a good edge preservation capability. The comparison of the results obtained by the proposed filter with that of the classical and recently developed techniques shows that the proposed method produces a noticeable improvement in the quality of denoised images evaluated subjectively and quantitatively as well as a substantial increment of the convergence rate comparing to the classical filter.
1
Introduction
Nonlinear diffusion denoising filters are known for their good edge preservation capabilities. In these techniques, the denoised image is a solution of a partial differential equation (PDE). The first kind of these denoising methods is introduced by Perona and Malik [1] in 1990 based on solving a nonlinear second-order PDE (i.e. the so-called Perona-Malik equation). Since then, there has been a great deal of research in this filed which led to introduction of variety of nonlinear diffusion denoising techniques (see [2], [3] as a few examples). In spite of the good edge preservation obtained by these techniques, these methods tend to produce blocky effects in the images [4]. In fact, the solution of Perona-Malik equation is a piecewise constant solution, therefore these filters create blocky effects on the smooth regions of the image. A spatially regularized version of the nonlinear diffusion filter has been introduced by Catte et al. [2] to reduce the formation of the these artifacts on the denoised image. You and Kaveh [4] proposed a more effective solution to this problem by using a fourthorder PDE for noise removal, where a planar approximation of the noisy image X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 356–367, 2009. c Springer-Verlag Berlin Heidelberg 2009
An Anisotropic Fourth-Order Diffusion Filter
357
is supported in the solution of the PDE resulting in a significant improvement of the ramp edge preservation and a dramatic reduction of blocky effects. Based on this idea a variety of the fourth-order PDE based denoising techniques have been developed such as the filters given in [5], [6], and [7]. However, the fourthorder diffusion filters damp high spatial frequency components (i.e. noise and step edges) much faster than the second-order ones [5]. This feature can result in distorting of the step edges during the evolutionary process of the image denoising especially when smoothing strength of the filter for the detected edges is not effectively reduced by a diffusivity function. Setting a small threshold value in the diffusivity function substantially reduces the diffusivity on the edges with the expense of a very slow convergence rate, as reported in [4] and [5]. All of previously mentioned techniques belong to a class of diffusion-based denoising filters known as isotropic nonlinear diffusion denoising methods. It means that total amount of the diffusion controlled by the diffusivity function is applied on the different regions of the image regardless of the direction of the image features. To improve edge preservation of these filters, the other class of diffusion-based denoising techniques have been emerged in which the diffusion is adapted to the direction of the local image features [8], [9] and [10]. It means that the filter minimizes the diffusion strength on the direction perpendicular to the direction of local features and maximizes it in the direction of the local features. However, these techniques have been developed in the context of the second-order diffusion filters. In this paper, an anisotropic fourth-order diffusion filter is proposed in which the diffusion strength is adjusted respecting the direction of the local features. Two different diffusivity functions are designed to extremely minimize the diffusion perpendicular to the feature orientation, while allowing the diffusion parallel to the edge orientation and on the smooth regions to proceed with normal strength. The comparison of the results obtained by the proposed filter with that of the classical and newly developed ones reveals a noticeable improvement in the quality of the denoised images evaluated subjectively and quantitatively as well as a substantial increment of the convergence rate comparing to the classical filter.
2 2.1
A Brief Review From Second to Fourth-Order Filters
The nonlinear diffusion filters are evolutionary processes. The fundamental PDE of the nonlinear diffusion filter introduced by Perona and Malik [1] is given by ∂u/∂t = div. (c (∇u) ∇u) ,
(1)
where u is the image intensity function, c(.) is a diffusivity function by which the diffusion coefficient is calculated and t is the evolution time. Symbols of div. and . are used for mathematical notation of Euclidean norm and divergence respectively. The diffusivity function is a positive and none increasing function of ∇u. One of these diffusivity functions defined by Perona and Malik is given by
358
M.R. Hajiaboli
c (∇u) = k 2 / k 2 + ∇u2 ,
(2)
where k is the so-called contrast parameter. You and his colleagues [11], carried out a detailed analysis to show that the solution of (1) is equal to the minimization of an energy functional. If the diffusivity function of (2) is used then the energy functional is R (u) = Ω
k2 2 ln k + ∇u2 dxdy , 2
(3)
where Ω is the region of support of u. R (u) is minimized when ∇u2 is minimum, which leads to a piecewise constant approximation of u. Therefore, formation of staircase artifacts on the ramp edges is unavoidable. In order to resolve this problem, You and Kaveh [4] introduced a fourth-order PDE-based denoising method in which the denoised image is obtained by minimization of the potential function given by E (u) = f |∇2 u| dxdy , (4) Ω
2
where f (s) = sc (s)and |∇ u| is the absolute value of Laplacian of u. Therefore, for the same diffusivity function in (2), E(u) is in form of E (u) = Ω
k2 2 ln k + |∇2 u| dxdy , 2
(5)
meaning that E(u) is minimized when |∇2 u| is minimum. Therefore, the ramp region of u (i.e. the regions where |∇2 u| = 0) are fit in the solution of the associated fourth-order PDE. The solution of the Minimization problem of (4) after using Euler equation followed by gradient descent procedure is given by ∂u/∂t = −∇2 c |∇2 u| ∇2 u , (6) By the forward Euler approximation of the ∂u/∂t , the numerical solver of (6) is given by un+1 = un − dt × ∇2 c |∇2 un | ∇2 un , u0 = u0
and
n = 0, 1, · · · , N ,
(7)
where n is the number of iterations, dt is the time step-size and u0 is a noisy image. This process is an iterative process. In order to protect the edges from over-smoothing, the process needs to be ceased at a certain number of iterations denoted by N. Besides these nonlinear diffusion filters, another class of techniques known as regularization techniques based on solving the nonlinear PDE has been widely used for image restoration. The classical paper of Rudin, Osher and Fatemi [12] is introduced one of the first kind of these filters in which PDE to be solved is
An Anisotropic Fourth-Order Diffusion Filter
359
of the second order. Therefore, the same problem of formation of staircases on the ramp regions of the image motivates the researchers to introduce the new regularization techniques by solving the higher order PDE such as [13], [14]. However, the focus of this paper is on the diffusion based techniques as they have been reviewed earlier. 2.2
Edge Preservation and Convergence Rate
Apart from a significant advancement in reduction of the blocky effects on the denoised image using (6), the optimal parameter setting for numerical solver in (7) leads to very slow convergence rate in its numerical solver especially when the level of contaminating noise is moderately high. A recently developed technique known as a fourth-order hybrid model [6] uses a relaxed median filter [15] to improve the quality of the denoised image when the observed image is heavily contaminated by noise. The numerical model of this filter is given by un+1 = RMαω un − dt × ∇2 c |∇2 un | ∇2 un , (8) where RM denotes the relaxed median filter with a lower bound of α and upper bound of ω. This filtering process needs a lower number of iterations to give a denoised image. However, the denoised image is highly affected by using the relaxed median filter and the main advantage of using fourth-order diffusion filters (i.e. the ramp edge preservation) is hindered as it is shown later. Moreover, the computational burden per iteration is dramatically higher than that of the You and kaveh. Another recently introduced technique [7] demonstrates a significant improvement in the convergence rate along with a good ramp edge preservation. In this technique, the diffusivity function of the You and Kaveh filter, c |∇2 u| , is replaced by c (∇u) and the PDE of the filter is given by (9) ∂u/∂t = −∇2 c (∇u) ∇2 u , Although the energy functional of (9) does not have a closed form, it can be seen that the filter can still support the planar approximation of the image. The ramp edge preservation of this fourth-order diffusion filter comes from the fact that 2 2 ∂u/∂t 2 → 0 when ∇ u → 0. However, as|∇ u| ≥ ∇u the diffusivity function of c |∇ u| gives the smaller diffusion coefficient for the step edges compared to c (∇u) . Therefore, in spite of the good convergence rate obtained by (9), the step edges are still facing the higher amount of the distortion comparing to that of the classical methods. 2.3
Anisotropic Diffusion Filters
The so-called anisotropic diffusion filters refer to the schemes in which the diffusion rate is specifically controlled based on the direction of the local features such as the ones introduced in [8], [9] and [10]. The coherence-enhancing diffusion filter [9] is one this kind in which the scalar diffusion coefficient in (1) is
360
M.R. Hajiaboli
replaced by a tensor diffusion coefficient to reduce the diffusivity of the filter in perpendicular to the orientation of the local features, while let the diffusion with high strength is performed at the direction of the level set. Another anisotropic filter introduced by Carmona and Zhong [10] uses the scalar diffusivity functions to perform anisotropic diffusion. The PDE of this filter is given by ∂u/∂t = c1 (c2 uηη + c3 uξξ ) ,
(10)
where c1 ,c2 and c3 are different diffusivity functions and uηη and uξξ are the second-order directional derivative. Let η denote the perpendicular direction to the orientation of the feature or the so-called gradient direction and ξ denote the direction of the contour or level set. All of these techniques belong to a class of filters known as the secondorder diffusion filters. Some techniques such as [16] for surface smoothing by anisotropic diffusion filtering of the normals to the surface or its other variant for image denoising [17] can be considered as fourth-order anisotropic filters, however these filters are two phase filters meaning that at the first phase, an anisotropic filter applies on the normal map of the surface or image and at the second phase, a surface is fitted to the processed normals. In Section 3, a new setting of the fourth-order anisotropic diffusion filter is proposed, which is a single phase filter and can be seen as a generalization of the classical fourth-order nonlinear diffusion filter of You and Kaveh.
3 3.1
The Proposed Model Diffusion Equation
The previously mentioned fourth-order diffusion filters are isotropic in which the extent of the diffusion is controlled by the diffusivity function regardless of the orientation of the edges. The only anisotropic behavior of those filters is limited to the anisotropic response of the discrete Laplacian operator. Most of the discrete Laplacian operators exhibit an anisotropic response to the edge with respect to x and y (i.e. the Cartesian coordination) [18]. However, in order to give an anisotropic realization of the fourth-order diffusion filter, one should consider the second-order directional derivative of the image. Two normalized and orthogonal vectors of η and ξ pointing at the direction of the gradient and level set respectively are given by [ux uy ] [−uy ux ] η= and ξ = . u2x + u2y u2x + u2y
(11)
Based on the definition in (11) , one can derive the second order derivative of the image in the direction of the gradient and level set as uηη = and
uxx u2x + 2ux uy uxy + uyy u2y u2x + u2y
(12)
An Anisotropic Fourth-Order Diffusion Filter
uξξ =
uxxu2y − 2ux uy uxy + uyy u2x . u2x + u2y
361
(13)
However, it can be simply shown that the summation of these second directional derivatives is equal to the Laplacian of the image, ∇2 u = uxx + uyy = uξξ + uηη .
(14)
Therefore, the proposed fourth-order diffusion equation, which is of a generalization of (6) can be written as ∂u/∂t = −∇2 (c1 (c2 uηη + c3 uξξ )) .
(15)
In the proposed model, c1 , c2 and c3 are the diffusivity functions, where c1 controls total amount of diffusion and c2 and c3 control the uneven diffusion in the direction of η and ξ . Apparently, choosing c2 = c3 and c1 ∗ c2 = c will lead to the nonlinear diffusion filter of (6) or (9) depending on the definition of c. In the next section, the criteria of a suitable choice for these diffusivity functions are discussed. 3.2
Diffusivity Functions
Different diffusivity functions in context of nonlinear diffusion denoising have been introduced and depending on the choice of the diffusivity function, the behavior of the filter can be varied. The most commonly used diffusivity function in fourth-order diffusion filters is the one in (2) as c (s), where s is the modulus of the derivative of the image (s = |∇2 u| in (6) or s = ∇u in (9). This diffusivity function regardless of the choice of s is a function bounded in (0,1]. However, a low computational cost and suitable choice of these diffusivity functions in our proposed model is given by c1 (s) = c2 (s) = c (∇u) and c3 = 1 .
(16)
Similar to (9), s in the function c1 is the modulus of the gradient of u which leads to a fast convergence rate and c2 = c1 is an optimal choice in terms of overall computational cost of the filter. Therefore, the proposed model in (15) can be rewritten in the form of ∂u/∂t = −∇2 c (∇u)2 uηη + c (∇u) uξξ . (17) Since the function c is bounded in (0, 1], the overall diffusivity in η direction is smaller than the one in ξ direction. Before presenting comparative results in the next section, the performance of the filter is compared to the second order filter of Perona Malik in Fig.1, which can show the ability of the proposed filter to preserve the ramp edges. In fact, the proposed filter supports the planar approximation of the image similar to (6) and (9), since for planar regions, uηη → 0 and uξξ → 0 which lead to ∂u/∂t → 0.
362
M.R. Hajiaboli
(a)
(b)
(c)
Fig. 1. Comparing the results obtained by a second-order filter and the proposed filter, (a) noisy image, (b) denoised image by the Perona and Malik filter, (c) denoised image by the proposed filter
3.3
Inverse Diffusion
The classical fourth-order filter of You and Kaveh [4] in (6) is a well-posed process because its potential function, (5), is a positive potential function with a global minimum. On the other hand, deriving the potential function of the proposed filter, (17), is not as simple as (6). However, in order to demonstrate that the uneven weighed summation of uηη and uξξ may lead to the inverse diffusion, it is sufficient to show that at least for a sub-region of u 2 sing c (∇u) uηη + c (∇u) uξξ = sign ∇2 u . (18) In this case, the dynamic flow of (17) performs an inverse diffusion, which results in the edge enhancement. The maximum of the uneven weight between coefficients of uηη and uξξ happens, when c (∇u) = 1/2. In this case, the linear version of the (17) can be written in the form of u uξξ ηη ∂u/∂t = −∇2 + 2 4u uξξ uξξ ηη 2 = −∇ + + 4 4 24 u u ∇ ξξ + . (19) = −∇2 4 4 Knowing (6) has a positive potential function, if sign ∇2 u/4 + uξξ /4 = 2 that sign ∇ u , it results in a positive potential function for filter (19). It means that |∇2 u| > |uξξ | should be valid throughout the whole image, which does not hold true. An example shown in Fig.2 can demonstrate the fact that the linear diffusion equation of (19) performs an inverse diffusion on the edges. The signal shown in Fig.2-(b) is the extracted intensity profile of the standard test image of "disk" in Fig.2-(a) at the middle row. The signal in Fig.2-(c) is the same intensity
An Anisotropic Fourth-Order Diffusion Filter
363
Fig. 2. Inverse diffusion as a result of the uneven diffusion in the directions of η and ξ, (a) is the original image of "disk", (b),(c) and (d) are the intensity of the original, diffused image by (19) and diffused image by the proposed filter (17) at the middle row
profile of the image after being filtered by (19). The inverse diffusion in this case leads to the edge enhancement. However, if the filter is run on the nonlinear fashion as it is proposed in (17), the image shown in Fig.2-(d) shows that process of uplifting of the edges is dramatically reduced. In the other word, in the general application of the image denoising, the process of the inverse diffusion in the proposed filter does not lead to instability of the filter and formation of ringing artifacts around the edges.
4
Comparative Results
In this section, we are presenting the comparative results of the proposed method with the other fourth-order nonlinear diffusion filters. The results of the following filters are going to be compared: 1. The Proposed filter with the PDE of (17) with k=7 and dt=0.031 (i.e. the time-step size that provides a data independent stability in the numerical solver [7]). 2. The filter of (7) introduced by You and Kaveh [4] with the suggested parameters setting of dt=0.25 and k=1. 3. The filter of (8) introduced in [12] with the suggested parameters setting of dt=0.1, k=3, α = 3 and ω = 5. 4. The filter in (9) introduced in [7] that is a self-governing filter. In this filter, the diffusivity function of Pernoa and Malik, c(s) has been used with s = ∇u, the contrast parameter of k is estimated by histogram-based mechanism used in [1] and dt=0.031. Three test images of "Pepper", "Cameraman" and "House" have been corrupted by white additive Gaussian noise with standard deviation of 15. In Table 1, an objective comparison between the performances of these filters in terms of signal-to-noise ratio (SNR) of the denoised image and their computational complexity are presented.
364
M.R. Hajiaboli Table 1. Quantitative comparison of the results
Noisy Image SNR(dB) Pepper 10.98 Cameraman 12.38 House 9.68
Method Proposed (9) (7) (8) Proposed (9) (7) (8) Proposed (9) (7) (8)
SNR(dB) 17.84 17.32 15.83 15.21 17.08 16.83 16.59 13.59 17.44 17.08 15.80 15.39
Denoised Image Num. of Iter. CPU/Iter. Convergence(s) 80 0.038 3.04 14 0.080 1.12 3133 0.031 97.12 2 0.155 0.31 35 0.038 1.33 6 0.082 0.492 3015 0.031 93.46 1 0.160 0.16 89 0.038 3.382 36 0.081 2.916 3907 0.031 121.12 2 0.160 0.32
18 17 16
SNR(dB)
15 14 13 12 filter (7) filter (9) proposed filter (17) filter(8)
11 10 9 0 10
1
10
2
10 Number of Iteration −1
3
10
4
10
Fig. 3. Comparing the convergence rate of the filters for denoising of test image "House"
The results exhibit that the proposed method constantly produces the denoised image with higher SNR. It is important to note that the results are obtained at the optimal number of iterations in which the maximum SNR in evolutionary process of the filters are achieved. If the iterative filtering process is continued after the optimal number of iterations, the SNR of the denoised image is reduced due to over-smoothness of edges. The other important feature in the proposed method is its fast convergence rate. As it is shown in Fig.3, for the test image of "House", the convergence rate in the proposed method is much higher than the filter of You and Kaveh. The computational burden of the filters is measured as CPU time of each iteration provided that they are filtering the same image on the same computer. Thus, the total convergence time for filtering process is a multiplication of CPU/iteration by number of iterations. The relaxed median regularized filter converges faster
An Anisotropic Fourth-Order Diffusion Filter
(a)
(d)
(b)
(c)
(e)
(f)
365
Fig. 4. Comparing the perceptual quality of the results. The pair of images labeled (a) to (f) are as the following: (a) noiseless image, (b) noisy image, (c) denoised image using (7), (d) denoised image using (8), (e) denoised image using (9), (f) is denoised image using proposed filter (17).
366
M.R. Hajiaboli
than the proposed method, however the maximum SNR is significantly lower than that of other methods, and the decay rate of SNR due to over-smoothness of the edges is also very fast. The computational cost of the proposed filter compared to the one in (9) is slightly higher, however the higher SNR obtained by the proposed filter justifies this amount of the higher computational burden. In Fig.4, the perceptual quality of the denoised image by the proposed method is compared with that of the other methods. In the first row, the whole image and in the second row, a magnified portion of the image are shown. Each pair of the images is labeled from (a)-(f). The first two images (a) and (b) are the noiseless and the noisy images. In Fig.4-(c), the denoised image by You and Kaveh filter is shown in which formation of some speckle noise on the denoised image is visible. This drawback is known and addressed in [4] and it is as a result of choosing small value for k in diffusivity function, however this setting of k is necessary to protect the edges from over-smoothing. In Fig.4-(d), the denoised image by the relaxed median regularized filter using (8) is shown. This denoised image is blurred and some staircase artifacts on smooth regions of the image are formed. The next image, shown in Fig.4-(e) is the result of the filter in (9) in which the extent of denoising and edge preservation is noticeably better than that of the filters of (7) and (8). However, comparing this result with the one obtained by the proposed filter in Fig. 4-(f) reveals that the extent of edge preservation in the proposed filter is noticeably higher.
5
Conclusion
An anisotropic fourth-order PDE for noise removal has been proposed. A brief theoretical review of the second and fourth-order diffusion denoising filters has been presented with highlighting the fact that previously developed fourth-order filters are isotropic filters in which the extent of the edge preservation is controlled by reduction of the diffusivity of the filters near the edge regardless of its orientation. A major challenge in these filters is that the optimal choice of the model parameters for good edge preservation leads to a dramatically slow convergence rate. However, in the proposed filter, the diffusion strength has been adjusted with respect to the direction of the local features. Two different diffusivity functions have been designed to extremely minimize the diffusion in perpendicular to the feature orientation (i.e. gradient direction), while let the diffusion on the direction parallel to the orientation of the edge (i.e. direction of the level set) proceed with normal speed. Therefore, the proposed filter leads to a faster reduction of the uncorrelated noise and overall faster convergence rate with a good edge preservation due to reduction of the diffusivity of the filter in the gradient direction. The comparison of the results obtained by the proposed filter with that of the classical and newly developed ones has shown that the proposed method produces a noticeable improvement in the quality of the denoised images evaluated subjectively and quantitatively as well as a substantial increment of the convergence rate compared to the classical filter.
An Anisotropic Fourth-Order Diffusion Filter
367
References 1. Perona, P., Malik, J.: Scale-space and edge detection using anisotropic diffusion. IEEE Trans. on Pattern Analysis and Machine Intelligence 12(7), 629–639 (1990) 2. Catte, F., et al.: Image selective smoothing and edge detection by nonlinear diffusion. SIAM J. Numer. Anal. 29(1), 182–193 (1992) 3. Black, M.J., et al.: Robust anisotropic diffusion. IEEE Transactions on Image Processing 7(3), 421–432 (1998) 4. You, Y.L., Kaveh, M.: Fourth-order partial differential equations for noise removal. IEEE Transactions on Image Processing 9(10), 1723–1730 (2000) 5. Lysaker, M., Lundervold, A., Tai, X.-C.: Noise removal using fourth-order partial differential equation with applications to medical magnetic resonance images in space and time. IEEE Tran. on Image Processing 12(12), 1579–1590 (2003) 6. Rajan, J., Kannan, K., Kaimal, M.R.: An Improved hybrid model for molecular image denoising. Journal of Mathematical Imaging and Vision 31, 73–79 (2008) 7. Hajiaboli, M.R.: A self-governing hybrid model for noise removal. In: Wada, T., Huang, F., Lin, S. (eds.) PSIVT 2009. LNCS, vol. 5414, pp. 295–305. Springer, Heidelberg (2009) 8. Weickert, J.: Anisotropic Diffusion in Image Processing. B. G. Teubner (1998) 9. Weickert, J.: Coherence-enhancing diffusion filtering. International Journal of Computer Vision 31(2-3), 111–127 (1998) 10. Carmona, R.A., Zhong, S.: Adaptive smoothing respecting feature directions. IEEE Transactions on Image Processing 7(3), 353–358 (1998) 11. You, Y.-L., et al.: Behavioral analysis of anisotropic diffusion in image processing. IEEE Trans. Image Processing 5, 1539–1553 (1996) 12. Rudin, L., Osher, S., Fatemi, E.: Nonlinear Total Variation based noise removal algorithms. Physica D 60, 259–268 (1992) 13. Chan, T., Marquina, A., Mulet, R.: High Order Total Variation-based Image Restoration. SIAM J. on Scientific Computing 22(2), 503–516 (2000) 14. Fang, L., et al.: Image restoration combining a total variational filter and a fourthorder filter. Journal of Visual Communication and Image Representation 18(4), 322–330 (2007) 15. Hamza, A.B., et al.: Removing noise and preserving details with relaxed median filters. Journal of Mathematical Imaging and Vision 11(2), 161–177 (1999) 16. Tasdizen, T., et al.: Geometric surface smoothing via anisotropic diffusion of normals. IEEE visualization 1(1), 125–132 (2002) 17. Lysaker, M., Osher, S., Tai, X.-C.: Noise removal using smoothed normals and surface fitting. IEEE Transactions on Image Processing 13(10), 1345–1357 (2004) 18. Kamgar-Parsi, B., Rosenfeld, A.: Optimally isotropic Laplacian operator. IEEE Transactions on Image Processing 8(10), 1467–1472 (1999)
Enhancement of Blurred and Noisy Images Based on an Original Variant of the Total Variation Khalid Jalalzai and Antonin Chambolle Centre de Mathématiques Appliquées (CMAP), École Polytechnique, 91128 Palaiseau Cedex, France [email protected], [email protected]
Abstract. In this paper, we introduce a new variant of the total variation (T V ). Its purpose is to simplify T V -based restoration when the image is degraded by some kernel which is easily computed in the Fourier domain (blur, Radon transform...). We actually replace the T V term by a mere L1 norm of some field, for which the optimization is much easier. This approach permits us to use a recent and fast algorithm to enhance, in particular, blurred and noisy images. We also compare our approach with standard total variation based denoising and show that it avoids the famous staircasing effect.
1
Introduction
In 1992, Rudin, Osher and Fatemi (ROF) introduced the total variation in their founding article [13] as a regularizing criterion for inverse problems in imaging. This has been fruitful in image restoration since it can regularize images without smoothing the edges. A possible approach to tackle the minimization of ROF’s problem consists in the generic forward-backward splitting method studied for instance by Combettes and Wajs in [3]. This consists in minimizing (ϕ + ψ) where ϕ and ψ are both convex functions with certain regularity properties. Usually in signal restoration, given a signal u, ϕ(u) is the so-called data fidelity term and is equal to 12 Au − g2 where g is a noisy signal which also underwent a linear perturbation A. The second term ψ usually reflects a priori knowledge about the noise for instance. In case ψ = T V as in ROF’s problem, the main drawback is that it is usually difficult to compute (even with a small error) the minimizer, namely the proximal operator proxT V (see Moreau [9] or again Combettes and Wajs [3] for more about this). Therefore, what we propose in this article is a variant of ROF’s problem where ψ is simply the L1 norm of some field p. This new term preserves the nice properties of the original total variation. Its relevance is due to the fact that its proximal operator is easy to compute and leaves the way open to compressed sensing-type algorithms (see Nesterov [12] or Beck and Teboulle [2] for instance). X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 368–376, 2009. c Springer-Verlag Berlin Heidelberg 2009
Enhancement of Blurred and Noisy Images
369
However our idea is different from the "Augmented lagrangian" (see Tai and Wu [15]) or "Split Bregman" (see Goldstein and Osher [6]) methods where the field p must satisfy at convergence (sometimes approximately) the constraint p = ∇u, while in our approach p might be quite far from being a gradient.
2
Few Notations
From now on, an image u will be represented by an n×n matrix with real entries i.e. an element of X = IRn×n . To simplify matters in the sequel, especially when we shall consider the discrete Fourier transform of u, we assume that the image u is also periodic and defined for all k ∈ ZZ by ui+kn,j+kn = ui,j with i, j ∈ {1, ..., n}. To define the total variation of the image u, we first have to introduce a discretized version of the gradient. For u ∈ X, it is the vector ∇u of Y = X × X given by ui+1,j − ui,j (∇u)i,j = , ui,j+1 − ui,j for i, j = 1, ..., n. Finally, the most simple approximation of the total variation of u ∈ X is defined by T V (u) = |(∇u)i,j | i,j
where | · | is simply the Euclidian norm of IR2 . Let us also introduce two important operators: the divergence div p of an element p ∈ Y and the laplacian Δv of an image v. By analogy with the continuous setting, we want them to satisfy div p, uX = −p, ∇uY and Δv = div ∇v,
(1)
for all u ∈ X.
3
The TV-Based Classical Approach
Given a noisy image g which has also been exposed to a linear perturbation A, the Rudin, Osher and Fatemi method suggests to minimize the following functional 1 2 F (u) = Au − g + λT V (u) (2) 2 to restore the image g. The positive parameter λ controls the regularization level. Actually the T V term is not differentiable and in practice it is often replaced by another approximation of the total variation: ε2 + |(∇u)i,j |2 T Vε (u) = i,j
370
K. Jalalzai and A. Chambolle
where ε is a positive real number. Therefore, we are led to find the unique uε which minimizes 1 2 Fε (u) = Au − g + λT Vε (u). 2 We are actually facing a smooth convex optimization problem which can be solved easily with the gradient method. To do so, it is enough to consider a sequence (un ) of images and a small enough gradient step h > 0 that satisfy un+1 = un − h AT (Aun − g) + λ∇T Vε (un ) ⎛ ⎞ (∇un )i,j ⎠, ∇T Vε (un ) i,j = − div ⎝ ε2 + |(∇un )i,j |2
with
for any i, j = 1, ..., n. It remains to choose u0 : it would be wiser to take it as close as possible to the minimizer, consequently u0 = g seems to be a good choice. Unfortunatelly, the simple scheme which is suggested above is fairly slow since it converges as O n1 which means that there exists a positive real C such that Fε (un ) − Fε (uε ) ≤
C . n
A proof of this classical result can be found in [10], [11] or even [2]. Actually, in [11], Nesterov proposes a variant of the gradient algorithm with convergence rate O( n12 ) which solves the problem. It is as follows:
L 2 vn = argmin Fε (un ) + v − un , ∇Fε (un )X + v − un , v ∈ X , 2 n 1 2 [Fε (uk ) + w − uk , ∇Fε (uk )X ] + w − u0 , w ∈ X , wn = argmin 2 k=0
un+1
2 k+1 wn + vn , = k+3 k+3
where L is the Lipschitz constant of ∇Fε . This algorithm combines efficiently classical gradient method (for the calculation of vn ) and conjugate gradient method (calculation of wn ). We refer to Nesterov [10] for further explanations on these two techniques. See also Beck-Teboulle [2] for a recent, simpler variant.
4
A Variant of TV
Let u ∈ X be an image. The main idea is to replace the T V term in (2) by J(u) = min p1 p∈Y Πp=∇u
Enhancement of Blurred and Noisy Images
371
2 2 where on the one hand, p1 = i,j (p1i,j ) + (p2i,j ) when p = (p1 , p2 ) ∈ X ×X and on the other hand, Π is the projection on the gradients defined by Πp = ∇¯ v, where v¯ realizes the minimum min ∇v − p.
v∈X
(3)
Here · is the Euclidian norm of Y . Remark by the way that we have J(u) ≤ T V (u) ≤ T Vε (u). for any u ∈ X. This is a straightforward consequence of the definition. In the sequel, we shall detail some other interesting properties of this functional which makes us believe that it behaves the same way as T V . Let us get back to work: the solution of (3) is characterized by the EulerLagrange equation ∇∗ (∇u − p) = 0 or, using the notation introduced in (1), Δu = div p, (we recall that our operators ∇, div and Δ are here discrete operators with periodic boundary conditions). Therefore, J(u) =
min
p∈Y div p=Δu
p1 .
Hence, the Rudin, Osher and Fatemi’s problem expressed in terms of this new functional consists in minimizing G(p) =
1 2 Au − g + λp1 2
over (p, u) which satisfy the constraint Δu = div p. Lately, minimization of such functionals has attracted much attention in data compression in particular and was the subject of many papers. Among those, two recent articles by Nesterov [12] and by Beck and Teboulle [2] focus on the minimization of objective functions which can be decomposed as a sum ϕ+ψ where ϕ is a continuously differentiable convex function whose gradient is Lipschitz continuous and ψ is a continuous convex function which is possibly nonsmooth but is simple in the sense that its proximal operator is easy to compute (see Combettes and Wajs [3] for the definition). These characteristics suit perfectly the two terms composing G and we henceforth denote ϕ(p) =
1 2 AΔ−1 div p − g and ψ(p) = p1 . 2
372
K. Jalalzai and A. Chambolle
In their article, Beck and Teboulle describe the following scheme to construct a minimizing sequence (pn ) for G: q1 = p0 ∈ Y, t1 = 1, L 2 pn = argmin ϕ(qn ) + p − qn , ∇ϕ(qn )Y + p − qn + p1 , p ∈ Y , 2 2 1 + 1 + 4tn , tn+1 = 2 tn − 1 qn+1 = pn + (pn − pn−1 ), tn+1 −1
is the Lipschitz constant of ∇ϕ. where L = 12 (1 − cos( 2π n )) Remark that in this algorithm, each iteration needs only one computation of the gradient if things are done properly. As for Nesterov’s algorithm, which converges as O( n12 ) as does Beck and Teboulle’s one, and which is again a clever combination of gradient method and conjugate gradient, it demands two calculations of the gradient which slows down notably each iteration.
5
The Continuous Setting
Let us mention in this section some properties of the functional J in the continuous setting. We refer to Jalalzai [7] for proofs and further results. First of all, let us fix some notations especially for this section. Henceforth, Ω will designate an open set of IRn with a smooth enough boundary and to simplify matters we first place ourselves in the context of functions u whose distributional derivatives are integrable functions that we denote Du, i.e. u lies in Sobolev space H 1 (Ω). The functional we previously introduced is a discretization of n 2 J(u) = inf |φ|, φ ∈ L (Ω) and Πφ = Du . Ω
where Π is the orthogonal projection on gradients as in section 4. Formally n speaking, given a function φ ∈ L2 (Ω) we set Πφ = D¯ v where v¯ minimizes min Dv − φL2 (Ω)n .
v∈H 1 (Ω)
n
It is actually easy to see that there exists a unique ψ ∈ L2 (Ω) such that we have the so-called Helmholtz decomposition ¯ φ = Πφ + ψ where ∇v · ψ = 0 given any v ∈ C 1 (Ω). Ω
(we refer to Dautray-Lions [4] or Temam [14] for more about this topic). If we put things together, we proved that n 2 1 ¯ |Du + ψ|, ψ ∈ L (Ω) and ∇v · ψ = 0 ∀v ∈ C (Ω) . J(u) = inf Ω
Ω
Enhancement of Blurred and Noisy Images
373
Nonetheless, this new formulation of J stays meaningful even when u is simply a function of bounded variation in Ω (denoted u ∈ BV (Ω)) which means that its n distributional derivative Du is this time in Mb (Ω) , the space of IRn -valued finite Radon measures on Ω. Henceforth, we also let ψ range in the space Mb (Ω)n . We refer to Ambrosio, Fusco and Pallara [1] or even Giusti [5] for properties of bounded variation functions and for other measure theory considerations. All this motivated a new definition of J when u ∈ BV (Ω), namely: n 1 ¯ J(u) = inf |Du + ψ|, ψ ∈ Mb (Ω) and ∇v dψ = 0 ∀v ∈ C (Ω) . Ω
Ω
Note by the way that J(u) is obviously well-defined for any u ∈ BV (Ω) since |Du|. (4) J(u) ≤ Ω
Thanks to a classical convex duality argument, it is possible to show that under some additional assumptions on Ω, we have ∇w · Du, w ∈ C 1 (Ω) and ∇w∞ ≤ 1 . J(u) = sup Ω
Using this dual formulation one can prove the following result: Theorem 1. Let Ω be an open set in IRn and u = χE be the characteristic of a finite-perimeter set E ⊂ Ω, or even let u ∈ BV (Ω) with a derivative Du concentrated on the jump set. Then J(u) = |Du|. Ω
The proof is mostly based on the fact that rectifiable sets admit approximate tangent hyperplanes. Remark that when Du has a diffuse part, inequality (4) may be strict. The latter theorem legitimates the use of functional J in the image processing context since it shows that J behaves the same way as T V .
6
Preliminary Numerical Simulations
In this last section, we compare the two different approaches based on the functionals T V and J. For this purpose, we use the two algorithms we presented above. In our implementation, Beck-Teboulle’s algorithm does 2 times less iterations since it needs to compute four Fourier transforms per iteration whereas Nesterov’s algorithm needs only two. We think that one can do much better especially in the case we consider J since it makes extensive use of Fourier transform methods and therefore is easily parallelizable. Moreover, the functional J seems to avoid the famous staircasing effect (see Louchet and Moisan’s article [8]) produced by the T V minimization. Indeed, the latter yields images with peculiar local configurations. J does not. All along these tests, the regularization parameter λ is maintained equal to 1. For all the simulations, we used a personal computer with a 2 Ghz Core2 Duo processor and we let the two Matlab programs run for exactly 20 seconds.
374
6.1
K. Jalalzai and A. Chambolle
First Example
We look at the 256 × 256 Lenna image. This photo went through a Gaussian blur of standard deviation σblur = 1.5 followed by an additive zero-mean Gaussian noise with standard deviation σnoise = 4. The original image is represented in Fig. 1. We then implemented and runned Beck-Teboulle and Nesterov’s algorithms to restore the image. The results of these two experiments are shown in Fig. 3 and 4.
Fig. 1. Original Lenna photo
Fig. 3. J-processed iterations
6.2
Lenna,
Fig. 2. σblur = 1.5, σnoise = 4
600
Fig. 4. T V -processed iterations
Lenna,
1000
Second Example
The second example aims to compare the deblurring for a text scan. The first figure is the original image. In Fig. 2 the 256 × 256 poem image underwent the same disruption process with parameters σblur = 1 and σnoise = 4.
Enhancement of Blurred and Noisy Images
Fig. 5. The first verse of a famous Verlaine’s poem
Fig. 7. J-processed iterations
poem,
600
375
Fig. 6. σblur = 1, σnoise = 4
Fig. 8. T V -processed iterations
poem,
1200
References 1. Ambrosio, L., Fusco, N., Pallara, D.: Functions of Bounded Variation and Free Discontinuity Problems. Oxford University Press, Oxford (2000) 2. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences (accepted) (2008) 3. Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. SIAM Journal on Multiscale Modeling and Simulation 4(4), 1168–1200 (2005) 4. Dautray, R., Lions, J.-L.: Mathematical Analysis VI and Numerical Methods for Science and Technology. Evolution Problems II, vol. 6. Springer, Heidelberg (1993) 5. Giusti, E.: Minimal Surfaces and Functions of Bounded Variation. Birkhäuser, Basel (1984) 6. Goldstein, T., Osher, S.: The Split Bregman Method for L1 Regularized Problems. UCLA CAAM Report 08-29 (2008)
376
K. Jalalzai and A. Chambolle
7. Jalalzai, K.: Étude des propriétés d’une variante de la variation totale. Master thesis (2008) 8. Louchet, C., Moisan, L.: Total variation denoising using posterior expectation (2008), http://hal.archives-ouvertes.fr 9. Moreau, J.-J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. C. R. Acad. Sci. Paris Sér. A Math. 255, 2897–2899 (1962) 10. Nesterov, Y.: Introductory lectures on convex optimization. Kluwer Academic Publishers, Dordrecht (2004) 11. Nesterov, Y.: Smooth minimization of non-smooth functions. Mathematical Programming (A), pp. 127–152 (2005) 12. Nesterov, Y.: Gradient methods for minimizing composite objective function. CORE Report (2007) 13. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 14. Temam, R.: Navier-Stokes Equations Theory and Numerical Analysis. AMS Bookstore (2001) 15. Tai, X.-C., Wu, C.: Augmented Lagrangian Method, Dual Methods and Split Bregman Iteration for ROF Model. UCLA CAAM Report 09-05 (2009)
Coarse-to-Fine Image Reconstruction Based on Weighted Differential Features and Background Gauge Fields Bart Janssen, Remco Duits, and Luc Florack Eindhoven University of Technology, Dept. of Biomedical Engineering & Dept. of Mathematics and Computer Science {B.J.Janssen,R.Duits,L.M.J.Florack}@tue.nl
Abstract. We propose an iterative approximate reconstruction method where we minimize the difference between reconstructions from subsets of multi scale measurements. To this end we interpret images not as scalar-valued functions but as sections through a fibered space. Information from previous reconstructions, which can be obtained at a coarser scale than the current one, is propagated by means of covariant derivatives on a vector bundle. The gauge field that is used to define the covariant derivatives is defined by the previously reconstructed image. An advantage of using covariant derivatives in the variational formulation of the reconstruction method is that with the number of iterations the accuracy of the approximation increases. The presented reconstruction method allows for a reconstruction at a resolution of choice, which can also be used to speed up the approximation at a finer level. An application of our method to reconstruction from a sparse set of differential features of a scale space representation of an image allows for a weighting of the features based on the sensitivity of those features to noise. To demonstrate the method we apply it to the reconstruction from singular points of a scale space representation of an image.
1
Introduction
Reconstruction from signal samples is a long standing problem in signal and image analysis [20]. We present a method for the approximation of a signal or image from its generalized samples, i.e. the samples are given on a non-equidistant grid and were obtained by means of spatially varying filters. Variational reconstruction of non-equidistant image samples has recently become of interest to the image compression community [9] where significant gains in reconstruction quality have been obtained by introducing anisotropic non-linear regularization strategies. In the scale space community a general interest in reconstruction from generalized samples has been there for quite some time [19, 18, 14, 12, 13]. We propose a method that produces an image that approximately satisfies all features. Features that are more robust to perturbations of the source image are given a higher weight, which steers the reconstruction method such that those X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 377–388, 2009. c Springer-Verlag Berlin Heidelberg 2009
378
B. Janssen, R. Duits, and L. Florack
features are better approximated than those that are more sensitive to noise. This leads to a more robust method compared to interpolating methods. A gauge field is introduced by means of covariant derivatives on a vector bundle. This way a model of the to be reconstructed image can be incorporated in the energy functional which is minimized to find a suitable reconstruction. Using this gauge field we can construct a coarse-to-fine image reconstruction method. A coarse-to-fine approach naturally leads to a more efficient algorithm in terms of memory consumption and computational efficiency.
2
Image Reconstruction
In the reconstruction problem we aim for a reconstruction from a set of linear functionals on an image. These functionals represent measurements on the image and are henceforth called features. More rigorously: a feature di ∈ R of an image f ∈ L2 (R2 ) measured with a filter ψi ∈ L2 (R2 ) is given by di = (ψi , f )L2 , i = 1 . . . P in which (·, ·)L2 denotes the L2 -inner product. In general the set of features do not describe the input image f unambiguously (they do not constitute a frame), and there is need for a model to which the reconstruction should adhere. When such a model can be described by a (semi-)norm the reconstruction can be obtained directly by means of an orthogonal projection onto the features [14]. Nielsen and Lillholm [19, 18] proposed to find a reconstruction from its features using a nonlinear regularization term (model). Their so called observationconstrained evolution ensures that the features are interpolated. When measurements are contaminated by noise approximation is often favored over interpolation. In the following we will not discuss the interpolation but approximation of a set of P features {di }P i=1 that were obtained by means of the filters {ψi }i=1 .
3
Approximation
Instead of searching for a signal that interpolates the given features one can try to find a signal that approximates the features. In the case of noisy measurements the latter approach is often preferred. We now aim for the function g ∈ H1 (R2 ) that minimizes P λ 2 E(g) = ((g, ψi )L2 − di ) + ||∇g||2 dV , (1) 2 2 R i=1 regularization term data term where λ ∈ R+ a parameter that controlls the quality of the approximation. As λ tends to 0 the approximation will approach the interpolation of the features. The minimizer of this linear functional can be found by finding the unique g that solves the following Euler equation: P ψi ((ψi , g)L2 − di ) − λΔg = 0 . (2) i=1
Coarse-to-Fine Image Reconstruction
379
The parameter λ takes into account each feature with the same weight. This is not desirable when the features are not normalized and even after normalization one can improve on the selection of the weights. We allow for these improvements by introducing P extra parameters (which we will call feature weights), αi ∈ R+ , i = 1 . . . P , that will be set to a fixed value based on the properties of the features. In case of reconstruction from differential features of a scale space representation of an image, which is the main motivation for our method, we can select the newly introduced parameters based on the noise propagation in the scale space representation of an image. The global parameter λ can be absorbed by these parameters but will be maintained in our formulation for the sake of clarity. For fixed αi we now search for the g that satisfies arg min E(g) = arg min g∈L2 (R2 )
P
g∈L2 (R2 ) i=1
2
αi ((g, ψi )L2 − di ) +
λ 2
R2
||∇g||2 dV .
(3)
In the next section we will discuss how the feature weights can be selected. 3.1
Noise Propagation
In order to be able to select sensible values for the αi parameters that appear in eq. (3), we need to make some assumptions on the noise and the set of filters {ψi }P i=1 that are used to extract the measurements. With regard to the noise we assume additive zero-mean white Gaussian noise which has a correlation distance of τ pixels. In recent work about stability of toppoints [2] (which are singular points of a Gaussian scale space representation of an image) this was found to be a sensible assumption. In our application we will reconstruct from differential structure taken from the Gaussian scale space representation of the input image f , therefore we assume that the set of filters {ψi }P i=1 consists of Gaussian kernels or derivatives thereof. The idea is now to construct the weights αi according to the sensitivity of their associated differential features to noise. In order to estimate the sensitivity of a feature di of the image f that is contaminated by additive noise we can adopt work on noise propagation in scale space by Blom [4]. He proposes to 2 compute at a certain scale t > 0 the momenta Mm = Nmx ,my , Nnx ,ny x ,my ,nx ,ny of derivatives of orders mx , my , nx , and ny of the fiducial noise function N . He assumes only the covariance matrix N 2 of the noise to be given. In case the correlation distance τ is much smaller than the scale t, 2 Mm N 2 x ,my ,nx ,ny
τ −1 12 (mx +my +nx +ny ) 2t
4t
Qmx +nx Qmy +ny ,
(4)
with Qn = (n + 1)!! for n even and Qn = 0 otherwise. Features that are sensitive to perturbations on the source image f should influence the final result less than features that are relatively insensitive to these perturbations. Therefore we i compute αi from eq. (4) such that αi ∝ Mn−2 i ,ni ,ni ,ni at scale t . The parameters x
y
x
y
nix , niy , and ti are the derivative order in the x direction, the derivative order
380
B. Janssen, R. Duits, and L. Florack
in the y direction and the scale of the ith filter ψi . Here we stress that these estimations are based on the assumption that P the filters are partial derivatives of a Gaussian. We furthermore ensure that i=1 αi = 1, which essentially makes αi independent of the value of N 2 and τ . 3.2
Discretization
We can try to solve an approximation to g by discretizing eq. (2) (augmented with the feature weights) or discretize the energy functional in eq. (3), and thereafter finding a discrete minimizer of the discretized energy. These two approaches can be equivalent for a slick choice of so called test functions that are involved in the former method. We will proceed by elaborating on directly discretizing the energy. To solve g from eq. (3) we will approximate g by a β-spline of order n:
(eiω/2 − e−iω/2 )n+1 β n (x) = F −1 ω → (x) , (5) (iω)n+1 where F −1 denotes inverse Fourier transformation. Equality (5) is equivalent to the (n + 1)-fold convolution of the β-spline of order 0 ⎧ ⎨ 1 − 12 < x < 12 0 β (x) = 21 |x| = 12 ; (6) ⎩ 0 otherwise a rectangle. Further details concerning β-splines can be found in eg. [22]. It was shown in the context of optic flow [17] and registration [21] that such an approach has computational advantages over a finite difference approach. Arigovindan et al. [1] showed good results in his application of this approach to (a multigrid scheme for) image and vector field interpolation. Moreover it allows for a coarse-to-fine implementation in a elegant way because of the 2-scale relation
x x n+1 n −n n β = 2 − k . (7) β k 2j 2j−1 k∈Z
The n-th order β-spline approximation of g in two spatial dimensions at resolution a > 0, is given by g˜a (x, y) =
M−1 −1 N l=0 k=0
ck,l β n (
x y − k)β n ( − l) , a a
(8)
with ck,l , x, y ∈ R, β n (·) the central β-spline of order n ∈ N, resolution parameter a, and N, M ∈ N correspond to the width and height of the image in pixels. Notice this is a representation of the image in the continuous domain and that g˜a ∈ C n (R2 ), i.e. n-times continuously differentiable.
Coarse-to-Fine Image Reconstruction
381
The regularization term in eq. (3), R2 ||∇g||2 dxdy, can be approximated with the help of eq. (8) by ∞ ∞ ∞ 1 M−1 −1 N ∂β n xi 2 −k ||∇ga (x, y)|| dxdy = ck,l cm,n a −∞ −∞ −∞ ∂xi i=0 l,n=0 k,m=0 x ∞
x ∂β n xi 1−i 1−i − m dxi − l βn − n dx1−i , (9) βn ∂xi a a a −∞ where (x1 , x2 ) correspond to (x, y) in eq. (8). When we consider the integrals in the previous equations we notice that it can be expressed by a convolution: ∞ ∞ ∂β n x ∂β n ∂β n x ∂β n −k − m dx = −a (u) ((m − k) − u) du . a ∂x a ∂u −∞ ∂x −∞ ∂u (10) This is easily verified by substitution of integration variable (u = xa − k) and noting that β n (x) = β n (−x) for all x ∈ R. We furthermore note that a derivative of a central β-spline of degree n is again a linear combination of β-splines at the expense of lowering its degree to (n − 1) ∂ n β (x) = β n−1 (x + 1/2) − β n−1 (x − 1/2) . ∂x As a result we can write eq. (9) in matrix-vector notation as ∞ ∞ ||∇ga (x, y)||2 dxdy = cT Rc , −∞
(M−1)(N −1)
with c = {ci }i=0
(11)
(12)
−∞
and
N −1 M−1 ∂β 2n (m − k) R = aβ 2n (n − l) n,l=0 ⊗ −a + ∂x m,k=0 M−1 N −1 ∂β 2n (n − l) −a ⊗ aβ 2n (m − k) m,k=0 . ∂y n,l=0
(13)
We will express the inner product in the data term in equation (3) in terms of β-splines as well. This leads to an expression similar to eq. (10), (ga , ψi )L2 (R2 ) = (−1)ni +mi
N −1,M−1
ck,l (β n ∗ ψi )(k − xi , l − yi ) ,
(14)
k,l=0
where (xi , yi ) and (ni , mi ) are the location and differential order of the ith filter ψi respectively. In contrast to the discretization of the regularization we will not derive a closed form expression for this convolution but we will approximate the β-spline in eq. (14) by a Gaussian. Where we use the observation in [23] that 6x2 6 n ∼ β (x) = e− (n+1) . (15) π(n + 1)
382
B. Janssen, R. Duits, and L. Florack
The data term can be expressed in matrix-vector notation by Edata (c) = ||Sc − d||2 ,
(16)
(N −1)(M−1),P
and d = {di }P where S = {(β n ∗ ψi )(k − xi , l − yi )}k,l=0,i=1 i=1 . Now we can write the minimizer of equation (3) in matrix-vector notation as T S S − λR c = S T d . (17) This linear system of equations can be solved using a conjugate gradient (CG) method [3]. In case the matrix S is sparse it is beneficial to apply a multigrid method [5]. Mainly due to the non-sparseness of S, the conjugate gradient method is preferred. Notice that, in this specific case, R can be expressed as a convolution. For large images it is infeasible to explicitly compute S T S, therefore we compute the matrix vector product ˆ c = S T Sc that appears in a conjugate ˜ = Sc and thereafter evaluating ˆc = S T c ˜. gradient iteration by first evaluating c
4
Adaptation to a Gauge Field
In the previous sections we used a very simple model as a regularization term. For several applications it would be beneficial if we were able to introduce a more sophisticated model of the image we want to reconstruct. Feature based image editing [16] and optic flow estimation [13,8] are applications that potentially have great benefit of such a refinement. Recently an image in-painting method was introduced that achieves a model refinement by means of covariant derivatives on a vector bundle that are guided by a user selectable gauge field [10]. We will adapt a similar approach. The basic idea is to replace the gradient that appears in the regularization term of eq. (3) by a covariant derivative DAh that is biased by a gauge field h ∈ H2 (R2 ). To this covariant derivative the gauge field h should be “invisible”, i.e DAh h = 0. If we were able to put h to be the original image f the approximation would exactly produce f again. To this end we interpret f not as a scalar function but as a section through a fibered space E = R2 × R+ . Heuristically this means that we rescale intensity by a spatially varying factor, the unit section σ. Thus we consider f σ instead of f to model intensity values in the image (the latter is a special case in which σ(x) = (x, 1) ∀x ∈ R2 ). This implies that when we consider derivatives, we need to account for the spatial variability of σ. In the next subsection we will introduce to this end a connection on a vector bundle. There, we will also make the heuristic description of our approach presented here a bit more rigorous. For the reader who is not familiar with the concept of vector bundles it could be useful to take notice of Fig. 1 before reading the next subsection, since it aids in developing the right geometrical interpretation of the presented material. 4.1
Connections on Vector Bundles
Consider a vector bundle (E, π, M ), with total space E = R2 × R+ , base space M = R2 , and projection π : E → M . π projects a point in E (a point in M augmented with an intensity L ∈ R+ ) to M in the following manner
Coarse-to-Fine Image Reconstruction
383
(18)
π(x, y, L) = (x, y) .
L amounts to a certain physical quantity such as luminous intensity, which is expressed in candela (cd). Next we define a section s : M → E such that π◦s = idM , where idM denotes the identity map on M . We define the association of a section σf with unique image f ∈ L2 (R2 ) as f ↔ σf ⇔ ∀(x,y)∈R2 σf (x, y) = (x, y, f (x, y)) .
(19)
The multiplication of such a section σf by an image g is given by (20)
gσf = σf g .
Let σ ˜ denote the unit section σ ˜ (x, y) = (x, y, L0 ), with L0 a fixed luminous intensity unit (eg. 1cd). We want to define a connection D over the space of sections Γ (E) on E. Let L (Γ (T M ), Γ (E)) denote the space of linear operators that map a section of a tangent bundle on M to a section of a vector bundle. Here we stress that a section of a tangent bundle, V ∈ Γ (T M ), is just a vector field on M . A map D : Γ (E) → L (Γ (T M ), Γ (E))
(21)
is a connection on a vector bundle iff it possesses the following properties, cf. [15], pp.106. In the following we will use standard notation DV σ = (Dσ) (V ). 1. D is tensorial in V : DV +W σ = DV σ + DW σ for V, W ∈ Γ (T M ), σ ∈ Γ (E) Df V σ = f DV σ for f ∈ C ∞ (M, R), V ∈ Γ (T M ) .
(22) (23)
2. D is R-linear in σ: DV (σ + τ ) = DV σ + DV τ for V ∈ Γ (T M ), σ, τ ∈ Γ (E)
(24)
and it satisfies the Leibniz product rule: DV (f σ) = V (f )σ + f DV σ for f ∈ C ∞ (M, R) .
(25)
Suppose we have a section D on a vector bundle. Then it must satisfy the four properties (eqs. (22) to (25)) mentioned above. Therefore we must have the following identity σ+ Dσ(X)(c(t)) = D(z σ ˜ )(X)(c(t)) = X|c(t) (z)˜
2
z(c(t))c˙i (t)D∂xi σ ˜
(26)
i=1
2 for all sections σ = z σ ˜ , and vector fields X = i=1 c˙i ∂xi . Here c : (0, 1) → M ˙ i = 1, 2, with c(t) ˙ = ddt c(t), and is a smooth curve on M , c˙i (t) = dxi , c(t) z ∈ C ∞ (M, R) an arbitrary image. By {dxi }2i=1 = {dx, dy} we denote the dual frame in the cotangent bundle T ∗ M .
384
B. Janssen, R. Duits, and L. Florack
For each i = 1, 2 D∂xi σ ˜ should be a section on the vector bundle. Such a section can be identified with a function Ai : M → R
(27)
D∂xi σ = σAi = Ai σ ˜.
(28)
by eq. (19) , i.e.
Substituting eq. (28) into eq. (26) yields Dσ(X)(c(t)) =
2 i c˙ (t)∂xi (z)(c(t)) + z(c(t))c˙i (t)Ai (c(t)) σ ˜.
(29)
i=1
2 So each connection is parameterized by the co-vector field A = i=1 Ai dxi . At this point we still have a degree of freedom, namely we still can select a specific co-vector field. In our application we want a certain image h to be “invisible” so for a fixed h we select A = Ah such that A D h (σh ) = 0 , (30) h i.e. DcA ˙ (σh ) = 0 for all curves c, holds for a specific image h. Here we made the dependence of D on Ah explicit in the superscript notation (in the previous equations we left it out in order to facilitate readability). Given the requirement of eq. (30) we explicitly calculate i c˙ (t)(∂xi h)(c(t)) + h(c(t))c˙i (t)Ai (c(t)) σ ˜=0σ ˜ for all curves c : (0, 1) → M
⇔ (∇h)(c(t)) + h(c(t))A(c(t)) = 0 ⇔ Ah (c(t)) = −
(31) 2
∂xi logh(c(t)) dxi ∀h>0 .
i=1
(32) Which gives us an expression for Ah (eq. (32)) provided h is strictly positive. This is a limitation of our method. However, for a system that observes physical quantities this is a realistic assumption. From the previous derivations we conclude that applying a covariant derivative that is gauged by an image h to an image f amounts to 2 2 A D h (σf ) (c) ˙ = c(f ˙ )+ Ai c˙i f σ ˜ = c(f ˙ )− ((∂xi logh)c˙i f ) σ ˜ (33) i=1
i=1
f ˙ σ ˜. = c(f ˙ ) − c(h) h
Where we used the following short notation: c(f ˙ )=
2 i=1
c˙i ∂xi (f ) = (c(t) ˙ · ∇f ) (c(t)) =
d f (c(t)) . dt
(34)
Coarse-to-Fine Image Reconstruction E σf (c(t+)) c(f ˙ )− fh c(h)| ˙ c(t) { c(f ˙ )|c(t)
C σf (c(t))
σh (c(0))
c(h)| ˙ c(t) f c(h)| ˙ c(t) h
c
{
{
y
σh (c(t))
c(t+) c(t)
D
σf
E A
{
σh (c(t+))
385
σh
↓π c(0) c(0) ˙
B
M
x
Fig. 1. A visualization of the calculation of a covariant derivative as described in eq. (33). The base space M corresponds to R2 and total space E corresponds to R2 ×R+ . We refer to the text right after eq. (33) for an explanation of this figure.
Note that eq. (33) can be rewritten as ˙ (c(t)) = σ ˜ (df + f Ah ) (c(t)) ˙ , ∀c:(0,1)→M : DAh σf (c(t))
(35)
˜) = σ ˜ (df + f Ah ). When we identify σf = f σ ˜ ↔ f this simplifies to i.e. DAh (f σ DAh f = (d + Ah )f ,
(36)
in which Ah f is a multiplication. The calculation of a covariant derivative as described in eq. (33) allows for a geometrical interpretation. A visualization thereof, which is depicted in Fig. 1, will be described next. We stipulate this is a specially crafted example since there is only structure present in one single direction. Therefore we only have to construct a visualization for the calculation of a covariant derivative in the direction that is labelled by x in the figure. The derivative in the direction that is labelled by y simply vanishes. On the base space M a curve c : (0, 1) → M is drawn. We want to calculate the covariant derivative of the section σf at the point that corresponds to c(t) on the base space. The covariant derivative is gauged by the gauge field h. Therefore another section, σh , is depicted in the figure. The gradient of σh at the point σh (c(t)) in total space E is depicted by a line, labelled A, through σh (c(t)) and σh (c(t + )). The line labelled D visualizes in a similar manner the gradient of σf at σh (c(t)). On the left side it is shown how the gradient of A is attenuated by the fraction of the values of σf (c(t)) and σh (c(t)). The value of this attenuated directional derivative is added to the directional derivative of σf at σf (c(t)) in the upper left of the figure to finally produce the result of eq. (33). To clarify the attenuation process we added Fig. 2 where the relevant lines are labeled the same as their corresponding lines in Fig. 1. In essence the energy functional for which we search a minimizer stays the same as the one for which a minimizer is
386
B. Janssen, R. Duits, and L. Florack
}
C A c(h) ˙ { f c(h) ˙ h
{
1
B
Fig. 2. Visualization of the amplification of c(h) ˙ by congruence relations that are used in Fig. 1.
f (c(t))
} h(c(t))
f (c(t)) . h(c(t))
This image clarifies the
sought in eq. (3). We merely change the notion of a gradient, which is adapted to a gauge field h, the resulting energy functional now reads E(g) =
P
2
αi ((g, ψi )L2 − di ) +
i=1
λ 2
R2
||DAh g||2 dV ,
(37)
where DAh is the covariant derivative or equivalently a linear connection acting on an image as in eq. (36).
5
Multi-scale Approximate Reconstruction from Singular Points
We will apply the gauged reconstruction of eq (37) to the reconstruction from singular points of a Gaussian scale space representation uf of an image f , with ∂u uf the unique solution to ∂sf = Δuf with initial condition uf (·, 0) = f . Singular 2 + points (x, y, s) ∈ R × R of uf are those points satisfying ∇uf (x, y, s) = 0 . (38) det∇∇T uf (x, y, s) = 0 For more information about catastrophe theory in general, its application in scale space theory and the calculation of the locations of singular points we refer to [11, 6, 7]. A filter ψi corresponding to a derivative at a certain position in the scale space of an image is given by ψi (x, y) = (2si )
ni +mi 2
∂ ni +mi 1 − ((x−xi )24s+(y−yi )2 ) i e . ∂(xni y mi ) 4πsi
(39)
Here we used multi-index notation i = (xi , yi , mi , ni , si ). A singular point is encoded by storing the second order derivative jet for each singular point location. The discretization proposed in Section 3.2 allows for a reconstruction at a certain resolution a > 0. We will select scales {2j }Jj=0 . First we find all features which can be approximated well at the coarsest resolution J, i.e. those features for which ||ψi − PVa ψi || < , where PVa denotes the L2 -projection onto the
Coarse-to-Fine Image Reconstruction
387
Fig. 3. From left to right, (1) the source image “trui”, (2) reconstruction at resolution 65×65 pixels from 84 feature points, (3) reconstruction from 226 feature points at 129× 129 pixels and gauged by the image on its left, (4) reconstruction from 727 feature points at 257×257 pixels and gauged by the image on its left, (5) same reconstruction as the image on the left but not gauged, and (6) reconstruction from all 1070 feature points, no gauge field. The features are up to second order differential structure obtained from the scale space rep. of the source image at its singular point positions.
set Va = {β n ( xa − k)β n ( xa − l)|k, l ∈ Z} and > 0 a small constant. Next we compute a reconstruction at resolution J using a constant gauge field h. Then, for each scale j = J − 1 . . . 0 we find the gauge field by application of the two scale relation (see eq. (7)) to the reconstructed image at scale j + 1. To reduce memory consumption and gain computational efficiency all features that were used in a coarser scale reconstruction are left out such that those features are only implicitly encoded (via the gauge field) in the reconstruction algorithm. See the caption of Figure 3 for a description of the experiments we conducted. Comparing the fourth and fifth image shows that features which are not directly encoded are passed by the gauge field (lower resolution images). In fact the difference between the two reconstructions is quite striking. We furthermore note that memory requirements and the computational complexity for the algorithms to produce these two images are equivalent. When the features of all 1070 singular points are directly used (right most image in Figure 3) the visual quality is more appealing. The memory requirements are however much larger. We also mention the method of feature selection for the next level is quite crude and can be improved by incorporating e.g. a feedback loop. These are possibilities for future exploration which are allowed by the presented framework.
6
Conclusions
We introduced a coarse-to-fine image reconstruction method that approximates a set of generalized samples that are weighted according to their noise robustness. Information from a coarse resolution reconstruction is passed to a finer resolution level by means of a gauge field. To this end we considered the image not as a scalar function but as a section through a fibered space. Application of the newly proposed method to the reconstruction from singular points of a scale space representation of an image shows the feasibility of the method.
388
B. Janssen, R. Duits, and L. Florack
References 1. Arigovindan, M.: Variational Reconstruction of Vector and Scalar Images from Non-Uniform Samples. PhD thesis, EPFL, Lausanne, Switserland (2005) 2. Balmashnova, E.: Scale-Euclidean invariant object retrieval. PhD thesis, Eindhoven University of Technology, Eindhoven, The Netherlands (2007) 3. Barret, R., Berry, M., Chan, T.F., et al.: Templates for the solution of linear systems: Building blocks for iterative methods. SIAM, Philadelphia (1994) 4. Blom, J.: Topological and Geometrical Aspects of Image Structure. PhD thesis, University of Utrecht, Utrecht, The Netherlands (1992) 5. Briggs, W.L., Henson, V.E., McCormick, S.F.: A Multigrid Tutorial. SIAM, Philadelphia (2000) 6. Damon, J.: Local Morse theory for solutions to the heat equation and Gaussian blurring. Journal of Differential Equations 115(2), 368–401 (1995) 7. Florack, L.M.J., Kuijper, A.: The topological structure of scale-space images. JMIV 12(1), 65–79 (2000) 8. Florack, L.M.J., Janssen, B.J., Kanters, F.M.W., Duits, R.: Towards a new paradigm for motion extraction. In: Campilho, A., Kamel, M.S. (eds.) ICIAR 2006. LNCS, vol. 4141, pp. 743–754. Springer, Heidelberg (2006) 9. Galic, I., Weickert, J., Welk, M., Bruhn, A., Belyaev, A., Seidel, H.: Image compression with anisotropic diffusion. JMIV 31(2-3), 255–269 (2008) 10. Georgiev, T.: Relighting, retinex theory, and perceived gradients. In: Proceedings of Mirage 2005 (March 2005) 11. Gilmore, R.: Catastrophe Theory for Scientists and Engineers. Dover Publications, New York (1993); Originally published by John Wiley & Sons, New York (1981) 12. Janssen, B.J., Duits, R., ter Haar Romeny, B.M.: Linear image reconstruction by Sobolev norms on the bounded domain. In: Sgallari, F., Murli, A., Paragios, N. (eds.) SSVM 2007. LNCS, vol. 4485, pp. 55–67. Springer, Heidelberg (2007) 13. Janssen, B.J., Florack, L.M.J., Duits, R., ter Haar Romeny, B.M.: Optic flow from multi-scale dynamic anchor point attributes. In: Campilho, A., Kamel, M.S. (eds.) ICIAR 2006. LNCS, vol. 4141, pp. 767–779. Springer, Heidelberg (2006) 14. Janssen, B.J., Kanters, F.M.W., Duits, R., Florack, L.M.J., ter Haar Romeny, B.M.: A linear image reconstruction framework based on Sobolev type inner products. IJCV 70(3), 231–240 (2006) 15. Jost, J.: Riemannian Geometry and Geometric Analysis, 4th edn. Springer, Berlin (2005) 16. Kanters, F.M.W.: Towards Object-based Image Editing. PhD thesis, Eindhoven University of Technology, Eindhoven, The Netherlands (February 2007) 17. Le Besnerais, G., Champagnat, F.: B-Spline image model for energy minimizationbased optical flow estimation. IEEE-TIP 15(10), 3201–3206 (2006) 18. Lillholm, M., Nielsen, M., Griffin, L.D.: Feature-based image analysis. International Journal of Computer Vision 52(2/3), 73–95 (2003) 19. Nielsen, M., Lillholm, M.: What do features tell about images? In: Proceedings on Scale Space 2001, pp. 39–50. Springer, Heidelberg (2001) 20. Shannon, C.E.: Communication in the presence of noise. In: Proc. IRE, vol. 37, pp. 10–21 (January 1949) 21. Thevenaz, P., Ruttimann, U.E., Unser, M.: A pyramid approach to subpixel registration based on intensity. IEEE-TIP 7(1), 27–41 (1998) 22. Unser, M.: Splines: A perfect fit for signal and image processing. IEEE Signal Processing Magazine 16(6), 22–38 (1999) 23. Unser, M., Aldroubi, A., Eden, M.: On the asymptotic convergence of B-Spline wavelets to Gabor functions. IEEE-TIT 38(2), 864–872 (1992)
Edge-Enhanced Image Reconstruction Using (TV) Total Variation and Bregman Refinement Shantanu H. Joshi1 , Antonio Marquina2,3, Stanley J. Osher3 , Ivo Dinov1 , John D. Van Horn1 , and Arthur W. Toga1 1
3
Laboratory of Neuroimaging, University of California, Los Angeles, CA 90095, USA 2 Departamento de Matematica Aplicada, Universidad de Valencia, C/ Dr Moliner, 50, 46100 Burjassot, Spain Department of Mathematics, University of California, Los Angeles, CA 90095, USA
Abstract. We propose a novel image resolution enhancement method for multidimensional images based on a variational approach. Given an appropriate downsampling operator, the reconstruction problem is posed using a deconvolution model under the assumption of Gaussian noise. In order to preserve edges in the image, we regularize the optimization problem by the norm of the total variation of the image. Additionally, we propose a new edge-preserving operator that emphasizes and even enhances edges during the up-sampling and decimation of the image. Furthermore, we also propose the use of the Bregman iterative refinement procedure for the recovery of higher order information from the image. This is coarse to fine approach for recovering finer scales in the image first, followed by the noise. This method is demonstrated on a variety of low-resolution, natural images as well as 3D anisotropic brain MRI images. The edge enhanced reconstruction is shown to yield significant improvement in resolution, especially preserving important edges containing anatomical information. Keywords: Edge-preserving operators, total variation regularization, deconvolution, Gaussian blur, Bregman iteration, up/down sampling.
1 Introduction With the recent advances in low-cost imaging solutions and increasing storage capacities, there is an increased demand for better image quality in a wide variety of applications involving both image and video processing. Often times, owing to sensor shortcomings, low-power requirements, or environmental limitations, one is only able to acquire a low-resolution observation of the scene. The low-resolution data can exist in the form of still images, a sequence of image frames devoid of inter-frame motion, a single video sequence, or a collection of video sequences. Furthermore the observations can be corrupted by motion-induced artifacts either in the case of still images or videos. The collective approach that tackles the problem of reconstructing a high-resolution image from one or more of the above low-resolution observations is termed as superresolution. There are several prominent approaches to this problem, all of them largely employing various cues such as sub-pixel shifts between successive frames, the camera blur, defocus, and zoom, etc. These approaches can be divided into two types, ones that use motion information between successive frames (e.g., video super-resolution), and X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 389–400, 2009. c Springer-Verlag Berlin Heidelberg 2009
390
S.H. Joshi et al.
the others that use a motion-free approach. Most of these approaches usually expect multiple low-resolution observations as input. Super-resolution image reconstruction can be mathematically modeled as a nonlinear process consisting of a convolution operator acting on the image, followed by a down sampling operation and the mixing of additive noise. Most of the earlier research work in this area has been developed in the frequency domain approach using (discrete) Fourier transform and wavelet-transform based methods. For e.g. the approach of Tsai and Huang [13] first outlined the idea of super-resolution in their seminal paper. Peleg et al. [8] used the iterative back projection scheme to achieve image reconstruction. Yet another approach [12] uses projections on convex sets (POCS) of images to restrict the solution domain for reconstruction. A hybrid approach by Elad and Feuer [5] combines the POCS and the maximum likelihood approaches for both motion-based and motion-free super-resolution. A very different set of methods use the learning-based approach for super-resolution. The general idea here is to learn a set of image features from exemplar images and use them for the reconstruction of a high-resolution image. Capel and Zisserman [2] use PCA on face image databases to learn the image model and use it to reconstruct images from multiple views. Freeman et al. [6] learn a feature set of image patches that encode the relationships among different spatial frequencies from a large training set and use it as prior information for reconstructing higher frequencies for resolution enhancement. The reader is referred to an excellent monograph by Chaudhari and Joshi [4] for a comprehensive bibliography and references in the field. Along with a wide range of applications of super-resolution methods in tasks such as satellite image processing, surveillance, computer vision, and even video processing, there has been a considerable effort by researchers trying to apply these methods to medical imaging. In particular, MRI acquisitions usually have a low-resolution in the inter-slice direction, and it is of considerable interest to “fill-in” the intermediate slices. Carmi et al. [3] use sub-pixel shifted MR (Magnetic Resonance) images for high resolution reconstruction. Greenspan et al. [7] combine several low resolution images in the slice-select direction to achieve SR reconstruction. Kornprobst et al. [9] also achieve higher resolution in the slice-select direction for fMRI sequences. While super-resolution methods attempt to exploit the information redundancy in several low-resolution observations of images, at times, only a single low-resolution instance of the image is available. This is sometimes the case in MRI images, where due to economic or health reasons, a patient is scanned only once over a period of time, or the time elapsed between successive scans may be too large to preserve any temporal coherence to take advantage of. Based on this assumption, we will focus mainly on the problem of single frame high resolution reconstruction of images. Our approach will be based upon a variational model that uses the TV norm [11] as a regularizing functional. Recently, Marquina et al. [14] have proposed a new variational model based on the TV norm [11] for super-resolution of multidimensional images. They use a new multi-scale approach (Bregman iterations) for iterative refinement and recovery of finer details in images. We will follow this approach to solve the more general super-resolution problem using the TV norm as regularizing functional. In addition, we propose an iterative refinement procedure based on an original idea by Bregman [1], to improve spatial resolution. The proposed super-resolution method improves upon the behavior of any
Edge-Enhanced Image Reconstruction
391
interpolation method (including high order and sinc interpolation) because our method preserves edges satisfactorily avoiding Gibbs phenomenon, whereas the iterative refinement procedure allows us to recover fine scales of the image. The main contributions of this paper are as follows: – a three-dimensional variational model based on the TV norm [11] regularizer. – a new multi-scale approach (Bregman iterations) for iterative refinement and recovery of finer details in images. – a new piecewise-linear up(down) sampling operator that preserves edges. – application of this method for super-resolution for anisotropic 3D MRI images. This paper is organized as follows: Section 2 outlines the super-resolution model using TV regularization. In particular, it explains the variational model as well as a new scale-space approach that utilizes the Bregman iterative procedure for recovering finer details from images. Additionally, section 2.2 proposes a new edge-preserving up (down) sampling operator used in the model. Section 3 presents details of the numerical implementation of the model. Section 4 demonstrates experimental results for a few 2D natural images as well as 2D slices and 3D volumes of MRI images, followed by the summary.
2 Image Observation and Synthesis Model The low resolution image observation model can be formulated in a standard fashion as a down-sampled degraded version of the original high resolution image. We assume that the low resolution image f is defined on a subset of a plane Ω ⊂ Rk . For the purpose of this paper, k is either 2 or 3. Here onwards, all the notation will be specified for 3D images. The restriction to 2D images is straightforward. For a discrete representation, we assume f ∈ Rn × Rm × Rp . Let the unknown high resolution image to be estimated be given by u ∈ R2m × R2n × R2p . Then given a linear down sampling operator D, we can write the observation model as, f = D(h ∗ u) + n,
(1)
where n is an additive Gaussian white noise with zero mean and variance σ 2 , and h is a translation invariant convolution kernel corresponding to the point spread function of the imaging device. A related problem in the above formulation is the estimation of the kernel h, that we shall skip in this paper. Throughout this paper, we assume that the kernel is given by the Gaussian, h(x, y, z) = Ke
− 12
x2 2 σx
2
2
y
z
y z +σ 2 + σ2
,
(2)
where K is a normalization constant, and σx , σy , σz are variances along the X, Y , and Z directions respectively. The problem in Eqn. 1 is usuallysolved as a constrained optimization problem that seeks to minimize the regularizer Ω ||∇u||2 dxdy, while constraining the noise to be ||h ∗ u − f ||2L2 = σ 2 . This ensures that the reconstructed image u is free of discontinuities. An alternative to the above regularizer is the total variation
392
S.H. Joshi et al.
proposed by Rudin and Osher [11]. This norm is shown to recover edges in images satisfactorily. The total variation norm is given as, TV(u) = |∇u|dxdy (3) Ω
Using the regularizer in Eqn. 3, we can state the single frame image reconstruction model as follows: u ˆ = arg min{T V (u) + u
λ [||f − D(h ∗ u)||2L2 − σ 2 ]} 2
(4)
The Euler-Lagrange formulation for Eqn. 4 can be written as ∇u ˜ ∗ S(f ) − h ˜ ∗ (S ◦ D(h ∗ u))) = 0 + λ(h |∇u| ∇u ˜ ∗ (¯ =⇒ ∇ · + λh g − T (h ∗ u)) = 0 |∇u| ∇·
(5) (6)
˜ is the inverse of h, g¯ = S(f ), and the operator T where S is an upsampling operator, h is defined as T = S ◦ D. Furthermore D ◦ S = Id The Euler-Lagrange equation given by Eqn. 6 can be solved as a time-dependent equation ∇ ˜ ∗ (¯ ut = ∇ · + λh g − T (h ∗ u)) (7) |∇u| with homogeneous Neumann boundary conditions and initiating with u0 = S(f ). 2.1 Bregman Iterative Method The convergence of Eqn. 7 to the steady state yields a reconstructed high resolution image. However if one wishes to recover even finer scales from the reconstructed image, one can use the Bregman iterative refinement procedure [1] to do so. If u0 is the solution of Euler-Lagrange equation (6), then we have, ∇·
∇u0 ˜ ∗ (¯ + λh g − T (h ∗ u0 )) = 0 |∇u0 |
(8)
We will denote the image residual in the high resolution scale by v0 as, v0 = g¯ − T (h ∗ u0 )
(9)
We now solve the Euler-Lagrange equation for the new image g¯ + v0 to obtain a new solution, which we denote by u1 . Again, the solution u1 will satisfy ∇u1 ˜ ∗ g¯ + v0 − T (h ∗ u1 ) = 0, ∇· + λh (10) |∇u1 | where the new residual is defined as v1 = g¯ + v0 − T (h ∗ u1 )
(11)
Edge-Enhanced Image Reconstruction
393
and so on. The sequence of images u0 , u1 , · · · , uj , · · · are also referred to as Bregman iterates. It is advisable to terminate this procedure when a satisfactory image quality is obtained, otherwise it has a tendency to recover noise after all the finer scales in the image are recovered. This iterative procedure was introduced for image restoration in [10]. 2.2 Edge-Preserving Up (Down)-Sampling Operator There are various choices for the up (S) and down (D) sampling operators used in the observation model in Eqn. 1 and the synthesis model in Eqn. 7 respectively. The simplest down sampling operator can be an averaging operator that simply averages the eight neighbors of the pixel using either a Gaussian kernel, or an arithmetic average. Correspondingly, the up sampling operation simply involves repeating voxel values for each row, column, and slice. Alternately, one can also use bilinear interpolation for up sampling and down sampling images. The problems with the above approaches are the unnecessary blurring (averaging) that is caused at each step of the iteration while solving the Euler Lagrange equation in 6. To overcome this problem, one can use better signal preserving operators that involve sinc or Fourier interpolation for up and down sampling. However these methods can potentially introduce ringing artifacts in images with sharp edges or boundaries. Especially for images with prominent edges and interfaces, we need an appropriate interpolation operator that preserves these features. Accordingly, we propose a new piecewise-linear up (down) sampling operator that preserves such edges and boundaries. We describe the edge-preserving operator in detail below. We set up the grid xj = (j − 1)Δx, yk = (k − 1)Δy and zl = (l − 1)Δz, where Δx > 0, Δy > 0, Δz > 0 and j = 1, . . . , n, k = 1, . . . , m and l = 1, . . . , p. We define the domain E = [0, A] × [0, B] × [0, C], where A = (n − 1)Δx, B = (n − 1)Δy, and C = (n − 1)Δz. We consider the grid function u defined as uj,k,l : R3 → R We define the edge-preserving piecewise linear approximation of the grid function u as the function L(x, y, z)|Ejkl = Ljkl (x, y, z) where the computational voxel Ejkl is given by Ejkl = [xj −
Δx Δx Δy Δy Δz Δz , xj + ] × [yk − , yk + ] × [zl − , zl + ] 2 2 2 2 2 2
and Ljkl (x, y, z) = uj,k,l + a(x − xj ) + b(y − yk ) + c(z − zl ), x Δ− uj,k,l Δx + uj,k,l where a, b, and c are determined from a = minmod , , Δx Δy u z Δx y Δ− uj,k,l Δz+ uj,k,l − j,k,l Δ+ uj,k,l b = minmod , and c = minmod , where the , Δy , Δz Δy Δz operations in the term containing derivatives are understood component-wise, and given by Δx± uni,j,k = ±(uni±1,j,k − uni,j,k ), Δy± uni,j,k = ±(uni,j±1,k − uni,j,k ), and Δz± uni,j,k = ±(uni,j,k±1 − uni,j,k ), where i, j, k are the indices of the 3D grid.
394
S.H. Joshi et al.
The minmod(d, e) function is defined as, minmod(d, e) =
sgn(d) + sgn(e) min(|d|, |e|), 2
(12)
where sgn(d) = 1 if d ≥ 0 and sgn(d) = −1 otherwise. The function Ljkl (x, y, z) is defined on the computational voxel Ejkl . We want to up-(down) sample the grid function u with a spatial resolution of hx > 0, hy > 0, hz > 0. Then the up-(down) sampled grid function v is defined on a new grid v(q, r, s) for q = q, . . . , nh, r = 1, . . . , mh, and s = 1 . . . , ph where A B C nh = floor , mh = floor , ph = floor , hx hy hz where floor(d) is the maximum of all integers i such that i ≤ d. The new grid is then defined as xhq = (q − 1)hx , yhr = (r − 1)hy , and zhs = (s − 1)hz . Based on this grid, the function v is defined as v(q, r, s) = L(xhq , yhr , zhs ). We demonstrate the edge-preserving property of the above operator by applying it to a checkerboard pattern as shown in Fig. 1. Figure 1 shows a low-resolution image, as well as its up sampled versions using a bilinear, sinc and the edge-preserving operator for two different types of checkerboard patterns. It also shows a magnified portion from the center of the image. It is observed that the bilinear and the sinc interpolation operators introduce significant spurious levels of gray in between the black squares in the pattern. Furthermore, they have a tendency to smooth out the boundaries of the flat black squares in the image. In contrast, the edge-preserving operator has retained, and in some cases even enhanced the boundaries and edges as compared to the low-resolution image. Figure 3 shows similar results with a 280 × 200 scene image. The first image in the top row shows the 560×400 pixel replicated image, whereas the last image is the superresolved image. The bottom row shows a small portion of the image magnified to show detail. One can immediately observe the blocking effects due to pixel replication in the first image, and blurring of the edge boundaries in the bilinearly interpolated version. The edges get somewhat better using the sinc interpolation, but the best quality is given by the super-resolved image, that resolves and even enhances sharp edges and interfaces in the image. In both the above cases, we used an isotropic Gaussian kernel with kernel widths σx = σy = 1.
3 Numerical Implementation This section discusses the numerical implementations of the solution to the Euler Lagrange equation. The Euler-Lagrange derivative of the TV-norm is not well defined at 1 points where ∇u = 0, due to the presence of the term |∇u| . Hence we modify the regularization TV functional as follows:
|∇u|2 + dxdy (13) Ω
Edge-Enhanced Image Reconstruction Low-resolution Image
Bilinear Interpolation
Sinc Interpolation
395
Edge-preserved Upsampling
Fig. 1. The first and the third rows show a low-resolution image from the left, and its up sampled versions using a bilinear interpolation operator, a sinc operator, and the new edge-preserving operator for two different checkerboard patterns. The second and the fourth rows show a magnified area from the center of the image.
where is a small positive parameter. We express the 3D model (7) in terms of explicit partial derivatives ˜ ∗ (¯ ut =λh g − T (h ∗ u)) 2
unxx((uny ) +(unz )2 + ))+unyy ((unx )2 +(unz )2 + ))+unzz ((unx )2 +(uny )2 + )) [(unx )2 + (uny )2 + (unz )2 + ]3/2 n n n −2uxy ux uy − 2unxz unx unz − 2unyz uny unz + (14) [(unx )2 + (uny )2 + (unz )2 + ]3/2
+
396
S.H. Joshi et al.
low-resolution image
sinc interpolation
Super-resolved reconstruction
1st Bregman refinement
Fig. 2. Clockwise from top, a 380 × 285 low-resolution image, upsampled to twice the size by sinc interpolation, and super-resolved reconstruction, and the first Bregman iterated image
using u0 = S(f ) as the initial guess and homogeneous Neumann boundary conditions (i.e. absorbing boundary). The above expression can also be rewritten as n un+1 i,j,k − ui,j,k ˜ ∗ (¯ = λ[h g − T (h ∗ un ))]i,j,k Δt
(15)
2
+
unxx ((uny ) +(unz )2 +))+unyy ((unx )2 +(unz )2 +))+unzz ((unx )2 +(uny )2 +)) [(unx )2 +(uny )2 +(unz )2 +]3/2
(16)
+
−2unxy unx uny − 2unxz unx unz − 2unyz uny unz [(unx )2 + (uny )2 + (unz )2 + ]3/2
(17)
The approximations to the derivatives in Eqn. 17 can be calculated as: [unxx ]i,j,k = Δx+ Δx− uni,j,k /h2x , [unyy ]i,j,k = Δy+ Δy− uni,j,k /h2y , [unzz ]i,j,k = Δz+ Δz− uni,j,k /h2z , [unxy ]i,j,k = (Δx− + Δx+ )(Δy− + Δy+ )uni,j,k /4(hx hy ), [unxz ]i,j,k = (Δx− + Δx+ )(Δz− + Δz+ )uni,j,k /4(hx hz ), [unyz ]i,j,k = (Δy− + Δy+ )(Δz− + Δz+ )uni,j,k /4(hy hz ), [unx ]i,j,k = (Δx− + Δx+ )uni,j,k /2hx, [uny ]i,j,k = (Δy− + Δy+ )uni,j,k /2hy , [unz ]i,j,k = (Δz− + Δz+ )
Edge-Enhanced Image Reconstruction Low-resolution Image
Bilinear Interpolation
Sinc Interpolation
397
Super-resolved reconstruction
Fig. 3. Top row shows the low-resolution image, and the upsampled versions using bilinear, sinc and the super-resolved reconstruction. The bottom row shows a magnified detail of a portion of the image.
uni,j,k /2hz The Lagrange multiplier λ was chosen to be the maximum value for which the algorithm was stable. It was empirically determined to be λ = 10, and was not changed thereafter.
4 Experimental Results Lastly, we demonstrate the algorithm by performing experiments with 2D natural images, 2D slices of 3D volumetric images, and finally the full 3D volumetric MRI images themselves. 4.1 Results for Natural Images Figure 2 shows the results of the super-resolution reconstruction algorithm applied to a 380 × 285 map image. This image has been scaled to 760 × 570 by pixel-replication for display purposes. It can be observed that pixel replication inherently adds blocking artifacts to the image. The low-resolution image is up sampled by a factor of two using bilinear interpolation, and sinc interpolation, and finally using the super-resolution reconstruction method. It is observed that bilinear interpolation grossly smoothes out the image, the result due to sinc interpolation is preserves some high frequency information, whereas the super-resolved reconstruction yields a sharp, crisp image, even resolving the little text at finer scales. One can further enhance this image by performing the 1st Bregman iteration as shown in Fig. 2. However, this process should be terminated after one or two iterations. 4.2 Results for 2D Slices of 3D MRI Image In this experiment, we look at enhancing the in-plane resolution of individual transverse slices of a 3D MRI image. From left, all rows of Fig. 4 show an isotropic original image
398
S.H. Joshi et al.
Original Image
Subsampled Image Fourier Interpolation SR reconstruction
Fig. 4. Examples of super-resolved reconstruction for 2D slices of 3D MRI images
180 × 216, the subsampled image, a Fourier interpolated image, and a super-resolved reconstructed image. For display purposes, the subsampled image is shown at twice the resolution using pixel-replication. It is observed that the high resolved reconstructed image has sharper edge features, more details, and visually closely resembles the original image as compared to the Fourier interpolated result. 4.3
Results for Full 3D MRI Images
The proposed super-resolution algorithm can be applied to arbitrary 2D images or even 3D volumes of anisotropic voxel dimensions. In this experiment, we apply the reconstruction
Edge-Enhanced Image Reconstruction
399
Original Image Subsampled Image Fourier Interpolation SR reconstruction
Fig. 5. Examples of super-resolved reconstruction for full 3D MRI images (volume rendered)
algorithm to the full 3D MRI image volume. Figure 5 shows a volume rendering of an original image of dimensions 256×256×160, at voxel widths given by 1×1×1.25 mm3. This image is first subsampled to half the resolution at 128 × 128 × 80 (2 × 2 × 2.5 mm3 ) and then super-resolved to a full isotropic 256 × 256 × 160 image with 1 × 1 × 1 mm3 resolution. As expected, we can see an improvement in the resolution plus an increase in the detail simultaneously across all X, Y, and Z dimensions. In this experiment, we used an anisotropic Gaussian kernel with the variances proportional to the voxel dimensions. Furthermore the grid dimensions for the edge-preserving up sampling and down sampling h operators were taken to be Δx = h2x , Δy = 2y , Δz = h2z , where hx , hy , hz are the voxel dimensions of the appropriate up sampled or down sampled image.
5 Conclusion and Future Directions We have presented a method for enhancement of resolution of images. The strengths of this approach lie in the i) TV norm as a regularizing functional in the variational model, and ii) a new piecewise-linear up(down) sampling operator that preserves edges. While we are aware that the proposed method works with the physical space, and not the frequency (k-space) of the data, we emphasize that the TV prior is a nonlinear prior that does modify the amplitudes of the k-space data. In other words, our algorithm works on the processed physical image, yet it modifies the spectral information implicitly in the data. This is an important point to be noted, especially in view of comparison with other methods that involve MRI image processing that work with the k-space representation of the data. We have demonstrated the improvement in spatial resolution for 2D as well as 3D anatomical MRI images. In the future, we intend to investigate the problem of high resolution reconstruction of DT-MRI images using the proposed method.
400
S.H. Joshi et al.
Acknowledgments This research was partially supported by the National Institute of Health through the NIH Roadmap for Medical Research, Grant U54 RR021813. Additionally, Dr. Antonio Marquina gratefully acknowledges the support from the NSF grants DMS-0312222, ACI-0321917, the NIH grant G54 RR021813, as well as DGICYT MTM2008-03597 from the Spanish Government Agency.
References 1. Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. and Math. Phys. 7, 200–217 (1967) 2. Capel, D., Zisserman, A.: Super-resolution from multiple views using learnt image models. In: CVPR, vol. 2, pp. 627–634 (2001) 3. Carmi, E., Liu, S., Alon, N., Fiat, A., Fiat, D.: Resolution enhancement in MRI. Magnetic Resonance Imaging 24(2), 133–154 (2006) 4. Chaudhuri, S., Joshi, M.: Motion-Free Super-Resolution. Springer, New York (2005) 5. Elad, M., Feuer, A.: Restoration of a single super-resolution image from several blurred,noisy, and undersampled measured images. IEEE Tran. Image Processing 6(12), 1646–1658 (1997) 6. Freeman, W.T., Jones, T.R., Pasztor, E.C.: Example-based super-resolution. IEEE Computer Graphics and Applications 22(2), 56–65 (2002) 7. Greenspan, H., Oz, G., Kiryati, N., Peled, S.: MRI inter-slice reconstruction. Magnetic Resonance Imaging 20, 437–446 (2002) 8. Irani, M., Peleg, S.: Improving resolution by image registration. CVGIP: Graphical Models and Image Processing 53(3), 231–239 (1991) 9. Kornprobst, P., Peeters, R., Nikolova, M., Deriche, R., Ng, M., Van Hecke, P.: A superresolution framework for fMRI sequences and its impact on resulting activation maps. In: Ellis, R.E., Peters, T.M. (eds.) MICCAI 2003. LNCS, vol. 2879, pp. 117–125. Springer, Heidelberg (2003) 10. Osher, S.J., Burger, M., Goldfarb, D., Xu, J., Yin, W.: An iterative regularization method for Total Variation-based image restoration. Multiscale Modeling and Simulation 4(2), 460–489 (2005) 11. Rudin, L.I., Osher, S.J., Fatemi, E.: Nonlinear Total Variation based noise removal algorithms. Physica D 60(1-4), 259–268 (1992) 12. Startk, H., Oskoui, P.: High-resolution image recovery from image-plane arrays, using convex projections. Journal of the Optical Society of America 6, 1715–1726 (1989) 13. Tsai, R.Y., Huang, T.S.: Multi-frame image restoration and registration. In: Advances in Computer Vision and Image Processing, pp. 317–339 (1984) 14. Marquina, A., Osher, S.J.: Image super-resolution by TV-regularization and Bregman iteration. Journal of Scientific Computing 37(3), 367–382 (2008)
Nonlocal Variational Image Deblurring Models in the Presence of Gaussian or Impulse Noise Miyoun Jung and Luminita A. Vese University of California, Los Angeles, Department of Mathematics, Los Angeles, CA 90095-1555, USA [email protected], [email protected]
Abstract. We wish to recover an image corrupted by blur and Gaussian or impulse noise, in a variational framework. We use two data-fidelity terms depending on the noise, and several local and nonlocal regularizers. Inspired by Buades-Coll-Morel, Gilboa-Osher, and other nonlocal models, we propose nonlocal versions of the Ambrosio-Tortorelli and Shah approximations to Mumford-Shah-like regularizing functionals, with applications to image deblurring in the presence of noise. In the case of impulse noise model, we propose a necessary preprocessing step for the computation of the weight function. Experimental results show that these nonlocal MS regularizers yield better results than the corresponding local ones (proposed for deblurring by Bar et al.) in both noise models; moreover, these perform better than the nonlocal total variation in the presence of impulse noise. Characterization of minimizers is also given.
1
Introduction
We consider the problem of restoring an image blurred and then contaminated by Gaussian or impulse noise. Let f, u : Ω → IR be image intensity functions, where Ω ⊂ IR2 is open and bounded. The standard linear degradation model is f = k ∗ u + n; f is the observed blurry-noisy image, k is (known) spaceinvariant blurring kernel, u is the ideal image we want to recover, and n is additive random noise independent of u. We approach the restoration problem within the variational framework: inf u {Φ(f − k ∗ u) + Ψ (|∇u|)}, where Φ defines a data-fidelity term, and Ψ defines the regularization that enforces a smoothness constraint on u, depending on its gradient ∇u. First, two different fidelity terms can be considered based on the noise; in the case of Gaussian noise model, the L2 -fidelity term led by the maximum likelihood estimation is commonly used: Φ(f − k ∗ u) = Ω (f − k ∗ u)2 dx. However, the quadratic data fidelity term considers the impulse noise, which might be caused by bit errors in transmissions or wrong pixels, as an outlier. So, for the impulse noise model, the L1 -fidelity term is more appropriate, due to its robustness of removing outlier effects [2], [17]: Φ(f − k ∗ u) = Ω |f − k ∗ u|dx. Image deblurring-denoising is an inverse problem, which is known to be ill-posed due to either the non-uniqueness of the solution or the numerical X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 401–412, 2009. c Springer-Verlag Berlin Heidelberg 2009
402
M. Jung and L.A. Vese
instability of the inversion of the blurring operator. The regularization term Ψ alleviates this problem by reflecting some a-priori properties. Several regularization terms were suggested in the literature, including [23], [9], [19], [20], [16]. Here, we consider the total variation regularization [19], [20] and two approximations of Mumford-Shah regularizers [16], denoted M SH 1 and M ST V , proposed by Ambrosio-Tortorelli [3] and Shah [21], [1] respectively and recently used for image deblurring in the presence of Gaussian and impulse noise by Bar et al [4], [5]. These traditional regularization terms are based on local image operators, which denoise and preserve edges very well, but may induce loss of fine structures like texture during the restoration process. Recently, Buades et al [8] introduced the nonlocal means filter, which produces excellent denoising results. Kindermann et al [13] and Gilboa-Osher [10,11] formulated the variational framework of NL-means by proposing nonlocal regularizing functionals. Lou et al [14] used the nonlocal total variation (N L/T V ) of Gilboa-Osher in image deblurring in the presence of Gaussian noise with a preprocessing step for the computation of the weight function. We propose here nonlocal versions of the approximated Mumford-Shah and Ambrosio-Tortorelli regularizing functionals, called N L/M SH 1 and N L/M ST V , by applying the nonlocal operators proposed by Gilboa-Osher to M SH 1 and M ST V respectively, for image restoration in the presence of blur and Gaussian or impulse noise. In addition, for the impulse noise model, we propose to use a preprocessed image to compute the weights w (the weights w defined in the NL-means filter are more appropriate for the additive Gaussian noise). We note that the interesting parallel work [7] also proposed N L/M SH 1 regularizer for segmentation and denoising in the presence of Gaussian noise, but not for deblurring, nor for the impulse noise case. More details about our proposed methods are presented in [12]. Local Regularizers. In this section, we recall several regularization terms. The first one is the Mumford-Shah regularizing functional [16] which gives preference to piecewise smooth images. The MS regularizer, depending on the image u and on its edge set K ⊂ Ω, is given by Ψ MS (u, K) = β Ω\K |∇u|2 dx + α K dH1 , where H1 is the 1D Hausdorff measure. The first term enforces smoothness of u everywhere except on the edge set K, and the second one minimizes the total length of edges. But it is difficult to minimize in practice the non-convex MS functional. Ambrosio and Tortorelli [3] approximated this functional by a sequence of regular functionals Ψ using the Γ -convergence. The edge set K is represented by a smooth auxiliary function v. Thus we have an approximation to Ψ MS as [3] (v − 1)2 MSH 1 2 2 dx, |∇v|2 + (u, v) = β v |∇u| dx + α Ψ 4 Ω Ω where 0 ≤ v(x) ≤ 1 represents the edges: v(x) ≈ 0 if x ∈ K and v(x) ≈ 1 1 otherwise, > 0 is a parameter, and α, β > 0. A minimizer u = u of ΨMSH approaches a minimizer u of Ψ MS as → 0.
Nonlocal Variational Image Deblurring Models
403
An alternative approach is the total variation [19, 20] proposed by Rudin, Osher, and Fatemi, called T V regularizer: Ψ T V (u) = Ω |Du| ≈ Ω |∇u|dx. Because of its benefits of preserving edges (which have high gradient levels) and convexity, T V has been widely used in image restoration. Shah [21] suggested a modified version of the AT approximation to the MS functional by replacing the 2-norm of |∇u| by the 1-norm in the first term:
ΨMST V
(v − 1)2 dx. |∇v|2 + (u, v) = β v |∇u|dx + α 4 Ω Ω 2
This functional Γ −converges to the other functional Ψ MST V as → 0, [1]:
|∇u|dx + α
Ψ MST V (u) = β Ω\K
K
|u+ − u− | dH1 + |Dc u|(Ω) 1 + |u+ − u− |
where u+ and u− denote the image values on two sides of the jump set K = Ku of u, and Dc u is the Cantor part of the measure-valued derivative Du. Note |u+ −u− | that the non-convex term 1+|u + −u− | is similar with the prior regularization by Geman-Reynolds [9]. We observe that this regularizing functional is simi|Du| = lar to the total variation of u ∈ BV (Ω) that can be written as Ω + − 1 |∇u|dx + |u − u |dH + |D u|(Ω). By comparing the second terms, c Ω\Ku Ku we see that the M ST V regularizer does not penalize the jump part as much as the T V regularizer. In this paper, we consider the T V regularizer Ψ T V , the 1 M SH 1 regularizer ΨMSH , and the M ST V regularizer ΨMST V . Nonlocal Regularizers. Nonlocal methods in image processing have been explored in many papers because they are well adapted to texture denoising while the standard denoising models working with local image information seem to consider texture as noise, which results in losing details. Nonlocal methods are generalized from neighborhood filters (e.g. Yaroslavsky filter, [24]) and patch based methods. The idea of neighborhood filter is to restore a pixel by averaging the values of neighboring pixels with a similar grey level value. Buades et al. [8] generalized this idea by applying the patch-based method, and proposed the famous nonlocal-means (or NL-means) filter for denoising, given by N Lu(x) = − da (u(x),u(y)) 1 h2 u(y)dy; da (u(x), u(y)) = Ga (t)|u(x + t) − u(y + t)|2 dt is C(x) Ω e the patch distance, Ga is the Gaussian kernel with standard deviation a deterda (u(x),u(y)) h2 dy is a normalization factor, and mining the patch size, C(x) = Ω e− h is the filtering parameter corresponding to the noise level (usually the standard deviation of the noise). The NL-means not only compares the grey level at a single point but the geometrical configuration in a whole neighborhood (patch). In the variational framework, Kindermann et al [13] formulated the neighborhood filters and NL-means filters as nonlocal regularizing functionals which generally are not convex. Then, Gilboa-Osher [10] formalized the convex nonlocal functional inspired from graph theory, and moreover, based on the gradient and divergence definitions on graphs in the context of machine learning,
404
M. Jung and L.A. Vese
they [11] derived the corresponding nonlocal operators. Let u : Ω → IR be a function, and w : Ω × Ω → IR be a nonnegative and symmetric weight function. Thenonlocal gradient vector ∇w u : Ω × Ω → IR is (∇w u)(x, y) := → (u(y) − u(x)) w(x, y). Hence, the nonlocal divergence divw − v : Ω → IR of the → − vector v : Ω × Ω → IR is defined as the adjoint of the nonlocal gradient, → v )(x) := Ω (v(x, y) − v(y, x)) w(x, y)dy, and the norm of the nonlocal (divw − 2 gradient of u at x ∈ Ω is given by |∇w u|(x) = Ω (u(y) − u(x)) w(x, y)dy. Based on these nonlocal operators, they introduced nonlocal regularizing functionals of the general√form Ψ (u) = Ω φ(|∇w u|2 )dx, where s → √ φ(s) is a positive function, convex in s, and φ(0) = 0. By taking φ(s) = s, they proposed the nonlocal TV regularizer (N L/T V ) which corresponds in the local case to Ψ T V (u) = Ω |∇u|dx. Inspired by these ideas, we propose in the next section nonlocal versions of Ambrosio-Tortorelli and Shah approximations to the MS regularizers for image denoising-deblurring. This is also continuation of the work by Bar et al. [4], [5], first to propose the use of Mumford-Shah-like approximations to image restoration. In practice, we use the search window Ωw = {y ∈ Ω : |y − x| ≤ r} instead of Ω (semi-local) and the weight function w at (x, y) ∈ Ω ×Ω depending on a function da (f (x),f (y)) . The weight function w(x, y) gives f : Ω → IR, w(x, y) = exp − h2 the similarity of image features between two pixels x and y, which is normally computed using the blurry-noisy image f . Recently, for image deblurring in the presence of Gaussian noise, Lou et al [14] used a preprocessed image obtained by applying the Wiener filter to f , instead of f , to compute w. In our work, only for the impulse noise model, we propose a different preprocessing step and evaluate w by using the preprocessed image.
2
Description of the Proposed Models
We propose the following nonlocal Mumford-Shah regularizers (N L/M S) by applying the nonlocal operators to the approximations of the MS regularizer (v − 1)2 N L/MS 2 2 Ψ dx, |∇v|2 + (u, v) = β v φ(|∇w u| )dx + α 4 Ω Ω √ where φ(s) = s and φ(s) = s correspond to the nonlocal versions of M SH 1 and M ST V regularizers, so called N L/M SH 1 and N L/M ST V , respectively. In addition, we use these nonlocal regularizers to deblur images in the presence of Gaussian or impulse noise. Thus, by incorporating the proper fidelity term depending on the noise model, we design two types of total energies as G Gaussian noise model: E (u, v) = (f − k ∗ u)2 dx + Ψ N L/MS (u, v), Ω Impulse noise model: E Im (u, v) = |f − k ∗ u|dx + Ψ N L/MS (u, v). Ω
Nonlocal Variational Image Deblurring Models
405
Minimizing these functionals in u and v, we obtain the Euler-Lagrange equations ∂E Im ∂E G v−1 2 = = 2βvφ(|∇w u| ) − 2αv + α = 0, ∂v ∂v 2 Gaussian noise model: Impulse noise model:
∂E G = k˜ ∗ (k ∗ u − f ) + LN L/MS u = 0, ∂u ∂E Im = k˜ ∗ sign(k ∗ u − f ) + LN L/MS u = 0, ∂u
˜ where k(x) = k(−x) and
N L/MS (u(y) − u(x))w(x, y) L u=−2 · (v 2 (y)φ (|∇w (u)|2 (y)) + v 2 (x)φ (|∇w (u)|2 (x)) dy. Ω
The energy functionals E G (u, v) and E Im (u, v) are convex in each variable and bounded from below. Therefore, to solve two Euler-Lagrange equations simultaneously, the alternate minimization approach is applied. Note that since both energy functionals are not convex in the joint variable (u, v), we may compute only a local minimizer. However, this is not a drawback in practice, since the initial guess for u in our algorithm is the data f . To extend the nonlocal methods to the impulse noise case, we need a preprocessing step for the weight function w since we cannot directly use the data f to compute w. In other words, in the presence of impulse noise, the noisy pixels tend to have larger weights than the other neighboring points, so it is likely to keep the noise value at such pixel. Thus, we propose a simple algorithm to obtain a preprocessed image g, which removes the impulse noise (outliers) as well as preserving texture as much as possible. Basically, we use the median filter, well-known for removing impulse noise. However, if we apply one-step of the median filter, then the output may be too smoothed out. In order to preserve fine structures as well as to remove noise properly, we take the idea of Bregman iteration [6], [18], and we propose the following algorithm to obtain a preprocessed image g that will be used only in the computation of the weight function w: Initialize : r0 = 0, g0 = 0. do (iterate n = 0, 1, 2, . . . , m) gn+1 = median(f + rn , [a a]) rn+1 = rn + f − k ∗ gn+1 while f − k ∗ gn 1 > f − k ∗ gn+1 1 [Optional] gm = median(gm , [b b]) where f is the given noisy-blurry data, median(f, [a a]) is the median filter of size a × a with input f ; the optional step is needed in the case when the final gm still has some salt-and-pepper-like noise. This algorithm is simple, it requires a few iterations only, and it takes less than 1 second for a 256 × 256 size image. Moreover, the preprocessed image gm is a deblurred and denoised version of f ; it will be used only in the computation of the weights w, while keeping f in the data fidelity term, thus artifacts are not introduced by the median filter.
406
M. Jung and L.A. Vese
Characterization of Minimizers. In this section we characterize the minimizers of the functionals formulated with the nonlocal regularizers, using [15, 22]. Assuming that a functional · on a subspace of L2 (Ω) is a semi-norm, we can define the dual norm (where ·, · denotes the L2 (Ω) inner product) of f,ϕ f ∈ L2 (Ω) ⊂ L1 (Ω) as f ∗ := supϕ =0 ϕ ≤ +∞, so that the usual duality f, ϕ ≤ ϕ f ∗ holds for ϕ = 0. We define two functionals (here Ku := k ∗ u), F (u) = λ |f − Ku|2 dx + |u|N L/T V , Ω |v − 1|2 2 2 2 )dx |f − Ku| + η dx + β|u|N L/MS + α (|∇v| + G(u, v) = 4 Ω Ω where λ > 0, and |u|N L/MS ∈ {|u|N L/MST V,v , |u|N L/MSH 1 ,v }. We use here the notations |u|N LT V = Ω |∇w u|(x)dx, |u|N L/MST V,v = Ω v 2 (x)|∇w u|(x)dx, and 2 2 |u|N L/MSH 1 ,v = Ω v (x)|∇w u| (x)dx, which are semi-norms. We modified the regularizing functional |u|N L/MSH 1 ,v ; the square-root term replaces the original term of our model, Ω v 2 (x)|∇w u|2 (x)dx. It is introduced here to enable the characterization of minimizers below, but the numerical calculations utilize the original formulation. For the proofs we refer to [12]. Proposition 1. Let K : L2 (Ω) → L2 (Ω) be a linear bounded blurring operator with adjoint K ∗ and let F be the associated functional. Then 1 if and only if u ≡ 0 is a minimizer of F . (1) K ∗ f ∗ ≤ 2λ 1 (2) Assume that 2λ < K ∗ f ∗ < ∞. Then u is a minimizer of F if and only if 1 1 ∗ K (f − Ku) ∗ = 2λ and u, K ∗ (f − Ku) = 2λ |u|N L/T V ,
where · ∗ is the corresponding dual norm of | · |N L/T V . Proposition 2. Let K : L2 (Ω) → L2 (Ω) be a linear bounded blurring operator with adjoint K ∗ and let G be the associated functional. If (u, v) is a minimizer of G with v ∈ [0, 1], then f − Ku f − Ku ∗ ∗ K , u = β|u|N L/MS , K = β and (f − Ku)2 + η 2 ∗ (f − Ku)2 + η 2 where · ∗ is the corresponding dual norm of | · |N L/MS .
3
Experimental Results and Comparisons
The nonlocal MS regularizers proposed here, N L/M ST V and N L/M SH 1 , are tested on several images with different blur kernels and noise types. We compare them with their traditional (local) versions, such as M ST V and M SH 1 , and with the local and nonlocal total variations (T V [20], N L/T V [11]). In addition, we experiment the nonlocal regularizers in the impulse noise model with a preprocessing step for the weight function.
Nonlocal Variational Image Deblurring Models
407
Fig. 1. Image recovery with cross sections: Gaussian blur kernel with σb = 1 and Gaussian noise with σn = 5. Top: original image and its cross section, noisy blurry image and its cross section. Middle, Bottom rows: recovered images (middle) and recovered cross sections (bottom) using T V, M ST V, N L/T V, N L/M ST V . SNR for the results: T V = 32.9485, M ST V = 33.5629, N L/T V = 45.1943, N L/M ST V = 50.6618. β = 0.0045 (M ST V ), 0.001 (N L/M ST V ), α = 0.00000015, = 0.000001.
Fig. 2. Top: (1st, 3rd) original images, (2nd, 4th) noisy blurry images with Gaussian kernel with σb = 1 (2nd) and using the pill-box kernel of radius 2 (4th), and then contaminated by Gaussian noise with σn = 5. Bottom: recovered images with SNR values: T V (14.4240), M ST V (14.4693), N L/T V (17.4165), N L/M ST V (16.5776). β = 0.007, α = 0.00000015 (M ST V ), β = 0.0025, α = 0.00000025 (N L/M ST V ), = 0.0000005.
408
M. Jung and L.A. Vese
Fig. 3. Recovery of noisy blurry image from Fig. 3. Top: recovered image u using T V (SNR=25.0230), M ST V (SNR=25.1968), M SH 1 (SNR=23.1324). Third row: recovered image u using N L/T V (SNR=26.4554), N L/M ST V (SNR=26.4696), N L/M SH 1 (SNR=24.7164). Second, bottom rows: corresponding residuals f − k ∗ u. β = 0.0045 (M ST V ), 0.001 (N L/M ST V ), 0.06 (M SH 1 ), 0.006 (N L/M SH 1 ), α = 0.00000001, = 0.00002.
First, we test the Gaussian noise model in Figs. 1-3. As expected, N L/M ST V and N L/M SH 1 perform better than M ST V and M SH 1 respectively in the sense that not only they recover the fine scales such as texture better, but also in the case of N L/M ST V , the model does not produce any staircase effect (appeared in M ST V ). Furthermore, comparing the nonlocal MS regularizers
Nonlocal Variational Image Deblurring Models
409
Fig. 4. Recovery of noisy blurry image with Gaussian kernel with σ = 1 and saltand-pepper noise with d = 0.3. Top: original image, blurry image, noisy-blurry image. Middle: recovered images using T V (SNR=26.9251), M ST V (SNR=27.8336), M SH 1 (SNR=23.2052). Bottom: recovered images using N L/T V (SNR=29.2403), N L/M ST V (SNR=29.3503), N L/M SH 1 (SNR=27.1477). Second column: β = 0.25 (M ST V ), 0.1 (N L/M ST V ), α = 0.01, = 0.002. Third column: β = 2 (M SH 1 ), 0.55 (N L/M SH 1 ), α = 0.001, = 0.0001.
with N L/T V , N L/M ST V and N L/T V seem to lead to similar results visually and according to SNR, while N L/M SH 1 gives a smoother image and lower SNR. Specifically, in Fig. 1, we use a simple image and its 1D cross section. In this example, we use 11 × 11 size search window for N L/M ST V which is sufficient to obtain the best result, while N L/T V needs a 31×31 size. Moreover, N L/M ST V recovers the signals much better than N L/T V , which might be caused by the fact that originally, M ST V regularizer does not suppress the jump part as much as T V . On the other hand, in Fig. 2, N L/T V produces clearer edges leading to higher SNR, while N L/M ST V has some artifacts near the edges of especially
410
M. Jung and L.A. Vese
Fig. 5. Comparison between M SH 1 and N L/M SH 1 with the image blurred and contaminated by high density (d = 0.4) of impulse noise. Top: noisy blurry images (left) using motion blur kernel of length=10, oriented at angle θ = 25◦ w.r.t. the horizon and salt-and-pepper noise with d = 0.4, (middle) using Gaussian kernel with σb = 1 and salt-and-pepper noise with d = 0.4, (right) using Gaussian kernel with σb = 1 and random-valued impulse noise with d = 0.4. Middle: recovered images using M SH 1 , (left) SNR=17.1106, (middle) SNR=15.2017, (right) SNR=16.6960. Bottom: recovered images using N L/M SH 1 , (left) SNR=21.2464, (middle) SNR=23.1998, (right) SNR=24.2500. First column: β = 2 (M SH 1 ), 0.4 (N L/M SH 1 ), second column: β = 2 (M SH 1 ), 1 (N L/M SH 1 ), α = 0.001, = 0.0002. Third column: β = 2.5 (M SH 1 ), 0.65 (N L/M SH 1 ), α = 0.000001, = 0.002.
small black boxes. However, in the other real boat image, there is no significant difference between them visually and according to SNR (see Fig. 3). Fig. 3 also justifies the result that the nonlocal regularizers preserve edges and details better than the traditional local ones because we see less textures in the residuals f − k ∗ u.
Nonlocal Variational Image Deblurring Models
411
Next, we recover a blurred image contaminated by impulse noise (salt-andpepper noise or random-valued impulse noise). First, we test all the nonlocal regularizers and the corresponding local ones on the Lenna image Fig. 4 with Gaussian blur kernel and salt-and-pepper noise with the noise density d = 0.3, and then we test M SH 1 and N L/M SH 1 on the Einstein image Fig. 5 with different blur kernels and both impulse noise models, salt-and-pepper noise and random-valued impulse noise, with the same noise density d = 0.4. By using a preprocessed image for the weight function, all the nonlocal regularizers outperform the traditional local ones by reducing the staircase effect and recovering the details better. Comparing the nonlocal regularizers, both N L/T V and N L/M ST V seem to give better results than N L/M SH 1 in the sense of SNR, but visually N L/M SH 1 looks more natural by preserving texture or details better especially with high noise density (see Fig. 4). Moreover, in the presence of high density of noise, M SH 1 suffers from restoring images especially blurred with Gaussian kernel, while it works satisfactorily with the other blur kernels such as motion blur. But, N L/M SH 1 performs very well with Gaussian blur as well as it produces better results with the other blur kernels. This can be seen in Figures 4 and 5. In Fig. 4 with Gaussian blur and high noise density d = 0.3, M SH 1 suffers from some artifacts induced by noise, while M ST V and T V give cleaner results. On the other hand, N L/M SH 1 provides visually better result than the other nonlocal ones by preserving the fine structures. Even though N L/M ST V gives the highest SNR, the result still looks more like cartoon by suppressing the texture parts especially in the hat part. So in this case, we visually prefer N L/M SH 1 . Based on the above results, in Fig. 5, we only compare M SH 1 and N L/M SH 1 with the different blur kernels and both impulse noise models with higher noise density d = 0.4. As expected, N L/M SH 1 produces better results than M SH 1 in both blur cases; especially in the Gaussian blur case, the results do not have any artifacts, unlike M SH 1 . Finally we note that in the MS regularizers, the parameters α, β and were selected manually to provide the best SNR results. The smoothness parameter β increases with noise level while the other parameters α, are approximately fixed. For the computational time, it takes about 5 minutes for constructing the weight function of a 256 × 256 image with the 11 × 11 search window and 5 × 5 patch in MATLAB on a dual core laptop with 2GHz processor and 2GB memory. The minimization for the (local or nonlocal) MS regularizers takes around 60 seconds for the computations of both u using an explicit scheme based on the gradient descent method and v using a semi-implicit scheme with the total iterations 5 × (100 + 5), while the (local or nonlocal) TV regularizer using gradient descent with an explicit scheme takes less than 55 seconds with 500 iterations.
Acknowledgments This work has been supported by the National Science Foundation Grants DMS0714945 and DMS-0312222.
412
M. Jung and L.A. Vese
References 1. Alicandro, R., Braides, A., Shah, J.: Free-discontinuity problems via functionals involving the L1 -norm of the gradient and their approximation. Interfaces Free Bound 1, 17–37 (1999) 2. Alliney, S.: Digital Filters as Absolute Norm Regularizers. IEEE TSP 40(6), 1548– 1562 (1992) 3. Ambrosio, L., Tortorelli, V.M.: On the approximation of free discontinuity problems. Boll. Un. Mat. Ital. 6-B, 105–123 (1992) 4. Bar, L., Sochen, N., Kiryati, N.: Semi-Blind Image Restoration via Mumford-Shah Regularization. IEEE TIP 15(2), 483–493 (2006) 5. Bar, L., Sochen, N., Kiryati, N.: Image deblurring in the presence of impulsive noise. IJCV 70, 279–298 (2006) 6. Bregman, L.M.: The relaxation method for finding common points of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics 7, 200–217 (1967) 7. Bresson, X., Chan, T.F.: Non-local unsupervised variational image segmentation models. UCLA C.A.M. Report 08-67 (2008) 8. Buades, A., Coll, B., Morel, J.M.: A review of image denoising algorithms, with a new one. SIAM MMS 4(2), 490–530 (2005) 9. Geman, D., Reynolds, G.: Constrained Restoration and the Recovery of Discontinuities. IEEE TPAMI 14(3), 367–383 (1992) 10. Gilboa, G., Osher, S.: Nonlocal linear image regularization and supervised segmentation. SIAM MMS 6(2), 595–630 (2007) 11. Gilboa, G., Osher, S.: Nonlocal operators with applications to image processing. SIAM MMS 7(3), 1005–1028 (2008) 12. Jung, M., Vese, L.A.: Image restoration via nonlocal Mumford-Shah regularizers. UCLA C.A.M. Report 09-09 (2009) 13. Kindermann, S., Osher, S., Jones, P.W.: Deblurring and denoising of images by nonlocal functionals. SIAM MMS 4(4), 1091–1115 (2005) 14. Lou, Y., Zhang, X., Osher, S., Bertozzi, A.: Image recovery via nonlocal operators. UCLA C.A.M. Report 08-35 (2008) 15. Meyer, Y.: Oscillating Patterns in Image Processing and Nonlinear Evolution Equations. Univ. Lecture Ser. 22 (2002) 16. Mumford, D., Shah, J.: Optimal approximations by piecewise smooth functions and associated variational problems. Comm. Pure Appl. Math. 42, 577–685 (1989) 17. Nikolova, M.: Minimizers of cost-functions involving non-smooth data-fidelity terms. Application to the processing of outliers. SIAM Num. Anal. 40(3), 965–994 (2002) 18. Osher, S., Burger, M., Goldfarb, D., Xu, J., Yin, W.: An iterative regularization method for total variation based image restoration. SIAM MMS 4, 460–489 (2005) 19. Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Phys. D 60, 227–238 (1996) 20. Rudin, L., Osher, S.: Total variation based image restoration with free local constraints. IEEE ICIP 1, 31–35 (1994) 21. Shah, J.: A common framework for curve evolution, segmentation and anisotropic diffusion. In: IEEE CVPR, pp. 136–142 (1996) 22. Tadmor, E., Nezzar, S., Vese, L.: Multiscale hierarchical decomposition of images with applications to deblurring. Denoising and segmentation: CMS 6(2), 281–307 (2008) 23. Tichonov, A., Arsensin, V.: Solution of ill-posed problems. Wiley, New York (1977) 24. Yaroslavsky, L.P.: Digital image processing: An Introduction. Springer, Heidelberg (1985)
A Geometric PDE for Interpolation of M -Channel Data Frank Lenzen1 and Otmar Scherzer1,2 1
Department of Mathematics, University of Innsbruck, Technikerstrasse 21a, A-6020 Innsbruck, Austria {Frank.Lenzen,Otmar.Scherzer}@uibk.ac.at http://infmath.uibk.ac.at 2 Johann Radon Institute for Computational and Applied Mathematics (RICAM), Austrian Academy of Sciences, Altenbergerstrasse 69, A-4040 Linz, Austria
Abstract. We propose a partial differential equation to be used for interpolating M -channel data, such as digital color images. This equation is derived via a semi-group from a variational regularization method for minimizing displacement errors. For actual image interpolation, the solution of the PDE is projected onto a space of functions satisfying interpolation constraints. A comparison of the test results with standard and state-of-the-art interpolation algorithms shows the competitiveness of this approach.
1
Introduction
A frequent task in image processing is interpolation, which we refer to as the process of assigning a discrete set of pixel positions and according discrete M channel image data (e.g. RGB color data) an interpolating function. Interpolation is frequently used for zooming into or scaling digital images. A special kind of image interpolation problems is inpainting, i.e. the problem of reconstructing lost or corrupted parts of images. Linear interpolation (that is convolution methods) [18], such as for example nearest neighbor, spline, and the Whittaker-Shannon interpolation [14, 4], is computationally efficient but produce unpleasant artifacts. On the other hand, nonlinear methods adapting to geometrical structures can produce more visually attractive results but are computationally more demanding . Nowadays, most of these nonlinear methods are motivated by energy minimization or by scale spaces of partial differential equations, see for example [1, 22, 21, 18]. In particular for inpainting such nonlinear methods are widely used, see for example [2, 5, 6, 23]. In this paper we derive a partial differential equation that is designed to correct and filter for displacement errors in M -channel data. Combined with the interpolation ideas of [11, 16], this method is suited for interpolation. The paper is organized as follows: In Section 2 we consider a variational ansatz for correcting displacement errors. Application of the semi-group concepts yields X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 413–425, 2009. c Springer-Verlag Berlin Heidelberg 2009
414
F. Lenzen and O. Scherzer
a PDE, which can be considered the gradient flow of the variational problem. A relationship of our PDE to the Mean Curvature Flow (MCF) equation is established. Our approach is combined with interpolation constraints in Section 3. For comparison, we show in Section 4 results from the proposed method and from interpolation methods from the scale space literature. In particular we take into account the GREYCstoration software of Tschumperlé [21] and the interpolation method proposed by Roussos and Maragos [18,19]. The paper ends with a conclusion in Section 5.
2
Displacement Regularization
Let u : Ω → IRM be an M channel function representing continuous M -channel data on a bounded open domain Ω ⊆ IR2 . We presume the following image acquisition model: Data u(0) of u are given, which satisfy u(0) = u ◦ Φ , (1) where Φ : Ω → Ω is a displacement vector field. In the following we consider the problem of finding (u, Φ) satisfying (1) such that the displacement Φ − Id is small and u has minimal total variation. A variational method corresponding to this problem consists in minimization of 1 2 |Φ(x) − x| dx + α |∇u(x)| dx , (2) 2 Ω Ω for small α > 0 over the set of functions satisfying u(0) = u ◦ Φ. Here ⎛ ⎞1/2 2 M ∂1 u1 ∂1 u2 ∂1 u3 and |∇u(x)| = ⎝ (∂i uj (x))2 ⎠ . ∇u = ∂2 u1 ∂2 u2 ∂2 u3 j=1 i=1
We want to avoid solving a coupled system for (u, Φ), and therefore we assume that u is a smooth function, so that we can make a first order Taylor series expansion. Then it follows from our modeling assumptions that u(0) (x) = (u ◦ Φ)(x) = u(x + (Φ(x) − x)) ≈ u(x) + ∇uT (x) (Φ(x) − x) . (3) Here, ≈ symbolizes that the left hand side approximates the right hand side for small displacements Φ − Id. In the following, we assume that equality holds instead of ≈, which implies that only small displacements occur. Note that the equation ∇uT (x)(Φ(x) − x) = u(0) (x) − u(x) for unknown Φ(x) − x is overdetermined. In case that the difference u(0) (x) − u(x) is not only caused by a distortion Φ, no solution to this problem might exist. To overcome this problem, we consider the minimization of
2
T
∇u (x)(Φ(x) − x) − u(0) (x) + u(x) ,
(4)
A Geometric PDE for Interpolation of M -Channel Data
415
that is, we search for the displacement vector Φ(x) − x , which fits best to the data (u(0) (x), u(x)). The minimizer of (4) is given by Φ(x) − x = (∇uT (x))† (u(0) (x) − u(x)),
(5)
where (∇u(x))† denotes the pseudo–inverse (see [17]) of ∇u(x). For notational convenience, we leave out the dependence of u with respect to x in the following. Inserting (5) into (2) gives the functional 1 Fu0(0) (u) := (u − u(0) )T (∇uT ∇u)† (u − u(0) ) + α |∇u| dx . (6) 2 Ω In order to avoid computation of the pseudo–inverse, we additionally regularize the probably singular matrix ∇uT ∇u by the regular, symmetric, and strictly positive definite matrix (εI + ∇uT ∇u) with some ε > 0. To summarize, we consider in the sequel the variational problem of minimizing 1 Fuε (0) (u) := (u − u(0) )T (εI + ∇uT ∇u)−1 (u − u(0) ) + α |∇u| dx . (7) 2 Ω For this functional, existence theory within the classical framework of the Calculus of Variations [7, 8] is not applicable. Moreover for a theoretical analysis, minimization has in fact to be considered over the space of M -channel functions with components of finite total variation. In order to implement the minimization of Fvε numerically, quasi-convexification techniques would be most efficient. This approach requires the analytical calculation of the quasi-convex envelope of the function (x, ξ, ν) →
1 (ξ − v(x))T (εI + ν T ν)−1 (ξ − v(x)) + α |ν| 2
with respect to ν. However, the quasi-convex envelope function is not known so far, and thus efficient numerical minimization based on this approach is not at hand. In the following we recall the convex semi-group solution concept [3]: Let R : H → IR ∪ {∞} be a convex functional on a Hilbert space H, and let uα be a minimizer of the variational regularization functional 2 1 Gu(0) (u) := u − u(0) + αR(u) . 2 H Then, for u(0) sufficiently smooth, (uα − u(0) )/α converges for α → 0 to an element in the subgradient ∂R(u(0) ) of R. Choosing u(k) ∈ argmin Gu(k−1) , iterative minimization of Gu(k) yields an approximation of the solution of the flow ∂u ∈ ∂R(u) ∂t at scale t = kα. In other words, variational regularization approximates a diffusion filtering scale space, which is the associated gradient flow equation. For
416
F. Lenzen and O. Scherzer
convex semi-groups the solutions of diffusion filtering and variational methods are comparable and look rather similar [20]. We expect a similar behavior for the non-convex functional Fuε (0) and derive the according flow equation, which is the gradient flow associated with (7). We use the abbreviations
−1 Aε (u) := εI + ∇uT ∇u and 1 (u − u(k−1) )T Aε (u)(u − u(k−1) ) dx . Suε (k−1) (u) := 2 Ω The directional derivative of Suε (k−1) at u in direction φ (provided it exists) satisfies ∂τ Suε (k−1) (u + τ φ) = φT Aε (u)(u − u(k−1) ) dx+ Ω (8) 1 (k−1) T ε (k−1) (u − u ) ∂u,φ A (u) (u − u ) dx , 2 Ω where
Aε (u + τ φ) − Aε (u) . τ →0 τ In a similar way, the directional derivative of Rα (u) := α Ω |∇u| at u in direction φ can be derived in a formal way: ∇u ∂τ Rα (u + τ φ) = α dx. (9) ∇φT |∇u| Ω ∂u,φ Aε (u) := lim
Note that the right hand side of (9) is meant as the subdifferential of the TV semi-norm evaluated in the direction of φ. Using (8) and (9), the optimality condition for the minimizer u(k) of Fuε (k−1) reads as u(k) − u(k−1) dx φT Aε (u(k) ) α Ω 1 (u(k) − u(k−1) )T (10) ∂u(k) ,φ Aε (u(k) ) (u(k) − u(k−1) ) dx + 2 Ω α ∇u(k) dx. ∇φT =− |∇u(k) | Ω Let t > 0 be fixed and k = t/α, then, as in the convex case, we can expect that (u(k) − u(k−1) )/α converges to ∂t u(t) for α → 0. From this it follows then that u(k) − u(k−1) → 0, and from (10) it follows that ∇u(t) dx. (11) φT Aε (u(t))∂t u(t) dx = − ∇φT |∇u(t)| Ω Ω Using Green’s formula and the fundamental lemma, from (11) the strong formulation
−1 ∇u(t) , (12) ∂t u(t) = ∇ · Aε (u(t))∂t u(t) = εI + ∇uT (t)∇u(t) |∇u(t)| follows, where u(t) satisfies natural (Neumann) boundary conditions.
A Geometric PDE for Interpolation of M -Channel Data
417
In the following, we leave out the dependence of u with respect to t for notational convenience. Multiplying both sides of (12) by (εI + ∇uT ∇u), we get ∇u T . (13) ∂t u = (εI + ∇u ∇u) ∇ · |∇u| Moreover, the initial condition associated with the flow is u(0) := u(0) . Now, letting ε → 0, which only seems to make sense mathematically if M ≤ 2, we obtain the evolutionary partial differential equation ∇u T ∂t u = (∇u ∇u) ∇ · . (14) |∇u| Remark 1. For scalar data (M = 1) the equation (14) reads as ∇u 2 . ∂t u = |∇u| ∇ · |∇u|
(15)
One recognizes that (15) differs from the Mean Curvature Flow equation by the leading factor |∇u|2 instead of |∇u|. We generalize the functional in (6) to
† 1 (u − u(0) )T (∇uT ∇u)p (u − u(0) ) + α |∇u| dx 2 Ω
(16)
with p ≥ 0. We note that the power of a matrix is defined via spectral decomposition. The case p = 1/2 is of particular interest, because – the functional (16) becomes invariant under affine rescaling of the image brightness. – The semi-group approach (see also [10] for the scalar case) gives the gradient flow 1 ∇u T 2 , ∂t u = (∇u ∇u) ∇ · |∇u| which, in the scalar case, is the Mean Curvature Flow equation. For an analytical comparison of the solution of (16) for scalar, radial-symmetric monotonous data to the MCF solution we refer to [9].
3
Interpolation of M-Channel Data
The evolution equation (14) can be used for interpolating discrete M -channel data by restricting u to satisfy interpolation constraints. The problem of interpolating M -channel data has already been studied in the literature before, see for example [1, 21, 18, 19]. The difference between the approaches by [21, 18, 19] and our approach are the different PDEs for filtering: [21, 18, 19] use anisotropic diffusion, whereas the PDE (14) generalizes the Mean Curvature Flow equation.
418
F. Lenzen and O. Scherzer
To begin with, we recall the interpolation constraints proposed in [11,16]. For the simplicity of notation we restrict ourself to M -channel data defined on a two-dimensional rectangular domain 1 1 1 1 , Nx + × , Ny + , Ω := 2 2 2 2 where Nx , Ny ∈ N. The domain is partitioned into cells (’pixels’) Qi,j :=
1 1 1 1 × j − ,j + , i − ,i + 2 2 2 2
(i, j) = (1, 1), (1, 2) . . . , (Nx , Ny ) .
Let G be a kernel function defined on IR2 and compactly supported in [− 21 , 12 ]2 . Let Z := (zm,i,j ) a tensor, which denotes sampled data of a function G ∗ u : IR2 → IRM at the positions (i, j). Here ∗ denotes the convolution operator. In particular: zm,i,j := (G ∗ um )(i, j),
(m, i, j) = (1, 1, 1), (1, 1, 2) . . . , (M, Nx , Ny ) .
(17)
Examples for kernel functions typically used in literature are listed in [18]. We rewrite (17) as follows: Let Gi,j := G(· − (i, j)), then zm,i,j = Gi,j , um L2 (Ω) ,
(m, i, j) = (1, 1, 1), . . . , (M, Nx , Ny ) .
We say that an M -channel function u = (u1 , . . . , uM ) satisfies the interpolation constraints for some discrete data Z = (zm,i,j ), if Gi,j , um L2 (Ω) = zm,i,j . The set of functions satisfying the interpolation constraints for data Z is denoted by UZ,G . Example 1. We consider for G the two-dimensional δ distribution, i.e., G(x, y) = δ(x)δ(y). Then zm,i,j = um ((i, j)). The nearest neighbor (componentwise, piecewise constant) interpolation reads as u(0) m |Qi,j = zm,i,j ,
(m, i, j) = (1, 1, 1), . . . , (M, Nx , Ny ) .
Here, u(0) = u ◦ Φ, where Φ(x, y)|Qi,j = (i, j). In particular u can be interpreted as a distortion of u(0) by a local sampling displacement Φ. Now let u(0) ∈ UZ,G be arbitrary. The nearest neighbor interpolation in Example 1 motivates the assumption that, for a sampled function u, there exists Φ such that u(0) = u ◦ Φ. Recalling the concepts presented in Section 2 we consider the
A Geometric PDE for Interpolation of M -Channel Data
419
functional defined in (7) restricted to the set UZ,G in order to reconstruct u from given u(0) . In turn, we restrict the flow equation (13) to UZ,G : ∇u T , (18) ∂t u = PU0,G (εI + ∇u ∇u)∇ · |∇u| where PU0,G (v) = v − G−2 L2 (R2 )
Ny Nx
Gi,j , vL2 (Ω) Gi,j
i=1 j=1
is applied on each component separately. Note that the assumption u(0) ∈ UZ,G together with ∂t u ∈ U0,G asserts that the solution u(t), t ≥ 0 stays in UZ,G . At this point we remark that there is no analytical theory guaranteeing the well posedness of (18). Since (18) comprises a projection, in order to solve (18) numerically a timeexplicit scheme with sufficiently small step size Δt is required.
4
Numerical Results
We compare our method consisting in numerically solving (18) to two standard interpolation methods, namely nearest neighbor and cubic interpolation, as well as to established, sophisticated interpolation methods proposed by Tschumperlé & Deriche [21] and by Roussos & Maragos [19]. The method of Tschumperlé & Deriche is implemented in the GREYCstoration software (see http://cimg. sourceforge.net/greycstoration/), for the method of Roussos & Maragos, test results are available from the site http://cvsp.cs.ntua.gr/∼tassos/PDEinterp/ ssvm07res/. In our method, the kernel function has to be chosen appropriately. We use G(x, y) :=
1 χ 1 1 2 gσ (x, y), g (x, y) dx dy [− 2 , 2 ] [− 1 , 1 ]2 σ 2 2
where gσ is the two-dimensional isotropic Gaussian of standard deviation σ. In our method a value of 20 is used for the variance σ 2 . For evaluating the methods, we use the two test images shown in Fig. 1. For both images, a low and a high resolution version is available, where the low resolution image is obtained from the high resolution image via low-pass filtering (convolution with a bicubic spline) and downsampling by a factor of four, see [19]. The test images were provided by Roussos & Maragos. The methods mentioned above are used to upsample the low resolution image by a factor four. Our method is applied with 100 time steps, Δt = 0.03 , ε = 0.05 and σ 2 = 20 for the first and 100 time steps, Δt = 0.05, ε = 0.01 and σ 2 = 20 for the second test image, respectively. For GREYCstoration (version 2.9) we use the option ’-resize’ together with the aimed size of the high resolution image and parameters ’-anchor true’, ’-iter 3’ and ’-dt 10’. For the remaining parameters
420
F. Lenzen and O. Scherzer
Fig. 1. Two test images. Each test image is available in a low and a high resolution version with a factor of four between both resolution.
the default values are used. The results of Roussos’ method were obtained from the web site mentioned above. Let us consider the results of upsampling the first test image. In order to highlight the differences between the methods, we compare only details of the resulting images, see Fig. 2. The results with nearest neighbor and cubic interpolation are shown in Fig. 2, top right and middle left, respectively. Both results are unsatisfactory and confirm, what is well known from the literature, that by nearest neighbor interpolation the upsampled images look blocky and cubic interpolation produces blurry images. The result of GREYCstoration with interpolation constraints (Fig. 2, middle row right) also appears blurry, but compared to cubic interpolation better reconstruct the edges in the image. The method proposed by Roussos & Maragos as well as our method (see Fig. 2, bottom row) produce sharp and well reconstructed edges. In order to further investigate the differences between the PDE based methods, we zoom into two regions of the second test image, one region containing an edge (see Fig. 3) and one region with texture (see Fig. 4). Fig. 3 shows the edge region after applying the methods proposed by Tschumperlé with interpolation constraints (top row, second left), Roussos (top row, second right) and our method (top row, right). For comparison we have plotted the detail of the original image (top row, left). One can see that by Tschumperlé’s method the edges appear blurry and irregular. This seems to be an effect of the interpolation constraints, because when Tschumperlé’s method is applied without constraints, strong anisotropic diffusion along the edge occurs so that the edge becomes more regular. By the method of Roussos the edge is reconstructed in a sharp way, but overshots appear. Our method is also able to reconstruct the edge sharply but with little overshots. Concerning the gray mark at the parrot’s beak, we observe that Tschumperlé’s method reconstructs the shape of the mark better than the other methods do. The differences in the behavior of the methods can also be recognized when applying the Sobel operator to the interpolated images: The thickness of the edges in the result of the Sobel operator indicates the blurriness of the reconstructed edge. We see that the proposed method produces sharper edges than the
A Geometric PDE for Interpolation of M -Channel Data
421
Fig. 2. Upsampling by a factor of four, Detail of the first test image. top left: original high resolution image, top right: nearest neighbor interpolation, middle left: cubic interpolation, middle right: interpolation using GREYCstoration, bottom left: interpolation method proposed by Roussos et. al, bottom right: proposed interpolation method.
method by Roussos and more regular edges than the method by Tschumperlé. The overshots introduced by Roussos’ method can also be observed in the outcome of the Sobel operator. They are far stronger than the overshots produced by our method. Now we investigate the effect of the interpolation methods on textures. Fig. 4, top left, shows a textured region of the original image. The results of the methods proposed by Tschumperlé (with interpolation constraints) and Roussos are given in Fig. 4, top right and bottom left, respectively. The result of the proposed method is shown in Fig. 4, bottom right. One observes a certain blurriness
422
F. Lenzen and O. Scherzer
Fig. 3. Detail of an edge in the original and interpolated images (top row, using GREYCstoration with interpolation constraints, Roussos’ method, and the proposed method) and subsequently applied Sobel operator (bottom row)
in the results by Tschumperlé’s method. As for the result before, we point out that incorporating the interpolation constraints seems to have a strong effect on the result. When applying GREYCstoration without imposing constraints, the results are much more influenced by the anisotropic diffusion and the edges and the texture are accentuated. In the result of the interpolation method proposed by Roussos, we see a strong effect of the anisotropic diffusion on the texture, so that the result is more visually appealing than the other results. Nevertheless, a comparison with the original image shows that original and reconstructed texture differ significantly. In particular the orientations of the short stripes in the face of the parrot are different. Note that the anisotropic diffusion induced by the direction of the texture also affects the pupil of the parrot. On the result of our method we remark that the reconstruction of the texture is quite conservative, i.e., we stay near the initial guess. The blockyness is slightly reduced by the evolution process. Taking a look at the eye of the parrot, the relation of our
A Geometric PDE for Interpolation of M -Channel Data
423
Fig. 4. A texture detail of the original (top left) and interpolated images using GREYCstoration (top right), Roussos’ method (bottom left) and the proposed method (bottom right)
method to Mean Curvature Flow can be observed: The pupil is reconstructed as a perfectly circular shape.
5
Conclusion
We have proposed a new PDE based method for the interpolation of color images. The method differs from other state-of-the-art methods by the underlying evolution process. We use a PDE which is a generalized Mean Curvature Flow, whereas other methods are based on anisotropic diffusion. Interpolation constraints are satisfied by projecting the evolution process onto an adequate function space. Numerical tests show that our method is competitive to state-of-the-art interpolation methods. Due to the Mean Curvature Flow nature of the method, edges are well reconstructed. Textures are treated in a conservative manner.
424
F. Lenzen and O. Scherzer
Acknowledgments We want to thank Gerhard Dziuk (Univ. Freiburg), Peter Elbau (RICAM, Linz) and Markus Grasmair (University Innsbruck) for inspirational discussions. We thank David Tschumperlé for providing GREYCstoration and Anastasios Roussos and Petros Maragos for providing the test images as well as the results of their algorithm. The work of O.S. is partially funded by the project FSP S 92 (subproject 9203-N12).
References 1. Belahmidi, A., Guichard, F.: A partial differential equation approach to image zoom. In: Proc. of the 2004 Int. Conf. on Image Processing, pp. 649–652 (2004) 2. Bertalmio, M., Sapiro, G., Caselles, V., Ballester, C.: Image inpainting. In: [13], pp. 417–424 (2000) 3. Brézis, H.: Opérateurs maximaux monotones et semi-groupes de contractions dans les espaces de Hilbert. North-Holland Publishing Co., Amsterdam (1973); NorthHolland Mathematics Studies, No. 5. Notas de Matemática (50) 4. Burger, W., Burge, M.J.: Digitale Bildverarbeitung. Springer, Heidelberg (2005) 5. Chan, R., Setzer, S., Steidl, G.: Inpainting by flexible Haar wavelet shrinkage. Preprint, University of Mannheim (2008) 6. Chan, T., Kang, S., Shen, J.: Euler’s elastica and curvature based inpaintings. SIAM J. Appl. Math. 63(2), 564–592 (2002) 7. Dacorogna, B.: Weak Continuity and Weak Lower Semicontinuity of Non-Linear Functionals. Lecture Notes in Mathematics, vol. 922. Springer, Heidelberg (1982) 8. Dacorogna, B.: Direct Methods in the Calculus of Variations. Applied Mathematical Sciences, vol. 78. Springer, Berlin (1989) 9. Elbau, P., Grasmair, M., Lenzen, F., Scherzer, O.: Evolution by non-convex energy functionals. Reports of FSP S092 - Industrial Geometry 75, University of Innsbruck, Austria (submitted) (2008) 10. Grasmair, M., Lenzen, F., Obereder, A., Scherzer, O., Fuchs, M.: A non-convex PDE scale space. In: [15], pp. 303–315 (2005) 11. Guichard, F., Malgouyres, F.: Total variation based interpolation. In: Proceedings of the European Signal Processing Conference, vol. 3, pp. 1741–1744 (1998) 12. Hagen, H., Weickert, J. (eds.): Visualization and Processing of Tensor Fields. Mathematics and Visualization. Springer, Heidelberg (2006) 13. Hoffmeyer, S. (ed.): Proceedings of the Computer Graphics Conference 2000 (SIGGRAPH 2000). ACMPress, New York (2000) 14. Jähne, B.: Digitale Bildverarbeitung, 5th edn. Springer, Heidelberg (2002) 15. Kimmel, R., Sochen, N.A., Weickert, J. (eds.): Scale-Space 2005. LNCS, vol. 3459. Springer, Heidelberg (2005) 16. Malgouyres, F., Guichard, F.: Edge direction preserving image zooming: a mathematical and numerical analysis. SIAM J. Numer. Anal. 39, 1–37 (2001) 17. Nashed, M.Z. (ed.): Generalized inverses and applications. Academic Press/ Harcourt Brace Jovanovich Publishers, New York (1976) 18. Roussos, A., Maragos, P.: Vector-valued image interpolation by an anisotropic diffusion-projection pde. In: Sgallari, F., Murli, A., Paragios, N. (eds.) SSVM 2007. LNCS, vol. 4485, pp. 104–115. Springer, Heidelberg (2007)
A Geometric PDE for Interpolation of M -Channel Data
425
19. Roussos, A., Maragos, P.: Reversible interpolation of vectorial images by an anisotropic diffusion-projection pde. In: Special Issue for the SSVM 2007 conference. Springer, Heidelberg (2007) (accepted for publication) 20. Scherzer, O., Weickert, J.: Relations between regularization and diffusion filtering. J. Math. Imaging Vision 12(1), 43–63 (2000) 21. Tschumperlé, D.: Fast anisotropic smoothing of multi-valued images using curvature-preserving pde’s. International Journal of Computer Vision (IJCV) 68, 65–82 (2006) 22. Tschumperlé, D., Deriche, R.: Vector valued image regularization with pdes: A common framework for different applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (2005) 23. Weickert, J., Welk, M.: Tensor field interpolation with pdes. In: [12], pp. 315–325 (2006)
An Edge-Preserving Multilevel Method for Deblurring, Denoising, and Segmentation Serena Morigi1 , Lothar Reichel2 , and Fiorella Sgallari1 1
2
Dept. of Mathematics-CIRAM, University of Bologna, Bologna, Italy {morigi,sgallari}@dm.unibo.it Dept. of Mathematical Sciences, Kent State University, Kent, OH 44242, USA [email protected]
Abstract. We present a fast edge-preserving cascadic multilevel image restoration method for reducing blur and noise in contaminated images. The method also can be applied to segmentation. Our multilevel method blends linear algebra and partial differential equation techniques. Regularization is achieved by truncated iteration on each level. Prolongation is carried out by nonlinear edge-preserving and noise-reducing operators. A thresholding updating technique is shown to reduce “ringing” artifacts. Our algorithm combines deblurring, denoising, and segmentation within a single framework.
1
Introduction
Digital image restoration, reconstruction, and segmentation are important in medical and astronomical imaging, film restoration, as well as in image and video coding. This paper introduces a cascadic multilevel method for simultaneous restoration and segmentation of blurred and noisy images. Blur arises for many reasons, including out-of-focus cameras, and camera or object motion during exposure. Blur often is modeled by a point-spread function (PSF). Noise is the random, unwanted, variation in brightness of an image. It may originate from, e.g., film grain or electronic noise from a digital camera or scanner. We consider additive noise in this work. It is well known that linear deblurring methods tend to introduce oscillatory artifacts. Variational deblurring methods are able to reduce these artifacts, however, they typically are much more computationally intensive than linear methods; see, e.g., Welk et al. [15] for a discussion. Many segmentation methods apply curve evolution techniques. These methods seek to detect object boundaries, represented by closed curves in an image. The contours are represented as the zero level set of an implicit function defined in higher dimension. The active contours evolve in time according to a Partial Differential Equation (PDE) model, which takes into account intrinsic geometric measures of the image. We will use a variant proposed by Li et al. [7] of the wellknown Geodesic Active Contours (GAC) model [2]. This paper discusses a cascadic multilevel image restoration method that allows both spatially variant and spatially invariant PSFs. The method requires X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 426–438, 2009. c Springer-Verlag Berlin Heidelberg 2009
Multilevel Method for Deblurring, Denoising, and Segmentation
427
the solution of a linear system of equations on each level. These systems are solved by an iterative method, the choice of which depends on properties of the PSF. We introduce a thresholding updating strategy in order to suppress “ringing.” The restriction operators are defined by solving local weighted least-squares problems, and the prolongation operators are determined by piecewise linear prolongation followed by integrating a discretized nonlinear Perona-Malik diffusion equation for a few time-steps. The purpose of the integration is to reduce noise. The cascadic multilevel method so obtained shares the computational efficiency and simplicity of truncated iteration for the solution of linear discrete ill-posed problems with the edge-preserving property of nonlinear models. The multilevel method proceeds from coarser to finer levels, and regularizes by truncated iteration on each level. For many image restoration problems, the multilevel method demands fewer matrix-vector product evaluations on the finest level than the corresponding 1-level truncated iterative method, and often determines restorations of higher quality. A benefit of our multilevel approach to image restoration is that it easily can be combined with image segmentation, as is illustrated in the present paper. We remark that our multilevel method differs significantly from multilevel methods for the solution of well-posed boundary value problems for elliptic partial differential equations in that prolongation and restriction operators, as well as the number of iterations on each level are chosen in a different manner. This paper is organized as follows. Section 2 introduces the variational deblurring and the denoising model, Section 3 discusses the cascadic multilevel framework, and Section 4 presents a few computed examples. Concluding remarks can be found in Section 5.
2
Deblurring, Denoising, and Segmentation of Images
We consider the restoration of two-dimensional gray-scale images, which have been contaminated by blur and noise. The available observed blur- and noisecontaminated image f δ is related to the unavailable blur- and noise-free image u ˆ by the degradation model δ f (x) = h(x, y)ˆ u(y)dy + η δ (x), x ∈ Ω, (1) Ω 2
where Ω ⊂ R is the image domain, η δ represents noise in the data, and the kernel h(x, y) models the PSF. If the blur is spatially invariant, then h is of the form ˜ ˜ The kernel is smooth or piecewise smooth h(x, y) = h(x− y) for some function h. and, therefore, the integral operator is compact. It follows that the solution of (1) is an ill-posed problem; see, e.g., Engl et al. [3] and Hansen [5] for discussions on ill-posed problems and their numerical solution. We would like to determine an accurate approximation of u ˆ when the observed image f δ and the kernel h, but not the noise η δ , are known. A popular approach to achieving this is to minimize the functional 2 1 δ E(u) = h(x, y)u(y)dy − f (x) + ρ R(u(x))dx, (2) Ω 2 Ω
428
S. Morigi, L. Reichel, and F. Sgallari
where ρ > 0 is a regularization parameter and R(u) = ψ(|∇u|2 )
(3)
is a regularization operator. Here ψ is a differentiable monotonically increasing function and ∇u denotes the gradient of u; see, e.g., Rudin et al. [11] and Welk et al. [15] for discussions on this kind of regularization operators. The Euler-Lagrange equation associated with (2), supplied with a gradient descent which yields a minimizer as “time” t → ∞, is given by ∂u (t, z) = − h(x, z) h(x, y)u(t, y)dy − f δ (x) dx + ρ D(u(t, z)), ∂t Ω Ω (4) for z ∈ Ω and t ≥ 0. The initial function u(0, z) = f δ (z), z ∈ Ω, and suitable boundary conditions are used. We also refer to D as a regularization operator. Image restoration methods based on the Euler-Lagrange equation require that the regularization operator D, as well as values of the regularization parameter ρ and a suitable finite time-interval of integration [0, T ] be chosen. The determination of suitable values of ρ and T generally is not straightforward. We get from (3) that D(u) = div(g(|∇u|2 )∇u),
g(t) = dΨ (t)/dt.
(5)
The function g is referred to as the diffusivity. Perona-Malik regularization is obtained by choosing the diffusivity g(s) =
1 , 1 + s/σ
(6)
where σ is a positive constant; see [14]. Alternatively, one can use a regularization operator of total variation-type. Nonlinear models based on (4)-(6) can provide denoising and deblurring of good quality; however, their time-integration is computationally demanding: explicit methods require many tiny time-steps and therefore are expensive, while each time-step with an implicit or semi-implicit method is, in general, expensive even if it could be accelerated by multigrid techniques. A much cheaper and simpler approach to determining an approximation of the desired image u ˆ is to apply a few steps of an iterative method to the linear system of equations obtained by a discrete approximation of (1), Au = bδ ,
A ∈ Rn×n ,
u, bδ ∈ Rn .
(7)
Here A is a discrete blurring operator and bδ represents the available blur- and noise-contaminated image. In applications typically bδ , rather than f δ , is available; see [5] for details. Approximate solutions of (7) conveniently can be computed by Krylov subspace iterative methods, where the choice of method depends on the matrix properties. For instance, spatially variant blur often gives rise to a nonsymmetric matrix A, and we may use the LSQR Krylov subspace method [13] to solve
Multilevel Method for Deblurring, Denoising, and Segmentation
429
(7). This method is an implementation of the conjugate gradient method applied to the normal equations. When the matrix is symmetric, but possibly indefinite, the MR-II [4] Krylov subspace method is an attractive alternative to LSQR. The iteration number may be considered a discrete regularization parameter. It is important not to carry out too many iterations in order to avoid severe error propagation. This approach to determining a restored image is referred to as regularization by truncated iteration; see, e.g., [3, 4, 8] for discussions. Due to cut-off of high frequencies, these iterative methods may introduce artifacts, such as ringing, and fail to recover edges accurately. Many image analysis applications require image segmentation. The level to which segmentation is carried out depends on the problem being solved; segmentation should be terminated when the regions of interest in the application have been isolated. This problem-dependence makes autonomous segmentation one of the most difficult computational tasks in image analysis. The presence of noise and blur makes this task even more complicated. In this paper we carry out segmentation by computing Geodesic Active Contours (GAC). This kind of segmentation methods are based on curve evolution theory, see [2] and references therein, and level sets [12]. The basic idea is to start with initial boundary shapes represented by closed curves, i.e., contours, and iteratively modify these contours by application of shrink/expansion operations determined by image constraints. The shrink/expansion operations, referred to as contour evolution, are performed by minimizing an energy functional, similarly to traditional region-based segmentation methods; however, the level set framework provides more flexibility. The GAC PDE model proposed in [2] is given by ∂φ ∇φ = |∇φ|div g(|∇bδ |2 ) , (8) ∂t |∇φ| where the edge-detector function g is defined by (6) and the initial condition φ0 is the signed distance function to an arbitrary initial curve enclosing the objects to be segmented. The solution to the segmentation problem is the zero-level set of the steady state of the flow φt = 0. We apply a fast curve evolution method recently suggested by Li et al. [7] in our multilevel method, which eliminates the need of costly re-initialization, but we remark that other GAC methods also can be used.
3
The Cascadic Multilevel Framework
We first review the cascadic multilevel method proposed in [8] for the removal of blur and noise. In [8] only symmetric blurring matrices are considered. Introduce for v = [v (1) , v (2) , . . . , v (n) ]T ∈ Rn the weighted least-squares norm v =
1 (i) 2 v n i=1 n
1/2 .
(9)
430
S. Morigi, L. Reichel, and F. Sgallari
Let ˆb ∈ Rn denote the unknown noise-free right-hand side associated with the right-hand side bδ of (7). We assume that ˆb ∈ Range(A) and that a bound δ for the noise e = bδ − ˆb is available, i.e., e ≤ δ.
(10)
Let W1 ⊂ W2 ⊂ · · · ⊂ W be a sequence of nested subspaces of Rn of dimension dim(Wi ) = ni with n1 < n2 < . . . < n = n. We refer to the subspaces Wi as levels, with W1 being the coarsest and W = Rn the finest level. Each level is furnished with a weighted least-squares norm; level Wi has a norm of the form (9) with n replaced by ni . We choose ni−1 = ni /4, 1 < i ≤ . Let Ai ∈ Rni ×ni be the representation of the blurring operator A on level Wi . The matrix Ai is determined by discretization of the integral operator (1) similarly as A. This defines implicitly the restriction operator Ri : Rn → Wi , such that Ai = Ri ARi∗ . (11) We define R = I. The choice of restriction operators Ri is in our experience less crucial for (ω) achieving high-quality restorations than the choice of restriction operators Ri : Rn → Wi for reducing the available blur- and noise-contaminated image represented by the right-hand side bδ in (7). We let (ω)
bδi = Ri bδ , (ω)
1 ≤ i < ,
(12)
where the Ri are determined by repeated local weighted least-squares approximation, inspired by a “staircasing”-reducing scheme recently proposed by Buades et al. [1]. Also the choice of prolongation operators from level i−1 to level i is important for the performance of the multilevel method. We apply nonlinear prolongation operators Pi : Wi−1 → Wi , 1 < i ≤ , defined by piecewise linear interpolation followed by integration of the Perona-Malik equation over a short time-interval; see below. The Pi are designed to be noise-reducing and edge-preserving. The multilevel methods of the present paper are cascadic, i.e., they first determine an approximate solution of A1 u = bδ1 in W1 , using the LSQR or MR-II iterative methods. We refer to the iterative method as IM in Algorithm 1 below. The iterations with this method are terminated by the discrepancy principle; see below. The so determined approximate solution in W1 is mapped into W2 by the prolongation P2 . A correction of this mapped iterate in W2 is computed by the IM. Again, the iterations are terminated by the discrepancy principle, and the approximate solution in W2 so obtained is mapped into W3 by P3 . The computations are continued in this fashion until an approximation of u ˆ has been determined in W = Rn . In the algorithm Δui,mi := IM(Ai , bδi − Ai ui,0 ) denotes the computation of the approximate solution Δui,mi of Ai zi = bδi − Ai ui,0 by mi iterations with one of the iterative methods MR-II or LSQR, using the initial iterate Δui,0 = 0.
Multilevel Method for Deblurring, Denoising, and Segmentation
431
Multilevel Algorithm 1 Input: A, bδ , δ, ≥ 1 (number of levels); Output: approximate solution u ∈ W of (7); segmented result φ ; Determine Ai and bδi from (11) and (12), respectively, 1 ≤ i ≤ ; u0 := 0; φ0 := initial contour; for i := 1, 2, . . . , do ui,0 := Pi ui−1 ; φi,0 := Si φi−1 ; Δui,mi := IM(Ai , bδi − Ai ui,0 ); Correction step: ui := ui,0 + βΔui,mi ; Segmentation step: φi := GAC(φi,0 , ui ); endfor
The number of iterations on each level is based on the discrepancy principle as follows: we assume that there are constants ci independent of δ, such that bδi − ˆbi ≤ ci δ,
1 ≤ i ≤ ,
where δ satisfies (10). It can be seen by using the noise-reducing property of the (ω) restriction operators Ri , that a suitable choice is ci =
1 ci+1 , 3
1 ≤ i < ,
c = γ,
(13)
for some constant γ > 1. In the computed examples of Section 4, we use γ = 1.4. The discrepancy principle prescribes that the iterations on level i be terminated as soon as bi − Ai ui,0 − Ai Δui,mi ≤ ci δ. (14) When many iterations are carried out, the computed approximate solution Δui,mi obtained, generally, is severely contaminated by noise, which is propagated from bi − Ai ui,0 . The purpose of the stopping criterion (14) is to i) allow enough iterations be carried out to determine an as accurate restoration on level i as possible, and ii) avoid to carry out so many iterations that the computed approximate solution Δui,mi is severely contaminated by propagated noise. Discussions on properties of the stopping rule (14) can be found in [8,10]. A general discussion on applications of the discrepancy principle to determine approximate solutions of ill-posed problems is provided in [3]. The nonlinear edge-preserving prolongation operators Pi have previously been applied in [8], where further details on their implementation are provided; see also [16]. The prolongation operator Pi first maps the approximate solution determined by the algorithm on level Wi−1 into Wi by piecewise linear interpolation, and then uses the result as initial function for a discretized initial-boundary value problem for the Perona-Malik nonlinear diffusion equation ∂u = div(g(|∇u|2 )∇u), ∂t
(15)
432
S. Morigi, L. Reichel, and F. Sgallari
where g is the Perona-Malik diffusivity (6). Integration over a short time-interval removes noise while preserving rapid spatial transitions, such as edges. Integration is performed by carrying out about 10 time-steps of size about 0.2 with an explicit finite difference method. The small number of time-steps avoids difficulties due to numerical instability and keeps the computational work required for integration negligible. We found it to be beneficial to apply more time-steps the more noise-contaminated the available image. However, in our experience the exact choices of the number of time-steps and their sizes are not crucial for the good performance of the multilevel method. In the algorithm, φ0 denotes the initial contour for the GAC segmentation method implemented by the solving (8); see [7]. The prolongation of the level set function from Wi−1 to Wi is carried out by spline interpolation and denoted by Si . The statement φi := GAC(φi,0 , ui ) updates the contour on level i. Ringing in restored images stems from the Gibbs phenomenon at discontinuities. The latter could be image borders, boundaries inside the image, or be introduced by inadequate spatial sampling of the image or kernel. The larger the support of the kernel in (1), the more pronounced the ringing. High contrast edges cause strong ringing, and the magnitude of the ringing is proportional to the norm of the image gradient. Based on these observations, we propose a deringing correction obtained by multiplying the image by the spatially variant function β(x, y) = α + (1 − α)(1 − g(|∇ui,0 (x, y)|2 )), (16) where g is the diffusivity (6) and the parameter 0 ≤ α ≤ 1 controls the suppression of the computed correction. Since we would like to suppress ringing in the smooth regions, but avoid suppression of edges, the correction function β should be small in smooth regions and large elsewhere. We use α = 0.05 in the computed examples of this paper, but this value can be tuned depending on the presence of large homogeneous regions in the image.
4
Numerical Results
We illustrate the performance of Algorithm 1. The computations are carried out in MATLAB with about 16 significant decimal digits. We assume that a fairly accurate estimate of the norm of the noise is available. If this is not the case, such an estimate can be computed by integration of bδ for a few time-steps with the Perona-Malik differential equation; details are described in a forthcoming paper. Note that the matrices Ai , defined by (11), do not have to be explicitly stored; it suffices to define functions for the evaluation of matrix-vector products with the Ai and, if Ai is nonsymmetric, also with the ATi . For the examples of this section, the matrix-vector products can be computed efficiently by using the structure of the Ai ; see, e.g., [9] for a discussion. The matrices corresponding to the finest level are numerically singular in all examples. The displayed restored images provide a qualitative comparison of the performance of the proposed cascadic multilevel method. A quantitative comparison is given by the Peak Signal-to-Noise Ratio,
Multilevel Method for Deblurring, Denoising, and Segmentation
PSNR(u , u ˆ) = 20 log10
255 dB, u − u ˆ
433
(17)
where u ˆ denotes the blur- and noise-free image and u the restored image determined by Algorithm 1. Each pixel is stored with 8 bits; the numerator 255 is the largest pixel-value that can be represented with 8 bits. A high PSNR-value indicates that the restoration is accurate; however, the PSNR-values are not always in agreement with visual perception. We also measure the variation in the error image uerr = u − uˆ, defined by EV(u , u ˆ) = ∇uerr 22 , (18) pixel
where the sum is over all pixels of the image. The more accurately the edges are restored, the smaller this sum.
Fig. 1. Blur- and noise-free images used in the numerical experiments. Left: butterfly, 400 × 400 pixels. Right: corner, 512 × 512 pixels.
We apply Algorithm 1 to blur- and noise-contaminated versions of the images shown in Figure 1. The corner image is representative of images with welldefined edges, while the butterfly image is a gray-scale photographic image with smoothed edges. Example 4.1. We consider the restoration of a contaminated version of the lefthand side image of Figure 1. Contamination is by space-invariant Gaussian blur as generated by the MATLAB function blur.m from Regularization Tools [6] with parameters sigma = 3 and band = 9. This function generates a block Toeplitz matrix with Toeplitz blocks. The parameter band specifies the halfbandwidth of the Toeplitz blocks and the parameter sigma defines the variance of the Gaussian PSF. The image also is contaminated by 5% Gaussian noise. The blurring operator is symmetric. We therefore use the MR-II iterative method.
434
S. Morigi, L. Reichel, and F. Sgallari
Fig. 2. Example 1. Top-left: Image contaminated by Gaussian blur and 5% Gaussian noise. Top-right: Image restored by 1-level method. Bottom-left: Image restored by 3-level method. Bottom-right: Deringing function β defined by (16).
Figure 2 provides a qualitative comparison of images restored by the basic 1-level MR-II method and the 3-level method defined by Algorithm 1. The restoration obtained with the latter method can be seen to be of higher quality with sharper edges. The deringing function β (16) is shown in Figure 2 (bottom right); it is small in smooth image regions and large elsewhere. Table 1(a) gives a quantitative comparison of the restorations determined by Algorithm 1 with = 2 and = 3 levels, and the basic 1-level MR-II method, for different amounts of noise. The columns marked “PSNR” and “EV” display (17) and (18), respectively. They show Algorithm 1 with = 3 to yield images with the highest PSNR- and smallest EV-values. The column marked “iter” shows the number of iterations required on each level. For instance, the triplet 4 − 1 − 2 indicates that Algorithm 1 carried out 4 MR-II iterations on the coarsest level, 1 iteration on the intermediate level, and 2 iterations on the finest level. The
Multilevel Method for Deblurring, Denoising, and Segmentation
435
Table 1. PSNR, number of iterations (iter), and edge variation (EV) as functions of the number of levels and noise-level (% noise) for restorations of (a) the image of Example 4.1 contaminated by Gaussian blur determined by band = 9 and sigma = 3, and (b) the image of Example 4.2 contaminated by motion blur defined by r = 15 and θ = 10 % noise 1 1 2 1 3 1 1 5 2 5 3 5 1 10 2 10 3 10
(a) PSNR 26.05 26.73 26.86 24.30 24.38 24.63 23.25 23.42 23.60
iter 11 89 979 4 33 533 3 22 412
EV 5043 4179 4060 5279 4682 4555 5477 4949 4853
% noise 1 1 2 1 3 1 1 5 2 5 3 5 1 10 2 10 3 10
(b) PSNR 30.93 31.69 32.02 27.13 28.56 28.69 25.15 26.77 26.93
iter 12 11 9 17 8 8 5 43 723 3 22 312
EV 4629 2294 2251 6519 3553 3140 5368 3692 3466
Fig. 3. Example 4.2. Left: Restoration determined by 3-level LSQR-based multilevel method. Right: Restoration obtained by basic 1-level LSQR.
dominating computational effort are the matrix-vector product evaluations on the finest level. The 2- and 3-level methods can be seen to require fewer iterations on the finest level than the basic 1-level MR-II method. 2 Example 4.2. Consider the restoration of a version of the right-hand side image of Figure 1 that has been contaminated by motion blur and 5% Gaussian noise. The PSF is represented by a line segment of length r pixels in the direction of the motion. The angle θ (in degrees) specifies the direction; it is measured counter-clockwise from the positive x-axis. The PSF takes on the value r−1 on this segment and vanishes elsewhere. We refer to the parameter r as the width.
436
S. Morigi, L. Reichel, and F. Sgallari
Fig. 4. Example 3. Top left: Segmentation of a blur- and noise- free image. Top right: Segmentation a blurred and noisy image by a 1-level method. Bottom-left: Segmentation by 3-level method of the blurred and noisy image on level 2. Bottom-right: Segmentation by 3-level method on finest level.
The motion blur for this example is defined by r = 15 and θ = 10. The blurring matrix A is nonsymmetric. We therefore use the LSQR iterative method in Algorithm 1. Figure 3 (left) shows the restoration determined by Algorithm 1 with 3 levels. The restored image obtained by the basic 1-level LSQR method is shown in Figure 3 (right). Visual comparison shows Algorithm 1 to give the most pleasing restoration. This is in agreement with the PSNR- and EV-values reported in Table 1(b). 2 Example 4.3. We apply Algorithm 1 to segmentation of a contaminated version of the image of Figure 1 (right). The contamination is caused by Gaussian blur, determined by band = 9 and sigma = 3, and 10% Gaussian noise. Segmentation is carried out using the variational formulation for geodesic active contours
Multilevel Method for Deblurring, Denoising, and Segmentation
437
(GAC) without re-initialization as described by Li et al. [7]. The initial curve is close to the boundary of the image. Figure 4 (top-left) shows the segmentation obtained when applied to the noise- and blur-free image in Figure 1 (left). The curve evolution requires 900 iterations. Segmentation of the contaminated image is more difficult. We first deblur the contaminated image by the basic 1-level MR-II iterative method, and then apply GAC segmentation to the restored image. The resulting segmentation is shown in Figure 4 (top-right). The curve evolution required 1200 iterations. Finally, we apply Algorithm 1 with 3 levels and the Segmentation step. No segmentation is carried out on the coarsest level. On level = 2, we apply GAC segmentation with 400 curve evolution iterations. The resulting segmentation is shown in Figure 4 (bottom-left). Prolongation of the evolved contour is carried out by spline interpolation. Only 100 curve evolution iterations are required on the finest level. The resulting segmentation is displayed in Figure 4 (bottomright). The figure shows Algorithm 1 to be able to extract object boundaries with less computational effort and higher accuracy than the corresponding 1level method. 2
5
Conclusions and Extension
Visual inspection of the images shown in Section 4, as well as computed PSNRand EV-values, show the cascadic multilevel method to give more accurate restorations than 1-level methods applied on the finest level only. A multilevel approach to segmentation of contaminated images also yields better results and requires less computational effort than the corresponding 1-level method. The aim of ongoing work is to gain increased understanding of the interplay between image restoration and segmentation.
Acknowledgments This research has been supported by PRIN-MIUR-Cofin 2006 project, by University of Bologna "Funds for selected research topics", and in part by an OBR Research Challenge Grant.
References 1. Buades, A., Coll, B., Morel, J.M.: The staircasing effect in neighborhood filters and its solution. IEEE Trans. Image Processing 15, 1499–1505 (2006) 2. Caselles, V., Kimmel, R., Sapiro, G.: Geodesic active contours. Int. J. Comput. Vis. 22, 61–79 (1997) 3. Engl, H.W., Hanke, M., Neubauer, A.: Regularization of Inverse Problems. Kluwer, Dordrecht (1996) 4. Hanke, M.: Conjugate Gradient Type Methods for Ill-Posed Problems. Longman, Essex (1995)
438
S. Morigi, L. Reichel, and F. Sgallari
5. Hansen, P.C.: Rank-Deficient and Discrete Ill-Posed Problems. SIAM, Philadelphia (1997) 6. Hansen, P.C.: Regularization tools, version 4.0 for MATLAB 7.3. Numer. Algorithms 46, 189–294 (2007) 7. Li, C., Xu, C., Gui, C., Fox, M.D.: Level set evolution without re-initialization: a new variational formulation. In: Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 430–436 (2005) 8. Morigi, S., Reichel, L., Sgallari, F., Shyshkov, A.: Cascadic multiresolution methods for image deblurring. SIAM J. Imaging Sci. 1, 51–74 (2008) 9. Ng, M.K., Chan, R.H., Tang, W.-C.: A fast algorithm for deblurring models with Neumann boundary conditions. SIAM J. Sci. Comput. 21, 851–866 (1999) 10. Reichel, L., Shyshkov, A.: Cascadic multilevel methods for ill-posed problems. J. Comput. Appl. Math. (in press) 11. Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 12. Osher, S., Sethian, J.A.: Fronts propagating with curvaturedependent speed: algorithms based on Hamilton-Jacobi formulations. J. Comput. Phys. 79, 12–49 (1988) 13. Paige, C.C., Saunders, M.A.: LSQR: An algorithm for sparse linear equations and sparse least squares. ACM Trans. Math. Software 8, 43–71 (1982) 14. Perona, P., Malik, J.: Scale-space and edge detection using anisotropic diffusion. IEEE Trans. Pattern Anal. Mach. Intell. 12, 629–639 (1990) 15. Welk, M., Theis, D., Brox, T., Weickert, J.: PDE-based deconvolution with forwardbackward diffusivities and diffusion tensors. In: Kimmel, R., Sochen, N.A., Weickert, J. (eds.) Scale-Space 2005. LNCS, vol. 3459, pp. 585–597. Springer, Heidelberg (2005) 16. Weickert, J., Romeny, B.M.H., Viergever, M.A.: Efficient and reliable schemes for nonlinear diffusion filtering. IEEE Trans. Image Process. 7, 398–410 (1998)
Fast Dejittering for Digital Video Frames Mila Nikolova CMLA, ENS Cachan, CNRS, PRES UniverSud, France [email protected] http://www.cmla.ens-cachan.fr/∼nikolova/
Abstract. We propose several very fast algorithms to restore jittered digital video frames (their rows are shifted) in one iteration. The restored row shifts minimize non-smooth and possibly non-convex local criteria applied on the second-order differences between consecutive rows. We introduce specific error measures to assess the quality of dejittering. Our algorithms are designed for gray-value, color and noisy images. Some of them can be considered as parameter-free. They outperform by far the existing algorithms both in quality and in speed. They are a crucial step towards real-time dejittering of digital video.
1
Intrinsic Dejittering
Image jitter consists in a random horizontal shift of each row of a video frame. It occurs when the synchronization row pulses are corrupted e.g. by noise or degradation of the storage medium, or in wireless transmission. The visual effect is disturbing since all shapes are jagged, cf. e.g. Fig. 4. Structured jitter can be provoked by acoustic or electrical interferences [7], cf. e.g. Fig. 8. Time base corrector machines recover with some success the row synchronization pulses. This operation is often unsuccessful or impossible [6]. An alternative—restoring the video frames directly from the jittered data, called intrinsic dejittering [5]—is much more flexible and widely applicable. State of the Art. Intrinsic dejittering was invented in [5]. The method is based on a 2D auto-regressive (AR) image model. The unknown AR coefficients and row starts are estimated iteratively, jointly by blocs; a drift compensation is applied afterwards [6]. In [7], the 1 norm of the differences between 2 or 3 consecutive shifted rows is compared in the framework of dynamic programming. A fully Bayesian iterative method using a TV-based prior for joint dejittering and denoising is derived in [12]. The Bake and Shake method in [3] uses a good PDE image model (e.g. Perona-Malik) to recover the row positions. In [4], the same authors analyze the vertical slicing moments of images of bounded variation and derive a variational method (faster than [3] but less effective for difficult data). Our Approach. We exhibit a pertinent model enabling to discriminate natural images from their jittered versions. Each row is restored based on the previously restored rows using a simple non-smooth and possibly non-convex local criterion. We thus construct one-iteration effective and fast dejittering algorithms. Noisy X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 439–451, 2009. c Springer-Verlag Berlin Heidelberg 2009
440
M. Nikolova
jittered images are restored in two stages: (a) dejittering of the raw data; (b) denoising of the obtained dejittered image.
2
The Main Points of Our Approach
Notations. For any positive integers m and n, the rows of a matrix h ∈ Rm×n are denoted by hi , 1 ≤ i ≤ m, and the components of a row hi by hi (j), 1 ≤ j ≤ n. The components of any n-length vector u are denoted by ui , 1 ≤ i ≤ n. Given an original image f ∈ Rr×c , a jittered image g is produced according to: fi (j + di ) if 1 ≤ j + di ≤ c, (1) 1 ≤ i ≤ r, di ∈ Z, 1 ≤ j ≤ c, gi (j) = 0 otherwise. In practice, the row shifts di are bounded, |di | ≤ M , for M ≤ 6 or more [6]. The ˆ respectively. restored image and row shifts are denoted by fˆ and d, 2.1
Choice of a Local Criterion on Consecutive Rows
First of all, we need a good model for the columns of natural images.
original
(a) Original
jittered
(b) One column
(c) Jittered
Fig. 1. 50 × 50 zoom of Lena. (b) Gray value of column 15 in (a) and in (c).
Remark 1. The gray-value of the columns of natural images can be seen as pieces of 2nd or 3rd order polynomials—see Fig. 1(b) left or Fig. 3 in [9]. Such a claim is false for jittered images—see Fig. 1(b) right. This observation provides a sound basis to discriminate a natural image from its jittered versions. Suppose that fˆ1 , . . . fˆi−1 are already dejittered. By Remark 1, we will estimate the next dˆi using a criterion that compares fˆi−1 , fˆi−2 , . . . with all possible shifts of the ith data row, gi (j − di ), di ∈ {−N, . . . , N } for N ≥ M .
Uniform jitter, M = 6 arg min J , α = 1 Original (116 × 200) arg min J , α = 0.5 Fig. 2. Uniform jitter on {−M, . . . , M }. Restorations using (2)-(3) and (4).
Fast Dejittering for Digital Video Frames
441
Remark 2. Each row of g has no more than N zero-valued pixels at both extremities because of the jitter, see e.g. Fig. 2. Involving them in our criterion can seriously distort its meaning. So for any row i, we will use only data samples gi (j) for j ∈ {N + 1, . . . , c − N } which certainly belong to the original image. Guided by Remarks 1 and 2, as well as by a series of preliminary experiments (see e.g. Fig. 3), our main focus is on (2) dˆi = arg min J (di ) : di ∈ {−N, .., N } , N ≥ M, J (di ) =
c−N
gi (j − di ) − 2fˆi−1 (j) + fˆi−2 (j)α , α ∈ {0.5, 1} .
(3)
j=N +1
dˆi is easily found by exhaustive search since it belongs to a small finite set. Then: ∀j ∈ {1, · · · , c}, fˆi (j) = gi (j − dˆi ) if 1 ≤ j − dˆj ≤ c and fˆi (j) = 0 else .
(4)
Criterion J for α ∈ (0, 1] is minimized by a dˆi such that for a maximum number of components j we have fˆi (j) ≈ 2fˆi−1 (j) − fˆi−2 (j)—i.e. fˆi (j), fˆi−1 (j) and fˆi−2 (j) form a nearly linear segment—while breakpoints are preserved; for a mathematical flavor, see [10, 11, 13]. Then the gray value of each column of fˆ varies nearly piecewise linearly. More details are given in [9]. Remark 3. Dejittering a single frame yields a translated estimate pˆ of the row shifts, say pˆ = dˆ + C. Given the original d, the integer C is such that (5) C = arg max # i ∈ {1, · · · , r} : pˆi − n = di , n∈Z
where # means cardinality. α ˆ Alternative criteria (see Fig. 3). Minimize J1 (di ) = c−N j=N +1 gi (j−di )−fi−1 (j) yields (c)-(d). Criteria J1 work poorly—they tend to recover constant grayvalue vertical pieces. Solving (2)-(3) yields the original image α in (e)-(f). Criteria c−N ˆ ˆ ˆ J3 (di ) = j=N +1 gi (j−di ) − 3fi−1 (j) + 3fi−2 (j) − fi−3 (j) cannot discriminate well enough a natural image from its slightly shifted versions, see (g)-(h). 2.2
Error Measures for Dejittering
Remind that fˆ is translated with respect to (w.r.t.) f and that the extremities of its rows are null. In order to apply standard error measures, we shrink fˆ to fˆs fˆis (j) = fˆi (j + N ), 1 ≤ j ≤ c − 2N, ∀i ∈ {1, . . . , r}, so that fˆs contains only proper image information. Then we select an r×(c−2N ) inner submatrix f s of the original f that matches fˆs the best. Note that any error measure on fˆs − f s is sensitive to the of f s . We select f s using choice r c−2N s s ˆ the 1 norm: f − f 1 = min0≤k≤2N i=1 j=1 fi (j + k) − fˆs (j). Then we
442
M. Nikolova
(a) Original
(e) J , α = 1
(b) Jittered
(f) J , α = 0.5
(c) J1 , α = 1
(d) J1 , α = 0.5
(g) J3 , α = 1
(h) J3 , α = 0.5
Fig. 3. (b) Independent uniform jitter. Next: restorations for N = M + 1.
consider the mean absolute error mae(fˆ, f ) = f s−fˆs 1 / r(c−2N ) and the peak 2 signal to noise ratio, psnr(fˆ, f ) = 10 log10 δ r(c − 2N )/f s − fˆs 22 , where .2 is the 2 -norm and δ is the dynamic range of (fˆs , f s ). ˆ The error measure The quality of dejittering can also be evaluated using d− d. def ˆ d) = (1/r)d − d ˆ 1 gives the average displacement of the pixels along any e1 (d, column. The following two measures are quite interesting: ˆ d) def ˆ ∞% ; e∞ (d, = (100/c)d − d (6) def Δ ˆ = 0, 1 ≤ i ≤ r−1 % . (7) e0 (d, d) = 100/(r−1) # (dˆi −di ) − (dˆi+1 −di+1 ) e∞ measures the maximum horizontal error w.r.t. the width c of the image while ˆ eΔ 0 measures the number of changes in d − d w.r.t. the height r of the image. Δ Remark 4. When both e∞ and eΔ 0 are small (e.g. e∞ ≤ 0.4% and e0 ≤ 0.8%), we are guaranteed that dejittering is nearly perfect, independently of any other error measure (see Figs. 6, 7, 10 and 12). Indeed, for a 512 × 512 image, the proposed error bounds mean that no more than 4 rows have a horizontal erroneous shift which is no more than 2 pixels. For a natural image, such an error is invisible to the naked eye. However, if one of these values is larger, no conclusion can be done—cf. Fig. 9 and the relevant comments.
3
Algorithms for Gray-Value Natural Images
We construct an r×(c+2N )-size matrix f ∗ for N > M . The middle of its first row f1∗ is g1 , so pˆ1 = N+1. Then we restore the relative row shifts pˆi ∈ {1, . . . , 2N +1}, ∀i ∈ {2,· · ·, r} based on (2)-(3) and (4). Then fˆ is an inner sub-matrix of f ∗ .
Fast Dejittering for Digital Video Frames
443
. . Notations. [a .. b .. c] means that a, b and c are concatenated horizontally; a ← b means that we replace a by b. ∀n ∈ N, θ(n) is the n-length 0-valued row: . def . θ(n) = 0 .. · · · .. 0 , #θ(n) = n.
(8)
Algorithm 1 (Gray value images) – Fix N > M , e.g., N = M + 1. – Choose α = 1 or α = 0.5.
. . 1. Define f ∗ ∈ Rr×(c+2N ) and set f1∗ = θ(N ) .. g1 .. θ(N ) .
. . 2. Split g = g L .. γ .. g R where g L ∈ Rr×N , γ ∈ Rr×(c−2N ) and g R ∈ Rr×N .
.. .. 3. Put pˆ0 = pˆ1 = N + 1 and u = ⎧ v = θ(N ) . γ 1 . θ(N ) . ..
.. 4. For any i = 2, . . . , r, do: ⎪ ⎪ (i) Put hk = θ(k − ⎪ ⎪ 1) . γ i . θ(2N − k + 1) ; ⎪ ⎪ ⎨ (ii) Find m = max k, pˆi−1 , pˆi−2 and n = min k, pˆi−1 , pˆi−2 +c−1 ; (a) ∀ k = 1, . . . , 2N +1, do ⎪ n α ⎪ ⎪ 1 k ⎪ ⎪ − 2u + v (iii) J (k) = h ; ⎪ j j j ⎩ n−m+1 j=m (b) Find pˆi = arg min{J (k) : 1 ≤ k ≤ 2N + 1} ;
. . pi − 1) .. γ i .. θ(2N + 1 − pˆi ) ; (c) Replace v ← u and u ← hpˆi = θ(ˆ
. . (d) Set f ∗ = θ(ˆ p − 1) .. g .. θ(2N − pˆ + 1) . i
i
i
i
5. Extract fˆ ∈ Rr×c from f ∗ ∈ Rr×(c+2N ) : cancel 2N columns at the left and right ends of f ∗ that have the largest number of zeros. Explanations. u, v and hk are c-length rows such that at step i, u and v correspond to the restored rows i − 1 and i − 2, respectively, while hk in 4a(i) realizes all possible shifts for row i. In 4a(ii), m and n help to satisfy Remark 2. In 4b, pˆi is the estimate for relative shift of row i. Computation time. We used Matlab 7.2 on a PC with Pentium 4 CPU 2.8GHz and 1GB RAM, under Windows XP Professional service pack 2. For a 512 × 512 image and N = 7 we got the solution in 0.62 s. for α = 1 and in 1 s. for α = 0.5. Translation Recovery. In order to compute the the errors defined in § 2.2, we need the translation constant C given in (5). Note that 1 − N ≤ C ≤ 3N + 1. Algorithm (Translation Recovery) 1. Define I = {−N + 1, . . . , 3N + 1}. 2. Compute the histogram H(n) = # j ∈ I : pˆ(j) − d(j) = n , ∀n ∈ I. 3. Obtain C = arg maxn∈ I H(n). Then dˆi = pˆi − C, 1 ≤ i ≤ r.
444
M. Nikolova
Compound models. If the gray-values of the columns of an image are nearly constant on large pieces, we should involve in J a 1st -order differences term. Algorithm 1(a) In Algorithm 1, 4a(iii), use J below where β is a weight for 1st -order differences: α 1 J (k) = n−m+1 nj=m |hkj − 2uj + vj | + β|hkj − uj | , β ≥ 0. Illustrations. In all experiments, Algorithm 1, is applied with N = M + 1. The jitter in Fig. 4 is significant. We kept this first trial since our method found the original for α ∈ {0.5, 1}. In Fig. 5 (Peppers), the dejittered image is hard to distinguish from the original. However, the error image f s − fˆs shows a slight displacement of several pixels. The dejittered image in Fig. 6 is nearly ˆ perfect since eΔ 0 = 0.6% and e∞= 0.39%. We observe that d − d has a 1-pixel error at rows 83, 84 and 401. The first two are within the zooms in the same figure. The restored Boat in Fig. 7 is quasi-perfect since eΔ 0 = 0.25% and e∞= 0.39%. The original Boat can be seen in Fig. 8 where the restorations are exact (all errors are null). For the results concerning [12] and [3], cf. section 6, p. 450.
Uniform jitter, M= 6 Bayesian TV [12] Bake & Shake [3] Algorithm 1≡Original mae=11.7, psnr=22 mae=7.4, psnr=23 mae=0, psnr=∞ Fig. 4. Algorithm 1 for α = 1 and α = 0.5 yields the original image
Uniform jitter,M=10 Original (512×512) Algorithm 1, α = 0.5
Error: f s − fˆs
Fig. 5. Algorithm 1 with α = 0.5 yields mae= 1.35, psnr=31.51 and e1 = 0.4
Large-Scale Experiment. We tested all proposed algorithms using 1000 independent experiments where 4 images were degraded with 2 different types of random jitter and restorations were done for α = 1 and α = 0.5. The main conclusion is that α = 0.5 is better for images with texture or curvatures (Lena, Barbara,
Fast Dejittering for Digital Video Frames
Uniform jitter,M=6
Alg. 1, α = 0.5
Zoom dejittered
445
Zoom original
Fig. 6. (512×512). Algorithm 1: mae= 4.16, psnr=25.53, eΔ 0 = 0.6% and e∞ = 0.39%.
Uniform jitter,M=10 Bayesian TV [12] Bake & Shake [3] Alg. 1, α ∈ {0.5, 1} mae=13.4, psnr=20.8 mae=12.5, psnr=20.3 mae=0.6, psnr=42.9 Fig. 7. Boat (400×512). Algorithms 1 is nearly perfect: eΔ 0 = 0.25% and e∞ = 0.39%.
d = 6 sin
n 20
Algorithm 1 ≡Original d = 6 sin
n 4
Algorithm 1 ≡Original
Fig. 8. Boat (400×512). Here . denotes approximation to the nearest integer.
Peppers); α = 1 is better for images with many straight lines (Boat). In all cases α = 1 yields good results, usually α = 0.5 works better. The details are reported in [9]. Globally, the obtained mean results are very encouraging.
4
Algorithms Color Natural Images
We extend Algorithm 1 to RGB color images where all channels incur the same jitter. RGB images are represented by vector-valued matrices f where each pixel fi (j) has 3 components, fi (j; κ), 1 ≤ κ ≤ 3. The jittering model now reads: fi (j + di ; κ), if 1 ≤ j +di ≤ c, 1 ≤ i ≤ r, |di | ≤ M, 1 ≤ κ ≤ 3. gi (j; κ) = 0, otherwise, 1 ≤ j ≤ c,
446
M. Nikolova
The main algorithm is based on (2)-(3) and (4), yet again. Since the jitter is the same for all color channels, we obtain from g a gray-value image γ and estimate the relative row shifts pˆi using γ as in Algorithm 1. The dejittered color image fˆ is obtained by inserting pˆ into g. Similarly to (8), for any positive integer n we denote by θ(n × 3) the n-length vector-valued row whose components are (0, 0, 0) for all i = 1, · · · , n. Algorithm 2 (Color images) – Fix N > M , e.g., N = M + 1. – Choose α = 1 or α = 0.5.
. . 1. Define f ∗ ∈ Rr×(c+2N )×3 and set f1∗ = θ(N × 3) .. g1 .. θ(N × 3) .
. . 2. Split g = g L .. g .. g R , where g L ∈ Rr×N , g ∈ Rr×(c−2N ) and g R ∈ Rr×N . 3. Calculate γ 1 (j) = |g1 (j; 1)| + |g1 (j; 2)| + |g 1 (j; 3)| for 1 ≤ j ≤ c − 2N . 4. Put pˆ0 = pˆ1 = N + 1 and u = v = θ(N ), γ 1 , θ(N ) . 5. For any i = 2, . . . , r do: i. γ i (j) = g i (j; 1) + g i (j; 2) + g i (j; 3); (a) ∀ k = 1, . . . , 2N + 1 do: ii. do step 4a as in Algorithm 1; (b) Do steps 4b and 4c as in Algorithm 1 ;
. . p − 1) × 3 .. g .. θ (2N − pˆ + 1) × 3 . (c) Set f ∗ = θ (ˆ i
i
i
i
6. Find fˆ ∈ Rr×c as in step 5, Algorithm 1. Computation time. In the conditions of Remark 3, p.443, for a 512×512 RGB image and N = 7 we got the solution in 1 s. for α = 1 and in 1.4 s. for α = 0.5. Algorithm 2(a) (Compound models) In step 5a, Algorithm 2, replace J as done in Algorithm 1(a). Illustrations. In all examples, Algorithms 2 and 2(a) are used with N = M + 1. In Fig. 9, the main part of the error in dˆ corresponds to the sky and to the ground which are quite homogeneous, so the error is invisible to the naked eye. Part of it reaches the the boat, so we display a zoom of the latter. Fig. 10 shows
original
restored
Uniform jitter, M= 8 Man (478 × 532)
Algorithm 2 α = 1
Zooms.
Fig. 9. Dejittering yields mae= 1.45, psnr=33.82, e1 = 0.76 and e∞ = 3.76%
Fast Dejittering for Digital Video Frames
Jitter N (0,52 ) truncated on {−15, .., 15}
Zooms of a 707 × 579 image
447
Algorithm 2, α = 0.5
Original
Fig. 10. The restoration of the whole image quasi-perfect: e∞= 0.17% and eΔ 0 = 0.28%
(a)
(b)
(c) Gaussian jitter, M = 12
Algorithms 2(a)
Zooms
Fig. 11. Zooms: (a) Jittered, (b) Original, (c) Dejittered
Uniform jitter M = 8
Original (542 × 410)
Algorithm 2, α = 0.5
Fig. 12. The result is quasi-perfect, mae=0.14, psnr=45.15, eΔ 0 = 0.37% and e∞= 0.18%
448
M. Nikolova
a zoom of a 707 × 579 image. The dejittering of the full image is nearly perfect since e∞ = 0.17% and eΔ 0 = 0.28%. The jitter in Fig. 11 is a centered Gaussian with standard deviation σ = 6, truncated and quantized on {−12, . . . , 12}. Algorithm 2(a) for α = 0.5 and β ∈ {2, 3} gives better visual results than Algorithm 2. Fig. 12 shows a nearly perfect restoration since e∞ = 0.18% and eΔ 0 = 0.37%.
5
Restoration of Noisy Jittered Images
Our approach is to first dejitter the raw data using the ideas of Algorithms 1-2 and then to denoise the dejittered image. In the second stage, we use fast shrinkage estimators, see e.g. [8]. Better methods would improve the final result. 5.1
Moderate Noise
For a noise with 15-20 db snr or more, Algorithms 1, 2 perform well. Experiment. The image in Fig. 13(a) is corrupted with white zero-mean normal noise, 15 db snr, and independent uniform jitter on {−6, . . . , 6}. Taking into account that the columns of the image are nearly constant on large segments, dejittering in (b) is done using Algorithm 1(a) for β = 3. Denoising of (b) is done in (c) by hard thresholding the 2D Daubechies wavelet transform with 4 vanishing moments for T = 30. The restoration is fast and the result is clean, compared to Fig. 5.
(a) 15 db snr+Jitter
(b) Dejittered, Alg.1
(c) Denoised
Fig. 13. Pepers (512 × 512). For the restored image in (c), psnr=29.34.
5.2
Strong Noise
When the noise is strong, we propose a sightly different scheme having a comparable computational cost. The idea is to partially denoise each row of the image using hard thresholding and to replace the function |.|α in step 4a(iii) of Algorithm 1 by a better adapted edge-preserving function ψ. Let W : R1×n → R1×n denote a 1D wavelet transform and W ∗ its inverse. Given a threshold T > 0, let us introduce the hard thresholding operator HT : R1×n → R1×n by
Fast Dejittering for Digital Video Frames
HT (w)(j) =
0 if w(j) ≤ T w(j) otherwise
1 ≤ j ≤ n, ∀w ∈ R1×n .
449
(9)
Knowing that the asymptotically optimal T , cf. [2], oversmooths rows, we use an under-optimal T . In order to simplify the presentation, we give the algorithm for gray-value images. The extension to color images is straightforward, cf. [9]. Algorithm 3 (Quite noisy images) – – – –
Fix N > M , e.g., N = M + 1. Choose a 1D wavelet transform W (e.g. Daubechies). Fix an under-optimal threshold T . Choose ψ : R×R → R+ , e.g. ψ(s, t) = (|s| + β|t|)α , and fix α > 0 and β ≥ 0.
. . 1. Define f ∗ ∈ Rr×(c+2N ) and set f1∗ = θ(N ) .. g1 .. θ(N ) .
. . 2. Split g = g L .. g .. g R where g L ∈ Rr×N , g ∈ Rr×(c−2N ) and g R ∈ Rr×N . 3. Compute γ1 = W ∗ HT (W g1 ) . 4. Do steps 3 to 5 of Algorithm 1 with changes: the following (a) in step 4a(i), insert γi = W ∗ HT (W g i ) ; n 1 k k (b) in step 4a(iii), use J (k) = n−m+1 j=m ψ |hj − 2uj + vj |+β|hj − uj | .
(a) 10db snr + Jitter
(d)Algorithm 3, dejittering
(b) Bayesian TV [12] (c) Bake & Shake [3] mae=19.36, psnr=20.24 mae=20.62, psnr=19.37
(e) Our 2-stage method mae=7, psnr=28.31
Original
Fig. 14. Boat (512 × 512). Restoration of (a) using different methods.
450
Comments. Hard-thresholding in steps 3 and 4a is better than other shrinkages since it keeps unchanged the important coefficients. The 1D row under-denoising (step 4a) helps to approach the model of Remark 1. Denoising of a dejittered image can be done by various methods. Experiment. Boat in Fig. 14 is corrupted with 10 db snr white zero-mean normal noise and independent jitter, uniform on {−8, .., 8}. The restoration using Bake and Shake [3] is visually better than Bayesian TV [12]. For these results, cf. section 6, p. 450. We used Algorithm 3(a) for β = 0 and ψ(t) = |t|α for α = 0.5 in step 4b. In steps 3 and 4a we use hard-thresholding of the Daubechies wavelet coefficients with 2 vanishing moments for T = 30. The dejittered image in (d) is denoised in (e) by hard thresholding of its curvelet transform using the enhanced-denoising program in the CurveLab 2.1.2 toolbox relevant to [1].
6
Conclusions
The obtained results have a remarkable quality while the algorithms are nearly real-time. More details and examples are presented in [9]. The crux of our approach are (a) to minimize a nonsmooth and possibly nonconvex local criterion on the magnitude of the second-order differences between consecutive rows; (b) to exclude from J all pixels due to the jitter. In presence of strong noise, a critical step is to (under)-denoise the rows successively so that the prior mentioned in Remark 1 remains relevant, and to adapt the criterion J if necessary. The natural evolution of this work is to involve it in the restoration of video sequences and to take advantage of the correlation between consecutive frames.
Acknowledgements This work has been supported by grant Freedom, anr07-jcjc-0048-01. The author thanks Louis Laborelli, (Institut National de l’Audiovisuel, France), for his discussion on practical questions relevant to jittering. The author is thankful to Dr. Suhg-Ha Kang, Georgia Institute of Technology, Atlanta, who realized all experiments with the methods [3] and [12], as well as to Dr. Jackie Shen (Barclays Capital, Wall Street) who provided his Matlab codes for [12].
References 1. Candés, E.J., Demanet, L., Donoho, D.L., Ying, L.: Fast discrete curvelet transforms. SIAM J. on Multiscale Modeling and Simulation 5(3), 861–899 (2006) 2. Donoho, D.L., Johnstone, I.M.: Ideal Spatial Adaptation by Wavelet Shrinkage. Biometrika 81(3), 425–455 (1994) 3. Kang, S.-H., Shen, J.: Video dejittering by bake and shake. Image and vision computing 24(2), 143–152 (2006) 4. Kang, S.-H., Shen, J.: Image Dejittering Based on Slicing Moments. Springer Series on Mathematics and Visualization, pp. 35–55 (2007)
451 5. Kokaram, A., Roosmalen, P.M.B., Rayner, P., Biemond, J.: Line registration of jittered video. In: Proc. of the IEEE Int. Conf. on Acoustics, Speech and Signal Processing, pp. 2553–2556 (1997) 6. Kokaram, A.: Motion picture restoration. Springer, Heidelberg (1998) 7. Laborelli, L.: Removal of video line jitter using a dynamic programming approach. In: Proc. of the IEEE ICASSP, pp. 331–334 (2003) 8. Mallat, S.: A Wavelet Tour of Signal Processing. Academic Press, London (1999) 9. Nikolova, M.: One-iteration dejittering of digital video images. Report CMLA n.2008-20, http://www.cmla.ens-cachan.fr/fileadmin/Membres/nikolova/RT-DJ.pdf 10. Nikolova, M.: Local strong homogeneity of a regularized estimator. SIAM J. on Appl. Mathematics 61(2), 633–658 (2000) 11. Nikolova, M.: Analysis of the recovery of edges in images and signals by minimizing nonconvex regularized least-squares. SIAM J. on Multiscale Modeling and Simulation 4(3), 960–991 (2005) 12. Shen, J.: Bayesian video dejittering by bv image model. SIAM J. on Appl. Mathematics 64(5), 1691–1708 (2004) 13. Welk, M., Weickert, J., Becker, F., Schnörr, C., Feddern, C., Burgeth, B.: Median and related local filters for tensor-valued images. Signal Processing (special issue Tensor Signal Processing) 7, 291–308 (2007)
Sparsity Regularization for Radon Measures Otmar Scherzer1,2 and Birgit Walch1, 1
2
Department of Mathematics, University of Innsbruck, Technikerstr. 21a, A-6020 Innsbruck, Austria [email protected], [email protected] http://infmath.uibk.ac.at Radon Institute of Computational and Applied Mathematics, Altenberger Str. 69, A-4040 Linz, Austria
Abstract. In this paper we establish a regularization method for Radon measures. Motivated from sparse L1 regularization we introduce a new regularization functional for the Radon norm, whose properties are then analyzed. We, furthermore, show well-posedness of Radon measure based sparsity regularization. Finally we present numerical examples along with the underlying algorithmic and implementation details. We shall, here, see that the number of iterations turn out of utmost importance when it comes to obtain reliable reconstructions of sparse data with varying intensities.
1
Introduction
In this paper we consider the solution of the abstract equation F u = v subject to u ∈ dom F .
(1)
The operator F is linear and bounded between Hilbert spaces W and V . We assume that dom F is a subset of Radon measures on a bounded domain Ω ⊆ IRn . We consider solving the operator equation (1) approximately by a variational regularization method, which consists in minimizing the functional 2 Tˆα,vδ (u ) := F u − v δ V + α u RM (2) on dom F ⊆ W . Here u RM is the norm of the Radon measure u . In order to see the relation to sparsity we note that if u is absolutely continuous with density U , i.e., U dx = du , then we have that U vdx : v ∈ C0 (Ω), vL∞ ≤ 1 = U L1 . u RM = sup Ω
The regularization method with Tˆα,vδ , where the Radon measure is replaced by the L1 -norm, has been analyzed in [13]. There, however, different assumptions
Birgit Walch is Recipient of a DOC fFORTE-fellowship of the Austrian Academy of Sciences at the Department of Mathematics of the University of Innsbruck.
X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 452–463, 2009. c Springer-Verlag Berlin Heidelberg 2009
Sparsity Regularization for Radon Measures
453
have been made that guarantee existence of a minimizer in L1 (Ω), while in this work we consider minimizers, which are Radon measures. The notion of sparsity appears in a variety of settings. In the context of regularization it is mostly used in connection with regularization terms RS (u ) := ωi |u , φi | , where φi is a set of appropriate functions, typically forming a basis or frame. The inner product is on a Hilbert space and ωi are positive coefficients. We refer to a few papers, which are related to this topic [7, 2, 3, 4, 5, 8, 9, 10, 11, 12, 14]. Some researchers even call total variation minimization sparsity regularization. We study the reconstruction of sparse functions and measures. In contrast to total variation regularization we focus on reconstructing sparse measures and not gradient measures. There is a fundamental difference between regularization terms RS and L1 , respectively Radon measure regularization. To see this, take (φi ) an orthonormal basis and ωi = 1 in the definition of RS (u ) and note that standard convex analysis in the Hilbert space l2 is applicable. Note that l1 ⊆ l2 2 and therefore we can consider minimization of u → F u − v δ + αRS (u ) over l2 ≡ L2 (Ω). That is, there is a proper extension of the functional from l1 to l2 if the operator F can be extended on l2 . However, convex analysis in the Hilbert spaces L2 is not applicable for ·L1 Regularization, since on domains with finite 2 measure, L2 (Ω) ⊂ L1 (Ω), and minimization of u → F u − v δ +α u L1 over L2 (Ω) is a real restriction of the proper domain of the regularization functional, which is L1 (Ω). The curiosity is that after discretization with piecewise constant functions of the later a truncated expansion of the former is revealed. The outline of this paper is as follows: In Section 2 we give a review on the analysis of regularization methods. In Section 4 we review some basic facts on Radon measures and duals of Sobolev spaces. Having specified the ingredients we apply the general results of the review sections to Tˆα,vδ in Section 3 and show well–posedness, and regularizing properties. Section 5 shows the analogy in the analysis to total variation minimization. Section 6 presents an example for sparse recovery and shows some reconstructions.
2
Review on Convergence Properties of Variational Regularization Methods
In this section we make the following general assumptions, where we stick to the notation of [13]. Afterwards, we apply the results to the setting already used in the introduction. Assumption 1 1. 2. 3. 4.
Let U and V be Hilbert spaces. L : U → V is a bounded linear operator. = dom F is closed and convex in U . F := L|dom F , where ∅ τU and τV are the weak topologies on U and V , respectively.
454
O. Scherzer and B. Walch
We consider now the solution of the abstract equation F u = v subject to u ∈ dom F .
(3)
We consider solving this operator equation by variational regularization methods, which consist in minimizing the functional 2 Tα,vδ (u) := F u − v δ V + αR(u) where v δ ∈ y. For most applications it will be considered a noisy approximation of v as in equation 3. In order to have regularization properties of the family (Tα,vδ ) it is required that R, ·V , and L satisfy: Assumption 2 1. The norm ·V is sequentially lower semi-continuous with respect to τV . 2. The functional R : U → [0, ∞] is convex and sequentially lower semicontinuous with respect to τU . dom R = {u : R(u) = ∞} is the domain of R. 3. D := dom F ∩ dom R = ∅ (which, in particular, implies that R is proper). 4. For every α > 0 and M > 0, the level sets Mα (M ) := levelM (Tα,v ) := {u ∈ U : Tα,v (u) ≤ M } are sequentially pre-compact with respect to τU . 5. For every M > 0 the set Mα (M ) is sequentially closed with respect to τU and the restriction of F to Mα (M ) is sequentially continuous with respect to the topologies τU and τV . We stress that the sets Mα (M ) are defined based on the Tikhonov functional for unperturbed data v and we do not a-priori exclude the case that Mα (M ) = ∅. We refer to the following theorems from [13], which guarantee the existence of a minimizer, stability of the regularized solutions, and convergence: Theorem 3 (Existence). Let F , R, D, U , and V satisfy Assumption 2. Assume that α > 0 and v δ ∈ V . Then, there exists a minimizer of Tα,vδ . It has been shown by several authors that information on the noise level δ v − v ≤ δ
(4)
is essential for an analysis of regularization methods. In fact without this information the regularization cannot be chosen such that convergence of uδα to a solution of equation 1 can be guaranteed. Theorem 4 (Stability). Let F , dom F , U , and V satisfy Assumption 2. Assume that α > 0 and vk → v δ . Moreover, let uk ∈ arg min Tα,vk ,
k ∈ IN .
Then, (uk ) has a convergent subsequence. Every convergent subsequence converges to a minimizer of Tα,vδ .
Sparsity Regularization for Radon Measures
455
The following theorem clarifies the role of the regularization parameter α. It has to be chosen in dependence of the noise level to guarantee approximation of the solution of (3). Theorem 5 (Convergence). Let F , dom F , U , and V satisfy Assumption 2. Assume that (3) has a solution in dom F and that α : (0, ∞) → (0, ∞) satisfies α(δ) → 0 and
δ2 → 0 , as δ → 0 . α(δ)
Moreover, let the sequence (δk ) of positive numbers converge to 0, and assume that the data vk := v δk satisfy v − vk ≤ δk . Let uk ∈ arg min Tα(δk ),vk . Then (uk ) has a convergent subsequence to a solution of (1).
3
Regularization on the Space of Radon Measures
We assume that Ω ⊆ IRn and Ω ⊆ IRm are bounded, open and connected with Lipschitz boundary, respectively. For the sake of simplicity of presentation we take V = L2 (Ω ). Other spaces can be considered but then the notation is not that transparent anymore. We consider and study minimization of the functional Tˆα,vδ (u ) := (F u − v δ )2 + α u RM (5) Ω
over the set of Radon measures on Ω. Here, u RM denotes the norm of the Radon measure of u . Radon Measures Below we shortly review some facts about Radon measures, and specify the according properties. The set of Radon measures is the dual of C0 (Ω). Here, C0 (Ω) is the space of continuous functions from Ω into IR with compact support in Ω. We always consider C0 (Ω) equipped with the supremum norm. We denote the dual by M := (C0 (Ω)) and for u ∈ M the Radon measure is defined by u RM := sup vdu : v ∈ C0 (Ω), vL∞ ≤ 1 . Ω
We recall the definition of weak* convergence in M, i.e., a bounded sequence (uk )k in M is weakly* convergent to u ∈ M if f duk = f du for all f ∈ C0 (Ω) . lim k→∞
Ω
Ω
Below we show that ·RM is lower semi-continuous with respect to the weak* convergence on M.
456
O. Scherzer and B. Walch
Lemma 1. ·RM is lower semi-continuous with respect to the weak* convergence on M. Proof. Let a sequence of Radon measures (uk )k be weakly* convergent to some measure u . Then, u RM = sup vdu : v ∈ C0 (Ω), vL∞ ≤ 1 Ω vduk : v ∈ C0 (Ω), vL∞ ≤ 1 = sup lim k→∞
Ω
≤ lim inf uk RM . k→∞
Dual of a Sobolev Space Let s ∈ IN be fixed. In the following we investigate the dual of the Sobolev space W := W0s,2 (Ω), which is a Hilbert space with the inner product w1 , w2 s := ∇s w1 · ∇s w2 , Ω
where ∇s is the tensor containing all s-th derivatives. The associated norm is denoted by w s . For w ∈ W , the dual of W0s,2 (Ω), we have w −s := sup {w w ˜:w ˜ ∈ W, w ˜ s ≤ 1} . W satisfies the following properties: 1. From the Riesz representation theorem (see e.g. [6, Theorem 3.4]) it follows that for every w ∈ W there exists w ∈ W such that w w ˜ = w, w ˜ s for all w ˜ ∈ W. We define the Riesz mapping Iw = w ,
(6)
and note that I is an isomorphism between W and W, i.e., Iw s = w −s . In particular, we have that (wk )k → w with respect to the topology τW if and only if (wk )k = (Iwk )k → Iw = w with respect to the topology τW . 2. The inner product on the dual space W can be defined by w1 , w2 −s = w1 , w2 s , where w1 , w1 and w2 , w2 are related by the Riesz representation theorem, respectively. Now, we state a lemma, which is central for our further considerations: Lemma 2. Let 2s > n; Recall that s is the order of differentiation in the definition of W and n is the dimension of Ω. Then
Sparsity Regularization for Radon Measures
457
1. ·RM is convex and lower semi-continuous on W . 2. M is closed in W . 3. There exists a constant C such that w −s ≤ C w RM for all w ∈ M. Proof. We make some general statements first. Since, by assumption 2s > n, the Sobolev embedding theorem (see [1, Thm. 5.4]) guarantees that the embedding from W into C0 (Ω) is bounded, i.e., there exists a constant C such that uL∞ ≤ C us for all u ∈ W .
(7)
Since C0∞ (Ω) is dense in W and C0 (Ω) (with respect to the topologies of W and C0 (Ω), respectively), we have u RM = sup {u v : v ∈ C0 (Ω), vL∞ ≤ 1} = sup {u v : v ∈ C0∞ (Ω), vL∞ ≤ 1} 1 = sup {u v : v ∈ C0∞ (Ω), vL∞ ≤ C} C 1 ≥ sup {u v : v ∈ C0∞ (Ω), vs ≤ 1} C 1 = sup u v : v ∈ W0s,2 (Ω), vs ≤ 1 C = u −s . Thus, M ⊆ W . 1. Let (uk )k be a sequence of Radon measures, which is convergent to u in W (i.e., with respect to τW ). It remains to prove that u is a Radon measure. Since (uk )k is bounded in W , it is also weakly* convergent in W , meaning that uk v → u v for all v ∈ W. Then, in particular, we have uk v → u v for all v ∈ C0∞ (Ω). Now, let v ∈ C0∞ (Ω) satisfy vL∞ ≤ 1, then u v = lim uk v k→∞
≤ lim sup {uk v˜ : v˜ ∈ C0 (Ω), ˜ v L∞ ≤ 1} k→∞
(8)
≤ lim inf uk RM . k→∞
Since
C0∞ (Ω)
is dense in C0 (Ω), the last inequality shows that u RM ≤ lim inf uk RM k→∞
and, thus, u is a Radon measure. 2. From (8) it also follows that .RM is lower semi-continuous on W . The convexity is trivial. 3. Using (7) it follows that w −s = sup {w w ˜:w ˜ ∈ W, ws ≤ 1} ≤ sup {w w ˜:w ˜ ∈ M, wL∞ ≤ C}
= C w RM . This gives the third assertion.
458
4
O. Scherzer and B. Walch
Application to Variational Regularization on Radon Measures
We consider minimization of Tˆα,vδ on W , the dual of the Sobolev space W0s,2 (Ω), with dom F := M, the space of Radon measures, and L : W → L2 (Ω ) as in Assumption 1 bounded. Here W , L2 (Ω ) play the role of U and V in Assumption 1; i.e., we consider the weak topologies on W (not that since W is a Hilbert space, weak and weak* convergence can be identified) and L2 (Ω ). Note that in our notation of Assumption 1 we use here F := L|dom F . In order to apply the general results stated in Section 1 we have to verify Assumption 2. The requirement in Assumption 1 that dom F = M is closed in W , has already been shown in Lemma 2. 1. We recall that every norm on a Hilbert space is continuous and convex with respect to the weak topology. Therefore, ·W is sequentially weakly lower semi-continuous with respect to τW . 2. The functional R(·) := ·RM is convex and lower semi-continuous, which has already been shown in Lemma 2. 3. The set of Radon measures, which equals the domain D, is not empty. 4. Let α > 0, M > 0, and let (uk )k be a sequence in Mα (M ). We show that (uk )k has a convergent subsequence with respect to τW . From the definition of Tˆα,vδ it follows that (uk RM )k is bounded and, therefore, from Lemma 2 it follows that (uk )k is bounded with respect to ·−s . Thus, (uk )k has a subsequence which weakly converges in W . This shows that the sequence is sequentially precompact with respect to τW . 5. Let us follow up on the proof of the previous item. – Let us denote the weak limit of (uk )k by u in W . We show that u ∈ Mα (M ). We use that .RM is lower semi-continuous with respect to W . Moreover, since L : W → L2 (Ω ) is bounded, the functional w → Lw − v δ 2 is lower semicontinuous with respect to W . Thus, the sum of both terms is lower semi-continuous and thus u ∈ Mα (M ). Thus Mα (M ) is sequentially closed. – The operator L|dom F is weakly continuous and dom F is weakly sequentially closed, which follows from Lemma 2, which states that dom F = M is closed and convex, and since L is bounded on W . Therefore, Assumption 2 is satisfied and the assertions follow. Theorem 5 requires the existence of a solution of (3) in D. Thus, for the application of this result the existence of a solution with finite Radon measure is required.
5
Methodological Comparison with Finite Total Variation Regularization
The method which we are proposing is methodologically related to total variation minimization, which can be viewed as the relaxation of W 1,1 –regularization, which in turn consists in minimization of the functional
Sparsity Regularization for Radon Measures
u→
Ω
δ 2
459
(F u − v ) + α
|∇u| . Ω
Total variation minimization consists in minimization of u → Ω (F u − v δ )2 + α |Du|, where |Du| is the total variation of u, which is the norm of the finite, vector valued, Radon measure Du. In our context the regularization is with respect to Radon measures, which is a relaxation of L1 –regularization. Thus, total variation regularization can be considered as a regularization method on Radon measures for the first derivatives of the function, while according to our theory, L1 -regularization is for the distributions in W −2,2 (Ω). The derived analogy is not completely satisfactory and certainly subject to further research. The analogy to total variation minimization suggests that the smallest Sobolev space, which is a Hilbert space and contains the Radon measures, is W −1,2 (Ω). However, based on our analysis so far, this space is slightly too small to perform analytical studies. Our analysis is based on using the standard Sobolev embedding theorem and as a consequence, slightly more regularity properties on the linear operator F have to be imposed, than expected from the comparison with the total variation analysis.
6
Application in Nuclear Medicine
Apart from a purely theoretical background the concept of sparse data also proves relevant to a variety of real-world applications. As far as the imaging point of view is concerned we consider the field of nuclear medicine one major area of interest. Basically, however, any type of peaky (clustered) data on an otherwise relatively homogeneous background appears suitable for sparsity reconstruction. In the following we give a short description of the above research topic in order to provide a short introduction to the practical part of sparsity regularization: The two most popular techniques in nuclear medicine, PET (Positron Emission Tomography) respectively SPECT (Single Photon Emission Tomography), both rely on nuclear disintegration. Here, a tomographic scanner measures the decay of a radioactive tracer substance which has previously been injected into the patients body. Such a procedure, e.g., often appears in cancer diagnosis. As far as the field of imaging is concerned we consider the related isotopes our sparse data. Based on the respective measurements we obtain a so-called sinogram, plotting the number of radioactive disintegrations against the different scanner angles. The actual image is, then reconstructed according to the given sinogram. In the medical imaging context sparse variational reconstructions have already been used for MRI RF excitation pulse design in [15]. 6.1
Algorithm Characteristics
The current section focuses on the most important implementation characteristics of the main reconstruction algorithms involved in sparsity reconstruction.
460
O. Scherzer and B. Walch
Firstly, we have decided to apply our sample data (see Paragraph 6.2) to the following Daubechies, Defrise, DeMol [7] (DDD)-type implementation uk+1 := uk − λF ∗ (F uk − v δ ) − α sgn(uk+1 )
(9)
where the last term represents the sign (denoted by the sgn) operator, applied k+1 to the next step reconstruction, and may also be expressed by |uuk+1 | . We, thus, obtain an alternative formulation
α −1 k+1 ) := 1 + k+1 uk+1 = uk − λF ∗ (F uk − v δ ) . (10) S (u |u | As indicated by the notation the set valued operator S −1 contains a univariate inverse and therefore, we get an implementable scheme by applying the inverse of S −1 : (11) uk+1 = S(uk − λF ∗ (F uk − v δ )) . where
⎧ ⎪ ⎨t + α S(t) := t − α ⎪ ⎩ 0
if t ≤ −α if t ≥ +α else.
(12)
We refer to this implementation as of DDD-type, since the implementation is for function (actually measures) and not basis coefficients, as the original sparsity is devoted to. Aside from this difference it is the algorithm suggested in [7]. The numerical implementation is for piecewise constant functions approximating Radon measures. The situation is analogous as in the case of total variation regularization with finite elements where derivative (which are Radon measures) are approximated by derivatives of finite element. 6.2
Experimental Results
In order to test the practical relevance of the above method we have created test data with a constant background exhibiting (clusters of) peaks as we consider them the most realistic scenario. Most practical acquisition devices, however, rarely yield noise free data, which has lead to the decision of adding to our sample data v different types of noise. I. e., in order to achieve a proper real-world scenario we restrict the input to our reconstruction algorithms (see Paragraph 6.1) to noisy sinograms v δ only. Since the tested algorithms are mainly intended for medical use we have decided to adapt the sample framework to the nature of nuclear medical data acquisition. Most underlying processes in this field exhibit a clear Poisson nature, which has motivated the decision to overlay the clear sinogram data with typical Poisson noise. From a programming point of view we have decided to allow for the specification of four different parameters, each of which may have a certain influence on the outcome of the reconstruction process. The weighting parameters λ and α from Equations (9) to (12) appear an obvious choice in this case. Furthermore,
Sparsity Regularization for Radon Measures 1
1
4
l Regularization 2.5
461
l Residuals
x 10
200 180 2 160 140 1.5 Residuals
120 100
1
80 60
0.5
40 20 0
0
0
100
200
300 400 Number of Iterations
500
600
Fig. 1. The above figure is to illustrate the convergence behavior of our proposed regularization scheme from a practical point of view. The right hand side plot shows the declining residuals obtained during the computation process yielding the reconstruction image to the left. α = 0.0036
α = 0.00036
α = 0.000036
α = 0.0000036
200
150
100
50
0
Fig. 2. Decreasing values of α tend to sharpen even smaller object boundaries but at the same time also produce more background noise. Increasing the parameter, however, results in a quite homogeneous background while blurring and sometimes even removing smaller objects.
we have added one algorithm-independent input parameter, i.e., the number of iteration cycles. With the above implementation details specified, we have, finally, submitted the DDD-type algorithm from Paragraph 6.1 to different test cases. Number of Iterations: As obvious from the problem statement in Equations (9) to (12) the final reconstruction is created from iteratively updating the current reconstruction image. In most cases the starting image will be of random nature. The number of iterations may, thus, have a certain impact on the outcome of the reconstruction process. For our algorithm we have created test cycles within the range of [25, 1600], with the remaining parameters fixed. In this respect we have determined 50 cycles as the minimum value for obtaining a relatively reliable result. Note, however, that here, object boundaries appear blurred on an otherwise constant background. With an increasing number of iterations the different objects become sharper, while on
462
O. Scherzer and B. Walch l1 Regularization
L2 Regularization
W1,2 Regularization
Fig. 3. The above figures are intended to compare our benchmark results to those of other popular methods, e. g., L2 and W 1,2 regularization. Here, l1 , as depicted to the left tends to yield the clearest approximations of the original objects. We have however noticed, that in some cases small peaks may not be preserved during the regularization process. On the other hand, L2 appears not only slightly more blurred but also fails to remove the circle object caused by the Radon Transform which we consider a major drawback. Finally, W 1,2 regularization tends to produce strong object blurs which may be a problem not only for small and dense peaks but also deteriorate the overall reconstruction quality.
the other hand we are faced with the problem of an ever more inhomogeneous background. Weighting Parameters: As described in paragraph 6.1 the implementation includes two weighting parameters λ and α closely related to each other. Since we consider the role of the first one to be of higher importance we have decided for a ratio-based test environment. I. e., setting λ with the range of [0.016, 0.16] we have evaluated the quality of the reconstructions with α at 10λn , where 1 ≤ n ≤ 4. The described test framework has, furthermore, helped in limiting the computational power involved to a reasonable extent. Interestingly our experiments have shown that the ratio between λ and α turns out less important provided the first parameter is selected ’correctly’. There were no obvious differences between images with α = 10λ2 or α = 10λ3 . On the other side, we have noticed lower values of λ producing a more homogeneous background while higher ones resulted in sharper object boundaries. In this respect the effects appear similar to those described for varying numbers of iterations. Finally we may conclude that there exists a certain relation between the number of iterations and the choice of λ. The higher we set the weighting parameter the sooner we have to stop the iterative cycle in order to limit the background inhomogeneities to a certain extent.
Acknowledgement This work has been supported by the Austrian Science Fund (FWF) within the national research networks Industrial Geometry, project 9203-N12, and Photoacoustic Imaging in Biology and Medicine, project S10505-N20.
Sparsity Regularization for Radon Measures
463
References 1. Adams, R.A.: Sobolev Spaces. Academic Press, New York (1975) 2. Bredies, K., Lorenz, D.: Iterated hard shrinkage for minimization problems with sparsity constraints. SIAM J. Sci. Comput. 30(2), 657–683 (2008) 3. Candès, E.J., Romberg, J., Tao, T.: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489–509 (2006) 4. Combettes, P.L., Pesquet, J.-C.: Proximal thresholding algorithm for minimization over orthonormal bases. SIAM J. Optim. 18(4), 1351–1376 (2007) 5. Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul. 4(4), 1168–1200 (electronic) (2005) 6. Conway, J.B.: A Course in Functional Analysis, 2nd edn. Graduate Texts in Mathematics, vol. 96. Springer, Heidelberg (1990) 7. Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Comm. Pure Appl. Math. 57(11), 1413–1457 (2004) 8. Daubechies, I., Fornasier, M., Loris, I.: Accelerated projected gradient methods for linear inverse problems with sparsity constraints. J. Fourier Anal. Appl. (to appear) (2008) 9. Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006) 10. Figueiredo, M., Nowak, R., Wright, S.: Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems. IEEE J. Sel. Topics Signal Process 1(4), 586–598 (2007) 11. Griesse, R., Lorenz, D.: A semismooth Newton method for Tikhonov functionals with sparsity constraints. Inverse Probl. 24(3), 035007, 19 (2008) 12. Ramlau, R., Teschke, G.: A Tikhonov-based projection iteration for nonlinear illposed problems with sparsity constraints. Numer. Math. 104(2), 177–203 (2006) 13. Scherzer, O., Grasmair, M., Grossauer, H., Haltmeier, M., Lenzen, F.: Variational Methods in Imaging. Applied Mathematical Sciences, vol. 167. Springer, New York (2008) 14. Tropp, J.A.: Just relax: convex programming methods for identifying sparse signals in noise. IEEE Trans. Inf. Theory 52(3), 1030–1051 (2006) 15. Zelinski, A.C., Wald, L.L., Setsompop, K., Goyal, V.K., Adalsteinsson, E.: Sparsityenforced slice-selective MRI RF excitation pulse design. IEEE Trans. Med. Imag. 27, 1213–1229 (2008)
Split Bregman Algorithm, Douglas-Rachford Splitting and Frame Shrinkage Simon Setzer University of Mannheim, A5, 68131 Mannheim, Germany [email protected] http://kiwi.math.uni-mannheim.de
Abstract. We examine relations between popular variational methods in image processing and classical operator splitting methods in convex analysis. We focus on a gradient descent reprojection algorithm for image denoising and the recently proposed Split Bregman and alternating Split Bregman methods. By identifying the latter with the so-called DouglasRachford splitting algorithm we can guarantee its convergence. We show that for a special setting based on Parseval frames the gradient descent reprojection and the alternating Split Bregman algorithm are equivalent and turn out to be a frame shrinkage method.
1
Introduction
In recent years variational models were successfully applied in image restoration. These methods came along with various computational algorithms. Interestingly, the roots of many restoration algorithms can be found in classical algorithms from convex analysis dating back more than 40 years. It is useful from different points of view to discover these relations: Classical convergence results carry over to the restoration algorithms at hand and ensure their convergence. On the other hand, earlier mathematical results have found new applications and should be acknowledged. The present paper fits into this context. Our aim is twofold: First, we show that the Alternating Split Bregman Algorithm proposed by Goldstein and Osher for image restoration and compressed sensing can be interpreted as a DouglasRachford Splitting Algorithm. In particular, this clarifies the convergence of the algorithm. Second, we consider the following denoising problem which uses an L2 data-fitting and a Besov-norm regularization term [1] 1 1 (Ω) }. argmin { u − f 2L2 (Ω) + λuB1,1 1 u∈B1,1 (Ω) 2
(1)
We show that for discrete versions of this problem involving Parseval frames the corresponding alternating Split Bregman Algorithm can be seen as an application of a Forward-Backward Splitting Algorithm. The latter is also related to the Gradient Descent Reprojection Algorithm, see Chambolle [2]. Since our methods are based on soft (coupled) frame shrinkage, we also establish the relation to the X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 464–476, 2009. c Springer-Verlag Berlin Heidelberg 2009
Split Bregman Algorithm, Douglas-Rachford Splitting and Frame Shrinkage
465
classical wavelet shrinkage scheme. Finally, we consider the Rudin-Osher-Fatemi model [3] 1 argmin u − f 2L2 (Ω) + λ |∇u(x)| dx, (2) u∈BV (Ω) 2 Ω which is a successful edge-preserving image restoration method. We apply our findings to create an efficient frame-based minimization algorithm for the discrete version of this problem.
2
Operator Splitting Methods
Proximation and Soft Shrinkage. We start by considering the proximity operator proxγΦ (f ) := argmin{ u∈H
1 u − f 2 + Φ(u)} 2γ
(3)
on a Hilbert space H. If Φ : H → R ∪ {+∞} is proper, convex and lower semi-continuous (lsc), then for any f ∈ H, there exists a unique minimizer u ˆ := proxγΦ (f ) of (3). By Fermat’s rule, this minimizer is determined by the inclusion 1 (ˆ u − f ) + ∂Φ(ˆ u) γ ⇔f ∈u ˆ + γ∂Φ(ˆ u) ⇔ u ˆ = (I + γ∂Φ)−1 f, 0∈
where the set-valued function ∂Φ : H → 2H is the subdifferential of Φ. If Φ is proper, convex and lsc, then ∂Φ is a maximal monotone operator. For a set-valued function F : H → 2H , the operator JF := (I + F )−1 is called the resolvent of F . If F is maximal monotone, then JF is single-valued and firmly nonexpansive. In this paper, we are mainly interested in the following two functions Φi , i = 1, 2, on H := RM : i) Φ1 (u) := Λu1 with Λ := diag(λj )M j=1 , λj ≥ 0,
N ˜ j )N , λ ˜ j ≥ 0 and |u| := uj 2 ii) Φ2 (u) := Λ˜ |u| 1 with Λ˜ := diag(λ j=1 for uj := (uj+kN )p−1 k=0 and M = pN .
j=1
The corresponding Fenchel conjugate functions are given by i) Φ∗1 (u) := ιC (u) with C := {u ∈ RM : |uj | ≤ λj , j = 1, . . . , M }, ˜ j , j = 1, . . . , N }, ii) Φ∗2 (u) := ιC˜ (u) with C˜ := {u ∈ RM : uj 2 ≤ λ ˜ i.e., ιC (u) := 0 for u ∈ C where ιC the indicator function of the set C (or C), and ιC (u) := +∞ otherwise. A short calculation shows that for any f ∈ RM we have proxΦ1 (f ) = TΛ (f ),
proxΦ2 (f ) = T˜Λ˜(f ),
466
S. Setzer
where TΛ denotes the soft shrinkage function given componentwise by 0 if |fj | ≤ λj , Tλj (fj ) := fj − λj sgn(fj ) if |fj | > λj ,
(4)
and T˜Λ˜ denotes the coupled shrinkage function, compare [2, 4, 5], ˜j , 0 if fj 2 ≤ λ T˜λ˜ j (fj ) := ˜ ˜j . fj − λj fj /fj 2 if fj 2 > λ Similarly, we obtain proxΦ∗1 (f ) = f − TΛ (f ),
proxΦ∗2 (f ) = f − T˜Λ˜(f ).
(5)
Operator Splittings. Now we consider more general minimization problems of the form (P ) min g(u) + Φ(Du) , u∈H1
:=FP (u)
where D : H1 → H2 is a bounded linear operator and both functions g : H1 → R ∪ {+∞} and Φ : H2 → R ∪ {+∞} are proper, convex and lsc. Furthermore, 1 we assume that 0 ∈ int(D dom(g) − dom(Φ)). For g(u) := 2γ u − f 2 and D = I this is again our proximation problem. The corresponding dual problem has the form (D) − min g ∗ (−D∗ b) + Φ∗ (b) . b∈H2
:=FD (b)
We assume that solutions u ˆ and ˆb of the primal and dual problems, respectively, exist and that the duality gap is zero. In other words, we suppose that there ˆ which satisfies the Karush-Kuhn-Tucker conditions 0 ∈ ∂g(ˆ is a pair (ˆ u, d) u) + ∗ˆ ∗ ˆ D b, 0 ∈ −Dˆ ˆ is a solution of (P ) if and only if u + ∂Φ (b). Then u u) = ∂g(ˆ u) + ∂(Φ ◦ D)(ˆ u). 0 ∈ ∂FP (ˆ Similarly, a solution ˆb of the dual problem is characterized by 0 ∈ ∂FD (ˆb) = ∂(g ∗ ◦ (−D∗ ))(ˆb) + ∂Φ∗ (ˆb). In both primal and dual problem, one finally has to solve an inclusion of the form 0 ∈ A(ˆ p) + B(ˆ p). (6) Various splitting techniques make use of this additive structure. In this paper, we restrict our attention to the forward-backward splitting (FBS) and the DouglasRachford splitting (DRS). The inclusion (6) can be rewritten as fixed point equation pˆ − ηB(ˆ p) ∈ pˆ + ηA(ˆ p) ⇔ pˆ ∈ JηA (I − ηB)ˆ p, η > 0 (7) and the FBS algorithm is just the corresponding iteration. For the following convergence result and generalizations of the algorithm we refer to [6, 7, 8, 9].
Split Bregman Algorithm, Douglas-Rachford Splitting and Frame Shrinkage
467
Theorem 1 (FBS). Let A : H → 2H be a maximal monotone and βB : H → H be firmly nonexpansive for some β > 0. Furthermore, assume that a solution of (6) exists. Then, for any p(0) and any η ∈ (0, 2β) the following FBS algorithm converges weakly to such a solution of (6) p(k+1) = JηA (I − ηB)p(k) .
(8)
To introduce the DRS, we rewrite the right-hand side of (7) as
p + ηB pˆ ⇔ pˆ ∈ JηB JηA (I − ηB)ˆ p + ηB pˆ pˆ + ηB pˆ ∈ JηA (I − ηB)ˆ
:=tˆ
The DRS algorithm [10] is the corresponding iteration, where we use t(k) := p(k) + ηBp(k) . For the following convergence result, which in contrast to the FBS algorithm holds also for set-valued operators B, see [6, 8]. Theorem 2 (DRS). Let A, B : H → 2H be maximal monotone operators and assume that a solution of (6) exists. Then, for any initial elements t(0) and p(0) and any η > 0, the following DRS algorithm converges weakly to an element tˆ: t(k+1) = JηA (2p(k) − t(k) ) + t(k) − p(k) , p(k+1) = JηB (t(k+1) ). p) + B(ˆ p). If H is finiteFurthermore, it holds that pˆ := JηB ( tˆ) satisfies 0 ∈ A(ˆ dimensional, then the sequence p(k) k∈N converges to pˆ.
3
Bregman Methods (p)
For a function ϕ : H → R ∪ {+∞}, the Bregman distance Dϕ is defined as (p) (u, v) = ϕ(u) − ϕ(v) − p, u − v , Dϕ
with p ∈ ∂ϕ(v), cp. [11]. Given an arbitrary initial value u(0) and a parameter γ > 0, the Bregman proximal point algorithm (BPP) applied to (P ) has the form [12, 13, 14] 1 (p(k) ) u(k+1) = argmin{ Dϕ (u, u(k) ) + FP (u)}, γ u∈H1
p(k+1) ∈ ∂ϕ(u(k+1) ).
(9)
For conditions on ϕ such that (u(k) )k∈N converges to a minimizer of (P ), see [13] and the references therein. For ϕ := 12 · 22 , we recover the classical proximal point algorithm (PP) for (P ) which can be written as follows, compare [15], u(k+1) = proxγFP (u(k) ) = argmin u∈H1
1 u − u(k) 22 + FP (u) = Jγ∂FP (u(k) ). 2γ
468
S. Setzer
Under our assumptions on g, Φ and D, the weak convergence of the PP algorithm is guaranteed for any initial point u(0) , see [16]. In the same way, we can define the PP algorithm for (D) 1 b(k+1) = proxγ∂FD (b(k) ) = argmin b − b(k) 22 + FD (b) = Jγ∂FD (b(k) ) 2γ b∈H2 and the same convergence result holds true. It is well-known that the PP algorithm applied to (D) is equivalent to the augmented Lagrangian method (AL) for the primal problem, see, e.g., [15,14]. To define this algorithm we first transform (P ) into the constrained minimization problem min
u∈H1 ,d∈H2
E(u, d) s.t. Du = d,
(10)
where E(u, d) := g(u) + Φ(d). This problem was introduced in [29]. The corresponding AL algorithm for (P ) is then defined as 1 (u(k+1) , d(k+1) ) = argmin E(u, d) + b(k) , Du − d + Du − d22 2γ u∈H1 ,d∈H2 1 (11) b(k+1) = b(k) + (Du(k+1) − d(k+1) ). γ Indeed, it has been shown that for the same initial value b(0) the sequence (b(k) )k∈N coincides with the one produced by the PP algorithm applied to (D), see [15]. Moreover, if (b(k) )k∈N converges strongly then every strong cluster point of (u(k) )k∈N is a solution of (P ), cf. [17]. To solve the constrained optimization problem (10), Goldstein and Osher [18] proposed to use the Bregman distance (p(k) )
DE
(k)
(k) (u, d, u(k) , d(k) ) = E(u, d) − E(u(k) , d(k) ) − p(k)
− pd , d − d(k) u ,u−u
and the term
1 2γ Du
− d22 instead of FP in (9). This results in the algorithm
(u(k+1) , d(k+1) ) = argmin u∈H1 ,d∈H2
(p(k) )
DE
1 Du − d22 , (12) 2γ 1 (k) = pd + (Du(k+1) − d(k+1) ), γ
(u, d, u(k) , d(k) ) +
1 ∗ (k+1) (k+1) − d(k+1) ), pd pu(k+1) = p(k) u − D (Du γ
where we have used that (12) implies (k) 0 ∈ ∂E(u(k+1) , d(k+1) ) − pu(k) , pd
1 1 + D∗ (Du(k+1) − d(k+1) ), − (Du(k+1) − d(k+1) ) , γ γ (k+1) (k+1) (k+1) (k+1) , ,d ) − pu , pd = ∂E(u (k) (k) (k) (k) ∈ ∂E(u(k) , d(k) ). Setting pu = − γ1 D∗ b(k) and pd = so that pu , pd for all k ≥ 0 and regarding that for a bounded linear operator D, 1 (p(k) ) Du − d22 = E(u, d) − E(u(k) , d(k) ) DE (u, d, u(k) , d(k) ) + 2γ 1 1 1 Du − d22 , − b(k) , Du − Du(k) − b(k) , d − d(k) + γ γ 2γ
1 (k) γb
Split Bregman Algorithm, Douglas-Rachford Splitting and Frame Shrinkage
469
Goldstein and Osher obtained the Split Bregman method [18] (u(k+1) , d(k+1) ) = argmin
E(u, d) +
u∈H1 ,d∈H2
b
(k+1)
1 (k) b + Du − d22 , 2γ
= b(k) + Du(k+1) − d(k+1) .
(13)
As already discovered in [19], the Split Bregman algorithm (13) is just the AL algorithm (11) with the only difference that in (13) the iterates b(k) are scaled by γ. Hence, we can conclude that the sequence ( γ1 b(k) )k∈N generated by the Split Bregman method (13) converges to solutions of the dual problem. The same (k) holds true for the sequence (pd )k∈N we get from (12). To summarize: PP for (D)
=
AL for (P )
=
Split Bregman Alg.
Since the minimization problem in (13) is hard to solve, Goldstein and Osher [18] proposed the following alternating Split Bregman algorithm without a convergence proof: 1 (k) u(k+1) = argmin g(u) + b + Du − d(k) 22 , 2γ u∈H1 1 (k) b + Du(k+1) − d22 , d(k+1) = argmin Φ(d) + 2γ d∈H2 b(k+1) = b(k) + Du(k+1) − d(k+1) .
(14) (15) (16)
The next theorem identifies this alternating Split Bregman method as a special case of a DRS. DRS for (D) = Alternating Split Bregman Alg. If H1 and H2 are finite-dimensional it therefore provides us with a convergence result for the sequence (b(k) )k∈N of this algorithm. Theorem 3. The alternating Split Bregman algorithm coincides with the DRS algorithm applied to (D) with A := ∂(g ∗ ◦ (−D∗ )) and B := ∂Φ∗ , where η = 1/γ and t(k) = η(b(k) + d(k) ), p(k) = ηb(k) , k ≥ 0. (17) Proof: 1. First, we show that for a proper, convex, lsc function h : H1 → R ∪ {+∞} and a bounded linear operator K : H1 → H2 the following relation holds true: η pˆ = argmin Kp − q2 + h(p) ⇒ η(K pˆ − q) = Jη ∂(h∗ ◦(−K ∗ )) (−ηq). 2 p∈H1 (18) The first equality is equivalent to
0 ∈ ηK ∗ (K pˆ − q) + ∂h(ˆ p) ⇔ pˆ ∈ ∂h∗ − ηK ∗ (K pˆ − q) .
470
S. Setzer
Applying −ηK on both sides and adding −ηq implies
−ηK pˆ ∈ −ηK∂h∗ − ηK ∗ (K pˆ − q) = η ∂ h∗ ◦ (−K ∗ ) η(K pˆ − q)
−ηq ∈ I + η ∂(h∗ ◦ (−K ∗ )) η(K pˆ − q) which is by the definition of the resolvent equivalent to the right equality in (18). 2. Applying (18) to (14) with h := g, K := D and q := d(k) − b(k) we get η(b(k) + Du(k+1) − d(k) ) = JηA (η(b(k) − d(k) )). Assume that the alternating Split Bregman iterates and the DRS iterates coincide with the identification (17) up to some k ∈ N. Using this induction hypothesis it follows that η(b(k) + Du(k+1) ) = JηA (η(b(k) − d(k) )) + ηd(k) = t(k+1) .
2p(k) −t(k)
(19)
t(k) −p(k)
By definition of b(k+1) in (16) we see that η(b(k+1) + d(k+1) ) = t(k+1) . Next we apply (18) to (15) with h := Φ, K := I and q := b(k) + Du(k+1) which gives together with (19), η(b(k) + Du(k+1) − d(k+1) ) = JηB (η(b(k) + Du(k+1) )) = p(k+1) .
t(k+1)
Again by the formula (16) for b(k+1) we obtain ηb(k+1) = p(k+1) which completes the proof. 2 A similar result was shown in [20, 21].
4
Application to Image Denoising
In the following, we restrict our attention to a discrete setting. We consider digital images defined on {1, . . . , n} × {1, . . . , n} and reshape them columnwise into vectors f ∈ RN with N = n2 . If not stated otherwise the multiplication of vectors, their square root etc. are meant componentwise. We will now apply the algorithms defined in Sections 2 and 3 to the discrete denoising problem of the form argmin u∈RN
1 2
u − f 22 + Φ(Du) ,
D ∈ RM,N ,
M ≥ N,
(20)
where Φ is defined as in Section 2. Consider the alternating Split Bregman algorithm (14)-(16) with g(u) := 12 u − f 22 . Theorem 3 implies the convergence
of b(k) k∈N and it is not hard to show that for this special choice of g, the se
quence u(k) k∈N converges to a solution of the primal problem. The quadratic functional in (14) with the above choice of g can simply be minimized by setting its gradient to zero which results in
u(k+1) = (γI + DT D)−1 γf + DT (d(k) − b(k) ) .
Split Bregman Algorithm, Douglas-Rachford Splitting and Frame Shrinkage
471
Goldstein and Osher proposed to calculate the inverse (γI + DT D)−1 by GaußSeidel iterations. Applying (4) we see that for Φ = Φ1 the solution of the proximation problem in (15) is given by d(k+1) = TγΛ (b(k) + Du(k+1) ). The following algorithm shows the case Φ = Φ1 . Observe that in order to better compare this method to the other algorithms in this section, we have changed the order in which we compute u(k+1) . This is allowed because there are no restrictions on the choice of the starting values. Algorithm (Alternating Split Bregman Shrinkage) Initialization: u(0) := f , b(0) := 0. For k = 0, 1, . . . repeat until a stopping criterion is reached d(k+1) := TγΛ (b(k) + Du(k) ), b(k+1) := b(k) + Du(k) − d(k+1) ,
u(k+1) := (γI + DT D)−1 γf + DT (d(k+1) − b(k+1) ) . For Φ = Φ2 we have to replace the soft shrinkage TγΛ by the coupled shrinkage T˜ ˜. Note that this algorithm can also be used for the deblurring problem which γΛ
differs from (20) in having a more general data-fitting term g(u) := 12 Ku − f 22 with some linear operator K. In this case one has to invert the matrix γK T K + DT D which can be diagonalized in many applications by FFT or DCT techniques, e.g., if it is circulant. The problem (20) can also be solved via its dual problem by u ˆ = f − DTˆb, where ˆb = argmin{ 1 f − DT b2 + Φ∗ (b)}, i = 1, 2 (21) 2 i 2 b∈RM see, e.g., [22]. Applying the FBS algorithm (8) to the dual problem (21) gives b(k+1) = proxγΦ∗i b(k) + γD(f − DT b(k) ) , i = 1, 2, where 0 < γ < 2/DT D2 . Using the relation (5) we obtain for Φ = Φ1
b(k+1) = b(k) + γD(f − DT b(k) ) − TΛ b(k) + γD(f − DT b(k) ) . This yields the following algorithm to compute the minimizer of (20) for Φ = Φ1 : Algorithm (FBS Shrinkage) Initialization: u(0) := f , b(0) := 0 For k = 0, 1, . . . repeat until a stopping criterion is reached
d(k+1) := TΛ b(k) + γDu(k) , b(k+1) := b(k) + γDu(k) − d(k+1) , u(k+1) := f − DT b(k+1) .
472
S. Setzer
For the functional Φ2 we have to replace the shrinkage functional by T˜Λ˜. This algorithm can also be deduced as a simple gradient descent reprojection algorithm as it was done, e.g., by Chambolle [2]. Note that this is not the often cited Chambolle algorithm in [22]. A relation of this method to the Bermúdez-Moreno algorithm which also turns out to be an FBS algorithm was shown in [23]. A connection to min-max duality was established in [24]. 4.1
Besov-Norm Regularization
For a sufficiently smooth orthogonal wavelet basis {ψi }i∈I of L2 (Ω) with wavelets of more than one vanishing moment, problem (1) can be rewritten as 1 d − c2 2 + λd 1 , 2 where c := ( f, ψi )i and d := ( u, ψi )i . In the discrete setting, consider the orthogonal matrix W ∈ RN,N having as rows the filters of orthogonal wavelets (and scaling functions) up to a certain level. Then the minimization problem corresponding to (1) is given by 1 u − f 22 + ΛW u1 2 u∈RN 1 = argmin W u − W f 22 + ΛW u1 . 2 N u∈R
u ˆ = argmin
(22)
ˆ where The orthogonality of W yields further u ˆ = W T d, 1 dˆ = argmin d − c22 + Λd1 , 2 d∈RN
c := W f, Λ := λIN
(23)
and by (4) we obtain the known wavelet shrinkage procedure u ˆ = W T TΛ (W f ) consisting of a wavelet transform W followed by soft shrinkage TΛ of the wavelet coefficients and the inverse wavelet transform W T . However, for image processing tasks like denoising or segmentation, ordinary orthogonal wavelets are not suited due to their lack of translational invariance which leads to visible artefacts. Nevertheless, without the usual subsampling, the method becomes translationally invariant and the results can be improved. But then W ∈ RM,N , M = pN , where p is three times the decomposition level plus one for the rows belonging to the scaling function filters on the coarsest scale. We still have W T W = IN , but of course W W T = IM , i.e., the rows of W form a discrete Parseval frame on RN but not a basis. For the design of such frames see, e.g., [25, 26]. Equality (22) is still true for Parseval frames, but the problem is no longer equivalent to (23). Instead we can apply FBS shrinkage or alternating Split Bregman shrinkage with D = W and Φ = Φ1 . Note that in order to use the FBS algorithm, γ has to fulfill 0 < γ < 2/W TW 2 . Now W T W = IN , thus we have to choose γ in (0, 2) and γ = 1 is an admissible choice. It was shown in [27] that both algorithms coincide for D = W with W T W = IN and γ = 1:
Split Bregman Algorithm, Douglas-Rachford Splitting and Frame Shrinkage
Alternating Split Bregman Shrinkage
473
FBS Shrinkage
=
Moreover, the third step of both algorithms can be simplified to the frame synthesis step u(k+1) = W T d(k+1) . 4.2
(24)
ROF Regularization
In this section, we apply the algorithms presented so far to the discrete ROF denoising method. We use an appropriate discretization of the absolute value of the gradient. Let h0 := 12 [1 1] and h1 := 12 [1 − 1] be the filters of the Haar wavelet. For convenience of notation, we use periodic boundary conditions and the corresponding circulant matrices are denoted by H0 ∈ Rn,n and H1 ∈ Rn,n . Then the following matrix fulfills W T W = IN but W T W = I4N ⎛
⎞
⎛
⎞
H0 ⊗ H0 H0 ⎜ H0 ⊗ H1 ⎟ ⎜ ⎟ W := ⎝ = ⎝ ⎠. H1 ⊗ H0 ⎠ H1 H1 ⊗ H1
In [4,5] it was shown that
2
2
2 12 (H0 ⊗ H1 ) u + (H1 ⊗ H0 ) u + (H1 ⊗ H1 ) u
is a consistent finite difference discretization of |∇u|. Using this gradient discretization, the discrete version of the ROF functional in (2) reads argmin u∈RN
1 u − f 22 + Λ˜ |H1 u| 1 , 2
Λ˜ := λIN .
(25)
Observe that if we use the alternating Split Bregman algorithm with D = H1 for this problem we have to solve a linear system of equations in the third step of each iteration. This problem can be avoided by using that H1 is part of a Parseval frame, cp. [27]. To this end we define the proper, convex and lsc functional Φ˜2 which differs from Φ2 in that the first part of the input vector is neglected, i.e., Φ˜2 (c) = Λ˜ |c1 | 1 ,
for c = (c0 , c1 ) ∈ RN × R3N .
Now we can rewrite (25) as follows argmin u∈RN
1 u − f 22 + Φ˜2 (W u) . 2
Applying the alternating Split Bregman algorithm, or equivalently the FBS method, with γ = 1 and (24) we obtain the following algorithm.
474
S. Setzer
Initialization: u(0) := f , b(0) := 0. For k = 0, 1, . . . repeat until a stopping criterion is reached (k+1)
d0
(k+1)
d1 b
(k+1)
u(k+1)
:= (W u(k) )0 ,
:= T˜ ˜ b(k) + (W u(k) )1 , Λ
(k+1)
+ (W u(k) )1 − d1 (k+1) d0 T , := W (k+1) d1 := b
(k)
, (26) (0)
where (W u)0 := H0 u and (W u)1 := H1 u. Note that starting with b0 := 0 all (k) iterates b0 remain zero vectors. We also obtain algorithm (26) if we apply FBS shrinkage directly to (25) with D = H1 and γ = 1. We now give a numerical example for these two algorithms. The computations were performed in MATLAB. In Fig. 1 we see the result of applying the two algorithms to a noisy image. Note that we only show the resulting image for algorithm (26) here, since the difference to the alternating Split Bregman
0.3 0.25 0.2 0.15 0.1 0.05 0 −0.05 −0.1 −0.15 −0.2
Fig. 1. Comparison of algorithm (26) and the alternating Split Bregman method with D = H1 . Stopping criterion: u(k+1) − u(k) ∞ < 0.5. Top left: Original image. Top right: Noisy image (white Gaussian noise with standard deviation 25). Bottom left: Algorithm (26), λ = 70, (53 iterations). Bottom right: Difference to alternating Split Bregman shrinkage with D = H1 , (53 iterations).
Split Bregman Algorithm, Douglas-Rachford Splitting and Frame Shrinkage
475
method with D = H1 is marginal. We also found that the two algorithms need nearly the same number of iterations. However, algorithm (26) is extremely fast and does not require solving a linear system of equations as the alternating Split Bregman shrinkage does. Moreover, γ = 1 seems to be a very good parameter choice. For the above numerical experiment we used periodic boundary conditions, concerning Neumann boundary conditions, see, e.g., [28].
References 1. DeVore, R.A., Lucier, B.J.: Fast wavelet techniques for near-optimal image processing. In: IEEE MILCOM 1992 Conf. Rec., vol. 3, pp. 1129–1135. IEEE Press, San Diego (1992) 2. Chambolle, A.: Total variation minimization and a class of binary MRF models. In: Rangarajan, A., Vemuri, B.C., Yuille, A.L. (eds.) EMMCVPR 2005. LNCS, vol. 3757, pp. 136–152. Springer, Heidelberg (2005) 3. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 4. Mrázek, P., Weickert, J.: Rotationally invariant wavelet shrinkage. In: Michaelis, B., Krell, G. (eds.) DAGM 2003. LNCS, vol. 2781, pp. 156–163. Springer, Heidelberg (2003) 5. Welk, M., Steidl, G., Weickert, J.: Locally analytic schemes: A link between diffusion filtering and wavelet shrinkage. Applied and Computational Harmonic Analysis 24, 195–224 (2008) 6. Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM Journal on Numerical Analysis 16(6), 964–979 (1979) 7. Tseng, P.: Applications of a splitting algorithm to decomposition in convex programming and variational inequalities. SIAM Journal on Control and Optimization 29, 119–138 (1991) 8. Combettes, P.L.: Solving monotone inclusions via compositions of nonexpansive averaged operators. Optimization 53(5–6), 475–504 (2004) 9. Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. Multiscale Modeling and Simulation 4, 1168–1200 (2005) 10. Douglas, J., Rachford, H.H.: On the numerical solution of heat conduction problems in two and three space variables. Transactions of the American Mathematical Society 82(2), 421–439 (1956) 11. Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics 7(3), 200–217 (1967) 12. Eckstein, J.: Nonlinear proximal point algorithms using Bregman functions, with applications to convex programming. Mathematics of Operations Research 18(1), 202–226 (1993) 13. Kiwiel, K.C.: Proximal minimization methods with generalized Bregman functions. SIAM Journal on Control and Optimization 35(4), 1142–1168 (1997) 14. Frick, K.: The Augmented Lagrangian Method and Associated Evolution Equations, Dissertation, University of Innsbruck (2008) 15. Rockafellar, R.T.: Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Mathematics of Operations Research 1(2), 97– 116 (1976)
476
S. Setzer
16. Browder, F.E., Petryshyn, W.V.: The solution by iteration of nonlinear functional equations in Banach spaces. Bulletin of the American Mathematical Society 72, 571–575 (1966) 17. Iusem, A.N.: Augmented Lagrangian methods and proximal point methods for convex optimization. Investigación Operativa 8, 11–49 (1999) 18. Goldstein, D., Osher, S.: The Split Bregman method for l1 regularized problems. UCLA CAM Report (2008) 19. Yin, W., Osher, S., Goldfarb, D., Darbon, J.: Bregman iterative algorithms for 1 minimization with applications to compressed sensing. SIAM Journal on Imaging Sciences 1(1), 143–168 (2008) 20. Eckstein, J., Bertsekas, D.P.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Mathematical Programming 55, 293–318 (1992) 21. Gabay, D.: Applications of the method of multipliers to variational inequalities. In: Fortin, M., Glowinski, R. (eds.) Augmented Lagrangian Methods: Applications to the Numerical Solution of Boundary–Value Problems. Studies in Mathematics and its Applications, vol. 15, pp. 299–331. North–Holland, Amsterdam (1983) 22. Chambolle, A.: An algorithm for total variation minimization and applications. Journal of Mathematical Imaging and Vision 20, 89–97 (2004) 23. Aujol, J.F.: Some first-order algorithms for total variation based image restoration. Preprint ENS Cachan (2008) 24. Zhu, M., Chan, T.: An efficient primal-dual hybrid gradient algorithm for total variation image restauration. UCLA CAM Report (2008) 25. Daubechies, I., Han, B., Ron, A., Shen, Z.: Framelets: MRA-based construction of wavelet frames. Applied and Computational Harmonic Analysis 14, 1–46 (2003) 26. Dong, B., Shen, Z.: Pseudo-splines, wavelets and framelets. Applied and Computational Harmonic Analysis 22, 78–104 (2007) 27. Setzer, S., Steidl, G.: Split Bregman method, gradient descent reprojection method and Parseval frames. Preprint Univ. Mannheim (2008) 28. Chan, R.H., Setzer, S., Steidl, G.: Inpainting by flexible Haar-wavelet shrinkage. SIAM Journal on Imaging Science 1, 273–293 (2008) 29. Wang, Y., Yang, J., Yin, W., Zhang, Y.: A new alternating minimization algorithm for total variation image reconstruction. SIAM Journal on Imaging Sciences 1(3), 248–272 (2008)
Anisotropic Smoothing Using Double Orientations Gabriele Steidl and Tanja Teuber University of Mannheim, A5, 68131 Mannheim, Germany [email protected], [email protected] http://kiwi.math.uni-mannheim.de Abstract. To improve the quality of image restoration methods directional information has recently been involved in the restoration process. In this paper, we propose a two step procedure for denoising images that is particularly suited to recover sharp vertices and X junctions in the presence of heavy noise. In the first step, we estimate the (smoothed) orientations of the image structures, where we find the double orientations at vertices and X junctions using a model of Aach et al. Based on shape preservation considerations this directional information is then applied to establish an energy functional which is minimized in the second step. We discuss the behavior of our new method in comparison with single direction approaches appearing, e.g., when using the classical structure tensor of Förstner and Gülch and demonstrate the very good performance of our method by numerical examples.
1
Introduction
Recently, much effort has been put into improving image restoration processes by involving directional information. Our paper contributes to this topic. We restrict our attention to the denoising of images f ∈ L2 (R2 ) corrupted by heavy white Gaussian noise and the minimization of energy functionals 1 2 f − uL2 + λJ(u) , arg min (1) u∈L2 2 where J : L2 → R≥0 ∪ {+∞} denotes a proper, convex, closed functional which is in addition positively homogeneous. Frequently applied examples of such functionals are R2 ϕ(∇u) dx, u ∈ BVϕ , J(u) := (2) ∞, u ∈ L2 \BVϕ , where ϕ(x) = ϕ1 (x) := |x1 | + |x2 | as in [1, 2] or ϕ(x) = ϕ2 (x) := x21 + x22 as 2 2 in the Rudin-Osher-Fatemi (ROF) model [3]. Here BVϕ (R ) := {u ∈ L2 (R ) : ϕ(∇u) dx < ∞} denotes the (anisotropic) space of functions of bounded R2 variation equipped with the norm ϕ(∇u) dx := sup − u(x) divV (x) dx, (3) R2
1 (R2 ,R2 ) V ∈Cc V ∈Wϕ a.e.
R2
X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 477–489, 2009. c Springer-Verlag Berlin Heidelberg 2009
478
G. Steidl and T. Teuber
where the Wulff shape Wϕ := {x ∈ R2 : x, y ≤ ϕ(y) ∀y ∈ R2 } of ϕ is the unit square with horizontal and vertical edges in case ϕ = ϕ1 and the unit circle for ϕ = ϕ2 . Note that other positively homogeneous, finite, convex, even functions ϕ with ϕ(0) = 0 and ϕ(x) > 0 for x = 0 can be used in (2) and that the spaces BVϕ are equivalent for all these functions [4]. Besides (2) we will also apply inf convolution functionals J(u) := (J1 2J2 )(u) :=
inf
u=u1 +u2
{J1 (u1 ) + J2 (u2 )},
(4)
where J1 , J2 are nonnegative, proper, convex, closed and positively homogeneous. A possible choice for J and J suggested, e.g., in [5] are |∂x u1 | dx 1 2 R2 and R2 |∂y u2 | dx. It is well known that for large regularization parameters λ model (2) with ϕ1 and similarly the above inf convolution model tends to cut vertices vertically and horizontally while the ROF approach rounds them. Therefore we propose to introduce local directional information obtained from the double direction tensors of Aach et al. [6] into these functionals. Outline of our paper. In Sec. 2 we recall the single orientation estimations provided by the structure tensor in [7]. Then we turn to the double orientation estimations proposed in [6], where we get some additional insights on the nullspaces of these tensors. In Sec. 3 we start with shape preservation facts as motivation for the subsequent introduction of our new directional denoising model. Furthermore, we discuss our orientation choice in comparison to the classical structure tensor. The good performance of our method is demonstrated by numerical examples in Sec. 4. Conclusions are given in Sec. 5. More details including proofs are contained in the accompanying preprint [8]. Related work. Image restoration by first approximating the local geometry and then involving it into the restoration process was suggested in various papers. A group of methods retrieves the local geometry by computing the Gülch/Förstner structure tensor and then uses its eigenvalues and orthogonal eigenvectors to define a diffusion tensor which steers the direction of the flux in PDEs. Tschumperlé [9] divided these methods into divergence-based [10], tracebased [11] and his curvature-based methods. The first approach is also related to the minimization of specific energy functionals, see, e.g., [12,13]. The curvaturebased method [14, 9] which is related to the line integral convolution [15] is better suited for the restoration of sharp edges than the other two methods, but our method is superior in the presence of heavy noise. Note that as in [16] the curvature-based method can include multiple directions. Various papers deal with the smoothing of normal vectors by minimizing certain energy functionals [17, 18, 19, 20, 21, 22] and use this information for subsequent denoising. In general these minimization procedures are much more expensive then our double direction approach. Kimmel, Sochen et al. suggested restoration techniques within the Beltrami framework [23]. The corresponding smoothing with the socalled ’short-time Beltrami kernel’ differs from the bilateral filters [24] in the fact that it uses geodetic distances on the image manifold while the bilateral
Anisotropic Smoothing Using Double Orientations
479
kernel applies Euclidian distances. In [25], the authors considered special images containing rotated rectangle and established a unique functional both for finding the rotation angles and for denoising. However, the resulting algorithm is again a two step procedure. For a simpler two step approach we refer to [26]. So far, the best results behind our new method we have obtained by applying nonlocal means [27, 28]. An example is reported in Sec. 4.
2 2.1
Orientation Estimations Single Orientation Estimations
Let Ω ⊂ R2 be the image part of interest. For simplicity, we assume that Ω := Bε (0) is the ball around 0 with radius ε. Our ideal assumption is that this part of the image corresponds to a function f : Ω → R which has constant values along a single direction r with r2 = 1, i.e., f = ϕ(sT ·) with s := r⊥ = (r2 , −r1 )T and ϕ : [−ε, ε] → R. Then, 0=
∂ f (x) = rT ∇f (x) = rT ϕ (sT x) s, ∂r
∀x ∈ Ω
holds true and we also have for a nonnegative weight function w : Ω → R that 2 0= w(x) (rT ∇f (x)) dx = rT w(x)∇f (x) ∇f (x)T dx r. (5) Ω
Ω
If ϕ is not constant, then the symmetric, positive semidefinite matrix 2 J := w(x)∇f (x)∇f (x)T dx = w(x) (ϕ (sT x)) dx ssT Ω
Ω
has rank one and r is an eigenvector of the eigenvalue 0. So far we have considered image parts with an ideal directional behavior. Since in applications we deal with noisy images, a pre-smoothing step with the 2D Gaussian Kσ of standard deviation σ is performed before computing the gradient in J . Thus, (5) holds at least approximately and r is the minimizer of the weighted least squares expression rT J r subject to r2 = 1, i.e., the eigenvector belonging to the smallest eigenvalue of J . Moreover, in natural images the significant directions vary in different image parts. To detect the direction in the neighborhood of every image point x, we use the shifted Gaussian w = Kρ (· − x) (truncated outside B3ρ (x)). In this way, we can attach to each image point a 2 × 2 matrix, the so-called structure tensor Jρ := Kρ ∗ (∇fσ ∇fσT ) ,
∇fσ := ∇(Kσ ∗ f ).
If the eigenvalues of Jρ (x) fulfill λ1 λ2 , then we are in the neighborhood of an edge and the orthogonal eigenvectors r1 = r and r2 = r⊥ approximate the isophote direction and the gradient direction in x. In the neighborhood of vertices, where λ2 ≥ λ1 0, we obtain smoothed eigenvectors between neighboring edges. This causes artefacts in restoration models involving these directions. Therefore we are interested in double orientations.
480
2.2
G. Steidl and T. Teuber
Double Orientation Estimations
Assume that f can be decomposed into two functions fi = ϕi (sTi ·) with si := ri⊥ , i = 1, 2, where r1 ∦ r2 . As in Fig. 1, we consider two decompositions of f , the transparent model f (x) = f1 (x) + f2 (x) ∀x ∈ Ω (6) and the occlusion model with Ω = Ω1 ∪ Ω2 , Ω1 ∩ Ω2 = ∅ and f1 (x) for x ∈ Ω1 , f (x) = f2 (x) for x ∈ Ω2 .
Ω
f1
Ω1
f1
f2
Ω2
f2
(7)
Fig. 1. Illustration of the transparent model (left) and the occlusion model (right)
Transparent model. By the definition of f1 and f2 we conclude for all x ∈ Ω that 0=
∂2 ∂2 f1 (x) + f2 (x) = r2T H(x) r1 = r1T H(x) r2 f (x) = ∂r1 ∂r2 ∂r1 ∂r2
(8)
with the Hessian H(x) of f at x. Applying tensor products ⊗ of matrices, (8) becomes 0 = (r1 ⊗ r2 )T h(x) = (r2 ⊗ r1 )T h(x)
with h := (∂xx f, ∂xy f, ∂xy f, ∂yy f )T (9)
and since this holds true for all x ∈ Ω we also get 0= w(x) (r1 ⊗ r2 )T h(x)h(x)T (r1 ⊗ r2 ) dx = (r1 ⊗ r2 )T T (r1 ⊗ r2 )
(10)
Ω
with the symmetric, positive semidefinite matrix T := Ω w(x) h(x)h(x)T dx ∈ R4,4 . By (10) and since r1 ∦ r2 , the vectors r1 ⊗ r2 and r2 ⊗ r1 are two linearly independent eigenvectors of the eigenvalue 0 of T . Instead of determining the directions r1 and r2 via (10), Aach et al. [6] proposed to rewrite (9) by skipping the double entry ∂xy f in h as ˜ ˜ := (∂xx f, ∂xy f, ∂yy f )T , r := (r11 r21 , r11 r22 + r12 r21 , r12 r22 )T . with h 0 = rT h(x) (11) Then our determining equation (10) becomes T ˜ h(x) ˜ T dx ∈ R3,3 0 = r T r with T := w(x) h(x) (12) Ω
Anisotropic Smoothing Using Double Orientations
481
and r is an eigenvector of 0 of the symmetric, positive semidefinite matrix T . ˜ := ˜ 1 , s2 ⊗s ˜ 2 ), v ⊗v More precisely, we can prove that T = S Φ S T with S := (s1 ⊗s 2 2 T (v1 , v1 v2 , v2 ) and
2 ϕ1 (sT1 x) ϕ2 (sT2 x) ϕ1 (sT1 x) dx w(x) Φ := T 2 ϕ1 (sT1 x)ϕ2 (sT2 x) Ω ϕ2 (s2 x) so that rank T = 0 if ϕi ∈ Π1 , i = 1, 2, rank T = 1 if ϕi ∈ Π1 for exactly one i or ϕi ∈ Π2 \ Π1 for i = 1, 2, rank T = 2 otherwise, where Πn denotes the space of polynomials on [−ε, ε] of degree ≤ n. If rank T = 2 (vertex case), then the nullspace of T is N (T ) = {c r : c ∈ R}. If rank T = 1 (edge case) and ϕ1 is linear but ϕ2 not, then N (T ) = {(r11 c1 , r11 c2 + r12 c1 , r12 c2 )T : c = (c1 , c2 )T ∈ R2 }, i.e., c plays the role of r2 in (11). There exist several possibilities to detect the directions ri , i = 1, 2 from an eigenvector u = (u1 , u2 , u3 )T ∈ N (T ). For example, it is not hard to check that the following setting from [6] does the job: T T For u1 = 0 set r1 := √ 21 2 (u1 , y1 ) , r2 := √ 21 2 (u1 , y2 ) , where yi , i = 1, 2 u1 +y1
u1 +y2
are the solutions of the quadratic equation y 2 − u2 y + u1 u3 = 0. If u1 = 0, then T T yi = 0 for one i and we set ri := √ 21 2 (u2 , u3 ) and r3−i := (0, 1) . u2 +u3
In the following, we choose as direction r1 those fulfilling |r1 , ∇fσ˜ | ≤ |r2 , ∇fσ˜ |. In particular, r1 is the isophote direction at edges, where some vector c plays the role of r2 . Occlusion model. By the definition of f1 and f2 we conclude for all x ∈ Ω that 0=
∂ ∂ f (x) f (x) = (r1T ∇f (x)) (r2T ∇f (x)) = r1T ∇f (x)∇f (x)T r2 ∂r1 ∂r2
(13)
and by rewriting the equation using tensor products that T 0 = (r2 ⊗r1 )T g(x) = (r1 ⊗r2 )T g(x) with g := (∂x f )2 , ∂x f ∂y f, ∂x f ∂y f, (∂y f )2 . This reads in the reduced form with r defined by (11) as T 0 = rT g˜(x) with g˜ := (∂x f )2 , ∂x f ∂y f, (∂y f )2 . Since this relation is true for all x ∈ Ω, we also have that T 0 = r C r with C := w(x) g˜(x)˜ g (x)T dx.
(14)
Ω
Thus, r is an eigenvector of the eigenvalue 0 of the symmetric, positive semidef˜ 1 )(s1 ⊗s ˜ 1 )T + inite matrix C. More precisely, we can prove that C = α1 (s1 ⊗s 4 T ˜ 2 )(s2 ⊗s ˜ 2 )T with αi := α2 (s2 ⊗s dx, i = 1, 2, so that the rank Ωi w(x) ϕi (si x) of C is ν ∈ {0, 1, 2} if exactly 2−ν of the functions ϕi are constant on Ωi , i = 1, 2. The directions ri , i ∈ {1, 2} can be obtained from an eigenvector of N (C) as in the transparent model.
482
G. Steidl and T. Teuber
Fig. 2. Noisy images and their double orientation estimations by the occlusion model (left) and by the transparent model (right)
Double orientation tensors. In practice, we deal with noisy images having image parts with various significant directions. As for the classical structure tensor the double orientation tensors are defined as
˜σ˜ Tρ := Kρ ∗ h gσ g˜σT ) , hTσ , Cρ := Kρ ∗ (˜ ˜ := ∂xx fσ , ∂xy fσ , ∂yy fσ T , g˜ := (∂x fσ )2 , ∂x fσ ∂y fσ , (∂yy fσ )2 T and the where h directions r1 , r2 can be derived from an eigenvector of the smallest eigenvalue of Tρ /Cρ (x). For an example of estimated double orientations see Fig. 2.
3
Image Restoration and Shape Preservation
We start with a proposition which characterizes the solution of (1). Proposition 1. The function uˆ ∈ L2 is the solution of the minimization problem (1) iff i) u ˆ = f − λˆ v , ii) vˆ ∈ CJ := {v ∈ L2 : v, w ≤ J(w) ∀w ∈ L2 }, iii) ˆ u, vˆ = J(ˆ u). For the special functional (2) we have that vˆ ∈ CJ if there exists a vector field Vˆ ∈ L∞ (R2 , R2 ) such that vˆ := −divVˆ ∈ L2 (R2 ) and Vˆ ∈ Wϕ a.e. on R2 . Using this proposition, one can prove that rectangles with horizontal and vertical edges [4] and + junctions [8] are preserved by the solution of (1) with (2) and ϕ = ϕ1 . Corollary 1. The solution u ˆ of (1) with (2) and ϕ = ϕ1 reads function 1Ω of Ω := (−a, a) × (−b, b) as i) for f:= c 1Ω with the characteristic cab u ˆ = c − λ a+b 1 , λ ≤ , a, b > 0, Ω ab a+b ii) for f := c1 1Ω1 + c2 1Ω2 with Ω1 := (−l, l) × (−a, a), b) × (−l, l) as Ω2 := (−b, c1 la c2 lb l+a l+b u ˆ = c1 − λ la 1Ω1 + c2 − λ lb 1Ω2 , λ ≤ min l+a , l+b , l > a, b > 0. In this paper, we propose to modify (2) (and similarly (4)) by locally including directions. The basic idea is that the minimizer of the modified functional also preserves shapes as, e.g., shown in Fig. 3 and arbitrary X junctions. This
Anisotropic Smoothing Using Double Orientations
483
Fig. 3. Original and noisy trapezoid image (standard deviation 150)
modification can be motivated by the following considerations for a globally fixed transform matrix R: Substituting x := R−1 t, fR := f (R−1 ·), we obtain 1 (f − u)2 + λϕ(∇u) dx 2 R2 1 = (f (R−1 t) − u(R−1 t))2 + λ ϕ(∇x u(R−1 t)) dt 2|det R| R2 1 = (fR (t) − uR (t))2 + λ ϕ(RT ∇t uR (t)) dt. 2|det R| R2 Whence, if u ˆ minimizes the left-hand side, then the transformed image u ˆR := u ˆ(R−1 ·) is a minimizer of 1 2 (fR − u) dx + λ ϕ(RT ∇u) dx. (15) 2 R2 R2 n−1
In the following, we consider discrete square images f := (f (x, y))x,y=0 ∈ Rn,n in their columnwise reshaped form f ∈ RN , N := n2 . Instead of partial derivatives we use forward differences so that the discrete version of the gradient reads ⎛
11 ⎜ 1 1 ⎜ 1⎜ H0 ⊗ H1 Dx .. := , H0 := ⎜ D= . Dy H1 ⊗ H0 2⎜ ⎝ 1
⎞
⎞
⎛
−1 1 ⎟ ⎜ −1 1 ⎟ ⎜ ⎟ ⎜ .. ⎟ , H1 := ⎜ . ⎟ ⎜ ⎝ 1⎠ −1 2
⎟ ⎟ ⎟ ⎟. ⎟ 1⎠ 0
Then problem (1) becomes
arg min f − u22 + λJ(u) u∈RN
and (2) with ϕ = ϕ1 resp. (4) with
J(u) := Du1 , J(u) :=
R2
|∂x u1 | dx and
(16)
R2
|∂y u2 | dx read as
resp.
min {Dxu1 1 + Dy u2 1 }.
u=u1 +u2
The solution of (16) can be characterized as in the continuous setting:
(17) (18)
484
G. Steidl and T. Teuber
160
140
120
100
80
60
40
20
Fig. 4. Denoising with the directions r, r ⊥ from the classical structure tensor. Left: Angle of r mod 180o (σ = 2.5, ρ = 5). The directions are smoothed near vertices following the smallest way between neighboring edge directions. Middle: Denoising result using only one direction R := (r) (λ = 2500). Following this direction, obtuse vertices are rounded, while the acute one is prolongated. Right: Denoising result using both directions R = (r1 , r2 ) = (r, r ⊥ ) (λ = 1000). The edges of the minimizer u ˆ tend u|, i = 1, 2 to be aligned with one of the directions ri , i.e., one of the summands |ri , ∇ˆ becomes very small. Hence, rounding artefacts are visible at obtuse vertices, while the model decides for the wrong direction at the acute vertex which leads to a cut-off artefact.
Proposition 2. The vector u ˆ ∈ RN is the solution of the minimization problem (16) if and only if i) - iii) of Proposition 1 hold true, where L2 has to be replaced by RN with the Euclidian inner product. For the special functionals (17) and (18) T we have that vˆ ∈ CJ if and only if there exists a vector Vˆ = (Vˆ (1) )T , (Vˆ (2) )T ∈ R2N such that vˆ := DT Vˆ vˆ := DxT Vˆ (1) = DyT Vˆ (2)
and Vˆ ∞ ≤ 1, and Vˆ ∞ ≤ 1.
resp.,
As in the continuous case rectangles and + junctions are preserved by the solution of (16) with (17). However, due to image boundaries one has to be careful with the discretization. Corollary 2. Let x0 , y0 ≥ 0 and x0 + a, y0 + b ≤ n − 2. The solution u ˆ of the minimization problem (16) with J defined by (17) reads for i) f :=
:= {x0 + 1, · · · , x0 + a} × {y0 + 1, · · · , y0 + b} as c 1Ω with Ω 2(a+b) cab 1Ω , λ ≤ 2(a+b) , where Hi are modified by Hi (0, 0) = 0, u ˆ = c − λ ab Hi (n − 1, n − 1) = (−1)i , i = 0, 1. ii) f := c1 1Ω1 + c2 1Ω2 with Ω1 := {x0 + 1, · · · ,x0 + a} × {0, . . . ,n − 1}, Ω2 := {0, . . . , n − 1} × {y0 + 1, · · · , y0 + b} as u ˆ = c1 − λ a2 1Ω1 + c2 − λ 2b 1Ω2 , λ ≤ min{ ac21 , bc22 }, where Hi are modified by H0 (n − 1, 0) = 1, H1 (0, 0) = 0, Hi (n − 1, n − 1) = (−1)i , i = 0, 1. Similarly it can be shown that the inf convolution approach preserves + junctions [8].
Anisotropic Smoothing Using Double Orientations 140
140
120
120
100
100
80
80
60
60
40
40
20
20
0
0
485
Fig. 5. Denoising with double orientations from the occlusion model. Left/Middle: u|, i = 1, 2. Except at isolated vertex points the model aligns the Energies |ri , ∇ˆ edges of the minimizer u ˆ with the direction r1 (σ = 2, ρ = 9.5). Right: Denoised image (λ = 2500). Although not perfect, this result is the best we got with various denoising methods so far. 180
160
140
120
100
80
60
40
20
0
Fig. 6. Denoising with the single direction r1 from the occlusion model. Left: Angle corresponding to the chosen direction (σ = 2, ρ = 9.5, σ ˜ = 5σ). Middle: Denoising with the regularization term |r1 , ∇u| introduces textures at flat regions (λ = 2500). Right: Denoising with the regularization term |∇u| − r1⊥ , ∇u avoids these artefacts (λ = 4500).
Having (15) in mind we introduce our double orientations r1 , r2 from Subsection 2.2 into (17) resp. (18) and consider for r˜iT = (diag(ri1 ), diag(ri2 )), i = 1, 2, the minimizers of our new functionals 1 ˜ T Du1 = 1 f − u22 + λ(˜ f − u22 + λR r1T Du1 + ˜ r2T Du1 ), 2 2 1 f − u22 + λ min {˜ r1T Du1 1 + ˜ r2T Du2 1 } . u=u1 +u2 2
(19)
(20)
We want to examine the behavior of (19) by the simple denoising example in Fig. 3. First, we computed the minimizers using the directions r and r⊥ from the classical structure tensor. The appearing artefacts are commented in the caption of Fig. 4. Then, Fig. 5 shows the good denoising result with the proposed occlusion model for double orientations. Finally, Fig. 6 presents the denoising results
486
G. Steidl and T. Teuber
obtained by using only direction r1 from this model. This leads to artefacts in flat regions, where the process introduces texture due to directional smoothing of heavy noise. This effect can be avoided by replacing |r1 , ∇u | by |∇u|−r1⊥ , ∇u . Note that we have to adapt the sign of r1⊥ such that r1⊥ , ∇fσ˜ ≥ 0 here. This functional was also proposed in [19] but with a more expansive procedure to find appropriate directions r1⊥ .
4
Numerical Examples
In the following, we present further numerical examples. All programs were written in MATLAB, where we solved the minimization problems via their dual problem using second-order cone programming implemented in the software package MOSEK [29]. To discretize the derivatives occurring in the orientation estimation tensors we applied the filters suggested by Scharr in [30]. The gray values of the original images are in [0, 255] and for visualization we have used the MATLAB routine ’imagesc’, which incorporates an affine gray value scaling. Moreover, the parameters are chosen with respect to the best visual result. To start with, we took a noisy image with different shapes and restored it by nonlocal means, ROF and by (19) with occluding directions. The results are presented in Fig. 7. As already observed in [25] the result by ROF suffers from rounding artefacts at corners, since to remove all noise the regularization parameter λ has to be chosen rather large. This is avoided by (19) using occluding directions as visible at bottom right. The example with nonlocal means gives slightly worse results at corners. To demonstrate the performance on a real world image we included Fig. 8. Here, the example shows that the shape of
Fig. 7. Top: noisy image (standard deviation 100) and restored image by iterating two times the nonlocal means filter [28]. Bottom left: denoised image by ROF (λ = 500). Bottom right: restored image by (19) and occluding directions (λ = 900, σ = 2, ρ = 6).
Anisotropic Smoothing Using Double Orientations
487
Fig. 8. Top: noisy image (standard deviation 30) and result by the nonlocal means filter [28]. Bottom left: denoised image by ROF (λ = 50). Bottom right: result by (19) and occluding directions (λ = 50, σ = 0.5, ρ = 8).
Fig. 9. Left to right: original image [30], noisy image (standard deviation 10), denoised image by (19) (λ = 15, σ = 2, ρ = 12), denoised image by (20) (λ = 40, σ = 2,ρ = 12). The directions are estimated by the transparent model.
488
G. Steidl and T. Teuber
the building is much better preserved by (19) than by ROF, since the local directions in the image are treated much more accurate. In contrast to nonlocal means, our method as well as ROF suffer from staircaising effects. However, for a large smoothing parameter related to the noise level nonlocal means creates small blur artefacts where our result has sharp structures. Besides, our method is computationally much faster. Finally, to point out the benefits of inf convolution, Fig. 9 shows restored images of an oriented texture by (19) and (20) resp. using the transparent model. For such images inf convolution is better suited than (19), since (19), like ROF, aims for a piecewise constant solution, which means that too many details are removed.
5
Conclusions
We have demonstrated how directional information estimated by the transparent or the occlusion model [6] can be integrated into certain minimization problems to improve the restoration results especially at sharp corners and X junctions. For simplicity we have restricted our attention to double orientations, but a generalization to more than two directions is possible with the results presented in [31]. To further improve the restoration results one option would be to use also higher order derivatives as done in [32]. Through this, it is for example possible to overcome the staircaising effects observed for (19).
References 1. Chambolle, A.: Total variation minimization and a class of binary MRF models. In: Rangarajan, A., Vemuri, B.C., Yuille, A.L. (eds.) EMMCVPR 2005. LNCS, vol. 3757, pp. 136–152. Springer, Heidelberg (2005) 2. Hintermüller, M., Kunisch, K.: Total bounded variation regularization as a bilaterally constrained optimization problem. SIAM J. Appl. Math. 4(64), 1311–1333 (2004) 3. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 4. Esedoglu, S., Osher, S.: Decomposition of images by the anisotropic Rudin-OsherFatemi model. Comm. Pure and Applied Mathematics 57(12), 1609–1626 (2004) 5. Chambolle, A., Lions, P.L.: Image recovery via total variation minimization and related problems. Numerische Mathematik 76, 167–188 (1997) 6. Aach, T., Mota, C., Stuke, I., Mühlich, M., Barth, E.: Analysis of superimposed oriented patterns. IEEE Trans. on Image Processing 15(12), 3690–3700 (2006) 7. Förstner, W., Gülch, E.: A fast operator for detection and precise location of distinct points, corners and centres of circular features. In: Proc. ISPRS Intercommission Conf. on Fast Processing of Photogrammetric Data, pp. 281–305 (1987) 8. Teuber, T.: Anisotropic smoothing using double orientations. Preprint University of Mannheim (2009) 9. Tschumperlé, D.: Fast anisotropic smoothing of multivalued images using curvature preserving PDEs. International Journal of Computer Vision 68(1), 65–82 (2006) 10. Weickert, J.: Anisotropic Diffusion in Image Processing. Teubner, Stuttgart (1998) 11. Tschumperlé, D., Deriche, R.: Vector-valued image regularization with PSDs: A common framework for different applications. IEEE Trans. on Pattern Analysis and Machine Intelligence 27(4) (2005)
Anisotropic Smoothing Using Double Orientations
489
12. Aubert, G., Kornprobst, P.: Mathematical Problems in Image Processing: Partial Differential Equations and the Calculus of Variations. Applied Mathematical Sciences, vol. 147. Springer, New York (2002) 13. Steidl, G., Teuber, T.: Diffusion tensors for denoising sheared and rotated rectangles (submitted) (2008) 14. Tschumperlé, D.: The CImg library. C++ Template Image Processing Library, http://cimg.sourceforge.net 15. Cabral, B., Leedom, L.C.: Imaging vector fields using line integral convolution. In: SIGGRAPH 1993, Computer Graphics, vol. 27, pp. 263–272 (1993) 16. Weickert, J.: Anisotropic diffusion filters for image processing based quality control. In: Fasano, A., Primicerio, M. (eds.) Proc. Seventh European Conference on Mathematics in Industry, pp. 355–362. Teubner, Stuttgart (1994) 17. Goldfarb, D., Wen, Z., Yin, W.: A curvilinear search method for p-harmonic flows on spheres. SIAM Journal on Imaging Sciences 2(1), 84–109 (2009) 18. Kimmel, R., Sochen, N.: Orientation diffusion or how to comb a porcupine? Journal of Visual Communication and Image Representation 13(1-2), 238–248 (2002) 19. Lysaker, O., Osher, S., Tai, X.C.: Noise removal using smoothed normals and surface fitting. IEEE Trans. on Image Processing 13(10), 1345–1357 (2004) 20. Vese, L., Osher, S.: Numerical methods for p-harmonic flows and applications to image processing. SIAM Journal on Numerical Analysis 40(6), 2085–2104 (2002) 21. Yuan, J., Schnörr, C., Steidl, G.: Convex Hodge decomposition and regularization of image flows. Journal of Mathematical Imaging and Vision 33(2), 169–177 (2009) 22. Rahman, T., Tai, X.C., Osher, S.: A TV-Stokes denoising algorithm. In: Sgallari, F., Murli, A., Paragios, N. (eds.) SSVM 2007. LNCS, vol. 4485, pp. 473–483. Springer, Heidelberg (2007) 23. Spira, A., Kimmel, R., Sochen, N.: A short-time Beltrami kernel for smoothing images and manifolds. IEEE Trans. on Image Processing 16(6), 1628–1636 (2007) 24. Tomasi, C., Manduchi, R.: Bilateral filtering for gray and color images. In: Proc. Sixth Intern. Conf. on Computer Vision, pp. 839–846. Narosa Publishing House (1998) 25. Berkels, B., Burger, M., Droske, M., Nemitz, O., Rumpf, M.: Cartoon extraction based on anisotropic image classification. In: Vision, Modeling, and Visualization Proceedings, pp. 293–300 (2006) 26. Setzer, S., Steidl, G., Teuber, T.: Restoration of images with rotated shapes. Numerical Algorithms 48(1-3), 49–66 (2008) 27. Buades, A., Coll, B., Morel, J.M.: A non-local algorithm for image denoising. In: IEEE Int. Conf. on Comp. Vision and Pattern Recognition., vol. 2, pp. 60–65 (2005) 28. Manjón, J.V., Buades, A.: NL means. MATLAB Software, http://dmi.uib.es/~abuades/software.html 29. The MOSEK Optimization Toolbox, http://www.mosek.com 30. Scharr, H.: Diffusion-like reconstruction schemes from linear data models. In: Franke, K., Müller, K.-R., Nickolay, B., Schäfer, R. (eds.) DAGM 2006. LNCS, vol. 4174, pp. 51–60. Springer, Heidelberg (2006) 31. Mühlich, M., Aach, T.: A theory for multiple orientation estimation. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 69–82. Springer, Heidelberg (2006) 32. Setzer, S., Steidl, G.: Variational methods with higher-order derivatives in image processing. In: Neamtu, M., Schumaker, L.L. (eds.) Approximation Theory XII: San Antonio 2007, pp. 360–385. Nashboro Press (2008)
Image Denoising Using TV-Stokes Equation with an Orientation-Matching Minimization Xue-Cheng Tai1,2 , Sofia Borok1, and Jooyoung Hahn1 1
Division of Mathematical Sciences, School of Physical Mathematical Sciences, Nanyang Technological University, Singapore 2 Department of Mathematics, University of Bergen, Norway [email protected]
Abstract. In this paper, we propose an orientation-matching minimization for denoising digital images with an additive noise. Inspired by the two-step algorithm in the TV-Stokes denoising process [1, 2, 3], the regularized tangential vector field with the zero divergence condition is used in the first step. The present work suggests a different approach in order to reconstruct a denoised image in the second step. Namely, instead of finding an image that fits the regularized normal direction from the first step, we minimize an orientation between the image gradient and the regularized normal direction. It gives a nonlinear partial differential equation (PDE) for reconstructing denoised images, which has the diffusivity depending on an orientation of a regularized normal vector field and the weighted self-adaptive force term depending on the direction between the gradient of an image and the vector field. This allows to obtain a denoised image which has sharp edges and smooth regions, even though an original image has smoothly changing pixel values near sharp edges. The additive operator splitting scheme is used for discretizing Euler-Lagrange equations. We show improved qualities of results from various numerical experiments.
1
Introduction
Digital image denoising processes based on partial differential equations (PDEs) and energy minimization have been extensively studied for last 20 years in both theoretical and practical ways. From the Gaussian filtering to the anisotropic diffusion [4,5,6] and the total variation (TV) minimization [7,8], a noisy image has been denoised from poorly estimated derivative information. The TV-filtering is very effective for piecewise constant images and the anisotropic diffusion is adjustable to flow-like images. However, both approaches are not suitable for an image which has smoothly changing pixel values near sharp edges. Since qualities of denoised images are seriously dependent on estimated derivative information, it has been a crucial topic to regularize derivatives of an image [9], that is, an orientational information [10, 11, 12, 1]. Inspired by [1, 2, 3],
The research is supported by MOE (Ministry of Education) Tier II project T207N2202 and IDM project NRF2007IDMIDM002-010. In addition, the support from SUG 20/07 is also gratefully acknowledged.
X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 490–501, 2009. c Springer-Verlag Berlin Heidelberg 2009
The TV-Stokes Equation with an Orientation-Matching Minimization
491
we also use a regularization of the tangent vector field of an image with the zero divergence condition. The present work propose a different approach in order to reconstruct a denoised image from the regularized normal vector field, which we call an orientation-mathching minimization. That is, we minimize an orientation between the image gradient and the regularized normal direction. It gives a nonlinear PDE for reconstructing denoised images, which has the diffusivity depending on an orientation of the regularized normal vector field and the weighted self-adaptive force term depending on the direction between the gradient of an image and the vector field. This allows to obtain a denoised image which has sharp edges and smooth regions, even though an original image has smoothly changing pixel values near sharp edges. The paper is organized as follows. In Section 2, we introduce the proposed model with a review of TV-Stokes (TVS) denoising algorithm [1,2]. Some numerical aspects are explained in Section 3. Several numerical examples are shown and different models are compared in Section 4. The paper is concluded in Section 5.
2 2.1
Two-Step Denoising Model Review of TV-Stokes Denoising Algorithm
Let us consider a gray true image d: Ω ⊂ R2 → [0, 1]. We assume that a given noisy image d0 has an additive Gaussian white noise η with the relation d0 (p) = d(p) + η(p),
p = (x, y) ∈ Ω.
The normal and tangential vectors of the level curves of an image d are given by T T ∂d ∂d ∂d ∂d , ,− and t = ∇⊥ d(p) = , (1) n = ∇d(p) = ∂x ∂y ∂y ∂x where T is a transpose. Then, the vector fields are satisfied with the following conditions ∇ × n = 0 and ∇ · t = 0, which means n is the irrotational vector field and t is the incompressible vector field. This property is very crucial when an image is reconstructed from the information of n or t. The TVS denoising model [1, 2] consists of two steps to obtain a denoised image, which uses the same process in the second step as the Lysaker-OsherTai (LOT) model [10]. However, for the first step, instead of regularizing the normal vector field in the LOT model, a tangential vector field is regularized with the constraint of incompressibility. The regularized tangential vector field t is obtained by minimizing a functional: δ 2 min |∇t| + |t − t0 | dp, (2) ∇·t=0 Ω 2 where t0 = ∇⊥ d0 , δ is a positive parameter, and |∇t| is defined by
492
X.-C. Tai, S. Borok, and J. Hahn
|∇t| =
∂u ∂x
2
+
∂u ∂y
2
+
∂v ∂x
2
+
∂v ∂y
2
, ∇t =
∇u ∇v
, t=
u . v
The minimization problem is originally introduced in [2, 1]. The optimality condition for the saddle point is obtained by the gradient descent flow which gives the PDE ∂t ∇t −∇· + δ(t − t0 ) − ∇λ = 0, ∂τ |∇t| (3) ∇ · t = 0, with the boundary conditions and the initial condition ∇t + λI · ν = 0, t(p, 0) = t0 , |∇t| where I is the identity matrix. Note that it is not straightforward to use the Perona-Malik (PM) model [4] or Rudin-Osher-Fatemi (ROF) model [7] directly for regularizing derivative information of an image [9]. One of reason for regularizing the tangential vector field is that the incompressibility condition, ∇ · t = 0, is numerically computed using the Chorin projection type method which is well developed in the fluid dynamics; see details in Section 3. Moreover, the condition guarantees the existence of an image d which satisfies the relation (1). Once the regularized tangent vector field t = (u, v)T is obtained in the first step, the regularized normal vector field n is defined by (v, −u)T . In two-step algorithms for image denoising [10, 2, 1] and image inpainting [2], it is suggested to solve the following minimization problem in the second step to reconstruct an image from n: n min dp, (4) |∇d| − ∇d · |n| d−d0 2 =σ Ω where ·2 is the L2 (Ω) norm and σ is the standard deviation of a Gaussian white noise. From the Euler-Lagrange equation and the gradient descent method along fictitious time τ , we obtain a PDE for reconstructing an image with the free flux boundary condition and an initial condition d(p, 0) = d0 (p): n ∂d ∇d (p, τ ) = ∇ · − − μ(d − d0 ), (5) ∂τ |∇d| |n| where μ is a positive parameter. Note that the ROF model is in the case of n = 0, which means that TV-norm filter is very suitable for denoising a piecewise constant image. In other words, the model suffers from a stair-case effect on regions whose pixel values are smoothly changed. Since the TVS denoising model and the LOT model find an image that fits the regularized normal vector field from the PDE (5), it is natural to have a better performance than the ROF model. However, it still has problems when the original image has smoothly changing pixel values near sharp edges and the regularized normal vector field on some regions is almost parallel or has some numerical errors; see Figures 2 and 4.
The TV-Stokes Equation with an Orientation-Matching Minimization
2.2
493
Orientation-Matching Minimization
Inspired by the two-step algorithm in the TVS denoising model, we also use the regularized tangential vector field with the zero divergence condition in the first step. In this paper, we propose a new approach for reconstructing a denoised image in the second step. Namely, unlike finding an image that fits the regularized normal direction (4), we minimize an orientation between the image gradient and the regularized normal direction: |∇d · n| min dp, (6) − d−d0 2 =σ Ω |∇d||n| where ·2 and σ are same in (4). From the Euler-Lagrange equation and the gradient descent method along fictitious time τ , we obtain new PDE for obtaining a denoised image with the free flux boundary condition and an initial condition d(p, 0) = d0 (p): sgn(∇d · n) n |∇d · n| ∇d ∂d (p, τ ) = ∇ · − − μ(d − d0 ), (7) ∂τ |∇d|2 |n| |∇d| |∇d| |n| where sgn(·) is the sign function and μ is a positive parameter. Unlike the diffu1 n sivity term |∇d| and the fixed force ∇· |n| term in (5), the PDE from the proposed minimization has the diffusivity depending on an orientation of the regularized normal vector field n and the weighted self-adaptive force term depending on the direction between ∇d and n. We expect two differences between the proposed model (6) and the previous one (4) for reconstructing a denoised image. The first is that we have smaller orientation difference between the gradient of an original image and the gradient of a denoised image. The second is that the result in our model will have sharper edges in a denoised image, specially when the original image has smoothly changing pixel values near sharp edges. These are easily observed in numerical experiments and there are some plausible reasons. In order to see the first difference, we assume that θ is the angle between ∇d/|∇d| and n/|n|. Then, the functional in the proposed model is written by (−| cos θ|)dp. (8) Ω
and the functional in the previous model is presented by n ∇d · n dp = dp |∇d| − ∇d · |∇d| 1 − |n| |∇d||n| Ω Ω = |∇d|(1 − cos θ)dp.
(9)
Ω
The previous energy functional minimizes both |∇d| and the angle θ. If an image d has some regions where |∇d| is large enough, the minimization of the angle difference between ∇d/|∇d| and n/|n| has quite an weak effect. In case of very
494
X.-C. Tai, S. Borok, and J. Hahn
small |∇d|, any angle will fit to n/|n|. Even though there exists a small amount of the angle difference, the graph of a denoised image is easily affected to generate a different shape to the original image. Since the proposed energy functional only minimizes the orientation difference, the shape of a denoised result is more sensitively changed in order to fit the original image regardless of the magnitude of |∇d|. We numerically show the orientation difference in Table 1 using different methods. When we assume that ∇d is approximately parallel to n, the second difference is expected because the proposed PDE can be written by |∇d · n| ∇d sgn(∇d · n) n n ∇d 1 ∇· − ∇· − (±) . |∇d|2 |n| |∇d| |∇d| |n| |∇d| |n| |∇d| From the approximation, if |∇d| is large, we observe that the proposed model (7) is dominantly influenced by a data fidelity term and slightly affected by a regularization term. However, the previous model (5) is still affected by an additional force term from the regularized normal vector field. Since we may have some numerical errors of the vector field in a numerical computation of (2), it is difficult to know whether the additional force will generate a good result or not. Even though the extra force reduces a stair-case effect comparing to the TV-filtering method in smooth regions, it may derive an erroneous effect near edges where |∇d| is large. We numerically show qualities of a denoised image when the original image has smoothly changing pixel values near sharp edges; see Figure 2, 3, and 4.
3
Numerical Aspects
For the discretization, we use the standard staggered grid which is suggested in [2]. In this section, we briefly note some issues of discretization in the first and second steps. 3.1
A Regularization of the Tangent Vector Field
The minimization problem (2) for regularizing the tangent vector filed with the constraint of the incompressibility condition is solved by the method of Lagrange and the Chorin projection type method. We apply the Chorin projection type method and the AOS method [13, 14] to solve the PDE (3). 1. Calculation for an intermediate tangent field t∗ which is not incompressible vector field. t∗ − tn ∇t∗ = ∇· − δ(t∗ − t0 ), Δτ |∇tn | with the boundary condition ∇t∗ · ν = 0,
The TV-Stokes Equation with an Orientation-Matching Minimization
495
where |∇tn | ≡ + |∇tn |2 and tn is the tangent vector field at the nth time step. The AOS method of the linearized equation for the component u and v is used. The spatial derivatives with respect to x and y are approximated by standard one-sided finite differences. 2. Solving for λ such that ⎧ n+1 − t∗ ⎨t = ∇λ, Δτ ⎩∇ · tn+1 = 0. This gives a Poisson equation for λ with the zero Neumann boundary condition: 1 ∇ · t∗ . ∇ · ∇λ = − Δτ 3. Updating the tangent vector field by tn+1 = t∗ + Δτ ∇λ. The boundary values are updated by the incompressibility condition. More datails are shown in [2, 1]. For the stopping criterion, we use the steady state condition for the flow t = (u, v)T : n+1 − un ||∞ ||v n+1 − v n ||∞ ||u ≤ α, , max ||un ||∞ ||v n ||∞ where n and n + 1 are consecutive time steps and || · ||∞ is the L∞ (Ω) norm. Note that α = 10−4 is fixed for all examples in the paper. 3.2
A Reconstruction of a Denoised Image
After the regularized tangent vector field t = (u, v)T is computed from the first step, we propose an orientation-matching minimization (6) to reconstruct a denoised image from the regularized normal vector field n = (v, −u)T . The optimality condition for the saddle point is obtained by the gradient descent flow which gives a PDE (7). We also apply the AOS method to solve the proposed PDE. Note that we use a regularized sign function [15]:
sgnε (s) ≡ 2Hε (s) − 1,
⎧ 1 ⎪ ⎪ ⎨ Hε (s) ≡ 0 ⎪ ⎪ ⎩ 1 1 + s + 1 sin πs 2 ε π ε
s > ε, s < ε, otherwise,
and a parameter is used to avoid division by zero in numerical experiments: |∇dn | ≡ + |∇dn |2 , |n| ≡ + |n|2 , where n is the nth time step. More datails are shown in [1, 2].
496
X.-C. Tai, S. Borok, and J. Hahn
For the stopping criterion, we use the steady state condition for the relative difference in the energy (6). That is, |E n+1 − E n | ≤ β, En where E n is the energy value at the time step n approximated by
|∇dn · n| n − . E ≈ |∇dn | |n| i,j The value of β may be different for images and we use 10−2 ≤ β ≤ 10−4 . The energy (4) is similarly computed and it is used for the stopping criterion of the second step in the previous model. Remark 1. The right choice of parameters is crucial for qualities of a denoised image. The parameters, δ and μ, they control a balance between a data smoothing and a fidelity therm. The parameter is used to avoid a division by zero, which also controls the diffusivity for smoothing a data. The AOS scheme provides us a wide range of the time step. However, if Δτ is too large, then visual qualities of a denoised image are deteriorated.
4
Examples
In this section, we show numerical experiments for denoising an image based on the proposed method. With synthetic images and real images, we discuss about the strength of the proposed orientation-matching minimization and compare with results from other methods. For the simplicity, the following notations are used to indicate parameters in different methods. – – – – –
V (Δτ, δ, ): a regularization of the tangent vector field (3). M 1 (Δτ, μ, ): a reconstruction of a denoised image from (7). M 2 (Δτ, μ, ): a reconstruction of a denoised image from (5). M 3 (λ): the TV-filtering method in [8]. M 4 (μ, ρ, ): a reconstruction of a denoised image from (10).
We also include an interesting numerical experiment to combine the anisotropic nonlinear diffusion [6, 5] with the regularized tangent vector field t = (u, v)T in the first step (2). That is, the diffusivity tensor is constructed from n = (v, −u)T and we solve a PDE with the free flux boundary condition: ∂d (p, τ ) = ∇ · g Gρ ∗ nnT ∇d − μ(d − d0 ), (10) ∂τ where (Gρ ∗M )ij = Gρ ∗mij for a matrix M = (mij ) and Gρ ∗f is the convolution of f with the two-dimensional Gaussian kernel with the standard deviation ρ. The function g is defined on a set S of real semi-positive symmetric 2×2 matrices: 1 1 g(M ) ≡ √ vΛ vΛ T + √ (Λ2 ) vΛ2 vΛ2 T , + Λ1 1 1 + Λ2 where (Λ1 , vΛ1 ) and (Λ2 , vΛ2 ) are eigenpairs of M ∈ S, Λ1 ≥ Λ2 .
The TV-Stokes Equation with an Orientation-Matching Minimization
(test 1)
(test 2)
(test 3)
(test 4)
497
(test 5)
Fig. 1. Results from the proposed method: the first row is original images, we add a Gaussian white noise with zero mean and the standard deviation 10 for all images in the second row, and the last row is the result from the proposed method
Table 1. Comparison of the orientation difference γ in (11): (A) is the result of the proposed method, (B) is the result of TVS denoising method, (C) is the result of TVfilter method. The denoised image from the prosed method is shown in the third row of Figure 1. images test 1 test 2 test 3 test 4 test 5 (A) (B) (C)
(a)
(b)
0.9706 0.8693 0.7668 0.5681 0.4936 0.9316 0.8478 0.6304 0.4983 0.4051 0.7466 0.6825 0.6218 0.3891 0.3228
(c)
(d)
(e)
(f)
Fig. 2. Comparison with other methods: (a), (b), and (c) are the graph of images from top to bottom of the test 5 in Figure 1, respectively. (d) is the result of TVS denoising model and (e) is the result of TV-filtering model. (f) is the result from (10). Note that (c) is the result from the proposed model.
498
X.-C. Tai, S. Borok, and J. Hahn
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 3. (a) is an original image. We add a Gaussian white noise with zero mean and the standard deviation 20 in (b) which is larger noise than in test 4 in Figure 1. (c) is the result of the proposed model. (d) is the result of TVS denoising model and (e) is the result of TV-filtering model. (f) is the result from (10).
(a)
(a1)
(a2)
(a3)
(b)
(b1)
(b2)
(b3)
Fig. 4. (a) is a part of a tangent vector field from (2). (a1), (a2), and (a3) in the first row are a part of the images (c), (d), (f) in Figure 3, respectively. In the second row, we compute less smooth tangent vector field (b) in the first step and use the same method for the second step as the first row.
The TV-Stokes Equation with an Orientation-Matching Minimization
(a)
(b)
(c)
(d)
499
Fig. 5. There is a Gaussian white noise with zero mean and the standard deviation 10 in (a) from [16]. (b) is the result from the proposed model. (c) is the result of TVS denoising model and (d) is the result of TV-filtering model. The size of image is 240 × 124.
(a)
(b)
(c)
(d)
Fig. 6. There is a Gaussian white noise with zero mean and the standard deviation 10 in (a) from [16]. (b) is the result of the proposed model. (c) is the result of TVS denoising model and (d) is the result of TV-filtering model. The size of image is 181 × 274.
Example 1. We numerically check how well the orientation of the gradient of a denoised image is fitted to the gradient of the original image. In Table 1, we measure the orientation difference for different test images: ∇de 1 ∇dc γ= · dp, (11) |Ω| Ω |∇de | |∇dc | where de is the original image, dc is the computed denoised image, and |Ω| is the area of the domain. In the first step in (A) and (B), V (10−1 , 1, 104) is
500
X.-C. Tai, S. Borok, and J. Hahn
fixed for all test images. In the second step in (A) and (B), M 1 (10−3 , 1, 10−3) and M 2 (10−3 , 1, 10−6) for test 1, M 1 (10−3 , 1, 5 · 10−3 ) and M 2 (10−3 , 1, 2.5 × 10−5 ) for test 2, M 1 (10−3 , 1, 2.5 · 10−5 ) and M 2 (103 , 1, 5 · 10−3 ) for test 3, M 1 (10−3 , 1, 10−3 ) and M 2 (10−3 , 5, 10−3) for test 4, and M 1 (10−3 , 2, 3 × 10−3 ) and M 2 (103 , 3, 3 × 10−3 ) for test 5 are used, respectively. In (C), all results are obtained by M 3 (60). As we explain in Section 2.2, the proposed model has better performance for fitting the orientation. In Figure 2, the graph of computed results are presented in order to see visual difference. The result (f) is obtained by (10) with M 4 (0.4, 0.1, 10−3). A denoised image from the proposed method has very clean shape, even though an original image has smoothly changing pixel values near edges. We observe that results from other methods do not have very sharp edges. The result (e) from the TV-filtering model has has a stair-case effect on smooth regions. These results are expected in Section 2.2. Example 2. In Figure 3, we compare the results from different methods with larger noise in Figure 1. For a regularization of the tangent vector field in (c) and (d), V (5 × 10−2 , 1, 10−4 ) is used. The result of the proposed method in (c) is obtained by using M 1 (10−3 , 2, 10−3). (d), (e), and (f) are obtained by M 2 (10−3 , 4, 10−4 ), M 3 (80), and M 4 (0.5, 1, 10−3). Now, we observe the effect of the first step (2) to the second step in (7), (5), and (10) is numerically shown. The first row in Figure 4 is a part of images in Figure 3. In the second row, we obtain a relatively less smooth vector field with V (10−1 , 3, 10−4). (b2) is obtained by M 1 (10−3 , 2, 10−3 ) and we use same parameters for (b1) and (b3) as (a1) and (a3). Note that the result (b2) does not have very clean edge even if we use smaller μ in the second step for the previous model (5). The other methods, (5) and (10), are responded by a small change of the vector field because the field is directly used in the formulation without considering any relation with an image data. Example 3. For real images, we make a comparison with denoised images from different methods. In Figure 5, the image (a) is obtained by the proposed method using V (10−1 , 5, 10−4 ) and M 1 (5 × 10−4 , 5, 5 × 10−4 ). (b) is from V (5 × 10−2, 5, 10−4 ) and M 2 (10−3 , 1, 5 × 10−3). (c) is from M 3 (60). In Figure 6, the image (a) is obtained by the proposed method with V (10−1 , 2, 10−4) and M 1 (10−4 , 30, 10−3). (b) is from V (10−1 , 2, 10−4 ) and M 2 (10−3 , 2, 10−3). (c) is from M 3 (60). For these images, two models (4) and (6) give similar results which are better than the TV-filtering model.
5
Conclusions
We proposed an orientation-matching minimization for denoising digital images. Our algorithm consisted of two steps. In the first step, we use the regularized tangent vector field with the incompressibility condition which is suggested in [2]. The condition is crucial for reconstructing an image from the vector field. In the second step, the present work proposed a minimization of an orientation between the image gradient and the regularized normal direction. It gives a nonlinear PDE for reconstructing a denoised images, which has the diffusivity depending on an
The TV-Stokes Equation with an Orientation-Matching Minimization
501
orientation of the regularized normal vector field and the weighted self-adaptive force term depending on the direction between the gradient of an image and the vector field. This allows to obtain a denoised image which has sharp edges and smooth regions, even though an original image has smoothly changing pixel values near sharp edges. We show improved qualities of results from various numerical experiments.
References 1. Rahman, T., Tai, X.C., Osher, S.: A TV-Stokes denoising algorithm. In: Sgallari, F., Murli, A., Paragios, N. (eds.) Scale Sace and Variational Methods in Computer Vision, pp. 473–482. Springer, Heidelberg (2007) 2. Tai, X.C., Osher, S., Holm, R.: Image inpainting using TV-Stokes equation. In: Image Processing Based on Partial Differential Equations, pp. 3–22. Springer, Heidelberg (2006) 3. Bertalmio, M., Sapiro, G., Bertozzi, A.L.: Navier-Stokes, fluid dynamica, and image and video inpainting. In: Proc. Conf. Comp. Vision Pattern Rec., pp. 355–362 (2001) 4. Perona, P., Malik, J.: Scale space and edge detection using anisotropic diffusion. IEEE Trans. Pattern Anal. Machine Intell. 12(7), 629–639 (1990) 5. Weickert, J.: Coherence-enhancing diffusion filtering. Int. J. Comput. Vis. 31, 111– 127 (1999) 6. Brox, T., Weickert, J., Burgeth, B., Mrázek, P.: Nonlinear structure tensors. Image Vis. Comput. 24, 41–55 (2006) 7. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 8. Bresson, X., Chan, T.: Fast daul minimization of the vectorial total variation norm and applications to color image processing. Inverse Problems and Imaging 2(4), 455–484 (2008) 9. Hahn, J., Lee, C.O.: A nonlinear structure tensor with the diffusivity matrix composed of the image gradient. J. Math. Imag. Vis. (accepted) 10. Lysaker, M., Osher, S., Tai, X.C.: Noise removal using smoothed normals and surface fitting. IEEE Trans. Image Processing 13(10), 1345–1357 (2004) 11. Vese, L., Osher, S.: Numerical methods for p-harmonic flows and applications to image processing. SIAM J. Numer. Anal. 40(6), 2085–2104 (2002) 12. Sochen, N., Sagiv, C., Kimmel, R.: Stereographic combing a porcupine or studies on direction diffusion in image processing. SIAM J. Appl. Math. 64(5), 1477–1508 (2004) 13. Lu, T., Neittaanmaki, P., Tai, X.C.: A parallel splitting up method for partial differential equations and its application to Navier-Stokes equations. RAIRO Math. Model. and Numer. Anal. 26(6), 673–708 (1992) 14. Weickert, J., ter Harr Romeny, B.M., Viergever, M.A.: Efficient and reliable schemes for nonlinear diffusion filtering. IEEE Trans. Image Processing 7, 398– 410 (2001) 15. Chan, T., Vese, L.: Active contours without edges. IEEE Trans. Image Processing 10, 266–277 (2001) 16. Martin, D., Fowlkes, C., Tal, D., Malik, J.: A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proc. 8th Int’l Conf. Computer Vision, vol. 2, pp. 416–423 (July 2001)
Augmented Lagrangian Method, Dual Methods and Split Bregman Iteration for ROF Model Xue-Cheng Tai1 and Chunlin Wu2 1
2
Division of Mathematical Science, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore and Department of Mathematics, University of Bergen, Johannes Brunsgate 12, N-5008 Bergen, Norway [email protected] Division of Mathematical Science, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore
Abstract. In the recent decades the ROF model (total variation (TV) minimization) has made great successes in image restoration due to its good edge-preserving property. However, the non-differentiability of the minimization problem brings computational difficulties. Different techniques have been proposed to overcome this difficulty. Therein methods regarded to be particularly efficient include dual methods of CGM (Chan, Golub, and Mulet) [7] Chambolle [6] and split Bregman iteration [14], as well as splitting-and-penalty based method [28] [29]. In this paper, we show that most of these methods can be classified under the same framework. The dual methods and split Bregman iteration are just different iterative procedures to solve the same system resulted from a Lagrangian and penalty approach. We only show this relationship for the ROF model. However, it provides a uniform framework to understand these methods for other models. In addition, we provide some examples to illustrate the accuracy and efficiency of the proposed algorithm.
1
Introduction
Image restoration such as denoising and deblurring is one of the most fundamental task in image processing and is in general based on regularization. To preserve image edges and features during image regularization procedures is difficult but very desired. Recently the ROF model [23] has been demonstrated to be very successful in edge-preserving image restoration; see [9] [11] and references therein. Consequently the model attracted much attention and has been extended to high order models [8] [31] [18] [19] [16] [25] and vectorial models [24] [2] [10] for color image restoration [17] [27]. However, the computation of the ROF model suffers from serious nonlinearity and non-differentiability. In [23], the authors proposed an artificial time marching strategy to the associated Euler-Lagrange equation. This method is slow due to strict stability constraints in the time step size. Besides, the artificial time marching method computes solutions of not the exact ROF model, but its approximation, say, regularized ROF model. Different techniques have been proposed to overcome this difficulty. X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 502–513, 2009. c Springer-Verlag Berlin Heidelberg 2009
Augmented Lagrangian Method, Dual Methods and Split Bregman Iteration
503
There are several methods regarded as particularly efficient. One approach is the dual methods [7] [5] [6], which is based on various dual formulations of the model. The other is split Bregman iteration [14], which uses functional splitting and Bregman iteration for constrained optimization [20] [30]. Similar to split Bregman iteration, another approach based on splitting and then alternating minimization of the penalized cost function was proposed in [28] [29]. In this paper, we present augmented Lagrangian method to solve the model and show that the dual method and split Bregman iteration can actually be either deduced from, or equivalent to our method.
2
ROF Model and Related Numerical Solvers
Assume Ω ⊂ R2 is a bounded open subset (usually a rectangle in image processing) and f : Ω → R is an observed image. f often contains various degradation and can be noisy and blurred, which is usually modelled as f = Ku + n,
(1)
where u is the true image, and K, n are the linear operator and noise respectively. The K operator may stand for the identity operator, or various blur operations such as Gaussian blur and motion blur. The noise n may denote Gaussian noise or salt-pepper noise or even others. Image restoration aims to recover u from f with some information of K and n. In this paper we assume that n is some Gaussian white noise and K is a general blur operator. Since the variance of n and the blur kernel of K can usually be estimated, we further assume we know K and the variance of n exactly. With these knowledge, it’s still difficult to recover u from f . Even in the pure denoising case (K = I), it’s not an easy task to get u since we only know the variance of the random noise n. For pure deblur case in which K = I and n = 0, we cannot directly solve f = Ku to get u due to the compactness of K. The problem f = Ku is ill-posed, and the solution would be highly oscillatory. Regularization on the solution should be considered. The restoration problem is thus presented using some regularity R(u) as min R(u) u
s.t.f − Ku2 = σ 2 ,
(2)
where σ is the variance of n. The constrained minimization problem is often solved approximately using Tikhonov regularization as follows min F (u) = R(u) + u
λ Ku − f 2 , 2
(3)
for some parameter λ. There are many choices for the regularity term R(u). One of the most basic and successful choice of the regularity is due to Rudin, Osher, and Fatemi [23] in which R(u) was chosen to be the total variation of u. The so-called ROF model reads
504
X.-C. Tai and C. Wu
u = arg min Frof (u) = u
|∇u| + Ω
λ Ku − f 2 . 2
(4)
In [23] the authors considered the image denoising problem (K = I) and presented a gradient descent method to solve (4). (Here the method is described for general K.) The artificial time marching was introduced to the associated Euler-Lagrange equation as follows ∇u ) |∇u|2 +β
ut = ∇ · ( √ u(0) = f
+ K ∗ (f − Ku)
,
(5)
where β is a small positive number to avoid zero division and K ∗ is the L2 adjoint of K. There are mainly two drawbacks for the gradient descent method (5). At first, the method computes the solution of (4) not exactly, but approximately. On the second, the method is slow due to strict constraints on the time step size. The choice of β affects both aspects. Larger the β, more efficient the scheme is, whereas worse the approximation will be. There is a tradeoff between the accuracy and efficiency in choosing β. Many algorithms have been proposed to improve on this method. Those regarded as particularly efficient include dual methods and split Bregman iteration, as well as splitting-and-penalty based method, as mentioned before. Before we go on, we present here an obviously equivalent formulation of the restoration problem (4), which will play an important roll in our derivation. The difficulty to solve the ROF restoration model (4) is due to the nondifferentiability of the total variation norm. We introduce an auxiliary variable q for ∇u to separate the calculation of the non-differentiable term and the fidelity term. The model (4) is thus equivalent to min Grof (u, q) = Ω |q| + λ2 Ku − f 2 u,q , (6) ∂x u q1 = s.t. q= = ∇u ∂y u q2 a constrained optimization problem. 2.1
CGM Dual Method
In [7] Chan et al presented a primal-dual method for the TV minimization. They introduced a new variable ∇u (7) ω= |∇u| to the Euler-Lagrange equation of the model (4), yielding −∇ · ω + λK ∗ (Ku − f ) = 0 , ∇u − ω|∇u| = 0
(8)
to remove some of the singularity caused by the non-differentiability of the object functional.
Augmented Lagrangian Method, Dual Methods and Split Bregman Iteration
505
Different from the original Euler-Lagrange equation for u, this system contains both u and ω variables. In [7], u and ω are called the primal and dual variables, respectively. Again the authors approximate this primal-dual system using a regularized TV norm for real calculation. Newton’s linearization technique for both the primal and dual variables is used to solve the discrete version. 2.2
Chambolle’s Dual Method
Another work based on dual formulation with a slightly different derivation is due to Chambolle. In [6] Chambolle used Legendre-Fenchel transform and a key result from optimization theory to get an original and efficient algorithm for total variation minimization. The primal variable of the image data is expressed explicitly with the dual variable and only the dual variable is iteratively computed. The primal variable u is obtained from the final result of the dual variable. However, the algorithm dose not consider general K operators. Specifically, Chambolle adopted the following definition of total variation for general (not necessary to be smooth) function u: TV(u) = sup{ u(x)∇ · ξ(x) : ξ ∈ Cc1 (Ω; R2 ), |ξ(x)| ≤ 1, ∀x ∈ Ω}. (9) Ω
Denoting S = Closure{∇ · ξ(x) : ξ ∈ Cc1 (Ω; R2 ), |ξ(x)| ≤ 1, ∀x ∈ Ω},
(10)
Chambolle showed that the ROF restoration model (4) with K = I (Note the slight difference between Eqn. (4) and the model in [6] about the parameter λ) yields 1 u = f − πS (λf ) = f − π S (f ), (11) λ λ where πS (·) is the L2 norm projection operator to S, which reads πS (·) = arg min {divξ(x) − ·2 : |ξ(x)| ≤ 1, ∀x ∈ Ω}. divξ(x)
(12)
Since S is not a linear space, this projection is nonlinear. From the KKT conditions and with a careful observation, it was shown in [6] that ξ(x) for πS (λf ) satisfies −∇(divξ(x) − λf ) + |∇(divξ(x) − λf )|ξ(x) = 0, (13) which can be solved by a semi-implicit gradient descent algorithm. Note here we present the continuous case instead of the discrete version used in [6]. 2.3
Split Bregman Iteration
Recently (split) Bregman iteration attracts much attention in signal recovery and image processing community. The basic idea is to transform a constrained optimization problem to a series of unconstrained problems. In each unconstrained
506
X.-C. Tai and C. Wu
problem, the object function is defined by Bregman distance [3] of a convex functional. The Bregman distance of a convex functional J(u) is defined as the following (nonnegative) quantity DJp (u, v) ≡ J(u) − J(v)− < p, u − v >,
(14)
where p ∈ ∂J(v). When J(u) is a continuously differentiable functional, its sub-differential ∂J(v) has a single element for each v, and consequently the Bregman distance is unique. In this case the distance is just the difference at the point u between J(·) and its first order approximation at the point v. For non-differentiable functionals, the sub-differential may contain none or multiple values. Therefore, the Bregman distance between u and v can be ill-defined or multivalued. However, this poses no difficulty for the iterative algorithms as the algorithms automatically choose a unique sub-gradient in each iteration as long as the fidelity term for the constraints is differentiable (this condition holds usually). We also want to emphasis here that Bregman distance of a functional is not a distance in the usual sense since, in general, DJp (u, v) = DJp (v, u) and the triangle inequality does not hold. See [20] [30] for more details. To find the solution of the ROF model (4), or equivalently the constrained problem (6), split Bregman iteration (In [14] algorithms for K = I, say, TV denoising are presented) solves a sequence of unconstrained problems taking the form as k r (pk u ,pq ) (uk+1 , q k+1 ) = arg min DGrof ((u, q), (uk , q k )) + |q − ∇u|2 , (15) u,q 2 Ω where pku , pkq , sometimes written together to be (pku , pkq ), are the sub-gradients of Grof at (uk , q k ) with respect to u and q, respectively. Taking the update of the sub-gradients into consideration, the iteration procedure can be formulated as Algorithm 1. For the computation of (uk+1 , q k+1 ), we refer to Algorithm 3 for more details. Algorithm 1. Split Bregman iteration for the ROF model 1. Initialization: q 0 = 0, u0 = 0, p0q = 0, p0u = 0; 2. For k=0, 1, 2, ...: Compute (uk+1 , q k+1 ) using Eqn. (15), and update = pku − rdiv(q k+1 − ∇uk+1 ) pk+1 u . k+1 pq = pkq − r(q k+1 − ∇uk+1 )
3
(16)
Augmented Lagrangian Method, and Relations to Dual Methods and Split Bregman Iteration
In this section we present augmented Lagrangian method [15] [21] [22] for the ROF model, or equivalently the constrained problem (6). Augmented Lagrangian
Augmented Lagrangian Method, Dual Methods and Split Bregman Iteration
507
method has many advantages over other methods such as penalty method [1], and has been successfully applied to nonlinear PDE and mechanics [13]. We also show that the dual methods and split Bregman iteration can be either deduced from, or equivalent to augmented Lagrangian method. 3.1
Augmented Lagrangian Method
In augmented Lagrangian method, one solves the constrained optimization problem (6) by λ r 2 min max Lrof (u, q, μ) = |q| + Ku − f + μ · (q − ∇u) + |q − ∇u|2 , u,q μ 2 2 Ω Ω Ω (17) μ1 is the Lagrange multiplier and r is a positive constant. That where μ = μ2 is, the method is to seek a saddle point of the augmented Lagrangian functional Lrof (u, q, μ). The system of optimality conditions is thus ∂Lrof = λK ∗ (Ku − f ) + ∇ · μ + r∇ · (q − ∇u) = 0, ∂u q ∂Lrof = + μ + r(q − ∇u) = 0, ∂q |q| ∂Lrof = q − ∇u = 0. ∂μ
(18) (19) (20)
We now have two ways to solve the problem (17). One is using optimization techniques to directly minimize/maximize corresponding functionals; while the other is solving the associated system of optimality conditions. The augmented Lagrangian method uses an iterative procedure to solve (17); see Algorithm 2. The iterative scheme runs until some stopping condition is satisfied. Algorithm 2. Augmented Lagrangian method for the ROF model 1. Initialization: u0 = 0, q 0 = 0, µ0 = 0; 2. For k=0,1,2,...: compute (uk+1 , q k+1 ) as a minimizer of the augmented Lagrangian method for the Lagrange multiplier µk , i.e., (uk+1 , q k+1 ) = arg min Lrof (u, q, µk ), u,q
(21)
where Lrof (u, q, µk ) is defined in Eqn. (17); and update µk+1 = µk + r(q k+1 − ∇uk+1 ).
(22)
To solve the problem (21), we separate it to the following two sub-problems ([28] [29]): r λ arg min Ku − f 2 − μk · ∇u + |q − ∇u|2 , (23) u 2 2 Ω Ω
508
X.-C. Tai and C. Wu
for given q, and arg min q
|q| +
Ω
μk · q + Ω
r 2
|q − ∇u|2 ,
(24)
Ω
for given u. Sub-problems (23) and (24) can be efficiently solved. For (23), the optimality condition gives a linear equation λK ∗ (Ku − f ) + divμk + rdivq − r u = 0 for u, which allows us to use Fast Fourier transforms. Denoting F (u) as the Fourier transform of u, we get u from u = F −1 (
λF (K ∗ )F (f ) − F (div) · F(μk ) − rF (div) · F(q) ), λF (K ∗ )F (K) − rF ( )
(25)
where applying Fourier transform to a vector such as div and μk means applying Fourier transform to its components, respectively; and Fourier transforms of operators such as K, ∂x , ∂y , are regarded as the transforms of their corresponding convolution kernels (for differential operators, the kernels will be approximated by kernels of difference operators). For (24), we actually have the following closed form solution 1 1 (1 − |w(x,y)| )w(x, y), |w(x, y)| > 1, q= r (26) 0, |w(x, y)| ≤ 1, where w = r∇u − μk , since we can reformulate the problem to be 1 arg min |rq| + |rq − (r∇u − μk )|2 . q 2 Ω Ω Based on these observation, we can use Algorithm 3 to solve (21). Here N can be chosen using some convergence test techniques. In common augmented Lagrangian method, one usually sets N = 1.
Algorithm 3. Augmented Lagrangian method for the ROF model – solve the sub-problem of Eqn. (21) 1. Initialization: uk+1,0 = uk , q k+1,0 = q k ; 2. For n = 0, 1, 2, ..., N : Compute uk+1,n+1 from Eqn. (25) for q = q k+1,n ; and then compute q k+1,n+1 from Eqn. (26) for u = uk+1,n+1 ; 3. uk+1 = uk+1,N , q k+1 = q k+1,N .
As for the second approach to solve the problem (17), people can use some other iterative procedures to solve the corresponding optimality system. Actually the optimality system naturally infers CGM and the dual method of Chambolle as shown in the following.
Augmented Lagrangian Method, Dual Methods and Split Bregman Iteration
3.2
509
Relations between Augmented Lagrangian Method and Dual Methods as Well as Split Bregman Iteration
In this sub-section we show that CGM and Chambolle’s dual methods for the ROF model can be deduced naturally from the augmented Lagrangian method. This is a much simpler derivation of the dual methods. Also split Bregman iteration is demonstrated to be equivalent to Algorithm 2. Connection to CGM Dual Method. We first show that CGM dual method can be deduced from the augmented Lagrangian method. The optimality conditions for the augmented Lagrangian approach are given in (18)–(20). From Eqn. (20), we get q = ∇u. Combining this with (19), we see that μ=−
∇u . |∇u|
(27)
Therefore, the dual variable in CGM dual method is nothing but the Lagrange multiplier μ with a different sign. Hence, the system of optimality conditions (18)–(20) is equivalent to ∇ · μ + λK ∗ (Ku − f ) = 0 , ∇u + μ|∇u| = 0
(28)
which is just the primal-dual system of CGM dual method if one replaces −μ with ω. Connection to Chambolle’s Dual Method. We now further derive Chambolle’s dual method. From the first equation of (28), we get u as: u = (λK ∗ K)−1 (λK ∗ f − divμ),
(29)
yielding the equation for the dual variable ∇((K ∗ K)−1 (λK ∗ f − divμ)) + |∇((K ∗ K)−1 (λK ∗ f − divμ))|μ = 0.
(30)
For image denoising problems where K = I, (30) and (29) are just the equations used by Chambolle in [6] to solve the dual variable and recover the primal variable u, respectively. The equation (30) for the dual variable in [6] was obtained through a not well-known KKT conditions for inequalities constrained optimization problems, whereas here we deduce this equation very naturally from the augmented Lagrangian method. This is a generic formulation and is not discussed in [6]. We also point out here that some connections between CGM and Chambolle’s dual methods have been noticed in [32]. Connection to Split Bregman Iteration. The split Bregman iteration is actually equivalent to the augmented Lagrangian method. Considering the zero initialization for the sub-gradients and the Lagrange multiplier and letting (pku , pkq ) = −(divμk , μk )
(31)
510
X.-C. Tai and C. Wu
for each k, we have (uk+1 , q k+1 )
r + |q − ∇u|2 = 2 Ω λ r = arg min |q| + Ku − f 2 + udivμk + μk · q + |q − ∇u|2 u,q Ω 2 2 Ω Ω Ω λ r 2 k k = arg min |q| + Ku − f − μ · ∇u + μ ·q+ |q − ∇u|2 u,q Ω 2 2 Ω Ω Ω k (pk u ,pq ) ((u, q), (uk , q k )) arg min DGrof u,q
= arg min Lrof (u, q, μk ), u,q
indicating the equivalence between split Bregman iteration and the iterative procedure for augmented Lagrangian method. In the context of compressive sensing, this equivalence has been pointed out in [30].
Original SNR: InfdB
Blurry&Noisy SNR: 6.30dB
deconvwnr deconvreg SNR: 11.29dB, t = 0.08s SNR: 11.17dB, t = 0.36s
ALM(r=10) SNR: 12.99dB, t = 0.86s
deconvlucy SNR: 9.29dB, t = 1.31s
Fig. 1. Augmented Lagrangian method for ROF restoration, and comparisons to builtin Matlab functions. In the sub-figures, SNR and t denote signal-noise-ratio and the CPU time usage, respectively.
Augmented Lagrangian Method, Dual Methods and Split Bregman Iteration FTVd(r0=1, SF=2, r=256) SNR: 12.62dB, t = 1.09s
511
ALM(r0=1, SF=2, r=128) ALM(r0=1, SF=1.70, r=69.758) SNR: 12.52dB, t = 0.75s SNR: 12.71dB, t = 0.80s
Fig. 2. Comparisons between FTVd package (splitting-and-penalty) and augmented Lagrangian method with increasing penalty parameters for ROF restoration. In the sub-figures, r0, SF and r stand for the initial value, the scaling factor and the final value of the penalty parameter of methods, respectively. Here, SNR and t denote signalnoise-ratio and the CPU time usage, respectively.
3.3
Remark
We want to emphasis that our observations can be extended to many other models including anisotropic TV, high order nonlinear PDE filters (e.g. fourth order models), vectorial TV, and even general models. Similarly, we can use FFTbased fast solvers and closed form solutions to solve the sub-problems for the corresponding algorithms. In addition, one can also derive naturally the dual methods [12] [26] [4] from the system of optimality conditions of augmented Lagrangian functionals for these models. Furthermore, the equivalence between split Bregman iteration and augmented Lagrangian method is also valid for these models. More details will be given in a forthcoming paper.
4
Examples
Two numerical examples are provided in Fig. 1 and Fig. 2 to illustrate the accuracy and efficiency of our method. We compare our method with some builtin Matlab functions, i.e. deconvwnr.m, deconvreg.m and deconvlucy.m in Fig. 1. As one can see, our method generates much better restoration than these built-in Matlab functions in comparable (or even less) CPU time costs. We also compare our method (with increasing parameter r) in Fig. 2 with the recently developed FTVd package based on pure splitting-and-penalty, which is one of the most efficient approaches as compared to other existing methods as discussed in [29]. From Fig. 1 and 2 people can also compare FTVd with our method with fixed parameter r.
5
Conclusion
In this paper we use an approach based on augmented Lagrangian method for ROF model. The algorithm benefits from FFT-based fast solvers and closed
512
X.-C. Tai and C. Wu
form solution. We also show that our method gives a uniform framework to understand the approaches currently regarded to be particularly efficient for ROF model, such as dual methods and split Bregman iteration. The CGM and Chambolle’s dual methods are different iterative schemes to solve the Augmented Lagrangian systems and the dual variables in these methods are nothing but the Lagrange multiplier. Split Bregman iteration is actually equivalent to augmented Lagrangian method. Numerical examples demonstrate the accuracy and efficiency of our approach. The method can be extended to many other restoration models.
Acknowledgements This research has been supported by MOE (Ministry of Education) Tier II project T207N2202 and IDM project NRF2007IDM-IDM002-010. Support from SUG 20/07 is also gratefully acknowledged.
References 1. Bertsekas, D.P.: Multiplier methods: a survey. Automatica 12, 133–145 (1976) 2. Blomgren, P., Chan, T.F.: Color TV: total variation methods for restoration of vector-valued images. IEEE Trans. Image Process. 7, 304–309 (1998) 3. Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics 7, 200–217 (1967) 4. Bresson, X., Chan, T.F.: Fast minimization of the vectorial total variation norm and applications to color image processing. UCLA CAM Report 07-25 (2007) 5. Carter, J.L.: Dual methods for total variation – based image restoration. Ph.D. thesis, UCLA (2001) 6. Chambolle, A.: An algorithm for total variation minimization and applications. J. Math. Imaging Vis. 20, 89–97 (2004) 7. Chan, T.F., Golub, G.H., Mulet, P.: A nonlinear primal-dual method for total variation-based image restoration. SIAM J. Sci. Comput. 20, 1964–1977 (1999) 8. Chan, T., Marquina, A., Mulet, P.: High-order total variation-based image restoration. SIAM J. Sci. Comput. 22, 503–516 (2000) 9. Chan, T.F., Osher, S., Shen, J.: The digital TV filter and nonlinear denoising. IEEE Trans. Image Process. 10, 231–241 (2001) 10. Chan, T.F., Kang, S.H., Shen, J.H.: Total variation denoising and enhancement of color images based on the CB and HSV color models. J. Visual Commun. Image Repres. 12, 422–435 (2001) 11. Chan, T., Esedoglu, S., Park, F.E., Yip, A.: Recent developments in total variation image restoration. UCLA CAM Report 05-01 (2005) 12. Chan, T.F., Esedoglu, S., Park, F.E.: A fourth order dual method for staircase reduction in texture extraction and image restoration problems. UCLA CAM Report 05-28 (2005) 13. Glowinski, R., Le Tallec, P.: Augmented Lagrangians and operator-splitting methods in nonlinear mechanics. SIAM, Philadelphia (1989) 14. Goldstein, T., Osher, S.: The split Bregman method for L1 regularized problems. UCLA CAM Report 08-29 (2008)
Augmented Lagrangian Method, Dual Methods and Split Bregman Iteration
513
15. Hestenes, M.R.: Multiplier and gradient methods. Journal of Optimization Theory and Applications 4, 303–320 (1969) 16. Hinterberger, W., Scherzer, O.: Variational methods on the space of functions of bounded Hessian for convexification and denoising. Computing 76, 109–133 (2006) 17. Kimmel, R., Malladi, R., Sochen, N.: Images as embedded maps and minimal surfaces: movies, color, texture, and volumetric medical images. Int’l J. Computer Vision 39, 111–129 (2000) 18. Lysaker, M., Lundervold, A., Tai, X.-C.: Noise removal using fourth-order partial differential equation with applications to medical Magnetic Resonance Images in space and time. IEEE Trans. Image Process. 12, 1579–1590 (2003) 19. Lysaker, M., Tai, X.-C.: Iterative image restoration combining total variation minimization and a second order functional. Int’l J. Computer Vision 66, 5–18 (2006) 20. Osher, S., Burger, M., Goldfarb, D., Xu, J.J., Yin, W.T.: An iterative regularization method for total variation-based image restoration. SIAM Multiscale Model. Simul. 4, 460–489 (2005) 21. Powell, M.J.D.: A method for nonlinear constraints in minimization problems. Optimization. In: Fletcher, R. (ed.), pp. 283–298. Academic Press, New York (1972) 22. Rockafellar, R.T.: A dual approach to solving nonlinear programming problems by unconstrained optimization. Mathematical Programming 5, 354–373 (1973) 23. Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 24. Sapiro, G., Ringach, D.L.: Anisotropic diffusion of multivalued images with applications to color filtering. IEEE Trans. Image Process 5, 1582–1586 (1996) 25. Scherer, O.: Denoising with higher order derivatives of bounded variation and an application to parameter estimation. Computing 60, 1–27 (1998) 26. Steidl, G.: A note on the dual treatment of higher-order regularization functionals. Computing 76, 135–148 (2006) 27. Tschumperlé, D., Deriche, R.: Vector-valued image regularization with PDEs: a common framework for different applications. IEEE Trans. Pattern Anal. Machine Intell. 27, 506–517 (2005) 28. Wang, Y.L., Yin, W.T., Zhang, Y.: A fast algorithm for image deblurring with total variation regularization. UCLA CAM Report 07-22 (2007) 29. Wang, Y., Yang, J., Yin, W., Zhang, Y.: A new alternating minimization algorithm for total variation image reconstruction. SIAM Journal on Imaging Sciences (to appear) 30. Yin, W.T., Osher, S., Goldfarb, D., Darbon, J.: Bregman iterative algorithms for compressend sensing and related problems. SIAM J. Imaging Sciences 1, 143–168 (2008) 31. You, Y.-L., Kaveh, M.: Fourth-order partial differential equation for noise removal. IEEE Trans. Image Process. 9, 1723–1730 (2000) 32. Zhu, M., Wright, S.J., Chan, T.F.: Duality-based algorithms for total variation image restoration. UCLA CAM Report 08-33 (2008)
The Convergence of a Central-Difference Discretization of Rudin-Osher-Fatemi Model for Image Denoising Ming-Jun Lai1 , Bradley Lucier2 , and Jingyue Wang3 1 2
University of Georgia, Athens GA 30602, USA [email protected] Purdue University, West Lafayette IN 47907, USA [email protected] 3 University of Georgia, Athens GA 30602, USA [email protected]
Abstract. We study the connection between minimizers of the discrete and the continuous Rudin-Osher-Fatemi models. We use a centraldifference total variation term in the discrete ROF model and treat the discrete input data as a projection of the continuous input data into the discrete space. We employ a method developed in [13] with slight adaption to the setting of the central-difference total variation ROF model. We obtain an error bound between the discrete and the continuous minimizer in L2 norm under the assumption that the continuous input data are in W 1,2 .
1
Introduction
One of the most influential variational models for image denoising is the total variation–based model proposed by Rudin, Osher and Fatemi(ROF) [10]. This model studies the following constrained minimization problem: arg min |u|BV u with u= g Ω
Ω
and
(1) |u − g|2 = σ 2
Ω
where g is the input data, σ is the standard deviation of the noise, Ω is the unit square [0, 1]2 , and |u|BV is the total variation (TV) of u defined as follows. We consider functions φ in the space of C 1 functions from Ω to R2 with compact support, i.e., [C01 (Ω)]2 . The variation of a function u ∈ L1 (Ω) is then defined to be |u|BV :=
|Du| := Ω
u∇ · φ.
sup φ∈[C01 (Ω)]2 , |φ|≤1 point-wise
Ω
For more details on functions of bounded variation, we refer the reader to [9]. X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 514–526, 2009. c Springer-Verlag Berlin Heidelberg 2009
The Convergence of a Central-Difference Discretization of ROF Model
515
The existence and uniqueness of the minimizer of (1) have been studied by Lions, Osher and Rudin [11] and more completely by Acar and Vogel [1]. Chambolle and Lions [4] proved that the constrained problem (1) is equivalent to the following unconstrained problem: 1 arg min |u|BV + |u − g|2 . (2) u 2λ Ω They also proved more general results of existence and uniqueness of (1). We later call 1 |u − g|2 E(u) = |u|BV + (3) 2λ the ROF energy functional. On the computing side, the most commonly used discrete variational model is based on the discrete energy Ek (u) =
k−1
μi,j |(∇u)i,j | +
i,j=0
k−1 1 μi,j (ui,j − gi,j )2 , 2λ i,j=0
(4)
where u is defined by a 2-dimensional matrix of size k × k, μi,j is related to the scale k. A simple choice of μi,j is μi,j = 1/k 2 . There are several possible choices for the discrete gradient operator ∇u [3], [5], and [13]. A common choice is (∇u)i,j = ((∇x u)i,j , (∇y u)i,j ) , with (∇x u)i,j =
ui+1,j − ui,j , h
(∇y u)i,j =
ui,j+1 − ui,j , h
where h = 1/k. On the boundary, u is assumed to satisfy the discrete Neumann boundary conditions: u−1,j = u0,j , ui,−1 = ui,0 ,
uk,j = uk−1,j , ui,k = ui,k−1 .
(5) (6)
The discrete function gi,j is the input image. Many efficient algorithms have been developed to find the numerical minimizer of (4) [6], [2], [3]. It is not hard to show that Ek Γ -converges to E (for the definition of Γ convergence, we refer the reader to [7]), therefore, the sequence {uk }, minimizers of Ek , converges to u, the minimizer of E, in L1 (Ω) and Ek (uk ) converges to E(u) as k tends to ∞ (cf. [7]). It is interesting to know the rate of convergence and the convergence in other norm, e.g., in L2 norm. It is also interesting see the difference between the continuous minimizer and the discrete minimizer. The authors in [13] proved that if the discrete energy Ek is equipped with a symmetrical discrete total variation as defined in (7) and the discrete input data g k is the projection of the
516
M.-J. Lai, B. Lucier, and J. Wang
continuous input data g by taking average of g on each pixel, one can bound the error between the discrete minimizer uk and the continuous u in L2 norm by the Lipschitz norm of g provided that g is in some Lipschitz space. ⎛ 2 2 ⎞1/2 k−1 h2 k uki+1,j − uki,j uki,j+1 − uki,j u = ⎠ + ⎝ + TV 4 h h i,j=0 ⎛
uk − uki,j ⎝ i+1,j h ⎛ ⎝
uki,j
− h
uki−1,j
⎛
uk − uki−1,j ⎝ i,j h
2
+
2
+
2
+
uki,j − uki,j−1 h − h
uki,j+1
uki,j
uki,j − uki,j−1 h
2 ⎞1/2 ⎠
+
2 ⎞1/2 ⎠
+
2 ⎞1/2 ⎠
(7)
In this paper, we extend the study in [13], [12] to the discrete ROF model equipped with a central-difference TV term which is much simpler than the symmetrical discrete TV term. The ideas for the study in this paper is exactly the same to the ones in [13]. However, a problem of the central-difference model is that it does not deal well with some non-smooth data, for example, a chessboard image. Thus we have to adapt the study in [13] slightly to this situation and put a stronger assumption on the input data g in order to establish the convergence. We can still get a similar error bound if the input data g ∈ W 1,2 . More precisely, our main results are Theorem 1. If g ∈ W 1,2 , u is the minimizer of E in (3) and uk is the minimizer of Ek in (4) equipped with the central-difference TV operator, we will give the definition in (10), then |E(u) − Ek (uk )| ≤ C(1 +
1 )(gW 1,2 + g2W 1,2 )h1/2 . λ
and Theorem 2. If g ∈ W 1,2 , u is the minimizer of the functional E in (3) and uk is the minimizer of the functional Ek in (10), then Ih uk − u2 ≤ C(λ + 1)(gW 1,2 + g2W 1,2 )h1/2 . where Ih uk is the piecewise constant injection of uk into L2 space. The definition of Ih uk will be given in (14) in the next secion.
2
Preliminaries
A continuous image u is defined as a L2 function on Ω ⊂ R2 . In practice, we always assume Ω to be the unit square [0, 1] × [0, 1].
The Convergence of a Central-Difference Discretization of ROF Model
517
We assume the output of denoised image to be in the space of bounded variation. In the discrete settings, we consider the discrete set Ω k to be the set of all pairs i = (i1 , i2 ) ∈ Z 2 with 0 ≤ i1 , i2 ≤ k. A discrete image uk is defined as a function on Ω k . We always use superscripts to indicate a function is a discrete function through this paper. For discrete functions, we define the discrete p (Ω k ) norms ⎛ ⎞1 u p (Ω k ) k
:= ⎝
p
|uki |p
μi ⎠
for 1 ≤ p ≤ ∞
i∈Ω k
where μi is the measure of the discrete space at each index i. The simplest choice of μi is μi = 1 for i ∈ Ω k . In analogue of Sobolev norm, we define the discrete Sobolev norm as follows. The first order forward finite differences of uk at point i = (i1 , i2 ) are k Δ+ x ui =
uki1 +1,i2 − uki1 ,i2 ; h
k Δ+ y ui =
uki1 ,i2 +1 − uki1 ,i2 , h
where h = 1/k is the step size. We can also define backward finite difference as k Δ− x ui =
uki1 ,i2 − uki1 −1,i2 ; h
k Δ− y ui =
uki1 ,i2 − uki1 ,i2 −1 . h
One can define the second order finite difference as Δxx uki =
k − k Δ+ x ui − Δx ui . h
Also Δyy uki can be similarly defined. We define ∇uk 1 , Δxx uk 1 , Δyy uk 1 as k + k ∇uk 1 := (|Δ+ x ui | + |Δy ui |)μi ; Δxx uk 1 :=
i
|Δxx uki |μi ,
Δyy uk 1 :=
i
(8) |Δyy uki |μi .
(9)
i
In this paper, we shall study the error bound for the central-difference discrete ROF model of which the energy functional is defined as follows Ec (uk ) = Jc (uk ) +
1 k u − g k 2c . 2λ
where the BV term Jc is defined by |Δcx uki |2 + |Δcy uki |2 μi , Jc (uk ) := i∈Ω k
(10)
(11)
518
M.-J. Lai, B. Lucier, and J. Wang
and Δcx uki and Δcy uki at i := (i1 , i2 ) are defined by Δcx uki =
uki1 +1,i2 − uki1 −1,i2 , 2h
Δcy uki =
uki1 ,i2 +1 − uki1 ,i2 −1 . 2h
Here uk satisfies the discrete Neumann boundary condition: uk−1,j = uk1,j ,
ukk+1,j = ukk−1,j ,
uki,−1 = uki,1 ,
uki,k+1 = uki,k−1 .
The discrete space measure μi = |Ωi | where Ωi is the intersection of Ω and the square with center ih and size h. Ωi := Ω ∩ [i1 h −
h h h h , i1 h + ] × [i2 h − , i2 h + ]. 2 2 2 2
It is straightforward to calculate ⎧ 2 ⎨ h /4 (i1 , i2 ) ∈ {(0, 0), (0, k), (k, 0), (k, k)} μi = h2 /2 i1 = 0, k; 0 < i2 < k or i2 = 0, k; 0 < i1 < k ⎩ 2 h 0 < i 1 , i2 < k
(12)
(13)
The 2 term is defined by uk − g k 2c =
k
k 2 |uki,j − gi,j | μi,j .
i,j=0
We often need to extend u ∈ Lp (Ω) and uk ∈ p (Ω k ) to all of R2 and Z2 , respectively; we denote the extensions by Ext u and Extk uk . For u ∈ Lp (Ω), we use the following procedure. First, Ext u(x) = u(x),
x ∈ Ω.
We then reflect horizontally across the line x1 = 1, Ext u(x1 , x2 ) = Ext u(2 − x1 , x2 ),
1 ≤ x1 ≤ 2, 0 ≤ x2 ≤ 1,
and reflect again vertically across the line x2 = 1, Ext u(x1 , x2 ) = Ext u(x1 , 2 − x2 ),
0 ≤ x1 ≤ 2, 1 ≤ x2 ≤ 2.
Having defined Ext u on 2Ω, we then extend Ext u periodically with period (2, 2) on all of R2 . We use a similar construction for discrete functions uk . First we extend uk to 2Ω k := {i = (i1 , i2 ) ∈ Z2 | 0 ≤ i1 , i2 ≤ 2k} as follows: Extk uki = uki ,
i ∈ Ωk ;
The Convergence of a Central-Difference Discretization of ROF Model
519
then we reflect horizontally Extk uk(i1 ,i2 ) = Extk uk(2k−i1 ,i2 ) ,
k + 1 ≤ i1 ≤ 2k, 0 ≤ i2 ≤ k,
and then vertically Extk uk(i1 ,i2 ) = Extk uk(i1 ,2k−i2 ) ,
0 ≤ i1 ≤ 2k, k + 1 ≤ i2 ≤ 2k.
Now that Extk uk is defined on 2Ω k , we extend it periodically with period (2k, 2k) to all of Z2 . Note that with this definition, the value of Extk uk at any point immediately “outside” Ω k is the same as the value of uk at the closest point “inside” Ω k . We sometimes need to inject or project functions into L2 (Ω) or discrete space 2 (Ω k ) respectively. We use the piecewise constant injector to inject discrete function uk into Lp (Ω): (Ih uk )(x) = uki
for x ∈ Ωi .
(14)
We also define an injector Lh into a space of continuous, piecewise linear functions. In fact, Lh is the linear interpolation of discrete points {uki } on a triangulation of vertices hZ2 . uki φki . (15) Lh uk = i∈Ω k
Here φki is a dilated and translated tent function, φki (x) := φki1 ,i2 (x1 , x2 ) := φ(x1 /h − i1 , x2 /h − i2 ),
(16)
where φ is the tent function which is continuous on R2 , supported in the hexagon shown in Fig. 1, linear on each triangle as shown in Fig. 1, and satisfies the following 0 (i1 , i2 ) ∈ Z2 \(0, 0) φ(i1 , i2 ) = 1 (i1 , i2 ) = (0, 0) We also consider the piecewise constant projector of u ∈ L1 (Ω) onto the space of discrete functions, defined by 1 (Pk u)i = u, i ∈ Ω k , |Ωi | Ωi where |Ωi | = μi is the measure of Ωi defined in (12). We need both continuous and discrete smoothing operators, which we define as follows. Assume that η(x) is a a fixed non-negative, rotationally symmetric, mollifier with support in the unit disk that is C ∞ and has integral 1. For > 0 we define the scaled function 1 x , x ∈ R2 ; η (x) := 2 η
520
M.-J. Lai, B. Lucier, and J. Wang
(−1, 1)
(0, 1)
(−1, 0)
(1, 0)
(0, 0)
(0, −1)
(1, −1)
Fig. 1. The Support of φ
we smooth a function u ∈ Lp (Ω), 1 ≤ p ≤ ∞, by computing η (x − y) Ext u(y) dy, x ∈ 2Ω. (η ∗ Ext u)(x) = R2
The discrete smoothing operator SL is defined by (SL uk )i =
1 (2L + 1)2
L j1 ,j2 =−L
uki+(j1 ,j2 )
for i ∈ Ω k
For u ∈ L (Ω) we define the (first-order) Lp (Ω) modulus of smoothness by p1 sup |u(x + τ ) − u(x)|p dx . ω(u, t)Lp (Ω) = p
τ ∈R2 , |τ |
x,x+τ ∈Ω
We also define ω(Ext u, t)Lp (2Ω) :=
sup τ ∈R2 , |τ |
Ext u(· + τ ) − Ext uLp (2Ω) .
We also have need of a discrete modulus of smoothness. To begin, we define the translation operator (T (uk ))i := uki+
for any = (1 , 2 ) ∈ Z2 . 2
(17)
We define the norm || = |1 | + |2 | on Z , and then the discrete modulus of smoothness is p1 k k k p ω(u , m)p := sup |ui+ − ui | μi . ∈Z2 , ||≤m
p
i,i+∈Ω k
For Extk uk we define similarly ω(uk , m)p (2Ω k ) =
sup ∈Z2 , ||≤m
T uk − uk p (2Ω k ) .
The Convergence of a Central-Difference Discretization of ROF Model
3
521
Basic Properties
We begin with the following properties. Lemma 1. (Contraction) Let u, v be the minimizers for input data f and g in problem (2) respectively, u − vL2 ≤ f − gL2 . See a proof in [13] or [12]. With the above property, one can have the following Lemma 2. (Continuity of translation) Assume u is the minimizer of E in problem (2) for input data g. Extend u to Ext u over R2 by symmetric extension as defined before. Then Ext u(x + h) − Ext u(x)L2 (Ω) ≤ ω(g, |h|)L2 (Ω) . Remark 1. One can conclude from Lemma 2 that ω(u, |h|)L2 (Ω) ≤ ω(g, |h|)L2 (Ω) .
(18)
Remark 2. Similar techniques allow one to show that this result also holds for the discrete case of uk and g k where uk is the minimizer of the discrete energy Ek with the symmetric discrete TV operator Jc , and uk is extended on Z2 as before. In fact, the corresponding discrete version is. T (uk ) − uk 2 (A) ≤ Cω(g k , ||)2 (A) ,
(19)
where A is the index set {i := (i1 , i2 ) : 0 ≤ i1 , i2 ≤ k}. For any discrete image v k , the discrete modulus of continuity is ω1 (v k , m)2 (A) :=
sup T (v k ) − v k 2 (An1 ,n2 )
(20)
0<||≤m
with T being the translation operator defined in (17) and An1 ,n2 := {(i, j) : (i, j) ∈ A, (i + n1 , j + n2 ) ∈ A}. Lemma 3. (Maximum principle) Suppose uk is the minimizer of Ek . If g k ∈ L∞ . Then uk ∞ ≤ g k ∞ . The following lemmas bound the errors introduced by injectors and projectors defined before respectively. Lemma 4. Let u ∈ L2 (Ω) and uk ∈ 2 (Ω k ). Then there exists a constant C such that the following properties hold: a) Pk u2 ≤ uL2 ;
522
M.-J. Lai, B. Lucier, and J. Wang
b) ω(Pk u, m)2 ≤ Cω(u, mh)L2 . c) uk 2 = Ih uk L2 ; d) ω(Ih uk , mh)2 ≤ Cω(uk , m)L2 . e) u − Ih Pk uL2 ≤ Cω(u, h)L2 . The following lemma bounds the difference between the two injectors we defined in (14) and (15). Lemma 5 Lh uk − Ih uk L2 ≤ Cω(uk , 1)2 The following lemmas show the properties of the smoothing operators Lemma 6 SL uk − uk 2 ≤ ω(uk , L)2 ,
(21)
Jc (SL uk ) ≤ Jc (uk ),
(22)
and Δxx SL uk 1 + Δyy SL uk 1 ≤
C ∇uk 1 . Lh
(23)
The first inequality in Lemma 6 shows the error between uk and smoothed uk can be bounded by its discrete modulus of continuity. The second inequality shows smoothing does not increase the discrete total variation. The last inequality shows the the second order difference of the smoothed function can be bounded by its first order finite difference. Lemma 7 is the continuous case of Lemma 6. Lemma 7 η ∗ u − uL2 ≤ ω(u, )L2 ,
(24)
|η ∗ u|BV ≤ |u|BV ,
(25)
and Dxxu L1 + Dyy u L1 ≤
C |u|BV .
(26)
The Convergence of a Central-Difference Discretization of ROF Model
4 4.1
523
Proof of the Main Result Main Idea
Recall the ROF continuous and discrete energy functionals are defined by 1 v − g2 ; 2λ 1 k v − g k 2c Ek (v k ) = Jc (v k ) + 2λ E(v) = |v|BV +
(27) (28)
with input image g k = Pk g. To study the difference between Ek (uk ) and E(u), it should first be noticed that Ek and E are two different functionals defined on different spaces. E is defined on the continuous BV(Ω) space while Ek is a discrete operator defined on discrete function space. Therefore, some connection between these two operators should be built. We use two energy bounds to bridge them. First, given a discrete minimizer uk of functional Ek , we inject uk into L2 space by function Lh SL uk with E(Lh SL uk ) less than Ek (uk ) plus some error. The construction of Lh SL uk is done by first “smoothing” uk as SL uk , then linearinterpolating SL uk . Assuming u is the minimizer of E, we have E(u) ≤ E(Lh SL uk ) ≤ Ek (uk ) + eg,h ,
(29)
where eg,h is the error to be bounded in the next section, which depends on initial g and mesh size h, and tends to zero as h tends to zero. The second energy bound is similar but taken in the opposite direction. Based on u, we construct a “smoothed" discrete function Pk η ∗ u by first “smoothing" it, then projecting it into discrete function space, with Ek (Pk η ∗ u) less than E(u) plus an error term eg,h similar to eg,h . By the definition of uk , we have Ek (uk ) ≤ Ek (Pk η ∗ u) ≤ E(u) + eg,h .
(30)
From (29) we see E(u) − Ek (uk ) ≤ eg,h ; from (30)
Ek (uk ) − E(u) ≤ eg,h ;
then we conclude that |Ek (uk ) − E(u)| ≤ max{eg,h , eg,h } . This will complete our error bound. 4.2
Sketch of the Proof
Proposition 1. If g ∈ W 1,2 , and uk , u are the minimizers of Ek , E in (28), (27) respectively, then E(u) ≤ Ek (uk ) + C(1 +
1 )(gW 1,2 + g2W 1,2 )h1/2 . λ
524
M.-J. Lai, B. Lucier, and J. Wang
Proof. We shall bound the energy E(Lh SL uk ). It is straightforward to calculate its TV term (albeit, the computation is tedious) that Lh SL uk ≤ Jc (SL uk ) + Ch Δxx SL uk 1 + Δyy SL uk 1 . BV By the property of discrete smoothing operator (22) and (23) in Lemma 6, L h SL u k
BV
≤ Jc (uk ) +
C ∇uk 1 . L
By Holder’s inequality and Lemma 2, ∇uk 1 is bounded by k + k ∇uk 1 = |Δ+ x ui | + |Δy ui | μi i
⎛
≤C⎝
1/2 k 2 |Δ+ x ui | μi
+
i
1/2 ⎞ k 2 ⎠ |Δ+ y ui | μi
i
C T(1,0) uk − uk + T(0,1) uk − uk ≤ h C ≤ ω(g k , 1)2 by (19) h ≤ CgW 1,2 We have L h SL u k
BV
≤ Jc (uk ) +
C gW 1,2 . L
The L2 term of E Lh SL uk can be written as
Lh SL uk − gL2 = (Lh SL uk − Ih SL uk ) + (Ih SL uk − Ih uk ) + (Ih uk − Ih g k ) + (Ih g k − g)L2 ≤ uk − g k c + C(Lh)gW 1,2 Applying properties of injectors and projectors, Lemma 4 and Lemma 5 and noting the assumption Lh ≤ 1 and the fact that uk − g k c ≤ g k c ≤ g, we obtain
Lh SL uk − g2L2 ≤ uk − g k 2c + C(Lh)g2W 1,2 .
Thus 1 E(Lh SL uk ) = Lh SL uk BV + Lh SL uk − g2L2 2λ C 1 k C u − g k 2c + (Lh)g2W 1,2 ≤ Jc (uk ) + gW 1,2 + L 2λ λ C C k 2 = Ek (u ) + gW 1,2 + (Lh)gW 1,2 . L λ
The Convergence of a Central-Difference Discretization of ROF Model
Setting
525
L = h−1/2 ,
we obtain the result of this proposition. Using similar method we prove the following Proposition 2. If g ∈ W 1,2 , and u, uk are the minimizers of E, Ek in (27), (28) respectively, then Ek (uk ) ≤ E(u) + C(1 +
1 )(gW 1,2 + g2W 1,2 )h1/2 . λ
Combining Propositions 1 and 2 immediately yields the following Theorem 1. If g ∈ W 1,2 , and u, uk are the minimizers of E, Ek in (27), (28) respectively, then |E(u) − Ek (uk )| ≤ C(1 +
1 )(gW 1,2 + g2W 1,2 )h1/2 . λ
Next we need the following lemma Lemma 8. If u is the minimizer of E in (27), then for any v ∈ BV, v − u2 ≤ 2λ(E(v) − E(u)).
(31)
A proof of this Lemma can be found in [13] or [12]. It then follows Theorem 2. If g ∈ W 1,2 , and u, uk are the minimizers of E, Ek in (27), (28) respectively, then Ih uk − u2 ≤ C(λ + 1)(gW 1,2 + g2W 1,2 )h1/2 . Remark 3. In this paper, we have proved the error bound for the discrete ROF model equipped with a central-difference TV term using the method suggested in [13]. This model is simpler in form than the model studied in [13], where a symmetrical TV term is used. This model is also slightly easier to be computed by Chambolle’s method (cf. [3]). However we notice that the central-difference model fails to deal with a class of data, for example a chessboard image. Thus we have to put some stronger assumption on the initial data(in W 1,2 )) to obtain the error bound which may not be satisfied by all real images. However this result still shows the method in [13] can be extended to other symmetric discrete TV operators. It is also interesting to study further if a similar error bound for this model can be obtained without this assumption imposed.
References 1. Acar, R., Vogel, C.R.: Analysis of bounded variation penalty methods for ill-posed problems. Inverse Problems 10, 1217–1229 (1994) 2. Carter, J.L.: Dual Methods for Total Variation-Based Image Restoration, Ph.D. thesis, U.C.L.A (2001)
526
M.-J. Lai, B. Lucier, and J. Wang
3. Chambolle, A.: An algorithm for total variation minimization and applications. Journal of Mathematical Imaging and Vision 20(1-2), 89–97 (2004) 4. Chambolle, A., Lions, P.-L.: Image recovery via total variation minimization and related problems. Numer. Math. 76(2), 167–188 (1997) 5. Chambolle, A., Levine, S., Lucier, B.: ROF image smoothing: some computational comments, draft (2008) 6. Chan, T.F., Golub, G.H., Mulet, P.: A nonlinear primal-dual method for total variation-based image restoration. SIAM J. Sci. Comput. 20(6), 1964–1977 (1999) 7. Dal Maso, G.: An Introduction to Γ -Convergence. Birkhauser, Boston (1993) 8. DeVore, R., Lorentz, G.: Constructive Approximation. Springer, Heidelberg (1993) 9. Evans, L., Gariepy, R.: Measure theory and fine properties of functions. CRC Press, Boca Raton (1992) 10. Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 11. Lions, P.-L., Osher, S.J., Rudin, L.: Denoising and deblurring using constrained nonlinear partial differential equations, Tech. Rep., Cognitech Inc., Santa Monica, CA, submit to SINUM 12. Wang, J., Lucier, B.: Error bounds for numerical methods for the ROF image smoothing model (2008) (in preparation) 13. Wang, J.: Error Bounds for Numerical Methods for the ROF Image Smoothing Model, Ph.D. thesis, Purdue (2008)
Theoretical Foundations for Discrete Forward-and-Backward Diffusion Filtering Martin Welk1 , Guy Gilboa2 , and Joachim Weickert1 1
Mathematical Image Analysis Group Faculty of Mathematics and Computer Science, Campus E1.1 Saarland University, 66041 Saarbrücken, Germany {welk,weickert}@mia.uni-saarland.de http://www.mia.uni-saarland.de 2 3DV Systems, 2nd Carmel St., Industrial Park Building 1 P.O. Box 249, Yokneam, 20692, Israel [email protected]
Abstract. Forward-and-backward (FAB) diffusion is a method for sharpening blurry images (Gilboa et al. 2002). It combines forward diffusion with a positive diffusivity and backward diffusion where negative diffusivities are used. The well-posedness properties of FAB diffusion are unknown, and it has been observed that standard discretisations can violate a maximum-minimum principle. We show that for a novel nonstandard space discretisation which pays specific attention to image extrema, one can apply a modification of the space-discrete well-posedness and scale-space framework of Weickert (1998). This allows to establish well-posedness and a maximum-minimum principle for the resulting dynamical system. In the fully discrete 1-D case with an explicit time discretisation, a maximum-minimum principle and total variation reduction are proven in spite of the fact that negative diffusivities may appear. This provides a theoretical justification for applying FAB diffusion to digital images.
1 Introduction In the last two decades, many partial differential equations (PDEs) and variational approaches have been proposed for enhancing digital images; see e.g. [1, 13] for an overview. The continuous framework behind these models offers advantages such as transparent and compact formulations where rotationally invariant approaches are easy to model. However, some of the most interesting models are difficult to analyse in the continuous setting due to well-posedness problems. Often these filters work well in practice, but lack a sound continuous theory. This has triggered researchers to investigate wellposedness properties for space-discrete and fully discrete formulations. Let us mention a few examples. For the Perona–Malik filter, Weickert [13] has proposed a space-discrete and fully discrete theory for smooth nonnegative diffusivities. Moreover, in [14] it is proven that the corresponding explicit scheme preserves monotonicity in the 1-D case. This explains that staircasing is the worst phenomenon that can happen. Pollak et al. [12] have X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 527–538, 2009. c Springer-Verlag Berlin Heidelberg 2009
528
M. Welk, G. Gilboa, and J. Weickert
extended this analysis to singular nonnegative diffusivities by showing well-posedness for dynamical systems with a discontinuous right hand sides that result from a spacediscrete Perona-Malik model. For the stabilised inverse linear diffusion process introduced by Osher and Rudin, it was not possible to establish a continuous well-posedness theory, but a stable minmod discretisation proved to work well in practice [9]. Later on, Breuß and Welk [2] showed that staircasing cannot be avoided by suitable space discretisations. Shock filtering [5,10] constitutes another example of a PDE that is difficult to analyse in the continuous setting, while for a 1-D space discretisation, Welk et al. [15] have shown that this process is well-posed and satisfies a maximum–minimum principle. It was even possible to find an analytic solution of the corresponding dynamical system. On the variational side, Nikolova has published a number of impressive papers that provide deep insights in the behaviour of minimisers of space-discrete energies, even if they are highly nonconvex or nondifferentiable; see e.g. [7, 8]. It would have been extremely difficult if not impossible to obtain similar results in the continuous setting. One PDE that has been proposed for sharpening images and for which no well-posed results are known so far, is the so-called forward-and-backward (FAB) diffusion model of Gilboa et al. [3]. Essentially this is a filter of Perona-Malik type, but its diffusivities are positive in certain areas and negative in others. Since pure inverse diffusion with a negative diffusivity is a prototype of an ill-posed problem, it is not surprising that no well-posedness results exist in the continuous setting. Experimentally it has been observed that straightforward explicit discretisations can violate a maximum–minimum principle. The goal of our paper is to address this problem. We show that space-discrete FAB diffusion is well-posed and satisfies a maiximum–minimum principle if a specific nonstandard discretisation is applied at extrema. This is achieved by modifying the spacediscrete diffusion framework of Weickert [13]. Moreover, for the fully discrete 1-D case with an explicit time discretisation, a maximum-minimum principle and a total variation reduction property are established. Our paper is organised as follows. In Section 2 we discuss the FAB diffusion model, while Section 3 reviews the space-discrete diffision framework from [13]. In the fourth section we present our nonstandard space discretisation for FAB diffusion, and we modify the space-discrete diffusion framework such that it becomes applicable to this model. The fully discrete 1-D case is discussed in detail in Section 5. Our paper is concluded with a summary in Section 6.
2 Forward-and-Backward Diffusion Filtering Forward-and-backward (FAB) diffusion filtering has been introduced by Gilboa, Sochen and Zeevi in 2002 [3]. Let Ω ∈ R2 be a rectangular image domain and consider a greyscale image f : Ω → R that is to be sharpened. Then FAB diffusion filtering creates filtered versions u(x, t) of f (x) by solving a Perona-Malik type [11] equation ∂t u = div g(|∇u|2 ) ∇u (1) with f as initial condition, u(x, 0) = f (x),
(2)
Theoretical Foundations for Discrete Forward-and-Backward Diffusion Filtering
529
and homogeneous Neumann boundary conditions, ∂n u = 0,
(3)
where n denotes a normal vector to the image boundary ∂Ω. Here x := (x, y) , subscripts denote partial derivatives, ∇ := (∂x , ∂y ) is the spatial gradient, and div its corresponding divergence operator. The diffusivity g may have different formulations, for example [4]: 1 α − , g(s2 ) = 2 1 + (s/kb )2 1 + (s/kf )
(4)
where kf and kb control the gradient magnitudes for forward and backward diffusion, respectively, and α is the weight between these terms. Note that for small image gradients, this diffusivity is positive, while it becomes negative for larger ones, and finally becomes positive again. Our theory relies on the essential assumption g(0) > 0, which ensures that extrema undergo forward diffusion. FAB diffusion has also been interpreted as an energy minimisation process of a nonmonotone potential in the shape of a triple-well [4]. In the variational formulation of [4] two additional terms have been introduced: a fidelity term to the input image and a fourth order term (hyper-diffusion) which increases the regularisation, strongly suppressing highly oscillating regions. Here we keep the notion of a sharpening flow without these terms. Connections between FAB diffusion and wavelet methods for image enhancement have been described in [6]. Apart from these results not many theoretical properties of the FAB process have been proven. In particular, existence, uniqueness and stability results are not available. Moreover, it was conjectured that such a process violates a maximum–minimum principle, as it may have a negative diffusivity [3]. This was shown to happen in numerical experiments, using standard numerical methods. In this paper we will prove that using a more sophisticated space discretisation, the process admits the maximum–minimum principle and useful theoretical results can be established.
3 A Space-Discrete Diffusion Framework Let us now review the space-discrete diffusion framework of Weickert [13], since parts of it can be extended to the FAB setting. A standard discretisation of a Perona-Malik type diffusion equation ∂t u = ∂x g(|∇u|2 ) ∂x u + ∂y g(|∇u|2 ) ∂y u (5) in some inner pixel (i, j) yields the ordinary differential equation 1 gi+1,j + gi,j ui+1,j − ui,j gi,j + gi−1,j ui,j dui,j = − dt h1 2 h1 2 gi,j + gi,j−1 ui,j 1 gi,j+1 + gi,j ui,j+1 − ui,j − + h2 2 h2 2
− ui−1,j h1 − ui,j−1 . (6) h2
Here ui,j denotes an approximation to u in pixel (i, j). It is centred in the location ((i − 12 )h1 , (j − 12 )h2 ), where h1 and h2 denote the grid size (pixel width) in x- resp.
530
M. Welk, G. Gilboa, and J. Weickert
y-direction. This formula even holds for boundary pixels, provided that the homogeneous Neumann boundary conditions (3) are implemented by mirroring boundary pixels into dummy pixels. A suitable discretisation for the diffusivity g will be discussed later. In a more compact notation, one can represent a pixel (i, j) by a single index k(i, j). This leads to 2 gl + gk duk = (ul − uk ), (7) dt 2h2n n=1 l∈Nn (k)
where Nn (k) are the neighbours of pixel k in n-direction (boundary pixels may have less neighbours). This can be written as a system of ordinary differential equations (ODEs): du = A(u) u, (8) dt where u = (u1 , ..., uN ) , and the N × N matrix A(u) = (ak,l (u)) satisfies ⎧ gk +gl ⎪ (l ∈ Nn (k)), ⎪ 2h2n ⎪ ⎨ 2 gk +gl ak,l := (9) (l = k), − 2h2n ⎪ n=1 l∈Nn (k) ⎪ ⎪ ⎩ 0 (else). Denoting the index set {1, ..., N } by J, a space-discrete problem class (Ps ) is defined in the following way. ⎫ Let f ∈ RN . Find a function u ∈ C1 ([0, ∞), RN ) that satisfies the ⎪ ⎪ ⎪ ⎪ initial value problem ⎪ ⎪ du ⎪ ⎪ ⎪ = A(u) u, ⎪ ⎪ dt ⎪ ⎪ ⎪ ⎪ u(0) = f , ⎪ ⎪ ⎬ where A = (aij ) has the following properties: (Ps ) N N ×N (S1) Lipschitz-continuity of A ∈ C(R , R ) for every bounded ⎪ ⎪ ⎪ N ⎪ subset of R , ⎪ ⎪ N ⎪ (S2) symmetry: a ij (u) = aji (u) ∀ i, j ∈ J, ∀ u ∈ R , ⎪ ⎪ ⎪ N ⎪ ⎪ (S3) vanishing row sums: a (u) = 0 ∀ i ∈ J, ∀ u ∈ R , ij ⎪ j∈J ⎪ ⎪ ⎪ (S4) nonnegative off-diagonals: aij (u) ≥ 0 ∀ i = j, ∀ u ∈ RN , ⎪ ⎭ (S5) irreducibility for all u ∈ RN . One should remember that a matrix A ∈ RN ×N is called irreducible if for any i, j ∈ J there exist k0 ,...,kr ∈ J with k0 = i and kr = j such that akp kp+1 = 0 for p = 0,...,r−1. In other words: There is a way from pixel i to pixel j along which the diffusivities do not vanish. Under these requirements the subsequent theorem is proven in [13]: Theorem 1 (Properties of Space-Discrete Diffusion Filtering) For the space-discrete filter class (Ps ) the following statements are valid: (a) (Well-Posedness) For each T > 0 the problem (Ps ) has a unique solution u(t) ∈ C1 ([0, T ], RN ). This solution depends continuously on the initial value and the right-hand side of the ODE system.
Theoretical Foundations for Discrete Forward-and-Backward Diffusion Filtering
531
(b) (Maximum-Minimum Principle) Let a := minj∈J fj and b := maxj∈J fj . Then, a ≤ ui (t) ≤ b for all i ∈ J and t ∈ [0, T ]. (c) (Average Grey Level Invariance) The average grey level μ := N1 j∈J fj is not affected by the space-discrete diffu sion filter: N1 j∈J uj (t) = μ for all t > 0. (d) (Lyapunov Functionals) V (t) := Φ(u(t)) := i∈J r(ui (t)) is a Lyapunov function for all r ∈ C1 [a, b] with increasing r on [a, b]: V (t) is decreasing and bounded from below by Φ(c), where c := (μ, ..., μ) ∈ RN . (e) (Convergence to a Constant Steady State) lim u(t) = c. t→∞
The proof shows that not all of the requirements (S1)–(S5) are necessary for each of the theoretical results above: Requirement (S1) is needed for local well-posedness, while proving a maximum–minimum principle requires (S3) and (S4). Local well-posedness together with the maximum–minimum principle implies global well-posedness. The average grey value invariance is based on (S2) and (S3). The existence of Lyapunov functionals can be established by means of (S2)–(S4), and convergence to a constant steady state requires (S5) in addition to (S2)–(S4).
4 Application to Space-Discrete FAB Diffusion It is straightforward to verify the prerequisites (S1)–(S5) for the popular positive diffusivity functions, such that Theorem 1 is applicable. However, for FAB diffusion negative diffusivities are possible and the situation becomes much more complicated. One immediatly sees that space-discrete FAB diffusion with g ∈ C 1 [0, ∞) satisfies (S1: smoothness), (S2: symmetry), and (S3: vanishing row sums). However, this just implies local well-posedness and average grey level invariance. By inspecting (9) it becomes clear that (S4: nonnegative off-diagonals) and (S5: irreducibility) cannot be satisfied for typical FAB diffusivities: These diffusivities may vanish (which violates (S5)) and they may even become negative (violating (S4)). As a consequence, global well-posedness, a maximum–minimum principle, Lyapunov functions and convergence to a constant steady state cannot be proven in this way. For the practical applicability of FAB diffusion it would be highly desirable to have at least global well-posedness and a maximum–minimum principle. Is there a remedy for these properties? Fortunately the answer is affirmative, since (S4: nonnegative off-diagonals) can be replaced by a less restrictive condition that only holds at extrema: Theorem 2 (Space-Discrete Diffusion Filtering under Weaker Conditions) Assume that a space-discrete filter satisfies only the properties (S1)–(S3) of the framework (Ps ), and
532
M. Welk, G. Gilboa, and J. Weickert
(S4a) nonnegative off-diagonals at extrema: = i if u has an extremum in i. ai,j (u) ≥ 0 for all j ∈ J with j Then the well-posedness result (a), the maximum–minimum principle (b), and the average grey level invariance (c) of Theorem 1 are still satisfied. Proof. Following [13], one observes that in some pixel k that is a discrete global maximum (i.e. uk ≥ uj for all j ∈ J), condition (S4a) implies that duk = akj (u) uj dt j∈J = akk (u) uk + akj (u) uj ≤ uk ·
j∈J\{k}
≥0
≤uk
akj (u)
j∈J (S3)
= 0.
(10)
In the same way one can prove that if k is a minimum, one has ≥ 0. This nonenhancement behaviour in extrema is the only place where nonnegativity is required in the entire proof of the maximum–minimum principle in [13]. As a consequence, the maximum–minimum principle still holds if (S4) is replaced by the weaker condition (S4a). Moreover, together with local well-posedness, global well-posedness is obtained. This completes the proof. duk dt
While the preceding results are encouraging, we have not yet shown that a suitable space-discretisation satisfies the nonnegativity requirement (S4a) at extrema. Unfortunately, this issue is a bit more delicate than one might assume: A standard discretisation of the diffusivity g(|∇u|2 ) in some pixel (i, j) is given by the central difference approximation 2 2 ui+1,j − ui−1,j ui,j+1 − ui,j−1 gi,j := g (11) + 2h1 2h2 Note that even if u has an extremum in (i, j), the preceding central difference approximation of |∇u|2 may become positive – and not 0 as one would expect from the continuous theory. Since the FAB diffusivities only guarantee that g(0) > 0, it can happen that this finite difference approximation creates negative diffusivities in extrema and (S4a) is violated. Fortunately there is an interesting alternative to the standard discretisation of the diffusivity that solves these problems immediately: Theorem 3 (Properties of Space-Discrete FAB Diffusion) The space discretisation (6) of FAB diffusion with g(0) > 0 and g ∈ C 1 [0, ∞) is wellposed, satisfies a maximum–minimum principle and average grey level invariance, if the diffusivity is evaluated by the nonstandard finite difference approximation ui+1,j − ui,j ui,j − ui−1,j gi,j := g max · ,0 h1 h1 ui,j+1 − ui,j ui,j − ui,j−1 · ,0 . (12) + max h2 h2
Theoretical Foundations for Discrete Forward-and-Backward Diffusion Filtering
533
It should be noted that this approximation has the same quadratic order of consistency as the previous one. However, it guarantees a vanishing discrete gradient approximation in extrema. As a consequence, (S4a) is guaranteed, since FAB diffusities satisfy g(0) > 0. Interestingly, the positivity property g(0) > 0 together with the smoothness assumption g ∈ C 1 [0, ∞) are the only requirements that are necessary to establish well-posedness and a maximum–minimum principle for space-discrete FAB diffusion. Last but not least, these results are not restricted to the two-dimensional case: With a similar nonstandard approximation, it is straightforward to verify that space-discrete FAB diffusion is well-posed and satisfies an extremum principle in any dimension.
5 Fully Discrete FAB Diffusion In order to establish useful properties for FAB diffusion in the fully discrete case, we restrict ourselves to the 1-D setting and use a simple explicit time discretisation with step size τ . Then the corresponding scheme to ∂t u = ∂x (g((∂x u)2 ) ∂x u) is given by g k + gik uki−1 − uki g k + gik uki+1 − uki − uki uk+1 i = i−1 · · + i+1 (13) 2 τ 2 h 2 h2 k k u −u uk −uk with the nonstandard approximation gik = g max i h i−1 · i+1h i , 0 . The up-
per index denotes the time level, i.e. uki approximates u at location (i − 12 )h and time kτ . This approximation also holds at the boundary pixels u1 and uN when one uses the before mentioned dummy pixels. For our analysis, two additional assumptions are essential. While the first one refers to the range of grey values, the second one requires a diffusivity g that still takes sufficiently large positive values for small positive arguments. We get the following result. Theorem 4 (Properties of Fully Discrete FAB Diffusion) Let an initial 1-D image f = (fi ) be given and let the sequence of images uk = (uki ) evolve according to (13) with the initial condition u0 = f . Let the grey-values fi be restricted to a finite interval of length R. Assume further that two constants c1 > c2 > 0 exist such that the diffusivity g fulfils g(0) = c1 , and g(z) > −c2 for all z > 0. Moreover, assume that a positive ω exists such that g(s2 ) > c2 holds for all s with 0 < s < ωR. If the time step satisfies τ<
ω 2 h4 , c1 + c2 + 2c1 ω 2 h2
(14)
the following results are true for the evolution of (uk ). (a) (Maximum–Minimum Principle) If the initial signal is bounded by a ≤ fi ≤ b for all i ∈ J, then a ≤ uki ≤ b holds for all i ∈ J and all k ≥ 0.
534
M. Welk, G. Gilboa, and J. Weickert
(b) (Total Variation Reduction) For each time step k ≥ 0, the total variation of the image uk+1 is less or equal to the total variation of uk : N −1
N −1 k+1 k k+1 u ui+1 − uki . ≤ − u i+1 i
i=1
(15)
i=1
Proof. The global statements of the theorem follow from local properties which will be proven in four steps. Step 1: A local maximal pixel does not increase Assume that uki is a local maximum of the 1-D image in time step k, i.e. we have k k uki ≥ uki+1 and uki ≥ uki−1 . Since in this case gi−1 + gik and gik + gi+1 are certainly k+1 k k k nonnegative, ui is a convex combination of ui−1 , ui and ui+1 if only 1−
τ k (g k + 2gik + gi+1 )≥0 2h2 i−1
(16)
k k + 2gik + gi+1 ≤ 4c1 this is certainly the case if holds. Because of gi−1
τ≤
h2 . 2c1
(17)
Step 2: A neighbour pixel of a local maximum remains below this maximum Assume that uki is a maximum and uki+1 is not a local minimum. Then the inequality k uk+1 i+1 ≤ ui holds if ω 2 h4 τ≤ . (18) 2c1 ω 2 h2 + c2 To see this, we use the equation k k k uki − uki+1 uki+2 − uki+1 gi+1 + gi+2 gik + gi+1 k+1 k · · (19) + ui+1 = ui+1 + τ · 2 h2 2 h2 and distinguish two cases. Case 1: (uki+1 − uki )(uki+2 − uki+1 ) ≤ ω 2 h2 R2 . k k Then gi+1 + gi+2 is certainly nonnegative. The right-hand side of (19) is therefore a convex combination of uki , uki+1 and uki+2 if (16) holds. Analogous to our above reasoning, this is true if (17) is satisfied. Case 2: (uki+1 − uki )(uki+2 − uki+1 ) > ω 2 h2 R2 . Here we conclude from uki+1 − uki+2 ≤ R that uki − uki+1 > ω 2 h2 R .
(20)
k k k Using 12 (gik + gi+1 ) < c1 and 12 (gi+1 + gi+2 ) > −c2 we obtain from (19) the estimate τ τ k k k uk+1 (21) i+1 ≤ ui+1 + 2 c1 (ui − ui+1 ) + 2 c2 R h h k which ensures uk+1 i+1 ≤ ui , provided that
Theoretical Foundations for Discrete Forward-and-Backward Diffusion Filtering
τ≤
ω 2 h4 c 1 ω 2 h 2 + c2
535
(22)
holds. Condition (18) ensures the bounds of both cases, i.e. (17) and (22). Step 3: No new extrema are generated around existing extrema Assume that uki is a local maximum, and none of its neighbours is a local minimum. Assume first that (uki+1 − uki )(uki+2 − uki+1 ) > ω 2 R2 (23) and thus again (20) and (21) hold. Similar considerations for uk+1 yield i uk+1 ≥ uki + i
τ τ c1 (uki+1 − uki ) − 2 c1 R 2 h h
which together with (21) implies τ k τ k+1 − u ≥ 1 − 2 c1 (ui − uki+1 ) − 2 (c1 + c2 )R . uk+1 i i+1 2 h h
(24)
(25)
By the hypothesis of the theorem, (14), and (20) we have that τ<
h2 , (c1 + c2 )R/(uki − uki+1 ) + 2c1
(26)
such that the expression on the right-hand side of (25) is nonnegative. k+1 ) only if Therefore uk+1 i+1 can become a maximum in (u (uki+1 − uki )(uki+2 − uki+1 ) ≤ ω 2 h2 R2 .
(27)
Analogous reasoning applies to the left neighbour uk+1 i−1 . This means that the maximum property of pixel i can be shifted to one of its neighbours. Our assertion that no new k+1 extrema are generated remains true except if both neighbours uk+1 i−1 and ui+1 would simultaneously turn into maxima. Let us therefore discuss this case. This would require the two inequalities (uki+1 − uki )(uki+2 − uki+1 ) ≤ ω 2 h2 R2 and (uki−1 − uki−2 )(uki − uki−1 ) ≤ ω 2 h2 R2
(28)
k k k k + gi+2 and gi−1 + gi−2 are to hold at the same time. In this situation, however, gi+1 nonnegative, implying
uki − uki+1 h2 k u − uk ≤ uki−1 + τ c1 i 2 i−1 , h
k uk+1 i+1 ≤ ui+1 + τ c1
and
uk+1 i−1
(29)
536
M. Welk, G. Gilboa, and J. Weickert
while for the central pixel uk+1 ≥ uki + τ c1 i
uki−1 − 2uki + uki+1 h2
(30)
holds. Hence,
τ k+1 k+1 + 2u − u ≥ 1 − 2 c1 (−uki−1 + 2uki − uki+1 ) . −uk+1 i−1 i i+1 h2
(31)
2
h For τ ≤ 2c , the right-hand side is clearly nonnegative which ensures that uk+1 i−1 and 1 k+1 ui+1 cannot both become maxima.
Step 4: Monotonicity is preserved in image segments without extrema k+1 Assume that uki > uki+1 > uki+2 > uki+3 . We show that then also uk+1 i+1 ≥ ui+2 holds. In the proof we distinguish three cases. k k k Case 1: gik + gi+1 ≥ 0 and gi+2 + gi+3 ≥ 0. Then τ k k+1 uk+1 − u ≥ 1 − 2 c1 (ui+1 − uki+2 ) (32) i+1 i+2 h2 such that the right-hand side is again nonnegative if (17) holds. k k k Case 2: gik + gi+1 ≥ 0 and gi+2 + gi+3 < 0. k k k (The case gik + gi+1 < 0 and gi+2 + gi+3 ≥ 0 is treated in a symmetric way.) k k k From ui+2 − ui+3 ≤ R and (ui+1 − uki+2 )(uki+2 − uki+3 ) > ω 2 h2 R2 we obtain
uki+1 − uki+2 > ω 2 h2 R .
(33)
Consequently, τ c1 (uki+1 − uki+2 ) − h2 τ > uki+1 − uki+2 − 2 2 c1 (uki+1 − uki+2 ) − h Due to (33) the right-hand side is certainly nonnegative if k+1 k k uk+1 i+1 − ui+2 ≥ ui+1 − ui+2 − 2
τ≤
τ c2 (uki+2 − uki+3 ) h2 τ c2 R . (34) h2
ω 2 h4 . 2c1 ω 2 h2 + c2
(35)
k k k < 0 and gi+2 + gi+3 < 0. Case 3: gik + gi+1 Since in this case we have
(uki − uki+1 ) + (uki+2 − uki+3 ) ≤ R ,
(36)
(uki+1 − uki+2 ) min(uki − uki+1 , uki+2 − uki+3 ) > ω 2 h2 R2
(37)
uki+1 − uki+2 > 2ω 2 h2 R .
(38)
it follows that
and thus
A similar reasoning as in Case 2 gives that τ≤
uk+1 i+1
−
uk+1 i+2
ω 2 h4 . 2c1 ω 2 h2 + c2 /2
is ensured if (39)
Theoretical Foundations for Discrete Forward-and-Backward Diffusion Filtering
537
Comparing the bounds derived for the different statements yields (14) as the most restrictive one. If this condition is imposed, extrema cannot be created but only shifted to neighbouring pixels, and monotone segments preserve their monotonicity. Both the maximum–minimum principle and the reduction of total variation follow immediately. This completes the proof. We are convinced that Theorem 4 also possesses a 2-D analogue. The preceding proof, however, does not transfer in a straightforward way to this case: The dependency of g on nonstandard discretisations of ux and uy (cf. (12)) makes it highly cumbersome to control the sign of g.
6 Summary and Conclusions In spite of its negative diffusivity, FAB diffusion becomes well-posed if a nonstandard space discretisation is used. It guarantees a positive diffusivity in discrete extrema. This result is fundamental for justifying FAB diffusion in a practical setting with digital images. Our ongoing work includes research on the multidimensional fully discrete case as well as extensions of our results to (semi-)implicit time discretisations.
Acknowledgement This work has been initiated during a visit of Guy Gilboa to Saarland University. His visit has been financially supported by the Minerva Foundation.
References 1. Aubert, G., Kornprobst, P.: Mathematical Problems in Image Processing: Partial Differential Equations and the Calculus of Variations, 2nd edn. Applied Mathematical Sciences, vol. 147. Springer, New York (2006) 2. Breuß, M., Welk, M.: Staircasing in semidiscrete stabilised inverse diffusion algorithms. Journal of Computational and Applied Mathematics 206, 520–533 (2007) 3. Gilboa, G., Sochen, N.A., Zeevi, Y.Y.: Forward-and-backward diffusion processes for adaptive image enhancement and denoising. IEEE Transactions on Image Processing 11(7), 689– 703 (2002) 4. Gilboa, G., Sochen, N.A., Zeevi, Y.Y.: Image sharpening by flows based on triple well potentials. Journal of Mathematical Imaging and Vision 20, 121–131 (2004) 5. Kramer, H.P., Bruckner, J.B.: Iterations of a non-linear transformation for enhancement of digital images. Pattern Recognition 7, 53–58 (1975) 6. Mrázek, P., Weickert, J., Steidl, G.: Diffusion-inspired shrinkage functions and stability results for wavelet denoising. International Journal of Computer Vision 64(2/3), 171–186 (2005) 7. Nikolova, M.: Local strong homogeneity of a regularized estimator. SIAM Journal on Applied Mathematics 61(2), 633–658 (2000) 8. Nikolova, M.: Minimizers of cost-functions involving nonsmooth data fidelity terms. Application to the processing of outliers. SIAM Journal on Numerical Analysis 40(3), 965–994 (2002)
538
M. Welk, G. Gilboa, and J. Weickert
9. Osher, S., Rudin, L.: Shocks and other nonlinear filtering applied to image processing. In: Tescher, A.G. (ed.) Applications of Digital Image Processing XIV. Proceedings of SPIE, vol. 1567, pp. 414–431. SPIE Press, Bellingham (1991) 10. Osher, S., Rudin, L.I.: Feature-oriented image enhancement using shock filters. SIAM Journal on Numerical Analysis 27, 919–940 (1990) 11. Perona, P., Malik, J.: Scale space and edge detection using anisotropic diffusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 12, 629–639 (1990) 12. Pollak, I., Willsky, A.S., Krim, H.: Image segmentation and edge enhancement with stabilized inverse diffusion equations. IEEE Transactions on Image Processing 9(2), 256–266 (2000) 13. Weickert, J.: Anisotropic Diffusion in Image Processing. Teubner, Stuttgart (1998) 14. Weickert, J., Benhamouda, B.: A semidiscrete nonlinear scale-space theory and its relation to the Perona–Malik paradox. In: Solina, F., Kropatsch, W.G., Klette, R., Bajcsy, R. (eds.) Advances in Computer Vision, pp. 1–10. Springer, Wien (1997) 15. Welk, M., Weickert, J., Gali´c, I.: Theoretical foundations for spatially discrete 1-D shock filtering. Image and Vision Computing 25(4), 455–463 (2007)
L0 -Norm and Total Variation for Wavelet Inpainting Andy C. Yau1 , Xue-Cheng Tai1,2 , and Michael K. Ng3 1
Division of Mathematical Science, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 2 Mathematics Institute, University of Bergen, Norway 3 Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Hong Kong
Abstract. In this paper, we suggest an algorithm to recover an image whose wavelet coefficients are partially lost. We propose a wavelet inpainting model by using L0 -norm and the total variation (TV) minimization. Traditionally, L0 -norm is replaced by L1 -norm or L2 -norm due to numerical difficulties. We use an alternating minimization technique to overcome these difficulties. In order to improve the numerical efficiency, we also apply a graph cut algorithm to solve the subproblem related to TV minimization. Numerical results will be given to demonstrate our advantages of the proposed algorithm.
1
Introduction
Inpainting refers as filling the missing “information" in an image. However, missing information of the image can be in the pixel domain, but also can be in the other domain. As wavelet plays an important role in the image compression, some information may be lost when the image is compressed and transmitted, either in terms of pixels or wavelet coefficients. In this work, we shall consider to “inpaint" the missing wavelet information. Inpainting idea in digital image processing has been developed for several years. Masnou and Morel [17] solved the inpainting problem by using the propagating level curves. Bertalmio et. al. [2] suggested to solve a third order PDE. Chan and Shen [6] proposed a total variation (TV) inpainting model which uses variational methods in inpainting. Tai et. al [18] suggested an inpainting algorithm that propagate the information into the inpainting domain along the isophote direction by solving TV-Stokes equation. Chan et. al. [5] suggested a unified TV model for inpainting and superresolution. However, all these methods are in the pixel domain only. In the wavelet domain, the situation is totally different. The damages in the wavelet domain will give the image with correlated damage patterns in the pixel domain. Therefore, we cannot recover the damage image directly in the pixel domain. Chan et. al. [7] suggested a wavelet-based TV
The research is supported by MOE (Ministry of Education) Tier II project T207N2202 and IDM project NRF2007IDMIDM002-010. In addition, the support from SUG 20/07 is also gratefully acknowledged.
X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 539–551, 2009. c Springer-Verlag Berlin Heidelberg 2009
540
A.C. Yau, X.-C. Tai, and M.K. Ng
algorithm to recover the image whose wavelet coefficients are lost. They applied TV regularization in the pixel domain to control and restore wavelet coefficients in the wavelet domain. Cai et. al. [3] suggested a tight frame based inpainting algorithm. They found the sparse representation of the image by using the smoothed function with the L1 -norm and solved the inpainting problem by the projection of a vector onto the convex set. In this paper, we will restore the image by finding the best sparse representation of the image in the wavelet domain with a TV minimization. There are some earlier works for image restoration in the wavelet domain with the TV minimization. Chan and Zhou [8] discussed the image denoising and compression by combining wavelet and the TV minimization. Durand and Froment [11] used the TV minimization in conjunction with wavelet to eliminate pseudo oscillations in image restoration. Wang and Zhou [20] suggested a wavelet-based TV minimization to denoise the medical image. To find the sparse representation, we apply the L0 -norm which counts the number of nonzero of the vector. It is usually used for finding spare representations. However, it is a combinatorial problem which is the NP-complex problem [10]. It is hard to be solved directly and therefore it has not been used much for real applications. It is usually replaced by the L1 -norm. This makes the objective function of the optimization to be a convex functional and the problem can be solved more easily. However, the computational cost for finding the solution of such optimization is very expensive. Mancera and Portilla [16] suggested a method to find the sparse representations by the L0 -norm directly. They presented a sub-optimal method, that is, looking for the vector with K non-zero coefficients to minimize the Euclidean distance to the input signal. In this paper, we will consider the observed image damaged in the wavelet domain and propose a fast algorithm for image restoration by using the L0 -norm and the TV minimization. We first find the best sparse representation by taking the L0 -norm in the wavelet domain and then fill the missing information by the TV minimization solved by a graph cut algorithm [1] [9]. We will present a method to solve the L0 -norm minimization directly and efficiently. The paper will be organized as follows. In Section 2, we will introduce our mathematical model and discuss the method to solve it. Numerical results will be given in Section 3 to demonstrate our algorithm. We will conclude this paper in Section 4.
2
Mathematical Model
Consider the image domain Ω ⊂ IR2 . Let g be the original image of size n1 × n2 . Then the noisy image g˜ is given by g˜ = g + η
(1)
where η is a noise vector. Let W be the wavelet transform that maps an image from the pixel domain to the wavelet domain. Then the wavelet decomposition of g˜ is given by gˆ = Wg + Wη.
(2)
L0 -Norm and Total Variation for Wavelet Inpainting
541
We assume that some of the wavelet coefficients are lost. Let J be the index set indicating the positions that the wavelet coefficients are known and the rest are lost. Then we define a mapping Π by 1, if (i, j) ∈ J; Πi,j = (3) 0, if (i, j) ∈ / J. Therefore, the observed image, which is damaged in the wavelet domain, can be written as gˆob = Π(Wg + Wη). (4) To restore the image, we consider the following minimization. min Π(ˆ u − gˆ)0 + βT V (u) u
(5)
where β is a regularization parameter and T V (u) denotes the TV norm of u, which is T V (u) = |∇u|dx. (6) Ω
Here and later, for any function defined in the image domain, we will use u ˆ to denote its wavelet representation, i.e. u ˆ = Wu. Due to the orthogonality of the wavelet transformation, we always have ˆ u2 = u2.
(7)
For simplicity, we have use · 2 to denote the both the matrix L2 and pixel domain L2 (Ω) norms. The L0 -norm measures the differences of wavelet coefficients between the observed image and the resultant image while the TV norm fills the missing information of the image in the pixel domain. Minimization problem (5) is trying to find an image u whose wavelet coefficients is close to gˆ while its TV-norm is minimized in the pixel domain. This problem is not easy to solve. To solve the minimization (5), we shall introduce one more auxiliary function f and one more fitting term [13] [21] to the minimization (5). Then it becomes 1 min Π(ˆ u − gˆ)0 + u − f 22 + βT V (f ) u,f α 1 2 ˆ u − f 2 + βT V (f ) = min min Π(ˆ u − gˆ)0 + ˆ u f α
(8)
We have used property (7) in the above formulation. The new auxiliary image f is an approximation of u and the new fitting α1 u − f 22 is used to control the difference between u and f in the pixel domain. When α goes to zero, the image f will go to u. The advantage with the above formulation is that we can solve the minimization (5) by solving two sub-minimization problems, i.e. 1 ˆ u − fˆ22 , u ˆ α min u − f 22 + βT V (f ).
min Π(ˆ u − gˆ)0 + f
(9) (10)
542
A.C. Yau, X.-C. Tai, and M.K. Ng
These two minimization problem are coupled. The first minimization problem is trying to find u with a given f and the second minimization problems is trying to find f with a given u. We shall try to use an iterative scheme to alternatively minimize these two sub-problems. For the first minimization problem (9), We shall show that the solution of can be given by a simple explicit formula when α approaches zero. This is cost efficient. The second minimization problem (10) is essentially the ROF model. We shall use a new fast graph cut algorithm to solve it. In the minimization (9), the minimization functional is separable and we can minimize with respect to u ˆi,j separately for different (i, j). This is an very important property. For each (i, j), there are two possible cases for u ˆi,j . We solve it by considering all the cases. Case 1: u ˆi,j = gˆi,j . The objective functional value of (9) related to the (i, j)-th coefficient is 1 gi,j − fˆi,j |22 . 0 + |ˆ (11) α = gˆi,j . The objective functional value of (9) related to the (i, j)-th Case 2: u ˆi,j coefficient is 1 ui,j − fˆi,j |22 . 1 + |ˆ (12) α However, u ˆi,j has two possible choices, either gˆi,j or fˆi,j . Therefore, in this case, we substitute u ˆi,j = fˆi,j into (12), and we have 1+
1 ˆ fi,j − fˆi,j 22 = 1. α
(13)
If u ˆi,j = gˆi,j , the following inequality must hold. 1 ˆ ui,j − fˆi,j 22 ≤ 1. α
(14)
Therefore, the update scheme for u ˆi,j is u ˆi,j =
gˆi,j , fˆi,j ,
if α1 (ˆ gi,j − fˆi,j )2 ≤ 1 and (i, j) ∈ J; otherwise.
(15)
If α is small enough, the update scheme (15) becomes u ˆi,j =
gˆi,j , fˆi,j ,
if (i, j) ∈ J; otherwise.
(16)
In case there is noise, we should choose a small α and use (15) to update u ˆi,j . When u ˆ is found, we can find f by applying the graph cut algorithm to the minimization (10). In order to use the graph cut algorithm, we need to discretize the TV norm in the minimization. Let M = {(i, j)|i ∈ {1, . . . , n1 }, j ∈
L0 -Norm and Total Variation for Wavelet Inpainting
543
{1, . . . , n2 }} is the set of grid points, and δ denote the mesh size. We should use a special form for the discrete TV norm as in [1] [4] [19] T V k (f ) =
p∈M q∈Nk (p)
1 ωpq |fp − fq | 2
(17)
where Nk (p) is the set of neighboring points of any grid point p ∈ M and defined as N4 = {(i ± 1, j), (i, j ± 1)|(i, j) ∈ M }, N8 = {(i ± 1, j), (i, j ± 1), (i ± 1, j ± 1)|(i, j) ∈ M } 2
4δ and ωpq = kp−q . 2 Finally, the minimization (10) can be rewritten in the discrete form as follows
min f
|up − fp |2 + β
p∈M q∈Nk (p)
p∈M
1 ωpq |fp − fq |. 2
(18)
We assume that the image is in n-bit grey scale format and thus f can only take values in [0, 1, . . . , 2n − 1]. Due to this special requirement, we can solve the minimization (10) by a graph cut algorithm. 2.1
Graph Construction
In this subsection, we shall use the graph cut method to solve the minimization (18). The graph cut method basically can be divided into two parts: graph construction and finding the minimal cut. Our graph construction is based on the method from Bae and Tai [1], which constructs a 3-dimensional graph. Consider that the observed image is n-bit gray level image. Then the range of the intensity level of this image is from 0 to 2n − 1 and a set of vertices is defined as V = {vp,l | p ∈ M, l ∈ {1, . . . , 2n − 1}} ∪ {s} ∪ {t}. Here s, t refer to the two terminal nodes and we refer to [1] for some more details. All the edges for the graph can be divided into two groups, Ed and Er , that is E = Ed ∪ Er . where n
Ed = ∪2l=1−2 {(vp,l , vp,l+1 )|p ∈ M } ∪ {(s, vp,1 )|∀p ∈ M } ∪ {(vp,2n −1 , t)|∀p ∈ M }; Er = {(vp,l , vq,l )|p ∈ M, q ∈ Nk (p), ∀l ∈ {1, . . . , 2n − 1}}. The cost of edges in Ed is defined by the data fitting terms, which is given by c(s, vp,1 ) = δ 2 |up |2 , c(vp,l , vp,l+1 ) = δ 2 |up − l|2
where l ∈ {1, . . . , 2n − 2},
c(vp,2n −1 , t) = δ 2 |up − (2n − 1)|2 .
544
A.C. Yau, X.-C. Tai, and M.K. Ng
We say that a cut is admissible if it exactly severs one edge for each p ∈ M , in which case exactly n1 n2 edges from Ed are severed. The cost of the edge in Er is defined by the TV norm, which is given by c(vp,l , vq,l ) = βωpq
where p ∈ M and q ∈ Nk (p).
The above method will give us the 3-dimensional graph G = (V, E). Then we can find the minimal cut with G. A cut on G is a partition of the vertices V into two disjoint sets (Vs , Vt ) such that s ∈ Vs and t ∈ Vt . For a given cut, the set of severed edges C is defined as C = {(a, b)|a ∈ Vs , b ∈ Vt and (a, b) ∈ E}
(19)
The cut severs the edge e if e is contained in C. The cost of the cut is defined as |C| = c(e) (20) e∈C
The minimal cut is that the total cost of the cut |C| is the minimum. As in [1], we associate an f with every admissible cut C on G through definition ⎧ if (s, vp,1 ) ∈ C, ⎨0 if (vp,l , vp,l+1 ) ∈ C, (21) fp = l ⎩ n 2 −1 if (vp,2n −1 , t) ∈ C. It is easy to see the following relation between a cut C and the image function f : 1 ωpq |fp − fq |. |C| = |up − fp |2 + β (22) 2 p∈M
p∈M q∈Nk (p)
which is the objective function of the minimization (18). So if C is the minimal cut, then the objective function is the minimum with respect to f . Therefore, to find the minimal cut on this graph is equivalent to solve the minimization (18). Besides the graph construction mentioned above, there are other methods to construct the graph. Darbon et. al. suggested construct the graph with one layer only and solve the problem level by level [9]. Ishikawa et. al. proposed a similar graph construction with more vertices and edges [15] [14]. According to the method mentioned above, we can construct the corresponding graph to the TV minimization (18) and find the minimal cut by applying the push-and-relabel algorithm [12]. As a result, we can solve the minimization (10).
3
Numerical Result
In the numerical experiment, three images, which are ‘lena’ image, ‘bush’ image and synthetic image, of size 96×96 are used for testing and shown in figure 1. We compare the quality of the image by peak-signal-to-noise ratio (PSNR) which is given by
2552 . (23) P SN R = 10 log10 u − u0 22 where u0 is an original image and u is a reconstructed image.
L0 -Norm and Total Variation for Wavelet Inpainting
545
In the experiment, we assume that α is small enough such that we can apply the update scheme (16) directly and we initiate β with some number and decrease β by 1 in each iteration. The reason is that the large β can recover the geometric information, and the small β can recover the details of the image. Figure 2 shows the position of missing wavelet coefficients of the observed images. We use ‘db7’ wavelet in our experiment. We use Matlab to run the experiment in the laptop computer with Intel Core 2 CPU T7200 (2GHz) and 2 GB memory. We will compare with the Chan et. al. algorithms [7] and name Model 1 and Model 2 to represent their model 1 and model 2.
(a)
(b)
(c)
Fig. 1. Original images: (a) ‘lena’ image, (b) ‘bush’ image and (c) synthetic image
(a)
(b)
Fig. 2. The position of the missing coefficients of the observed images: (a) 10% of wavelet coefficients are missing and (b) 50% of wavelet coefficients are missing
3.1
Noise Free Image
In the first experiment, we test our algorithm with the noise free image. Figure 3 shows the lena image with 10% wavelet coefficients lost and its restored image. The starting β for this case is 60 and PSNR of the resultant image is 27.80 dB. The result obtained by Chan et. al. algorithms [7] are also shown in the figure. Figure 3(c) is obtained by their Model 1 with PSNR=28.26 dB and Figure 3(d) with PSNR=26.70 dB is obtained by their Model 2. Figure 4 shows the bush image with 10% wavelet coefficients lost and its restored image. The starting β for this case is 55 and PSNR of the resultant image is 29.54 dB. Figure 4(c) is obtained by Model 1 with PSNR=26.89 dB and Figure 4(c) with PSNR=25.65 dB is obtained by Model 2. Figure 5 shows the bush image with 50% wavelet coefficients lost and its restored image. The starting β for this case is 60 and PSNR of the resultant
546
A.C. Yau, X.-C. Tai, and M.K. Ng
(a)
(b)
(c)
(d)
Fig. 3. The image with 10% of wavelet coefficient lost : (a) observed image (PSNR=11.84 dB) and (b) restored image (PSNR = 27.80 dB). (c) and (d) are obtained by the Chan et. al. algorithms in [7].
(a)
(b)
(c)
(d)
Fig. 4. The image with 10% of wavelet coefficient lost : (a) observed image (PSNR=16.08 dB) and (b) restored image (PSNR = 29.54 dB). (c) and (d) are obtained by the Chan et. al. algorithms in [7].
(a)
(b)
(c)
(d)
Fig. 5. The image with 50% of wavelet coefficients lost : (a) observed image (PSNR=11.00 dB) and (b) restored image (PSNR=19.48 dB). (c) and (d) are obtained by the Chan et. al. algorithms in [7].
image is 19.48 dB. Figure 5(c) is obtained by Model 1 with PSNR=17.92 dB and Figure 5(d) with PSNR=18.22 dB is obtained by Model 2. Figure 6 shows the synthetic image with 50% wavelet coefficients lost and its restored image. The starting β for this case is 60 and PSNR of the resultant image is 23.79 dB. Figure 6(c) is obtained by Model 1 with PSNR=26.04 dB and Figure 6(d) with PSNR=22.38 dB is obtained by their Model 2. Table 1 summarizes the results of our experiment.
L0 -Norm and Total Variation for Wavelet Inpainting
(a)
(b)
(c)
547
(d)
Fig. 6. The image with 50% of wavelet coefficient lost : (a) observed image (PSNR=10.02 dB) and (b) restored image (PSNR = 23.79 dB). (c) and (d) are obtained by the Chan et. al. algorithms in [7]. Table 1. Comparsion of noise free cases (PSNR) Image
Missing coef. Obs. image Our alg. Model 1 Model 2
Lena image Bush image Bush image Synthetic image
10% 10% 50% 50%
(a)
11.84 16.08 11.00 10.02
27.80 29.54 19.48 23.79
(b)
28.26 26.89 17.92 26.04
26.70 25.65 18.22 22.38
(c)
Fig. 7. Noisy ‘lena’ image, ‘bush’ image and synthetic image
(a)
(b)
(c)
(d)
Fig. 8. The image with 10% of wavelet coefficient lost : (a) observed image (PSNR=11.28 dB) and (b) restored image (PSNR = 22.37 dB). (c) and (d) are obtained by the Chan et. al. algorithms in [7].
The experimental results show that our method can obtained better result than Model 2. The result of the Bush image is also better than those results from Model 1 in PSNR. The resultant image shows that our method can keep
548
A.C. Yau, X.-C. Tai, and M.K. Ng
(a)
(b)
(c)
(d)
Fig. 9. The image with 10% of wavelet coefficients lost : (a) observed image (PSNR=15.93 dB) and (b) restored image (PSNR=26.22 dB). (c) and (d) are obtained by the Chan et. al. algorithms in [7].
(a)
(b)
(c)
(d)
Fig. 10. The image with 50% of wavelet coefficients lost : (a) observed image (PSNR=10.76 dB) and (b) restored image (PSNR=17.54 dB). (c) and (d) are obtained by the Chan et. al. algorithms in [7].
(a)
(b)
(c)
(d)
Fig. 11. The image with 50% of wavelet coefficients lost : (a) observed image (PSNR=9.85 dB) and (b) restored image (PSNR=19.06 dB). (c) and (d) are obtained by the Chan et. al. algorithms in [7].
Table 2. Comparsion of noisy cases (PSNR) Image Lena image Bush image Bush image Synthetic image
Missing coef. Obs. image Our alg. Model 2 Model 2 10% 10% 50% 50%
11.28 15.93 10.76 9.85
22.37 26.22 17.54 19.06
17.79 24.47 15.36 15.84
20.67 22.48 15.90 18.95
L0 -Norm and Total Variation for Wavelet Inpainting
549
more details in the restored image than the other methods. In Figure 6, the small circle on the right upper corner is clearer than the other images. 3.2
Noisy Image
In the second experiment, we test our algorithm with noisy images. We add the white Guassian noise with σ = 0.01 to the original images which are shown in Figure 7. Figure 8 shows the lena image with 10% of wavelet coefficients lost and its restored image. The starting β for this case is 65. We obtained the best image when β = 23 and its PSNR is 22.37 dB. The PSNR of Figure 8(c) by using Model 1 is 17.79 dB) and Figure 8(d) is 20.67 dB by using Model 2. Figure 9 shows the input image with 10% of wavelet coefficients lost and its restored image. The starting β for this case is 50 and the PSNR of the resultant image is 26.22 dB. The PSNR of Figure 9(c) by using Model 1 is 24.47 dB and Figure 9(d) is 22.48 dB by using Model 2. Figure 10 shows the bush image with 50% of wavelet coefficients lost and its restored image. The starting β for this case is 60 and the PSNR of the resultant image is 17.54 dB. The PSNR of restored images are 15.36 dB by using Model 1 (Figure 10(c)) and 15.90 dB by using Model 2 (Figure 10(d)). Figure 11 shows the synthetic image with 50% of wavelet coefficients lost and its restored image. The starting β for this case is 60 and the PSNR of the resultant image is 19.06 dB. The PSNR of restored images are 15.84 dB by using Model 1 (Figure 11(c)) and 18.95 dB by using Model 2 (Figure 11(d)). Table 2 summarizes the results of our experiment. In this experiment, it is more difficult as the noise is present in the observed image. Table 2 shows that our restored image better than those results from the other methods in PSNR. The resultant images show that our method can remove the noise and keep more details of the image. In Figure 10, our resultant image keeps more details than the others resultant images. The head of Bush in our resultant image remains a better shape than in the other images.
4
Conclusion
In this paper, we introduce the algorithm to solve the wavelet inpainting problem. We apply the L0 -norm to optimize the wavelet coefficients and the TV minimization to fill the missing information. We suggest a method to solve the L0 -norm directly. We solve the minimization (5) by introducing one more fitting term and break down into two minimizations (9) and (10). The minimization (9) minimizes the L0 -norm and the minimization (10) minimizes the TV norm. We apply the graph cut algorithm to solve the TV minimization. The experimental results show that our algorithm can obtain better results.
550
A.C. Yau, X.-C. Tai, and M.K. Ng
References 1. Bae, E., Tai, X.C.: Graph Cuts for the Multiphase Mumford-Shah Model Using Piecewise Constant Level Set Methods. UCLA, Applied Mathematics, CAMreport-08-36 (2008) 2. Bertalmio, M., Sapiro, G., Caselles, V., Balleste, C.: Image inpainting. Technical report, ECE-University of Minnesota 60, 259–268 (1999) 3. Cai, J.F., Chan, R.H., Shen, Z.: A Framelet-Based Image Inpainting Algorithm. Appl. Comput. Harmon. Anal. 24, 131–149 (2008) 4. Chambolle, A.: Total variation minimization and a class of binary MRF models. In: Rangarajan, A., Vemuri, B.C., Yuille, A.L. (eds.) EMMCVPR 2005. LNCS, vol. 3757, pp. 136–152. Springer, Heidelberg (2005) 5. Chan, T.F., Ng, M.K., Yau, A.C., Yip, A.M.: Superresolution image reconstruction using fast inpainting algorithms. Applied and Computational Harmonic Analysis 23(1), 3–24 (2007) 6. Chan, T., Shen, J.: Mathematical models for local non-texture inpainting. SIAM Journal on Applied Mathematics 62, 1019–1043 (2001) 7. Chan, T., Shen, J., Zhou, H.M.: Total Variation Wavelet Inpainting. Journal of Mathematical Imaging and Vision 25(1), 107–125 (2006) 8. Chan, T., Zhou, H.M.: Optimal Constructions of Wavelet Coefficients Using Total Variation Regularization in Image Compression. UCLA, Applied Mathematics, CAM Report, No. 00–27 (2000) 9. Darbon, J., Sigelle, M.: Image restoration with discrete constrained total variation part I: Fast and exact optimization. J. Math. Imaging Vis. 26(3), 261–276 (2006) 10. Donoho, D.L.: For Most Large Undetermined Systems of Linear Equations the Minimal l1-norm Solution is also the Sparsest Solution. Communications on Pure and Applied Mathematics 59(7), 903–934 (2006) 11. Durand, S., Froment, J.: Artifact Free Signal Denoising with Wavelets. In: Proceedings of ICASSP 2001, vol. 6, pp. 3685–3688 (2001) 12. Goldberg, A.V., Tarjan, R.E.: A new approach to the maximum-flow problem. J. ACM 35(4), 921–940 (1988) 13. Huang, Y., Ng, M.K., Wen, Y.: A Fast Total Variation Minimization Method for Image Restoration. Multiscale Modeling & Simulation 7(2), 774–795 (2008) 14. Ishikawa, H.: Exact optimization for markov random fields with convex priors. IEEE Transactions on Pattern Analysis and Machine Intelligence 25(10), 1333– 1336 (2003) 15. Ishikawa, H., Geiger, D.: Segmentation by grouping junctions. In: CVPR 1998: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, DC, USA. IEEE Computer Society, Los Alamitos (1998) 16. Mancera, L., Portilla, J.: L0-Norm-Based Sparse Representation Through Alternate Projections. In: IEEE International Conference on Image Processing, Atlanta, pp. 2089–2092 (2006) 17. Masnou, S., Morel, J.: Level-lines based disocclusion. In: Proc. 5th IEEE Int. Conf. on Image Process., Chicago, pp. 259–263 (1998) 18. Tai, X.C., Osher, S., Holm, R.: Image Inpainting Using a TV-Stokes Equation. In: Image Processing based on partial differential equations, pp. 3–22. Springer, Heidelberg (2006)
L0 -Norm and Total Variation for Wavelet Inpainting
551
19. Ranchin, F., Chambolle, A., Dibos, F.: Total Variation Minimization and Graph Cuts for Moving Objects Segmentation. In: Sgallari, F., Murli, A., Paragios, N. (eds.) SSVM 2007. LNCS, vol. 4485, pp. 743–753. Springer, Heidelberg (2007) 20. Wang, Y., Zhou, H.: Total Variation Wavelet-Based Medical Image Denoising. International Journal of Biomedical Imaging 2006, 1–6 (2006) 21. Wang, Y., Yang, J., Yin, W., Zhang, Y.: A New Alternating Minimization Algorithm for Total Variation Image Reconstruction. SIAM J. Imaging Science 1(3), 248–272 (2008)
Total-Variation Based Piecewise Affine Regularization Jing Yuan1 , Christoph Schnörr1, and Gabriele Steidl2 1
Image and Pattern Analysis Group Dept. Mathematics and Computer Science, University of Heidelberg, Germany {yuanjing,schnoerr}@math.uni-heidelberg.de 2 Appl. Math. Comp. Sci. Group Dept. Mathematics and Computer Science, University of Mannheim [email protected]
Abstract. In this paper, we introduce a novel second-order regularizer, the Affine Total-Variation term, to capture the geometry of piecewise affine functions. The approach can be characterized by two convex decompositions of a given image into piecewise affine structure and texture and noise, respectively. A convergent multiplier-based method is presented for computing a global optimum by computationally cheap iterative steps. Experiments with images and vector fields validate our approach and illustrate the difference to classical TV denoising and decomposition.
1 1.1
Introduction Overview and Motivation
In this paper, we suggest and investigate a novel second-order regularization term, (1) u2xx + u2yx + u2xy + u2yy dx , TVa (u) := Ω
called Affine Total Variation, for denoising and decomposing functions into piecewise affine structures. Our work has been motivated by the basic total variation approach [15] to the piecewise constant regularization of functions, henceforth called ROF-model, and a recent extension of this approach suggested in [23] to the piecewise harmonic regularization of vector fields. The latter approach demonstrates that by modifying the usual total variation term TV(u) = |∇u| dx , (2) Ω
flows can be restored and decomposed into richer structure than merely piecewise constant functions, that only model a narrow subclass of real signals sufficiently accurate. At the same time, the basic structure of the ROF-model from the viewpoint of convex optimization has been preserved, such that standard methods from convex programming lead to efficient algorithms. X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 552–564, 2009. c Springer-Verlag Berlin Heidelberg 2009
Total-Variation Based Piecewise Affine Regularization
553
While the work [23] was motivated by flows related to image sequences from experimental fluid dynamics, our present work investigates piecewise affine regularization as an alternative to the piecewise harmonic case studied in [23]. Figure 1 shows the result of applying the novel regularizer (1) to a noisy image function. Our approach returns a denoised version of the input data with the piecewise affine structures preserved well. From the viewpoint of optimization, our approach has the same simple structure as the ROF-model. From the viewpoint of algorithm design, however, a bit more work is required to be able to resort to standard algorithms, due to the second-order partial derivatives appearing in (1).
Fig. 1. From left to right. Noisy input image f , denoised image u using the regularizer (1), and the difference between the original noise-free image and the denoised image. Up to local errors at discontinuities, this latter image is almost constant which means that the piecewise affine structure underlying the noisy input data has been successfully restored.
1.2
Related Work and Contribution
Related work. Applying the standard TV-term (2) to general, not necessarily piecewise constant signals and images, leads to the well-known staircasing effect, that is to many jumps of the minimizing functions making the decomposition of the input data useless for signal interpretation. In this connection, higher-order regularization has been studied in the literature. In [1], Chambolle and Lions propose an inf-convolution of the total-variation term and a functional based on the second-order derivatives: 2 + w2 + w2 + w2 R(u) = min |∇v| dx + α wxx yx xy yy dx. u=v+w
Ω
Ω
A corresponding asymptotical case was studied in [16]. Chan et al. [3] adaptively add the Laplacian as regularizing term or replace the second summand in the inf-convolution by the Laplacian in [2] to avoid staircasing. After mollifying the TV-measure TV(u) ≈ Ω |∇u|2 + ε dx , ε 1, the corresponding Euler-Lagrange equation is iteratively solved by the lagged-diffusivity fixed-point method (cf. [19]). Likewise, You and Kaveh [20] and Didas et al. [5] investigate Laplacians u and variations thereof as argument of one convex functional. In [11], Lysaker and Tai provide two regularizars
554
J. Yuan, C. Schnörr, and G. Steidl
R1 (u) = R2 (u) =
|uxx | + |uyy | dx
(3)
u2xx + u2yx + u2xy + u2yy dx
(4)
Ω Ω
which are used in a PDE-based image diffusion process so as to avoid staircase effect in smooth regions and a fourth-order numerical scheme is given. In [12], Lysaker and Tai further introduce the convex combination of high-order regularizar and the classical total-variation term. The functional (3) was also considered in [8]. In [13], Rahman, Tai and Osher suggested a two-step high-order image denoising method, which first computes a denoised tangential field τ = (τ1 , τ2 )t , i.e. div τ = 0, by applying the regularizar |∇τ | dx which is actually equivalent to (4) for the image scalar field, then reconstructs the image gray-values by fitting the resulting normal field n = (τ2 , −τ1 ) through n dx , s.t. min |∇u| − ∇u · (u − f )2 dx = σ 2 . u |n| Ω Ω Basically, the energy functionals used in our approaches possess the same structure as the work [11] except the applied nonsmooth high-order regularizar and the optimized functional proposed in (13b) is similar as the tangential-smoothing step suggested in [13] except that our approach tries to smooth the curl-free gradient field than the div-free field. In connection with optical flow estimation, Trobin et al. [18] adopt from [4] the second-order term T √ √ 1 t(u) := √ Δu, 2 (uxx − uyy ), 8 uxy , 3 and use the corresponding TV-term Ω t(u) · t(u)dx for flow estimation. The derivation of t(u) in [4] is based on Fourier transforms and motivated by designing local detectors for detecting ridges and valleys of image functions, say. As a consequence, the corresponding TV-term appears not to be a proper basis for piecewise affine decomposition, and boundaries are not treated adequately (as is clearly visible e.g. in Fig. 2f in [18]). Contribution. Our contribution consists in devising a novel regularization term (1) that provides a mathematically precise solution to the problem of denoising piecewise affine signals. Staircasing is suppressed as well, and a augmented Lagrangian based problem decomposition is derived that enables to compute a global optimum by iterating computationally simple iterative steps. Numerical experiments are presented mainly to illustrate and validate properties of the approach.
2
Subspaces and Orthogonal Decompositions
We let Ω ⊂ R2 denote an open bounded and simply-connected domain with Lipschitz-continuous boundary ∂Ω. For scalar-valued functions, we denote by
Total-Variation Based Piecewise Affine Regularization
555
| · |p , 1 ≤ p < ∞ the usual Lp (Ω) norm and by · the L2 (Ω)inner product. For vector-valued functions g = (g1 , g2 )T , we set g p := g12 + g22 p and
g, hΩ := g1 , h1 + g2 , h2 . Further, we use the notation u ¯ := |Ω|−1 Ω u dx for T ⊥ T the average of u and ∇u := (ux , uy ) , ∇ u := (uy , −ux ) , div g := g1x + g2y and curl g := g1y − g2x . Let H 1 (Ω) denote the Sobolev spaces with the inner product ¯ v¯
u, vH 1 := ∇u, ∇vΩ + u
(5)
and let H01 (Ω) := {u ∈ H 1 (Ω) : u|∂Ω = 0}. We are interested in the space
ux , u ¯y )T · n , H(Ω) := u ∈ H 1 (Ω) : ∂n u|∂Ω = (¯ where n denotes the outer unit normal vector at the boundary ∂Ω. By the following proposition, we can decompose functions u ∈ H(Ω) into a globally affine component ua and an oscillating part uo . Proposition 1. The space H(Ω) admits the orthogonal decomposition H(Ω) = Ha (Ω) ⊕H 1 Ho ,
Ha (Ω) := u ∈ H(Ω) : ∇u = (¯ ux , u ¯y )T ,
Ho (Ω) := u ∈ H(Ω) : u ¯=u ¯x = u ¯y = 0 , ∂n u|∂Ω = 0 .
(6a) (6b) (6c)
¯y . Then ua := Proof. For any u ∈ H(Ω), let uox := ux − u¯x and uoy := uy − u u ¯x x + u ¯y y + u ¯ ∈ Ha (Ω) and the function uo defined by its partial derivatives uox , uoy and by u ¯o = 0 belongs to Ho (Ω). Moreover, we have that u = ua + uo . The orthogonality of the decomposition follows by
ua , uo H 1 = ∇ua , ∇uo Ω + u ¯a u¯o = u¯x uox dx + u¯y uoy dx = 0. 2 Ω
Ω
The Helmholtz decomposition of vector fields, see [6, 22, 21] also for the discrete setting, is given by L2 (Ω)2 = ∇H 1 (Ω) ⊕ ∇⊥ H01 (Ω), where the spaces can be also characterized by ∇H 1 (Ω) = {v ∈ L2 (Ω)2 : curl v = 0} and ∇⊥ H01 (Ω) = {v ∈ L2 (Ω)2 : div v = 0, v · n|∂Ω = 0}. We will need the space V (Ω) := {v ∈ L2 (Ω)2 : v · n|∂Ω = 0, v¯1 = v¯2 = 0} . (7) By the Helmholtz decomposition, this space admits the orthogonal decomposition V (Ω) = V∇ (Ω) ⊕ V∇⊥ (Ω),
(8)
where V∇ (Ω) := {v ∈ ∇H 1 (Ω) : v · n|∂Ω = 0, v¯1 = v¯2 = 0} and V∇⊥ (Ω) := {v ∈ ∇⊥ H01 (Ω) : v¯1 = v¯2 = 0}. Proposition 2. For every vector field v ∈ V∇ (Ω), there is a unique function uo ∈ Ho (Ω) with v = ∇uo .
556
J. Yuan, C. Schnörr, and G. Steidl
Proof. By definition we have for any v ∈ V∇ (Ω) that there exists u ∈ H 1 (Ω) such that v = ∇u. Then we see that v · n|∂Ω = ∂n u|∂Ω = 0 and v¯1 = u ¯x = 0, v¯2 = u ¯y = 0. On the other hand, uo ∈ Ho is uniquely determined by the Neumann problem Δuo = div v ,
3
∂n uo |Ω = 0 ,
u ¯o = 0 .
(9)
Variational Approaches
In the rest of this paper, we follow the first discretize, then optimize paradigm, yet adopt the usual (continuous) notation that is easier to read. Accordingly, all operators like ∇, div etc. denote linear mappings between finite dimensional spaces, | · |p are the usual p norms and for g := (gi )ni=1 , gi ∈ R2 g p := |(|gi |2 )ni=1 |p . In the following, we denote by δC the indicator function of a convex set C, i.e. δC (x) := 0 if x ∈ C, and δC (x) := ∞ otherwise and by PC the orthogonal projector onto C. We exhibit the effect of the regularizer (1) by computing a dual representations of the optimization problems (13) in accordance to the dual formulation of the ROF-model. In general, if g : Rn → R and Φ : Rm → R are proper, closed convex functions and D : Rn → Rm is a linear operator, then the following problem (P) has the dual (D): (P )
inf {g(u) + Φ(Du)},
(D)
u∈Rn
− infm {g ∗ (−D∗ p) + Φ∗ (p)}, p∈R
where g ∗ denotes the conjugate function of g. For the problems considered in the following, it can be shown that solutions of the primal and dual problem exist and that the duality gap is zero. Rudin-Osher-Fatemi (ROF) Model. We recall some basic formulas as a reference for our approach presented below. The ROF-model reads inf u
1 2
|f − u|22 + α TV(u) ,
α TV(u) := σCα (u),
(10)
where Cα := div Bα , Bα := p : p ∞ ≤ α}. Let u ˆ denote the minimizer of (10). Setting g(u) := 12 |f − u|22 , D := I and Φ(u) := αT V (u) and regarding that g ∗ (v) := 12 |f + v|22 − 12 |f |2 and Φ∗ (v) = δCα the dual problem reads − inf
1
v∈Cα
1 |f − v|22 − |f |2 , 2 2
(11)
where we have replaced v by −v by the symmetry of Cα . Consequently, if pˆ := argmin p∈Bα
1 2
|f − div p|22
(12)
Total-Variation Based Piecewise Affine Regularization
557
then vˆ := div pˆ = PCα (f ) is the minimizer of the dual problem. Primal and dual solutions are related by the optimality condition f − div pˆ = u ˆ, that in turn yields the image decomposition f = u ˆ + vˆ. Affine Variational Models. Based on the regularizer (1) we consider two variational approaches: 1
|u − f |22 + α TVa (u) , inf u 2 1
|u − f |2H 1 + α TVa (u) . inf u 2
(13a) (13b)
These approaches differ due to the data term which is the usual one in case of (13a), whereas the data term in (13b) is induced by the discrete counterpart of the inner product (5). 3.1
Variational Approach (13a)
We introduce an auxiliary vector field v in order to express the regularizer (1) in term of the ordinary TV-measure defined in (2). Then approach (13a) reads inf
u,v
1 2
|f − u|22 + α TV(v1 ) + α TV(v2 ) ,
i.e.,
subject to
v = ∇u.
(14)
inf {g(u, v) + Φ D(u, v)T u,v
with g(u, v) :=
1 2 |f
−
u|22
+ α TV(v1 ) + α TV(v2 ), Φ := δ0 , D := (∇ − Since I). div g ∗ (r, s) = 12 |f + r|22 − 12 |f |2 + δCα (s1 ) + δCα (s2 ), Φ∗ ≡ 0 and −D∗ = the I dual problem becomes 1 1 div |f − div q|22 − |f |2 . q) = − inf 2 − inf 2 g ∗ ( I 2 2 q∈Cα q∈Cα This formulation parallels the dual formulation (11) of the ROF-model. Let 1 qˆ := argmin |f − div q|22 . 2 2 q∈Cα
(15)
The higher-order TV regularization becomes apparent through the texture part of f which is defined by the orthogonal projection vˆ = div qˆ = Pdiv Cα2 (f ) onto a different convex set. An alternative, more explicit characterization of the regularization effect of (1) in terms of the auxiliary field v = ∇u is obtained by reformulating (13a) as inf G(v) + α TV(v1 ) + TV(v2 ) , v
where
G(v) :=
inf
u,∇u=v
1 |u − f |22 (16) 2
558
J. Yuan, C. Schnörr, and G. Steidl
Exploiting strong duality again we obtain that 1
1 |f − div p|22 − |f |2 − v, pΩ . G(v) = − inf p 2 2
(17)
Fermat’s rule yields that the minimizer pˆ has to fulfill ∇ div pˆ = ∇f − v and, in turn, Δ(div pˆ) = Δf − div v. Insertion into G(v) in (17) yields for (16) (omitting the constant) 1 Δ−1 (Δf − div v)2 + α TV(v1 ) + TV(v2 ) inf (18) 2 v 2 This representation of (13a) and (15), respectively, shows that the edge image Δf is approximated by the divergence of a piecewise smooth vector field v in terms of the | · |Δ−2 -norm. Clearly, inserting v = ∇u into Δ−1 (Δf − div v) yields 1 2 2 |f − u|2 from (13a). 3.2
Variational Approach (13b)
The data term of problem (13b) decomposes according to the orthogonal decomposition (6a). By construction, the affine component ua of u = ua + uo is not affected by the regularizer. Thus, uˆa = fa , where fa can be computed in a preprocessing step. It remains to minimize 1 inf ∇fo − v 22 + α TV(v1 ) + TV(v2 ) , subject to v = ∇uo . uo ,v 2 Due to the Prop. 2 the linear constraint can be expressed as δV∇ (v). Reasoning similar to the previous section yields
1 sup w, ∇fo − vΩ − w 22 + sup q, −vΩ + δV∇ (v) 2 2 w q∈Cα
1 = sup w + q, −vΩ + δV∇ (v) + w, ∇f Ω − w 22 2 2 w,q∈Cα Interchanging inf and sup and taking inf v (ignoring constants), we obtain inf
2 w,q∈Cα
1 ∇fo − w 22 2
subject to
w + q ∈ V∇⊥ .
(19)
The minimizer w ¯ is obviously an element of V∇ , which together with the constraints q ∈ Cα2 , w + q ∈ V∇⊥ leads to the reformulation of (19) inf
2) q∈P∇ (Cα
1 ∇fo − q 22 . 2
(20)
Here P∇ denotes the orthogonal projector onto the subspace V∇ . To compare this approach with (15), we rewrite (20) as inf
2) q∈P∇ (Cα
2 1 1 ∇(fo − Δ−1 div q) 22 = inf 2 ∇ Δ−1 (Δfo − div q) 2 , (21) 2 q∈P∇ (Cα ) 2
Total-Variation Based Piecewise Affine Regularization
559
where Δ−1 stands for the solution operator of problem (9). Approach (15), on the other hand, is given by inf |fa + (fo − div q)|22 .
2 q∈Cα
(22)
Taking into account the representation of vector fields q ∈ V∇ by a potential functions φq in terms of q = ∇φq viz. div q = Δφq (Prop. 2), we see that (21) focuses on the decomposition of the edge set Δf , whereas (22) decomposes f and does not discriminate the two components fa and fo . Comparing (21) on the other hand with (18) indicates how regularization of the large-scale structural components of f is accomplished by (21) in terms of the small-scale texture component φq , by taking the gradient (after smoothing with Δ−1 ) and projection onto a suitable set P∇ (Cα2 ).
4
Optimization
In this section we specify algorithms for computing a global minimum of (13a) and (13b), respectively. We apply an alternating version of the split Bregman algorithm [7]. Note that the split Bregman algorithm coincides with the augmented Lagrangian method applied to the primal problem [14] and that its alternating version is just a Douglas-Rachford splitting for the dual problem [17]. The convergence properties of this technique are well known. 4.1
Algorithm Minimizing (13a)
The split Bregman algorithm for (14) reads (u(k+1) , v (k+1) ) = argmin u,v
1 2
|f − u|22 + α(TV(v1 ) + TV(v2 ))
1 ∇u − v 22 + b(k) , ∇u − vΩ , 2τ 1 = b(k) + ∇u(k+1) − v (k+1) . τ +
b(k+1)
Alternating the minimization of u(k+1) and v (k+1) we obtain
1 ∇u(k) + τ b(k) − v 22 , v (k+1) = argmin α(TV(v1 ) + TV(v2 )) + 2τ v
1 1 (k+1) ∇u + τ b(k) − v (k+1) 22 . = argmin |f − u|22 + u 2 2τ u Then v (k+1) follows as in the ROF approach by (k) 2 (∇u v (k+1) = ∇u(k) + τ b(k) − PCατ + τ b(k) )
and u(k+1) can be computed by setting the gradient to zero u(k+1) = (I −
1 1 −1 ) f − div( v (k+1) − b(k) ) τ τ
560
J. Yuan, C. Schnörr, and G. Steidl
Algorithm. Initialization: b(0) = 0 and u(0) = f For k = 0, 1, . . . iterate until a convergence criterion is reached w(k+1) := ∇u(k) + τ b(k) (k+1) 2 (w v (k+1) := w(k+1) − PCατ ) 1 1 (k+1) (k+1) −1 f − div( τ v := (I − τ ) − b(k) ) u b(k+1) := b(k) + τ1 ∇u(k+1) − v (k+1) 4.2
Algorithm Minimizing (13b)
Based on the derivation in section 3.2, we consider inf
uo ,v
1 2
∇fo − ∇uo 22 + α TV(v1 ) + TV(v2 ) ,
subject to
v = ∇uo .
and have to iterate
1 ∇fo − ∇u|22 + α(TV(v1 ) + TV(v2 )) 2 u,v 1 + ∇u − v 22 + b(k) , ∇u − vΩ , 2τ 1 = b(k) + ∇u(k+1) − v (k+1) . τ
(u(k+1) , v (k+1) ) = argmin
b(k+1)
Alternating the first minimization process we obtain the following algorithm Algorithm. Initialization: b(0) = 0 and u(0) = f For k = 0, 1, . . . iterate until a convergence criterion is reached w(k+1) := ∇u(k) + τ b(k) (k+1) 2 (w v (k+1) := w(k+1) − PCατ ) τ (k+1) −1 := 1+τ div ∇fo + ( τ1 v (k+1) − b(k) ) u b(k+1) := b(k) + τ1 ∇u(k+1) − v (k+1)
5
Numerical Experiments
In this section we illustrate the properties of our approach with few numerical experiments. The mimetic finite difference method [9, 10] is used for discretizing relevant scalar fields and vector fields and a detailed implementation of the nonlinear functionals is given in [21]. By this numerical scheme, the relevant boundary conditions are kept well and turn out to be compatible with the corresponding integral identities. Signals. Figure 2 shows that our approach (13b) effectively removes noise without staircasing effect, in contrast to the ROF model. We also point out that boundaries are treated without introducing artifacts.
Total-Variation Based Piecewise Affine Regularization
561
Fig. 2. Ground truth and noisy input data are shown by the first two graphs respectively. Standard TV-regularizaton (ROF model) leads to the well-known staircasing effect (see 3rd. picture). Piecewise affine TV regularization effectively removes noise and recovers the piecewise affine signal structure (see 4th picture).
Variational approach (13a) versus (13b). Figure 3 compares the minimizers of the two variational approaches (13a) and (13b) for an arbitrary image section. The last two pictures of Figure 3 depict 3D plots of the minimizers subtracted from the original image section. The plot on 5th graph corresponding to the approach (13b) clearly indicates an approximation “error” that is not noticeable in the plot on 4th graph corresponding to (13a). This result confirms the discussion above of formal differences between equations (21) and (22) and the | · |2H1 based data term is more sensitive to large noises due to the noise amplification by partial derivatives.
Fig. 3. From left to right. Original image section, minimizer of (13a), minimizer of (13b), 3D plots of the minimizers subtracted from the original data illustrate a major difference between the variational approaches (13a) and (13b). While the 4th plot on the shows almost pure noise, the rightmost plot indicates an estimation error due to using the | · |2H1 data term which is sensitive to large noise levels.
Denoising of vector fields. Figure 4 compares the standard TV regularization (ROF model) with piecewise affine TV regularization for the denoising of vector fields. The input data simulate estimates obtained for a moving camera in a scene with moving objects. This scenario is roughly represented by a piecewise planar layout of the scene. The numerical results confirm again that our approach returns useful estimates of both denoised vector fields and its discontinuities, while the ROF-model only returns discontinuities but no useful vector field estimates.
562
J. Yuan, C. Schnörr, and G. Steidl
Fig. 4. Top. Color-coded motion field corresponding to a moving camera and static as well as moving objects represented by sections of planes; ground-thruth (1st. fig.), input data (2nd. fig.), the ROF-based result (3rd. fig.) and the affine regulariza tionbased (13a) result (4th. fig.). Last two rows: Components of ∇u1 and ∇u2 for the ROF model (2nd. row) and for piecewise affine regularization (3rd. row). The result on the right illustrates that through piecewise affine regularization no staircasing effect occurs, thus enabling both discontinuity detection and motion estimation, while the latter is not feasible for such scenarios with the standard ROF-model.
6
Conclusion
We presented a novel convex variational approach to the denoising and the decomposition of signals, images and vector fields. Based on a suitable orthogonal decomposition of the underlying vector space, a TV measure comprising second-order derivatives was introduced that enables to denoise noisy input data and to preserve piecewise affine signal structure using standard algorithms of convex programming. The latter are computationally simple due to a problem decomposition employing the augmented Lagrangian and primal and dual variables. By deriving dual variational formulations aking to the ROF model, differences between first- and second order regularization and between two alternative data terms were worked out. Numerical experiments confirm these findings.
Total-Variation Based Piecewise Affine Regularization
563
References 1. Chambolle, A., Lions, P.L.: Image recovery via total variation minimization and related problems. Numer. Math. 76(2), 167–188 (1997) 2. Chan, T., Esedoglu, S., Park, F.E.: Image decomposition combining staicase reduction and texture extraction. J. Visual Communication and Image Representation 18(6), 468–486 (2007) 3. Chan, T., Marquina, A., Mulet, P.: Higher-order total variation-based image restoration. SIAM J. Sci. Comput. 22(2), 503–516 (2000) 4. Danielsson, P.E., Lin, Q.: Efficient detection of second-degree variations in 2D and 3D images. J. Vis. Comm. Image Repr. 12, 255–305 (2001) 5. Didas, S., Setzer, S., Steidl, G.: Combined 2 data and gradient fitting in conjunction with 1 regularization. Advances in Computational Mathematics 30(1), 79–99 (2009) 6. Girault, V., Raviart, P.-A.: Finite Element Methods for Navier-Stokes Equations. Springer, Heidelberg (1986) 7. Goldstein, D., Osher, S.: The Split Bregman method for l1 regularized problems. UCLA CAM Report (2008) 8. Hintermüller, W., Kunisch, K.: Total bounded variation regularization as a bilaterally constraint optimization problem. SIAM J. Appl. Math. 64(4), 1311–1333 (2004) 9. Hyman, J.M., Shashkov, M.: Natural discretizations for the divergence, gradient, and curl on logically rectangular grids. Comput. Math. Appl. 33(4), 81–104 (1997) 10. Hyman, J.M., Shashkov, M.: Adjoint operators for the natural discretizations of the divergence, gradient and curl on logically rectangular grids. Appl. Numer. Math. 25(4), 413–442 (1997) 11. Lysaker, M., Lundervold, A., Tai, X.C.: Noise removal using fourth-order partial differential equation with applications to medical magnetic resonance images in space and time. IEEE Trans. Image Processing 12(12), 1579–1590 (2003) 12. Lysaker, M., Tai, X.C.: Iterative image restoration combining total variation minimization and a second-order functional. International Journal of Computer Vision 66(1), 5–18 (2006) 13. Rahman, T., Tai, X.C., Osher, S.J.: A TV-stokes denoising algorithm. In: Sgallari, F., Murli, A., Paragios, N. (eds.) SSVM 2007. LNCS, vol. 4485, pp. 473–483. Springer, Heidelberg (2007) 14. Rockafellar, R.T.: Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1(2), 97–116 (1976) 15. Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 16. Scherzer, O.: Denoising with higher order derivatives of bounded variation and an application to parameter estimation. Computing 60, 1–27 (1998) 17. Setzer, S.: Split Bregman algorithm, Douglas-Rachford splitting and frame shrinkage. In: Lie, K.A., Lysaker, M., Morken, K., Tai, X.C. (eds.) Scale Space and Variational Methods. LNCS. Springer, Heidelberg (2009) 18. Trobin, W., Pock, T., Cremers, D., Bischof, H.: An unbiased second-order prior for high-accuracy motion estimation. In: Rigoll, G. (ed.) DAGM 2008. LNCS, vol. 5096, pp. 296–405. Springer, Heidelberg (2008) 19. Vogel, C.R.: Computational Methods for Inverse Problems. SIAM, Philadelphia (2002)
564
J. Yuan, C. Schnörr, and G. Steidl
20. You, Y.L., Kaveh, M.: Fourth-order partial differential equations for noise removal. IEEE Trans. Image Processing 9(10), 1723–1730 (2000) 21. Yuan, J., Schnörr, C., Steidl, G.: Simultaneous optical flow estimation and decomposition. SIAM J. Scientific Computing 29(6), 2283–2304 (2007) 22. Yuan, J., Schnörr, C., Memin, E.: Discrete orthogonal decomposition and variational fluid flow estimation. J. Math. Imaging and Vision 28(1), 67–80 (2007) 23. Yuan, J., Schnörr, C., Steidl, G.: Convex hodge decomposition and regularization of image flows. J. Math. Imag. Vision 33(2), 169–177 (2009)
Image Denoising by Harmonic Mean Curvature Flow Mourad Zéraï Laboratory for Mathematical and Numerical Modeling in Engineering Science National Engineering School at Tunis ENIT-LAMSIN, B.P. 37, 1002 Tunis Belvédère, Tunisia [email protected]
Abstract. We propose a noise-removal method for vector-valued images by considering the negative gradient flow (the biharmonic map heat flow) of the intrinsic Bi-energy on Riemannian manifold of non-positive curvature. This method represents a natural generalization of both harmonic maps and minimal immersions. It is derived by finding the critical point of the variational problem associated to the integral of the squared norm of the tension-field (Bi-harmonic map) or of the mean curvature vector field (Bi-minimal immersion). In local coordinates, this method yields a fourth order non-linear system of PDE’s that we, numerically, solve by an explicit finite difference method. Experiments on real color-image endowed with the Helmholtz and Stiles metrics show that the proposed method is effective, accurate and highly robust.
1
Introduction
Let (D, g) be a flat 2D image domain endowed with the metric g and mapped in an (V, h) coordinates manifold, which can be, for instance, a color RGB-space endowed with the color-metric h. Consider the energy 1 E2 (u) = |τ (u)|2 dμg , (1) 2 D where μg is the area measure on D endowed with the metric g and τ (u) = trg ∇du is the tension vector field, vanishing for critical points of the Dirichlet energy (i.e. harmonic maps), 1 E(u) = |du|2 dμg . (2) 2 D In local coordinates, it takes the form: 2 α α ∂ u α ij D k ∂u − Γ + τ (u) = g ij ∂xi ∂xj ∂xk
V
α Γβγ
∂uβ ∂uγ ∂xi ∂xj
,
(3)
α where D Γijk and V Γβγ are the Christoffel symbols of the Levi-Civita connections on (D, g) and (V, h). X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 565–575, 2009. c Springer-Verlag Berlin Heidelberg 2009
566
M. Zéraï
Critical points of E2 (u) are called biharmonic maps. The Euler-Lagrange operator attached to the bienergy (1), called the bitension field and derived by Jiang [16] is τ2 (u) = −g τ (u) − trRV (du, τ (u)du). (4) The corresponding gradient flow is given by the geometric evolution problem ∂ut = −τ2 (u),
(5)
where RV is the curvature tensor of V. Jiang also proved that, in the case of a target manifold RV with non-positive curvature, every biharmonic map is harmonic, which is the case of almost all coordinates manifolds (excluding the directional ones) that we are concerned with in image processing. In the same way, if we denote by Imm(D, V) the space of Riemannian immersions in (V, h), then a Riemannian immersion u : (D, g) → (V, h) is called minimal if it is a critical point of the volume functional 1 V : Imm(D, V) → IR, V (u) = dμu∗ h , 2 D where u∗ h is the pull-back metric and μu∗ h is the induced area mesure on D. The corresponding Euler-Lagrange equation is H = 0, where H is the mean curvature vector field. We recall an important fact that will be of great importance in the sequel and established by Eells and Sampson [7] that is : if u is an immersion, then its mean curvature is, up to a constant, the trace of the second fundamental form trg ∇du. As suggested by Eells and Sampson in their seminal paper [7], natural generalization of harmonic maps can be given by considering the critical points of the functionals obtained integrating the square of the norm of the tension field, i.e. : 1 E2 (u) = |τ (u)|2 dμg . 2 D Critical points of the functionals obtained integrating the square of the norm of the the mean curvature vector field, which is known in the literature by the (generalized)-Willmore functional, can represent a possible generalization of minimal immersions. More precisely, biminimal immersions (or Willmore immersions) are critical points of the Willmore functional (see [21] for a short survey on those topics): W(u) = (|H|2 + K)dμu∗ h , (6) D
where K is the sectional curvature of (V, h) restricted to the image of D. Historically, this functional appears in the context of an embedding of a surface Σ in the three dimensional Euclidean space IR3 and consequently with a vanishing sectional curvature K. As noticed by Willmore in [34], it was Weiner in (1978) who has added the curvature term K in the integrand when he considered immersions of orientable surface into a Riemannian manifold of constant sectional curvature [32].
Image Denoising by Harmonic Mean Curvature Flow
567
The Willmore functional appears already in earlier work of Germain [11]. After what it was considered in the early twentieth century in various works by Thomsen [27], and subsequently by Blaschke [2]. It was reintroduced, and more systematically studied within the framework of the conformal geometry of surfaces, by Willmore in 1965 [33]. The Willmore functional also plays an important role in various areas of science. In molecular biology, it is known as the Helfrich Model [13], where it appears as a surface energy for lipid bilayers. In solid mechanics, the Willmore functional arises as the limit-energy for thin-plate theory (see [9]). In general relativity, this functional appears as the main term in the expression of the Hawking quasilocal mass (see [12] and [15]). In image processing, the Willmore functional was used since 2004 by Droske and Rumpf in a level-set formalism (see [6]) for the restoration of damaged region of a surface. Recently Clarenz et al. [5] covers a hole and reconstruct a surface by using a minimizing Willmore energy functional with a finite element implementation leading to smooth surface patches with guaranteed continuity properties. In this paper, we use the generalized Willmore functional (6) in the context of manifold-valued image processing. We are interested with a noise-removal method for multi-channels images, taking color images as a typical representative of this class of images. More precisely, we use flat color metrics, i.e., with vanishing curvature tensor. Namely, we use the Helmholtz and Stiles (flat) metrics, and consequently the curvature term will disappear from (6) which simplifies the expressions of the derived flow which is a nonlinear parabolic PDE’s system ∂ut = g H,
(7)
where g = u∗ h and g stands for the rough Laplacian, i.e.: 1 ∂ H = √ det g ∂xi g
α
∂H α det gg ∂xj ij
+
V
α Γβγ
∂H β ∂H γ ij g . ∂xi ∂xj
(8)
Since the Euler-Lagrange equation associated to (6) reduces to g H = 0. Following the denomination used by Chen in [4] we call the flow (7) harmonic mean curvature flow (HMCF) and it represents an extension to the notion of harmonic mean curvature from the Euclidean setting to the Riemannian one. We refer to [1] for more details about this topic in the Euclidean setting. We will tackle the problem with non flat color metrics such as Schrodinger or VosWalraven metrics in a future work. We note that Sochen and Zeevi [25] have already used the Vos-Walraven line-element in a Riemannian setting for processing color images with the Beltrami flow which yields a second-order nonlinear parabolic PDEs system. Finally, one can remark, formally, the apparent similarity between the two HMCF in Euclidean and (flat)-Riemannian settings since it is well known that every flat Riemannian n-dimensional manifold is locally isometric to IRn (see ( [10], p. 109, for instance).
568
1.1
M. Zéraï
Related Works
In local coordinates, the gradient descent of the Euler-Lagrange equation of the minimization problem related to (6) yields a fourth order non-linear system of PDE’s, and thus our method can be classified in the family of fourth-order parabolic equation for image denoising. This family of methods have gained big importance in the last few years. Indeed, many nonlinear PDEs are proposed to deal with the tradeoff between noise removal and edge preservation. Among them, the fourth-order parabolic PDEs have drawn great interest. In general, the forms of fourth-order PDEs are analogous with the second order ones. For example, the fourth order equation proposed by You and Kaveh [35], ut = −(g(u)u),
(9)
or the equation proposed by Wei [30] ut = −∇(g(|∇u|)∇u),
(10)
are Perona-Malik analogue; the equation proposed by Tumblin and Turk’s in [28] ut = ∇(g(Dij u)∇u),
(11)
where g is a function of the second derivatives of the image intensity function u, is a fourth order possible analogue to the anisotropic diffusion equation of Weickert [31], and finally the equation proposed by Lysaker et al [19] u ut = , (12) |u| is similar to Total Variation model [23].
2
Color-Image as Typical Example of Vector-Valued Image Processing
Since the beginning of quantized color vision theories in the 19th century two approaches have appeared. On one side is the Young-Helmholtz trichromatic approach which is physically oriented and compatible with the science of colorimetry. On the other side is the opponent approach which is mainly based on color sensation. In this paper, we are mainly interested by the geometrical trichromatic approach of Young-Helmholtz. This approach has many good computation characteristics but also many physiological limitations (for a nice discussion about this topic see [3] for instance). In this vein, many approaches were proposed by different line element theory. The notion of line-element is nothing but the metric associated to the color manifold. Amount these lineelement theories we mention Helmholtz [14], Schrödinger [24], Stiles [26] and Vos-Walraven [29]. In the geometrical trichromatic approach, a color image is considered as 3 images: Red, Green, and Blue, (or their many other transformations) that are
Image Denoising by Harmonic Mean Curvature Flow
569
composed into one. These three channels represent a limited domain in the three dimensional Euclidean space IR3 , which we endow with a metric derived from the expression of the considered line-element. With this construction we can consider the color space as a Riemannian (sub-)manifold. The different line-elements (or metrics) proposed in the literature are derived from two main considerations: – The first consideration is what Ron Kimmel has called inductive [18]. In this case, the line elements are established by simple assumptions on the visual response mechanisms. All models of this category assume that the color space can be simplified and represented as a Riemannian space of nonpositive curvature. Some of the proposed metrics have an effect to flatten the color space like the Helmholtz’s or the Stiles’ one. The others have the effect to warp (negatively) the color space like Schrödinger or Vos-Walraven metrics. Roughly speaking, we can see the negative curvature of a manifold like a generalization of Euclidean space (which is flat) in the sense that if two geodesic-lines start from the same point but in different directions, they will never cross again (which is not true in manifold with positive curvature, like the sphere). This characteristic ensures some nice properties like uniqueness results in minimization problems [17]. – The second class of line-elements are derived by empirical considerations. In this category, the metric coefficients are determined to fit empirical data. Among them, some describe an Euclidean space like the CIELAB (CIE 1976 (L*a*b*)) [18]; some others, like MacAdam [20], are based on an effective arclength. 2.1
Vector-Valued Image as Isometric Immersions
Let (D, g) be a flat 2D image domain imbedded in an (V, h) coordinates manifold, which can be, for instance, a color RGB-space endowed with the color-metric h, ˜ = (IR2 ⊕ V, can ⊕ h), where ˜ h) and let (V, – can is the canonical metric of IR2 , – V is an n-dimensional manifold equipped with the metric h and modeling the coordinates space which is the (vector) channels of an image u (the three dimensional RGB space for a color image, for instance), – V˜ is the direct sum of IR2 and V, ˜ = can ⊕ h. – and h A vector-valued image can be described mathematically as an isometric immersion (x1 , x2 ) → u = (x1 , x2 ; v 1 (x1 , x2 ), . . . , v n (x1 , x2 )) of a two-dimensional domain D in the trivial fiber bundle IR2 × (V, h) which is a (2 + n)-dimensional manifold. The image manifold and its metric (D, g) are called the space of parameters in the dynamical system community, the target manifold and its metric ˜ are called the space of coordinates. The metric h ˜ of V is then given by ˜ h) (V, d˜ s2 = ds2spatial + β 2 ds2vector
(13)
570
M. Zéraï
where β is the relative scale between the spatial coordinates and the intensity components which we will set equal to one for sake of simplicity. We can rewrite the metric defined by (13) as the quadratic form d˜ s2 = (dx1 )2 + (dx2 )2 + (dv)T h(dv), where v = (v i ). The corresponding metric tensor is I2 02,n ˜ h= , 0n,2 h where h is the metric tensor of V. ˜ Therefore Since the image is an isometric immersion, we have g = u∗ h. gαβ = δαβ + hij ∂α v i ∂β v j ,
α, β = 1, 2,
i, j = 1, . . . , n,
(14)
where n = 3 if we deal with color-images.
3 3.1
Main Examples of Flat Color-Metric Helmholtz’s Color Metric
Hermann von Helmholtz (1821-1894), was the first who had attempted to mathematically formulate the distance between colors by the concept of line element. He defines the following line element: 2 2 2 dR dG dB + + , (15) ds2 = R G B where R, G and B are the three color channels: Red, Green and blue. In local coordinates, this can be expressed as a positive definite symmetric matrix: ⎛ 1 ⎞ 0 0 x21 ⎜ ⎟ 1 (hij )i,j=1,2,3 = ⎝ 0 x22 0 ⎠ , (16) 0 0 x12 3
where we use the coordinate notation x1 = R, x2 = G and x3 = B. The color space is defined as a domain D in the positive orthant IR3+ defined by:
IR3+ = x ∈ IR3 | xi > 0, i = 1, 2, 3 (17) Having the expression of the metric, we can now give the Christoffel symbols using the formula: 1 ∂hkl ∂hjk ∂hjl i , (18) Γjk = hij + − 2 ∂xk ∂xj ∂xl and hence, the non vanishing Christoffel symbols are 1 1 1 2 3 , Γ22 = − , Γ33 =− . R G B A simple computation shows that the color-manifold endowed with Helmholtz metric is flat. 1 Γ11 =−
Image Denoising by Harmonic Mean Curvature Flow
3.2
571
Stiles’ Color Metric
Walter W. Stiles modified the Helmholtz’s proposal in order to better account for observations of threshold values (see [26] p. 660). Thus he proposed the following form of color-metric: 2 2 2 ζ(R) ζ(G) ζ(B) 2 dR + dG + dB (ds) = ρ γ β where: 9 , 1 + 9R
ζ(R) =
ζ(G) =
9 , 1 + 9G
ζ(B) =
9 . 1 + 9B
The functions ζ(R), ζ(G) and ζ(B) are determined experimentally. The constant ρ, γ and β are proportional to the limiting Weber fractions of the three cone responses at high luminances and Stiles obtained the following values: ρ = 1.28,
γ = 1.65,
β = 7.25
At high luminances, the Stiles’ metric reduces to 2
(ds) =
dR ρR
2
+
dG γG
2
+
dB βB
2
and in this form its relationship with the Helmholtz’s metric is obvious. With the same notations as the previous section and using equation (18) we have 9 9 9 1 2 3 , Γ22 , Γ33 . Γ11 =− =− =− 1 + 9R 1 + 9G 1 + 9B Another simple computation shows that the color-manifold endowed with Stiles’ metric is flat.
4
Harmonic Mean Curvature Flow
We consider the flow ∂ut = g H,
(19)
where H, in local coordinates, takes the form (up to a multiplicative constant): ∂uβ ∂uγ ij g , (20) ∂xi ∂xj g = u∗ h the pull-back induced metric and the Laplace-Beltrami operator. Suppose that the color-space is Euclidean, then all the Christoffel symbols V α Γβγ vanish and (19) becomes H α (u) = u +
V
α Γβγ
∂ut = (u)
(21)
572
M. Zéraï
where = √
1 ∂ det g ∂xα
∂ det gg αβ β ∂x
.
and we recover a You-Kaveh type equation (9) when we consider the intrinsic formulation (21), and a Wei type equation (10) if we consider (21) in local coordinates. α ∂uβ ∂uγ ij term, is to constraint the flow to live on The effect of the V Γβγ ∂xi ∂xj g the color-manifold, and thus to take account of the physiological aspects of the different luminances.
5
Numerical Issues
The corresponding gradient descent of the minimizing of the functional (6) leads to a system of fourth order partial differential equations. We have used an explicit finite difference discretization approach for this PDEs system which requires the evaluation of higher order derivatives and comes along with strong restrictions on the time step. This is not the better method to deal with this problem. To overcome these difficulties, a better strategy is, for instance, the discretization by the finite element method as it was used by Clarenz et al [5]. Nevertheless, the finite difference explicit scheme we used seems to be very robust and effective in numerical experiments.
6
Experiments
To be sure that our model is effective and works, we made some tests on different color-images. In figure-1 are presented the affects of HMCF, with Helmholtz and Stiles metrics, on a detail of the peppers color image. In order to test the efficiency of our method we must compare it with other fourth-order methods (and even second-order). That’s what we will accomplish in the future. It is interesting to analyse the figure-2 where are presented in grey-levels images the intensities of the four entries of the inverse image metric tensor, namely (g ij ) with the above notations. It is clearly shown that (g ij ) collects the morphological structure of the image, and acts like an anisotropic edge stopping
Fig. 1. From left to right : 1- Original image as a little detail from peppers, 2- Highly degraded image, 3- HMCF with Helmholtz metric, 100 iterations at dt =0.05, 4- HMCF with Stiles metric, 100 iterations at dt =0.05
Image Denoising by Harmonic Mean Curvature Flow
573
Fig. 2. Grey-levels image representation of the four entries of the (symmetric) tensor g ij which is the inverse of the induced metric gij and acts like an anisotropic edge stopping function
function. This fact proves, empirically, that our method preserves the contours of an image, while it smoothes homogeneous region. And the fact that the edge stopping function is of matrix-type, then anisotropic, motivates the comparison with the fourth-order equation (11) proposed by Tumblin and Turk’s [28].
Acknowledgment I am indebted to Professor Maher Moakher for encouragement, insightful comments and assistance throughout my work.
References 1. Barros, M., Garay, O.J.: On Submanifolds with Harmonic Mean Curvature. Proceedings of the American Mathematical Society 123(8) (August 1995) 2. Blaschke, W.: Vörlesungen über Differential Geometrie III. Springer, Berlin (1929) 3. Buchsbaum, G., Gottschalk, A.: Trichromacy, opponent colours coding and optimum colour information transmission in the retina. Proc. R. Soc. Lond. B (220), 89–113 (1983) 4. Chen, B.-Y.: Some open problems and conjectures on submanifolds of finite type. Soochow J. Math. 17, 169–188 (1991)
574
M. Zéraï
5. Clarenz, U., Diewald, U., Dziuk, G., Rumpf, M., Rusu, R.: A finite element method for surface restoration with smooth boundary conditions. Computer Aided Geometric Design 21(5), 427–445 (2004) 6. Droske, M., Rumpf, M.: A level set formulation for Willmore flow. Interfaces and Free Boundaries 6(3), 361–378 (2004) 7. Eells, J., Sampson, J.H.: Harmonic mappings of Riemannian manifolds. Amer. J. Mah. 86, 109–160 (1964) 8. Eliasson, H.I.: Introduction to global calculus of variations. In: Global analysis and its applications, IAEA, Vienna, vol. II, pp. 113–135 (1974) 9. Friesecke, G., James, R.D., Muller, S.: A theorem on geometric rigidity and the derivation of nonlinear plate theory from three-dimensional elasticity. Commun. Pure Appl. Math. 17(11), 1461–1506 (2002) 10. Gallot, S., Hulin, D., Lafontaine, J.: Riemannian Geometry, 2nd edn. Springer, Heidelberg (1990) 11. Germain, S.: Recherches sur la théorie des surfaces élastiques. Courcier, Paris (1821) 12. Hawking, S.W.: Gravitational radiation in an expanding universe. J. Mat. Phys. 9, 598–604 (1968) 13. Helfrich, W.: Elastic properties of lipid bilayers: theory and possible experiments. Z. Nat. forsch. A C28, 693–703 (1973) 14. von Helmholtz, H.: Handbuch der Physiologischen Optik. Voss, Hamburg (1896) 15. Huisken, G., Ilmanen, T.: The Riemannian Penrose inequality. Int. Math. Res. Not. 1997(20), 1045–1058 (1997) 16. Jiang, G.Y.: 2-harmonic maps and their first and second variational formulas. Chin. Annals Math. A 7, 389–402 (1986) 17. Jost, J.: Riemannian Geometry and Geometric Analysis, 2nd edn. Springer, Heidelberg (1998) 18. Kimmel, R.: A natural norm for color processing. In: Chin, R., Pong, T.-C. (eds.) ACCV 1998. LNCS, vol. 1351, pp. 88–95. Springer, Heidelberg (1997) 19. Lysaker, M., Lundervold, A., Tai, X.C.: Noise removal using fourth-order partial differential equation with applications to medical magnetic resonance images in space and time. IEEE Transactions on images processing 12, 1579–1590 (2003) 20. MacAdam, D.L.: Visual sensitivity to color differences in daylight. J. Opt. Soc. Am. 32, 247 (1942) 21. Montaldo, S., Oniciuc, C.: A short survey on biharmonic maps between Riemannian manifolds. Revista de la Unión Mathemática Argentina 47(2) (2006) 22. Olischlager, N., Rumpf, M.: A two step time discretization of Willmore flow. In: 21st Chemnitz FEM Symposium (2008) 23. Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 24. Schroedinger, E.: Grundlinien einer theorie de farbenmetrik in tagessehen. Ann. Physik 63, 481 (1920) 25. Sochen, N., Zeevi, Y.: Using Vos-Walraven line element for Beltrami flow in color images. EE-Technion and TAU HEP report Technion and Tel-Aviv University (1992) 26. Stiles, W.S., Wyszecki, G.: Color Science, Concepts and Methods, Quantitative Data and Formulae. John Wiley & Sons, Inc., Chichester (2000) 27. Thomsen, G.: Über konforme Geometrie, I. Grundlagen der konformen Flächentheorie. Abh. Math. Semin. Univ. Hamburg, 31–56 (1923)
Image Denoising by Harmonic Mean Curvature Flow
575
28. Tumblin, J., Turk, G.: LCIS: A boundary hierarchy for detail-preserving contrast reduction. In: Proceeding of the SIGGRAPH annual conference on Computer Graphics, Los Angeles, CA USA, August 1999, pp. 83–90 (1999) 29. Vos, J.J., Walraven, P.L.: An analytical description of the line element in the zonefluctuation model of colour vision II. The derivative of the line element. Vision Research (12), 1345–1365 (1972) 30. Wei, G.: Generalized Perona-Malik equation for image processing. IEEE Signal Processing Letters 6(7), 165–167 (1999) 31. Weickert, J.: Anisotropic diffusion in image processing. Teubner (1998) 32. Weiner, J.L.: On a problem of Chen, Willmore et Alia. Indiana University Math. J. (27), 19–35 (1978) 33. Willmore, T.J.: Note on embedded surfaces. An. Stiint. Univ. Al. I. Cuza Iasi., Ser. Noua, Mat. 11B, 493–496 (1965) 34. Willmore, T.J.: Riemannian Geometry. Owford Science Publications (1993) 35. You, Y.L., Kaveh, M.: Fourth-order partial differential equations for noise removal. IEEE Transactions on Image Processing 10(9), 1723–1730 (2000)
Tracking Closed Curves with Non-linear Stochastic Filters Christophe Avenel1 , Etienne Mémin2 , and Patrick Pérez2 1
ENS Cachan / IRISA INRIA, Vista Project, Center of Rennes {Christophe.Avenel,Etienne.Memin,Patrick.Perez}@irisa.fr 2
Abstract. The joint analysis of motions and deformations is crucial in a number of computer vision applications. In this paper, we introduce a non-linear stochastic filtering technique to track the state of a free curve. The approach we propose is implemented through a particle filter which includes color measurements characterizing the target and the background respectively. We design a continuous-time dynamics that allows us to infer inter-frame deformations. The curve is defined by an implicit level-set representation and the stochastic dynamics is expressed on the level-set function. It takes the form of a stochastic differential equation with Brownian motion of low dimension. Specific noise models lead to traditional evolution laws based on mean curvature motions, while other forms lead to new evolution laws with different smoothing behaviors. In these evolution models, we propose to combine local motion information extracted from the images and an incertitude modeling of the dynamics. The associated filter we propose for curve tracking thus belongs to the family of conditional particle filters. Its capabilities are demonstrated on various sequences with highly deformable objects.
1
Introduction
Tracking deformable structures delineated by free curves, with no prior on their possible shapes, is a very challenging problem. As a matter of fact, the shape of a deformable object or even of a rigid body may change drastically when visualized from an image sequence. These deformations are due to object apparent motion, to perspective effects and to 3D shape evolution. This difficulty is amplified when the object becomes partially or totally occluded during even a very short time period. The presence of cluttered background and ambiguities constitutes other difficulties for tracking. For curve tracking numerous approaches based on the level set representation have been proposed [1, 2, 3, 4, 5, 6, 7]. These techniques mainly addressed the problem as a succession of instantaneous detection or segmentation problems. At best only discrete snapshots of the location of the object of interest are provided and no dynamical or morphological consistency can be really enforced. Implausible growing/decreasing or merging/splitting cannot be avoided without resorting to shape priors [8,9,10]. This reduces considerably the generality of the tracker and restrains its use to very specific applications [8,10]. X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 576–587, 2009. c Springer-Verlag Berlin Heidelberg 2009
Tracking Closed Curves with Non-linear Stochastic Filters
577
Such deterministic approaches have also great difficulties to cope with ambiguities and noise. The explicit introduction of a dynamics in the curve evolution law has been considered in [4]. However, the proposed technique, although much more satisfying from the point of view of the forecasting of the curves, is not embedded into a tracking framework. In [11], an approach based on a group action mean shape and a moving average has been proposed. This tracking is restricted to simple motions. Recently an optimal control strategy has been defined for curve tracking [12]. This technique permits to cope with non linear differential evolution laws. It is nevertheless a deterministic technique that only involves Gaussian incertitude on the dynamical system. It is also a batch technique which relies on the entire image sequence. It can hardly be used for on-line tracking. The extraction of state trajectories relying on past measurements and on a dynamical model, as done with stochastic filtering, permits to handle naturally partial occlusions, cluttered noise and ambiguities. It enables also to rely on an approximate knowledge of the underlying dynamics. However, the state dimension constitutes the Achille’s heel of recursive Bayesian filter such as the particle filter. Due to this so called curse of dimensionality, only few works attempted to mix stochastic filtering and level set representation for curve tracking [13, 14]. These works have to face a high dimensional sampling problem and as a consequence rely on a crude discretization of the non linear curve dynamics which may be problematic in some situations. The approach we proposed for curve tracking is also implemented through a particle filter and a level set representation. This approach includes color measurements characterizing the target and the background respectively [15]. The dynamics involved is formulated as a stochastic differential equation. This allows us to get a continuous-time representation of the curve trajectory and, thus, to infer inter-frame deformations. This gives access to richer dynamics on curves. It would also permit the use of continuous time physical evolution laws in specific contexts. The stochastic dynamics is expressed on the level-set function and takes the form of a stochastic differential equation with Brownian motion of low dimension. Although such an attempt has been done to build stochastic dynamics for image segmentation in [16], our approach is different, as it integrates naturally the contribution of noise in the dynamics derivation. It also allows interpreting additional smoothing terms on the curve as a consequence of the incertitude we have on the curve dynamics. Conceptually, this yields a rigourus derivation of the curve dynamics, enabling to handle topological changes occuring between two frame instants, and also to cope with the propagation of possibly irregular curves driven by noisy motion fields. No adhoc, additional filters are here needed to propagate the curve. Such a smoothing is expicitly handled within the expression of the stochastic expression of the level set dynamics. The evolution models we propose combines local motion information extracted from the image and the modeling of dynamics uncertainty. The associated filter thus belongs to the family of conditional particle filters [17].
578
2
C. Avenel, E. Mémin, and P. Pérez
Stochastic Filtering and Particle Filter
Before introducing in detail the stochastic evolution laws on which we will rely in this work we present in this section the generic problem of continuous time stochastic filtering in presence of discrete-time measurements. Stochastic filters constitute well known procedures to estimate the posterior pdf p(xk |z1:k ) (called the filtering distribution) of a state variable of interest at any measurement instant k, given the discrete measurements series z1:k = (z1 · · ·zk ) until instant k, and an initial distribution p(x0 ). In the following, we consider a continuous time state xt . We will denote by xt=k or xk its value at the measurement instant k. At each time instant k, the measurement equation relates the observation zk to the state xk . In this work the general system we are dealing with is described by: dxt = f (xt )dt + σ(t)dBt , (1) zk = g(xk ) + vk , where Bt is a Brownian motion and vk is a noise variable. Functions f and g are non linear in the general case. Assuming there exists a transition distribution p(xt |xr
Particle Filter
Particle filtering is a sequential Monte Carlo framework that yields an approximate solution of the general stochastic filtering problem (non linear likelihood, non additive and non Gaussian noises). The filtering distribution p(xk |z1:k ) is recursively approximated by a finite weighted sum of N Diracs centered on hypothesized locations in the state space – called particles – of the initial system (i) (i) x0 . At each particle, xk (i = 1 : N ), is assigned a weight γk describing its relevance. This approximation reads: (i) p(xk |z1:k ) ≈ γk δx(i) (xk ). (2) i=1:N
k
Assuming that the approximation of p(xk−1 |z1:k−1 ) is known, the recursive computation of the filtering distribution is done by propagating the swarm of (i) (i) weighted particles {xk−1 , γk−1 }i=1:N . At each time instant (or iteration), the
Tracking Closed Curves with Non-linear Stochastic Filters
579
(i)
set of new particles {xk }i=1:N is drawn from an approximation of the true distribution p(xt≥k−1 |z1:k ), called the importance function and here denoted (i) π(xt |x0:k−1 , z1:k ). The closer the approximation to the true distribution, the (i)
more efficient the filter. The importance weights, wk , account for the deviation w.r.t. the unknown true distribution. To maintain a consistent sample, the importance weights are updated according to a recursive evaluation as the new measurement zk becomes available: (i) γk
∝
(i) γk−1
(i)
(i)
(i)
p(zk |xk ) p(xk |xk−1 ) (i) (i) π(xk |x0:k−1 , z1:k )
,
(i)
γk = 1.
(3)
i=1:N
Different choices are possible for this proposal density [17]. The most common one consists in setting the proposal distribution to the dynamics: (i)
(i)
π(xt |x0:k−1 , z1:t ) = p(xt |xk−1 ).
(4)
In this case the weight update in (3) is greatly simplified: it amounts to multiply(i) ing by the data likelihood p(zt |xt ). This version of the particle filter is known as the bootstrap filter. This is the kind of filter which we are dealing with. In our case the two steps of the filter reads: (i)
(i)
(i)
– Prediction step : xk ∼ p(xk |xk−1 ) (i)
(i)
(i)
– Correction step : wk ∝ wk−1 p(zk |xk ). (i)
The prediction step consists in sampling trajectories {xt : k−1 t k}i=1:N from the stochastic differential equation describing the continuous evolution of the state: (i) (i) (i) (i) dxt = f (xt )dt + σ(t, xt )dBt , (5) (i)
(i)
from the initial condition {xk−1 }i=1:N and where {Bt }i=1:N are independent Brownian motions. The simulation of the sde (5) can be done through the Euler scheme: (i) (i) (i) (i) (i) xt+Δt = xt + f (xt )Δt + σ(t)(Bt+Δt − Bt ), (6) (i)
(i)
where the increments Bt+Δt − Bt are independent Gaussian noises with zero mean and variance ΔtI. Let us note that the discretization step is much smaller than the inter measurement time interval (Δt 1). In order to avoid degeneracy of the particle swarm, a resampling step must be applied sufficiently often [19]. This process consists in drawing, with replacements, a new set of particle from the current one according to a probability distribution that depends on importance weights. The particles associated to low weights will tend to disappear whereas the ones with larger weights are likely to be duplicated. In this work the state variables will consist in closed curves represented by implicit surfaces. Their associate dynamics will be defined in section 3. Before that let us define the likelihood on which we will rely.
580
2.2
C. Avenel, E. Mémin, and P. Pérez
Likelihood Definition
In bootstrap filters, the likelihood associated to each particle directly determines its weight. It is therefore crucial for the likelihood to be sufficiently discriminant in order to discard curves which are too distant from the intended result. To this end, we choose to define a likelihood that depends on the similarity between the color distributions inside the curve at times t = 0 and t = k respectively. For each particle, it reads: (i)
(i)
p(zt |xt ) ∝ exp −λd(h0 , hk ),
(7)
where d is related to the Bhattacharyya distance between h0 the reference inte(i) rior color histogram instantiated at time 0 and hk the interior color histogram associated to the i-th level-set sample at time k, and λ is a positive parameter. For discrete probability distributions p and q defined over the same domain X, the Bhattacharyya distance is defined as: 1/2 d(p, q) = 1 − p(x)q(x) (8)
3
A Stochastic Evolution Law for Level Sets
As mentioned in the introduction, the curve that we want to track is defined by an implicit level-set representation. The stochastic dynamics has thus to be defined on this level-set function which is of infinite dimension (or at least of very high dimension in its discrete form). In order to cope with the curse of dimensionality that makes inefficient any sampling in high dimension, the model we consider relies on a low dimension Brownian motion. To this end we introduce next three different evolution laws and explain how they are related to evolution laws of level sets. Let Ct denote a closed Jordan curve Ct : [0, 1] → R2 at time t ∈ [t0 , τ ] of the image sequence. Let us first assume that this curve evolves in time according to the following evolution law: (1)
dCt = wn ndt + σ1 ndBt (1)
(2)
+ σ2 n⊥ dBt ,
(9)
(2)
where dBt and dBt are two independent Brownian motions, n is the unit vector normal to the curve and wn = w · n is the projection of some velocity field w on this normal. In this model, a deterministic drift associated to velocity field w is mitigated with an isotropic Gaussian incertitude that grows linearly in time. As a matter of fact, let us recall that the quadratic variation of the Brownian motion, on the real line for sake of simplicity, is: < σdBt , σdBt >t =
0
t
(σdBs )2 = lim
Δt→0
t
|Bt+Δt − Bt |2 = σ 2 t.
(10)
0
Contrary to differentiable deterministic functions, Brownian motion does not have a bounded variation (i.e., its total variation on [0, t] is infinite).
Tracking Closed Curves with Non-linear Stochastic Filters
581
Level Set Representation. As we wish to focus in this work on the tracking of non parametric closed curves that may exhibit topology changes during the time of the analyzed image sequence, we will rely on an implicit level set representation of the curve of interest [5, 7]. Within this framework, the curve Ct enclosing a region D we wish to track is described at time t through a higher dimensional surface Φ : R2 → R and the implicit equation: Ct = (xt (p) : Φ(xt (p)) = 0) ,
(11)
where p stands for a parameter of the curve and x ∈ Ω denote image positions. This representation constitutes an Eulerian representation of a curve and enables a natural topology adaptivity. The implicit representation is defined from an initial surface such as a signed distance function to the contours of interest, and evolves according to the curve evolution law. The curve at time t is defined by construction through its implicit representation at time t: t Φt = Φ0 + dΦs . (12) 0
Assuming the level set representation is uniquely defined from an initial surface and the curve evolution (9), the surface, Φ, constitutes a function of the stochastic process Ct . Its differential must be calculated using the so called Îto formula from stochastic calculus [20, 21]. Stochastic Level Set Evolution Law. Let us apply Îto formula to the implicit representation of the curve Φ(X t ) where X t = (Xtx Xty )T ∈ Ω, is driven by an Îto diffusion defined as an extension of the curve velocity: (1)
dX t = wn∗ ndt + σ1 ndBt
(2)
+ σ2 dBt n⊥ .
(13)
In this equation, the drift term wn∗ is an extension to the whole image domain of the curve deterministic drift along the curve normal n = ∇Φ/∇Φ. Following Îto formula, the process ϕt = Φ(X t ) is an Îto process defined as (1)
dϕt = wn∗ ∇ϕdt + σ1 ∇ϕdBt
+
1 ∂2ϕ < dXti , dXtj > . 2 i,j=x,y ∂xi , xj
(14)
The associated quadratic variation reads: < dXtx , dXtx > = < dXty , dXty > = <
dXtx , dXty
>=
σ12 ϕ2x +σ22 ϕ2y dt, ∇ϕ2 σ12 ϕ2y +σ2 ϕ2x dt, ∇ϕ2 (σ12 −σ22 )ϕx ϕy dt. ∇ϕ2
(15)
Introducing the surface normal expression, the Îto diffusion [21] driving the implicit surface evolution reads finally: dϕt = wn∗ ∇ϕ +
1 (ϕxx (σ12 ϕ2x + σ22 ϕ2y ) + ϕyy (σ12 ϕ2y + σ22 ϕ2x ) 2∇ϕ2 (1)
+2(σ12 − σ22 )ϕx ϕy ϕxy ))dt + σ1 ∇ϕdBt .
(16)
582
C. Avenel, E. Mémin, and P. Pérez
Recalling that the mean curvature can be expressed as: κ = curv(ϕ) =
1 (Δϕ − ∇ϕT ∇2 ϕ ∇ϕ), ∇ϕ
(17)
where ∇2 ϕ denotes the Hessian matrix and Δϕ the Laplacian, the surface evolution law may be written in a more compact form as: dϕ = (wn∗ ∇ϕ +
σ12 σ22 (1) κ∇ϕ + ∇ϕT ∇2 ϕ ∇ϕ)dt + σ1 ∇ϕdBt . (18) 2 2∇ϕ2
It can be observed from (18) that if both incertitudes have the same strength (i.e. σ1 = σ2 ) this model takes a particular simple form: 1 (1) dϕt = (wn∗ ∇ϕ + σ12 Δϕ)dt + σ1 ∇ϕdBt . 2
(19)
The dynamical model (2) constitutes a general stochastic process allowing to guide a curve through an implicit surface. This stochastic process will enable us to draw samples of curves in our tracking process. Before turning to the experiments, it is interesting to see to what corresponds the expectation of these stochastic processes. It can be shown, through Kolmogorov backward equation (the adjoint of the Fokker-Planck equation) that the expectation u(x, t) = Ex (Φ(X t )) evolves as: σ12 σ2 ∂u = (wn∗ + 2 κ)∇u + ∇uT ∇2 u ∇u, and u(x, 0) = Φ0 (x), (20) ∂t 2 2∇u2 where Φ0 denotes the initial surface, built from an initial value of the contour. This equation gives us the evolution law of the expectation on a fixed grid of an implicit surface driven by a stochastic dynamical model of form (9). This dynamical model includes two independent Brownian uncertainty on the curve motion directed along the curve’s tangent and normal respectively. The first term corresponds to the traditional deterministic evolution law of a level set function. The curvature term is here introduced due to the effect of the motion incertitude along the curves tangent. The second term is less usual and corresponds to an uncertainty directed along the surface normal. If both uncertainties are set to the same amplitude then the previous equation simplifies as: = wn∗ ∇u + u(x, 0) = Φ0 (x). ∂u ∂t
4
σ2 2∇Φ2 Δu,
(21)
Experiments and Results
Motion Information Extracted from the Images. The evolution laws introduced in the previous section are based on a stochastic force w calculated
Tracking Closed Curves with Non-linear Stochastic Filters
583
from the image. We now introduce the force we use in our experiments. It is a linear combination of two main components: w ∗(i) = β(t)v T n + (1 − β(t))∂ϕ F (ϕ(i) ) n
(22)
with proportions β(t) ∈ [0, 1] and 1 − β(t) respectively. The first component is a motion component obtained from an optical flow computation, while the second corresponds to a photometric edge component obtained from a generalized ChanVese operator [12]. Optical-Flow Component. The motion component v = (v x , v y )T is provided by a robust and fast optical-flow estimator. It is defined as the minimizer of the objective function: T f (∇I v + I(t + dt) − I(t)1p(zt |ν(x))<1− )dx + λ (∇v x 2 + ∇v y 2 )dx. Ω
Ω
(23) Function f is a robust function whose role is to discard data that significantly deviates from the brightness constancy assumption. This function together with the characteristic function defined from a local likelihood computed over a neighborhood ν(x) of x ∈ Ω (eq. 7) provides a smooth motion field on the whole image plane that represents only the motion of data points that likely correspond to the object of interest. This motion component is a rough description of the curve’s motion. It is reasonable to combine it with a photometric edge force.
Fig. 1. Tracking of a skier; first row: drift term with only the photometric edge component; second row: drift term defined as a combination of a photometric edge components and a motion component
Photometric Edge Component. The second component is derived from an operator [12] that corresponds to Chan and Vese operator [22] applied to histograms. It is thus defined from the derivative w.r.t. the unknown level set of the following objective function: F (ϕ, I)(x, t) = d(h(ν(x)), h0 )2 1ϕ(x)<0 + d(h(ν(x)), hb )2 1ϕ(x)≥0 ,
(24)
584
C. Avenel, E. Mémin, and P. Pérez
Fig. 2. Tracking cyclone Vince in infrared channel of Meteosat satellite
where d is the Bhattacharyya distance, ho and hb denote respectively the reference interior and exterior color histograms instantiated at time 0, h(ν(x)) represents the local color histogram at point x. The gradient of this objective function reads: ∂ϕ F = (d(h(ν(x)), ho )2 − d(h(ν(x)), hb )2 )δ(ϕ),
(25)
where δ(.) is the Dirac function. Both components have their own advantages in the time interval between measurement instants k and k + 1. For our tracking purpose, the photometric component is especially helpful in the temporal vicinity of the second images, whereas the optical-flow component is more likely to be meaningful as a rough component of the motion only in the temporal vicinity of the first image. As a consequence we choose to change gradually the proportion of each according to: β(t) =
2t − 1, t ∈ [0, Δk]. Δk
(26)
In order to illustrate the role of each component we show first results on a sequence of 21 frames depicting a skier in action. On Fig. 1, the first row exhibits the results obtained when considering only the photometric component with a constant weight. The second row shows the results obtained from the combination of the optical-flow and the photometric components. Between t = 13 and t = 15, the skier moves rapidly to the right of the image. It can be observed that in the first case, the tracker quickly focuses on the skier’s shadow only. In the second case, the optical flow term allows us to cope with this large displacement and to improve the result. Let us outline that for visualisation purposes, we have centered all the images on the skier. The second sequence on which we present results is composed of 100 meteorological images (Meteosat infra-red image) showing the evolution of cyclone Vince over North Atlantic. In Fig. 2 we show in red the level set associated to the mean of all implicit function particles (after resampling) and the standard
Tracking Closed Curves with Non-linear Stochastic Filters
585
deviation of the estimate. As can be observed from these pictures or from the companion video the results are of good quality. The method allows a robust tracking of the regions of interest. When the cyclone collapses at the end of the sequence, the tracking becomes less certain and the variance of the estimation grows. Such an assessment of estimate confidence is another great advantage of probabilistic techniques. We finally present results on 30 frames of a video showing a lion running in the savanna. The results obtained are shown in Fig. 3.
Fig. 3. Tracking of lion running in the savanna with our particle filter on the space of implicit functions
We can observe on this sequence that for regions where background color is a source of high ambiguities (i.e., around such as the legs), the uncertainty is important. The top of the lion is clearly distinct from the background, it is therefore segmented with better accuracy and confidence. Beside the quality of the results local confidence assessment via variance vizualisation (or analysis) is an interesting feature of our approach. This could probably be of practical interest in medical image applications. In order to show the advantage of our method, we present in Fig. 4 the same sequence with successive segmentations obtained using the Chan-Vese operator only. We can observe the lack of continuity in the tracking, and the selection of several portions of the background due to color ambiguities. Our method avoids these problems by favoring a continuous evolution of the implicit surface. Our method involves two main parameters, which are related to the incertainty we have on the curve dynamics. The estimation of these parameters is not addressed in this paper but will be investigated in future researches. We
Fig. 4. Successive segmentations of lion running in the savanna
586
C. Avenel, E. Mémin, and P. Pérez
have observed that better results were obtained for a noise along the curve tangent that is slightly larger than for the one directed along the normal. For the sequences shown in this paper we chose σ1 = 3 and σ2 = 4.
5
Conclusions and Future Work
In this paper we have described a probabilistic filtering method for the tracking of level sets. The technique we propose is implemented through a particle filter and combines discrete-time image measurements with a continuous-time stochastic dynamics. This continuous dynamics relies on two different incertainties on the curve motion, directed respectively along the curve normal and along the curve tangent. The considered curve dynamics has been built from the image data by considering a drift term that combines in varying proportions a motion component and a photometric component. The measurement considered in this filter are built from color histograms of the object delineated by the user at the initial time. The first perspective concerns the automatic estimation of the two noise variances. The first one is related to the incertainty on the motion whereas the second one corresponds to the level of noise in the image. Another perspective concerns the management of occlusions. To that end, an idea would be to modify the coefficient of the normal noise according to the average of all likelihoods of particles. Thus, in case of loss of the object, the uncertainty would grow, resulting in a spread and expansion of the level sets and, as a consequence, in a more likely recovery of the tracker when the object reappears. Finally, it could be interesting to investigate the use a Brownian motion of higher dimension to capture a larger set of deformations between two consecutive frames.
References 1. Cremers, D., Soatto, S.: Motion competition: A variational framework for piecewise parametric motion segmentation. IJCV 62(3), 249–265 (2005) 2. Goldenberg, R., Kimmel, R., Rivlin, E., Rudzsky, M.: Fast geodesic active contours. IEEE Trans. on Image Processing 10(10), 1467–1475 (2001) 3. Kimmel, R., Bruckstein, A.M.: Tracking level sets by level sets: a method for solving the shape from shading problem. Comput. Vis. Image Underst. 62(1), 47–58 (1995) 4. Niethammer, M., Tannenbaum, A.: Dynamic geodesic snakes for visual tracking. In: CVPR (1), pp. 660–667 (2004) 5. Osher, S., Sethian, J.A.: Fronts propagating with curvature-dependent speed: Algorithms based on Hamilton-Jacobi formulations. Journal of Computational Physics 79, 12–49 (1988) 6. Paragios, N., Deriche, R.: Geodesic active regions: a new framework to deal with frame partition problems in computer vision. J. of Visual Communication and Image Representation 13, 249–268 (2002) 7. Sethian, J.: Level set methods: An act of violence - evolving interfaces in geometry, fluid mechanics, computer vision and materials sciences (1996) 8. Cremers, D.: Dynamical statistical shape priors for level set based tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(8), 1262–1273 (2006)
Tracking Closed Curves with Non-linear Stochastic Filters
587
9. Leventon, M., Grimson, E., Faugeras, O.: Statistical shape influence in geodesic active contours. In: CVPR (2000) 10. Paragios, N.: A level set approach for shape-driven segmentation and tracking of the left ventricle. IEEE Trans. on Med. Imaging 22(6) (2003) 11. Cremers, D., Soatto, S.: Variational space-time motion segmentation. In: ICCV 2003: Proceedings of the Ninth IEEE International Conference on Computer Vision, Washington, DC, USA, p. 886. IEEE Computer Society, Los Alamitos (2003) 12. Papadakis, N., Mmin, E.: A variational technique for time consistent tracking of curves and motion. Journal of Mathematical Imaging and Vision 31(1), 81–103 (2008) 13. Jiang, T., Tomasi, C.: Level-set curve particles. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3953, pp. 633–644. Springer, Heidelberg (2006) 14. Rathi, Y., Vaswani, N., Tannenbaum, A., Yezzi, A.: Tracking deforming objects using particle filtering for geometric active contours. IEEE Trans. Pattern Analysis and Machine Intelligence 29(8), 1470–1475 (2007) 15. Pérez, P., Hue, C., Vermaak, J., Gangnet, M.: Color-based probabilistic tracking. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 661–675. Springer, Heidelberg (2002) 16. Juan, O., Keriven, R., Postelnicu, G.: Stochastic motion and the level set method in computer vision: Stochastics active contours. International Journal of Computer Vision 69(1), 7–25 (2006) 17. Arnaud, E., Mmin, E.: Partial linear gaussian model for tracking in image sequences using sequential monte carlo methods. IJCV 74(1), 75–102 (2007) 18. Jazwinski, A.H.: Stochastic Processes and Filtering Theory. Academic Press, London (1970) 19. Liu, J.S., Chen, R.: Sequential Monte Carlo methods for dynamic systems. Journal of the American Statistical Association 93(443), 1032–1044 (1998) 20. Karatzas, I., Shreve, S.E.: Brownian Motion and Stochastic Calculus. Graduate Texts in Mathematics. Springer, Heidelberg (2004) 21. Oksendal, B.: Stochastic Differential Equations: An Introduction with Applications (Universitext). Springer, Heidelberg (2005) 22. Chan, T., Vese, L.: An active contour model without edges. In: Nielsen, M., Johansen, P., Fogh Olsen, O., Weickert, J. (eds.) Scale-Space 1999. LNCS, vol. 1682, pp. 141–151. Springer, Heidelberg (1999)
A Multi-scale Feature Based Optic Flow Method for 3D Cardiac Motion Estimation Alessandro Becciu1 , Hans van Assen1 , Luc Florack1,2 , Sebastian Kozerke3, Vivian Roode1 , and Bart M. ter Haar Romeny1 1
Eindhoven University of Technology, Biomedical Engineering, Eindhoven 5600 MB, The Netherlands [email protected] 2 Eindhoven University of Technology, Mathematics and Computer Science, Eindhoven 5600 MB, The Netherlands 3 Institute for Biomedical Engineering, University of Zurich and Swiss Federal Institute of Technology, Zurich, Switzerland
Abstract. The dynamic behavior of the cardiac muscle is strongly dependent on heart diseases. Optic flow techniques are essential tools to assess and quantify the contraction of the cardiac walls. Most of the current methods however are restricted to the analysis of 2D MR-tagging image sequences: due to the complex twisting motion combined with longitudinal shortening, a 2D approach will always suffer from throughplane motion. In this paper we investigate a new 3D aperture-problem free optic flow method to study the cardiac motion by tracking stable multi-scale features such as maxima and minima on 3D tagged MR and sine-phase image volumes. We applied harmonic filtering in the Fourier domain to measure the phase. This removes the dependency of intensity changes of the tagging pattern over time due to T1 relaxation. The regular geometry, the size-changing patterns of the MR-tags stretching and compressing along with the tissue, and the phase- and sine-phase plots represent a suitable framework to extract robustly multi-scale landmark features. Experiments were performed on real and phantom data and the results revealed the reliability of the extracted vector field. Our new 3D multi-scale optic flow method is a promising technique for analyzing true 3D cardiac motion at voxel precision, and free of through-plane artifacts present in multiple-2D data sets.
1
Introduction
Cardiac diseases represent number one cause of death and disability in the western countries [1]. Symptoms of cardiac illness can be sometimes traced back from the adolescence [2, 3], making a prevention in the childhood a necessity. Cardiac illnesses may influence the deformation of the cardiac walls. A visualization and quantification of cardiac motion may therefore become an important step in the diagnosis, giving indications of the progress of the disease and/or therapy and perhaps even as precursors of cardiac symptoms. X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 588–599, 2009. c Springer-Verlag Berlin Heidelberg 2009
A Multi-scale Feature Based Optic Flow Method
589
Optic flow is one of the traditional techniques in carrying out motion analysis. It measures the apparent velocity pattern of moving structures in an image sequence. In computer vision literature, several optic flow approaches have been described, ranging from gradient based to feature based methods. Differential techniques compute the velocity from spatiotemporal image intensity derivatives or altered versions of the image, using low-pass or band-pass filters. In most of these techniques it is assumed that brightness does not change by small displacements and the motion is estimated by solving the so-called Optic Flow Constraint Equation (OFCE): Lx u + Ly v + Lt = 0 3
(1)
where L(x, y, t) : R → R is an image sequence, Lx , Ly , Lt are the spatiotemporal derivatives, u(x, y, t), v(x, y, t) : R3 → R are unknown velocity vectors and x, y and t are the spatial and temporal coordinates respectively. Since there is one equation and two unknowns (u and v), a unique solution cannot be found. This has been referred as the "aperture problem" and can be solved by generating as many equations as the unknown velocities. In order to find a plausible solution for equation (1), Horn and Schunck [4] combined the gradient constraint with a global smoothness term, finding the solution by minimizing an energy function. Lucas and Kanade [5] proposed a local differential technique, for which the flow field is constant in a small spatial neighborhood. The results obtained by the early methods were impressively improved by Brox et al. and Bruhn et al. [6, 7], who investigated a continuous, rotationally invariant energy functional and gave a multi-grid approach to the variational optic flow methods. In literature feature tracking techniques are also well studied methods for motion estimation. For instance Thirion [8] proposed optic flow method introducing an analogy with Maxwell’s demons. In the technique the constant brightness assumption is preserved and the feature points are pushed toward their successive most likely position by forces. One of the first applications of optic flow methods to tagged MRI was introduced by Dougherty et al. [9]. Florack et al. [10] developed a robust differential technique in a multi-scale framework, whose application to cardiac MR images was presented by Niessen et al. [11,12] and Suinesiaputra et al. [13]. Van Assen et al. and Florack and Van Assen [14,15] developed a method based on multiple independent MR tagging acquisitions, removing altogether the aperture problem, by generating as many equations as unknowns. In recent years there has been a high increase of computational power and it is becoming more feasible to compute 3-dimensional optic flow fields from MRI data. However, most of the current methods for flow estimation are restricted to the analysis of 2-dimensional MR images, even if the extension to 3-dimensional approach would be straightforward. In case of cardiac motion estimation, 2-dimensional optic flow techniques capture only expansions, contractions and rotations of the cardiac tissue, missing, however, the twisting motion. A 3-dimensional optic flow technique takes into account all the components of the cardiac motion, providing therefore a more realistic estimation of the heart behaviour. The 3-dimensional version of equation (1) is:
590
A. Becciu et al.
Lx u + Ly v + Lz w + Lt = 0
(2)
where u(x, y, z, t), v(x, y, z, t) and w(x, y, z, t) : R4 → R are now the unknown velocity vectors. An example of 3-dimensional gradient based optic flow estimation has been proposed in 2004 by Barron [16]. He explored the 3-dimensional motion from gated MRI cardiac datasets extending the Horn and Schunck and Lucas and Kanade approaches to three dimensions. This method, however, imposes a constant intensity assumption, which in MRI tagging images does not hold due to the T1 relaxation. Pan et al. [17] instead tracked a cardiac mesh, consisting of a collection of material points extracted from HARP images. The estimation, however, is done on sparse set of HARP planes, therefore the tracking can not be performed for every point within the heart volume. A similar approach which makes use of the so-called "slice-following" was performed by Sampath and Prince [18]. In this paper we investigate cardiac motion from image volumes by exploiting point features in Gaussian scale-space. These features are interesting candidates for motion analysis: for those points the aperture problem does not arise and they are detected in a robust framework, which is inspired by findings of the multiscale structure of the visual system. In the experiments maxima and minima are chosen as feature points and the approach has been tested on an artificial and real image sequence. Outcomes of the proposed technique reveal the reliability of the vector field. In Section 2 a preprocessing approach is presented. In 2.1 and 2.2 the image structure of the data and the dataset are discussed. The multi-scale framework used in the experiments and a convenient technique for extracting multi-scale features is explored in Section 3. There we also present the calculation of a sparse velocity vector field, the dense flow field extension and the angular error measure. Finally in Sections 4 and 5 we describe the experiment, the results, and discuss future directions.
2 2.1
Materials Image Structure
In 1988 Zerhouni et al. [19] introduced a tagging method for noninvasive assessment of myocardial motion. The method introduces structure, represented as dark stripes (Figure 1 top), on the image aiming to improve the visualization of the intramyocardial motion. The approach was later improved by Axel and Dougherty and Fischer et al. [20, 21], who explored magnetic resonance imaging using spatial modulation of magnetization (SPAMM) and (CSPAMM) respectively. The images, however, suffer from tag fading, making the frames not suitable for optic flow methods based on conservation of brightness. In the harmonic phase (HARP) method [22, 23], MR images are filtered in the spectral domain and this technique overcomes the fading problem by taking into account the spatial phase information from the inverse transform of the filtered images. In our experiments a similar technique was employed using Gabor filters [24]. Three tagged image series with mutually perpendicular tag lines were acquired
A Multi-scale Feature Based Optic Flow Method
591
(figure 1 top) and all but the first harmonic peak was suppressed using a bandpass filter in the Fourier domain (Figure 1, row 3). After applying the inverse Fourier Transform, in the filtered images the phase varies periodically from 0 to 2π creating a saw tooth pattern (Figure 1 row four, columns 1 to 3). A sine function was applied to the phase images so as to avoid spatial discontinuities in the input due to the saw tooth pattern. A combination of sine phase frames was later employed to produce a grid, from which the feature points (maxima and minima) were retrieved (Figure 1 bottom).
Fig. 1. Top: cross sections of the cardiac MR tagged images volumes of a patient. From left to right: short axis view (frames present horizontal tags), 2 long axis views (frames present vertical and horizontal tags). Second row: Fourier spectrum of the MR tagged images. Middle: Fourier spectrum with the band-pass filter. Fourth row: phase plots, the phase varies periodically from 0 to 2π creating a saw tooth pattern. Bottom: sine phase images and volume grid obtained by combining three sine phase volumes.
2.2
Dataset
The experiments were carried out on a 3-dimensional tagged MR image volume sequence of a patient heart. The data were acquired using a 3D CSPAMM sequence [25] developed at ETH Zurich, Switzerland and consisted of 23 frames
592
A. Becciu et al.
with a temporal resolution of 30 ms. In each frame, 14 image slices were present for each of three different views (one short axis and two long axis views). The different views were perpendicular with respect to each other (Figure 2) and by combining them, a grid is obtained from which the critical points were retrieved. The images present a resolution of 112 × 112 pixels and in order to obtain an image volume of 112 × 112 × 112 voxels, linear interpolation through the 14 slices was applied.
Fig. 2. Ninth cardiac MR tagged frame. From left to right: short axis view (frames present horizontal tags), 2 long axis views (frames present vertical and horizontal tags) and a combination of the image planes.
3 3.1
Method Scale Space
Scale is one of the most important concepts in human vision. When we look at a scene, we instantaneously view its contents at multiple scale levels. The Gaussian scale-space representation L(x, y, z, s) ∈ R3 × R+ of a raw 3-dimensional image f (x, y, z) ∈ R3 is defined by the convolution of f (x, y, z) with a Gaussian kernel φ(x, y, z, s) ∈ R3 × R+ . L(x, y, z, s) = (f ∗ φ)(x, y, z, s) where φ(x, y, z, s) =
√ 1 ( 2πs)3
exp(− x +
2
2
+y +z 2s2
2
(3)
). In equation (3) x, y and z are the
spatial coordinates, whereas s ∈ R denotes the variance of the Gaussian kernel (scale). Equation (3) provides a blurred version of the image, where the strength of blurring depends on the choice of scale. For an extensive review on scale space see [26, 27, 28, 29]. 3.2
Critical Point Detection
Singularities (critical points) induced by the MR tagging pattern are interesting candidates for structural descriptions. Computation of critical points in scale space can be performed in an efficient way by detecting locations where the gradient of the input image vanishes. Classification of the detected points can be then carried out by determining the sign of the eigenvalues of the Hessian matrix. Locations where the signs of all eigenvalues are positive correspond to locations of local minima; locations where the eigenvalues are all negative, match with locations of local maxima and, finally, eigenvalues with mixed signs provide information about saddle points.
A Multi-scale Feature Based Optic Flow Method
3.3
593
Sparse Velocities of Feature Points and Dense Flow Field
In our experiments given a sequence of frames, we assume that the singularity (feature) points move along with the moving tissue (this is true by construction of the tagging pattern, provided the feature points correctly correspond to the tag crossings). In general, given a point in a sequence of frames defined as L(x(t), y(t), z(t), t), where t indicates the time, the critical points are defined implicitly by a vanishing spatial gradient: ∇L(x(t), y(t), z(t), t) = 0
(4)
In order to track the feature points, we derive equation (4) with respect to time and apply the chain rule for implicit functions, yielding: ⎡ ⎤ Lxx u + Lxy v + Lxz w + Lxt d [∇L(x(t), y(t), z(t), t)] = ⎣ Lyx u + Lyy v + Lyz w + Lyt ⎦ = 0 (5) dt Lzx u + Lzy v + Lzz w + Lzt d is the total time derivative, and where we have dropped space-time where dt arguments on the r.h.s. for simplicity. Equation (5) holds only on location of critical points and can be also written as: ⎡ ⎤ u ⎣ v ⎦ = −H −1 ∂∇L (6) ∂t w
where H denotes the Hessian matrix of L(x(t), y(t), z(t), t). The velocities computed by equation (6) represent the flow field at a sparse set of positions. In order to retrieve a dense velocity field, the sparse velocities have been interpolated using homogeneous diffusion interpolation. Given a spatial domain Ω → R3 , the scalar functions u(x, y, z), v(x, y, z) and w(x, y, z) are the horizontal and vertical components of a velocity vector V : Ω → R3 . We know the velocity vectors just at certain positions and we call these vectors V = { u, v , w} such that V : Ωs → R3 , where Ωs is a finite subset of Ω. We are interested in retrieving a dense set of vectors V ∀x, y, z ∈ Ω. In order to do so, we minimize the energy function E(u, v) = (∇u(x, y, z)2 + ∇v(x, y, z)2 + ∇w(x, y, z)2 )dxdydz (7) Ω
under the constraint V = V ∀x, y, z ∈ Ωs . The minimization of equation (7) is carried out by employing Euler-Lagrange equations and the resulting expression can be solved with numerical schemes. 3.4
Angular Error
The flow vector at certain positions in the image can deviate from the true flow vector at that position in direction and in length. In our assessment we
594
A. Becciu et al.
are interested in the movement from one frame to the next. Therefore, we set the time component of the flow vector to 1, yielding a 4-dimensional vector V = {u, v, w, 1}. The computed vector field has been compared with the ground truth extracted by two artificial sequences described in Section 4. The assessment has been performed using the so-called average angular error (AAE) introduced by Barron et al. [30] Vt Ve Angular Error = arccos( ) · 2 2 2 2 2 2 u + v ut + vt + wt + 1 e e + we + 1
(8)
where Vt is the true vector with spatial component ut , vt , wt and time component 1, whereas Ve is the estimated velocity vector and ue , ve , we and 1 are its spatial and time components respectively.
4
Results
The proposed optic flow method was applied on a real sequence of 23 MR image volumes (Figure 1), representing the beating heart of a patient. The images presented a resolution of 112 × 112 × 112 √ voxels and contained tags of 8 voxels wide. The spatial scale is defined as σ = 2s and the experiments were performed from spatial scale σ = 1 until scale σ = 3 at time scale 1. In order to assess the quality of extracted vector field, one artificial translating sequence of 19 frames was built using the first frame of sine phase grid image (Figure 1, row 5 and column 4). The algorithm was also tested on a more realistic sine phase grid phantom with the same number of frames and with non rigid motion, such as contraction and expansion. Computed vector fields of the translating sequence and the expanding and contracting phantom are depicted in Figure 3. The computation of the flow field was performed from frame 8 to frame 11 in order to avoid outliers due to temporal boundary conditions. In Table 1 the performance of the proposed method, employing multi-scale maxima and minima, is displayed. Error measurement was carried out only on locations of retrieved features, in order to assess the reliability of the corresponding velocity. In both sequences, evaluation revealed high accuracy of the extracted
Fig. 3. Vector fields in the artificial sequence. Vector field of the translating sequence (left) and two frames of the contracting and expanding phantom’s vector field( middle and right).
A Multi-scale Feature Based Optic Flow Method
595
Table 1. Performance of the proposed optic flow method with different multi-scale feature points. In the experiments the Average Angular Error (AAE) and its standard deviation have been employed as error measurement. The error measure is expressed in degrees. The scales used in the experiment were: spatial scale σ = {1, 1.3, 1.7, 2.3, 3}, time scale 1. Translating Sequence Nonrigid Motion Feature Maxima Minima Maxima Minima AAE 5.4 × 10−5 2.4 × 10−5 1.0 0.2 Std 2.1 × 10−5 1.3 × 10−5 1.4 0.1
vector fields for both maxima and minima, suggesting to employ a combination of the two retrieved velocities during the interpolation process. The error measure is expressed in degrees. Accuracy of the dense vector field is dependent on the reconstruction method used. As a preliminary study, the homogeneous diffusion interpolation method was applied in this optic flow algorithm. Figure 4 depicts plots of average angular error for both phantoms with respect to the scale σ. The graphs reveal that the smallest average angular error was obtained at different scales for different features, highlighting the importance of using a multi-scale approach. In particular for the translating sequence, maxima and minima (Figure 4 row 1) obtained best performance at scale σ = 1 and scale σ = 1.3 respectively, in case of the contracting and expanding phantom, maxima and minima registered best performance at scale σ = 2.3 and scale σ = 1.7 respectively (Figure 4 row 2). Figure 5 displays the 3-dimensional sparse vector fields on 2-dimensional cross-section of the tenth frame of the real cardiac image volume. The heart is in phase of contraction. On the short axis view in row 1 and column 1, the velocity vectors in yellow point not only to the center AAE 0.00012
AAE 0.00008
0.00010 0.00006 0.00008 0.00006
0.00004
0.00004 0.00002 0.00002 0.00000 0.5
1.0
1.5
2.0
2.5
3.0
3.5
Σ
0 0.5
1.0
1.5
2.0
2.5
3.0
3.5
3.0
3.5
Σ
AAE 2.0
AAE 10 8
1.5
6 1.0 4 0.5 2 0 1.0 2 4
1.5
2.0
2.5
3.0
3.5
Σ
0.0 1.0
1.5
2.0
2.5
Σ
0.5 1.0
Fig. 4. Average Angular Errors plots in function of scale. Plots in row 1 display the average angular error for the vector field extracted from the translating sequence. Case maxima, column 1; case minima, column 2. Plots in row 2 depict the average angular error for motion field computed from the contracting and expanding phantom. Case maxima, column 1; case minima, column 2.
596
A. Becciu et al.
Fig. 5. Three-dimensional velocity flow field on two-dimensional cross sections of the cardiac image volume. Short axis (row 1) and two long axis (row 2). The 3-dimensional vectors describe with accuracy the cardiac motion and overcome problems typical of the 2-dimensional optic flow methods, such as through-plane motion detection.
of the ventricle, but point also down. To the right, the same image is displayed from another perspective showing how the method is able to find through-plane components of the velocity vectors. This is confirmed also by the velocity vectors of the long axis view images in row 2, which point down as well.
5
Discussion
In this paper we investigate a new method to track cardiac motion from 3dimensional volume images by following the movement of multi-scale singularity points. The computed 3-dimensional vector field exhibits expansions, contractions and twistings of the cardiac tissue (Figure 5), making the results more realistic than velocity fields obtained by a 2-dimensional approach. In the latter case, results would highlight only contractions, expansions and rotations of the cardiac muscle, and through-plane motion would not be detected. The proposed method is not based on conservation of brightness, but based on the assumption
A Multi-scale Feature Based Optic Flow Method
597
that an extremum individuated on the crossing between the tags will still remain an extremum after a displacement, and even under T1 relaxation, showing as tag fading. Therefore, equation 4 does hold in this case. Moreover, in order to improve the localization of critical points, the images have been filtered in the Fourier domain, a phase image sequence is reconstructed and a sine-phase grid sequence has been generated. The new images adhere to the brightness conservation principle due to the filtering in Fourier domain, the fading problem is avoided completely and equation 4 holds for these filtered images as well. The method has been assessed using two phantoms, one translating sequence and one expanding and contracting phantom, for which the ground truth was known. In both cases qualitative and quantitative analysis of the results emphasize the reliability of the vector field. The experiments have been carried out using only multi-scale maxima and multi-scale minima, in future tests the algorithm will be assessed also with other multi-scale features points and combinations of those. In the tests the velocity field of our approach has been extracted at fixed scales. The most suitable scale has been chosen taking into account performance of the method with respect to the ground truth. In real data, due to continuous deformation of the cardiac walls, the structure changes scale over time, thus, the final results obtained in the assessment may not be optimal. Therefore, it may be interesting to repeat the same experiments by using a more sophisticated scale selection methods. Furthermore, the behavior of the cardiac muscle is characterized by twistings and contractions, therefore, interpolation with a term, that takes into account the rotation and the expansion of the vector field may improve the results. Finally, the retrieved motion field may find also an application in validating mathematical models describing heart deformation. Ubbink et al. [31], for instance, compared three simulations of the cardiac muscle, illustrating how the orientation of modeled myofibers plays an important role in the computation of the final strain. A validation of these methods might be carried out by comparing the simulated strain with a ground truth strain calculated from the extracted optic flow field using real data.
Acknowledgements We would like to thank Dr Markus van Almsick for his help in Mathematica implementation. This work is supported by the ENN 06760 project grant through the Stichting voor de Technische Wetenschappen (STW).
References 1. Rosamond, W., Flegal, K., Furie, K., Go, A., Greenlund, K., Haase, N., Hailpern, S.M., Ho, M., Howard, V., Kissela, B., Kittner, S., Lloyd-Jones, D., McDermott, M., Meigs, J., Moy, C., Nichol, G., O’Donnell, C., Roger, V., Sorlie, P., Steinberger, J., Thom, T., Wilson, M., Hong, Y.: American heart association statistics committee and stroke statistics subcommittee: heart disease and stroke statistics 2008 update. A report from the american heart association statistics committee and stroke statistics subcommittee. Circulation 117, 2–122 (2008)
598
A. Becciu et al.
2. Rainwater, D.L., McMahan, C.A., Malcom, G.T., Scheer, W.D., Roheim, P.S., McGill, H.C., Strong, J.: Lipid and apolipoprotein predictors of atherosclerosis in youth. Arteriosclerosis, Thrombosis, and Vascular Biology 19, 753–761 (1999) 3. McGill, H.C., McMahan, C.A., Zieske, A.W., Sloop, G.D., Walcott, J.V., Troxclair, D., Malcom, G.T., Tracy, R.E., Oalmann, M.C., Strong, J.P.: Associations of coronary heart disease risk factors with the intermediate lesion of atherosclerosis in youth. Arteriosclerosis, Thrombosis, and Vascular Biology 20 (2000) 4. Horn, B.K.P., Shunck, B.G.: Determining optical flow. Artificial Intelligence 17, 185–203 (1981) 5. Lucas, B., Kanade, T.: An iterative image registration technique with application to stereo vision. In: DARPA, Image Process., vol. 21, pp. 85–117 (1981) 6. Brox, B., Bruhn, A., Papenberg, N., Weickert, J.: High accuracy optical flow estimation based on a theory for warping. In: Pajdla, T., Matas, J.G. (eds.) ECCV 2004. LNCS, vol. 3024, pp. 25–36. Springer, Heidelberg (2004) 7. Bruhn, A., Weickert, J., Kohlberger, T., Schnoerr, C.: A multigrid platform for real-time motion computation with discontinuity-preserving variational methods. International Journal of Computer Vision 70(3), 257–277 (2006) 8. Thirion, J.P.: Image matching as a diffusion process: an analogy with Maxwell’s demons. Medical Image Analysis 2(3), 243–260 (1998) 9. Dougherty, L., Asmuth, J., Blom, A., Axel, L., Kumar, R.: Validation of an optical flow method for tag displacement estimation. IEEE Transactions on Medical Imaging 18(4), 359–363 (1999) 10. Florack, L., Niessen, W., Nielsen, M.: The intrinsic structure of optic flow incorporating measurements of duality. International Journal of Computer Vision 27(3), 263–286 (1998) 11. Niessen, W., Duncan, J., ter Haar Romeny, B., Viergever, M.: Spatiotemporal analysis of left ventricular motion. In: Medical Imaging 1995, San Diego, pp. 192– 203. SPIE (1995) 12. Niessen, W., Duncan, J., Nielsen, M.L.F., ter Haar Romeny, B., Viergever, M.: A multiscale approach to image sequence analysis. Computer Vision and Image Understanding 65(2), 259–268 (1997) 13. Suinesiaputra, A., Florack, L., Westenberg, J., ter Haar Romeny, B., Reiber, J., Lelieveldt, B.: Optic flow computation from cardiac MR tagging using a multiscale differential method a comparative study with velocity encoded MRI. In: Ellis, R.E., Peters, T.M. (eds.) MICCAI 2003. LNCS, vol. 2878, pp. 483–490. Springer, Heidelberg (2003) 14. van Assen, H.C., Florack, L., Suinesiaputra, A., ter Haar Romeny, B.M., Westenberg, J.J.M.: Purely evidence based multi-scale cardiac tracking using optic flow. In: MICCAI 2007 workshop on Coputational Biomechanics for Medicine II, pp. 84–93 (2007) 15. Florack, L., van Assen, H.C.: Dense multiscale motion extraction from cardiac cine MR tagging using HARP technology. In: Mathematical Methods in Biomedical Image Analysis. Workshop of the ICCV (2007) 16. Barron, J.: Experience with 3D optical flow on gated mri cardiac datasets. In: Proceedings of the 1st Canadian Conference on Computer and Robot Vision, pp. 370–377. IEEE Computer Society, Los Alamitos (2004) 17. Pan, L., Prince, J., Lima, J., Arts, N.: Fast tracking of cardiac motion using 3DHARP. IEEE transactions on Biomedical Engineering 52(8), 1425–1435 (2005) 18. Sampath, S., Prince, J.: Automatic 3D tracking of cardiac material markers using slice-following and harmonic-phase MRI. Magnetic Resonance Imaging 25, 197–208 (2007)
A Multi-scale Feature Based Optic Flow Method
599
19. Zerhouni, E.A., Parish, D.M., Rogers, W.J., Yang, A., Sapiro, E.P.: Human heart: tagging with MR imaging a method for noninvasive assessment of myocardial motion. Radiology 169(1), 59–63 (1988) 20. Axel, L., Dougherty, L.: MR imaging of motion with spatial modulation of magnetization. Radiology 171(3), 841–845 (1989) 21. Fischer, S.E., McKinnon, G., Maier, S., Boesiger, P.: Improved myocardial tagging contrast. Magnetic Resonance in Medicine 30(2), 191–200 (1993) 22. Osman, N.F., McVeigh, W.S., Prince, J.L.: Cardiac motion tracking using cine harmonic phase (harp) magnetic resonance imaging. Magnetic Resonance in Medicine 42(6), 1048–1060 (1999) 23. Sampath, S., Derbyshire, J., Atalar, E., Osman, N., Prince, J.: Realtime imaging of two dimensional cardiac strain using a harmonic phase magnetic resonance imaging (HARP MRI) pulse sequence. Magnetic Resonance in Medicine 50(1), 154–163 (2003) 24. Gabor, D.: Theory of communication. J. IEE 93(26), 429–457 (1946) 25. Rutz, A., Ryf, S., Plein, S., Boesiger, P., Kozerke, S.: Accelerated whole-heart 3D CSPAMM for myocardial motion quantification. Magnetic Resonance in Medicine 59(4), 755–763 (2008) 26. Koenderink, J.J.: The structure of images. Biological Cybernetics 50, 363–370 (1984) 27. ter Haar Romeny, B.M.: Front-End Vision and Multi- Scale Image Analysis: Multiscale Computer Vision Theory and Applications, written in Mathematica. Computational Imaging and Vision. Kluwer Academic Publishers, Dordrecht (2003) 28. Florack, L.: Image Structure. Computational Imaging and Vision. Kluwer Academic Publishers, Dordrecht (1997) 29. Lindeberg, T.: Scale-Space Theory in Computer Vision, 1st edn. The Springer Intern. Series in Engineering and Computer Science. Kluwer Academic Publishers, Dordrecht (1994) 30. Barron, J.L., Fleet, D.J., Beauchemin, S.: Performance of optical flow techniques. International Journal of Computer Vision 12(1), 43–77 (1994) 31. Ubbink, S., Bovendeerd, P., Delhaas, T., Arts, T., van de Vosse, F.: Towards modelbased analysis of cardiac MR tagging data: relation between left ventricular shear strain and myofiber orientation. Medical Image Analysis 10, 632–641 (2006)
A Combined Segmentation and Registration Framework with a Nonlinear Elasticity Smoother Carole Le Guyader1 and Luminita A. Vese2 1
IRMAR, UMR CNRS 6625 Institut National des Sciences Appliquées de Rennes 20, Avenue des Buttes de Coësmes, CS 14315, 35043 RENNES Cedex, France [email protected] 2 Department of Mathematics, University of California, Los Angeles Los Angeles, CA 90095-1555, USA [email protected]
Abstract. In this paper, we present a new non-parametric combined segmentation and registration method. The problem is cast as an optimization one, combining a matching criterion based on the active contour without edges [4] for segmentation, and a nonlinear-elasticity-based smoother on the displacement vector field. This modeling is twofold: first, registration is jointly performed with segmentation since guided by the segmentation process; it means that the algorithm produces both a smooth mapping between the two shapes and the segmentation of the object contained in the reference image. Secondly, the use of a nonlinearelasticity-type regularizer allows large deformations to occur, which makes the model comparable in this point with the viscous fluid registration method [7]. Several applications are proposed to demonstrate the potential of this method to both segmentation of one single image and to registration between two images.
1
Introduction
Image registration and image segmentation are challenging issues that are encountered in a wide range of fields such as medical imaging (shape tracking, comparison of images taken at different instants, data fusion from images that have not necessarily been acquired with the same modality, comparison of data to a common reference frame), pattern recognition or geophysics, etc. We propose in this paper a segmentation model based on the active contour model without edges [4], that is no longer solved in terms of level set functions. This is now solved using registration techniques. Therefore, a displacement field models the deformation of the initial curve into the final segmented boundary via registration. Thus, the binary segmentation functional [4], F (c1 , c2 , φ) = ν1 |R − c1 |2 H(φ) + ν2 |R − c2 |2 (1 − H(φ)) + μ|∇H(φ)| dx Ω
(R is the given image, φ is a level set function describing the unknown contour, H is the Heaviside function), can be reformulated as a warping problem X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 600–611, 2009. c Springer-Verlag Berlin Heidelberg 2009
A Combined Segmentation and Registration Framework
601
between the binary image defining the initial contour, and the (unknown) binary segmented image. Or the proposed model can also be used for registration between two images: having a segmentation of one of the images defined via a displacement field, this is used as initial guess in the “registration-segmentation” model, to segment/register the second image. The main ingredients of our proposed minimization model are thus the active contour model without edges [4], and registration via a non-linear elasticity smoother, solved in a particular simplified way. The unknown level set function φ is substituted by the unknown transformation, with an appropriate regularization as a substitute for the length term. Topology-preserving segmentation results can be obtained. An extensive overview of registration techniques can be found in [24]. These can be partitioned into two classes: parametric and non-parametric. In the nonparametric methods (our framework) the problem is phrased as a functional minimization whose unknown is the displacement vector field u. Denoting by T the template, by R the reference, the introduced functional combines a distance measure component D[R, T, u] and a smoother on the displacement vector field S = S[u] to remove the ill-posed character of the problem. Usually, the distance measure is intensity-driven and is chosen to be the L2 −norm of the difference between the deformed template and the reference (suitable when the images have been acquired through similar sensors), i.e. D[R, T, u] =
1 2
2
(T (x + u) − R(x)) dx, Ω
but one could also use correlation-based or mutual information-based techniques [24]. Several methods to regularize the displacement vector field have been investigated. One is the elastic registration introduced by Broit [3], in which the objects to be registered are considered to be the observations of a same elastic body before and after being subjected to a deformation. The smoother S = S[u] is chosen to be the linearized elastic potential of the displacement vector field u and its expression integrates the Lamé coefficients λ, μ which reflect material properties. A drawback of this smoother is that it is not suitable for problems involving large deformations. To circumvent this problem, Christensen et al. [7] proposed a viscous fluid registration model in which objects are viewed as fluids evolving in accordance with the fluid-dynamic Navier-Stokes equations. However, this is a computationally expensive procedure. In the diffusion registration model introduced by Fischer and Modersitzki [11], the smoother is based on the semi-norm of H 1 (Ω, IRn ) of u = (u1 , · · · , un )T , Ω being an open bounded subset of IRn . Regularizing properties motivate this choice (it minimizes oscillations of all components of u) rather than physical ones but here again only small deformations can be expected. In the ”curvature”based registration model introduced by Fischer and Modersitzki [12], [13], the H˙ 2 (biharmonic) regularization is explored. Affine linear transformations belong to the kernel of the regularizer S[u], which is not the case in elastic, viscous fluid or diffusion registration. But here again, transformations are restricted to small deformations. To circumvent this drawback, we propose in this paper a nonlinear elasticity-based smoother that allows larger deformations.
602
C. Le Guyader and L.A. Vese
Many improvements or alternatives of these non-parametric methods have been proposed. These include [14], [15], [37], [21], [20], [19]. By comparison with some of these methods, the only input required in our method is a fixed level set function representing the template image, that is, partitioning the image into two regions. Also, we jointly treat segmentation and registration: the distance measure is devised using the segmentation criterion [4], while registration is jointly performed, guided by the segmentation process. Our method applies to a particular class of images, since the binary criterion is being used. Before depicting our approach, we would like to mention previous work for joint segmentation and registration while stressing the main differences with our model. In [38], Yezzi et al. also suggest to jointly treat segmentation and registration. The authors couple segmentation and registration as follows: denoting R : Ω ⊂ : Ω ⊂ IR2 → IR the two images containing a common IR2 → IR and T = R object to be registered and segmented, find a closed curve C ⊂ Ω and a closed ⊂ Ω related by C = g(C) where g : IR2 → IR2 is an element of a curve C finite dimensional group G (for instance, the group of rigid motions) such that correctly delineate the object contained respectively in R and the one C and C Consequently, there are two unknowns, the closed curve C ⊂ Ω contained in R. and the mapping g. The authors exploit region-based active contour models [4] and minimize the energy: E(g, C) = E1 (C) + E2 (g(C)) |R − c1 |2 dx + |R − c2 |2 dx + = Cin
Cout
in C
− c1 |2 dx + |R
out C
− c2 |2 dx |R
with Cin and Cout the regions inside and outside C, c1 and c2 the mean values in and C out the regions inside and outside of R on Cin and Cout , and with C out . The main differences C, c1 and c2 the mean values of R on Cin and C with our model are: the contours C and C are jointly deformed here through a combination of segmentation and registration methods while in our model, we assume that the object in the template image has already been detected (we could have considered a problem with two unknowns as well). It means that the energy-minimization problem is only written in terms of the unknown contour C. Segmentation is performed using a registration approach as in [38]. The model is cast in the level set setting, which allows a straightforward modeling of the evolving curve. Contrary to [38], the class of admissible deformations (rigid, etc...) is not an input in our model. Their model, first exposed in the context of rigid deformations, has then been extended to non-rigid motions [35], [34], [29]. We would also like to mention the interesting work by Lord et al. [22] which uses a matching criterion based on metric structure comparison. The authors propose a unified method that simultaneously treats segmentation and registration by introducing two unknowns in the process: the deformation map and the segmenting curve. The segmentation process is guided by the registration map. The matching criterion, unlike classical registration methods, rests on the minimization of deviation from isometry. The matching criterion introduced is based on the metric structure comparison of the surfaces, more precisely on their first
A Combined Segmentation and Registration Framework
603
fundamental form, and on a homogeneity constraint as in [4]. Thus contrary to our model in which the expected curve (implicitly represented as the zero level set of a Lipschitz function) delineates two regions with homogeneous intensity, their criterion is still based on metric structure comparisons to disconnect normal regions from abnormal ones. We would also like to mention the related work by Vemuri et al. [31], [32]. The authors propose a coupled PDE model to perform both segmentation and registration. In the first PDE, the level sets of the source image are evolved along their normals with a speed defined as the difference between the target and the evolving source image. The second PDE allows to explicitly retrieve the displacement vector field. In particular, in the work of Vemuri-Chen [30] for joint registration and segmentation, the piecewise-smooth level set segmentation model from [33] is combined with prior shape information through global alignment. As will be seen below, our model is different from the one in [30]. We also refer the reader to [5] in which a geodesic-active-contour-based model including a shape prior is presented and [6] in which a shape prior is incorporated this time in the Mumford-Shah model. Related work is presented in [10], on an atlas-based segmentation of medical images locally constrained by level sets. We wish to refer to a segmentation method, different from ours, that also uses nonlinear elasticity to define the deformation of the evolving contour or surface in Rouchdy et al. [27]. The segmentation criterion is based on the gradient vector flow [36], and a deformation field is computed via non-linear elasticity using the finite element method. For completeness, we also refer the reader to [2], [23] for a variational registration method for large deformations, to [26], for a much related work which also uses nonlinear elasticity regularization but which is implemented using the finite element method, and to [9], a related work that uses nonlinear elasticity principles but different from our proposed approach. More details of the proposed method are presented in [18].
2
Description of the Proposed Model
As mentioned in Sect. 1, the scope of the proposed method is twofold: – devise a model in which segmentation and registration are jointly performed. – large and smooth deformations must be authorized, while keeping the deformation map topology-preserving. We see in the sequel how these criteria are fulfilled. Distance Measure Criterion. Let Ω be a bounded open subset of IRn . For the ¯ → IR purpose of illustration, we consider the case n = 2. Let us denote by R : Ω the “reference” image to be segmented (later we will discuss how the proposed ¯ → IR and method can be used for registration between a template image T : Ω the reference image R; initially, our method is defined as a segmentation method based on [4]). Let Φ0 be a given Lipschitz level set function. Denoting by C the zero level set of Φ0 and w ⊂ Ω the open set it delineates, Φ0 is such that:
604
C. Le Guyader and L.A. Vese
C = {x ∈ Ω | Φ0 (x) = 0} , w = {x ∈ Ω | Φ0 (x) > 0} , Ω \ w ¯ = {x ∈ Ω | Φ0 (x) < 0} . The deformation of the evolving curve is made in order to satisfy a segmentation criterion. Indeed, the distance measure we introduce is related to the fitting term of the active contours without edges model [4]. In this way, registration and segmentation are correlated and we expect, at the end of the process, to obtain the segmentation of the reference image as well as a smooth deformation map. It results in a region-based intensity approach and no longer in a pointwise process as usually done. The idea is to find a smooth displacement vector field u = (u1 , u2 ) : Ω → IR2 , x → (u1 (x), u2 (x)) ∈ Ω, for each x ∈ Ω, such that the zero level line of Φ defined by Φ(x) = Φ0 (x + u(x)) fits the boundary of the object to be warped in the given “reference” image. Denoting by H the onedimensional Heaviside function, by ν1 , ν2 > 0 two fixed parameters and c1 and c2 being two unknown constants depending on Φ0 , R and u, the distance measure functional Fd (the segmentation criterion) is defined by: |R(x) − c1 |2 H (Φ0 (x + u(x))) dx Fd (c1 , c2 , u) = ν1 Ω |R(x) − c2 |2 (1 − H (Φ0 (x + u(x)))) dx. (1) + ν2 Ω
We need to add a regularization term of the form Freg (u) to (1), which is a substitute for the length term of the evolving curve in [4], and therefore the unknown Φ(x) from [4] is substituted by Φ0 (x + u(x)), with Φ0 fixed now. Thus, we obtain a binary segmentation method that can also be used for registration. Introduction of a Nonlinear Elasticity-Based Regularizer. A regularizing term Freg is now introduced to ensure the smoothness of the displacement vector field u. To allow large displacements, we introduce a nonlinear-elasticitybased smoother. We propose to view the deformation of the initial contour into the final segmented contour as the deformation undergone by St. Venant-Kirchhoff materials. These materials are homogeneous, isotropic, hyperelastic and the axiom of frame indifference is satisfied (see [8] for further details). Let us denote by ε the Green-St. Venant strain tensor defined by: ε = 12 (C − I) with C = ∇ϕT ∇ϕ, ϕ being the deformation such that ϕ = Id+u, ∇ϕ being the Jacobian matrix and I denoting the identity matrix. We have equivalently ε = ε(u) = 12 (∇uT + ∇u + ∇uT ∇u). The strain tensor is a measure of the deviation between a given deformation and a rigid deformation for which C = I. As stressed by Ciarlet ( [8]), St. Venant-Kirchhoff materials are the simplest ones among nonlinear models (large strains are also possible when the stress is small, however a linear relation implies that the stress is small if and only if the strain is small). The stored energy of St. Venant-Kirchhoff materials [8] is given by W (ε) = λ2 (tr ε)2 + μtr ε2 . Thus, the nonlinear elasticity regularizer that will be coupled with the distance measure functional Fd is defined by: λ 2 2 (tr ε(u)) + μtr ε (u) dx . W (ε(u)) dx = (2) Freg (u) = 2 Ω Ω
A Combined Segmentation and Registration Framework
605
Although this functional does not satisfy known theoretical assumptions (the stored energy function is not polyconvex; it is also not rank-1 convex and consequently not quasiconvex, which raises a drawback of theoretical nature since the introduced functional is not lower semi-continuous on W 1,4 ) to insure existence of minimizers, we can expect to get, in practice, better results than those obtained with linearized models, as will be demonstrated next. The computation of the Euler-Lagrange equation satisfied by u is cumbersome. Following the idea of the more theoretical work [25], we propose to circumvent this issue by introducing a second unknown, a matrix auxiliary variable V , which approximates the Jacobian matrix of u. The nonlinear elasticity regularizer is thus applied to V and no longer to ∇u, that is, the nonlinearity is no longer in the derivatives of the unknown u. Also, as the matrix variable V is introduced to mimic the Jacobian matrix of u, an additional term based on the Frobenius norm denoted by || · ||F of ∇u − V is incorporated in the modeling. More precisely, letting T T V = V +V2+V V and α > 0 a tuning parameter, we redefine the smoothing functional Freg = Freg (u, V ) by: α W (V ) dx + ||∇u − V ||2F dx . (3) Freg (u, V ) = 2 Ω Ω In the limit, as α → +∞, we obtain ∇u V in the L2 -topology. Total Energy Functional. The total energy Etotal considered in the remainder of this work is given by: Etotal (c1 , c2 , u, V ) = Fd (c1 , c2 , u) + Freg (u, V ).
(4)
Evolution Problem. We give the form of the associated Euler-Lagrange equations in the two-dimensional case. In the calculations, the Heaviside function is replaced by a smooth version denoted by H and H = δ , regularization of the Dirac measure. Fixing u and V and minimizing Etotal (c1 , c2 , u, V ) with respect to c1 and c2 yields, as in [4]: R(x)H (Φ0 (x + u(x))) dx R(x) (1 − H (Φ0 (x + u(x)))) dx , c2 = Ω . c1 = Ω H (Φ0 (x + u(x))) dx (1 − H (Φ0 (x + u(x)))) dx Ω
Ω
Computing the first variation of functional Fd (c1 , c2 , u) in (1) with respect to u gives the following gradient: ∂u Fd (c1 , c2 , u) = ν1 (R − c1 )2 −ν2 (R − c2 )2 δ (Φ0 (x + u(x))) ∇Φ0 (x+u(x)) . Also, computing the first variation of functional Freg (u, V ) in (3) with respect to u gives only linear differential equations in each ui :
∂vk2 ∂vk1 , k = 1, 2. (5) + ∂uk Freg (u, V ) = −α uk − ∂x1 ∂x2
606
C. Le Guyader and L.A. Vese
To finish, setting V = (vij )1≤i,j≤2 and letting c01 = v11 + v22 +
1 2 2 2 2 2 2 v11 + v12 , c02 = 2v11 + v11 + v21 + v22 + v21 2
2 2 c03 = 2v22 + v12 + v22 , c04 = v12 + v21 + v11 v12 + v21 v22 ,
we obtain:
∂v11 Freg (u, V ) = α v11 −
∂u1 ∂x1
∂v12 Freg (u, V ) = α v12 −
∂u1 ∂x2
∂v21 Freg (u, V ) = α v21 −
∂u2 ∂x1
∂v22 Freg (u, V ) = α v22 −
∂u2 ∂x2
+ (λc01 + μc02 )(1 + v11 ) + μc04 v12 .
+ (λc01 + μc03 )v12 + μc04 (1 + v11 ).
+ (λc01 + μc02 )v21 + μc04 (1 + v22 ).
+ (λc01 + μc03 )(1 + v22 ) + μc04 v21 .
(6)
We solve the Euler-Lagrange equations in u and V using gradient descent, parameterizing the descent direction by an artificial time t ≥ 0. Systems of 4 and 2 equations are obtained (solved by semi-implicit finite difference schemes), ∂V = −∂V Freg (u, V ), ∂t
∂u = −∂u Fd (c1 , c2 , u) − ∂u Freg (u, V ), ∂t
(7)
equipped with the boundary conditions u = 0IR2 on ∂Ω and with the initial conditions u(x, 0) = 0IR2 and V = 0M2 (IR) . In most cases, no regridding is necessary. Nevertheless, in the algorithm, we have used a regridding technique quite similar to the one proposed by Christensen et al. [7]. The Jacobian det(∇(Id + u)) is monitored and if it drops below a defined threshold in some parts of the image, the process is reinitialized. The only change is that instead of doing the reinitialization step with the last deformed template as done in [7], we use the last deformed level set function Φ0 (· + u(·)). The overall displacement u is reconstructed similarly to [7].
3
Numerical Experiments
We conclude the paper by presenting several results on both synthetic and real images in 2 dimensions. In most experiments, ν1 = ν2 = 1 but when dealing with complex topologies involving long and thin concavities, these parameters have been increased up to 2.5. The C ∞ regularization of the Heaviside function [4] is 1 2 z H (z) = 2 1 + π arctan . Our first experimental test in Fig. 1 is an academic one and is similar to those performed by Modersitzki in [24] (we refer to pages 114–115, 129–130, 150–153, 168–170 for comparisons using linear elasticity, diffusion, curvature, or the viscous fluid method), with the goal to illustrate that the model easily handles large displacements while segmenting the reference object. The problem is to warp a
A Combined Segmentation and Registration Framework
607
Fig. 1. Top: left, the reference image; right the template. Bottom: left, the boundary of the disk (zero level set of Φ0 ) superimposed on the reference image; middle, the segmentation of the letter C; right, deformed grid using nonlinear elasticity regularization.
Fig. 2. Left, boundary of the ellipse (zero level set of Φ0 ) superimposed on the reference image; middle, the topology-preserving segmentation of the two disks; right, deformed grid using nonlinear elasticity regularization
black disk to the letter C both defined on the same image domain. The given data are the template and reference images as well as the curve delineating the disk boundary. We wish to demonstrate that our method qualitatively performs in a way similar to the fluid model without requiring the expensive Navier-Stokes solver employed for its numerical discretization, and provides two results: the segmentation of the reference image as well as a smooth displacement vector field u. The implementation is simple, based on finite difference schemes, and allows to remove the nonlinearity in the derivatives of the unknown u. The method allows large deformations unlike the linear elasticity model, diffusion model, curvature-based model for which the registration cannot be accomplished, the images differing too much (see pages 114–115, 150–153, 168–171 from [24]). In this example, three regridding steps were necessary: the transformation was considered as admissible if the Jacobian exceeded 0.01. Note that regridding steps were also necessary with the fluid registration model.
608
C. Le Guyader and L.A. Vese
Fig. 3. Topology-preserving segmentation of three complex slices of the brain. Left, the boundary of the disk (zero level set of Φ0 ) superimposed on the reference image; middle, the segmentation of the slice of the brain; right, deformed grid using nonlinear elasticity regularization.
The second example in Fig. 2 illustrates how the method can be used in the case of topology-preserving segmentation ([16], [1], [28], [17] on this topic). This synthetic reference image represents two disks (similar to tests performed in prior related works [16], [28], [17]). The template image, defined on the same image domain is made of a black ellipse such that, when superimposed on the reference image, its boundary encloses the two disks. We aim at segmenting these two disks while maintaining the same topology throughout the process (one pathconnected component) and at obtaining a smooth displacement vector field u. In this example, two regridding steps were necessary: the transformation was considered as admissible if the Jacobian exceeded 0.01. The method has been tested on complex slices of brain data. The goal is to register a disk to the outer boundary of the brain with topology preservation. In Fig. 3, the template image, defined on the same image domain, is made of a disk (shown superimposed on the reference). Two regridding steps were necessary for the first row, and 3-4 regridding steps for the 2nd and 3rd rows: the transformation was considered as admissible if the Jacobian exceeded 0.01.
A Combined Segmentation and Registration Framework
609
Fig. 4. Top: left, reference R; right, template T (mouse atlas and gene data). Bottom, left to right: contour obtained by the proposed algorithm segmenting template T (starting with Φ0 defining a disk), superimposed over the reference R; segmented reference, using as Φ0 the output contour detected at the previous step; final deformed grid using nonlinear elasticity smoother.
Fig. 5. Experiment exactly as in Fig. 4
Another medical application, as shown in Fig. 4 and Fig. 5, is proposed for mapping mouse gene data to an atlas. First, the proposed method is applied to the gene data, using Φ0 defining a disk, to segment it and extract a contour; then the method is applied again using as Φ0 the new contour, to segment the atlas data. In the process, we obtain a smooth deformation between the gene and the atlas data. No regridding step was necessary for Fig. 4.
Acknowledgments This work was supported in part by the National Institutes of Health (NIH) through the NIH Roadmap for Medical Research Grant U54 RR021813 entitled
610
C. Le Guyader and L.A. Vese
Center for Computational Biology, and by the National Science Foundation Grant DMS 0312222.
References 1. Alexandrov, O., Santosa, F.: A topology-preserving level set method for shape optimization. J. Comput. Phys. 204(1), 121–130 (2005) 2. Beg, F., Miller, M., Trouvé, A., Younes, L.: Computing large deformation metric mappings via geodesic flows of diffeomorphisms. IJCV 61(2), 139–157 (2005) 3. Broit, C.: Optimal Registration of Deformed Images. PhD thesis, Computer and Information Science, University of Pensylvania (1981) 4. Chan, T., Vese, L.: Active Contours Without Edges. IEEE Trans. Image Process. 10(2), 266–277 (2001) 5. Chen, Y., Thiruvenkadam, H., Tagare, H., Huang, F., Wilson, D.: On the Incorporation of Shape Priors in Geometric Active Contours. In: IEEE Workshop on VLSM, pp. 145–152 (2001) 6. Chen, Y., Thiruvenkadam, H., Gopinath, K., Brigg, R.: Image Registration Using the Mumford-Shah Functional and Shape Information. In: World Multiconference on Systems, Cybernetics and Informatics, pp. 580–583 (2002) 7. Christensen, G.E., Rabbitt, R.D., Miller, M.I.: Deformable Templates Using Large Deformation Kinematics. IEEE Trans. Image Process. 5(10), 1435–1447 (1996) 8. Ciarlet, P.G.: Elasticité Tridimensionnelle. Masson (1985) 9. Droske, M., Rumpf, M.: A variational approach to non-rigid morphological registration. SIAM J. Appl. Math. 64(2), 668–687 (2004) 10. Duay, V., Houhou, N., Thiran, J.-P.: Atlas-based segmentation of medical images locally constrained by level sets. In: ICIP, vol. 2 (2005) 11. Fischer, B., Modersitzki, J.: Fast Diffusion Registration. AMS Contemporary Mathematics. Inverse Problems, Image Analysis, and Medical Imaging 313, 117– 129 (2002) 12. Fischer, B., Modersitzki, J.: Curvature based image registration. JMIV 18(1), 81– 85 (2003) 13. Fischer, B., Modersitzki, J.: A Unified Approach to Fast Image Registration and a New Curvature Based Registration Technique. Linear Algebra and its applications 380, 107–124 (2004) 14. Haber, E., Modersitzki, J.: Numerical methods for volume preserving image registration. Inverse problems 20(5), 1621–1638 (2004) 15. Haber, E., Modersitzki, J.: Image Registration with Guaranteed Displacement Regularity. Int. J. Comput. Vision 71(3), 361–372 (2007) 16. Han, X., Xu, C., Prince, J.L.: A Topology Preserving Level Set Method for Geometric Deformable Models. IEEE Trans. Pattern Anal. Mach. Intell. 25(6), 755–768 (2003) 17. Le Guyader, C., Vese, L.: Self-repelling snakes for topology-preserving segmentation models. IEEE Trans. Image Process. 17(5), 767–779 (2008) 18. Le Guyader, C., Vese, L.: A Combined Segmentation and Registration Framework with a nonlinear Elasticity Smoother. UCLA C.A.M. Report 08-16 (2008) 19. Leow, A., Chiang, M.-C., Protas, H., Thompson, P., Vese, L., Huang, H.S.C.: Linear and Non-Linear Geometric Object Matching with Implicit Representation. In: Proc. 17th ICPR, vol. 3, pp. 710–713 (2004)
A Combined Segmentation and Registration Framework
611
20. Liao, W.-H., Khuu, A., Bergsneider, M., Vese, L., Huang, S.-C., Osher, S.: From Landmark Matching to Shape and Open Curve Matching: A Level Set Approach. UCLA CAM Report 02-59 (2002) 21. Liao, W.-H., Yu, C.-L., Bergsneider, M., Vese, L., Huang, S.-C.: A New Framework of Quantifying Differences Between Images by Matching Gradient Fields and Its Application to Image Blending. In: Nuclear Science Symposium Conference Record, vol. 2, pp. 1092–1096. IEEE, Los Alamitos (2002) 22. Lord, N.A., Ho, J., Vemuri, B.C., Eisenschenk, S.: Simultaneous Registration and Parcellation of Bilateral Hippocampal Surface Pairs for Local Asymmetry Quantification. IEEE Trans. Med. Imaging 26(4), 417–478 (2007) 23. Miller, M., Trouvé, A., Younes, L.: On the metrics and Euler-Lagrange equations of computational anatomy. Annu. Rev. B. Eng. 4, 375–405 (2002) 24. Modersitzki, J.: Numerical Methods for Image Registration. Oxford University Press, Oxford (2004) 25. Negrón Marrero, P.V.: A numerical method for detecting singular minimizers of multidimensional problems in nonlinear elasticity. Numerische Mathematik 58, 135–144 (1990) 26. Rabbitt, R.D., Weiss, J.A., Christensen, G.E., Miller, M.I.: Mapping of hyperelastic deformable templates using the finite element method. In: Proceedings SPIE, vol. 2573, pp. 252–265 (1995) 27. Rouchdy, Y., Pousin, J., Schaerer, J., Clarysse, P.: A nonlinear elastic deformable template for soft structure segmentation: application to the heart segmentation in MRI. IP 23, 1017–1035 (2007) 28. Sundaramoorthi, G., Yezzi, A.: Global regularizing flows with topology preservation for active contours and polygons. IEEE Trans. Image Process. 16(3), 803–812 (2007) 29. Unal, G.B., Slabaugh, G.G.: Coupled PDE’s for non-rigid registration and segmentation. In: CVPR, pp. 168–175 (2004) 30. Vemuri, B., Chen, Y.: Joint image registration and segmentation. In: Osher, S., Paragios, N. (eds.) Geometric Level Set Methods, pp. 251–269 (2003) 31. Vemuri, B., Ye, J., Chen, Y., Leonard, C.: A level-set based approach to image registration. In: IEEE Workshop on Mathematical Methods in Biomedical Image Analysis, pp. 86–93 (2000) 32. Vemuri, B., Ye, J., Chen, Y., Leonard, C.: Image Registration via level-set motion: Applications to atlas-based segmentation. Medical Image Analysis 7(1), 1–20 (2003) 33. Vese, L., Chan, T.: A Multiphase Level Set Framework for Image Segmentation Using the Mumford and Shah Model. IJCV 50(3), 271–293 (2002) 34. Wang, F., Vemuri, B.C.: Simultaneous registration and segmentation of anatomical structures from brain MRI. In: Duncan, J.S., Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3749, pp. 17–25. Springer, Heidelberg (2005) 35. Xiaohua, C., Brady, J.M., Rueckert, D.: Simultaneous segmentation and registration of medical images. In: Barillot, C., Haynor, D.R., Hellier, P. (eds.) MICCAI 2004. LNCS, vol. 3216, pp. 663–670. Springer, Heidelberg (2004) 36. Xu, C., Prince, J.L.: Snakes, shapes, and gradient vector flow. IEEE Trans. Image Process. 7, 359–369 (1998) 37. Yanovsky, I., Thompson, P.M., Osher, S., Leow, A.D.: Topology Preserving LogUnbiased Nonlinear Image Registration: Theory and Implementation. In: IEEE Conf. on CVPR (2007) 38. Yezzi, A., Zollei, L., Kapur, T.: A variational framework for joint segmentation and registration. IEEE-MMBIA, 44–51 (2001)
A Scale-Space Approach to Landmark Constrained Image Registration Eldad Haber1 , Stefan Heldmann2 , and Jan Modersitzki3 1
Dept. of Math. and Computer Science, Emory Emory University, Atlanta, USA [email protected] 2 Inst. of Mathematics, University of Lübeck, Lübeck, Germany [email protected] 3 Dept. of Computing and Software, McMaster University, Hamilton, Canada [email protected]
Abstract. Adding external knowledge improves the results for ill-posed problems. In this paper we present a new multi-level optimization framework for image registration when adding landmark constraints on the transformation. Previous approaches are based on a fixed discretization and lack of allowing for continuous landmark positions that are not on grid points. Our novel approach overcomes these problems such that we can apply multi-level methods which have been proven being crucial to avoid local minima in the course of optimization. Furthermore, for our numerical method we are able to use constraint elimination such that we trace back the landmark constrained problem to a unconstrained optimization leading to an efficient algorithm.
1
Introduction
Image registration is a challenging problem in digital imaging. Roughly speaking, the problem can be described as follows. Given a reference image R and a template image T , find a reasonable spatial transformation y such that the transformed image T [y] is similar to the reference. Image registration is required whenever images resulting from different times, devices, and/or perspectives need to be compared or integrated. Alone in the area of medical applications, registration is used in radiation therapy, surgery planing, treatment evaluation, motion correction and estimation and many more, see, e.g. [1, 2, 3, 4, 5, 6] and references therein. See also [7, 8, 9] for related work. However, although the registration problem is easily stated it is hard to be solved. A key difficulty is the ill-posedness of the problem: For a particular point x, scalar intensity values R(x) and T (x) are given but a transformation vector y(x) vector is to be computed. A common approach is to phrase image registration as an optimization problem involving a distance measure D reflecting similarity of images and a regularization term S reflecting reasonability of the transformation. Though appropriate regularization results in a well-posed problem in the sense of Hadamard [10] (see, e.g. [11, 12, 13]), it is sometimes difficult or even impossible to find an application conform regularization. X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 612–623, 2009. c Springer-Verlag Berlin Heidelberg 2009
A Scale-Space Approach to Landmark Constrained Image Registration
613
Fig. 1. Reference (left) and template (right) images
A simple example is shown in Fig. 1, where the reference and template image cover the full intensity range and share some obvious symmetries. Considering only rigid transformations, there are four different solutions for any reasonable distance measure. Regularization can be used to privilege one of these (for example by penalizing rotations). However, any regularization is somehow artificial and may favor a meaningless solution. One way to obtain better results and to guide the model towards a more realistic solution is by using landmarks. In the above example, just adding the information that the top-left corner of the square in the reference image corresponds to the bottom-right corner of the square in the template image eliminates three of the above solutions. Adding landmark information to image registration is far from being new, see e.g. [14, 15, 16, 17, 18, 4] and references therein. Although landmarks have been used extensively in the past, the effective numerical implementation of image registration with landmark is unsatisfactory. For example, no landmark registration scheme known to us allow for the incorporation of scale space or multi-level techniques which are frequently used to avoid local minima. Typically, landmark constraints are described in a discrete sense, where the ith pixel in a fixed discretization is constrained. This causes troubles if the discretization is variable, that is, if discretization on different scales is used. The goal of this paper is to develop a multilevel technique for the incorporation of landmarks in a registration process. We stress that ignoring issues such as local minima, different algorithms than the one proposed here (for example [17]) should give similar results. Thus, the focus of this work is on numerical implementation of multilevel algorithms with landmark constraints. Starting with a variational formulation of the landmark constrained registration problem, this paper provides a consistent numerical approach. The new approach is based on discretize-then-optimize approach and takes advantage of a multi-level discretization. The new approach automatically resolves the problem resulting from a fixed number of constraints versus a varying number of unknowns and related inconsistency of the constraints. A numerical stable and computational feasible basis of the constrained manifold is derived. Using a reduced formulation gives a handle to an elegant algorithm, where indefinite Karush-Kuhn-Tucker systems [19] can be avoided.
614
E. Haber, S. Heldmann, and J. Modersitzki
This paper is organized as follows. Sect. 2 introduces the basic notation and states the problem in a variational framework. A discretized then optimize approach is used to numerically solve the constrained registration problem. Details are outlined in Sect. 3, where the discretization, the construction of a basis for the constraint manifold, the numerical optimization, and a multi-level strategy are described. Sect. 4 presents some numerical results. Conclusions are given in Sect. 5.
2
Variational Formulation
In this section we formulate the constrained registration problem. Let d ∈ N denote the spatial dimension (typically d = 2, 3) and Ω ⊂ Rd the region of interest and let T , R ∈ L2 (Rd , R) denote the template and reference image, respectively. The objective is to find a transformation y : Rd → Rd such that the transformed image T [y] in similar to R and the transformation y is regular, where similarity and regularity are measured by D and S, respectively. More precisely, T [y](x) := T (y(x)) for all x ∈ Ω, D[T , R] := 12 Ω (T − R)2 dx, S[y] := α2 Ω |By|2 dx, B := Id ⊗ Δ. Here, for ease of presentation, it is assumed that similarity is quantified by the energy in the difference image. However, other distance measure like mutual information [20,21] or normalized gradient fields [22,23] can be handled similarly. Regularity is measured using the curvature regularizer [24, 25] where the partial differential operator B is the vector valued Laplacian, | · | denotes the Euclidian norm in Rn , and α is a regularization parameter. Note that the order of the regularizer has to be sufficiently high to cover the landmark constraints [26, 4]. It is assumed that a number L of landmarks r1 , ..., rL ∈ Rd in the reference and corresponding landmarks t1 , ..., tL ∈ Rd in the template image are given. The automatic detection of landmarks is beyond the scope of this paper; see [16] for an overview. The point evaluation functional is denoted by δx . With (Id ⊗ δr )[y] = (y 1 (r ), ..., y d (r )) = y(r ) ∈ Rd the landmark constraints can be phrased as C[y] = t := (t1 , ..., tL ) ∈ RL,d , where minimize J [y] = D[T [y], R] + S[y − yref ] subject to
C[y] = t,
(1)
where yref allows for a bias towards a particular solution. The above problem is strongly related to plain landmark based registration, where D = 0 and S = S TPS is the bending energy of a thin-plate-spline; see,
A Scale-Space Approach to Landmark Constrained Image Registration
615
e.g. [26,4] for an extended discussion. The solution yTPS is explicitly known and a linear combination of shifts of a radial basis function ρ associate to S and a polynomial correction. Following [4], the kth component of yTPS reads k (x) = yTPS
L
k k θk ρ(|x − r |) + (1, x1 , ..., xd )(θL+1 , ..., θL+d+1 ) ,
(2)
=1
where the coefficients are given by Aθk = (tk1 , . . . , tkL , 0, . . . , 0) with 1 ··· 1 P [ρ(|ri − rj )|)]L i,j=1 , P = r · · · r ∈ Rd+1,L , A= P 0 1 L 2 t log t (d = 2) and ρ(t) = . t (d = 3) In our final formulation of the continuous problem, we use this function as a reference for regularization, i.e. yref = yTPS , and it is thus convenient, to rephrase the problem in the update u = y − yref : minimize J [u] = D[T [yref + u], R] + S[u] subject to
C[u] = 0.
(3)
The role of the plain landmark solution as a reference is manifold. It can be seen as a good starting guess for a later implementation, minimizing the risk of being trapped by a local minimum. Moreover, it injects boundary values to region of interest. In fact, these boundary conditions make yTPS linear for x → ∞ and thus invertible, which is preferable for most applications. Finally, it yields homogeneous constraints. As it is pointed out later, this is a crucial point for the discretization as now the feasible set is always non-empty.
3
Numerical Treatment
A discretize-then-optimize approach is used to compute a numerical solution of (3). The discretization is briefly outlined for dimension d = 2, see [27] for a detailed and general description. Note that the discretization is variable during the course of optimization and all quantities introduced in this section depend on the discretization with h. However, in this section a fixed discretization level is assumed and in order to keep the presentation clear, dependencies on h are neglected. 3.1
Discretization
Fig. 2.a shows the discretization of a domain Ω in m = (3, 4) cells with cellcenters xj , j = 1, ..., n = m1 m2 . Note that all discrete quantities depend on the discretization width h, hi = ˆ = h1 · · · hd . The next equations describes how the discrete quantiωi /mi and h ties are assembled. X = (x11 , ..., x1n , ..., xdn ) ∈ Rdn , U = (u11 , ..., u1n , ..., udn ) ∈ Rdn ,
R = (R(x1 ), ..., R(xn )) ∈ Rn , T (U ) = (T (u1 ), ..., T (un )) ∈ Rn .
616
E. Haber, S. Heldmann, and J. Modersitzki
x2 ω2
h2
•
•
•
•
•
•
•
•
xj •
•
0 h1
•
∂i2,h
• ω1
x
1
1 0 −2 1 C B 1 −2 1 C .. 1 B C B . 1 −2 = 2B C C ∈ Rmi ,mi hi B .. A @ . −2 1 1 −2
xb
xd ξ1 r ξ2
xa
xc
Fig. 2. Discretization of a 2D domain Ω = (0, ω1 ) × (0, ω2 ) ⊂ R2 (left); discrete 2nd derivative ∂i2,h (middle); linear interpolation (right)
The discretization of the curvature operator can by expressed as Kronneckerproducts [25] of identity matrices Iq ∈ Rq,q and discrete 2nd derivatives ∂i2,h (see Fig. 2.b): B ≈ B = Id ⊗ (Im2 ⊗ ∂12,h + ∂12,h ⊗ Im1 ). Finally, the integrals are approximated using a midpoint quadrature rule. Thus J [u] ≈ J(U ) =
1 2
ˆ |T (yTPS(X) + U ) − R|2 + h
1 2
ˆ |BU |2 . αh
The final step is the discretization of the point evaluation functional δx . For an arbitrary location r , a d-linear interpolation of discrete point evaluation functionals located at the 2d closest grid points is exploited. For example, let d = 2 and let the four neighboring grid points of r be denoted by xa , ..., xd ; see Fig. 2.c. Thus, δr [u] ≈ δrh u = C u(X) = (1 − ξ1 )(1 − ξ2 )u(xa ) + ξ1 (1 − ξ2 )u(xb ) + (1 − ξ1 )ξ2 u(xc ) + ξ1 ξ2 u(xd ), and C is a sparse row vector with non-zero entries only at positions related to the locations of xa , ..., xd . If for a certain discretization a landmarks r would be located precisely on a grid point xj , then C has only one non-zero entry at position j. Assembling these rows for = 1, ..., L results a sparse L-by-n matrix C with at most 2d non-zero entries per row, see Fig. 3.b. The Kronnecker-products Id ⊗ C enables a simultanuous treatment of all components of the discretized vector field U . Note that even for a very coarse discretization (n < L) there exists a feasible solution fulfilling the constraints: U = 0. Thus the feasible set is non-empty. The discrete formulation of the constrained registration problem thus reads: minimize subject to 3.2
ˆ |T (yref (X) + U ) − R|2 + J(U ) = 12 h (Id ⊗ C)U = 0, U ∈ Rdn .
1 2
ˆ |BU |2 hα
(4)
An Efficient Basis for the Feasible Set
The objective is to derive a numerical feasible basis for the nullspace of the operator C. Note the size L-by-n of C can be large (e.g. n = 1283 and L = 100)
A Scale-Space Approach to Landmark Constrained Image Registration
617
and the rank of this matrix is generally unknown. For a coarse discretization, C has more rows than columns and a fine discretization it has more columns than rows. The basic idea is to reorder the columns of C, such that the non-zeros columns are placed first. Let Π denote the corresponding n × n permutation matrix and C ∗ be a matrix consisting of the non-zeros columns of C, such that CΠ = ( C ∗ | 0 ). The size of C ∗ is L-by-p, where p ≤ 2d L since each row of C can have at most 2d non-zeros entries. The matrix C ∗ is not only relatively small but also very sparse. Assuming the number of landmarks to be less then 1.000, it is thus possible to compute a singular value decomposition (SVD) of C ∗ [28], i.e. C ∗ = W ΣV ,
W W = IL ,
where
V V = Ip ,
and Σ = diag(σ1 , ..., σmin{L,p} ) ∈ RL,p ,
σ1 ≥ · · · ≥ σmin{L,p} ≥ 0.
The above SVD enables the computation of the numerical rank of the matrix C ∗ and hence C. To this end let tol be a user proscribed tolerance (e.g. tol = 0 or tol = 10−16 ) and let k the largest integer such that σk > tol. The last p − k columns of V are a basis of the (numerical) nullspace of the matrix C ∗ and thus the columns of Z form a basis for the nullspace of C, where V (:, k+1 : p) ∈ Rp,p−k V (:, k + 1 : p) 0 ∈ Rn,n−k , Z=Π 0 I n−p
and the final step undoes the permutation. Important issues are summarized as follows. The matrix C ∗ is relatively small, such that the SVD becomes numerically feasible. The SVD enables a uniform treatment independent of the rank of C ∗ and thus handles a coarse discretization (L > n) as well as a fine discretization (L < n). Note that in the case L > n the solution is the thin plate spline solution since there are 0 degrees of freedom. The columns of Z form a sparse, orthonormal, and numerically stable basis for the set of constraints. For very fine discretizations, the matrix Z is essentially the identity matrix and can be stored efficiently. Any feasible vector is given by U = (Id ⊗ Z)w, where w ∈ Rd(n−k) , and there always exists a feasible point w = 0. 3.3
Numerical Optimization
The final version of the discrete constrained registration problem is given in terms of the reduced basis and reads minimize J(w) =
1 2
ˆ |T (yref (X) + Zd w) − R|2 + h
1 2
ˆ |BZd w|2 . hα
(5)
where Zd = Id ⊗ Z. In order to find a numerical solution to (5) standard optimization techniques can be applied; see e.g. [19] for an overview. Here, we use a Gauss-Newton type
618
E. Haber, S. Heldmann, and J. Modersitzki
algorithm with an Armijo line search as outlined in [27]. The quasi-Newton system is given by Zd HZd δw = −∇J(w) where δw is the new search direction and H = ∇T ∇T + αB B is an approximation to the Hessian. Note that since the regularization is quadratic the term B B is exactly the Hessian of the regularization part and only the data fitting term is approximated. A generalized Gauss-Newton strategy can be used to handle other distance measures as mentioned before. For a numerical solution of the Newton-systems, a preconditioned conjugate gradient solver is used with symmetric Gauss-Seidel preconditioned; see [29] for details. 3.4
The Multilevel Strategy
It remains to describe the multi-level framework. To this end, a multi-level rep resentation {TD , RD , m } of given discrete data is initialized, where for ease of presentation it is assumed that mi = 2 , i = 1, ..., d, = min , ..., max . Note that hi = ωi /mi depends on the level. More precisely, T max = original data,
T −1 = downsample(conv(G, T )),
where G is a smoothing kernel (in our numerical experiments we used the block smoother G = (1, 1, 1)(1, 1, 1)/9). In general, we compute updates to the thin plate spline solution on different grids. Similar to many other multilevel algorithms, the solution on finer grid is initialized by the coarser grid solution. To be more specific, running from coarse to fine, the continuous represen tation T , R for TD , RD are computed (in our numerical experiments, spline interpolation is used). Moreover, the discretized thin-plate spline solution Yref = TPS y (X ) (cf. (2)) for a cell-centered grid X of size m and the matrix Z (cf. Sect.3.2) is initialized. A numerical solution wopt of the discretized registration problem (5) is computed and the current grid solution is given by Yopt = Y0 + Zd wopt . On the coarse grid, the initial guess we choose w0min = 0 as min starting guess such that Y0min = Yref . The starting guess w0 for a finer grid is chosen as the best least squares approximation of the prolongated coarser grid so lution, where P−1 denotes the linear prolongation operator. Since the constraint
−1 Z −1 wopt . basis Z is orthogonal, the computation simplifies to w0 := Z P−1 When designing a multilevel strategy we require set the number of levels. Unfortunately, setting the number of levels is non-trivial. In general, similar to other problems, one requires that the coarsest level actually represents the problem [30].
4 4.1
Results Artificial 2D Data
We use the hand data shown in Fig. 4. In this example, a synthetic transformation ytrue has been specified and the reference is a transformed copy of the template image R = T [ytrue ]; see Fig. 4. This construction allows a comparison with
A Scale-Space Approach to Landmark Constrained Image Registration
619
a ground truth. Here, 47 manually detected landmarks t˜j haven been chosen in the template image. Using a numerical approximation to the inverse of the trans−1 ˜ formation, the landmarks in for the reference are defined by rj ≈ ytrue (tj ) and corresponding landmarks in the template image are defined by tj := ytrue (rj ). Note that since ytrue is explicitly known, there are no errors in the landmark pairing. The original data is 128-by-128 and the level ranges from min = 3 to max = 7. Fig. 3.a shows the coarse grid representation of the data. Here, many landmarks can be found in some particular cells. The problem is over-constrained and the 47-by-64 matrix C min is rank deficient (the rank being 27). The non-zero pattern of this matrix is shown in Fig. 3.b.
Fig. 3. Coarse grid representation of data with 47 landmarks (circles), min = 3 (left); non-zero pattern of the matrix C min (right)
Fig. 4 shows the original data (a,b,c) and the results based on the thin-platespline solution yTPS (d,g), an unconstrained solution yun (e,h), and the constrained solution ycon (f,i). The distance measure and landmark error are given by err(Y ) := 100 D(Y )/D(X)[%], D(Y ) = |T (Y ) − R|2 , LM(Y ) := |(Id ⊗ C)Y − t|Frobenius. All three registration approaches (TPS, unconstrained, constrained) perform well for this example. The TPS approach gives perfect results for the landmarks but a large difference for the trapezoid. The unconstrained approach results a very small difference but the landmark error is relatively large. Finally, the constrained approach performs perfect on the landmark and results the smallest difference. The later is due to the fact that the stopping criteria is relative to the initial guess, which is results a smaller distance in the constrained approach. 4.2
3D Example
For our 3D experiment we use real data from CT and 3D power Doppler ultrasound (US) of a human liver. The goal of this application is the alignment of
620
E. Haber, S. Heldmann, and J. Modersitzki
err ≈ 100% LM ≈ 8.1
err ≈ 9% LM ≈ 10−14
err ≈ 0.6% LM ≈ 0.68
err ≈ 0.4% LM ≈ 10−14
Fig. 4. Original template image with landmarks (crosses) and visualization of an artificial transformation ytrue (top left); reference (top middle) is a transformed template R(x) = T (ytrue (x)), with visualization of transformed landmarks and initial grid; initial difference |T −R| (top right); transformed template T [y] based on thin-plate spline solution yTPS (center left), unconstrained solution yun (center middle), and constrained solution ycon (center right); differences |T [y] − R| for y = yTPS (bottom left), y = yun (bottom middle), and y = ycon (bottom right)
A Scale-Space Approach to Landmark Constrained Image Registration
621
Fig. 5. 3D registration of CT and US. Reference (top left) R with landmarks (black balls); (b) template T with landmarks (top right); reference R and deformed template T [yTPS ] after landmark registration (bottom left); reference R and deformed template T [ycon ] after constrained registration (bottom right)
vessels that have been segmented from the original data. Consequently, we have binary images allowing for a direct comparison by the SSD distance measure. The size of the data in our experiment is 171 × 165 × 186 voxels. Additionally, we have 11 corresponding landmarks that were manually picked by an expert; see Fig. 5 (a,b). For the registration we used four levels starting from 22 × 21 × 24 and ranging to the original resolution with 171 × 165 × 186 voxels. Results for a plain landmark based registration by using only the thin-plate-spline solution yTPS and the constrained solution ycon = yTPS + u are shown in Fig. 5(c,d). As it turns out, the landmark solution provides a reasonable alignment but is far from being perfect. On the other hand, using the constrained approach improved the quality of the results considerably and leads to an almost perfect alignment of large parts of the vessel system.
5
Conclusions
The paper presents a variational framework for the landmark constrained registration problem and a discretize-then-optimize approach for computing a
622
E. Haber, S. Heldmann, and J. Modersitzki
numerical solution. A difficulty for the multi-level discretization is that the number of constraints is constant while the number of degrees of freedom varies. In particular for a coarse discretization, inconsistent constrains are to be expected. This paper provides a technique to overcome this problem by mixing landmark and update components, which results in compatible constraints. Moreover, it is shown how to efficiently compute a stable, orthogonal, and sparse basis for the constraint manifold and thus enabling a reduced space optimization avoiding saddle point problems.
References 1. Glasbey, C.: A review of image warping methods. Journal of Applied Statistics 25, 155–171 (1998) 2. Pluim, J., Maintz, J., Viergever, M.: Mutual-information-based registration of medical images: a survey. IEEE Transactions on Medical Imaging 22, 986–1004 (1999) 3. Hajnal, J., Hawkes, D., Hill, D.: Medical Image Registration. CRC Press, Boca Raton (2001) 4. Modersitzki, J.: Numerical Methods for Image Registration. Oxford University Press, Oxford (2004) 5. Goshtasby, A.A.: 2-D and 3-D Image Registration. Wiley Press, New York (2005) 6. Joshi, A., Shattuck, D., Thompson, P.: Brain image registration using cortically constrained harmonic mappings. In: Karssemeijer, N., Lelieveldt, B. (eds.) IPMI 2007. LNCS, vol. 4584, pp. 359–371. Springer, Heidelberg (2007) 7. Grady, L.: A lattice-preserving multigrid method for solving the inhomogeneous poisson equations used in image analysis. In: Forsyth, D.A., Torr, P.H.S., Zisserman, A. (eds.) Scale Space and Variational Methods in Computer Vision, SSVM, ECCV (2008) 8. Koestler, H.: A Multigrid Framework for Variational Approaches in Medical Image Processing and Computer Vision. Ph.d. dissertation, University of Erlangen, Netherland (2008) 9. Keller, S., Lauze, F., Nielsen, M.: Motion compensated video super resolution. In: Sgallari, F., Murli, A., Paragios, N. (eds.) SSVM 2007. LNCS, vol. 4485, pp. 801–812. Springer, Heidelberg (2007) 10. Hadamard, J.: Sur les problmes aux drives partielles et leur signification physique, pp. 49–52. Princeton University Bulletin, Princeton (1902) 11. Weickert, J., Schnörr, C.: A theoretical framework for convex regularizers in PDEbased computation of image motion. Int. J. Computer Vision 45(3), 245–264 (2001) 12. Hinterberger, W., Scherzer, O., Schnörr, C., Weickert, J.: Analysis of optical flow models in the framework of calculus of variations. Num. Funct. Anal. Opt. 23, 69–82 (2002) 13. Droske, M., Rumpf, M.: A variational approach to non-rigid morphological registration. SIAM Appl. Math. 64(2), 668–687 (2004) 14. Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) 15. Maurer, C.R., Fitzpatrick, J.M.: A Review of Medical Image Registration. In: Interactive Image-Guided Neurosurgery. In: American Association of Neurological Surgeons, Park Ridge, IL, pp. 17–44 (1993)
A Scale-Space Approach to Landmark Constrained Image Registration
623
16. Rohr, K.: Landmark-based Image Analysis. Computational Imaging and Vision. Kluwer Academic Publishers, Dordrecht (2001) 17. Fischer, B., Modersitzki, J.: Combining landmark and intensity driven registrations. PAMM 3, 32–35 (2003) 18. Ashburner, J., Friston, K.: Spatial normalization using basis functions. In: Frackowiak, R., Friston, K., Frith, C., Dolan, R., Friston, K., Price, C., Zeki, S., Ashburner, J., Penny, W. (eds.) Human Brain Function, 2nd edn. Academic Press, London (2003) 19. Nocedal, J., Wright, S.J.: Numerical optimization. Springer, New York (1999) 20. Collignon, A., Vandermeulen, A., Suetens, P., Marchal, G.: 3D multi-modality medical image registration based on information theory. Computational Imaging and Vision 3, 263–274 (1995) 21. Viola, P.A.: Alignment by Maximization of Mutual Information. PhD thesis, Massachusetts Institute of Technology (1995) 22. Clarenz, U., Droske, M., Rumpf, M.: Towards fast non–rigid registration. In: Inverse Problems, Image Analysis and Medical Imaging, AMS Special Session Interaction of Inverse Problems and Image Analysis, vol. 313, pp. 67–84. AMS (2002) 23. Haber, E., Modersitzki, J.: Intensity gradient based registration and fusion of multimodal images. Methods of Information in Medicine 46(3), 292–299 (2007) 24. Fischer, B., Modersitzki, J.: Fast curvature based registration of MRmammography images. In: Meiler, M., et al. (eds.) Bildverarbeitung für die Medizin, pp. 139–143. Springer, Heidelberg (2002) 25. Fischer, B., Modersitzki, J.: A unified approach to fast image registration and a new curvature based registration technique. Linear Algebra and its Applications 380, 107–124 (2004) 26. Light, W.A.: Variational methods for interpolation, particularly by radial basis functions. In: Griffiths, D., Watson, G. (eds.) Numerical Analysis 1995, pp. 94– 106. Longmans, London (1996) 27. Haber, E., Modersitzki, J.: A multilevel method for image registration. SIAM J. Sci. Comput. 27(5), 1594–1607 (2006) 28. Golub, G.H., van Loan, C.F.: Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore (2000) 29. Barrett, R., Berry, M., Chan, T.F., Demmel, J.W., Donato, J., Dongarra, J., Eijkhout, V., Pozo, R., Romine, C., van der Vorst, H.: Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd edn. SIAM, Philadelphia (1994) 30. Trottenberg, U., Oosterlee, C., Schuller, A.: Multigrid. Academic Press, London (2001)
A Variational Approach for Volume-to-Slice Registration Stefan Heldmann and Nils Papenberg Institute of Mathematics, University of Lübeck, Germany {heldmann,papenber}@math.uni-luebeck.de
Abstract. In this work we present a new variational approach for image registration where part of the data is only known on a low-dimensional manifold. Our work is motivated by navigated liver surgery. Therefore, we need to register 3D volumetric CT data and tracked 2D ultrasound (US) slices. The particular problem is that the set of all US slices does not assemble a full 3D domain. Other approaches use so-called compounding techniques to interpolate a 3D volume from the scattered slices. Instead of inventing new data by interpolation here we only use the given data. Our variational formulation of the problem is based on a standard approach. We minimize a joint functional made up from a distance term and a regularizer with respect to a 3D spatial deformation field. In contrast to existing methods we evaluate the distance of the images only on the two-dimensional manifold where the data is known. A crucial point here is regularization. To avoid kinks and to achieve a smooth deformation it turns out that at least second order regularization is needed. Our numerical method is based on Newton-type optimization. We present a detailed discretization and give some examples demonstrating the influence of regularization. Finally we show results for clinical data.
1
Introduction
In this paper we describe a new method for the registration of volumetric images to data that is given only on a low dimensional submanifold. The work is motivated by a clinical problem on improved resection of tissue by pre-operative intervention planning in liver surgery [1, 2]. Before an intervention an extensive planning including the definition of surgical paths and risk analysis is made. The planning is based on abdominal CT scans of the patient and subsequent segmentation of liver, liver segments, and vessels, cf. Fig. 1(a). During the intervention the surgeon is guided by tracked ultrasound (US) images of the liver. Consequently, the pre-operative CT planning data has to be aligned to the actual deformation of the liver given by the US data. A challenge in laparoscopic liver surgery is that the US data is recorded as a sequence of two dimensional slices in 3-space. Although the spatial ordering of the slices follows the scan path, they are not aligned and in general each slice can have an arbitrary position, cf. 1(b). X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 624–635, 2009. c Springer-Verlag Berlin Heidelberg 2009
A Variational Approach for Volume-to-Slice Registration
(a)
625
(b)
Fig. 1. Clinical image data; (a) pre-operative CT planning data (few slices out of volume and segmentation of the liver); (b) few US slices from a single scan
One approach for the registration of a CT volume and US slices is to use so-called compounding techniques. Therefore, in a first step the US slice data is compounded into volume by interpolation and subsequently standard volumetric image registration is applied. However, using compounding has several drawbacks [3, 4, 5]. and practical experiments showed that using this approach for registration performs poorly and did not produce reasonable results. Besides poor performance, matching volumetric CT data to artificially generated volumetric US data does not provide confidence in registration results for the surgeon. Here, we take a different approach by comparing volumetric data directly to the given slice data. We use a variational setting for image registration. Therefore we minimize a cost-functional consisting of a so-called distance measure and regularizer with respect to a volumetric deformation. Here the regularizer is an integral on a d-dimensional domain while the distance is an integral on a d − 1-dimensional manifold. Although this seems to be a slight modification it turns out that higher order regularization is necessary to ensure smooth and differentiable deformations. In this work we provide proof-of-concept for our new approach. Therefore we consider a simplified mono-modal setting, i.e., we assume the volumetric and the slice data stem from the same type of imaging device. Without loss of generality, this allows for using the easy to present so-called Sum-of-Squared distance measure for the description of our method. The paper is organized as follows. First we present our variational approach to image registration and the novel distance measure. Next we discuss the need of higher-order regularization. In Sect. 4 we present a numerical scheme and subsequent we discuss our specific discretization of the distance measure and the regularizer in detail. part. Finally, in Sect. 5 we demonstrate the method with a synthesized clinical example.
2
Approach
In general we are given two images, a so-called reference R : Rd → R and a socalled template T : Rd → R. The goal of image registration is to find a smooth
626
S. Heldmann and N. Papenberg
deformation y : Ω → Rd that spatially aligns the images best on a domain of interest Ω ⊂ Rd . Typically Ω is a rectangular domain. Mathematically we formulate image registration as an optimization problem [6]. That is, we want to compute a solution y to min y
J (y) := D(R, T (y)) + αS(y)
(1)
where T (y) denotes the composition T ◦ y. The first term D of the objective function is a so-called distance measure that quantifies similarity between the reference R and the deformed template T (y). The second building block S is a regularizer forcing smoothness of the solution where α > 0 is a fixed chosen parameter. Typically S has the form [7] S(y) :=
1 By2L2(Ω) 2
(2)
where B is a linear differential operator. The particular difficulty in our case is that the template is a volumetric image while the reference is only known on a few scattered slices. As mentioned in the introduction one can use compounding-techniques to generate an artificial volume and subsequently use standard distance measure that relies an comparing two images of same dimension. We propose a different method. The idea of our new approach is to use only the given data rather than guessing the missing parts of the reference. To make the idea clear, in the following we assume that the distance measure is the socalled sum-of-squared-differences (SSD) [8], i.e, D is the squared L2 norm of the difference of the images. This is no loss of generality. The proposed modification applies to other distance measures such as mutual information [9,10], too, which is more suitable for multi-modal registration of CT and US data. As mentioned in the introduction, the goal of this paper is proof-of-concept and to outline the general method. Therefore and for ease of presentation, here we use the SSD distance measure. However, the standard SSD for d-dimensional images is given by 2 1 SSD(R, T ) = T (x) − R(x) dx. (3) 2 Ω In our approach we assume the reference is given only on a few planes on Ω. More generally, we assume R is known only on a set of smooth and bounded (d − 1)-dimensional sub-manifolds Mj ⊂ Ω, j = 1, . . . , m. Therefore, we modify (3) and define our distance measure by 1 D(R, T ) := 2 j=1 m
Mj
2 T (x) − R(x) dS(x)
(4)
where dS is the (d − 1)-dimensional surface measure. Note that in the particular case when Mj are slices we can trace back our modified distance to a sum of SSD distances of (d − 1)-dimensional images similar to serial registration. In this
A Variational Approach for Volume-to-Slice Registration
627
particular case we can parametrize Mj by linear maps τj with Gram determinant det Dτj Dτj = 1, where Dτj denotes the Jacobian matrix of τj , such that D(R, T ) =
m
SSD(Rj , Tj )
j=1
with Rj := R ◦ τj and Tj := T ◦ τj . Although changing integration in the distance measure seems a slight modification of problem (1) it turns out that regularization becomes crucial and needs to be chosen carefully. Since now the data is only given on a low-dimensional manifold the solution is strongly influenced by the full-space regularization. It turns out that first-order regularization, e.g, by choosing B = ∇ in (2), will produce non-differentiable solutions with kinks at the boundary of the manifold, cf. Fig. 2(e) and (h). In contrast, using second order regularization, e.g., setting B = Δ where Δ denotes the vector Laplacian, produces smooth results, cf. Fig. 2(f) and (i). In Sect. 3 we analyze this behavior by considering a simplified quadratic functional. Generally, the order of regularization to ensure differentiability depends on the space dimension. However, from the analysis in Sect. 3 we found that second order regularization is sufficient for space dimension d = 2 and d = 3. As a result we particularly propose using the curvature regularizer, i.e., setting B = Δ. Summarizing, for volume-to-slice registration we consider problem (1) with the distance measure (4) and smoother (2) with B = Δ. Thus, our approach is m 2 α 1 min T (y(x)) − R(x) dS(x) + |Δy|2 dx. (5) y 2 j=1 Mj 2 Ω
3
Regularization
In the following we motivate second order so-called curvature regularization [11, 12] by choosing B = Δ. The resulting functional for the registration (cf. (5)) is highly non-linear and in general non-convex which makes an analysis difficult and involved. To illustrate the main point on regularization we now consider a simplified quadratic problem 1 min By2L2(Ω) + gy dS (6) y 2 M where Ω ⊂ Rd is a domain with smooth boundary (Lipschitz), M ⊂ Ω is a smooth (d − 1)-dimensional manifold, and a function g ∈ L2 (M). Without loss of generality we assume that locally coordinates can be chosen such that M = {x ∈ Ω : xd = 0}. Then we can define a distribution f as the product of g multiplied by a Dirac-delta distribution, i.e., f is given by f = g δxd , such that f y dx = gδxd y dx = gy dS. (7) Ω
Ω
M
628
S. Heldmann and N. Papenberg
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
Fig. 2. Volume-to-slice-registration results for academic 2D (a)–(f) and 3D (g)–(i) experiments. (a) Template image and 1D manifold (vertical line); (d) Original Reference that is compared to the template on the 1D manifold (vertical line); (b)+(e) Deformed template (a) and deformation for 1st order regularization (B = ∇); (c)+(f) Deformed template (a) and deformation for 2nd order regularization (B = Δ); (g) Surface of 3D template (elongated bar) and three orthogonal 2D manifolds with reference data taken from a big cuboid; (h) Deformed template for 1st order regularization (B = ∇); (i) Deformed template for 2nd order regularization (B = Δ).
Furthermore we assume that g = 0, i.e., gL2(M) = 0. Computing the EulerLagrange equations in its weak form shows a necessary condition for a minimizer is Ay = f (8) where A := −B ∗ B and B ∗ denotes the adjoint of B.
A Variational Approach for Volume-to-Slice Registration
629
The right-hand-side f belongs to the space H −1 (Ω) but clearly f ∈ L2 (Ω) = H (Ω) where H −1 (Ω) denotes the dual space of H 1 (Ω) and H m (Ω) is the Sobolev space of all m-times weakly differentiable functions [13, §3]. Now we discuss two different choices for the regularizer B. First first-order socalled diffusive regularization [14] with B = ∇ and second second-order curvature regularization by B = Δ. In the first case B = ∇ yields B ∗ = −∇· and hence A = Δ is a second-order differential operator. Since the right-hand-side f belongs to H −1 (Ω) \ H 0 (Ω) a solution of (8) must be in H −1+2 (Ω) \ H 0+2 (Ω) = H 1 (Ω) \ H 2 (Ω) (cf. [15, §8]). Due to the embedding H k (Ω) ⊂ C m (Ω) for m < k − d/2 this shows that if d > 1a solution cannot be differentiable [13, §6]. Applying the same logic in the second case for B = Δ, we find B ∗ = −Δ yielding the fourth-order differential operator A = Δ2 . Therefore, a weak solution y of (8) has to satisfy y ∈ H 3 (Ω) \ H 4 (Ω). Hence, if d < 4 then y ∈ C 1 (Ω) such that a solution is continuously differentiable for d = 2, 3. 0
4
Numerical Method
In this section we describe our approach to compute a numerical solution for the volume-to-slice registration problem (5). Here, we follow the first-discretize-thenoptimize paradigm. Therefore, we discretize the functional and subsequently apply Gauss-Newton optimization. We start by explaining our discretization. In the following we particularly describe the discretization for the threedimensional case, i.e., d = 3. That is, the domain of interest Ω is a subset of R3 and Mj are two-dimensional manifolds. We assume that the domain of interest is rectangular, i.e., Ω = (a1 , b1 ) × (a2 , b2 ) × (a3 , b3 )
with − ∞ < ai < bi < ∞, i = 1, 2, 3,
and Mj are rectangular slices. For simplicity we assume that all slices Mj are parametrized over the same parameter space Θ such that Mj = {x = τj (t) : t ∈ Θ}
and
Θ := (0, θ1 ) × (0, θ2 )
with parametrizations τj : Θ ⊂ R2 → Mj ⊂ R3 given by τj (t) := Qj t + bj ,
3 Qj ∈ R3×2 such that Q j Qj = I and bj ∈ R .
(9)
Note that the condition Q j Qj = I implies det Dτj Dτj = 1 where Dτj denotes the Jacobian matrix of τj . This property simplifies computing the integrals on the manifolds and will be used later. We start with the discretization of the deformation and the distance measure. Subsequently we describe the discretization of the regularizer.
Discretization of the Deformation We use a nodal discretization for the deformation y on Ω. Therefore, we introduce a uniform grid composed of n1 × n2 × n3 cells with grid-spacing h = 1 b2 −a2 b3 −a3 ( b1n−a , n2 , n3 ) and nodal grid points 1
630
S. Heldmann and N. Papenberg
Ω h := xk = x0 + k h : k ∈ {0, . . . , n1 } × {0, . . . , n2 } × {0, . . . , n3 } where x0 = (a1 , a2 , a3 ) and denotes the Hadamard (point-wise) product of two vectors. Then, we collect the values y(xk ) ∈ R3 of the deformation at all N = (n1 + 1)(n2 + 1)(n3 + 1) nodal grid points xk ∈ Ω h in a grid-function, i.e., a vector y h ∈ R3N . Discretization of the Distance Measure Now we turn to the to the discretization of the distance measure. Recall, that it was defined as m 2 1 T (y(x)) − R(x) dS(x). D(R, T (y)) = 2 j=1 Mj For an approximation of the integrals on Mj we start by discretizing the parameter space Θ. Therefore, we define θ 1 θ2 h h : k ∈ {1, . . . , p1 }×{1, . . . , p2 } Θ := tk = k h − with h = , 2 p1 p2
such that Θh contains the cell-center of a regular discretization by p1 × p2 cells. Consequently, we discretize Mj by
Mhj := {mk = τj (tk ) : tk ∈ Θh }. Note that we have two different grid-spacings h and h for the discretization of the deformation y on Ω and the discretization of the manifolds Mj , respectively. y
t2
τj x
t1 cell-centered discretization Θh of the parameter-space
z nodal discretization Ω h of the deformation (gray) with cell-centered discretiza tion Mhj of the manifold (black)
Fig. 3. Schematic overview on the discretization of the parameter-space Θ (left) and a manifold Mj and the domain Ω (right)
A Variational Approach for Volume-to-Slice Registration
631
An schematic overview of the different discretizations Θh , Mhj , and Ω h is shown in Fig. 3. Using the common mid-point rule for the discretization of an integral over Mj we obtain 2 2
T (y(x)) − R(x) dS(x) = T (y(τj (t)))−R(τj (t)) det Dτj Dτj dt Mj
Θ
=
2 T (y(τj (t))) − R(τj (t)) dt
Θ
≈ h1 h2
tk ∈Θh
= h1 h2
2 T (y(τj (tk ))) − R(τj (tk )) 2 T (y(mk )) − R(mk ) ,
mk ∈Mh j
where we used orthogonality of the Jacobian matrix Dτj , cf. (9). For short nota tion, analogues to the deformation we collect the M = p1 p2 grid points in Mhj in a vector mhj ∈ R3M . With some abuse of notation let Rjh := R(mhj ) ∈ RM be the values of the reference R on Mhj and analogues T (y(mhj )) be the values of T (y) such that 2 T (y(mhj )) − Rjh 22 = T (y(mk )) − R(mk ) .
mk ∈Mh j
As we can see this approximation involves values of the deformation y at points mk ∈ Mhj which are in general no grid-points of our nodal discretization Ω h . To this end we approximate the values y(mk ) for mk ∈ Mhj by interpolation of the nodal grid-function y h , i.e., y(mk ) ≈
3N
ξi yih
for
mk ∈ Mhj .
i=1
We particularly use linear interpolation such that in fact only 8 coefficients per point are involved. Collecting all interpolation weights ξi for each point mk ∈ Mhj in a 3M × 3N matrix Pj we have
T (Pj y h ) ≈ T (y(mhj )). Summarizing, we approximate the distance measure by m m 2 h h 1 T (y(x))−R(x) dS(x) ≈ 1 2 T (Pj y h )−Rjh 22 . D(R, T (y)) = 2 j=1 Mj 2 j=1
h ) ∈ RMm , P = diag(P1 , . . . , Pm ) ∈ R3Mm×3N we Setting Rh = (Rh1 , . . . , Rm obtain a concise formulation for a discrete version of D(R, T (y)) given by
D(y h ) :=
h1 h2 T (P y h) − Rh 22 . 2
(10)
632
S. Heldmann and N. Papenberg
Discretization of the Regularizer For a discrete version of the curvature regularizer we use standard finite differences for approximating derivatives and the mid-point rule for the approximation integrals. Recall the curvature regularizer was defined as 1 1 2 S(y) = ΔyL2 (Ω) = |Δy|2 dx. 2 2 Ω In a first step we approximate the Laplacian based on the standard second-order seven-point-formula, i.e., we define Δh y(x) :=
3 1 y(x − h e ) − 2y(x) + y(x + h e ) 2 h =1
where e1 , e2 , e3 are the unit vectors of R3 . Furthermore, let B h ∈ R3N ×3N be its matrix representation such that B h y h is a second order approximation to Δy at the nodal grid points in Ω h yielding (B h y h ) (B h y h ) is a second order approximation to (Δy)2 . Now, let Acn ∈ Rn1 n2 n3 ×N be a matrix that averages values from nodes to the cell-centers such that Acn (B h y h ) (B h y h ) is a second order approximation to (Δy)2 at the cell-centers. Thus applying the mid-point rule for mesh size h = (h1 , h2 , h3 ) we obtain c h h h h h1 h2 h3 e An (B y ) (B y ) ≈ |Δy|2 dx Ω
with e = (1, 1, . . . , 1) ∈ R algebra we find
n1 n2 n3
the one-vector. Moreover, applying some linear
e Acn (B h y h ) (B h y h ) = e Acn diag(B h y h )B h y h = y h B h diag(e Acn )B h y h . As a result, we define the discrete version of the curvature regularizer by S(y h ) :=
1 h h h y A y 2
with a matrix Ah := h1 h2 h3 B h diag(e Acn )B h ∈ R3N ×3N . Gauss-Newton Optimization Having established discrete versions of the distance measure and the smoother now we aim to min D(y h ) + αS(y h ). (11) y
Clearly, (11) is not a quadratic function due to the non-linearity in the distance D. Therefore, we cannot compute a solution directly and have to rely on an iterative method. Here, we us a standard Gauss-Newton method [16]. Therefore, in each iteration we solve a linear system of the type Hs = −g
(12)
to compute an update s for the current iterate. Thereby g is the gradient ∇D + α∇S of the objective function given by
A Variational Approach for Volume-to-Slice Registration
633
g = h1 h2 P ∇T (T (P y h ) − Rh ) + αAh y h and H is an approximation to the Hessian ∇2 D+α∇2 S. Neglecting second order terms in ∇2 D we set H := h1 h2 P ∇T ∇T P + αAh . Thus, the Hessian is a sparse symmetric positive definite matrix such that we can apply a conjugate gradient (CG) method for solving the linear system (12). In our implementation we use CG with symmetric Gauss-Seidel relaxation as a preconditioner. Summarizing this leads to an efficient numerical algorithm for computing a solution to the discrete volume-to-slice registration problem (11).
5
Experiments
We demonstrate our method by an academic example on real liver data. Therefore, we use 238 × 155 × 156 US volumetric data captured by a 3D US-scanner.
(a)
(b)
(c)
(d)
Fig. 4. 3D Volume-to-slice-registration results for clinical data. (a) 3D data (black) with five 2D reference slice; (b) 3D template (gray) with five reference slice; (c)+(d) 3D template (gray) and original data (black) before and after registration.
634
S. Heldmann and N. Papenberg
We simulate a typical ultrasound sweep by extracting few 2D slices from the volume. Fig. 4(a) shows the setting for five slices where we visualize the volumetric data by a surface rendering of the contained vessels. This slice data is used as reference. Subsequently, we apply an artificial non-linear deformation to the volume that is used as a template. Fig. 4(b) displays a surface rendering of the template with the reference slice data. Based on the five reference slices and the volumetric template then we performed a volume-to-slice registration. Fig. 4(c) and 4(d) shows the 3D template vessels before and after registration together with original vessels. Note that the original vessels served only to generate the reference slices and was not take into account during registration. As we can see we obtain an amazing and almost perfect alignment based on very few reference data (see Fig. 4(d)).
6
Conclusions
We described a new method for registration of a d-dimensional template to d − 1-dimensional reference data motivated by CT/US registration. A key observation is that high order regularization is required to avoid unwanted and non-differentiable deformations. Furthermore, we described an efficient algorithmic based on a Gauss-Newton optimization method. In a first experiment we successfully demonstrated our method for the registration of artificially deformed data where we were able to almost recover the original deformation based only on very few reference data. These promising first result shows that out approach works in general. Clearly, the chosen SSD distance measure is not suitable for the target application on CT and US registration. However, our overall method is independent of a particular choice for the distance measure. An extension to other distance measure that can handle multi-modality, such as mutual information, is straightforward. Concluding, we have presented a novel scheme and proof-of-concept for a clinical-relevant problem based on sound theory and efficient numeric. Future work includes extension to a multi-modal setting for registration of CT and US.
Acknowledgments We thank Dirk Langemann from the Institute of Mathematics at the University of Lübeck for his support on functional analysis. We also thank Thomas Lange from the Department of Surgery and Surgical Oncology at Charité - Universitätsmedizin Berlin for providing image data.
References 1. Fong, Y., Fortner, J., Sun, R., et al.: Clinical score for predicting recurrence after hepatic resection for metastatic colorectal cancer: analysis of 1001 consecutive cases. Ann. Surg. 230, 309–318 (1999)
A Variational Approach for Volume-to-Slice Registration
635
2. Lang, H.: Technik der leberresektion - teil i. Chirurg 78(8), 761–774 (2007) 3. Barry, C., Allott, C., John, N., Mellor, P., Arundel, P., Thomson, D., Waterton, J.: Three-dimensional freehand ultrasound: Image reconstruction and volume analysis. Ultrasound in Medicine & Biology 23, 1209–1224 (1997) 4. Coupe, P., Azzabou, P.H.N., Barillot, C.: 3D freehand ultrasound reconstruction based on probe trajectory. In: Duncan, J.S., Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3749, pp. 597–604. Springer, Heidelberg (2005) 5. Rohling, R.: 3D Freehand Ultrasound: Reconstruction and Spatial Compounding. PhD thesis, Department of Engineering, University of Cambridge (1998) 6. Broit, C.: Optimal registration of deformed images. PhD thesis, Department of Computer and Information Science, University of Pensylvania (1981) 7. Modersitzki, J.: Numerical Methods for Image Registration. Numerical Mathematics and Scientific Computation. Oxford University Press, Oxford (2003) 8. Brown, L.G.: A survey of image registration techniques. ACM Computing Surveys 24(4), 325–376 (1992) 9. Viola, P.A., Wells, W.M.I.: Alignment by maximization fo mutual information. In: 5th International Conference on Computer Vision (1995) 10. Collignon, A., Maes, F., Vandermeulen, P., Suetens, P., Marchal, G.: Automated multi-modality image registartion based on information theory. Information Processing in Medical Imaging (1995) 11. Fischer, B., Modersitzki, J.: Curvature based image registration. JMIV 18(1) (2003) 12. Fischer, B., Modersitzki, J.: Combining landmark and intensity driven registrations. PAMM 3, 32–35 (2003) 13. Wloka, J.: Partial Differential Equations. Cambridge University Press, Cambridge (1987) 14. Fischer, B., Modersitzki, J.: Fast diffusion registration. In: Nashed, M., Scherzer, O. (eds.) Inverse Problems, Image Analysis, and Medical Imaging. Contemporary Mathematics, vol. 313. AMS (2002) 15. Rudin, W.: Functional Analysis. McGraw-Hill, New York (1991) 16. Nocedal, J., Wright, S.J.: Numerical Optimization. Springer Series in Operations Research. Springer, Heidelberg (1999)
Hyperbolic Numerics for Variational Approaches to Correspondence Problems Henning Zimmer1,2 , Michael Breuß1 , Joachim Weickert1 , and Hans-Peter Seidel2 1
Mathematical Image Analysis Group, Faculty of Mathematics and Computer Science, Building E1.1, Saarland University, 66041, Saarbrücken, Germany {zimmer,breuss,weickert}@mia.uni-saarland.de 2 Max-Planck Institute for Informatics, Stuhlsatzenhausweg 85, 66123 Saarbrücken, Germany [email protected]
Abstract. Variational approaches to correspondence problems such as stereo or optic flow have now been studied for more than 20 years. Nevertheless, only little attention has been paid to a subtle numerical approximation of derivatives. In the area of numerics for hyperbolic partial differential equations (HDEs) it is, however, well-known that such issues can be crucial for obtaining favourable results. In this paper we show that the use of hyperbolic numerics for variational approaches can lead to a significant quality gain in computational results. This improvement can be of the same order as obtained by introducing better models. Applying our novel scheme within existing variational models for stereo reconstruction and optic flow, we show that this approach can be beneficial for all variational approaches to correspondence problems.
1
Introduction
Numerous tasks in the field of computer vision belong to the class of correspondence problems, where one has to match pixels of two or more images. Popular examples are stereo reconstruction and optic flow, that both amount to computing a displacement field between two images. In the stereo context, the absolute value of this field is called disparity and is needed to recover the depth information of a static scene. For optic flow, the displacement field is called optic flow field and gives information about the dynamics of a moving scene. A successful class of techniques for solving correspondence problems like stereo or optic flow are the variational approaches that find the displacement field as the minimiser of a continuous energy functional. Those methods have been studied for more than two decades, starting from the optic flow approach of Horn and Schunck [1]. During this period of time, lots of effort has been spent to improve the quality of models [2, 3, 4, 5, 6, 7]. In order to apply those continuous models to sampled digital images and for solving the minimisation problem on a computer, one certainly has to discretise X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 636–647, 2009. c Springer-Verlag Berlin Heidelberg 2009
Hyperbolic Numerics
637
occurring image derivatives. This task obviously offers a certain degree of freedom in choosing a well-suited derivative approximation. Surprisingly, this issue has hardly been studied for variational approaches to correspondence problems. If the discretisation is discussed at all, most approaches use “standard” central finite difference approximations [3, 4, 5]. For variational approaches to image restoration, sophisticated approximation schemes have already been considered for a long time [8, 9]. They also have been thoroughly studied in the field of hyperbolic partial differential equations (HDEs) [10, 11], where one simulates the transport of liquids or gases, resulting in a problem setting related to correspondence problems: Given an initial density distribution (first image) and the velocity of transport (displacement), compute the density distributions at later times (second image). One realises that the role of known and unknown is switched compared to correspondence problems. In this paper we make use of this relation between HDEs and correspondence problems for the first time in the literature. In the style of numerical schemes for HDEs, we develop an adaptive discretisation scheme that decides, based on a smoothness measure, on a suitable approximation of image derivatives at each point. This scheme is then used within variational frameworks for stereo reconstruction and optic flow. Experiments show that this approach improves the quality of results in the same order as can be achieved with model refinements. This paper is organised as follows: In Sect. 2 we investigate the importance of an appropriate approximation of image derivatives on the example of simple 1-D correspondence problems. Based on this we develop the adaptive discretisation scheme that is applied to stereo reconstruction and optic flow in Sect. 3 and Sect. 4, respectively. There we also show corresponding experiments. The paper is then concluded by a summary and an outlook on future work in Sect. 5.
2 2.1
Hyperbolic Numerics for 1-D Variational Approaches A Variational Approach for 1-D Correspondence Problems
For simplicity, let us consider a 1-D signal sequence f (x, t) where x ∈ Ω denotes the position in the signal domain Ω ⊂ IR and t ≥ 0 denotes time. In order to compute the unknown displacement function u(x) that gives the displacements from time t to t + 1, we minimise the energy functional E(u) = (fx u + ft )2 + α u2x dx , (1) Ω
where subscripts denote partial derivatives. The term (fx u+ft )2 is called data term and models how well the displacement u matches the signal sequence f . We impose that the signal values are invariant under their displacement, i.e., f (x+u, t+1) = f (x, t). Assuming that u is small and f sufficiently smooth, we can perform a linearisation that finally leads to the presented data term. Note that in the 1-D setting, the data term alone allows to compute a solution u = −ft /fx , if fx = 0. However, in 2-D this will no longer be the case.
638
H. Zimmer et al.
There, and also to obtain a solution in flat signal regions, the smoothness term u2x is needed. By penalising large derivatives of u, it allows to smoothly fill in the displacement function where the data term is not sufficient. Its contribution to the energy is steered by a smoothness weight α > 0. In order to actually compute a minimiser u of the energy (1), the calculus of variations states that u necessarily has to fulfil the Euler-Lagrange equation fx (fx u + ft ) − α uxx = 0 ,
(2)
with homogeneous Neumann boundary conditions. 2.2
A Closer Look into Discretisation Issues
For solving the Euler-Lagrange equation (2) on a computer, we have to discretise the signal f , the displacement u and their derivatives fx , ft and uxx . Note that the image derivatives that occur in the Euler-Lagrange equation (2) are in general the same as in the linearised data term of the energy (1). Thus, the data term suffices to find out which derivatives have to be approximated. Let us start with the discretisation of the signals f and u. To this end we sample them on a spatio-temporal discrete grid which yields the approximations f (xi , tk ) ≈ fik and u(xi ) ≈ ui where xi := (i − 12 ) h and tk = k τ for a spatial grid size h and a time step size τ . In this paper we will only consider the two frames fik and fik+1 , assuming a temporal sampling of τ = 1. Derivative Approximations. The discretisation of the occurring derivatives can be done in different ways. We use the popular concept of finite differences, as for example presented in [12]. As notation for the approximation of partial derivatives we use fd (xi , tk ) ≈ (fd )ki to denote the corresponding finite difference discretisation. I. Temporal Discretisation. For the time derivative we use the forward difference (ft )ki :=
1 k+1 f − fik , τ i
(3)
as this is the only reasonable choice, given fik and fik+1 . II. Spatial Discretisation of First Order. The approximation of fx offers different possibilities for (fx )ki . Basic choices are forward, backward and central differences: 1 k fi+1 − fik , h 1 k − k k fi − fi−1 Dx fi := , h 1 k k fi+1 − fi−1 Dx0 fik := , 2h Dx+ fik :=
1 k+1 fi+1 − fik+1 , h 1 k+1 k+1 − k+1 fi Dx f i , := − fi−1 h 1 k+1 k+1 fi+1 − fi−1 Dx0 fik+1 := , 2h Dx+ fik+1 :=
(4)
where D+ denotes forward, D− backward and D0 central differences, respectively, that can be computed at the time level k or k + 1.
Hyperbolic Numerics
639
Note that the approximation error of the one-sided differences (forward and backward) is in O(h), whereas their central counterparts only involve an error of O(h2 ). This, together with the unbiased stencil orientation, explains why they are a popular “standard” choice in image processing applications. To further reduce the approximation error one may consider averaged differences, taking into account the time level k and k + 1. In the remainder of this paper those will be referred to as “standard” derivative approximation. They are given by k+ 12
Dx0 fi
:=
1 k 1 0 k k+1 k+1 k Dx fi + Dx0 fik+1 = f . − fi−1 + fi+1 − fi−1 2 4h i+1
(5)
III. Spatial Discretisation of Second Order. Finally we have to approximate the second order spatial derivative of the displacement function. As this choice is not crucial we propose a simple central approximation 1 (uxx )i := Dx− Dx+ ui = 2 (ui+1 − 2ui + ui−1 ) . h
(6)
Why the Discretisation of fx Matters. To show that an appropriate choice of (fx )ki is crucial for computing reasonable displacements u, we conduct a small experiment: Consider the two frames of a signal sequence in Fig. 1 (a). Here, the signal is displaced by one position to the right in its middle part and stays unchanged otherwise, which is also indicated in the ground truth displacement in Fig. 1 (b). Note that this example comprises smooth as well as discontinuous signal and displacement regions which make it rather indicative. In Fig. 1 (c)–(e) we depict computed displacements using different discretisations for fx . The displacements were obtained as the solution of a linear system of equations that arises from the discretised Euler-Lagrange equation (2). As the system matrix is tri-diagonal, it can directly be solved via the Thomas algorithm [13]. Further note that we set the smoothness weight α = 10−4 , to clearly see the influence of the data term where fx occurs. When comparing the displacements in Fig. 1 (c)–(e), the large influence of the choice of (fx )ki becomes obvious: Averaged central differences only perform well in the smooth signal regions at the left and right boundaries. At discontinuities they suffer from over- and undershoots. One-sided differences perform either favourably or fail totally. Obviously, the correct orientation matters here. When using the “correct” one-sided differences, the displacement almost coincides with the ground truth, except at one point. This is, however, not due to the numerics, but is caused by the occlusion at the jump in the displacement. Hence the considered point at time level k does not possess a matching point at time level k + 1 and its displacement is undefined. In the ground truth, we assign to this point the displacement of its right neighbour. The observed behaviour in our experiment can be explained when looking into the theory of HDEs [10, 11]. There, so called upwind schemes are a widely used concept where the signal derivatives are approximated by “correctly oriented” one-sided differences. The correct orientation in our case means opposite to the displacement direction, see our experiment.
640
H. Zimmer et al. 30
1.5
25 1 20 15
0.5
10 0 5 0
-0.5 0
2
4
6
8
10
12
0
2
4
8
6
8
10
12
2
4
8 5
7 6
7 6
0
5
5
-5
4
4
-10
3 2
3 2
-15
1
1 -20
0
0
-1
-1 0
2
4
6
8
10
12
0
2
4
6
8
10
12
0
6
8
10
12
Fig. 1. Top row: (a) Signal at time k (solid) and k + 1 (dotted). (b) Ground truth displacement. Bottom row: (c) Displacement computed using standard averaged central differences (solid), compared to the ground truth (dotted). (d) Same for one-sided forward differences. (e) Same for one-sided backward differences.
2.3
An Adaptive Discretisation Scheme
After explaining the outcome of our experiment with the help of hyperbolic numerics, we now adapt a successful concept from this area for our purpose. Recall that one-sided upwind differences – that are low-order approximations – perform well at signal discontinuities. However, they involve a higher discretisation error than central differences that are high-order approximations and that perform favourably in smooth signal regions. Hence a natural idea is to combine the two strategies by using high-order approximations in smooth signal parts and low-order ones at discontinuities. Slightly more involved techniques utilising this idea are the high-resolution methods [11], developed in the context of HDEs. They use a nonlinear blend of low- and high-order approximations, steered by a smoothness measure. Adapting this methodology to the variational framework will result in an adaptive highresolution-type (HRT) discretisation scheme for correspondence problems, that will be presented now. Measuring smoothness. First we discuss how to determine the smooth and discontinuous regions of a signal. Therefore we introduce a smoothness measure Θi := Θ fik , fik+1 := Dx− fik − Dx+ fik + Dx− fik+1 − Dx+ fik+1 ,
(7)
that is close to 0 in smooth regions where backward and forward differences of fik and fik+1 are almost identical, and large at discontinuities of fi . Determining the Upwind Directions. Next we need to determine the appropriate upwind directions for the one-sided differences. Note that our experiment
Hyperbolic Numerics
641
from Fig. 1 has shown that this is very crucial. We propose to compute a predictor solution u ˜ whose sign determines the upwind direction. The predictor is computed using standard averaged central differences and a comparatively large smoothness weight, e.g., α ˜ = 1 to cope with outliers caused by the possibly less appropriate high-order discretisation. With its help the low-order upwind approximation fxL of fx is defined as ⎧ ⎪ D− f k , if u˜i > 0 , ⎪ ⎨ x i L (8) fx i := Dx+ fik , if u ˜i < 0 , ⎪ ⎪ ⎩ H (fx )i , if u ˜i = 0 , where
H k+ 1 fx i := Dx0 fi 2
(9)
denotes the high-order standard approximation of fx using averaged central differences. Revisiting the experiment from Fig. 1, we realise that this definition agrees with the results obtained there. The High-Resolution-Type (HRT) Discretisation Scheme. Now we have everything at hand to define the adaptive HRT discretisation scheme as (fx )ki := fxL i + Φ (Θi ) fxH i − fxL i , (10) using a blending function Φ(Θi ). It is close to 1 in smooth signal regions (indicated by Θi ), yielding a high-order approximation there. At discontinuities it is close to 0 which leads to a low-order approximation that is better suited there. For the actual choice of Φ(Θi ) we propose 1 − ΘTi , if 0 ≤ Θi < T , (11) Φ(Θi ) := 0, else , using a threshold parameter T > 0. Note that for T → 0 we obtain the upwind scheme and for T → ∞ one falls back to a standard scheme. Applying the HRT scheme to the signal sequence from Fig. 1 gives the same result as with the appropriate upwind scheme, hence we omit an additional figure. However, for more challenging stereo and optic flow problems that we discuss in Sect. 3 and 4, the blending of the HRT scheme will give results superior to a pure upwind scheme.
3
Integration into Variational Stereo Approaches
In this section we integrate our adaptive HRT discretisation scheme into a recent variational stereo approach by Slesareva et al. [6]. We restrict ourselves to the rectified scenario where displacements can only occur in horizontal direction and thus one has to solve a 1-D correspondence problem for each image row. However, it makes sense to couple those via a 2-D smoothness assumption, as will be described now.
642
3.1
H. Zimmer et al.
Variational Stereo
We consider the image pair fl (x) ≡ f (x, t) and fr (x) ≡ f (x, t + 1) denoting the left and right view of a static scene, respectively. Here, x := (x, y) denotes the location within a rectangular image domain Ω2 ⊂ IR2 . Further assume that the images are presmoothed by a Gaussian convolution of standard deviation σ. The unknown scalar-valued disparity is given by the absolute value of u which can be written as u := (u, 0) in the rectified case. In accordance to [6], the disparity is found by minimising the energy E(u) = [M (u) + α V (u)] dx . (12) Ω2
The data term
2 2 , M (u) = ΨM |fr (x+u) − fl (x)| + γ |∇fr (x+u) − ∇fl (x)|
(13)
where ∇ := (∂x , ∂y ) denotes the spatial gradient operator, combines the brightness and gradient constancy assumption weighted by γ > 0. The latter makes the method more robust under illumination changes. To cope with√ outliers caused by noise or occlusions, a robust penaliser function ΨM (s2 ) := s2 + ε2 using a small regularisation parameter ε > 0 is employed that results in modified L1 penalisation. As will be described below, the linearisation of the data term is postponed to the minimisation phase to allow for a correct handling of large displacements. The smoothness term V (u) = ΨV (|∇u|2 ) , (14) uses the same robust non-quadratic penaliser function as the data term, i.e., ΨV = ΨM , resulting in Total Variation regularisation [8]. Concerning the minimisation of the energy (12), we refer to [6] for the corresponding Euler-Lagrange equation. To solve it, we employ a coarse-to-fine multiscale warping approach [4] and compute on each warping level small flow increments du using the linearised data term
2 2 2 ΨM (fx du + ft ) + γ (fxx du + fxt ) + (fxy du + fyt ) . (15) Note that the discretised Euler-Lagrange equation now leads to a nonlinear system of equations. After linearisation, we obtain a large but sparse linear system, which can be solved efficiently by an iterative solver of Gauß-Seidel type [14]. 3.2
The HRT Discretisation Scheme for Variational Stereo
We now adapt the HRT scheme from Sect. 2.3 to the stereo setting. First, we extend the discrete grid to a 2-D version with grid sizes hx and hy in x- and y-direction, respectively. The images and the disparity are then approximated k+1 k by fl (xi , yj ) ≈ fi,j , fr (xi , yj ) ≈ fi,j and u(xi , yj ) ≈ ui,j .
Hyperbolic Numerics
643
I. Smoothness Measures. In the 2-D stereo case, we first of all need distinct smoothness measures Θx , Θy and Θxy for the x-, y- and xy-direction, respectively. For Θx we use the according expression (7) from the 1-D case and Θy is obtained by using y- instead of x-differences. With their help, the mixed expression is defined as Θxy = Θx + Θy . II. Derivative Approximations. Inspecting the linearised data term from (15), we realise that now also the second-order derivatives fxx , fxt , fxy and fyt need to be discretised. Due to space limitations we will exemplify our approach for fxy . The other derivatives are than approximated accordingly. Note that given the two signals k+1 k fi,j and fi,j , the time derivative ft is always approximated as in (3). We start with the high-order approximation of fxy = ∂x fy . This translates to the finite difference case as
1 k+ 1 k+1 H k k (fxy + Dx0 Dy0 fi,j (16) )i,j = Dx0 Dy0 fi,j 2 = Dx0 Dy0 fi,j 2 0 k+1 1 0 k k+1 k Dx fi,j+1 − fi,j−1 + Dx fi,j+1 − fi,j−1 (17) = 4hy k 1 k k k = fi+1,j+1 − fi+1,j−1 (18) − fi−1,j+1 − fi−1,j−1 8hx hy k+1 k+1 k+1 k+1 +fi+1,j+1 . − fi+1,j−1 − fi−1,j+1 − fi−1,j−1 Note that for fxx we employ the central discretisation in accordance to (6). In the low-order case we use the upwind discretisation of (fx )ki,j , steered by the predictor u ˜. For the y-derivative we employ the averaged central difference approximation as in the rectified scenario, the displacement in y-direction is always zero. Thus we obtain for u ˜>0:
1 L k − 0 k k k k k (fxy )i,j = Dx Dy fi,j = f , (19) −f − fi−1,j+1 −fi−1,j−1 2hx hy i,j+1 i,j−1 and a corresponding expression for u ˜ < 0. Note that we do not need a larger smoothness weight α ˜ to compute u ˜ in this case since an appropriate α for usual stereo pairs will be large enough. 3.3
Experiments for Variational Stereo
We now show results for disparity computations using the approach of Slesareva et al. [6] with different derivative approximations. We use greyscale versions of the stereo image data from the Middlebury University [15]1 . To measure the quality of estimated disparities compared to the given ground truth disparities, we employ the bad pixel error (BPE) measure [15]. As fixed parameters we set ε = 10−3 and T = 1. In the stereo case we set σ = 0.5 and for the optic flow experiments in Sect. 4 we set σ = 0.8. 1
Available under http://vision.middlebury.edu/stereo
644
H. Zimmer et al.
In Fig. 2, the results for the Plastic pair are depicted. Considering the bad pixel maps in Fig. 2 (b)–(c), we see that the HRT scheme improves the results in the vicinity of image discontinuities and at the boundaries. Those areas are marked grey in the error maps. Note that the artefacts in Fig. 2 (f) are again caused by occlusions. The improvement also becomes visible in the BPE measures that are summarised in Table 1 that also lists other Middlebury pairs and parameter settings. Also error measures for a pure upwind scheme are given there. Comparing them to the HRT scheme shows that the blending of the latter scheme also pays off in terms of quality measures.
Fig. 2. Top row: (a) Left image of the Plastic pair. (b) Bad pixels for approach with a standard derivative approximation (bad pixels are coloured black). (c) Same for the HRT scheme. Bottom row: (d) Ground truth disparity. (e) Disparity for approach with a standard derivative approximation. (f ) Same for the HRT scheme.
4
Extension to Variational Optic Flow
Having presented how to employ the adaptive HRT discretisation scheme for stereo, its extension to the optic flow case is more or less straightforward. For optic flow we consider a presmoothed image sequence f (x, t) and want to compute a flow field w := (u, v) , where u and v give the displacements in x- and y-direction, respectively. Using the method of Brox et al. [4] that was the basis for the stereo approach of Slesareva et al. [6], we compute w as the minimiser of an energy functional similar to the one from (12). One difference concerning the HRT scheme is that we now also have to approximate fy and fyy . This, however, works accordingly to the stereo case. More problematic are the low-order upwind approximations of fxy , as they now depend on a predictor w ˜ = (˜ u, v˜) . Hence we need to do an extensive case distinction taking into account all possible combinations of the signs of u ˜ and v˜. For example, let u ˜ > 0 and v˜ < 0 then
Hyperbolic Numerics
645
Table 1. BPE measures and parameters for stereo experiments Image Pair Derivative Approximation standard Plastic upwind HRT scheme standard Teddy upwind HRT scheme standard Venus upwind HRT scheme
L k k = (fxy )i,j = Dx− Dy+ fi,j
α= α= α= α= α= α= α= α= α=
Parameters 5.5, γ = 190.0 5.5, γ = 190.0 5.5, γ = 190.0 8.0, γ = 9.5 8.0, γ = 9.5 8.0, γ = 9.5 4.5, γ = 0.5 4.5, γ = 0.5 4.5, γ = 0.5
BPE 25.85 21.35 18.85 17.45 16.94 16.75 3.06 2.78 2.77
1 k k k k fi,j+1 . − fi,j − fi−1,j+1 − fi−1,j hx h y
(20)
In order to show that the HRT scheme also performs favourably for optic flow, we performed experiments using the recent optic flow data sets from the Middlebury University [16]2 . In Fig. 3 we show results obtained for the Urban3 sequence. Note that the error maps now show the magnitude of the average angular error (AAE) [17] measure. Inspecting them, the favourable performance of the HRT scheme in the marked regions becomes visible, which is also reflected in the AAE measures shown in Table 2. It again comprises also other Middlebury sequences, parameter settings and results for the upwind scheme. Concerning the latter, we see that also for optic flow, the HRT scheme performs better.
Fig. 3. Top row: (a) Frame 10 of the Urban3 sequence. (b) AAE map for approach with a standard derivative approximation. (c) Same for the HRT scheme. Bottom row: (d) Flow magnitude of the ground truth. (e) Flow magnitude for approach with a standard derivative approximation. (f ) Same for the HRT scheme.
2
Available under http://vision.middlebury.edu/flow
646
H. Zimmer et al. Table 2. AAE measures and parameters for optic flow experiments Image Sequence Derivative approximation standard Urban3 upwind HRT scheme standard RubberWhale upwind HRT scheme standard Dimetrodon upwind HRT scheme
5
Parameters α = 4.5, γ = 4.0 α = 4.5, γ = 4.0 α = 4.5, γ = 4.0 α = 50.0, γ = 50.0 α = 50.0, γ = 50.0 α = 50.0, γ = 50.0 α = 7.0, γ = 10.0 α = 7.0, γ = 10.0 α = 7.0, γ = 10.0
AAE 5.71 4.58 4.11 4.72 4.73 4.34 1.94 3.06 1.88
Conclusions and Outlook
In this paper we have presented a sophisticated numerical scheme for the approximation of spatial image derivatives in variational approaches to correspondence problems. Our experiments demonstrated that such a scheme allows to tangibly improve the quality of results, which has in more than 20 years of research in this field only been experienced by model refinements. We hence conjecture that the numerics can be a fruitful alternative starting point for further advances. This finding is no surprise for people acquainted with the theory of HDEs where sophisticated numerical schemes have been thoroughly investigated. In this paper we have seen that HDEs and variational approaches share some structural similarities. However, we were the first to utilise this similarity for developing a well-engineered numerical scheme for variational approaches. We want to stress that the adaptive discretisation scheme developed within this paper is for sure not the only lucrative technique that can be adapted from the field of HDEs. Our current research is thus concerned with exploring further directions that may lead to better numerical schemes for variational approaches to correspondence problems.
Acknowledgement Henning Zimmer gratefully acknowledges funding by the International MaxPlanck Research School (IMPRS).
References 1. Horn, B., Schunck, B.: Determining optical flow. Artificial Intelligence 17, 185–203 (1981) 2. Alvarez, L., Deriche, R., Papadopoulo, T., Sanchez, J.: Symmetrical dense optical flow estimation with occlusions detection. International Journal of Computer Vision 75(3), 371–385 (2007)
Hyperbolic Numerics
647
3. Ben-Ari, R., Sochen, N.: Variational stereo vision with sharp discontinuities and occlusion handling. In: Proc. 2007 IEEE International Conference on Computer Vision, Rio de Janeiro, Brazil, pp. 1–7. IEEE Computer Society Press, Los Alamitos (2007) 4. Brox, T., Bruhn, A., Papenberg, N., Weickert, J.: High accuracy optical flow estimation based on a theory for warping. In: Pajdla, T., Matas, J. (eds.) ECCV 2004. LNCS, vol. 3024, pp. 25–36. Springer, Heidelberg (2004) 5. Nir, T., Bruckstein, A.M., Kimmel, R.: Over-parameterized variational optical flow. International Journal of Computer Vision 76(2), 205–216 (2008) 6. Slesareva, N., Bruhn, A., Weickert, J.: Optic flow goes stereo: A variational method for estimating discontinuity-preserving dense disparity maps. In: Kropatsch, W., Sablatnig, R., Hanbury, A. (eds.) DAGM 2005. LNCS, vol. 3663, pp. 33–40. Springer, Heidelberg (2005) 7. Zimmer, H., Bruhn, A., Valgaerts, L., Breuß, M., Weickert, J., Rosenhahn, B., Seidel, H.P.: PDE-based anisotropic disparity-driven stereo vision. In: Deussen, O., Keim, D., Saupe, D. (eds.) Proceedings of Vision, Modeling, and Visualization (VMV) 2008, pp. 263–272. AKA, Heidelberg (2008) 8. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 9. Marquina, A., Osher, S.: Explicit algorithms for a new time dependent model based on level set motion for nonlinear deblurring and noise removal. SIAM Journal on Scientific Computing 22(2), 387–405 (2000) 10. LeVeque, R.J.: Numerical Methods for Conservation Laws. Birkhäuser, Basel (1992) 11. LeVeque, R.J.: Finite Volume Methods for Hyperbolic Problems. Cambridge University Press, Cambridge (2002) 12. Morton, K.W., Mayers, L.M.: Numerical Solution of Partial Differential Equations. Cambridge University Press, Cambridge (1994) 13. Thomas, L.H.: Elliptic problems in linear difference equations over a network. Technical report, Watson Scientific Computing Laboratory. Columbia University, New York (1949) 14. Saad, Y.: Iterative Methods for Sparse Linear Systems, 2nd edn. SIAM, Philadelphia (2003) 15. Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision 47(1-3), 7–42 (2002) 16. Baker, S., Roth, S., Scharstein, D., Black, M., Lewis, J., Szeliski, R.: A database and evaluation methodology for optical flow. In: Proc. 2007 IEEE International Conference on Computer Vision, Rio de Janeiro, Brazil, pp. 1–8. IEEE Computer Society Press, Los Alamitos (2007) 17. Barron, J.L., Fleet, D.J., Beauchemin, S.S.: Performance of optical flow techniques. International Journal of Computer Vision 12(1), 43–77 (1994)
From a Single Point to a Surface Patch by Growing Minimal Paths Fethallah Benmansour and Laurent D. Cohen CEREMADE, UMR CNRS 7534, Université Paris Dauphine, Place du Maréchal De Lattre De Tassigny, 75775 PARIS CEDEX 16, France {benmansour,cohen}@ceremade.dauphine.fr
Abstract. We introduce a novel implicit approach for surface patch segmentation in 3D images starting from a single point. Since the boundary surface of an object is locally homeomorphic to a disc, we know that the boundary of a small neighboring domain intersects the surface of interest on a single closed curve. Similarly to active surfaces, we use a cost potential which penalizes image regions of low interest. First, Using a front propagation approach from the source point chosen by the user, one can see that the closed curve corresponds to a valley line of the arrival time from the source point. Next, we use an implicit 3D segmentation method. It assumes that the object boundary contains two known constraining curves. In our case, the first curve is reduced to a point and the other one is automatically detected by our approach. A partial differential equation is introduced and its solution is used for segmentation. The zero level set of this solution contains the valley line and the source point as well as the set of minimal paths joining them. We present a fast implementation which has been successfully applied to 3D biomedical and synthetic images.
1
Introduction
In this paper we are interested in interactive segmentation of a surface in a 3D image by clicking a single point on the boundary of an object and obtaining a patch of the desired surface around the given point. For this we use energy minimizing techniques and partial differential equations. Energy minimization techniques have been applied to a broad variety of problems in image processing and computer vision. Since the original work on snakes [1], they have notably been used for boundary detection. An active contour model, or snake, is a curve that deforms its shape in order to minimize an energy combining an internal part which smooths the curve and an external part which guides the curve toward particular image features. One of the main drawbacks of this approach is that it suffers from local minima ’traps’. Consequently, results strongly depend on the model initialization. Since the publication of [1], much work has been done in order to free active models from the problem of local minima. Cohen and Kimmel [2] introduced an approach to globally minimize the geodesic active contour energy, provided that two endpoints of the curve are X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 648–659, 2009. c Springer-Verlag Berlin Heidelberg 2009
From a Single Point to a Surface Patch by Growing Minimal Paths
649
initially supplied by the user. This energy is of the form γ P˜ where the incremental cost P˜ is chosen to take lower values on the contour of the image, and γ is a path joining the two points. The solution of this minimization problem is obtained through the computation of the minimal action map associated to a source point. The minimal action map can be regarded as the arrival times of ˜ and it satisfies a front propagating from the source point with velocity (1/P), the Eikonal equation. Therefore, we can compute efficiently the minimal action map with the Fast Marching Method as will be detailed in section 2. However their approach [2] cannot be directly extended to find the global minimum for an active surface in a 3D image. Nevertheless, this approach has been extended to surfaces in a 3D image by extracting a minimal surface laying on two given curves [3]. The advantage of this method is that it does not suffer from local minima problems, as would other active surface methods like [4, 5]. In this work, we focus on a novel approach for 3D object segmentation. Our aim is to generate a local surface patch from a single point. The method presented herein can be seen as an extension of the Eulerian approach presented by Ardon et al in [3] for surface extraction from a couple of ’constraining’ closed curves. But in our case, one of the curves is reduced to a single point and the other one is unknown. Let P˜ : Ω → R+ be a potential , where Ω ⊂ R3 , such that P˜ takes lower values on the surface of the object to be extracted, noted S and unknown. Having a single point p on S and a neighborhood od p: Σ ⊂ Ω, the required conditions are (see figure 1.) • the boundary ∂Σ is a connected closed surface. • ∂Σ ∩ S is a simple closed curve. • p ∈ S ∩ Σ. The volume Σ might be a ball or any topology equivalent volume. Our objective is to find the surface patch S ∩ Σ from the source point p and the potential P. We proceed in two stages : first, we look for the boundary of the
Fig. 1. On the left, one can see the required conditions for the surface patch extraction. The point p must be initialized on the surface S in the volume Σ. ∂Σ, the boundary of Σ, is a closed surface and ∂Σ ∩ S is a simple closed curve. On the right, we represent the information one has in practice : the surface S is unknown but the potential P takes lower values along S and higher values elsewhere.
650
F. Benmansour and L.D. Cohen
surface patch S ∩ ∂Σ and give a good estimate of it Γ ; in fact, running the Fast Marching algorithm (which will be detailed is section 2) from the source point p one can see that the Valley Line, noted Γ , of the arrival time on the boundary ∂Σ is a good approximation of S ∩∂Σ. A detailed definition of the valley line and the way it is extracted is presented in section 3. Next, one can represent the surface of interest as a dense network of minimal paths joining points of the valley line Γ to the source point p (section 4). The surface generated by this algorithm is completely composed of globally minimal paths. Indeed, by solving a stationary transport equation of the form : ∇Ψ.∇U = 0, where U is the action map (defined in section 2), and Ψ is the unknown, we show that any minimal path between the valley line Γ and the source point p is contained in its zero level set Ψ −1 ({0}). Important advantages of this approach are that it needs minimal interaction and that it is computationally efficient as explained later. This approach can also be used as computing brick for a complete segmentation from one single point (see section 5). Segmentation results on synthetic and medical images are presented in section 5. Finally conclusions, advantages and drawbacks of our method, and perspectives follow in section 6.
2
Background on Minimal Paths
Given a 3D image I : Ω → R+ and two points p1 and p2 , the underlying idea introduced by Cohen and Kimmel [2] is to build a potential P : Ω → R∗+ which takes lower values near desired features of the image I. The choice of the potential P depends on the application. For example, one can define P as a decreasing function of ∇I to extract image edges by finding a curve that globally minimizes the energy functional E : Ap1 ,p2 → R+ P γ(s) + w ds = E(γ) = P˜ γ(s) ds, (1) γ
γ
where Ap1 ,p2 is the set of all paths connecting p1 to p2 , s is the arc-length parameter, w > 0 is a regularization term and P˜ = (P + w). A curve connecting p1 to p2 that globally minimizes the energy (1) is a minimal path between p1 and p2 , noted Cp1 ,p2 . The solution of this minimization problem is obtained through the computation of the minimal action map U1 : Ω → R+ associated to p1 . The minimal action is the minimal energy integrated along a path between p1 and any point x of the domain Ω: ∀ x ∈ Ω, U1 (x) = min P˜ γ(s) ds . (2) γ∈Ap1 ,x
γ
The values of U1 may be regarded as the arrival times of a front propagating ˜ U1 satisfies the Eikonal equation from the source p1 with velocity (1/P). ˜ for x ∈ Ω, and U1 (p1 ) = 0. ∇U1 (x) = P(x)
(3)
From a Single Point to a Surface Patch by Growing Minimal Paths
651
Fig. 2. Minimal action map U from the source p using the potential P of figure 1 computed using the Fast Marching algorithm. Left: slices through the volume. Right: some equi-distant surfaces (level sets) of U.
The map U1 has only one local minimum, the point p1 , and its flow lines satisfy the Euler-Lagrange equation of functional (1). Thus, the minimal path Cp1 ,p2 can be retrieved with a simple gradient descent on U1 from p2 to p1 , solving the following ordinary differential equation with standard numerical methods like Heun’s or Runge-Kutta’s: dCp1 ,p2 (s) = −∇U1 Cp1 ,p2 (s) , and Cp1 ,p2 (0) = p2 . ds 2.1
(4)
Fast Marching Method
The Fast Marching Method (FMM) is a numerical method introduced by Sethian in [6] and Tsitsiklis in [7] for efficiently solving the isotropic Eikonal equation on a cartesian grid. In equation (3), the values of U may be regarded as the arrival ˜ The times of wavefronts propagating from the point of S with velocity (1/P). central idea behind the FMM is to visit grid points in an order consistent with the way wavefronts propagate. It leads to a single-pass algorithm for solving equation (3) and computing the minimal action map U. The FMM is a front propagation approach that computes the values of U in increasing order, and the structure of the algorithm is almost identical to Dijkstra’s algorithm for computing shortest paths on graphs [8]. In the course of the algorithm, each grid point is tagged as either Alive (point for which U has been computed and frozen), Trial (point for which U has been estimated but not frozen) or Far (point for which U is unknown). The set of Trial points forms an interface between the set of grid points for which U has been frozen (the Alive points) and the set of other grid points (the Far points). This interface may be regarded as a set of fronts expanding from each source until every grid point has been reached. The key to the speed of the FMM is the use of a priority queue to quickly find the Trial point with the smallest U value. If Trial points are ordered in a min-heap data structure, the computational complexity of the FMM is O(N log2 N ), where N is the total number of grid points.
652
F. Benmansour and L.D. Cohen
A way to estimate U, for a grid point xn is presented here. We limit ourselves to the 3D case. Adopting standard notation, we denote by Ui,j,k the value of U at the grid vertex (i, j, k) associated to the point xn with coordinates (i hx , j hy , k hz ), where hx , hy and hz are grid spacings in the x, y and z directions. A discretized version of (3) is solved in order to compute Ui,j,k . For the Eikonal equation, classic finite difference schemes tend to overshoot and are unstable. Rouy and Tourin [9] showed that the correct viscosity solution for Ui,j,k is given by the following first order accurate scheme :
max{(Ui,j,k − Ui−1,j,k ), (Ui,j,k − Ui+1,j,k ), 0} hx max{(Ui,j,k − Ui,j−1,k ), (Ui,j,k − Ui,j+1,k ), 0} hy max{(Ui,j,k − Ui,j,k−1 ), (Ui,j,k+1 − Ui,j,k ), 0} hz
2 + 2 + 2
= (P˜i,j,k )2 .
(5)
This is an upwind scheme : the forward and backward differences are chosen to follow the direction of the flow of information.
3
Valley Line Detection
In this section, we present a method to extract the intersection between the sub-domain boundary and the unknown surface of interest. We propose to use the minimal action map to extract the desired curve, since one can see that it corresponds to a valley line of the minimal action map (without a formal proof). Ridge and valley lines are concepts used in geomorphology and computer vision [10, 11]. According to Koenderink [12], valley lines are the locus of points on a surface at which the normal curvature assumes a local minimum in the principal direction associated with the largest, negative curvature. The main drawback of the existing criteria [10, 11] is that thresholding is needed. Hence, the detection is not precise enough, and needs more interaction for real noisy images. Moreover, these approaches are not adapted to our case where we want to extract the valley line of a scalar function defined on a surface topologically equivalent to a sphere. Our approach is heuristic, based on the fact that the fast marching propagates faster along the desired surface and then the minimal action map takes lower values along the curve of intersection between the domain boundary and the surface. Discrete definition of Σ and ∂Σ and Minimal action map on ∂Σ In practice, we assume that the volume Σ is defined as a boolean array. Then, we can partition Σ into two subsets, int(Σ) and ∂Σ, its interior and its boundary. A voxel x ∈ Σ is in the interior of the volume if all its 6 neighbors are in Σ, and it is a point of the boundary ∂Σ if x ∈ Σ \ int(Σ). Then ∂Σ is also represented by a boolean array (see figure 3).
From a Single Point to a Surface Patch by Growing Minimal Paths
653
(b)
(a)
Fig. 3. Discrete representation of the volume Σ and its boundary ∂Σ. (a) The volume Σ is described by a boolean array. (b) Σ is partitioned into two subsets int(Σ) and ∂Σ such that ∂Σ is connex according to on 26-connectivity.
(a)
(b)
(c)
frontier Γ of the surface patch
Fig. 4. Minimal action map associated to source point p and potential of figure 1. (a) Cut views of the minimal action map U on volume Σ. (b) View of U on ∂Σ, and its valley line Γ . (c) Unfolded U|∂Σ , valley line, and different marked points on Γ correspond to local minima.
Let us note U|∂Σ : ∂Σ → R+∗ the restriction of U on ∂Σ (see figure 4.) The value U(x) for a point x in ∂Σ is the arrival time to point x of the wavefront ˜ Since potential P˜ takes propagating from the source point p with velocity 1/P. lower values along the surface S, the front propagates faster along it. So, we can reasonably assume that the first point reached by the front on ∂Σ belongs to ∂Σ ∩ S. This point is easy to detect, because it is the global minimum of U|∂Σ and is noted xmin . In a more general manner, each local minimum xm of U|∂Σ has been reached by the front before all points in a small neighborhood of xm . Since, the wavefront propagates faster along S, one can expect that the curve ∂Σ ∩ S corresponds to valley lines on U|∂Σ . For valley line detection, our approach is simple and fast. Using the function U|∂Σ and without parametrizing the surface ∂Σ, we find frontier Γ of the surface patch S ∩ ∂Σ by looking for the cyclic sequences of the valley lines of U|∂Σ containing xmin . Finding Valley Lines of U|∂Σ As explained above, valley lines of U|∂Σ contain the local minima xm as well as the saddle points. A robust way to link two local minima is to detect the saddle point between them and to make a double gradient descent to each minimum. The difficulty here is that some local minima and saddle points of U|∂Σ do not
654
F. Benmansour and L.D. Cohen
belong to the curve of interest. To avoid this, saddle points of U|∂Σ are detected by increasing order. During this step, we store the information on a graph G such that vertices of the graph correspond to local minima of U|∂Σ , and an edge corresponds to a pair of valley lines joining two local minima via a saddle point. The valley line detection algorithm stops when a cycle (in the sense of a simple closed path) is detected in the graph G. However, the closed curve Γ tends to have low length, linking between close local minima. In practice, one adds two ad hoc constraints which make it possible to extract the border of the surface patch in a more robust manner. The algorithm stops as soon as the global minimum of U∂Σ , xmin , belongs to the closed sequence , and the subset of int(Σ) defined by : −1 U|int(Σ) (] max{U(x )}, +∞[) = {x ∈ int(Σ); U(x) > max(U(x ))} x ∈
x ∈
includes exactly two connected components for the 26-connectivity, which means that the sequence cuts the boundary ∂Σ into exactly two connected components (see figure 4).
4
Dense Network of Minimal Paths: An Implicit Approach
Once the boundary curve Γ is obtained, it is easy to construct explicitly a network of minimal paths linking points of Γ to the source point p by simple gra Γ dient descents as in [13]. The network linking Γ to p is noted Np = CxΓ ,p . xΓ ∈Γ
Since this networkmay have holes, our objective is to find a smooth function Ψ : Σ → R, such that the network NpΓ is included in the zero level set of Ψ , i.e NpΓ ⊂ Ψ −1 ({0}), where Ψ −1 ({0}) = {x ∈ Σ; Ψ (x) = 0}. A necessary condition on function Ψ is ∇Ψ (x).∇U(x) = 0, (6) for each point x belonging to a path CxΓ ,p . Thus, vector ∇Ψ is perpendicular to ∇U along the minimal paths of the network NpΓ . Extending the constraint given by equation (6) to the whole domain Σ gives a sufficient condition on Ψ . Moreover, adding a linear term on Ψ smoothes the solution without changing the zero level set of Ψ . Hence, if Ψ is a smooth function satisfying the following conditions:
(C1 ) ∀ x ∈ Σ, ∇Ψ (x) · ∇U(x) − α Ψ (x) = 0, (7) (C2 ) ∀ x ∈ Γ, Ψ (x) = 0, where α ≥ 0, then NpΓ ⊂ Ψ −1 ({0}). Finally, Ψ −1 ({0}) is a dense network of minimal paths. Indeed, if Ψ satisfies conditions (C1 ) and (C2 ), then ∀x ∈ Ψ −1 ({0}), the minimal path Cx,p linking x to the source p in included in Ψ −1 ({0}). Detailed proof of these results can be found in [3]. Using conditions (C1 ) and (C2 ), we look for a solution Ψ of the following Dirichlet problem:
From a Single Point to a Surface Patch by Growing Minimal Paths
∇Ψ (x) · ∇U(x) − α Ψ (x) = 0 if x ∈ int(Σ), if x ∈ ∂Σ, Ψ (x) = d|∂Σ (x)
655
(8)
where d|∂Σ is a signed Euclidean distance to Γ on ∂Σ. Indeed, that makes the function Ψ satisfying the second condition (C2 ). One can propose other boundary conditions satisfying (C2 ), but empirically, we found that the signed distance is an adequate choice. Since Γ is a simple closed curve on ∂Σ and ∂Σ is topologically equivalent to a sphere, Γ partitions ∂Σ into two distinct open surfaces. That makes the sign choice for d|∂Σ obvious. First, the unsigned distance from Γ on ∂Σ is calculated using the Fast Marching algorithm (this time using 26-connectivity), then different signs are attributed to the distance on each connected component of ∂Σ \ Γ of the partition (see figure 5).
Fig. 5. Transport initialization. First, the distance map from the curve Γ is computed. Then using Γ , ∂Σ \ Γ is partitioned into exactly two parts. Finally, different signs are attributed to d|∂Σ on each connected component.
Equation (8) is a stationary transport equation. The associated non stationary PDE models the transport in time and space of material along the vector field ∇U. The stationary transport equation has been studied [3] for surface segmentation, for computing tissue thickness [14] and inpainting [15]. The stationary transport equation (8), as most PDEs for which characteristics intersect are numerically hard to solve. Nevertheless, the direction on which information propagates is known (−∇U) thus one can elaborate a single pass algorithm based on an ordered sweeping of the grid points [3,14,15]. We propose to find values of Ψ by exploring points of Σ in decreasing order of |Ψ |. The algorithm, called Fast Transport is similar to the Fast Marching algorithm : only the ordering is different as well as the local update scheme. The complexity of the Fast Transport algorithm is O(N log(N )). The information propagates from ∂Σ to the source point p following the direction −∇U. Thus, it is important to use an upwind scheme that takes into account the direction −∇U to approximate the derivatives of Ψ . Let us note Ψi,j,k the value of Ψ at point x of coordinate (ihx , jhy , khz ), ∂d Ψi,j,k the derivative of Ψ along direction d (d corresponds to x, y or z-direction) and ∂d Ui,j,k the derivative of U along direction d. If ∂d Ui,j,k < 0, the information is transported increasingly along d direction. Thus along the x direction we have:
656
F. Benmansour and L.D. Cohen
Fig. 6. On the left and on the middle are respectively shown, on a cut view, the function Ψ and its sign on Σ. On the right is shown the extracted surface pathc, i.e. the isosurface Ψ −1 ({0}), as well as the network of minimal paths NpΓ .
∂x Ψi,j,k
⎧ Ψi+1,j,k − Ψi,j,k ⎪ ⎪ if ∂x Ui,j,k ≥ 0, ⎨ hx = ⎪ − Ψi−1,j,k Ψ ⎪ ⎩ i,j,k if ∂x Ui,j,k < 0. hx
The derivatives along y and z direction are similar. The update scheme of the Fast Transport algorithm is based on the previous equation, by injecting it in equation (8), see [3] for more details. Lastly, although this scheme is of relatively low precision and dissipative, it gives satisfactory results in our experiments with an acceptable convergence speed. In our implementation α is a parameter that can be fixed through the maximum discontinuity jump of Ψ around the source p. Indeed, by considering the minimal path Cx,p , linking a point x ∈ ∂Σ to p, parametrized on the interval J = [0, L(x)], where L(x) is the Euclidean length of the path, one can prove using equation (8) that ∀ s ∈ J, Ψ Cx,p (s) = d|∂Σ (x) e−αs . Thus the discontinuity jump occurs around the source point p and is as high as |d|∂Σ (x)|e−αL(x) . Fixing a maximum discontinuity jump ε and
α =
log max |d|∂Σ (x)| − log(ε) x∈∂Σ
min L(x)
,
x∈∂Σ
guaranty that the discontinuity jump around the source point p is less or equal than ε. Imposing this constraint requires the computation of the Euclidean length L of the minimal paths. This calculus can be easily done during the Fast Marching propagation as explained in [16, 17]. On figure 6, function Ψ solution of equation (8), the final segmentation result Ψ −1 ({0}) as well as the network of minimal paths are shown.
From a Single Point to a Surface Patch by Growing Minimal Paths
657
Fig. 7. We select a sub-volume from a CT cardiac image. Then an edge detector potential, inversely proportional to the gradient magnitude of the image ∇I is shown. The Fast Marching algorithm is launched from the selected source point to compute the minimal action map U. Then the valley line of U is calculated. Finally the information is transported from the initialized values of the sub-volume boundary using the fast transport algorithm, and the segmentation result of this patch of surface is found using the marching cube algorithm on the solution of the transport equation.
Fig. 8. On the left: segmentation of a synthetic torus. On the right: segmentation of a closed cell from electronic microscopy image. (a) Potential P taking lower values on the features of interest on which a single source point is selected. The other points are found automatically using the approach presented in [17]. (b) A cut view of the visited domain Ω ∗ showing the value of the minimal action map U. (c) A Cut view of the domain Ω ∗ showing the Voronoi partition. (d) The set of sources and the valley lines detected on each Voronoi cell. (e) A cut view of the domaine Ω ∗ showing values of function Ψ solution of the transport equation (8). (f ) Isosurface Ψ −1 ({0}) on which the detected keypoint points, the valley lines and the geodesic meshing are superimposed. On the right: (g-h-i) Some slices of the original image and the final segmentation Ψ −1 ({0}) superimposed on it.
5
Experimental Results
Using our method, one can extract a surface patch from a single point, see figure 7. The main advantages of our method is that it is minimally interactive
658
F. Benmansour and L.D. Cohen
and fast. The important constraint is that the boundary of the selected subvolume intersects the surface on a single closed curve. One can imagine that by considering a subdivision of the whole domain, and by selection of a few points on the sub-domains that contains the surface of interest, one can extract a full segmentation of the desired object. Recently, we presented [17] a new method for segmenting closed contours and surfaces. Our work builds on a variant of the minimal path approach. First, an initial point on the desired contour is chosen by the user. Next, new keypoints are detected automatically using a front propagation approach. We assume that the desired object has a closed boundary. This a-priori knowledge on the topology is used to devise a relevant criterion for stopping the keypoint detection and front propagation. The final domain visited by the front will yield a band surrounding the object of interest. Using this method for 3D closed objects, we can extract a networks of minimal paths from a 3D image called Geodesic Meshing. But this segmentation is insufficient. The Voronoi partition of the visited domain gives a good subdivision of it, and by applying the algorithm presented in this paper on each Voronoi cell, one can find a full segmentation of the object of interest, see figure 8.
6
Conclusion
In this paper we have proposed a new method to segment a surface patch from a single source point. Our method needs minimal interaction : a single source point. An important condition is that the boundary of the sub-volume that contains the surface patch of interest should intersects the surface on a single closed curve. By remarking that this closed curve corresponds to the valley line of the arrival time from the source point we have proposed a heuristic to extract it automatically. Finally we adapted an existing implicit surface segmentation method to find a complete surface that contains the valley line and the network of minimal paths linking this valley line to the source point. Our approach can be extended to segment a complete surface by subdividing the domain into several sub-domains containing the desired surface patches. Then, a few points can be enough to generate a coherent object boundary segmentation.
Acknowledgements We would like to thank Stéphane Bonneau for his contributions, and Professor Anthony J. Yezzi for interesting discussions. This work was partially supported by ANR grant SURF -NT05-2_45825.
References 1. Kass, M., Witkin, A., Terzopoulos, D.: Snakes: active contour models. International Journal of Computer Vision 1, 321–331 (1988) 2. Cohen, L.D., Kimmel, R.: Global minimum for active contour models: a minimal path approach. International Journal of Computer Vision 24, 57–78 (1997)
From a Single Point to a Surface Patch by Growing Minimal Paths
659
3. Ardon, R., Cohen, L.D., Yezzi, A.: Fast surface segmentation guided by user input implicit extension of minimal paths. Journal of Mathematical Imaging and Vision 25, 289–305 (2006) 4. Caselles, V., Kimmel, R., Sapiro, G., Sbert, C.: Minimal surfaces based object segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 394–398 (1997) 5. Cohen, L.D., Cohen, I.: Finite element methods for active contour models and balloons for 2D and 3D images. IEEE Transactions on Pattern Analysis and Machine Intelligence 15, 1131–1147 (1993) 6. Sethian, J.A.: Level Set Methods and Fast Marching Methods. Cambridge University Press, Cambridge (1999) 7. Tsitsiklis, J.N.: Efficient algorithms for globally optimal trajectories. IEEE Transactions on Automatic Control 40, 1528–1538 (1995) 8. Dijkstra, E.W.: A note on two problems in connection with graphs. Numerische Mathematic 1, 269–271 (1959) 9. Rouy, E., Tourin, A.: A viscosity solution approach to shape from shading. SIAM Journal on Numerical Analysis 29, 867–884 (1992) 10. López, A., Lloret, D.: On ridges and valleys. In: ICPR 2000: Proceedings of the International Conference on Pattern Recognition, Washington, DC, USA, p. 4059. IEEE Computer Society, Los Alamitos (2000) 11. Tang, C.K., Medioni, G.G.: Extremal feature extraction from 3-D vector and noisy scalar fields. In: IEEE Visualization 1998, October 1998, pp. 95–102 (1998) 12. Koenderink, J.: Solide Shape. MIT Press, Cambridge (1990) 13. Ardon, R., Cohen, L.D.: Fast constrained surface extraction by minimal paths. Int. J. Comput. Vision 69(1), 127–136 (2006) 14. Yezzi, A., Prince, J.L.: An Eulerian PDE Approach for Computing Tissue Thickness. IEEE Transactions On Medical Imaging 22, 1332–1339 (2003) 15. Bornemann, F., Marz, T.: Fast image inpainting based on coherence transport. JMIV 28(3), 259–278 (2007) 16. Cohen, L.D., Deschamps, T.: Segmentation of 3D tubular objects with adaptive front propagation and minimal tree extraction for 3D medical imaging. Computer Methods in Biomechanics and Biomedical Engineering 10, 289–305 (2007) 17. Benmansour, F., Cohen, L.D.: Fast object segmentation by growing minimal paths from a single point on 2D or 3D images. Journal of Mathematical Imaging and Vision 33(2), 209–221 (2009)
Optimization of Convex Shapes: An Approach to Crystal Shape Identification Timo Eirola and Toni Lassila Helsinki University of Technology, Institute of Mathematics, P.O. Box 1100, FI-02015 TKK, Finland [email protected]
Abstract. We consider a shape identification problem of growing crystals. The shape of the crystal is to be constructed from a single interferometer measurement. This is an ill-posed inverse problem. The forward problem of interferogram from shape is injective if we restrict the problem to convex shapes with known boundary. The problem is formulated as a shape optimization problem. Our aim is to solve this numerically using the gradient descent method. In the numerical computations of this paper we study the behavior of the approach in simplified cases. Using H 1 -gradients (inner products) acts as a regularization method. Methods for enforcing the convexity of shapes are discussed.
1
Introduction
Shape optimization is a field of mathematical optimization concerned with finding the shape (bounded open set with Lipschitz boundary) that minimizes a given cost functional. Boundary variational techniques can be used to compute sensitivities of functionals with respect to shape. Comprehensive texts on the topic of shape analysis include [1] and [2]. We consider a shape identification problem of finding the shape of a growing 3 He crystal that best fits the interferogram produced in a Fabry-Pérot interferometer. Based on physical principles it is assumed that the crystal shape is convex at all times. For an overview of the growth process of 3 He crystals and the interferometer setup, see [3]. The restriction to convex shapes can be used as a simplification tool in shape optimization problems. In [4] the authors showed the existence of solutions to very generic shape optimization problems with the constraint that the shapes were convex. In our problem of determining shape from interferogram the operator solving the forward problem is generally not injective if the shapes are allowed to be nonconvex. We prove that if the convexity assumption holds and the height of the shape at the boundary of the computational domain is known then the shape identification problem does have a unique solution.
This work has been supported by the Academy of Finland (decision number 107290/04). We would like to thank Heikki Junes from the Low Temperature Laboratory at TKK for his input and introducing us to this problem.
X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 660–671, 2009. c Springer-Verlag Berlin Heidelberg 2009
Optimization of Convex Shapes
661
It has been previously noted that the convexity constraint can be difficult to handle in numerical computations, especially in higher dimensions. It is known that pointwise conditions, such as curvature conditions, can fail to guarantee convexity for functions sampled at discrete points. For further discussion on this point, see [5]. Methods for optimization in the family of convex functions have been previously studied in [5, 6, 7, 8, 9]. In contrast to most of these approaches we do not write a strict convex constraint system, but instead use a penalization method that allows convexity to be temporarily broken when it is beneficial to the convergence of the iteration. The shape identification problem is solved using level set methods and gradient descent for shapes. Methods for convexification by evolution equations, such as the level set method, have been previously considered in [10, 11]. As is typical for ill-posed inverse problems, the presence of experimental noise in the measurements requires some type of regularization. We demonstrate that using H 1 -gradients (inner products) for the shape gradients acts as a form of regularization.
2 2.1
Shapes and Shape Evolution Representing Shapes
We first define the notation. The computational domain D ⊂ IRd , d ∈ {1, 2}, is a convex bounded open set. We consider convex shapes (open sets with Lipschitz boundary) Ω ⊂ D × IR+ , which are supported by D from below, that is to say n(x), e3 < 0
=⇒
x3 = 0 ,
(1)
where n is the outward normal vector field on the surface ∂Ω. A convex shape Ω supported by D can be represented in many ways. One is to give a Lipschitz function φ : D × IR+ → IR such that Ω = {x : φ(x) < 0} ,
Ω c = {x : φ(x) ≥ 0}
(2)
and |∇φ| nonvanishing on ∂Ω. Then φ is called an implicit function or a level set function for Ω. An alternative representation of Ω is with a function u : D → IR+ defined as u(x1 , x2 ) = sup {x3 ≥ 0 : φ(x1 , x2 , x3 ) ≤ 0} , (3) where φ is an implicit function for Ω. We call this the height function of Ω. Note that if Ω is convex then u is concave. Denote by C ⊂ H 1 (D) ∩ C(D) the set of concave functions on D that are continuous on D. We also define Ch ⊂ C as the subset of concave functions that are equal to h on the boundary ∂D for a given function h : ∂D → IR+ .
662
2.2
T. Eirola and T. Lassila
Level Set Methods
Consider an initial shape Ω0 and an evolution its boundary ∂Ω0 under a smooth velocity field v(x, t). When the shape Ω(t) at time t is represented by an implicit function φ(·, t), we have an Eulerian representation of the evolution of the implicit function in time φt (x, t) + vn (x, t)|∇φ(x, t)| = 0 ,
(4)
where vn is the component of v in the outward normal direction of ∂Ω. This is called a level set equation. Level set methods are a generic framework of nonlinear hyperbolic-parabolic PDEs for implicit functions that can be used to model evolution of shapes under certain types of flows. For a generic introduction into level set methods, see [12]. For a survey of level set methods specifically in inverse problems, see [13].
3 3.1
Shape Optimization Shape Derivatives
Let J(Ω) : Σ → IR be a shape functional defined on some family of admissible shapes Σ. The derivative with respect to shape at Ω0 in the direction of the smooth velocity field v is defined as the limit dJ(Ω0 ; v) = lim+ t→0
J(Ωt ) − J(Ω0 ) t
(5)
when it exists. With some general assumptions (see Chap. 8 of [1] for details) this expression is bounded and linear with respect to v, and has support only on the boundary of Ω0 : dJ(Ω0 ; v) = D · vn dS . (6) ∂Ω0
Using the shape derivative (6) the shape functional can be expanded as J(Ωt ) = J(Ω0 ) + t · dJ(Ω0 ; v) + o(t) .
(7)
For a given Hilbert space H(∂Ω0 ) we look for the unique function ∇S J ∈ H(∂Ω0 ) such that dJ(Ω0 ; v) = ∇S J, vn H . (8) Then ∇S J is the shape gradient of J with respect to the chosen inner product. If the velocity normal field vn is chosen to be the negative shape gradient vn = −∇S J(Ω0 ) we have J(Ωt ) = J(Ω0 ) − t · ||dJ(Ω0 )||2H(∂Ω0 ) + o(t) < J(Ω0 )
(9)
for sufficiently small t > 0. This is the method of gradient descent for shape optimization. The negative gradient flow can be efficiently implemented with numerical level set methods.
Optimization of Convex Shapes
3.2
663
Convexity Constraints
To obtain level set methods that preserve the convexity of the shape we follow the basic idea of constrained gradient descent. Let G(Ω) be a shape constraint functional. We consider the constrained shape optimization problem min Ω
(10)
J(Ω)
subject to G(Ω) = 0. Then if J and G are shape differentiable and there exist shape gradients ∇S J and ∇S G, we let μ be a Lagrange multiplier and obtain the necessary conditions for a constrained minimum ∇S J(Ω) + μ∇S G(Ω) = 0 , G(Ω) = 0 .
(11) (12)
A C 2 shape in the plane is convex if the curvature of its boundary is nonnegative. In three dimensions a sufficient condition for convexity is that both principal curvatures of the surface must be nonnegative. Let Ω be a convex shape with the height function u. Then the minimum curvature k1 of the surface is given by 2 2 ux1 x1 + ux2 x2 + (ux1 x1 − ux2 x2 ) + (2ux1 x2 ) k1 = − . (13) 1 + u2x1 + u2x2 This follows from taking the smaller eigenvalue of the matrix representation of the second fundamental form. We extend k1 to all of D × IR+ by setting k1 (x1 , x2 , x3 ) = k1 (x1 , x2 , u(x1 , x2 )) for all x3 ≥ 0 . (14) Let Ω be supported by D and define k := k1 1 + |∇u|2 . We use the constraint functional G(Ω) = u(x) max {0, −k1 (x)} dS . (15) ∂Ω
This functional vanishes if and only if k1 is everywhere nonnegative. The scaling by u is shown to be useful by the following computation. We reformulate the functional in terms of a change of integrals from ∂Ω to D. Then: u G(Ω) = max 0, − max 0, −u k dx1 dx2 k dS = 1 + |∇u|2 ∂Ω D u(x1 ,x2 ) = max 0, − k dx3 dx2 dx1 = max 0, − k dx . D
0
Ω
According to Theorem 4.2 of Chap. 8 in [1] this functional has the L2 shape gradient = max 0, − ∇S G k . (16)
664
T. Eirola and T. Lassila
We obtain the penalty function formulation for the level set equation (4) with a convexity constraint
φt + vn − μ max 0, − k |∇φ| = 0 , (17) with a penalty term μ > 0. This method is a version of the min/max curvature flows studied in [14], since φt + vn |∇φ| = μ min {0, k1 } |∇φ| .
(18)
Furthermore, the minimum curvature flow will convexify the initial shape, justifying our choice of the constraint functional (15). The following theorem was proven in [11]: Theorem 1. In the case that vn ≡ 0, the viscosity solution of the equation (17) converges towards the convex hull of the initial shape Ω0 as t → ∞.
4 4.1
A Problem in 3 He Crystal Imaging Fabry-Pérot Interferometer Measurement of a Crystal
The formation of faceted crystals in low-temperature 3 He has been the subject of study in the low temperature physics community. It is known that at below 200 mK temperatures smooth facets appear that correspond to orientations of the lattice planes. The problem of predicting which facets appear at which temperature is still open. It is known that as the temperature is increased past the so called roughening limit the facets become rounded out and no longer appear. The theoretical roughening limit is much higher than what has been observed in practical experiments. We consider an experimental setup where liquid 3 He at temperature below 200 mK is placed between the two plates of a Fabry-Pérot interferometer. Overpressure is then exerted to allow the creation of crystals to occur. As light passes through the crystals, a diffraction pattern is observed on a CCD imaging array. By relating the intensity of the interferogram to the phase delay through the crystal at each point we can determine the shape of the crystal and the orientation of all the facets. 4.2
Convexity of Crystals and the Growth Process
The growth of crystals is governed by three principal forces: the external work done to the system by the driving overpressure, the surface tension between the liquid and solid Helium, and gravity. When the crystal growth process is sufficiently slow we can assume that at each measurement the crystal has achieved thermal equilibrium. The crystal shape is then determined by minimizing a surface energy. This leads to an anisotropic mean curvature flow that models the growth process of crystals [15]. It is known that such flows preserve convexity of the shapes [16]. We therefore assume that, apart from small irregularities, the thermal equilibrium shape is also convex. This assumption has been verified in experimental measurements.
Optimization of Convex Shapes
4.3
665
Inverse Problem of Shape from Interferogram
Let D = [0, 1]2 be the domain of the interferogram and f : D → IR a function that gives the intensity of the interference pattern at each point on the CCD. The physical parameters are Δnsl , the difference between the refractive indices of the solid and liquid 3 He, and λ, the interferometer laser wavelength, and a(x) the amplitude. The intensity of the interference pattern at each point is given approximately by Δnsl F (u)(x) = a(x) ϕ( u(x)) = f (x) , (19) λ where ϕ : IR → [−1, 1] is a continuously differentiable piecewise strictly monotone waveform function. Note that this definition forbids square or sawtooth type waveforms. To simplify things we assume the laser amplitude to be almost constant and known, a(x) ≈ a. The inverse problem to be solved is: given an interferogram f ∈ L2 (D) of measured intensities (with noise), deduce the shape of the crystal Ω. This problem can be posed as a mathematical shape optimization problem. Let Ω be a convex trial shape supported by D. Denote the bottom part of the surface of the shape as Γb := ∂Ω ∩ D. We consider the shape functional with the L2 -norm J(Ω) = 12 |ϕ(x3 ) − Sf (x1 , x2 )|2 dS , (20) ∂Ω\Γb
where ϕ is a continuously differentiable and piecewise strictly monotone function and S : L2 (D) → H 1 (D) is a smoothing operator. The corresponding mathematical shape optimization problem is then min
convex Ω∈ΣΓ
J(Ω) ,
(21)
b
where ΣΓconvex is a family of convex shapes with Γb fixed. The choice of this family b of will be discussed later. We have the following existence theorem from [4]: Theorem 2. Let f be such that Sf is continuous. Then the shape optimization problem (21) has at least one solution. 4.4
Is the Inverse Problem Uniquely Solvable?
It is possible to construct examples that show that in the absence of a convexity constraint the inverse problem of finding the shape Ω from its interference pattern f is not uniquely solvable even when we set a perimeter constraint such as requiring Γb to be fixed. But if we require convexity and fix u on the boundary ∂Γb , we have the following result: Theorem 3. Let D ⊂ IRd be a bounded convex open set and Γ its boundary. Fix a function h ∈ C(Γ ) on the boundary. Let Ch be the family of concave functions u : D → IR in C(D) such that u|Γ = h. Let the operator F : H 1 (D) → H 1 (D) be defined as (Fu)(x) = ϕ(u(x)) . (22)
666
T. Eirola and T. Lassila
where ϕ is a continuously differentiable and piecewise strictly monotone function. Then the restriction of F into Ch is injective. Proof. Case d = 1 Let u, v : [a, b] → IR be distinct concave functions such that u(a) = v(a), u(b) = v(b), and that ϕ(u) ≡ ϕ(v). Let (ξ, η) ⊂ [a, b] be any open interval where u = v but u(ξ) = v(ξ) and u(η) = v(η). Without loss of generality we assume u > v on (ξ, η). Since ϕ is continuously differentiable and ϕ(u(ξ)) = ϕ(v(ξ)) from the inverse function theorem it follows that ϕ (u(ξ)) = 0. From the assumption that ϕ is piecewise strictly monotone follows that ϕ has only isolated zeros. Thus the local behavior of ϕ near u(ξ) can be of only two types, a) or b), as shown in Fig. 1. a)
b)
ϕ
u, v
u(ξ) = v(ξ)
u(ξ) = v(ξ)
Fig. 1. The different kinds of possible local behavior of the function ϕ(u) near a bifurcation point ξ
Since u is concave there exists an interval (ξ, ξ + ε) where it is either constant, increasing, or decreasing: 1. If u was constant in some interval (ξ, ξ + ε) then so would be ϕ(v). But because ϕ cannot vanish in any neighborhood of u(ξ) this would mean that v would also be constant in (ξ, ξ + ε), a contradiction. So neither u nor v can be locally constant past the bifurcation point ξ. 2. Assume that u is increasing in some interval (ξ, ξ + ε) and the local behavior of ϕ is like in a). Then v must be decreasing in (ξ, ξ + ε). 3. Assume that u is decreasing in some interval (ξ, ξ + ε) and the local behavior of ϕ is like in b). But since u > v, case b) is impossible. Thus immediately after the bifurcation point ξ we must have u increasing and v decreasing. Using the same argument at η we get that u must be decreasing and v increasing in some interval (η − ε, η). But v is concave and cannot be first decreasing and later increasing, a contradiction.
Optimization of Convex Shapes
667
Case d ≥ 2 For every pair of points x, y ∈ Γ we take the line segment L connecting x to y and look at the restrictions u|L , v|L , which are concave functions of one variable. Since u, v coincide on all such segments L they are equal everywhere. We remark that in when the measurement is noisy we can lose the uniqueness of the solution. This is due to the fact that the range of the forward operator F is nonconvex, and thus if the measurement f lies outside the range of F the minimization problem (21) can have multiple solutions. 4.5
Formulation for the H 1 -Variation of a Shape Functional
To solve optimization problem (21) using the gradient descent method we must find the shape gradient of the functional given by (8). While the gradient could be computed only in the L2 inner product, we prefer the H 1 inner product since the resulting gradients are smoother and hopefully also lead to a numerically more robust algorithm. The need for regularizing the shape variations is wellestablished in the literature, but the relation with regularization of ill-posed inverse problems perhaps less so. The effect of different inner products on the convergence of the gradient descent iteration was studied in more detail in [17]. Lemma 1. Consider the shape functional for d-dimensional convex shapes Ω ⊂ D × IR+ : g(x, n) dS , (23) J(Ω) = ∂Ω \ Γb
1
where g(x, n) is H with respect to both arguments. Then J is shape differentiable and the shape derivative dJ(Ω; v) with respect to a normal variation vn ∈ H01 (D) is given by
dJ(Ω; v) = − ∇n g · ∇vn + (∇x g · n + κg)vn |F | dξ , (24) D
for all vn ∈ H01 (D), where |F | := 1 + |∇u|2 is the change of integrals term given by u the height function of the convex shape. Proof. The details are given for example in [18]. Here we reproduce only the general procedure. Let Ω be given and φ its implicit function. Then according to the coarea formula [19] gives ∇φ ) |∇φ| δ(φ) 1lΓbc dx . g(x, n) dS = g(x, J(Ω) = |∇φ| ∂Ω \ Γb IRd The variation can now be performed in terms of φ. Let vn = −ψ/|∇φ| be an extension velocity field to the entire IRd such that ψ|Γb ≡ 0, i.e. the base remains fixed. The Gâteaux derivative is, after some computations, given by d ψ ∇φ J(φ + τ ψ) = ∇ · ∇n g + g |∇φ|δ(φ) dx . dJ(Ω; v) = − dτ |∇φ| |∇φ| IRd
668
T. Eirola and T. Lassila
Integration by parts gives ∇ · (∇n g) vn dS = − ∂Ω
∇n g · ∇vn dS
∂Ω
and the result follows by using the coarea formula in the other direction and noting that n = ∇φ/|∇φ| and κ = ∇ · n is the mean curvature of ∂Ω. We can thus compute the negative shape gradient of J with respect to the H 1 inner product as the solution w ∈ H01 (D) of the elliptic equation (∇vn · ∇w + vn w) dξ+ (αvn + β · ∇vn ) dξ = 0, for all vn ∈ H01 (D), (25) D
D
where α = |F | (∇x g · n + κg) and β = −|F |∇n g as in Lemma 1, plus homogeneous Dirichlet boundary conditions. For the convex constrained iteration it also beneficial to use the H 1 -gradient of the constraint functional (15), which can be obtained by the same procedure from (16).
5 5.1
Numerical Experiments Methodology
As a first approach to optimization of convex shapes we limit the numerical experiments to 1-d and choose D = [0, 1]. The questions to be answered are: – Does the convexity constraint penalty term improve the quality of the recovered shapes? – We would like to estimate the tensor of anisotropy of the mean curvature flow that drives the crystal formation process. Can reasonable estimates for the curvatures be obtained from the recovered shapes? The quality of the recovered shapes was studied with two different crystal profiles (shown in Fig. 2). Case A represents a faceted crystal, while Case B is a smooth profile. For the forward model we used a sinusoidal waveform, f (x) = sin(γu(x)). To measure the error of the recovered shapes we generated a testing sample of 100 noisy realizations of the data f , each with 10% standard deviation, and took the mean L2 -error over this sample set. At each descent step the shape derivative (24) was computed. The H 1 -gradient was solved from equation (25). The normal velocity field was extended to the entire computational domain and the resulting level set evolution was solved using the Level set method toolbox [20] for Matlab. The gradient descent step size was chosen according to the Armijo rule [21] to obtain decreasing steps in the functional (20). The iteration was stopped when the recovered height function u changed less than 0.1% in the L2 -norm during the previous step. For the convex constrained iteration (17) we used a penalty parameter value of μ = 105 .
Optimization of Convex Shapes
Case A
−4
x 10
Case B
−4
4
669
x 10
3
2
2 1 0
1 0
0.2
0.4
0.6
0.8
0
1
0
0.2
0.4
0.6
0.8
1
Fig. 2. Left: True crystal shape (solid line) and initial guess (dashed line) for the test Case A. Right: Same for Case B.
5.2
Choosing the Smoothing Operator S
To construct the smoothing operator S in (20) we considered linear diffusion operators of the form −K
(Sf )(xi ) = (I − δDxx )
f (xi ),
K ∈ IN ,
(26)
where Dxx is an operator giving the discrete approximation of the second derivative of f at xi . The simplest choice is the symmetric difference approximation for the second derivative (in the 1-d case) Dxx =
f (xi+1 ) − 2f (xi ) + f (xi−1 ) . Δx2
(27)
This difference approximation tends to smooth out especially the corners of f , so that for faceted profiles we should choose K moderately small. We chose δ = 0.01 and considered the cases K = 0 (no smoothing) and K = 100 (with smoothing). 5.3
Comparison of Convergence with and without the Convexity Constraint
The first observation we made was that the L2 -gradient descent iteration in general does not work at all. The computed boundary variations were too oscillatory. After an H 1 -gradient was implemented the regularization was enough to provide local convergence from an initial guess that had 15%-20% relative L2 -error. In Table 1 we list the accuracy of the obtained shapes by the relative L2 error from the true crystal shape. We note that in both cases the recovered solutions were roughly within 3% of relative error. This remained the case even with convexity constraints and smoothing of the data. The sharp corner of Case A also produced more error than the smooth profile of Case B. 5.4
Estimating the Curvature(s) of the Crystal Surface
One way of evaluating the quality of the recovered crystal shapes is to see if useful estimates for the curvature(s) of the crystal surface can be obtained. We
670
T. Eirola and T. Lassila
Table 1. Relative L2 -error from the true profile u obtained by the unconstrained (μ = 0) and convex constrained (μ = 105 ) iterations with and without smoothing
Case A B
No smoothing No smoothing With smoothing With smoothing μ=0 μ = 105 μ=0 μ = 105 1.71 % 0.47 %
2.61 % 0.51 %
1.98 % 0.47 %
2.63 % 0.61 %
ran both the unconstrained and convex constrained iterations for Case A. We also tested the effect of increasing K in the smoothing operator (26). The obtained curvatures are plotted in Fig. 3. In this case the curvature should be zero almost everywhere with a singularity at one point. None of the curvature estimates are free from numerical artifacts. The convex constrained solution gives curvatures that are nearly nonnegative everywhere. The effect of added smoothing is to dampen the oscillations of the recovered curvatures. μ = 0, No smoothing
μ = 0, With smoothing
1
1
0.5
0.5
0
0 0
0.5 5 μ = 10 , No smoothing
1
1
1
0.5
0.5
0
0 0
0.5
1
0
0.5 5 μ = 10 , With smoothing
1
0
0.5
1
Fig. 3. Estimated curvatures for the Case A obtained with the unconstrained and convex constrained iterations, with and without smoothing of the data. The true curvature is denoted by a dashed line.
6
Conclusions
The inverse problem of crystal shape identification from a single interferogram is uniquely solvable if the shape is required to be convex and we have boundary data available. Numerical level set methods can be used to solve such problems with the gradient descent method. We added a penalty term to enforce convexity of the shapes. By choosing H 1 shape gradients we introduced regularization to the problem. This allowed recovery of solutions of the otherwise ill-posed problem. We demonstrated that local convergence is obtained even when relatively large amounts of noise are present in the interferogram. The convex penalty term improved the quality of the recovered surface curvatures.
Optimization of Convex Shapes
671
References 1. Delfour, M., Zolésio, J.P.: Shapes and geometries - analysis, differential calculus, and optimization. SIAM, Philadelphia (2001) 2. Sokolowski, J., Zolésio, J.P.: Introduction to shape optimization: shape sensitivity analysis. Springer, Heidelberg (2003) 3. Tsepelin, V., Alles, H., Babkin, A., Jochemsen, R., Parshin, A., Todoshchenko, I., Tvalashvili, G.: Morphology and growth kinetics of 3He crystals below 1 mK. J. Low Temp. Phys. 129(5-6), 489–530 (2002) 4. Buttazzo, G., Guasoni, P.: Shape optimization problems over classes of convex domains. J. Convex Anal. 4(2), 343–351 (1997) 5. Aguilera, N., Morin, P.: Approximating optimization problems over convex functions. Numer. Math. 111(1), 1–34 (2008) 6. Carlier, G., Lachand-Robert, T.: Convex bodies of optimal shape. J. Convex Anal. 10, 265–273 (2003) 7. Carlier, G., Lachand-Robert, T., Maury, B.: A numerical approach to variational problems subject to convexity constraint. Numer. Math. 88, 299–318 (2001) 8. Carlier, G., Lachand-Robert, T., Maury, B.: H 1 -projection into set of convex functions: A saddle point formulation. In: ESAIM: Proc., vol. 10, pp. 277–290 (2001) 9. Lachand-Robert, T., Oudet, É.: Minimizing within convex bodies using a convex hull method. SIAM J. Optim. 16(2), 368–379 (2005) 10. Hinterberger, W., Scherzer, O.: Variational methods on the space of functions of bounded Hessian for convexification and denoising. Comput. 76, 109–133 (2006) 11. Vese, L.: A method to convexify functions via curve evolution. Commun. Partial Differential Equations 24(9), 1573–1591 (1999) 12. Osher, S., Fedkiw, R.: Level set methods and dynamic implicit surfaces. Applied Mathematics Sciences, vol. 153. Springer, Heidelberg (2002) 13. Burger, M., Osher, S.: A survey on level set methods for inverse problems and optimal design. European Journal of Applied Mathematics 16(2), 263–301 (2005) 14. Malladi, R., Sethian, J.: Image processing: flows under min/max curvature and mean curvature. Graph. Models Image Process. 58(2), 127–141 (1996) 15. Wettlaufer, J., Jackson, M., Elbaum, M.: A geometric model for anisotropic crystal growth. J. Phys. A 27, 5957–5967 (1994) 16. Bellettini, G., Caselles, V., Chambolle, A., Novaga, M.: Crystalline mean curvature flow of convex sets. Arch. Ration. Mech. Anal. 179, 109–152 (2005) 17. Burger, M.: A framework for the construction of level set methods for shape optimization and reconstruction. Interfaces Free Bound 5, 301–329 (2003) 18. Solem, J.: Variational problems and level set methods in computer vision - theory and applications. PhD thesis, Lund University (2006) 19. Federer, H.: Geometric measure theory. Springer, New York (1969) 20. Mitchell, I.: The flexible, extensible and efficient toolbox of level set methods. J. Sci. Comput. (2007) (online first) 21. Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific J. Math. 16(1) (1966)
An Implicit Method for Interpolating Two Digital Closed Curves on Parallel Planes Nikolaos Gabrielides and Laurent Cohen Centre de Recherche en Mathématique de la Décision, Université Paris IX, Dauphine, Place du Maréchal de Lattre de Tassigny, 75775 Paris Cedex 16, France [email protected], [email protected]
Abstract. Ardon et al. [2] presented an implicit method for surface segmentation in 3D images. The boundary of the surface is assumed to be constrained by two given curves in the image. In this work we adopt the afore approach to interpolate two given digital curves lying on parallel planes, by introducing an artificial image potential, which is based on a triangular facet surface interpolation technique.
1
Introduction
Let us be given two digital contours Γ and Δ, i.e. two closed ordered sets of black voxels on a white background, lying on the planes z = rΓ and z = rΔ , of a 3D image Ωpqr , which discretizes the volume Ω ⊂ IR3 , with p, q and r being the number of voxels distributed equidistantly along the x, y and z axis, respectively. We wish to construct a surface that interpolates the data sets Γ and Δ. A similar formulation to the afore digital contour interpolation problem can be found in the construction of a gradual transformation from the closed polygon, PΓ to the closed polygon PΔ , most widely known as the morphing problem. Following Efrat et al. [11], this tranformation can be expressed as a mapping: M(PΓ , PΔ ) = {μ(t), t ∈ [0, 1], such that μ(0) = PΓ , μ(1) = PΔ }, which can be computed by solving the following two problems: (a) The correspondence problem, where an explicit mapping between PΓ and PΔ , is established, by specifying two functions cγ (u) : [0, 1] → PΓ and cδ (u) : [0, 1] → PΔ . (b) The vertex path problem, where we seek for the trajectory that connects cγ (u) with cδ (u) (see also [15]). If this path is a straight line, then it is easy to find examples with self intersections. The authors of [11] assert that if one adopts the policy of moving cγ (u) to cδ (u) along the Euclidean shortest path, from cγ (u) to cδ (u) that avoids PΓ and PΔ , then it is guaranteed that all intermediate morphs are simple, since the shortest paths do not cross each other, although two such paths may have a common sub-path. Hence, in order to achieve a solution to the digital contour interpolation problem, free of self intersections, we seek for a method that constructs surfaces from
This work was partially supported by ANR grant SURF -NT05-2_45825.
X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 672–683, 2009. c Springer-Verlag Berlin Heidelberg 2009
An Implicit Method for Interpolating Two Digital Closed Curves
673
3D images, that contain geodesic paths connecting the digital contours Γ and Δ. The method presented in [2] might give us the opportunity to solve the problem with implicitly defined surfaces, as it possesses this property.
2
Preliminaries
In order to segment a given 2D or 3D image I : Ω → IR, a common approach is to define a Riemannian manifold, called potential function, P = P (I) : Ω → IR, such that features in I will be captured on P . This, of course, is ensured with an “appropriate” definition of the function P , which takes into account the nature of the features we aim to follow. More specifically, after the classic work of Kass et al. [16] in 2D image segmentation methods the objective is to compute an active contour, C(s), s ∈ [0, L], located on the surface P , such that minimizes the energy functional:
L
E(C) =
(1)
P (C(s))ds. 0
Towards this aim, Cohen & Kimmel, in [8], presented a segmentation method, which computes the active contour connecting two given points, P1 , P2 on P . The authors show that a globally minimal curve for (1) is obtained by following the opposite gradient direction on the minimal action map UP1 (Q) (see [18]) which is defined by: UP1 (Q) =
L
inf
C(0)=P1 ,C(L)=Q
P (C(s))ds, 0
Q on P.
(2)
The minimal path C(s), from P1 to P2 is then obtained by solving the problem: ˜ dC(σ) ˜ ˜ = −∇UP1 (C(σ)), with C(0) = P2 , dσ
˜ − σ) (3) and setting C(s) = C(L
According to the analysis in [19] the minimal action map UP1 is the solution of the following eikonal equation: ||∇UP1 || = P,
with UP1 (P1 ) = 0.
(4)
An extension of the above results for 3D images is presented in [1]. Given a 3D image, I, and the corresponding potential, P , the Euler-Lagrange equations of the energy functional E in the 3D space are: ˆ = P (C)κ ∇P (C) · n
ˆ = 0, and ∇P (C) · b
(5)
ˆ and the scalar κ denote the normal, the binormal and ˆ, b where the vectors n the curvature of C, respectively. It was proved that if UP1 is the solution of the eikonal equation (4), then every curve C(s) that is a solution of the ordinary differential equation (3) is also a solution of the Euler-Lagrange equations (5).
674
N. Gabrielides and L. Cohen
This result paved the way to define and compute the globally minimal path between a point P and a curve Γ on the Riemannian manifold P . The minimal action map with respect to Γ and P is defined as the function 1 UΓ (P) = min E(C) = min P (C(t)) ||C (t)|| dt, (6) C
C
0
where C(t), t ∈ [0, 1] is any curve from the point P to the curve Γ . Note that, by the definition of C, the minimal action map UΓ (P) is equal to UQ (P) for some Q ∈ Γ . Thus, UΓ satisfies the eikonal equation: ||∇UΓ || = P,
with UΓ (Q) = 0,
∀Q ∈ Γ.
(7)
Going one step beyond, let us assume that the point P belongs to a set Δ. Having solved (7) all the minimal paths from each point in Δ to the curve Γ , can be computed using (3). Let us denote this set of paths by SΓΔ . It can now be undrestood that if the points in Δ form a curve, then the set SΓΔ consists of all the minimal paths, CΔ Γ (s), between the points of the two curves Γ and Δ. Next, in [2] a function Ψ , was defined on the image domain, such that its zero level set contains all the paths in SΓΔ , i.e. Ψ (CΔ Γ (s)) = 0. Assuming that Ψ is continuously differentiable, the following necessary condition was obtained: Δ Ψ (CΔ Γ (s)) = 0 =⇒ ∇Ψ (CΓ (s)) ·
CΔ Γ (s) = 0 =⇒ ∇Ψ (P) · ∇UΓ (P) = 0, ds
(8)
for every point P ∈ SΓΔ . Demanding that Ψ satisfies a relation similar to (8), everywhere in Ω, a sufficient condition for the minimal paths to be contained in Ψ = 0 is given by the following transport equation: ∇Ψ (P) · ∇UΓ (P) + G(Ψ (P)) = 0,
Ψ (Q) = 0,
∀Q ∈ Δ,
(9)
where the function G is such that G(0) = 0 (e.g., G(Ψ ) = aΨ (P)). In fact it was proved that if Ψ satisfies (9) then for all points P of its zero-level set, the minimal path joining P with the curve Γ is contained in the zero level set of Ψ . This, in turns, proposes to solve equation (7) and then compute Ψ through (9). , Note that the equations (7) and (9) can be solved over the nodes of Ωpqr which discretizes Ω. In view of this, the point-sets Γ and Δ form two digital contours, which in turn implies that the afore method establishes essentially an interpolation between the two given digital contours. This allows us to employ it in the digital contour interpolation problem, provided that Γ and Δ lie on the parallel planes z = rΓ and z = rΔ , and no potential function is given. Since the surface Ψ = 0 contains all the minimal paths from the digital contour Γ to the digital contour Δ, we can allegate that solving the problems (7) and (9) we obtain an interpolating surface free of self-intersections.
3
An Artificial Image Potential
The need for an artificial image potential, other than constant, can be explained as follows: if P is constant, then the induced Riemannian manifold is a hyperplane in R4. Thus, the minimization of the energy functional (1) leads to a set
An Implicit Method for Interpolating Two Digital Closed Curves
675
of straight lines in R3, which start from the point set (contour) Δ and end on the points of the contour Γ , having the minimum length. Suppose now that the contour Γ is translated onto the plane z = rΓ until one point p of it is closer to all points of the set Δ. Then, the surface that contains all minimal paths is conic with its apex at P and base the set Δ. In that case all the points of Γ but P are not interpolated by the surface Ψ = 0. Thus, the problem is to introduce an artificial potential function, by using only the given information of Γ and Δ. Let us suppose that we are given a matching between the two given point sets (pixel sets) Γ and Δ. Then, we can easily define the set of minimal paths SΓΔ through equation (3) for any potential function, P . If P is constant, then the minimal paths are the straight lines which connect the points of the two point sets (the pixel centers) according to the preassumed matching, thus a C 0 surface containing all the minimal paths can be a triangular facet surface that interpolates Γ and Δ. The main disadvantage with such a construction is that self-intersections cannot be avoided in general (see [14]). However, since there are interpolation techniques which can easily construct triangular facet surfaces that interpolate the given point sets, the above remarks make us think that it is preferable to compute the potential P through the construction of such a surface, say S. Since S consists of triangles, it can easily be implicitized on the grid, Ωpqr . This can be achieved, for example, by computing the euclidean distance function, D, of S, on the grid nodes P, i.e. D(P, S) = min ||P − S|| , ∀P ∈ Ω
(10)
Then, regardless the matching we chose between the points of Γ and Δ, if one traverses the minimal path from a point on Δ to some point on Γ and the surface intersects itself, the minimal path is chosen so as to have a common sub-path after the intersection point, thus avoiding self-intersections. This suggests that the surface S could be the Riemannian manifold on which the minimal paths lie, i.e. the unsigned distance D can play the role of the discrete potential P at the image discretized domain Ωpqr . 3.1
Interpolating Two Polygons with C 0 Triangular Facet Surfaces
Previous Work. The construction of the surface S can be formulated as follows: Problem 1. Given the ordered closed planar point sets: PΓ = {PΓ,j ∈ IE 3 , j = 0, . . . , n − 1} and PΔ = {PΔ,k ∈ IE 3 , k = 0, . . . , m − 1}, which belong to the planes z = rΓ and z = rΔ , respectively, construct a C 0 surface interpolating them and consists of triangles with vertices in PΓ and PΔ . (n+m)! . Among them, one has to The total number of such triangulations is (n−1)!(m−1)! compute the optimal, according to some objective function, which quantifies the quality of these triangulations. Apparently, the quality of such a surface depends mainly on the relative twist between the points of the two contours. This in turns lets us entitle the objective function as a twist minimization criterion.
676
N. Gabrielides and L. Cohen
Keppel introduced in [17] a representation of all continuous solutions, with the aid of a toroidal graph, i.e., a binary matrix, Kn×m , where the indices j, k of its elements are regarded as j = mod(j, n) and k = mod(k, m). If Kjk = 1, then the points PΓ,j and PΔ,k are connected. If Kjk = 1 and Kj+1,k = 1, then the points PΓ,j , PΓ,j+1 and PΔ,k form a triangle. (Analogously, if Kjk = 1 and Kj,k+1 = 1, then the points PΓ,j , PΔ,k and PΓ,k+1 form a triangle). Each triangle arrangement is represented by a set of unitary elements in this matrix. Keppel proved that for acceptable triangulation, these elements form a monotone path in the graph. Thus, the optimum surface can be obtained by searching among all monotone paths in the toroidal graph Kn×m . The methods for computing such paths can be divided into two categories: the exhaustive search methods (e.g, [17, 13]) that evaluate the final surface according to some global criterion, and the methods based on weighted graphs (e.g., [6, 4, 12]) according to which a weight is assigned on each graph node and then starting from the least one, the whole path is computed by choosing in each step, among the neighboring nodes, the one with minimum weight. The methods based on weighted graphs reduce effectively the computational cost, but since they are depending on the selection of the nodal weights, may yield surfaces that do not interpolate all points in PΓ and PΔ . Our intension is to propose a nodal weight definition, which resolves such ambiguities. Our Method. In order to introduce our method, let us further restrict ourselves to convex contour data sets. In [6, 12] the weight at the node Kjk of the toroidal graph, is the length ||PΓ,j − PΔ,k ||. Thus, by definition, the final result depends on the relative position of the sets PΓ and PΔ . The method of [4] proposes a translation of the polygons so as their centers, AΓ and AΔ coinside. Thus, the square of the afore defined distance for the translated polygons, with respect to the initial points is equal to ||(PΓ,j − AΓ ) − (PΔ,k − AΔ )||2 = ||(PΓ,j − AΓ )||2 + ||(PΔ,k − AΔ )||2 − 2(PΓ,j − AΓ ) · (PΔ,k − AΔ ). Then, setting −(PΓ,j − AΓ ) · (PΔ,k − AΔ ) as nodal weight, the path is computed by choosing the minimum weight at each step. We propose as weight function the dimensionless quantity: −
(PΓ,j − AΓ ) · (PΔ,k − AΔ ) , ||PΓ,j − AΓ || ||PΔ,k − AΔ ||
(11)
which is equal to the negative cosine of the angle formed by the vectors: PΓ,j − AΓ , j = 0, . . . , n − 1 and PΔ,k − AΔ , k = 0, . . . , m − 1, in [0, π]. Since the cosine is a decreasing function in [0, π], the proposed weight can equivalently be defined as the least angle, φ(θΓ,j , θΔ,k ) formed by two lines with directions given by PΓ,j − AΓ and PΔ,k − AΔ , where θΓ,j denotes the polar angle of the point PΓ,j with respect to a coordinate system whose origin is AΓ,j . (Analogous definition holds for θΔ,k ). We connect the point PΓ,j with the point PΔ,k (analogously the point PΔ,k with PΓ,j ), when the index k (index j) is such that solves the following problems: min
k=0,...,m−1
φ(θΓ,j , θΔ,k ) and
min
j=0,...,n−1
φ(θΓ,j , θΔ,k ).
(12)
An Implicit Method for Interpolating Two Digital Closed Curves
677
We set the weight at every node Kjk equal to the angle φ(θΓ,j , θΔ,k ). Then, Kjk = 1 for all couples of points that constitute the set of solutions of the problems (12). Now, we can easily establish that the solution has the following properties (see, e.g., Fig.1): i. In every row and every column of the toroidal graph there exists at least one unitary node, since ∀j we have computed the corresponding index k and ∀k we have computed the corresponding j. ii. The unitary nodes of the graph are ordered monotonically. The proof is simple, if one realizes that for each particular connection between PΓ,p1 PΔ,q1 and PΓ,p2 PΔ,q2 , every point PΓ,j which is in between PΓ,p1 and PΓ,p2 must be connected with a point which is in between PΔ,q1 and PΔ,q2 , since both polygons share the same orientation and are convex. iii. Solving the problems (12) does not imply that all the nodes of the monotone path in the graph have been computed. It is possible to be left with couples (p1 , q1 ) and (p1 + 1, q1 + 1) but none of (p1 , q1 + 1) and (p1 + 1, q1 ).
Fig. 1. Left: The connections between the points of two convex polygons, as obtained by solving the problems (12). Right: The toroidal graph of the connections. The unitary nodes are illustrated by spheres, the computed triangle edges by blue lines and the possible triangle edges by red lines.
If we interprete geometrically the afore properties, we may assert that up to this point we have constructed a surface which interpolates the point sets PΓ and PΔ and consists of triangular and rectangular patches. The final triangulation can be obtained by tracking all the rectangular patches (i.e. where the property (iii) holds) and triangulating them, based on the least nodal weight. Constructing the surface in this way, O(nm) operations need to be performed, but this cost can effectively be reduced. Towards this aim, we define the circular lists: LΓ = {θΓ,j }n−1 j=0 and LΔ = {θΔ,k }m−1 of the polar angles of the points of the two initial point sets, with k=0 respect to their centers. Note that the elements of these lists have a circularly increasing order. We find the element of the list LΓ with the least value and we
678
N. Gabrielides and L. Cohen
set the head of LΓ at its position. Then, we compute the index which solves (12) for j = 0 and we set the head of LΔ at . (We also reorder accordingly the elements of the point sets PΓ and PΔ ). Now, we know that the element K00 of the graph belongs to the set of solutions of the problem. Note that up to this point, the operations done are O(n + m). Say now that the node Kjk belongs to the solution set of the problems (12), i.e. Kjk = 1. We consider only the possible connection of this to the nodes Kj+1,k , Kj,k+1 and Kj+1,k+1 , knowing that due to the properties (i)-(iii), at least one of them belongs to the solution nodes. Thus, we begin from the node K00 , which is already computed, and at each step we compare the weights given by the function φ(·, ·), only for the afore mentioned three neigboring nodes. In case the least node is Kj+1,k+1 , we also insert in the path the one of the other two that has the least weight. Apparently, the path computed this way will traverse the nodes of the solution of the problems (12), and since the nodes to be computed are exactly (n + m), it readily follows that the complexity of the algorithm is O(n + m). Now, we can state the following result: Lemma 1. A C 0 triangular faced surface that interpolates any two convex planar polygons, with n and m points and satisfies the criteria (12) can be computed after O(n + m) operations. Moreover, the space needed for the whole process is of O(n + m). If one or both polygons are not convex, we can map them onto their convex hulls and apply the algorithm to the trasformed polygons. The output of the alogrithm is actually a point matching, thus the final surface can be constructed by adopting this matching. The use of such a technique was first proposed and implemented in [12] but their method increases the computational cost. Alternatively, in order to eliminate the cost of this mapping, we project all the points of the non-convex segments, Pj , j = S + 1, . . . , E − 1, to the corresponding convex hull segment PS PE , according to rule given by: Pj = (1 − tj )PS + tj PE ,
tj =
j−1 k=S ||Pk+1 −Pk || E−1 . k=S ||Pk+1 −Pk−1 ||
Computing the convex hull
of a polygon by using the algorithm of [21] needs O(n) operations, hence we can state that the results of the Lemma 1 still hold in the general case of nonconvex polygons. It is worth to remark that this algorithm although is of linear complexity, the criterion is not local (in the sense that the same result is obtained following the exhaustive search procedure) in constrast to all up today published algorithms, except of the one given in [24] also for convex polygons. Finally, the result, i.e. the point matching, is independent of any translation of the initial data and moreover independent of an isotropic scaling of the initial data sets, thus it satisfies the criteria given by [24]. Note also that the whole method emulates the algorithmic procedure proposed by [5]. The Discrete Potential Function. Since the surface S consists of (n + m) triangles, the minimum Euclidean distance (10) from every point of a grid Ωpqr to S can be found in (n + m) operations, thus the total number of calculations for the discrete image potential becomes of O(pqr(n + m)).
An Implicit Method for Interpolating Two Digital Closed Curves
4
679
Numerical Solution of the Eikonal and the Transport Equation
Both equations (7) and (9), belong to the class of Hamilton-Jacobi stationary equations and shall be considered simultaneously. The conditions under which the solution of a numerical approximation of any Hamilton-Jacobi equation converges towards the so-called viscosity solution can be found in [9] and [10]. In [25, 2] a first order upwind scheme employed in order to solve equation (9). According to them, the numerical Hamiltonian of (9) can be written, for G(Ψ ) = αΨ , as i,j,k i,j,k i,j,k Ψ x , Ψy , Ψz · (UΓ )i,j,k + αΨ i,j,k = 0. , (UΓ )i,j,k , (UΓ )i,j,k (13) x y z where the subscripts denote the partial differentation with respect to x, y and z. Approximating the derivatives by biasing the finite difference stencil in the direction where the characteristic information is coming from, lets us write the product Ψxi,j,k (UΓ )i,j,k as: x i,j,k Ψ i,j,k −Ψ i+1,j,k (UΓ )i,j,k = −(UΓ )i,j,k , if (UΓ )i,j,k <0 Ψ+x i,j,k i,j,k x x x Δx Ψx (UΓ )x = i,j,k i−1,j,k i,j,k −Ψ i,j,k i,j,k Ψ i,j,k Ψ−x (UΓ )x = (UΓ )x , if (UΓ )x > 0 Δx or i + 1, if (UΓ )i,j,k <0 x = >0 i − 1, if (UΓ )i,j,k x (14) Applying the above to (13) and solving it with respect to Ψxi,j,k we obtain: |(UΓ )i,j,k | |(U )i,j,k | |(U )i,j,k | + Ψ i,J,k Δyy + Ψ i,j,K ΓΔzz Ψ I,j,k ΓΔxx
Ψ i,j,k = (15) |(U )i,j,k | |(UΓ )i,j,k | )i,j,k | x z + ΓΔyy + |(UΓΔz +α Δx Ψxi,j,k (UΓ )i,j,k x
Ψ |(UΓ )i,j,k | x
i,j,k
− Ψ I,j,k , where I = Δx
for i = 0, . . . , p − 1, j = 0, . . . , q − 1 and k = 0, . . . , r − 1, with I, J and K being defined in analogous to (14) manner, according to the sign of the nodal derivatives of UΓi,j,k with respect to x, y and z, respectively. For the eikonal equation (7) the scheme proposed by Rouy & Tourin [22],
2 i,j,k max max (UΓ )i,j,k = (P i,j,k )2 , −X , 0 , − min (UΓ )+X , 0
(16)
X={x,y,z}
leads to a quadratic equation, with respect to (UΓ )i,j,k . Both equations (15) and (16) can be solved iteratively by updating their grid values until they converge, according to some predefined accuracy. An ultimately efficient approach to solving them is based on the so-called fast marching method, which was introduced by Sethian [23] for the eikonal equation (16). Realizing that the solution of the eikonal equation represents the distance
680
N. Gabrielides and L. Cohen
map on the (hyper)-surface P from the boundary curve Γ (see [19] and [7]) it is to be expected that the information propagates from the smaller values, near the boundary Γ , to the larger ones as we move far from it. In other words, since the characteristics of the eikonal equation are straight lines (see [20]) emanating from the boundary Γ , the numerical solution can be built "outwards" from the smallest values, as Sethian pointed out. The idea is to sweep the front ahead, by considering a set of points in a narrow band around the existing front, and to march this narrow band forward, freezing the values of existing points and bringing new ones into the narrow band structure. The key is in the selection of which grid point in the narrow band to update. The answer is that the point having the smallest value (i.e. the closest to the already calculated points) in this narrow band around the front is the one that cannot be affected by the other points next to it, thus its value must be correct. Returing back to the discrete transport equation (15) an extremely fast convergence can be achieved by visiting the points in the order they are reached by the characteristic curves, in an analogous way to that of the fast marching method for the eikonal equation (see [25, 2]). Considering the characteristics of the equation (9) we obtain that the absolute values of Ψ i,j,k decrease, as we move from the boundary to the zero-level set of Ψ , provided that the coefficient α is greater than zero, thus in each step we update the values of Ψ i,j,k on a narrow band of nodes, using the values of Ψ i,j,k that have already been calculated (solved), starting from the boundary of the domain, via equation (15). Then, we consider as solved the point, whose value is closest to solved points, i.e., the one with the maximum absolute value in the narrow band. Regarding the boundary conditions, since we concern only for the zero level set of Ψ , and the condition Ψ = 0 on Γ , following [2] we define the closed set: Vη = {P ∈ Ωpqr : D(P, Γ ) ≤ η}, Γ
where η is a real positive value. We impose Ψ to be equal to the signed distance between P and Γ on the nodes of V ∩ Ωpqr and equal to ± min(D(P, Γ )), Γ P ∈ ΩpqrΓ on the rest of the boundary nodes of Ωpqr , by choosing the negative sign for the nodes exterior to Γ and the positive sign for those interior to Γ . Note also that Γ can be on the boundary of Ω, while Δ must be entirely inside Ω. Numerical experimentation has shown that visually acceptable results can be achieved if we extend the grid Ωpqr in the z direction, so as Δ lies in the middle z-plane. Finally, we should remark that the algorithm yields a different solution, Γ if we compute the surface containing the minimal paths SΔ , instead of the one Δ containing SΓ . The authors of [3] raise this asymmetry by exploiting both the minimal action maps UΓ and UΔ , which is defined analously to UΓ .
5
Examples
In what follows we have taken the digital contours Γ and Δ to lie on the planes z = rΓ and z = rΔ , with rΓ < rΔ and the coefficient α in equation (15) to be
An Implicit Method for Interpolating Two Digital Closed Curves
681
Fig. 2. Ex. 1: The C 0 triangular facet surface and the implicit surface Ψ = 0
Fig. 3. Ex. 2: The C 0 triangular facet surface and the implicit surface Ψ = 0
Fig. 4. Ex. 2: Intermediate slices of the implicit surface, from the contour Γ to Δ
Fig. 5. Ex. 3: The C 0 triangular facet surface and the implicit surface Ψ = 0
equal to 0.1. The grids are relatively coarse, ranging from 50 ÷ 70 nodes in the x and y directions and 20 nodes in the z direction. The first example (see Fig.2) can be characterized as a “simple case” where the triangular facet surface has no self-intersections. In the example shown in
682
N. Gabrielides and L. Cohen
Figs.3-4 the triangular surface has a widely spread self-intersection region, due to the interpolated contours, which are far from being convex. The method yields a surface with no self-intersections. The third example (see Fig.5) is an interpolation of two contours of U as S like shapes. It shows that “morphing” cannot be achieved always due to the fact that in some cases the resulting surface, although it has no self-intersections, appears to have holes, i.e., disconnected cross sections in the area of self-intersection of the triangular surface. This means that the particular image potential function dictates the minimal paths to go around the self-intersection area, thus generating a hole in the surface.
6
Conclusions
We presented an implicit method to interpolate two digital contours on parallel planes, employing the 3D image segmentation technique of [2]. In order to guarantee that the voxels of both contours will always be interpolated, we introduced an artificial potential function. Towards this, we developed an interpolation method, that matches all the pixel centers through a C 0 triangular facet surface, and set the potential function to be the eucledean distance to this surface. The method results to non self-intersecting surfaces. However, when the polygons, connecting the contour voxel centers, are far from convex, it cannot always produce morphs that preserve the connectedness of the given contours along each intermediate slice, which in turns arises a question on how can the potetial function be improved, so as to stably accomplish an acceptable morphing between Γ and Δ. This remains an open question. The idea behind this work was to set up processes for interpolating sets of pixels/voxels following minimal paths on some appropriately defined manifolds. The idea seems to be fruitful and it might pave the way to solve even more difficult interpolation problems in the future.
References 1. Ardon, R., Cohen, L.D.: Fast constrained surface extraction by minimal paths. Inter. J. of Computer Vision 69, 127–136 (2006) 2. Ardon, R., Cohen, L.D., Yezzi, A.: A new implicit method for surface segmentation by minimal paths in 3D images. Appl. Math. Optim. 55, 127–144 (2007) 3. Ardon, R., Cohen, L.D., Yezzi, A.: Fast surface segmentation guided by user input using implicit extension of minimal paths. J. of Math. Imaging & Vision 25, 289– 305 (2006) 4. Batnitzki, S., et al.: Three-dimensional computer reconstruction from surface contours for head CT examinations. J. of Comp. Assist. Tomogr. 5, 60–67 (1981) 5. Choi, Y.-K., Park, K.-H.: A heuristic triangulation algorithm for multiple planar contours using an extended double branching procedure. Visual Computer 10, 372– 387 (1994) 6. Christiansen, H.N., Sederberg, T.W.: Conversion of complex contour lines into polygonal element mosaics. In: Phillips, R.L. (ed.) Computer Graphics (SIGGRAPH 1978), vol. 12, pp. 187–192 (1978)
An Implicit Method for Interpolating Two Digital Closed Curves
683
7. Cohen, L.D.: Minimal paths and fast marching methods for Image Analysis. In: Paragios, N., Chen, Y., Faugeras, O. (eds.) Mathematical Models in Computer Vision: The Handbook, pp. 97–111. Springer, Heidelberg (2005) 8. Cohen, L.D., Kimmel, R.: Global minimum for active contour models: A minimal path approach. Inter. J. of Computer Vision 24, 57–78 (1997) 9. Crandall, M., Lions, P.L.: Viscosity solutions of Hamilton-Jacobi equations. Trans. Amer. Math. Soc. 277, 1–42 (1983) 10. Crandall, M., Lions, P.L.: Two approximations of solutions of Hamilton-Jacobi equations. Math. of Comp. 43, 1–19 (1984) 11. Efrat, A., Har-Peled, S., Guibas, L., Murali, T.: Morphing between Polylines. In: Proc. 12th Ann. ACM-SIAM Symp. on Discr. Alg. (SODA 2001), pp. 680–689 (2001) 12. Ekoule, A.B., Peyrin, F.C., Odet, C.L.: A triangulation algorithm from arbitrary shaped multiple planar contours. ACM Trans. on Graph. 10, 182–199 (1991) 13. Fuchs, H., Kedem, Z.M., Uselton, S.P.: Optimal surface reconstruction from planar contours. Commun. ACM 20, 693–702 (1977) 14. Gitlin, G., O’Rourke, J., Sabramanian, V.: On reconstructing polyhedra from parallel slices. Intern. J. of Comp. Geom. & Appl. 6, 103–122 (1996) 15. Hahmann, S., Bonneau, G.-P., Caramiaux, B., Cornillac, M.: Multiresolution morphing of planar curves. Computing 79, 197–209 (2007) 16. Kass, M., Witkin, A., Terzopoulos, D.: Snakes: Active Contour Models. Intern. J. of Computer Vision 1, 321–331 (1988) 17. Keppel, E.: Approximating complex surfaces by triangulation of contour lines. IBM J. Res. Devel. 19, 2–11 (1975) 18. Kimmel, R., Amir, A., Bruckstein, A.: Finding shortest paths on surfaces using level sets propagation. IEEE Trans. Pat. Anal. Mach. Int. 17, 635–640 (1995) 19. Kimmel, R., Kiryati, N., Bruckstein, A.: Sub-pixel distance map and weighted distance transforms. J. of Math. Imaging & Vision 6, 223–233 (1996) 20. Mauch, S.: Efficient Algorithms for Solving Static Hamilton-Jacobi Equations, Doctoral Thesis, California Institute of Technology, Pasadena, California (2003) 21. Melkman, A.: On-line construction of the convex hull of a simple polygon. Inform. Proc. Letters 25, 11–12 (1987) 22. Rouy, E., Tourin, A.: A viscosity solutions approach to shape-from-shading. SIAM J. Numer. Anal. 29, 867–884 (1992) 23. Sethian, J.: A fast marching level set method for monotonically advancing fronts. Proc. Natl. Acad. Sci. USA 93, 1591–1595 (1996) 24. Welzl, E., Wolfers, B.: Surface reconstruction between simple polygons. In: Lengauer, T. (ed.) ESA 1993. LNCS, vol. 726, pp. 397–408. Springer, Heidelberg (1993) 25. Yezzi, A., Prince, J.: An Eulerian PDE approach for computing tissue thickness. IEEE Trans. on Medical Imaging 22, 1332–1339 (2003)
Pose Invariant Shape Prior Segmentation Using Continuous Cuts and Gradient Descent on Lie Groups Niels Chr. Overgaard, Ketut Fundana, and Anders Heyden Applied Mathematics Group, Malmö University, Sweden {nco,ketut.fundana,anders.heyden}@mah.se
Abstract. This paper proposes a novel formulation of the Chan-Vese model for pose invariant shape prior segmentation as a continuous cut problem. The model is based on the classic L2 shape dissimilarity measure and with pose invariance under the full (Lie-) group of similarity transforms in the plane. To overcome the common numerical problems associated with step size control for translation, rotation and scaling in the discretization of the pose model, a new gradient descent procedure for the pose estimation is introduced. This procedure is based on the construction of a Riemannian structure on the group of transformations and a derivation of the corresponding pose energy gradient. Numerically, this amounts to an adaptive step size selection in the discretization of the gradient descent equations. Together with efficient numerics for TVminimization we get a fast and reliable implementation of the model. Moreover, the theory introduced is generic and reliable enough for application to more general segmentation- and shape-models.
1
Introduction
The celebrated model of T. Chan and L. Vese [1] for piecewise constant, twophase segmentation of a gray scale image I : Ω → R+ can be formulated as follows: Among all characteristic functions u = 1Σ of measurable sets Σ, contained in the bounded (image) domain Ω ⊂ R2 , and all pairs of real numbers c = (c0 , c1 ), find u∗ = 1Σ ∗ , c∗ = (c∗0 , c∗1 ) which minimizes the following energy λ 1 − u, (I − c0 )2 + u, (I − c1 )2 , (1) 2 where λ > 0 is a fixed weight, J(u) = Ω |∇u| dx is the total variation of u, and u, v = Ω uv dx is the L2 inner product between u and v. Recall that for u = 1Σ , J(u) = Per(Σ), the perimeter (in Ω) of Σ, i.e. the length of the boundary Γ = ∂Σ in Ω. Traditionally, and originally [1], minimization of (1) was formulated in the level set framework of Osher an Sethian [2, 3, 4] by setting u = H(φ), where H denotes the Heaviside function, and φ : Ω → R an embedding function used to represent the image object implicitly as Σ = {x ∈ Ω ; φ(x) > 0}. This highly ECV (u, c) = J(u) +
X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 684–695, 2009. c Springer-Verlag Berlin Heidelberg 2009
Pose Invariant Shape Prior Segmentation Using Continuous Cuts
685
non-linear optimization problem is solved using gradient descent, which, in the level set framework, corresponds to the following evolution PDE for the active contour Γ (t) := ∂Σ(t) = {x ∈ Ω ; φ(x, t) = 0}, ∇φ λ ∂φ = div( ) + (I − c0 )2 − (I − c1 )2 |∇φ|, ∂t |∇φ| 2 where t is an artificial time parameter and φ = φ(x, t) a time dependent level set function. At every instant in this evolution, the gray value estimates c0 , c1 are updated according to c0 = c0 (u) =
1 − u, I 1 − u, 1
and c1 = c1 (u) =
u, I . u, 1
(2)
One of the most inspiring discoveries in resent years, due to Chan, Esedou¯glu and Nikolova [5], is that, for any fixed c, the minimization (1) with respect to binary label functions u may be solved exactly by considering a convex relaxation of the problem, where the set of admissible u’s is enlarged to: K := {u ∈ BV(Ω) ; 0 ≤ u(x) ≤ 1 for all x ∈ Ω}.
(3)
In fact, it was shown in [5] that if u ∈ K minimizes (1), then for almost all thresholds t ∈ (0, 1) the function 1 if u(x) > t ut (x) = (x ∈ Ω), (4) 0 otherwise is a global minimizer for the original problem. The proof is recalled in Section 2.1. Thus, global minimizers of the Chan-Vese model can be found by truncation of the solution to an easier, unilaterally constrained, convex variational problem. The use of this truncation property is referred to as the continuous (graph) cut method, and problems formulated in this manner can be solved efficiently using fast algorithms for TV-minimization. See, e.g., Chambolle [6]. The problem of including apriori shape information into the segmentation process has been studied extensively within the level set framework for the last decade or so [7, 8, 9, 10, 11]. The common approach is to include a interaction energy between object Σ and a prior shape Σ into the segmentation functional. If f denotes the characteristic function of the prior shape Σ , then a typical shape prior segmentation functional looks like E(u, c, f ) = ECV (u, c) +
γ u − f 2 , 2
(5)
where γ > 0 is a fixed coupling constant for the interaction, and u = u, u is the L2 norm. The shape interaction in (5) may be interpreted geometrically as u − f 2 = area(Σ Σ ), i.e. the area of the symmetric set difference between the sets Σ and Σ , c.f. [10] and [11]. The segmentation is now obtained by minimization of the functional (5) with respect to the (binary) label functions
686
N.Chr. Overgaard, K. Fundana, and A. Heyden
u, gray values c and f ∈ F , where F denotes a class of prescribed shape priors. This formulation is quite general. A specific example, considered in this paper, is segmentation with pose invariant priors. In this case F = {f = f0 ◦ T }T ∈G , where the binary function f0 is a shape template, and T ranges over a group of transformations G, e.g. the group of similarity transformations. Since continuous cuts have emerged as an alternative to level sets for minimization of the CV- and other segmentation models, it is natural to ask if known shape prior segmentation models can be reformulated as variational problems possessing the important truncation property, which allow them to be solved using TV-minimization algorithms. One such attempt has been made in [12], see Section 2.2, but it does not go all the way. The purpose of the present paper is to formulate the shape prior segmentation model (5) as a continuous cut problem. This is achieved by reformulating the problem as a CV model (see Section 3.1). We specifically consider shape priors which are pose invariant under the group of similarity transforms, which involves optimization over a Lie group. In order to solve this problem efficiently and reliably, we develop a theory for gradient descent on Lie groups (Section 3.3). The problem here is, essentially, to construct a Riemannian structure on the Lie group. The new theory eliminates the problems associated with step-size selection in discretizations of the gradient descent ODEs usually encountered in segmentation models with pose estimation.
2 2.1
Background Relaxation in the CV Model
In this section we briefly describe the theory behind the continuous cut solution for the CV model and its connection to the ROF denoising model and TV-minimization. Let us consider the minimization of (1) over the set of label functions u ∈ K defined in (3), and gray values c ∈ R2 . In this setting ECV is a bi-convex functional, that is, convex in each of its arguments u and c, separately, when the other is kept fixed. However, ECV is not jointly convex. One therefore uses a method referred to, in this paper, as the CV-algorithm, which alternates between optimization in u and c: If an initial state (u0 , c0 ) is given, then a minimizing sequence (uk , ck ) is constructed by uk+1 = arg min ECV (u, ck ),
(6)
u∈K
ck+1 = arg min ECV (uk+1 , c).
(7)
c∈R2
The sub-problem (7) is a simple quadratic optimization whose solution is readily given by the formulas in (2) with u = uk+1 . We therefore proceed to describe the theory and algorithms for the continuous cut solution of the sub-problem (6). If c is fixed then the minimization of (1) over K is equivalent to minimization over K of the energy
Pose Invariant Shape Prior Segmentation Using Continuous Cuts
λ ˆ E(u) = J(u) + (I − c1 )2 − (I − c0 )2 , u := J(u) + g, u, 2
687
(8)
where g = (λ/2)[(I − c1 )2 − (I − c0 )2 ] is the data term. We now prove the result in Chan et al. [5] referred to in the Introduction, that minimization of Eˆ over binary u’s can be obtained from the solution of the convex variational problem ˆ inf u∈K E(u) by truncation. For u ∈ BV (Ω), let ut denote the function defined in (4). We recall: ˆ then so does ut for almost The Truncation Lemma. If u ∈ K solves inf K E, all t ∈ [0, 1]. 1 Proof. The coarea formula, J(u) = 0 J(ut ) dt, and the layer cake representation 1 1 ˆ ˆ t ) dt. Since ut ∈ K it is g, u = 0 g, ut dt, together yield E(u) = 0 E(u t ˆ ) ≥ E(u) ˆ admissible, and E(u for all t, by assumption, the integrand on the left 1 t ˆ ˆ hand side of 0 E(u ) − E(u) dt = 0 must be zero for almost all t ∈ [0, 1].
In Chan et al. [5], the minimum was approximated by solving a degenerate parabolic PDE for u (the gradient descent PDE) with an exact penalty term to ensure that the constraint 0 ≤ u ≤ 1 is satisfied at all times. This PDE was implemented with an explicit finite difference scheme, and is therefore rather slow. We have chosen another method, introduced by Aujol and Chambolle [13] and used successfully by Bresson et al. [14, Sect. 3.2]. This consists of minimizing a variant of (8) which has been regularized slightly by infimal convolution with a quadratic function: inf
v∈BV, u∈K
J(v) +
1 v − u 2 + g, u , 2θ
(9)
where θ > 0 is a parameter, and send θ → 0. For θ fixed, the problem is solved iteratively using what we call the ABC-algorithm: If (v 0 , u0 ) denotes an initial guess, then a minimizing sequence is given by the pair (v n , un ) where 1 v n+1 = arg min J(v) + v − un 2 = un − θ PrC (un /θ), 2θ v∈BV un+1 = arg min u∈K
1 n+1 v − u 2 + g, u = PrK (v n − θg). 2θ
(10) (11)
The first of these problems is the classical Rudin-Osher-Fatemi (ROF) image denoising model [15] with un as input image. The second one is a simple L2 optimization. Both problems are strictly convex, thus admits unique solutions, and, as indicated, their optima can be expressed in terms of L2 -projections onto closed convex sets: the first projection is onto C, which is the L2 -closure of the set {div ξ ; ξ ∈ C 1 (Ω; R2 ), |ξ(x)| ≤ 1 ∀x ∈ Ω}, c.f. Chambolle [6]. The second projection is onto K, defined above. The latter is easy to compute, indeed PrK f (x) = min(1, max(0, f (x))) for x ∈ Ω), for any square L2 function f : Ω → R.
688
N.Chr. Overgaard, K. Fundana, and A. Heyden
To minimize the ROF functional (10) we use a variant of the fast and reliable algorithm for TV-minimization proposed by Chambolle [6, 16]. 2.2
The Algorithm of Fundana and Co-workers
A resent paper by Fundana et al. [12] contains what is probably the first attempt to include shape priors into continuous cut segmentation. The authors consider the model (5) where f = f0 ◦ T is pose invariant under the group of similarity transformations T of the plane, i.e. the variational problem inf
u,c,T
γ E(u, c, T ) := ECV (u, c) + u − f0 ◦ T 2 . 2
(12)
This problem cannot be solved by continuous cuts (for c and T fixed) simply by enlarging the admissible label functions from the binary u’s to u ∈ K. The problem, of course, lies in the quadratic interaction term, which seems to “spoil” the Truncation Lemma. In [12] this problem is cleverly circumvented by the following construction: If (u0 , c0 , T 0 ) denotes an initial guess then a minimizing sequence (uk , ck , T k ) is (essentially) constructed by the following procedure: ck+1 = c(uk )
using formula (2).
(13)
∂ E(uk , ck+1 , T k ) time step Δt > 0 (14) ∂T γ = arg min ECV (u, ck+1 ) + u − f0 ◦ T k+1 , uk − f0 ◦ T k+1 (15) 2 u∈K
T k+1 = T k − Δt uk+1
Here we observe that by “freezing” one occurrence of u = uk in the quadratic interaction term, the update step (15) becomes linear in u, hence solvable by continuous cut methods. In [12] this minimization was performed using the gradient descent PDE from [5]. Our aim is to improve the above method by formulating the problem in such a way that the model itself, not only the algorithm, satisfies the truncation property.
3 3.1
The Shape Prior Segmentation Model The Basic Energy Functional
Our reformulation of the functional (5) is based on the following observation: If the label function u : Ω → {0, 1} is binary, and we define an image model by Imodel = Imodel (u, c) = c0 (1 − u) + c1 u, then it is easy to see that the CVfunctional (1) may be rewritten as: ECV (u, c) = J(u) +
λ I − Imodel 2 . 2
(16)
This suggests the following model for shape prior segmentation: If f : Ω → R denotes a (possibly fuzzy) shape prior, that is 0 ≤ f (x) ≤ 1 on Ω, then we
Pose Invariant Shape Prior Segmentation Using Continuous Cuts
689
associate an image model to f given by Iprior = Iprior (f, b) = b0 (1 − f ) + b1 f . We now pose shape prior segmentation as the minimization over all binary label functions u of the following functional: λ μ E(u, c, f, b) = ECV + Eprior = J(u)+ I − Imodel 2 + Imodel − Iprior 2 . (17) 2 2 Notice that close to convergence, it is reasonable to expect that b0 ≈ c0 and b1 ≈ c1 . Assuming that exact equality holds here, we find that μ μ Imodel − Iprior 2 = (c1 − c0 )2 u − f 2 , 2 2
(18)
which corresponds to the interaction term in (5) if we set γ = μ(c1 − c0 )2 . We will use this simplification in Section 3.2. Let us consider the minimization of (17) with respect to u and c when prior data b and f are kept fixed. After completion of squares in (17) we find that E(u, c, f, b) = J(u) +
λ + μ
Imodel − ( λ I + μ Iprior ) 2 2 λ+μ λ+μ
2 μ 1 1 λ I+ Iprior . + I 2 − 2 2 λ+μ λ+μ
(19)
Only the first square depends on the (binary) u and c. So updating u and c is equivalent to solving the following CV-problem: λ + μ 1 − u, (Ieff − c0 )2 + u, (Ieff − c1 )2 . inf J(u) + (20) 2 μ λ I + λ+μ Iprior is an effective image obtained as a convex combiHere Ieff = λ+μ nation of the observed image I and the prior image Iprior . The problem (20) has the truncation property, and may be solved by the CV-algorithm (6), (7), using continuous cuts. This solution is a minimizer of (17). Suppose that c and u have been updated and are now held fixed. Returning to the energy E, written in the original form (17), we optimize with respect to prior image model Iprior = b0 (1−f )+b1f . An easy calculation shows the optimal gray scales b = (b0 , b1 ) are given by the formulas:
b0 =
1 − f, Imodel 1 − f 2
and b1 =
f, Imodel . f 2
With these values fixed, we proceed to update the pose of the shape prior f , which is the subject of the next few sections. 3.2
Pose Invariant Prior Interaction Energy
Let f0 : Ω → R denote a shape template of class C01 (Ω), and T : R2 → R2 a similarity transformation, that is, a mapping of the form y = T (x) = μ−1 R−1 (x − a), x ∈ R2 , where R ∈ SO(2) denotes rotation, μ > 0 a scaling
690
N.Chr. Overgaard, K. Fundana, and A. Heyden
factor, and a ∈ R2 translation. We define the shape prior f as the transformed template T ∗ f0 : R2 → R by the formula f (x) = T ∗ f0 (x) = (f0 ◦ T )(x) = f0 (T (x)) for all x ∈ R2 . If T is sufficiently close to the identity map then, clearly, T ∗ f0 ∈ C01 (Ω), so that the support of the prior will remains inside the image domain Ω. In the present paper we use the simplification of (17) in (18) and consider a pose invariant prior interaction defined by the energy, Eprior (u) = inf u − T ∗ f0 2 = inf (u(x) − f0 (T (x)))2 dx, (21) T
T
Ω
where the infimum is taken over the group of similarity transforms T in the plane. The following (natural) parametrization is used throughout: cos θ − sin θ (θ ∈ R). (22) a ∈ R2 , μ = eσ (σ ∈ R), and R(θ) = sin θ cos θ The pose parameters θ, σ and a are collected in a vector p = (p1 , p2 , p3 , p4 ) := (θ, σ, a) ∈ R4 , the corresponding map is occasionally denoted T = T (p), and the shape prior becomes f (x) = T ∗ f0 (x) = T (p)∗ f0 (x) = f0 (e−σ R(−θ)(x − a)). Now, the infimum in (21) is usually computed by applying a gradient descent procedure to the function R4 p → E(p) := u − T (p)∗ f0 2 /2. That is, one solves a system of ODE:s given by p (t) = −∇E(p(t)), with respect to an artificial time parameter t, and the obtain the optimal pose p∗ as p∗ = limt→∞ p(t). This method requires the computation of the partial derivatives ∂E(p)/∂pi for every component pi of p. A simple calculation shows that ∂E(p)/∂pi = T (p)∗ f0 − u, ∂T (p)∗ f0 /∂pi , so we begin with the partials ∂T (p)∗ f0 (x)/∂pi . By the chain rule, ∂ ∗ T f0 (x) = −∇x T ∗ f0 (x) = −∇x f (x) (two components!) ∂a ∂ ∗ T f0 (x) = −∇x T ∗ f0 (x)T J(x − a) = −∇x f (x)T J(x − a) ∂θ ∂ ∗ T f0 (x) = −∇x T ∗ f0 (x)T (x − a) = −∇x f (x)T (x − a) ∂σ 0 1 where J = R(−θ)T R (−θ) = [ −1 0 ] is the clockwise rotation by π/2 radians. Notice that −∇x f appears in all the formulas, with the x-derivative computed after transformation of the template. It follows from the above formulas that the partial derivatives of E(p) are given by (The first equation being interpreted component wise.)
∂ E(θ, σ, a) = −f − u, ∇x f , ∂a and
∂ E(θ, σ, a) = −f − u, ∇x f T J(· − a), ∂θ
∂ E(θ, σ, a) = −f − u, ∇x f T (· − a). ∂σ (23)
Pose Invariant Shape Prior Segmentation Using Continuous Cuts
691
These integrals are effectively computed on the support of −∇x f , that is, over a neighbourhood of the boundary of the shape prior. The traditional way to proceed is to iteratively update the pose parameters a, θ and σ using (essentially) the schemes a(t + Δta ) = a(t) − Δta · ∂E/∂a, θ(t + Δtθ ) = θ(t) − Δtθ · ∂E/∂θ, and σ(t + Δtσ ) = σ(t) − Δtσ · ∂E/∂σ. This is problematic; in order for this method to work properly the time steps Δta , Δtθ and Δtσ have to be chosen differently, and with great care. This is not only unsatisfying from a theoretical view point but it also limits the practical applicability of the method; not least because the delicate choice of time steps tends to be time-consuming. We address this problem in the next section. 3.3
The Gradient Construction
The group of similarity transformations constitutes a four-dimensional manifold that we denote M (i.e., M is a Lie group). Any point p ∈ M may be represented by the coordinates p = (θ, σ, a) using (22), which may be regarded as a (almost global) parametrization of a neighbourhood of the identity map in M. If E : M → R is a differentiable function then dE(p) : Tp M → R denotes the differential of E at p ∈ M, where Tp M is the tangent space of M at p. In the lo∂E ∂E cal coordinates the differential may be expressed as dE = ∂E ∂a da + ∂θ dθ + ∂σ dσ. Suppose that Tp M is equipped with a scalar product (·, ·)p , then we may define the gradient of E at p as the unique vector ∇E(p) ∈ Tp M which satisfies the relation (∇E(p), v)p = dE(p)v, ∀v ∈ Tp M. (24) The metric ds2 = |da|2 + dθ2 + dσ 2 defines a scalar product which, as already noted, is insufficient for the construction of a reliable gradient descent scheme for E(p) = u − T (p)∗ f0 2 /2. Our goal is to define a Riemannian structure on M which is better suited for this task. Let a function f : M × R2 → R be defined by f (p, x) = T (p)∗ f (x) = f0 (T (p)x). Since the shape template f0 ∈ L2 (R2 ), the mapping p → f (p, ·) is a function f : M → L2 (R2 ). Now, L2 (R2 ) comes with an inner product ·, ·, so it is natural to define the scalar product on Tp M as the pullback by f of the L2 -inner product to the tangent space Tp M, (v, w)p = df (p)v, df (p)w,
(v, w ∈ Tp M)
(25)
where df (p) : Tp M → Tf (p) L2 (R2 ) ≡ L2 (R2 ) denotes the differential of f . By the chain rule, df = −(Dx f0 ◦ T )dT , so in view of the identity dT = DT dp = Dx T (p)DT (0)dp (which uses the group structure of M) we see that df = −∇x f T DT (0) dp, where DT (0) is the linear map given by the block matrix: DT (0) = I2×2
J(x − a)
(x − a) .
692
N.Chr. Overgaard, K. Fundana, and A. Heyden
0 1 As before, J = [ −1 0 ]. With this calculation we find that
df (p)v,df (p)w = −∇x f T DT (0) dp(v), −∇x f T DT (0) dp(w) = dp(v)T 1, DT (0)T ∇x f ∇x f T DT (0)dp(w) := dp(v)T G(p)dp(w), where G(p) denotes the metric tensor on Tp M expressed in the coordinates p. If we define M = ∇x f ∇x f T then G(p) = 1, g(p, ·) where g(p, ·) : R2 → R4×4 is given by g(p, x) = DT (0)T M DT (0), which equals ⎡
M J(x − a)
M
M (x − a)
⎤
⎢ ⎥ ⎣(x − a)T J T M (x − a)T J T M J(x − a) (x − a)T JM (x − a)⎦ (x − a)T M
(x − a)T M J(x − a)
(x − a)T M (x − a)
This expression is, unfortunately, too complicated for our present purpose, so we need to make some simplification. This is achieved by approximating the structure tensor M by the simpler tensor 12 |∇x f |2 I2×2 . (There are some compelling reasons for doing so! For instance g3,3 + g4,4 = |∇x f |2 |x − a|2 .) With this simplification we get ⎡
⎤ J(x − a) (x − a) I2×2 |x − a|2 (x − a)T J(x − a)⎦ , g(p, x) = |∇x f |2 ⎣(x − a)T J T T T (x − a) (x − a) J(x − a) |x − a|2 where we notice that, in fact, the matrix elements g4,3 = g3,4 = 0 because J is skew-symmetric. Finally, if we choose a—the center of rotation and scaling— such that |∇x f |2 , x − a = 0, that is, as the barycenter of the mass-distribution dm = |∇x f |2 dx, then the metric tensor G = 1, g has the following diagonal form: ⎡ ⎤ ∇x f 2 I2×2 0 0 ⎦. 0 |x − a|∇x f 2 0 G(p) = ⎣ (26) 2 0 0 |x − a|∇x f Equivalently, (dp, dp)p = ∇x f 2 |da|2 + |x − a|∇x f 2 (dθ2 + dσ 2 ). It follows from (25) and the formulas (23), that the corresponding gradient of E has the components: ∇a E =
f − u, −∇x f , ∇x f 2
∇θ E =
f − u, −∇x f T J(· − a) , |x − a|∇x f 2
f − u, −∇x f T (· − a) . and ∇σ E = |x − a|∇x f 2
(27)
This is the gradient used in our implementation of gradient descent search for the optimal pose parameters. Its use amounts to an adaptive step-size control in the numerical discretization of the associated system of ODEs.
Pose Invariant Shape Prior Segmentation Using Continuous Cuts
693
Fig. 1. Experiment 1: First row: The original image, 212×320 pixles (left), the active contour Γ = {x ; u(x) = .5} in CV-segmentation without priors after 100 iterations (middle), and the corresponding segmentation (right). Second row: The shape template, the active contour and the shape prior after 150 iterations, and the final segmentation. Final row: segmentation of the image contaminated with 15% Gaussian noise using 200 iterations. Parameters: μ = .4, λ = .1, θ = .5 and step-size Δt = .75.
4
Experiments
The method presented in Section 3 was implemented in MATLAB with the following specifics: For the minimization of (17) (in the form (19)) we used the ABC-algorithm (10) and (11) with the parameter θ = 0.5 and a variant of Chambolle’s algorithm [16, Eq. (12)], implemented with periodic boundary conditions, for the TV-minimization in (10). This was alternated with an update of the pose of the prior, using gradient descent with the new gradient (27). The experiments presented here are limited to a proof-of-concept level. The first experiment (Figure 1) shows the CV segmentation with and without the shape prior, and with added noise. The segmentation result is displayed as a cutout from the original image by multiplication with the optimal label function u. This verifies the binary character of u. The second experiment (Figure 2) shows how the search evolves for three different initializations. As shown, the method may not always converge to the wanted solution. In fact, the prior contour may sometimes even shrink and disappear. These cases correspond, however, to quite plausible local minima for the pose energy, and this behavior is not unexpected in a local optimization method. More details are found in the figure captions.
694
N.Chr. Overgaard, K. Fundana, and A. Heyden
Fig. 2. Experiment 2: Shape prior segmentation with three different initial poses (top row). Evolution after (approximately) 12, 25, 50, 100 and 200 iterations (rows 2–6). The run-time for 100 iterations is about 25 CPU-seconds. In the final phase of the segmentation, objects previously detected outside the prior disappear. With the third initialization the shape prior gets stuck in a local minimum. Such behavior cannot be ruled out when we work with local optimization methods. Image size and parameter settings are as in Experiment 1.
Pose Invariant Shape Prior Segmentation Using Continuous Cuts
5
695
Conclusion
This paper contains two central contributions. Firstly, the reformulation in (17) of the shape prior segmentation model in (5), which leads to a minimization problem which can be solved using continuous cut methods. Secondly, the derivation of the gradient expressions (27), which is the basis for a stable and efficient gradient descent scheme for prior pose optimization. We believe that the ideas introduced here can be extended to cover more general and complex shape prior segmentation models. In particular it would be interesting to see if the ideas can be applied to pose problems in three dimensions.
References 1. Chan, T., Vese, L.: Active contours without edges. IEEE Transactions on Image Processing 10(2), 266–277 (2001) 2. Osher, S., Sethian, J.A.: Fronts propagating with curvature-dependent speed: Algorithms based on Hamilton-Jacobi formulations. Journal of Computational Physics 79, 12–49 (1988) 3. Sethian, J.: Level Set Methods and Fast Marching Methods Evolving Interfaces in Computational Geometry, Fluid Mechanics, Computer Vision, and Materials Science. Cambridge University Press, Cambridge (1999) 4. Osher, S.J., Fedkiw, R.P.: Level Set Methods and Dynamic Implicit Surfaces. Springer, Heidelberg (2002) 5. Chan, T.F., Esedo¯ glu, S., Nikolova, M.: Algorithms for finding global minimizers of image segmentation and denoising models. SIAM J. Appl. Math. 66(5), 1632–1648 (2006) 6. Chambolle, A.: An algorithm for total variation minimization and applications. J. Math. Imaging and Vision 20(1–2), 89–97 (2004) 7. Leventon, M., Grimson, W., Faugeras, O.: Statistical shape influence in geodesic active contours. In: CVPR (2000) 8. Rousson, M., Paragios, N.: Shape priors for level set representations. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2351, pp. 78–92. Springer, Heidelberg (2002) 9. Cremers, D., Soatto, S.: A pseudo-distance for shape priors in level set segmentation. In: Faugeras, O., Paragios, N. (eds.) 2nd IEEE Workshop on Variational, Geometric and Level Set Methods in Computer Vision (2003) 10. Chan, T., Zhu, W.: Level set based prior segmentation. Technical Report UCLA CAM 03-66, Department of Mathematics, UCLA (2003) 11. Riklin-Raviv, T., Kiryati, N., Sochen, N.: Unlevel-sets: Geometry and prior-based segmentation. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3024, pp. 50–61. Springer, Heidelberg (2004) 12. Fundana, K., Heyden, A., Gosch, C., Schnörr, C.: Continuous graph cuts for priorbased object segmentation. In: Proc. ICPR (2008) 13. Francois Aujol, J., Chambolle, A.: Dual Norms and Image Decomposition Models. Int. J. Comput. Vis. 63(1), 85–104 (2005) 14. Bresson, X., Esedo¯ glu, S., Vandergheynst, P., Thiran, J.-P., Osher, S.: Fast global minimization of the active contour/snake model. J. Math. Imaging Vis. 28(2), 151– 167 (2007) 15. Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 16. Chambolle, A.: Total variation minimization and a class of binary MRF models. UMR CNRS 7641, Ecole Polytechnique, Centre de mathematiques appliquées (June 2005)
A Non-local Approach to Shape from Ambient Shading Emmanuel Prados1 , Nitin Jindal1 , and Stefano Soatto2 1
2
Perception Lab., INRIA Grenoble – Rhône-Alpes, France Computer Science Department, University of California, Los Angeles, USA
Abstract. We study the mathematical and numerical aspects of the estimation of the 3-D shape of a Lambertian scene seen under diffuse illumination. This problem is known as “shape from ambient shading” (SFAS), and its solution consists of integrating a strongly non-local and non-linear Integro-Partial Differential Equation (I-PDE). We provide a first analysis of this global I-PDE, whereas previous work had focused on a local version that ignored effects such as occlusion of the light field. We also design an original approximation scheme which, following Barles and Souganidis’ theory, ensures the correctness of the numerical approximations, and discuss about some numerical issues.
1
Introduction
Shape From Shading (SFS) refers to the problem of computing the three-dimensional shape of a surface, under certain assumptions on its reflectance and on the illumination, from a single grayscale image. By necessity, to render the problem tractable, these assumptions are rather coarse: Most restrict the illumination to a single point-light source at infinity [20, 4, 13, 7]. Only recently, [14] have shown that the problem actually simplifies when the attenuation of the light source at finite distance is taken into account. Nevertheless, due to inter-reflections and other complex phenomena, modeling illumination as a point source is very unrealistic even on a bright sunny day. Indeed, in most realistic conditions including indoors and outdoor overcast conditions, a uniform hemispherical illumination source is a more realistic model. The study of SFS under such illumination conditions has been pioneered by Langer et al. [10, 16, 9], and followed by others that we discuss shortly. In this work, we focus on the mathematical properties of the problem of “Shape From Ambient Shading” (SFAS), and seek for conditions that render the problem well-posed. 1.1
Relation to Prior Work
Langer et al. [10,16,9] were the first to consider the case of ambient lighting, and to note that vignetting effects, far from being a nuisance, enable the inference of object shape similar to more traditional SFS, except for the added complication of the distributed source. In [17], Tian, Tsui and Yeung have proposed a numerical SFS algorithm for dealing with some non-punctual and multiple X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 696–708, 2009. c Springer-Verlag Berlin Heidelberg 2009
A Non-local Approach to Shape from Ambient Shading
697
light sources (any combination of spherical, rectangular and cylindrical light sources). Following a more elaborate and physically motivated model of illumination, [12, 18, 9, 19] introduced methods to deal with interreflection. However, in none of these works [17, 10, 16, 9, 12, 18, 9] are the mathematical properties of the SFAS problem elucidated analytically. In particular, there are no results on the existence and uniqueness of solution for the ensuing global PDE. At the opposite end of the spectrum, Lions, Rouy and Tourin [11] performed a theoretical analysis the SFS problem for multiple and continuous distributed light sources. As Tian, Tsui and Yeung [17], Lions, Rouy and Tourin neglect shadows (i.e. occlusions of the light sources by the surface itself); more specifically, they assume that for any fixed point x on the surface, all the light sources located on the hemisphere normal to the surface at x are visible from this point. This allows them to neglect the global nature of the equation, which in turn significantly simplifies the analysis. As Langer et al. [10,16,9] we focus on ambient lighting. In their work, Langer et al. do not neglect the “shadows effect” and they model interreflections. They also underline the importance of ambient lighting in psychophysics. In this context, light comes from all directions and the assumption of Lions, Rouy and Tourin [11] is equivalent to assume that the solution is concave. Here, we do not want to limit ourself to concave objects. Therefore, Lions’ constraints are far too restrictive.1 . The necessity to consider these phenomena takes us to mathematicaly uncharted territories. To the best of our knowledge, we are the first to provide theoretical results for the SFAS problem. Also, we introduce numerical algorithms verifying the properties of monotony, consistency and stability which typically ensure its convergence (see [1]).
2
Modeling Shape from Ambient Shading
Shape From Shading exploits assumptions on the illumination and reflectance properties of the scene (or of an object of interest within the scene) to relate its three-dimensional (3-D) shape to the measured grayscale image. The most typical assumptions are that the scene is Lambertian with constant diffuse albedo. This is akin to chalk and rough stone, and neglects specularities, translucency and other complex phenomena in the interaction of light with matter. While this assumption is clearly violated in most natural and man-made scenes, there are significant portions of scenes where the assumption is reasonable, and even objects that are far from Lambertian, such as human faces, have been successfully approximated as such for the purpose of analysis and inference (but not for synthesis, as humans are evolutionarily atuned to discriminate subtle features in human faces). Clearly, being SFS an ill-posed problem, there is no way to validate the assumptions on the data themselves, so applying SFS to a scene that is not Lambertian and that does not have constant diffuse albedo will results in gross errors even if the SFS algorithm used is provably correct and optimal. The 1
For simplicity, however, we also neglect interreflections, as Lions et al. [11] did, and we lump their contribution into the ambient illumination term, up to additive errors.
698
E. Prados, N. Jindal, and S. Soatto
second class of assumptions commonly made concern illumination. The most common assumption, that of a point light source, is made more for mathematical convenience than for realism. Under this model, anything hidden from direct line-of-sight to the sun would be invisible, clearly a far cry from reality. Modeling the entire sky as a constant-radiance hemisphere seems to be equally crude, but indeed it has been shown to be a better approximation that a single pointlight source [8]. Clearly, both phenomena are important and we auspicate their eventual integration. In the next subsection we formalize these assumptions and introduce our notation. 2.1
Reflectance Assumptions
Let S be a surface that supports a bi-directional reflectance distribution function (BRDF) β with Lambertian reflection and constant diffuse albedo ρ. In other words, following [6], the BRDF at a point p ∈ S does not depend on the viewing direction νpx , but only on the light source direction ν and on the position of the point itself p ∈ S: β(p; νpx , ν) = ρ. Because the intensity of the light source is not known, without loss of generality we can assume that the albedo to be equal to 1, and attribute the actual value to the light source. 2.2
Lighting Assumptions
We assume the dominating sky principle [10], so we neglect inter-reflections and, for any point of the surface, consider only radiant energy coming from the sky which is assumed to be a whole sphere of infinite radius. We also assume that the illumination is homogeneous, that is to say, that its power density distribution is constant. This assumption is required if we want to get rid of other contraints while still keeping the problem manageable. Now, unlike most previous work, we want to model the effect of self-occlusions, whereby the light source is only partly visible at each point. Let q be a point in R3 . We call visibility function and we denote χS (q; ν) the indicator function of the directions ν ∈ S2 from q that are not occluded by S: χS (q; ν) = 1 if {q + λν, λ ∈ R+ } ∩ S = φ, otherwise, χS (q; ν) = 0. The visibility function specifies if a point q is reached by the light ray of direction ν. The visibility cone assembles all the visible rays from a point q ∈ R3 : CS,q = {ν ∈ S2 : χS (q; ν) = 1}. 2.3
Resulting Radiance
Given the assumptions above, the radiance of the surface at a point p is χS (p; ν)ν, νp dν = ν, νp dν, = ν, νp + dν. RS (p) = S2
CS,p
(1)
CS,p
where νp is the unit normal vector to the surface S at p (see [6]) and where for all a in R, a+ = a if a ≥ 0 and a+ = 0 else. Here, the surface is implicitly assumed to be smooth. This ensures that all the ligth rays visible from a point come from above its tangent plane (the tangent plane would not be defined otherwise). So,
A Non-local Approach to Shape from Ambient Shading
699
for all points p on S, all the ligth rays visible from that point are included in the hemisphere defined by the normal νp to the surface at that point; that is to say CS,p ⊂ Hemiνp . Therefore ∀ν ∈ CS,p , ν, νp ≥ 0. Already at this point one can immediatly see the difficulty introduced by self-occlusions, for the integration domain of (1) is restricted to the visibility cone CS,p , which directly depends of the global geometry of the scene S. This is unlike traditional SFS, where the radiance only depended on local properties of the scene, for instance the direction of the normal νp to the surface at a given point. This requires the deployment of a different arsenal of tools that traditionally considered in SFS.2 Unlike most prior work, we consider full ambient illumination. In such a case, the assumptions of [11] are equivalent to assuming that the surface is convex which is too restrictive an assumption. In the next section we relate the measurements, i.e. the image greyscale, to the unknown – the 3-D shape of the scene – via the model above.
3
Shape from Ambient Shading
In this section we formalize the problem of SFAS as the solution of a global integro-partial differential equation, which we analyze in the next section. 3.1
Imaging Equation
We assume that we measure a greyscale image I : D ⊂ R2 → R+ ; x → I(x), on a closed domain D. Our goal is to characterize the surfaces S which generate it. Note that in general there is no guarantee that the surface is unique. We now need to link the measurements (I) with the unknowns (S). To do so we use the assumptions developed in the previous section, together with the socalled Radiance equation [6], which approximates the brightness of a pixel x of the image with the radiance of the point πS−1 (x) of the surface viewed in x: I(x) = RS (πS−1 (x)). Using the results from the previous section we have I(x) = ν, νp + dν, (2) CS,p
where νp is the outward-pointing normal vector to the surface S at the point p = πS−1 (x). In what follows we are going to assume that the data I corresponds with an image of a scene verifying our modeling assumptions. In particular, for 2
In order to simplify the problem and to remove this global dependency, Lions, Rouy and Tourin [11] assume that for all the points of the surface, all the light sources located on the normal hemisphere are visible. In other words, they assume that there are no self-shadows. simplifies strongly the problem Also, such an assumption because we have then C ν, νp RL (ν) dν = S2 ν, νp RL (ν) dν, where RL (ν) is S,p the power density distribution of the lighting. Also, this completely removes the global dependency of the radiance with respects to the whole shape.
700
E. Prados, N. Jindal, and S. Soatto
convenience, we rescale the range so as to have 0 ≤ I(x) ≤ π. Also for simplicity, we assume that the camera performs an orthographic projection of the scene. This is a reasonable hypothesis provided that the domain of interest in the scene is small compared to its distance to the camera. Under these conditions, we can represent the surface as the graph of a function u, and write the outward unit normal vector explicitly: S = {(x, u(x)); x ∈ D} ; ν(x,u(x)) = √ 1 (−∇u(x), 1). 2 1+|∇u(x)|
Finally, following [13], we could assume that the camera is a pinhole. This assumption could be forgone at the cost of a more complicated notation, but the core of the analysis in this paper would hold nevertheless. 3.2
Formulation as an Integro-Differential Equation
With the orthographic camera model, the image formation model above can be interpreted as a Partial Differential Equation (PDE) in the unknown function u: + 1 (−∇u(x), 1) , ν dν, (3) I(x) = 1 + |∇u(x)|2 Cu,(x,u(x)) where Cu,p denote CS,p (the surface S is represented by the function u). Solving the SFAS problem then amounts to integrating the PDE (3) given an image I. Clearly the result would be meaningful only if a solution exists, and if it is unique, or at least if one can characterize the set of functions u that are indistinguishable in the sense of all solving (3) for a given measured image I. Note that this equation is a first-order stationary global integro-partial differential equation of the general form: H(x, u(x), ∇u(x), u(.)) = 0, ∀x ∈ Int(D). The numerical and theoretical study of the solutions of these kind of equation is done via the Hamiltonian 1 H(x, t, p, u) = (−p, 1), ν+ dν − I(x). 1 + |p|2 Cu,(x,t)
4
Analysis of the Shape from Ambient Shading Equation
We consider now the problem of uniqueness of solution of (3). While we show that the solution is, in general, not unique, we give an analytical characterization of all the different scenes that – under the given assumptions – yield the same measured image. This analysis is important both for the purpose of implementing viable numerical integration scheme, and also to make SFAS a useful tool in Computer Vision. This is akin to what is done in Structure From Motion [5], where the 3-D structure of a scene is in general not unique, but one can easily characterize the solutions as being equivalence classes under the similarity, affine or projective groups depending on knowledge on the camera calibration. 4.1
An Intrinsic Ambiguity
First, recall that 0 ≤ RS (p) ≤ π, p ∈ S and CS,p ⊂ Hemiνp , so one can easily show that RS (p) = π iff CS,p = Hemiνp . Now, let us consider a completely
A Non-local Approach to Shape from Ambient Shading
701
u(x)
x
Fig. 1. Example of multiple solutions in dimension 1 when the image contains a subset of pixels having the maximal intensity. Any curve between the blue and the green curves, and which is concave on the set of points with maximal intensity, generates the same image as the one generated by the initial black curve.
white image with a maximal intensity: I(x) = π ∀ x ∈ D. With such an image, the solutions of equation (2) satisfy CS,p = Hemiνp for all the points p on the surface. Therefore, if we represent the surface as the graph of the function u, it is easy to see that the surface lies below the tangent plane to the surface at the point (x, u(x)). So, the solutions u of (3) are concave, and so is the surface S. Since inversely all concave functions generate such a white image then we can conclude that the set of solutions is comprised of all concave functions. In this case, the problem is clearly ill-posed because the image can be generated by a number of different surfaces, and therefore the solution cannot be unique. This problem does not arise only in this pathological case: It is patent as soon as the image contains a subset of pixels having the maximal intensity, as we illustrate in Figure 1. Pixels with maximal intensity are shown in red, and the green curve corresponds with a maximal solution when the blue gives the minimal one. Any curve between these two, which is concave on the set of points with maximal intensity, generates the same image as the one generated by the black curve. In the following sections, we will show that this condition is minimal, in the sense that the solution is unique if and only if there are no subsets of pixels having the maximal intensity. Also, when there are multiple solutions, they are characterized by in terms of their value on these subsets. 4.2
Uniqueness Result and Characterization of the Solutions
In this section we show that the solutions of the SFAS problem are charaterized by their value on the subset {x | I(x) = π} ⊂ D. To the end, let us define Ω = {x | I(x) < π} and let us complete the equation H(x, u(x), ∇u(x), u) = 0, ∀x ∈ D
(4)
by some Dirichlet boundary conditions on CΩ = D− Ω = {x ∈ D | I(x) = π}. In other words, we assume that we know the height of the solution on this subset. The equation then becomes H(x, u(x), ∇u(x), u) = 0, ∀x ∈ Ω, (5) u(x) = ϕ(x) ∀x ∈ CΩ.
702
E. Prados, N. Jindal, and S. Soatto
For mathematical convenience, we also assume that the brightness image I is continuous (then Ω is an open subset of D) and that the intensity is maximal ¯ ⊂ Int D). We on the boundary of the image (in other words, we assume that Ω can now state the uniqueness theorem: Theorem 1. If u and v are two C 1 solutions to equation (5) then u = v on D. This theorem ensures that there exists at most a unique C 1 solution to equation (5). Also, it provides a characterization of the set of the solutions of equation (4), characterized by its values on the subset CΩ (the region where I(x) = π). If the image never saturates (CΩ is empty), then the solution is unique when complemented by a Dirichlet boundary condition. Equivalently, all solutions are parameterized by their boundary conditions. Because of space constraints, we cannot report the complete proof of theorem 1 here, and we refer the reader to our technical report [15] for details. The relevance of this result from the standpoint of Computer Vision is that if we know the depth of the scene on the subset where the image is saturated, then there exists a unique solution to the Shape From Ambient Shading problem. This means that, elsewhere on the image, ambient shading is sufficient to recover the original surface which generated the image. In the next section we develop an approximation scheme for numerically integrating (5).
5
Approximation Scheme and Numerical Algorithm
In section 3 we have formalized the SFAS problem as the solution of a partial differential equation of the form H(x, u(x), ∇u(x), u) = 0. We have then added Dirichlet boundary conditions on CΩ = D−Ω to arrive at a unique solution when the image is not saturated. In order to compute a reliable numerical solution to this equation, we use machinery available for Hamilton-Jacobi equations. The key point consists then in designing approximation schemes which are monotone [2, 1]. 5.1
A Monotonic Scheme
Following [1], we consider schemes of the form S(h, x, uρ (x), uρ ) = 0 where S : ¯ × R × B(Ω) ¯ → R : (h, x, t, u)
R+ × Ω → S(h, x, t, u); h ∈ R+ defines the size of the grid that is used in the corresponding numerical algorithms (a 2D Cartesian ¯ is the space of bounded functions defined on the set Ω. ¯ uρ is the grid); B(Ω) ρ unknown (u is a function). Also, we are interested in the solution uρ of the ¯ and scheme S. We say that the scheme S is monotone if for all h ∈ R+ , x ∈ Ω ¯ ¯ t ∈ R the function S(h, x, t, ·) : B(Ω) → R is monotone. That is, for all y ∈ Ω, u(y) ≥ v(y), then S(h, x, t, u) ≥ T (h, x, t, v). An iterative algorithm for computing a numerical approximation of the solution directly follows. Given un (the approximation of uρ at step n), and a point ¯ the associated algorithm consists in solving the equation x of Ω, S(h, x, t, un ) = 0
(6)
A Non-local Approach to Shape from Ambient Shading
703
with respect to t. A solution of (6) is the updated value of un at x. Here, we are then going to use the definition of monotonicity given by Barles and Souganidis in [1]: Definition 1 (monotonicity). The scheme S(h, x, uρ (x), uρ ) = 0 defined in ¯ ∀t ∈ R and ∀u, v ∈ B(Ω), ¯ ¯ , is monotone if ∀h ∈ R+ , ∀x ∈ Ω, Ω u≤v
=⇒
S(h, x, t, u) ≥ S(h, x, t, v)
(the scheme is non-increasing with respect to u). The interest of the monotonicity is twofold. (i) With other basic assumptions (monotonicity with respect to t, existence of a subsolution, bound for the subsolutions), this property is the key to ensure that the scheme is stable (existence of the solution and of an upper bound), that the computed approximations converge towards the solution of the scheme, see [13]. (ii) Combined with some stability and consistency properties, the monotonicity ensures that the solutions of the scheme converge towards the continuous solution of the considered PDE when the grid vanishes see [1]. In what follows, we are going to design a monotonic approximation scheme for the SFAS problem in order to take advantage of all these benefits. 5.2
Monotonic Scheme for the SFAS Problem
For readability, we denote Hu,t (x, p) = H(x, t, p, u). Let us recall that the Hamiltonian of insterest in SFAS is Hu,t (x, p) = Cu,(x,t) √ 1 2 (−p, 1), ν+ dν −I(x). 1+|p|
One can verify easily that Cu,(x,t) is decreasing (in the sense of inclusion) with respect to u and increasing with respect to t. Also, it follows that Hu,t verifies exactly the same monotonic properties. On the other hand, in order to get a consistent approximation scheme, we have to replace ∇u (represented by the variable p in the above Hamiltonian) in the PDE by one of its numerical approximations (finite differences). The difficulty is then to find such a discretization while maintaining monotonicity. In order to get a monotonic scheme, we take inspiration from Lax-Friedrichs scheme for conservation laws [3, 2]. We chose: S(h, x, t, u) = Hu,t (x, Du(x)) − θ Lut (x),
(7)
where Du(x) is the vector obtained by a centered discretization of ∇u(x), more precisely, the ith component of Du(x) is [Du(x)]i =
→ → u(x + h− ei ) − u(x − h− ei ) 2h
and where Lut (x) is the classical discretization of the Laplacian Δu(x) (in which one replaces u(x) by t), i.e. Lut (x) =
→ → u(x + h− ei ) + u(x − h− ei ) − 2t . 2 h
i=1..N
704
E. Prados, N. Jindal, and S. Soatto
This scheme, however, is still not necessarily monotonic. To satisfy this property, we need to find an adequate value for θ. By differential calculus, one can verify that maxi=1..N h |∂pi Hu,t (x, Dz)| ≤ 2θ is a sufficient condition to ensure this property; see [15] for a detailled √ proof. By the same tools, one can also easily prove√that |∂pi Hu,t (x, p)| ≤ 2 2π. The scheme(7) is then monotonic as soon as θ ≥ 2πh. Also, to limit the smoothing due to the Laplacian term introduced in the scheme (term which can be interpreted as a regularization), θ must be as small as possible. On the other hand, under the assumptions of section 4.2, one can verify that any deep enough function is a subsolution of the scheme (7) (because the visibility cone becomes arbitrarily small). Moreover, the subsolutions are necessarily bounded by the function corresponding to convex hull defined by the Dirichlet boundary constraints. Since the scheme is also increasing with respect to t and verifies limt→+∞ S(h, x, t, u) ≥ 0 then theorems 3.1 and 3.5 of [13] ensure that the scheme (7) is stable and that the iterative approximations converge towards the solution of the scheme. In practice, we can start from any subsolution and we have just to update the surface with scheme (7) until convergence. Finally, our scheme being also consistent with the SFAS I-PDE, relying on Barle and Souganidis theorem [1], we can conjecture that the computed approximations converge towards the continuous solution of the I-PDE. This guarantees the reliability of our numerical approximations toward the theoretical solution of our problem.
6
Numerical Experiments
We focus here on the numerical results obtained by the algorithm associated to the scheme (7). As described in section 5.1, the approximation schemes suggest an iterative numerical algorithm, whose udating step (at point x) consists in solving equation S(h, x, t, u) = 0 (equation in t), where u is the approximation of the whole solution at the previous step. Here, to solve equation Hu,t (x, Du(x))−θ Lut (x) = 0, we rewrite this equation as a fixed point equation t = g(t), where
→ − → − h2 g(t) = 14 i=1,2 (u(x + h ei ) + u(x − h ei )) − θ Hu,t (x, Du(x)
and then process the iterations tn+1 = g(tn ). In practice this process systematically converges after less than 5 iterations (we assign t0 to the previous value of u(x)). The numerical algorithm starts with a subsolution as a very steep valley such that visibility is closed to 0 for all points in the domain of the image. We refer the reader to [15] for further implemention details. To test our algorithm, we consider some scenarios for which the problem is well-posed. In other words, we limit the computation domain to a subset of Ω = {x | I(x) < π}. This computation domain is delimited by the red box in the corresponding figures. On the other part of the image domain, we enforce Dirichlet boundary conditions. In our tests, we use the sin(x) sin(y) surface. For the first test, we restrict the computation domain to a subset on which the surface is convex. As shown in
A Non-local Approach to Shape from Ambient Shading
705
Fig. 2. Left: image generated by the sin x ∗ sin y surface with h = 0.05 and region of interest where we run the algorithm; middle: original surface (groundtruth) on the region interest; right: surface reconstructed by our algorithm (result)
Fig. 3. Left: image generated by the sin x ∗ sin y surface with h = 0.05 inside a cubical box and region of interest where we run the algorithm; middle: original surface (groundtruth) on the region of interest; right: surface reconstructed by our algorithm (result)
Fig. 4. sinx ∗ siny image with regularization and region of interest where we run the numerical scheme. Results of the numerical scheme with (right) and without (left) regularization in input image.
Fig. 5. Reconstruction with different grid sizes h
706
E. Prados, N. Jindal, and S. Soatto Table 1. Errors for the first two tests min value max value L1 errors L2 errors L∞ errors sin x sin y, Fig. 2 -0.999707 0.066750 0.006191 0.009792 0.033867 sin x sin y in box, Fig. 3 -0.999707 0.999568 0.188896 0.240712 0.372564 Table 2. Errors by adding the regularization term in the input image Min Value Max Value L1 Error L2 Error L∞ Error without regularization -0.999707 0.999568 0.186037 0.189434 0.207331 with regularization -0.999707 0.999568 0.065627 0.067900 0.078941 Table 3. Errors with respect to h
grid sizes (h) L1 error L2 error L∞ error
h = 0.2 0.504147 0.526644 0.658852
h = 0.1 0.358676 0.371685 0.424875
h = 0.08 0.270054 0.276862 0.308127
h = 0.05 0.186037 0.189434 0.207331
h = 0.04 0.151427 0.153691 0.166671
Figure 2, the computed iterative solution converges accurately towards the original surface. In the second test, we want to extend the computation domain to both concave and convex areas. To remove the ambiguity due to points with maximal intensity, we reduce the intensity of the image by placing the sin(x) sin(y) surface in a box, i.e. surrounded by four walls of a cube with the roof open. In this test, the algorithm converges towards the solution in both concave and convex regions. Nevertheless, as shown Figure 3, when the reconstruction is very accurate in the convex region, there is a significant error in the concave region. Table 1 shows the minimum and maximum values of the original surfaces in the regions of interest (where the algorithm is applied). It also shows the L1 , L2 and L∞ errors. The top row shows the errors for the first test (sin(x) sin(y) surface) illustrated in Figure 2. The second row shows the errors for sin(x) sin(y) surface inside a box; it corresponds with the result of Figure 3. In our experiments, we have used the L1 error to test for convergence. In the second test, one can understand the error on the concave region as a result of the introduction of the regularization term (which was needed to make the scheme monotonic). To further analyze this effect, we focus on the concave part and we perform the following two experiments. 1) We run our algorithm with an input image containing the regularization term. More precisely, we use 1 ˜ I(x) = (−Du(x), 1), ν+ dν − θ Lu(x) 1 + |Du(x)|2 Cu,(x,u(x)) as input to our algorithm. So, in practice, the algorithm computes the solution of equation 1 ˜ − θ Lu(x) = 0 (−Du(x), 1), ν+ dν − I(x) 2 1 + |Du(x)| Cu,(x,u(x))
A Non-local Approach to Shape from Ambient Shading
707
and the computed solution should then better coincide with the original surface. We then make this third test with the sin x sin y surface inside the box (with a computation domain reduced to the concave part). As shown in table 2 and Figure 4, the algorithm is now able to recover accurately the surface. 2) Finally, since the regularization parameter θ is linearly dependent with the size of the grid h, then the regularization effect should reduce when the size of the grid vanishes. We then redo the second test (sin x sin y surface inside a box, with the original image I, with the same reduced computation domain as previously) with smaller and smaller grid sizes: h = 0.2, 0.1, 0.08, 0.05, 0.04. Also, as we can see in Figure 5 and Table 3, the computed approximations actually converge towards the original surface when the grid size is reduced. In addition to confirm the above assertion, this also validates our methodology and our theory which ensures a well-posed algorithm whose the output convergences towards the continuous solution when the grid vanishes.
7
Conclusion and Future Work
In 3-D reconstruction approaches to Computer Vision, illumination is rarely modeled explicitly. With few notable exceptions, most work in Structure From Motion assumes that illumination is constant and therefore it ascribes all photometric effects to the radiance of the scene, regardless of how it comes to be. In Shape From Shading, where the illumination is key, most existing work models it as an ideal point light source. In this paper we focus on the opposite abstraction, where the illumination is diffuse, and indeed it is constant. Outdoor scenes on a cloudy day, or indoor scenes in modern offices are reasonably well approximated by these conditions. Clearly one would like to account for arbitrary unknown radiant distributions, and possibly also illumination, but this would render the analysis prohibitive. Already under the restrictive assumptions we have chosen to operate under, the problem of recovering the 3-D shape of the scene translates to a global integro-differential equation that, to the best of our knowledge, has never been analyzed. Although algorithms have been explored in the past to exploit diffuse shading for recovering properties of the scene, a thorough theoretical study of the mathematical properties of this problem has been lacking. We believe we are the first to study the uniqueness of SFAS, to show that – in general – it is not unique, and to characterize the set of scenes that are indistinguishable, in the sense of satisfying the assumptions of SFAS and generating the same image. While we believe that the main contribution of this paper is analytical, we do validate our results empirically in simulation. To that end, we propose a monotonic scheme for numerically integrating the SFAS equation, and show experimental results that highlight the features, and challenges, of this method.
Acknowledgement ANR-06-MDCA-007 and ONR N00014-08-1-0414.
708
E. Prados, N. Jindal, and S. Soatto
References 1. Barles, G., Souganidis, P.E.: Convergence of approximation schemes for fully nonlinear second order equations. Asymptotic Analysis 4, 271–283 (1991) 2. Crandall, M.G., Lions, P.L.: Two approximations of solutions of Hamilton-Jacobi equations. Mathematics of Computation 43(167), 1–19 (1984) 3. Crandall, M.G., Majda, A.: Monotone difference approximations for scalar conservation laws. Mathematics of Computation 34(149), 1–21 (1980) 4. Durou, J.-D., Falcone, M., Sagona, M.: Numerical methods for shape-from-shading: A new survey with benchmarks. CVIU 109(1), 22–43 (2008) 5. Faugeras, O.: Three-Dimensional Computer Vision: A Geometric Viewpoint. MIT Press, Cambridge (1993) 6. Horn, B.K.: Robot Vision. MIT Press, Cambridge (1986) 7. Horn, B.K., Brooks, M.J. (eds.): Shape from Shading. MIT Press, Cambridge (1989) 8. Koenderink, J.J., Pont, S.C., van Doorn, A.J., Kappers, A.M.L., Todd, J.T.: The visual light field. Perception 36, 1595–1610 (2007) 9. Langer, M.S., Bulthoff, H.H.: Depth discrimination from shading under diffuse lighting. Perception 29(6), 649–660 (2000) 10. Langer, M.S., Zucker, S.W.: Shape from shading on a cloudy day. Journal of Optical Society of America 11, 467–478 (1994) 11. Lions, P.-L., Rouy, E., Tourin, A.: Shape-from-shading, viscosity solutions and edges. Numer. Math. 64, 323–353 (1993) 12. Nayar, S., Ikeuchi, K., Kanade, T.: Shape from interreflections. IJCV 6(3), 173–195 (1991) 13. Prados, E.: Application of the theory of the viscosity solutions to the Shape From Shading problem. PhD thesis, Univ. of Nice-Sophia Antipolis (2004) 14. Prados, E., Faugeras, O.: Shape from shading: a well-posed problem? In: Proceedings of CVPR 2005, vol. II, pp. 870–877. IEEE, Los Alamitos (2005) 15. Prados, E., Jindal, N., Soatto, S.: A non-local approach to shape from ambient shading. Technical report, INRIA (2009) 16. Stewart, A.J., Langer, M.S.: Towards accurate recovery of shape from shading under diffuse lighting. IEEE Trans. on PAMI 19(9), 1020–1025 (1997) 17. Tian, Y.L., Tsui, H.T., Yeung, S.Y., Ma, S.: Shape from shading for multiple light sources. Journal of the Optical Society of America 16(1), 36–52 (1999) 18. Wada, T., Ukida, H., Matsuyama, T.: Shape from shading with interreflections under proximal light source-3D shape reconstruction of unfolded book surface from a scanner image. In: ICCV (1995) 19. Yang, J., Zhang, D., Ohnishi, N., Sugie, N.: Determining a polyhedral shape using interreflections. In: CVPR 1997, p. 110 (1997) 20. Zhang, R., Tsai, P.-S., Cryer, J.-E., Shah, M.: Shape from Shading: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 21(8), 690–706 (1999)
An Elasticity Approach to Principal Modes of Shape Variation Martin Rumpf and Benedikt Wirth Bonn University, 53113 Bonn, Germany {martin.rumpf,benedikt.wirth}@ins.uni-bonn.de http://www.ins.uni-bonn.de Abstract. Concepts from elasticity are applied to analyze modes of variation on shapes in two and three dimensions. This approach represents a physically motivated alternative to shape statistics on a Riemannian shape space, and it robustly treats strong nonlinear geometric variations of the input shapes. To compute a shape average, all input shapes are elastically deformed into the same configuration. That configuration which minimizes the total elastic deformation energy is defined as the average shape. Each of the deformations from one of the shapes onto the shape average induces a boundary stress. Small amplitude stimulation of these stresses leads to displacements which reflect the impact of every single input shape on the average. To extract the dominant modes of variation, a PCA is performed on this set of displacements. To make the approach computationally tractable, a relaxed formulation is proposed, and sharp contours are approximated via phase fields. For the spatial discretization of the resulting model, piecewise multilinear finite elements are applied. Applications in 2D and in 3D demonstrate the qualitative properties of the presented approach.
1
Introduction
This paper is concerned with the notion of shape averages and principal modes of shape variation based on concepts from continuum mechanics, namely nonlinear and linearized elasticity. As shapes we consider object contours, encoded as edge sets in images. Compared to a classical principal component analysis in a vector space, where an average and a covariance tensor can be computed directly on the linear space itself, in the case of shapes we are dealing with highly nonlinear geometric variations. Hence, for the zero moment analysis – i. e. the definition of a suitable shape average – the total elastic energy stored in a set of deformations from the input shapes onto a single image shape is minimized. At the energy minimum the corresponding image shape is defined as the shape average. Concerning a first moment analysis, we propose a physically sound linearization of shape variations which allows to define a covariance tensor. Each deformation from an input onto the average shape induces stresses on the shape average, which can be regarded as the imprint of the input shape. Modulating these stresses leads to displacements on the shape average, where the mapping from stresses to X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 709–720, 2009. c Springer-Verlag Berlin Heidelberg 2009
710
M. Rumpf and B. Wirth
displacements is linear and well-defined. Each of these displacements can be regarded as a linearization of the usually nonlinear elastic deformation from one of the image shapes onto the shape average. Thus, a covariance tensor can be computed based on these displacements of the shape average. It linearly encodes the modes of variation of the shape average induced by the set of input shapes, even though the underlying deformations are usually large and nonlinear. Finally, we perform a principal component analysis based on this covariance tensor, which allows to identify the dominant modes of variation of the input shapes. Our model is related to the physical interpretation of the arithmetic mean and the covariance tensor for n points x1 , · · · , xn in IRd . Indeed, the arithmetic d mean x ∈ IR minimizes i=1,...,n αd(x, xi )2 , where d(x, xi ) is the distance between x and xi . Due to Hooke’s law, the stored elastic energy αd(x, xi )2 in the spring connecting xi and x is proportional to the squared distance. Hence, the arithmetic mean minimizes the total elastic energy of the system of connected springs. Likewise, the covariance tensor (xi − x, xj − x) can – up to the spring constant – be identified with the covariance tensor (σi , σj ) of the forces σi pulling at the mean x. At first, shape analysis was mainly based on correspondences between landmark positions on different shapes as in the influential work by Cootes et al. [1]. Principal component analysis (PCA) is a classical, by definition linear statistical tool. Chalmond and Girard [2] have proposed a PCA which incorporates also truely nonlinear geometric transformations. A survey on the potential of shape analysis in brain imaging is given by Faugeras and coworkers in [3]. Another important application concerns ready-made clothing, where it would be favorable to know the shape of the average human body and its principal modes of variation to design clothes which sufficiently fit as many people as possible. Conceptually, correlations of shapes have been studied on the basis of a general framework of a space of shapes and its intrinsic structure. The notion of shape space was introduced by Kendall [4] already in 1984. Charpiat et al. [5] discuss shape averaging and shape statistics based on the Hausdorff distance of sets. Statistics on signed distance functions was also studied by Leventon et al. [6], whereas Dambreville et al. [7] used shape statistics based on characteristic functions to define a robust shape prior in image segmentation. Kernel density estimation in feature space was introduced by Cremers et al. [8] to incorporate the probability of 2D silhouettes of 3D objects in image segmentation. An overview on related kernel density methods is given by Rathi et al. [9]. Mémoli and Sapiro [10] have investigated the Gromov–Hausdorff distance as a global measure for the lack of isometry in shape analysis. In contrast to such a global measure for the defect from an isometry, the nonlinear elastic energy functional involved in our approach measures this defect locally, and locally isometric deformations indeed minimize the corresponding local functional. Understanding shape space as an infinite-dimensional Riemannian manifold has been studied extensively by Miller et al. [11, 12]. Fuchs et al. [13] proposed a viscoelastic notion of the distance between shapes S given as boundaries of physical objects O. The elasticity paradigm for shape analysis on which our
An Elasticity Approach to Principal Modes of Shape Variation
711
approach is founded differs significantly from these metric approaches to shape space (cf. Sect. 4 for a detailed discussion of the conceptual difference). In this paper, shapes are represented implicitly via a diffused phase field description. This in particular enables a robust and flexible application in two and three dimensions.
2
Zero Moment Analysis
In this section we briefly recall an elastic approach to shape averaging already presented in [14]. We consider shapes Si as the boundaries ∂Oi of sufficiently regular objects Oi . Given n shapes, S1 , . . . , Sn , we seek an average shape S that reflects the geometric characteristics of the given shapes in a physical manner. For that purpose we assume that the average shape S can be described as a deformed configuration of the input shapes, i. e. there are deformations φi : Oi → IRd , i = 1, . . . , n, with S = φi (Si ) (see Fig. 1). A natural choice for the shape average S is that particular shape which minimizes the total n accumulated deformation energy of all deformations, E[S, (φi )i=1,...,n ] = n1 i=1 W[Oi , φi ], where W[Oi , φi ] represents the stored deformation energy of the deformation φi . To ensure existence of a minimizing shape S, we add a regularizingprior L[S] to the energy. Here, we consider the Hd−1 -measure of S, i. e. L[S] = S da, and the shape average S is defined as a minimizer of the energy E[S, (φi )i=1,...,n ] + μL[S]. As deformation energy W[Oi , φi ] we will employ a nonlinear, hyperelastic energy W[O, φ] = O W (Dφ) dx , whose integrand can be rewritten as a function of ˆ (Dφ, cofDφ, det (Dφ)) = W ¯ (I1 , I2 , I3 ) with only the three invariants W (Dφ) = W 2 2 T (I1 , I2 , I3 ) := (|Dφ|2 , |cofDφ|2 , det (Dφ)). |Dφ|2 := tr(Dφ Dφ), |cof(Dφ)|2 , and det(Dφ) describe the averaged local change of length, area, and volume, ˆ is conrespectively. We consider polyconvex energy functionals [15], where W vex and isometries, i. e. deformations with DφT Dφ = ½, are local minimizers p ¯ (I1 , I2 , I3 ) = α1 I 2 + (cf. Fig. 2). Typical energy densities are of the form W 1 q α2 I22 + α3 I3−s + α4 I3r with α1 , . . . , α4 > 0, where the penalization of volume 3 →0 ¯ I−→ ∞, enables us to control local injectivity (cf. [16]). shrinkage, i. e. W φ1
S1
S2
φ2
φ4
φ3
S3
S
φ5
S4
S5
Fig. 1. Sketch of elastic shape averaging. The input shapes Si (i = 1, . . . , 4) are mapped onto a shape S via elastic deformations φi . The shape S which minimizes the elastic deformation energy is denoted the shape average.
712
M. Rumpf and B. Wirth
Fig. 2. For two input shapes from Fig. 1 the deformation (via a deformed checkerboard), the averaged local change of length √12 |Dφi |2 , and the local change of area det(Dφi ) are depicted (colors encode range [0.95, 1.05])
This type of energy has two major advantages: it allows to incorporate large deformations with strong material and geometric nonlinearities, and its form follows from first principles and allows to distinguish the physical effects of length, area, and volume distortion, which reflect the local distance from an isometry. The first Piola–Kirchhoff stress tensor, which describes force per unit area in the reference configuration O, is then recovered as σ ref [φ] = W,A (Dφ) := ∂W∂A(A) . The Cauchy (real) stress, describing the force per unit area in the deformed configuration φ(O), reads σ[φ] = σ ref [φ](cofDφ)−1 . To simplify the numerical treatment and to allow for slight topological differences between the shapes Si we relax the constraint φi (Si ) = S, i = 1, . . . , n, and −1 introduce a penalty functional F [Si , φi , S] = Hd−1 (Si \ φ−1 i (S) ∪ φi (S) \ Si ) which measures the symmetric difference of the input shapes Si and the pull back φ−1 i (S) of S. Our shape averaging model is thus based on the energy 1 E [S, (φi )i=1,...,n ] = n i=1 n
γ
3
Oi
W (Dφi ) dx + γF [Si , φi , S] + μL[S] .
First Moment Analysis
As outlined in the introduction, our first moment analysis on shapes is based on an analysis of stresses induced on the shape average by each individual input shape. Modulation of each of these stresses results in a certain displacement, and the proposed principal component analysis on shapes will be performed on these displacements. To comprehensively derive this model we proceed in several steps: Encoding nonlinear deformations via stresses on a linear vector space. Let us at first review the underlying physical concept of stress. By the Cauchy stress principle, each deformation φi : Oi → O is characterized by pointwise boundary stresses on S in the deformed configuration, which try to restore the undeformed configuration Oi . The stress at some point x on S is given by the application of the Cauchy stress tensor σi = σ[φi ] to the outer normal ν on S. The resulting stress σi ν is a force density acting on a local surface element of S. Let us assume that the above relation between the energetically favorable deformation and its induced stresses is one-to-one. Hence, the average shape can be described in terms of the input shape Si and the boundary stress σi ν, and
An Elasticity Approach to Principal Modes of Shape Variation
713
we write S = Si [σi ν]. If we now scale the stress with a weight t ∈ [0, 1], we obtain a one-parameter family of shapes S(t) = Si [tσi ν] connecting Si = S(0) with S = S(1). Thus, we can regard σi ν as a representative of shape Si in the linear space of vector fields on S. Modeling the impact of an input shape on the average shape. Let us now study how the average shape S varies if we increase the impact of a particular input shape Sk for some k ∈ {1, . . . , n}. In fact, we intend to associate to every surface load σk ν a displacement on the averaged object domain O via the solution operator of a suitable linearized elasticity problem. Here, the object O actually is a deformed configuration of different original objects Oi . Hence, we have to choose a proper elasticity tensor which reflects the compound stress configuration of the averaged domain O. A simple isotropic linearized elasticity model would not take into account the nonlinear geometric nature of our zero order analysis. To achieve this, we apply the Cauchy stress σk ν to the average shape S, scaled with a small constant δ. Based on our above discussion of stresses and due to the sketched equilibrium condition, this additional boundary stress δσk ν acts as a first Piola–Kirchhoff stress on the (reference) configuration S. The elastic response is given by a correspondingly scaled displacement uk : O → IRd . To properly model the loaded configurations we concatenate this displacement with every nonlinear deformation φi and take into account the sum of the resulting elastic energies plus a term involving the given Cauchy stress in the following energy, 1 Ek [δ, u] = W[Oi , (½ + δu) ◦ φi ] − δ 2 σk ν · u da . n i=1,...,n S Now, the displacement uk is obtained as a minimizer of this modulated energy for a fixed set of deformations (φi )i=1,...,n under the constraints O uk dx = 0 and O x× uk dx = 0, which encode zero average translation and rotation.Let us remark that the boundary integral can be replaced by the volume integral O σk : Du dx, which is more convenient with respect to a numerical discretization. To verify this, we use integration by parts and the fact that div σk = 0 holds on O. As Euler Lagrange condition for uk we obtain div σk [δ uk ] = 0 on O and σ[δ uk ]ν = δσk ν on S after a tedious but straightforward computation. Here, σ[δ uk ] :=
1 −1 W,A ((½ + δDuk )Dφi ◦ φ−1 i )cofD(φi ) n i=1,...,n
is the first Piola–Kirchhoff stress tensor on the compound object O, which effectively reflects an average of all stresses in the n deformed configurations φi (Oi ) for i = 1, . . . , n. As long as A → W (A) is not quadratic in A, uk still solves a nonlinear elastic problem. The advantage of this nonlinear variational formulation is that it is of the same type as the one for the zero moment analysis, and it encodes in a natural way the compound elasticity configuration of the
714
M. Rumpf and B. Wirth
σ2refν ref S2
σ2 ν
φ2
1 φ− 2 (x)
σ3 ν
σ1 ν
1 φ− 1 (x)
S1
x
1 φ− 3 (x)
ref ref
σ1 ν
φ1
S
φ3
σ3refν ref
S3
Fig. 3. Sketch of the pointwise stress balance relation on the averaged shape
averaged shape domain O. As an obvious drawback we have to consider the sum of n nonlinear elastic energies for the computation of every displacement uk , k = 1, . . . , n. In the limit for δ → 0, we would obtain uk as the solution of the actually linear elasticity problem div (C [u]) = 0 in O ,
C [u] ν = σk ν on S
for the symmetric displacement gradient [u] = (Du + DuT )/2 under the constraint O u dx = 0. Here, the in general inhomogeneous and anisotropic elasticity tensor C is defined by 1 1 T C= Dφi W,AA [Dφi ]Dφi ◦ φ−1 , i n i=1,...,n det Dφi based on an appropriate transformation of the Hessian of the energy density W . This elasticity tensor takes into account the loads of the compound configuration based on the combination of all deformations φi on the input objects Oi for i = 1, . . . , n. In our current implementation, we avoid the evaluation of C and consider the above nonlinear approximation, which is simpler to implement but computationally more expensive. The actual covariance analysis based on the derived displacements. Now, we have a set of displacements uk : O → IRd at hand which represent the variations of the average shape, induced by a modulation of the stresses σk from the deformations φk of the input shapes Sk into the average shape S. On this space of displacements, we consider the standard L2 –product (u, u ˜)2 := O u·˜ u dx and define the covariance operator Cov : L2 (O) → L2 (O); u → Covu :=
1 n
(u, uk )2 uk .
k=1,...,n
Obviously, Cov is positive definite on span(u1 , · · · , un ). Hence, we can diagonalize Cov on this finite dimensional space and obtain a set of L2 –orthogonal eigenfunctions wk : O → IRd – actually displacements – and eigenvalues λk > 0 with Covwk = λk wk .
An Elasticity Approach to Principal Modes of Shape Variation
715
Fig. 4. The two dominant modes (right) for four different shapes (left) demonstrate that our principal component analysis properly captures strong geometric nonlinearities
These eigenfunctions can be considered as principal modes of variation of the average object O and hence of the average shape S, given the n input shapes. The eigenvalues encode the actual strength of these variations. Let us underline that this covariance analysis properly takes into account the usually strong geometric nonlinearity in shape analysis via the transfer of geometric shape variation to elastic stresses on the average shape, based on paradigms from nonlinear elasticity (cf. Fig. 4). These stresses lie in a linear vector space and thus allow for a covariance analysis, which is by definition linear. The interpretation of stresses in terms of displacements can be regarded as a proper choice of a scalar metric g(·, ·) on the space of stresses interpreted as a tangent space of the shape space at the average shape: we define g(σν, σ ˜ ν) := (u, u ˜)2 , given the above identification of stresses σν, σ ˜ ν with induced displacements u, u ˜ via the proper compound elasticity problem. Finally, this identification provides a suitable physical interpretation of stresses as modes of shape variation.
4
Elastic versus Riemannian Shape Analysis
The elasticity paradigm, on which our zero and first order shape analysis are based, differs significantly from a Riemannian approach to shape space as proposed for instance by Srivastava et al. [17]. Due to the axiom of elasticity, the energy at the deformed configuration S is independent of the path from a shape S˜ to the shape S along which the deformation is generated in time. Hence, there is no notion of shortest paths if we consider a purely elastic shape model. The visco-plastic model by Fuchs et al. [13] and the related model by Younes [18] define energies based on an integration of dissipation along transformation paths, where dissipation is understood as a Riemannian metric. This approach is not elastic in the classical axiomatic sense we consider here, and it partiularly requires that at rest the intermediate configurations are all stress-free. The above-mentioned conceptual differences are reflected in a different behavior. If we regard shapes from a flow-oriented perspective, then a visco-elastic approach would be more appropriate. However, the elastic approach is favorable for rather rigid, more stable shapes, since it prevents locally strong isometry violation. An example is provided in Fig. 5: The input shapes are regarded as two versions of an object that may have none, one, or two pins at more or less stable positions. Both pins are apparently not interpreted as shifted versions of each other since a shifting deformation would cost too much energy. However, if the material was visco-plastic, a horizontal shift of each pin would be easier and result in an average shape with just one centered pin and its variation being a
716
M. Rumpf and B. Wirth
Fig. 5. Average and variation (right) for two shapes with pins at different positions (left). The pins are not interpreted as shifted versions of each other.
sideward movement. This corresponds to a completely different perception of the input shapes. The strong local rigidity and isometry preservation of the elasticity concept becomes particularly evident in Fig. 4 and Fig. 6, where non-isometric deformations are concentrated only at joints. On a Riemannian manifold, the exponential map allows to describe geodesics from an averaged shape S – in the sense of Karcher [19] – to the input shapes Sk via Sk = expS (vk ) for some tangent vector vk at the shape S in shape space. Hence, a covariance analysis will be performed on the tangent vectors v1 , · · · , vn with respect to the Riemannian metric g(·, ·). In the strictly elastic setup, the shape space is in general not metrizable. Instead, the stresses σk play the role of the vk , imprinting the impact of Sk on the average shape S in terms of an induced displacement uk .
5
Finite Element Phase Field Approximation
Since explicit treatment of an edge set is difficult in a variational setting, we consider a phase field model picking up the approach by Ambrosio and Tortorelli [20] for the discretization of the Mumford–Shah model [21]. Hence, a shape S is encoded by a smooth phase field function v : Ω → IR, which is close to zero on S and one in between. In our approach we construct such phase field functions vi for the input shapes Si in advance. Usually, vi can be computed based on the model in [20] applied to the input images ui . The specific form of the phase field function v for the averaged shape S is then directly determined via a phase field approximation of our variational model. Given a phase field parameter , which will determine the width of the phase field, we first define an approxi mate mismatch penalty F [vi , φi , v] = 1 Ω (v ◦ φi )2 (1 − vi )2 + vi2 (1 − v ◦ φi )2 dx . Here, we suppose v to be extended by1 outside the computational domain Ω. 1 Next, we consider the energy L [v] = Ω |∇v|2 + 4 (v − 1)2 dx , which acts as an approximation of the prior L[S]. Furthermore, we simplify the later numerical implementation by assuming that the whole computational domain behaves elastically with an elasticity several orders of magnitude softer outside the object domains Oi on the complement set Ω \ Oi . Thus, given a smooth approximation χOi of the characteristic function of the object domain O i , we define an ap proximate elastic energy W [Oi , φi ] = Ω (1 − η)χOi + η W (Dφi ) dx , where in our applications η = 10−4 . Finally, the resulting approximation of the total energy functional for the variational description of the average shape reads
An Elasticity Approach to Principal Modes of Shape Variation
717
1 (W [Oi , φi ] + γF [vi , φi , v]) + μL [v] . n i=1 n
E γ, [v, (φi )i=1,...,n ] =
In analogy, a phase field approximation Ekγ, of the energy Ek can be constructed. In these approximations, F acts as a penalty with γ 1 and L ensures a mild regularization of the averaged shape with μ 1. Integration is performed only in regions where all integrands are defined. The actual spatial discretization is based on finite elements. We consider the phase fields v, vi and deformations φi as being represented by continuous, piecewise multilinear (trilinear in 3D and bilinear in 2D) finite element functions on an image domain Ω = [0, 1]d . A cascadic multi scale approach is applied for the relaxation of the energy. For details both on the phase field approximation and the numerical discretization we refer to [14].
6
2D and 3D Applications
We have applied our shape analysis approach to various collections of 2D and 3D shapes. The computed average and dominant variations for sets of 2D shapes are depicted in Figs. 1 to 7 as first illustrative examples. Figure 1 shows the average of five human silhouettes. The corresponding deformations φi and local deformation invariants are displayed in Fig. 2 for two of the input shapes. Particularly the deformed checkerboard patterns show that – due to the invariance properties of the energy – isometries are locally preserved. Also, the indicators of length and area variation only peak locally at the person’s joints. The corresponding principal components are given in Fig. 6. The average shape is represented by the dark line, whereas the light red lines signify deformations of the shape along the principal components. Here, we see the bending of the arm and the leg basically decoupled as the first two dominant modes of variation. The silhouette variations of raising the arm or the leg can only be obtained as linear combinations of the first and fourth or of the second and third mode of variation, respectively. A larger set of shapes is treated in Fig. 7, where 20 binary images “device7” from the MPEG7 shape database serve as input shapes. Apparently, the first principal component is given by a thickening or thinning of the leaves, accompanied by a change of indentation depth between them. The second mode obviously corresponds to bending the leaves, and the third mode represents local changes at the tips: A sharpening and orientation of neighboring
Fig. 6. A set of input shapes (cf. Fig. 1) and their modes of variation with ratios 1, 0.22, 0.15, and 0.06
λi λ1
of
718
M. Rumpf and B. Wirth
Fig. 7. Original shapes and their first three modes of variation with ratios 0.20, and 0.05
λi λ1
of 1,
Fig. 8. 24 given foot shapes, textured with the distance to the surface of the average foot (bottom right). The range [−6 mm, 6 mm] is color-coded as .
λ1 /λ1 = 1
λ2 /λ1 = 0.010
λ3 /λ1 = 0.010
λ4 /λ1 = 0.003
λ5 /λ1 = 0.001
λ6 /λ1 = 0.0008
Fig. 9. The first six dominant modes of variation for the feet from Fig. 8
tips towards each other, originating e. g. from the sixth or the second last input shape. The final example uses 24 foot-shapes as input (which were originally provided as triangulated surfaces and then converted to characteristic functions
An Elasticity Approach to Principal Modes of Shape Variation
719
on the unit cube). The average shape is shown along with the original shapes in Fig. 8, where the input feet are color-coded according to their local distance to the surface of the average foot. It is doubtlessly difficult to analyze the shape variation on this basis: We see modest variation at the toes and the heel as well as on the instep, but any correlation between these variations is difficult to determine. The corresponding modes of variation in Fig. 9, however, are quite intuitive. For all modes we show the average in the middle and its configurations after deformation according to the principal components. The first mode apparently represents changing foot lengths, the second and third mode belong to different variants of combined width and length variation, and the fourth to sixth mode correspond to variations in relative heel position, ankle thickness, and instep height.
7
Conclusion
We have developed an elasticity-based notion of shape variation. Since the shape space of elastically deformable objects inherently does not possess a Riemannian structure, we utilized an alternative shape space structure, in which distance is replaced by elastic deformation energy and boundary stresses play the role of linear representations of shapes. Such an approach imposes a physically and mathematically sound structure on spaces of elastic objects. Its computational feasibility has been proven by application to sets of 2D and 3D shapes.
Acknowledgments The authors thank Guillermo Sapiro for pointing them to the issue of an elastic principal component analysis. We are grateful to Heiko Schlarb from adidas, Herzogenaurach, Germany, for providing 3D scans of feet. Furthermore, we acknowledge support by the Hausdorff Center for Mathematics. Benedikt Wirth has been supported by the Bonn International Graduate School.
References 1. Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active shape models—their training and application. Computer Vision and Image Understanding 61(1), 38–59 (1995) 2. Chalmond, B., Girard, S.C.: Nonlinear modeling of scattered multivariate data and its application to shape change. IEEE Transactions on Pattern Analysis and Machine Intelligence 21(5), 422–432 (1999) 3. Faugeras, O., Adde, G., Charpiat, G., Chefd’Hotel, C., Clerc, M., Deneux, T., Deriche, R., Hermosillo, G., Keriven, R., Kornprobst, P., Kybic, J., Lenglet, C., LopezPerez, L., Papadopoulo, T., Pons, J.P., Segonne, F., Thirion, B., Tschumperlé, D., Viéville, T., Wotawa, N.: Variational, geometric, and statistical methods for modeling brain anatomy and function. NeuroImage 23, S46–S55 (2004)
720
M. Rumpf and B. Wirth
4. Kendall, D.G.: Shape manifolds, procrustean metrics, and complex projective spaces. Bull. London Math. Soc. 16, 81–121 (1984) 5. Charpiat, G., Faugeras, O., Keriven, R.: Approximations of shape metrics and application to shape warping and empirical shape statistics. Foundations of Computational Mathematics 5(1), 1–58 (2005) 6. Leventon, M., Grimson, W., Faugeras, O.: Statistical shape influence in geodesic active contours. In: 5th IEEE EMBS International Summer School on Biomedical Imaging, 2002 (2002) 7. Dambreville, S., Rathi, Y., Tannenbaum, A.: A shape-based approach to robust image segmentation. In: Campilho, A., Kamel, M. (eds.) ICIAR 2006. LNCS, vol. 4141, pp. 173–183. Springer, Heidelberg (2006) 8. Cremers, D., Kohlberger, T., Schnörr, C.: Shape statistics in kernel space for variational image segmentation. Pattern Recognition 36, 1929–1943 (2003) 9. Rathi, Y., Dambreville, S., Tannenbaum, A.: Comparative analysis of kernel methods for statistical shape learning. In: Beichel, R., Sonka, M. (eds.) CVAMIA 2006. LNCS, vol. 4241, pp. 96–107. Springer, Heidelberg (2006) 10. Mémoli, F., Sapiro, G.: A theoretical and computational framework for isometry invariant recognition of point cloud data. Foundations of Computational Mathematics 5, 313–347 (2005) 11. Miller, M.I., Younes, L.: Group actions, homeomorphisms and matching: a general framework. International Journal of Computer Vision 41(1-2), 61–84 (2001) 12. Miller, M., Trouvé, A., Younes, L.: On the metrics and euler-lagrange equations of computational anatomy. Annual Review of Biomedical Enginieering 4, 375–405 (2002) 13. Fuchs, M., Jüttler, B., Scherzer, O., Yang, H.: Shape metrics based on elastic deformations. Forschungsschwerpunkt S92, Idustrial Geometry 71, Universität Innsbruck (2008) 14. Rumpf, M., Wirth, B.: A nonlinear elastic shape averaging approach. SIAM Journal on Imaging Sciences (2008) (submitted) 15. Ciarlet, P.G.: Three-dimensional elasticity. Elsevier Science Publishers B. V., Amsterdam (1988) 16. Baker, T.: Three dimensional mesh generation by triangulation of arbitrary point sets. In: Computational Fluid Dynamics Conference, 8th, Honolulu, HI, June 9-11, 1987, vol. 1124-CP, pp. 255–271 (1987) 17. Srivastava, A., Jain, A., Joshi, S., Kaziska, D.: Statistical shape models using elastic-string representations. In: Narayanan, P. (ed.) ACCV 2006. LNCS, vol. 3851, pp. 612–621. Springer, Heidelberg (2006) 18. Younes, L.: Computable elastic distances between shapes. SIAM J. Appl. Math. 58, 565–586 (1998) 19. Karcher, H.: Riemannian center of mass and mollifier smoothing. Communications on Pure and Applied Mathematics 30(5), 509–541 (1977) 20. Ambrosio, L., Tortorelli, V.M.: On the approximation of free discontinuity problems. Bollettino dell’Unione Matematica Italiana, Sezione B 6(7), 105–123 (1992) 21. Mumford, D., Shah, J.: Optimal approximation by piecewise smooth functions and associated variational problems. Communications on Pure Applied Mathematics 42, 577–685 (1989)
Pre-image as Karcher Mean Using Diffusion Maps: Application to Shape and Image Denoising Nicolas Thorstensen, Florent Segonne, and Renaud Keriven Universite Paris-Est, Ecole des Ponts ParisTech, Certis [email protected] http://certis.enpc.fr/˜thorsten
Abstract. In the context of shape and image modeling by manifold learning, we focus on the problem of denoising. A set of shapes or images being known through given samples, we capture its structure thanks to the Diffusion Maps method. Denoising a new element classically boils down to the key-problem of pre-image determination, i.e.recovering a point, given its embedding. We propose to model the underlying manifold as the set of Karcher means of close sample points. This non-linear interpolation is particularly well-adapted to the case of shapes and images. We define the pre-image as such an interpolation having the targeted embedding. Results on synthetic 2D shapes and on real 2D images and 3D shapes are presented and demonstrate the superiority of our pre-image method compared to several state-of-the-art techniques in shape and image denoising based on statistical learning techniques.
1 Introduction Manifold learning, the process of extracting the meaningful structure and correct geometric description present in a set of training points Γ = {s1 · · · sp } ⊂ §, has seen renewed interest over the past years. These techniques are closely related to the notion of dimensionality reduction, i.e.the process of recovering the underlying low dimensional structure of a manifold M that is embedded in a higher-dimensional space §. Among the most recent and popular techniques are the Locally Linear Embedding (LLE) [5], Isomap [6], Laplacian eigenmaps [7] and Diffusion Maps [8, 9, 10]. In this paper we focus on Diffusion Maps. Their nonlinearity, as well as their locality-preserving property and stable behavior under noise are generally viewed as a major advantage over classical methods like principal component analysis (PCA) and classical multidimensional scaling [8]. This method considers an adjacency graph on the set Γ of training samples, which matrix (Wi,j )i,j∈1,...,p captures the local geometry of Γ - its local connectivity - through the use of a kernel function w. Wi,j = w(si , sj ) measures the strength of the edge between si and sj . Typically w(si , sj ) is a decreasing function of the distance d§ (si , sj ) between the training points si and sj . In this work, we use the Gaussian kernel w(si , sj ) = exp (−d2§ (si , sj )/2σ 2 ), with σ estimated as the median of the distances between all the training points [2, 10]. The kernel function has the property to implicitly map data points into a highdimensional space, called the feature space. This space is better suited for the study of non-linear data. Computing the Diffusion Maps amounts to embed the data into the X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 721–732, 2009. c Springer-Verlag Berlin Heidelberg 2009
722
N. Thorstensen, F. Segonne, and R. Keriven
feature space through a mapping Ψ . While the mapping from input space to feature space is of primary importance , the reverse mapping from feature space back to input space (the pre-image problem) is also useful. Consider for example the use of kernel PCA for pattern denoising. Given some noisy patterns, kernel PCA first applies linear PCA on the -mapped patterns in the feature space, and then performs denoising by projecting them onto the subspace defined by the leading eigenvectors. These projections, however, are still in the feature space and have to be mapped back to the input space in order to recover the denoised patterns. 1.1 Related Work Statistical methods for shape processing are very common in computer vision. A seminal work in this direction was published by Leventon et. al. [11] adding statistical knowledge into energy based segmentation methods. Their method captures the main modes of variation by performing a PCA on the set of shapes. This was extended to nonlinear statistics by Cremers et al. in [12]. The authors introduce non linear shape priors by using a probabilistic version of Kernel PCA (KPCA). Dambreville et.al [1] and Arias et al. [2] developed a method for shape denoising based on Kernel PCA. So did Kwok et al. [3] in the context of image denoising. Both methods compute a projection of the noisy datum onto a low dimensional space. In [13,4] the authors propose another kernel method for data denoising, the so called Laplacian Eigenmaps Latent Variable Model (LELVM), a probabilistic method. This model provides a dimensionality reduction and reconstruction mapping based on linear combinations of input samples. LELVM performs well on motion capture data but fails on complex shapes (see Fig. 1). Further we would like to mention the work of Pennec [14] and Fletcher [15] modeling the manifold of shapes as a Riemannian manifold and the mean of such shapes as a Karcher mean [16]. Their methodology is used in the context of computational anatomy to solve the average template matching problem. Closer to our work is the algorithm proposed by Etyngier et. al. [17]. They use Diffusion Maps as a statistical framework for non linear shape priors in segmentation. They augment an energy functional by a shape prior term. Contrary to us, they do not compute a denoised shape but propose an additional force toward a rough estimate of it.
Fig. 1. Digit images corrupted by additive Gaussian noise (from left to right, σ 2 = 0.25, 0.45, 0.65, 0.85). The different rows respectively represent, from top to bottom: the original digits; the corrupted digits; denoising with [1]; with [1]+ [2]; with [3]; with [3]+ [2]; with [4]; with our Karcher means based method. See table 2 for quantified results.
Pre-image as Karcher Mean Using Diffusion Maps
723
1.2 Our Contributions In this paper, we propose a new method to solve the pre-image (see Section 3) problem in the context of Diffusion Maps for shape and image denoising. We suggest a manifold interpretation and learn the intrinsic structure of a given training set. Our method relies on a geometric interpretation of the problem which naturally leads the definition of the pre-image as a Karcher-mean [16] that interpolates between neighboring samples according to the diffusion distance. Previous pre-image methods were designed for Kernel PCA. Our motivation for using Diffusion Maps comes from the fact that the computed mapping captures the intrinsic geometry of the underlying manifold independently of the sampling. Therefore, the resulting Nyström extension (see Section 2.2) proves to be more “meaningful” far from the manifold and leads to quantitatively better pre-image estimations, even for very noisy input data. In the case of shape denoising, we compare our results to the work proposed by Dambreville [1] and for image denoising, to several denoising algorithms using Kernel PCA: [3], [2], [4]. Results on 3D shapes and 2D images are presented and demonstrate the superiority of our method. The rest of the paper is organized as follows. Section 2 presents the Diffusion Maps framework and the out-of-sample extension. Section 3 introduces our pre-image methodology. Numerical experiments on real data are reported in section 4 and section 5 concludes.
2 Learning a Set of Shapes Let Γ = {s1 · · · sp } be p independent random points of a m-dimensional manifold M locally sampled under some density qM (s) (m << p). The manifold M is assumed to be a smooth finite-dimensional sub-manifold embedded in a (potentially infinitedimensional) space S. The density qM (s) is unknown and might not be uniform. In this work, we consider more general spaces than the traditional Euclidean space Rn and only assume that the input space S is equipped with a distance dS . 2.1 Diffusion Maps To extract the meaningful structure present in the training set Γ , classical manifold learning techniques minimize a quadratic distortion measure of the desired coordinates on the data, naturally leading to the eigenfunctions of Laplace-type operators as minimizers [8, 9]. Unfortunately, most unsupervised learning methods generate coordinates (the embedding) that combine the information of both the density qM and the geometry [9, 10, 18]. Diffusion Maps construct a discrete density-independent approximation of the Laplace-Beltrami operator ΔM defined on M and provide an embedding that captures the intrinsic geometry independently of the sampling density. We quickly review the construction of Diffusion Maps [8]. In a first step, we build a fully connected graph on the set Γ where each node correponds to a sample ∈ Γ . Based on the distance (dS ) between samples, nodes are connected if their mutal distance is less or equal to σ, with σ being the median distance between all shapes. In order to build the normalized Laplacian matrix we use the diffusion kernel w(., .) Pi,j = p(si , sj ) =
w(si , sj ) . g(si )
(1)
724
N. Thorstensen, F. Segonne, and R. Keriven
The diffusion kernel w(si , sj ) encodes the probability of transition between si and sj and g(si ) normalizes the quantity in (1) such that j p(si , sj ) = 1. Therefore, the quantity p(si , sj ) can be seen as the probability of a random walker to jump from si to sj and P encodes a Markov Chain on Γ . The function g(si ) measures the number of incident edges to the node corresponding to the shape si . If we introduce a time t and denote pt the elements of P t (the tth power of P ), then pt (si , sj ) corresponds to the probability of transition after t time steps. When t → ∞ the random walk converges to a unique stationary distribution ϕ0 . We have ϕT0 P = ϕT0 . Using a well known fact from spectral theory, Coifman [8] introduces the following eigen-decomposition of the kernel pt : pt (si , sj ) = λtl ψlt (si )ϕtl (sj ), (2) l
{λtl }
where is the decreasing eigenspectrum of P t and {ϕtl (sj )} respectively {ψlt (si )} the corresponding biorthogonal left and right eigenvectors. They verify ϕ0 (x)ψl (x) = ϕl (x).
(3)
Note that because of the fast decaying sequence of eigenvalues only a few terms need to be retained to approximate the probability pt (., .) within a certain relative accuracy. Then the diffusion distance Dt (si , sj ) between two points si and sj can be written as Dt2 (si , sj ) =
(pt (si , sl ) − pt (sj , sl ))2 l
ϕ0 (sl )
.
(4)
This simple L2 -weighted distance between the two conditional probabilities pt (si , .), pt (sj , .) defines a metric on the data that measures the amount of connectivity of the points si and sj along paths of length t. In order to relate the diffusion distance we have to combine (2) and (4) and find with the biorthogonality relation between left and right eigenvectors(cf. [10]) that Dt2 (si , sj ) = (λtl ψlt (si ) − λtl ψlt (sj ))2 . (5) l≥1
(since ψ0 is a constant vector, it is left out of the sum). Equation (5) shows that the right eigenvectors of Pt can be used to express the diffusion distance. To this end, we introduce the family of Diffusion Maps indexed by a time parameter t ⎞ ⎛ t t λ0 ψ0 (s) ⎟ ⎜ t t Ψt (s) = ⎝ λ1 ψ1 (s) ⎠ .. . In the sequel we will omit the parameter t and assume it set to a fixed value [10]. From Equation (5), we can see that Diffusion aps generate a quasi-isometric mapping since the diffusion distance is approximately equal to the L2 metric in the new coordinate system when retaining the first m eigenvectors. Also note that methods like LLE or Laplacian Eigenmaps do not provide an explicit metric which is crucial for the contribution in this paper.
Pre-image as Karcher Mean Using Diffusion Maps
725
2.2 Out-of-Sample Extension In general, the mapping Ψ , also referred to as an embedding, is only known over the training set. The extension of the mapping to new input points is of primary importance for kernel based methods whose success depend crucially on the “accuracy” of the extension. This problem, referred to as the out-of-sample problem, is often solved using the popular Nyström extension method [2, 19, 18]. Instead of recomputing the whole embedding, which can be costly for very large datasets because it involves a spectral decomposition, the problem is solved through a method borrowed from numerical analysis [20]. With this technique in hand and considering that every training sample verifies: ∀sj ∈ Γ ∀l ∈ 1, . . . , p p(sj , si )ψl (y) = λl ψl (si ), si ∈Γ
the embedding of new data points located outside the set Γ can similarly be computed by a smooth extension Ψˆ of Ψ : ⎧ → (ψˆ1 (s), . . . , ψˆp (s)) ⎨ S → Rp , s Ψˆ : ∀l ∈ 1, ..., p ψˆl (s) = λl (6) p(s, y)ψl (y). ⎩ y∈Γ
It is obvious that the extension depends on the data and recomputing the whole embedding with the new datum would yield a different embedding. But in general the approximation works well and is used throughout the literature. In addition, the reverse mapping from the feature space back to the input space is often required. After operations are performed in feature space (these operations necessitate the extension of the mapping), corresponding data points in input space often needs to be estimated. This problem, known as the pre-image problem, is the problem to be addressed in this paper. We now tackle the problem of pre-image computation using Diffusion Maps.
3 Pre-image as Karcher Means We push the manifold interpretation and define the pre-image of φ ∈ Rp as the point −1 s = Ψ|M (φ) in the manifold M such that Ψ (s) = φ. Although Diffusion Maps extract the global geometry of the training set and define a robust notion of proximity, they cannot permit the estimation of the manifold between training samples, i.e. the local geometry of the manifold is not provided. Following [21], we propose to approximate the manifold as the set of Karcher means [16] interpolating between correctly chosen subsets of m + 1 sample points, m being the fixed dimension reduction parameter. Usually it is chosen by observing the eigenvalues of the eigenvectors. As mentioned in Section 2.1 only a few eigenvectors are needed to approximate well the diffusion distance. And the parameter m is exactly the number of eigenvectors retained. From a dimensionality reduction point of view this parameter corresponds to the degree of freedom in the data set but which cannot be computed automatically and therefore must be guessed. In [21], these subsets are the Delaunay simplices of a m-dimensional Delaunay triangulation of
726
N. Thorstensen, F. Segonne, and R. Keriven
the sample points. This limits in practice m to small values. Here, we simply exploit the Euclidean nature of the feature space: for a given φ, we choose the interpolating subset as its m + 1 nearest neighbors with respect to the diffusion distance D. We then −1 define the pre-image s = Ψ|M (φ) as a Karcher mean that minimizes the mean-squared criterion: s = arg min Ψ (z) − φ2 (7) z∈S
3.1 Shape Interpolation Using Karcher Means Given a set of neighboring points N = {s1 , · · · , sm+1 } (i.e.neighboring for the diffusion distance D), we assume that the manifold M can be locally described (i.e.between neighboring samples) by a set of weighted-mean samples {sΘ } that verifies: sΘ = arg min θi dS (z, si )2 , (8) z∈S
1≤i≤m+1
m+1 where dS is the distance in the input space and θi ≥ 0, i=1 θi = 1 . The coefficients Θ = {θ1 , . . . , θm+1 } are the barycentric coefficients of the point sΘ with respect to its neighbors N in S. Proposed by Charpiat el al. [22], this model proved to give natural shape interpolations, compared to linear approximations. One classical choice is the area of the symmetric difference between the regions bounded by the two shapes: 1 dSD (s1 , s2 ) = |χΩ1 − χΩ2 | , (9) 2 where χΩi is the characteristic function of the interior of shape si . This distance was recently advocated by Solem in [23] to build geodesic paths between shapes. But the drawback is that this distance yields no unique geodesics. We proved this behavior analytically in the context of our method. But in the simulations we did not encounter any problems with the symmetric distance function. Another definition has been proposed [11, 24, 22], based on the representation of a curve in the plane, of a surface in 3D space, by its signed distance function. In this context, the distance between two shapes can be defined as the L2 -norm or the Sobolev W 1,2 -norm of the difference between their signed distance functions. Let us recall that W 1,2 (Ω) is the space of square integrable functions over Ω with square integrable derivatives: dL2 (s1 , s2 )2 = ||Ds1 − Ds2 ||2L2 (Ω,R) ,
(10)
dW 1,2 (s1 , s2 )2 = ||Ds1 − Ds2 ||2L2 (Ω,R) + ||∇Ds1 − ∇Ds2 ||2L2 (Ω,Rn ) ,
(11)
where Dsi denotes the signed distance function of shape si (i = 1, 2), and ∇Dsi its gradient. 3.2 Pre-image and Manifold Interpolation We propose to define the pre-image of a target point φ in the feature space, as the point sΘ that minimizes the energy EΨ (sΘ ) = Ψ (sΘ ) − φ2 , sΘ being expressed a Karcher
Pre-image as Karcher Mean Using Diffusion Maps
727
mean for the neighborhood N made of the m + 1 samples of Γ which embedding are the m + 1-closest neighbors of φ in the feature space equipped with D: −1 Ψ|M (φ) = arg min Ψ (sΘ ) − φ2 , sΘ where sΘ = arg min θi dS (z, si )2 z∈S
(12)
1≤i≤m+1
When the input space is some Euclidean space Rn with its traditional L2 -norm, this indeed amounts to assuming that the manifold M is piecewise-linear (i.e.linearly interpolated between neighboring training samples). For shapes, we will see that this yields natural pre-images. By simple extension, we define the projection of any new test sam−1 ple s on the manifold M by ΠM (s) = Ψ|M (Ψ (s)). 3.3 Implementation Issues −1 (φ) is computed by gradient descent. Instead of optimizing over The pre-image Ψ|M Θ, we use a descent over sΘ itself (Equation 13), constraining it to remain a Karcher mean (Equation 8). This boils down to projecting the deformation field ∇s Eψ onto the tangent space TM sΘ of M at point sΘ . Note that to compute this tangent space, we are implicitly assuming that the space S has a manifold structure, in particular that the tangent space TSsΘ of S at location sΘ (i.e.the space of local deformations around sΘ ) is equipped with an inner product that we denote .|. S . The optimality condition of Equation 8 is:
∀β ∈ TSsΘ ,
m+1
θi di ∇s di |β S = 0,
i=1
where we denote N = {s1 , ..., sm + 1} and di = dS (sΘ , si ). In order to recover the tangent space TM sΘ at sΘ , one needs to relate the m-independent modes of variations of m+1 the coefficient Θ (remember that i=1 θi = 1) with local deformation fields dsΘ ∈ TSsΘ . To a small variation of the barycentric coefficients Θ → Θ + dΘ, corresponds a small deformation of the sample sΘ → sΘ + dsΘ . Differentiating the optimality condition with respect to Θ and sΘ provides the relation between dΘ and dsΘ . For n example, when the input space m+1is taken to be the Euclidean m+1 space, i.e.S = R , we obviously obtain dsΘ = 1 dθi si . Remembering 1 dθi = 0 and fixing the dθi appropriately, we can recover TM sΘ . Therefore we optimize for sΘ without explicitly computing Θ. The gradient descent generates a family of samples s : τ ∈ R+ → s(τ ) ∈ M such that ds s(0) = s0 , = −v M (sτ ), dτ with s0 ∈ N (in practice, the nearest neighbor of φ). The velocity field v M (sτ ) is the orthogonal projection of the deformation field ∇sτ EΨ = (Ψ (sτ ) − φ)T ΛΨ T ∇sτ psτ onto the tangent space TM sτ . Here Λ is a diagonal matrix of eigenvalues and P si are the corresponding eigenvectors. Note that before projecting onto TM sτ we first orthogonalize the tangent space by using Gram-Schmidt. In the case of the L2 -norm the Θ’s can be
728
N. Thorstensen, F. Segonne, and R. Keriven
Fig. 2. Interpolation using Karcher means for 39 three-dimensional sample shapes. From left to right: a) a new shape not in the given sample b) the same shape with an occlusion c) the 3 nearest neighbors of the corrupted shape according to the diffusion distance (in red, green and blue) d) the original shape (in yellow) and our interpolation (in red). See text for quantitative results.
easily recovered. When using a different distance function such as the symmetric difference or the Sobolev W 1,2 -norm then one needs to solve additionally a system of linear equations in each step of the gradient descent.
4 Results In order to validate the proposed method, we run several experiments on real and synthetic data. First, we test the Karcher mean interpolation with the reconstruction problem of occluded 3D medical shapes [1]. In a second experiment we validate the purpose of the projection of the gradient onto the tangent space. Finally, a third experiment demonstrates the superiority of our method for a standard denoising problem on images. 4.1 Remaining on the Manifold To validate both the Karcher means modeling of the manifold and our projecting constraint (section 3.3), we generate a set of 200 synthetic shapes parameterized by an articulation angle and a scaling parameter (Fig. 3a). The corresponding embeddings are shown Fig. 3b. Choosing two distant shapes A and B, we compute a path s(τ ) from A to B be mean of a gradient descent starting from s(0) = A and minimizing dS (s(τ ), B). Fig. 3c and 3b show in red the intermediate shapes and the corresponding embeddings. In purple are shown the same path when projecting the gradient in order to remain on the manifold. Observe how the intermediate shapes look more like the original sample ones in that case. Note also that when remaining on M, the interpolating path is almost a straight line with respect to the diffusion distance. 4.2 Projection and Manifold as Karcher Means We here test the validity of using Karcher means as a manifold interpolation model. We consider the space of two-dimensional surfaces embedded in R3 . For such a general space, many different definitions of the distance between two shapes have been proposed in the computer vision literature but there is no agreement on the correct way to measure shape similarity. In this work, we represent a surface si in the Euclidean embedding space R3 by its signed distance function Dsi . In this context, we define the distance between two shapes to be the L2 -norm of the difference between their signed distance functions [11]: dS (s1 , s2 )2 = ||Ds1 − Ds2 ||2L2
Pre-image as Karcher Mean Using Diffusion Maps
729
Table 1. Average reconstruction error for a set of 9 noisy shapes Avg err of shapes with occlusion Nearest neighbors(NN) Mean of NN [1] Our method 4.67 1.81 1.96 1.1 0.58
Fig. 3. Synthetic sample of 200 articulated and elongated shapes. From left to right: (a) a subset of the sample. (b) triangulated 2-dimensional embedding computed using Diffusion Maps and a gradient descent from an initial shape to a target one, without (red dots) and with (purple dots) remaining on the interpolated manifold. (c) Some shapes of the resulting evolution (left column: without projection, right column: with projection.
Note that, in order to define a distance between shapes that is invariant to rigid displacements (e.g.rotations and translations), we first align the shapes using their principal moments before computing distances. Note also that the proposed method is obviously not limited to a specific choice of distance [22, 17]. We use a dataset of 39 ventricles nuclei extracted from Magnetic Resonance Image (MRI). We learn a random subset of 30 shapes and corrupt the nine remaining shapes by an occlusion (Fig. 2a,b). In order to recover the original shapes we project the shapes onto the shape manifold with our method. We then compare the reconstruction results with the nearest neighbor, the mean of the m+1 nearest neighbors and the method of Dambreville [1]. The parameters of this experiments is m = 2. In Figure 2-d one example of a reconstructed shape (red) is obtained from the m + 1 nearest neighbors of s• (Fig. 2c). In order to quantitatively evaluate the projection, we define the reconstruction error as e(s) = dS (s◦ , s)/σ, where s◦ is the original shape and s is the reconstructed shape. The occluded shape has an error of e(s• ) = 4.35, while the nearest-neighbor has an error of 1.81. In Table 1 we see that our method is superior the one proposed by Dambreville [1]. 4.3 Application: Denoising of Digits To test the performance of our approach on the task of image denoising, we apply the algorithm on the USPS dataset of handwritten digits1 . In a first experiment, we compare 1
The USPS dataset is available from http://www.kernel-machines.org.
730
N. Thorstensen, F. Segonne, and R. Keriven
Table 2. Average PSNR (in dB) of the denoised images corrupted by different noise levels σ. Training sets consist in 60 samples (first 4 rows) and 200 samples (last 4 rows). σ2 0.25 0.45 0.65 0.85 0.25 0.45 0.65 0.85
[1] 8.50 9.05 9,78 9.06 9.35 9.64 9.41 9,24
[3] 15.71 13,87 13,10 12,58 16.08 15.70 13.97 13.06
[2]+ [1] 10.17 9.98 9,58 8,61 11.97 10.18 10.26 10.25
[2]+ [3] 16.18 15,42 13,60 13,91 16.21 15.98 15.85 15.07
[4] Our method 14,01 17.71 13,91 17.52 13,89 17.38 13,87 17.32 15,27 17.95 14,85 17,85 14,13 17,79 14,07 17,75
our method to five state-of-the-art algorithms [1], [1]+ [2], [3], [3]+ [2] and [4]. For each of the ten digits, we form two training sets composed of randomly selected samples (60 and 200 respectively). The test set is composed of 40 images randomly selected and corrupted by some additive Gaussian noise at different noise levels. The process of denoising simply amounts to estimating the pre-images of the feature vectors given by the Nyström extension of the noisy samples. For all the methods, we take m = 8 for the reduced dimension (number of eigenvectors for the kernel-PCA based methods). Table 2 shows a quantitative comparison based on the pixel-signal-to-noise ratio (PSNR). Our method outperforms visually (Fig. 1) and quantitatively other approaches. Interestingly, it is less sensitive to noise than other ones and yields good results even under heavy noise.
5 Conclusions and Future Work In this paper, we focused on the pre-image problem. We provide a solution to the preimage problemusing Diffusion Maps. Following a manifold interpretation of the training set, we define the pre-image as a Karcher mean interpolation between neighboring samples with respect to the diffusion distance. Results on real world data, such as 3D shapes and noisy 2D images, demonstrate the superiority of our approach. In the continuation of this work several ideas may be exploited. In the perspective of working on complex shape spaces, our projection operator, defined from a manifold point-of-view, could be used in different tasks, such as segmentation with shape priors, interpolation and reconstruction of shapes, and manifold denoising. Interestingly, our approach is able to deal with manifolds of complex topology. In the context of manifold denoising this property can be useful. So far, none of the pre-image problems were tested when the training data itself contains heavy noise. We are currently investigating these directions.
References 1. Dambreville, S., Rathi, Y., Tannenbaum, A.: Statistical shape analysis using kernel PCA. In: IS&T/SPIE Symposium on Electronic Imaging (2006) 2. Arias, P., Randall, G., Sapiro, G.: Connecting the out-of-sample and pre-image problems in kernel methods. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 18-23 (2007)
Pre-image as Karcher Mean Using Diffusion Maps
731
3. Kwok, J.T., Tsang, I.W.: The pre-image problem in kernel methods. IEEE Transaction in Neural Network 15(6), 1517–1525 (2004) 4. Carreira-Perpiñan, M.A., Lu, Z.: The Laplacian Eigenmaps Latent Variable Model. JMLR W&P 2, 59–66 (2007) 5. Roweis, S., Saul, L.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000) 6. Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000) 7. Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation 15(6), 1373–1396 (2003) 8. Coifman, R., Lafon, S., Lee, A., Maggioni, M., Nadler, B., Warner, F., Zucker, S.: Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps. PNAS 102(21), 7426–7431 (2005) 9. Hein, M., Audibert, J.Y., von Luxburg, U.: From graphs to manifolds - weak and strong pointwise consistency of graph Laplacians. Journal of Machine Learning Research, ArXiv Preprint (forthcoming) (2006) 10. Lafon, S., Keller, Y., Coifman, R.R.: Data fusion and multicue data matching by diffusion maps. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(11), 1784–1797 (2006) 11. Leventon, M., Grimson, E., Faugeras, O.: Statistical shape influence in geodesic active contours. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 316–323 (2000) 12. Cremers, D., Kohlberger, T., Schnörr, C.: Nonlinear shape statistics in mumford shah based segmentation. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2351, pp. 93–108. Springer, Heidelberg (2002) 13. Lu, Z., Carreira-Perpinan, M., Sminchisescu, C.: People tracking with the laplacian eigenmaps latent variable model. In: Platt, J., Koller, D., Singer, Y., Roweis, S. (eds.) Advances in Neural Information Processing Systems, vol. 20, pp. 1705–1712. MIT Press, Cambridge (2008) 14. Pennec, X.: Intrinsic statistics on Riemannian manifolds: Basic tools for geometric measurements. Journal of Mathematical Imaging and Vision 25(1), 127–154 (2006); a preliminary appeared as INRIA RR-5093 (January 2004) 15. Davis, B., Fletcher, P., Bullitt, E., Joshi, S.: Population shape regression from random design data. In: ICCV, vol. 1 (2007) 16. Karcher, H.: Riemannian center of mass and mollifier smoothing. Comm. Pure Appl. Math. (30), 509–541 (1977) 17. Etyngier, P., Segonne, F., Keriven, R.: Shape priors using manifold learning techniques. In: 11th IEEE International Conference on Computer Vision, Rio de Janeiro, Brazil (October 2007) 18. Lafon, S., Lee, A.B.: Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(9), 1393–1403 (2006) 19. Bengio, Y., Paiement, J.F., Vincent, P., Delalleau, O., Le Roux, N., Ouimet, M.: Out-ofsample extensions for lle, isomap, mds, eigenmaps, and spectral clustering. In: Thrun, S., Saul, L.K., Schölkopf, B. (eds.) Advances in Neural Information Processing Systems, vol. 16. MIT Press, Cambridge (2004) 20. Baker, C.T.H., Baker, C.T.H.: Numerical analysis of volterra functional and integral equations. In: Duff, I.S., Watson, G.A. (eds.) The state of the art in numerical analysis, pp. 193– 222. University Press (1996)
732
N. Thorstensen, F. Segonne, and R. Keriven
21. Etyngier, P., Keriven, R., Segonne, F.: Projection onto a shape manifold for image segmentation with prior. In: 14th IEEE International Conference on Image Processing, San Antonio, Texas, US (September 2007) 22. Charpiat, G., Faugeras, O., Keriven, R.: Approximations of shape metrics and application to shape warping and empirical shape statistics. Foundations of Computational Mathematics 5(1), 1–58 (2005) 23. Solem, J.: Geodesic curves for analysis of continuous implicit shapes. In: International Conference on Pattern Recognition, vol. 1, pp. 43–46 (2006) 24. Rousson, M., Paragios, N.: Shape priors for level set representations. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2351, pp. 78–92. Springer, Heidelberg (2002)
Fast Shape from Shading for Phong-Type Surfaces Oliver Vogel, Michael Breuß, Thomas Leichtweis, and Joachim Weickert Mathematical Image Analysis Group, Faculty of Mathematics and Computer Science, Building E1.1 Saarland University, 66041 Saarbrücken, Germany {vogel,breuss,leichtweis,weickert}@mia.uni-saarland.de Abstract. Shape from Shading (SfS) is one of the oldest problems in image analysis that is modelled by partial differential equations (PDEs). The goal of SfS is to compute from a single 2-D image a reconstruction of the depicted 3-D scene. To this end, the brightness variation in the image and the knowledge of illumination conditions are used. While the quality of models has reached maturity, there is still the need for efficient numerical methods that enable to compute sophisticated SfS processes for large images in reasonable time. In this paper we address this problem. We consider a so-called Fast Marching (FM) scheme,which is one of the most efficient numerical approaches available. However, the FM scheme is not trivial to use for modern non-linear SfS models. We show how this is done for a recent SfS model incorporating the non-Lambertian reflectance model of Phong. Numerical experiments demonstrate that – without compromising quality – our FM scheme is two orders of magnitude faster than standard methods.
1
Introduction
Given a single 2-D image, the aim of Shape from Shading (SfS) is to infer the 3-D depth of the surface of depicted objects. For this, SfS uses the brightness variation in the image together with information on intensity and position of the light source. Much progress has been achieved in the last years in modelling SfS. As proper model components have been identified, SfS is now considered to be a well-posed problem. In recent model extensions, also non-Lambertian surfaces are taken into account within this well-posed framework. Thus, SfS has reached a reasonable level of maturity. However, these advances on the modelling side also lead to new challenges for numerical methods in this field. In order to obtain 3-D reconstructions of good quality it is recommended to use modern, highly non-linear SfS models together with large, high-resolution input images. Thus, a proper algorithm must be able to deal with the arising large non-linear problems in reasonable computing time. In this paper, we show how to use a Fast Marching (FM) scheme for this purpose. It turns out that this is not trivial because of the involved non-linearities. Brief history of SfS models. The SfS-problem is a classic problem in computer vision. It was introduced in the works of Horn [1]. In particular, his model X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 733–744, 2009. c Springer-Verlag Berlin Heidelberg 2009
734
O. Vogel et al.
assumptions of an orthographic camera and Lambertian surface reflectance became a standard for early SfS research, see the review article [2]. However, the authors of [2] also concluded that orthographic SfS models do not perform well on synthetic data, and even worse on real-world images. In recent years, sophisticated models employing a more realistic perspective projection have been developed [3,4,5]. In [5], it was shown that the perspective camera model, together with a point light source at the optical centre of the camera and a non-linear light attenuation term, leads to the well-posedness of the SfS-task. Recently, this class of perspective SfS models has been extended to cover also non-Lambertian surface reflectance. In [6], the Lambertian diffuse reflection has been substituted by the model of Oren and Nayar [7] for the purpose of facial recognition. Another approach has been introduced in [8], where the reflectance model of Phong [9] well-known from computer graphics is used. The Fast Marching method. The SfS models of interest infer the problem to solve boundary value problems for a class of non-linear hyperbolic partial differential equations (PDEs) called Hamiton-Jacobi equations. The fast marching (FM) method is an efficient technique for solving such problems. It was introduced by Tsitsiklis [10] and further developed by Sethian [11]. Our contribution. We show how to use the FM technique for the highly nonlinear, perspective SfS model given in [8] which especially incorporates light attenuation and the non-Lambertian reflectance model of Phong. In particular, we address the following issues. We consider the problem to compute an initial guess of the depth in surface points with the minimal distance to the camera. The estimation is non-trivial for highly non-linear models such as the one we use. For this estimate, a suitable set of corresponding image points needs to be identified in advance. In order to realise the scheme, one also needs to perform in each discretisation point a fixed-point iteration, for which we give a well-working scheme here. Having solved these problems, we compare the method with other schemes in the Lambertian case, confirming that our FM scheme is two orders of magnitude faster without a trade-off in accuracy. Then we apply the FM scheme directly for Phong-type non-Lambertian SfS of objects in real-world images taken with a standard digital camera. We show that our FM scheme delivers high-quality results in just a few seconds of computing time, while the method from [8] we compare with takes hours for computing comparable results. Relation to previous work. It is quite well-known that FM schemes may outperform other discretisation methods for the class of problems we are interested in, provided it is possible to construct such a scheme. The potential usefulness of FM schemes has also been noticed by other authors in the field of SfS. The first one who applied FM to the SfS problem was Sethian [11]. The model he considered was the classic orthographic Lambertian model with a single far light source. Later, Kimmel and Sethian used FM for the same set-up but with an oblique light source [12]. In [13], Yuen et al. apply FM at a Lambertian model incorporating a perspective projection. Let us note that this model is formulated
Fast Shape from Shading for Phong-Type Surfaces
735
in terms of unknown surface normals – in contrast to the unknown depth as in [5,3] – and it does not include a light attenuation term. Tankus et al. perform FM at a perspective Lambertian model also not incorporating light attenuation [14]. In [15], Prados and Soatto develop a FM approach based on ideas from optimal control theory. However, while they claim that their approach holds for perspective SfS with Lambertian reflectance, they only show computational results of their scheme for the classic orthographic model also considered in [11]. The paper [16] is an extension of the work [13], addressing problems with strong gradients of the authors’ previous method arising by occluded regions. As it is of importance in the context of this paper, let us stress that up to now the light attenuation has not been taken consequently into account in FM, and that non-Lambertian reflectance models have not been considered at all within an FM scheme. Note that exactly terms corresponding to these model assumptions yield strong non-linear contributions. Paper organisation. After briefly introducing the Phong-type SfS model in Section 2, we describe in Section 3 in detail its discretisation of the SfS model, making use of the FM method. We then proceed elaborating on the choice of points featuring the initial guess of the 3-D depth in Section 4. Sections 5 and 6 are devoted to the experimental evaluation and a conclusion, respectively.
2
The Perspective SfS Model with Phong-Type Reflectance
The SfS model we deal with in this paper is given in [8]. We briefly review here the developments in that work. The Phong reflection model. It is adequate to begin the presentation of the SfS model with the modeling ansatz given by the brightness equation due to Phong [9]. Assuming thereby the presence of only one light source, it reads as I = ka Ia +
1 kd Id cos φ + ks Is (cos θ)α r2
(1)
where I := I(x) is the normalised grey value of the image pixel located at T x = (x1 , x2 ) ∈ R2 , and r = uf is the distance of the surface point from the light source. In (1), the intensities of ambient, diffuse, and specular components of light are denoted by Ia , Id and Is , respectively. In analogous notation, the constants ka , kd , and ks with ka + kd + ks ≤ 1 denote the ratio of ambient, diffuse, and specular reflection. Discussing the light reflection contributions, the ambient light models a base intensity in the depicted scene, i.e., a basic illumination present everywhere. The diffusely reflected light in each direction is proportional to the cosine of the angle φ between surface normal and light source direction. In our scenario, the latter is identical to the direction of the optical centre. The amount of specular light also reflected in this direction is proportional to (cos θ)α , where θ is the angle between the ideal (mirror) reflection direction of the incoming light and
736
O. Vogel et al.
the optical centre. The number α is a constant depending on the roughness of the material. An ideal mirror reflection can be described via α → ∞. Note also that the cosine in the specular term is to be set to zero if it yields negative values. The SfS model. Plugging in appropriate expressions, the brightness equation (1) yields a nonlinear Hamilton-Jacobi equation. For details of the derivation see [8]. One (usual) important model assumption not mentioned up to now is the visibility of the surface. This means that it is in the front of the optical centre, so that the unknown 3-D depth u is strictly positive. Employing then the change of variables v := v(x) = ln(u(x)), the resulting model is given by 2 α M ks Is 2Q exp (−2v) JM − kd Id exp (−2v) − − 1 = 0, (2) Q M2 2 I )f /Q, M (x) = f 2 |∇v|2 + (∇v · x)2 + Q2 , and Q(x) = where J(x) = (I − k a a 2 f/ |x| + f 2 . In this description, |.| is the Euclidean vector norm and f is the focal length relating the optical centre of the camera and the retinal plane. The terms occuring in (2) can be distinguished by their ordering corresponding to their appearance within the brightness equation (1). T T ∂ ∂ Note that ∇v = ∂x v, v =: (vx1 , vx2 ) contains first-order spatial ∂x2 1 derivatives, and thus the given model is a first-order PDE. It needs to be supplemented by boundary conditions: for details see the section concerned with experiments. The expressions in (2) are also the basis for our numerical implementation of the FM scheme.
3
Discretisation and Fast Marching Implementation
It is of importance to discretise the occuring spatial derivatives in the correct fashion, as in the case of hyperbolic PDEs like the currently given HamiltonJacobi equation it is well-known that simply using central differences leads to a blow-up of numerical solutions. In order to ensure the stability of our algorithm as well as the validity of reasonable theoretical properties, we thus employ an upwind method as in [5, 4, 8]. Spatial Discretisation. We use the following conventions: – vi,j denotes the approximation of v (ih1 , jh2 ), where – i and j are the coordinates of the pixel (i, j) in x1 - and x2 -direction, respectively, and – h1 and h2 are the corresponding mesh widths in our pixel grid. Then the spatial discretisation of derivatives reads as vx1 (ih1 , jh2 ) ≈ h−1 1 min (0, vi+1,j − vi,j , vi−1,j − vi,j ) ,
(3)
vx2 (ih1 , jh2 ) ≈
(4)
h−1 2
min (0, vi,j+1 − vi,j , vi,j−1 − vi,j ) .
Fast Shape from Shading for Phong-Type Surfaces
737
Terms like Q, I and exp (−2v) can be evaluated pointwise at (i, j), so that we have completely defined the spatial discretisation of (2). We refrain from writing down the complete discrete expression of the scheme, as this is quite cumbersome and does not give more insight. Fast Marching. Let us now turn to the FM method. We only sketch here the idea behind it, as there are many extensive descriptions available in the literature, see especially [11]. The basic principle behind the FM scheme applied in the SfS setting is to advance monotonically a front from the foreground of the depicted object to the background. Thereby, the pixels are distinguished by the labels ’known’, ’trial’ and ’far’, respectively, referring thereby via ’known’ and ’trial’ to the corresponding 3-D depth. In the beginning, all pixels are labelled as ’far’ with their depth values set to infinity. However, since the FM method propagates information from the foreground to the background, it relies on correct depth values being supplied in the pixel which is most in the foreground, i.e. the pixel with minimum depth. In the case of complex images which consist of multiple segments, for each of these segments the correct depth in the point with minimum depth must be supplied. These points are called singular points. These singular points are then marked as ’trial’, which concludes the initialisation of the method. For FM methods on SfS it is common to just require this data to be provided. Other methods like [5], however, do not require the knowledge of given initial depth data. We therefore aim at estimating very precisely the locations of singular points and obtain a SfS method using the FM scheme that does not rely on any depth information to be provided. The task of estimating this data will be the subject of the next section. The ’trial’ candidate with the smallest computed depth is then marked as ’known’, taking the computed 3-D depth in this point as the estimate. The pixels adjacent in terms of the stencil to the new set of known points are updated with respect to their label, marking them as ’trial’. The described process is then repeated until all image pixels are marked ’known’. Fixed-Point Iteration. Updating the depth at ’trial’ points consists of solving the discrete form of (2) for v in this point. In contrast to other SfS techniques using FM, we need to solve a nonlinear equation. This is not trivial in our case, since near the solution, the derivative of (2) is very low, making standard solvers like the Newton method diverge in most cases. To avoid this, we employ the Regula Falsi: Starting with two values v1 and v2 such that v1 < v2 and the left-hand side L of (2) is negative in v1 and positive in v2 , one chooses v3 :=
L(v2 )v2 − L(v1 )v1 , L(v2 ) − L(v1 )
(5)
which is between v1 and v2 . If L at v3 is negative, set v1 := v3 , otherwise set v2 := v3 . Repeating this until v1 and v2 are very close together yields an estimate for the solution of (2) in this pixel. Note that computing the derivatives involves computing a minimum. Depending on v1 , v2 and v3 , these minima might change
738
O. Vogel et al.
within the estimation process. Thus, it is necessary to update the values of v1 , v2 , v3 during the process.
4
Estimating the Initial Depth
The FM methods for SFS rely on the knowledge of ground truth data at singular points, i.e. at points with locally minimal depth. However, in general this kind of data is not given. Thus, these depth values need to be estimated. In the experimental section, we will show that a good estimate is crucial for the reconstruction quality. In most other works, this issue is neglected. In [4], the problem is solved by obtaining an initial estimate for the depth using an orthographic SfS method. Their perspective method, however, is not comparable with the one used in this paper, since they neglect the light attenuation term. By doing this, their solution is invariant to multiplicative scalings of the depth. This is not true in our case. To obtain a working method, we either need to know the correct depth at singular points or estimate both the singular points and their depth. In this section, we will introduce ways to estimate the locations of singular points and estimate their depths as correctly as possible. Lambertian Case. For simplicity, we first focus on the Lambertian case, i.e. ka = ks = 0, kd = 1. In this simplified model, the brightness of a pixel is determined by two main factors: (i) The angle between surface normal and light source direction φ and (ii) the light attenuation because of the distance of the surface point to the light source. Directly from the model (1) we obtain the simple equation cos φ I = Id 2 2 . (6) u f Assuming the surface to be continuously differentiable, the points of minimal depth are the points where the derivatives of the depth vanish, which means the surface normal points directly to the viewer. This results in φ = 0, which leads by use of cos 0 = 1 and re-arranging (6) to
1 u = Id 2 . (7) If Knowing the coordinates of singular points, we can compute the depth. It remains to determine the coordinates of singular points. Singular points are local minima in depth. Since minima in depth mean both less attenuation and a maximum Lambertian reflectance, this suggests that local maxima in image brightness are the singular points. At the image boundary, it might happen that we have brightness maxima that do not satisfy φ = 0. In this case, there can be errors. In most cases, this does not affect the reconstruction quality significantly. Due to sampling and quantisation artifacts, it is possible that this estimate might be slightly off, both in the location of singular points and in the estimated depth. This effect is usually rather small.
Fast Shape from Shading for Phong-Type Surfaces
739
In conclusion, we propose to search local maxima in the image and estimate their depth according to equation (7). Boundary pixels should not be considered, since the estimate might be incorrect due to φ not being zero. The points obtained in this way should be marked as ’trial’ points for the subsequent FM method. In the Phong case which follows we use the same approach. Phong Case. To obtain a good estimate for singular points in the general case, we review the model equation again. Essentially, we have I = ka Ia +
α
kd Id cos φ + ks Is (cos θ) . u2 f 2
(8)
At singular points, we have φ = θ = 0, which simplifies equation (8) to I = ka Ia +
kd Id + ks Is . u2 f 2
(9)
Now, after shifting the grey values down by the ambient brightness to I − ka Ia , we can separate diffuse and specular light and compute the diffuse brightness I by kd Id I = (I − ka Ia ). (10) kd Id + ks Is Now, we can make use of the equation (7) using I instead of I.
5
Experiments
In this section, we evaluate the presented method on both synthetic and realworld images. We discuss the accuracy and importance of the initial estimates at singular points. In comparison to other methods in the field, we evaluate the accuracy and the performance of our method. Note that for none of the experiments, any a-priori depth information is used. In the cases where we need depth initialisation at singular points, we use the estimation method introduced in Section 4. Lambertian case. First, we restrict the method to diffuse reflection only. We compare the reconstruction quality and performance with the methods of Prados et al. [5], Cristiani et al. [17], and Vogel et al. [18], which use all the same Lambertian model, but different schemes. Visually, the reconstructions of these methods are almost identical. Their performance, however, is different. Figure 1 shows the vase surface [2], a classic test surface for SfS algorithms, and a rendered version of this surface using a Lambertian model. The rendering parameters are f = 492, Id = 100000, 128 × 128 pixels. When detecting the local maxima, we notice that around the maxima, we have more than just one point with the same maximal grey value. This is a result from the quantisation of the image. Since we set the depth estimates of these points to ’trial’, only one of them will be used as an actual depth estimate. This might not be the actual position of the singular point, but it is close.
740
O. Vogel et al.
Fig. 1. Vase surface and Lambertian rendered image
Fig. 2. Reconstruction of the vase using a Lambertian model. Left: Reference methods. Middle: Our method. Right: Wrong depth estimate. Table 1. Results of the Lambertian vase experiment Method Depth Error Initialisation Time Computation Time Prados et al. 0.39% ≈ 0s 36.99s Cristiani et al. 0.31% ≈ 0s 28.89s Vogel et al. 0.32% ≈ 0s 2.96s Our method 0.39% 0.02s 0.39s Wrong initialisation 8.15% 0.02s 0.39s
Figure 2 shows the reconstructions of the vase surface using both the presented method and the reference methods. The results are visually very similar. In Figure 2, also a reconstruction can be found where we manually chose a wrong depth at the singular points, i.e. we multiplied the estimates with a factor 0.9. We can see that this distorts the reconstruction. Table 1 supports our visual impression. It shows the relative average depth errors for the reconstructions. With the correct depth estimates, our method is about as good as the three reference methods. In fact, all reconstructions are nearly perfect. The quality of the reconstruction using the faulty estimation technique is a lot worse. This means the correctness of our initial guess is crucial for the reconstruction quality. Phong case. Now we evaluate the method on a synthetic image rendered using the Phong reflectance model. We compare to the same methods as before, but since the reference methods only consider a Lambertian model, we additionally compare to the method of Vogel et al. [8] using the Phong model.
Fast Shape from Shading for Phong-Type Surfaces
741
Fig. 3. Mozart surface and rendered image using the Phong model
Fig. 4. Reconstructions of the Mozart surface. Left: Lambertian methods. Middle: Vogel et al. using the Phong model. Right: Our method.
Table 2. Results of the Phong Mozart experiment. Methods marked with (L) use a Lambertian model for the reconstruction. Method Depth Error Initialisation Time Computation Time Prados et al. (L) 12.58% 0.02s 158.62s Cristiani et al. (L) 12.17% 0.02s 170.37s Vogel et al. (L) 12.56% 0.02s 16.33s Vogel et al. 5.39% 0.02s 68.76s Our method 5.07% 0.03s 1.85s
Figure 3 shows a rendered version and the ground truth of the Mozart face [2], a classic test image. This time, we rendered the image using the Phong reflectance model. Parameters for the rendering are f = 500, Id = Is = 100000, kd = 0.7, ks = 0.3, α = 5. Note that the Mozart face is a perfect test image for multiple sectors in an image, of which each has its own singular point. In Figure 4, we show reconstructions of the Mozart face using our method, the method of Vogel et al., and the three Lambertian reference methods. The Phong reconstructions are clearly more accurate than the Lambertian ones. Table 2 shows the reconstruction errors and computation times of the Mozart experiment. We notice that the error of our method is about equal to the one of the method of Vogel et al., and it the Lambertian methods w.r.t. quality. Again, our method is up to two orders of magnitude faster than any of the other methods. Another important observation is that the performance gain is
742
O. Vogel et al.
Fig. 5. Real input image: Rook, knight, and pawn
Fig. 6. Reconstructions of chess figures. Left: [8] with Phong. Right: Our method.
much larger on the Mozart test image compared to the vase. The reason for this is the larger size of the Mozart image. This suggests that on high-resolution images, our method might have a clear advantage over other methods in the field. This is particularly interesting for real-world images. Many authors apply their methods to relatively small test images, at most 256 × 256 pixels, usually even much less. We now apply our method to a full-size real-world image with 8 megapixels. On such images, the reference methods take very long to converge. A Real-World Experiment. Figure 5 shows a photograph of three chess figures: a rook, a knight, and a pawn. The original image has size 3264 × 2448 and has been taken with a digital camera with flash in a darkened room. For the reconstruction, we used the known square pixel sizes of 1.61μm and the known focal length of 70.2mm. This gives for pixel size 1 a relative focal length of f = 43478, which we used for the reconstruction. Since scaling Ia , Id and Is only stretches the reconstructed surface by a factor that depends quadratically on the scaling factor, their magnitude is not important for the reconstruction process. For simplicity, we just chose them all equal to 100000. We manually estimated the other parameters to kd = 0.6, ks = 0.4 and α = 10. We neglected ambient light, i.e. we set ka = 0.
Fast Shape from Shading for Phong-Type Surfaces
743
Figure 6 shows reconstructions of the high-resolution version of the image using our method and the method of Vogel et al., both with the same parameters. The reconstruction using our method looks much smoother than the one with the method of Vogel et al. This can be explained by the different numerics of both methods. Our method starts at singular points and reconstructs the surface from near to far, while the other method treats all depths equally. For images like this, i.e. images with light objects in front of a dark background, our method has the clear advantage of recovering the object of interest first, such that this part is not distorted by artifacts caused by the background. Table 3. Run times of the chess experiment. (S) marks experiments on a downsampled image of size 408 × 306, (F) denotes the full, high-resolution image. Method Vogel et al. (S) Our method (S) Vogel et al. (F) Our method (F)
Iterations 296 1 1207 1
Initialisation Time 0.03s 0.07s 1.98s 2.9s
Computation Time 139.8s 2.8s 38645s 263.2s
Table 3 shows the computation times compared to a test using a downsampled version of the image. While the computation times of our method are very low, the computation times of the iterative reference method are extremely high, especially for the large input image. This makes our method still applicable even on large images, outshining other methods with respect to computation time. It also shows that the performance gain of using FM for SfS increases with the size of the input image.
6
Conclusion
We have shown that the FM scheme is the method of choice for modern SfS models that incorporate light attenuation and non-Lambertian reflectance. Without compromising quality it is two orders of magnitude faster than other approaches. We demonstrated that it is possible to estimate initial depths to obtain a method that does not rely on the knowledge of initial data. By combining state-of-the-art SfS models and proper numerical methods, it becomes possible to tackle real-world data with image sizes of many megapixels. This is far beyond the size of the model problems that are considered in many SfS papers.
References 1. Horn, B.K.P., Brooks, M.J.: Shape from Shading. Artificial Intelligence Series. MIT Press, Cambridge (1989) 2. Zhang, R., Tsai, P.S., Cryer, J.E., Shah, M.: Shape from shading: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 21(8), 690–706 (1999)
744
O. Vogel et al.
3. Tankus, A., Sochen, N., Yeshurun, Y.: A new perspective [on] shape-from-shading. In: Proc. Ninth International Conference on Computer Vision, vol. 2, pp. 862–869. IEEE Computer Society Press, Nice (2003) 4. Tankus, A., Sochen, N., Yeshurun, Y.: Perspective shape-from-shading by fast marching. In: Proc. 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 43–49. IEEE Computer Society Press, Washington (2004) 5. Prados, E., Faugeras, O.: Shape from shading: A well-posed problem? In: Proc. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 870–877. IEEE Computer Society Press, San Diego (2005) 6. Ahmed, A., Farag, A.: A new formulation for shape from shading for nonLambertian surfaces. In: Proc. 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 17–22. IEEE Computer Society Press, New York (2006) 7. Oren, M., Nayar, S.: Generalization of the Lambertian model and implications for machine vision. Vogel-International Journal of Computer Vision 14(3), 227–251 (1995) 8. Vogel, O., Breuß, M., Weickert, J.: Perspective shape from shading with nonLambertian reflectance. In: Rigoll, G. (ed.) DAGM 2008. LNCS, vol. 5096, pp. 517–526. Springer, Heidelberg (2008) 9. Phong, B.T.: Illumination for computer-generated pictures. Communications of the ACM 18(6), 311–317 (1975) 10. Tsitsiklis, J.N.: Efficient algorithms for globally optimal trajectories. IEEE Transactions on Automatic Control 40(9), 1528–1538 (1995) 11. Sethian, J.A.: Level Set Methods and Fast Marching Methods, 2nd edn. Cambridge University Press, Cambridge (1999) 12. Kimmel, R., Sethian, J.A.: Optimal algorithm for shape from shading and path planning. Vogel-Journal of Mathematical Imaging and Vision 14(3), 237–244 (2001) 13. Yuen, S.Y., Tsui, Y.Y., Chow, C.K.: Fast marching method for shape from shading under perspective projection. In: Proceedings of the 2nd IASTED International Conference on Visualization, Imaging and Image Processing, Malaga, Spain, September 2002, pp. 584–589 (2002) 14. Tankus, A., Sochen, N., Yeshurun, Y.: Shape-from-shading under perspective projection. International Journal of Computer Vision 63(1), 21–43 (2005) 15. Prados, E., Soatto, S.: Fast marching method for generic shape from shading. In: Paragios, N., Faugeras, O., Chan, T., Schnörr, C. (eds.) VLSM 2005. LNCS, vol. 3752, pp. 320–331. Springer, Heidelberg (2005) 16. Yuen, S.Y., Tsui, Y.Y., Chow, C.K.: A fast marching formulation of perspective shape from shading under frontal illumination. Pattern Recognition Letters 28, 806–824 (2007) 17. Cristiani, E., Falcone, M., Seghini, A.: Some remarks on perspective shape-fromshading models. In: Sgallari, F., Murli, F., Paragios, N. (eds.) SSVM 2007. LNCS, vol. 4485, pp. 276–287. Springer, Heidelberg (2007) 18. Vogel, O., Breuß, M., Weickert, J.: A direct numerical approach to perspective shape-from-shading. In: Lensch, H., Rosenhahn, B., Seidel, H.P., Slusallek, P., Weickert, J. (eds.) Vision, Modeling, and Visualization, pp. 91–100. AKA, Berlin (2007)
Generic Scene Recovery Using Multiple Images Kuk-Jin Yoon1 , Emmanuel Prados2 , and Peter Sturm2, 1
Computer vision Lab., Dept. Information and Communications, GIST, Korea 2 Perception Lab., INRIA Grenoble - Rhône-Alpes, France
Abstract. We present a generative model based method for recovering both the shape and the reflectance of the surface(s) of a scene from multiple images, assuming that illumination conditions are known in advance. Based on a variational framework and via gradient descents, the algorithm minimizes simultaneously and consistently a global cost functional with respect to both shape and reflectance. Contrary to previous works which consider specific individual scenarios, our method applies to a number of scenarios – mutiview stereovision, multiview photometric stereo, and multiview shape from shading. In addition, our approach naturally combines stereo, silhouette and shading cues in a single framework and, unlike most previous methods dealing with only Lambertian surfaces, the proposed method considers general dichromatic surfaces.
1
Introduction and Related Work
Many methods have been proposed to recover the three-dimensional surface shape using multiple images during these last two decades [1]. On the other hand, for a long time, the estimation of surface radiance/reflectance was secondary. Even some recent works [2,3,4,5] compute the 3D shape without considering radiance estimation. However, radiance/reflectance estimation has become a matter of concern in multiview reconstruction scenarios in the last decade [6, 7, 8]. Especially, recovering reflectance is required for realistic relighting, which is also fundamental in virtual reality as well as augmented reality. In addition, in real life applications, perfect Lambertian surfaces are rare and, therefore, multiview stereo algorithms have to be robust to specular reflection. Widespread ideas are to use appropriate similarity measures [2,9,10] and/or to modify input images in order to remove specular highlights [11, 12]. However, those similarity measures are not generally valid under general lighting conditions and these methods are strongly limited by the specific lighting configuration. Concerning the robustness to non-Lambertian effects, it is also worth to cite [6] which considers the radiance tensor. However, the radiance tensor presented in [6] is not appropriate when the images of the scene are taken under several (different) lighting conditions. In this paper, our goal is to provide a model based method that simultaneously estimates shape and reflectance by combining stereo, silhouette, and
This work was supported by the Flamenco project (ANR-06-MDCA-007) and by the GIST Dasan project.
X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 745–757, 2009. c Springer-Verlag Berlin Heidelberg 2009
746
K.-J. Yoon, E. Prados, and P. Sturm
shading cues in a single framework. The method we propose is robust to nonLambertian effects by directly incorporating a specular reflectance model in the mathematical formulation of the problem. By incorporating a complete photometric image formation model, it also exploits prolifically all the photometric phenomena, as it is explicitly done in photometric stereo methods. Also, it allows to naturally deal with images taken under several lighting conditions. Let us note that actually there already exist recent works that provide solutions in this direction. [13] proposes a model-based method for recovering the 3D shape and the reflectance of a non-Lambertian object. Nevertheless, in this paper, the authors constrain the object to be made of a single textureless material — the parameters of the reflectance (in particular the albedo) are the same for all the points of the object surface. So, the method in [13] is a “multiview shape from shading” method, similarly as the one proposed by [8, 14] which focus on the Lambertian case. To our knowledge, with the exception of [15, 16], all the works going in the same direction as ours are limited to surfaces made of a single (textureless) material. In particular, this is the case for the photometric stereo methods proposed by [17, 18] and for the multiview photometric stereo work of [19]. Only the similar works [15, 16] are able to recover scenes with varying albedo. However, in [15,16], the authors tried to filter out specular highlights by using a simple thresholding and to use only diffuse components to estimate the shape. [15] also used a thresholding to detect shadowed pixels not visible from light sources, which is however not working under multiple light sources. In our work, we do not want to restrain ourselves to a single textureless material. (In return, we assume that lighting conditions are known in advance.) And, more generally, one of the goals of this paper is to show that the joint computation of shape and reflectance is beneficial from several points of view. In addition to providing the reflectance of the scene, this allows to naturally introduce specular models in the mathematical formulation of the multiview reconstruction problem; and thus the method to be robust to highlights. Without any additional effort, it is also possible to deal with a set of images lighted by several different conditions (which is not possible with radiance only). Moreover in such a case, the method allows to completely exploit the variations of the radiance according to the changes of illumination, as in photometric stereo. Finally, this allows to easily incorporate some constraints on the reflectance and so in particular to naturally exploit shading effects in textureless regions. Here, let us emphasize that, contrary to previous works considering specific scenarios, our method can be applied indiscriminately to a number of scenarios — multiview stereovision, multiview photometric stereo, and multiview shape from shading.
2
Modeling Assumptions and Notations
We assume here that the scene can be decomposed into two entities: the foreground, which corresponds to the objects of interest, and the background. The foreground is composed by a set of (bounded and closed) 2D manifolds of R3 and represented by S.
Generic Scene Recovery Using Multiple Images
747
Images are generated by nc pinhole cameras. The perspective projection performed by the ith camera is represented by Πi : R3 → R2 . πi ⊂ R2 is the image domain of the ith camera. It is split into two parts: the pixels corresponding to the foreground, πiF = πi ∩ Πi (S), and the other points πiB = πi \ πiF . Ii : πi → Rc is the image of the true scene, captured by the ith camera1 . I is the set of input images and IiF and IiB are the restrictions of the function Ii to πiF and πiB , respectively. In other respects, the visibility function vSi : R3 → R is defined by: vSi (X) = 1 if X is visible from the ith camera and vSi (X) = 0 if not. Si denotes −1 the part of S that is visible from the ith camera and Πi,S is the back-projection th from the i camera onto Si . We model the scene as being illuminated by a finite number of distant point light sources and an ambient light. nil is the number of illuminants corresponding to the ith camera and lij ∈ S2 and Lij ∈ Rc are the direction and the intensity1 of the j th illuminant of the ith camera, respectively. Similarly, Lia ∈ Rc is the intensity1 of the ambient illumination of the ith camera. vLij : R3 → R is the light visibility function: vLij (X) = 1 if the j th illuminant of the ith camera is visible from X, vLij (X) = 0 otherwise. We model the foreground object(s) by its shape S and its reflectance R. We denote Ω = (S, R). Contrary to most previous stereovision methods, we want to go beyond the Lambertian model. In order to get a solvable minimization problem without too many unknowns, we represent the reflectance by a parametric model. In this work, we consider the popular Blinn-Phong shading model. In this context, and assuming that Ii (x) is equal to the radiance −1 of the surface S at point X = Πi,S (x) in the direction of the ith camera, the images Ii are decomposed as Ii = Iid + Iis + Iia , where Iid , Iis , and Iia are images with the diffuse, specular, and ambient reflection components of Ii , respectively. Here, diffuse reflection can be expressed by using the cosine nil law as Iid (x) = n(X) · l , where ρd (X) ∈ Rc is v (X) ρ (X)L d ij ij j=1 Lij the diffuse albedo1 at point X ∈ S, n(X) is the normal vector to the surface S at X. On hand, specular reflection is expressed as Iis (x) = the other αs (X) nil , where hij (X) is the bisector of n(X) · h v (X) ρ (X)L (X) s ij ij j=1 Lij the angle between the view of the ith camera and the j th illuminant at X, ρs (X) ∈ Rc and αs (X) ∈ R+ are the specular albedo and the shininess parameter at point X. The ambient illumination is assumed to be uniform and modeled as Iia (x) = ρd (X)Lia , where Lia is defined above. Finally, the image formation equation is given as Ii (x) =
nil
vLij (X)Lij (X, n(X)) + ρd (X)Lia ,
(1)
j=1
αs (X) . We where Lij (X, n(X)) = Lij ρd (X) n(X) · lij + Lij ρs (X) n(X) · hij (X) denote R = (Rd , Rs ), where Rd = ρd and Rs = (ρs , αs ). As suggested by [20,21], to be sure that the estimated foreground surface does not shrink to an empty set, it is crucial to define and characterize the background. 1
Non-normalized color vector, if c = 3.
748
K.-J. Yoon, E. Prados, and P. Sturm
In this work, we assume that we have the background images I˜ = {I˜1 , · · · , I˜nc } and define (I˜iF , I˜iB ) analogously to (IiF , IiB ).
3
Bayesian Formulation of the Problem
Clearly, the goal of this work is to estimate the shape S and the reflectance R of a scene surface Ω, that maximize P (Ω|I) for given I. By Bayes’ rule, P (Ω|I) = P (I|Ω) P (Ω)/P (I) ∝ P (I|Ω) P (Ω) = P (I|S, R) P (S) P (R)
(2)
under the assumption that S and R are independent. Here, P (I|Ω) = P (I|S, R) is a likelihood and P (S) and P (R) are priors on the shape and reflectance, respectively. When Πi is given, we can produce a synthetic image I¯i (Ω) corresponding to Ii by using the current estimation of Ω. This allows us to measure the validity of the current estimation by comparing input images with generated ones. When assuming an independent identical of observations, the likelihood nc distribution nc can be expressed as P (I|Ω) ∝ i=1 exp − ξi (Ω) = i=1 exp − ξ(Ii , I¯i (Ω)) , where ξi (Ω) = ξ(Ii , I¯i (Ω)) is a function of Ω, measuring image dissimilarity. A typical and prior for the surface shape S is about the area given reasonable as P (S) ∝ exp − ψ(S) . Here, ψ(S) is the monotonic increasing function of the surface area S dσ where dσ is the Euclidean surface measure. In other respects, a prior on the reflectance is also required because there are not enough observations exhibiting specular reflection at every surface point. To overcome the lack of observations, we assume that specular reflectance varies smoothly within each homogeneous material surface patch. This prior is clearly reasonable in real life applications and in common scenes. Thus, in this work, we use the diffuse reflectance of a surface as a soft constraint Ω and to partition define the prior on the surface reflectance as P (R) ∝ exp − ω(R) , where ω(R) will be defined later.
4
Description of the Cost Functions
Based on the section 3, the problem can be expressed terms of cost functions in nc as Etotal (Ω) = Edata (Ω) + Eshape (S) + Eref l (R) = i=1 ξi (Ω) + ψ(S) + ω(R). Maximizing the probability P (Ω|I) is equivalent to minimizing the total cost. Data Cost Function. The current estimation of Ω gives a segmentation of the input image Ii into foreground IiF and background IiB and we can synthesize I¯iF according to the above image formation model. As for I¯iB , it is generated according to the available background model. In this paper, we use actual background images, i.e. I¯iB =I˜iB . Also, as suggested by [20], ξi (Ω) = ξ(Ii , I¯i ) is then rewritten as ξ(Ii , I¯i ) = ξF (IiF , I¯iF ) + ξB (IiB , I¯iB ) = ξˆF (IiF , I¯iF ) + ξ(Ii , I˜i ),
(3)
Generic Scene Recovery Using Multiple Images
749
where ξˆF (IiF , I¯iF ) = ξF (IiF , I¯iF ) − ξF (IiF , I˜iF ). Since ξ(Ii , I˜i ) is independent of nc ˆ ξF (IiF , I¯iF ) + C, where Ω, the data cost function is written as Edata (Ω) = i=1 nc nc C = i=1 Ci = i=1 ξ(Ii , I˜i ) is constant. When computing ξ, any statistical correlation among color or intensity patterns such as the sum of squared differences (SSD), cross correlation (CC), and mutual information (MI) can be used. In any case, ξ can be expressed as the integral over the image plane as ξ(Ii , I¯i ) = πi ei (x)dσi , where dσi is the surface measure and ei (x) is the contribution at x to ξi . The data cost function is then given as nc
Edata (Ω) = eˆi (x)dσi + C, (4) i=1
πiF
where eˆi (x) = ei Ii (x), I¯i (x) − ei Ii (x), I˜i (x) . Decoupling Appearance from Surface Normal. As shown in Eq. (1), surface appearance is dependent on both the surface normal and position, and this makes the problem hard to solve and unstable. To resolve this problem, we introduce a photometric unit vector field v satisfying v = 1 as in [14], which is used for the computation of surface appearance. To penalize the deviation between the actual normal vector n and the photometric normal vector v, we add a new term
Edev (Ω) = τ χ(X)dσ = τ (1 − (n(X) · v(X))) dσ, (5) S
S
to the cost function, where τ is a control constant. Shape Area and Reflectance Discontinuity Cost Functions. By using the area of a surface for the prior, Eshape (S) is simply defined as Eshape (S) = ψ(S) = λ S dσ, where λ is a control constant. Based on the assumption in section 3, we define a discontinuity cost function of surface reflectance, which makes the discontinuities of specular reflectance generally coincide with the discontinuities of diffuse reflectance, as
Eref l (R) = ω(R) = β
ζ Rd (X) × η Rs (X) dσ,
f (X)dσ = β S
(6)
S
where β is a control constant, and ζ Rd (X) and η Rs (X) are defined as ζ Rd (X) =
1−
∇S Rd (X)2 M
, η Rs (X) = ∇S ρs (X)2 + γ∇S αs (X)2 (7)
with a pre-defined constant M .2 ∇S denotes the intrinsic gradient defined on S. By using the proposed discontinuity cost function of surface reflectance, surface points that do not have enough specular observations get assigned specular reflectance inferred from the specular reflectance of neighboring surface points. 2
Be sure that M ≥ 3 for gray-level images and M ≥ 9 for color images.
750
K.-J. Yoon, E. Prados, and P. Sturm
Total Cost Function. Finally, the total cost function is given by Etotal (Ω) = C +
nc
i=1
πiF
eˆi (x)dσi +τ
χ(X)dσ+λ S
dσ+β S
f (X)dσ. (8) S
Here, it is worthy of notice that Edev (Ω), Eshape (S), and Eref l (R) are defined over the scene surface while Edata (Ω) is defined as an integral over the image plane. By the change of variable, dσi = − di (X)·n(X) dσ, where di (X) is the zi (X)3 vector connecting the center of the ith camera and X and zi (X) is the depth of X relative to the ith camera, we can replace the integral over the image plane by an integral over the surface [7]. When denoting g(X, n(X)) : R3 × Ω → R as
n c d · n i vSi eˆi 3 + τ χ + λ + βf , g(X, n(X)) = − (9) zi i=1 Eq. (8) is simply rewritten as Etotal (Ω) = C +
5
S
g(X, n(X))dσ.
Scene Recovery
Recently, via graph cuts or convexity, some authors have proposed some global optimization methods for the classical multiview stereovision problem [5, 22, 23]. Nevertheless, because of the presence of the normal but also of the visibility in the cost function, the state of the art in optimization does not allow to compute the global minimum of the energy we have designed. Also, here, scene recovery is achieved by minimizing Etotal via gradient descents. In other respects, S and R being highly coupled, it is very complicated to estimate all unknowns simultaneously. To solve the problem, we adopt an alternating scheme, updating S for a fixed R and then R for a fixed S. 5.1
Shape Estimation – Surface Evolution
When assuming that R is given, Etotal is a function of S. In this work, we derive the gradient descent flows corresponding to the cost functions respectively. The final gradient descent flow is then given by St = St data + St dev + St shape + St ref l , (10) where St data , St dev , St shape and St ref l are described below. The data cost is a function of the visibility of a surface point, which is dependent on the whole surface shape. According to [20, 21] for correctly dealing with the visibility of non-convex objects, St data is given by nc v i (ˆ v i ei − eˆ ) − S 3 i dti ∇ndti δ(di · n) + S3 ∂2 eˆi ∇I¯i · di , (11) St data = zi zi i=1
Generic Scene Recovery Using Multiple Images
751
where δ(·) is the delta function and eˆi is an error computed by using the radiance at point X in the direction of the ith camera, which is the terminator of a horizon point X [21]. When a horizon point has no terminator point on the surface, eˆi = 0 ¯ because the terminator nil point is from the background. ∇Ii is expressed by using ¯ Eq. (1) as ∇Ii = j=1 {(∇vLij )Lij + vLij (∇Lij )} + (∇ρa )Lia . This gradient descent flow includes both the variation related to the camera visibility changes (the first term) and the variation related to the image changes (the second term), which also includes the variation due to the light visibility changes. In addition, similarly as [8, 14], the gradient descent flows for the normal deviation cost St dev (originating from Edev (Ω)) is St dev = (−2τ H + τ (∇ · v)). Also St shape (from Eshape (S)) is the mean curvature flow as St shape = −2λH. Due to the complexity of the discontinuity cost function of surface reflectance, it needs more attention to derive the gradient descent flow. By using the derivation in [24], we get the following equation for surface evolution. 1 St ref l = −2β m(ρd )η(Rs ) − (m(ρs ) + γm(αs )) ζ(Rd ) . (12) M Here, m(y) = II ∇S y×n +∇S y2 H , where II(t) is the second fundamental form for a tangent vector t with respect to n. 5.2
Photometric Unit Vector Field Update
The computed gradient descent flows minimize the total cost with respect to given reflectance and v. We then update the photometric unit vector field v to minimize the total cost with respect to given shape and reflectance. The v that nc ∂g I¯i di ·n + vSi ∂2 eˆi ∂∂v minimizes the total cost satisfies the equation ∂v = − i=1 3 zi
(−τ n) = 0. Here, we have to keep v = 1. Since v ∈ S2 , v can be expressed in spherical coordinates as [cos θv sin φv , sin θv sin φv , cos φv ]T where θv and φv are the coordinates of v. Therefore, we update θv and φv to update v. As before, the θv and φv that minimize the total cost satisfy the following two equations by the chain rule. ∂g ∂v ∂g ∂g ∂v ∂g · · = = 0, = =0 ∂θv ∂v ∂θv ∂φv ∂v ∂φv
(13)
So, we update v by performing gradient descent using above two PDEs. 5.3
Reflectance Estimation
Here, we estimate R for fixed S and v, still minimizing the total cost function. Since Edev and Eshape do not depend on R at all, we seek an optimal R by minimizing (Edata (Ω) + Eref l (R)). Here, because it is also complex to estimate diffuse and specular reflectance at the same time due to the high coupling between them, we alternatively estimate surface reflectance one by one while assuming that the rest are given and fixed. We repeat the procedure until they no longer change.
752
K.-J. Yoon, E. Prados, and P. Sturm
Diffuse Reflectance Estimation. For given S and Rs , we estimate ρd that minimizes the cost (Edata + Eref l ). Here, ρd that minimizes the total cost func c i ∂ I¯i di ·n tion will satisfy the Euler-Lagrange equation given as − ni=1 vS ∂2 eˆi ∂ρ 3 + d zi 2β M η Rs ΔS ρd = 0, where ΔS denotes the Laplace-Beltrami operator defined on the surface S. We solve the PDE by performing gradient descent using the following PDE:
n c ∂ρd ∂ I¯i di · n 2β i = − η R + ΔS ρd . vS ∂2 eˆi (14) s ∂t ∂ρd zi 3 M i=1 Specular Reflectance Estimation. We then estimate Rs = (ρs , αs ) for given S and Rd in the same manner. ρs that minimizes the total cost function nc ∂ I¯i di ·n will satisfy the Euler-Lagrange equation given as − i=1 − vSi ∂2 eˆi ∂ρ 3 s zi 2β ΔS ρs ζ ρd = 0. We again solve the PDE by performing gradient descent using the following PDE. nc ∂ρs ∂ I¯i di · n i =− vS ∂2 eˆi − 2β ΔS ρs ζ ρd . 3 ∂t ∂ρ z s i i=1 αs is also estimated in the same manner by solving the PDE as nc ∂αs ∂ I¯i di · n =− vSi ∂2 eˆi − 2βγ ΔS αs ζ ρd . 3 ∂t ∂αs zi i=1
(15)
(16)
Single-Material Surface Case. When dealing with a single-material surface that has a single specular reflectance Rs , the discontinuity cost function of surface reflectance, Eref l (R), can be excluded because f (X) is zero everywhere on the surface. The PDE used for the ρd estimation, Eq. (14), then simplifies to nc i ∂ρd ∂ I¯i di ·n ˆi ∂ρ 3 . Here ρs and αs are also computed by performing i=1 vS ∂2 e ∂t = − d zi gradient descent using PDEs given as ∂ρs = ∂t
6
− S
nc
vSi ∂2 eˆi
i=1
∂αs ∂ I¯i di · n = dσ, ∂ρs zi 3 ∂t
− S
nc i=1
vSi ∂2 eˆi
∂ I¯i di · n dσ. (17) ∂αs zi 3
Experiments
We have implemented the gradient descent surface evolution in the level set framework. The proposed method starts with the visual hull obtained by rough silhouette images to reduce computational time and to avoid local minima. We also adopt a multi-scale strategy. 640×480 or 800×600 images were used as inputs and the simple L2 -norm was used to compute the image similarity, e. For synthetic data sets, the estimated shape is quantitatively evaluated in terms of accuracy and completeness as in [1]. We used 95% for accuracy and
Generic Scene Recovery Using Multiple Images
(a) input images
(b) synthesized images
753
(c) results
Fig. 1. “dino” image set (16 images) — Lambertian surface case (static illumination)
(a) ground-truth model
(b) estimated model
(c) input vs. synthesized image
Fig. 2. “bimba” image set (18 images) — textureless Lambertian surface case (varying illumination and viewpoint). 95% accuracy (shape, ρdr , ρdg , ρdb )=(2.16mm, 0.093, 0.093, 0.093), 1.0mm completeness (shape, ρdr , ρdg , ρdb ) = (82.63%, 0.104, 0.104, 0.104), eimage =1.44.
the 1.0mm error for completeness. For easy comprehension, the size of a target object is normalized so that it is smaller than [100mm 100mm 100mm]. Here, beside the shape evaluation, we also evaluated the estimated reflectance in the same manner. In addition, we computed the average between input nc 1difference images and synthesized images as eimage = n1c i=1 I (x) − I¯i (x) dσi , i A πi where A = πi dσi . Due to the generality of the proposed method, it can be applied to various types of image sets with different camera/light configurations. Here, knowledge of illumination allows to factorize radiance into reflectance and geometry. In practice, depending on the scenario, that knowledge may not be required, e.g. for recovering shape and radiance of Lambertian surfaces with static illumination. In this case, the proposed method can be applied even without lighting information, assuming only an ambient illumination, and the proposed method works much like the conventional multiview stereo methods. Figure 1 shows the result for the “dino" image set [1], for which no lighting information is required. The proposed method successfully recovers the shape as well as the radiance. The proposed method can also be applied to images taken under varying illumination. Results using images of textureless/textured Lambertian surfaces are shown in Fig. 2 and Fig 3. In the case of Fig. 2, the proposed method works as a multiview photometric stereo method and recovers the shape and the diffuse reflectance of each surface point. Based on these, we can synthesize images of the scene for different lighting conditions. We then applied our method to the images of textureless/textured nonLambertian surfaces showing specular reflection. Note that, unlike [15, 16], we do not use any thresholding to filter out specular highlight pixels. The result for the smoothed “bimba” data set is shown in Fig. 4. In this case, the surface has
754
K.-J. Yoon, E. Prados, and P. Sturm
(a) input image
(b) true refl.
(c) true shading
(d) est. refl.
(e) est. shading
Fig. 3. “dragon" image set (32 images) — textured Lambertian surface case (static illumination and varying viewpoint). 95% accuracy (shape, ρdr , ρdg , ρdb )=(1.28mm, 0.090, 0.073, 0.066), 1.0mm completeness (shape, ρdr , ρdg , ρdb ) = (97.11%, 0.064, 0.056, 0.052), eimage =1.25.
(a) true model
(b) est. shape
(c) diffuse & specular images
(d) synthesized
Fig. 4. Smoothed “bimba" image set (36 images) — textureless non-Lambertian surface case (uniform specular reflectance, varying illumination and viewpoint). 95% accuracy (shape, ρdr , ρdg , ρdb , ρs , αs )=(0.33mm, 0.047, 0.040, 0.032, 0.095, 8.248), 1.0mm completeness (shape, ρdr , ρdg , ρdb , ρs , αs ) = (100%, 0.048, 0.041, 0.032, 0.095, 8.248), eimage =1.63.
uniform diffuse/specular reflectance and each image was taken under a different illumination. Here, we used the method with Eq. (17) to estimate the specular reflectance. Although there is high-frequency noise in the estimated shape, the proposed method estimates the specular reflectance well — the ground-truth specular reflectance is (ρs =0.7, αs =50) while the estimated one is (ρs =0.61, αs =41.8). Here, note that small errors in estimated surface normals can cause large errors in specular reflectance due to its sensitivity to the surface normal. For instance, 0.7 × (0.98)50 (= 0.255) ≈ 0.61 × (0.979)41.8 (= 0.251). Note that most previous methods do not work for image sets taken under varying illumination and, moreover, they have difficulties to deal with specular reflection even if the images are taken under static illumination. For example, Fig. 5 shows results obtained by the method of [2] and our result for comparison. We ran the original code provided by the authors many times while changing parameters and used mutual information (MI) and cross correlation (CCL) as similarity measures to get the best results under specular reflection. As shown, the method of [2] fails to get a good shape even when the shape is very simple, while our method estimates it accurately. Also, with such images, given the large proportion of overbright surface parts, it seems intuitive that the strategy chosen by [16] and [15] (who consider bright pixels as outliers) might return less accurate results, because it removes too much information.
Generic Scene Recovery Using Multiple Images
(a) two input images
(b) results using [2] (MI, CCL)
755
(c) our result
Fig. 5. Comparison using the “ellipse" image set (16 images) — textureless nonLambertian surface case (uniform specular reflectance, static illumination and varying viewpoint)
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 6. Result for real image sets. (a) input image (b) initial shape (c) estimated shape (d) diffuse image (e) specular image (f) synthesized image.
We also used real image sets of textured glossy objects, which were taken by using fixed cameras/light sources, while rotating the objects as in [15, 16]. Here, we simply assumed a single-material surface. (72 × 72 × 72) grids were used for the “saddog” (59 images) and “duck” (26 images) image sets. Figure 6 shows that, although sparse grid volumes were used, the proposed method successfully estimated the shape of the glossy object even under specular reflection while estimating specular reflectance. In addition, although the estimated specular reflectance may not be so accurate because of the inaccuracy of lighting calibration, saturation, and some unexpected photometric phenomenon such as interreflection, it really helps to recover the shape well. Finally, we applied our method to the most general case — textured nonLambertian surfaces with spatially varying diffuse and specular reflectance and shininess, cf. Fig. 7. (64 × 125 × 64) grids were used in this case. We can see that the proposed method yields plausible specular/diffuse images and shape. However, there is high-frequency noise in the estimated shape. Moreover, the error in reflectance estimation is rather larger compared to the previous cases. This result shows that, although the proposed discontinuity cost function of
756
K.-J. Yoon, E. Prados, and P. Sturm
(a) input image
(b) true shading
(c) shape
init.
(d) est. shading
(e) syn. image
Fig. 7. Result for the “amphora" image set (36 images). 95% accuracy (shape, ρdr , ρdg , ρdb , ρs , αs )=(0.59mm, 0.041, 0.047, 0.042, 0.226, 13.59), 1.0mm completeness (shape, ρdr , ρdg , ρdb , ρs , αs ) = (89.73%, 0.042, 0.047, 0.042, 0.226, 13.55), eimage =1.99.
surface reflectance helps to infer the specular reflectance of all points with sparse specular reflection observation, reliably estimating specular reflectance for all surface points is still difficult unless there are enough observations of specular reflection for every surface point.
7
Conclusion
In this paper, we have presented a variational method that recovers both the shape and the reflectance of surfaces using multiple images. Scene recovery is achieved by minimizing a global cost functional by alternation. As a result, the proposed method produces a complete description of scene surfaces. Contrary to previous works that consider specific scenarios, our method can be applied indiscriminately to a number of classical scenarios — it naturally fuses and exploits several important cues (silhouettes, stereo, and shading) and allows to deal with most of the classical 3D reconstruction scenarios such as stereo vision, (multi-view) photometric stereo, and multiview shape from shading. In addition, our method can deal with strong specular reflection, which is difficult even in some other state of the art methods using complex similarity measures.
References 1. Seitz, S.M., Curless, B., Diebel, J., Scharstein, D., Szeliski, R.: A comparison and evaluation of multi-view stereo reconstruction algorithms. In: IEEE CVPR, pp. 519–528 (2006) 2. Pons, J.P., Keriven, R., Faugeras, O.: Multi-view stereo reconstruction and scene flow estimation with a global image-based matching score. IJCV 72(2), 179–193 (2007) 3. Goesele, M., Curless, B., Seitz, S.M.: Multi-view stereo revisited. In: IEEE CVPR, vol. 2, pp. 2402–2409 (2006) 4. Tran, S., Davis, L.: 3D surface reconstruction using graph cuts with surface constraints. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 219–231. Springer, Heidelberg (2006)
Generic Scene Recovery Using Multiple Images
757
5. Kolev, K., Klodt, M., Brox, T., Esedoglu, S., Cremers, D.: Continuous global optimization in multiview 3D reconstruction. In: Yuille, A.L., Zhu, S.-C., Cremers, D., Wang, Y. (eds.) EMMCVPR 2007. LNCS, vol. 4679, pp. 441–452. Springer, Heidelberg (2007) 6. Jin, H., Soatto, S., Yezzi, A.J.: Multi-view stereo reconstruction of dense shape and complex appearance. IJCV 63(3), 175–189 (2005) 7. Soatto, S., Yezzi, A.J., Jin, H.: Tales of shape and radiance in multi-view stereo. In: IEEE ICCV, pp. 974–981 (2003) 8. Jin, H., Cremers, D., Wang, D., Prados, E., Yezzi, A., Soatto, S.: 3-D reconstruction of shaded objects from multiple images under unknown illumination. IJCV 76(3), 245–256 (2008) 9. Jin, H., Yezzi, A., Soatto, S.: Variational multiframe stereo in the presence of specular reflections. In: 3DPVT, pp. 626–630 (2002) 10. Yang, R., Pollefeys, M., Welch, G.: Dealing with textureless regions and specular highlights-a progressive space carving scheme using a novel photo-consistency measure. In: IEEE ICCV, pp. 576–583 (2003) 11. Yoon, K.J., Kweon, I.S.: Correspondence search in the presence of specular highlights using specular-free two-band images. In: Narayanan, P.J., Nayar, S.K., Shum, H.-Y. (eds.) ACCV 2006. LNCS, vol. 3852, pp. 761–770. Springer, Heidelberg (2006) 12. Zickler, T., Mallick, S.P., Kriegman, D.J., Belhumeur, P.: Color subspaces as photometric invariants. To appear in IJCV (2008) 13. Yu, T., Xu, N., Ahuja, N.: Shape and view independent reflectance map from multiple views. IJCV 73(2), 123–138 (2007) 14. Jin, H., Cremers, D., Yezzi, A.J., Soatto, S.: Shedding light on stereoscopic segmentation. In: IEEE CVPR, vol. 1, pp. 36–42 (2004) 15. Esteban, C.H., Vogiatzis, G., Cipolla, R.: Multiview photometric stereo. IEEE TPAMI 30(3), 548–554 (2008) 16. Birkbeck, N., Cobzas, D., Sturm, P., Jägersand, M.: Variational shape and reflectance estimation under changing light and viewpoints. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 536–549. Springer, Heidelberg (2006) 17. Georghiades, A.S.: Incorporating the torrance and sparrow model of reflectance in uncalibrated photometric stereo. In: IEEE ICCV, vol. 2, pp. 816–823 (2003) 18. Vogiatzis, G., Favaro, P., Cipolla, R.: Using frontier points to recover shape, reflectance and illumunation. In: IEEE ICCV, vol. 1, pp. 228–235 (2005) 19. Lu, J., Little, J.: Reflectance function estimation and shape recovery from image sequence of a rotating object. In: IEEE ICCV, pp. 80–86 (1995) 20. Yezzi, A., Soatto, S.: Stereoscopic segmentation. IJCV 53(1), 31–43 (2003) 21. Gargallo, P., Prados, E., Sturm, P.: Minimizing the reprojection error in surface reconstruction from images. In: IEEE ICCV (2007) 22. Paris, S., Sillion, F.X., Quan, L.: A surface reconstruction method using global graph cut optimization. IJCV 66(2), 141–161 (2006) 23. Vogiatzis, G., Esteban, C.H., Torr, P.H.S., Cipolla, R.: Multiview stereo via volumetric graph-cuts and occlusion robust photo-consistency. IEEE TPAMI 29(12), 2241–2246 (2007) 24. Jin, H., Yezzi, A.J., Tsai, Y.H., Cheng, L.T., Soatto, S.: Estimation of 3D surface shape and smooth radiance from 2D images: A level set approach. J. Sci. Comput. 19(1-3), 267–292 (2003)
Highly Accurate PDE-Based Morphology for General Structuring Elements Michael Breuß and Joachim Weickert Mathematical Image Analysis Group, Faculty of Mathematics and Computer Science, Building E1.1 Saarland University, 66041 Saarbrücken, Germany {breuss,weickert}@mia.uni-saarland.de
Abstract. Modelling the morphological processes of dilation and erosion with convex structuring elements with partial differential equations (PDEs) allows for digital scalability and subpixel accuracy. However, numerical schemes suffer from blur by dissipative artifacts. In our paper we present a family of so-called flux-corrected transport (FCT) schemes that addresses this problem for arbitrary convex structuring elements. The main characteristics of the FCT-schemes are: (i) They keep edges very sharp during the morphological evolution process, and (ii) they feature a high rotational invariance. Numerical experiments with diamonds and ellipses as structuring elements show that FCT-schemes are superior to standard schemes in the field of PDE-based morphology.
1
Introduction
Mathematical morphology is concerned with the analysis of shapes. Beginning with the works of Serra and Matheron [1, 2], it has evolved to a highly successful field in image processing. Many monographs and conference proceedings document this development, see e.g. [4, 6, 8, 18] and [17, 21, 22, 25], respectively. In mathematical morphology two fundamental operations are employed, dilation and erosion. Many other morphological processes such as openings, closings, top hats and morphological derivative operators can be derived from them. While dilation/erosion are frequently realised using a set-theoretical framework, an alternative formulation is available via partial differential equations (PDEs) [10, 11, 13, 14, 15]. Compared to the set-theoretical approach, the latter offers the conceptual advantages of digital scalability and subpixel accuracy. However, a usual drawback of PDE-based algorithms is that they introduce blurring artefacts, especially at edges of dilated/eroded objects. In this paper we are addressing this problem by dealing with the proper numerical realisation of PDE-based dilation and erosion for general structuring elements. We show how a flux-corrected transport (FCT) scheme that gives a sharp resolution of dilated/eroded object edges combined with a high-rotational invariance can be X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 758–769, 2009. c Springer-Verlag Berlin Heidelberg 2009
Highly Accurate PDE-Based Morphology for General Structuring Elements
759
used. It is not only easy to implement, but we also show in numerical experiments that it outperforms other schemes for PDE-based morphology. Mathematical Formulation of Dilation and Erosion. Let us consider a grey-value image f : Ω ⊂ IR2 → IR and a so-called structuring element B ⊂ IR2 . The building blocks of morphological filters, dilation and erosion, are then defined by dilation: erosion:
(f ⊕ B) (x) := sup {f (x−z), z ∈ B}, (f B) (x) := inf {f (x+z), z ∈ B}.
(1) (2)
Dilation/erosion are often realised in a set-theoretical framework. To this end, the structuring elements are given by masks defined in accordance to the discrete pixel grid in an image. For convex structuring elements, there exists an alternative formulation of dilation/erosion in terms of PDEs that guarantee the validity of the semigroup property of dilation/erosion operations [14, 15, 10, 11]. Here, a scaling parameter t > 0 is introduced within the structuring element which is then given as tB, achieving digital scalability. Especially, in Paragraph 4.2 of [14] it was shown that dilation/erosion can be realised by solving the PDEs dilation:
∂t u(x, t) = sup z, ∇u(x, t) ,
(3)
erosion:
∂t u(x, t) = inf z, ∇u(x, t) ,
(4)
z∈B z∈B
respectively. In (3)-(4), ∇ = (∂x , ∂y ) is the spatial nabla operator, and a, b denotes the Euclidean product of the vectors a and b. Interpreting the scaling parameter t as an artificial time, the given image f serves as the initial condition for the temporal evolution described by the PDEs (3)-(4). As we deal with rectangular images of finite size, we also need to define boundary conditions. Thus, we employ homogeneous Neumann boundary conditions at the image boundary ∂Ω, complementing the PDE-based problem description. Set-Theoretical vs. PDE-Based Approach. As already mentioned, the PDE-based approach offers the advantages of digital scalability and subpixel accuracy compared to the set-theoretical formulation, while the PDE-based algorithms usually introduce blurring of edges. Let us note in addition, that round structuring elements such as circles or ellipses can not be represented conveniently in the set-theoretical approach, and they typically do not define a granulometric family [18]. Thus, conceptually the PDE-based approach is favourable. Numerical Schemes. Let us first briefly comment on the nature of the evolutionary PDEs (3)-(4). By the first-order spatial derivatives these PDEs are hyperbolic, describing a wave propagation or transport behaviour, in analogy to Huygens’ principle. Thereby, the shape of the evolving wavefront is determined by the shape of the scalable structuring element. Thus, given the hyperbolic character of the dilation/erosion PDEs (3)-(4), it is natural that techniques from hyperbolic conservation laws are of importance for this work; see e.g. [23] for a general discussion of numerical methods
760
M. Breuß and J. Weickert
for hyperbolic PDEs. In the context of dilation/erosion, popular schemes are the Osher-Sethian (OS) schemes [24,9,20] and the Rouy-Tourin (RT) scheme [12,19]. In particular, let us note that one of the mentioned OS-schemes is a second-order high-resolution method. The use of a comparable high-resolution ansatz, specifically an essentially non-oscillatory (ENO) approach, was reported in [16]. In [26], Breuß and Weickert constructed a FCT-scheme for performing dilation/erosion with a disc of radius t as structuring element. Our Contribution. We extend the applicability of the FCT-scheme introduced in [26] from discs to general structuring elements. As it turns out, this is feasible but involves technical difficulties, especially for the case of general ellipses as structuring elements we discuss here in detail. We validate experimentally that the attractive features discussed in [26], namely a sharp resolution of edges and high rotational invariance, do carry over to the general case. In order to compare the performance of the FCT-scheme to set-theoretical algorithms, we use a diamond-shaped structuring element. For a comparison relying completely on digitally scalable structuring elements, we use an ellipse as structuring element. We show experimentally that the FCT-scheme gives much more accurate results than other PDE-based schemes. Paper Organisation. In Section 2, we briefly introduce classic numerical schemes important in this paper for the case of a diamond as structuring element. We also construct the FCT-scheme for the same structuring element there. After that, we elaborate in Section 3 on the FCT-construction for ellipses as structuring element. In Section 4, we present numerical results. The paper is finished by a conclusion and outlook in Section 5.
2
PDE-Based Algorithms for Diamonds
For the sake of brevity, we discuss only dilation in detail, as the corresponding scheme for erosion is easily obtained. Employing the structuring element B := z ∈ IR2 , z 1 ≤ 1 , (5) the sought PDE (3) describing specifically dilation with a diamond is based on the dual norm to the norm used in (5). It reads as ∂t u = ∇u ∞ ,
(6)
where ∇u ∞ = max (|∂x u| , |∂y u|). Now, we need to discretise the PDE (6). For this, we define a spatio-temporal grid with uniform mesh widths hx , hy and τ , respectively. For the formulae of numerical schemes, let us then introduce the n notation Ui,j via n Ui,j ≈ u (ihx , jhy , nτ ) . (7) Also, for writing down our schemes let us define the following finite difference operators:
Highly Accurate PDE-Based Morphology for General Structuring Elements
right-sided: left-sided: central:
x n n n D+ Ui,j := Ui+1,j − Ui,j , x n Ui,j D− n Dcx Ui,j
:= :=
n n Ui,j − Ui−1,j , n n Ui+1,j − Ui−1,j
761
(8) (9) .
(10)
y In an analogous fashion, we use corresponding finite difference operators D+ , y y D− and Dc for the y-direction.
2.1
The High-Resolution Osher-Sethian-Scheme
In what follows, we will refer to this method as the OS-scheme, as its simpler, first-order variant will not be considered here. For its definition, we employ the minmod-function (as it gives back the minimal modulus of its arguments) given as ⎧ ⎨ min (a, b) if a > 0 and b > 0 , (11) mm(a, b) := max (a, b) if a < 0 and b < 0 , ⎩ 0 else . To keep the presentation of the OS-scheme short, let us define the following discrete derivative operators:
x x n 1 1 OS− n x n x x n δx Ui,j := min D− Ui,j + mm D− D+ Ui,j , D− D− Ui,j , 0 , (12) hx 2
x x n 1 1 OS+ n x n x x n δx Ui,j := max D+ Ui,j − mm D+ D+ Ui,j , D− D+ Ui,j , 0 , (13) hx 2 n n and we set analogously δyOS− Ui,j and δyOS+ Ui,j . Let us note that the basic idea behind the construction within (12)-(13) is to augment the first-order derivatives x n x n D− Ui,j and D+ Ui,j by a higher-order correction given in terms discrete secondorder derivatives. For a compact notation, let us then set OS+ n OS− n OS+ n n L (U n , i, j) := max δxOS− Ui,j + δx Ui,j , δy Ui,j + δy Ui,j , (14)
which realises the maximum norm on the discrete level. Let us briefly comment on the ’double’ of the discretised derivatives in (14), for instance in contributions OS+ n n x-direction: δxOS− Ui,j + δx Ui,j . For a strictly monotone grey-value profile in the points incorporating the indices i − 1, i, i + 1, there will only be one nonzero contribution from one of the summands; the other one will be zero. That is determined by the sign of the slope in a strictly monotone profile. Only at a n local minimum Ui,j , both summands could be non-zero. The OS-scheme is a second-order high-resolution scheme. As such, we need to employ a second-order time stepping scheme, for which we choose the well-known method of Heun which is a two-stage Runge-Kutta method [7]: ¯ n+1 = U n + τ L (U n , i, j) U i,j i,j 1 n 1 ¯ n+1 τ ¯ n+1 n+1 Ui,j = Ui,j + U + L U , i, j . i,j 2 2 2
(15)
762
2.2
M. Breuß and J. Weickert
The FCT-Scheme
Like the OS-scheme, the FCT-scheme is a predictor-corrector method. However, while this format arises in the case of the OS-scheme by use of a Runge-Kutta method for time integration, the FCT-construction works differently. As a predictor step, a first-order scheme is used for wave propagation. Thus, by the first-order error the predictor features desirable theoretical properties but also introduces much artificial dissipation. Then, by taking into account the so-called viscosity form of the predictor scheme, the dissipation can be quantified on a discrete level and is negated in a second step using stabilised inverse diffusion [27]. For details we refer to [26]. Let us note that the basic idea to negate dissipation by a corrector step was invented by Boris and Book [3,5]. However, the corrector step was realised technically quite differently in their original works. Following their procedure would lead to a different (and less attractive) scheme than with the approach followed here. As a predictor step we use the dissipative scheme proposed by Rouy and Tourin [12]. In order to write this down, we use the abbreviation
1 x n 1 n x n δxRT Ui,j := max max −D− Ui,j , 0 , max D+ Ui,j , 0 , (16) hx hx n and δyRT Ui,j is used accordingly. Then the RT-scheme is in our case defined as n n RT n ¯ n+1 = Ui,j , δy Ui,j . (17) U + τ max δxRT Ui,j i,j
The FCT scheme then consists of a subsequent application of (17) and a corrector step negating the artificial dissipation of the RT scheme, reading in total as n n RT n ¯ n+1 = Ui,j U , δy Ui,j + τ max δxRT Ui,j i,j n+1 n+1 n+1 ¯ ¯ ¯ n+1 + Ch U Ui,j − Cd U . (18) =U ij Let us consider the corrector step and especially the functions Ch (’h’ for highorder part) and Cd (’d’ for dissipative part) in some detail. As indicated, the first step of the FCT procedure is to split the dissipative part of the scheme from the non-dissipative second-order part. The latter part of the scheme can be described via central differences as in (10), since central differences do not feature dissipation in the leading-order part of the truncation error. Thus, the discretisation of the dilation PDE (6) using central differences only,
τ n+1 n x n τ y n ¯ (19) Ui,j = Uij + max Dc Ui,j , Dc Ui,j , 2hx 2hy incorporates no numerical dissipation in the approximation of spatial derivatives. Employing predicted data as arguments in the formulae of the corrector step, we can identify the high-order part within the predictor formula (which was absent before adding it) as
τ n+1 x ¯ n+1 τ y ¯ n+1 ¯ Ch U (20) := + max Dc Ui,j , Dc Ui,j 2hx 2hy
Highly Accurate PDE-Based Morphology for General Structuring Elements
by adding zero via adding/subtracting
τ ¯ n+1 , τ Dcy U ¯ n+1 . Dcx U max i,j i,j 2hx 2hy
763
(21)
Let us now stress that the remaining terms of the predictor formula plus the ¯ n+1 as arguments, define the discrete contribution due to (20)-(21), with data U dissipation Cd . However, since we want to subtract Cd in (18), we aim for a backward dissipation. Thus, we need to stabilise this contribution with help of a straightforward extension of the minmod-function from (11) to three arguments:
τ x ¯ n+1 x ¯ n+1 x ¯ n+1 Gi+1/2,j := mm D− Ui,j , (22) D U , D+ Ui+1,j , 2hx + i,j
τ y ¯ n+1 ¯ n+1 , Dy U ¯ n+1 Gi,j+1/2 := mm D− (23) Dy U Ui,j , + i,j+1 . 2hy + i,j Let us note that the left and right arguments in (22)-(23) are supposed to prevent overshoots, while the middle argument is determined in accordance to (16). For details concerning this procedure and an analysis of stabilised inverse diffusion, ¯ n+1 we employ see [26,27], respectively. For the dissipative correction term Cd U the stabilised fluxes from (22)-(23), yielding ¯ n+1 := τ Dcx U ¯ n+1 + Gi+1/2,j − Gi−1/2,j , δxbd U (24) i,j i,j 2hx ¯ n+1 := τ Dy U ¯ n+1 + Gi,j+1/2 − Gi,j−1/2 , (25) δybd U c i,j i,j 2hy and finally:
3
n+1 ¯ ¯ n+1 , δybd U ¯ n+1 . = max δxbd U Cd U i,j i,j
(26)
The FCT-Scheme for General Ellipses
The key for obtaining dilation with a general ellipse is to consider the normal form of an ellipse in the x-y-plane which can be written for our purpose as a2 x2 + b2 y 2 = 1 .
(27)
This equation describes the location of the front of the solution of the evolutionary PDE ∂t u =
2
2
a2 (∂x u) + b2 (∂y u)
(28)
at time t = 1, starting from the center (x, y)T = (0, 0)T . For a = b = 1, one obtains a circle, retrieving a disc as structuring element. Note that we should be able to handle a PDE like (28) easily, while implementing directly an algorithm for ellipses with a general orientation of the principal axis poses difficulties. The General Idea. Let us briefly outline the procedure. In order to finally solve the PDE (28), we collect, for each pixel individually, grey values from positions
764
M. Breuß and J. Weickert
corresponding to a rotated grid. As these will not be located exactly at pixel centers, they will in general not coincide with the given grey values and need to be interpolated. With these interpolated data we solve pointwise the PDE (28). Having thus described the general proceeding, we begin its realisation for general ellipses as structuring elements by implementing a rotation of the coordinate system. For a more detailed explanation of this, we need to fix some geometric properties of the ellipse defining the structuring element. In order to simplify the presentation, we set hx := hy := 1. First, let us calibrate the length of the principal axis to 1, i.e. the final ellipse is a subset of the unit disc. In order to use a PDE of the form of (28), we have to rotate the grid. Let us note that for hx = hy = 1, all points within the stencil of the Rouy-Tourin scheme (16)-(17) are on the unit sphere if we center this at (ihx , jhy )T . Then we rotate the local Euclidean coordinate system centered at (ihx , jhy )T by an angle α with 0 ≤ α ≤ π/2. Making use of elementary trigonometry, the values rotated now onto the knots of our finite difference stencil are grey values from the points given by (cos αk , sin αk )T , αk := α + k · π2 , k = 0, 1, 2, 3. Let us note that in using this procedure, we effectively consider an ellipse where the angle between x-axis and principal axis is −α. Let us stress, that we can obtain via 0 ≤ α ≤ π/2 all possible ellipses, as we can switch at any time the roles of a and b in (28) that define the principal axis. It is just practical to impose 0 ≤ α ≤ π/2 since this helps to give a suitable interpolation formula, which is the next step. Obviously, we need at each pixel the grey values after rotation for defining our finite difference scheme. We wish to achieve second-order accuracy because the second-order high-resolution OS-scheme will serve as the comparison scheme for the procedure. Thus, we use standard bilinear interpolation for this purpose as the error of this approach is formally of the same order. In order to show how the computation wroks, we now clarify the details for the values in the first quadrant. As 0 ≤ α ≤ π/2, the grey value we need at the knot ((i + 1)hx , jhy )T is located at (cos α0 , sin α0 )T = (cos α, sin α)T . Because of hx = hy = 1, we can use the general formula for bilinear interpolation of some function g over the rectangle [0, 1] × [0, 1] reading as g(x, y) ≈ g(0, 0)(1 − x)(1 − y) + g(1, 0)x(1 − y) +g(0, 1)(1 − x)y + g(1, 1)xy , x, y ∈ [0, 1].
(29)
Plugging in our values within the first quadrant, we obtain the rotated grey ˜n value U i+1,j as n n n ˜i+1,j U := Ui,j (1 − cos α)(1 − sin α) + Ui+1,j cos α(1 − sin α) n n +Ui,j+1 (1 − cos α) sin α + Ui+1,j+1 cos α sin α .
(30)
Analogously, we can compute the other members of our stencil after rotation of our local coordinate system.
Highly Accurate PDE-Based Morphology for General Structuring Elements
765
The resulting formulae are: n n ˜i,j := Ui,j , U
˜n U i,j+1 ˜n U i−1,j ˜n U i,j−1
:= := :=
n n Ui,j (1 − cos α)(1 − sin α) + Ui,j+1 cos α(1 − sin α) n n +Ui−1,j (1 − cos α) sin α + Ui−1,j+1 cos α sin α , n n Ui,j (1 − cos α)(1 − sin α) + Ui−1,j cos α(1 − sin α) n n +Ui,j−1 (1 − cos α) sin α + Ui−1,j−1 cos α sin α , n n Ui,j (1 − cos α)(1 − sin α) + Ui,j−1 cos α(1 − sin α) n n +Ui+1,j (1 − cos α) sin α + Ui+1,j−1 cos α sin α .
(31) (32) (33) (34)
In terms of these values we now give the main formulae for the FCT-scheme. Comparing especially with (18), (20) and (26) from Paragraph 2.2 shows what needs to be done:
2 2 n+1 n 2 δ RT U ˜ ˜n ˜n ¯ + b Ui,j = Ui,j + τ a2 δxRT U y i,j i,j n+1 n+1 n+1 n+1 ¯ ¯ ¯ − Cd U , (35) Ui,j = Uij + Ch U with
n+1 τ ¯ ¯ n+1 2 + b2 Dcy U ¯ n+1 2 , =+ a2 Dcx U Ch U i,j i,j 2hx n+1 ¯ ¯ n+1 2 + b2 δybd U ¯ n+1 2 . = a2 δxbd U Cd U i,j i,j
(36) (37)
n+1 ¯ , we Note that for the arguments of the minmod-function used within Cd U n+1 n+1 ˜ ¯ also need to compute rotated grey values Ui±2,j±2 from the data set U . This can be done in the same fashion as in (30)-(34). Also note that because of the callibration of ellipses, we always have in our experiments a = 1 and b ∈ [0, 1].
4
Numerical Experiments
The main disadvantage in using PDE-based algorithms is the occurence of dissipative discretisation artefacts. The resulting blurring is especially observable at edges of dilated/eroded objects. The Diamond Experiment. In this experiment, we solve the dilation PDE (6), comparing the FCT-scheme with the set-theoretical approach. For convenience, we always employ hx = hy = 1. For the fully discrete, set-theoretical approach, we employ the usual 5-pointstructuring element defined centered in (0, 0)T with vertices (1, 0)T , (0, 1)T , (−1, 0)T , (0, −1)T .
(38)
In Figure 1, we observe the outcome of this experiment, where we have inverted the grey values. As input image, we use an image of size 129 × 129, where we
766
M. Breuß and J. Weickert
Fig. 1. Comparison of dilation with a diamond using inverted grey values. Top. Left: Initial image (a). Right: The set-based result (b). Bottom. Left: The FCT-result (c). Right: Scaled difference (d). The average difference visualised in (d) is of the grey value 1.502.
have exactly one pixel in the center of the image which is dilated. We perform 100 time steps with τ = 0.5 for dilation with FCT, and 50 iterations with the set-based algorithm, respectively. We observe that the FCT-result (c) is visually nice, with sharp diamond edges. Compared with the set-based result (b), we observe that there is some difference at the edges, which can be seen in the scaled difference map (d). Note that the average (unscaled) difference amounts to a grey value of 1.502. However, let us also note that the solution of the PDE is digitally scalable, so that the set-based solution is not intended to be the true solution of the dilation PDE. The Ellipse Experiment. We now show computational results for ellipses as structuring elements. In order to first give an impression of what quality one may expect, we first consider an ellipse where the principal axis is aligned with the grid. For this experiment, we define the structuring element via a = 1, b = 0.25, compare (27). For the numerical experiment, we use τ = 0.5, and we perform 100 time steps. The results of the OS-scheme together with the result of the
Highly Accurate PDE-Based Morphology for General Structuring Elements
767
Fig. 2. Dilation comparison with inverted grey values. Left: OS-result (a). Right: FCTresult (b).
Fig. 3. Dilation comparison with inverted grey values. Rotated ellipses with (top) α = 0.6 and (bottom) α = 0.9. Bilinear interpolation was used to rotate the grid in each time step. Left column: OS-results. Right column: FCT-results.
768
M. Breuß and J. Weickert
FCT-scheme are displayed in Figure 2. While the result of the OS-scheme is quite blurry, we observe a mixed behaviour of the FCT-scheme. While the left and right front travelling with the largest signal velocity in this example are well-resolved, there is some blurring on the slow-moving upper and lower part of the edge of the ellipse. Let us now consider the rotated case employing bilinear interpolation. In a setting analogous to the non-rotated case, but with a = 1, b = 0.25 and (a) α = 0.6, (b) α = 0.9. We obtain the results displayed in Figure 3. We observe that due to the interpolation there is some more blurring in using both schemes, however, the general qualitative relationship between results of these schemes is the same as in the non-rotated case.
5
Conclusion and Outlook
The main message of this paper is twofold: – The FCT-methodology is readily applicable in the context of general structuring elements. With the exception of suitable interpolation formulae, there is no more technical effort than in the case of a disc-shaped structuring element. – The quality of FCT-results is better than the quality of results using other schemes with respect to edge resolution. The current paper represents one of the most advanced numerical approaches to continuous-scale morphology. For our future work, we aim to improve the quality of numerical schemes in this field even further.
References 1. Serra, J.: Echantillonnage et estimation des phénomènes de transition minier. PhD thesis, University of Nancy, France (1967) 2. Matheron, G.: Eléments pour une théorie des milieux poreux. Masson, Paris (1967) 3. Boris, J.P., Book, D.L.: Flux corrected transport. I. SHASTA, a fluid transport algorithm that works. Journal of Computational Physics 11(1), 38–69 (1973) 4. Matheron, G.: Random Sets and Integral Geometry. Wiley, New York (1975) 5. Boris, J.P., Book, D.L.: Flux corrected transport. III. Minimal error FCT algorithms. Journal of Computational Physics 20, 397–431 (1976) 6. Serra, J.: Image Analysis and Mathematical Morphology, vol. 1. Academic Press, London (1982) 7. Hairer, E., Norsett, S., Wanner, G.: Solving Ordinary Differential Equations. I: Nonstiff Problems. Springer Series in Computational Mathematics, vol. 8. Springer, New York (1987) 8. Serra, J.: Image Analysis and Mathematical Morphology, vol. 2. Academic Press, London (1988) 9. Osher, S., Sethian, J.A.: Fronts propagating with curvature-dependent speed: Algorithms based on Hamilton–Jacobi formulations. Journal of Computational Physics 79, 12–49 (1988)
Highly Accurate PDE-Based Morphology for General Structuring Elements
769
10. Brockett, R.W., Maragos, P.: Evolution equations for continuous-scale morphology. In: Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, San Francisco, CA, March 1992, vol. 3, pp. 125–128 (1992) 11. van den Boomgaard, R.: Mathematical Morphology: Extensions Towards Computer Vision. PhD thesis, University of Amsterdam, The Netherlands (1992) 12. Rouy, E., Tourin, A.: A viscosity solutions approach to shape-from-shading. SIAM Journal on Numerical Analysis 29, 867–884 (1992) 13. Sapiro, G., Kimmel, R., Shaked, D., Kimia, B.B., Bruckstein, A.M.: Implementing continuous-scale morphology via curve evolution. Pattern Recognition 26(9), 1363– 1372 (1993) 14. Alvarez, L., Guichard, F., Lions, P.L., Morel, J.M.: Axioms and fundamental equations in image processing. Archive for Rational Mechanics and Analysis 123, 199– 257 (1993) 15. Arehart, A.B., Vincent, L., Kimia, B.B.: Mathematical morphology: The Hamilton–Jacobi connection. In: Proc. Fourth International Conference on Computer Vision, Berlin, pp. 215–219. IEEE Computer Society Press, Los Alamitos (1993) 16. Siddiqi, K., Kimia, B.B., Shu, C.W.: Geometric shock-capturing ENO schemes for subpixel interpolation, computation and curve evolution. Graphical Models and Image Processing 59, 278–301 (1997) 17. Heijmans, H.J.A.M., Roerdink, J.B.T.M. (eds.): Mathematical Morphology and its Applications to Image and Signal Processing. Computational Imaging and Vision, vol. 12. Kluwer, Dordrecht (1998) 18. Soille, P.: Morphological Image Analysis, 2nd edn. Springer, Berlin (2003) 19. van den Boomgaard, R.: Numerical solution schemes for continuous-scale morphology. In: Nielsen, M., Johansen, P., Olsen, O.F., Weickert, J. (eds.) Scale-Space 1999. LNCS, vol. 1682, pp. 199–210. Springer, Heidelberg (1999) 20. Sethian, J.A.: Level Set Methods and Fast Marching Methods, 2nd edn. Cambridge University Press, Cambridge (1999) 21. Goutsias, J., Vincent, L., Bloomberg, D.S. (eds.): Mathematical Morphology and its Applications to Image and Signal Processing. Computational Imaging and Vision, vol. 18. Kluwer, Dordrecht (2000) 22. Talbot, H., Beare, R. (eds.): Proc. Sixth International Symposium on Mathematical Morphology and its Applications, Sydney, Australia (April 2002), http://www.cmis.csiro.au/ismm2002/proceedings/ 23. LeVeque, R.J.: Finite Volume Methods for Hyperbolic Problems. Cambridge University Press, Cambridge (2002) 24. Osher, S., Fedkiw, R.P.: Level Set Methods and Dynamic Implicit Surfaces. Applied Mathematical Sciences, vol. 153. Springer, New York (2002) 25. Ronse, C., Najman, L., Decencière, E. (eds.): Mathematical Morphology: 40 Years On. Computational Imaging and Vision, vol. 30. Springer, Dordrecht (2005) 26. Breuß, M., Weickert, J.: A shock-capturing algorithm for the differential equations of dilation and erosion. Journal of Mathematical Imaging and Vision 25, 187–201 (2006) 27. Breuß, M., Welk, M.: Analysis of staircasing in semidiscrete stabilised inverse linear diffusion algorithms. Journal of Computational and Applied Mathematics 206, 520– 533 (2007)
Computational Geometry-Based Scale-Space and Modal Image Decomposition Application to Light Video-Microscopy Imaging Anatole Chessel1 , Bertrand Cinquin3,4 , Sabine Bardin3 , Jean Salamero3,5, and Charles Kervrann1,2 1
INRIA Rennes INRA-MIA UMR 144 CNRS-Institut Curie 4 Soleil Synchrotron 5 PICT-IBiSA Institut Curie 2
3
Abstract. In this paper a framework for defining scale-spaces, based on the computational geometry concepts of α-shapes, is proposed. In this approach, objects (curves or surfaces) of increasing convexity are computed by selective sub-sampling, from the original shape to its convex hull. The relationships with the Empirical Mode Decomposition (EMD), the curvature motion-based scale-space and some operators from mathematical morphology, are studied. Finally, we address the problem of additive image/signal decomposition in fluorescence video-microscopy. An image sequence is mainly considered as a collection of 1D temporal signals, each pixel being associated with its temporal intensity variation.
1
Introduction
Vision is a complex and hierarchical process of aggregation and reconstruction going from pointwise data to global information. The scale-space approach which builds several versions of the original signal at increasingly coarser scales, is a general framework for investigating those hierarchies. In this paper we propose to explore how to derive such a scale-space from a computational geometry point of view based on space or time signal/image convexity. To our knowledge, the tools from computational geometry [7], commonly used in images synthesis and 3D modeling, are quite unusual as far as raster images and computer vision are concerned. Typically, a 2D image may be viewed as a collection of sampled points on a surface in R3 , and may be represented by computational geometry objects using projection and interpolation operators, as we shall see in section 2. √ The notion of α-shape [8], which roughly corresponds to the “up-to- α-detailsconvex-hull” was introduced to represent 3D objects from given unorganized point clouds in R3 . In our approach, it is applied to known objects (signals or images) to define the so-called α-scale-space. In some way, this modeling can be thought of as a continuous variant of the Empirical Mode Decomposition (EMD) introduced earlier for signal decomposition in [10]. X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 770–781, 2009. c Springer-Verlag Berlin Heidelberg 2009
Computational Geometry-Based Scale-Space
771
The relationships between the proposed scale-space and previous mathematical frameworks are given in section 3. The motion-by-curvature is especially examined since it constitutes a possible interpolation operator needed in our approach. An equivalence with the usual opening operators from mathematical morphology is also presented. In section 4, a demonstration of the methodology applied to an usual decomposition problem in biological imaging is proposed. In some circumstances, the studied fluorescence video-microscopy sequences can be viewed as the sum of a diffusing component (slowly varying in space and time), and a very localized faster moving component. The proposed algorithm is mainly used to analyze 1D temporal signals extracted from a temporal series of images. The implementation is now routinely used by biologists and collaborators because of its rapidity of execution and its simplicity of control.
2
Computational Geometry-Based Scale-Space for Curves
This section is devoted to the description of a new morphological scale-space based on the computational geometry theory and α-shape concept [8]. The theory is laid out for Rd , but is mainly applied to signals and images, that is α-shapes in R2 and R3 . An analogy with EMD is also discussed in this section. 2.1
α-Shapes and Convexity
The key ingredients of computational geometry (see Fig. 1), are simplices, that is points, segments, triangles and tetrahedron. Sets of simplices form complexes. Definition 1 – A k-simplex σT is the convex combination of a set T of k + 1 points in Rd (k-simplices are points (k = 0), segments (k = 1), triangles (k = 2) or tetrahedrons (k = 3)). – A complex K is a set of simplices verifying: i) If σT ∈ K, ∀U ∈ T, σU ∈ K; ii) If σU , σV ∈ K, then σU∩V = σU ∩ σV . – A triangulation of a set P of points in Rd is a connected complex where all the k-simplices are included in a k + 1-simplex, 0 ≤ k < d. – A Delaunay triangulation of a set P of points in Rd is a triangulation where the circumscribed circle of all the d-simplices does not include any points of P . The triangulation is unique if the points are in generic positions. The triangulation is a tessellation of the convex hull of P . – A filtration of a complex K is a nested sequence of complexes included in K: ∅ = K 0 ⊆ K 1 ⊆ . . . ⊆ K m = K. In Fig.1, the Delaunay triangulation of a point set is shown in blue and the original points are shown in red.
772
A. Chessel et al.
The notion of α-shape is based on Delaunay triangulations and amounts to selecting points “with enough empty space around them”. More formally, Definition 2 (α-shapes). Let P be a set of points of Rd and TP its Delaunay triangulation. Let σT be a simplex of TP , cT the smallest sphere going through T and ρT its radius. σT belongs to the α-complex Kα of P , α ≥ 0 if and only if either of these propositions is true: 1. ρ2T < α and cT does not include any other points of P , 2. σT ⊂ σU with σU ∈ Kα . The subset of Rd actually covered by the α-complex (the underlying space) is called the α-shape Sα of P . In the general case, the α-complex is not pure, i.e. it may contain isolated points or segments not included within a triangle. Thus, the regularized α-shapes are also defined: Definition 3 (regularised α-shapes). Let P be a set of points of Rd , TP its Delaunay triangulation and Kα its α-complex, α ≥ 0. The regularized α-complex ˜ α of P is the largest complex (wrt inclusion) included in Kα for which all the K k-simplices are included in a k + 1-simplex. The regularized α-shape S˜α is the underlying space of the regularized α-complex. The following basic property holds true for both the α-shapes and regularized α-shapes we have defined: Property 1 – There exists a finite number of values of α leading to distinct α-complexes and α-shapes. – If α = 0, Kα = Sα = P . – There exists a finite value αM such that ∀α ≥ αM Kα = TP . – The sequence {Kα }, 0 ≤ α ≤ αM , is a filtration of TP . √ Intuitively, the idea is to define the “convex-up-to- α” hull of a set of points. For a given α, the α-shape consists of the√points “with enough empty space around them”. The resulting shape is made of α sized concavities. In Fig.1, the α-shape (for a given α) is shown in green and is a subset (according to a size criteria) of the original Delauney triangulation (in blue) computed from the initial set of points (in red). 2.2
α-Scale-Space Framework
In this section, we use the α-shape concept to define original scale-spaces. Let u : Rn → R be a continuous function, n = 1 or 2. Let d = n + 1, and P = {(x1 , . . . , xn , u(x1 , . . . , xn )) ∈ Rd | ( x1 , . . . , xn ) ∈ Nn } the finite set of points of Rn+1 = Rd corresponding to samples from u obtained at integer coordinates. Let P˜α , α ≥ 0 be the regularized α-shapes of P in Rd . Considering sampled points in Rn+1 allows us to pass from the continuous to the computational geometry setting. But going back from a computational geometry object in (n + 1)D to a continuous function in Rn two tools will be needed: a projection operator to get something mono-valued in nD and an interpolation operator for continuity.
Computational Geometry-Based Scale-Space
773
Definition 4 (lower envelope). The lower envelope of a set of functions H = {H1 , . . . , Hn } defined in R1 . . . , Rn ⊂ Rn is the pointwise minimum over H: LH = min1
Thus, Pα are selective sub-samplings of u based on local convexity analysis. By interpolating those two sets of points, we can define sets of functions corresponding to the original function analyzed at different increasing scales. Let Ω be an open of Rn and f : ∂Ω → R values defined on its border. Let I be an interpolation operator that associates a unique function on Ω to f defined onto ∂Ω. In what follows, I is assumed to verify the maximality principle: max∂Ω f ≥ maxΩ I(f ) ≥ minΩ I(f ) ≥ min∂Ω f . We can now define: Definition 6. Let I be an interpolation operator verifying the maximality principle. Considering a point P = (x1 , x2 , x3 ) as a function of R2 f (x1 , x2 ) = x3 , + − − we define the upper and lower α-scale space as: u+ α = I(Pα ) and uα = I(Pα ). + − ) The α-scale space of u is then defined as uα = (u +u . 2 In practice, various interpolation operators can be used, depending on the application. In 1D, a linear interpolation is the most usual. The question is more difficult in 2D and more details are given in the following sections. From the basic properties of α-shape explained above, basic properties of α-scale-spaces are given: Property 2 − – For α = 0, u+ α = uα = uα = u. + – For α > αM , uα and u− α are respectively the upper and lower convex hulls of u. − – For all α, u+ α ≥ u ≥ uα .
Properties of invariance (or covariance) are also of particular interest in the case of scale-spaces. Because the α-shape is related to the distance between points, it is invariant to isometric transformations and defined up to a scale factor. − d Property 3 (Invariance). u+ α , uα and uα are invariant to similarity in R .
This includes rotation and translation (for images), gray level shift and contrast inversion. On the other hand, invariance to generic or even increasing contrast changes does not hold. Indeed such a transformation would treat one direction independently from the others and thus would break the isotropy of the underlying algorithms. It is worth noting that it might be possible to exhibit more invariance properties by formulating the computational geometry framework using other metrics, but this is out of the scope of this paper.
774
2.3
A. Chessel et al.
Relationships with the Empirical Mode Decomposition
The EMD algorithm was introduced for highly non-stationary and non-linear 1D physical signals by Huang in [10], and since extended to 2D images [11]. A modal decomposition of signals is obtained by applying an iterative and intuitive algorithm which received recently a more formal justification within the wavelet framework [9]. Briefly, the EMD algorithm is composed of two loops. In the inner loop, the upper and lower envelopes are iteratively computed based on an interpolation process of maxima and minima and further used to build a “mean envelope”. The difference between the “mean envelope” and the signals yields a component, called the IMF (Intrinsic Mode Function) and a residual signal with no high frequency. The computation of the “mean envelope” and subtraction are iterated in the outer loop until the residual is a monotonic trend. Ultimately, we have n u = i=1 ci + r where the {ci } represent the modes that capture increasingly higher frequencies, and r represents the residual component. In the terminology of scale-spaces, we rather write uk = u − ki=1 ci , k = [1 . . . n] with u0 = u and un+1 = r by convention, to yield an increasingly coarse description of the original signal. Thus EMD can be seen as a discrete scale-space or, as the number of modes is relatively small, a scale-space with a limited number of scales automatically selected. Our proposed α-scale-space is consistent with this framework. The common ideas are modes corresponding to variations around a local mean, and local maxima/minima used to compute upper and lower envelopes. Nevertheless, they differ for two reasons: i) the more continuous nature of the α-shape decomposition,which consider a larger number of modes and scales; ii) the possibility of computing non-symmetric modes with the α-shapes. The IMFs computed by EMD are symmetric components with respect to the computed mean, which was desired in [10]. In image analysis, since the images are lower bounded, this constraint can be relaxed. The relationships between our scale-space formulation and the modal decomposition outlined by EMD however is worth keeping in mind. Section 4 illustrates these connections, especially if we consider lower α-scale-spaces. 2.4
Implementation and First Examples
In the following, the CGAL implementation of the computational geometry algorithms is used [2]. In the 1D case, a simple linear interpolation is performed. For 2D images, the interpolation is more problematic and a scattered points interpolation framework must be considered (see [1,6] for reviews). Natural neighbor interpolation [3] is used, as is it defined in the same computational geometry framework available in CGAL. Other possible choices are discussed in the next section.
Computational Geometry-Based Scale-Space
775
Fig. 1. Intensity variation wrt time (for a given pixel) in video-microscopy (see section 4). Red: original points (see also Fig. 4), blue: Delauney triangulation, green: α-shapes.
Fig. 2. Example of α-scale-space applied to a 2D otolith image (see text). Left to right: original, α-scale-space for α = 100, image difference.
Figure 1 shows a 1D example in video-microscopy, i.e. the intensity variations for a given pixel wrt time (see section 4 for details). The two peaks, corresponding to two objects passing through that pixel, are of interest. This illustrates the Delauney triangulation of a set of points and its relationships with the α-shape concept. The lower α-scale-space is defined as the green curve below the red points. Thus the difference with the original curve allows us to recover the peaks. Figure 2 shows a 2D example of the α-scale-space decomposition when applied to an otolith image. Otoliths are biological hard tissue (of few mm [12]) of much use in marine biology and ecology. The α-scale-space corresponds to the trend in intensity, while the difference between the trend and the original image corresponds to variations around that trend.
3
Interpolation and α-Shapes: Mean Curvature Motion and Mathematical Morphology
As defined, the α-scale-space is based on selective sub-sampling and interpolation. In this section, we investigate several interpolation operators. The so-called mean curvature motion and some tools from the mathematical morphology, are mainly studied. 3.1
α-Shape and Curvature Motion
Motion-by-curvature is a partial differential equation (PDE) commonly-used in image analysis. Curves evolve with a speed and a direction depending on the local curvature. If an image is represented as a stack of non-intersecting level
776
A. Chessel et al.
lines, each level line evolves according to its own curvature. It is established that evolving an image according to the mean curvature motion amounts to minimizing the total variation of the image. Finally, an image is also a 3D surface and may be deformed according to its local curvatures in R3 . In [6] an axiomatic approach for image interpolation was studied, singling out three second order interpolation operators: i) the first operator is the Laplacian operator, known not to be able to handle isolated data points and thus not suitable to scattered point interpolation; ii) the second operator is the Absolutely Minimizing Lipschitz Extension (AMLE) which supposes a Lipschitz initialization and thus not suitable to scattered point interpolation either; iii) the third operator is the curvature operator for which, when used as an interpolation operator, neither uniqueness nor existence of solutions holds. In [16], L. Vese proposed a mean curvature motion based on PDEs for the computation of convex hulls of signals and images, and proved existence and uniqueness of viscosity solutions: ∂u = 1 + |Du|2 min(0, λmin (D2 u)), ∂s u(x, 0) = u, where λmin (D2 u) is the lower eigenvalue of the Hessian matrix. It is an alternative way of computing a convexity based scale-space, yielding a family of functions of increasing convexity up to the convex hull. Two flows, converging respectively toward the convex and concave hulls, are then defined. These flows can be used in a similar way as the upper and lower α-scale- spaces defined earlier, that is two envelopes enclosing the original function. However the relationships between the two frameworks cannot be easily exhibited. In the particular case of a convex function u, the PDE can be used as an interpolation operator of Pα− . It amounts to using the convex hull of Pα− . A tessellation of u into convex components would be needed in the general case. 3.2
α-Shape and Mathematical Morphology
Pioneered by Serra and Matheron, the mathematical morphology [15] is one of the most classical theory in image analysis. In the scale-space theory, it has been shown to be related to the mean curvature motion and related filters [5]. Typically the gray-scale opening of an image is defined as the composition of a gray-scale erosion and dilation. Let u be considered as the part of Rd below it (in R3 for images: u = {x ∈ R3 | x3 < u(x1 , x2 )}). Its opening can be written as: O(u) = Sup{B ⊂ B, B ⊂ u} with B the class closed under union generated by a structuring element B0 . Thus if the structuring element B0 is a ball of radius r, Or (u) is the best approximation of u below u by unions of ball in R3 . It corresponds to the so-called “rolling ball” algorithm commonly used to remove backgrounds and trends in biological imaging. It is intuitively related to the α-shape, as both are obtained through an approximation by “sweeping balls”. Indeed we have
Computational Geometry-Based Scale-Space
777
Fig. 3. Lower α-scale-space and opening on an artificial example (see text). Blue crosses: original points; red line: opening by a disk of radius 1; green: α-scale-space for α = 1. − Proposition 1. There exists an interpolation operator I for which u+ α and uα are √ respectively the morphological opening and closing of u by a sphere of radius α.
Proof. Let Pα− = {(xi1 , . . .√ , xid ), i = 1 . . . |Pα− |}. Points in Pα− have by definition an empty space of radius α below them, so we can write ∀i ∈ [1 . . . |Pα− |], O√α (u)(xi1 , . . . , xid−1 ) = xid , √ that is all points of Pα− are in O√α (u). Thus u− α = O α (u) is a valid definition of a lower α-scale-space. The same ideas apply for closing and upper α-scale-spaces.
However, the resulting interpolation operator does not verify the maximality principle. It can be seen on a simple counter-example: let b : [−1, 1] → R be the half-sphere of radius 1 centered on O, and let u = b + εb, with ε small. Then the opening with a ball of radius 1, O1 (u) = b, while Pα− for α = (1)2 is the points (0, −1) and (0, 1). As an interpolation of Pα− in [−1, 1], O1 (u) does not verify the maximality principle. This example is presented in Fig.3 where u is shown in blue and r corresponding to the opening and the interpolated α-scale-space are respectively shown in red and green. Thus the connection with some tools from mathematical morphology gives an intuitive idea of α-scale-space, but is not a suitable choice in practice since the maximality principle does not hold.
4
Application to Additive Signal Decomposition in Fluorescence Video-Microscopy
The proposed framework is especially used to solve the additive signal decomposition problem in fluorescence microscopy. We shall see that the proposed framework is particularly adapted to process such image sequences depicting two components with different spatio-temporal characteristics that translate into signals with different convexities.
778
A. Chessel et al.
Biological context and motivation. The discovery of the xFP (naturally fluorescent proteins), for which the 2008 chemistry Nobel prize was awarded, along with advances in genetic engineering allow for the coupling of any protein of interest with a fluorescent tag in cells, tissues and organisms. Dynamic visualization of in vivo protein behavior is then possible in many biological models. In this section, we study the Rab GTPase family, a family of proteins involved in the intra-cellular transport and maintenance of membranes. The Rab proteins are known to exist in two main states, a cytosolic state slowly diffusing and a membrane state in which they eventually move as vesicles with directed movements. The fluorescence depends in first approximation on the concentration of fluorescent proteins, and the membrane state consists of vesicles of much higher concentration than the cytosol in which they are embedded. Thus the vesicles are seen as dots of higher intensity moving rather quickly on a slowly varying background. The separation of those two components is a necessary step for several studies [13]. Thanks to the biophysical properties of the image, it can be regarded as an additive signal decomposition problem. The intensity of a given pixel is proportional to the total concentration of the proteins in the corresponding volume, defined as the sum of the concentration of the proteins in each state. Thus for an acquired image sequence u, we can write u = ucyt + uves with ucyt the cytosol component slowly varying in both space and time, and uves the vesicular component, very localized in space and time and moving rather quickly. In our experiments, we performed pairwise comparisons of the behavior of specific Rab complexes. Rabin8 against Rab8wt (wild type) or against Rab11Awt are especially studied in this paper. Rabin8 was previously shown to interact with both (or either) the two Rab GTPases, Rab8 and Rab11A, whose functions in membrane traffic are closely related, although separated in time and location. Those proteins have clear different properties, and in particular show different separations into membrane and cytosolic states. The images shown are taken using fast Total Internal Reflection Fluorescent microscopy (10 to 20 frames/seconds, time sequences 120s) which allows for an imaging of the very narrow depth inside the cell near the plasma membrane. Experiments and results. Previous related works include spatial detection using wavelets [14] and background extraction using a temporal model [4]; the vesicles appear as bright blobs in space or as spikes in time. Figure 4shows the intensity variation wrt time for a given pixel (the two spikes correspond to vesicles passing through that pixel). Generally, the decreasing trend corresponding to the loss of fluorescence over time (known as observational photobleaching) needs to be estimated and to be potentially removed. Our scale-space framework is then particularly well suited if we consider the lower envelope as the key feature to analyze such additive signals. When applied to individual 2D images, the image sequence decomposition is not reliably performed (not shown in the paper). The typical size of vesicles is relatively small and may be confounded by noise. It turns out that the temporal characteristics (movements, apparition and disappearance) are actually more pregnant than the spatial ones, and are analyzed further.
Computational Geometry-Based Scale-Space
779
Fig. 4. Intensity variation wrt time for a given pixel (the two spikes correspond to two vesicles passing through that pixel)
Figure 5 shows the application of the proposed lower α-scale-space to analyze 1D temporal signals, processed individually. Two typical sequences depicting several Rab proteins are presented. We decided arbitrarily to display the maximum intensity projection (MIP) along the time axis since these maps enable to summarize the sequence contents (the videos are more demonstrative but can not be embedded in the paper). In Fig.5, the bright lines correspond to the main trajectories of moving vesicles. The MIP maps of the original image sequences are shown on the first row in Fig. 5. The second and the third rows show respectively the MIPs of the vesicular components and the MIPs of computed trends corresponding to the lower α-scale-spaces when α = 1000. The vesicular components are computed as the difference between the original image sequence and its computational geometry-based scale-space representation. By summing the two decomposed image sequences, we get the original image sequences with no loss of information. Clearly, the bright lines are enhanced and the slowly moving background contain no blob (moving vesicles) as desired. A few structures in the vesicular component are only hinted at in the original sequence. In this biological experiment, we focus on the pairwise co-localization of membrane (or vesicular) signals of two pairs of Rab proteins: (Rab8 and Rabin8) and (Rab11a and Rabin8). In Fig. 5 (left), co-localization of (Rab8 and Rabin8) is measured near the plasma membrane of the cell; (Rab11a and Rabin8) appear more co-localized in the same area, for the time scale we considered in this experiment. It turns out that our signal decomposition enables to better assess co-localization for moving and static structures in the image sequences. Each temporal signal is processed individually but the resulting image sequences are surprisingly regularized. While the results are quite preliminary and need to be carefully inspected by experts, we have now the opportunity to perform quantitative spatio-temporal co-localization. This validation is currently underway. It is worth noting that the proposed algorithm is very fast (a few seconds to process several hundred images) and easy to control. It is routinely used by biologists (via the ImageJ software) to separate the membrane (or vesicular) and cytosolic states of Rab proteins. This method is relevant to elucidate the roles of different molecular partners and can be used in many other topics. It will allow further a better description and classification of complex behaviors of various proteins, more reliably than previously, that is from the original image sequences.
780
A. Chessel et al. Rab8 and Rabin8
Rab11a and Rabin8
Fig. 5. MIP maps of image sequences. Top to bottom: original images, vesicular components, cytosolic components (see text).
5
Conclusion
The computational geometry concepts, earlier introduced for 3D image synthesis were exploited to derive an original scale-space-based signal/image representation. Sub-samplings of the original function are performed, keeping only the points with enough space around them. The selected points are then used to compute a continuous approximation of the original function by interpolation. The relationships with the Empirical Mode Decomposition was established, since they share algorithmic similarities and enable to represent the original signal using a limited number of modes. The relationships between the proposed framework and several classical scale-space frameworks were also studied. In particular, it was shown that, for a well chosen interpolation operator, the lower α- scale-space is equivalent to a gray-scale opening in mathematical morphology. Finally the particular case of additive decomposition in video sequences acquired by fluorescence microscopy was addressed and served as a demonstration. The convexity-based decomposition is particularly well suited to the biophysical properties of the studied proteins in fluorescence microscopy.
Computational Geometry-Based Scale-Space
781
A number of theoretical questions remain open for investigation. In particular, going beyond spheres to define α-shape, toward α-shapes in generic imageinduced metric may lead to interesting results both in theory and practice. Finding an appropriate interpolation operator remains an open question to be addressed in future work.
References 1. Amidror, I.: Scattered data interpolation methods for electronic imaging systems: a survey. Journal of Electronic Imaging 11, 157–176 (2002) 2. CGAL Editorial Board. CGAL-3.2 User and Reference Manual (2006) 3. Bobach, T., Hering-Bertram, M., Umlauf, G.: Comparison of voronoi based scattered data interpolation schemes. Palma de Majorque (2006) 4. Boulanger, J., Kervrann, C., Bouthemy, P.: Estimation of dynamic background for fluorescence video-microscopy. In: 2006 IEEE International Conference on Image Processing, pp. 2509–2512 (2006) 5. Cao, F.: Geometric curve evolution and image processing. Lecture notes in mathematics (2003) 6. Caselles, V., Morel, J.M., Sbert, C.: An axiomatic approach to image interpolation. IEEE Trans. Image Processing 7, 376–386 (1998) 7. de Berg, M., van Kreveld, M., Overmars, M., Schwarzkopf, O.: Computational geometry: algorithms and applications. Springer, New York (1997) 8. Edelsbrunner, H., Mucke, E.P.: Three-dimensional alpha shapes. ACM Transactions on Graphics 13, 43–72 (1994) 9. Flandrin, P., Rilling, G., Goncalves, P.: Empirical mode decomposition as a filter bank. IEEE Signal Processing Letters 11, 112–114 (2004) 10. Huang, N.E., Shen, Z., Long, S.R., Wu, M.C., Shih, H.H., Zheng, Q., Yen, N.C., Tung, C.C., Liu, H.H.: The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Royal Society of London Proceedings Series A 454, 903 (1998) 11. Nunes, J.C., Bouaoune, Y., Delechelle, E., Niang, O., Bunel, P.: Image analysis by bi-dimensional empirical mode decomposition. Image and Vision Computing 21, 1019–1026 (2003) 12. Panfili, J., de Pontual, H., Troadec, H., Wright, P.J. (eds.): Manual of fish sclerochronology, Ifremer-ird coedition (2002) 13. Pecot, T., Kervrann, C., Bouthemy, P.: Minimal paths and probabilistic models for origin-destination traffic estimation in live cell imaging. In: ISBI 2008, pp. 843–846 (2008) 14. Racine, V., Sachse, M., Salamero, J., Frasier, V., Trubuil, A., Sibarita, J.-B.: Visualization and quantification of vesicle trafficking on a three-dimensional cytoskeleton network in living cells. Journal of Microscopy 225, 214–228 (2007) 15. Serra, J.: Image analysis and mathematical morphology, vol. 1. Academic press, London (1982) 16. Vese, L.: A method to convexify functions via curve evolution. Commun. Partial Differential Equations 24, 1573 (1999)
Highlight on a Feature Extracted at Fine Scales: The Pointwise Lipschitz Regularity Christophe Damerval1 and Sylvain Meignen2 1 2
Dept. of Computer Science, Katholieke Universiteit Leuven, Belgium Laboratoire Jean Kuntzmann (LJK), University of Grenoble, France
Abstract. The aim of this paper is to study the robustness of the pointwise Lipschitz regularity in 2D, which is a measure of the local regularity of the intensity function associated to an image. This regularity can be efficiently computed by an approach based on fine scales. We assess its robustness when the image undergoes various transformations, especially geometric ones. The results we obtain show that the pointwise Lipschitz regularity is a suitable feature for applications in computer vision. Keywords: Lipschitz regularity, invariance properties, wavelet decompositions, multiscale edge detection, extraction of characteristic values, robustness to transformations applied to the image.
1
Introduction
The extraction of invariant or robust features from an image appears as a central issue in computer vision. The difficulty of this problem lies in the fact that natural scenes are often viewed under various situations, corresponding to a wide class of transformations (geometric deformations or illumination change for instance). So as to reach certain invariance properties regarding transformations such as scale change, multiscale approaches were put forward. In particular, methods based on the Scale-Space theory [1,2,3] turned out as successful in computer vision. These can evidence regions of interest, which are stable through local geometric deformations [4]. These regions are identified by their location and characteristic scale, and their content can be quantified by a suitable descriptor [5]. Recent works compared state-of-the-art interest regions detectors [6] and region descriptors [7]. Existing methods proved to be efficient for one type of scene or transformation [8, 9, 10]; however, no method outperforms the others in all cases, so combining different kinds of features seems relevant. In this paper, we study a feature related to the local regularity of the intensity function: the pointwise Lipschitz regularity α ∈ R (denoted regularity α). It was widely studied in 1D, especially in the case of multifractal signals [11, 12], and also applied to the characterization of singularities [13, 14] and landmark registration [15]. In 2D, methods based on regularity measures were also put forward with applications to textured images [16, 17], the regularity being used from a global point of view. Besides, recent advances in edge detection using X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 782–794, 2009. c Springer-Verlag Berlin Heidelberg 2009
Highlight on a Feature Extracted at Fine Scales
783
multiscale approaches [18] were used for object detection [19]. New developments using the multiscale SIFT descriptor were also recently proposed [20]. Here we are concerned with the pointwise regularity in 2D. More precisely, a multiscale approach focused on fine scales allows to compute numerically values of regularity α. As we will see, this regularity appears as a relevant feature to detect points of interest and could be profitably used as an image descriptor: on the one hand it has invariance properties (especially regarding geometric transformations), and on the other hand values of the pointwise Lipschitz regularity α can be efficiently computed. This paper is organized as follows. We first present the notion of regularity α in 2D, its invariance properties. So as to numerically estimate α in 2D, we recall an algorithm based on a multiscale edge detector [21], which gives pointwise estimations of α at edge points of the image. We also present a methodology so as to compare values of α between two images related by a geometric deformation. Finally, an evaluation procedure allows to assess the robustness of the regularity α, for natural scenes viewed under various imaging conditions. The obtained results show that the regularity α makes up a robust feature.
Regularity α in the Context of Image Analysis
2
We consider an image (in level of gray) given by its intensity function f : R2 → R. We first present briefly the definition of Lipschitz regularity in 2D, inferred from the 1D definition [13]. This leads to the notion of regularity α. Then we recall a known algorithm for computing the values of the regularity α. We also investigate its invariance properties, especially from a practical point of view. 2.1
Notion of Regularity α in 2D – Invariance Properties
The Lipschitz regularity generalizes the usual notion of regularity. Definition 1. (1D Lipschitz regularity) Given α ∈]0, 1[, a function f : R → R is α-Lipschitz at x0 ∈ R if there exists a neighborhood V of x0 and A > 0 so that ∀x ∈ V, |f (x) − f (x0 )| ≤ A|x − x0 |α This can be extended for α ∈ R. In particular, for α = n ∈ N∗ , this corresponds to a locally C n function. Besides, for α < 0, this definition can be generalized thanks to the theory of distributions (see details in [14]). Definition 2. (2D Lipschitz regularity) Let f : R2 → R and x0 ∈ R2 . For θ ∈ [0, 2π[, we define fθ : R∗+ → R as fθ (h) = f (x0 + huθ ), where uθ = (cos θ, sin θ). For α ∈ R, f is α-Lipschitz at x0 ∈ R2 if ∀θ ∈ [0, 2π[, fθ α-Lipschitz at 0 Note that this definition agrees with the usual definition of the Lipschitz regularity. Indeed, when α ∈]0, 1[, provided f is α-Lipschitz at x0 , we can write |f (x) − f (x0 )| = |f (x0 + (h cos θ, h sin θ)) − f (x0 )| = |fθ (h) − fθ (0)| ≤ Ahα |f (x) − f (x0 )| ≤ A||x − x0 ||α with A > 0, for x in a neighborhood of x0 .
784
C. Damerval and S. Meignen
Definition 3. (Regularity α) Let f : R2 −→ R and x0 ∈ R2 . The regularity α of f at x0 is defined as α = α(f, x0 ) = inf{α0 ∈ R, f α0 -Lipschitz at x0 }
(1)
The relevance of α arises from its invariance properties: the regularity α appears as a characteristic value. In particular, let us study the case of a constant affine deformation (so including rotation and scale change), widely studied in the ScaleSpace theory [2]. Proposition 1. (Influence of an affine deformation on the regularity α) Let f : R2 −→ R, and g defined by ∀x ∈ R2 , g(x) = f (Bx), with B a 2 × 2 invertible matrix
(2)
Then, for α ∈]0, 1[, α(f, x0 ) = α(g, y0 ) with y0 = B −1 x0 . Proof. According to def. 3, there exists A > 0 so that ∀θ ∈ [0, π[, |f (x0 ) − f (x0 + huθ )| ≤ Ahα
(3)
Let us study the regularity of g at y0 = B −1 x0 . For θ ∈ [0, 2π[ we have |g(y0 ) − g(y0 + huθ )| = |g(B −1 x0 ) − g(B −1 x0 + huθ )| = |f (x0 ) − f (x0 + hBuθ )|
(4) (5)
Moreover, since Buθ = λuθ with λ ∈ R∗ and θ ∈ [0, 2π[: |g(y0 ) − g(y0 + huθ )| = |f (x0 ) − f (x0 + hλuθ )| ≤ (A|λ|α )hα
(6) (7)
So there exists A > 0 so that ∀θ ∈ [0, 2π[, |g(y0 ) − g(y0 + huθ )| ≤ A hα
(8)
and g is α-Lipschitz at x0 . Then, let us assume the regularity α of f corresponds to a minimum α0 attained in a certain direction θ0 . Since we consider a constant affine deformation, there exists θ1 so that Buθ1 and uθ0 are collinear. Hence, the regularity α of g at B −1 x0 corresponds to a minimum α0 in the direction θ1 . So the regularity α is preserved when a constant affine deformation is applied to the image. Note this invariance property may not always hold in practice. Indeed, when B becomes nearly singular (case of extreme deformations), λ can be very small when uθ is an eigenvector of B. So there may be numerical instabilities for extreme deformations. However, as we will see, the regularity α yields a significant robustness for wide-ranging transformations (and not only small deformations). Now, let us discuss more general transformations, given by ∀x ∈ R2 , g(x) = f (v(x)) with v : R2 −→ R2
(9)
Highlight on a Feature Extracted at Fine Scales
785
In this general context, note that the regularity α is not necessarily preserved: depending on the regularity of v, g may be more regular than f , resulting in a higher regularity α for g than for f . Nevertheless, we point out that it can be preserved in many practical cases, especially when considering image edges [22]. Note that in the case of an image representing an edge, f is regular along the tangent to the edge and irregular along the normal direction – see Fig.1(a). In this regard, so as to estimate precisely the regularity α of f at a given point, it is important to determine the direction of maximum irregularity; we further explain how to compute this direction and estimate α (section 2.2). More generally, since transformations such as local affine deformations do not alter the topology of the edges, the regularity α on these edges should be preserved – see Fig.1(b). 2.2
Numerical Computation through a Multiscale Approach
So as to compute numerical values of α, we use a known approach based on a multiscale edge detector [21]. Let us recall briefly some aspects of this detector. According to Canny [22], edge points correspond to locations where the magnitude of the gradient attains a local maximum in the direction of the gradient – which is the direction of maximum irregularity. A generalization of Canny’s detector was put forward by Mallat, using wavelet decompositions [21]. This allows to detect edge points, and also to compute an accurate estimation of the regularity α at these edge points. This computation of α is carried out using a linear regression at the finest scales. Besides, denoting N the size of the data (N = n2 for an image n × n), this formulation can be computed in O(N ), thus allowing a fast computation. In summary, this detector is known as an efficient method so as to compute numerical values of α. We emphasize that this method is focused on the finest scales, and that it gives pointwise estimations of the regularity α. Given an image f , the output of this detector can be expressed as a set (xi , yi , αi ) ∈ R3 , 1 ≤ i ≤ nf (10) where nf is the number of detected edge points (xi , yi ), each being associated to a value αi . Since edge points correspond to singularities (where f may not be differentiable), the obtained values can be negative: typically a boundary leads to α = 0, a line to α = −1, and an isolated point to α = −2. For natural images, we obtain various values; for instance, given the image represented on Fig.2(a), we represent the density associated to the regularity α of detected edge points on Fig.2(c). In this case, 95% of the computed values are within [−1.4, 0.8]. Besides, some parameters of the detector (like thresholding) allow to tune the number of edge points. We use here a light thresholding in our numerical experiments, so as to obtain a large number of values of regularity α. This is consistent insofar as we want to evaluate the robustness of the regularity α from a practical point of view. 2.3
Empirical Study in the Case of an Affine Deformation
We study here the robustness of the estimation of α in the case of natural images, for which values of regularity α are computed at detected edge points. For that
786
C. Damerval and S. Meignen
Deformation
Edge D1
Tangent
D2
Normal
Deformed edge
Original edge
(a)
(b)
Fig. 1. (a) At a point belonging to an edge line, the Lipschitz regularity is minimal along the normal direction; along this direction, the regularity α can be accurately computed. (b) If an edge undergoes a deformation which does not change its topology, the regularity α is preserved (D1 , D2 : directions of maximum irregularity).
0.07
0.06
Density
0.05
0.04
0.03
0.02
0.01
0 −3
(a)
−2
(b)
−1
0
1
Regularity α
2
(c) 0.9
Proportion of correct matches
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0
(a’)
(b’)
Exact Matches Approximate Matches 0.1
0.2
0.3
0.4
Tolerated error on regularity α
0.5
(d)
Fig. 2. (a,b) Two images related by a known affine deformation; (a’, b’) Detected edge points; (c) Density of the regularity α, based on edge points represented on (a’); (d) Errors on the estimation of α for exact matches (EM) and approximate matches (AM) between edge points of (a’) and (b’)
purpose we consider an original image X0 (see Fig.2(a)) and a deformed image X1 (see Fig.2(b)) related by a known affine deformation. This homography allows to carry out point-to-point correspondences and thus to compare the values of regularity between the two images (see Fig.2(a’,b’)). Given X0 and X1 , the detector leads to two sets of edge points with associated values of regularity α:
Highlight on a Feature Extracted at Fine Scales
787
S0 = (x0i , yi0 , α0i )i∈I0 and S1 = (x1j , yj1 , α1j )j∈I1 Afterwards, we project the points (x1i , yi1 )i ∈ I1 into the coordinates of X0 , and we carry out correspondences between the sets S1 and S0 . At this step, we have two possible choices: either exact matches (EM) for which a projected point of S1 corresponds exactly to a point of S0 ; or approximate matches (AM), by tolerating an error of 1 pixel. This allows to compare the computed values of α: given a correspondence between (x0i , yi0 , α0i ) and (x1j , yj1 , α1j ) (either EM or AM), we define the error on α (for one matched pair) as dα = dα (i, j) = |α0i − α1j |
(11)
Finally, a match is said correct if dα < , where the parameter is a tolerated error on α. We point out there is a certain freedom of choice for , which should depend on the application. In this regard, there is a trade-off between too low values (refusing any numerical error on α) and too high values of (not taking into account α). Let us now study the effect of this parameter > 0, by comparing the sets S0 and S1 : we represent on Fig.2(d) the proportion of correct matches depending on , for both EM and AM. As expected, it increases with respect to the parameter . More precisely, a good result lies in the fact that it increases rapidly for small values , becoming thus significant: indeed, when exceeds 0.3, this proportion attains almost 80%. Moreover we observe that the results are only slightly better for EM than AM, so that it can be relevant to consider AM to define a descriptor since the number of extracted points is significantly larger. So these first results show that the regularity α estimated at edge points is a feature robust to affine deformations. Let us now evaluate the robustness of this feature in a more general context, when various transformations are applied to natural images.
3 3.1
Quantifying the Robustness of the Regularity α Evaluation Procedure
We consider 8 sequences, each consisting of 6 images (Xk )0≤k≤5 : ZoomRotation1, ZoomRotation2, Viewpoint1, Viewpoint2, Blur1, Blur2, Jpeg and Light (see Fig.3). For each sequence, the 6 images represent a given scene viewed under a certain imaging condition. For instance, considering the sequence Viewpoint1 (see Fig.4), each image Xk (1 ≤ k ≤ 5) corresponds to a change of viewpoint applied to the reference image X0 . The relevance of these sequences lies in different aspects. First they represent various objects: textured scenes – repeated textures, see Fig.3(a,d) – and structured ones – homogeneous regions with edges boundaries, see Fig.3(b,c). Secondly the imaging conditions are wide-ranging: geometric deformations and specific transformations like JPEG compression. Thirdly the degree of these transformation can be significant (scale change up to 4, angle of viewpoint up to 60o , JPEG compression rate up to 98%). Finally, we mention the sequences ZoomRotation1, ZoomRotation2, Viewpoint1 and
788
C. Damerval and S. Meignen
Viewpoint2 correspond to actual camera displacements; the sequences Blur1, Blur2 and Light correspond to camera operations (varying the camera focus or shutter speed); for the sequence Jpeg, different levels of JPEG compression were obtained by a software. For illustration purposes, we represent on Fig.3 the images X0 and X5 associated to every sequence. For more details, see http://www.robots.ox.ac.uk/~vgg/research/affine For a given set of images (Xk )0≤k≤5 associated to a sequence (viewpoint change for instance, see Fig.4), we carry out the following procedure: 1. For each image (Xk )0≤k≤5 , detect edge points and compute associated values of regularity α: pki = (xki , yik , αki ), 1 ≤ i ≤ nk . 2. For fixed k (1 ≤ k ≤ 5), determine a set of C k of point-to-point correspondences between edge points of X0 and Xk (thanks to the known homography between these images) C k = (p0i , pkj ) matched according to a geometric criterion (12) This leads to a certain number of correspondences (NC) #C k . 3. Select the subset Ck of correspondences for which regularities are sufficiently close (according to a parameter > 0) Ck = (p0i , pkj ) ∈ C k , dα = |α0i − αkj | < (13) and compute the matching score, representing the proportion of correct matches: #Ck Sk = (14) #C k This score reflects the robustness of the regularity α. We will study its evolution with respect to for the sequence Viewpoint1 (Fig.5), and also with respect to k for all sequences (Fig.6, for fixed ). Note that in step 2, the matches based on a geometric criterion can be either exact or approximate (as described in section 2.3); we study both EM and AM. In step 3, one may have to choose the parameter , representing a tolerated error on α (as seen in section 2.3). In our experiments we use = 0.3, for which a high proportion of the matches (almost 80%) are deemed correct in the case of the affine deformation studied in section 2.3 (see Fig.2(d)). Note also that this choice allows to identify clearly boundaries (α = 0), lines (α = −1) and isolated points (α = −2). 3.2
Analysis of the Results
Study of the sequence Viewpoint1 (Fig.5) First, we consider the sequence Viewpoint1, associated to increasing angles of viewpoint changes (k = 1 : 20o , k = 2 : 30o , k = 3 : 40o, k = 4 : 50o , k = 5 : 60o ). For each k (1 ≤ k ≤ 5), we represent on Fig.5(ak ) the evolution of Sk with respect to the tolerated error on α (parameter ), for both EM and AM. The obtained results are similar to the affine deformation studied in section 2.3; indeed, EM
Highlight on a Feature Extracted at Fine Scales
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
789
Fig. 3. Sample of data set, representing X0 (reference image) and X5 (highest degree of transformation) associated to: (a, b) Scale change and rotation; (c, d) Viewpoint change; (e, f) Blur; (g) JPEG compression; (h) Illumination change
Fig. 4. (Top) Complete sequence Viewpoint1 (viewpoint change): 6 images X0 , ..., X5 ; (Bottom) Associated edge points
C. Damerval and S. Meignen 1
1
0.9
0.9
0.8
0.8
0.8
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.1
0.2
0.3
0.4
0.7 0.6 0.5 0.4 0.3 0.2 0.1
Exact Matches Approximate Matches
0 0
0.1
0.2
0.3
0.4
(a1 )
0.6 0.5 0.4 0.3 0.2 Exact Matches Approximate Matches
0 0
0.5
Tolerated error on regularity α
0.7
0.1
Exact Matches Approximate Matches
0 0
0.5
Tolerated error on regularity α
Proportion of correct matches
1 0.9
Proportion of correct matches
Proportion of correct matches
790
0.1
0.2
(a2 )
0.9
0.3
0.4
0.5
Tolerated error on regularity α
(a3 )
0.9 1
0.8
0.8
0.6 0.5 0.4 0.3 0.2 0.1
0.1
0.2
0.3
0.4
0.6 0.5 0.4 0.3 0.2 0.1
Exact Matches Approximate Matches
0 0
0.7
Exact Matches Approximate Matches
0 0
0.5
Tolerated error on regularity α
Proportion of correct matches
Proportion of correct matches
Proportion of correct matches
0.9
0.7
0.1
0.2
0.3
0.4
0.7
2
1
0.6
3 4
5
0.5 0.4 0.3 0.2 0.1 0 0
0.5
Tolerated error on regularity α
(a4 )
0.8
0.1
0.2
0.3
0.4
0.5
Tolerated error on regularity α
(a5 )
(b)
1
1
0.9
0.9
0.8
0.8
0.8
0.8
0.7 0.6 0.5 0.4 0.3 0.2 Exact Matches Approximate Matches
0.1 0 1
1.5
2
0.7 0.6 0.5 0.4 0.3 0.2 Exact Matches Approximate Matches
0.1
2.5
3
3.5
0 1
4
1.5
Factor of scale change
0.6 0.5 0.4 0.3 0.2
2.5
0
3
Exact Matches Approximate Matches 20
30
Factor of scale change
(a)
0.7 0.6 0.5 0.4 0.3 0.2
50
0
60
1 0.9
0.8
0.8
0.8
0.8
0.4 0.3 0.2 Exact Matches Approximate Matches
0.1 0
2
2.5
3
3.5
0.6 0.5 0.4 0.3 0.2 Exact Matches Approximate Matches
0.1 4
4.5
Increasing blur
(e)
5
5.5
6
0
2
2.5
3
3.5
0.7 0.6 0.5 0.4 0.3 0.2 0.1
4
4.5
Increasing blur
(f)
5
5.5
6
Proportion of correct matches
1 0.9 Proportion of correct matches
1
0.7
0 55
Exact Matches Approximate Matches 60
65
70
75
85
90
95
Rate of JPEG compression JPEG (in %)
(g)
50
60
0.7 0.6 0.5 0.4 0.3 0.2 Exact Matches Approximate Matches
0.1 80
40
(d)
0.9
0.5
30
Angle of viewpoint change
1
0.6
20
(c)
0.9
0.7
Exact Matches Approximate Matches
0.1 40
Angle of viewpoint change
(b) Proportion of correct matches
Proportion of correct matches
0.7
0.1 2
Proportion of correct matches
1 0.9 Proportion of correct matches
1 0.9 Proportion of correct matches
Proportion of correct matches
Fig. 5. Proportion of correct matches with respect to the tolerated error on α, between the images X0 and Xk (1 ≤ k ≤ 5) associated to the sequence Viewpoint1 (see Fig.4). (ak ) Comparison between X0 and Xk , for both EM and AM; (b) Comparison between X0 and all Xk , for AM (the curves 1–5 correspond to those of fig.(a1 –a5 ) for AM).
100
0
2
2.5
3
3.5
4
4.5
5
5.5
6
Decreasing light
(h)
Fig. 6. (a–h) Matching scores ( = 0.3) associated to the sequences of Fig.3(a–h)
yield slightly better than AM in all cases, and the proportion of correct matches increases with the tolerated error on α (parameter ). In particular, for = 0.3, we note that the proportion of correct matches (for EM and AM) exceeds: 80% for the angles 20o , 30o and 40o , see Fig.5(a1 –a3 ); 60% for the angles 50o and 60o ,
Highlight on a Feature Extracted at Fine Scales
791
see Fig.5(a4 –a5 ). Besides, we display on Fig.5(b) all the preceding curves (only for EM), representing the degradation of the estimation of regularity α as the degree of the viewpoint change increases. Note that as soon as is larger than 0.2, the proportion of correct matches exceeds 50%, even for significant changes of viewpoint. This good result shows the regularity α makes up a robust feature. Study of all sequences (Fig.6) Now, considering the 8 sequences, for both approximate and exact matches, we represent on Fig.6 the curves (k, Sk ). More precisely, each graph of Fig.6 describes the performance for one particular sequence associated to a certain image transformation. For instance, Fig.6(c) refers to the sequence Viewpoint1 (associated to a viewpoint change), represented partially on Fig.3(c) (and comprehensively on Fig.4). This allows to assess the robustness of the estimation of the regularity α in general situations, which is the main objective of this paper. We emphasize that a method is all the better than it leads to higher scores and that these scores are stable, i.e, they remain high when the degree of deformation increases. On the basis of the matching scores (see section 3.1), we can evaluate how robust the estimation of α is, under various imaging conditions (Fig.6). The matching score gives the proportion of correspondences (between two images) for which computed values are close. Globally, we observe that the score tends to decrease as the degree of the transformation increases. Note also the scores for EM and AM are close, and that EM yield better results than AM (as we pointed out in section 2.3, see Fig.2). We do not observe a significant difference between textured (Fig.6(a,d)) and structured scenes (Fig.6(b,c)); however, the structured scene of Fig.6(c) performs better, due to the presence of clear edges. Let us now detail the analysis of the results for each transformation. Scale change and rotation, Fig.6(a,b): for both textured and structured scenes, the performance decreases overall (from 0.8 to 0.5). Since sequences ZoomRotation1 and ZoomRotation2 correspond to significant scale changes and rotations, the obtained results are satisfactory. Viewpoint change, Fig.6(c,d): the performance decreases moderately, remaining high for the structured scene (between 0.9 and 0.7, Fig.6(c)) and good for the textured scene (between 0.8 and 0.6, Fig.6(d)). In this regard, note that both sequences Viewpoint1 and Viewpoint2 contain distinct edge, which are moderately affected by a viewpoint change. Blur, Fig.6(e,f): the performance decreases rapidly (from 0.7 to 0.3) for both structured scenes and textured ones. It is not surprising to obtain these average results since the blur modifies the regularity α (edges are smoothed). It is well known that a smoothing operation alters the regularity α; yet, one can retrieve the regularity α when the smoothing kernel is known [14]. JPEG compression, Fig.6(g): the performance remains high (between 0.9 and 0.6), decreasing steadily. So JPEG artifacts have little impact on the regularity α, even if JPEG compression tends to blur sharp edges.
792
C. Damerval and S. Meignen
Light change, Fig.6(h): very high performance (stable, close to 0.9). Since illumination change does not alter the structure of the edges, the regularity α is not affected by such changes. In conclusion, the regularity α appears as very robust to light change and JPEG compression, and less to image blur. Concerning geometric deformations, we obtain very good results for viewpoint change (especially structured scenes) and satisfying ones for scale change and rotation. In addition, we observe that for all transformations except blur, the performance does not fall down, even for a high degree of transformation. This emphasizes the fact that the regularity α is characteristic of the kind of edge. In a blurry context, the computation of α seems less reliable; nevertheless this can be improved by using higher scales in the detector (provided the edges are sufficiently far apart). Eventually note that there is a balance between quantity – larger number of AM than EM, better repeatability for AM – and quality – better estimation of α for EM than AM. More precisely this balance is in favor of AM: the scores are only slightly inferior for AM than for EM, while AM leads to a greater number of points (which is important for image description). Note also that in an application such as image matching, increasing the parameter leads to a limited number of matched pairs (for which the computed values are closer, see section 2.3). 3.3
Discussion
Let us discuss now some aspects of the regularity α. First, concerning the way of computing values of α, note there exist methods based on wavelet transforms that allow to estimate the regularity α at any point of the image [23]. Here we focused on edges, since they appear to be robust to various transformations of the image. Moreover, so as to improve the matching performance, the number of detected edge points can be limited. This can be done by selecting only the highest responses (threshold on the modulus of the gradient), or by considering higher scales. This will result in evidencing only the most salient edges. Here, we considered all the values for regularity α, showing that a significant proportion of the computed values α corresponds to values for which the estimation is robust. In addition, we can compare certain aspects of our approach with works related to interest regions [6, 7], to the extent they allow to characterize some objects present in the image. To that regard, it is important to note that we focus on position and pointwise regularity α, whereas these methods are based on position and characteristic scale. On the one hand, such methods use this characteristic scale so as to define interest regions, and then compute associated descriptors which characterize their content. On the other hand, our method gives pointwise features, so there is no region of interest associated to the Lipschitz regularity (the definition of such regions seems possible, but is not straightforward). Besides, we can compare the performance measures of these two methods. Since the criteria used are different, we can only draw conclusions from the shape of the curves. Numerically we observe that our method is more stable than those based on interest regions: as the level of transformation increases, the performance declines slower overall. More precisely, our method appears:
Highlight on a Feature Extracted at Fine Scales
793
more stable for JPEG, Viewpoint1, Viewpoint2 and ZoomRotation2; less stable for ZoomRotation1 and Blur1; equivalent for Light, Blur2. So, compared to the best state-of-the-art methods, the regularity α (computed at edge points) yields a significant robustness to various image transformations.
4
Conclusions and Perspectives
In this paper, we studied the regularity α in the context of interest point detection, focusing on edge points. This approach is based on fine scales (pointwise features) which differs from other methods based on coarser scales (local features). We explained why certain transformations of the image do not change the pointwise regularity α at such edge points. Hence the regularity α stands out as a characteristic value. The main contribution of our work lies in quantifying the robustness of the estimated value of α. For that purpose we proposed an evaluation procedure which allows to compare the values of the regularity α between two images related by a known homography. This leads to good results of robustness concerning geometric deformations – such as viewpoint change, scale change and rotation – and also JPEG compression and illumination change. So the regularity α (computed at edges) appears as a relevant feature for various tasks in computer vision. In terms of perspectives, let us point out potential applications of the regularity α. A first application may consist on clustering edge points into edges (1D-curves) since these have connexity properties. Instead of relying only on a distance measure, such a clustering would use a criterion based on both distance and regularity α. Secondly, the regularity α could be used complementary to interest region descriptors: the estimated regularity α at all edge points within a given region may help to characterize the content of the region. For that purpose, it is interesting that our method can evidence a great number of features. Besides, the regularity α has potential applications to image registration, in particular feature-based methods (thanks to the identification of lines, curves, points and corners). Eventually, the regularity α appears as an interesting additional feature, additional to other existing local features: an integration of the pointwise Lipschitz regularity in existing detectors will certainly improve their performance.
Acknowledgement We would like to thank the referees for relevant suggestions, and also Prof. M. Jansen from the K.U. Leuven for useful comments.
References 1. Iijima, T.: Basic theory on normalization of pattern (in case of typical onedimensional pattern). Bull. of the Electrotechnical Laboratory 26, 368–388 (1962) 2. Lindeberg, T.: Scale Space Theory in Computer Vision. Kluwer, Dordrecht (1994)
794
C. Damerval and S. Meignen
3. Witkin, A.: Scale-space filtering. In: Proceedings of the 8th International Joint Conference on Artificial Intelligence, pp. 1019–1021 (1983) 4. Lindeberg, T.: Feature detection with automatic scale selection. International Journal of Computer Vision 30(2), 77–116 (1998) 5. Lowe, D.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004) 6. Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Gool, L.V.: A comparison of affine region detectors. International Journal of Computer Vision 62(1), 43–72 (2005) 7. Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Transactions on PAMI 27(10), 1615–1630 (2005) 8. Kadir, T., Zisserman, A., Brady, M.: An affine invariant salient region detector. In: Pajdla, T., Matas, J. (eds.) ECCV 2004. LNCS, vol. 3021, pp. 228–241. Springer, Heidelberg (2004) 9. Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: British Machine Vision Conf., pp. 384–393 (2002) 10. Tuytelaars, T., Van Gool, L.: Matching widely separated views based on affine invariant regions. International Journal of Computer Vision 59(1), 61–85 (2004) 11. Arneodo, A., Bacry, E., Jaffard, S., Muzy, J.F.: Singularity spectrum of multifractal functions involving oscillating singularities. Journal of Fourier Analysis and Applications 4(2), 159–174 (1998) 12. Benassi, A., Cohen, S., Istas, J., Jaffard, S.: Identification of filtered white noises. Stochastic Processes and their Applications 75(1), 31–49 (1998) 13. Jaffard, S., Meyer, Y.: Wavelet methods for pointwise regularity and local oscillations of functions. American Mathematical Society (1996) 14. Mallat, S., Hwang, W.L.: Singularity detection and processing with wavelets. IEEE Transactions on Information Theory 38(2), 617–643 (1992) 15. Bigot, J.: Automatic landmark registration of 1d curves. In: Recent advances and trends in nonparametric statistics, pp. 479–496. Elsevier, Amsterdam (2003) 16. Deguy, S., Debain, C., Benassi, A.: Classification of texture images using multi-scale statistical estimators of fractal parameters. In: British Machine Vision Conference, pp. 192–201 (2000) 17. Kaplan, L.M., Kuo, C.C.: Texture roughness analysis and synthesis via extended self-similar model. IEEE Transactions on PAMI 17(11), 1043–1056 (1995) 18. Martin, D., Fowlkes, C., Malik, J.: Learning to detect natural image boundaries using local brightness, color and texture cues. IEEE Transactions on PAMI 26(5), 530–549 (2004) 19. Ferrari, V., Fevrier, L., Jurie, F., Schmid, C.: Groups of adjacent contour segments for objet detection. IEEE Transactions on PAMI 30(1), 36–51 (2008) 20. Brown, M., Lowe, D.: Automatic panoramic image stitching using invariant features. International Journal of Computer Vision 74(1), 59–73 (2007) 21. Mallat, S., Zhong, S.: Characterization of signals from multiscale edges. IEEE Transactions on PAMI 14(7), 710–732 (1992) 22. Canny, J.: A computational approach to edge detection. IEEE Transactions on PAMI 8(6), 679–698 (1986) 23. Mallat, S.: A wavelet tour of signal processing. Academic Press, London (1998)
Line Enhancement and Completion via Linear Left Invariant Scale Spaces on SE(2) Remco Duits1,2 and Erik Franken2 1
Dept. of Mathematics and Computer Science 2 Dept. of Biomedical Engineering, Eindhoven University of Technology, Den Dolech 2, P.O.Box 513, 5600 MB Eindhoven, The Netherlands [email protected], [email protected]
Abstract. From an image we construct an invertible orientation score, which provides an overview of local orientations in an image. This orientation score is a function on the group SE(2) of both positions and orientations. It allows us to diffuse along multiple local line segments in an image. The transformation from image to orientation score amounts to convolutions with an oriented kernel rotated at multiple angles. Under conditions on the oriented kernel the transform between image and orientation score is unitary. This allows us to relate operators on images to operators on orientation scores in a robust way such that we can deal with crossing lines and orientation uncertainty. To obtain reasonable Euclidean invariant image processing the operator on the orientation score must be both left invariant and non-linear. Therefore we consider nonlinear operators on orientation scores which amount to direct products of linear left-invariant scale spaces on SE(2). These linear left-invariant scale spaces correspond to well-known stochastic processes on SE(2) for line completion and line enhancement and are given by group convolution with the corresponding Green’s functions. We provide the exact Green’s functions and approximations, which we use together with invertible orientation scores for automatic line enhancement and completion.
1
Introduction
In many medical imaging applications elongated structures (such as catheters, blood-vessels and collagen fibres) appear only partially and vaguely in noisy medical image data, [9]. It is often desirable to process these images such that crossing elongated structures become more visible before actual detection takes place. Due to occlusions small parts of these line or edge-like structures may not be clearly visible, requiring line-completion, [15, 19, 1, 18, 7]. Furthermore, since the acquisition of, for example, X-ray images is harmful to a patient, the radiation dose is reduced as much as possible leading to very noisy images. Such images typically require line-enhancement, [9, 3] where the aim is to make the elongated structures more visible while reducing the noise. In this article we will consider operators for line enhancement, using diffusion equations on the non-commutative group SE(2) of planar translations X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 795–807, 2009. c Springer-Verlag Berlin Heidelberg 2009
796
R. Duits and E. Franken
and rotations. This group SE(2) is a semi-direct product of R2 and the circle T = {eiθ | θ ∈ [0, 2π)} ≡ SO(2) and is equipped with the following product
gg = (x, eiθ )(x , eiθ ) = (x+Rθ x , ei(θ+θ ) ), g = (x, eiθ ), g = (x , eiθ ) ∈ SE(2),
θ − sin θ with x = (x, y) ∈ R2 and Rθ = cos ∈ SO(2). sin θ cos θ Before we can apply line completion and enhancement to images we need a map Uf : SE(2) → C which provides an overview of all local orientations in the image f : R2 → R. There exist several approaches to construct such a map, see for example [11], [4], [19], [1], but only few methods put emphasis on the stability of the inverse transformation Uf → f . However, well-posed image enhancement on the basis of local orientations in an image f can be done via the map Uf iff there exists a stable transformation between image f and map Uf . In this article we restrict ourselves to the case where Uf = Wψ f is given by Wψ f (g) = ψ(Rθ−1 (y − x))f (y) dy, g = (x, eiθ ) ∈ SE(2), Rθ ∈ SO(2), (1) R2
i.e. the orientation score Wψ f is obtained from image f by convolution with a directed anisotropic kernel ψ ∈ L2 (R2 ) rotated at multiple angles. In section 2 we will show that for a certain class of directional kernels ψ, we obtain quadratic norm preservation and thereby a stable reconstruction formula. This allows us to relate operators on images to operators on orientation scores via a robust commuting diagram, see Figure 1 (where the precise details will follow later). Note that an invertible orientation score has useful properties: It carries per position a whole distribution of orientations and by invertibility it automatically unwraps crossing lines, [9, 8]. So instead of applying a diffusion directly on the image f we apply anisotropic diffusion on the corresponding orientation score Wψ f such that we take advantage of these properties. Now in order to obtain Euclidean invariant smoothing of the image the diffusion on the orientation score must be left invariant and therefore in section 3 we consider left invariant diffusions on orientation scores. These diffusions are Fokker-Plank PDE’s of wellknown stochastic processes for line completion and enhancement. We provide their exact solutions as SE(2)-convolutions with the explicit Green’s functions, which were strongly required by Mumford [15], Citti [3] and many others [19], [1] but hitherto unknown. Since our exact derivation of the Green’s functions (which are scale space kernels on SE(2)) is rather technical we omit the derivations here and focus only on the results. For details see our recent works [7], [6]. Instead we will consider the highly simplified case of scale space kernels on the circle T, which is often used in image analysis and quite analogous to the SE(2) case. This helps the reader to get a better grasp on the scale space kernels on SE(2). Finally, we include an experiment for both line enhancement and line completion. Here the advantage of our approach compared to our previous work [8] on non-linear diffusion on SE(2), is that it involves less parameters, it is easier to grasp from a stochastic point of view and easier to implement (in parallel). The drawback, however, is that this scheme is less adaptive. For various biomedical engineering applications we refer to our thesis, [9, 18, 4].
Line Enhancement and Completion
2
797
Invertible Orientation Scores
The transformation between an image f : R2 → R and an orientation score Wψ f : R2 T → R given by (1) is a wavelet transformation generated by a reducible representation U : SE(2) → B(L2 (R2 )) of the Euclidean motion group SE(2) = R2 T into the space of bounded operators in L2 (R2 ). This important observation needs some explanation. By definition a representation of the group SE(2) (with unit element e = (0, ei0 )) is an isomorphism between SE(2) and the space of bounded operators on L2 (R2 ), which means that Ug ◦ Uh = Ugh for all g, h ∈ SE(2) and Ue = I. In our case we have Ug f (y) = f (Rθ−1 (y − x)), for all f ∈ L2 (R2 ), g = (x, eiθ ). The transform which maps an image f to orientation score Wψ f given by (1) can now be rewritten in an L2 -inner product form: Wψ [f ](g) = (Ug ψ, f )L2 (R2 ) , g ∈ SE(2),
(2)
which is the standard group theoretical structure of a continuous wavelet transform. However, we restrict ourselves initially to a single scale, like in [11]. The issue of scale comes into play later on by the diffusions on the orientation scores. Note that in standard continuous wavelet theory on the group of translations, rotations and scalings, it is not possible to obtain a stable reconstruction from a single scale layer as this conflicts [6, ch:2] the admissibility condition, [12]. The same holds for edgelets, curvelets, ridgelets [2]. Moreover, the admissibility condition in standard wavelet theory, [12], requires the wavelet to oscillate in radial direction which is undesirable with the diffusions we consider later on. However, in contrast to the standard approach, [12], our representation U is reducible, which means that there exists a closed subspace of L2 (R2 ) which is invariant under Ug for all g ∈ SE(2). Consider for example the closed subspace: L2 (R2 ) = {f ∈ L2 (R2 ) | support{F f } ⊂ B0, },
(3)
where B0, denotes a ball around 0 ∈ R2 with radius > 0 and where fourier 1 −iω ·x f (x)dx. Contransform F : L2 (R2 ) → L2 (R2 ) is given by F f (ω) = 2π Re sequently, the celebrated result of Grossmann et al. [10] on stable reconstruction does not apply. Therefore in previous work [4] we showed that under minor conditions on ψ the wavelet transform Wψ is a unitary map from L2 (R2 ) onto some reproducing kernel space of L2 -functions on SE(2). Here we avoid technicalities and just provide the essential formula which describes the stability: R2 T
|(FWψ f )(ω, eiθ )|2 dθ Mψ1(ω ) dω = |(Ff )(ω)|2 |Fψ(RθT ω)|2 dθ 2 T R = R2 |(Ff )(ω)|2 dω = f 2L2 (R2 ) ,
1 Mψ (ω )
dω
(4)
2π where Mψ ∈ C(R2 , R) is defined by Mψ (ω) := 0 |Fψ(RθT ω)|2 dθ. If ψ is chosen such that Mψ = 1 then we get L2 -norm preservation. However, this is not possible as ψ ∈ L2 ∩ L1 (R2 ) implies that Mψ is a continuous function
798
R. Duits and E. Franken
vanishing at infinity. This can be taken into account using distributional kernels [4]. In practice however, because of finite grid sampling, we can just restrict Wψ to the space of bandlimited images L2 (R2 ) given by (3) and use localized wavelets ψ with the property that Mψ (ω) = M(ρ), ρ = ω , where M : [0, ] → R+ is a smooth approximation of 1[0,) . We call these wavelets proper wavelets. Exact reconstruction is obtained by the adjoint wavelet transform Wψ∗ : → f = Wψ∗ Wψ [f ] = F −1 ω
2π 0
F[Uf (·, eiθ )](ω) F[Reiθ ψ](ω) dθ Mψ−1 (ω)
,
(5)
where the rotated kernel is given by Reiθ ψ(x) = ψ(Rθ−1 x). Now for proper wavelets one may as well use the (approximative) reconstruction: → F −1 ω
2π 0
F[Uf (·, eiθ )](ω) F[Reiθ ψ](ω) dθ
(6)
In [4] we construct two different classes of proper wavelets. Here we shall briefly mention a typical example of one particular class (for the other class see [18, 4]) that even allows a reconstruction by integration over θ only, which is practical, fast and intuitive. Example. Let B k be a k-th order B-spline, i.e. B k = B k−1 ∗ B 0 , with B 0 (x) = 1[− 12 , 12 ] then we set (with ω = (ρ cos φ, ρ sin φ)): ψ(x) = F
−1
[ω →B
k
nθ (φmod 2π − 2π
π ) 2
M(ρ)](x) ,
and where nθ equals the number orientation samples in our orientation score, ρ2
k
controls “kernel-width” and M(ρ) = e− 2σ2 ( 4k=0 (−1)k 2−1 σ −2 ρ2 )−1 , σ = 2 . Now that we have constructed a stable transformation between images f and corresponding orientation scores Uf we can relate operators Υ on images to operators Φ on orientation scores in a robust manner, see Figure 1. This relation is 1-to 1 if we ensure that the operator on the orientation score again provides an orientation score of an image. However the operators Φ that we will propose in the remainder of this article will not leave the space of orientations scores (which we from now on denote by CSE(2) ) invariant, i.e. the processed orientation score K will not be the orientation score of an image but just some enhanced square integrable element Φ(Wψ f ) in L2 (SE(2)). In practice however, this does not matter since we naturally extend the reconstruction formula to L2 (SE(2)): → (Wψ∗ )ext U (g) = F −1 ω
2π 0
F[U (·, eiθ )](ω) F[Reiθ ψ](ω) dθ Mψ−1 (ω) (x),
(7)
for all U ∈ L2 (SE(2)), where g = (x, eiθ ) ∈ SE(2). So there arise no practical problems, however one should be aware that the effective part of an operator Φ on an orientation score is in fact Pψ Φ where Pψ = Wψ (Wψ∗ )ext is the orthogonal projection of L2 (SE(2)) onto the space of orientation scores CSE(2) . K Next we give a brief motivation why we must restrict ourselves to left invariant operators on orientation scores: It can be verified that Wψ ◦ Ug = Lg ◦ Wψ for all g ∈ SE(2), where the left-regular representation L : G → B(L2 (SE(2))) is
Line Enhancement and Completion
799
given by Lg U (h) = U (g −1 h). Consequently, the effective operator Υ on images is Euclidean invariant iff the operator Φ on orientation scores is left-invariant: Υ ◦ Ug = Ug ◦ Υ for all g ∈ SE(2) ⇔ Φ ◦ Lg = Lg ◦ Φ for all g ∈ SE(2), (8) for further details see [4, Thm. 21, p.153]. It is well-known that the only leftinvariant kernel operators are convolutions. On SE(2) they are given by (K ∗SE(2) U )(g) =
2π K(h−1 g)U (h) dμ(h) = K(RθT (x−x ), θ−θ ) U (x , θ )dθ dx, (9)
SE(2)
R2 0
with g = (x, eiθ ) and μ the left-invariant Haar-measure on SE(2). For a detailed overview of alternative algorithms (including complexity, steerability, performance, relation to Fourier transform on SE(2), relation to tensor voting methods [9], [13] and extension to 3D), see the most complete and recent work [9, ch:3, p.53, p.72], containing new faster algorithms for steerable SE(2) convolutions, [9, ch: 3.5.1], [18, ch: 6.5.1, 6.5.2], [4, ch: 7.8, 5.4, 5.3.2]. However, the operators on orientation scores should not be linear, since this would imply that the effective operator Υ is a rotation and translation invariant kernel operator and thereby [16], Υ would be a R2 -convolution with an isotropic kernel. Clearly, in this case one does not require invertible orientation scores. Therefore, based on the works [19], [1], [17] on “completion fields”, we consider so-called “collision distribution operators” which are given by ˜ (10) (Φ(U, V ))(g) = (Rγ ∗SE(2) (χ(U )))(g) · (Rγ ∗SE(2) (χ(V )))(g), ∞ −γt where Rγ (g) = γ 0 e Kt (g) dt, g ∈ SE(2), is a time integrated probability kernel obtained from a scale space kernel Kt : SE(2) → R+ (satisfying Kt1 ∗SE(2) Kt2 = Kt1 +t2 ) which we shall derive in section 3 and where U and V denote two initial distributions on SE(2). Finally, χ in (10) is a monotonic, homogenous greyvalue transformation on orientation scores such as χ(U )(x, y, θ) = F (Re{U (x, y, θ)}), with F : R → R given by F (I) = |I|p sign(I), for some p > 1. Here we do not put sources and sinks by hand as delta-distributions on SE(2), [19], but we use invertible orientation scores instead. So in (10) we set ˜ ψ f, Wψ f ). U = V = Wψ f and consider the operators Wψ f → Φ(Wψ f ) := Φ(W The motivation for our choice (10) comes from basic probability theory which we explain next.
3
Scale Spaces on SE(2) Based on Stochastic Processes
By the results of the previous section an operator on orientation scores must be left invariant. Therefor we consider left invariant scale spaces. The PDE’s of these scale spaces are stochastic differential equations corresponding to left invariant stochastic processes for line enhancement/completion. Just like an image can be interpreted as a distribution of greyvalue particles over space, the absolute value of an orientation score can be interpreted as a
800
R. Duits and E. Franken
Image
Wψ
2
f ∈ L2 (R )
SE(2)
⊂ L2 (SE(2)) Φ
Υ
Processed Image Υ[f ] = Wψ∗ [Φ[Uf ]]
Orientation Score Uf ∈ CK
Processed Score ∗ ext ∗ (Wψ ) = Wψ Pψ
Φ[Uf ] ∈ L2 (SE(2))
Fig. 1. The complete scheme; for admissible vectors ψ the linear map Wψ is unitary SE(2) from L2 (R2 ) onto the closed subspace of orientation scores CK within L2 (SE(2)). SE(2) SE(2) So we can uniquely relate an operator Φ : CK → CK on an orientation score to an operator on an image Υ = (Wψ∗ )ext ◦ Φ ◦ Wψ ∈ B(L2 (Rd )), where (Wψ∗ )ext is ˜ ψ f, Wψ f ) is given by (10) using the Green’s given by (7) and where Φ(Wψ f ) = Φ(W functions/probability kernels Ks := GD,a : SE(2) → R+ of the scale spaces (for line s enhancement and completion ) on SE(2), that we shall derive in section 3.
distribution of oriented greyvalue particles over space and orientation. Next we derive suitable stochastic processes on this distribution of oriented greyvalue particles. We first consider a single oriented greyvalue particle with initial position X(0) and orientation eiΘ(0) in SE(2). We will apply superposition afterwards. For line completion this oriented greyvalue particle is send in the spatial plane along its preferred direction eξ = cos θ ex +sin θ ey , ξ = x cos θ+y sin θ , allowing random behavior (with variance σ 2 > 0) of its orientation over time: √ (Xn+1 , Θn+1 ) := (Xn , Θn )+Δs(cos Θn ex + sin Θn ey , κ0 ) + Δs σ n+1 (0, 0, 1), (X0 , Θ0 ) = (0, 0), where n+1 ∼ N (0, 1) independently normally distributed ,
(11)
with steps Δs = L/N , total length L of the trajectory, n = 0, 1, . . . , N −1, N ∈ N and κ0 an a priori curvature. This stochastic process is known in computer vision as the direction process [15], see Figure 2. By infinite repetition of this process one gets a limiting distribution G : SE(2) × R+ → R+ of greyvalue particles which (by Ito’s formula) satisfies the following Fokker-Plank equation
∂s G(x, y, θ, s) = −∂ξ − κ0 ∂θ + D11 (∂θ )2 G(x, y, θ, s) G(·, s = 0) = δg0 = δx0 ⊗ δy0 ⊗ δθ0
(12)
In a Markov-process traveling time s is memoryless. Therefore s must be negatively exponentially distributed, i.e. P (S = s) = γe−γs with expectation E(s) = γ −1 . Now by superposition the probability densities of finding an oriented greyvalue particle at time s > 0, at position (x, y) with orientation θ, starting from the distribution U ∈ L1 (SE(2)) at s = 0, equals
Line Enhancement and Completion 11 11 ∗SE(2) U )(x, y, θ) with GD (x, y, θ) = G(x, y, θ, s) P (x, y, θ | U, S = s) = (GD s s P (x, y, θ | U ) = R+ P (x, y, θ | U, S = s)P (S = s)ds = (RsD11 ∗SE(2) U )(x, y, θ) ,
801
(13)
∞ 11 where P (S = s) = γe−γs so that RγD11 = γ 0 GD e−γ s ds. For line enhances ment, we consider a different stochastic process on SE(2): √ (Xn+1 , Θn+1 ) := (Xn , Θn ) + Δs (σ2 2n+1 (ex cos Θn +ey sin Θn ) , σ1 1n+1 ), i (X0 , Θ0 ) = (0, 0), with n+1 ∼ N (0, 1) independently normally distributed ,
(14)
where i = 1, 2. Again by infinite concatenation of this process one gets a limiting distribution which satisfies following Fokker-Planck equation
11 ,D22 11 ,D22 ∂s GD (x, y, θ) = D11 (∂θ )2 + D22 (∂ξ )2 GD (x, y, θ) s,g0 s,g0 D11 G0,g0 (·) = δg0 = δx0 ⊗ δy0 ⊗ δθ0 ,
(15)
with ξ = x cos θ + y sin θ, which coincides with Citti’s model for perceptional enhancement in SE(2), [3]. Next we consider all linear left invariant 2nd-order scale spaces on the Euclidean motion group SE(2), whose solutions are SE(2)convolutions with the corresponding Green’s functions. In two particular cases we arrive at the Green’s functions (12) and (15). In contrast to previous work [15] and [3], we provide the exact Green’s functions. 3.1
Left-Invariant Scale Spaces on SE(2)
A vector field X on SE(2) is called left invariant if for all g ∈ SE(2) the pushforward of (Lg )∗ Xe by left multiplication Lg h = hg equals Xg , that is (Xg ) = (Lg )∗ (Xe ) ⇔ Xg f = Xe (f ◦ Lg ), for all f ∈ C ∞ : Ωg → R,
(16)
where Ωg is some open set around g ∈ SE(2). Recall that the tangent space at the unity element e = (0, 0, ei0 ) is spanned by {ex , ey , eθ } = {(1, 0, 0), (0, 1, 0), (0, 0, 1)} and by the general recipe explained in [5] we get the following basis for the space of left-invariant vector fields, L(SE(2)): {A1 , A2 , A3 } := {∂θ , ∂ξ , ∂η } = {∂θ , cos θ ∂x + sin θ ∂y , − sin θ ∂x + cos θ ∂y }, with ξ = x cos θ + y sin θ and η = −x sin θ + y cos θ.
(17)
Note that the non-commutative behavior of the group is intuitively reflected in a non-commuting Lie-algebra: [A1 , A2 ] = A1 A2 − A2 A1 = A3 , [A1 , A3 ] = −A2 , [A2 , A3 ] = 0 . Next we follow our general theory for left invariant scale spaces on Lie-groups, [5] and set the following quadratic form on L(SE(2)), with a = (a1 , a2 , a3 ) ∈ R3 , Q
D,a
(A1 , A2 , A3 ) =
3 i=1
−ai Ai +
3 j=1
Dij Ai Aj
, D := [Dij ] ∈ R3×3 ,
(18)
802
R. Duits and E. Franken
with DT = D ≥ 0 and consider the linear left-invariant scale spaces on SE(2): s > 0, g ∈ SE(2). ∂s W (g, s) = QD,a (A1 , A2 , A3 ) W (g, s) , (19) W (g, s = 0) = U (g) , g ∈ SE(2). with corresponding resolvent equations (obtained by Laplace transform over s): Pγ (g) = γ(QD,a (A1 , A2 , A3 ) − γI)−1 U (g)
(20)
which (for the cases a = 0) correspond to first order Tikhonov regularization on SE(2), [5]. By our results in [5], the solutions of these left invariant evolution equations are SE(2)-convolutions with the corresponding Green’s function: ∗SE(2) U )(x, θ), W (x, θ, s) = (GD,a s
Pγ (x, θ) = (RγD,a ∗SE(2) U )(x, θ)
where we recall (9). In the special case Dij = 12 σ 2 δi1 δj1 , i, j = 1, 2, 3, and a = (κ0 , 1, 0) our scale space equation (19) is the Fokker-Planck equation (12) of Mumford’s direction process for line completion and in the case Dij = Dii δij , D11 = 12 (σ1 )2 , D22 = 12 (σ2 )2 , D33 = 0, a = 0, our scale space equation is the Fokker-Planck equation (15) of the stochastic process for line enhancement. Next we provide the exact Green’s functions with suitable approximations, but first we provide a simple intuitive, but nevertheless analogous, example. 3.2
A Simple Introductory Example: Scale Spaces on the Circle
The Gaussian scale space equation and corresponding resolvent equation on a circle T = {eiθ | θ ∈ [0, 2π)} with group product eiθ eiθ = ei(θ+θ ) , read
∂s u(θ, s) = D11 ∂θ2 u(θ, s), and pγ (θ) = γ(D11 ∂θ2 − γI)−1 f (θ), u(0, s) = u(2π, s) and u(θ, 0) = f (θ)
(21)
with θ ∈ [0, 2π) and D11 > 0 fixed, where we recall that the function θ → ∞ pγ (θ) = γ 0 u(θ, s)e−γs ds is the minimizer of the energy
E(pγ ) :=
2π 0
γ|pγ (θ) − f (θ)|2 + D11 |pγ (θ)|2 dθ
under the periodicity condition pγ (0) = pγ (2π). By left-invariance the solutions are given by T-convolution with their Green’s function (or “impuls-response”), ∞ 11 11 −γs say GD : T → R+ and RγD11 : T → R+ . Note that RγD11 = γ 0 GD e ds. s s Now orthogonal eigenfunctions of the diffusion process correspond to eigenfunceinθ tions of the generator D11 (∂θ )2 and they are given by ηn (θ) = √ , so that 2π u(θ, s) = pγ (θ) =
2
(ηn , f )L2 (T) ηn (θ)e−n
n∈Z
sD11
(ηn , f )L2 (T) ηn (θ) D11 nγ2 +γ
n∈Z
2 11 , GD (θ, s) = ηn (θ)ηn (0)e−n sD11 , s n∈Z , RγD11 (θ) = ηn (θ)ηn (0) D11 nγ2 +γ . n∈Z
(22)
A well-known drawback of such an approach is that the series do not converge quickly if s > 0 resp. γ > 0 are small. In such case one of course prefers a spatial
Line Enhancement and Completion
803
implementation over a Fourier implementation, where one unfolds the circle and calculate modulo 2π-shifts afterwards, i.e. D11 ,∞ 11 11 u(θ, s) = (GD ∗ f )(θ) , where GD (θ) = Gs (θ − 2πn) s s (23) n∈Z
where the Green’s functions for diffusion and Tikhonov regularization on R are √ θ2 11 ,∞ 11 ,∞ −γs GD (θ) = (4πs)−1/2 e− 4s and RγD11 ,∞ (θ) = γ R+ GD e ds = γ2 e− γ|θ| . s s Again the latter formula follows by Laplace transform of the first, but a better derivation is by means of a continuous (not differentiable) fit at θ of two solutions in the nullspace of operator (∂θ2 + γ) which vanish resp. at +∞ and −∞. The 1 11 √θ sums in (23) can be computed explicitly, yielding GD (θ) = ϑ , e−s , s 2π 3 2 D11 where ϑ3 is a theta-function of the 3rd kind. 3.3
The Green’s Functions of the Line-Completion Process 2
Let us consider the case Dij = σ2 δi1 δj1 , a = (κ0 , 1, 0) where our scale space equation (19) equals the Fokker-Planck equation (12) of Mumford’s direction process. The next theorems provide formulas (like (22)) for the Green’s function in terms of Mathieu-functions, using the conventions as in cf. [14], meν (z, q), ceν (z, q) with Floquet exponent ν, such that Im(ν) ≥ 0. 11 ∈ C ∞ (SE(2) \ {e}), of the diTheorem 1. The Green’s functions RγD11 , GD s rection process with κ0 = 0, i.e. the unique smooth solutions of
D
D11 2 11 ∂ξ −D11 ∂θ2 + γ RγD11 = γ δe , ∂s Gs = −∂ξ + D11 ∂θ Gs D11 D11 11 D11 D11 Gs (·, 0) = Gs (·, 2π), lim GD = δe s Rγ (·, 0) = Rγ (·, 2π) s↓0
are
RγD11 (x, y, θ)=F −1
⎛
∞ ω →
11 GD (x, y, θ) = F −1 ⎝ω → s
n=0 ∞ n=0
γ cen π 2 λn (ρ)
a2 (ρ)s − n D11 e π2
cen
−ϕ , i D2ρ11 2 −ϕ , i D2ρ11 2
cen cen
θ−ϕ , i D2ρ11 2 θ−ϕ , i D2ρ11 2
(24)
(x, y), ⎞ ⎠ (x, y)
(25)
2iρ with ω = (ρ cos ϕ, ρ sin ϕ), −λn (ρ) = −an ( D )−γ < 0, where an (h2 ) denote the 11 positive eigenvalues of Mathieu’s equation, cf. [14]. The Green’s function RγD11 is indeed a probability kernel, i.e. RγD11 > 0 and SE(2) RγD11 (g)dg = 1 .
For detailed proof we refer to our latest work [7], where our most relevant observation is that the generator B = −∂ξ + D11 ∂θ2 of the line-completion process in the Fourier domain (only with respect to (x, y)) reads ˆ = FBF −1 = −iωx cos θ − iωy sin θ + D11 ∂θ2 = −iρ cos(θ − ϕ) + D11 ∂θ2 , B
(26)
so (25) is a bi-orthogonal expansion of eigen functions (directly related to the Mathieu functions, which are eigen functions of ∂z2 −2h2 cos(2z)) of the restriction ˆ + γI to the circle T. The formulae (25) are the exact solutions of the of −B numerical algorithm by August [1]. The drawback, however, of this bi-orthogonal
804
R. Duits and E. Franken
expansion is the speed of convergence near e. This inspired us to find a much better series representation: The idea here is to make a continuous (but not differentiable at θ = 0!) fit of elements on each side of the singularity at θ = 0 ˆ within the nil-space of −B+γI that vanish at θ → ±∞. Here we unfold the circle providing a series of 2π-shifts of the solutions with infinite boundary conditions. For relevant parameter settings this series can be truncated at N = 0, 1 or at the most N = 2 if D11 /γ is small. This yields the following analogue of (23): Theorem 2. The Green’s function RγD11 of the direction process with a priori curvature κ0 ≥ 0, i.e. the unique smooth solution of (24), is given by ˆ γD11 (ω, θ))(x), RγD11 (x, θ) = F −1 (ω →R N
ˆ D11 (ω, θ) = lim with R γ
N →∞ k=−N
ˆ D11 (ω, θ − 2kπ), where the Fourier transform R γ,∞
D11 D11 D11 ˆ γ,∞ R = F Rγ,∞ of the solution Rγ,∞ of
∂ξ − D11 (∂θ )2 + γ RγD11 ,∞ = γδe RγD11 ,∞ (·, θ) → 0 uniformly on compacta as |θ| → ∞
θ 2ρ , i me−ν ϕ− u(θ) D11 2 ϕ 2ρ ϕ−θ 2ρ + me−ν 2 , i D11 meν 2 , i D11 u(−θ) ,
κ0 θ
D11 ˆ γ,∞ is given by R (ωx , ωy , θ) =
−γe 2D11 πD11 W (ρ)
with Floquet exponent given by ν = ν 1 2 (1+sign(θ)), me−ν (·, i D2ρ11 ).
meν −4γ D11
ϕ , i D2ρ11 2
−
κ20 2iρ 2 , D D11 11
, unitstep function u(θ) =
and the function W (ρ) denotes the Wronskian of meν (·, i D2ρ11 ) and
Remark. The sum in Theorem 2 can be computed explicitly (by Floquet’s theorem) yielding a single exact formula on SE(2) consisting of only 4 Mathieu functions [18, p.127], [7, ch:4.2.1], likewise the ϑ-function of subsection 3.2. However it still requires sampling of Mathieu-functions, therefore we derive a parametrix (see [3], [17]) by replacing the true left invariant vector fields {A1 , A2 , A3 } on SE(2) by {Aˆ1 , Aˆ2 , Aˆ3 } = {∂θ , ∂x + θ∂y , −θ∂x + ∂y }. (27) Essentially, this replaces the group of positions and orientations, locally by the (nilpotent) group of positions and velocities (normalized in x-direction). See Figure 2. By some theory on nilpotent Lie groups it follows, see [4]p.166, that 11 ˜D G (x, y, θ) = s
yielding R˜ γD11 (x, y, θ) = γ 3.4
√
−
3e
√
3 2 D11 πx2
δ(x−s) e
−
3(xθ−2y)2 +x2 (θ−κ0 x)2 4x3 D11
3(xθ−2y)2 +x2 (θ−κ0 x)2 −γx 4x3 D11
2 D11 πx2
,
(28)
u(x) by Laplace transform.
The Green’s Functions of the Line-Enhancement Process
In this paragraph we will derive the Green’s functions of the line-enhancement process (14). These kernels are the exact heat-kernels for a Gaussian scale space
Line Enhancement and Completion
g(t)
y
y
4
4
0
0
805
-4
y
-P
x
x
Q
0
P
y
4
8
12 x
-P
Q
0
P
0
0
Q
Q
-P
-P
Ƨ
image plane
-4 0
0
4
8
12 x
Fig. 2. Left: Random walks of the direction process (11). Middle: isoline-plots of the marginals of RγD11 , which is the time-integrated limiting distribution (using Ito1 1 calculus) of all random walks, γ = 10 , D11 = 32 , κ0 = 0. Left corner middle image: 3D-plot of 2D-isolines of RγD11 . Right: a comparison of the level curves of the marginals ˜ γD11 and RγD11 . Dashed lines denote the level sets of approximation R ˜ γD11 , see (28). of R
on the group of positions and orientations. Here we even allow D33 ≥ 0. Set D33 = 0 to get the Green’s functions of (14). Theorem 3. Let D11 , D22 , D33 > 0, then the Green’s function (or rather Gaus11 ,D22 ,D33 sian kernels) GD on the Euclidean motion group SE(2) of the scale s space equation (19) generated by (18) with a = 0 and Dij = Dii δij is given by 11 ,D22 ,D33 11 ,D22 ,D33 ˆD GD (b1 , b2 , eiθ ) = F −1 [ω →G (ω, eiθ )](b1 , b2 ), with s s 11 ,D22 ,D33 ˆD G (ω, eiθ ) s
with q = q(ρ) =
e−s(1/2)(D22 +D33 )ρ = π
ρ2 (D22 −D33 ) 4 D11
2
∞
cen (ϕ, q)cen (ϕ − θ, q)e
−s an (q)D11
n=0
and an (q) the Mathieu Characteristic. For relatively D
,D
,
11 22 D33 simple formula for the corresponding resolvent Green’s functions Rγ,∞ analogous to the formulas in Theorem 2 see [6, part I, Thm 5.2, 5.3]. For D33 < D22 the resolvent (or Tikhonov regularization) kernel on SE(2) D=diag{D11 ,D22 ,D33 } Rγ is given by
γ [FRγD (·, θ)](ω) = 4πD11 ceν (0,q) { seν (0,q) (− cot(νπ) (ceν (ϕ, q) seν (ϕ − θ, q) + seν (ϕ, q) seν (ϕ − θ, q)) + + ceν (ϕ, q) seν (ϕ − θ, q) − seν (ϕ, q) ceν (ϕ − θ, q)) u(θ) (− cot(νπ) (ceν (ϕ, q) ceν (ϕ − θ, q) − seν (ϕ, q) seν (ϕ − θ, q)) + ceν (ϕ, q) seν (ϕ − θ, q) + seν (ϕ, q) ceν (ϕ − θ, q)) u(−θ) },
(D22 −D33 )ρ2 , ω 4D11 γ+(1/2)(D22 −D33 )ρ2 − . D11
with q = a=
(29)
= (ρ cos ϕ, ρ sin ϕ), Floquet exponent ν = ν(a, q) and
For the corresponding Green’s functions on the group of positions and velocities, recall (27), see [5]. Finally, in [6, ch:5.4] we derive the useful formula: KsD11 ,D22 (x, y, eiθ )≈ 1
− 4s c2 1 4πs2 D11 D22 e
where we recall (17).
θ2 D11
2
2
θ (y−η) + 4(1−cos(θ)) 2D
22
2
1 11 D22
+D
θ2 (ξ−x)2 4(1−cos θ)2
(30) ,
806
R. Duits and E. Franken a input
b output
c output
d output
e output
f output
xy-marginal Green’s functions:
Corresponding Orientation scores a Wavelet d
f
Fig. 3. Top row: a: noisy input image f , b: |f |p signf , c: (Wψ∗ )ext (χp (Wψ f )) with p = 1.5, d,e: line enhancement using time-dependent diffusion kernel depicted below, f: line completion using resolvent completion-kernel depicted below. All involved orientation scores are sampled on a 100×100×64 grid. Circles depict parts where a clear difference arises between line completion and enhancement. Middle row: proper wavelet ψ in example of section 2 (par’s k = 2, nθ = 64), Green’s function line enhancement process par’s D11 = 0.00015, D22 = 1, s = 15, using asymptotic formula (30), Green’s function 1 line enhancement process D11 = 0.00015, D33 = 1, γ = 64 , Green’s function line 1 completion process D11 = 0.0024, γ = 64 . Bottom row: Slices Wψ f (·, ·, eiθk ) for θk = π , k = 0, . . . , 5. (2k + 1) 32
Now that we have derived the Green’s functions we return to our scheme, Fig. 1. First, we construct an invertible orientation score (1). Then we convolve the orientation score with 2 Green’s functions to compute the direct product in (10) and finally we apply the inverse transform (6). For experiments, see Fig. 3.
4
Conclusion
Since the transformations between image and orientation score are stable, we can apply image processing via orientation scores. To ensure Euclidean invariance of the operator on an image the corresponding operator on its orientation score must be left-invariant. Therefore we consider left-invariant scale spaces on these orientation scores based on stochastic processes on the group SE(2), the solutions of which are given by SE(2)-convolution with the corresponding Green’s functions, which we derived explicitly.
Line Enhancement and Completion
807
References 1. August, J.: The Curve Indicator Random Field. PhD thesis, Yale University (2001) 2. Candes, F.: New ties between computational harmonic analysis and approximation theory. Approximation Theory X, Innov. Appl. Math. (6), 87–153 (2000) 3. Citti, G., Sarti, A.: A cortical based model of perceptual completion in the rototranslation space. JMIV 24(3), 307–326 (2006) 4. Duits, R.: Perceptual Organization in Image Analysis. PhD thesis, Eindhoven University of Technology, Dep. of Biomedical Engineering, The Netherlands (2005) 5. Duits, R., Burgeth, B.: Scale spaces on lie groups. In: Sgallari, F., Murli, A., Paragios, N. (eds.) SSVM 2007. LNCS, vol. 4485, pp. 300–312. Springer, Heidelberg (2007) 6. Duits, R., Franken, E.M.: Left-invariant parabolic evolutions on SE(2) and contour enhancement via invertible orientation scores, part i: Linear left-invariant diffusion equations on SE(2). Quarterly of Appl. Math. (to appear, 2009) 7. Duits, R., van Almsick, M.A.: The explicit solutions of linear left-invariant second order stochastic evolution equations on the 2d-euclidean motion group. Quarterly of Applied Mathematics 66, 27–67 (2008) 8. Franken, E., Duits, R., ter Haar Romeny, B.M.: Nonlinear diffusion on the 2D euclidean motion group. In: Sgallari, F., Murli, A., Paragios, N. (eds.) SSVM 2007. LNCS, vol. 4485, pp. 461–472. Springer, Heidelberg (2007) 9. Franken, E.M.: Enhancement of Crossing Elongated Structures in Images. PhD thesis, Dep. of Biomedical Engineering, Eindhoven University of Technology, The Netherlands, Eindhoven (October 2008) 10. Grossmann, A., Morlet, J., Paul, T.: Integral transforms associated to square integrable representations. J. Math. Phys. 26, 2473–2479 (1985) 11. Kalitzin, S.N., ter Haar Romeny, B.M., Viergever, M.A.: Invertible apertured orientation filters in image analysis. IJCV 31(2/3), 145–158 (1999) 12. Louis, A.K., Maass, P., Rieder, P.: Wavelets, Theory and Applications. Wiley, New York (1997) 13. Medioni, G., Lee, M.S., Tang, C.K.: A Computational Framework for Segmentation and Grouping. Elsevier, Amsterdam 14. Meixner, J., Schaefke, F.W.: Mathieusche Funktionen und Sphaeroidfunktionen. Springer, Heidelberg (1954) 15. Mumford, D.: Elastica and computer vision. Algebraic Geometry and Its Applications, pp. 491–506. Springer, Heidelberg (1994) 16. Sporring, J., Nielsen, M., Florack, L.M.J., Johansen, P.: Gaussian Scale-Space Theory. KAP, Dordrecht (1997) 17. Thornber, K.K., Williams, L.R.: Analytic solution of stochastic completion fields. Biological Cybernetics 75, 141–151 (1996) 18. van Almsick, M.A.: Context Models of Lines and Contours. PhD thesis, Eindhoven University of Technology, Department of Biomedical Engineering, Eindhoven, The Netherlands (2007) ISBN:978-90-386-1117-4 19. Zweck, J., Williams, L.R.: Euclidean group invariant computation of stochastic completion fields using shiftable-twistable functions. JMIV 21(2), 135–154 (2004)
Spatio-Featural Scale-Space Michael Felsberg Computer Vision Laboratory, Linköping University, S-58183 Linköping, Sweden [email protected]
Abstract. Linear scale-space theory is the fundamental building block for many approaches to image processing like pyramids or scale-selection. However, linear smoothing does not preserve image structures very well and thus non-linear techniques are mostly applied for image enhancement. A different perspective is given in the framework of channelsmoothing, where the feature domain is not considered as a linear space, but it is decomposed into local basis functions. One major drawback is the larger memory requirement for this type of representation, which is avoided if the channel representation is subsampled in the spatial domain. This general type of feature representation is called channel-coded feature map (CCFM) in the literature and a special case using linear channels is the SIFT descriptor. For computing CCFMs the spatial resolution and the feature resolution need to be selected. In this paper, we focus on the spatio-featural scale-space from a scaleselection perspective. We propose a coupled scheme for selecting the spatial and the featural scales. The scheme is based on an analysis of lower bounds for the product of uncertainties, which is summarized in a theorem about a spatio-featural uncertainty relation. As a practical application of the derived theory, we reconstruct images from CCFMs with resolutions according to our theory. The results are very similar to the results of non-linear evolution schemes, but our algorithm has the fundamental advantage of being non-iterative. Any level of smoothing can be achieved with about the same computational effort.
1
Introduction
The concept of scale is a central ingredient to many image analysis and computer vision algorithms. Scale was first introduced systematically in terms of the concept of linear scale-space [1, 2, 3], establishing a 3D space of spatial coordinates and a scale coordinate. Often identified with Gaussian low-pass filtering, a rigorous analysis of underlying scale-space axioms [4] has led to the discovery of the Poisson scale-space [5] and more general α scale-spaces [6]. In practice, discrete scale-spaces are mostly sub-sampled with increasing scale parameter, leading to the concept of scale-pyramids [7, 8], multi-scale analysis
The research leading to these results has received funding from the European Community’s Seventh Framework Programme (FP7/2007-2013) under grant agreement n◦ 215078 (DIPLECS).
X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 808–819, 2009. c Springer-Verlag Berlin Heidelberg 2009
Spatio-Featural Scale-Space
809
and wavelet theory [9, 10]. While pyramids and wavelets speedup the computation of linear operators and transforms, non-linear scale-space methods are widely used, e.g. for image enhancement. Non-linear scale-space is based on a non-stationary or anisotropic diffusivity function [11, 12]. More recently, non-linear methods have been introduced which are less directly connected to linear scale-space space and diffusion, but allow for faster processing and partially superior results [13, 14]. The former method is based on wavelets, whereas the latter one is based on the channel representation [15] and is called channel smoothing. Combining the channel representation with a systematic decimation of spatial resolution, similar to the pyramid approach, has been applied in blob-detection [16] and in channel-coded feature maps (CCFM) [17, 18], a density representation in spatio-featural domain, see also [19]. In this paper, we propose a new spatio-featural scale-space approach including an image reconstruction algorithm, which generates images from CCFMs. The CCFM scale-space is generated by applying the principles of linear scale-space to the spatial resolution of CCFMs and simultaneously increasing the resolution of feature space. By subsampling this space and subsequent reconstruction, image evolutions are generated which are very similar to those generated by iterative methods. We show some examples and propose a scale-selection scheme based on a new uncertainty relation: the spatio-featural uncertainty relation. In the Section 2, we introduce lesser known relevant techniques: channel representation, channel smoothing, CCFMs. In Section 3 we propose the novel reconstruction algorithm, define the linear scale-space of CCFMs, and formulate a scale-selection scheme based on a spatio-featural uncertainty relation. In Section 4 we present experimental results and in Section 5 we give some concluding remarks.
2 2.1
Required Methods The Channel Representation
Channel coding, also called population coding [20, 21], is a biologically inspired data representation, where features are represented by weights assigned to ranges of feature values [22, 15], see Fig. 1. Similar feature representations can also be found in the visual cortex of the human brain, e.g. in the cortical columns. The closer the current feature value f to the respective feature interval center n, the higher the channel weight cn : cn (f ) = k(f − n) ,
(1)
where k(·) is a suitable kernel function and where f has been scaled such that it has a suitable range (note that we chose to place the channel centers at integers). By introducing z as a continuous feature coordinate, kn (z) = k(z − n), and δf (z) = δ(z − f ) denoting the Dirac-delta at f , the encoding can be written as a scalar product cn (f ) = δf |kn =
δf (z)kn (z) dz
(2)
810
M. Felsberg
orientation Fig. 1. Orientation distribution is encoded into channels, resulting in a (low-pass filtered) reconstruction. Figure courtesy Erik Jonsson.
or as a sampled correlation in the feature-domain: cn = (δf k)(n) = δf (z )k(z − z) dz
.
(3)
z=n
From the weights of all channels the feature value can be decoded unambiguously by finding the mode, where the decoding depends on the kernel function. In some theoretic considerations we will consider Gaussian functions as kernels but in the practical implementation we have been using quadratic B-splines: ⎧ (z + 3/2)2 /2 −3/2 < z ≤ −1/2 ⎪ ⎪ ⎪ ⎨3/4 − z 2 −1/2 < z ≤ 1/2 B2 (z) = (4) 2 ⎪ (z − 3/2) /2 1/2 < z < 3/2 ⎪ ⎪ ⎩ 0 otherwise The features can be scalar valued or vector valued, e.g. grey-scales, color vectors, or orientations. In the case of scalar features the decoding from quadratic B-splines has been considered in detail in [14], which we will not repeat here. For the case of non-interfering channel weights, a simplified scheme based on the quotient of linear combinations can be used: Mn = cn−1 + cn + cn+1
n0 = arg max Mn
cn +1 − cn0 −1 + n0 (5) fˆ = 0 M n0
where fˆ is our estimate of the feature f that had been encoded in cn . Channel representations obviously need more memory than directly storing features, but this investment pays off in several ways which we will show in the subsequent sections.
Spatio-Featural Scale-Space
2.2
811
Channel Smoothing and Channel-Coded Feature Maps
The idea of channel smoothing is based on considering the feature f in the encoding (1) as a stochastic variable. It has been shown in [14] that the distribution pf is approximated by cn in expectation sense: E{cn (f )} = (pf k)(n)
(6)
such that fˆ becomes a maximum-likelihood estimate of f . If we assume that pf is locally ergodic, we can estimate fˆ from a local image region, which corresponds to a local averaging of the channel weights within a spatial neighborhood. The algorithm consisting of the three steps channel encoding, channel averaging, and channel decoding is called channel smoothing and has been shown to be superior to many other robust smoothing methods [14]. Due to the assumption of (piecewise) constant distributions, the positioning of region boundaries might violate the sampling theorem, resulting in unstable edge-pixels. To avoid this effect, a modification to the channel decoding has been proposed in [23], called α-synthesis, which creates smooth transitions between neighborhoods with different feature levels. Instead of extracting the global maximum in (5), all local maxima are extracted located at channels nr . The decoding is then obtained according to ˆ α cnr +1 − cnr −1 r f r M nr ˆ ˆ fr = + nr . (7) f= α Mnr l M nl For the choice of α see [23]; we used α = 2 throughout this paper. One major drawback of channel smoothing is the extensive use of memory if many feature channels are required. A high density of channels is only reasonable if the spatial support is large, which implies that the individual feature channels are heavily low-pass filtered along the spatial dimension. Therefore, the feature channels have a lower band limit and can be subsampled in the spatial domain without losing information. If the three steps of channel encoding, channel averaging, and subsampling are integrated into a single step, channel-coded feature maps (CCFMs) are generated. The advantage of CCFMs is a much higher number of channels, e.g. by combining several features as in Fig. 2, without increasing the memory requirements significantly. The CCFM encoding of a single feature point can be written as (cf. (1)): cl,m,n (f (x, y), x, y) = kf (f (x, y) − n)kx (x − l)ky (y − m) ,
(8)
where kf , kx , ky are the 1D kernels in feature domain and spatial domain. Note that x and y are scaled such that they suit the integer spatial channel centers l, m. Note further, that the previous definition of CCFMs assumes separable kernels, but we could easily use non-separable kernels, e.g. in the case of orientation data. Similar to (1), the encoding (8) of a set of feature points can be written as a scalar product in 3D function space or as a 3D correlation, where we use δf (x, y, z) = δ(z − f (x, y))
(9)
812
M. Felsberg
Fig. 2. Simultaneous encoding of orientation and color in a local image region. Figure taken from [17] courtesy Erik Jonsson.
and kf,n (z) = kf (z − n), kx,l (x) = kx (x − l), ky,m (y) = ky (y − m): δf (x, y, z)kf,n (z)kx,l (x)ky,m (y) dz dy dx cl,m,n (f ) = δf |kf,n kx,l ky,m = = (δf (kf kx ky ))(n, m, l).
(10)
The final formulation is the starting point of the CCFM scale-space.
3
The CCFM Scale-Space
In this section, we introduce the concept of CCFM scale-space. Our considerations are based on CCFMs computed from grey-scale images, i.e., we consider f : R2 → R+ instead of a more general feature function. 3.1
Linear Scale-Space Theory in the Spatio-Featural Domain
The starting point is to embed the image f (x, y) as a 3D surface according to (9). One might try to generate a 3D α scale-space [6] (Gaussian as a special case α = 1 and all α-kernels are symmetric, i.e., correlation and convolution are the same): Fs (x, y, z) = (ks(α) δf )(x, y, z) (11) However, the semi-group property of scale-space implies that all dimensions (spatial dimensions and the feature dimension) become increasingly blurred. Despite the fact that this implies a rapidly growing loss of information with increasing scale and a singular zero scale, this procedure is insensible from a statistical perspective and does not comply with the notion of scale selection [24, 25]. Since the latter argument is not straightforward, we explain our rationale in some more detail. From the requirement that the dimensionless derivative attains its maximum at a position proportional to the wavelength of the signal [24] (section 13.1), we conclude that the scale of a structure is proportional to its spatial scale (a trivial fact) and anti-proportional to its feature scale. The latter can be shown by looking at the Taylor expansion of a harmonic oscillation
Spatio-Featural Scale-Space
813
A sin(ωx) in the origin: Aωx. The steepness of a sinusoid Aω in the origin grows linearly with the amplitude and the frequency, i.e., it is antiproportional to the wavelength λ = 2π ω . Alternatively, one can consider the energy of a harmonic oscillation. The energy is proportional to the square of the amplitude times the square of the 2 frequency: E ∝ A2 ω 2 ∝ A λ2 . That means, if we apply a 3D lowpass filter to the spatio-featural domain, the energy decays with a power of four. Hence, scale selection would favor the highest possible frequencies in nearly all cases. If we scale the amplitude anti-proportionally to the spatial domain, the change of energy is balanced and will reflect intrinsic properties of the signal. 3.2
The Spatio-Featural Uncertainty: The Linear Case
In what follows, we analyze a linear 1D signal resulting from, e.g., the cross section of a locally planar image. Images are observations of a stochastic process, i.e., each measurement of the signal at each position follows a certain distribution in the feature domain. Furthermore, our measurements are subject to stochastic position errors and deterministic distortions (e.g. point-spread function), resulting in a distribution in the spatial domain. If we assume stationarity and independence of the two distributions, we can model the densities by a separable function that is shift-invariant in (f, x)-space, see Fig. 3, top left.
f
nt die gra r a line
t
f
t ien rad rg a e lin
t
s
s
nel ker uss a G ral atu -fe o i t spa
x f
x ar line
t
nt die gra
s kf
kx x
Fig. 3. Illustration for the derivation of uncertainties of a linear gradient. Top left: spatio-featural distribution moving along a linear signal. Top right: 2D distribution obtained by marginalizing along s. Bottom: projections of the 2D distribution onto the original spatio-featural coordinates.
814
M. Felsberg
The overall distribution is obtained as the margin at angle φ, i.e. it is a function of t = cos φ f − sin φ x obtained by integrating along s = cos φ x + sin φ f , see Fig. 3, top right. For the case of Gaussian distributions, this integral can be computed analytically:
x2 f2 p(t; φ) ∝ exp − 2 − 2 ds (12) 2σx 2σf
(s cos φ − t sin φ)2 (s sin φ + t cos φ)2 ds (13) = exp − − 2σx2 2σf2
t2 (14) ∝ exp − 2(sin2 φ σx2 + cos2 φ σf2 ) where σf2 is the variance of the distribution in the feature domain and σx2 is the spatial distribution. In order to compute suitable kernels for a scale-space representation, the 2D distribution p(t; φ) is projected onto the spatial domain respectively the feature domain, see Fig. 3, bottom. For the case of Gaussian distributions, the projections can be computed analytically again:
(f cos φ − x sin φ)2 px (x; φ) = p(t; φ) f =0 ∝ exp − 2(sin2 φ σx2 + cos2 φ σf2 ) f =0
2 x and (15) = exp − 2 2(σx + cot2 φ σf2 )
f2 , (16) pf (f ; φ) ∝ exp − 2(tan2 φ σx2 + σf2 ) resulting in the variances σk2f (φ) = σf2 + tan2 φ σx2 σk2x (φ)
=
σx2
2
+ cot
φ σf2
(17) .
(18)
Hence, we obtain the spatial uncertainty (Δx)2 = 12 σk2x (φ) and the feature uncertainty (Δf )2 = 12 σk2f (φ). Minimizing the product of uncertainties with respect to φ 1 φ0 = arg min(Δx)2 (Δf )2 = arg min σk2x (φ)σk2f (φ) (19) φ φ 4 results in a global minimum at φ0 = tan−1 3.3
σf σx
giving σk2x (φ0 )σk2f (φ0 ) = 4σf2 σx2 .
The Spatio-Featural Uncertainty Relation
In order to generalize the result from the previous section, we have to define the group structure of spatio-featural transformations. Readers not familiar with
Spatio-Featural Scale-Space
815
group theory might consider the previous example as a proof of concept and continue with the subsequent section. We choose a methodology which is based on the isotropic model used in [26], although restricted to the 1D case. The higher-dimensional case generalizes straightforwardly. The group that we consider contains the shearing group and the translation group given as x = x + tx
(20)
f = f + tan(φ)x + tf .
(21)
The shearing transformation corresponds to the rotation of a Euclidean space and is obtained since the f -coordinate is a null-vector [26], i.e., f · f = 0. The parameterization is chosen such that it is consistent with the angle φ in the previous section and it reflects the fact that points move along the surface / curve with angle φ, i.e., that we cannot determine whether measurement noise comes from spatial uncertainty or feature noise. Using this definition we state the following Theorem 1. Let the spatio-featural domain be described by the isotropic model. The uncertainty product in the spatio-featural domain has a lower bound ∃k > 0 :
(Δx)(Δf ) ≥ k
(22)
and the lower bound is given as 1 σf σx (23) 2 where σf2 is the variance of the feature domain marginal distribution and σx2 is the variance of the spatial domain distribution. k=
The proof of this theorem is given as follows. The generators of (20) and (21) are given as sx = ∂x of = x∂f sf = ∂f (24) and the commutator of sx and of is given by [sx , of ] = sx of − of sx = ∂x x∂f − x∂f ∂x = ∂f = sf .
(25)
Hence, using the Robertson-Schrödinger relation [27] (note that the considered shearing transformations is a spinor group in the considered space), we obtain 1 1 E{[sx , of ]}2 = . (26) 4 4 Taking the square root and scaling x and f by σx respectively σf , we obtain Theorem 1. Note that (23) and the example of the previous section differ by a factor of 2. This might either mean that other types of noise distribution would lead to smaller uncertainties or that it is not possible to reach the lower bound. Despite the actual uncertainty product, Theorem 1 implies that we should not use iterative filtering with a 3D low-pass kernel as in (11). Instead, the scales in the different domains must behave reciprocal. The optimal choice of scales is the topic of the subsequent section. E{s2x }E{o2f } ≥
816
3.4
M. Felsberg
Scale-Selection for CCFMs
The derivation from the previous section can be used to determine the proper change of scale when constructing a CCFM scale-space. The major trouble is, however, that we normally do not have access to the effective σf2 and σx2 , i.e., we have to find a way to estimate these unknown parameters. When considering the 3D embedding (9) in context of channel representations, we make the following observation. Encoding f (x, y) with three arbitrarily, but finitely large channels always results in the same image after decoding (assuming infinite accuracy of real numbers): (c1 (f /R) − c−1 (f /R))R = f
0 << R < ∞ .
(27)
This identity is obtained directly from (1) and (5) for the case of three channels (implying n0 = 0 and Mn0 = 1) or directly by (4). This means, we can always select the relative feature resolution R such that the feature resolution σk−2 (φ) f is minimal. Furthermore, we know the original spatial resolution, which can be considered as a maximal spatial resolution [28]. These extremal resolutions can be considered as initial conditions for the CCFM scale-space. Consider for example an image of the size X × X with values in [−0.5, 0.5]. We simply select R = 1 and obtain 1 = σk2f (φmin ) = σf2 + tan2 φmin σx2 .
(28)
From the image size, we know that 1 = σk2x (φmin ) = σx2 + cot2 φmin σf2 . X2
(29)
This means, we get two equations and three unknowns σx2 , σf2 , and φmin and thus we cannot compute the proper scale-selection scheme from the initial image only. However, we also have knowledge about the final state of the scale-space: There is an angle φmax for which the spatial resolution σk−2 (φ) becomes minimal, x i.e., equal to one, which corresponds to three channels (the minimum number for quadratic B-spline channels): 1 = σk2x (φmax ) = σx2 + cot2 φmax σf2 .
(30)
However, this introduces another unknown φmax . This can finally be eliminated by selecting a maximal resolution in feature space. The choice of the latter is however subject to heuristics, as there is no natural upper bound to feature resolution. As an alternative, it can be chosen according to requirements in the further processing. In any case, by selecting some constant F (we used F = 20), we obtain 1 = σk2f (φmax ) = σf2 + tan2 φmax σx2 . (31) F2 The four equations (28–31) are solved by σf2 = φmin
X2 − 1 F 2X 2 − 1 = tan−1 X
σx2 = φmax
F2 − 1 F 2X 2 − 1 = cot−1 F .
(32) (33)
Spatio-Featural Scale-Space
817
Inserting these parameters into (17) and (18) determines good choices of scales in spatial-featural domain and has been used in subsequent experiments.
4
Experiments
In the experiments shown in Fig. 4, we have quantized the spatio-featural resolutions resulting from (17) and (18). The image has been encoded in a CCFM with
Fig. 4. Examples for CCFM-smoothing at different scales. The spatial and featural (denoted as channels) resolutions are given as (quantized) functions (18) and (17) where φ is linearly increasing with the frame number. The feature considered here is the grey-scale.
818
M. Felsberg
the corresponding number of channels and has been decoded afterwards. For the decoding of pixels not lying at a spatial channel center, we have linearly interpolated between the nearest channels. Obviously, the processed images maintain similar perceptual quality for a wide range of channels, before the image degrades at a resolution of 32 × 32. Note that any resolution can be computed in a single step, since the CCFM scale-space is linear and need not be computed iteratively.
5
Conclusion
This paper presents a new theorem for a spatio-featural uncertainty relation. The theoretic result is directly applicable in terms of scale-selection for non-linear image filtering using CCFMs. The main claim of this paper is that one should always consider the spatial domain and the feature domain in conjunction, since they are inherently connected. Still, the presented results are only a very first step and need to be considered more in detail for various applications and the theoretic results need to be generalized for non-flat manifolds (non-trivial fibrebundles) and effects of higher dimensionality need to be considered. Classical scale-space features as preservation of the average grey-scale or the max-min principle have to be considered in future work, presumably in some modified formulation.
Acknowledgement The authors would like to thank P.-E. Forssen for various discussions about the paper, in particular on alpha-synthesis.
References 1. Iijima, T.: Basic theory of pattern observation. Papers of Technical Group on Automata and Automatic Control, IECE, Japan (December 1959) 2. Witkin, A.P.: Scale-space filtering. In: Int. Joint Conf. Art. Intell., pp. 1019–1022 (1983) 3. Koenderink, J.J.: The structure of images. Biolog. Cybernetics 50, 363–370 (1984) 4. Weickert, J., Ishikawa, S., Imiya, A.: Linear scale-space has first been proposed in Japan. Mathematical Imaging and Vision 10, 237–252 (1999) 5. Felsberg, M., Sommer, G.: The monogenic scale-space: A unifying approach to phase-based image processing in scale-space. J. Math. Imag. Vis. 21, 5–26 (2004) 6. Duits, R., Florack, L.M.J., de Graaf, J., ter Haar Romeny, B.M.: On the axioms of scale space theory. Journal of Mathematical Imaging and Vision 20, 267–298 (2004) 7. Granlund, G.H.: In search of a general picture processing operator. Computer Graphics and Image Processing 8, 155–173 (1978) 8. Burt, P.J., Adelson, E.H.: The Laplacian pyramid as a compact image code. IEEE Trans. Communications 31(4), 532–540 (1983)
Spatio-Featural Scale-Space
819
9. Mallat, S.G.: A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans. Pattern Anal. Machine Intelligence 11, 674–693 (1989) 10. Daubechies, I.: Orthonormal bases of compactly supported wavelets. Communications on Pure and Applied Mathematics 41(7), 909–996 (1988) 11. Perona, P., Malik, J.: Scale-space and edge detection using anisotropic diffusion. IEEE Trans. Pattern Analysis and Machine Intelligence 12(7), 629–639 (1990) 12. Weickert, J.: Theoretical foundations of anisotropic diffusion in image processing. Computing suppl. 11, 221–236 (1996) 13. Portilla, J., Strela, V., Wainwright, J., Simoncelli, E.P.: Image denoising using scale mixtures of Gaussians in the wavelet domain. IEEE Trans. Image Processing 12(11), 1338–1351 (2003) 14. Felsberg, M., Forssén, P.E., Scharr, H.: Channel smoothing: Efficient robust smoothing of low-level signal features. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(2), 209–222 (2006) 15. Granlund, G.H.: An associative perception-action structure using a localized space variant information representation. In: Sommer, G., Zeevi, Y.Y. (eds.) AFPAC 2000. LNCS, vol. 1888, pp. 48–68. Springer, Heidelberg (2000) 16. Forssén, P.E., Granlund, G.: Robust multi-scale extraction of blob features. In: Bigun, J., Gustavsson, T. (eds.) SCIA 2003. LNCS, vol. 2749, pp. 11–18. Springer, Heidelberg (2003) 17. Jonsson, E.: Channel-Coded Feature Maps for Computer Vision and Machine Learning. PhD thesis, Linköping University, Sweden (2008) 18. Jonsson, E., Felsberg, M.: Efficient computation of channel-coded feature maps through piecewise polynomials. Image and Vision Computing (in press) 19. Felsberg, M., Granlund, G.: P-channels: Robust multivariate m-estimation of large datasets. In: International Conference on Pattern Recognition, Hong Kong (2006) 20. Zemel, R.S., Dayan, P., Pouget, A.: Probabilistic interpretation of population codes. Neural Computation 10(2), 403–430 (1998) 21. Pouget, A., Dayan, P., Zemel, R.: Information processing with population codes. Nature Reviews – Neuroscience 1, 125–132 (2000) 22. Howard, I.P., Rogers, B.J.: Binocular Vision and Stereopsis. Oxford University Press, Oxford (1995) 23. Forssén, P.E.: Low and Medium Level Vision using Channel Representations. PhD thesis, Linköping University, Sweden (2004) 24. Lindeberg, T.: Scale-Space Theory in Computer Vision. Kluwer Academic Publishers, Boston (1994) 25. Elder, J.H., Zucker, S.W.: Local scale control for edge detection and blur estimation. IEEE Trans. Pattern Analysis and Machine Intell. 20(7), 699–716 (1998) 26. Koenderink, J.J., van Doorn, A.J.: Image processing done right. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 158– 172. Springer, Heidelberg (2002) 27. Santhanam, T.S.: Higher-order uncertainty relations. Journal of Physics A: Mathematical and General 33(8), 83–85 (2000) 28. Florack, L.M.J., ter Haar Romeny, B.M., Koenderink, J.J., Viergever, M.A.: Scale and the differential structure of images. Image Vision Comp. 10(6), 376–388 (1992)
Scale Spaces on the 3D Euclidean Motion Group for Enhancement of HARDI Data Erik Franken1 and Remco Duits1,2 1
Department of Biomedical Engineering Department of Mathematics and Computer science, Eindhoven University of Technology, The Netherlands {E.M.Franken,R.Duits}@tue.nl 2
Abstract. In previous work we studied left-invariant diffusion on the 2D Euclidean motion group for crossing-preserving coherence-enhancing diffusion on 2D images. In this paper we study the equivalent threedimensional case. This is particularly useful for processing High Angular Resolution Diffusion Imaging (HARDI) data, which can be considered as 3D orientation scores directly. A complicating factor in 3D is that all practical 3D orientation scores are functions on a coset space of the 3D Euclidean motion group instead of on the entire group. We show that, conceptually, we can still apply operations on the entire group by requiring the operations to be α-right-invariant. Subsequently, we propose to describe the local structure of the 3D orientation score using left-invariant derivatives and we smooth 3D orientation scores using left-invariant diffusion. Finally, we show a number of results for linear diffusion on artificial HARDI data.
1
Introduction
A common approach for enhancing elongated structures in noisy images is by nonlinear anisotropic diffusion on the image [1]. This can be regarded as calculating a nonlinear scale space on the additive group (Rn , +), i.e. the translation group. In our earlier work [2, 3, 5], we proposed to enhance elongated structures via the orientation score of a 2D image, which has the practical advantage that crossing structures can be handled appropriately. An orientation score of a 2D image is a function on the 2D Euclidean motion group SE(2), which is constructed from a 2D image using an invertible transformation. The image enhancement in our previous work is accomplished by a nonlinear diffusion process in the orientation score of the image (which is a 3D dataset: 2 spatial dimensions and 1 orientation dimension), followed by an inverse orientation score transformation to obtain an enhanced image. In this paper we go one step further and investigate how we can apply the same techniques to 3D orientation scores. Such orientation score is a 5D dataset, i.e. 3 spatial dimensions and 2 orientation dimensions. The 3D case is very relevant for many (bio)medical problems, since many (bio)medical images are intrinsically 3D. Our main application of interest is high angular resolution diffusion imaging (HARDI). X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 820–831, 2009. c Springer-Verlag Berlin Heidelberg 2009
Scale Spaces on the 3D Euclidean Motion Group
821
With the term HARDI we refer to all diffusion MRI techniques, in which the diffusion profile on each spatial position is modeled by a function on the sphere, which provides richer information especially in regions where different fibrous structures cross or bifurcate [4,6,7,8]. Roughly speaking the MRI scanner measures the probability of finding a water molecule at each position for a certain direction, where the number of acquired directions can be varied. Clearly, all data obtained using any HARDI technique can be considered as 3D orientation scores directly. Remarkably, in HARDI processing algorithms that are proposed in literature, the data is processed as function on the sphere for each spatial position separately, see e.g. [4, 7, 9]. In our approach, we consider both the spatial and the orientational part to be included in the domain, so a HARDI dataset is considered as a function R3 S 2 → R. Furthermore, we explicitly employ the proper underlying group structure. The advantage is that we can enhance the data using both orientational and spatial neighborhood information, which potentially leads to improved enhancement and detection algorithms. 3 2 3D orientation scores are defined as u : R S → R or C, where functions 3 2 3 R is the spatial domain and S = n ∈ R n = 1 is the domain of a unit sphere. In this paper, the domain of u is parameterized by (x, n), where x = (x, y, z) ∈ R3 and n ∈ S 2 . Figure 1(a) shows an example clarifying the structure of a 3D orientation score. This paper will start with the introduction of the group structure of the 3D orientation score domain, i.e. the 3D Euclidean motion group SE(3). Subsequently, we will introduce the important differential geometry on SE(3), needed z Į
ȕ Ȗ
y
x (a) Example 3D orientation score
(b) Euler angles on the sphere
Fig. 1. (a) Visualization of a simple 3D orientation score u(x, n) containing two crossing straight lines, visualized using Q-ball glyphs in the DTI tool (see http://www.bmia.bmt.tue.nl/software/dtitool/) from two different viewpoints. At each (relevant) spatial position x the function on the sphere u(x, ·) is displayed by a so-called glyph, which is given by n → x + q u(x, n)n where q is a scaling factor. (b) Intuition of coset space SO(3)/SO(2): the Euler angles (α, β, γ) are needed to parametrize rotation in SO(3), while the two angles (β, γ) are sufficient to describe positions on the unit sphere, represented by a unit vector. The third Euler angle α is in fact a rotation of this vector around its own axis, leaving the vector invariant. Thus, each position on the sphere is identified by a coset space SO(3)/SO(2) cf. (4).
822
E. Franken and R. Duits
to estimate tangent vectors that locally fit best to the elongated structures in the 3D orientation score. The next topic will be the diffusion on 3D orientation scores, which yields a scale space representation of the SE(3) group. The paper will end with results of linear SE(3)-diffusion on artificial HARDI datasets.
2 2.1
Group Structure of the Domain of 3D Orientation Scores The Rotation Group SO(3) and Coset Space SO(3)/SO(2)
The noncommutative group of 3D rotations is defined as matrix group by SO(3) = {R | R ∈ R3×3 , RT = R−1 , det(R) = 1}.
(1)
In this section, we will first consider different parameterizations of SO(3). Then, we will describe the coset space SO(3)/SO(2), which is an essential prerequisite to relate functions on the sphere (i.e. two angles) to functions on SO(3) (i.e. three angles). The relation between positions on the sphere S 2 and a 3D rotation SO(3) is established by rotating the vector ez , i.e. n = R · ez .
(2)
This relation shows that the resulting position n on the sphere is independent on an arbitrary rotation around the z-axis, that is R Reαz ·ez = R·ez for all α, where Rn α denotes rotation over α around the axis defined by vector n. This means that a function on the sphere is not equivalent to a function on the complete rotation group SO(3), but rather a function on the set that partitions SO(3) into left cosets SO(3)/stab(ez ) where stab(ez ) denotes the subgroup of SO(3) of all rotations around the z-axis, as is made intuitive in Figure 1(b). A left coset [g]H of a group G with subgroup H is defined as the set [g]H = gH = {gh|h ∈ H},
(3)
for any g ∈ G. The left cosets form a partition of the group, i.e. the group is divided into disjoint cosets, and the set of all of these cosets is denoted by G/H. Two group elements g1 ∈ G and g2 ∈ G have an equivalence relation g1 ∼ g2 if they belong to the same left coset, i.e. g1 H = g2 H. In the case SO(3)/stab(ez ), we have the equivalence relation R1 ∼ R2 iff there is an α such that R1 Reαz = R2 . From now on we will write SO(3)/SO(2) rather than SO(3)/stab(ez ) since stab(ez ) and SO(2) are isomorphic. The cosets SO(3)/SO(2) are isomorphic to the space of the unit vectors of (2), i.e. SO(3)/SO(2) ∼ (4) = S 2 = n ∈ R3 n = 1 . The isomorphism is given by means of (2). The set of all the cosets SO(3)/SO(2) can be parameterized using only two angles rather than three angles, for instance e e as [Reγz Rβy ]SO(2) ∈ SO(3)/SO(2) and therefore n(β, γ) = Reγz Rβy ez ∈ S 2 . Note that the set of all disjoint cosets SO(3)/SO(2) does not form a group since SO(2) is not a normal subgroup of SO(2), so [g1 ]SO(2) [g2 ]SO(2) = [g1 g2 ]SO(2) .
Scale Spaces on the 3D Euclidean Motion Group
2.2
823
The 3D Euclidean Motion Group SE(3)
The 3D Euclidean motion group is the group of 3D translations and 3D rotations, i.e. SE(3) = R3 SO(3). An element of SE(3) can be parameterized by (x, R) where x ∈ R3 is the translation vector and R ∈ SO(3) is the rotation matrix. The group product and inverse of SE(3) are given by g g = (x, R) (x , R ) = (x + R · x , R · R ),
(5)
g −1 = (x, R)−1 = (−R−1 x, R−1 ).
To map the structure of a group to operators on orientation scores, we need a representation. A representation is a mapping of the form R : G → B(H), where H is the linear space of orientation scores and B(H) is the space of bounded linear invertible operators H → H, that maps a group element to an operator where the group properties are preserved, i.e. Rg Rh = Rgh and Re = I. On SE(3) we define the left- and right-regular representations on a function U ∈ L2 (SE(3)) as (Lg ◦ U )(h) = U (g −1 h),
g, h ∈ SE(3),
(6)
(Qg ◦ U )(h) = U (h g),
g, h ∈ SE(3).
(7)
The matrix Lie algebra [10] Te (SE(3)) is spanned by the following basis ⎛
0 ⎜0 ⎜ X1 = ⎝ 0 0 ⎛ 0 ⎜0 ⎜ X4 = ⎝ 0 0
0 0 0 0 0 0 1 0
⎞ 1 0⎟ ⎟, 0⎠ 0 ⎞ 0 0 −1 0⎟ ⎟, 0 0⎠ 0 0 0 0 0 0
⎛
⎞ ⎛ ⎞ 0000 0000 ⎜ 0 0 0 1⎟ ⎜ 0 0 0 0⎟ ⎟ ⎟ X2 = ⎜ X3 = ⎜ ⎝ 0 0 0 0⎠ , ⎝ 0 0 0 1⎠ , 0000 0000 ⎞ ⎛ ⎛ 0 010 0 −1 0 ⎜ 0 0 0 0⎟ ⎜1 0 0 ⎟ ⎜ ⎜ X5 = ⎝ , X6 = ⎝ −1 0 0 0⎠ 0 0 0 0 000 0 0 0
⎞ 0 0⎟ ⎟. 0⎠ 0
(8)
The nonzero commutators can be found by [Xi , Xj ] = Xi Xj − Xj Xi . By calculating the matrix exponents we find the following matrix representation of the SE(3) group E(x,R) = exp(x X1 + y X2 + z X3 ) exp(ˇ γ X4 ) exp(βˇ X5 ) exp(ˇ α X6 ) Rx e = , with R = Reγˇx Rβˇy Reαˇz . 0 1
(9)
ˇ γˇ ) is a possible Euler angle parametrization of the rotation group where (ˇ α, β, SO(3), see [5, Chapter 7]. 2.3
Left-Invariance and Right-Invariance
An operator Φ : L2 (SE(3)) → L2 (SE(3)) is left-invariant if it commutes with the left-regular representation (6) ∀ g ∈ SE(3) : Lg ◦ Φ = Φ ◦ Lg ,
(10)
824
E. Franken and R. Duits
and similarly an operator Φ is right-invariant if it commutes with the rightregular representation (7) ∀ g ∈ SE(3) : Qg ◦ Φ = Φ ◦ Qg .
(11)
In this work we aim at left-invariant operations and consider right-invariant operations senseless. The rationale behing this will be clarified below. Define W : (SE(3) → C) → (R3 → C) to be the operator that calculates the orientationmarginal,
W[U ](x) = U (x, R)dμ(R). (12) SO(3)
where dμ is the Haar measure, which is designed in order to fulfill requirement
F (R)dμ(R) = F (R · R )dμ(R), ∀ R ∈ SO(3). (13) SO(3)
SO(3)
It is easy to derive that for the left-regular representation Ug ◦ W ◦ U = W ◦ Lg ◦ U, ∀ g ∈ SE(3),
(14)
where U is a representation of SE(3) on L2 (R3 ) defined by (U(x ,R ) f )(x) = f ((R )−1 (x − x )). On the other hand, we note that
(W ◦ Q(x,R) ◦ U )(x , R ) = U (x + R x, R R)dμ(R ), (15) SO(3)
which shows that the integral variable R enters the spatial part, making it impossible to find a relation equivalent to (14) for the right-regular representation. In words, the left-regular representation “commutes” with W, where Lg changes into Ug since the function space changes from SE(3) to R3 , while it is not possible to find such a relation for the right-regular representation. This observation makes it sensible to favor operators Φ to be left-invariant, i.e. W ◦ Φ ◦ Lg ◦ U = W ◦ Lg ◦ Φ ◦ U = Ug ◦ W ◦ Φ ◦ U states that applying a group transformation (Lg ) on the input U renders the same result as applying the same group transformation (Ug ) on the orientation-marginal of the output. 2.4
Functions on SE(3) and R3 S 2
In the beginning of this paper we defined a 3D orientation score u as a function of three spatial variables and only two angular variables describing a position on the sphere. However, since the sphere S 2 is isomorphic to the coset space SO(3)/SO(2), rather than the entire rotation group SO(3), such an orientation score is not a function on the entire Euclidean motion group SE(3), but rather a function on the coset space SE(3)/(0 × stab(ez )). Here, (0 × stab(ez )) denotes the SE(2) subgroup of rotations around the z-axis and translation 0, which is isomorphic to SO(2). Analogously to the isomorphism SO(3)/SO(2) ∼ = S 2 , we 3 2 ∼ have the isomorphism SE(3)/(0 × stab(ez )) = R S .
Scale Spaces on the 3D Euclidean Motion Group
825
For the analysis it is more convenient to consider functions on R3 S 2 as functions on the entire group SE(3) with the extra property of α-right-invariance. ˜ : SE(3) → C is defined to be α-right-invariant if A function U ˜ =U ˜ , ∀ α, that is, Q(0,Reαz ) ◦ U ˜ ˜ (x, R), ∀ α, U(x, RReαz ) = U
(16)
˜ rather than U to make explicit in the notation that the function where we write U ˜ (x, R) is independent on a is α-right-invariant. We observe that the value of U ˜ rotation of the z-axis applied on the right-side, so U can be identified one-to-one to an orientation score u : R3 S 2 → C, as ˜ U(x, R) = u(x, R · ez ),
˜ is α-right-invariant. where U
(17)
˜ , because In this paper we will mostly work with the α-right-invariant function U it is more convenient to work with functions on the group. 2.5
SE(3)-Convolutions
It can be shown that all operations on orientation scores that are linear and left-invariant, can be expressed as an SE(3)-convolution, which is defined by
(Ψ ∗SE(3) U )(g) = Ψ (h−1 g)U (h)dh. (18) SE(3)
More explicitly this yields
(Ψ ∗SE(3) U )(x, R) =
R3
Ψ (R−1 (x − x ), R−1 R)U (x , R ) dx dμ(R ), (19) SO(3)
where dμ(R ) is defined in (13). ˜ cf. (16) we need to put additional requirements on For an α-right-invariant U ˜ to be α-right-invariant as well, the kernel Ψ . We require the result Ψ ∗SE(3) U leading to the following requirement ˜ )) = Ψ˜ ∗SE(3) U ˜, Q(0,Rez ) ◦ (Ψ˜ ∗SE(3) (Q(0,Reαz ) ◦ U α
∀ α, α .
(20)
˜ One can easily verify that the folThis imposes requirements on the kernel Ψ. lowing properties hold for the SE(3)-convolution of (18) Qg (Ψ ∗SE(3) U ) = (Qg Ψ ) ∗SE(3) U, ∀g ∈ SE(3),
(21)
(Lg Ψ ) ∗SE(3) U = Ψ ∗SE(3) (Qg−1 U ), ∀g ∈ SE(3).
(22)
Using the latter two equations, the left-hand side of (20) can now be rewritten as ˜ )) = ((Q(0,Rez ) ◦ Ψ˜ ) ∗SE(3) (Q(0,Rez ) ◦ U ˜ )) Q(0,Rez ) ◦ (Ψ˜ ∗SE(3) (Q(0,Reαz ) ◦ U α α
α
˜. = (L(0,Rez ) ◦ Q(0,Rez ) ◦ Ψ˜ ) ∗SE(3) U −α
α
(23)
826
E. Franken and R. Duits
Therefore
˜ , for all α, α , z ) ◦ Q(0,Rez ) ◦ Ψ Ψ˜ = L(0,Re−α
(24)
α
so Ψ˜ is required to be both α-right-invariant and α-left-invariant (i.e. L(0,Rez ) ◦ α ˜ =U ˜ for all α ). More explicitly this yields U Ψ˜ (x, R) = Ψ˜ ((Reαz )−1 x, (Reαz )−1 RReαz ), for all α, α .
3
(25)
Differential Geometry on SE(3)
In [3] we introduced the basic differential geometry on SE(2). In this section we establish the same concepts for SE(3). We will introduce the left-invariant vector fields and left-invariant derivatives, and a procedure to estimate tangent vectors that locally fit best to elongated structures in 3D orientation scores. A more extensive description, including explicit expression for e.g. curvature and torsion, can be found in [5, Chapter 7]. 3.1
Left-Invariant Derivatives in SE(3)
Using the matrix representation cf. equation (9), left-invariant derivatives are given by U (Eg · exp(h Xi )) − U (Eg ) (Ai U )(Eg ) = lim . (26) h→0 h The tangent space of g ∈ SE(3) is vector fields, i.e. spanned by these Tg (SE(3)) = span{A1 g , A2 g , A3 g , A4 g , A5 g , A6 g } where we define Ai g (U ) = (Ai U )(Eg ). Left-invariant derivatives A1 , A2 ,and A3 can be implemented simply by approximating (26) using finite differences. ˜ , A3 U ˜ (g) is always α-right-invariant On an α-right-invariant function U since exp(h X3 ) = E(h ez ,I) = E(h ez ,I) E(0,Reαz ) . Furthermore, we always have ˜ (g) = 0 for all g ∈ SE(3). The remaining left-invariant derivatives Ai U ˜, A6 U with i ∈ {1, 2, 4, 5}, do not render α-right-invariant functions since these leftinvariant derivatives are dependent on the value of α resp. α ˇ . This implies that if one takes higher order derivatives one still needs to take all 6 left-invariant derivatives into account. As an example, we derive the left-invariant Hessian HU = ∇(∇U ) for α-rightinvariant functions where the gradient operator is ∇ = (A1 , A2 , . . . A6 )T . To this end, we first use the commutator relations to order the numbered left-invariant derivatives such that angular derivative A1 always appears on the left-side and A6 always appears on the right-side and subsequently we can use A6 U (g) = 0 (which implies that Ai A6 U = 0 for all i). This yields the following 5 × 6 Hessian matrix ⎛
A21 ⎜A1 A2 ⎜ ˜ = ∇(∇U ˜ ) = ⎜A1 A3 HU ⎜ ⎝A1 A4 A1 A5
A1 A2 A22 A2 A3 A2 A4 A2 A5
A1 A3 A1 A4 A1 A5 − A3 A2 A3 A2 A4 + A3 A2 A5 A23 A3 A4 − A2 A3 A5 + A1 A3 A4 A24 A4 A5 A3 A5 A4 A5 A25
⎞ A2 −A1 ⎟ ⎟ ˜ 0 ⎟ ⎟ U. ⎠ A5 −A4
(27)
Scale Spaces on the 3D Euclidean Motion Group
827
We use finite differences to calculate the left-invariant derivatives on orientation scores with a sampled domain R3 S 2 . To get a rotation matrix corresponding to an element of S 2 one can choose an arbitrary rotation matrix with R · ez = n. For first order centered finite differences one subsequently calculates (Ai U )(Eg ) ≈
1 (U (Eg · exp(h Xi )) − U (Eg · exp(−h Xi ))) . 2h
(28)
Note that this will require interpolations to be performed, both in the spatial dimensions and on the sphere. One should, however, always ensure that the result of the effective operator is independent on the specific choice of R. To this end, we have the following important relation between the left-invariant derivatives at g1 and g2 iff g1 = (x, R1 ) ∼ g2 = (x, R2 ) ˜ (g1 ) = Zα1 −α2 ∇U ˜ (g2 ), ∇U
with Zα = Rα ⊕ ( 1 ) ⊕ Rα ⊕ ( 1 ),
(29)
where Zα1 −α2 ∈ R6×6 “converts” the left-invariant gradient at g2 to the left α − sin α invariant gradient at g1 , rotation matrix Rα is given by Rα = cos sin α cos α , and the symbol “⊕” denotes direct sum of matrices. 3.2
Estimation of Tangent Vectors in R3 S 2
The exponential curves of SE(3) are found by (expressed in matrix form) ⎛ ⎞ 6 cj Xj ⎠ , γc (t) = exp ⎝t (30) j=1
which where c = (c1 , c2 , . . . , c6 ) denotes the SE(3)-tangent vector components, 6 j are elements of the tangent space at the unity element j=1 c Aj e ∈ Te (SE(3)), where we use the isomorphism between the Lie algebra and the left-invariant vector fields at the unity element, i.e. Aj e ↔ Xj . We aim to estimate the locally best fitting exponential curve (in the previous subsection) at each position SE(3). Therefore, we formulate a minimization problem that minimizes over the “iso-contours” of the left-invariant gradient vector at position g, leading to the optimal tangent vector c∗ 2 d ∗ ˇ c (g) = arg min (31) dt (∇U (g γc(g) (t)))t=0 c(g)μ = 1 , c(g) μ where · μ denotes the norm on a vector in tangent space Tg (SE(3)) (i.e. the norm at the right side) resp. on a covector in the dual tangent space Tg∗ (SE(3)). The norm on vectors is defined by c = (c, c)μ with the inner product μ 3 6 2 j j j j + j=4 c c , where parameter μ ensures that all com(c, c)μ = μ j=1 c c ponents of the inner product are dimensionless. The value of the parameter determines how the distance in the spatial dimensions relates to distance in the
828
E. Franken and R. Duits
orientation dimension. After some elementary math, we find that equation (31) can be expressed as ˜∗ = λ c ˜∗ , (Mμ HU Mμ )T (Mμ HU Mμ ) c
(32)
∗ ˜∗ = M−1 where Mμ = diag(1/μ, 1/μ, 1/μ, 1, 1, 1) and c μ c . This amounts to eigensystem analysis of the symmetric 6×6 matrix (Mμ HU Mμ )T (Mμ HU Mμ ), where ˜∗ . The eigenvector with the smallest correone of the three eigenvectors gives c ˜∗ , and the desired tangent sponding eigenvalue is selected as tangent vector c ∗ ∗ ∗ vector c is then given by c = Mμ ˜ c . Once the local tangent vector is found, it is of interest to obtain a measure for orientation confidence, which can be used for controlling the anisotropy factor of an adaptive diffusion process, as described for 2D in [2, 3]. Such measure can be obtained by calculating the Laplacian in the five-dimensional (considering the full SE(3)) hyperplane orthogonal to the estimated tangent vector.
4
Diffusion on 3D Orientation Scores
The general left-invariant diffusion equation on SE(3) is given by ⎧ 6 6 ⎪ ⎨∂ W (g, t) = ∇ · D∇W (g, t) = A D A W (g, t), t j ij i i=1 j=1 ⎪ ⎩ ∂t W (g, 0) = U (g),
(33)
where W (·, t) represents the diffused orientation score at time t. This equation generates the diffusion scale space on the 3D Euclidean motion group SE(3). Next, we will derive which types of diffusions on SE(3) preserve the α-right˜ (g, 0) = U ˜ (g). In that case, invariance of an α-right-invariant input function W the right-hand side of (33) becomes, using (29) ˜ (g1 ) = ∇ · ZT ˜ ˜ ∇ · D(g1 )∇W α1 −α2 D(g1 )Zα1 −α2 ∇W (g2 ) = ∇ · D(g2 )∇W (g2 ),
(34)
which shows that diffusion is only valid (i.e., α-right-invariance-preserving) if D(g1 ) = Zα1 −α2 D(g2 )ZT α1 −α2 ,
for all g1 ∼ g2 .
(35)
Next, we separately consider constant diffusion (diffusion tensor D is constant for all g ∈ SE(3)) and adaptive diffusions (diffusion tensor D varies). Linear and Constant Diffusion: Suppose D is an arbitrary diffusion tensor, which is not necessarily valid, one can always make it valid by taking the α-marginal to remove the dependency on α, i.e.
2π ˜ (g, t)dα = 1 ˜ ∇ · D∇W ∇ · ZT α−α0 DZα−α0 ∇W (g0 , t)dα 2π 0 0
2π 1 ˜ ˜ ˜ =∇· ZT α−α0 DZα−α0 dα ∇W (g0 , t) = ∇ · D∇W (g0 , t), 2π 0
1 2π
2π
(36)
Scale Spaces on the 3D Euclidean Motion Group
829
˜ = 1 2π ZT DZα dα and where g = (x, R(α,β,γ)) and g0 = (x, R(α ,β,γ)). with D α 0 2π 0 ˜ α-right-invariance is preserved. All So by considering only diffusion tensors D, ˜ have the form D ˜ = diag(A, A, B, C, C, 0) (where the sixth diffusion tensors D ˜ = 0). This corresponds to horizontal, zero-curvature value is irrelevant since A6 U and zero-torsion, linear diffusion. Adaptive Diffusion: In case of adaptive diffusions, both linear and nonlinear, the diffusion above with adaptive A, B, and C is valid as well, since the derivation in (36) can also be applied on an adaptive D. Furthermore, adaptive diffusion with diffusion tensor D(g) = c(g) c(g)T , which can be interpreted as a diffusion process that only diffuses tangent to an exponential curve at each position g ∈ SE(3) with tangent vector c(g), is a valid diffusion as well. This can be easily seen by observing that c(g1 ) = Zα1 −α2 c(g2 ), iff g1 ∼ g2 . This yields for the diffusion tensor D D(g1 ) = (Zα1 −α2 c(g2 ))(Zα1 −α2 c(g2 ))T = Zα1 −α2 c(g2 )c(g2 )T ZT α1 −α2 ,
(37)
which matches requirement (35). Furthermore, the sum of two valid diffusion tensors D1 + D2 forms a valid diffusion tensor again since T D1 (g1 ) + D2 (g1 ) = Zα1 −α2 D1 (g2 )ZT α1 −α2 + Zα1 −α2 D2 (g2 )Zα1 −α2
= Zα1 −α2 (D1 (g2 ) + D2 (g2 ))ZT α1 −α2 .
(38)
Therefore, in an adaptive setting one can also use a mixture between the between spatially-isotropic diffusion and diffusion along estimated exponential curves, i.e. D(c, Da ) = (1 − Da )
μ2 2 cμ
c cT + Da diag(1, 1, 1, μ2 , μ2 , μ2 ),
(39)
where Da is the anisotropy factor. Both Da and c are made dependent on the local structure in the orientation score. This diffusion process is analogous to the nonlinear curvature-adaptive diffusion process on 2D orientation scores that we have proposed in [2, 3].
5
Results
We implemented linear, left-invariant and α-right-invariance-preserving, diffu˜ = diag(A, A, B, C, C, 0) using an explicit sion on 3D orientation scores with D numerical scheme. The time derivative is taken as a first order forward finite difference. Spatially, we take second order centered finite differences for ∂x2 , ∂y2 , and ∂z2 . In the orientation dimensions we calculate the Laplace operator on the sphere ΔS 2 via the spherical harmonic transform, where for stability a small regularization is applied via the spherical harmonic domain as well [11]. In Figure 2 we show a result of the linear SE(3)-diffusion process. In these examples an artificial three-dimensional HARDI dataset is created, to which Rician noise [12] is added. Next, we apply two different SE(3)-diffusions on
830
E. Franken and R. Duits
(a) t = 0, no noise
(b) t = 0, noisy
(c) t = 1, μ-isotropic, no noise
(d) t = 1, μ-isotropic, noisy
(e) t = 1, anisotropic, no noise
(f) t = 1, anisotropic, noisy
Fig. 2. Result of R3 S 2 -diffusion on an artificial HARDI dataset of two crossing lines where one of the lines is curved, with and without added Rician noise with σ = 0.17 (signal amplitude 1). Image size: 10×10×10 spatial and 162 orientations. Parameters of the μ-isotropic diffusion process: A = B = 1, C = 0.01. Parameters of the anisotropic diffusion process: A = 0.01, B = 1, C = 10−4 .
both the noise-free and the noisy dataset. To visualize the result we use an experimental version of the DTI tool, which can visualize HARDI glyphs (recall Figure 1(a)) using the Q-ball visualization method [7]. In the results, all glyphs are scaled equivalently. The μ-isotropic diffusion does not preserve the anisotropy of the glyphs well; especially in the noisy case we observe that we get almost isotropic glyphs. With anisotropic diffusion, the anisotropy of the HARDI glyphs is preserved much better and in the noisy case the noise is clearly reduced. The resulting glyphs are, however, less directed than in the noise-free input image. This would improve when using nonlinear diffusion, or when adding some sort of “thinning” step in the method.
Scale Spaces on the 3D Euclidean Motion Group
6
831
Conclusions
In this paper we have shown that we can map all techniques of our previous work on 2D orientation scores to the more complicated case of 3D orientation scores. Some issues require special attention. Especially the fact that we usually have to deal with the coset space SE(3)/(0 × stab(ez )) ∼ = R3 S 2 has been emphasized as an important issue. We have shown that we can consider functions R3 S 2 → C as functions on SE(3) which are α-right-invariant. The required preservation of α-rightinvariance imposed additional constraints on the SE(3)-convolution kernel and the allowed types of (non)linear diffusion. The results suggest that even anisotropic linear diffusion on SE(3) is a useful way to denoise HARDI data. Future work should include the implementation and evaluation of nonlinear SE(3)-diffusion.
References 1. Weickert, J.A.: Coherence-enhancing diffusion filtering. International Journal of Computer Vision 31(2/3), 111–127 (1999) 2. Franken, E., Duits, R., ter Haar Romeny, B.M.: Nonlinear diffusion on the 2D Euclidean motion group. In: Sgallari, F., Murli, A., Paragios, N. (eds.) SSVM 2007. LNCS, vol. 4485, pp. 461–472. Springer, Heidelberg (2007) 3. Franken, E., Duits, R.: Crossing-preserving coherence-enhancing diffusion on invertible orientation scores. International Journal of Computer Vision (IJCV) (to appear, 2009) 4. Özarslan, E., Mareci, T.H.: Generalized diffusion tensor imaging and analytical relationships between diffusion tensor imaging and high angular resolution imaging. Magnetic Resonance in Medicine 50, 955–965 (2003) 5. Franken, E.: Enhancement of Crossing Elongated Structures in Images. PhD thesis, Eindhoven University of Technology, Department of Biomedical Engineering, Eindhoven, The Netherlands (2008) 6. Özarslan, E., Shepherd, T.M., Vemuri, B.C., Blackband, S.J., Mareci, T.H.: Resolution of complex tissue microarchitecture using the diffusion orientation transform (DOT). NeuroImage 31, 1086–1103 (2006) 7. Descoteaux, M., Angelino, E., Fitzgibbons, S., Deriche, R.: Regularized, fast, and robust analytical Q-ball imaging. Magnetic Resonance in Medicine 58(3), 497–510 (2007) 8. Jian, B., Vemuri, B.C., Özarslan, E., Carney, P.R., Mareci, T.H.: A novel tensor distribution model for the diffusion-weighted MR signal. NeuroImage 37, 164–176 (2007) 9. Florack, L.: Codomain scale space and regularization for high angular resolution diffusion imaging. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2008. CVPR Workshops 2008, June 2008, pp. 1–6 (2008) 10. Varadarajan, V.: Lie groups, Lie algebras, and their representations. Prentice-Hall, Englewood Cliffs (1974) 11. Kin, G., Sato, M.: Scale space filtering on spherical pattern. In: Proc. 11th international conference on Pattern Recognition, vol. C, pp. 638–641 (1992) 12. Macovski, A.: Noise in MRI. Magnetic Resonance in Medicine 36(3), 494–497 (1996)
On the Rate of Structural Change in Scale Spaces David Gustavsson, Kim S. Pedersen, Francois Lauze, and Mads Nielsen DIKU, University of Copenhagen Universitetsparken 1, DK-2100 Copenhagen Ø, Denmark {davidg,kimstp,francois,madsn}@diku.dk
Abstract. We analyze the rate in which image details are suppressed as a function of the regularization parameter, using first order Tikhonov regularization, Linear Gaussian Scale Space and Total Variation image decomposition. The squared L2 -norm of the regularized solution and the residual are studied as a function of the regularization parameter. For first order Tikhonov regularization it is shown that the norm of the regularized solution is a convex function, while the norm of the residual is not a concave function. The same result holds for Gaussian Scale Space when the parameter is the variance of the Gaussian, but may fail when the parameter is the standard deviation. Essentially this imply that the norm of regularized solution can not be used for global scale selection because it does not contain enough information. An empirical study based on synthetic images as well as a database of natural images confirms that the squared residual norms contain important scale information. Keywords: Regularization, Tikhonov Regularization, Scale Space, TV, Total Variation, Geometric Structure, Texture.
1 Introduction Images contain a mix of different type of information - from fine scale stochastic textures to large scale geometric structures. Image regularization can be viewed as approximating the observed original image with a simpler image, where simpler is defined by the regularization (prior) term and the regularization parameter λ. Here an image is considered to be simpler if it is smoother (or piece-wise smoother). Regularization can also be viewed as decomposing the observed image into a regularized (smooth) component and a small scale texture/noise component (called the residual, because it is the difference between the regularized solution and the observed image). By increasing the regularization parameter λ smoother and smoother approximations are generated. The rate in which image details are suppressed as a function of the regularization parameter depends on the image content and regularization method. The image residual contains the details that are suppressed during the regularization and the norm of the residual is a measurement of the amount of details that are suppressed. The norm of the residual as a function of the regularization parameter gives important information about the image content. For images containing small scale structure a lot of details are suppressed even for small λ and the norm of the residual will be large for small λ. For images containing solely large scale geometric structures few details will be suppressed for small λ and X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 832–843, 2009. c Springer-Verlag Berlin Heidelberg 2009
On the Rate of Structural Change in Scale Spaces
833
the norm of the residual will be small. The rate in which details are suppressed can be viewed as the derivative of the norm of the residual with respect to the regularization parameter, and reveals the amount of details that are suppressed if the regularization parameter increases. First order Tikhonov regularization, Gaussian linear scale space (which is equivalent to infinite order Tikhonov regularization [1]) and Total Variation image decomposition are studied. The squared L2 -norm of the regularized solution and the residual are studied as functions of the regularization parameter. Of special interest is the convexity/concavity of those norms viewed as functions, because it relates to the possibility that the rate in which details are suppressed can increase/decrease. In section 2, first order Tikhonov regularization is revisited and it is shown that the norm of the regularized solution is a convex function, while the norm of the residual is not a concave function. In section 3, linear Gaussian Scale Space is revisited, and it is shown that the norm of the regularized solution is convex as a function of the Gaussian variance, or equivalently diffusion time, but may fail to be convex when the parameter is the Gaussian standard deviation. The squared norm of the residual is in general not a concave function of its parameter. In section 4, Total Variation (TV) image decomposition is revisited. In section 5 experimental results are presented, the norm of the Sinc function, synthetic image containing image structures at different scales and natural images are studied. These studies tend to show that the square residual norm contains scale information, particularly at values where local convexity/concavity behavior changes. 1.1 Related Work Characterization of images by analyzing the behavior of the norm of the regularized solution and the residual as functions of the regularization parameter has not received much research attention. Sporring and Weickert [2, 3] view images as distributions of light quanta and use information theory to study the structure of images in scale space. The entropy of an image as a function of the scale (in scale-space) is analyzed and shown to be an increasing function of the scale. The result holds both for linear Gaussian scale space and non-linear scale-space. Furthermore the derivative of the entropy with respect to the scale is shown, empirically, to be a good texture descriptor. The derivative of the scale-space entropy function with respect to the scale is a global measure of how much the entropy of an image changes at different scale. Where Sporring and Weickert studies monotone functions of images across scale, we study norms of the scale space image and residual. Buades et.al [4] introduced the concept of Method Noise in denoising. The Method Noise is the image details that are removed in the denoising - i.e. the residual image - and the content is used for comparing denoising methods. The residual image has often been used for determine the optimal regularization parameter. (See Thompson et.al [5] for a classical study.) Selection of the optimal stopping time for diffusion filter was studied by Mrazek and Navara [6], which also relate to the Lyapunov functionals studied by Weickert [7].
834
D. Gustavsson et al.
1.2 Convexity, Fourier Transforms, Power Spectra Recall that a function f (x) defined on a convex set C is convex if f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y) for all 0 ≤ λ ≤ 1 and for all x, y ∈ C. If f (x) is convex on a convex set C then −f (x) is said to be concave on C. When f (x) is twice-differentiable, a necessary and sufficient condition for convexity is ∀x ∈ C, f (x) ≥ 0 (1) (in the multidimensional case a the Hessian matrix is positive semi-definite). Two elementary facts will be used in the sequel: 1) let h(λ) be a function of the form h(λ) = d(λ, x)s(x)dx (2) where d(λ, x) is convex in λ and s(x) ≥ 0 then h(λ) is convex. 2) Assume that f (x) = h(g(x)) where g : Rn → Rk and h : Rk → R. Then – if h is convex and non-decreasing and g is convex, then f is convex, – if h is convex and non-increasing and g is concave, then f is concave. The Fourier transform of a function f is denoted with fˆ. Parseval’s theorem asserts ˆ L2 where that this is an isometry of L2 : f L2 = f 2 f (x, y)2 = |f (x, y)|2 dxdy. (3) The frequency domain variables are denoted (ωx , ωy ) =: ω. The power spectrum function of a function f is the function ω → |fˆ(ω)|2 . f is said to follow a (α-)power law if α |fˆ(ω)| ∼ C/|ω| , where C and α are some constants. It is well known that the power spectra computed over a large ensemble of natural image approximate a power law in spatial frequencies with α around 1.7 or at least in (0, 2) [8, 9]. We use often implicitly the following classical result from Calculus. Let B := B(0, 1) the unit ball of Rn and B c its complement. Let g a positive function defined on −α c Rn . Assume that g ∼ x in B (resp B ). Then g dx < ∞ if and only if α < n B (resp. B c g dx < ∞ if and only if α > n). Finally, to conclude this paragraph, given a regularization, the functions s(λ) and r(λ) will denote the squared L2 -norm of respectively the the regularized solution and of the residual as a function of the regularization parameter λ.
2 Tikhonov Regularization The first order Tikhonov regularization is defined as the minimizer of the energy functional Eλ [f ] = (f − g)2 + λ|∇f |2 dxdy (4)
On the Rate of Structural Change in Scale Spaces
835
where g is the observed data and λ is the regularization parameter. The energy functional is composed of two terms: the data fidelity term f − g22 and the regularization term ∇f 22 . Note that Wiener filter can be regarded as a Tikhonov regularization method applied to the Fourier domain. Thanks to Parseval’s theorem all calculation can be performed in the Fourier domain where this energy becomes ˆ Eλ [f ] = (fˆ − gˆ)2 − λ(ωx2 fˆ2 + ωy2 fˆ2 )dωx dωy . (5) Using the Calculus of Variations, a necessary condition for a function f to minimize the functional (4) is given by its Euler-Lagrange equation: (f − g) − λΔf = 0. In the Fourier domain, it becomes ˆ =0 fˆ − gˆ + λ(ωx2 fˆ + ωy2 f)
i.e fˆ =
gˆ 1 + λ|ω|2
(6)
1 that is, the original signal multiplied with the filter function F (λ, ω) = 1+λ|ω| 2 which is a non-increasing convex function w.r.t λ (for λ ≥ 0). Set d(λ, ω) = F (λ, ω)2 . It is important to remark that defining the regularization in frequency domain by λ → F (λ, ω)ˆ g (ω) extends Tikhonov regularization beyond the case where g ∈ W 1,2 (R2 ), the Sobolev space of L2 functions with L2 weak derivatives, which is the natural space for Tikhonov regularization as defined by minimization of (4). Indeed, the corresponding function s(λ) is given by s(λ) = F (λ, ω)ˆ g 22 = d(λ, ω)|ˆ g |2 dω. (7)
This is the integral of the squared filter function times the power spectrum of the original signal g, and we have the following result: Proposition 1. The squared L2 -norm s(λ) of the minimizer of the Tikhonov regularization functional as a function of the regularization parameter λ is, for non-trivial images, a monotonically decreasing convex function (for λ ∈ (0, ∞)), when it exists. If g follows an α-power law, then from the Calculus fact recalled in the previous section, g ∈ L2 (Rn ), however s(λ), s (λ) and s (λ) exist and are finite for λ > 0 if and only if α ∈ (0, 2) (which is the case for natural images). Both s and s diverge for λ → 0+ . The square of a non-increasing convex function is a convex function, and from Section 1.2 we have the first part of the proposition. Now dλ (λ, ω) = −
2|ω|2 (1 +
, λ|ω|2 )3
dλλ (λ, ω) = 6
2|ω|4 (1 + λ|ω|2 )4
.
s (λ) = dλ (λ, ω)|g|2 dω and s (λ) = dλλ (λ, ω)|g|2 dω and the rest of the proposition follows by elementary analysis. Set R(λ, ω) = 1 − F (λ, ω) and e(λ, ω) = R(λ, ω)2 . The Fourier image residual is R(λ)ˆ g and its squared norm is r(λ) = R(λ, ω)ˆ g 2 = e(λ, ω)|ˆ g |2 dω
836
D. Gustavsson et al.
An elementary calculation gives eλ (λ, ω) = 2λ|ω|2 /(1 + λ|ω|2 )3 and this function, is for λ fixed, bounded in ω while it satisfies ∀ω,
lim eλ (λ, ω) → 0, lim eλ (λ, ω) → 0
λ→0+
λ→∞
The same holds for r (λ) when it is finite and therefore by the mean value theorem, as it is positive, it must have a maximum and r (λ) must change sign and we can state the following: Proposition 2. Assume first that g ∈ W 1,2 (R2 ) is non trivial. Then, although s(λ) is convex and decreasing, the squared norm residual r(λ) of Tikhonov regularization, while increasing from 0 to g22 , is neither concave nor convex. Note that when g is a α−power law with α ∈ (0, 2), g ∈ L2 (R) while its regularization 2 2 gλ is when λ > 0, thus g − gλ ∈ L (R ) and r(λ) = g − gλ 22 = +∞.
3 Linear Scale-Space and Regularization Linear scale-space theory [10, 11, 12] deals with simplified coarse scale representation of an image g, generated by solving the diffusion (heat) equation with initial value g: ∂f = f, f (−, 0) = g(−) (8) ∂t where = ∂xx + ∂yy is the Laplacian. Equivalently, this coarse scale representation can be obtained by convolution with a Gaussian kernel: fσ = g ∗ Gσ ,
Gσ (x, y) =
1 − x2 +y2 2 e 2σ 2πσ 2
(9)
and the link between the two formulations is given by fσ = f (−, 2σ 2 ). A third formulation of Linear Scale-Space is obtained as “infinite order” Tikhonov regularization, the 1-dimensional case was introduced by Nielsen et al. in [1]. In dimension 2, one defines for λ > 0 2 ∞ k λk k ∂kf E[f ] = (f − g)2 dxdy + dxdy (10) k! ∂x ∂y k− k=1 =0 where k is the ( , k)-binomial coefficient. By a direct computation, its associated Euler-Lagrange equation is given by f −g+
∞ (−1)k λk k=1
k!
k f = 0
where k is the k-th iterated Laplacian
k = ◦ · · · ◦ =
k times
k k =0
∂ 2k ∂x2 ∂y 2(k−)
.
Via Fourier Transform, the Laplacian operator becomes the multiplication by −|ω|2 operator and as in 1st order Tikhonov regularization, the solution is given by filtering:
On the Rate of Structural Change in Scale Spaces
gˆ
∞
fˆ =
837
2
= e−λ|ω| gˆ.
(11) 1+ The solution of the filtering problem for a given λ > 0 is the same as solving (8) with t = λ. By setting λ = 2σ 2 and applying the convolution theorem to (9) one gets the above equation. Using the Fourier formulation, the squared norm of the solution at λ of (11) s(λ) the squared-norm residual r(λ) are given by 2 2 s(λ) = e−λ|ω| gˆ22 = e−2λ|ω| |ˆ g (ω)|2 dω, 2 2 −λ|ω|2 2 )ˆ g 2 = 1 − e−λ|ω| |ˆ g(ω)|2 dω. r(λ) = (1 − e λk |ω|2k k=1 k!
2
2
If one defines d(λ, ω) = e−2λ|ω| and e(λ, ω) = (1 − eλ|ω| ), they have with respect to convexity/concavity, the same properties as their Tikhonov counterpart defined in the previous section and one can state the following, in term of heat equation / Gaussian variance Proposition 3 1. The squared L2 -norm s(t) of the solution of heat equation as a function of the diffusion “time” t (or equivalently the convolution by the Gaussian kernel in function of the kernel variance) is, for non-trivial images, a monotonically decreasing convex function (for t ∈ (0, ∞)), when it exists. 2. The squared norm residual r(t) of the solution of the heat equation at time t, while increasing from 0 to g22 , is neither concave nor convex. If, instead of using the diffusion time / variance as parameter, one uses the standard deviation σ of the Gaussian kernel, the resulting solution squared norm function s(σ), 2 2 although increasing, may fail to be convex as the function σ → e−σ |ω| is not convex in σ, this is a half Gaussian bell. A simple example showing the convexity failure is provided by the band limited function b whose Fourier transform is ˆb(ω) = 1 if |ω| ≤ 1 and ˆb(ω) = 0 otherwise. A direct calculation gives 2 π 1 − e−σ s(σ) = σ which is neither convex nor concave. In the other hand, for a function g following a α−power law with α < 2, s(σ), this seems to be convex (for instance if α = 0, s(σ) = π/σ 2 , if α = 1, s(σ) = π 3/2 /σ 2 ). If,again, the power spectrum of the image g is following a power law in spatial frequencies, while its regularized L2 - norm is finite, the residual norm is not as the initial datum is not square-integrable.
4 Total Variation Image Decomposition Bounded Variation image modeling was introduced in the seminal work of Rudin et al. in [13], where the following variational image denoising problem is considered. Given an image g and λ > 0, find the minimizer of the following energy 2 E(f ; g, λ) = (g − f ) dxdy + λ |∇f | dxdy (12)
838
D. Gustavsson et al.
The regularized image fλ can be interpreted as a denoised version of g, but also as the “geometric” content of g while the residual νλ = g − fλ contains the “noise/fine texture” component. Several methods have been proposed to solve the above equation, by solving a regularized form of the Euler-Lagrange equation of the functional f − g − λ∇ ·
∇g =0 |∇g|
where ∇· denote the divergence operator, but also for instance the non linear projection method of Chambolle ( [14]), which we have used in this work. λ is a regularization parameter that determines the level of details that ends up in the (noise/texture) component νλ . As λ increases νλ will contain details of larger and larger scale, that will not appear in fλ . Again it is interesting so see how the image content changes as λ increases. The component vλ is the residual of the regularization and contains the details that are suppressed in the cartoon component fλ and we set r(λ; g) = vλ 22 = g − fλ 22
(13)
2
i.e. the squared L -norm of the residual image as a function of the regularization parameter λ. Related to the norm of the residual is the norm of the cartoon component as a function of λ s(λ; u0 ) = uλ 22 (14) s (λ) encodes the rate in which details are suppressed in the cartoon component uλ . Due to the high non linearity of the TV-regularization problem, there is no relatively simple expression for s(λ), r(λ) and their respective derivatives. A norm study for the dual norm of the TV norm was done by Meyer in [15]. A more direct behavior for the 2-norm can be computed in a few cases. For instance Strong and Chan [16] showed that if g is the function g(x) = 1 if x ∈ B(0, 1) the unit disk, g(x) = 0 if x ∈ B(0, 1), then its regularization has the form cg, where c ∈ (0, 1) is a constant, therefore attenuating the contrasts of the image. In general situation, we cannot expect these type of simple results. We have instead decided to study the behavior of these functions experimentally on an image database.
5 Experiments 5.1 Sinc in Scale Space Let g(x) = sin(x)/x be the Sinc function where x ∈ [−∞, ∞]. The squared L2 norm of the residual as a function of the regularization parameter is in the Tikhonov case r(λ) =
1
−1
(
λx2 2 ) dx 1 + λx2
(15)
and in the scale space case r(σ) =
1
−1
(1 − e
−ω2 σ2 2
)2 dω.
(16)
On the Rate of Structural Change in Scale Spaces
839
1.0
0.8
0.15
0.6
0.10
0.4
0.6
0.4 0.2 0.2 2
2
4
6
8
10
12
4
6
8
10
14
0.5
1.0
1.5
2.0
6
8
(a) Tikhonov regularization: Residual norm, first and second order derivative 0.5 1.5 0.4
0.4
0.3 1.0
0.3 0.2 0.2
0.1
0.5 0.1
2
4
0.1 2
4
6
8
2
4
6
8
0.2
(b) Scale Space: Residual norm, first and second order derivative Fig. 1. The residual norm as a function of the regularization parameter for g(x) = sin(x) . The x plots clearly indicate that residual norm function are, in both case, increasing functions, but not concave.
The result is presented in figure 1. The plots clearly indicate that the residual norm in both cases -is not concave. 5.2 Black Squares with Added Gaussian Noise The first experiment is done on an artificially generated 100 × 100 image containing four 3 × 3 black squares, one 20 × 20 black square and added Gaussian white noise with σ2 = 12. The white background has intensity 125 and the black square 10, after the noise has been added the image is zero mean normalize. In figure 2 the regularized and residual image are shown for increasing regularization using first order Tikhonov Regularization. As the small scale noise is suppressed, the large scale geometric structures are also smoothed out. The norm of the residual is an increasing function of the scale and it seems to be concave, and in fact it can be concave for the shown λ. However λ may be small at the inflection point. In figure 3 the regularized and residual images are shown for increasing regularization using linear gaussian scale space. The results for the linear Gaussian scale-space is similar to the result using first order Tikhonov regularization. In figure 4 the regularized and residual images are shown for increasing regularization using Total Variation image decomposition. The different structures are suppressed at using different λ while the large scale structures are well preserved. At λ = 12 the gaussian white noise is suppressed, and at λ = 210 is the small boxes remove and finally the large box is suppressed at λ = 550. The residual norm as a function of the regularization parameter is not a concave function of λ.
840
D. Gustavsson et al.
4.5
1800
First Order Derivative of the Residual Norm
Residual Norm
4 1600
3.5 1400
3 1200
2.5
2
1000
1.5 800
1 600
0.5 400
200
0
0
20
40
60
80 100 120 Regularization Parameter
140
160
180
200
−0.5
0
20
40
60
80 100 120 Regularization Parameter
140
160
180
200
Fig. 2. Result for the squares and noise image using first order Tikhonov regularization. On the first row the regularized and the residual images for λ = 3, 10, 20 and 50 are shown. The plots contain the L2 −norm of the residual as a function the scale λ, followed by the first order derivative in log-scale.
8
2600
Derivative of the Residual Norm in Log−Scale 7
2400
6 2200
5
2000
4
3 1800
2 1600
1 1400
0 Residual Norm 1200
0
10
20
30
40
50
60
Regularization Parameter − σ
70
80
90
100
−1
0
10
20
2
30
40
50
60
70
80
90
100
2
Regularization Parameter − σ
Fig. 3. Result for the squares and noise image using linear scale space. On the first row the regularized and the residual images for σ 2 = 1, 7, 13 and 64 are shown. The plots contain the L2 −norm of the residual as a function the scale σ, followed by the first order derivative in log-scale.
5.3 DIKU Multi Scale Image Sequence Database I The newly collected DIKU Multi-Scale image sequence database [17], contains sequences of the same scene captured using varying focal length. The sequences contain both man-made structures and nature, the distance to the main objects in the scenes also show a large variation (from a few meters to a few kilometers).
On the Rate of Structural Change in Scale Spaces
841
10
3500
Derivative of the Residual Norm in Log−Scale
Residual Norm
3000
5
2500
0
2000
−5 1500
−10 1000
−15 500
0
0
100
200
300 400 500 Regularization Parameter
600
700
800
−20
0
100
200
300 400 500 Regularization Parameter
600
700
800
Fig. 4. Result for the squares and noise image using TV-decomposition. On the row regularized and the residual images for λ = 12, 38, 100 and 200 are shown. The plots contain the L2 −norm of the residual as a function the scale λ, followed by the first order derivative in log-scale. The residual norm seems to be a monotonically increasing non-concave function. The residual norm has three points of ’high’ curvature: one at λ = 12 - the noise is suppressed - and λ = 210 - the small squares are suppressed, and λ = 580 - the large square is suppressed.
Each image has first been normalized by an affine intensity range change so that that the intensity range becomes [0, 1], followed by subtracting the mean value (i.e. the mean intensity is 0 in each image). The mean residual norm was computed on the normalized images in the database, using fixed scales σ = 2i where i = 0, · · · , 12, using linear gaussian scale space. The result is a feature vector ¯ r (0), · · · , r¯(12) containing 1 r¯(i) = r(i; I) (17) N I∈F
where F is the set of all N normalized images in the database. The (signed) distance function d(I0 ) of a normalized image I0 ∈ F to the mean is defined as d(I0 ) =
12
r(i; I0 ) − r¯(i)
(18)
i=0
The (signed) distance to the mean has been computed for all images in the DIKU database. Images with large positive values have a larger than average residual and images with large negative values have a smaller than average residual. The first row in figure 5 contains the 4 images with the largest positive distance to the mean, on the second row the 4 images with the largest negative distance to the mean. The image contents difference is striking and clearly indicate that the residual norm contains important contents information. The same experiment was performed using first order Tikhonov regularization with similar, but not identical, result.
842
D. Gustavsson et al.
Fig. 5. The top row show images where f (σ) is much larger than the average and bottom row show images where f (σ) is much smaller than the average. The contents difference is striking! The images in the first row contain small scale details (texture), while the images in the bottom row contain large scale geometric structures.
6 Conclusions For square-integrable images, the squared L2 -norms of the regularized images in first order Tikhonov regularization and linear Gaussian Scale Space are, in general decreasing convex functions of the regularizing parameter. This may fail for Linear Scale space when Gaussian standard deviation is used as a parameter. Their squared residual norm are however not concave functions. For the the Total Variation regularization too, it is shown empirically that the squared norm of the residual is not concave. This confirms that the squared norm of the residual may be an indicator of image structure, both for 1st order Tikhonov regularization, Gaussian Scale Space as well as Total variation regularization. The behavior of the latter will be studied further in future research.
Acknowledgements This research was funded by the EU Marie Curie Research Training Network VISIONTRAIN MRTN-CT-2004- 005439 and the Danish Natural Science Research Council project Natural Image Sequence Analysis (NISA) 272-05-0256. The authors wants to thank Christoph Schnörr (Heidelberg University), Niels-Christian Overgaard (Lund University) and Vladlena Gorbunova (Copenhagen University) for charing their knowledge.
References 1. Nielsen, M., Florack, L., Deriche, R.: Regularization, scale-space, and edge detection filters. International Journal of Computer Vision 7(4), 291–307 (1997) 2. Sporring, J., Weickert, J.: On generalized entropies and scale-space. In: ter Haar Romeny, B.M., Florack, L.M.J., Viergever, M.A. (eds.) Scale-Space 1997. LNCS, vol. 1252, pp. 53– 64. Springer, Heidelberg (1997)
On the Rate of Structural Change in Scale Spaces
843
3. Sporring, J., Weickert, J.: Information measures in scale-spaces. IEEE Transactions on Information Theory 45, 1051–1058 (1999) 4. Buades, A., Coll, B., Morel, J.M.: A non-local algorithm for image denoising. In: CVPR 2005: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), Washington, DC, USA, vol. 2, pp. 60–65. IEEE Computer Society, Los Alamitos (2005) 5. Thompson, A.M., Brown, J.C., Kay, J.W., Titterington, D.M.: A study of methods of choosing the smoothing parameter in image restoration by regularization. IEEE Transactions on Pattern Analysis and Machine Intelligence 13(4), 326–339 (1991) 6. Mrázek, P., Navara, M.: Selection of optimal stopping time for nonlinear diffusion filtering. International Journal of Computer Vision 52(2-3), 189–203 (2003) 7. Weickert, J.: Anisotropic Diffusion in Image Processing. ECMI. Teubner-Verlag (1998) 8. Ruderman, D.L., Bialek, W.: Statistics of natural images: Scaling in the woods. Physical Review Letters 73(6), 814–817 (1994) 9. Field, D.J.: Relations between the statistics of natural images and the response properties of cortical cells. Journal of the Optical Society of America A 4, 2379–2394 (1987) 10. Koenderink, J.J.: The structure of images. Biological Cybernetics 50, 363–370 (1984) 11. Witkin, A.P.: Scale-space filtering. In: Proceedings 8th International Joint Conference on Artificial Intelligence, Karlsruhe, August 1983, vol. 2, pp. 1019–1022 (1983) 12. Iijima, T.: Basic theory on normalization of a pattern. Bulletin of Electrical Laboratory 26, 368–388 (1962) (in Japanese) 13. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Phys. D 60(1-4), 259–268 (1992) 14. Chambolle, A.: An algorithm for total variation minimization and applications. Journal of Mathematical Imaging and Vision 20(1-2), 89–97 (2004) 15. Meyer, Y.: Oscillating Patterns in Image Processing and Nonlinear Evolution Equations: The Fifteenth Dean Jacqueline B. Lewis Memorial Lectures. American Mathematical Society (AMS), Boston (2001) 16. Strong, D., Chan, T.F.: Exact solutions to total variation problems. Technical Report 96-41, UCLA, Ca (1996) 17. Gustavsson, D., Pedersen, K.S., Nielsen, M.: A multi-scale study of the distribution of geometry and texture in natural images (2009) (in preparation) 18. Florack, L., Duits, R., Bierkens, J.: Tikhonov regularization versus scale space: A new result. In: Proceedings of International Conference on Image Processing (ICIP), pp. 271–274 (2004)
Transitions of a Multi-scale Image Hierarchy Tree Arjan Kuijper Fraunhofer Institute for Computer Graphics Research & TU Darmstadt, Germany
Abstract. In this work we describe the possible transitions for the hierarchical structure that describes an image in Gaussian scale space. Until now, this tree structure has only been used for topological segmentation. In order to perform image matching and retrieval tasks based on this structure, one needs to know which transitions are allowed when the structure is changed under influence of one control parameter. We present a list of such transitions, enabling tree edit distance operations.
1
Introduction
In the analysis of images and shapes, descriptors take a prominent place. The first aim of these descriptors is to represent the underlying structure in a simple way that is as invariant as possible, for instance with respect to rotations and scaling. Secondly, they should be robust with respect to (some) noise. Thirdly, they should capture “essential” aspects of the underlying structure, so that efficient and effective comparison tasks can be carried out on the descriptors. Essential for the latter is that the way the descriptor is obtained, is well-understood. This allows the definition of its possible changes. Robustness towards noise can be achieved by considering noise as a local perturbation of the structure. One way to overcome this perturbation is blurring the structure. A Gaussian filter is traditionally used for this purpose. It was pointed out by Koenderink [1], that choosing an a priori width of the kernel relates to observing the image at only one scale. Taking into account all widths (scales), the image is investigated at all small (“noisy”) levels and coarse (“structure containing”) ones. Doing so, one obtains a scale space image. Secondly, he pointed out that this equals to observing the image dynamically changed by the heat equation, thus linking the kernel based approach to a partially differential equation. It was shown that scale space image contains a tree-like sub-structure that serves as a rotation and scale invariant image descriptor [2] which can be used for image segmentation based on topological arguments. See for example [3] for full details and related work. In order to be able to use the proposed tree structure for tasks like image matching, retrieval, and reconstruction (like [4,5,6] were points in the scale space image are used), one needs to understand how this tree can change. The focus of this paper is to describe the possible changes of this tree structure. We restrict ourselves to a one parameter family of perturbations, that is, the structural changes (called singularities or catastrophes) that occur when one X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 844–855, 2009. c Springer-Verlag Berlin Heidelberg 2009
Transitions of a Multi-scale Image Hierarchy Tree
845
extra introduced parameter changes, for instance due to varying imaging conditions. Such changes can influence the “building blocks” of the tree structure and allow well-defined tree-based edit distance operations [7]. We will start with an short introduction to scale space and polynomials in it [1, 8], catastrophe theory [9, 10], and the tree structure [2] in section 2. In section 3 we introduce a special type of points that occur in our analysis, namely degenerated scale space saddles. Together with catastrophe points these form the basis of the possible transitions. They are presented in section 4, while the consequences for the tree structure are given in section 5. We give a simple example to illustrate the theory on an MR image in section 6 and give conclusions in section 7.
2
Theory
Let L(x) : Rn → R be an image with x an n-dimensional spatial variable (point) and L((x)) the intensity measured at a point x. The Gaussian scale space image L(x; t) is defined as the convolution of L with a Gaussian: 2 1 − |x−y| 4t L(x; t) = L(y) dy (1) √ ne 4πt Rn The Gaussian filter is the Greens’ function of the diffusion, or heat, equation: ∂t L(x; t) = ΔL(x; t) with limt↓0 L(x; t) = L(x). For simplicity we will assume that n = 2 and, for notational ease x ∈ R2 , i.e. we assume that the image is embedded in the complete R2 . 2.1
Scale Space Polynomials, Jets
At each point (x0 , y0 ) a Taylor expansion can be made of a function L(x, y) to investigate the local structure: L(x, y) ≈ L + iLi + 12 ijLij + 16 ijkLijk + . . . , where L(.) denotes the partial derivatives with respect to the variables i, j, · · · ∈ (x, y), evaluated at the point of interest. In Gaussian scale space the same holds for the spacial and scale variable, i.e. i, j, · · · ∈ (x, y, t) and all derivatives of L(x, y, t) are evaluated at (x0 , y0 , t0 ). This yields a scale space polynomial. Next, due to the heat equation, the scale derivatives in Taylor expression can be expressed in terms of spatial derivatives, since ∂tn = Δn . The nth order scale space jet is defined as the scale space polynomial with spacial derivatives up to order n. 2.2
Critical Curves, Scale Space Germs
Critical curves are curves in scale space that satisfy ∇x L = 0. It has been proven by Damon [9] that these curve do not intersect in scale space unless extra constraints (like symmetry) are added. The curves consist of saddle branches and extremum branches that meet pairwise at catastrophe points. At such points the spatial Hessian matrix Lxx Lxy H= (2) Lxy Lyy
846
A. Kuijper
degenerates and has exactly one eigenvalue equal to zero. Tracing critical points over scale, at such catastrophe points a saddle-extremum pair is created or annihilated. These catastrophe points are also called top points [4, 5, 6], since they occur at local extrema with respect to the scale axis: at local maxima for annihilations and at local minima for creations. In Gaussian scale space, there is one semi-free parameter: scale. Therefore, an eigenvalue of the Hessian matrix can become zero with multiplicity one. The generic catastrophe is thus described by terms x3 and y 2 , called A2 or cusp [10]. To account for the fact t can only increase during the evolution, two scale space polynomials are needed to describe an annihilation (Eq. (3)) and a creation (Eq. (4)) in a small environment of the origin with local coordinates x, y, and t: La = x3 + 6xt + y 2 + 2t 3
2
(3) 2
L = x − 6xy − 6xt + y + 2t c
(4)
The critical curves √ and are parameterised by √ cc occur in the (x, t) plane cca (x, y, t) = (± −2t, 0, t) and ccc (x, y, t) = (± 2t, 0, t). This follows directly from the x derivatives of Eqs. (3 - 4): Lax |y=0 = 3x2 + 6t and Lcx |y=0 = 3x2 − 6t. As an important consequence of the A2 catastrophe, critical curves do not intersect (as this requires a higher order catastrophe), but can contain subsequent creation-annihilation points. This implies that we can define scale space germs as scale space polynomials that yield critical curves with only generic catastrophes. For instance, La in Eq. (3) is a valid scale space germ, but Lc in Eq. (4) not, as 1 for t = 72 an intersection of two critical curves occurs. However, Lc + y, =0 is a scale space germ. 2.3
Saddle Points in Scale Space
In Gaussian scale space the only type of critical points are saddle points [11]. These scale space saddles appear at critical curves since the spatial derivatives vanish. To investigate these points, consider the Hessian matrix in scale space (the extended Hessian): ⎛ ⎞ Lxx Lxy Lxt H = ⎝ Lxy Lyy Lyt ⎠ (5) Lxt Lyt Ltt Since this matrix contains the spatial Hessian (Eq. (2)), at least one eigenvalue is positive and one is negative [2]. At scale space saddles the intensity on a critical curve has a local extremum: Let a curve be parametrised by L(x(t), y(t), t), then ddt L(x(t), y(t), t) = Lx xt + Ly yt + Lt . Since the parametrisation takes place at a critical curve, the spatial derivatives are zero, so ddt L(x(t), y(t), t) = Lt . Finally, at a scale space saddle Lt = 0 (and consequently ΔL = 0 and trH = 0). 2.4
A Multi-scale Image Descriptor
In 2+1D scale space images iso-manifolds through scale space saddles divide the 3D volume into two parts. At the initial image such parts reduce to areas
Transitions of a Multi-scale Image Hierarchy Tree
847
P
SSS D
C
Fig. 1. A sketch of the local structure at a scale space saddle in (x, y, t) space, t vertical (left), same in the y = 0 plane with the critical curves dashed (middle), and its algebraic tree representation (right)
that correspond to topological segments around two extrema. It is possible to discriminate between the two parts connected at the saddle, due to the fact that the scale space saddle is connected to one of the extrema via a critical curve. A sketch of such a structure is given in Figure 1 (see e.g [3] for full details, e.g. on the role of creations). On the left one sees critical curves and an isomanifold through a scale space saddle; a sketch in the (x, t) plane is given in the middle. The critical curve on the right (called “C”) contains a saddle branch and an extremum branch. The two branches are connected at the catastrophe (top) point. Via the iso-manifold through the scale space saddle “SSS”, this critical curve is connected to the other one on the left (called “D”). This is schematically visualised in the right image, where the “C” child part is connected to the curve with nodes “D” (child) and “P” (parent) via SSS . This is the “building block” of the hierarchical tree. Each inner node represents a scale space saddle, while the leaves are formed by the extrema in the initial image.
3
Degenerated Scale Space Saddles
The matrix in Eq. (5) is degenerated when at least one of its eigenvalues equal zero, i.e. det H = 0. Obviously, this is an extra requirement in scale space and thus non-generic. A degenerated (extended) Hessian implies that the type of the point cannot be resolved. For critical points it means that it is neither a saddle nor an extremum, but merely a combination of both - exactly because two of such points coincide. For scale space saddles, a zero eigenvalue of the extended Hessian implies that one of the other eigenvalues is positive and one negative, for instance when a saddle with two negative eigenvalues and one with two positive eigenvalues coincide. Since the latter denotes a local minimum of L(x(t), y(t), t) and the former a local maximum of L(x(t), y(t), t), such an event is visible as appearing as a point of inflexion of L(x(t), y(t), t).
848
A. Kuijper
Theorem 1. Degenerated scale space saddles coincide with points of inflexion: d2 L(x(t), y(t), t) = 0 ⇔ det H = 0 dt2
(6)
Proof. The points of infection of L(x(t), y(t), t) are with vanishing first (i.e. scale space saddles) and second order derivatives of L with respect to t. Then d2 dt2 L(x(t), y(t), t)
d = dt (Lx xt + Ly yt + Lt ) = (Lxx xt + Lxy yt + Lxt )xt +(Lxy xt + Lyy yt + Lyt )yt +(Lxt xt + Lyt yt + Ltt ),
(7)
since we can ignore the derivatives of the spatial parametrisations xt and yt , as they are accompanied by spatial derivatives that vanish on critical curves. Next, the right hand side of Eq. (7) can be written as (xt , yt , 1) · H · (xt , yt , 1)T , which equals zero iff det H = 0 since H is symmetric. So when two scale space saddles coincide, the resulting point is degenerate. Theorem 2. On critical curves, Eq. (7) can be simplified to d2 L(x(t), y(t), t) = (Lxt xt + Lyt yt + Ltt ). dt2
(8)
d Lx (x(t), y(t), t) = Proof. On critical curves, we have Lx = 0 and consequently dt 0. The similar argument holds for Ly , so (Lxx xt + Lxy yt + Lxt ) = 0.
In a one-parameter family, only one eigenvalue equals zero. So we assume that the special event locally takes place in the (x, t) plane, while y is a regular (Morse) 2 variable. Then we may neglect y derivatives and get ddt2 L(x(t), 0, t) = (Lxt xt + Ltt ) and det H = Lxx Ltt − L2xt . At critical points we obtain Lxx xt + Lxt = 0.
4
Transitions
When we allow a change driven by one parameter, we expect to see situations that are non-generic for still images. However, for moving images, e.g. films or a sequence to warp one image into another, such situations can become generic. Since the tree structure relies on critical curves, catastrophe points and scale space saddles, we will discuss the effect of the simplest combinations of them: 1. 2. 3. 4. 5.
two catastrophe points coincide on a critical curve, two critical curves intersect (necessarily at a catastrophe point), a catastrophe point coincides with scale space saddle, two scale space saddles coincide, and two scale spaces saddles on a critical curve have the same value - either at one critical curve, or at different curves. 6. two scale spaces saddles on different critical curves but on the same isomanifold have the same value.
Transitions of a Multi-scale Image Hierarchy Tree
849
Fig. 2. A critical curve in (x, y, t) space, t vertical. From left to right: When in Eq. (9) increases, a pair of top points is removed.
For all situation we will describe scale space germs. They are generic in a oneparameter family of perturbations iff there is exactly one parameter that has to be fixed to obtain the described situation. We note that we described cases 1 and 2 (partially) in some detail before 4, but summarize the results for completeness. In section 5 we describe the consequences of these events for the tree structure. 4.1
Two Catastrophe Points Coincide on a Critical Curve
The situation that two catastrophe points coincide on a critical curve implies the description of a creation or an annihilation of a pair of creation and annihilations events. Such a pair exists if a critical curve traverses the manifold det H = 0 twice. If the curve is perturbed, it is pulled away from this manifold. Exactly where the curve is tangent to the manifold, this special situation occurs. In [12], this was modelled by using a scale space germ in analogy of Eq. (4) by Lc = x3 − 6xy 2 − 6xt + y 2 + 2t + y, (9) √ 1 6) a creation and an annihilation with = 0 a free parameter. For ∈ (0, 32 √ √ 1 1 occur, for > 32 6 there are zero catastrophes, and for = 32 6 the two catastrophes coincide (are created or annihilated, depending on the decrease or increase of ). So with an additional parameter “wiggles” at a critical curve can be removed, i.e. a smoothing of the critical curve, as shown in Figure 2. 4.2
Two Critical Curves Intersect
The intersection of critical curves occurs at catastrophe points. Therefore, this event is described by an higher order catastrophe. For an annihilation (Eq. 3) this can be modelled [12] by Lx = 4x(x2 + 6t), arising from the scale space germ L = x4 + 12x2 t + 12t2 + x + y 2 + 2t
(10)
= 0 one For = 0 one obtains the so-called A3 catastrophe in scale space, for has the generic A2 catastrophe, see Figure 3. For a creation we use the modelling Lx = 4x(x2 − 6t), arising from the more complicated scale space polynomial L = x4 − 12x2 t − 12t2 − 12x2 y 2 + 2y 4
(11)
850
A. Kuijper
Fig. 3. A critical curve in (x, y, t) space, t vertical. From left to right: When the sign of in Eq. (10) changes, the annihilation takes place with the other extremum.
The scale space germ is obtained by adding the perturbation terms α(x2 + 2t) + β(y 2 + 2t) + γx + δy.
(12)
Choosing non-zero values for α, β, and γ, together with δ = 0 (that is, a one parameter degeneration), yields the desired creation result. 4.3
A Catastrophe Point Coincides with Scale Space Saddle
When a scale space saddle and a catastrophe point coincide, the following requirements hold: det H = 0 and trH = ΔL = ∂t L = 0. The latter implies that Lxx = −Lyy , so the former reads −L2xx − L2xy = 0. So the complete second order structure has to vanish: Lxx = Lxy = Lyy = 0. This is due to the fact that trH = 0 implies that det H is non-positive. We needed to set two parameters equal to zero, instead of one. So this situation is not generic in a one parameter family of perturbations. This is in line with intuition, stating that we cannot simple change an extremum into a saddle, vice versa. 4.4
Two Scale Space Saddles Coincide
For the situation that two scale space saddles coincide – a degenerated scale space saddle – we take the case that the event takes place in the (x, t) plane. Then it follows that Ltt is at least O(x): If Ltt = 0 than Lxt = 0, i.e. we are left with an ordinary critical point. If Ltt = O(1) we have L = O(x4 , t2 ), and similar to the case above the saddle coincides with the catastrophe point, since Lxx has to vanish. Therefore, Ltt = O(x) and the simplest scale space 5-jet (in x only) reads
1 5 1 3 1 2 x + tx + t x + x2 − y 2 + δ x2 + 2t + x (13) 120 6 2 Eq. (13) represents an A4 catastrophe in scale space, where the saddle is located at the origin. For such a catastrophe perturbations are required for the orders x3 , x2 , and x. Note that scale t perturbs x3 . For the three requirements L=
Transitions of a Multi-scale Image Hierarchy Tree Δ,Ε 0.025, 1.
Δ,Ε 0., 1.
0.4 0.2 0 -0.2 -0.4 -0.6 -0.8
Δ,Ε 0.025, 1.
0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -2
-1
0
1
0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -2
Δ,Ε 0.025, 0.
-1
0
1
-1
0
1
-1
0
1
-2
Δ,Ε 0., 1.
0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1
0
1
2
1
-1
0
1
Δ,Ε 0.025, 1.
0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -2
0
0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -2
Δ,Ε 0.025, 1.
-1
Δ,Ε 0.025, 0.
0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -2
-2
Δ,Ε 0., 0.
0.4 0.2 0 -0.2 -0.4 -0.6 -0.8
851
0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -2
-1
0
1
2
-2
-1
0
1
2
Fig. 4. Plots of L(x(t, y(t)), t) along a critical curve of the scale space polynomial of Eq. (13) for several values of δ and . Each column (changing , δ fixed) shows a sequence where two scale space saddles (local extrema on the curve) meet and (dis)appear.
Lx = 0, Lt = 0 and det H = 0 one gets δ = δ(), that is, one free parameter remains. In Figure 4 a parameterized critical curve is shown for several values of δ and . Each column shows a sequence with varying where two scale space saddles (local extrema on the curve) meet and (dis)appear. Constraining the event to the origin yields δ = 0 and = 0, i.e. the plot in the middle. Here one clearly sees a horizontal tangent. Degenerated scale space saddles are generic in a one parameter family of perturbations, which implies that pairs of scale space saddles can be created and annihilated on a critical curve. 4.5
Two Scale Spaces Saddles with the Same Value
The case that two scale space saddles have the same intensity is easily derived from the previous section. In Figure 4, the plot in the middle of the third row shows exactly this phenomena when in Eq. (13) instead of δ now is taken fixed. For < 0 the critical curve contains 3 extrema. When δ varies, their intensities vary and two have equal intensity for δ = 0. This effect in the (x, t) plane, i.e. for the separation of parts in the scale space image, is shown in Figure 5 (cf. the middle plot in Figure 1). Here the iso-manifold is visible as an isophote since we do not consider the (Morse) y variable. When δ = 0, the iso-manifolds are connected at two places (middle plot), one of each is taken when δ = 0. As one can see, the region enclosed by the iso-manifold remains stable when going through the transition. Only the location of the scale space saddle changes suddenly.
852
A. Kuijper
Fig. 5. The critical curves (dashed) and iso-manifolds through the scale space saddles for Eq. (13) for several values of δ and = −1. When δ = 0, the saddles with equal intensity occur.
Fig. 6. Two scale-space saddles are located at one manifold (middle); Perturbing yields two nested manifolds with each one scale-space saddle (left and right). Critical curves are represented by the dashed curves, the iso-manifolds by the continuous curves.
4.6
Two Saddle Points on a Manifold with the Same Intensity
Of course, the scale-space saddles do not have to lie on the same critical curve. In the situation that a iso-manifold contains two scale-space saddles, the local description needs two saddle branches and three extremum ones. Consequently, one needs a polynomial expression of L6 (x, t) = O(x6 ) = x6 + 30x4 t + 180x2 t2 + 120t3 . Perturbations are of orders L3 (x, t) = x3 + 6xt, L2 (x, t) = x2 + 2t, and L1 (x) = x. Again, t perturbs the O(x4 ) terms. So the simplest description reads L(x, y, t) = x2 − y 2 + L6 (x, t) + αL3 (x, t) + βL2 (x, t) + γL1 (x)
(14)
In Fig. 6 one can see the unperturbed situation in the middle, and two perturbed situations on the left and the right. The unstable situation with two scale-space saddles on one iso-manifold is a transition.
5
Consequences for the Tree Structure
For the tree structure these transitions imply the following results: 1. Two catastrophe points coincide on a critical curve: This has no direct influence, since the complete critical curve is used in the construction of the
Transitions of a Multi-scale Image Hierarchy Tree
D2
e1
C2
D3
853
C3
D3
C3
D2
C2
e2
e3
e1
e2
e3
Fig. 7. The tree representations of the transition visualised in Fig. 6. The two scalespace saddles swap in hierarchy, which is a simple rotation of a parent-child pair of internal nodes.
2.
3.
4.
5.
6.
6
tree. This event merely describes a smoothing effect allowing one to reduce the number of catastrophes on a critical curve. Two critical curves intersect: This event describes the change in ordering of the two child nodes “C” and “D” in the hierarchy when one catastrophe describes an annihilation. In the case that one describes a creation, it can be regarded as handing over a local creation-annihilation from one curve to another, which has no influence on the tree. A catastrophe point coincides with scale space saddle: This requires a total disappearance of second order structure and is a co-dimension two event, i.e. not generic. Two scale space saddles coincide: When going through a degenerated scale space saddle, two scale space saddles are created or annihilated. This does not influence the tree (although it may have some consequences when followed by the following event). Two scale spaces saddles on a critical curve have the same value. In this case, another scale space saddle connects the two parts of the manifold. Although the intensity of the manifold changes continuously, the location of the scale space saddles changes discontinuously. In the tree structure this influences the information stored at the node. If an iso-manifold under perturbation goes through a situation where it contains two scale space saddles on two different curves, the impact on the tree is a change of parent-child nodes, as shown in Fig. 7.
Example
We illustrate the theory with the MR image shown in Fig. 8. Normal [0, 100] distributed noise is added, and of both a blurred version is computed. We derived the pre-segmentations and tree structures of both images shown in Fig. 9 as described in [2]. For visualisation purposes, we used the blurred versions for reference to keep the trees rather simple. The labels in the trees refer to the extrema in the blurred MR images as shown in Fig. 10. It is clear that a mapping based on the pre-segmentation in these images yields the pair A1, B2, C3, E5, F 6, and G7. This is also provided by the locations of spatial locations of these extrema (deviation of maximal one pixel).
854
A. Kuijper
Fig. 8. From left to right: An MR image, its noisy version (although intensities increased, the graphical output is rescaled) and their blurred versions
R G D E F A C B
R 7 5 6 1 3 2 4
Fig. 9. The tree structures of the MR image and the noisy MR image, respectively, starting scale the blurred versions
E B F C
5 2 6 34
D A G
7
H
1
8
Fig. 10. The labelled pre-segmentations, and the segment belonging to the left sub trees in the white matter for both MR images
The differences in the trees are the labels (extrema) D and 4. The operation on D is a simple deletion of a leave. Leave (extremum) 4 is added to the subtree spanned by extremum 1. Its position is found comparing the intensity of the scale space saddle with those that are related to extremum 1. Alternatively, it can be considered as replacing leave 1 by the building block with extrema 1 and 4, and applying the subsequent rotations with the scale space saddle belonging to extremum 4: with the node representing the scale space saddle related to extremum 3, followed by the one related to extremum 2.
7
Summary and Discussion
We introduced degenerated scale space saddles in section 3 and discussed in section 4 the six possible simple situations that may occur when an extra constraint is posed on the building blocks of the hierarchical structure in scale space, viz.
Transitions of a Multi-scale Image Hierarchy Tree
855
the critical curves, catastrophe points, and (degenerated) scale space saddles. We showed that one of them (described in section 4.3) is a co-dimension two event, requiring two vanishing control parameters. The other cases are co-dimension one events and generic in a one-parameter setting. These cases describe transitions of structures that are non-generic in scale space, but when allowing an additional constraint they become generic. This is useful when we want to change one image into another, i.e. matching. The list in section 5 indicates that the hierarchical tree structure changes only with respect to the ordering of children, information stored in the nodes and rotation of a parent-child node combination. The consequences of the standard events in scale space, viz. creation and annihilation of pairs of critical points, are the addition or removal of leave elements. Together they form the possible changes of the tree under a one parameter family of changes, give the grammar for relevant matching algorithms, cf. [7], and extend point-based methods like [4, 5, 6]. A simple example for this was shown in section 6.
References 1. Koenderink, J.J.: The structure of images. Biological Cybernetics 50, 363–370 (1984) 2. Kuijper, A., Florack, L.M.J.: Hierarchical pre-segmentation without prior knowledge. In: Proceedings of the 8th ICCV, pp. 487–493 (2001) 3. Kuijper, A.: Exploring and exploiting the structure of saddle points in Gaussian scale space. Computer Vision and Image Understanding 112(3), 337–349 (2008) 4. Kanters, F., Platel, B., Florack, L.M.J., ter Haar Romeny, B.: Content based image retrieval using multiscale top points. In: Griffin, L.D., Lillholm, M. (eds.) ScaleSpace 2003. LNCS, vol. 2695, pp. 33–43. Springer, Heidelberg (2003) 5. Kanters, F., Lillholm, M., Duits, R., Janssen, B., Platel, B., Florack, L.M.J., ter Haar Romeny, B.: On image reconstruction from multiscale top points. In: Kimmel, R., Sochen, N.A., Weickert, J. (eds.) Scale-Space 2005. LNCS, vol. 3459, pp. 431– 442. Springer, Heidelberg (2005) 6. Platel, B., Balmachnova, E., Florack, L.M.J., Kanters, F., ter Haar Romeny, B.: Using Top-Points as Interest Points for Image Matching. In: Fogh Olsen, O., Florack, L.M.J., Kuijper, A. (eds.) DSSCV 2005. LNCS, vol. 3753, pp. 211–222. Springer, Heidelberg (2005) 7. Fogh Olsen, O.: Tree edit distances from singularity theory. In: Kimmel, R., Sochen, N.A., Weickert, J. (eds.) Scale-Space 2005. LNCS, vol. 3459, pp. 316–326. Springer, Heidelberg (2005) 8. Lindeberg, T.: Scale-Space Theory in Computer Vision. The Kluwer International Series in Engineering and Computer Science. Kluwer Academic Publishers, Dordrecht (1994) 9. Damon, J.: Local Morse theory for solutions to the heat equation and Gaussian blurring. Journal of Differential Equations 115(2), 386–401 (1995) 10. Arnold, V.I.: Catastrophe Theory. Springer, Berlin (1984) 11. Koenderink, J.J.: A hitherto unnoticed singularity of scale-space. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(11), 1222–1224 (1989) 12. Kuijper, A., Florack, L.M.J.: The relevance of non-generic events in scale space models. International Journal of Computer Vision 57(1), 67–84 (2004)
Local Scale Measure for Remote Sensing Images Bin Luo1 , Jean-François Aujol2 , and Yann Gousseau3 1
CNES/DLR/ENST Competence Center and Telecom ParisTech [email protected] 2 CMLA, ENS Cachan, CNRS, UniverSud [email protected] 3 Telecom ParisTech, LTCI CNRS [email protected]
Abstract. This paper addresses the problem of defining a scale measure for digital images, that is, the problem of assigning a meaningful scale information to each pixel. We propose a method relying on the set of level lines of an image, the so-called topographic map. We make use of the hierarchical structure of level lines to associate a level line to each pixel, enabling the computation of local scales. This computation is made under the assumption that blur is constant over the image, and therefore adapted to the case of satellite images. We then investigate the link between the proposed definition of local scale and recent methods relying on total variation diffusion. Eventually, we perform various experiments illustrating the spatial accuracy of the proposed approach.
1
Introduction
Scale is a fundamental concept in digital image analysis. In particular, computing local scale information is often a preliminary step for structure extraction, enabling the tuning of analysis tools and permitting scale invariant analysis. In texture or remote sensing image analysis, scale itself is also a useful feature for recognition or classification. In this paper, we propose a new method relying on morphological tools to compute characteristic scales in a digital image. Roughly speaking, a pixel is characterized by the size of the most contrasted object it belongs to. In order to compute scales in an image, the approach initially proposed in [13] is probably the most extensively used. The basic idea is to compute local scales as extrema of various differential operators in the linear scale-space. Based on similar ideas, it is proposed in [11,23,28] to estimate scales in an image by considering extrema in the linear scale-space of operators based on information theory. The linear scale-space has also been proposed in the framework of remote sensing image analysis as a convenient way to estimate a resolution invariant characteristic scale [15]. However, such methods cannot achieve spatial accuracy in the computation of local scales, since it is well-known that Gaussian convolutions yield geometric degradations of the image through the diffusion of edges. Recently, it has been proposed to use the total variation regularization to compute local scales [26, 27]. The underlying idea is that using non-linear parX.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 856–867, 2009. c Springer-Verlag Berlin Heidelberg 2009
Local Scale Measure for Remote Sensing Images
857
tial differential equations enables one to get spatially accurate results. In [4], it is proposed to estimate the local scales of structures by looking at the way they evolve under the total variation flow. In a different context, the mathematical morphology school has proposed to characterize materials through the size distribution of their constituents, by using the concept of granulometry [10]. Similarly, it has been proposed in [16] to use the derivative of granulometry, the pattern spectrum, to index gray-scale images. In the framework of remote sensing imaging, the authors of [7] have proposed, in view of the classification of satellite images, to compute size distributions (called derivative morphological profile) at each pixel. In this paper, we introduce at first a method to compute a local scale measure (a characteristic scale defined at each pixel) relying on the topographic map [5] of the image. The main idea is to associate to each pixel the scale of the most significant structure containing it. Contrarily to previous morphological approaches, the proposed method is auto-dual (i.e. dark and bright structures are processed in the same way) and does not necessitate any structuring element. More precisely, we use the Fast level set transformation (FLST) [20], an efficient tool to compute the topographic map, representing an image by a hierarchical structure (an inclusion tree) of shapes. From this tree we search, for a given pixel, the most contrasted shape containing it and we associate the scale of this shape to the pixel. Since remote sensing images are blurred by the PSF of the optical instrument, the contour of a single structure can be diffused into several level lines. We propose to group the level lines belonging to the same structure. The criterion used to decide whether level lines should be grouped is based on the assumption that the optical blur is constant over the image, an assumption which makes sense for remote sensing images but would be wrong for natural, everyday life images. The second contribution of the paper is a study of the relationships between morphological definitions of scale and definitions relying on the total variation. We show that approaches relying on the total variation flow or regularization yield local scales that are defined, under some assumptions, as weighted averages of the size of shapes containing a given pixel. The paper is organized as follows. In Sect. 2, we present our method for local scale computation. In Sect. 3, alternative variational definitions of scale are recalled and the link between these approaches and ours is investigated. In Sect. 4, we illustrate the method with numerical examples on remote sensing images and compare our approach with variational methods.
2 2.1
Local Scale Measure Based on Topographic Map Topographic Map
We present in this section the main tool to be used in this paper in order to define local scales, the topographic map of an image as introduced in [5]. The topographic map is made of the set of level lines of an image. A level line is defined as a connected component of the topological boundary of a level set.
858
B. Luo, J.-F. Aujol, and Y. Gousseau
Equivalently, the topographic map can be seen as a collection of shapes, as defined below. For an image u : Ω ⊂ R2 → R, its upper and lower level sets are respectively defined as Ψλ = {x ∈ Ω, u(x) ≥ λ}
and
Ψ λ = {x ∈ Ω, u(x) ≤ λ},
for λ ∈ R. Observe that u can be reconstructed using respectively upper level sets or lower level sets. Moreover, these sets are globally invariant with respect to contrast changes. Each of these family, upper sets on the one hand and lower sets on the other hand, has a tree structure with respect to inclusion. Several authors ( [22, 5]) have proposed the connected components of level sets as an efficient way to represent images. In order to obtain a unique tree structure of connected component, Monasse et al. [19, 20] introduced the FLST. A shape is defined as the union of a connected component of an upper or lower set together with its holes. The holes of a set A are defined as the connected components of the complementary set of A which do not intersect with the boundary of Ω. Under mild regularity assumptions, shapes correspond to the interior of level lines, see [6]. An important property of the tree of shapes is its invariance to local contrast changes and its auto-duality, that is, its invariance with respect to the operation u → M − u. This implies in practice that light and dark objects are treated in the same way, a property which enables us to associate a unique contrasted shape to each pixel. Figure 1 shows the result obtained with the FLST algorithm on a synthetic image. For a pixel x of an image u, we denote by {fi (x)}i∈A(x) the set of shapes that contain x, A(x) being the set of indices such that fi (x) ⊂ fi+1 (x). For the sake of clarity, we will omit the x dependency when it is not necessary. For each shape, we define S(fi ) its area, P (fi ) its perimeter, and I(fi ) the gray level value associated to fi . The contrast of the shape fi is then defined as the absolute
(a)
(b)
Fig. 1. Example of FLST : (a) Synthetic image ; (b) Inclusion tree obtained with FLST
Local Scale Measure for Remote Sensing Images
859
value of the difference between the gray level values associated respectively to fi+1 and fi : C(fi ) = |I(fi+1 ) − I(fi )| 2.2
(1)
Scales of an Image
Basically, we want to associate to each pixel a shape (i.e. a node in the FLST) from which its scale can be computed. Such shapes are obtained by filtering the Topographic Map. Shapes are recursively grouped in order to account for structures present in the image and the most contrasted groups are kept. Shape grouping is defined by taking advantage of the particular structure of satellite images, for which the blur is constant over the image and depend only on the (usually known) PSF of the acquisition device. We then define the local scale associated to each pixel. Most Contrasted Shape. In view of (1), the simplest definition of the most contrasted shape at a pixel would be the shape with highest contrast among all shapes containing x, i.e. fˆ(x) = farg maxi∈A(x) C(fi ) .
(2)
However, this definition is not applicable. Indeed, the definition of contrast by (1) corresponds to the contrast of a given binary structure under the assumption that only one line is associated to the boundary of this structure. Now, in a natural image, the contours of objects are always blurred by the acquisition process. As a consequence, a discrete image being quantified, a contour is in fact associated to a set of level lines. In practice, the contrast of each line is often equal to one. Therefore, the choice of the most contrasted shape using (2) can be ambiguous at best or even completely meaningless in the presence of strong noise. A possible solution to this problem has been proposed in [8]. It consists in computing the contrast of a line in a neighborhood, and then in selecting the most meaningful line along monotone branch of the tree. In the present work, we choose to group level lines corresponding to a single structure by using a simple model of blur. To do so, we recursively sum up the contrasts of shapes fi and fi+1 such that (3) S(fi+1 ) − S(fi ) < λP (fi ) where λ is a constant. This criterion relies on the hypothesis that the level lines corresponding to a blurred contour are regularly separated by a distance λ. Let us remark that the hypothesis of a constant blurring kernel for the whole image is realistic in the case of satellite images. We thus define the cumulated contrast of a shape fi as: i ¯ i) = C(f C(fk ), (4) k=a(i)
860
B. Luo, J.-F. Aujol, and Y. Gousseau
where, for all i, a(i) = min{j|∀k = j + 1, . . . , i, S(fk ) − S(fk−1 ) ≤ λP (fk−1 )}. ¯ i ) = C(fi ). The If a(i) is not defined (that is if (3) is not satisfied), then C(f cumulated contrast of fi is therefore obtained by adding the contrasts of close enough level lines, which usually correspond to the same contour in the image. The most contrasted shape associated to x is then defined as: fˆc (x) = farg maxi∈N C(f ¯ i (x))
(5)
In the case when the maximum is reached at more than one index, then the smaller one is chosen. We conclude this section by noticing that a method to group level lines relying on criteria similar to (3) (but using no perimeter information) was proposed in [19] as an efficient alternative to shock filters, in the framework of image restoration. Notice that the notion of Maximally Stable Extremal Regions (MSER) [18], which are popular in computer vision community, can be seen as an alternative way for selecting the significant shape for a given pixel. The MSERs are defined as the shapes for which the quantities S(fi+1 (x)) − S(fi−1 (x))/S(fi (x)) reach local minima. Observe that there may be several MSERs containing a pixel. A further selection of shape must be performed for computing the scale. [14] shows that the selecting methods bias strongly the definition of local scale. Level Lines, Edges and Blur. We now investigate the validity of the use of (3) for grouping lines corresponding to a single edge. Let fi and fi+1 be two consecutive shapes corresponding to the same object, an object being defined as a constant times the indicator function of some set smoothed by the acquisition kernel. Writing q for the quantization step and neglecting sampling, we have, for some gray level l, fi = Ψl = {x ∈ Ω / u(x) = l} and fi+1 = Ψl+q . Now, as noticed in [9], if x(s) is a parameterization of ∂Ψl , then ∂Ψl+q can be approximated, for small q by: x ˜(s) = x(s) + q
∇u |∇u|2
(6)
If we now assume that |∇u| ≥ C for some C > 0, then fi+1 ⊂ fi ⊕ D(qC −1 ), where ⊕ stands for the Minkowski1 addition and D(r) is a disk of radius r centered at the origin. On the other hand, assuming that fi is a convex set, the area of fi ⊕ D(qC −1 ) is (Steiner Formula [25]) S(fi ⊕ D(
q 2 q q )) = S(fi ) + π + P (fi ). C C C
This suggests that (3) enables one to group level lines corresponding to the same edge as soon as λ > qC −1 . 1
Let A and B be two sets; A ⊕ B = {x + y, x ∈ A, y ∈ B}.
Local Scale Measure for Remote Sensing Images
861
Scale Definition. We have seen above that the most contrasted shape at each pixel is defined by (5). In order to define the scale at each pixel, we choose to consider as final shape associated to x the shape fˆ(x) minus the most contrasted shapes embedded inside itself. Let us recall indeed that a shape is a connected component of a level set whose holes have been filled in. On Fig. 1, the shape F contains the pixels of the shape H. In satellite images occlusion is not preponderant, and contrasted shapes containing other contrasted shapes often correspond to road or river networks. To accurately represent such structures, we eventually decide to define the most contrasted shape associated to a pixel x as: ˜ f(x) = fˆ(x) \ fˆ(y), (7) ˆ fˆ(y)f(x)
ˆ i.e. the shape f(x) minus the most contrasted shapes strictly embedded in it. Other choices would be possible in the framework of other applications. We choose to define the scale as E(x) = S(f˜(x))/P (f˜(x))
(8)
so that the geometry of f˜(x) is taken into account. In particular, long and thin shapes (e.g. the roads) correspond to relatively small scales, even though their area can be quite large.
3
Link with Variational Definitions of Scales
Recently, two definitions of local scale related to the total variation of images have been proposed in the literature. They are presented in Sect. 3.1. In Sect. 3.2 a geometrical interpretation of these definitions is given and the link between them is clarified. These are actually closely connected with the scale definition of this paper variation define the scale at each pixel as a weighted average over many shapes of the ratio area/perimeter, whereas the approach proposed in this paper define scale using the same ratio but relies on only one shape per pixel. 3.1
Variational Definitions of Scale
Definition Based on Total Variation Regularization. It is proposed in [27] to define the scales in an image by using the Rudin-Osher-Fatemi model [21] (ROF). Recall that the ROF model (or total variation regularization) consists, given an image f , in finding the solution u of: 1 inf
f − u 2L2 |Du| + (9) u 2T It is shown in [27] that if the scale of a set E is defined as PS(E) (E) (i.e. its area divided by its perimeter, as done in (8)) and if f is a binary image of a disk, then the intensity change between u and f inside this disk is inversely proportional
862
B. Luo, J.-F. Aujol, and Y. Gousseau
to its scale, ie δ = T . Therefore, the idea in [27] to define scales in an image scale is to use the gray level difference at each pixel between u and f . The scale at each pixel x is defined as scale(x) = T |u(x) − f (x)|−1 .
(10)
Observe that, in general, this definition of scale depends on the parameter T . Definition Based on the Total Variation Diffusion. Another definition of scale in images has been introduced in [4], by making use of the properties of total variation diffusion. Let us recall that the solution u of the total variation diffusion satisfies u(., 0) = f (11) ∂u Du ∂t = div |Du| In [24], the authors have proved the equivalence for 1-dimensional signal of total variation regularization (ROF model) and total variation diffusion. They have derived the same type of results as in [27] (where the considered functions were 2-dimensional radially symmetric signals). In particular, when using the total variation diffusion on an image, a constant region evolves with speed 2/m where m is the number of pixels in the considered region. Therefore in [4] the authors have proposed to define the scale m of a region (in dimension 1) as: m = 2 T |∂T u| dt , where T is the evolution time of the total variation diffu0
t
sion. In the same paper, the following definition of scale m is then proposed for 2-dimensional images T m = T . (12) |∂t u| dt 0 3.2
Equivalence and Geometrical Interpretation
As explained above, a geometrical interpretation of the scale definition given by (10) is provided by results from [27]. On the other hand, equivalence results between total variation regularization (see (9)) and total variation flow (see (11)) are provided in [24]. We now summarize some recent mathematical results in order to further investigate the definitions of scale given by (10) and (12), as well as to clarify the link with the definition of scale given in the present paper, (8). These results have been proved by V. Caselles and his collaborators in a series of papers [2, 3, 1]. In particular, it is shown that, if an image f is the characteristic function of a convex set C, i.e. f = 1C , then total variation regularization is equivalent to total variation flow. In both cases, the evolution (E) speed of a convex body C is P|E| where E is the Cheeger set of C (see [1]),
(K) that is, E is a solution of minK⊂C P|K| . The set C is said to be a Cheeger set in itself if it is itself a solution to this minimization problem. In dimension 2, a necessary and sufficient condition for C to be a Cheeger set in itself is that C is (C) . A disk is thus a Cheeger convex and the curvature of C is smaller than P|C| set in itself.
Local Scale Measure for Remote Sensing Images
863
Assume that f = 1C , with C Cheeger set in itself. Then it is shown in [3] that the solution of the total variation flow, (11), or equivalently of the total variation regularization, (9), is given by P (C) , 0 1C u(x, T ) = max 1−T (13) |C| (C) The evolution speed of C is thus P|C| (and in the case when C is a disk, this is what was proved by Chan and Strong in [27]). As a consequence, in the case when the considered image f is the characteristic function of a Cheeger set, then both definitions of scale (10) and (12) are equivalent. Notice that in this particular case these two definitions of scale are also equivalent to the one proposed in this paper (8). With all three definitions (10), (12), and (8), the scale of C is equal to P|C| (C) . Of course, in the case of more complicated images, the equivalence does not hold any more (see [14] for a detailed study).
4
Experiments
In this section, we compute scale maps on real satellite images using the approach presented in Sect. 2. Afterwards, we have also compared these results with the results obtained by variational methods. We first consider SPOT5 HMA images with a spatial resolution of 5 meters. Most contrasted shapes are extracted using (5). We choose to use a value of λ = 1. Of course, the choice of λ is related to the the image acquisition process and the image quantification. It is shown in Appendix A of [14] that under reasonable assumptions, it makes sense to use λ = 1 in (3). In Figs. 2(a) and (d) is displayed a SPOT5 HMA image of Toulouse (half urban area and half rural area) together with its computed scale map. It can be observed that the computed scales are spatially accurate (e.g. at the edges of buildings and warehouses). Moreover, these scales are in good qualitative agreement with the size of structures of the image (large scales for fields and the forest, while smaller for individual houses on the right). One also observes that computed scales are largely constant inside objects. Notice also that the road network is attributed relatively small scales, in agreement with (8). Figures 2(b) and (e) show the scale map obtained from a SPOT5 THR image [12] of Marseille with resolution 2.5m. In this clever imaging system, two images captured by two different CCD line arrays are interpolated to generate the high resolution image. The PSF of SPOT5 THR images is much more complicated than those of HMA images and cannot be modeled in a simple way, for instance using a Gaussian kernel. However, the slope of these PSF is sharp enough so that a value of λ = 1 still allows to group level lines belonging to the same contour. Again one can observe that the scales computed for the fields on the bottom are larger than the urban area on the top. Finally we present the scale map for a QuickBird Panchromatic image with a resolution of 0.6m, taken at Ouagadougou (see Fig. 2(c)). Again we use a
864
B. Luo, J.-F. Aujol, and Y. Gousseau
(a)
(b)
(c)
11 50
10
100
9
150
8
200
7
250
11 50
8
50
100
7
100
10
150
9 150 8
6 200
6
250
300
5
300
350
4
350
400
3
400
200 5
4
3
7
250
6
300
5
350
4
400
3
2 450 500 50
100
150
200
250
300
350
400
450
2
450
1
500
500
450 1 50
100
150
200
(d)
250
300
350
400
450
2 1
500
500
50
100
150
200
(e)
6
300
350
400
450
500
(f)
6.5 50
250
6.5 50
100
5.5
100
150
5
150
6 5.5
6.5 50
6
100
5.5
150
5
200
4.5
250
4
300
3.5
5 200
4.5
200 4.5
250
4
300
3.5
350
3
400
2.5
450
2
500 50
100
150
200
250
300
(g)
350
400
450
500
1.5
250 4
300 350
3.5
350
400
3
400
2.5
450
450 500 50
100
150
200
250
300
(h)
350
400
450
500
2
3 2.5 2
500 50
100
150
200
250
300
350
400
450
500
1.5
(i)
c Fig. 2. (a) Image of Los Angeles, SPOT5 (5m)CNES; (a) Image of Toulouse, SPOT5 c c (5m)CNES; (b) Image of Marseille, SPOT5 (2.5m, 512 × 512)CNES; (c) Image of c Ouagadougou, Quick-Bird (0.6m, 512 × 512)CNES; (d - f ) Scale map corresponding to (a)-(c) obtained by the method proposed in this paper. Notice the spatial accuracy of the method. (g - i) Scale map corresponding to (a)-(c) obtained by using the total variation diffusion, (12).
value of λ = 1. Considering the high resolution of QuickBird images, shapes smaller than 16 pixels are not taken into consideration. This is equivalent to the application of a grain filter of size 16 before the computation of the scale map, see [17]. The scale map is shown in Fig. 2(f). Again it can be observed that for most structures, such as the big buildings on the top left, computed scales are spatially accurate. However, for the city block in the middle of the image, the scale of the shape corresponding to the whole block has been associated to all pixels of this shape. This shows one of the limitation of the method : to each pixel is associated the scale of exactly one structure. A natural extension of the method would be to compute a scale profile at each pixel, in a way similar to [7].
Local Scale Measure for Remote Sensing Images
865
In Sect. 3, we have compared our scale definition with the scale definitions proposed by [4] and [27], which are based respectively on the total variation flow and the Rudin-Osher model. We have also explained why these two approaches ( [4] and [27]) are equivalent under some regularity assumptions. Therefore, in this section, we only use scale maps computed by the method of [4] for experimental comparisons. On Figs. 2(g)-(i) are displayed the scale maps of the previously shown images of Toulouse, Marseille and Ouagadougou (see Figs. 2(a)-(c)), obtained by using total variation diffusion (see (12)). The total evolution time is T = 60. The first observation is that all these methods yield a good spatial accuracy Second, one can observe that the scale maps displayed on Figs. 2 (g)-(i) are more noisy than the ones obtained using the method presented in this paper, Figs. 2 (d)-(f) Third, regions of the original images with homogeneous scales are more clearly identified in Figs. 2 (d)-(f) than with the method using the total variation flow to define scales. It appears quite clearly on these three examples that the approach presented in this paper yield sharper results. This is probably due to the definition of scale given by (8): only one contrasted shape is selected for each pixel.
5
Conclusions and Perspectives
By using the topographic map, we have introduced a definition of local scale. The scale of a pixel is defined as the scale of the most contrasted shape containing this pixel. We have validated our approach on various satellite images (we refer the interested reader to [14] for more numerical examples). These experiments indicate that the method gives robust and spatially accurate results. No complex parameter tuning is involved. However, it should be noticed that this approach is devoted to remote sensing images: indeed, when defining the contrast of level lines, we make the assumption that blur is uniform over the image. Another contribution of this paper is the study of the links between the proposed method and previous variational definitions of scale. This somehow bridges a gap between morphological and variational methods to compute scales in an image. We think that the proposed scale measure could be an efficient feature for image classification or segmentation. This should be the subject of further studies. Indeed, this feature can be expected to be complementary to more traditional indexing features obtained through wavelets or pixel statistics.
References 1. Alter, F., Caselles, V.: Uniqueness of the cheeger set of a convex body (submitted, 2007) (preprint) 2. Bellettini, G., Caselles, V., Novaga, M.: The total variation flow in Rn . Journal of differential equation 184, 475–525 (2002)
866
B. Luo, J.-F. Aujol, and Y. Gousseau
3. Bellettini, G., Caselles, V., Novaga, M.: Explicit solutions of the eigenvalue problem Du −div |Du| = u in R2 . SIAM Journal on Mathematical Analysis 36, 1095–1129 (2005) 4. Brox, T., Weickert, J.: A TV flow based local scale estimate and its application to texture discrimination. Journal of Visual Communication and Image Representation 17, 1053–1073 (2006) 5. Caselles, V., Coll, B., Morel, J.-M.: Topographic maps and local contrast changes in natural images. Int. J. Comp. Vision 33, 5–27 (1999) 6. Caselles, V., Monasse, P.: Geometric Description of Topographic Maps and Applications to Image Processing. Lecture Notes in Mathematics. Springer, Heidelberg (to appear, 2009) 7. Chanussot, J., Benediktsson, J., Fauvel, M.: Classification of remote sensing images from urban areas using a fuzzy possibilistic model. IEEE Geoscience and Remote Sensing Letters 3, 40–44 (2006) 8. Desolneux, A., Moisan, L., Morel, J.: Edge detection by helmholtz principle. Int. J. of Computer Vision 14, 271–284 (2001) 9. Dibos, F., Koepfler, G., Monasse, P.: Total variation minimization for scalar/vector regularization. In: Osher, S., Paragios, N. (eds.) Geometric Level Set Methods in Imaging, Vision, and Graphics, pp. 121–140 (2003) 10. Haas, A., Matheron, G., Serra, J.: Morphologie mathématique et granulométries en place. Annales des mines, 736–753 (1967) 11. Jägerstand, M.: Saliency maps and attention selection in scale and spatial coordinates: An information theoretic approach. In: Proc. 5th Int. Conf. on Computer Vision, Cambridge, MA, USA, pp. 195–202 (1995) 12. Latry, C., Rouge, B.: SPOT5 THR mode. In: Proc. SPIE Earth Observing Systems III, October 1998, vol. 3493, pp. 480–491 (1998) 13. Lindeberg, T.: Feature detection with automatic scale selection. Int. J. of Computer Vision 30, 79–116 (1998) 14. Luo, B., Aujol, J.-F., Gousseau, Y.: Local scale measure from the topographic map and application to remote sensing images. SIAM Multiscale Modeling and Simulation (to appear) 15. Luo, B., Aujol, J.-F., Gousseau, Y., Ladjal, S., Maître, H.: Resolution independent characteristic scale dedicated to satellite images. IEEE Trans. on Image Processing 16, 2503–2514 (2007) 16. Maragos, P.: Pattern spectrum and multiscale shape representation. IEEE Trans. Pattern Anal. Mach. Intell. 11, 701–716 (1989) 17. Masnou, S., Morel, J.-M.: Image restoration involving connectedness. In: DIP 1997, pp. 84–95. SPIE (1997) 18. Matas, J., Chum, O., Martin, U., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: Proceedings of the British Machine Vision Conference, vol. 1, pp. 384–393 (2002) 19. Monasse, P.: Mophological representation of digital images and application to registration. PhD thesis, University Paris IX (2000) 20. Monasse, P., Guichard, F.: Fast computation of a contrast-invariant image representation. IEEE Trans. on Image Processing 9, 860–872 (2000) 21. Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 22. Salembier, P., Serra, J.: Flat zones filtering, connected operators, and filters by reconstruction. IEEE Transactions on Image Processing 4, 1153–1160 (1995)
Local Scale Measure for Remote Sensing Images
867
23. Sporring, J., Weickert, J.: On generalized entropies and scale-space. In: Scale-Space Theories in Computer Vision, pp. 53–64 (1997) 24. Steidl, G., Weickert, J., Brox, T., Mrazek, P., Welk, M.: On the equivalence of soft wavelet shrinkage, total variation diffusion, total variation regularization, and sides. SIAM Journal on Numerical Analysis 42, 686–713 (2004) 25. Stoyan, D., Kendall, W.S., Mecke, J.: Stochastic Geometry and its Applications, 2nd edn. Wiley, Chichester (1995) 26. Strong, D., Aujol, J.-F., Chan, T.: Scale recognition, regularization parameter selection, and Meyer’s G norm in total variation regularization. SIAM Journal on Multiscale Modeling and Simulation 5, 273–303 (2006) 27. Strong, D., Chan, T.: Edge-preserving and scale-dependent properties of total variation regularization. Inverse Problems 19, 165–187 (2003) 28. Winter, A., Maître, H., Cambou, N., Legrand, E.: An Original Multi-Sensor Approach to Scale-Based Image Analysis for Aerial and Satellite Images. In: IEEEICIP 1997, Santa Barbara, CA, USA, vol. II, pp. 234–237 (1997)
Author Index
Alrefaya, Musa 212 Andersson, Thord 124 Astola, Laura 224 Aubert, Gilles 137 Aujol, Jean-Fran¸cois 295, 856 Avenel, Christophe 576
Felsberg, Michael 808 Florack, Luc 224, 377, 588 Franchini, Elena 75 Franken, Erik 795, 820 Frolkoviˇc, Peter 38 Fundana, Ketut 684
Bae, Egil 1 Bardin, Sabine 770 Becciu, Alessandro 588 Becker, Florian 150 Benmansour, Fethallah 14, 648 Berkels, Benjamin 26 Bischof, Horst 200 Borga, Magnus 124 Borok, Sofia 490 Bourgine, Paul 38 Bresson, Xavier 112 Breuß, Michael 247, 636, 733, 758 Brune, Christoph 235 Burger, Martin 235 Burgeth, Bernhard 247
Gabrielides, Nikolaos 672 Gilboa, Guy 527 Gousseau, Yann 856 Grasmair, Markus 331 Griffin, Lewis D. 343 Guillot, Laurence 87 Gurumoorthy, Karthik S. 100 Gustavsson, David 832
Chambolle, Antonin 368 Chan, Tony F. 112 Chessel, Anatole 770 Cinquin, Bertrand 770 Cohen, Laurent D. 14, 163, 648, 672 Crosier, Mike 343 Damerval, Christophe 782 Dascal, Lorina 259 DeCezaro, Adriano 50 Dinov, Ivo 389 Dong, Yiqiu 271 Drbl´ıkov´ a, Olga 63 Duits, Remco 377, 795, 820 Durand, Sylvain 282 Duval, Vincent 295 Eirola, Timo 660 Elmoataz, Abderrahim 187 Elo, Christoffer A. 307 Fadili, Jalal 137, 282 Feigin, Micha 319
Haber, Eldad 612 Hahn, Jooyoung 490 Hajiaboli, Mohammad Reza 356 Hao, Dinh Nho 212 Heldmann, Stefan 612, 624 Heyden, Anders 684 Hinterm¨ uller, Michael 271 Houhou, Nawal 112 Imiya, Atsushi
175
Jalalzai, Khalid 368 Janssen, Bart 377 Jehan-Besson, Stephanie Jindal, Nitin 696 Joshi, Shantanu H. 389 Jung, Miyoun 401 Kappes, J¨ org 150 Keriven, Renaud 721 Kervrann, Charles 770 Kimmel, Ron 259 Kozerke, Sebastian 588 Kuijper, Arjan 844 Lai, Ming-Jun 514 Lassila, Toni 660 L¨ ath´en, Gunnar 124 Lauze, Francois 832
137
870
Author Index
Lecellier, Francois 137 Le Guyader, Carole 87, 600 Leichtweis, Thomas 733 Leit˜ ao, Antonio 50 Lellmann, Jan 150 Lenz, Reiner 124 Lenzen, Frank 413 L´ezoray, Olivier 187 Lillholm, Martin 343 Lucier, Bradley 514 Luo, Bin 856 Malyshev, Alexander 307 Marquina, Antonio 389 Meignen, Sylvain 782 M´emin, Etienne 576 Mikula, Karol 38, 63 Mille, Julien 163 Modersitzki, Jan 612 Morigi, Serena 75, 426 Ng, Michael K. 539 Nielsen, Mads 832 Nikolova, Mila 282, 439 Osher, Stanley J. 389 Overgaard, Niels Chr. 684 Papenberg, Nils 624 Pedersen, Kim S. 832 P´erez, Patrick 576 Peyri´eras, Nadine 38, 63 Pizarro, Luis 247 Pock, Thomas 200 Prados, Emmanuel 696, 745 Rahman, Talal 307 Rangarajan, Anand 100 Reichel, Lothar 426 Remeˇs´ıkov´ a, Mariana 38 Revenu, Marinette 137 Roode, Vivian 588 Rosman, Guy 259 Rumpf, Martin 709 Sahli, Hichem Sakai, Tomoya
212 175
Salamero, Jean 770 Sawatzky, Alex 235 Scherzer, Otmar 413, 452 Schn¨ orr, Christoph 150, 552 Segonne, Florent 721 Seidel, Hans-Peter 636 Setzer, Simon 464 Sgallari, Fiorella 75, 426 Soatto, Stefano 696 Sochen, Nir 319 Steidl, Gabriele 477, 552 Sturm, Peter 745 Szlam, Arthur 112 Ta, Vinh-Thong 187 Tai, Xue-Cheng 1, 50, 259, 490, 502, 539 ter Haar Romeny, Bart M. 588 Teuber, Tanja 477 Thiran, Jean-Philippe 112 Thorstensen, Nicolas 721 Toga, Arthur W. 389 Unger, Markus
200
van Assen, Hans 588 Vanhamel, Iris 212 Van Horn, John D. 389 van Sande, Justus 343 Vese, Luminita A. 295, 401, 600 Vogel, Oliver 733 Walch, Birgit 452 Wang, Jingyue 514 Weickert, Joachim 247, 527, 636, 733, 758 Welk, Martin 527 Werlberger, Manuel 200 Wirth, Benedikt 709 Wu, Chunlin 502 Yau, Andy C. 539 Yoon, Kuk-Jin 745 Yuan, Jing 150, 552 Z´era¨ı, Mourad 565 Zimmer, Henning 636