Scale Space and Variational Methods in Computer Vision: Second International Conference, SSVM 2009, Voss, Norway, June 1-5, 2009. Proceedings (Lecture Notes in Computer Science)

Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris...

Author: Xue-Cheng Tai | Knut Morken | Marius Lysaker | Knut-Andreas Lie (eds.)

8 downloads 643 Views 30MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen University of Dortmund, Germany Madhu Sudan Massachusetts Institute of Technology, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany

5567

Xue-Cheng Tai Knut Mørken Marius Lysaker Knut-Andreas Lie (Eds.)

Scale Space and Variational Methods in Computer Vision Second International Conference, SSVM 2009 Voss, Norway, June 1-5, 2009 Proceedings

13

Volume Editors Xue-Cheng Tai Department of Mathematics University of Bergen, Norway and Division of Mathematical Science Nanyang Technological University, Singapore E-mail: [email protected] Knut Mørken Department of Informatics and Centre of Mathematics for Applications University of Oslo, Norway E-mail: [email protected] Marius Lysaker Simula Research Laboratory Lysaker, Norway E-mail: [email protected] Knut-Andreas Lie Centre of Mathematics for Applications University of Oslo, Norway and SINTEF ICT, Oslo, Norway E-mail: [email protected]

Library of Congress Control Number: Applied for CR Subject Classification (1998): I.4, I.5, I.3.5, I.2.10, G.1.2, F.2.2 LNCS Sublibrary: SL 6 – Image Processing, Computer Vision, Pattern Recognition, and Graphics ISSN ISBN-10 ISBN-13

0302-9743 3-642-02255-3 Springer Berlin Heidelberg New York 978-3-642-02255-5 Springer Berlin Heidelberg New York

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. springer.com © Springer-Verlag Berlin Heidelberg 2009 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 12689675 06/3180 543210

Preface

This book contains 71 original, scientiﬁc articles that address state-of-the-art research related to scale space and variational methods for image processing and computer vision. Topics covered in the book range from mathematical analysis of both established and new models, fast numerical methods, image analysis, segmentation, registration, surface and shape construction and processing, to real applications in medical imaging and computer vision. The ideas of scale space and variational methods related to partial diﬀerential equations are central concepts. The papers reﬂect the newest developments in these ﬁelds and also point to the latest literature. All the papers were submitted to the Second International Conference on Scale Space and Variational Methods in Computer Vision, which took place in Voss, Norway, during June 1–5, 2009. The papers underwent a peer review process similar to that of high-level journals in the ﬁeld. We thank the authors, the Scientiﬁc Committee, the Program Committee and the reviewers for their hard work and helpful collaboration. Their contribution has been crucial for the eﬃcient processing of this book, and for the success of the conference. Finally, we wish to thank those who have supported and helped to organize the conference. First and foremost it is a pleasure to acknowledge the generous ﬁnancial support from the Centre of Mathematics for Applications (CMA) at the University of Oslo and the Research Council of Norway. In addition, partial support was given by the Centre of Integrated Petroleum Research (CIPR) at the University of Bergen and the Simula Research Laboratory (SRL). Moreover, we would like to thank Tiril P. Gurholt and Andrew McMurry for their support, both with technical and administrative matters. Members and students from the Mathematical Imaging and Vision Group at the Nanyang Technological University of Singapore and the University of Bergen, Norway deserve special thanks for their kind help. March 2009

Xue-Cheng Tai Knut Mørken Marius Lysaker Knut-Andreas Lie

Organization

Organizing Commitee and Editors Xue-Cheng Tai

University of Bergen, Norway, and Nanyang Technology University Singapore, Conference Chair Sintef, Norway Simula Research Laboratory, Norway University of Oslo, Norway

Knut-Andreas Lie Marius Lysaker Knut Mørken

Scientiﬁc Committee Alfred M.Bruckstein Tony F. Chan Mads Nielsen Stanley Osher Nikos Paragios Bart M. ter Haar Romeny Christoph Schnoerr Fiorella Sgallari Joachim Weickert

Technion IIT, Israel University of California at LA, USA University of Copenhagen, Denmark University of California at LA, USA Ecole Centrale de Paris, France Eindhoven University of Technology, The Netherlands University of Heidelberg, Germany University of Bologna, Italy Saarland University, Germany

Program Commitee Luis Alvarez Noura Azzabou Thomas Brox Bernhard Burgeth Vicent Caselles Raymond Chan Yeowmeng Chee Yunmei Chen Daniel Cremers Francoise Dibos Michael Felsberg Luc Florack

Lewis Griﬃn Anders Heyden Charles Kervrann Ron Kimmel Arjan Kuijper Georg Langs Antonio Leitao Riccardo March Antonio Marquina Etienne Memin Karol Mikula Jan Modersitzki

Michael Ng Mila Nikolova Martin Rumpf Otmar Scherzer Nir Sochen Gabriele Steidl Demetri Terzopoulos David Tschumperle Baba C. Vemuri Hongkai Zhao Haomin Zhou

VIII

Organization

Invited Speakers Antonin Chambolle, CMAP - Ecole Polytechnique, France Raymond Chan, The Chinese University of Hong Kong, China Amiram Grinvald, Weizmann Institute of Science, Israel

Sponsoring Institutions Centre of Mathematics for Applications, University of Oslo Research Council of Norway Centre of Integrated Petroleum Research, University of Bergen Simula Research Laboratory

Table of Contents

Segmentation and Detection Graph Cut Optimization for the Piecewise Constant Level Set Method Applied to Multiphase Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . Egil Bae and Xue-Cheng Tai Tubular Anisotropy Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fethallah Benmansour and Laurent D. Cohen

1 14

An Unconstrained Multiphase Thresholding Approach for Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Benjamin Berkels

26

Extraction of the Intercellular Skeleton from 2D Images of Embryogenesis Using Eikonal Equation and Advective Subjective Surface Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Paul Bourgine, Peter Frolkoviˇc, Karol Mikula, Nadine Peyri´eras, and Mariana Remeˇs´ıkov´ a

38

On Level-Set Type Methods for Recovering Piecewise Constant Solutions of Ill-Posed Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Adriano DeCezaro, Antonio Leit˜ ao, and Xue-Cheng Tai

50

The Nonlinear Tensor Diﬀusion in Segmentation of Meaningful Biological Structures from Image Sequences of Zebraﬁsh Embryogenesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Olga Drbl´ıkov´ a, Karol Mikula, and Nadine Peyri´eras

63

Composed Segmentation of Tubular Structures by an Anisotropic PDE Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Elena Franchini, Serena Morigi, and Fiorella Sgallari

75

Extrapolation of Vector Fields Using the Inﬁnity Laplacian and with Applications to Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Laurence Guillot and Carole Le Guyader

87

A Schr¨ odinger Equation for the Fast Computation of Approximate Euclidean Distance Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Karthik S. Gurumoorthy and Anand Rangarajan

100

Semi-supervised Segmentation Based on Non-local Continuous Min-Cut . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nawal Houhou, Xavier Bresson, Arthur Szlam, Tony F. Chan, and Jean-Philippe Thiran

112

X

Table of Contents

Momentum Based Optimization Methods for Level Set Segmentation . . . Gunnar L¨ ath´en, Thord Andersson, Reiner Lenz, and Magnus Borga Optimization of Divergences within the Exponential Family for Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Francois Lecellier, Stephanie Jehan-Besson, Jalal Fadili, Gilles Aubert, and Marinette Revenu Convex Multi-class Image Labeling by Simplex-Constrained Total Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jan Lellmann, J¨ org Kappes, Jing Yuan, Florian Becker, and Christoph Schn¨ orr Geodesically Linked Active Contours: Evolution Strategy Based on Minimal Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Julien Mille and Laurent D. Cohen

124

137

150

163

Validation of Watershed Regions by Scale-Space Statistics . . . . . . . . . . . . . Tomoya Sakai and Atsushi Imiya

175

Adaptation of Eikonal Equation over Weighted Graph . . . . . . . . . . . . . . . . Vinh-Thong Ta, Abderrahim Elmoataz, and Olivier L´ezoray

187

A Variational Model for Interactive Shape Prior Segmentation and Real-Time Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Manuel Werlberger, Thomas Pock, Markus Unger, and Horst Bischof

200

Image Enhancement and Reconstruction A Nonlinear Probabilistic Curvature Motion Filter for Positron Emission Tomography Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Musa Alrefaya, Hichem Sahli, Iris Vanhamel, and Dinh Nho Hao

212

Finsler Geometry on Higher Order Tensor Fields and Applications to High Angular Resolution Diﬀusion Imaging . . . . . . . . . . . . . . . . . . . . . . . . . Laura Astola and Luc Florack

224

Bregman-EM-TV Methods with Application to Optical Nanoscopy . . . . . Christoph Brune, Alex Sawatzky, and Martin Burger

235

PDE-Driven Adaptive Morphology for Matrix Fields . . . . . . . . . . . . . . . . . Bernhard Burgeth, Michael Breuß, Luis Pizarro, and Joachim Weickert

247

On Semi-implicit Splitting Schemes for the Beltrami Color Flow . . . . . . . Lorina Dascal, Guy Rosman, Xue-Cheng Tai, and Ron Kimmel

259

Multi-scale Total Variation with Automated Regularization Parameter Selection for Color Image Restoration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yiqiu Dong and Michael Hinterm¨ uller

271

Table of Contents

Multiplicative Noise Cleaning via a Variational Method Involving Curvelet Coeﬃcients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sylvain Durand, Jalal Fadili, and Mila Nikolova Projected Gradient Based Color Image Decomposition . . . . . . . . . . . . . . . . Vincent Duval, Jean-Fran¸cois Aujol, and Luminita Vese

XI

282 295

A Dual Formulation of the TV-Stokes Algorithm for Image Denoising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christoﬀer A. Elo, Alexander Malyshev, and Talal Rahman

307

Anisotropic Regularization for Inverse Problems with Application to the Wiener Filter with Gaussian and Impulse Noise . . . . . . . . . . . . . . . . . . Micha Feigin and Nir Sochen

319

Locally Adaptive Total Variation Regularization . . . . . . . . . . . . . . . . . . . . . Markus Grasmair Basic Image Features (BIFs) Arising from Approximate Symmetry Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lewis D. Griﬃn, Martin Lillholm, Mike Crosier, and Justus van Sande

331

343

An Anisotropic Fourth-Order Partial Diﬀerential Equation for Noise Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mohammad Reza Hajiaboli

356

Enhancement of Blurred and Noisy Images Based on an Original Variant of the Total Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Khalid Jalalzai and Antonin Chambolle

368

Coarse-to-Fine Image Reconstruction Based on Weighted Diﬀerential Features and Background Gauge Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bart Janssen, Remco Duits, and Luc Florack

377

Edge-Enhanced Image Reconstruction Using (TV) Total Variation and Bregman Reﬁnement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shantanu H. Joshi, Antonio Marquina, Stanley J. Osher, Ivo Dinov, John D. Van Horn, and Arthur W. Toga Nonlocal Variational Image Deblurring Models in the Presence of Gaussian or Impulse Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Miyoun Jung and Luminita A. Vese A Geometric PDE for Interpolation of M -Channel Data . . . . . . . . . . . . . . Frank Lenzen and Otmar Scherzer An Edge-Preserving Multilevel Method for Deblurring, Denoising, and Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Serena Morigi, Lothar Reichel, and Fiorella Sgallari

389

401 413

426

XII

Table of Contents

Fast Dejittering for Digital Video Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . Mila Nikolova

439

Sparsity Regularization for Radon Measures . . . . . . . . . . . . . . . . . . . . . . . . . Otmar Scherzer and Birgit Walch

452

Split Bregman Algorithm, Douglas-Rachford Splitting and Frame Shrinkage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simon Setzer Anisotropic Smoothing Using Double Orientations . . . . . . . . . . . . . . . . . . . Gabriele Steidl and Tanja Teuber

464 477

Image Denoising Using TV-Stokes Equation with an Orientation-Matching Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xue-Cheng Tai, Soﬁa Borok, and Jooyoung Hahn

490

Augmented Lagrangian Method, Dual Methods and Split Bregman Iteration for ROF Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xue-Cheng Tai and Chunlin Wu

502

The Convergence of a Central-Diﬀerence Discretization of Rudin-Osher-Fatemi Model for Image Denoising . . . . . . . . . . . . . . . . . . . . . Ming-Jun Lai, Bradley Lucier, and Jingyue Wang

514

Theoretical Foundations for Discrete Forward-and-Backward Diﬀusion Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Martin Welk, Guy Gilboa, and Joachim Weickert

527

L0 -Norm and Total Variation for Wavelet Inpainting . . . . . . . . . . . . . . . . . Andy C. Yau, Xue-Cheng Tai, and Michael K. Ng

539

Total-Variation Based Piecewise Aﬃne Regularization . . . . . . . . . . . . . . . . Jing Yuan, Christoph Schn¨ orr, and Gabriele Steidl

552

Image Denoising by Harmonic Mean Curvature Flow . . . . . . . . . . . . . . . . . Mourad Z´era¨ı

565

Motion Analysis, Optical Flow, Registration and Tracking Tracking Closed Curves with Non-linear Stochastic Filters . . . . . . . . . . . . Christophe Avenel, Etienne M´emin, and Patrick P´erez A Multi-scale Feature Based Optic Flow Method for 3D Cardiac Motion Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alessandro Becciu, Hans van Assen, Luc Florack, Sebastian Kozerke, Vivian Roode, and Bart M. ter Haar Romeny

576

588

Table of Contents

XIII

A Combined Segmentation and Registration Framework with a Nonlinear Elasticity Smoother . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Carole Le Guyader and Luminita A. Vese

600

A Scale-Space Approach to Landmark Constrained Image Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Eldad Haber, Stefan Heldmann, and Jan Modersitzki

612

A Variational Approach for Volume-to-Slice Registration . . . . . . . . . . . . . . Stefan Heldmann and Nils Papenberg Hyperbolic Numerics for Variational Approaches to Correspondence Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Henning Zimmer, Michael Breuß, Joachim Weickert, and Hans-Peter Seidel

624

636

Surfaces and Shapes From a Single Point to a Surface Patch by Growing Minimal Paths . . . . . Fethallah Benmansour and Laurent D. Cohen

648

Optimization of Convex Shapes: An Approach to Crystal Shape Identiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Timo Eirola and Toni Lassila

660

An Implicit Method for Interpolating Two Digital Closed Curves on Parallel Planes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nikolaos Gabrielides and Laurent Cohen

672

Pose Invariant Shape Prior Segmentation Using Continuous Cuts and Gradient Descent on Lie Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Niels Chr. Overgaard, Ketut Fundana, and Anders Heyden

684

A Non-local Approach to Shape from Ambient Shading . . . . . . . . . . . . . . . Emmanuel Prados, Nitin Jindal, and Stefano Soatto

696

An Elasticity Approach to Principal Modes of Shape Variation . . . . . . . . Martin Rumpf and Benedikt Wirth

709

Pre-image as Karcher Mean Using Diﬀusion Maps: Application to Shape and Image Denoising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nicolas Thorstensen, Florent Segonne, and Renaud Keriven

721

Fast Shape from Shading for Phong-Type Surfaces . . . . . . . . . . . . . . . . . . . Oliver Vogel, Michael Breuß, Thomas Leichtweis, and Joachim Weickert

733

Generic Scene Recovery Using Multiple Images . . . . . . . . . . . . . . . . . . . . . . Kuk-Jin Yoon, Emmanuel Prados, and Peter Sturm

745

XIV

Table of Contents

Scale Space and Feature Extraction Highly Accurate PDE-Based Morphology for General Structuring Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michael Breuß and Joachim Weickert Computational Geometry-Based Scale-Space and Modal Image Decomposition: Application to Light Video-Microscopy Imaging . . . . . . . Anatole Chessel, Bertrand Cinquin, Sabine Bardin, Jean Salamero, and Charles Kervrann

758

770

Highlight on a Feature Extracted at Fine Scales: The Pointwise Lipschitz Regularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christophe Damerval and Sylvain Meignen

782

Line Enhancement and Completion via Linear Left Invariant Scale Spaces on SE(2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Remco Duits and Erik Franken

795

Spatio-Featural Scale-Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michael Felsberg Scale Spaces on the 3D Euclidean Motion Group for Enhancement of HARDI Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Erik Franken and Remco Duits

808

820

On the Rate of Structural Change in Scale Spaces . . . . . . . . . . . . . . . . . . . . David Gustavsson, Kim S. Pedersen, Francois Lauze, and Mads Nielsen

832

Transitions of a Multi-scale Image Hierarchy Tree . . . . . . . . . . . . . . . . . . . . Arjan Kuijper

844

Local Scale Measure for Remote Sensing Images . . . . . . . . . . . . . . . . . . . . . Bin Luo, Jean-Fran¸cois Aujol, and Yann Gousseau

856

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

869

Graph Cut Optimization for the Piecewise Constant Level Set Method Applied to Multiphase Image Segmentation Egil Bae1 and Xue-Cheng Tai2 1

Department of Mathematics, University of Bergen, Norway [email protected] 2 Department of Mathematics, University of Bergen, Norway and Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore [email protected]

Abstract. The piecewise constant level set method (PCLSM) has recently emerged as a variant of the level set method for variational interphase problems. Traditionally, the Euler-Lagrange equations are solved by some iterative numerical method for PDEs. Normally the speed is slow. In this work, we focus on the piecewise constant level set method (PCLSM) applied to the multiphase Mumford-Shah model for image segmentation. Instead of solving the Euler-Lagrange equations of the resulting minimization problem, we propose an eﬃcient combinatorial optimization technique, based on graph cuts. Because of a simpliﬁcation of the length term in the energy induced by the PCLSM, the minimization problem is not NP hard. Numerical experiments on image segmentation demonstrate that the new approach is very superior in terms of eﬃciency, while maintaining the same quality.

1

Introduction

The level set method [1, 2] is a powerful tool for interphase problems. It has numerous applications in computer vision, ﬂuid dynamics and inverse problems. The interphase is implicitly represented by a higher dimensional level set function. Originally, the signed distance functions were used as level set functions. Later the work of [3, 4, 5] introduced piecewise constant level set functions, representing the interphases by discontinuities. This has certain advantages, such as ability to represent several interphases by one single level set function. This method will be referred to as the piecewise constant level set method (PCLSM) In computer vision, the level set method has been applied with great success to image segmentation. Of particular importance is the Mumford-Shah model [6],

Support from the Norwegian Research Council (eVita project 166075), National Science Foundation of Singapore (NRF2007IDM-IDM002-010) and Ministry of Education of Singapore (Moe Tier 2 T207B2202) are gratefully acknowledged.

X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 1–13, 2009. c Springer-Verlag Berlin Heidelberg 2009

2

E. Bae and X.-C. Tai

which is an established image segmentation model. In [7, 8], Chan and Vese proposed a numerical realization of this model based on traditional level set functions. In [3, 4, 5], piecewise constant level set functions were proposed. Both approaches lead to a system of nonlinear PDEs that needs to be solved numerically. They both have the drawback of expensive computation. This work aims to signiﬁcantly reduce the computational cost of the piecewise constant level set method for the multiphase Mumford-Shah model. The length term is often simpliﬁed in the energy induced when representing this model by piecewise constant level set functions. We will show that this simpliﬁcation makes it possible to eﬃciently compute global minimizers via graph cuts, when the mean image intensity value in each phase is known. Graph cuts is a wellknown technique in image analysis and computer vision [9, 10, 11, 12]. Usually, NP-hard multilabeling problems are approached by constructing algorithms for ﬁnding approximate suboptimal solutions, such as alpha expansion [12]. We instead do the approximation in the model, and then compute the exact solution of the approximate model. The graph used for optimization is constructed as in [13, 14, 15], except for some small modiﬁcations. Finally, for unknown mean intensity values, an iterative algorithm is presented, which we believe will have large practical value because of the strong eﬃciency. In case of two phases, some work on graph cut optimization for the MumfordShah model has been made in [16, 17]. Also a multiphase approach based on graph cuts has recently been presented in [18]. The process is started by splitting the image into two regions, by solving the two-phase Mumford-Shah model to optimality. In the next step, each new region is splitted in two by solving the twophase Mumford-Shah model within each region. The process is repeated until the intensity variation within each region falls below a predeﬁned threshold. The limitation of this approach is that the possibility of a region to evolve has been ignored. For instance, the optimal interphase for two regions may not be a subset of the optimal interphases for three regions. An experiment in Section 4 will clarify this. The paper is organized as follows: Section 2 gives a brief overview of the piecewise constant level set method and the Mumford-Shah model. Section 3 presents the new integer optimization approach, while numerical experiments are presented in Section 4.

2 2.1

Image Segmentation and the PCLSM The Mumford-Shah Model

The Mumford-Shah model [6] is an established image segmentation model with a wide range of applications. Let u0 be the input image. In the most common variant with closed contours, one seeks a partition {Ωi }ni=1 of the image domain Ω, and an approximation image u which minimizes the functional E(u, Γi ) = Ω

(u − u0 )2 dx + μ

Ω\∪i Γi

|∇u|2 dx +

n ν i=1

Γi

ds,

(1)

Graph Cut Optimization for the Piecewise Constant Level Set Method

3

where {Γi }ni=1 denotes the interphases between the regions {Ωi }ni=1 . Often u is assumed to be constant within each phase, in which case the second term disappears and one ends up with the simpler version

(u − u0 )2 dx +

E({ci }, Γi ) = Ω

n ν

ds,

(2)

Γi

i=1

n where u = i=1 ci Ψi , and Ψi is the characteristic function of Ωi . As a numerical realization, Chan and Vese [7,8] proposed to represent the above functional with level set functions, and solve the resulting gradient descent equations numerically. In order to represent n phases, log2 (n) level set functions were required. For any n > 2 the length term had to be simpliﬁed. 2.2

Piecewise Constant Level Set Functions

In [3, 4, 19], instead the piecewise constant level set method was proposed, and applied to the Mumford-Shah model. This approach has certain beneﬁts, such as the ability to represent any number of phases with one single level set function. Let {Ωi }ni=1 be a partition of the domain Ω into n regions. Any such partition can be described by a piecewise constant level set function φ as follows φ=i

in Ωi

for i = 1, 2, ..., n.

(3)

Note that all interphases are represented by discontinuities in φ. The MumfordShah functional can now be written in terms of φ E(c, φ) =

(u − u0 )2 dx +

Ω

n ν |∇ψi |dx. 2 i=1 Ω

(4)

n where u = i=1 ci ψi , and ψi is the characteristic function of Ωi . It can be derived from the level set function by ψi =

1 αi

(φ − j) with αi =

j=1j =i

(i − k).

(5)

k=1k =i

The length term can be approximated by the total variation of the level set function itself, especially when the number of phases is not too large E(c, φ) =

n i=1

0 2

(u − u ) dx + ν Ω

|∇φ|dx,

(6)

Ω

see for instance [5] for a justiﬁcation. Most often this approximation is preferred, since it is computationally easier. Such a simpliﬁcation of the length term has also been made in [20, 21] among others for multiphase image segmentation. In this work we will consider (6).

4

E. Bae and X.-C. Tai

There are some variants of the total variation regularization term. The com monly used version is the isotropic total variation: T V2 (φ) = Ω |∇φ|2 dx = |φx1 |2 + |φx2 |2 dx. In computation, often a simpler verΩ order to simplify sion is used: T V1 (φ) = Ω |∇φ|1 dx = Ω |φx1 | + |φx2 | dx. However, since T V1 is not isotropic, regularization will be stronger in certain directions. A more isotropic version based on the 1-norm can be obtained by splitting T V1 using the original gradient counterclockwise π/4 radians: operator, and one rotated T V1, π4 (φ) = 12 Ω |∇φ(x)|1 + |R π4 ∇φ(x)|1 dx, where R π4 ∇ is the gradient in the rotated coordinate system. It is also possible to create even more isotropic versions by considering more such rotations. Previous attempts to minimize (6) have been made by continuous optimization. In order to force a solution taking only integer values, the following constraint was imposed n K(φ) = (φ − i) = 0 (7) i=1

The constrained optimization problem (6) and (7) could be solved by the augmented lagrangian method as in [3, 4, 19]. Some attempts to speed up the computation can be found in [5]. In the next section we propose to solve the minimization problem by graph cuts. We start by discretizing the variational problem (6) on a grid P of mesh size δ = 1. For each p = (i, j) ∈ P, deﬁne the neighborhood systems N4 (p) = {(i ± 1, j), (i, j ± 1)}, and N8 (p) = {(i ± 1, j), (i, j ± 1), (i ± 1, j ± 1)}. The modiﬁcation of the deﬁnition for boundary points is clear. The discrete energy function can now be written compactly Ed (c, φ) =

p∈P

δ 2 (up − u0p )2 + ν

p∈P q∈Nk (p)

1 wpq |φp − φq |, 2

(8)

where k = 4 for T V1 and k = 8 for T V1, π4 . The weights wpq are given by 2

4δ . In case of two phases, similar weights can also be derived by using wpq = k||p−q|| 2 the Cauchy-Crofton formula [22]. Note that each term is being counted twice in the last summation. This is compensated by multiplication by the factor 12 .

3

Integer Optimization for PCLSM and Mumford-Shah

Instead of imposing constraints to force an integer solution by continuous optimization, we instead propose the much more natural approach of using integer optimization to minimize (6). We will show that the discretized functional (8) can be minimized by graph cuts in case the values c are known in Section 3.2. Finally, in Section 3.3 an algorithm is designed to minimize with respect to both c and φ.

Graph Cut Optimization for the Piecewise Constant Level Set Method

3.1

5

Background on Graph Cuts and Terminology

Min-cut is a well known optimization problem. Due to a duality theorem by Ford and Fulkerson [23], there are several fast algorithms for this problem. Graph cuts is a reference to such algorithms, and was introduced as a computer vision tool by Greig et. al. [9] in connection with markov random ﬁelds [24]. A graph G = (V, E) is a set of vertices V and a set of directed edges E. We let (a, b) denote the directed edge going from vertex a to vertex b, and let c(a, b) denote the cost (weight) on this edge. In the graph cut scenario there are two distinguished vertices in V, called the source {s} and the sink {t}. A cut on G is a partitioning of the vertices V into two disjoint an connected (through edges) sets (Vs , Vt ) such that s ∈ Vs and t ∈ Vt . For each cut, the set of severed edges C is uniquely deﬁned as C = {(a, b) | a ∈ Vs , b ∈ Vt and (a, b) ∈ E}.

(9)

We say that the cut severs the edge e if e is contained in C. From now on, we refer to the cut as the set of severed edges C. The cost of the cut is deﬁned as |C| = c(e). (10) e∈C

We are interested in ﬁnding the cut of minimum cost on G, from now on called the minimum cut. The duality theorem by Ford and Fulkerson [23] states this is equivalent to ﬁnding the maximum ﬂow from {s} to {t}, where the edge weights are bounds on the maximum amount of ﬂow that can be pushed through the edges. Cuts of minimum cost can thus be computed very eﬃciently by max-ﬂow algorithms such as Ford-Fulkerson [23]. See [10] for a detailed discussion about implementation. 3.2

Graph Cuts for the Multiphase PCLSM

For ﬁxed values c, we will show that the minimizer of (8) can be obtained by ﬁnding the minimum cut over an appropriate graph, i.e. we will construct a graph G such that min |C| = min Ed (c, φ) + σ, (11) C cut on G

φ

where σ is a constant that will be speciﬁed later. Note that the minimizer φ is not inﬂuenced by this constant. Some work on graph cuts for the two phase Mumford-Shah model can be found in [17,16]. Unfortunately, the extension to more than two phases is NP hard [25]. The usual graph cut approach to optimization problems of several labels, is to use some sort of approximation method, such as the alpha expansion [12]. Since we have already made an approximation in the model (6), we will show that graph cuts can be used to ﬁnd the exact minimum. The idea is to introduce an extra dimension to take care of several phases. We construct the graph in a similar way as Ishikawa [13, 14], except for some small technical diﬀerences:

6

E. Bae and X.-C. Tai

(a)

(b)

Fig. 1. (a) The graph corresponding to a 1D signal of 6 grid points used for 4 phase segmentation. Edges in ED are depicted as vertical arrows and edges in ER are depicted as horizontal arrows. The gray curve is used to visualize the cut, vertices in the interior to the curve belongs to Vs , vertices in the exterior to the curve belongs to Vt . Edges in C are depicted as dotted arrows. Figure (b) shows the values of φ at each grid point corresponding to the cut in (a), they are determined from deﬁnition 1.

our graph consists of one less layer of vertices and edges, and is a generalization from the binary construction of Greig et. al. [9]. We also avoid edges of inﬁnite capacity. When the number of phases is small, this will have a little eﬀect on the eﬃciency. For each grid point p ∈ P, we associate (n−1) vertices in the graph G, denoted vp, , = 1, ..., n − 1. The set of vertices V is formally deﬁned V = {vp, | p ∈ P, ∈ {1, ..., n − 1}} ∪ {s} ∪ {t}.

(12)

An illustration in case of a 1D image where P = {1, 2, ..., 6}, is shown in Figure 1. For ease of visualization, no 2D cases are shown. The edges are arranged in two groups, ED and ER . The ﬁrst group ED corresponds to the data term in (8). It is deﬁned as ED = ∪p∈P Ep ,

(13)

where for each p ∈ P the edge set Ep is deﬁned as Ep = (s, vp,1 ) ∪n−2 =1 (vp, , vp,+1 ) ∪ (vp,n−1 , t).

(14)

The edges in ED are illustrated as the vertical arrows in Figure 1. The second group of edges ER corresponds to the regularization term in (8). These are illustrated as the horizontal arrows in Figure 1, i.e. ER = {(vp, , vq, ) | p ∈ P, q ∈ Nk (p), ∈ {1, ..., n − 1}}.

(15)

We say that a cut is admissible if it severs exactly one edge in Ep for each p ∈ P. We can now establish the relationship between a cut on G and a level set function φ.

Graph Cut Optimization for the Piecewise Constant Level Set Method

7

Definition 1. Let C ⊂ E be an admissible cut on G. For any grid point p ∈ P, the corresponding level set function φ is defined as ⎧ if (s, vp,1 ) ∈ C, ⎨1 φp = + 1 if (vp, , vp,+1 ) ∈ C, (16) ⎩ n if (vp,n−1 , t) ∈ C. Note that φ is single valued by the admissible cut requirement. We can now deﬁne the edge costs (weights) such that the relationship (11) is satisﬁed. We start by edges in ED , i.e. the data edges σ c (s, vp,1 ) = δ 2 |u0p − c1 |2 + |P| ∀p ∈ P, σ 2 0 2 c (vp, , vp,+1 ) = δ |up − c+1 | + |P| ∀p ∈ P, σ c (vp,n , t) = δ 2 |u0p − cn |2 + |P| ∀p ∈ P.

∀ ∈ {1, ..., n − 2},

(17)

The costs (weights) for the regularization edges ER are deﬁned by c (vp, , vq, ) = νwpq , ∀p ∈ P, ∀q ∈ Nk (p), ∀ ∈ {1, ..., n − 1}.

(18)

By choosing σ as any positive value, the cut of minimum cost will be admissible, which implies that its corresponding level set function is single valued. Theorem 1. Let C be a minimum cut on G, then C is admissible if σ > 0 Proof. Suppose C is a minimum cut on G and for some p ∈ P several edges in Ep belongs to C. That is, there exists a set of indices Lp such that (vp, , vp,+1 ) ∈ C ∀ ∈ Lp . Deﬁne the cut C ∗ s.t. for each p ∈ P, vp, ∈ Vs∗ if ≤ max Lp , else vp, ∈ Vt∗ . Then C ∗ ∩ ED ⊂ C ∩ ED . Since σ > 0, no edges have zero weight, therefore |C ∗ ∩ ED | < |C ∩ ED |. Furthermore, |C ∗ ∩ ER |cardinality ≤ |C ∩ ER |cardinality . For T V1 , the weights on all edges in ER are equal. Therefore |C ∗ | < |C|, which is a contradiction. The same contradiction can also be derived for T V1, π4 . To summarize, for any piecewise constant level set function φ taking values in {1, 2, · · · n}, there exists a unique admissible cut on G. Moreover, the function φ and its corresponding cut satisﬁes |C| = Ed (c, φ).

(19)

Thus, a function φ corresponding to a minimum cut, is a minimizer of the functional (8), i.e. it solves the segmentation problem. Note that in case n = 2, the extra dimension breaks down, and the graph becomes identical to that of Greig et. al. [9] for binary problems. It is also possible to exactly minimize (8) as in [26], by solving a sequence of binary optimization problems via graph cuts. This approach is likely to be faster when n is very large, and is a power of 2. However, for image segmentation n is relatively small, and we expect the presented approach to be faster.

8

3.3

E. Bae and X.-C. Tai

Algorithm for Minimizing the Mumford-Shah Functional

The algorithm presented in the last section minimizes Ed (c, φ) with respect to φ for a ﬁxed c. Vice versa, for a ﬁxed φ the values c minimizing Ed (c, φ) are given by the average intensity in each region 0 u (x)ψi (x) dx i = 1, 2, ..., n, (20) ci = Ω ψ (x) dx Ω i or in discrete form

p∈P ci =

u0p ψi,p

p∈P

ψi,p

i = 1, 2, ..., n.

(21)

We want an algorithm to minimize both with respect to φ and c. This is achieved by combining the two above results in the following iterative descent algorithm Algorithm 1. Estimate initial values c0 , set l = 0 while( ||cl − cl−1 || > tol ) 1. Use graph cuts to estimate φ from ˜ φ = arg minφ˜ Ed (cl , φ).

(22)

2. Update cl+1 according to equation (21). 3. Update l ← l + 1.

Note that no initialization of the level set function is required. Only the values c0 need to be initialized, which can be achieved very eﬃciently by the isodata algorithm [27]. Note that algorithm 1 has an exact termination criterion, as tol can be set to zero. In all our experiments, convergence was reached in 4-12 iterations. It must be noted that this algorithm is no longer guaranteed to ﬁnd the global minima. Theoretically it may get trapped in a local minima close to the initial values c0 . However, in practice it is usually rather insensitive to initialization.

4

Numerical Experiments

In this section we validate our new optimization method by numerical experiments. The results are compared with the original gradient descent approach [3] for minimizing (4) (note: not the variant with simpliﬁed length term). The implementation of both these methods is made in C++. Comparisons are made both with respect to quality and computation time on an intel 2.19 GHz laptop. The list of computation times is shown in Table 1. The test images are shown in Figure 2. In all results, the estimated phases are depicted as a bright region. The results of experiment 1 and 2 are depicted in Figure 3 (a) - (d). We observe that graph cut

Graph Cut Optimization for the Piecewise Constant Level Set Method

9

Fig. 2. Test images Table 1. Computation times in seconds for gradient descent vs graph cut optimization

Experiment1 Experiment2 Experiment3 Brain

Size 100x100 100x100 92x98 933x736

Number of Phases Gradient descent Graph Cut 4 50.3 0.120 5 70.0 0.179 5 55.4 0.165 4 5401 25.22

optimization solve the multiphase problem with at least as good quality as gradient descent. In experiment 3 Figure 3(e)(f), the number of regions is assumed to be unknown. The optimal number of regions can be estimated by using more phases than necessary in the minimization problem. As we can see, this results in some empty phases, while the remaining phases capture the correct regions. We have also tested the method on a synthetic brain MRI image. The noise level is 7%, and non-uniformity of the RF-puls is of 20 % (see http://www.bic.mni.mcgill.ca/brainweb/ for details). We want to extract four tissue classes from the image: region 1; background, region 2; cerebrospinal ﬂuid, region 3; gray matter and region 4; white matter. This is achieved by minimizing the Mumford-Shah model with 4 phases. In Figure 4 we compare the results of graph cuts, gradient descent and the exact results. The background phase is not shown. Again, we observe that graph cut results are very good, while the computation time is dramatically reduced compared to gradient descent(c.f. Table 1, brain). Finally, in Figure 5 we show an example which demonstrates the limitation of the multiphase approach presented in [18], described in the related work section. For the chosen parameter ν, the global minimum consists of three phases, which we are able to detect by applying our multiphase algorithm with 4 phases, Figure 5(b) top. The result of the ﬁrst step of the algorithm presented in [18] is shown in Figure 5(b) buttom, which is the global minimum of the two phase Mumford-Shah functional. Clearly, no further splitting of these regions can result in the correct three phases, since the interphase from the ﬁrst step is not allowed to evolve.

10

E. Bae and X.-C. Tai

(a) Experiment 1: graph cut

(b) Experiment 1: gradient descent

(c) Experiment 2: graph cut

(d) Experiment 2: gradient descent

(e) Experiment 3: graph cut

(f) Experiment 3: gradient descent Fig. 3. (a) and (b) Experiment 1, from left to right: phase 1 - phase 4. (c) and (d) Experiment 2, from left to right: phase 1 - phase 5. (e) and (f) Experiment 3, from left to right: phase 1 - phase 5.

Graph Cut Optimization for the Piecewise Constant Level Set Method

11

(a) graph cut

(b) gradient descent

(c) exact phases Fig. 4. From left to right: phase 1 - phase 3. (a) Graph cut, (b) gradient descent, (c) exact phases.

(a)

(b)

Fig. 5. (a) Input image. (b) Top: Our approach, from left to right phase 1 - phase 4. Buttom: First step of approach reported in [18], from left to right phase 1 - phase 2.

5

Summary

We have presented an algorithm for eﬃciently minimizing the energy induced by the piecewise constant level set representation of the multiphase MumfordShah functional. This minimization method is based on graph cuts. Numerical

12

E. Bae and X.-C. Tai

experiments demonstrated the method is very superior in eﬃciency compared to the previous PDE based approach, while maintaining the same quality of results.

References 1. Dervieux, A., Thomasset, F.: A ﬁnite element method for the simulation of a Rayleigh-Taylor instability. In: Approximation methods for Navier-Stokes problems, Proc. Sympos., Univ. Paderborn, Paderborn, 1979. Lecture Notes in Math., vol. 771, pp. 145–158. Springer, Berlin (1980) 2. Osher, S., Sethian, J.: Fronts propagating with curvature dependent speed: algorithms based on hamilton-jacobi formulations. J. Comput. Phys. 79(1), 12–49 (1988) 3. Lie, J., Lysaker, M., Tai, X.: A variant of the level set method and applications to image segmentation. Math. Comp. 75(255), 1155–1174 (2006) (electronic) 4. Lie, J., Lysaker, M., Tai, X.: A binary level set model and some applications to mumford-shah image segmentation. IEEE Transactions on Image Processing 15(5), 1171–1181 (2006) 5. Tai, X., Christiansen, O., Lin, P., Skjaelaaen, I.: Image segmentation using some piecewise constant level set methods with mbo type of project. International Journal of Computer Vision 73, 61–76 (2007) 6. Mumford, D., Shah, J.: Optimal approximation by piecewise smooth functions and associated variational problems. Comm. Pure Appl. Math. 42, 577–685 (1989) 7. Chan, T., Vese, L.: Active contours without edges. IEEE Image Proc. 10, 266–277 (2001) 8. Vese, L.A., Chan, T.F.: A new multiphase level set framework for image segmentation via the mumford and shah model. International Journal of Computer Vision 50, 271–293 (2002) 9. Greig, D.M., Porteous, B.T., Seheult, A.H.: Exact maximum a posteriori estimation for binary images. Journal of the Royal Statistical Society, Series B, 271–279 (1989) 10. Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-ﬂow algorithms for energy minimization in vision. In: Figueiredo, M., Zerubia, J., Jain, A.K. (eds.) EMMCVPR 2001. LNCS, vol. 2134, pp. 359–374. Springer, Heidelberg (2001) 11. Kolmogorov, V., Zabih, R.: What energy functions can be minimized via graph cuts? IEEE Transactions on Pattern Analysis and Machine Intelligence 26(2), 147– 159 (2004) 12. Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. In: ICCV, vol. (1), pp. 377–384 (1999) 13. Ishikawa, H.: Exact optimization for markov random ﬁelds with convex priors. IEEE Transactions on Pattern Analysis and Machine Intelligence 25(10), 1333– 1336 (2003) 14. Ishikawa, H., Geiger, D.: Segmentation by grouping junctions. In: CVPR 1998: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, pp. 125–131. IEEE Computer Society, Los Alamitos (1998) 15. Darbon, J., Sigelle, M.: Image restoration with discrete constrained total variation part ii: Levelable functions, convex priors and non-convex cases. J. Math. Imaging Vis. 26(3), 277–291 (2006)

Graph Cut Optimization for the Piecewise Constant Level Set Method

13

16. Darbon, J.: A note on the discrete binary mumford-shah model. In: Gagalowicz, A., Philips, W. (eds.) MIRAGE 2007. LNCS, vol. 4418, pp. 283–294. Springer, Heidelberg (2007) 17. Zehiry, N.E., Xu, S., Sahoo, P., Elmaghraby, A.: Graph cut optimization for the mumford-shah model. In: Proceedings of the Seventh IASTED International Conference visualization, imaging and image processing, pp. 182–187. Springer, Heidelberg (2007) 18. El-Zehiry, N.Y., Elmaghraby, A.: A graph cut based active contour for multiphase image segmentation. In: IEEE International Conference on Image Processing, pp. 3188–3191 (2008) 19. Lie, J., Lysaker, M., Tai, X.: Piecewise constant level set methods and image segmentation. In: Kimmel, R., Sochen, N.A., Weickert, J. (eds.) Scale-Space 2005. LNCS, vol. 3459, pp. 573–584. Springer, Heidelberg (2005) 20. Chung, G., Vese, L.A.: Energy minimization based segmentation and denoising using a multilayer level set approach. In: Rangarajan, A., Vemuri, B.C., Yuille, A.L. (eds.) EMMCVPR 2005. LNCS, vol. 3757, pp. 439–455. Springer, Heidelberg (2005) 21. Jung, Y.M., Kang, S.H., Shen, J.: Multiphase image segmentation via modicamortola phase transition. SIAM J. Appl. Math. 67, 1213–1232 (2007) 22. Boykov, Y., Kolmogorov, V.: Computing geodesics and minimal surfaces via graph cuts. In: ICCV 2003: Proceedings of the Ninth IEEE International Conference on Computer Vision, Washington, DC, USA, pp. 26–33. IEEE Computer Society, Los Alamitos (2003) 23. Ford, L., Fulkerson, D.: Flows in networks. Princeton University Press, Princeton (1962) 24. Geman, S., Geman, D.: Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. In: Readings in uncertain reasoning, pp. 452–472. Morgan Kaufmann Publishers Inc., San Francisco (1990) 25. Dahlhaus, E., Johnson, D.S., Papadimitriou, C.H., Seymour, P.D., Yannakakis, M.: The complexity of multiway cuts (extended abstract). In: STOC 1992: Proceedings of the twenty-fourth annual ACM symposium on Theory of computing, pp. 241– 251. ACM, New York (1992) 26. Darbon, J., Sigelle, M.: Image restoration with discrete constrained total variation part i: Fast and exact optimization. J. Math. Imaging Vis. 26(3), 261–276 (2006) 27. Velasco, F.R.D.: Thresholding using the ISODATA clustering algorithm. IEEE Trans. Systems Man Cybernet. 10(11), 771–774 (1980)

Tubular Anisotropy Segmentation Fethallah Benmansour and Laurent D. Cohen CEREMADE, UMR CNRS 7534, Université Paris Dauphine, Place du Maréchal De Lattre De Tassigny, 75775 PARIS CEDEX 16, France {benmansour,cohen}@ceremade.dauphine.fr

Abstract. In this paper we present a new interactive method for tubular structure extraction. The main application and motivation for this work is vessel tracking in 2D and 3D images. The basic tools are minimal paths solved using the fast marching algorithm. This allows interactive tools for the physician by clicking on a small number of points in order to obtain a minimal path between two points or a set of paths in the case of a tree structure. Our method is based on a variant of the minimal path method that models the vessel as a centerline and surface. This is done by adding one dimension for the local radius around the centerline. The crucial step of our method is the definition of the local metrics to minimize. We have chosen to exploit the tubular structure of the vessels one wants to extract to built an anisotropic metric giving higher speed on the center of the vessels and also when the minimal path tangent is coherent with the vessel’s direction. This measure is required to be robust against the disturbance introduced by noise or adjacent structures with intensity similar to the target vessel. We obtain promising results on noisy synthetic and real 2D and 3D images.

1 Introduction In this paper we deal with the problem of finding a complete segmentation of tubular structures like vessels. The main objective is to extract at the same time the centerline of the tubular structure and its boundary. During the last two decades, the extraction of vascular objects such as the blood vessel, coronary arteries, or other tube-like structures has attracted the attention of more and more researchers. Various methods such as vascular image enhancement methods [1, 2, 3], or others were proposed, see [4] for a complete survey. Some of these methods extract the vessel boundary directly, and then use thinning methods to find its centerline. Other methods extract only the centerline and then estimate the vessel width to extract its boundary. Deschamps and Cohen [5] proposed to use the minimal path method to find the centerline. The minimal path technique introduced by Cohen and Kimmel [6] captures the global minimum curve between two points given by the user. This leads to the global minimum of an active contour energy. Since then, the minimal path method has been improved by many researchers, and adapted to anisotropic media as done by Jbabdi et al for tractography [7]. Unfortunately, despite their numerous advantages, classical minimal path techniques exhibit some disadvantages. First, vessel boundary extraction can be very difficult, even in 2D X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 14–25, 2009. c Springer-Verlag Berlin Heidelberg 2009

Tubular Anisotropy Segmentation

15

where the vessel’s boundary can be completely described by two curves. Second, the path given by the minimal path technique does not always yield to the centerline of the vessel. A readjustment step is required to obtain a central trajectory. Third, the minimal path technique provides only a trajectory and does not give information about the vessel boundary and local width. Li and Yezzi [8] proposed a new variant of the classical, purely spatial, minimal path technique by incorporating an extra non-spatial dimension into the search space. Each point of the 4D path (after adding the extra dimension for the 3D image) consists of three spatial coordinates plus a fourth coordinate which describes the vessel thickness at that corresponding 3D point. Thus, each 4D point represents a sphere in 3D space, and the vessel is obtained by taking the envelope of these spheres as we move along the 4D curve. A crucial step of this method is to build an adequate potential that drives the propagation. Li and Yezzi [8] proposed different isotropic potentials. As they said in the conclusion of their paper, the proposed potentials are very parameter dependent and they hoped to find more appropriate choice of potential. In particular, one can see in their paper, that the potential used does not yield to a correct detection of the radius when it is not constant (see figure 6 in [8]). An other drawback of Li and Yezzi method is that they did not take into account the vessel orientation. Our first contribution is to take into account the vessel orientation by defining a suited anisotropic metric that makes the propagation faster along the centerlines and for the adequate radius. Law et al. [9] proposed a new scalar descriptor called Optimally Oriented Flux (OOF) for the detection of curvilinear structures. But they did not exploit the orientation given by their descriptor. The major advantage of the OOF technique is that it does not consider the regions in the vicinity of target objects, where background noise or adjacent structures with intensity similar to the target vessels are possibly present. Therefore, the disturbance introduced by the closely located nearby structures is avoided. The second contribution of this paper is to build an anisotropic metric based on the OOF descriptor, its scalar function as well as its orientation. That makes the propagation faster along the vesselÕs center line and for exact associated scale. This means that the path location, orientation and scale (radius) have to be coherent with the local geometry of the image extracted by the OOF. In section 2, we give some background on minimal path method and Anisotro-pic Fast Marching. In section 3 the Optimally Oriented Flux descriptor is presented as well as the metric construction. In section 4, results on synthetic and real data are shown. Finally, conclusions and perspectives follow in section 5.

2 Background on Minimal Path Method A minimal path, first introduced in the isotropic (P does not depend on the orientation of the path) case [6], is a pathway minimizing the energy functional, E(γ) = P γ(s), γ (s) ds (1)

γ

γ (.)T M(γ(.))γ (.) describes an infinitesimal distance where, P(γ(.), γ (.)) = along a pathway γ relative to a metric tensor M (symmetric definite positive). Thus,

16

F. Benmansour and L.D. Cohen

we are considering only the case of an elliptic medium. In the isotropic case M(.) = P 2 (.)I, where I is the identity matrix. A curve connecting p1 to p2 that globally minimizes the above energy (1) is a minimal path between p1 and p2 , noted Cp1 ,p2 . The solution of this minimization problem is obtained through the computation of the minimal action map U : Ω → R+ associated to p1 on the domain Ω which can be a 2D, 3D or 4D domain. The minimal action is the minimal energy integrated along a path between p1 and any point x of the domain Ω : ∀ x ∈ Ω, U(x) = min P γ(s), γ (s) ds , (2) γ∈Ap1 ,x

γ

where Ap1 ,x is the set of paths linking x to p1 . The values of U may be regarded as the arrival times of a front propagating from the source p1 with oriented velocity related to the metric tensor M−1 . U satisfies the Eikonal equation ∇U(x)M−1 (x) = 1 for x ∈ Ω, and U(p1 ) = 0,

(3)

√ where vM = vT M v. The map U has only one local minimum, the point p1 , and its flow lines satisfy the Euler-Lagrange equation of functional (1). Thus, the minimal path Cp1 ,p2 can be retrieved with a simple gradient descent on U from p2 to p1 (see Fig. 1), solving the following ordinary differential equation with standard numerical methods like Heun’s or Runge-Kutta’s : dCp1 ,p2 (s) ∝ −M−1 (Cp1 ,p2 (s))∇U Cp1 ,p2 (s) , with Cp1 ,p2 (0) = p2 . ds

(4)

Proof of (3) and (4) can be found in [10,7]. On figure 1, we show some examples of the minimal path method on an isotropic case and an anisotropic one. On the first image of figure 1 the metric is isotropic and the potential P in the grey region is twice as low as the white one. Isolevel sets of the minimal action map associated to the source point p1 are displayed and so the minimal path Cp1 ,p2 . The second image represents a metric M. We took two constant metrics in each half side of the image with different orientations. On the last image, the minimal action map U associated to the metric M and to the source point p1 is shown. The minimal path Cp1 ,p2 is found by solving equation (4).

p1 p1

p2

p2 0

50

100

150

200

250

300

350

Fig. 1. Minimal path examples on an isotropic case on the left image. On the middle, visualization by small ellipses of eigenvalues of a metric constant on each half side of the image. On the right, the minimal action map associated to the source point p1 with the minimal path Cp1 ,p2 .

Tubular Anisotropy Segmentation

17

The Fast Marching Method (FMM) is a numerical method introduced by Sethian in [11] and Tsitsiklis in [12] for efficiently solving the isotropic Eikonal equation on a cartesian grid. The central idea behind the FMM is to visit grid points in an order consistent with the way wavefronts of constant action propagate. It leads to a single-pass algorithm for solving equation (3) and computing the minimal action map U. Tsitsiklis’s method relies on minimizing directly the energy functional of equation (1) while Sethian’s method uses the Eikonal equation. Both methods are suitable for isotropic metric, but they fail for anisotropic metric [13]. To deal with anisotropy, Sethian and Vladimirsky [10] proposed an update scheme that converges to the viscosity solution of the anisotropic Eikonal equation. A simplified scheme, based on the original Tsitsiklis’s method [12], was proposed by Lin in [14] to approximate the solution of the anisotropic Eikonal equation. Contrary to Sethian and Vladimirsky’s ordered upwind method (OUM) [10], Lin’s algorithm does not converge to the viscosity solution of the Eikonal equation. In this paper we used Lin’s scheme to solve the anisotropic Eikonal equation, since it is much faster (than OUM) and the introduced errors do not affect much the extracted geodesics. The FMM is a front propagation approach that computes the values of U in increasing order, and the structure of the algorithm is almost identical to Dijkstra’s algorithm for computing shortest paths on graphs [15]. In the course of the algorithm, each grid point is tagged as either Alive (point for which U has been computed and frozen), Trial (point for which U has been estimated but not frozen) or Far (point for which U is unknown). The set of Trial points forms an interface between the set of grid points for which U has been frozen (the Alive points) and the set of other grid points (the Far points). This interface may be regarded as a front expanding from the source until every grid point has been reached. Let us denote by NM (x) the set of M neighbors of a grid point x, where M = 2 × d if the dimension of Ω is equal to d. Initially, all grid points are tagged as Far, except the source point p1 that is tagged as Trial. At each iteration of the FMM one chooses the Trial point with the smallest U value, denoted by xmin . Then, xmin is tagged as Alive and the value of U is updated for each point of the set NM (xmin ) which is either Trial or Far. In order to satisfy a causality condition, the way U is updated in the vicinity of xmin requires special care. The iteration ends by tagging every Far point of the set NM (xmin ) as Trial. The algorithm automatically stops when all grid points are Alive. The key to the speed of the FMM is the use of a priority queue to quickly find the Trial point with the smallest U value. If Trial points are ordered in a min-heap data structure, the computational complexity of the FMM is O(N logN ), where N is the total number of grid points. A crucial step of the Fast Marching algorithm is the computation of the weighted distance between the front and the neighbouring voxels in the Trial set. Here, we present a way to estimate this weighted distance in the anisotropic case and only in 3D. It is straightforward to extend it to 4D. Since the distance is anisotropic, we cannot use the standard methods, because they rely on the fact that the geodesics are perpendicular to the level sets of U. To take into account the anisotropy Jbabdi et al [7] and Lin [14] considered a set of simplexes that cover the whole neighbourhood around a voxel of the narrow band. The definition of a simplex neighbouring a point x is simply a set of three points (x1 , x2 , x3 ) that are among the 26 neighbours of x, defining a triangle that we

18

F. Benmansour and L.D. Cohen x3 xm x

x2

x1

Fig. 2. On the left Position of the optimal point on a simplex such as to minimize the geodesic distance to x. On the right the considered simplexes.

denote x1 x2 x3 . There are 48 such triangles around x for the 26 connexity. To make the update procedure faster, we propose to consider only the simplexes defined by a t-uple of three points of the 6-neighbors of x. There are 8 such triangles (see Fig. 2), and by making this modification, the precision of the algorithm is lower but the algorithm is six times faster. To estimate U(xm ), where xm is a neighbor of the last trial point xmin , we make two approximations. If the geodesic passing by xm comes from a triangle x1 x2 x3 then the time of arrival is given by: xm U(x) + U(xm ) = min P (γ, γ ) (5) x∈x1 x2 x3

x

The term one wants to minimize is approximated by : f (α) =

3

i=1

3

αi U(xi ) + x − αi xi i=1

M(x)

,

(6)

3 where α = (α1 , α2 , α3 ), with i=1 αi = 1 since the point x is in the triangle (see figure 2). This equation follows Tsitsiklis’s approximation [12]. The first term approximates the value of the minimal action map at the point x = 3i=1 αi xi by a simple linear interpolation. And the second term approximates the remaining distance by considering the metric constant along the segment [x, xm ] equal to its value at point xm . 3 The function f is convex and the constraints on α, i.e i=1 αi = 1 and αi ≥ 0, define a convex subset. Thus the minimization of f can be done using classical optimization tools. See [7] for more details. For each of the eight triangles, we get a value u. Finally, we choose the triangle giving the smallest value of u. Note that in order to approximate ∇U, computing the derivatives of U in the triangle using the estimate U(xn ) gives a consistent approximation of ∇U(xn ) by the following: ∇U(xn ) = (U(xn ) − U(x ))

xn − x , xn − x

where x is the minimizer of function f , see figure 2 left, and . is the Euclidean norm. The computation of the gradient is very useful since it is used to solve the gradient descent described by equation (4).

Tubular Anisotropy Segmentation

19

3 Optimally Oriented Flux: An Anisotropy Descriptor We are interested in the construction of a metric that extracts from the image the geometric information leading to reconstruction of vessels. This means that we wish to find an estimate for the local orientation and scale and a criterion on the local geometry to distinguish the presence of vessels from the background. At the position x on an image I, the amount of the image gradient projected along the axis v flowing out from a 3D sphere (or a 2D circle) Sr is measured as in [9],

h da, (7) (∇(G ∗ I(x + h)) · v)v · f (x, v; r) = |h| ∂Sr where G is a Gaussian function with a scale factor of 1 pixel, r is the sphere (or circle) radius, h is the position vector along ∂Sr and da is the infinitesimal area (or length) on ∂Sr . To detect vessels having higher intensity than the background region, one would be interested in finding the vessel direction which minimizes f (x, v; r), i.e. we are looking for: arg min f (x, v; r). Using the divergence theorem, it can be shown that f (x, v; r) v can be calculated using a simple convolution, f (x, v; r) = vT {(∂i,j G) ∗ I ∗ 1Sr } v,

(8)

where (∂i,j G) is the Hessian matrix of function G and 1Sr is the indicator function inside the sphere (or circle) Sr . By differentiating the above equation with respect to v, minimization of function f is in turn acquired as solving a generalized eigenvalue decomposition problem. Solving the aforementioned generalized eigen decomposition problem gives d eigenvalues (where d = 2 or 3 is the dimension of the image), λ1 (·) ≤ · · · ≤ λd (·) and d eigenvectors vi (·), i.e. λi (x; r) = f (x, vi (x; r); r) for i = 1, . . . , d. To handle the vessels having various radii, a multi-scale approach should be used along with the OOF method. In [9], Law and Chung have proposed to normalize the OOF’s eigenvalues by the sphere surface area when the OOF method is incorporated in a multiscale approach for 3D image volumes. In the 2D case the eigenvalues are normalized by the circle perimeter 2πr. In the 3D case the eigenvalues are normalized by the sphere area 4πr2 . In the 2D case (see figure 3), for a point on the centerline and if r is equal to radius of the vessel, the first eigenvector v1 represents the direction orthogonal to the vessel. v2 reprensents the direction along the vessel. In the 3D case, if the point is on the centerline, the two eigenvectors associated to the first eigenvalues (λ1 , λ2 ) represent the directions orthogonal the vessel. v3 represents the direction along the vessel, see figure 4. On the same figure, one can see that if the point x is on the centerline, the minimal response of the function f is obtained when the radius r is equal to the exact radius of the tube. If the point is inside the tube but not on the centerline, v3 is parallel to the tube orientation, and the other eigenvectors depends on the scale r. If the point is outside the tube (last line), then the vector v3 , corresponding to the red area, is oriented toward the centerline. Li and Yezzi [8] proposed a new variant of the classical, purely spatial, minimal path technique by incorporating an extra non-spatial dimension into the search space. The crucial step of this method is to build an adequate metric that drives the propagation.

20

F. Benmansour and L.D. Cohen

Fig. 3. The plots of the values of f (x, v; r) obtained from the synthetic image shown in the left, at four different positions with various radii and projection axes. (a) Four interested positions, denoted as x1 , x2 , x3 and x4 are shown along with the original synthetic image. (b) An illustration regarding the polar coordinate system used in (c)-(f). (c)-(f) The plots of the values of f (·) and the corresponding eigenvectors, computed at the four different positions shown in (a), using various values of r and different projection axes (cos θ sin θ)T .

Fig. 4. Plot of f (x, v; r) superimposed on the original 3D synthetic image for three different points(on each line) and different values of the radius : r = 3, . . . , 7 from left to right. The radius of the tube on the top half side image is equal to 4, and equal to 6 on the bottom half side. Similarly to figure 3, the visualization of the normalized flux function is done using a spherical coordinate system (instead of the polar one used in 2D). The first point is on the centerline of the tube. The second point is inside the tube but not on the centerline. The third point is outside the tube. The reader should zoom on each image. Notice that the colormaps are different.

Li and Yezzi [8] proposed different isotropic potentials. The main drawback, as they mention, is that these potentials are very parameter dependent and they do not exploit the vessel orientation. Our main contribution is to improve Li and Yezzi method by adding to it an anisotropic formulation, and the anisotropic metric is constructed by extension of the OOF descriptor presented by Law et al. [9].

Tubular Anisotropy Segmentation

21

Fig. 5. The constructed metric for different scales r = 1, 5, 10, 15, 20 from left to right. The original image is shown in figure 3 (a), the radius of the structure is equal to 10. We used the same color range for all images, so one can see that the optimal anisotropy is obtained along the centerline of the tubular structure when the scale r is equal to the exact radius of the tube. On ˜ the top, we show a display of M(x, r)−1 . On the bottom, responses of Pradii are shown.

The (d + 1)D minimal path is found by minimizing the following energy: γ (s)T M(γ(s))γ (s) ds, where M is the (d + 1)D anisotropic metric we want γ to construct. It is not natural to consider orientations on the (d + 1) dimension, i.e the radii dimension. Thus one candecompose by block the metric M as follows: ˜ M(x, r) 0 ˜ where M(x, r) is a d × d symmetric definite M(x, r) = 0 Pradii (x, r) positive matrix giving the spatial anisotropy and Pradii (x, r) is the radii potential (also strictly positive). Since the result given by the anisotropic minimal path method is very dependent on the metric, results inherit advantages and drawbacks of the constructed metric, thus we should be very carful with its construction. First, let us fix conditions on the desired ˜ has to be well oriented along the vessel centerline. And metric. The spatial metric M the radii potential Pradii has to be small for the adequate scale for any point of the image. ˜ is symmetric Pradii corresponds to the inverse speed for the radii dimension. Since M d T ˜ m definite positive, we can decompose it as follows: M(.) = i (.)ui (.)ui (.) , i=1 where 0 < m1 ≤ · · · ≤ md are the eigenvalues and ui are the associated eigenvectors. √ The velocity of the propagating front along direction ui is equal to 1/ mi . We used the OOF descriptor to construct the metric as follows: d λj (.) λ (.) i j = i T i=1 ˜ M(.) = vi (.)vi (.) , Pradii (.) = β exp α . exp α d−1 d i=1 (9) The constant α is controlled by an intuitive parameter, which is the maximal exp(αλ2 (x, r)) in the 2D case and μ = spatial anisotropy ratio: μ = max x,r exp(αλ1 (x, r)) d

22

F. Benmansour and L.D. Cohen

⎫

⎧ 3 (x,r) ⎨ exp α λ2 (x,r))+λ ⎬ 2 in the 3D case. By choosing the maximal spatial

max x,r ⎩ 2 (x,r) ⎭ exp α λ1 (x,r))+λ 2 anisotropy ratio μ, the constant α is fixed. And by doing so, the anisotropy descriptor M becomes contrast invariant because the OOF is linear on the image. The parameter β controls the radii speed. In 2D (it is very similar in 3D), if Pradii ≤ exp(αλ1 ) then the Fast Marching propagation is faster for the radii than the spatial dimensions. If Pradii ≥ exp(αλ2 ) then the propagation is slower. One can tune parameter β depending on the tubular structure one wants to extract. If its radius changes a lot then β should be chosen such that the propagation on the radii dimension is faster. If not β is chosen such that the propagation is less sensitive on the radii dimension. On figure 5 the constructed metric of image 3 at some different scales is shown. Since we chose the same color range for the visualization, we can see that the directions are well detected, and that the optimal values are obtained along the centerline of the tube

Fig. 6. The red cross points are source points given by the user, and the blue ones are end points. On each case the segmented centerlines are displayed as well as the envelope of the moving discs. In the middle, the associated minimal action map U as well as the 3D minimal path between the two selected points are shown (transparent visualization).

Tubular Anisotropy Segmentation

23

when the scale is equal to the tube radius. For our experiments, we took μ = 10 and 1) β such that max exp(αλ = 5, this means that in the worst case, the speed along Pradii the radii dimension is 5 times faster than the spatial dimensions. We did so, because we wanted our algorithm to be sensitive to the radii dimension.

4 Experimental Results Our method is minimally interactive. First, the user has to precise if the desired vessels are darker or brighter than the background. So, we can consider different criteria on the signs of the eigenvalues. Then the scale range [rmin , rmax ], which corresponds to the range of radii of the vessel one wants to extract, is given by the user. Finally few points are required as source points or end points of the Fast Marching algorithm. We used the metric described in the previous section to find the minimal anisotropic path (as described in section 2) between two or more selected points (see figures 6 and 7). For any selected point, the associated radius is equal to the minimal radius rmin given by the user. On figure 6, segmentation results on synthetic and real noisy 2D images are shown. On the first synthetic image, the source point and destination are selected on the centerline. The obtained tube is perfectly detected as well as the centerline. On the second

Fig. 7. First line : RCA segmentation using the tubular anisotropy approach shown on the whole image and on the selected sub-volume. Second line : LAD segmentation shown on the whole image and on the selected sub-volume. Only few points are required (the extremities of the paths). The tubular anisotropy method provides the centerline as well as vessels boundaries.

24

F. Benmansour and L.D. Cohen

image, the initial points are not centered. But the centerline given by our algorithm goes back fast to the real centerline. This makes our algorithm robust to initialization. The third synthetic image shows that our approach is robust to scale changing. On the last line of figure 6, segmentation results are shown on real noisy images. In figure 7, segmentation results are shown on real medical images. First, right coronary arteries (RCA) are segmented. Second, left anterior descending (LAD) arteries are segmented. One can see that the obtained radii on the principal coronary branches are larger than those of the secondary. Thus, our approach is robust to scale changing and bifurcations. Nevertheless, our current implementation requires huge memory allocations due to the 4D and anisotropic aspects. To overcome this issue, we added a preprocessing interactive tool to select a sub-volume containing the desired vessels (see figure 7). Moreover, we are working on a new implementation of the tubular anisotropy approach to make the memory allocation dynamic and hence to benefit from the front propagation aspect of the fast marching algorithm. Besides the reduction of the computation time (which has been actually achieved), we will save on memory allocation and will have a new version of our algorithm that extract the whole coronary arteries using a regular PC.

5 Conclusion In this paper we have proposed a new general method for tubular structure extraction in 2D and 3D images. Our method exploit the orientation of the vessels by using the optimally oriented flux to construct a multi-resolution anisotropic metric that extracts from the image the local geometry and describes the vessels orientation and scales. Combining this metric with anisotropic minimal path technique, we were able to find a complete description of the tubular structure, i.e the centerline as well as the boundary. To summarize, our method is minimally interactive, robust to initialization, scale variations and bifurcations.

Acknowledgements We would like to thank Professor Anthony J. Yezzi and Max Wai-Kong Law for interesting discussions. Also Eduardo Davila for his precious help for the implementation of the interface. This work was partially supported by ANR grant SURF -NT05-2_45825.

References 1. Sato, Y., Nakajima, S., Shiraga, N., Atsumi, H., Yoshida, S., Koller, T., Gerig, G., Kikinis, R.: Three-dimensional multi-scale line filter for segmentation and visualization of curvilinear structures in medical images. Med. Image Anal. 2(2), 143–168 (1998) 2. Krissian, K.: Flux-based anisotropic diffusion applied to enhancement of 3D angiogram. TMI 21(11), 1440–1442 (2002) 3. Frangi, A., Niessen, W.J., Vincken, K.L., Viergever, M.A.: Multiscale vessel enhancement filtering. In: Wells, W.M., Colchester, A.C.F., Delp, S.L. (eds.) MICCAI 1998. LNCS, vol. 1496, pp. 130–137. Springer, Heidelberg (1998)

Tubular Anisotropy Segmentation

25

4. Kirbas, C., Quek, F.K.H.: A review of vessel extraction techniques and algorithms. ACM Computing Surveys 36, 81–121 (2004) 5. Deschamps, T., Cohen, L.: Fast extraction of minimal paths in 3D images and applications to virtual endoscopy. MIA 5(4) (December 2001) 6. Cohen, L.D., Kimmel, R.: Global minimum for active contour models: a minimal path approach. International Journal of Computer Vision 24, 57–78 (1997) 7. Jbabdi, S., Bellec, P., Toro, R., Daunizeau, J., Pélégrini-Issac, M., Benali, H.: Accurate anisotropic fast marching for diffusion-based geodesic tractography. Journal of Biomedical Imaging 2008(1), 1–12 (2008) 8. Li, H., Yezzi, A.: Vessels as 4D curves: Global minimal 4D paths to extract 3D tubular surfaces and centerlines. IEEE Transactions on Medical Imaging 26(9), 1213–1223 (2007) 9. Law, M.W.K., Chung, A.C.S.: Three dimensional curvilinear structure detection using optimally oriented flux. In: ECCV, vol. 4, pp. 368–382 (2008) 10. Sethian, J.A., Vladimirsky, A.: Fast methods for the eikonal and related hamilton- jacobi equations on unstructured meshes. Proceedings of the National Academy of Sciences 97(11), 5699–5703 (2000) 11. Sethian, J.A.: A fast marching level set for monotonically advancing fronts. Proceedings of the National Academy of Sciences 93, 1591–1595 (1996) 12. Tsitsiklis, J.N.: Efficient algorithms for globally optimal trajectories. IEEE Transactions on Automatic Control 40, 1528–1538 (1995) 13. Chopp, D.L.: Replacing iterative algorithms with single-pass algorithms. Proc. Nat. Acad. Sc. USA 98(20), 10992–10993 (2001) 14. Lin, Q.: Enhancement, extraction, and visualization of 3D volume data. PhD thesis, Linkopings Universitet (2003) 15. Dijkstra, E.W.: A note on two problems in connection with graphs. Numerische Mathematic 1, 269–271 (1959)

An Unconstrained Multiphase Thresholding Approach for Image Segmentation Benjamin Berkels Institut für Numerische Simulation, Rheinische Friedrich-Wilhelms-Universität Bonn, Nussallee 15, 53115 Bonn, Germany [email protected] http://numod.ins.uni-bonn.de/ Abstract. In this paper we provide a method to ﬁnd global minimizers of certain non-convex 2-phase image segmentation problems. This is achieved by formulating a convex minimization problem whose minimizers are also minimizers of the initial non-convex segmentation problem, similar to the approach proposed by Nikolova, Esedo¯ glu and Chan. The key diﬀerence to the latter model is that the new model does not involve any constraint in the convex formulation that needs to be respected when minimizing the convex functional, neither explicitly nor by an artiﬁcial penalty term. This approach is related to recent results by Chambolle. Eliminating the constraint considerably simpliﬁes the computational difﬁculties, and even a straightforward gradient descent scheme leads to a reliable computation of the global minimizer. Furthermore, the model is extended to multiphase segmentation along the lines of Vese and Chan. Numerical results of the model applied to the classical piecewise constant Mumford-Shah functional for two, four and eight phase segmentation are shown.

1

Introduction

Image segmentation is one of the fundamental research topics in the ﬁeld of image processing. In particular, the Mumford-Shah model [1] is widely used in this context. One of the diﬃculties of this and many other variational image processing models is that the underlying energy functional has local, non-global minima. This is not only a theoretical problem, since the commonly used numerical minimization techniques often get stuck in local minima that diﬀer considerably from a global minimum, hence possibly producing useless results. The goal of this paper is to introduce a method to obtain a global minimizer of the Mumford-Shah functional for 2-phase segmentation that only involves solving an unconstrained convex minimization problem. This method can be extended to multiphase segmentation by the ideas of Vese and Chan [2] in a canonical way. 1.1

Related Work

The problem of minimizing the Mumford-Shah segmentation functional has been extensively studied in the last decade leading to a wide range of existing methods, X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 26–37, 2009. c Springer-Verlag Berlin Heidelberg 2009

Image Segmentation by Unconstrained Multiphase Thresholding

27

each with its own shortcomings. One of the ﬁrst numerical feasible methods to obtain (local) minimizers of the functional was proposed by Chan and Vese [3]. They build on the levelset methods of Osher and Sethian [4] and parameterize the unknown set by a levelset function. Shen [5] developed a Γ -convergence formulation along with a simple implementation by the iterated integration of a linear Poisson equation. The unknown set is represented in a diﬀuse way by a phase ﬁeld. In [6], Esedo¯glu and Tsai tackle the minimization problem based on the threshold dynamics of Merriman, Bence and Osher [7] for evolving an interface by its mean curvature. Here the minimization is achieved by alternating the solution of a linear parabolic partial diﬀerential equation and simple thresholding. Alvino and Yezzi [8] approximate Mumford-Shah segmentation using reduced image bases. According to them, the majority of the robustness of Mumford-Shah segmentation can be obtained without allowing each pixel to vary independently. Their approximative model has comparable performance to Mumford-Shah segmentations where each pixel is allowed to vary freely. A way to obtain global minimizers was introduced by Nikolova, Esedo¯ glu and Chan [9]. Here, a convex constrained minimization problem has to be solved followed by a simple thresholding of the latter minimizer. This method is closely related to the method we propose in this paper, the key diﬀerence is that [9] requires a constraint in the convex minimization while the model proposed in this paper does not involve any constraint in the convex formulation. On the other hand there are methods to solve a certain class of minimal surface problems by unconstrained convex optimization, cf. the work of Chambolle and Darbon [10,11]. The 2-phase Mumford-Shah functional belongs to this class, yet due to the best of our knowledge nobody seems to have tapped the potential oﬀered by these general insights for Mumford-Shah based image segmentation so far.

2

Constrained Global 2-Phase Minimization

First let us describe the general framework and revise the work of Nikolova et al. [9], the starting point for our model. In the following, Ω denotes our computational domain, an arbitrary but ﬁxed subset of Rn . For given indicator functions f1 , f2 ∈ L1 (Ω) such that f1 , f2 ≥ 0 a.e. we consider the prototype Mumford-Shah energy EMS [Σ] := f1 dx + f2 dx + ν Per(Σ), (1) Σ

Ω\Σ

where Per(Σ) denotes the perimeter of the set Σ ⊂ Ω in Ω. If u0 is an image, c1 , c2 ∈ R are two grey values and fi (x) := (u0 (x) − ci )2 , this is the well known piecewise constant Mumford-Shah functional for 2-phase segmentation, i.e. E[Σ, c1 , c2 ] = (u0 − c1 )2 dx + (u0 − c2 )2 dx + ν Per(Σ). (2) Σ

Ω\Σ

28

B. Berkels

Remark 1. Because of

EMS [Σ] = Σ

(f1 − f2 )dx + ν Per(Σ) +

f2 dx, Ω

ˆMS [Σ] =:E

ˆMS share the same minimizers. EMS and E 2

Remark 2. For h(x) := e−|x| , we have (f1 + h) dx + (f2 + h) dx + ν Per(Σ) − hdx, EMS [Σ] = Σ Ω\Σ Ω =C<∞

i.e. replacing f1 and f2 by f1 + h and f2 + h does not aﬀect the minimizers of EMS . This, combined with f1 , f2 ≥ 0 a.e., means that we can assume f1 , f2 > 0 a.e. in Ω without loss of generality. To obtain (local) minimizers of the functional above, Chan and Vese [3] proposed to parametrize the unknown set Σ by a levelset function φ and get the energy ECV [φ] := H(φ)f1 + (1 − H(φ))f2 + ν|∇(H(φ))|dx. Ω

Here, H(·) denotes the Heaviside function, i.e. H(s) = 1 for s > 0 and H(s) = 0 else. A gradient descent will be used for minimization, therefore H is replaced by a smeared out Heaviside function, e.g. Hδ (x) := 12 + π1 arctan xδ , where δ > 0. While the speciﬁc choice is not important, it is important to use a function whose derivative does not have compact support (cf. [3]). This gives the regularized energy ECV,δ [φ] := Hδ (φ)f1 + (1 − Hδ (φ)) f2 + ν |∇(Hδ (φ))| dx (3) Ω

and yields the gradient descent

∇φ . ∂t φ = Hδ (φ) (f2 − f1 ) + ν div |∇φ|

(4)

One of the major drawbacks of the energy (3) is its non-convexity in φ. In [9], Nikolova et al. noted that the gradient descent (4) and

∇φ ∂t φ = (f2 − f1 ) + ν div |∇φ| have the same stationary points, because Hδ (φ) > 0. Obviously the latter is the gradient descent of the energy ECE [φ] := (f1 − f2 )φ + ν|∇φ|dx. Ω

Image Segmentation by Unconstrained Multiphase Thresholding

29

In general, f1 − f2 takes positive and negative values, therefore the energy is not bounded (neither from below nor from above). In other words, it does not necessarily have a minimizer. However, this is easily ﬁxed by restricting the minimization to 0 ≤ φ(x) ≤ 1 for all x ∈ Ω. Based on this, the following theorem holds: Theorem 1. For given indicator functions f1 , f2 ∈ L1 (Ω) such that f1 , f2 ≥ 0 a.e., let u := argmin (f1 − f2 )˜ u + ν|∇˜ u|dx = argmin ECE [˜ u] 0≤˜ u≤1

0≤˜ u≤1

Ω

and Σc := {x ∈ Ω|u(x) > c}. Then Σc is a minimizer of the Mumford-Shah energy (1) for all c ∈ [0, 1). Proof. Nikolova et al. proved this theorem in [9] for a.e. c ∈ [0, 1], we extend it here to hold not only for almost every, but for every c ∈ [0, 1). First, we brieﬂy sketch the prove given by Nikolova et al. for a.e. c ∈ [0, 1]. Using 0 ≤ u ≤ 1 and the coarea formula, one can show 1 ECE [u] = EMS [Σc ]dc − C, 0

where C is a constant independent of u. Let Σ∗ ⊂ Ω be a minimizer of EMS (the existence of such minimizers using convergence in measure follows from standard arguments) and let M := {c ∈ [0, 1]|EMS [Σc ] > EMS [Σ∗ ]}. Assuming μ(M ) > 0 leads to the contradiction ECE [χΣ∗ ] < ECE [u] therefore μ(M ) = 0 holds and the statement is proven for a.e. c ∈ [0, 1]. Here, χA denotes the characteristic function of the set A. Now we extend the statement to all c ∈ [0, 1), inspired by the proof of Lemma 4 (iii) in [12]: Again let u be a minimizer of ECE under the constraint 0 ≤ u ≤ 1 and denote its superlevelsets by Σc . Choose an arbitrary but ﬁxed cˆ ∈ [0, 1). The statement holds for a.e. c ∈ [0, 1], so by Remark 1, there exists a sequence (cn ) ∈ [0, 1]N with cn ↓ cˆ such that ˆMS [Σ]. Σcn ∈ argmin E Σ⊂Ω

Since the superlevelsets of a function are contained in each other, we have χΣcn =

∞ χ nk=1 Σck → χΣ ∪ pointwise a.e., where Σ ∪ := n=1 Σcn . Setting g := f1 − f2 and using Lebesgue’s dominated convergence theorem, we obtain gdx = gχΣ ∪ dx = lim gχΣcn dx = lim gdx. Σ∪

Ω

n→∞

n→∞

Ω

Σcn

Here we used gχΣcn ≤ |g| ≤ |f1 | + |f2 | to provide the integrable upper bound. For each n and Σ ⊂ Ω, we have gdx + ν Per(Σcn ) ≤ gdx + ν Per(Σ). Σ cn

Σ

30

B. Berkels

Using the continuity argument from above and the lower semicontinuity of the perimiter (cf. [13]), we get gdx + ν Per(Σ ∪ ) ≤ gdx + ν Per(Σ), Σ∪

Σ

i.e. Σ ∪ is a minimizer of EMS [·, cn ]. Combining this with Σc = {x ∈ Ω|u(x) > c} =

∞

{x ∈ Ω|u(x) > cn } =

n=1

concludes the proof.

∞

Σc n

n=1

Knowing that Theorem 1 holds true for all c ∈ [0, 1) also remedies the last bit of “uncertainty” left in [9]. Remark 3. For any function u that fulﬁlls the constraint, obviously {u > 1} = ∅. Therefore we cannot expect Theorem 1 to hold for c = 1. To solve the constrained optimization problem, Nikolova et al. show that the constrained problem has the same minimizers as the unconstrained problem if a penalty term of the form α p(u(x)) is added with asuﬃciently large coeﬃcient α (cf. [9], Claim 1). Here p denotes p(s) = max{0, 2 s − 12 − 1} . While this result already gives a method to ﬁnd global minimizers of EMS by solving a convex, unconstrained minimization problem, its practical relevance is limited. Most numerical minimizations methods rely on the gradient of the functional, but the proposed penalty term is not diﬀerentiable, making a regularization necessary. But any smooth regularization of the penalty term will stop the minimizers of the convex, constrained functional to coincide with those of the convex functional with penalty term. The stronger the regularization, the more the minimizers deviate. Furthermore, the regularization imposes numerical diﬃculties. If an explicit gradient descent is used for the minimization (as proposed in [9]), a suitable timestep size control is needed to ensure convergence. The step sizes allowed by such methods, e.g. the Armijo rule [14], typically correspond to the size of the region in which the linearization of the functional properly approximates the functional. Due to the nature of the penalty term p, the linearization at 0 and 1 of a regularized version of it only approximates the regularization properly in a region that is of the size of the regularization parameter. So, as soon as the current iterate of the gradient descent takes values near 0 or 1, the timestep control only allows timestep sizes of the order of the regularization parameter, which, as mentioned above, cannot be chosen too big. Instead of using a penalty term one could of course also approach the constrained convex optimization problem directly. This is done for example by Bresson et al. [15]. Their approach does not need a penalty term and gives an eﬃcient algorithm to minimize ECE , but has to introduce an additional unknown v and a regularization parameter θ and needs to minimize for u and v alternatingly. Furthermore, the key idea to apply Chambolle’s TV minimization algorithm [16] can

Image Segmentation by Unconstrained Multiphase Thresholding

31

also be directly applied to our model to obtain a simpler and faster minimization algorithm: There is no need to introduce v, θ and the alternating minimization. Therefore it is worth to investigate whether it is possible to simplify the problem by getting rid of the constraint altogether.

3

Unconstrained Global 2-Phase Minimization

Another alternative to Chan Vese is a phase ﬁeld approach [6, 5] with a typical double well term:

1 2 2 EPH, [u] := u (1 − u)2 + |∇u| dx. u2 f1 + (1 − u)2 f2 + ν Ω A minimizer u of this energy is a diﬀuse representation of the segmentation, i.e. {u = 0} and {u = 1} represent the two segments respectively with a smooth transition in between. EPH, [u] is known to Γ -converge to EMS [5], but unfortunately not convex and does not permit jumps in u for > 0. Knowing both ECE and EPH, , the question arises whether it is possible to combine the advantages of both models while eliminating some of the disadvantages. Heuristically looking at both energies served as motivation to investigate the following energy: E[u] := u2 f1 + (1 − u)2 f2 + ν|∇u|dx. (5) Ω

This energy is convex because it does not involve the non-convex double well term of EPH, , and can be minimized without imposing constraints because it does not have the indicator term from ECE that is not bounded from below. Furthermore, it permits jumps in u. Remark 4. Given a function u, obviously we have E[min{max{0, u}, 1}] ≤ E[u]. Therefore, a minimizer umin fulﬁlls 0 ≤ umin ≤ 1. While the proposed functional has some nice obvious properties, it is far from obvious whether there is a relation between its minimizer and minimizers of EMS . Before we tackle this question, let us remark a link between ECE and E: Remark 5. There is a direct relationship between ECE and E: A straightforward calculation shows u2 f1 + (1 − u)2 f2 = (f1 − f2 ) u + (u − 12 )2 (f1 + f2 ) − 14 (f1 + f2 ) + f2 . Therefore

(f1 − f2 ) u + (u − 12 )2 (f1 + f2 ) − 14 (f1 + f2 ) + f2 + ν|∇u|dx =ECE [u] + (f1 + f2 )(u − 12 )2 dx + C.

E[u] =

Ω

Ω

In other words, E essentially equals ECE plus an additional quadratic penalty energy. The constant C is clearly irrelevant for the minimizers.

32

B. Berkels

To investigate the relation between the minimizers of E and minimizers of EMS we can make use of the theory derived in the context of the connection between minimal surface problems and total variation minimization. The following general statement has been made by Chambolle [17], Chambolle and Darbon [11], in the continuous setting, its discrete counterpart is well known: Theorem 2. Let Ψ : Ω × R → R, (x, s) → Ψ (x, s) such that Ψ (x, ·) is C 1 and uniformly convex for all x ∈ Ω and u := argmin Ψ (x, u ˜(x)) + ν|∇˜ u|dx. u ˜

Ω

Then Σc := {x ∈ Ω|u(x) > c} for all c ∈ R is a minimizer of ∂s Ψ (x, c)dx + ν Per(Σ). Σ

Note that this general statement cannot be directly applied to the model of Nikolova et al. discussed in Section 2 because the integrand is neither uniformly (not even strictly) convex nor does the general statement incorporate the constraint. As remarked in [11], the proof for a more speciﬁc statement given in [10] still applies to Theorem 2. Theorem 3. If u is a minimizer of (5), then {u > 12 } minimizes EMS [Σ] = f1 dx + f2 dx + ν Per(Σ). Σ

Ω\Σ

Proof. Let Ψ (x, s) := s2 f1 (x)+(1−s)2 f2 (x). Obviously Ψ (x, ·) is C 2 for all x ∈ Ω and we have ∂s Ψ (x, s) = 2sf1 (x)+2(s−1)f2 (x) and ∂s2 Ψ (x, s) = 2(f1 (x)+f2 (x)). From Remark 2, we know that f1 , f2 > 0 a.e., therefore Ψ (x, ·) is uniformly convex for a.e. x ∈ Ω. Now just apply Theorem 2, noting ∂s Ψ (x, 12 ) = f1 (x) − f2 (x) and Remark 1.

In this sense, our theorem is a corollary of Theorem 2. The preceding theorem ﬁnally tells us how to ﬁnd a global minimizer of EMS [·] given in (1): Minimize the convex energy (5) and threshold the minimizer to 1 2 . In case of the piecewise constant Mumford-Shah functional for 2-phase segmentation, we obtain a global minimizer of the Mumford-Shah energy (2) with respect to Σ for ﬁxed gray values c1 , c2 . We do not necessarily ﬁnd a global minimizer with respect to Σ, c1 and c2 . Another link between ECE and EPH, is the so-called piecewise constant levelset method [18] for 2-phase segmentation that constrains the levelset function to be piecewise constant. If this constraint is approximated with a penalty energy, the method equals the phase ﬁeld approach. If the constraint is relaxed to a certain boundedness constraint, the method equals [9]. In both cases the ﬁdelity term has to be altered accordingly, making use of the fact that this term is the same in ECE and EPH, if u only takes the values 0 and 1.

Image Segmentation by Unconstrained Multiphase Thresholding

33

Since (5) is similar to the Rudin-Osher-Fatemi energy [19], there is a wide variety of established minimization schemes to choose from, ranging from a straightforward gradient descent scheme with a diﬀerentiable approximation of the BV term over primal thresholding methods [20] to sophisticated methods based on the dual formulation of the BV norm, e.g. [16, 11]. 2 With Ψ (x, s) = 12 (s − (f2 (x) − f1 (x))) , another immediate consequence of Theorem 2 is that the zero superlevelset of a minimizer of the ROF energy 1 2 (u − (f2 − f1 )) + ν|∇u|dx EROF [u] := (6) 2 Ω ˆMS and therefore of EMS . This is another way to obtain is a global minimizer of E a global minimizer of EMS by unconstrained convex optimization, but compared to (5) this method has a few shortcomings, cf. Sections 4 and 5. Furthermore, the boundedness mentioned in Remark 4 does not hold for minimizers of the ROF energy. Perhaps this is one of the reasons why nobody seems to have used the classical ROF function for Mumford-Shah based image segmentation so far.

4

Multiphase Segmentation

Our functional can be extended to multiphase segmentation by the using the idea of Vese and Chan [2] in a straightforward manner. To keep notation at bay, we restrict the discussion to segmentation in 4 phases. The segmentation in 2n phases works analogously. Let f1 , f2 , f3 , f4 ∈ L1 (Ω) such that fi ≥ 0 a.e., then the multiphase functional is given by E[u1 , u2 ] := u21 u22 f1 + (1 − u1 )2 u22 f2 Ω

+ u21 (1 − u2 )2 f3 + (1 − u1 )2 (1 − u2 )2 f4

(7)

+ ν (|∇u1 | + |∇u2 |) dx. If we ﬁx u2 , the reduced functional E[·, u2 ] is the same as the 2-phase functional (5) with the indicator functions f˜1 = u22 f1 + (1 − u2 )2 f3 and f˜2 = u22 f2 + (1 − u2 )2 f4 . As in the 2-phase case, we can assume fi > 0 a.e. without loss of generality and because either u22 > 0 or (1 − u2 )2 holds, we have f˜1 , f˜2 > 0. Therefore, all statements proven for the 2-phase functional can be applied to E[·, u2 ], i.e. we can compute the global minimum (for ﬁxed u2 ). The same applies for ﬁxed u1 , so as an optimization strategy, we propose to minimize with respect to u1 and u2 alternatingly. Even though it is easy to extend (5) to multiphase segmentation, the same does not apply to the ROF energy (6). There is no apparent extension in the sense of [2] to formulate the multiphase segmentation in a single functional.

5

Indicator Parameters

In typical segmentation tasks, the indicator functions depend on unknown parameters, e.g. the grey values for each segment in case of the piecewise constant

34

B. Berkels

Mumford-Shah model. For the sake of simplicity, we discuss the latter model in its 2-phase formulation here, i.e. fi (x) := (u0 (x) − ci )2 , i = 1, 2, but this discussion applies to other indicator functions and multiphase segmentation as well. During the minimization of (5) we have to minimize for c1 and c2 as well. This is typically done in an alternating fashion, but there are two apparent possibilities to update the grey values: Minimize (5) with respect to c1 and c2 or do so for the energy in the set formulation (1). The two possible updating formulae for c1 two arising are u2 u0 dx u2 dx or c1 = u0 dx dx. c1 = Ω

Ω

{u> 12 }

{u> 12 }

The two possibilities only coincide if u is binary. The ﬁrst formula not only averages u0 in {u > 12 }, instead it takes into account the values of u0 everywhere, but weights the values according to u2 . To a certain degree this is similar to the eﬀect of the regularization of the Heaviside function in the model of Chan and Vese. From our experiments, this reduces the chance of getting stuck in local minima that can still occur when minimizing over u and the indicator parameters. Particulary in the case of multiphase segmentation it turned out to be beneﬁcial. Due to the diﬀerent way f1 and f2 are used in the ROF energy (6), it is not quadratic in c1 and c2 . So this functional does not give a natural formula to update the grey values.

6

Numerical Examples

To conclude, we show the practical usability of the proposed model by applying it to the classical piecewise constant Mumford-Shah functional, see equation (2). As minimization method we use an explicit gradient descent scheme with the Armijo√rule [14] as timestep size control. The absolute value is regularized by |z| = z 2 + 2 (in all examples presented here, = 0.1 is used). For the spatial discretization, we use bilinear ﬁnite elements on a regular quadrilateral grid, i.e. each pixel of the input image u0 corresponds to a node of the ﬁnite element mesh. The grey values c1 and c2 are initialized with 0 and 1 respectively and updated occasionally during the gradient descent. Figure 1 shows results of our method and of the one proposed by Nikolova et al. [9] on one artiﬁcial image and one digital photo. In both examples, the minimizer u from our model is far from being binary, but this is nothing to be expected from the theory presented in this paper. The 0.5-superlevelset gives an accurate segmentation that is not inﬂuenced by the presence of heavy noise (top row) and works on non-binary input images (bottom row). The minimizers u of the Nikolova et al. model look very diﬀerent, but the segmentation obtained from the 0.5-superlevelsets is almost identical. Upon closer inspection, the minimizer u of our model from the top row of Figure 1 looks very much like as obtained by minimizing the ROF energy with

Image Segmentation by Unconstrained Multiphase Thresholding

35

Fig. 1. Segmentation of an artiﬁcial noisy structure (ν = 2 · 10−3 , top row) and the well-known Matlab cameraman image (ν = 4·10−3 , bottom row): Input image u0 (left), segmentation function u and 0.5-superlevelset of u colored with the average grey values c1 , c2 obtained by our model (middle) and by using ECE (right). The slight diﬀerence of the grey values is attributed to the employed update formula, cf. Section 5.

Fig. 2. 4-phase segmentation of an artiﬁcial noisy image (top row) and a MRI image (bottom row) (ν = 6 · 10−4 ): Input image u0 (left), segmentation functions u1 and u2 (middle), segmentation colored with the average grey values c1 , ..., c4 (right)

u0 as input image. This is not surprising due to the following observation: If u0 is binary, i.e. u0 = χA for a set A ⊂ Ω and c1 = 0, c2 = 1 we have f1 = (χA − 0)2 = χA and f2 = (χA − 1)2 = χΩ\A and therefore E[u] = Ω

(u − χΩ\A )2 + ν|∇u|dx,

36

B. Berkels

Fig. 3. Segmentation of a digital photo (ν = 2 · 10−5 ). Input image u0 (left), segmentation in four (middle) and eight (right) segments colored with the average grey values c bigmama / PIXELIO. of the segments. Original image

Fig. 4. Intermediate results of the segmentation in eight segments shown in Figure 3 after 50 (left), 250 (middle) and 700 (right) gradient descent steps

i.e. E equals the ROF energy in this special case. This is not the case if u0 is non-binary which can be seen from the bottom row of Figure 1. Figure 2 shows 4-phase segmentation results. Those indicate the tendency of the segmentation functions to become binary for small values of ν. Finally, Figure 3 illustrates the behavior of the method for diﬀerent numbers of segments and Figure 4 shows three timesteps of the 8-phase segmentation.

References 1. Mumford, D., Shah, J.: Optimal approximation by piecewise smooth functions and associated variational problems. Communications on Pure Applied Mathematics 42, 577–685 (1989) 2. Vese, L.A., Chan, T.F.: A multiphase level set framework for image segmentation using the Mumford and Shah model. International Journal of Computer Vision 50(3), 271–293 (2002) 3. Chan, T.F., Vese, L.A.: Active contours without edges. IEEE Transactions on Image Processing 10(2), 266–277 (2001) 4. Osher, S.J., Sethian, J.A.: Fronts propagating with curvature dependent speed: Algorithms based on Hamilton–Jacobi formulations. Journal of Computational Physics 79, 12–49 (1988) 5. Shen, J.: Γ -convergence approximation to piecewise constant Mumford-Shah segmentation. In: Blanc-Talon, J., Philips, W., Popescu, D.C., Scheunders, P. (eds.) ACIVS 2005. LNCS, vol. 3708, pp. 499–506. Springer, Heidelberg (2005)

Image Segmentation by Unconstrained Multiphase Thresholding

37

6. Esedo¯ glu, S., Tsai, Y.H.R.: Threshold dynamics for the piecewise constant Mumford-Shah functional. Journal of Computational Physics 211(1), 367–384 (2006) 7. Merriman, B., Bence, J.K., Osher, S.J.: Diﬀusion generated motion by mean curvature. CAM Report 92-18, UCLA (1992) 8. Alvino, C.V., Yezzi, A.J.: Fast Mumford-Shah segmentation using image scale space bases. In: Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, vol. 6498 (2007) 9. Nikolova, M., Esedo¯ glu, S., Chan, T.F.: Algorithms for ﬁnding global minimizers of image segmentation and denoising models. SIAM Journal on Applied Mathematics 66(5), 1632–1648 (2006) 10. Chambolle, A.: An algorithm for mean curvature motion. Interfaces and free Boundaries 6, 195–218 (2004) 11. Chambolle, A., Darbon, J.: On total variation minimization and surface evolution using parametric maximum ﬂows. CAM Report 08-19, UCLA (2008) 12. Alter, F., Caselles, V., Chambolle, A.: A characterization of convex calibrable sets in RN . Mathematische Annalen 332(2), 329–366 (2005) 13. Ambrosio, L., Fusco, N., Pallara, D.: Functions of bounded variation and free discontinuity problems. Oxford Mathematical Monographs. Oxford University Press, New York (2000) 14. Kosmol, P.: Methoden zur numerischen Behandlung nichtlinearer Gleichungen und Optimierungsaufgaben, 2nd edn. Teubner, Stuttgart (1993) 15. Bresson, X., Esedo¯ glu, S., Vandergheynst, P., Thiran, J., Osher, S.: Fast global minimization of the active contour/snake model. Journal of Mathematical Imaging and Vision 28(2), 151–167 (2007) 16. Chambolle, A.: An algorithm for total variation minimization and applications. Journal of Mathematical Imaging and Vision 20(1-2), 89–97 (2004) 17. Chambolle, A.: Total variation minimization and a class of binary MRF models. In: Rangarajan, A., Vemuri, B.C., Yuille, A.L. (eds.) EMMCVPR 2005. LNCS, vol. 3757, pp. 136–152. Springer, Heidelberg (2005) 18. Lie, J., Lysaker, M., Tai, X.C.: A binary level set model and some applications to Mumford-Shah image segmentation. IEEE Transactions on Image Processing 15(5), 1171–1181 (2006) 19. Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 20. Daubechies, I., Defrise, M., de Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on Pure and Applied Mathematics 57(11), 1413–1457 (2004)

Extraction of the Intercellular Skeleton from 2D Images of Embryogenesis Using Eikonal Equation and Advective Subjective Surface Method Paul Bourgine1 , Peter Frolkovič2, Karol Mikula2 , Nadine Peyriéras3, and Mariana Remešíková2 1

2

CREA, Ecole Polytechnique-CNRS, 1 rue Descartes, 75005, Paris, France [email protected] Department of Mathematics, Slovak University of Technology, Radlinského 11, 81368 Bratislava [email protected], [email protected], [email protected] 3 CNRS-DEPSN, Avenue de la Terasse, 91198, Gif-sur-Yvette, France [email protected]

Abstract. We suggest an eﬃcient method for automatic detection of the intercellular skeleton in microscope images of early embryogenesis. The method is based on the solution of two advective PDEs. First, we solve numerically the time relaxed eikonal equation in order to obtain the signed distance function to a given set – a set of points representing cell centers or a set of closed curves representing segmented inner borders of cells. The second step is a segmentation process driven by the advective version of subjective surface equation where the velocity ﬁeld is given by the gradient of the computed distance function. The ﬁrst equation is discretized by Rouy-Tourin scheme and we suggest a ﬁxing strategy that signiﬁcantly improves the speed of the computation. The second equation is solved using a classical upwind strategy. We present several test examples and we show a practical application - the intercellular skeleton extracted from a 2D image of a zebraﬁsh embryo.

1

Introduction

The measure of the cell contact surface (intercellular skeleton) is an important quantitative characteristic of a living organism, especially during its embryonic development [6]. Together with other characteristics, e.g. the volume of the embryo, the global and local density of cells, the density of cell divisions etc., it provides an insight into the process of the evolution of the organism and allows to detect abnormalities or to compare individuals evolving in diﬀerent conditions. The intercellular skeleton can be extracted from the miscroscope images of the evolving embryo. Fig. 1 shows an example of suitable image data. These images display signiﬁcant cell structures (cell nuclei and cell membranes) of a zebraﬁsh embryo at an early stage of its development and they were obtained by a two-photon confocal microscope. X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 38–49, 2009. c Springer-Verlag Berlin Heidelberg 2009

Extraction of the Intercellular Skeleton

39

The main goal of our paper is to introduce an eﬃcient and easily implementable method for detecting the intercellular skeleton. Our technique is based on numerical solution of a pair of advective partial diﬀerential equations. The ﬁrst step is the solution of the time relaxed eikonal equation with a special Dirichlet type condition. The solution of such an equation is the distance function to a given set. This can be a set of points representing cell centers or a set of closed curves representing inner borders of cells obtained by segmentation. In case we deal with the curves, we construct the signed distance function with negative values in the interior part. We discretize the problem using the explicit Rouy-Tourin scheme and we suggest to extend the original scheme by a ﬁxing technique. The idea of ﬁxing is based on the fact that the Rouy-Tourin scheme applied to the time relaxed eikonal equation produces in every point monotonically increasing values approaching the value of the distance function. At some moment, the value will reach some steady state and the point can be excluded from the calculations. This strategy provides a signiﬁcant improvement of the eﬃciency of the method and it brings a natural stopping criterion for the computation. We compare the performance of our algorithm with the computation of the exact distance function and we provide some examples of situations when the numerical solution can be obtained faster. The second step of our procedure is the segmentation using the advective version of the subjective surface equation. For each cell, we construct an initial segmentation function. Afterwards, all level sets of this function are evolving according to the velocity ﬁeld given by the gradient of the computed signed distance function. By taking one of the level sets of the ﬁnal form of the evolving function, we obtain the part of the required intercellular skeleton corresponding to one particular cell. The complete skeleton is constructed as the union of the results corresponding to individual cells. Using the distance function corresponding to cell centers, we get a Voronoi type cell skeleton that is already a good approximation of the real one as the cell formations are naturally similar to Voronoi tiling. A very realistic skeleton localization can be obtained if we consider the distance function to the segmented inner boundaries of the cells, assuming that we have a good quality cell segmentation. For pratical purposes, it is even suﬃcient to perform only a few time steps of the Rouy-Tourin scheme and use a rough estimate of the distance function in order to obtain a correctly oriented velocity ﬁeld for the segmentation. This makes the method very eﬃcient without loss of the quality of the resulting skeleton. The advective subjective surface equation is discretized by an explicit upwind approach. The paper is organized as follows. In Sec. 2, we describe the mathematical models for the two substeps of the procedure. Sec. 3 presents the numerical schemes and it explains the idea of the ﬁxing algorithm intended to reduce the CPU time needed to compute the numerical solution of the eikonal euqation. In Sec. 4, we provide a series of numerical experiments as well as an example of a skeleton extracted from a 2D microscope image of a zebraﬁsh embryo. Let us note that we present a method for solving two-dimensional problems, but the extension to three dimensions is rather straightforward.

40

P. Bourgine et al.

Fig. 1. 2D slices of 3D image of a zebraﬁsh embryo. Left, the cell nuclei. Right, the cell membranes.

2

Mathematical Models

The ﬁrst equation involved in our skeleton extraction strategy is the eikonal equation with time relaxation dt + |∇d| = 1

(1)

solved in the domain Ω × [0, TD ] where Ω is the image domain and coupled with a Dirichlet type condition d(x, t) = 0,

x ∈ Ω0 ⊂ Ω.

(2)

By the problem formulated in this way the solution d approaches, as time is evolving, the distance function to the set Ω0 . In our case, as we have already mentioned, Ω0 can be a set of points corresponding to approximate cell centers or a set of closed curves representing the segmented inner boundaries of the cells. The signed distance function can be constructed straightforwardly. The result of the cell shape segmentation is a level set function. Choosing one of the level sets to represent the inner boundary of the cell, we are able to recognize the inner and outer parts of the cell [1, 2] and assign the corresponding sign to the distance function. The distance function corresponding to a set of points is always positive. In the second step, we use the computed signed distance function in the advective part of the subjective surface model [4, 2] and we solve the equation ut + ∇g · ∇u = 0

(3)

where (x, t) ∈ Ω × [0, TS ] and g(x) = d (x, TD ) according to [8] or g(x) = −1/(1 + Kdp (x, TD )) with K > 0, p > 0 as in [4, 2]. The unknown function u is initialized by a piecewise constant proﬁle localized around the approximate cell center. Then it is evolved by (3). The intercellular borders are represented by a p

Extraction of the Intercellular Skeleton

41

chosen level set of the function u(x, TS ). Due to the properties of the function d (see Fig. 3 and 6), the border lines of the neighboring cells correspond to the ridges of the distance function and are attached to each other and thus form the intercellular skeleton.

3 3.1

Numerical Schemes Time Relaxation Method with Fixing for Computing the Distance Function

In order to solve the equation (1) with the condition (2) numerically, we use an explicit time discretization with time step τD . Afterwards, the equation is discretized in space by the Rouy-Tourin scheme [3], cf. also [5,7]. As it is natural for image processing applications, the space grid elements correspond to the pixels of the image. Let us consider a rectangular space domain with dimensions Lx × Ly . The space grid is then uniform and consist of square elements Vij , i = 1 . . . nx , j = 1 . . . ny , nx = Lx /hD , ny = Ly /hD where hD is the length of the side of the pixel. For each volume Vij , let dij represent the approximate value of the solution d in the center of Vij in time step n. Let us deﬁne Mijpq , p, q ∈ {−1, 0, 1}, |p| + |q| = 1 as 2 Mijpq = min dni+p,j+q − dnij , 0 The Rouy-Tourin scheme for problem (1) then reads as follows τD 1,0 0,−1 n+1 n + max Mij max Mij−1,0 , Mij , Mij0,1 dij = dij + τD − hD

(4)

This scheme is stable for τD ≤ hD /2 and we take advantage of the fact, that it produces monotonically increasing updates that are gradually approaching a steady state. This property allows us to implement (4) in a computationally eﬃcient way. Let us consider the index set F n that contains the indices (i, j) of the volumes where the steady state has been already reached, i.e. there exists such n0 ∈ N , n0 ≤ n, that dnij0 = dnij0 −1 . The set F 0 is given as follows. At the beginning, we compute exact distances to the set Ω0 (which is a set of points corresponding to cell centers or a set of curves representing the inner boundaries of cells) in a local (one pixel) neighborhood. Then F 0 consists of the indices of all volumes with these exact values including the set Ω0 . Then the method is given by Algorithm 1. 3.2

Advective Subjective Surface Method for Detecting the Intercellular Skeleton

Now we discretize equation (3). Again, we consider explicit time discretization with time step τS and the space grid elements are indentiﬁed with the pixels of the image. The space discretization is based on the upwind principle. If the

42

P. Bourgine et al.

Algorithm 1. Fixing method for distance function • if (i, j) ∈ F n then continue • else −1,0 0,−1 1,0 0,1 τD = dn max Mij , Mij , Mij + max Mij • dn+1 ij + τD − hD ij •

n+1 if dn+1 = dn = F n ∪ {(i, j)} ij then F ij

lentgh of the side of the pixel is denoted by hS and we deﬁne the central diﬀery x ences Dij g = (gi+1,j − gi−1,j )/(2hS ), Dij g = (gi,j+1 − gi,j−1 )/(2hS ), we get the following approximation of (3) x x τS un+1 max Dij = unij − g, 0 (uij − ui−1,j ) + min Dij g, 0 (ui+1,j − uij ) ij hS y y g, 0 (uij − ui,j−1 ) + min Dij g, 0 (ui,j+1 − uij ) (5) + max Dij As the initial condition we take a shock-like proﬁle localized around the cell center. Due to the properties of the signed distance function computed by the method described in Sec. 3.1, we can see that the advective velocity ∇g drives all level lines of the initial segmentation function to the ridges of the distance function, cf. Fig. 6. These ridges represent the intercellular skeleton.

4 4.1

Numerical Experiments Computation of the Signed Distance Function

Now let us present some computational results obtained by Algorithm 1. We inspected the experimental order of convergence of the suggested method, the CPU time needed for the computation and the eﬀect of the ﬁxing strategy. The results were also compared with the distance function computed analytically. First, let us make a note about the stopping criteria for the methods. If the ﬁxing technique is not applied, the computation is stopped either when ||dn+1 − dn ||L1 (Ω) ≤ ε1 << 1 or when a prescribed number of time steps is performed. If we use the ﬁxing strategy, we stop when (i, j) ∈ F n for all i = 1 . . . nx , j = 1 . . . ny , i.e. when all values are already ﬁxed. In practice, the condition dn+1 = dnij is replaced by |dn+1 − dnij | ≤ ε2 << 1. If we want to ij ij compare the two methods, we ﬁrst run the computation with ﬁxing until all points are ﬁxed and then the method with no ﬁxing is prescribed to stop at exactly the same time. Now let us assume that the error of the numerical method in L2 (Ω)-norm is of the form E(h) = Chα , where h is the space discretization step. Obtaining experimentally E(h) and E(h/2) (see e.g. Table 1, column 4), we can express α = log2 (E(h)/(E(h/2)), which is called the experimental order or convergence (EOC). In the ﬁrst experiment, we computed the distance function to seven given points situated in a square domain Ω = [−1, 1] × [−1, 1] and we measured the

Extraction of the Intercellular Skeleton

43

L2 (Ω) error with respect to the exact distance function and the EOC. The results and some details of the computations are displayed in Table 1, 2 and 3. Table 1 shows the comparison of the method with ﬁxing with the original scheme without any ﬁxing. The value in the seven points was set to 0, and the values in a one pixel neighborhood of the points were set to the values of the exact solution. In the stopping criterion for the ﬁxing method, we set ε2 = 10−5 . Fig. 2 displays the distance function computed by the ﬁzing method using nx = ny = 320. Comparing the results in the table, we can see the eﬀect of the ﬁxing strategy on the CPU time as well as on the L2 error and EOC. We can observe that in this case the computation is approximately two times faster and also we get smaller error when we ﬁx the solution by our algorithm. The fact that the L2 error apparently depends on the value of ε2 led us to perform the experiment presented in Table 2. We were looking for the optimal choice of ε2 , i.e. the value that would provide the smallest L2 error. We can see that this value is diﬀerent for diﬀerent discretization parameters. Finally, in Table 3, we present the EOC for a slightly diﬀerent implementation of the ﬁxing method. In this case, the solution was initialized with the exact values not only in a one pixel neighborhood, but in a neighborhood whose size was independent of the space step. Again, we were looking for the optimal value of ε2 . We can observe that the EOC is approximately equal to 1. Table 1. Comparison of results of computational tests for the method without ﬁxing and the method with ﬁxing. Computation of the distance function to seven given points. nx 40 80 160 320 640

τD time steps L2 (Ω)-error CP U EOC L2 (Ω)-error ﬁxing 0.025 52 3.447525e-2 0.01 3.440187e-2 0.0125 90 2.373985e-2 0.05 0.53825 2.359906e-2 0.00625 163 1.533921e-2 0.4 0.63009 1.508116e-2 0.003125 305 9.498908e-3 3.1 0.69139 9.016883e-3 0.0015625 583 5.704456e-3 23.18 0.73567 4.844598e-3

CP U ﬁxing 0.0 0.03 0.21 1.73 13.08

EOC ﬁxing 0.54376 0.64598 0.74205 0.89625

Table 2. Results of computational tests for the ﬁxing method. Computation of the distance function to seven given points with determination of optimal ε2 . nx τD time steps 40 0.025 37 80 0.0125 73 160 0.00625 145 320 0.003125 287 640 0.0015625 569 1280 0.00078125 1132

L2 (Ω) - error 1.976552e-2 1.373675e-2 9.133363e-3 5.857149e-3 3.645805e-3 2.164545e-3

ε2 3.4e-3 1.3e-3 4.0e-4 1.3e-4 4.0e-5 1.1e-5

CP U 0.0 1.02 0.19 1.62 12.77 100.09

EOC 0.52494 0.58882 0.64095 0.68396 0.75217

The next experiment is similar. Instead of seven points, we computed the distance function to four polygons. Table 4 and Fig. 3 show the results.

44

P. Bourgine et al.

Table 3. Experimental order of convergence for the ﬁxing method with exact values prescribed in a ﬁxed neighborhood of seven given points and with determination of optimal ε2 nx τD time steps 40 0.025 35 80 0.0125 71 160 0.00625 140 320 0.003125 276 640 0.0015625 545

L2 (Ω) - error 2.648673e-2 1.352430e-2 6.928000e-3 3.598868e-3 1.795268e-3

ε2 6.2e-3 1.5e-3 3.5e-4 9.0e-5 2.2e-5

1.0

CP U 0.0 0.02 0.19 1.58 12.29

EOC 0.96972 0.96504 0.94490 1.00033

80

60 0.8

0.5

0.6

40

0.4 0.2 0.0 20

0.0 0

0.5 0 20 40 60

1.0 1.0

0.5

0.0

0.5

1.0

80

Fig. 2. Distance function to seven given points computed by the fast ﬁxing method. Left, the contours of the function. Right, the 3D plot.

Summarizing the results presented in Table 1, 2 and 3, we can determine the experimental complexity of our algorithm. By careful checking of the CPU times needed for the computations, we can see that the complexity is not higher than O(N 3/2 ), where N is the number of unknowns. The next two tests were realized in order to compare the ﬁxing method with computation of the exact distance function. Again, we used ε2 = 10−5 for the ﬁxing mehotd. In the ﬁrst case, a certain number of points was randomly generated in a given 2D domain. After, the distance function to this set of points was computed both numerically by our ﬁxing algorithm and analytically by ﬁnding the nearest point to every pixel. The CPU time was measured for diﬀerent numbers of points and plotted in graphs presented in Fig. 4. We can observe that the computational cost of the analytical computation is increasing with increasing number of points while the cost of the numerical computation is decreasing. For a certain number of points, depending on the size of the image domain, the numerical method becomes more eﬃcient. We can see that in all cases displayed in the ﬁgure, this number is at most 1% of the number of image pixels so it is practically meaningful. Another experiment was performed in order to test the numerical method on data qualitatively similar to segmented cell structures. The result of the cell segmentation is a level set function and the inner borders of the cells are

Extraction of the Intercellular Skeleton

45

Table 4. Comparison of computational tests for the method without ﬁxing and the method with ﬁxing. Computation of the distance function to four polygons. nx 40 80 160 320 640

τD time steps L2 (Ω) - error CP U EOC L2 (Ω) - error ﬁxing 0.025 36 1.122800e-2 0.01 1.121228e-2 0.0125 61 8.266972e-3 0.03 0.44167 8.231755e-3 0.00625 108 5.606750e-3 0.26 0.56020 5.539007e-3 0.003125 203 3.599271e-3 2.04 0.63946 3.475747e-3 0.0015625 390 2.228162e-3 15.46 0.69185 2.041495e-3

1.0

CP U EOC ﬁxing ﬁxing 0.01 0.01 0.44581 0.11 0.57157 0.96 0.67230 7.24 0.76770

80

60

0.4

0.5

40

0.2 0.0 0.2

20

0.0 0

0.5 0 20 40 60

1.0 1.0

0.5

0.0

0.5

1.0

80

Fig. 3. Distance function to four polygons computed by the RMF. Left, the contours of the function with the shapes indicated. Right, the 3D plot.

represented by a chosen isoline of this function. For our test purposes, we used images with isolines in form of either randomly placed circles or randomly placed rectangles of a random size (Fig. 5). The maximum image intensity (255) was in the centers of these shapes and then it gradually decreased to 0 with increasing distance from the center. In order to compute the distance function to a certain level set numerically, the location of the level set was detected and the solution was initialized by exact values in its one pixel neighborhood. After, the ﬁxing method was applied. For the exact computation, we used a procedure that constructs set of points corresponding to the chosen isoline, ﬁnding its crossections with the pixel structure. Then, the nearest of these points was found for each pixel. The CPU time for such a procedure is documented in Table 5, column 5. By construction, the points of the isoline are sorted by their coordinates and therefore the computation of the exact distance function can be optimized – we do not have to go through the whole list of points but the nearest point to a pixel can be always found in a certain neighborhood of the nearest point to the previous pixel. The CPU times for this optimized approach are listed in Table 5, column 6. According to Table 5, we can observe that in all cases considered here, the numerical solution was faster than the optimized analytical computation. We considered isoline for value I = 128, hD = 1.0, τD = 0.5, ε2 = 10−5 and the dimensions of the image domain were 512 × 512 pixels.

46

P. Bourgine et al.

2.5

10

2.0

8

1.5

6

1.0

4

0.5

2

35 30 25 20 15 10 5 0

0

0

1000

2000

3000

4000

5000

6000

0

1000

2000

3000

4000

5000

6000

0

1000

2000

3000

4000

5000

6000

Fig. 4. Comparison of the ﬁxing method (solid line) with computation of exact solution (dashed line)–plot of CPU time depending on the number of randomly distributed points. Left, image domain with 256 × 256 pixels. Middle, 512 × 512 pixels. Right, 1024 × 1024 pixels.

Fig. 5. Isolines (I = 128) of test level set functions simulating cell structures Table 5. Comparison of the numerical and analytical computation of the distance function to the cell-like structures type of level set time steps L2 (Ω) - rel. error CPU num. CPU exact CPU exact opt. 256 rectangles 52 3.509519e-2 0.91 35.56 2.87 1024 rectangles 33 5.813329e-2 0.63 75.82 2.88 256 circles 54 2.441320e-2 0.91 34.55 3.00 1024 circles 34 4.153412e-2 0.64 68.41 3.07

In the last experiment, we computed numerically the distance function to a set of segmented cells of a zebraﬁsh embryo. In Fig. 6, we show the computed distance function as well as the vector ﬁeld given by the gradient of this function. We used hD = 1.0, τD = 0.5 and ε2 = 10−5 . 4.2

Extraction of the Intercellular Skeleton

In the ﬁrst experiment, we computed the skeleton using the distance function corresponding to approximate cell centers. The result is a Voronoi type skeleton.

Extraction of the Intercellular Skeleton

47

1.0

0.5

0.0

0.5

1.0 1.0

0.5

0.0

0.5

1.0

40 10

30

20 30

20

40 5

10

0 5

Fig. 6. Signed distance function to segmented cells. Top left, segmentation of inner boundaries of cells. Top right, contour plot of the signed distance function. Bottom left, detail of the 3D plot of this distance function. Bottom right, detail of the vector ﬁeld given by ∇g, g(x) = d(x), with recognizable position of the intercellular skeleton.

As we can see in Fig. 7, this skeleton represents a very good approximation of the cell structure in the sense that it can provide reliable estimate of some quantities, like the area of the cell contact surface. It can be used in the cases when no or bad quality membrane images are provided and therefore the correct membrane segmentation is not possible. If we have a good quality segmentation of the inner boundaries of the cells, we can detect the intercellular skeleton more precisely using the signed distance function to the segmented objects. An example is shown in Fig. 8. In both experiments, we set hD = 1.0, τD = 0.5, ε2 = 10−5 , hS = 1.0, τS = 0.1. The automatic segmentation of inner cell borders displayed in Fig. 8 was obtained by the generalized subjective surface method [2] applied to a microscope image of zebraﬁsh cell membranes, similar to Fig. 1.

48

P. Bourgine et al.

Fig. 7. Voronoi type skeleton. On the left, the approximate cell centers. On the right, the corresponding Voronoi type skeleton.

Fig. 8. Real skeleton detection. On the left, the cell segmentation. On the right, the corresponding intercellular skeleton.

1.0

0.5

0.0

0.5

1.0 1.0

0.5

0.0

0.5

1.0

Fig. 9. Left, the result after 10 steps of computation of the distance function. Right, the skeleton obtained using the corresponding vector ﬁeld.

Extraction of the Intercellular Skeleton

49

Remark 1. In practice, the main determining factor of the quality of the detected skeleton is the correct orientation of the vector ﬁeld generated by the gradient of the distance function. As it follows from the character of the problem and also from the numerical procedure, the correct orientation of the vector ﬁeld can be already obtained as soon as the values in all image pixels are nontrivially updated by the numerical scheme (4), i.e. before they are deﬁnitely ﬁxed. This allows a signiﬁcant reduction of the computational time without loss of the quality of the ﬁnal result. In Fig. 9 left, the result obtained after 10 steps of the numerical computation of the distance function is shown. At this moment, the values were nontrivially updated in a suﬃciently large neighborhood of the segmented cells. We can construct the vector ﬁeld for equation (3) and ﬁnd the corresponding intercellular skeleton (Fig. 9 right). Comparing Fig. 8 and 9 we can see that the skeleton found in this way is of the same quality as in the case of complete ﬁxing of the values in the whole domain. Let us note that the complete ﬁxing would require 123 time steps.

Acknowledgments The work was supported by the European projects Embryomics and BioEmergences, the grants APVV-RPEU-0004-06, APVV-0351-07, APVV-LPP-0020-07 and the grant of VEGA 1/0269/09.

References 1. Frolkovič, P., Mikula, K., Peyriéras, N., Sarti, A.: A counting number of cells and cell segmentation using advection-diﬀusion equations. Kybernetika 43(6), 817–829 (2007) 2. Mikula, K., Peyriéras, N., Remešíková, M., Sarti, A.: 3D embryogenesis image segmentation by the generalized subjective surface method using the ﬁnite volume technique. In: Proceedings of FVCA5 – 5th International Symposium on Finite Volumes for Complex Applications. Hermes Publ., Paris (2008) 3. Rouy, E., Tourin, A.: Viscosity solutions approach to shape-from-shading. SIAM Journal on Numerical Analysis 29(3), 867–884 (1992) 4. Sarti, A., Malladi, R., Sethian, J.A.: Subjective Surfaces: A Method for Completing Missing Boundaries. Proc. Nat. Acad. Sci. 12(97), 6258–6263 (2000) 5. Sethian, J.A.: Level Set Methods and Fast Marching Methods: Evolving Interfaces in Computational Geometry. In: Fluid Mechanics, Computer Vision, and Material Science. Cambridge University Press, New York (1999) 6. Tassy, O., Daian, F., Hudson, C., Bertrandt, V., Lemaire, P.: A quantitative approach to the study of the cell shapes and interactions during early chordate embryogenesis. Currrent Biology 16, 345–358 (2006) 7. Zhao, H.-K.: Fast sweeping method for eikonal equations. Mathematics of Computation 74, 603–627 (2005) 8. Zhao, H.-K., Osher, S., Fedkiw, R.: Fast surface reconstruction using the level set method. In: Proc. IEEE workshop on variational and level set methods - VLSM 2001, Vancouer, pp. 194–201 (2001)

On Level-Set Type Methods for Recovering Piecewise Constant Solutions of Ill-Posed Problems Adriano DeCezaro1, Antonio Leitão2 , and Xue-Cheng Tai3,4 1

Institute of Mathematics Statistics and Physics, Federal University of Rio Grande, Av. Italia km 8, 96201-900 Rio Grande, Brazil [email protected] 2 Department of Mathematics, Federal University of St. Catarina, P.O. Box 476, 88040-900 Florianópolis, Brazil [email protected] 3 Division of Mathematical Science, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore [email protected] 4 Department of Mathematics, University of Bergen, Johannes Brunsgate 12, N-5008 Bergen, Norway

Abstract. We propose a regularization method for solving ill-posed problems, under the assumption that the solutions are piecewise constant functions with unknown level sets and unknown level values. A level set framework is established for the inverse problem and a Tikhonov regularization approach is proposed. Existence of generalized minimizers for the Tikhonov functional is proven. Moreover, we establish convergence and stability results, characterizing our Tikhonov approach as a regularization method. Based on the necessary conditions of optimality for the Tikhonov functional, a level-set type method is derived and implemented numerically for solving an inverse source problem. This allow us to test the quality of the proposed algorithm.

1

Introduction

Several inverse problems of interest consist of identifying an unknown physical quantity u ∈ X, that can be represented by a piecewise constant function, over a bounded given domain Ω, from the set of data y ∈ Y , where X, Y are Hilbert spaces. This process being described by the model F (u) = y ,

(1)

where F : D(F ) ⊂ X → Y and the set of data is obtained by indirect measurements of the parameter. Because of this, in practical applications the exact data y ∈ Y is, in general, not known. Given is only approximate measured data y δ ∈ Y , corrupted by noise of level δ > 0 and satisfying y δ − yY ≤ δ . X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 50–62, 2009. c Springer-Verlag Berlin Heidelberg 2009

(2)

On Level-Set Type Methods

51

In the case where the unknown function u is a piecewise constant function distinguishing between two given values (i.e., up to an aﬃne transformation, u is a characteristic function), level set approaches were considered in [1, 2, 3, 4, 5, 6]. In this case, since the level values of u are known, one needs only to identify the level sets of u, i.e. the inverse problem reduces to a shape identiﬁcation problem. In the case where the unknown function u is a piecewise constant function distinguishing between several given values, multiple level set approaches were considered in [6,7,8]. If the level values of u are also unknown the inverse problem becomes harder, since one has to identify both the level sets as well as the level values of the unknown parameter u. In this case, the dimension of the parameter space increases by the number of unknown level values. Our starting point in this article is the assumption that the parameter u in (1) is a piecewise constant function assuming two unknown values, i.e. u(x) ∈ {c1 , c2 } a.e. in Ω ⊂ Rd , d = 2, 3. In this case one can assume the existence of an open mensurable set D ⊂⊂ Ω s.t. u(x) = c1 , x ∈ D and u(x) = c2 x ∈ Ω/D. We propose a level set approach to represent the unknown parameter u. First we introduce the H 1 -function φ, which act as a regularization on the parameter space. Then, using the Heaviside projector H : H 1 (Ω) → L∞ (Ω), a solution of (1) can be represented in the form u = c1 H(φ) + c2 (1 − H(φ)) =: P (φ, c1 , c2 ) .

(3)

With this notation we have D = {x ∈ Ω ; φ(x) > 0} and Ω/D = {x ∈ Ω ; φ(x) < 0}. The level values c1 , c2 ∈ R are unknown and have to be determined as well. As already observed in [3], the Heaviside operator H maps H 1 (Ω) into the set V := {χD ; D ⊂ Ω measurable, Hn−1 (∂D) < ∞}, where Hn−1 (S) denotes the (n-1)-dimensional Hausdorﬀ-measure of the set S. Therefore, the operator P in (3) maps H 1 (Ω) × R2 into the admissible parameter set U := {u = q(v, c1 , c2 ); v ∈ V and c1 , c2 ∈ R}, where q : V × R2 (v, c1 , c2 ) → c1 v + c2 (1 − v) ∈ L∞ (Ω). Using the level set framework introduced above, the inverse problem in (1), with data given as in (2), can be written in the form of the operator equation F (P (φ, c1 , c2 )) = y δ .

(4)

Once an approximate solution (φ, c1 , c2 ) of (4) is obtained, a corresponding solution of (1) can be computed using equation (3). In this article, approximate solutions to (4) are obtained by minimizing the Tikhonov functional Gα (φ, c1 , c2 ) := F (P (φ, c1 , c2 )) − y δ 2Y + α β1 |H(φ)|BV + β2 φ − φ0 2H 1 2 + β3 j=1 |cj − cj0 |2 , (5) based on T V -H 1 penalization. Here φ0 and cj0 are known reference parameters. This Tikhonov functional extends the ones proposed in [5, 6, 9] (based on T V penalization) and [3, 8] (based on T V -H 1 penalization). To motivate the regularization terms in (5), notice that they eﬀect: i) the boundedness of the level lines of φ as well as it’s H 1 -norm; ii) the boundedness of cj . These two facts allow us to guarantee existence of (generalized) minimizers of Gα in L∞ ∩ BV.

52

A. DeCezaro, A. Leitão, and X.-C. Tai

This article is outlined as follows: In Section 2 we introduce the concept of generalized minimizers for the functional Gα in (5). In Section 3 we derive a convergence analysis for this Tikhonov approach. In Section 4 we introduce stabilized functionals and prove that the corresponding minimizers approximate a minimizer of Gα . Section 5 is devoted to numerical experiments. A level set type method is implemented for solving a two-dimensional inverse potential problem.

2

The Concept of Generalized Minimizers

We shall consider the model problem described as in the introduction under the following general assumptions: (A1) Ω ⊆ Rn is bounded, connected, with piecewise C 1 boundary ∂Ω. (A2) The operator F : D(F ) ⊂ L1 (Ω) → Y is continuous and Fréchetdiﬀerentiable on D(F ) with respect to the L1 (Ω)-topology. (A3) ε, α and βj , j = 1, 2, 3 denote positive parameters. (A4) Equation (1) has a solution, i.e. there exists u ∈ U satisfying F (u) = y and a function φ ∈ H 1 (Ω) satisfying |∇φ| = 0, in a neighborhood of {φ = 0} such that H(φ) = z, for some z ∈ V. Moreover, there exist constants values c1 , c2 ∈ R such that q(z, c1 , c2 ) = u. For each ε > 0, we deﬁne the operator Pε (φ, c1 , c2 ) := c1 Hε (φ) + c2 (1 − Hε (φ)) ,

(6)

where Hε is the continuous approximation to H given by: 1 + t/ε for t ∈ [−ε, 0] Hε (t) := . H(t) for t ∈ R/ [−ε, 0] In order to guarantee existence of a minimizer of Gα in (5), we adapt to the level-set framework described above, the concept of generalized minimizers formulated in [3]. Definition 1. Let the operators H, P , Hε and Pε be defined as above. a) A vector (z, φ, c1 , c2 ) ∈ L∞ (Ω) × H 1 (Ω) × R2 is called admissible when there exists a sequence {φk } of H 1 (Ω)-functions satisfying limk φk − φL2 = 0, and also there exists a sequence {εk } ∈ R+ converging to zero such that limk Hεk (φk ) − zL1 = 0. b) A minimizer of Gα is considered to be any admissible vector (z, φ, c1 , c2 ) minimizing 2 Gα (z, φ, c1 , c2 ) := F (q(z, c1 , c2 )) − y δ Y + αR(z, φ, c1 , c2 ) (7) over the set of admissible vectors, where

R(z, φ, c1 , c2 ) = ρ(z, φ) + β3 2j=1 |cj − cj0 |2 , ρ(z, φ) := inf lim inf β1 |Hεk (φk )|BV + β2 φk − φ0 2H 1 . k→∞

(8) (9)

The infimum in (9) is taken over all sequences {εk } and {φk } characterizing (z, φ, c1 , c2 ) as an admissible vector.

On Level-Set Type Methods

53

c) A generalized minimizer of Gα (φ, c1 , c2 ) is an admissible vector (z, φ, c1 , c2 ) minimizing the functional Gα in (7) on the set of admissible vectors. 2.1

Relevant Properties of Admissible Vectors

First we verify some basic properties of the operators Pε , Hε and q that will be necessary in the subsequent analysis. Lemma 1. Let Ω be given as above and j = 1, 2. (i) Let {zk }k∈N be a bounded sequence in L∞ (Ω) converging to some element z in L1 (Ω) and {cjk }k∈N be a sequence of real numbers converging to cj . Then q(zk , c1k , c2k ) converges to q(z, c1 , c2 ) in L1 (Ω). (ii) Let (z, φ) ∈ L1 (Ω) × H 1 (Ω), be such that Hε (φ) → z in L1 (Ω) as ε → 0 and let c1 , c2 ∈ R. Then Pε (φ, c1 , c2 ) → q(z, c1 , c2 ) in L1 (Ω) as ε → 0. (iii) Given ε > 0, let {φk }k∈N be a sequence in H 1 (Ω) converging to φ ∈ H 1 (Ω) in the L2 -norm. Then Hε (φk ) → Hε (φ) in L1 (Ω), as k → ∞. Moreover, if {cjk }k∈N are sequences of real numbers converging to some cj , then q(Hε (φk ), c1k , c2k ) → q(Hε (φ), c1 , c2 ) in L1 (Ω), as k → ∞. Proof. Since Ω is assumed to be bounded, we have L∞ (Ω) ⊂ L1 (Ω). To prove (i), notice that 1 2 1 2 q(zk , ck , ck )−q(z, c , c )L1 = |c1k zk + c2k (1 − zk ) − c1 z − c2 (1 − z)| dx Ω

1 (|c | + |c2 |)|zk − z| + |c2k − c2 | dx ≤ |zk | (|c1k − c1 | + |c2k − c2 |)dx + Ω

Ω

≤|Ω| zk L∞ (|c1k − c1 | + |c2k − c2 |) + (|c1 | + |c2 |)zk − zL1 + |Ω| |c2k − c2 | , which converges to zero as k → ∞. Assertion (ii) follows with similar arguments. The ﬁrst part of assertion (iii) is a direct consequence of the inequality Hε (φk )− Hε (φ)L1 (Ω) ≤ ε−1 meas(Ω)φk − φL2 (Ω) . The second part of assertion (iii) follows by a combination of the inequality above and assertion (i). Lemma 2. Let (zk , φk , c1k , c2k ) be a sequence of admissible vectors converging in L1 (Ω)× L2 (Ω)× R2 to some (z, φ, c1 , c2 ). Then (z, φ, c1 , c2 ) is also an admissible vector. Sketch of the proof. In order to prove that (z, φ, c1 , c2 ) is also an admissible vector, one uses an argument of extraction of diagonal subsequences, analogously as in [8, Lemma 2]. 2.2

Relevant Properties of the Penalization Functional

In the next lemmas we verify two properties of the functional R which are fundamental for the convergence analysis in Section 3. Lemma 3. The functional R in (8) is coercive on the set of admissible vectors.

54

A. DeCezaro, A. Leitão, and X.-C. Tai

Sketch of the proof. Let (z, φ, c1 , c2 ) be an admissible vector. From [8, Lemma 4] follows

ρ(z, φ) ≥ β1 |z|BV + β2 φ − φ0 2H 1 . (10) Now, from (10) and the deﬁnition of R in (8) follows β1 |z|BV + β2 φ − φ0 2H 1 + β3

2

|cj − cj0 |2 ≤ 2 ρ(z, φ) + β3 j=1 |cj − cj0 |2 = R(z, φ, c1 , c2 ) , j=1

concluding the proof.

Lemma 4. The functional R in (8) is weak lower semi-continuous on the set of admissible vectors, i.e. given a sequence {(zk , φk , c1k , c2k )} of admissible vectors such that zk → z in L1 (Ω), φk φ in H 1 (Ω), cjk → cj in R, for some admissible vector (z, φ, c1 , c2 ), then it follows R(z, φ, c1 , c2 ) ≤ lim inf R(zk , φk , c1k , c2k ) . k∈N

Proof. The functional ρ(z, φ) is weak lower semi-continuous cf. [8, Lemma 5]. Moreover, the Euclidean norm in R2 is also lower semi-continuous. The lemma follows from the fact that the functional R in (8) is a linear combination of lower semi-continuous functionals.

3

Convergence Analysis

First we prove that for any positive parameters α, β the functional Gα in (5) is well posed. Theorem 1 (Well-Posedness). The functional Gα in (5) attains minimizers on the set of admissible vectors. Proof. Notice that the set of admissible vectors is not empty, since (0, 0, 0, 0) is admissible. Let {(zk , φk , c1k , c2k )} be a minimizing sequence for Gα , i.e. a sequence of admissible vectors satisfying Gα (zk , φk , c1k , c2k ) → inf Gα ≤ Gα (0, 0, 0, 0) < ∞. Then, {Gα (zk , φk , c1k , c2k )} is a bounded sequence of real numbers. Therefore, {(zk , φk , c1k , c2 , k)} is uniformly bounded in BV × H 1 (Ω) × R2 . Thus, the Sobolev compact embedding theorem [10] and the Bolzano-Weierstrass theorem guarantees the existence of a subsequence (denoted again by {(zk , φk , c1k , c2k )}) and the existence of (z, φ, c1 , c2 ) ∈ L1 (Ω) × H 1 (Ω) × R2 such that φk → φ in L2 (Ω), φk φ in H 1 (Ω), zk → z in L1 (Ω) and cjk → cj in R. From Lemma 2 we conclude that (z, φ, c1 , c2 ) is an admissible vector. Moreover, from the weak lower semi-continuity of R together with the continuity of F and q we obtain lim Gα (zk , φk , c1k , c2k ) = lim F (q(zk , c1k , c2k )) − y δ 2Y + αR(zk , φk , c1k , c2k ) k→∞

k→∞

≥ F (q(z, c1 , c2 )) − y δ 2Y + αR(z, φ, c1 , c2 ) = Gα (z, φ, c1 , c2 ) , (11) proving that (z, φ, c1 , c2 ) minimizes Gα .

On Level-Set Type Methods

55

In the next theorems we present the main convergence and stability results. The proofs use classical techniques from the analysis of Tikhonov type regularization methods (see, e.g., [11, 12]). Theorem 2 (Convergence for exact data). Assume that we have exact data, i.e. y δ = y and βj > 0 , j = 1, 2, 3. For every α > 0 let (zα , φα , c1α , c2α ) denote a minimizer of Gα on the set of admissible vectors. Then, for every sequence of positive numbers {αk }k∈N converging to zero there exists a subsequence, denoted again by {αk }l∈N , such that (zαk , φαk , c1αk , c2αk ) is strongly convergent in L1 (Ω)× L2 (Ω) × R2 . Moreover, the limit is a solution of (1). Proof. Let (z † , φ† , c1,† , c2,† ) be a solution of (1) – its existence is guaranteed by assumption (A4). Let {αk }k∈N be a sequence of positive numbers converging to zero. For each k ∈ N, let (zk , φk , c1k , c2k ) := (zαk , φαk , c1αk , c2αk ) be a minimizer of Gαk . Then, for each k ∈ N we have 2 Gαk (zk , φk , c1k , c2k ) ≤ F (q(z † , c1,† , c2,† )) − y Y + αk R(z † , φ† , c1,† , c2,† )

= αk R(z † , φ† , c1,† , c2,† ) . (12)

Since αk R(zk , , φk , c1k , c2k ) ≤ Gαk (zk , φk , c1k , c2k ), it follows from (12) that R(zk , φk , c1k , c2k ) ≤ R(z † , φ† , c1,† , c2,† ) < ∞ .

(13)

Moreover, from the assumption on the sequence {αk }, it follows that lim αk R(z † , φ† , c1,† , c1,† ) = 0 .

k→∞

(14)

From (13) and Lemma 3 we conclude that the sequences {φk }, {zk } and {cjk }j=1,2 are bounded in H 1 (Ω) , BV and R2 respectively. Using an argument of extraction of diagonal subsequences (see proof of Lemma 2) we can guarantee the existence ˜ c˜1 , c˜2 ) such that of an admissible vector (˜ z , φ, ˜ c˜1 , c˜2 ) in L1 (Ω) × L2 (Ω) × R2 . (zk , φk , c1k , c2k ) → (˜ z , φ, From Lemma 1 (i) follows that q(˜ z , c˜1 , c˜2 ) = lim q(zk , c1k , c2k ) on L1 (Ω). Using k→∞

the continuity of the operator F together with (12) and (14) we conclude that y = lim F (q(zk , c1k , c2k )) = F (q(˜ z , c˜1 , c˜2 )) . k→∞

On the other hand, from the lower semi-continuity of R and (13) it follows that ˜ c˜1 , c˜2 ) ≤ lim inf R(zk , φk , c1 , c2 ) R(˜ z , φ, k k k→∞

≤ lim sup R(zk , φk , c1k , c2k )) ≤ R(z †, φ† , c˜1 , c˜2 ) , k→∞

concluding the proof.

56

A. DeCezaro, A. Leitão, and X.-C. Tai

Theorem 3 (Convergence for noisy data). Let α = α(δ) be a function satisfying lim α(δ) = 0 and lim δ 2 α(δ)−1 = 0. Moreover, let {δk }k∈N be a seδ→0

δ→0

quence of positive numbers converging to zero and y δk ∈ Y be corresponding noisy data satisfying (2). Then, there exist a subsequence, denoted again by {δk }, and a sequence {αk := α(δk )} such that (zαk , φαk , c1αk , c2αk ) converges in L1 (Ω) × L2 (Ω) × R2 to solution of (1). Proof. Let (z † , φ† , c1,† , c1,† ) be a solution of (1).1 For each k ∈ N, denote by (zk , φk , c1k , c2k ) := (zα(δk ) , φα(δk ) , c1α(δk ) , c2α(δk ) ) a minimizer of Gα(δk ) . Then, for each k ∈ N we have 2 Gαk (zk , φk , c1k , c2k ) ≤ F (q(z † , c1,† , c1,† )) − y δk Y + α(δk )R(z † , φ† , c1,† , c2,† ) ≤ δk2 + α(δk )R(z † , φ† , c1,† , c2,† ) .

(15)

Taking the limit k → ∞ in (15), it follows from the theorem assumptions 2 that lim F (q(zk , c1k , c2k )) − y δk ≤ lim Gαk (zk , φk , c1k , c2k ) = 0. Therefore, k→∞

k→∞

lim F (q(zk , c1k , c2k )) = y. Moreover, from (15) and the deﬁnition of Gαk , it fol-

k→∞

lows that R(zk , φk , c1k , c2k ) ≤ δk2 α(δk )−1 + R(z † , φ† , c1,† , c2,† ). Thus, from the assumptions on the function α(δk ), we conclude that lim sup R(zk , φk , c1k , c2k ) ≤ k→∞

R(z † , φ† , c1,† , c2,† ). The proof follows arguing as in the proof of Lemma 2.

4

Numerical Solution

In the sequel we introduce a functional which can be handled numerically, and whose minimizers are ’close’ to the minimizers of Gα . Let Gε,α be the stabilized functional deﬁned by Gε,α (φ, c1 , c2 ) := F (Pε (φ, c1 , c2 )) − y δ 2Y + α β1 |Hε (φ)|BV + 2 + β2 φ − φ0 2H 1 + β3 j=1 |cj − cj0 |2 , (16) where Pε (φ, c1 , c2 ) := q(Hε (φ), c1 , c2 ) is the functional deﬁned in (6). The functional Gε,α is well-posed as the following lemma shows: Lemma 5. Given positive constants α, ε, βj , j = 1, 2, 3 as above, a function φ0 ∈ H 1 (Ω) and cj0 ∈ R, j = 1, 2, the functional Gε,α in (16) attains a minimizer on H 1 (Ω) × R2 . Proof. Since inf{Gε,α (φ, c1 , c2 ) : (φ, c1 , c2 ) ∈ H 1 (Ω) × R2 } ≤ Gε,α (0, 0, 0) < ∞, there exists a minimizing sequence {(φk , c1k , c2k )} in H 1 (Ω) × R2 satisfying lim Gε,α (φk , c1k , c2k ) = inf{Gε,α (φ, c1 , c2 ) : (φ, c1 , c2 ) ∈ H 1 (Ω) × R2 } .

k→∞ 1

The existence of solutions is guaranteed by (A4).

On Level-Set Type Methods

57

Then, for ﬁxed α > 0, the sequences {φk } and {cjk }j=1,2 are bounded in H 1 (Ω) and R2 respectively. Therefore, φk φ in H 1 (Ω) and cjk → cj in R, j = 1, 2. Moreover, by the weak lower semi-continuity of the H 1 –norm and the continuity of the Euclidean norm in R, it follows that φ − φ0 2H 1 ≤ lim inf φk − φ0 2H 1 , k→∞

and |cj − cj0 | ≤ lim inf |cjk − cj0 |. k→∞

From the Sobolev compact embedding theorem [13] we have φk → φ in L2 (Ω). Therefore, Lemma 1 implies √ Hε (φjk ) − Hε (φj )L1 ≤ ε−1 meas(Ω)φk − φL2 → 0,

Pε (φk , c1k , c2k ) − Pε (φ, c1 , c2 )L1 = q(Hε (φk ), c1k , c2k ) − q(Hε (φ), c1 , c2 )L1 → 0. Thus, it follows from [10, Theorem 1, pg 172] that |Hε (φ)|BV ≤ lim inf |Hε (φk )|BV . k→∞

Now, from the continuity of F and q, together with the estimates above we obtain Gε,α (φ, c1 , c2 ) ≤ lim F (Pε (φk , c1k , c2k )) − y δ 2Y + α β1 lim inf |Hε (φk )|BV + k→∞ k→∞ 2 2 + β2 lim inf φk − φ0 H 1 + β3 lim inf j=1 |cjk − cj0 |2 k→∞

≤ lim inf k→∞

concluding the proof.

Gε,α (φk , c1k , c2k )

k→∞

= inf Gε,α ,

In the sequel we prove that, when ε → 0, the minimizers of Gε,α approximate a minimizer of the functional Gα . Theorem 4. Let α and βj be given as above. For each ε > 0, denote by (φε,α , c1ε,α , c2ε,α ) a minimizer of Gε,α . There exists a sequence of positive numbers εk → 0 such that (Hεk (φεk ,α ), φεk ,α , c1εk ,α , c2εk ,α ) converges strongly in L1 (Ω)×L2 (Ω)× R2 and the limit minimizes Gα on the set of admissible vectors. Proof. The functional Gα attains a generalized minimizer (zα , φα , c1α , c2α ) on the set of admissible vectors (cf. Theorem 1). From Deﬁnition 1, there exists a sequence {εk } of positive numbers converging to zero and corresponding sequences {φk } in H 1 (Ω) satisfying φk → φα in L2 (Ω), Hεk (φk ) → zα in L1 (Ω). Moreover, we can further assume [8, Lemma 3] that R(zα , φα , c1α , c1α ) = lim

k→∞

2 β1 |Hεk (φk )|BV + β2 φk − φ0 2H 1 + β3 j=1 |cjk − cj0 |2 .

Let (φεk , c1εk , c2εk ) be a minimizer of Gεk ,α . The sequences {φεk }, {Hεk (φεk )} and {cjk }j=1,2 are uniformly bounded in H 1 (Ω), BV(Ω) and R2 respectively. By the compact Sobolev embedding theorem [13], the compact embedding of BV into L1 [10] and the Bolzano-Weierstrass theorem, there exist convergent subsequences ˜ z˜ and c˜j . Summarizing, we have φε → φ˜ in whose limits are denoted by φ, k 2 1 ˜ c˜1 , c˜2 ) ∈ L (Ω), Hεk (φεk ) → z˜ in L (Ω), and cjk → c˜j in R, j = 1, 2. Thus, (˜ z , φ, L1 (Ω) × H 1 (Ω) × R2 is an admissible vector (cf. Lemma 2).

58

A. DeCezaro, A. Leitão, and X.-C. Tai

From the deﬁnition of R, Lemma 1 and the continuity of F , it follows that F (q(˜ z , c˜1 , c˜2 )) − y δ 2Y = lim F (Pεk (φεk , c1εk , c2εk )) − y δ 2Y ,

k→∞

2 ˜ c˜1 , c˜2 ) ≤ lim inf β1 |Hε (φε )|BV + β2 φε − φ0 2 1 + β3 |cj − cj |2 . R(˜ z , φ, εk k k k 0 H k→∞

j=1

Therefore, ˜ c˜1 , c˜2 ) = F (q(˜ ˜ c˜1 , c˜2 ) z , φ, z , c˜1 , c˜2 )) − y δ 2Y + αR(˜ z , φ, Gα (˜ ≤ lim inf Gεk ,α (φεk , c1εk , c2εk ) ≤ lim inf Gεk ,α (φk , c1k , c2k ) k→∞

≤

lim sup F (Pεk (φk , c1k , c2k )) k→∞

−

k→∞ δ 2 y Y

+ α lim sup β1 |Hεk (φk )|BV + β2 φk − φ0 2H 1 + β3

=

k→∞ F (q(zα , c1α , c2α ))

2 j=1

|cjk − cj0 |2

− y δ 2Y + αR(zα , φα , c1α , c2α ) = Gα (zα , φ1α , c1α , c2α ) ,

˜ c1 , c2 ) as a minimizer of Gα . characterizing (˜ z , φ, α α 4.1

Optimality Conditions for the Stabilized Functional

For numerical purposes it is convenient to derive ﬁrst order optimality conditions for minimizers of the stabilized functionals Gε,α . Therefore, we consider Gε,α in (16) with Y = L2 (Ω) and we look for the Gâteaux directional derivatives with respect to φ and the unknown constants cj for j = 1, 2. Since Hε (φ) is self-adjoint, we can write the optimality conditions for the functional Gε,α in the form of the system α(Δ − I)(φ − φ0 ) = Lε,α,β (φ, c1 , c2 ), in Ω ; (φ − φ0 ) · ν = 0, at ∂Ω (17a) α (cj − cj0 ) = Ljε,α,β (φ, c1 , c2 ), j = 1, 2 .

(17b)

Here ν(x) is the external unit normal vector at x ∈ ∂Ω, β¯ := (2β3 )−1 , and Lε,α,β (φ, c1 , c2 ) = (c1 − c2 )β2−1 Hε (φ)∗ F (Pε (φ, c1 , c2 ))∗(F (Pε (φ, c1 , c2 )) − y δ ) −β1 (2β2 )−1 Hε (φ) ∇· ∇Hε (φ)/|∇Hε (φ)| , (18a)

∗ L1ε,α,β (φ, c1 , c2 ) = β¯ F (Pε (φ, c1 , c2 ))Hε (φ) (F (Pε (φ, c1 , c2 )) − y δ ), (18b)

∗ L2ε,α,β (φ, c1 , c2 ) = β¯ F (Pε (φ, c1 , c2 ))(1 − Hε (φ)) (F (Pε (φ, c1 , c2 )) − y δ ).(18c)

5

Numerical Results

In this section a level-set type method based on the system of optimality conditions (17) is used for solving an inverse potential problem of recovering a piecewise constant function u : Ω → {c1 , c2 }, from measurements of the Cauchy data of its corresponding potential on the boundary of the domain Ω = (0, 1) × (0, 1). Notice that no knowledge of the image of u (values c1 , c2 ∈ R) is assumed.

On Level-Set Type Methods

5.1

59

The Inverse Potential Problem

To describe the direct problem, we deﬁne the operator F : L2 (Ω) → L2 (∂Ω) by F : u(x) → F (u) := wν |∂Ω , where u is a piecewise constant function in L2 (Ω) with u(x) ∈ {c1 , c2 } a.e. in Ω, and w ∈ H 1 (Ω) solves the elliptic boundary value problem Δw = u , in Ω ; w = 0 , at ∂Ω . (19) Since u ∈ L2 (Ω), the Dirichlet boundary value problem in (19) has a unique solution, namely the potential w ∈ H 2 (Ω) ∩ H01 (Ω). The inverse problem we are concerned with, consists in determining the piecewise constant source function u from measurements of the Neumann trace of w at ∂Ω, i.e. from wν |∂Ω . Using the above notation, the inverse potential problem can be written in the abbreviated form F (u) = y δ , where the data y δ has the same meaning as in (2). Other inverse problems for the operator F were considered in [3, 8]. In [3] a level set method was used for recovering the indicator function u = χD of a star-shaped domain D ⊂ R2 . In [8] a multiple level set method was used for recovering a simple function u : Ω → {c1 , . . . , c4 }. In both cases, knowledge of the (ﬁnite) image of u was assumed. 5.2

A Level-Set Algorithm for the Inverse Potential Problem

In the sequel we describe the level set regularization algorithm. This method compares to the level set method as proposed in [8]. The complexity of our algorithm is as follows: at each iteration of the level set method, four elliptic boundary value problems (BVP) are solved (two of Dirichlet type and two of Neumann type). In Table 1 an explicit ﬁxed point procedure for solving the the optimality condition (18) is outlined. In the ﬁrst step the residual rk ∈ L2 (∂Ω) of the iterate (φk , c1k , c2k ) is evaluated. This corresponds to solving one elliptic BVP of Dirichlet type. In the second step the solution hk ∈ H 1 (Ω) of the adjoint problem for the residual is evaluated. This corresponds to solving one elliptic BVP of Dirichlet type. In the fourth step, the velocity function vk ∈ H 1 (Ω) for the level-set function is evaluated. This corresponds to solving an elliptic BVPs of Neumann type. In the subsequent numerical experiments this algorithm was implemented using a ﬁnite element method for the solution of partial diﬀerential equations. 5.3

Numerical Experiment

In our experiment we consider the inverse problem of reconstructing the right hand side u in (19) from the knowledge of a single pair of Cauchy data (0, y δ ) at ∂Ω. We further assume that the level value c2 = 0 is given, and that we have to identify only the support of u and the level value c1 ∈ R+ . The data y δ = y = F (u) for solving the inverse problem is known exactly, i.e. δ = 0, and is obtained by solving numerically the elliptic boundary value problem

60

A. DeCezaro, A. Leitão, and X.-C. Tai Table 1. Level set algorithm for the inverse potential problem 1. Evaluate the residual rk := F (Pε (φk , c1k , c2k )) − y δ = (wk )ν |∂Ω − y δ , where wk solves wk = 0 , at ∂Ω . Δwk = Pε (φk , c1k , c2k ) , in Ω ; 2. Evaluate hk := F (Pε (φk , c1k , c2k ))∗ (rk ) ∈ L2 (Ω), solving Δhk = 0 , in Ω ; hk = rk , at ∂Ω . 3. Calculate Lε,α,β (φk , c1k , c2k ) and Ljε,α,β (φk , c1k , c2k ), j = 1, 2 as in (18). 4. Evaluate the velocity vk ∈ H 1 (Ω), solving (Δ − I)vk = Lε,α,β (φk , c1k , c2k ) , in Ω ; (vk )ν = 0 , at ∂Ω . 5. Update the level set function φk and the level values cjk , j = 1, 2: φk+1 = φk +

1 α

vk ,

cjk+1 = cjk +

1 α

Ljε,α,β (φk , c1k , c2k ) .

in (19) (the word ’exactly’ here means: up to the precision of the numerical method used for solving the direct problem). For the direct problem we use the values: c1 = 1, c2 = 0 to compute the exact solution. In the computation of the inverse problem, the exact solution is known a priori to assume the values {c1 , 0} (with unknown c1 ). Moreover, when the data are given exactly, the iterative level-set method is implemented without the additional regularization term |Hε (φ)|BV , i.e. β1 = 0. The solution u of the inverse problem as well as the initial guess Pε (φ0 , c10 ) for the level-set method are shown in Figure 1. Notice that the support of u corresponds to a non-connected proper subset of Ω, The initial guess c10 = 1.5 is used for the unknown level value. In Figure 2 the evolution of the level set method for the ﬁrst 1500 iterative steps is presented. As one can see in this ﬁgure, the shapes of both inclusions are reasonably reconstructed, and the level value c1 is accurately reconstructed as well. The iteration is stopped when the residual drops below the predeﬁned precision F (Pε (φk , c1k )) − yL2 < 10−2 .

Fig. 1. Numerical experiment: The picture on the left hand side shows the coeﬃcient to be reconstructed. On the right hand side, the initial condition for the level-set method.

On Level-Set Type Methods

61

Fig. 2. Numerical experiment: On the left hand side a plot of P (φk , c1k ) for k = 1500. The picture on the right hand side shows the corresponding iteration error.

We performed other numerical simulations with diﬀerent choice of initial guess (φ0 , c10 ), and observed that the number of iterative steps required in order to obtain a reasonable approximation (up to the predeﬁned precision of 10−2 in the L2 -norm) strongly depends on the choice of the initial guess c10 . On the other hand, the ﬁnal result is not sensitive with respect to the choice of the initial guess φ0 .

Acknowledgments A.DC acknowledges the support from CNPq, grant 474593/2007-0. The work of A.L. is supported by the Brazilian National Research Council CNPq, grants 306020/2006-8, 474593/2007-0, and by the Alexander von Humbolt Foundation AvH. This article was written during a visit of the author to NTU (Singapore). X.-C.T. acknowledges the support from NTU SUG 20/07 and MOE Tier II project T207B2202 (ARC 29/07).

References 1. Santosa, F.: A level-set approach for inverse problems involving obstacles. ESAIM Contrôle Optim. Calc. Var. 1, 17–33 (1995/1996) 2. Leitão, A., Scherzer, O.: On the relation between constraint regularization, level sets, and shape optimization. Inverse Problems 19, L1–L11 (2003) 3. Frühauf, F., Scherzer, O., Leitão, A.: Analysis of regularization methods for the solution of ill-posed problems involving discontinuous operators. SIAM J. Numer. Anal. 43, 767–786 (2005) 4. Chung, E., Chan, T., Tai, X.C.: Electrical impedance tomography using level set representation and total variational regularization. J. Comput. Phys. 205(1), 357– 372 (2005) 5. Chan, T., Tai, X.C.: Identiﬁcation of discontinuous coeﬃcients in elliptic problems using total variation regularization. SIAM J. Sci. Comput. 25(3), 881–904 (2003) 6. Chan, T., Tai, X.C.: Level set and total variation regularization for elliptic inverse problems with discontinuous coeﬃcients. J. Comput. Phys. 193(1), 40–66 (2004)

62

A. DeCezaro, A. Leitão, and X.-C. Tai

7. Chung, J., Vese, L.: Image segmantation using a multilayer level-sets apprach. UCLA C.A.M. Report 193(03-53), 1–28 (2003) 8. DeCezaro, A., Leitão, A., Tai, X.C.: On multiple level-set regularization methods for inverse problems. Inverse Problems 25 (to appear, 2009) 9. Tai, X.C., Chan, T.: A survey on multiple level set methods with applications for identifying piecewise constant functions. Int. J. Num. Anal. Model 1(1), 25–47 (2004) 10. Evans, L., Gariepy, R.: Measure theory and ﬁne properties of functions. Studies in Advanced Mathematics. CRC Press, Boca Raton (1992) 11. Engl, H., Kunisch, K., Neubauer, A.: Convergence rates for Tikhonov regularisation of nonlinear ill-posed problems. Inverse Problems 5(4), 523–540 (1989) 12. Engl, H.W., Hanke, M., Neubauer, A.: Regularization of inverse problems. Mathematics and its Applications, vol. 375. Kluwer Academic Publishers Group, Dordrecht (1996) 13. Adams, R.: Sobolev Spaces. Academic Press, New York (1975)

The Nonlinear Tensor Diﬀusion in Segmentation of Meaningful Biological Structures from Image Sequences of Zebraﬁsh Embryogenesis Olga Drblíková1 , Karol Mikula1 , and Nadine Peyriéras2 1

Slovak University of Technology, Radlinského 11, 813 68 Bratislava, Slovakia [email protected], [email protected] http://www.math.sk/drblikov, http://www.math.sk/mikula 2 CNRS-DEPSN, Institut de Neurobiologie Alfred Fessard, Batiment 32-33, Avenue de la Terrasse, 91198 Gif sur Yvette, France [email protected]

Abstract. In this contribution we develop a strategy for segmentation of evolving biological structures in image sequences. Our approach is based on combination of nonlinear tensor diﬀusion image smoothing and subjective surface based image segmentation. Since the ﬁne cell structure would restrain the evolving segmentation function to achieve a shape of meaningful biological structures, we have to smooth properly the images in the sequence. To that goal we apply the nonlinear tensor diffusion which enhances the connectivity of bordering structure lines and smoothes their inner parts. For the numerical implementations we use semi-implicit diamond-cell ﬁnite volume methods both for ﬁltering and segmentation. We show application of the method in image segmentation of early stages of zebraﬁsh embryogenesis.

1

Introduction

The subjective surface based segmentation is an eﬃcient tool for the extraction of 2D or 3D image objects, cf. [10,9,1]. It is also the case when dealing with twophoton laser scanning microscopy images in detecting and segmenting structures at cellular and subcellular level, cf. [6, 8]. However, the use of such algorithms when segmenting the supercellular structures is not straightforward. Using an original (not ﬁltered) image leads to entirely useless results due to the presence of small cell structures. Then a useful tool is ﬁltering by the nonlinear tensor diﬀusion enhancing the coherence of structure boundaries and smoothing the inner cell structures and noise. The model, cf. [11, 7, 4], has the following form ∂t u − ∇ · (D∇u) = 0, u(x, 0) = u0 (x), (D∇u) · n = 0,

in QT ≡ I × Ω,

(1)

in Ω, on I × ∂Ω,

(2) (3)

where u represents a greylevel 3D image intensity, u0 ∈ L2 (Ω), I = [0, T ] denotes a time interval, Ω is an image domain, D = D(u(x, t)) is a diﬀusion tensor and X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 63–74, 2009. c Springer-Verlag Berlin Heidelberg 2009

64

O. Drblíková, K. Mikula, and N. Peyriéras

n is the outer normal unit vector to ∂Ω. The model is useful when a strong ﬁltering is desirable in a preferred direction, e.g. along 2D edge surfaces in 3D images and a low smoothing is expected in the perpendicular direction.

2

Design of the Diﬀusion Tensor

The matrix D depends on a smoothed intensity gradient, which is given as ∇ut˜ = (ux1 , ux2 , ux3 )T , where ut˜(x, t) = (Gt˜ ∗ u(·, t))(x),

(t˜ > 0)

(4)

and Gt˜ is a Gaussian kernel. Provided μ = ||∇ut˜||2 > 0 we choose a triplet of vectors (v1 , v2 , v3 ) as follows v1 ∇ut˜,

v2 ⊥ ∇ut˜,

v3 ⊥ ∇ut˜,

v2 ⊥ v3 .

(5)

The direction of vector v1 corresponds to the direction of the largest intensity change. The other two vectors give a tangential plane to a level set of image intensity which may represent a 2D surface edge in a 3D image, provided that μ is large. It is called a coherence plane P, cf. [4, 7], and corresponds to an eigenspace corresponding to the eigenvalue 0 of the outer product ∇ut˜ ⊗ ∇ut˜. In order to improve the coherence, the diﬀusion tensor D must steer a ﬁltering process such that the diﬀusion is strong and increasing with the level of μ along the coherence plane and is small in the perpendicular direction. We achieve it choosing the eigenvalues of the diﬀusion tensor, which determine the diﬀusivities in the directions v1 , v2 and v3 as κ1 = α, κ2 =

α ∈ (0, 1), α 1, α, if μ = 0, α + (1 − α) exp

−C μ

, C>0

(6) otherwise.

Further, we apply other convolution with a smoothing kernel ρ to get the diﬀusion matrix D in the form ⎛ ⎞ κ1 0 0 B, if μ = 0, B = ⎝ 0 κ2 0 ⎠ D = Gρ ∗ D0 , where D0 = (7) P BP −1 otherwise, 0 0 κ2 and P represents a transition matrix from the basis (v1 , v2 , v3 ) to (e1 , e2 , e3 ). The exponential function in (6) is used to ensure that κ2 does not exceed 1. The process never stops owing to the positive parameter α. Even if μ tends to zero, a small linear diﬀusion with a diﬀusivity α > 0 still remains there. C has the role of a threshold parameter. If μ C then κ2 ≈ 1, and, conversely if μ C then κ2 ≈ α. After some manipulations we get that at any point where μ > 0, the matrix D0 has the following form ⎞ ⎛ 2 ux1 ux2 (κ1 − κ2 ) ux1 ux3 (κ1 − κ2 ) ux1 κ1 + (u2x2 + u2x3 )κ2 1⎝ ux1 ux2 (κ1 − κ2 ) u2x2 κ1 + (u2x1 + u2x3 )κ2 ux2 ux3 (κ1 − κ2 ) ⎠ (8) μ ux1 ux3 (κ1 − κ2 ) ux2 ux3 (κ1 − κ2 ) u2x3 κ1 + (u2x1 + u2x2 )κ2

The Nonlinear Tensor Diﬀusion in Structure Segmentation

65

in the standard basis (e1 , e2 , e3 ). Such choice of the matrix D0 was given in [4], it is independ on a concrete choice of v2 and v3 and can be directly and fast evaluated using the diamond-cell ﬁnite volume technique (see also next section). Then the matrices are spatially averaged using the Gaussian smoothing with a variance ρ to get the ﬁnal matrix D elements. The diﬀusion tensor possesses the smoothness, symmetry and positive deﬁniteness properties, cf. [4].

3

The Finite Volume Scheme for 3D Nonlinear Tensor Diﬀusion

Let the image u(x) be represented by a bounded mapping u : Ω → R and given by n1 × n2 × n3 voxels (ﬁnite volumes) such that it looks like a mesh with n1 rows, n2 columns and n3 layers. Let us consider an image domain Ω = (0, n1 h) × (0, n2 h) × (0, n3 h), with a voxel size h. We consider the diﬀusion process in a time interval I = [0, T ]. Let the time discretization is given by 0 = t0 < t1 < ... < tNmax = T with tn = tn−1 + k, where k is a length of a discrete time step. We will look for an approximation of solution at time tn for every n = 1, ..., Nmax . We start the scheme derivation integrating the equation (1) over a ﬁnite volume K, then provide a semi-implicit time discretization and use the divergence theorem to have

unK − un−1 K m(K) − (Dn−1 ∇un ) · nK,σ ds = 0, (9) k σ σ∈EK ∩Eint

where unK , K ∈ Th , denotes the mean value of un on K and Th is a cubic ﬁnite volume mesh. Further quantities and notations are given as follows: m(K) is the 3D measure of ﬁnite volume K with the boundary ∂K, σKL = K ∩ L is a side of the ﬁnite volume K, where L ∈ Th is a neighboring ﬁnite volume to K for which holds that the volumes K and L share a 2D surface element with a nonzero area. At several places we will replace σKL by σ only due to a notation simpliﬁcation. EK represents the set of sides such that ∂K = σ∈EK σ and E = K∈Th EK . The set of boundary sides is denoted by Eext , that is Eext = {σ ∈ E, σ ⊂ ∂Ω} and Eint = E \ Eext . Υ is the set of pairs of neighboring ﬁnite volumes deﬁned by Υ = {(K, L) ∈ Th2 , K = L, m(K ∩ L) = 0} and nK,σ is the normal unit vector to σ outward to K. Our discrete approximation solution is deﬁned as uh,k (x, t) =

N

max

unK χ{x ∈ K}χ{tn−1 < t ≤ tn },

n=0 K∈Th

where the function χ(A) is given by 1, χ{A} = 0,

if A is true, elsewhere.

(10)

66

O. Drblíková, K. Mikula, and N. Peyriéras

The ﬁnite volume approximation at the n-th time step is given by unh,k (x) =

n 1 uK χ{x ∈ K} and initial values as u0K = m(K) u0 (x)dx, K ∈ Th . We K∈Th

K

can deﬁne an auxiliary unknown φnσ (unh,k ) representing an approximation of the 1 n−1 exact averaged ﬂux m(σ) ∇un ) · nK,σ ds for any K and σ ∈ EK in order σ (D to rewrite (9) in the form unK − un−1 1 K − k m(K)

φnσ (unh,k )m(σ) = 0,

σ∈EK ∩Eint

where m(σ) is the measure of side σ.

Fig. 1. The co-volumes associated with the side σ = σWE (left) and σ = σEW (right)

φnσ (unh,k ) is built with the help of a co-volume mesh, cf. e.g. [2, 3], for the 2D case. We create a co-volume χσ associated with σ around each ﬁnite volume side by joining four vertices of this side and midpoints of the ﬁnite volumes which are common to this side, cf. Fig. 1. The co-volume boundary consists of triangles σ ¯ ⊂ ∂χσ (their vertices are denoted by N1 (¯ σ ), N2 (¯ σ ) and N3 (¯ σ )) and nχσ ,¯σ is the normal unit vector to σ ¯ outward to χσ . First, we approximate the gradient averaged on χnσ . Applying the divergence theorem we obtain 1 1 n ∇u dx = u nχσ ,¯σ ds which can be approximated as follows m(χσ ) χσ m(χ σ ) ∂χσ

1 1 n n n u m(¯ σ )nχσ ,¯σ . The values at xE + u + u pnσ (u) = m(χ (¯ σ ) (¯ σ ) (¯ σ ) N N N ) 3 1 2 3 σ σ ¯ ∈∂χσ

and xW are denoted as uE and uW . Further, we evaluate the values uTN , uTS , uBN , and uBS at the vertices xTN , xTS , xBN , and xBS , cf. Fig. 1, as the arithmetic mean of uK , where K represents the ﬁnite volumes which are common to the vertex. Since the mesh is uniform and squared, we can simplify our discrete √ h3 2 2 scheme applying the following relations: m(χσ ) = 3 , m(¯ σ ) = 4 h . After a short calculation we can state pnσ (u) =

uEn − unW un + unBN − unTS − unBS nK,σ + TN t1K,σ h 2h un + unTS − unBN − unBS t2K,σ , + TN 2h

(11)

The Nonlinear Tensor Diﬀusion in Structure Segmentation

67

6

φσT

φσN

σT

φσW

σE σW

φ σE

σN

-

σS σB

φσS

φσB

?

Fig. 2. A ﬁnite volume K, its boundaries σi , i = E, W, N, S, T, B and the ﬂuxes outward to the ﬁnite volume K

where t1K,σ is a unit vector parallel to xTN −xTS such that (xTN −xTS )·t1K,σ > 0 and t2K,σ is a unit vector parallel to xTN −xBN such that (xTN −xBN )·t2K,σ > 0. We replace the exact gradient ∇un by the discrete gradient pnσ (u) to get the numerical ﬂux in the form φnσ (unh,k ) = (Dσ pnσ (u)) · nK,σ . ⎛

(12)

⎞

¯σ D ¯σ D ¯σ D 11 12 13 ¯σ D ¯σ D ¯ σ ⎠ denotes an approximation of the mean value Dσ = Dσn−1 = ⎝ D 12 22 23 σ ¯σ ¯σ ¯ 13 D D23 D33 of the matrix D along σ which was evaluated at the previous time step using ∞ functions due to the convolutions in un−1 h,k . The elements of matrix Dσ are C (4) and (7). Let us emphasize that in (12) we always consider the matrix Dσ written in the basis (nK,σ , t1K,σ , t2K,σ ), cf. [2,3] for an analogy with the 2D model. In practice it means cf. Fig. ⎛ σ that, ⎞ 2, the matrix D given in the standard basis on a side σ σ σ D11 D12 D13 σ σ σ ⎠ D22 D23 by ⎝ D12 is the same in the new basis on two sides σW and σE . It σ σ σ D13 D23⎛D33 ⎞ σ σ σ D22 D12 D23 σ σ σ ⎠ D11 D13 has the form ⎝ D12 in the new basis for two other sides σS and σN σ σ σ D D D ⎛23 σ 13 σ 33 σ ⎞ D33 D23 D13 σ σ σ ⎠ D22 D12 and it becomes ⎝ D23 for the last two sides σB and σT . Using such σ σ σ D13 D12 D11 matrix representations, the deﬁnition (12) can be written in the form ⎡⎛ ⎞⎤ ⎛ ⎞ n ⎞⎛ un E −uW ¯σ D ¯σ D ¯σ D 1 h 11 12 13 ⎢ ¯ σ ¯ σ ¯ σ ⎠ ⎜ unTN +unBN −unTS −unBS ⎟⎥ ⎝ ⎠ φnσ (unh,k ) = ⎣⎝ D · 0 = ⎠ ⎦ ⎝ 12 D22 D23 n 2h n n ¯σ D ¯σ D ¯σ un TN +uTS −uBN −uBS 0 D 13 23 33 n

n

n

n

n

n

2h

n

n

n

n

σ uE −uW σ uTN +uBN −uTS −uBS σ uTN +uTS −uBN −uBS ¯ 12 ¯ 13 ¯ 11 +D +D . =D h 2h 2h

68

O. Drblíková, K. Mikula, and N. Peyriéras

Finally, let us summarize our semi-implicit finite volume scheme: unK − un−1 1 K − k m(K) where

φnσ (unh,k ) =

φnσ (unh,k )m(σ) = 0,

(13)

σ∈EK ∩Eint n n n u ¯ σ uTN ¯ σ E − uW + D D 11 12

h n n n u σ TN + uTS − uBN ¯ 13 +D 2h

+ unBN − unTS − unBS 2h − unBS .

(14)

Due to the computation of the values uT N , uT S , uBN and uBS in (14) as the arithmetic mean of neighboring voxel values, we end up with the 27 point ﬁnite volume scheme.

4

Segmentation

Our segmentation approach is based on the subjective surface method [10] and its ﬁnite volume implementation from [9]. The mathematical model has the following form ∇u 0 2 2 , in QT ≡ I × Ω,(15) ∂t u = ε + |∇u| ∇. g(|∇Gσ ∗ I |) ε2 + |∇u|2 u(x, 0) = u0 (x), u = 0,

in Ω,

(16)

on I × ∂Ω,

(17)

where I 0 is the image which is segmented and ε is the regularization parameter. The solution u represents here the evolving segmentation function. The function g = g(|∇Gσ ∗ I 0 |) has the role of the edge detector, which requires a suitable 1 choice of g in practice, e.g. g(s) = 1+Ks 2 , K > 0. In the subjective surface method we start the segmentation constructing the initial segmentation function located in an approximate object center. The segmentation function is driven by equation (15) and evolves to a numerical steady state. Its shock proﬁle gives the segmentation result and shape of the object. To that goal, we choose a suitable isoline of the shock proﬁle which represents the boundary of the segmented object. This isoline is most naturally taken as the average of maximal and minimal value of the ﬁnal segmentation function.

5

Numerical Experiments

The goal of this section is to discuss our computational results and the inﬂuence of nonlinear tensor diﬀusion ﬁltering on the time evolving biological structure segmentation. We perform our experiments on the 3D image sequences of cell nuclei, cf. Fig. 3, and cell membranes, cf. Figs. 4-6. The images represent early stages of the zebraﬁsh embryogenesis and were created by the two-photon laser scanning microscope. We apply the 3D numerical scheme to ﬁlter the images,

The Nonlinear Tensor Diﬀusion in Structure Segmentation

69

Fig. 3. 2D slices of a 3D zebraﬁsh embryo image. Left: the original image. Right: the ﬁltered image after 50 time steps.

then the segmentation is performed on 2D image slices (512×512 pixels) in order to ﬁrstly test the performance and capabilities of the method. First experiment illustrates the behaviour of nonlinear tensor diﬀusion in ﬁltering of this type of images, cf. Fig. 3. One can observe that this type of diﬀusion improves the connectivity of structure bordering lines while it smoothes the structure interiors. One can compare the original image showing separate nuclei but with observable structure borders with the ﬁltered one, where the structure border lines are connected. Our next experiments are devoted to the segmentation of eye retina structure in the several subsequent image slices. First, the initial segmentation function is given by two cones which are inside the structure such that their partially overlapping bases suﬃciently cover the eye structure area. Then we evolved it in the original as well as ﬁltered images. Using the original image we obtained the ﬁnal state of segmentation function represented by a variety of diﬀerent level lines, cf. Fig. 4 (top, right). The question is, which isoline would represent the most precisely the structure shape. The natural choice is a medium isoline which is depicted in the original image Fig. 4 (top, left). One can clearly see the large diﬀerence between the segmented and real structure shape due to the restraints of evolving segmentation function caused by inner cell structures. In order to compare our method with other ﬁltering techniques we performed several tests. The segmentation results obtained on the images ﬁltered by the geodesic mean curvature ﬂow (GMCF) ﬁltering, the mean curvature ﬂow (MCF) ﬁltering and the Perona-Malik (PM) ﬁltering are shown in Fig. 5. In Fig. 5 (right), we can see that after ﬁltering the proﬁles of ﬁnal segmentation functions are not well suited for our purposes although the MCF results is rather close to the real one. They are again given by several diﬀerent isolines and medium one, cf. Fig. 5 (left), represents the segmented structure only partially. This is a consequence of edge

70

O. Drblíková, K. Mikula, and N. Peyriéras

Fig. 4. The eye retina segmentation using the 2D original image (top) and image ﬁltered by 20 time steps of the nonlinear tensor diﬀusion (middle). Left: the averaged isoline of the ﬁnal state of segmentation function is superimposed to the original and ﬁltered image, respectively. Right: the graphs of the ﬁnal state of segmentation function is plotted after 2000 time steps using the original image and after 200 time steps using the ﬁltered image. At the bottom we display the original (left) and the ﬁltered image (right).

The Nonlinear Tensor Diﬀusion in Structure Segmentation

71

Fig. 5. The eye retina segmentation using the ﬁltered image by 100 steps of the GMCF ﬁltering (top), 25 steps of the MCF ﬁltering (middle) and 20 steps of the PM ﬁltering (bottom). Left: the averaged isoline of the ﬁnal state of segmentation function is superimposed to the ﬁltered image. Right: the graphs of the ﬁnal state of segmentation function is plotted after 3000 segmentation steps using the GMCF ﬁltering, after 500 segmentation steps using the MCF ﬁltering and 5000 segmentation steps using the PM ﬁltering.

72

O. Drblíková, K. Mikula, and N. Peyriéras

Fig. 6. The segmentation results for the image sequence which are superimposed to the original slices

preserving smoothing by GMCF and PM which can not remove inner cell structures. On the contrary, the ﬁnal steady state of segmentation function evolving in the image ﬁltered by nonlinear tensor diﬀusion consists of isolines accumulated

The Nonlinear Tensor Diﬀusion in Structure Segmentation

73

along the real structure boundary, cf. Fig. 4 (middle, right). The formation of correct shock proﬁle was enabled due to the smoothing of cell structure barriers and noise removal and the emphasizing of structure boundaries. Embedding the medium isoline into the image, cf. Fig. 4 (middle, left) we achieved the precise structure shape. Then the segmentation procedure was successively applied in the image sequence part consisting of 11 images, cf. Fig. 6. We use the backward in time strategy starting from the last image of the sequence segmented as explained above. The initial segmentation function for other slices is taken as the ﬁnal result of the segmentation of the previous image. Fig. 6 shows the segmentation results displayed on the original membrane images from last 150th image slice (top), to the 145th slice (middle) up to the 140th slice of the processed image sequence (bottom). In experiments dealing with the nonlinear tensor anisotropic diﬀusion we used the spatial step h = 0.01, time step k = 0.0001, C = 1, α = 0.001, t˜ = 10−5 , ρ = 0.002, 20 time steps for the ﬁltering of membranes images and t˜ = 10−10 , ρ = 0.1, 50 time steps for the ﬁltering of nuclei images. The arising linear systems were solved by the Gauss-Seidel iterative method. For the segmentation experiments we use the following parameters: ε = 10−4 , the spatial step h = 0.01, time step k = 0.01, δ = 10−6 for a stopping criterion and K = 100 (a constant of the 1 function g(s) = 1+Ks 2 ), cf. [9]. The resulting linear systems were solved by the SOR method.

6

Conclusions

In the article we concern with the technique for embryo structure segmentation in image sequences. Since a noise and cell structures restrain the correct segmentation evolution, as the ﬁrst step, we smooth the image sequence. We choose the nonlinear tensor diﬀusion due to the fact that this ﬁltering not only smoothes image objects but emphasizes connections of their boundaries as well. Then, the segmentation process starts using an artiﬁcial initial function centered inside the biological structure of the ﬁrst image in the sequence. The segmentation result given by the subjective surface method obtained for this image is used as the initial condition for the next image of processed sequence, etc. Our experiments conﬁrm the usefulness of the nonlinear tensor diﬀusion for this type of segmentation.

Acknowledgment The work was supported by the European projects Embryomics and BioEmergences, the grants APVV-RPEU-0004-06, APVV-0351-07, APVV-LPP-0020-07 and the grant of VEGA 1/0269/09.

74

O. Drblíková, K. Mikula, and N. Peyriéras

References 1. Corsaro, S., Mikula, K., Sarti, A., Sgallari, F.: Semi-implicit co-volume method in 3D image segmentation. SIAM J. Sci. Comput. 28(6), 2248–2265 (2006) 2. Coudiere, Y., Vila, J.P., Villedieu, P.: Convergence rate of a ﬁnite volume scheme for a two-dimensional convection-diﬀusion problem. M2AN Math. Model. Numer. Anal. 33, 493–516 (1999) 3. Drblíková, O., Mikula, K.: Convergence Analysis of Finite Volume Scheme for Nonlinear Tensor Anisotropic Diﬀusion in Image Processing. SIAM J. Numer. Anal. 46(1), 37–60 (2007) 4. Drblíková, O., Mikula, K.: Semi-implicit Diamond-cell Finite Volume Scheme for 3D Nonlinear Tensor Diﬀusion in Coherence Enhancing Image Filtering. In: Eymard, R., Herard, J.M. (eds.) Finite Volumes for Complex Applications V: Problems and Perspectives, ISTE and WILEY, London, pp. 343–350 (2008) 5. Eymard, R., Gallouët, T., Herbin, R.: Finite Volume Methods. In: Ciarlet, P., Lions, J.L. (eds.) Handbook for Numerical Analysis, vol. 7. Elsevier, Amsterdam (2000) 6. Frolkovič, P., Mikula, K., Peyriéras, N., Sarti, A.: A counting number of cells and cell segmentation using advection-diﬀusion equations. Kybernetika 43(6), 817–829 (2007) 7. Meijering, E., Niessen, W., Weickert, J., Viergever, M.: Diﬀusion-Enhanced Visualization and Quantiﬁcation of Vascular Anomalies in Three-Dimensional Rotational Angiography: Results of an In-Vitro Evaluation. Medical Image Analysis 6(3), 217– 235 (2002) 8. Mikula, K., Peyriéras, N., Remešíková, M., Sarti, A.: 3D embryogenesis image segmentation by the generalized subjective surface method using the ﬁnite volume technique. In: Eymard, R., Herard, J.M. (eds.) Finite Volumes for Complex Applications V: Problems and Perspectives, ISTE and WILEY, London, pp. 585–592 (2008) 9. Mikula, K., Sarti, A., Sgallari, F.: Co-volume level set method in subjective surface based medical image segmentation. In: Suri, J., et al. (eds.) Handbook of Medical Image Analysis: Segmentation and Registration Models, pp. 583–626. Springer, New York (2005) 10. Sarti, A., Malladi, R., Sethian, J.A.: Subjective Surfaces: A Method for Completing Missing Boundaries. Proceedings of the National Academy of Sciences of the United States of America 12(97), 6258–6263 (2000) 11. Weickert, J.: Coherence-enhancing diﬀusion ﬁltering. Int. J. Comput. Vision 31, 111–127 (1999)

Composed Segmentation of Tubular Structures by an Anisotropic PDE Model Elena Franchini, Serena Morigi, and Fiorella Sgallari Department of Mathematics-CIRAM, University of Bologna, Bologna, Italy {franchini,morigi,sgallari}@dm.unibo.it

Abstract. In this work we introduce the composed segmentation (Csegmentation), that is a priori composition of sources to obtain a single one segmentation result according to speciﬁc logic combinations. The approach and the segmentation model are general but we apply the C-segmentation technique to the challenging problem of segmenting tubular-like structures. The reconstruction is obtained by continuously deforming an initial distance function following the Partial Diﬀerential Equation (PDE)-based diﬀusion model derived from a minimal volumelike variational formulation. The gradient ﬂow for this functional leads to a nonlinear curvature motion model. An anisotropic variant is provided which includes a diﬀusion tensor aimed to follow the tube geometry. Numerical examples demonstrate the ability of the proposed method to produce high quality 2D/3D segmentations of complex and eventually incomplete synthetic and real data.

1

Introduction

Segmentation of three-dimensional (3D) images can be a very useful computer aided diagnosis tool for clinical routines or surgical planning. We use the term composed segmentation for systems that extract structures from several images, by combining them according to speciﬁc Boolean operations. Traditionally, the segmentation process independently performed on single images have to be combined by cumbersome algorithms. The goal of C-segmentation is to combine complementary multispatial, multisensor, multitemporal and/or multiview information into one new domain containing only the information to be segmented. The term composed means by Boolean operations which depends on the application requirements. The individual images entering the C-segmentation process need to be registered to a common frame of reference, this is a nontrivial task which could aﬀect the robustness of the segmentation approach, but it is not addressed in this work. We assume the input images have been preliminary registered. Let us illustrate the role of C-segmentation in diﬀerent applications. Multimodal fusion deals with images that capture diﬀerent physical properties of the original scene. In this case, C-segmentation identiﬁes and segments the union of regions of interest. Multispatial fusion is related to several images which cover a single one scene, for example several aerial photographs to represent an entire territorial region, or multiple CT scans to reconstruct a human organ. The X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 75–86, 2009. c Springer-Verlag Berlin Heidelberg 2009

76

E. Franchini, S. Morigi, and F. Sgallari

C-segmentation uniﬁes all the information, eventually replicated in the multiple sources, into a single one segmented structure. Multitemporal composition requires the comparison between images representing the same structures acquired at diﬀerent timing. For example, in medical analysis, a tumor-region growing is monitored by subsequently images of the region of interest. The C-segmentation can identify and segment the diﬀerence between structures in two images, reconstructing the grown area. While our segmentation methodology is quite general, we focus our attention on the most challenging problem of tubular-like segmentation which is particularly diﬃcult in case of multiple sources due to the huge amount of connected structures that should be reconstructed. In particular, we will consider applications in medical image analysis which are interested in the extraction of anatomical surfaces of tubular structures like blood vessels. Indeed, problems like aneurysm or stenosis can occur in a vessel, and the clinicians need tools to help them in interpreting and quantifying the images for evaluating the pathology, for proposing a therapy or a surgical operation, for planning minimally invasive treatment. A number of deformable model-based approaches for vessel segmentation or, generally tube-like structures, have received considerable attention and success. We refer the reader to [4] for an extended review on vessel segmentation algorithms. Since explicit deformable model representation is usually impractical, level set techniques to evolve a deformable model have been recently introduced, which provide implicit representation of a deformable model. A curve in 2D or a surface in 3D evolves in such a way as to cover a complex shape or structure. Its initialization can be either manual or automatic and it needs not to be close to the desired solution. A disadvantage of level sets segmentation approach is the computational eﬀort required to cover the entire domain of interest which is, in general, one dimension higher than the original one. Interested readers are referred to recent literature on the level set segmentation strategy for tubular structures [5], [6], [7], [8], [10]. A generalization of the single-channel active contour without edges model is proposed in [9] for object detection using logic operations. This logic framework suﬀers from the active contour model limits, and is not suitable for detecting tubular structures. In this work, we modify a geometric deformable model segmentation procedure based on level sets [2], to obtain a fast and accurate method for solving the C-segmentation problem to extract tubular structures from multiple 2D/3D images, and we apply the proposed segmentation method for segmenting blood vessels, neurovascular structures and similar characteristics medical images. The main contributions of this work concern the design of a strategy to deal with directionality in the vessels based on a diﬀusion tensor, and the capability to compose segmentation of multiple images according to Boolean operations. The former makes the segmentation algorithm able to follow tubular structures and connect eventually disconnected parts, while the latter let simultaneously combine diﬀerent information into a robust segmentation method. The proposed method is able to segment twisted, convoluted, and occluded structures without

Composed Segmentation of Tubular Structures

77

the user interactivity, following branching of diﬀerent layers, from thinner to larger structures. One of the major disadvantages of the geometric deformable models, that is the computational cost, is strongly reduced by the proposed numerical approach which limits the dimensions of the linear systems involved in the solution. The paper is organized as follows. The non-linear PDE model for tubular structures segmentation is introduced in Section 2, and numerical aspects related to the discretization of the PDE model are discussed in Section 3. The Csegmentation algorithm is discussed in details in Section 4, and the anisotropic variant of the segmentation model is introduced in Section 5. Synthetic as well as real tests are provided in Section 6. Some selected 3D examples are also presented in Section 6 to demonstrate the eﬀectiveness of this technique for automatic segmentation of bloods vessels in volumetric MRA/CTA images. Section 7 contains concluding remarks.

2

A Segmentation Model for Tubular Structures

Several recently proposed 3D segmentation methods are based on deformable models, which can naturally capture the physics and geometry of shapes varying in space and time. In this section we formulate the segmentation problem as a special deformation of a 3D manifold driven by the structures we want to recover. Classical segmentation problems show oversmoothed structures and eventually uncomplete boundaries and the surface evolution usually ﬂows over the boundaries of longer and thinner objects when propagating. A common choice to detect structure boundaries and to drive diﬀusion or segmentation process is the Perona-Malik diﬀusivity g(s) = 1/(1 + (s/ρ)2 ),

(1)

where ρ > 0 is a small positive constant. For the implicit representation of the segmented surface, we consider a special 3D manifold which is the graph of a trivariate function φ mapping an open set Ω ⊂ R3 into R. The problem of determining the surface that best ﬁts the object boundary represented in a 3D image I, can be posed as a volume minimization problem with objective function Vg := g(∇I)dV dV = 1 + ∇φ2 dxdydz, (2) Ω

where the metric g is deﬁned by (1) in Ω and Vg represents the weighted volume of a 3D manifold on Ω. The volume functional (2) can be minimized and according to the steepest descent, reading ε = 1, we have ∇φ ∂φ 2 2 = ε + ∇φ ∇. g(∇I) , (3) ∂t ε2 + ∇φ2

78

E. Franchini, S. Morigi, and F. Sgallari

or, equivalently, in advection-diﬀusion form ∂φ 2 ∇φ 2 = ε + ∇φ g(∇I)∇. + ∇g · ∇φ. ∂t ε2 + ∇φ2

(4)

The PDE model (4), ε = 1, represents the mean curvature motion of the 3D manifold in 4D space with metric g. The metric g in (4) is the edge function appropriately chosen so that the object boundaries act as attractors under a particular ﬂow. This term allows us to extract sharp features, such as edges, corners, spikes, and to accelerate the deformation of the initial function. In the evolution of φ according (4) the 3D manifold assumes constant values for most regions far from the boundaries. The ﬁrst term in (4) corresponds to a minimal volume regularization weighted by the function g, while the second term corresponds to the attraction to the image edges. The advection term in equation (4) introduces a driving force which moves the level surfaces towards the object boundaries. Equation (3) in case ε ∈ (0, 1] is proposed in [2] for dealing with the boundary completion problem. The variability in the parameter ε, ε ∈ (0, 1], provides both a regularization eﬀect and a hole ﬁlling strategy. The eﬀect of the parameter ε is to segment boundaries which are eventually uncompleted due to, for example, noise or corruptions in the acquisition phase. However, this does not help in the reconstruction of slightly disconnected tubular structures. The latter problem is solved by the introduction of a suitable diﬀusion tensor, which is discussed in Section 5. The starting initial function φ0 is usually a problem, since it involves user interaction for locating some starting points at one particular recognizable part of the structure to be segmented inside the 3D image. This is overcome by our method which automatically initialize the surface evolution using a suitably designed distance function, as described in Section 4. We can adapt the PDE model (3) to compute Boolean operations between implicit surfaces M1 and M2 . This can be carried out quite easily, using the min, max tools on the related signed distance functions dist1 (x) and dist2 (x). In fact the union, intersection and diﬀerences between two surfaces can be obtained applying the evolving PDE (3) initialized by (9) in Section 4, where dist(x) is deﬁned respectively by dist(x) = min{dist1 (x), dist2 (x)}, union dist(x) = max{dist1 (x), dist2 (x)} intersection dist(x) = max{dist1 (x), −dist2 (x)} diﬀerence(M1 − M2 ) dist(x) = max{−dist1 (x), dist2 (x)} diﬀerence(M2 − M1 ).

3

(5)

Solving the PDE Model

The computational method for solving (3) is based on an eﬃcient semi-implicit co-volume scheme as suggested in [2]. The semi-implicit in time discretization is obtained by treating the nonlinear terms of the equation from the previous

Composed Segmentation of Tubular Structures

79

time step while the linear ones are considered at the current time level. Timediscretization of (3) by Euler’s method yields the following semi-discrete scheme Let τ be a uniform discrete time step, φ0 be a given initial function. Then, for every discrete time step tn = nτ , n = 1, . . . N , we look for a function φn , solution of the equation 1 φn − φn−1 ∇φn 0 = ∇. g , (6) τ ε2 + ∇φn−1 2 ε2 + ∇φn−1 2 where g 0 := g(∇I). The computational domain is obtained through Ω-decomposition into cubic cells and the construction of a co-volume mesh using a complementary 3D tetrahedral grid. Following the classical ﬁnite volume methodology we integrate (6) over every co-volume p, p = 1, . . . , M and according to the details explained in [2] we get at the time step n, a system of linear equations which can be written in matrix-vector form as AΦn = b,

(7)

where A ∈ RM×M is the coeﬃcient matrix, which is symmetric and diagonally dominant M-matrix, and Φn = (φ1 , . . . , φM ) is the vector solution. Since the unknown function φ(x, t) evolves only on nodes suﬃciently close to the structure boundary, we speed up the computation by determining the updated values for φ(x, t) only for the nodes identiﬁed by initial function φ0 (x) > η, for a given small positive threshold η. In case of vessel structures, for example, this means a signiﬁcant reduction of the computational eﬀort required since the number of nodes representing the vessels is small compared with the dimension of the entire 3D image which contains them. In practice, at each time step, the number of unknowns of the linear system (7) is signiﬁcatively reduced, and thus both the storage and the computational cost are much lower. Since at each row of A corresponds a node in Ω, if we consider a limited number of nodes M1 << M , we get a linear system with a sparse coeﬃcient matrix of rank M1 , which contains M − M1 zero rows and columns. It is easy to verify that if we ﬁrst apply a suitable permutation of rows and columns of A , and corresponding elements of b, we get a linear ∈ RM1 ×M1 with full rank which has the same system with a coeﬃcient matrix A properties as A, that is, it’s symmetric and positive deﬁnite. In a similar way, the same permutation applied to the components of the right-hand side vector b, leads to a vector b ∈ RM1 . Therefore, instead of solving the linear system (7) which involves a M × M coeﬃcient matrix, we can apply the preconditioned conjugate gradient iterative method, using diagonal scaling as preconditioner, of the linear system for computing the solution Φ = b, AΦ with a negligible approximation error on the solution.

(8)

80

E. Franchini, S. Morigi, and F. Sgallari

4

The C-Segmentation Algorithm

The C-segmentation procedure consists of two steps: the partitioning phase driven by the Boolean operations to obtain an initial function, and the segmentation phase. The case of segmentation of tubular structures from a single input image is trivially a simpliﬁcation of the proposed C-segmentation method. In this section, we discuss the implementation of the PDE model (3) for segmentation of multiple 3D images. Let Ω be a common support. Each gray-level 3D image i containing tubular structures of interest is thresholded to obtain a rough segmentation estimate which is used to generate an initial distance function. The thresholding process consists of making a binary image Ii from the input i image based on a user threshold. Since the images can be represented by a real-valued function deﬁned on a region Ω ⊂ R3 or by the discretization of such a function, the threshold process let us deﬁne in a natural way the sub-domain Ωi ⊂ Ω, Ωi := {x ∈ Ω : Ii (x) = 1}. The choice of the threshold value is not an issue due to the fact that in practical cases the structures to be segmented are characterized by a particular intensity value represented by a given gray level. For example, vascular system containing a contrast ﬂuid, can be identiﬁed as the brightest formations in the volume. As it will be better illustrated by the examples in Section 6 the 3D mask Ii resulting from the pre-computed binarization, in general, preserves the largest tubular structures while breaks up into small pieces the thinnest ones. These structures will be perfectly recover and reconnected by the surface evolution process. Each image i has an associated set (region) Ωi . Depending on the type of the fusion task {Ωi }i=1,··· ,S , where S is the number of input images, need not be exactly disjoint and/or cover the whole region Ω. The composed signed distance function dist(x) is then obtained by the signed distance functions disti (x), i = 1, .., S, with respect to Ωi , following the rules (5). We deﬁne the initial function φ0 as follows 1 if dist(x) < 0 . (9) φ0 (x) = dist(x) (1 − max{dist(x)} ) else The implicit function φ0 represents the initial surface which continuously evolves, by following equation (3), towards the boundaries of the tubular structures. The evolution is stopped when a change of solution in time (in L2 -norm) is less than a prescribed tolerance. A critical issue is the deﬁnition of the composed diﬀusion function g(·) in (3) when the C-segmentation considers several input images. For example, in case of diﬀerence between two images, which are characterized by g 1 and g 2 , the composed diﬀusion function g 0 used in (3) is given by if ((g 1 ≤ )and(g 2 ≤ ) or (g 1 ≥ )and(g 2 ≥ )) g 0 = 1 else g 0 = min(g 1 , g 2 ),

(10)

with > 0, we used = 1 · 10−3 in the computational examples. We can proceed similarly for other Boolean operations.

Composed Segmentation of Tubular Structures

81

Finally, the reconstructed surface is obtained from the implicit surface φ as the zero level set of the function φ(x) − s, that is the s-level set of φ: {x ∈ Ω : φ(x) = s}, where s = (max(Φ) + min(Φ))/2. This is motivated by the fact that the ﬂow driven by (3) forms a sharp step in the proximity of the object boundaries, while it approaches at constant values inside/outside the object.

5

An Anisotropic Variant of the Segmentation Algorithm

In this section we provide a variant of the isotropic model (3) designed to improve signiﬁcantly the connectivity of the coherent structures in the segmentation. The idea is to incorporate local orientation of the tubular structures into the dynamic segmentation process in such a way that at each time step the surface evolves by isotropic mean curvature motion in homogeneous regions, while it is driven by the directional ﬁeld representing the orientation of the tube in presence of tubular structures. We aim to capture the vessel’s structure and the vessels directions locally by a local spatial coherence descriptor. Coherence enhancing image smoothing has been introduced by [3] and successfully applied in image ﬁltering by anisotropic diﬀusion. This type of nonlinear diﬀusion includes the construction of a diﬀusion tensor which is built as follows. Given an image I, and its Gaussian-smoothed version ∇Iσ , a regularized shape descriptor is provided by Jδ (∇Iσ ) := (Kδ ∗ (∇Iσ ⊗ ∇Iσ )) (11) where Kδ is a Gaussian kernel with δ ≥ 0. The matrix Jδ is symmetric positive semi-deﬁnite and its eigenvalues μ1 ≥ μ2 integrate the variation of the gray values within a neighborhood of size O(δ). They describe the average contrast in the corresponding eigendirections v1 and v2 . The orientation of the eigenvector v2 , corresponding to the smaller eigenvalue, represents the direction of lowest ﬂuctuations, the so-called coherence orientation. In this way, constant areas are characterized by μ1 = μ2 = 0, while straight edges give μ1 μ2 = 0. The normalized coherence value which measures the anisotropic structures within a window of scale δ is thus deﬁned as c=

(μ1 − μ2 )2 , max{(μ1 − μ2 )2 }

c ∈ [0, 1].

(12)

Thus c approaches to 1, for anisotropic structures and tends to zero for isotropic structures. The diﬀusion tensor D is a matrix with the same eigenvectors as the (regularized) structure tensor Jδ and its eigenvalues are given by λ1 = g(∇I) g(∇I) λ2 = g(∇I) + (1 − g(∇I))e−κ/c

if μ1 = μ2 κ > 0, else

(13)

where g(·) is the composed diﬀusion function, deﬁned for example, by (10) which suitably adapts its values to the anisotropy. The parameter κ has the role of a

82

E. Franchini, S. Morigi, and F. Sgallari

threshold, and c is the coherence deﬁned in (12). Therefore, the matrix D has the following form

T

v1 λ 0 . (14) D = [ v1 v2 ] 1 0 λ2 v2T In local homogeneous areas of an image the diﬀusion is reduced to be the isotropic mean curvature motion driven by (3), in fact we have D = g(∇I). Areas nearby elongated structures are characterized by values of g(·) approaching to zero, which gives λ2 >> λ1 . The eﬀect on the diﬀusion of the segmentation function is thus stronger along the coherence directions. In all the experiments reported in Section 6 we set κ = 1 · 10−5 . Since if c >> κ then λ2 ≈ 1, while if c << κ then λ2 ≈ g(∇I), the segmentation function ﬂows along the coherence direction when it approaches to the edges and stops when the object boundary is reached. The incorporation of the diﬀusion tensor D deﬁned by (14) in the segmentation model (3) leads to the following nonlinear anisotropic segmentation ∂φ 2 ∇φ = ε + ∇φ2 ∇. D , φ(x, 0) = φ0 (x) x ∈ Ω. (15) ∂t ε2 + ∇φ2 We will refer to the models (3) and (15) by isotropic and anisotropic segmentation models, respectively. The PDE model (15) can be easily extended to 3D or 4D segmentation problems; we refer the reader to [1] for the numerical aspects involved in the discretization of the diﬀusion tensor.

6

Results and Experiments

We tested the performance of the proposed C-segmentation method on 2D/3D synthetic and real examples. Example 1 and 2 demonstrate the performance of the anisotropic C-segmentation model compared with the isotropic one to verify the important role of the diﬀusion tensor in the segmentation of tubular structures. Examples 3, 4 and 5 illustrate results from the application of composed segmentation of 2D/3D images. For all the experiments, we apply C-segmentation algorithm as illustrated in Section 4, with or without diﬀusion tensor, and we set the time step parameters involved to τ = 1 · 10−3 , for 3D images and τ = 1 · 10−2 for the 2D images, and ε = 1 · 10−3 in (3). We stop the iterations when the change of solution in time is less than 1 · 10−4. In our experiments, the segmentation of images with high contrast (see Examples 2 and 3) provides good results when an automatic choice for parameter s is chosen such as s = (max(Φ) + min(Φ))/2, while in case of images like in Examples 1, 4 and 5, the results are more sensitive to the choice of s, and visual inspection could be required to tune the automatic choice of s. Example 1. In the ﬁrst 2D example we consider the segmentation of a carotid vascular system represented in a Magnetic Resonance Angiography (MRA) image of 182 × 182 pixels shown in Fig. 1 (left). MRA is based on detection of signals from ﬂowing blood and suppression of signals from other, static, tissues.

Composed Segmentation of Tubular Structures

83

Fig. 1. Carotid vascular system segmentation; the original image (left); results of the isotropic model (center), and anisotropic model (right)

Fig. 2. First row: original synthetic image (left); results obtained by the isotropic (left) and anisotropic segmentation model (right). Second row: associated segmentation function φ is shown at the ﬁnal time step of isotropic (left) and anisotropic segmentation (right).

The blood vessels appear as high intensity regions in the image. The structures to be segmented represent several vessels of variable diameters which are close to each other, partial occlusions and intersections make the segmentation very challenging. In Fig.1 the segmented structures obtained by 10 iterations of the isotropic (center) and anisotropic (right) segmentation models are displayed. The boundary curves in pink color are extracted using contour values s = 0.90 and s = 0.98, respectively. Visual comparison shows the anisotropic segmentation method to give the most accurate restoration. Example 2. The second example illustrates the ability to reconstruct structures which present small occlusions along the coherence direction. The synthetic image to be segmented of dimension 200 × 200 pixels is shown in Fig. 2 (ﬁrst row, left). Applying 10 time steps of the anisotropic segmentation model, the structure is well reconstructed while maintaining the narrowing, as shown in Fig.2

84

E. Franchini, S. Morigi, and F. Sgallari

(a)

(b)

(c)

(d)

Fig. 3. C-segmentation of images (a) and (b). Result in (c) of (b)\(a) and in (d) of (a)\(b).

(ﬁrst row, right). The propagation driven by the isotropic segmentation model leads to enhance the disconnections, as shown in Fig.2 (ﬁrst row, center). The boundary curves determined as iso-contours of the segmentation functions shown in Fig.2 (second row), using s = 0.95 are over-imposed on the original image and shown in Fig.2 (ﬁrst row). Example 3. Veriﬁcation of the proposed C-segmentation method is carried out on a real echo image show in Fig.3 (a) where we generated one synthetic bump and two synthetic holes, simulating aneurysm and stenosis eﬀects, as illustrated in Fig.3 (b). Applying C-segmentation to the Boolean diﬀerence (b) \ (a) we are able to enhance the aneurysm as shown in Fig.3 (c), while to detect the stenosis eﬀects we apply C-segmentation to the Boolean diﬀerence (a) \ (b), obtaining the results illustrated in Fig.3 (d). Example 4. Volumetric segmentation is applied to a Computed Tomographic Angiography (CTA) data set shown in Fig.4(left). The volumetric data set kidney of dimension 201 × 201 × 201 has been extracted from a 436 × 436 × 540 CTA image of the kidney vasculature system and present vessel patterns with diﬀerent curvatures, diameters and bifurcations. The ﬁnal segmentation obtained after 10 time steps of the segmentation algorithm is shown in Fig.4(right). Example 5. The last experiment demonstrates the performance of the proposed technique to segment a complex structure extracted from two overlapping 3D

Composed Segmentation of Tubular Structures

85

Fig. 4. Segmentation of the kidney volume data set: (left) original CTA image (right) segmentation result

Fig. 5. Segmentation of the lung volume data set: (left) original MRA image, (right) C-segmentation result

MRA images representing the airway tree of a human lung. The original MRA image is illustrated in Fig.5(left) using volume rendering. In particular the ellipsoidal area outlined in the image includes the region of interest for segmentation inside the lung image. The data set lung is DICOM format and sample images available on the web site (http://www.osirix-viewer.com/Downloads.html). We simulated the acquisition of two partially overlapping 3D MRA images covering the area of interest consisting of 156 × 156 × 156 voxels each, and we applied the C-segmentation algorithm using union Boolean composition. Fig.5(right) shows the obtained single one segmentation of the human airway tree, where several branching generations were detected. These preliminary results demonstrate the applicability of the developed method for the C-segmentation of quite diﬀerent and topological complex tubular structures.

86

7

E. Franchini, S. Morigi, and F. Sgallari

Conclusions

In this paper we introduced the C-segmentation method based on the distance function which is successfully applied to 2D/3D images containing tubular structures. The algorithm is automatic, accurate and fast. The latter is due to a speed up strategy in the iterative method for linear systems. The algorithm is provided with a diﬀusion tensor to move the evolving surface toward elongated tubular structures, connecting gaps in the underlying raw data, while keeping the structures distinguished along the coherence direction. This model deﬁnes a general segmentation framework which combines object information from diﬀerent images into any logical combination, rather than following the diﬃcult a posteriori process to compose segmentation results obtained separately.

Acknowledgments This work has been supported by PRIN-MIUR-Coﬁn 2006, project and by University of Bologna "Funds for selected research topics".

References 1. Drblikova, O., Mikula, K.: Semi-implicit Diamond-cell Finite volume Scheme for 3D Nonlinear Tensor Diﬀusion in Coherence Enhancing Image Filtering. In: Eymard, R., Herard, J.M. (eds.) Finite Volumes for Complex Applications V: Problems and Perspectives, ISTE and WILEY, London, pp. 343–350 (2008) 2. Corsaro, S., Mikula, K., Sarti, A., Sgallari, F.: Semi-implicit covolume method in 3D image segmentation. SIAM J. Sci. Comput. 28(6), 2248–2265 (2006) 3. Weickert, J., Scharr, H.: A scheme for coherence enhancing diﬀusion ﬁltering with optimized rotation invariance. Journal of Visual Communication and Image Representation 13(1/2), 103–118 (2002) 4. Kirbas, C., Quek, F.: A review of vessel extraction techniques and algorithms. ACM Computing Surveys 36(2), 81–121 (2004) 5. Hassan, H., Farag, A.A.: Cerebrovascular segmentation for MRA data using levels set. International Congress Series, vol. 1256, pp. 246–252 (2003) 6. Scherl, H., et al.: Semi automatic level set segmentation and stenosis quatiﬁcation of internal carotid artery in 3D CTA data sets. Medical Image Analysis 11, 21–34 (2007) 7. Cohen, L.D., Deschamps, T.: Segmentation of 3D tubular objects with adaptive front propagation and minimal tree extraction for 3D medical imaging. Computer Methods in Biomechanics and Biomedical Engineering 10(4), 289–305 (2007) 8. Gooya, A., Liao, H., et al.: A variational method for geometric regularization of vascular segmentation in medical images. IEEE Transaction on image processing 17, 1295–1312 (2008) 9. Sandberg, B., Chan, T.F.: A logic framework for active contours on multi-channel images. J. Vis. Commun. Image R. 16, 333–358 (2005) 10. Westin, C.-F., Lorigo, L.M., Faugeras, O.D., Grimson, W.E.L., Dawson, S., Norbash, A., Kikinis, R.: Segmentation by Adaptive Geodesic Active Contours. In: Delp, S.L., DiGoia, A.M., Jaramaz, B. (eds.) MICCAI 2000. LNCS, vol. 1935, pp. 266–275. Springer, Heidelberg (2000)

Extrapolation of Vector Fields Using the Infinity Laplacian and with Applications to Image Segmentation Laurence Guillot1 and Carole Le Guyader2 1 Laboratoire Jean-Alexandre Dieudonné, UMR 6621, Université de Nice Sophia Antipolis – CNRS, Faculté des Sciences Parc Valrose, 06108 – Nice Cedex 02, France [email protected] 2 IRMAR, UMR CNRS 6625, Institut des Sciences Appliquées de Rennes, 20, Avenue des Buttes de Coësmes, CS 14315, 35043 Rennes Cedex, France [email protected]

Abstract. In this paper, we investigate a new Gradient-Vector-Flow (GVF)( [38])-inspired static external force ﬁeld for active contour models, deriving from the edge map of a given image and allowing to increase the capture range. Contrary to prior related works, we reduce the number of unknowns to a single one v by assuming that the expected vector ﬁeld is the gradient ﬁeld of a scalar function. The model is phrased in terms of a functional minimization problem comprising a data ﬁdelity term and a regularizer based on the super norm of Dv. The minimization is achieved by solving a second order singular degenerate parabolic equation. A comparison principle as well as the existence/uniqueness of a viscosity solution together with regularity results are established. Experimental results for image segmentation with details of the algorithm are also presented.

1 1.1

Introduction Motivations

Many of the well-known variational segmentation methods require a careful choice of the initial condition. One of the most famous variational methods to process this partition of the image is the active contour model introduced by Kass, Witkin, and Terzopoulos ([30]). It consists in evolving a parameterized curve so that it matches the object boundary. The shape taken by the curve through the process is related to an energy minimization, this energy comprising a data ﬁtting term and a regularizer, and being non-convex. Thereof, we can only expect local minimizers, which, in practice, means that the contour to be deformed must be initialized near the object boundary. Cohen ([18]) has proposed a way to alleviate this constraint by adding an inﬂating/deﬂating force in the modelling, deﬁned by kn, n denoting the unit inward normal to the curve and k, a constant. According to the sign of the constant k, the curve inﬂates X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 87–99, 2009. c Springer-Verlag Berlin Heidelberg 2009

88

L. Guillot and C. Le Guyader

or deﬂates. Thereby, in practice, the contour to be deformed is either initialized inside the object, or it encloses the object of interest. In [38], Xu and Prince address both the problems of initialization and slow and/or poor convergence near boundaries with strong concavities by introducing a new static external force called Gradient Vector Flow (GVF). The initialization constraint is removed, that is, initialization can be made inside, outside or across the object boundaries, and the front evolution is easily handled even in boundary concavities. The main idea behind this model is to increase the capture range of the external edge-map-related force ﬁeld, and to make the contour evolve toward the desired boundaries, here where classical methods would fail to. Unlike classical active contours, the introduced external force does not derive from a potential function and cannot be computed straightforwardly from the image edge map. More precisely, the model cannot be phrased in terms of a unique functional minimization problem but is deﬁned in two steps. In a ﬁrst step, the external force (GVF) w = (u, v)T is obtained by minimizing an energy functional in a variational framework. The corresponding Euler-Lagrange equations are computed and lead to solve a linear partial diﬀerential equation decoupled system by a gradient descent method. The second step consists then in replacing, in the dynamic snake equation, the classical potential force by the newly computed external force w. This method motivated the following works. In [34], Paragios et al. propose to integrate this boundary spatial diﬀusion technique to the geodesic active contours ( [16]). In [29], Jifeng et al. propose to improve the diﬀusion properties of the GVF force ﬁeld. They obtain a new force by replacing the Laplacian operator used in the GVF model by its diﬀusion term in the normal direction that is, the normalized inﬁnity Laplacian operator. Unlike the GVF model, their new ﬁeld (called NGVF for GVF in the normal direction) is anisotropic. Furthermore, the NGVF is stable for bigger time steps, improves slightly segmentation results, and allows to detect more quickly long and thin concavities. Our work is much motivated by [38] and [29]. We wanted to provide, in a rigorous mathematical framework, a new method to generate this external force ﬁeld. Contrary to these prior works, we propose to reduce the number of unknowns to a single one, by assuming that the sought vector ﬁeld is the gradient ﬁeld of a scalar function. Also, the introduced minimization problem contains a data-ﬁtting term related to the original GVF model and a regularizer that penalizes the super norm of the unknown gradient. Thus the problem becomes related to the absolutely minimizing Lipschitz extensions and to the inﬁnity Laplacian. The absolute minimal Lipschitz extension model was introduced by Aronsson in [2] (see also [1, 3, 4]) in the following way. Given Ω ⊂ IRn a bounded, open and connected domain with suﬃciently smooth boundary, and b ∈ C(∂Ω), solve inf

u∈W 1,∞ (Ω), u=b

on ∂Ω

DuL∞ (Ω) .

(1)

A minimizer of (1) is called an absolutely minimizing Lipschitz interpolant of b|∂Ω inside Ω. Aronsson proved the existence of an absolute minimal Lipschitz

Extrapolation of Vector Fields Using the Inﬁnity Laplacian

89

extension and Jensen proved the uniqueness. Aronsson also derived the EulerLagrange equation governing the absolute minimizer in the sense of viscosity solutions: Δ∞ u = D2 u(Du, Du) = 0 in Ω. (2) We refer to [5, 10, 20] for more details. The operator Δ∞ is called the inﬁnity Laplacian and solutions of (2) are said to be ∞-harmonic. Jensen proved a comparison principle and an existence/uniqueness result of (2) for Lipschitz continuous boundary data (see [28]). Before depicting our model, we brieﬂy make a non-exhaustive review of some prior works related to AMLE and inﬁnity Laplacian in the ﬁeld of image processing. 1.2

Prior Related Works

As stressed by Caselles et al., the equation Δ∞ u = D2 u(Du, Du) = 0 was introduced in the ﬁeld of computer vision as edge detector (see [37], [39]). It earlier appeared in the domain of edge enhancement (see [35]) and served as the basis of Canny edge detection [12]. In [17], Caselles et al. investigate the AMLE and the inﬁnity Laplacian in the ﬁeld of image processing with applications to the restoration of images. Motivated by prior applications devoted to coding ( [13], [14]), they address the issue of interpolating data given on a set of points and/or curves in the plane. Another application, dedicated to shape metamorphism (the process which consists in evolving a source shape into a target shape by intermediate steps) is proposed by Cong et al. in [19] and makes use of the inﬁnity Laplacian. Also, in [31], Mémoli et al. propose a new framework for brain warping using Minimizing Lipschitz Extensions. To ﬁnish, in [24], Elion and Vese aim at solving the (BV, G) decomposition model introduced by Meyer in [32]. In that purpose, Elion and Vese focus on an isotropic decomposition of the image f ≈ u + v with v = ΔP = div(DP ) and DP ∈ (L∞ (Ω))2 . The outline of the paper is as follows: Sect. 2 is devoted to the depiction of the model and the derivation of the associated evolution problem. Section 3 is dedicated to the theoretical study of the obtained parabolic problem. We ﬁrst prove a comparison principle, then prove existence and uniqueness of a viscosity solution. Regularity results of this solution are also given. We conclude the paper with experimental results and integrate this new external force ﬁeld in a segmentation problem. Details of the algorithm are also provided.

2

Depiction of the Model

Let Ω be a bounded open subset of IRn , ∂Ω its boundary and let I be a given ¯ → IR. For the purpose of illustration we bounded image function deﬁned by I : Ω consider n = 2. Let g be an edge-detector map. The function g is applied to the norm of the image gradient, and satisﬁes the following properties: g : [0, ∞[→ [0, ∞[, g(0) = 1, g strictly decreasing, and limr→+∞ g(r) = 0. An example of 1 such a function is g : r → 1+r 2 . We denote by W = (w1 , w2 ) = −Dg(||DI||) the

90

L. Guillot and C. Le Guyader

associated gradient vector ﬁeld. In homogeneous regions, ||DI|| 0 so g(||DI||) is almost equal to 1. On boundaries, ||DI|| is large so g(||DI||) is almost zero. Also, in homogeneous regions, W is almost the null vector. Along the boundaries, the vector ﬁeld W points toward the middle of the edges (see such an example in Fig. 1). We plan to extrapolate the vector ﬁeld on the whole image domain in a variational framework. A majority of existing regularization functionals aims at minimizing the global variation of the unknown and thus provides little local control. In this work, we propose to minimize the super norm of the unknown gradient. This choice is also motivated by the fact that the Laplacian operator (that naturally appears in the GVF model) can be decomposed into the sum of the second derivative in the normal direction, and the second derivative in the tangent direction. The former component that is kept in the NGVF model weighs heavily in the extrapolation process and has good properties unlike the later component which proves to be parasitic particularly when dealing with thin and long concavities. Also, unlike prior related works, we reduce the number of unknowns to a single one by assuming that the expected vector ﬁeld is the gradient vector ﬁeld of a scalar function. We thus propose to minimize the following functional: inf ||Dv − W ||2 ||W ||2 dx + μ ||Dv||L∞ (Ω) , (3) 1,∞ v∈W

(Ω)

Ω

where μ > 0 is a tuning parameter. Remark 1. Functional (3) is deﬁned on W 1,∞ (Ω). The domain Ω being bounded, the inclusion L∞ (Ω) ⊂ L2 (Ω) holds so Dv ∈ L2 (Ω). Remark 2. If v is a minimizer of (3), so is v + C where C denotes any real constant. This is not a problem since we are interested in the associated gradient vector ﬁeld. If v ∈ W 1,∞ (Ω), v is Lipschitz continuous and thus, by Rademacher’s theorem, diﬀerentiable almost everywhere. To minimize the above energy, we make use of the absolutely minimizing Lipschitz extensions. Following the results on AMLE recalled in Sect. 1, we obtain the Euler-Lagrange equation satisﬁed by v if it minimizes (3) and solve it by gradient descent. More precisely, classically, in image processing, the equation is deﬁned on a domain R of IR2 (e.g., on the square [0, 1] × [0, 1]). In this case, boundary conditions must be deﬁned: Neumann boundary conditions on ∂R are well-suited to the image processing framework since it corresponds to the reﬂection of the data through the edges. Thus it is no longer necessary to deﬁne boundary values. Following [6] and [15], we propose to simplify the problem by working with periodic solutions. The function v, primarily deﬁned on [0, 1] × [0, 1], is extended to IR2 . First, by symmetry, we extend it to [−1, 1] × [−1, 1] and then in all of IR2 by periodicity (see Sect. 3.3.1 from [6]). We thus obtain that ∀h ∈ ZZ 2 , ∀x ∈ IR2 ,

Extrapolation of Vector Fields Using the Inﬁnity Laplacian

91

v(x + 2h) = v(x). Also, we assume that the initial condition v0 and the functions x → wk (x), k = 1, 2 are extended to IR2 with the same periodicity. Given T > 0, we then obtain the following problem: ⎧ ∂v ⎪ ⎪ = 2 ||W ||2 Δv + 2 D||W ||2 , Dv − 2 div ||W ||2 W ⎪ ⎪ ∂t

⎪ ⎪ Dv Dv ⎪ 2 ⎨ , on IR2 × (0, T ), + μD v |Dv| |Dv| (4)

⎪ Dv Dv ⎪ ⎪ ⎪ , , = b(x)Δv − d(x), Dv − h(x) + μ D 2 v ⎪ ⎪ |Dv| |Dv| ⎪ ⎩ 2 v(x, t = 0) = v0 (x) in IR , with b : x → 2||W (x)||2 , d : x → −2 D||W ||2 (x), h : x → 2 div ||W ||2 W (x), and with the assumptions v0 ∈ C(IR2 ) ∩ W 1,∞ (IR2 ), b ∈ C(IR2 ) and bounded by ξb , d ∈ C(IR2 ) ∩ W 1,∞ (IR2 ), bounded by ξd and with Lipschitz constant κd , h ∈ C(IR2 ) ∩ W 1,∞ (IR2 ), bounded by ξh and with Lipschitz constant κh , and with ·, · denoting the euclidean scalar product in IR2 . We also assume that the mapping IR2 x → b1/2 (x) is Lipschitz continuous on IR2 with Lipschitz constant κb1/2 .

3

Theoretical Results

This problem falls within the framework of the theory of viscosity solutions. Indeed, we obtain a second order singular degenerate parabolic equation. The concept of viscosity solutions has been introduced in 1981 by Crandall and Lions ( [22]). This theory was developed to study ﬁrst-order partial diﬀerential equations of nondivergence form, typically, Hamilton-Jacobi equations. Later, the study of viscosity solutions was extended to second-order elliptic and parabolic equations (for a good introduction to the theory of viscosity solutions, we refer to Barles [8, 7], the article of Crandall, Ishii and Lions [21], Crandall, Lions [23], Ishii [26], and Ishii, Lions [27]). We also refer to the related work [9]. In our problem, the evolution equation in (4) can be rewritten in the form: ∂v + G x, Dv, D2 v = 0, ∂t with G : IR2 × IR2 − {0IR2 } × S 2 (S 2 being the set of symmetric 2 × 2 matrices equipped with its natural partial order) deﬁned by: p pT X , |p| |p|

p p = d(x), p + h(x) − b(x) trace (X) − μ trace X , |p|2

G(x, p, X) = d(x), p + h(x) − b(x) trace (X) − μ

= c(x, p) + E (x, X) + F (p, X) , with the following properties:

92

L. Guillot and C. Le Guyader

p p X 2 |p|

and E : (x, X) →

if X ≤ Y then F (p, X) ≥ F (p, Y ).

(5)

1. The operators G, F : (p, X) → −μ trace

−b(x) trace (X) are independent of v and are elliptic, i.e., ∀X, Y ∈ S 2 , ∀p ∈ IR2 ,

The operators G, E, and F are therefore proper. 2. F is locally bounded on IR2 × S 2 , continuous on IR2 \ {0IR2 } × S 2 , and F ∗ (0, 0) = F∗ (0, 0) = 0,

(6)

where F ∗ (resp. F∗ ) is the upper semicontinuous (usc) envelope (resp. lower semicontinuous (lsc) envelope) of F . 3. c : IR2 × IR2 (x, p) → d(x), p + h(x) is locally Lipschitz continuous in space and ∀x, y ∈ IR2 × IR2 , |c(x, p) − c(y, p)| ≤ (κd |p| + κh ) |x − y|.

(7)

We start by proving a comparison principle that will be useful to prove the uniqueness of the viscosity solution of the considered problem. Theorem 1 (Comparison principle). Let u ∈ U SC(IR2 × [0, T )), bounded, periodic (with the same periodicity as the initial condition of (4)), be a subsolution and v ∈ LSC(IR2 × [0, T )), bounded, periodic (with the same periodicity as the initial condition of (4)), be a supersolution of (4). Assume that u0 (x) = u(x, 0) ≤ v0 (x) = v(x, 0) in IR2 , then u ≤ v in IR2 × [0, T ). Proof. This proof is rather classical. We follow the arguments of [21]. We ﬁrst observe that for λ > 0, u ˜ = u − T λ−t is also a subsolution of (4) and u ˜t + G∗ (x, D˜ u, D 2 u ˜) ≤ −

λ λ ≤ − 2. 2 (T − t) T

Since u ≤ v follows from u ˜ ≤ v in the limit λ → 0, it will simply suﬃce to prove the comparison under the additional assumptions: ⎧ ⎨ (i) u + G x, Du, D2 u ≤ − λ . t ∗ T2 (8) ⎩ (ii) lim u(x, t) = −∞. t→T

Let us set M = supIR2 ×[0,T ) u(x, t) − v(x, t). We aim to show that M ≤ 0. In this purpose, we argue by contradiction and assume that M > 0. We introduce the duplication function f (x, − (4ε)−1|x − y|4 and consider y, t) = u(x, t) − v(y, t) −1 M0 = supIR2 ×IR2 ×[0,T ) u(x, t) − v(y, t) − (4ε) |x − y|4 , ε > 0. Obviously, M0 ≥ M > 0. Moreover, this supremum is reached owing to the bound above of u and −v, the fact that f is such that ∀h ∈ ZZ 2 , f (x + 2h, y + 2h, t) = f (x, y, t), and (8)(ii). We denote by (x0 , y0 , t0 ) ∈ IR2 × IR2 × [0, T ) a point of maximum. We ﬁrst prove that t0 > 0 for ε small enough and then rise a contradiction using Th. 8.3 from [21], which allows to conclude that M ≤ 0. Consequently, u ≤ v in IR2 × [0, T ).

Extrapolation of Vector Fields Using the Inﬁnity Laplacian

93

We now give an existence result using the classic Perron’s method (see Sect. 4 from [21]). We start by constructing a subsolution U − . Let us set U − = inf IR2 (v0 ) − Ct with C = ξh . U − is twice diﬀerentiable in space, once diﬀerentiable in time, bounded, and periodic with the same periodicity as v0 and U − is a subsolution of (4). Similarly, U + = supIR2 (v0 ) + Ct is a supersolution of (4). Obviously, U − (x, 0) ≤ U + (x, 0). We can deﬁne: v = sup {w; w periodic with the same periodicity as v0 , subsolution such that U − ≤ w ≤ U + } . In that case, Perron’s method states that v is a periodic discontinuous solution of (4) with the same periodicity as v0 . Clearly, the solution is bounded since U + is bounded. Also as v is a solution, v ∗ is a subsolution and v∗ a supersolution so from the comparison principle v ∗ ≤ v∗ . But v∗ ≤ v ∗ so v ∗ = v∗ = v, which gives that v is continuous on IR2 × [0, T ). Conclusion 1. We have proved the existence and uniqueness of a bounded, periodic, continuous on IR2 × [0, T ) viscosity solution of (4). We now prove that a solution of (4) is Lipschitz continuous in space, and uniformly continuous in time. Theorem 2 (Regularity results). Let us assume that ||Dv0 ||L∞ (IR2 ) ≤ B0 with B0 > 0. Then the solution of (4) satisﬁes: ||Dv(·, t)||L∞ (IR2 ) ≤ B(t), αt

with B(t) = κh e

−1 α

+ B0 eαt , and with α = 8κ2b1/2 + κd .

Proof. The function v is bounded, continuous on IR2 × [0, T ), and periodic with 1 the same periodicity as v0 . We set Φε (x, y, t) = B(t) |x − y|2 + ε2 2 and aim at proving that v(x, t) − v(y, t) ≤ Φε (x, y, t). Let us set M = sup(x,y)∈IR2 ×IR2 , t∈[0,T ) (v(x, t) − v(y, t) − Φε (x, y, t)). We thus aim to show that M ≤ 0. Once again, we argue by contradiction and assume that M > 0. So we conclude that v(x, t) − v(y, t) ≤ Φε (x, y, t) and letting ε tend to 0, one obtains: v(x, t) − v(y, t) ≤ B(t)|x − y|. Exchanging x and y yields: |v(x, t) − v(y, t)| ≤ B(t)|x − y|.

Theorem 3 (Regularity results). The solution v is uniformly continuous in time.

94

L. Guillot and C. Le Guyader

Proof. We proceed like in [25]. In a ﬁrst time, we assume that v0 is bounded, periodic, C 2 , and such that there exists C, ||Dv0 ||L∞ (IR2 ) , ||D2 v0 ||L∞ (IR2 ) ≤ C. Let us set C1 = sup ζ + E(x, D2 v0 ) + F∗ (Dv0 , D2 v0 ), ζ − E(x, D2 v0 ) − F ∗ (Dv0 , D2 v0 ) x∈IR2

with ζ = ξd ||Dv0 ||L∞ (IR2 ) + ξh . Let us also set v − = v0 − C1 t and v + = v0 + C1 t. It can be checked that v − is a subsolution of (4) and v + is a supersolution. Then, there exists a unique solution v of (4) and, by the comparison principle, it yields: ∀x ∈ IR2 , ∀t ∈ [0, T ), |v(x, t) − v0 (x)| ≤ C1 t. Letting u(x, t) = v(x, t + h), we obtain that u is the solution of ∂u + G(x, Du, D2 u) = 0 . ∂t u(x, t = 0) = v(x, h) Classical arguments (comparison principle) allow to conclude that |u(x, t) − v(x, t)| ≤ C1 h, that is |v(x, t + h) − v(x, t)| ≤ C1 h. So v is uniformly continuous in time. Then we assume that v0 is only bounded, periodic and Lipschitz continuous, and use molliﬁcation (see Chap. IV from [11] and Sect. 2.5 from [6]). Using the ﬁrst step of the proof, we obtain the result and the modulus of continuity of v which depends on B0 . Conclusion 2. We have proved the existence and uniqueness of a viscosity solution of problem (4), bounded, periodic, continuous on IR2 × [0, T ), Lipschitz continuous in space so diﬀerentiable almost everywhere, and uniformly continuous in time. We now discretize the evolution equation. In the sequel, we set Ω x = (x1 , x2 ).

4

Experimental Results

Let Δx1 and Δx2 be the spatial steps, Δt be the time step and (x1i , x2j ) = (iΔx1 , jΔx2 ) be the grid points, 1 ≤ i ≤ M and 1 ≤ j ≤ N . For a function Ψ : Ω → IR, let Ψijn = Ψ (iΔx1 , jΔx2 , nΔt). To discretize (4), we use an explicit ﬁnite diﬀerence scheme as follows. Also, the problem is complemented by Neumann boundary conditions. For the discretization of the convection component, we refer to [36] (we have used the usual notations for the ﬁnite diﬀerence operators and the notation d = (d1 , d2 )). n+1 n n n = vi,j + Δt bi,j Dx1 x1 vi,j + Dx2 x2 vi,j vi,j x1 n x1 n vi,j + min (d1 )i,j , 0 D+ vi,j (9) −Δt max (d1 )i,j , 0 D−

x2 n x2 n +max (d2 )i,j , 0 D− vi,j + min (d2 )i,j , 0 D+ vi,j − Δt hi,j +Δt μ

n n 2 n n n n n 2 Dx1 x1 vi,j (Dx1 vi,j ) +2Dx1 vi,j Dx2 vi,j Dx1 ,x2 vi,j +Dx2 x2 vi,j (Dx2 vi,j ) n 2 n 2 (Dx1 vi,j ) +(Dx2 vi,j ) +ε

.

Extrapolation of Vector Fields Using the Inﬁnity Laplacian

95

Fig. 1. On the left, depiction of the initial gradient vector ﬁeld W = −Dg(||DI||), on the right, the obtained vector ﬁeld with our proposed approach (μ = 0.05, Δt = 0.1)

Fig. 2. On the left, depiction of the initial gradient vector ﬁeld W = −Dg(||DI||), on the right, the obtained vector ﬁeld with our proposed approach (μ = 0.1, Δt = 0.1)

4.1

Numerical Experimentations of Extrapolation

The experiments have been performed on a 2.21 GHz Athlon with 1.00 GB of RAM. In all our experiments, Δx1 = Δx2 = 1. We apply our model to real data and for each test, we provide a view of the initial gradient vector ﬁeld −Dg(||DI||) and a view of the extrapolated vector ﬁeld. The initialization was made either by setting v0 ≡ 0, or by setting v0 ≡ −g(||DI||). In all the tests we performed, it does not seem to inﬂuence the obtained result. The number of iterations as well as the computational time (order of the second) are similar for the three methods (GVF, NGVF and our proposed approach). Our method qualitatively performs in a way similar to the GVF and the NGVF: we increase the capture range of the vector ﬁeld and we obtain downward components within the boundary concavity. Nevertheless, contrary to the the GVF and NGVF models, the method requires only one unknown. We start with an image taken from the Image Toolbox of Matlab (Fig. 1), and with an image showing a slice of Tuﬀeau

96

L. Guillot and C. Le Guyader

Fig. 3. Steps of the segmentation of the synthetic image taken from [34]

Fig. 4. Steps of the segmentation of the image of the brain

Extrapolation of Vector Fields Using the Inﬁnity Laplacian

97

(Fig. 2, Courtesy of ISTO/ESRF). Our proposed approach performs well but seems to be sensitive to the textures of the objects contained in the image. 4.2

Application to Segmentation

This part is dedicated to segmentation and more precisely to the integration of this extrapolated vector ﬁeld in the geodesic active contour model, in order to alleviate the constraint on the choice of the initial condition. The geodesic active contour model, introduced by Caselles et al. in [16], is cast in the level set setting developed by Osher and Sethian in [33]. We propose, as done in [34], to replace W = −Dg(||DI||) of the geodesic active contour model by the extrapolated vector ﬁeld obtained with our proposed approach. To illustrate this, we propose an example taken from [34]. It demonstrates that the initial condition can be made of several contours selected inside, outside or across the boundaries of interest, provided the initial curves contain part of the skeleton of the extrapolated vector ﬁeld. The classical geodesic active contour model does not authorize this ﬂexibility in the initialization step and therefore the method alone would fail to detect all the shapes. Of course, the proposed method cannot detect automatically interior contours but this drawback is overcome, still with the ﬂexibility in the initialization step. We illustrate this remark with Fig. 4 that represents a slice of the brain (Courtesy of the Laboratory Of Neuro Imaging, UCLA).

5

Conclusion

This paper was devoted to the theoretical study of a new method to extrapolate vector ﬁelds using the inﬁnity Laplacian and with applications to image processing. Contrary to prior related works, the number of unknowns is reduced to a single one. The problem is phrased in a variational framework and the EulerLagrange equation is then derived. It is solved using a gradient descent method, which leads to a parabolic problem that falls within the viscosity solution theory framework. The existence and uniqueness of a viscosity solution continuous in space and time, Lipschitz continuous in space and uniformly continuous in time is established. The theoretical study is complemented by several numerical experimentations, ﬁrst dedicated to the extrapolation problem, and then extended to the segmentation problem. The experimentations show that the proposed approach performs well, even if in strong concavities the results are slightly less accurate than with the NGVF. The model is sensitive to the geometry of the boundaries and to the textures present in the images. In the segmentation framework, the introduction of this new force ﬁeld allows to widen the choice of the initial condition.

References 1. Aronsson, G.: Minimization problems for the functional supx F (x, f (x), f (x)). Arkiv für Mate. 6, 33–53 (1965) 2. Aronsson, G.: Minimization problems for the functional supx F (x, f (x), f (x)). II. Arkiv für Mate. 6, 409–431 (1966)

98

L. Guillot and C. Le Guyader

3. Aronsson, G.: Extension of functions satisfying Lipschitz conditions. Arkiv für Mate. 6(6), 551–561 (1967) 4. Aronsson, G.: On the partial diﬀerential equation u2x uxx + 2ux uy uxy + u2y uyy = 0. Arkiv für Mate. 7, 395–425 (1968) 5. Aronsson, G., Crandall, M., Juutinen, P.: A tour of the theory of absolutely minimizing functions. Bull. Amer. Math. Soc. (N.S.) 41, 439–505 (2004) 6. Aubert, G., Kornprobst, P.: Mathematical Problems in Image Processing: Partial Diﬀerential Equations and the calculus of Variations. Springer, Heidelberg (2002) 7. Barles, G.: Solutions de viscosité des équations de Hamilton-Jacobi. Springer, Heidelberg (1994) 8. Barles, G.: Solutions de viscosité et équations elliptiques du deuxième ordre. Cours de DEA (1997) 9. Barles, G., Busca, J.: Existence and comparaison results for fully nonlinear degenerate elliptic equations without zeroth-order term. Comm. Partial Diﬀerential Equations 26, 2323–2337 (2001) 10. Barron, E.N., Evans, L.C., Jensen, R.: The inﬁnity Laplacian, Aronsson’s equation and their generalizations. Trans. Amer. Math. Soc. 360(1), 77–101 (2008) 11. Brézis, H.: Analyse fonctionnelle. Dunod (1999) 12. Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 8, 679–698 (1986) 13. Carlsson, S.: Sketch based coding of grey level images. Signal Process. 15, 57–83 (1988) 14. Casas, J.R.: Image compression based on perceptual coding techniques. Ph.D. dissertation, Dept. Signal Theory Commun., UPC, Barcelona, Spain (1996) 15. Caselles, V., Catté, F., Coll, C., Dibos, F.: A geometric model for active contours in image processing. Numer. Math. 66, 1–31 (1993) 16. Caselles, V., Kimmel, R., Sapiro, G.: Geodesic Active Contours. Int. J. Comput. Vision 22(1), 61–87 (1997) 17. Caselles, V., Morel, J.M., Sbert, C.: An Axiomatic Approach to Image Interpolation. IEEE Trans. Image Process. 7(3), 376–386 (1998) 18. Cohen, L.D.: On Active Contour Models and Balloons. CVGIP: Image Understanding 53(2), 211–218 (1989) 19. Cong, G., Esser, M., Parvin, B., Bebis, G.: Shape Metamorphism Using p-Laplacian Equation. In: ICPR, vol. 4, pp. 15–18 (2004) 20. Crandall, M.G.: A visit with the ∞-Laplace equation. Preprint, Notes from a CIME course (2005) 21. Crandall, M.G., Ishii, H., Lions, P.L.: User’s guide to viscosity solutions of second order partial diﬀerential equations. Bull. Amer. Math. Soc. 27, 1–67 (1992) 22. Crandall, M.G., Lions, P.L.: Viscosity solutions of Hamilton-Jacobi Equations. Trans. Amer. Math. Soc. 277, 1–42 (1983) 23. Crandall, M.G., Lions, P.L.: On existence and uniqueness of solutions of HamiltonJacobi equations. Non-Linear Anal. 10, 353–370 (1986) 24. Elion, C., Vese, L.A.: An image decomposition model using the total variation and the inﬁnity Laplacian. In: Proceedings SPIE, vol. 6498, pp. 64980W-1–64980W-10 (2007) 25. Forcadel, N.: Dislocations dynamics with a mean curvature term: short time existence and uniqueness. Diﬀerential and Integral Equations 21(3-4), 285–304 (2008) 26. Ishii, H.: Existence and uniqueness of solutions of Hamilton-Jacobi equations. Funkcial. Ekvac. 29, 167–188 (1986) 27. Ishii, H., Lions, P.L.: Viscosity solutions of fully nonlinear second-order elliptic partial diﬀerential equations. J. Diﬀer. Equations 83, 26–78 (1990)

Extrapolation of Vector Fields Using the Inﬁnity Laplacian

99

28. Jensen, R.: Uniqueness of Lipschitz extensions minimizing the sup-norm of the gradient. Arch. Rat. Mech. Anal. 123(1), 51–74 (1993) 29. Jifeng, N., Chengke, W., Shigang, L., Shuqin, Y.: NGVF: An improved external force ﬁeld for active contour model. Pattern Recogn. Lett. 28, 58–63 (2007) 30. Kass, M., Terzopoulos, D., Witkin, A.: Snakes: Active contour models. Int. J. Comput. Vision 1, 321–331 (1988) 31. Mémoli, F., Sapiro, G., Thompson, P.: Brain and surface warping via minimizing Lipschitz extensions. In: MFCA, International Workshop on Mathematical Foundations of Computational Anatomy (2006) 32. Meyer, Y.: Oscillating Patterns in Image Processing and Nonlinear Evolution Equations. AMS 22 (2001) 33. Osher, S., Sethian, J.A.: Fronts propagation with curvature dependent speed: Algorithms based on Hamilton-Jacobi formulations. J. Comput. Phys. 79, 12–49 (1988) 34. Paragios, N., Mellina-Gottardo, O., Ramesh, V.: Gradient Vector Flow Fast Geodesic Active Contours. In: Proc. IEEE Intl. Conf. Computer Vision, vol. 1, pp. 67–73 (2001) 35. Prewitt, J.M.S.: Object enhancement and extraction. In: Lipkin, B., Rosenfeld, A. (eds.) Picture Processing and Psychopictorics, pp. 75–149. Academic Press, New York (1970) 36. Sethian, J.A.: Level Set Methods and Fast Marching Methods: Evolving interfaces in Computational Geometry. In: Fluid Mechanics, Computer Vision and Material Science. Cambridge University Press, Londres (1999) 37. Torre, V., Poggio, T.A.: On edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 8, 147–163 (1986) 38. Xu, C., Prince, J.L.: Snakes, shapes, and gradient vector ﬂow. IEEE Trans. Image Process. 7(3), 359–369 (1998) 39. Yuille, A.L., Poggio, T.A.: Scaling theorems for zero-crossings. IEEE Trans. Pattern Anal. Mach. Intell. 8, 15–25 (1986)

A Schrödinger Equation for the Fast Computation of Approximate Euclidean Distance Functions Karthik S. Gurumoorthy and Anand Rangarajan Dept. of CISE, University of Florida, Gainesville, FL, USA

Abstract. Computational techniques adapted from classical mechanics and used in image analysis run the gamut from Lagrangian action principles to HamiltonJacobi field equations: witness the popularity of the fast marching and fast sweeping methods which are essentially fast Hamilton-Jacobi solvers. In sharp contrast, there are very few applications of quantum mechanics inspired computational methods. Given the fact that most of classical mechanics can be obtained as a limiting case of quantum mechanics (as Planck’s constant h tends to zero), this paucity of quantum mechanics inspired methods is surprising. In this work, we derive relationships between nonlinear Hamilton-Jacobi and linear Schrödinger equations for the Euclidean distance function problem (in 1D, 2D and 3D). We then solve the Schrödinger wave equation instead of the corresponding HamiltonJacobi equation. We show that the Schrödinger equation has a closed form solution and that this solution can be efficiently computed in O(N log N ), N being the number of grid points. The Euclidean distance can then be recovered from the wave function. Since the wave function is computed for a small but non-zero h, the obtained Euclidean distance function is an approximation. We derive analytic bounds for the error of the approximation and experimentally compare the results of our approach with the exact Euclidean distance function on real and synthetic data.

1 Introduction Image analysis [1,2] has a tradition of importing and adapting a host of classical physics based approaches including Lagrangian based variational principles and their associated Euler-Lagrange equations [3], Hamiltonian dynamics [4] and more recently HamiltonJacobi based methods [5]. Approaches in image analysis do not strictly adhere to the classical mechanics sequence [6] of i) first specifying a Lagrangian action principle, ii) deriving the corresponding Euler-Lagrange equation, iii) employing a Legendre transformation to convert the Lagrangian dynamics to a first-order Hamiltonian dynamics, and finally, iv) employing a canonical transformation to derive the Hamilton-Jacobi equation whose solution also yields a solution to the original variational problem. Instead, most research in image analysis uses a combination of one or more of these four approaches depending on the problem at hand. For example, in surface reconstruction [3], a popular approach consists of writing a variational form and then finding a solution using preconditioned conjugate gradient or quasi-Newton type iterative methods. While we notice a plethora of classical mechanics inspired techniques in image analysis, the same cannot be said about quantum mechanics. Despite the well known fact X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 100–111, 2009. c Springer-Verlag Berlin Heidelberg 2009

A Schrödinger Equation for the Fast Computation

101

that most of classical mechanics is a limiting case of quantum mechanics as Planck’s constant → 0 [7], there is very little application of quantum mechanical principles in image analysis problems. Rather than speculate on the reasons for this dearth of applications, we wish to point out that in this paper, we are primarily interested in exploiting a concrete relationship between the classical, non-linear Hamilton-Jacobi equation and the quantum, linear Schrödinger equation. We feel that focusing more narrowly on this relationship (which will become more obvious as we proceed) is more productive than dwelling on the more mysterious and specifically quantum mechanical issues of i) interpretation of the wave function, ii) role of probabilities and, iii) the problem of measurement. While these issues are certainly important, they do not play any role in this paper. In summary, we are mainly interested in exploiting the relationship between the Schrödinger and Hamilton-Jacobi equations in order to derive computationally efficient algorithms which are applicable in image analysis problems where Hamilton-Jacobi theory is used. In the theoretical physics literature, a Schrödinger wave equation at the energy state h E has the form ψ(X, t) = φ(X) exp( iEt ), ( ≡ 2π ) [8], where φ(X)—the stationary state wave function—is the eigenstate of the Hamiltonian operator H corresponding to the eigenvalue E. When the Hamilton-Jacobi scalar field S ∗ appears as the exponent ∗ of the stationary state wave function, specifically φ(X) = exp( −S (X) ), and if φ(X) satisfies the linear Schrödinger equation, namely Hφ = Eφ, we show that as → 0, S ∗ satisfies the Hamilton-Jacobi equation for a carefully chosen problem. The novel aspect is that a nonlinear Hamilton-Jacobi equation is obtained in the limit as → 0 of a linear Schrödinger equation. Consequently, instead of solving the Hamilton-Jacobi equation, one can solve the Schrödinger wave equation (taking advantage of its linearity), and then compute an approximate S ∗ for small values of . This computational procedure would be approximately equivalent to solving the original Hamilton-Jacobi equation. With the basic setup in place, we now turn our attention to an actual application. Our goal is to apply the Schrödinger formalism to a well known problem that has been successfully attacked by Hamilton-Jacobi theory. To this end, we choose the problem of computing Euclidean distance functions on a grid for a given set of points where the task is to assign at each grid point a value corresponding to the Euclidean distance to its nearest neighbor from the given point-set. The literature is replete with elegant and pioneering works which have successfully solved this problem. To name a few, the well known fast marching [9] and fast sweeping [10] methods are essentially O(N log N ) Hamilton-Jacobi based algorithms where N is the number of grid points. [The fast sweeping method has an added advantage in that the algorithm appears (empirically)1 to be O(N ).] These techniques focus on directly solving the non-linear eikonal equation [4], whereas our approach shows that one can instead solve a linear equation and obtain the solution to the non-linear eikonal equation in the limit as → 0. The important connection between the Schrödinger wave equation and the Hamilton-Jacobi equation [7] is illuminated during the process. In the more traditional (computer science) algorithms literature, computational geometry inspired techniques like Voronoi diagrams and KD-Trees [11] have also solved this problem. However, computing Voronoi 1

After a careful reading of [10], it is still unclear to us if the fast sweeping method is formally rather than empirically O(N ).

102

K.S. Gurumoorthy and A. Rangarajan

diagrams or building data structures for KD-Trees in 3D and higher dimensions is expensive and the O(N log N ) complexity is not retained at higher dimensions [11]. Our technique (which as we shall see is based on application of the fast Fourier transform (FFT) [12]) is very simple and elegant and remains O(N log N ) irrespective of the spatial dimension. The basic question asked and answered in this paper is: Can we design a Schrödinger equation that computes an exponentiated Euclidean distance function on a grid? The distance function is obtained from the exponent of the Schrödinger wave function. A naïve approach to solve this Euclidean distance problem would be to visit every grid point and compute the Euclidean distance to all members of the point-set and pick the smallest distance. The complexity of this naïve approach is obviously related to the product of the number of grid points and the cardinality of the point-set. The fast sweeping and fast marching methods avoid this naïve complexity and as we shall see, so does the Schrödinger wave function approach.

2 Euclidean Distance Functions We now describe the Euclidean distance function problem. Given a point-set Y = {Yk ∈ RD , k ∈ {1, . . . , K}} where D is the dimensionality of the point-set and a set of equally spaced Cartesian grid points X, the Euclidean distance function problem requires us to assign S ∗ (X) = min X − Yk (1) k

with the Euclidean norm used in (1). If efficient computation is a non-issue, this is a simple problem. We visit each grid point in the set X and compute the distance to every point Yk , ∀k ∈ {1, . . . , K} and assign S ∗ (X) the minimum distance. If the number of grid points is N , the naïve complexity is O(N KD). 2.1 Hamilton-Jacobi Formulation The Hamilton-Jacobi equation approach to the Euclidean distance function problem stems from considering the following variational problem which in 2D is I[q] =

t1

L(q1 , q2 , q˙1 , q˙2 , t)dt

(2)

to

where the Lagrangian L is defined as L(q1 , q2 , q˙1 , q2˙, t) ≡

1 2 (q˙1 + q˙2 2 ). 2

Defining pi ≡ ∂∂L q˙i and applying a Legendre transformation [6] [by inverting pi = to get the function q˙i (q1 , q2 , p1 , p2 , t)], we define the Hamiltonian of the system as

(3) ∂L ∂ q˙i

H(q1 , q2 , p1 , p2 , t) ≡ p1 q˙1 + p2 q˙2 − L(q1 , q2 , q˙1 (q1 , q2 , p1 , p2 , t), q˙2 (q1 , q2 , p1 , p2 , t)) (4)

A Schrödinger Equation for the Fast Computation

103

which for the Euclidean distance function problem turns out to be H(q1 , q2 , p1 , p2 , t) =

1 2 (p + p22 ). 2 1

(5)

The Hamilton-Jacobi equation is obtained via a canonical transformation [6] of the Hamiltonian. In the 2D case, it is ∂S ∂S ∂S + H(q1 , q2 , , , t) = 0 ∂t ∂q1 ∂q2

(6)

where we have replaced the generalized momentum variables pi with ∂S ∂L = pi = . ∂qi ∂ q˙i

(7)

Since the Hamiltonian in (5) is a constant independent of time, equation (6) can be simplified to the static Hamilton-Jacobi equation. By separation of variables, we get S(q1 , q2 , t) = S ∗ (q1 , q2 ) − Et

(8)

∗

where E is the total energy of the system and S (q1 , q2 ) is called the Hamilton’s characteristic function [13]. Using the definition of pi from (7) in (5) and observing that ∂S ∂S ∗ ∂qi = ∂qi , we get 2 ∗ 2 1 ∂S ∗ ∂S = E. (9) + 2 ∂q1 ∂q2 Choosing the energy E to be 12 , we obtain S∗ = 1

(10) ∗

which is the well known eikonal equation [4] where the forcing term is 1. S is the required Hamilton-Jacobi scalar field which is efficiently obtained by the fast sweeping and fast marching methods. 2.2 Schrödinger Wave Equation for Euclidean Distance Functions We now derive and solve a Schrödinger wave equation for this Euclidean distance function problem instead of solving the non-linear eikonal equation. The Schrödinger wave equation is written as [8] i

∂ψ = Hψ ∂t

(11)

where ψ(X, t) is the wave function and H is the Hamiltonian operator obtained by first ∂ quantization2—where the momentum variables pi are replaced with the operator i ∂x . i Using the definition of our Hamiltonian from (5), ψ satisfies (in 2D) 2 ∂ 2 ψ ∂ 2ψ ∂ψ = . (12) + i ∂t 2 ∂x21 ∂x22 2

First quantization is still mysterious. For an informal but illuminating treatment, please see http://math.ucr.edu/home/baez/categories.html

104

K.S. Gurumoorthy and A. Rangarajan

Using separation of variables ψ(X, t) = φ(X)f (t), we get 2 2 φ f˙ = =E (13) if 2 φ where E is the energy state of the system. By choosing the energy of the system to be 1 2 as before, we get it (14) f (t) = exp 2 and hence it (15) ψ(X, t) = φ(X) exp 2 where φ(X) satisfies the equation 2 2 φ = φ.

(16)

2.3 Eikonal Equation for Euclidean Distance Functions ∗

∗ We now show that when φ = exp( −S ) and satisfies (16), S asymptotically satisfies the eikonal equation (10) as → 0. We show this for the 2D case but the generalization to higher dimensions is straightforward. ∗ 1 ,x2 ) When φ(x1 , x2 ) = exp( −S (x ), the first partials of φ are ∗ 1 1 ∂φ −S ∂S ∗ ∂φ −S ∗ ∂S ∗ = − exp , = − exp . (17) ∂x1 ∂x1 ∂x2 ∂x2

The second partials needed for the Laplacian are ∗ 2 1 1 ∂2φ ∂S −S ∗ ∂ 2 S ∗ −S ∗ exp = exp − , ∂x21 2 ∂x1 ∂x21 ∗ 2 1 1 ∂S −S ∗ ∂ 2 S ∗ ∂2φ −S ∗ = 2 exp − exp . ∂x22 ∂x2 ∂x22 From this, equation (16) can be rewritten as ∗ 2 ∗ 2 2 ∗ ∂2S ∗ ∂S ∂S ∂ S =1 + − + ∂x1 ∂x2 ∂x21 ∂x22

(18)

(19)

which in simplified form is S ∗ 2 − 2 S ∗ = 1.

(20)

The additional 2 S ∗ term [relative to (10)] is referred to as the viscosity term [9]. (Note that this term emerges naturally from the Schrödinger equation derivation—an intriguing result.) Since | 2 S ∗ | is bounded, as → 0, (20) tends to S∗ = 1

(21)

which is the original eikonal equation (10) for the Euclidean distance function. This relationship motivates us to solve the linear Schrödinger equation (16) instead of the non-linear eikonal equation and then compute the distance function via S ∗ (X) = − log φ(X).

(22)

A Schrödinger Equation for the Fast Computation

105

3 Closed Form Solutions for the Approximate Euclidean Distance Function and Proofs of Convergence We now derive the closed form solution for φ(X) (in 1D, 2D and 3D) satisfying equation (16) and hence for S ∗ (X) by (22) and observe that we get the actual Euclidean distance function in the limit as → 0. In order to satisfy the condition that S ∗ (Yk ) = 0,∀Yk , k ∈ {1, . . . , K}, we consider the forced version of the equation (16) which is −2 2 φ + φ =

K

δ(X − Yk ).

(23)

k=1

Using a Green’s function approach [14] (where the form of the solution depends on the number of spatial dimensions), we can write expressions for the solution φ. Below, let r = mink X − Yk —the actual Euclidean distance function at the grid point X. 1D: In 1D, the solution [14] for φ is 1 exp 2 K

φ(X) =

k=1

−|X − Yk |

.

(24)

Using the relationship in (22), we get ∗

S (X) = − log

K

exp

k=1

Observe that

−|X − Yk |

−r S (X) ≤ − log exp = r + log(2). ∗

Also,

+ log (2) .

(25)

+ log(2)

−r + log(2) S ∗ (x) ≥ − log K exp = − log K + r + log(2).

(26)

(27)

As → 0, log K → 0 and log → 0. Furthermore, we see from (26) and (27) that lim S ∗ (X) = r.

(28)

K 1 X − Yk φ(X) = K0 2π2

(29)

→0

2D: In 2D, the solution [14] for φ is

k=1

106

K.S. Gurumoorthy and A. Rangarajan

where K0 is the modified Bessel function of the second kind. Using (22), we get S ∗ (X) = − log

K

K0

k=1

X − Yk

Then,

r

S ∗ (X) ≤ − log K0 Using the relation K0 ( hr ) ≥

r exp(− h )

√r

when

h

∗

S (X) ≤ − log = − log Moreover

r h

exp r

+ log(2π2 ).

+ log(2π2 ).

(30)

(31)

≥ 0.5, we get

−r

+ log(2π2 )

+ r + log(2π2 ). r

−r + log(2π2 ). S ∗ (X) ≥ − log KK0

(32)

(33)

r Using the relation K0 ( r ) ≤ exp( −r ) when h ≥ 1.5, we get −r + log(2π2 ) S ∗ (X) ≥ − log K exp

= − log K + r + log(2π2 ).

(34)

As → 0, log K → 0, log r → 0 and log → 0. Furthermore, we see from (32) and (34) that lim S ∗ (X) = r. (35) →0

3D: In 3D, the solution [14] for φ is based on the modified spherical Bessel function of the second kind:

K exp −X−Yk 1 φ(X) = . (36) 4π2 X − Yk k=1

Using (22), S ∗ (X) = − log

K exp k=1

−X−Yk

X − Yk

+ log 4π2 .

(37)

Then, ∗

exp

−r

+ log(4π2 ) r

= r + log r + log 4π2 .

S (X) ≤ − log

(38)

A Schrödinger Equation for the Fast Computation

Also,

S ∗ (X) ≥ − log K

exp

−r

r

107

+ log(4π2 )

= − log K + r + log r + log(4π2 ).

(39)

As → 0, log K → 0, log r → 0 and log → 0. Furthermore, we see from (38) and (39) that lim S ∗ (X) = r. (40) →0

Hence, we have shown that (in 1D, 2D and 3D), the closed form solution for φ guarantees that S ∗ approaches the true Euclidean distance function in the limit → 0.

4 Error Bound between the Obtained and True Euclidean Distance Function The solution for φ in 1D, 2D and 3D motivates us to compute the function K −X − Yk ˜ exp φ(X) =

(41)

k=1

(instead of computing φ) and then to compute the approximate distance function ˜ S˜∗ (X) = − log φ(X)

(42)

(instead of computing S ∗ ). The reasons are two-fold. Firstly, φ˜ can be computed efficiently in O(N log N ) time using the fast Fourier transform (FFT) [12] as explained in the subsequent section. Secondly, lim→0 S˜∗ (X) = r, as shown below, where r is the true Euclidean distance function value at the grid point X (r = mink X − Yk ), and this is the lim→0 S ∗ (X) as seen from the previous section. Hence, for small values of ∗ , S˜∗ (X) is a very good

approximation to S (X).

K −X−Yk ˜∗ can be approximated as exp −r As → 0, k=1 exp . Hence S (X)

−r ≈ − log exp = r. The bound derived below between S˜∗ (X) and r also unveils the proximity between the computed and the actual Euclidean distance function. Note from (41) that −r ∗ ˜ S (X) ≤ − log exp = r. (43) Also, observe that

and hence,

−r S˜∗ (X) ≥ − log K exp = − log K + r

(44)

r − S˜∗ (X) ≤ log K.

(45)

108

K.S. Gurumoorthy and A. Rangarajan

From (43) and (45),

|r − S˜∗ (X)| ≤ log K. (46) ∗ ˜ Equation (46) shows that as → 0, S (X) → r. It is worth commenting that the bound log K is actually very tight as (i) it scales only as the logarithm of the cardinality of the point-set (K) and (ii) it can be made arbitrarily small by choosing a small but non-zero value of .

5 Efficient Computation of the Approximate Euclidean Distance Function The motivation for computing φ˜ instead of φ is the fact that the direct computation of φ at the N grid locations is O(N K) which is O(N 2 ) when the cardinality of the point-set is O(N ), whereas computing φ˜ at the N grid locations can be done in O(N log N ) using an FFT implementation [12]. The realization that FFT can be employed

to compute K k φ˜ stems from the insight that the summation term, namely, k=1 exp −X−Y is

actually the discrete convolution between the functions f (X) = exp −X com puted at the grid locations, with the function g(X) which takes the value 1 at the pointset locations and 0 at other grid locations. By the convolution theorem [15], a discrete convolution can be obtained as the inverse Fourier transform of the product of two individual transforms, which for two O(N ) sequences can be computed in O(N log N ) time and hence φ˜ can be determined efficiently at the N grid locations. One just needs to compute the discrete Fourier transform (DFT) of the sampled version of the functions f (X) and g(X), compute their point-wise product and then compute the inverse discrete Fourier transform. Taking the logarithm of the inverse discrete Fourier transform and multiplying it by (−), gives the approximate Euclidean distance function. The algorithm is adumbrated in Table 1. Table 1. Approximate Euclidean distance function algorithm

1. Compute the function f (X) = exp −X at the grid locations. 2. Define the function g(X) which takes the value 1 at the point-set locations and 0 at other grid locations. 3. Compute the FFT of f and g, namely F (u) and G(u) respectively. 4. Compute the function H(u) = F (u) ∗ G(u). ˜ 5. Compute the inverse FFT of H which gives φ(X) at the grid locations. ˜ 6. Take the logarithm of φ(X) and multiply it by (−) to get the approximate Euclidean distance function at the grid locations.

5.1 Computation of the Approximate Euclidean Distance Function in Higher Dimensions Our technique has a straightforward generalization to higher dimensions. Regardless of the spatial dimension, the approximate Euclidean distance function, S˜∗ can be computed by exactly following the steps delineated in the table above. It is worthwhile mentioning that computing the discrete Fourier transform using FFT is always O(N log N )

A Schrödinger Equation for the Fast Computation

109

irrespective of the spatial dimension. Hence, for all dimensions, S˜∗ can be computed at the given N grid points, in O(N log N ). This speaks for the scalability of our technique, which is generally not the case with other methods, for example KD-Trees [11].

6 Experiments In this section, we show the efficacy of our technique by computing the approximate Euclidean distance function S˜∗ and comparing it to the actual Euclidean distance function S, first on randomly generated 2D point-sets and then on a set of bounded 3D grid points. We began with a 2D grid consisting of points between (−30, −30) and (30, 30). Hence, the total number of grid points is N = 61 × 61 = 3721. We randomly chose around 1000 grid locations as data points (point-set). Then 50, 000 experiments were run for values of ranging from 0.1 to 0.5 in steps of 0.01. The errorbar plot in Figure 1 shows the mean and standard deviation of the percentage error at each value of . The error is less than 0.5% at = 0.1 demonstrating the algorithm’s ability to compute accurate Euclidean distances. Next, we took the Stanford bunny dataset3 and used the coordinates of the data points on the model as our point-set locations. Since the input data locations need not be at integer locations, we scaled the space uniformly in all dimensions and rounded off the data so that the data lies at integer locations. The input data was also shifted so that it was approximately symmetrically located with respect to the x, y and z axis. We should comment that shifting the data doesn’t affect the Euclidean distance function value and uniform scaling of all dimensions is also not an issue, as the distances can be rescaled once they are computed. After these basic data manipulations, the cardinality of the point set was K = 3019 with the data confined to the cubic region −16 ≤ x ≤ 16, −15 ≤ y ≤ 15 and −12 ≤ z ≤ 12. Our grid consisted of the set of all integer locations within this cubic region. The number of grid locations was N = 25575. We computed the Euclidean distance function value at each of these grid locations using our technique for different values of and compared it to the true Euclidean distance function value. −X At small values of , exp drops off very quickly and hence for grid locations which are far away from the point-set, the convolution done using FFT needs to be precise (without round-off error) for the computed distance to be meaningful. Such high precision support may not be available and hence our technique may produce erroneous results at these grid locations for very small values of . But at those grid locations which are close to the point-set, the accuracy of the computed distance improves as is decreased. Hence, to circumvent this problem of choosing , we ran our technique for different values of and chose the distance function values obtained at large values of at those grid locations whose average computed distance is larger and vice versa.

3

Go to http://www.cc.gatech.edu/projects/large_models/bunny.html to obtain this dataset.

110

K.S. Gurumoorthy and A. Rangarajan

percentage error

20

15

10

5

0

0.1

0.2

0.3 hbar values

0.4

0.5

Fig. 1. Percentage error versus in 50,000 2D experiments

Fig. 2. Isosurfaces: (i) Left: Actual Euclidean distance function and (ii) Right: Our approach

When we ran our technique for the set of ∈ {0.1, 0.2, 0.3, 0.4} and used {3, 6, 10} respectively as the threshold of the average computed distance for choosing the appropriate distance function, it gave the following set of results. The maximum absolute difference between the actual and the computed Euclidean distance value over all the grid locations is 0.9066 and the average absolute difference is 0.1322. The accuracy is fairly high since the furthest grid point from the point-set is at a distance of 15.5242 and the average overall distance computed at the grid locations from the point-set is 3.5449. The average error is 0.1322 3.5449 ∗ 100 = 3.72%. This error can be lowered by using higher precision numerical methods for convolution [16]. We plotted the isosurface obtained by connecting the grid points which are at a distance of 0.5 from the point set, determined both by the true Euclidean distance function and our technique. Figure 2 shows the two surfaces. Notice the similarity between the two plots. It provides anecdotal visual evidence for the usefulness of our approach.

7 Discussion In this paper, we have introduced a new approach to solving the non-linear eikonal equation (with a constant forcing term equal to 1). We have proved that the solution of the eikonal equation can be obtained as a limiting case of the solution to the corresponding linear Schrödinger wave equation. The key here is the embedding of the nonlinear Hamilton-Jacobi equation in a linear Schrödinger equation. Our Schrödinger wave equation formalism for solving the Euclidean distance function problem (which has been successfully attacked by pioneering Hamilton-Jacobi solvers such as the fast sweeping [10] and fast marching [9] methods) leverages this deep relationship between the two regimes of modern physics. In the future, we would like to solve the more

A Schrödinger Equation for the Fast Computation

111

general, static Hamilton Jacobi equation using techniques inspired from quantum mechanics as a counterpart to classical mechanics based techniques. In all likelihood, this will involve direct discretization of the Schrödinger wave equation which was not required for the Euclidean distance function problem. We expect that the linearity of the Schrödinger equation will result in fast algorithms even in this more general setting.

References 1. Horn, B.K.P.: Robot vision. MIT Press, Cambridge (1986) 2. Kimmel, R.: Numerical geometry of images: Theory, algorithms, and applications. Springer, Heidelberg (2003) 3. Grimson, W.E.L.: An implementation of a computational theory of visual surface interpolation. Computer Vision, Graphics, and Image Processing 22(1), 39–69 (1983) 4. Siddiqi, K., Tannenbaum, A., Zucker, S.W.: A Hamiltonian approach to the eikonal equation. In: Hancock, E.R., Pelillo, M. (eds.) EMMCVPR 1999. LNCS, vol. 1654, pp. 1–13. Springer, Heidelberg (1999) 5. Kao, C.Y., Osher, S.J., Tsai, Y.H.: Fast sweeping methods for static Hamilton-Jacobi equations. SIAM Journal on Numerical Analysis 42(6), 2612–2632 (2004) 6. Goldstein, H., Poole, C.P., Safko, J.L.: Classical mechanics. Addison-Wesley, Reading (2002) 7. Butterfield, J.: On Hamilton-Jacobi theory as a classical root of quantum theory. In: Elitzur, A., Dolev, S., Kolenda, N. (eds.) Quo-Vadis Quantum Mechanics, pp. 239–274. Springer, Heidelberg (2005) 8. Griffiths, D.J.: Introduction to quantum mechanics. Addison-Wesley, Reading (2004) 9. Osher, S.J., Sethian, J.A.: Fronts propagating with curvature dependent speed: algorithms based on Hamilton-Jacobi formulations. Journal of Computational Physics 79(1), 12–49 (1988) 10. Zhao, H.K.: A fast sweeping method for eikonal equations. Mathematics of Computation 74, 603–627 (2005) 11. de Berg, M., Cheong, O., van Kreveld, M., Overmars, M.: Computational geometry: Algorithms and applications. Springer, Heidelberg (2008) 12. Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Mathematics of Computation 19(90), 297–301 (1965) 13. Arnold, V.I.: Mathematical methods of classical mechanics. Springer, Heidelberg (1989) 14. Abramowitz, M., Stegun, I.A.: Handbook of mathematical functions with formulas, graphs, and mathematical tables. Government Printing Office, USA (1964) 15. Bracewell, R.N.: The Fourier transform and its applications. McGraw-Hill Science and Engineering, New York (1999) 16. Hida, Y., Li, H.S., Bailey, D.H.: Quad-double arithmetic: Algorithms, implementation, and application. Technical Report LBNL-46996, Lawrence Berkeley National Laboratory, Berkeley, CA 94720 (2000)

Semi-supervised Segmentation Based on Non-local Continuous Min-Cut Nawal Houhou1 , Xavier Bresson2 , Arthur Szlam2 , Tony F. Chan2 , and Jean-Philippe Thiran1 1

Signal Processing Laboratory 5, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland 2 Department of Mathematics, University of California Los Angeles, CA 90095-1555, USA

Abstract. We propose a semi-supervised image segmentation method that relies on a non-local continuous version of the min-cut algorithm and labels or seeds provided by a user. The segmentation process is performed via energy minimization. The proposed energy is composed of three terms. The ﬁrst term deﬁnes labels or seed points assigned to objects that the user wants to identify and the background. The second term carries out the diﬀusion of object and background labels and stops the diﬀusion when the interface between the object and the background is reached. The diﬀusion process is performed on a graph deﬁned from image intensity patches. The graph of intensity patches is known to better deal with textures because this graph uses semi-local and non-local image information. The last term is the standard TV term that regularizes the geometry of the interface. We introduce an iterative scheme that provides a unique minimizer. Promising results are presented on synthetic textures a nd real-world images.

1

Introduction

Image segmentation is an important problem in image processing. The objective of segmentation algorithms is to partition an image into a ﬁnite number of semantically important regions such as anatomical or functional structures in medical images or objects in natural images. Well-posed approaches to solve the image segmentation problem are energy minimization methods. This paper introduces an energy minimization algorithm to solve the semi-supervised segmentation problem based on the continuous min cut/max ﬂow model originally deﬁned by Strang in [1]. Semi-supervised segmentation models deﬁned in a continuous setting have already been proposed in the literature. Among them, Protiere and Sapiro proposed in [2] an interactive algorithm for segmentation. Cremers et al. introduced in [3] an algorithm based on the level set method to perform interactive image segmentat ion. Appleton and Talbot introduced in [4] a semi-supervised segmentation model based on the continuous min-cut of Strang. Unger et al. deﬁned in [5] a segmentation method also based on the min-cut model of Strang in [1]. The semi-supervised segmentation models using the continuous min-cut are based on local image information. These models X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 112–123, 2009. c Springer-Verlag Berlin Heidelberg 2009

Semi-supervised Segmentation Based on Non-local Continuous Min-Cut

113

s.a. [4,5] perform very well for the segmentation of smooth regions, but they are less eﬃcient with textures. In this paper, we extend the continuous min cut to a non-local formulation along the same line as non-local means deﬁned by Buades, Coll and Morel in [6] and the variational non-local means model of Gilboa and Osher [7]. This non-local extension of the continuous min cut can be obtained in diﬀerent ways. We used the original discrete min-cut model [8] to deﬁne the non-local continuous min-cut, which turns to be the H 1 norm deﬁned on graph and a term that constraints the labels. The H 1 norm carries out the diﬀusion of object and background labels on the graph of image patches [9, 6], which holds semi-local and non-local image information that can better segment textures and real-world objects. Besides, the continuous formulation of the min-cut algorithm allows us introducing other regularization processes such as the TV energy. The TV energy is indeed useful to regularize the boundary between the object and the background. Besides, the TV energy can smooth out the segmentation of small sets favored by the min-cut algorithm as noticed by Shi and Malik in [10] (see Figure 1).

2

Graph, Min-Cut and Diﬀusion

Graph representation. Let G = (V, E) be a weighted undirected graph, where the sets V are the graph nodes and E the edges connection nodes. In this paper, each node Vi represents a pixel i in an image I with support Ω ⊂ Rn where typically n = {2, 3}. The similarity between two pixels/nodes i and j in Ω is measured by the edge function on the graph, namely wij . In the case of image segmentation, two pixels i and j that belong to the same object/class are said to be connected and deﬁne a measure wij close to unity. Inversely, two pixels i and j that do not belong to the same class are said to be not connected and deﬁne a measure wij close to zero. A standard construction approach for the weight matrix wij is as follows. Let h(i, j) be some general non-negative distance measure between nodes i and j, then the weight wij is computed with a Gaussian kernel of 0-mean and variance σ such that: h(i, j) 1 exp(− ), Z σ2 where σ is the scaling parameter and Z is the normalization factor. wij =

Image Feature. The distance h(i, j) depends on image feature. The choice of features is diﬃcult and critical to get an optimal segmentation result. For piecewise smooth and constant images, the gray-level value can be enough. For texture images, a feature vector at each pixel computing from a ﬁlter bank (as suggested in e.g. [11]) can be eﬃcient. A recent promising image feature to represent and process textures is the image intensity patch around the current pixel. The patch idea as feature vector was ﬁrst introduced for texture synthesis [9, 12, 13] then for image denoising. Buades et al. in [6] proposed to compute the weight matrix with patch diﬀerences and denoise the image with a non-local averaging. Gilboa and Osher in [7] proposed a variational model for non-local denoising based on

114

N. Houhou et al.

patch diﬀerences. Fina lly, Bresson and Chan in [14] proposed a variational unsupervised segmentation method also based on patch diﬀerences. In this paper, we will use the graph of image patches of Buades et al. Min-Cut. By deﬁnition a binary cut partitions a graph into two subsets. This partition process for graphs can be used for image segmentation when we want to ﬁnd an object and the background. In optimization theory about maximum ﬂows in ﬂow networks [8], the optimal partition of the graph V into two sets A and B such that A ∪ B = V and A ∩ B = ∅ can be computed by ﬁnding the minimal cut (min-cut), i.e. the minimization of the inter-similarity between two sets A and B of V . In other words, given two particular nodes s ∈ A and t ∈ B in the graph, the min-cut partition can be written as: min − cut(A, B) = min −wij xi xj , (1) x

xi >0,xj <0

where x is a N -dimensional indicator vector, with N = card(V ), such that xi = 1 if node i ∈ A, and xi = −1 otherwise. The min-cut partition gives the minimal capacity deﬁned as the total weight between the nodes for each subset A and B. The min-cut approach has been applied to several computer vision problems, see [15, 16] for image restoration, [17, 18, 4, 5] for image segmentation, [19] for stereo and motion, and [20] for texture synthesis. Diﬀusion. In the case of binary partition of a graph, min-cut partition and diﬀusion are equivalent. Indeed, let us denote W the symmetric matrix such that W (i, j) = w(i, j). Then, the graph partition energy deﬁned in (1) is equivalent to a Graph Laplacian operator: 1 (2) wij (xi − xj )2 , 8 where D is a N × N diagonal matrix with di = j w(i, j) on its diagonal, the matrix D − W is called the Graph-Laplacian. cut(A, B) =

Proof. See e.g. [10]. Let x+1 (resp. x−1 2 2 ) be the indicator function for xi > 0 (resp. xi < 0). Then (1) can be written in the matrix form as follows: cut(A, B) = which implies: xT (D − W )x =

1 T x (D − W )x, 4 1 wij (xi − xj )2 . 2 i,j

The weighted Graph Laplacian corresponds to a ﬁnite diﬀerence approximation of the continuous Lapacian operator. The graph Laplacian can also be a non-local operator. Semi-supervised segmentation. We observe that min-cut partitioning algorithms are deﬁned as semi-supervised segmentation techniques. The min-cut seeks for

Semi-supervised Segmentation Based on Non-local Continuous Min-Cut

115

the optimal partition of the graph given particular nodes called the source "s" and the sink "t". Hence, it is easy to assign some pixels as source and some as sink if the pixel belongs to the object or the background. Several graphbased partitioning methods have been proposed in the literature s.a. [17, 21, 22, 23, 18]. The previous papers are based on discrete minimization methods to compute the min-cut given the labels. In this paper, we propose a continuous minimization method to solve the min-cut problem with labels and non-local image information.

3 3.1

Proposed Segmentation Method Continuous Min-Cut

Energy minimization problem. In this section a new non-local semi-supervised segmentation algorithm is introduced. The algorithm relies on the continuous formulation of the discrete min-cut problem deﬁned as: min − cut(A, B) = min x

s.t.

wij (xi − xj )2

(3)

i,j

xk = +1, ∀k ∈ S xk = −1, ∀k ∈ T,

where S are the labels selected for the object and T are the labels assigned to the background. We propose the continuous min-cut (CMC) problem as follows (which is a constrained minimization problem w.r.t. a real-valued function u): 1 CM C(u) = min w(x, y)(u(x) − u(y))2 dxdy u 2 Ω×Ω s.t. u(x) = 1, ∀x ∈ S u(x) = 0, ∀x ∈ T, which is equivalent to this unconstrained minimization problem for u: 1 2 w(x, y)(u(x) − u(y)) dxdy + λ(x)(u − u0 )2 dx, (4) CM C(u) = min u 2 Ω×Ω Ω 1 if x ∈ S ∞ if x ∈ S ∪ T where u0 (x) = and λ(x) = , 0 if x ∈ T 0 otherwise where function λ provides the degree of conﬁdence with respect to the labels. Non-local H 1 energy. The ﬁrst term of (4) is deduced from (3) using the change of variable ui = xi2+1 ∈ {0, 1} then relaxing ui to [0, 1]. This term is also known as the non-local H 1 energy ( [24, 7]) deﬁned as: 1 1 1 2 HG (u) = w(x, y)(u(x) − u(y)) dxdy = |∇G u|2 dx = ||u||HG1 , (5) 2 Ω×Ω 2 Ω

116

N. Houhou et al.

where |∇G u|2 := Ω w(x, y)(u(x)−u(y))2 dy is the square norm of the continuous graph gradient of u. The optimality condition for (5) is: w(x, y)(u(x) − u(y))dy = ΔG u = 0, Ω

where ΔG u is the continuous graph Laplacian of u. Labels. The second term of (4) introduces the hard constraint of labels in the energy minimization approach. This term comes from Unger et al. in [25], which incorporates seed points (assigned either to the object or to the background) in the geodesic active contour/snake model [26]. This term constrains function u to be equal to u0 for x ∈ S ∪ T and being equal to anything else for x ∈ S ∪ T . Function λ is highly discontinuous, which requires some regularization process to handle it. Unger et al. proposed a splitting operation to solve this problem. A new function v is introduced s.t.: 1 min λ(x)(v − u0 )2 dx + ||u − v||22 , u 2θ Ω where the term ||u − v||22 forces v ≈ u as θ → 0. The optimality solution w.r.t. v leads to: 2λ(x)θu0 − u u if λ → 0 = v= . u if λ → ∞ 2λ(x)θ − 1 0 3.2

Proposed Semi-supervised Segmentation Algorithm: Continuous Min-Cut + TV

Final model. The previous section introduced the continuous formulation of the min cut problem. In this section, we proposed to merge the continuous min-cut with the Total Variation (TV) energy. The TV term oﬀers two advantages. First, TV regularizes the geometry of the contour between classes (object and background). Experiments showed that the continuous min-cut can provide irregularities along the contour. Second, Shi and Malik in [10] observed that the min-cut algorithm tends to favor misclassiﬁcation of small sets, which are smoothed out with the TV regularization process. Finally, we propose the following energy minimization model for semisupervised segmentation: E(u) = ||u||HG1 + λ(x)(u − u0 )2 dx + β||u||T V , (6) where ||u||T V =

Ω

Ω

|∇u|dx.

Minimization process. A direct use of the calculus of variation to (6) will produce a very slow minimization process. We propose to use a splitting operation to minimize E more eﬃciently. We introduce two new functions v, s s.t.: λ(x)(v − u0 )2 dx + β||s||T V E(u, v, s) = ||u||HG1 + Ω

1 1 + ||u − v||22 + ||s − v||22 . 2θv 2θs

(7)

Semi-supervised Segmentation Based on Non-local Continuous Min-Cut

117

1 2θv ||u − w(x,y)u(y)dy+v(x) Ω . Funcθv Ω w(x,y)dy+1

Then, v, s being ﬁxed, we search for u as the solution of minu ||u||HG1 + v||22 , which is given by a ﬁxed point method as u =

θv

tions u, s being ﬁxed, we search for v as the solution of minv Ω λ(x)(v − vs u0 )2 dx + 2θ1v ||u − v||22 + 2θ1s ||s − v||22 , which is given by v = θsθv+θ if λ = 0 s +θv and v = u0 if λ = ∞. Functions u, v being ﬁxed, we search for s as the solution of mins β||s||T V + 2θ1s ||s − v||22 , which solution is given e.g. the Projection algorithm of Chambolle [27]. We propose the following iterative scheme for minimizing energy (7): ⎧ ⎪ un+1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ n+1 v ⎪ ⎪ ⎪ ⎪ sn+1 ⎪ ⎪ ⎩ n+1 p

3.3

= =

θv

w(x,y)un (y)dy+v n (x) θv Ω w(x,y)dy+1

Ω

θs un+1 +θv sn θs +θv

u0

if λ = 0 if λ = ∞ ,

n≥0

(8)

=v − θdivp β(pn +1/8∇(divpn −v n+1 /θs )) = β+1/8|∇(divpn −vn+1 /θs )| n+1

n

Some Properties of the Models (6) and (7)

Convexity. Both energy minimization models (6) and (7) are strictly convex 1 (since the HG term is strictly convex), which implies the existence of a unique minimizing solution independently of the initial condition. Hence, even using gradient descent approaches, the algorithm does not get stuck in a local minimum. Thus, as far as the labels are correctly deﬁned, the results will be independent of the initialization. Relation with the original min-cut problem (1) or equivalently (3). The continuous min-cut problem (4) has the same solution as the discrete min-cut problem when considering characteristic/indicator functions of sets, i.e.: min {ECMC (u = 1A )} = min {cut(A, Ω\A)}. 1A

A

(9)

We remind that we relax function u to be between [0, 1] to deﬁne a continuous version of the min-cut algorithm, which can be minimized with continuous minimization tools. Then, the segmentation result is given by thresholding the minimizer u of (6) with any value in (0, 1). Non-trivial steady state solution of (6). The ﬁnal steady state solution of (6) is not the mean value of the initial function. Call ut=0 = ut=0 the mean value function. It is easy to show by contraction that ut=0 is not solution to (6). If ut=∞ = ut=0 then E(ut=∞ ) = Ω λ(x)(ut=0 − u0 )2 dx > 0, and the minimizer is thus given by ut=0 = u0 . However, u0 (x) = 1 ∀x ∈ S, 0 ∀x ∈ T . Thus, ut=0 = u0 . We notice that Gilboa and Osher in [7] also use the energy ||u||HG1 to perform semi-supervised segmentation. However, they did not use a term to constraint the labels as in this work. They minimized energy ||u||HG1 starting with a trinary initial function ut=0 = {−1, 0, 1} (labelled pixels for the object are assigned to

118

N. Houhou et al.

the value 1 and those for the background to the value −1). However, the minimizing solution is the mean value function ut=∞ = ut=0 . Hence, this algorithm requires to stop the diﬀusion process.

4

Results

This section presents some results of the proposed semi-supervised segmentation algorithm. The graph is deﬁned from local and non-local image information: |i−j|2 (j)|2 + |F (i)−F if i, j ∈ Na×a (i) σ12 σ22 w(i, j) = , (10) 0 otherwise where Na×a (i) is a square window of size a × a around i. The computational cost of the similarity between pixel on the whole image is very expensive, however we chose to simply select points in a close neighborhood. This implies the supposition that if two points are far away, they are not connected. From a2 neighbors only the cl = 8 closest points are selected. The feature vector F is a square patch of size f ×f centered on each pixel. The segmentation is driven by (8). The initial condition for u, v and s are given by the label S, i.e. u = v = s = 1 if x ∈ S and u = v = s = 0 otherwise. With an unoptimized Matlab implementation, the graph computation lasts approximatively 15 seconds and the segmentation is performed in approximatively 1 minute. The image size is 128 × 128. TV Regularization Eﬀect. The importance of the TV-Regularization eﬀect is emphasized in this paragraph. A salt-and-pepper noise is added on a two-phase image with diﬀerent means 1(a). The inside and outside labels are presented on Figure 1(a). The results show that if the TV regularization is not performed then the segmentation fails (Fig 1(b)). When the TV regularization is used, then the segmentation succeeds.

(a)

(b)

(c)

Fig. 1. Application of our algorithm on a image with a salt-and-pepper noise. (a)Initialization (b)The segmentation result without TV-regularization.(c)The segmentation result with TV-regularization.

Semi-supervised Segmentation Based on Non-local Continuous Min-Cut

119

Texture Images. We apply our algorithm to a synthetic texture image composed of ﬁve diﬀerent patterns. Figures 2(a) and 2(c) show the initializations and Figures 2(b) and 2(d) the corresponded results. The patch size is chosen to be 9 × 9 which correspond to the pattern size for the two selected textures.

(a)

(b)

(c)

(d)

Fig. 2. Results on synthetic textures. (a) and (c) Initializations. (b) and (d) results.

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 3. Results on real-world images from the Berkeley dataset. Left column: Initial labels. Right column: Segmentation Result.

Natural Images. We apply now our algorithm to a set of natural images taken from the Berkeley segmentation dataset [28]. In the ﬁst column of Figure 3, the inside and outside labels are shown and in the second column the segmentation results.

120

N. Houhou et al.

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 4. Results on real world color images from the Berkeley dataset. Left column: Initial labels. Right column: Segmentation Result.

Fig. 5. First row, Segmentation of the liver. (a) Initial labels. (b) Segmentation Result. Second row, Segmentation of the lateral muscles on the neck. (c) Initial labels. (d) Segmentation Result. (e) Zoom on the segmentation of the muscles.

Color Images. We consider the simple case of Red-Green-Blue (RGB) channels. The ﬁrst step consists of computing the graph by taking into account each channel, i.e. F = (Fr , Fg , Fb ), where respectively the red, green and blue feature channel. Images are also taken from the Berkeley segmentation dataset [28]. In the ﬁst column of Figure 3, the inside and outside labels are shown and in the second column the segmentation results. Medical Images. We apply our segmentation algorithm on 2-D medical images of CT scans of the abdomen and the head and neck. Figures 5(a) and 5(c) present

Semi-supervised Segmentation Based on Non-local Continuous Min-Cut

121

the inside and outside initial labels. Figures 5(b) and 5(d) show the segmentation results. For the liver segmentation, the label on the background (black) prevents the diﬀusion from capturing as well the heart. The segmentation of the structures in the neck are challenging and the result that we obtain are promising.

5

Discussion and Conclusion

In this paper, a non-local semi-supervised segmentation method has been proposed. The success of graph partitioning algorithms for image segmentation has motivated this work. Our objective was to translate the discrete min-cut algorithm into a non-local continuous min-cut algorithm. The addition of hard constraints with the source and sink labels has been done naturally in the proposed continuous framework. Besides, it has also been easy to introduce new terms such as the TV term that regularizes the geometry of the boundary between the object and the background. The non-local continuous min-cut is also equivalent to a diﬀusion process. The diﬀusion is done on the graph of image intensity patches, which holds semi-local and non-local image information useful to segment textures and complex patterns. Our semi-supervised segmentation has provided promising segmentation results for textures and real-world objects. Future work will focus on comparing the eﬃciency of our segmentation algorithm with other related semi-supervised segmentation algorithms. We would like also to extend our method to 3-D medical images.

Acknowledgements Nawal Houhou was supported by Swiss National Science Foundation #205320101621, Xavier Bresson was supported by ONR N00014-03-1-0071 and ONR MURI subcontract from Stanford University and Arthur Szlam was supported by NSF DMS-0811203. The authors would like also to thank the referees for their constructive comments.

References 1. Strang, G.: Maximal Flow Through A Domain. Mathematical Programming 26(2), 123–143 (1983) 2. Protiere, A., Sapiro, G.: Interactive image segmentation via adaptive weighted distances. IEEE Transactions on Image Processing 16(4), 1046–1057 (2007) 3. Cremers, D., Fluck, O., Rousson, M., Aharon, S.: A Probabilistic Level Set Formulation for Interactive Organ Segmentation. In: SPIE (2007) 4. Appleton, B., Talbot, H.: Globally minimal surfaces by continuous maximal ﬂows. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(1), 106–118 (2006) 5. Unger, M., Pock, T., Cremers, D., Bischof, H.: Tvseg - interactive total variation based image segmentation. In: British Machine Vision Conference (BMVC), Leeds, UK (September 2008)

122

N. Houhou et al.

6. Buades, A., Coll, B., Morel, J.: A review of image denoising algorithms, with a new one. Multiscale Modeling & Simulation 4(2), 490–530 (2005) 7. Gilboa, G., Osher, S.: Nonlocal linear image regularization and supervised segmentation. Multiscale Modeling and Simulation 6(2), 595–630 (2007) 8. Elias, P., Feinstein, A., Shannon, C.E.: Note on Maximum Flow Through a Network. IRE Transactions on Information Theory 2, 117–119 (1956) 9. Efros, A., Leung, T.: Texture Synthesis by Non-Parametric Sampling. In: IEEE International Conference on Computer Vision, vol. 2, pp. 10–33 (1999) 10. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 888–905 (2000) 11. Malik, J., Belongie, S., Leung, T., Shi, J.: Contour and texture analysis for image segmentation. International Journal of Computer Vision 43(1), 7–27 (2001) 12. Efros, A., Freeman, W.: Image quilting for texture synthesis and transfer. In: Proceedings of the Conference on Computer graphics and interactive techniques, SIGGRAPH, pp. 341–346. ACM, New York (2001) 13. Liang, L., Liu, C., Xu, Y., Guo, B., Shum, H.: Real-time texture synthesis by patch-based sampling. ACM Trans. Graph. 20(3), 127–150 (2001) 14. Bresson, X., Chan, T.: Non-local Unsupervised Variational Image Segmentation Models, UCLA CAM Report 08-67 (2008) 15. Wu, Z., Leahy, R.: An Optimal graph theoretic approach to data clustering: Theory and its application to image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 15(11), 1101–1113 (1993) 16. Ishikawa, H., Geiger, D.: Segmentation by grouping junctions. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 125–131 (1998) 17. Boykov, Y., Jolly, M.P.: Interactive graph cuts for optimal boundary and region segmentation of objects in n-d images. In: Proceedings of Eighth IEEE International Conference on Computer Vision, vol. 1, pp. 105–112 (2001) 18. Boykov, Y., Funka-Lea, G.: Graph cuts and eﬃcient n-d image segmentation. Int. J. Comput. Vision 70(2), 109–131 (2006) 19. Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence 23, 2001 (2001) 20. Kwatra, V., Schödl, A., Essa, I., Turk, G., Bobick, A.: Graphcut textures: Image and video synthesis using graph cuts. In: Proceedings of the Conference on Computer graphics and interactive techniques, SIGGRAPH, vol. 22(3), pp. 277–286 (July 2003) 21. Blum, A., Chawla, S.: Learning from labeled and unlabeled data using graph mincuts. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 19–26. Morgan Kaufmann Publishers Inc., San Francisco (2001) 22. Yu, S., Shi, J.: Segmentation given partial grouping constraints. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(2), 173–183 (2004) 23. Grady, L., Funka-lea, G.: Multi-label image segmentation for medical applications based on graph-theoretic electrical potentials. In: Proceedings of the European Conference on Computer Vision, pp. 230–245. Springer, Heidelberg (2004) 24. Zhou, D., Scholkopf, B.: A Regularization Framework for Learning from Graph Data. In: Workshop on Statistical Relational Learning and Its Connections to Other Fields (2004) 25. Unger, M., Pock, T., Bischof, H.: Continuous globally optimal image segmentation with local constraints. In: Computer Vision Winter Workshop 2008 (2008)

Semi-supervised Segmentation Based on Non-local Continuous Min-Cut

123

26. Caselles, V., Kimmel, R., Sapiro, G.: Geodesic Active Contours. International Journal of Computer Vision 22(1), 61–79 (1997) 27. Chambolle, A.: An Algorithm for Total Variation Minimization and Applications. Journal of Mathematical Imaging and Vision 20(1–2), 89–97 (2004) 28. Martin, D., Fowlkes, C., Tal, D., Malik, J.: A Database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, July 2001, vol. 2, pp. 416–423 (2001)

Momentum Based Optimization Methods for Level Set Segmentation Gunnar Läthén1,3 , Thord Andersson2,3 , Reiner Lenz1,3 , and Magnus Borga2,3 1

Department of Science and Technology, Linköping University Department of Biomedical Engineering, Linköping University Center for Medical Image Science and Visualization, Linköping University 2

3

Abstract. Segmentation of images is often posed as a variational problem. As such, it is solved by formulating an energy functional depending on a contour and other image derived terms. The solution of the segmentation problem is the contour which extremizes this functional. The standard way of solving this optimization problem is by gradient descent search in the solution space, which typically suﬀers from many unwanted local optima and poor convergence. Classically, these problems have been circumvented by modifying the energy functional. In contrast, the focus of this paper is on alternative methods for optimization. Inspired by ideas from the machine learning community, we propose segmentation based on gradient descent with momentum. Our results show that typical models hampered by local optima solutions can be further improved by this approach. We illustrate the performance improvements using the level set framework.

1

Introduction

A very popular and powerful approach for solving image segmentation problems is through the calculus of variations. In this setting the solution is represented by a contour, which parameterizes an energy functional depending on various image based quantities such as intensities or gradients. In general, the set of possible contours constitutes the solution space, where the goal is to ﬁnd the contour which extremizes the energy in this space. As an optimization problem, there are many possible strategies to ﬁnd this solution. One approach is to use the method of graph cuts to ﬁnd a global optimum [1]. However, this can only be applied to a small class of energy functionals. For more general problems, the standard method has been to deform an initial contour in the steepest (gradient) descent of the energy. Equations of motion for the contour is derived using the Euler-Lagrange equation and the condition that the ﬁrst variation of the energy functional should vanish at a (local) optimum. Then, the contour is evolved to steady-state given the resulting equations. A standard implementation of this strategy is usually hampered by two common problems. The ﬁrst problem is sensitivity to local optima, which are manifested due to noisy data. To avoid this, the usual approach has been to modify the energy functional by adding regularizing terms. The second common problem is poor convergence due X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 124–136, 2009. c Springer-Verlag Berlin Heidelberg 2009

Momentum Based Optimization Methods for Level Set Segmentation

125

to diﬃculties in choosing good initial conditions. To improve convergence, very eﬀective solvers based on multi-grid [2, 3] and AOS schemes [4, 5, 6] have been developed. However, these methods all search for a solution in the gradient descent direction and little focus has been given to the underlying optimization problem. This has been identiﬁed in recent work [7, 8], where the metric deﬁning the notion of steepest descent (gradient) has been studied. By changing the metric in the solution space, local optima due to noise are avoided in the search path. Along the same direction, this paper presents an alternative search strategy for the optimization solver. Our idea stems from the machine learning community, where an optimization problem is solved to update the system to adapt to a given stimulus. A simple, but eﬀective, modiﬁcation to gradient descent was proposed in [9], which basically adds a momentum to the motion in solution space. This simulates the physical properties of inertia and momentum and eﬀectively allows the search to avoid local optima and accelerate in favorable directions. In this paper, we show how this idea can be used for image segmentation in a variational framework using level set methods. The results show faster convergence and less sensitivity to local optima. The paper will proceed as follows. In Section 2, we describe the idea of gradient descent with momentum in a general setting and give examples highlighting the beneﬁts. Then, Section 3 presents how this idea can be used to solve segmentation problems in a level set framework. This is exempliﬁed in Section 4 and Section 5 where we give implementation details and compute segmentations given a common energy functional. Finally, Section 6 concludes the paper and presents ideas for future work.

2

Gradient Descent with Momentum

Considering general optimization problems, gradient descent is a very simple approach which can handle many types of cost functions. It is intuitive, since it always moves in the direction of steepest descent, which locally gives the largest amount of decrease in the cost function. In addition, it only requires ﬁrst order derivatives of the function, providing simple and fast computations. On the other hand, it is well known that gradient descent suﬀers from poor convergence and high sensitivity to local optima for many practical problems. Therefore, other descent directions (Newton, Quasi-Newton, etc.) have been studied and proved superior, see e.g. [10] for a rigorous reference. A simple alternative to these, more theoretically sophisticated methods, is often applied in the machine learning community. A typical problem here is the construction of adaptive systems that can classify unknown inputs. This can be formulated as an optimization problem and one of the goals of machine learning is to construct fast learning or adaptation rules that can be implemented in very simple hardware or software devices. To improve the convergence and robustness of a simple gradient descent solution, while avoiding the complexity of more sophisticated optimization methods, gradient descent with momentum

126

G. Läthén et al.

was proposed [9]. The starting point of our derivation of the proposed method is the following description of a standard line search optimization method: xk+1 = xk + sk

(1)

sk = αk pk

(2)

where xk is the current iterate, sk is the next step consisting of length αk in direction pk . To guarantee convergence, it is often required that pk be a descent direction while αk gives a suﬃcient decrease in the cost function. A simple realization of this is gradient descent which moves in the steepest descent direction according to pk = −∇fk , where f is the cost function, while αk satisﬁes the Wolfe conditions [10]. Turning to gradient descent with momentum, we will adopt some terminology from the machine learning community and choose a search direction according to: sk = −η(1 − ω)∇fk + ωsk−1 (3) where η is the learning rate and ω ∈ [0, 1] is the momentum. Note that ω = 0 gives standard gradient descent sk = −η∇fk , while ω = 1 gives “inﬁnite inertia” sk = sk−1 . The intuition behind this strategy is that the current iterate has an inertia, which prohibits sudden changes in the velocity. This will eﬀectively ﬁlter out high frequency changes in the cost function and allow for greater steps in favourable directions. Selecting appropriate parameters, our hope is that the rate of convergence is increased while eventual local optima will be overstepped. The eﬀect of the momentum term is illustrated in Figure 1. The iterates with momentum ω = 0 show the behaviour of standard gradient descent when varying the learning rate (step length) η. In comparison, for an appropriate choice of momentum ω = 0.1, the solution approaches the optimum more rapidly. It can be seen however, that too high momentum of ω = 0.4 leeds to oscillations. 1

10

100 90

0

10

80 −1

10

70 60

−2

10

50 −3

10

40

−4

30

10

20

−5

10

10 −6

20

40

60

80

(a) Iterates and cost function

100

10

0

η = 0.04, ω = 0 η = 0.4, ω = 0 η = 0.4, ω = 0.1 η = 0.4, ω = 0.4 2

4

6

8

10

(b) Convergence rate

Fig. 1. Gradient descent search with/without momentum on a quadratic cost function

Momentum Based Optimization Methods for Level Set Segmentation

3

127

Minimizing Level Set Flows

As was previously outlined, segmentation problems in image analysis are often described as optimization problems with solutions derived using the calculus of variations. The standard procedure is to formulate an energy functional by means of a contour and various image derived terms. Extremals of this functional are then identiﬁed by an Euler-Lagrange equation, which is used to derive equations of motion for the contour [11]. This typical procedure yields a gradient descent search in a high dimensional solution space, in which each possible contour is represented by a point. For example [11] presents, among others, the derivation of weighted region described by the following functional: E(C) = f (x, y)dxdy (4) ΩC

where C is a 1D curve embedded in a 2D domain, ΩC is the inside region of C, and f (x, y) is a scalar function. This functional is used to maximize some quantity given by f (x, y) inside C. A simple example is f (x, y) = 1 which measures, and maximizes, the area. Calculating the ﬁrst variation of Eq. (4) yields the evolution equation: ∂C = −f (x, y)n (5) ∂t where n is the curve normal. Again, setting f (x, y) = 1 gives a constant ﬂow in the normal direction, typically referred to as the “balloon force”. The representation, or parameterization, of the contour C can in general be chosen arbitrarily. However, it is often convenient to use the implicit level set method by Osher and Sethian [12], since this allows for arbitrary topological changes. To summarize the basic ideas, a contour is represented implicitly as a zero level set of a time dependent scalar function (referred to as the level set function). Formally, a contour C is described by C = {x : φ(x, t) = 0}. To deform C, the level set function is evolved in time according to a set of partial diﬀerential equations (PDEs). The transition from the equations of motion for a parametrized curve (Eq. (5)) to a level set PDE is accomplished by a simple procedure. In general, the motion ∂C ∂t = γn translates to the level set equation ∂φ ∂t = γ |∇φ| [11]. Thus, Eq. (5) gives the familiar level set equation: ∂φ = −f (x, y) |∇φ| ∂t

(6)

The remainder of this section will describe how we modify the typical level set method update scheme to incorporate a momentum term as presented in Section 2. 3.1

Momentum for Minimizing Level Set Flows

We have noted that the contour evolving according to the Euler-Lagrange equation yields a gradient descent search. Recall that each contour can be represented as a point in the solution space (the structure of the space will depend on

128

G. Läthén et al.

parameterization). Thus, we can approximate the direction of the gradient by computing the vector between two subsequent points. In the level set framework we achieve this by taking the diﬀerence between two subsequent time instances of the level set function, representing the entire level set function as one vector: φ(tn ) − φ(tn−1 ) (7) Δt where f is a cost function in compliance with the terminology used in Section 2. Note that this is indeed an approximation, depending on the time diﬀerence Δt = tn − tn−1 . Following the ideas from Section 2, we update the level set function to incorporate a momentum term: ∇f (tn ) ≈

n ) − φ(tn−1 ) φ(t + ωs(tn−1 ) Δt φ(tn ) = φ(tn−1 ) + Δts(tn ) s(tn ) = −η(1 − ω)

(8) (9)

The complete procedure works as follows: Procedure UpdateLevelset 1

Given the level set function φ(tn−1 ), compute the next (intermediate) n ). This is performed by evolving φ according to a PDE time step φ(t (such as Eq. (6)) using standard techniques (e.g. Euler integration).

2

Compute the approximate gradient by Eq. (7).

3

Compute a step s(tn ) according to Eq. (8). This step eﬀectively modiﬁes the gradient direction by incorporating the momentum term as a fraction of the previous step s(tn−1 ).

4

Compute the next time step φ(tn ) by Eq. (9). Note that this replaces the intermediate level set function computed in Step 1.

The procedure is very simple and is directly compatible with any type of level set implementation.

4

Experiments

We now describe some details of the implementation and illustrate properties of the suggested method using two examples. Here we study 1D curves embedded in a 2D domain, but the approach readily generalizes to 2D surfaces in 3D given the level set framework. 4.1

Implementation Details

We have implemented the proposed ideas in Matlab using standard level set techniques based on [13, 14]. Reference code can be found online at the site http://dmforge.itn.liu.se/ssvm09/. Some details of our implementation are the following:

Momentum Based Optimization Methods for Level Set Segmentation

129

– The level set function is reinitialized (reset to a signed distance function) after Step 1 and Step 4. This is typically performed using the fast marching [15] or fast sweeping algorithms [16]. There are two reasons for this: Firstly it is required for stable evolution in time due to the use of explicit Euler integration. Secondly we want a momentum induced by the zero level set of φ (the contour), rather than all level sets of φ. Reinitialization could be omitted, with the eﬀect of introducing a momentum on all individual level sets. Interpreting each sample of φ as a parameter of the contour, this is equivalent to applying momentum on each parameter. While feasible, we have not experimented with momentum without incorporating reinitialization. – We avoid instabilities by dampening s(tn ) in Step 3 using a sigmoidal function: sˆ(s(tn ), smax ) =

1+

2smax −2s(t n )/smax e

− smax

(10)

where smax is the maximum step length allowed. – Any explicit or implicit time integration scheme can be used in Step 1. Due to its simplicity, we have used explicit Euler integration which might require several inner iterations in Step 1 to advance the level set function by Δt time units. 4.2

Weighted Region Based Flow

To verify our idea, we have used a simple energy functional based on a weighted region term (Eq. (4)) combined with a penalty on curve length for regularization. The goal is to maximize: E(C) = f (x, y)dxdy − α ds (11) ΩC

C

where α is a regularization weight parameter. The target function f (x, y) is image based, computed using the approach in [17]. This method uses quadrature ﬁlters [18] across multiple scales to detect line structures. Taking the real part of the complex ﬁlter response, f (x, y) gives positive values on the inside of linear structures, negative on the outside, and zero on the edges. Translating Eq. (11) to a level set PDE following [11] gives: ∂φ = −f (x, y) |∇φ| + ακ |∇φ| ∂t

(12)

where κ is the curvature of the contour. First we illustrate some properties of the method with a synthetic test image depicted in Figure 2(a), which mimics the common problem of intensity variation in medical imaging. The intensity of the object ranges from 0.3 to 1, while the noise level is 0.1. This image yields the target function f (x, y) in Figure 2(b) where bright and dark colors indicate positive and negative values respectively. As exempliﬁed in our ﬁrst experiment (Figure 3) the dip in contrast results in a local optimum in the solution space.

130

G. Läthén et al.

(b) Target function f (x, y)

(a) Input image

Fig. 2. Synthetic test image illustrating the presence of a local optima in the solution space

(a) time = 0

(b) time = 40

(c) time = 100

(d) time = 170

(e) time = 300

(f) time = 870

Fig. 3. Iterations without momentum (conventional gradient descent)

Figure 3 shows the results after evolving the level set function by Eq. (12) until convergence without momentum, using conventional methods. We deﬁne convergence as |∇f |∞ < 0.03 (using the inﬁnity/maximum norm), with ∇f given in Eq. (7). For this experiment we used parameters α = 0.7 and we reinitialized the level set function every ﬁfth time unit. For comparison, Figure 4 shows the results after running our method using parameters α = 0.7, ω = 0.8, η = 10, smax = 100, Δt = 5. Plots of the energy functional for both experiments are shown in Figure 5. Here, we plot the weighted area term and the length penalty term separately, to illustrate the balance between the two. Note that the functional without momentum in Figure 5(a) is monotonically increasing, due to the nature of gradient descent, while the functional with momentum visits a number of local maxima during the search. To further exemplify the behaviour of our method, we created a slightly modiﬁed version of Figure 2(a), shown in Figure 6(a). In contrast to Figure 2(a), the shape in Figure 6(a) is disconnected, so the global optimum is expected to contain two separated regions. Not surprisingly, conventional gradient descent captures only a local minimum as displayed in Figure 7, while gradient descent with momentum succeeds in capturing the global solution as two separated

Momentum Based Optimization Methods for Level Set Segmentation

(a) time = 0

(b) time = 20

(c) time = 40

(d) time = 60

(e) time = 150

(f) time = 200

(g) time = 245

(h) time = 320

(i) time = 460

131

Fig. 4. Iterations using momentum 1800

1800 Energy functional Length penalty integral Target function integral

1600

1600

1400

1400

1200

1200

1000

1000

800

800

600

600

400

400

200

200

0 0

100

200

300

400 500 time

600

700

(a) Without momentum

800

0 0

Energy functional Length penalty integral Target function integral 100

200

300

400

time

(b) With momentum

Fig. 5. Plots of energy functionals for synthetic test image in Figure 2(a)

regions (Figure 8). For this experiment, we used the same parameters as in Figure 3 and Figure 4. As a third test image we used a 458 × 265 retinal image from the DRIVE database [19], shown in Figure 9(a). The target function f (x, y) is illustrated in Figure 9(b). As in the previous experiment, bright and dark colors indicate positive and negative values for f (x, y). The convergent result without momentum using parameters α = 0.07 and reinitialization every tenth time unit is shown in Figure 10, given the initial condition in Figure 10(a). Applying

132

G. Läthén et al.

(b) Target function f (x, y)

(a) Input image

Fig. 6. Synthetic test image illustrating the presence of a local optima in the solution space

(a) time = 0

(b) time = 200

(c) time = 515

Fig. 7. Iterations without momentum (conventional gradient descent)

(a) time = 0

(b) time = 40

(c) time = 70

(d) time = 180

(e) time = 240

(f) time = 485

Fig. 8. Iterations using momentum

the idea of momentum yields the result in Figure 11, using the parameters α = 0.07, ω = 0.5, η = 1.3, smax = 40, Δt = 10. The energy functionals are plotted in Figure 12 to display the convergence of both methods.

5

Results

The synthetic test image in Figure 2(a) illustrates a local optimum in the solution space when applying the parameters in our ﬁrst experiment. As expected,

Momentum Based Optimization Methods for Level Set Segmentation

(a) Input image

133

(b) Target f (x, y)

Fig. 9. Retinal image

(a) time = 0

(b) time = 20

(c) time = 40

(d) time = 100

(e) time = 200

(f) time = 400

(g) time = 600

(h) time = 1210

Fig. 10. Iterations without momentum (conventional gradient descent)

134

G. Läthén et al.

(a) time = 0

(b) time = 20

(c) time = 40

(d) time = 100

(e) time = 200

(f) time = 400

(g) time = 600

(h) time = 820

Fig. 11. Iterations using momentum 9000

9000 8000

8000

7000

7000

6000

6000

5000

Energy functional Length penalty integral Target function integral

4000

5000

3000

3000

2000

2000

1000

1000

0 0

200

400

600 time

800

1000

(a) Without momentum

1200

Energy functional Length penalty integral Target function integral

4000

0 0

100

200

300

400 time

500

600

700

(b) With momentum

Fig. 12. Plots of energy functionals for the retinal image in Figure 9(a)

800

Momentum Based Optimization Methods for Level Set Segmentation

135

the conventional gradient descent approach converges to this local optimum as depicted in Figure 3. In contrast, our proposed method gains enough momentum in order to overstep the optimum, while at the same time the global solution is reached more rapidly. The process (illustrated in Figure 4) intuitively expands the curve beyond a local optimum, followed by a retraction if the search does not provide any increase in that direction. Using a slightly modiﬁed input image, our second example shows that our method is capable of capturing global optima, even when the solution consists of separated regions (Figure 8). Our third example illustrates our method on real data using a retinal image. In Figure 10 we see that conventional gradient descent fails to capture many weak signal blood vessels. This is a typical case of local optima solutions introduced by noise and poor image contrast. Under the same conditions, gradient descent with momentum captures practically all visible vessels as shown in Figure 11. Note that this example does not include any veriﬁcation of the accuracy of the segmented vessels. The primary purpose is to illustrate that our method reaches a stronger optimum value for the energy functional, as shown in Figure 12.

6

Conclusions and Future Work

In this paper we have presented the idea of gradient descent with momentum in the context of segmentation using the level set method. We have illustrated the drawbacks of conventional gradient descent and showed examples on how the solution is improved by adding momentum. In contrast to much of the previous work, we have improved the solution by changing the method of solving the optimization problem rather than changing the parameters of the energy functional. In the future, we will further study the general optimization problem of image segmentation to propose more eﬃcient solutions. Regarding the particular idea of momentum, we will apply this on real applications and verify the quality of the results.

References 1. Boykov, Y., Kolmogorov, V.: Computing geodesics and minimal surfaces via graph cuts. In: Proc. ICCV 2003, October 2003, vol. 1, pp. 26–33 (2003) 2. Papandreou, G., Maragos, P.: Multigrid geometric active contour models. IEEE Transactions on Image Processing 16(1), 229–240 (2007) 3. Kenigsberg, A., Kimmel, R., Yavneh, I.: A multigrid approach for fast geodesic active contours. Technical Report CIS-2004-06, Technion–Israel Inst. Technol., Haifa (2004) 4. Paragios, N., Mellina-Gottardo, O., Ramesh, V.: Gradient vector ﬂow fast geometric active contours. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(3), 402–407 (2004) 5. Goldenberg, R., Kimmel, R., Rivlin, E., Rudzsky, M.: Fast geodesic active contours. IEEE Transactions on Image Processing 10(10), 1467–1475 (2001)

136

G. Läthén et al.

6. Weickert, J., Kühne, G.: Fast methods for implicit active contour models. In: Geometric Level Set Methods in Imaging, Vision and Graphics. Springer, Heidelberg (2003) 7. Charpiat, G., Keriven, R., Pons, J.P., Faugeras, O.: Designing spatially coherent minimizing ﬂows for variational problems based on active contours. In: Proc. ICCV 2005, October 2005, vol. 2, pp. 1403–1408 (2005) 8. Sundaramoorthi, G., Yezzi, A., Mennucci, A.: Sobolev active contours. International Journal of Computer Vision 73(3), 345–366 (2007) 9. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation, pp. 318–362. MIT Press, Cambridge (1986) 10. Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, Heidelberg (2006) 11. Kimmel, R.: Fast edge integration. In: Geometric Level Set Methods in Imaging, Vision and Graphics. Springer, Heidelberg (2003) 12. Osher, S., Sethian, J.A.: Fronts propagating with curvature-dependent speed: Algorithms based on Hamilton-Jacobi formulations. Journal of Computational Physics 79, 12–49 (1988) 13. Osher, S., Fedkiw, R.: Level Set and Dynamic Implicit Surfaces. Springer, New York (2003) 14. Peng, D., Merriman, B., Osher, S., Zhao, H.K., Kang, M.: A pde-based fast local level set method. Journal of Computational Physics 155(2), 410–438 (1999) 15. Sethian, J.: A fast marching level set method for monotonically advancing fronts. Proceedings of the National Academy of Science 93, 1591–1595 (1996) 16. Zhao, H.K.: A fast sweeping method for eikonal equations. Mathematics of Computation (74), 603–627 (2005) 17. Läthén, G., Jonasson, J., Borga, M.: Phase based level set segmentation of blood vessels. In: Proc. ICPR 2008, Tampa, FL, USA, IAPR (December 2008) 18. Granlund, G.H., Knutsson, H.: Signal Processing for Computer Vision. Kluwer Academic Publishers, Netherlands (1995) 19. Staal, J., Abramoﬀ, M., Niemeijer, M., Viergever, M., van Ginneken, B.: Ridge based vessel segmentation in color images of the retina. IEEE Transactions on Medical Imaging 23(4), 501–509 (2004)

Optimization of Divergences within the Exponential Family for Image Segmentation Francois Lecellier1 , Stephanie Jehan-Besson2, Jalal Fadili1 , Gilles Aubert3 , and Marinette Revenu1 1

Laboratoire GREYC, University of Caen, France Laboratoire LIMOS, University of Clermont-Ferrand, France Laboratoire J.A. Dieudonné, University of Nice Sophia-Antipolis, France 2

3

Abstract. In this work, we propose novel results for the optimization of divergences within the framework of region-based active contours. We focus on parametric statistical models where the region descriptor is chosen as the probability density function (pdf) of an image feature (e.g. intensity) inside the region and the pdf belongs to the exponential family. The optimization of divergences appears as a flexible tool for segmentation with and without intensity prior. As far as segmentation without reference is concerned, we aim at maximizing the discrepancy between the pdf of the inside region and the pdf of the outside region. Moreover, since the optimization framework is performed within the exponential family, we can cope with difficult segmentation problems including various noise models (Gaussian, Rayleigh, Poisson, Bernoulli ...). We also experimentally show that the maximisation of the KL divergence offers interesting properties compare to some other data terms (e.g. minimization of the anti-log-likelihood). Experimental results on medical images (brain MRI, contrast echocardiography) confirm the applicability of this general setting.

1 Introduction We propose here to focus on the segmentation of homogeneous regions in noisy images using statistical region-based active contour models (RBAC). In RBAC, region-based terms can be advantageously combined with boundary-based ones [1, 2]. The evolution equation is generally deduced from a general criterion to minimize that includes both region integrals and boundary integrals. The combination of those two terms in the energy functional allows the use of photometric image properties, such as texture [3] and noise [4], as well as geometric properties such as the shape prior of the object to be segmented. In statistical region-based active contours, see [5] for a review, image features (e.g. intensity) are considered as random variables whose distribution may be parametric (e.g. Gaussian) or non parametric [6]. Classically, the authors consider the minimization of the anti-log-likelihood for segmentation [7,8,4]. In this paper, we rather focus on the optimization of distance between pdfs. Such distances or more generally divergences can be used in two different manner. On the one hand, they can be used for segmentation with distribution intensity prior and in this case, we aim at minimizing the distance between the pdf of the evolving region and a reference one. On the other hand, they can be used for segmentation without reference and in this second case, we aim X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 137–149, 2009. c Springer-Verlag Berlin Heidelberg 2009

138

F. Lecellier et al.

at maximizing the distance between the pdf of the inside region and the pdf of the outside region. In the literature, the minimization of divergences between non parametric pdfs has first been proposed in [6] for video sequences. It has then been developed for cardiac structures tracking in perfusion MRI (p-MRI) sequences in [9]. As far as segmentation using the maximization of divergences is concerned, some authors [10] have also proposed to take benefit of the maximization of the Bhattacharya distance of non parametric pdfs for segmentation. On the other hand, divergences between Gaussian distributions have been developed for DTI segmentation in [11]. In this paper, we propose to set a general framework for the optimization of divergences between parametric pdfs within the exponential family. To the best of our knowledge, such a framework has never been studied for region-based active contour segmentation. The rationale behind using the exponential family is that it includes, among others, Gaussian, Rayleigh, Poisson and Bernoulli distributions that have proven to be useful to model the noise structure [4] in many real image acquisition devices (e.g. Poisson for photon counting devices such as X-ray or CCD cameras, Rayleigh for ultrasound images, etc). Using shape derivative tools as in [12, 6], our effort focuses on constructing a general expression for the derivative of the energy (with respect to a domain), and on deriving the corresponding evolution speed. Our general framework is also specialized to some particular cases, such as the optimization of the KullbackLeibler (KL) divergence [13], which gives a simple expression of the derivative. This theoretical framework is then more explicitly detailed and illustrated for the case of the segmentation without reference. In this case, we aim at maximizing the dissimilarity between the pdf of the intensity within the region inside the evolving contour and the pdf of the intensity within the region outside the contour. In other words, we perform a competition between the pdfs of these two regions through the maximisation of divergences. Experimental results are given for the particular case of the KL divergence. We experimentally compare this data term to the classical minimization of the anti-log-likelihood [7, 14] for the segmentation of the White Matter in brain MRI and we show that KL maximisation is able to extract a single Gaussian from a mixture of Gaussian. We also show the applicability of our data term for the segmentation of the left ventricle in contrast echocardiography where the noise is modelled using Rayleigh. In this paper, we first set our general setting and introduce shape gradients in section 2. In section 3, we propose to give some general results for the exponential family and then for the shape derivative of divergences between pdfs. These results are then specialized for the KL divergence using the Maximum Likelihood Estimation (MLE) for the parameters. Experimental results for the maximization of KL divergence are given in section 4.

2 Optimization of Divergences between Pdfs: General Setting In this section, we set our general setting for segmentation through the optimization of distances between pdfs or more generally divergences. 2.1 General Setting Consider a function y : Rn → χ ⊂ R which describes the feature of interest. The term y(x) then represents the value of the feature y at location x where x ∈ Rn . Let q(y, Ω)

Optimization of Divergences within the Exponential Family for Image Segmentation

139

be the probability density function (pdf) of the feature y within the image region of interest. We now assume that we have a function Ψ : R+ × R+ → R+ which allows us to compare two pdfs. This function is small if the pdfs are similar and large otherwise. It allows us to introduce the following functional which represents the distance or more generally the divergence between the current pdf estimate q(y, Ω) and another one p(y) which may also depend on another domain: D(Ω) = Ψ (q(y, Ω), p(y)) dy. (1) χ

The distance can be for example the symmetrized Kullback-Leibler divergence when p(y) Ψ (q, p) = 12 (p(y) log q(y,Ω) + q(y, Ω) log q(y,Ω) p(y) ). Such divergences represent a general setting for both segmentation with and without reference. Indeed, in segmentation problems, we generally search for homogeneous regions regarding with a given feature. We may then modelize the segmentation problem as the maximization of the distance between the pdf of the feature within the inside region and the pdf of the feature within the outside region. In order to fix ideas, let us consider a partition of an image in two regions where Ω is the inside region and Ω c the complementary outside region. The segmentation may then be formulated as the maximization of the following criterion: D(Ω, Ω c ) = Ψ (q(y, Ω), p(y, Ω c )) dy. (2) χ

On the other hand, we can also consider that we have a reference histogram pref and that we search for the domain that minimizes the divergence between q and pref . This last framework may be applied to tracking or to supervised segmentation where a reference pdf is learned on the region of interest. The theoretical results given in this paper can be used for both applications. 2.2 Shape Gradient Descent In order to find an optimum, we perform a shape gradient descent using region-based active contours. We then have to compute the derivative of the criterion according to the domain using shape derivation tools [15]. Shape derivative tools applied to region-based active contours are described in [12, 6] and we won’t remind all the definitions in this paper. Let us just remind that, from the shape derivative, we can derive the evolution equation that will drive the active contour towards a (local) minimum of the criterion. Let us suppose that the shape derivative of the criterion D(Ω) in the direction V may be written as follows: < D (Ω), V >= −

speed(x, Ω)(V(x) · N(x))da(x) ,

(3)

∂Ω

where N is the unit inward normal to ∂Ω and da its area element. When minimizing the distance D(Ω), interpreting equation (3) as the L2 inner product on the space of velocities, the straightforward choice is to take V = speed(x, Ω)N. When minimizing the distance D(Ω), we can then deduce the following evolution equation: ∂Γ = speed(x, Ω) N(x) , ∂τ

(4)

140

F. Lecellier et al.

On the contrary, when maximizing the criterion, we take the opposite sign for the velocity.

3 General Results for Shape Derivative of Divergences within the Exponential Family In this paper, we consider that pdfs belong to the exponential family. In this case, the current pdf estimate q(y, Ω) is now indexed by a set of parameters θ ∈ Θ ⊂ Rκ (e.g. we have κ = 2 and θ = (μ, σ)T where μ is the mean and σ the variance for the Normal family). When using the exponential family, we rather index the pdf by η which is the natural parameter as explained below. In order to derive the criterion, we must take into account the dependence of the natural parameter with the domain. We then restrict our study to the full rank κ-parameter canonical exponential family [16]. For this family, we can establish a 1-1 correspondence between η and Ω and so compute directly the shape derivative of D(Ω). In the sequel, let us first introduce the exponential family and some properties and then explain the computation of the shape derivative. We then specialize our result when parameters are estimated using the Maximum Likelihood Estimation (MLE) method. We also give some results for the optimization of the Kullback-Leibler (KL) divergence. In this case, the shape derivative reduces to a very simple general expression. 3.1 The Exponential Family: Definition and Properties The multi-parameter exponential family [17] is naturally indexed by a κ-dimensional real parameter vector η and a κ-dimensional natural statistic vector T(Y ). We draw the reader’s attention to the fact that η is a function of θ ∈ Θ which is the parameter of interest in most applications (for the Gaussian distribution, we have θ = (μ, σ)T ). Definition 1. The family of distributions of a Random Variable (RV) Y {qθ : θ ∈ Θ ⊆ Rκ }, is said a κ-parameter canonical exponential family, if there exists real-valued functions: • • • •

η(θ) = [η 1 , ..., η κ ]T with η i : Θ ⊆ Rκ → R h:R→R B:Θ→R T = [T1 , ..., Tκ ]T : Rκ → R

such that the pdf qθ (y) may be written as: qθ (y) = h(y) exp[η(θ), T(y) − B(θ)]

with

y∈χ⊂R.

(5)

The term T is called the natural sufficient statistic and η the natural parameter vector. The term η, T denotes the scalar product. Letting the model be indexed by the natural parameter η rather that θ, the canonical κ−parameter exponential family generated by T and h is defined as follows: qη (y) = h(y) exp[η(θ), T(y) − A(η)] , (6) +∞ with A(η) = log −∞ h(y) exp[η(θ), T(y)]dy. The natural parameter space is defined as E = {η ∈ Rκ ; −∞ < A(η) < +∞}.

Optimization of Divergences within the Exponential Family for Image Segmentation

141

Some Common Distributions. Table 1 provides a synthetic description of some common distributions of the exponential family: Table 1. Some common canonical exponential families. B(α, β) is the Euler Beta function. Distribution

θT

η(θ)T

T(y)T

Normal

(μ, σ 2 )

−1 ( σμ2 , 2σ 2)

(y, y 2 )

1 2

A(η) η2

− 2η12 − log

−η2 π

Gamma (λ, p) (−λ, p − 1) (y, log y) −(η2 + 1) log −η1 + log Γ (η2 + 1) Beta (r, s) (r − 1, s − 1) (log y, log(1 − y)) − log B(η1 + 1, η2 + 1) Poisson μ log μ y eη Exponential λ −λ y − log −η Rayleigh θ2 −1/2θ2 y2 − log −2η

Properties. The following results will be useful for our RBAC scheme based on the exponential family. Their proofs may be found in [16]. These properties give us a relation between the parameters η and the domain Ω through the use of the expectation of the natural statistics T(Y ). The first theorem provides general relations between the gradient of A and the expectation of T(Y ) while the second theorem allows us to establish a 1 − 1 correspondence between η and E[T(Y )] (for the full rank exponential family). Such a relation may then be used to express the parameter η and derive it according to the domain. Theorem 1. Let {qη : η ∈ E} a κ-parameter canonical exponential family with natural sufficient statistic T(Y ) and open natural parameter space E, we then have the following properties: 1. 2. 3. 4.

E is convex. A:E → S ⊆ R is convex. E[T(Y )] = ∇A(η). ¨ Cov[T (Y )] = A(η).

∂A ∂A ∂A T where ∇A = ( ∂η , , .., ∂η ) represents the gradient of A, and A¨ is the Hessian 1 ∂η2 κ 2 matrix of A with A¨ij = ∂ A . ∂ηi ∂ηj

The following theorem establishes the conditions of strict convexity of A, and then those for ∇A to be 1-1 on E. This is a very useful result for optimization (derivation) purposes: Theorem 2. Let {qη : η ∈ E} a full rank (i.e. Cov[T (Y )] is a positive-definite matrix) κ-parameter canonical exponential family with natural sufficient statistic T(Y ) and open natural parameter space E, we have [16]: 1. η → ∇A(η) is 1-1 on E. 2. The family may be uniquely parameterized by μ(η) ≡ E[T(Y )] = ∇A(η). 3. The anti-log-likelihood function is a strictly convex function of η on E. These results establish a 1-1 correspondence between η and E[T(Y )] such that: μ = ∇A(η) = E[(T(Y )] ⇔ E η = φ (E[T(Y )]) , holds uniquely with ∇A and φ continuous.

(7)

142

F. Lecellier et al.

Estimation of the Hyperparameters. The relation 7 allows us to express the parameter η as a function of E[(T(Y )]. In order to estimate the parameters, we replace E[T(Y )] by the empirical estimate of the mean T(Y ). This corresponds to the MLE of the parameter. Indeed, the MLE of η corresponds to minimizing the anti-log-likelihood score (for independent and identically distributed (iid) data). By differentiation of the anti-log-likelihood according to η, we find ∇A(η MLE ) = T (Y ). Note however that in this case, this is the discrete sample mean. The following example illustrates this stating: −1 Example 1. When dealing with the Rayleigh distribution, we have η = 2θ 2 , A(η) = 2 − log(−2η) and T (y) = y . By computing A (η) = T(Y ), we find that − η1 = 1 2 y(x)2 dx, which corresponds to the MLE of the parameter θ2 given by θML |Ω| Ω 1 = 2|Ω| y(x)2 dx. Ω

3.2 Shape Derivative of the Criterion In this section, we propose to derive according to the domain the functional (1). The dependence of the functional with the domain is due to the estimation of the parameter η detailed above. In the sequel, for the sake of simplicity, we will invariably denote η for the natural parameter and its finite sample estimate over the domain (without a ˆ ). We are now ready to state our main result: slight abuse of notation, this should be η Theorem 3. The Gâteaux derivative, in the direction of V, of the functional (1), is: < D (Ω), V >= ∇V η, C ,

(8)

where ∇V η = [< ∇η1 (Ω), V >, ..., < ∇ηκ (Ω), V >] is the Gâteaux derivative of η in the direction of V, ., . is the usual scalar product of two vectors and: C = E[∂1 Ψ (q(Y, η(Ω)), p(Y ))(T(Y ) − E[T(Y )])]. The term ∂1 Ψ denotes the partial derivative of Ψ according to the first variable. The proof is detailed in Appendix A.2. We then have to compute the shape derivative ∇V η. Such a computation requires an estimation of the expectation E[T(Y )] as explained in the next section. 3.3 Computing the Shape Derivative for the MLE Estimator As mentioned in section (3.1.3), the expectation E[T(Y )] can be replaced with the empirical estimate of the mean T(Y ) which is computed over the considered domain Ω. Using such an estimation for the hyperparameter, we can state the following proposition: Lemma 1. Within the full rank exponential family, and using the MLE estimator for the hyperparameters, the shape derivative ∇V η can be expressed as: ¨ −1 ∇V (T) . ∇V η = A(η)

(9)

Optimization of Divergences within the Exponential Family for Image Segmentation

143

¨ −1 = I(η)−1 is the inverse of the Hessian matrix of A¨ which is also the where A(η) fisher information matrix I. The derivative ∇V (T) is given by: 1 T(y) − T(y(a)) (V · N)da(x) , (10) ∇V (T) = |Ω| ∂Ω The proof is given in Appendix A.3. We can then replace the shape derivative of the natural parameters given in Lemma 1 in the general Theorem 3. The corollary that gives the shape derivative then follows: Corollary 1. The Gâteaux derivative, in the direction of V, of the functional (1), is: 1 < D (Ω), V >= |Ω|

κ

∂Ω

i=1

Ci

κ

−1 ¨ [A(η)]ij (Tj (y) − Tj (y(a))) (V · N)da ,

j=1

where the κ components of the vector C are defined as follows: Ci = E[∂1 Ψ (q(Y, η(Ω)), p(Y ))(Ti (Y ) − Ti (Y )] i ∈ [1, κ]. The term ∂1 Ψ denotes the partial derivative of Ψ according to the first variable. In order to fix ideas, the functional D(Ω) can be chosen as the Kullback-Leibler divergence, in this case ∂1 Ψ (q, p) = log q + 1 − log p − pq . In order to compute the vector C in Corollary 1, we can assume that the pdf p belongs to the exponential family and to the same parametric law as the pdf q. Let us denote by η 1 the parameter of the pdf p. This parameter is supposed to be already computed or dependent of another domain and so does not depend on the domain Ω. We then state the following proposition: Lemma 2. When p(y, η 1 ) and q(y, η(Ω)) are two members of the exponential family that belong to the same parametric law with respective parameters η 1 and η, and when the functional D(Ω) is chosen as the KL divergence, we find for the vector C defined in Theorem 1: ¨ C = A(η)(η − η 1 ) + ∇A(η) − ∇A(η 1 ) . A proof is given in appendix C. This expression demonstrates that the derivative can be very simply computed using the natural parameters and the sufficient statistics of the law. Let us give two examples of computation for both the Rayleigh and the Gaussian law. Example 2. When dealing with the Rayleigh following example 2, with distribution, θ12 1 2 2 2 θ2 θ = 2 y , the term C is equal to C = 2θ θ2 − θ2 . we then find for the derivative 1 of KL divergence: 1 y(a)2 C < KL (Ω), V >= (1 − )(V · N)da(x) . (11) 2 |Ω| ∂Ω 2θ 2θ2 Example 3. When dealing with the Gaussian distribution, the term C is equal to

2 C=

( σσ 2 + 1)(μ − μ1 ) r 2 μ2 + σ 2 − μ21 − σ12 + 2 σσ 2 μ2 − μμ1 + σ 4 ( σ12 − 1

1

1 ) σ2

.

(12)

144

F. Lecellier et al.

We then find for the derivative of KL divergence: < KL (Ω), V >= 2μ 1 μ −(y − μ) C (1 + ) − C 1 2 2 σ 2 |Ω| ∂Ω σ2 σ C2 μ +(y 2 − σ 2 − μ2 ) C1 2 − 2 (V · N)da(x) . σ 2σ

4 Maximisation of Divergences In this section, we propose to concentrate on the segmentation of an image into two regions (namely Ω and its complement Ω c ) by maximizing the criterion 2. 4.1 Evolution Equation When using the MLE estimator for the parameters, and noting that Ω and Ω c shares 1 the same boundary with opposite normals, we take T(y) = |Ω| Ω T(y(x))dx and c 1 T(y) = |Ω c | Ω c T(y(x))dx. Using Corollary 1 and the fact that < D (Ω, Ω c ), V >= ∇V η, C + ∇V η c , Cc , we find for the evolution equation: κ κ 1 ∂Γ ¨ −1 (Tj (y) − Tj (y(x)) = Ci (Ω) A(η) ij ∂τ |Ω| i=1 j=1

−

κ κ 1 c ¨ c )−1 (Tj (y)c − Tj (y(x)) N. C (Ω ) A(η i ij |Ω c | i=1 j=1

For the KL divergence, the term C is evaluated as explained in section 3.3. A classical regularization term λκ is added where λ is a positive constant and κ the curvature. As far as the numerical implementation is concerned, we use the level set method approach first proposed by Osher and Sethian [18]. 4.2 Comparison with Other Methods in the Gaussian Case In this section, we propose to compare the behavior of our data term based on the maximization of the symmetrized Kullback-Leibler divergence between parametric pdfs to two other well-known region-based methods [7, 14]. The first method is the famous Chan & Vese method [14]. Such a criterion implies a Gaussian distribution for the feature y with a fixed variance. The corresponding evolution equation can be found in [14]. The second method has been first proposed by [7] and aims at minimizing the anti-loglikelihood for a Gaussian distribution. The evolution equation can be found in [7]. In order to compare these terms, let us express the non symmetrized KL divergence using the expectation under the pdf q, denoted by Eq , as follows: D(qp) = Eq [log(q(Y, η Ω ))] − Eq [log(p(Y, η Ω c ))]

(13)

Optimization of Divergences within the Exponential Family for Image Segmentation

145

To get the gist of using KLD as a criterion in an RBAC functional, consider the data yi = {y(x)|x ∈ Ω} as an iid sequence from the statistical model q(y, η Ω ). Using the weak law of large number for a very large domain Ω, the first term (which corresponds 1 to the entropy) can then be expressed as |Ω| log(q(y(x), η(Ω))dx. Maximizing the Ω first term in KL divergence can then be seen as equivalent to minimizing the anti-loglikelihood score [19] divided by the size of the sample (which corresponds to the entropy under the law of large number). Using the same assumptions, the second term of KL divergence can be seen as the minimization of the plausibility of the data provided by Ω c in the inside region Ω. When using the symmetrized version, we act both on Ω and Ω c . Let us now compare experimentally the behavior of these criterions for the extraction of an homogeneous region corrupted by a Gaussian noise in an image. We propose to take the example of the segmentation of the White Matter (WM) in T1-weighted brain MRI images. We perform the three evolution equations using the Gaussian assumption for the pdf of the feature y within each region. The feature y is chosen as the Intensity of the image. The initial contour is given in Figure 1.(a) and we also show the two initial pdfs (b), namely qη (I, Ω) which corresponds to the distribution of the intensity I inside the region Ω and qηc (I, Ω c ) which corresponds to the distribution of I inside

0.025 hist_in hist_out

0.02

0.015

0.01

0.005

0 0

50

100

150

200

Intensity

(a) initial contour

0.03

(b) associated pdfs

0.03

0.06

hist_in hist_out

hist_in hist_out

0.025

hist_in hist_out

0.025

0.05

0.02

0.02

0.04

0.015

0.015

0.03

0.01

0.01

0.02

0.005

0.005

0

0.01

0 0

50

100

150

Intensity

(c) Chan & Vese

200

0 0

50

100

150

Intensity

(d) log likelikood

200

0

50

100

150

200

Intensity

(e) KL maximization

Fig. 1. T1-weighted brain MRI segmentation results (extraction of the White Matter). The pdf of the intensity inside the contour is in solid line, the pdf of the intensity outside the contour is in dotted lines. (a): initial contour and (b) : associated pdfs, column (c): final contour and pdfs for the Chan & Vese method [14], column (d): for the log-likelihood method [7], column (e): for the maximization of the KL divergence.

146

F. Lecellier et al.

the region Ω c (i.e. outside the region ). In Figure 1, we can observe the final active contour obtained using our criterion (22) and the two other criterions mentioned above. We can remark that our criterion acts as an extractor of the most important Gaussian in the initial mixture of Gaussian (see Figure 1.e). The two other criterions separate the mixture without extracting a single Gaussian. So, with our method, we can directly obtain the White Matter of the brain without a multiphase scheme. 4.3 Examples of Applications In this part, we consider two examples of application (brain MRI images and contrast echocardiogaphy) using two different noise models (Gaussian and Rayleigh). Concerning 3D T1-weighted MRI images of the brain, the noise model is assumed to be represented by a Rician distribution [20]. For large signal intensities the noise distribution can be considered as a Gaussian distribution (this is the case for the White Matter (WM) or the Gray Matter (GM)). We propose in Figure 2 an example of WM segmentation by maximizing the KL divergence between Gaussian distributions. When evaluating quantitatively our results of WM segmentation on the simulated brain T1weighted MRI images provided by the Montreal Neurological Institute Brain Web URL, we find a dice coefficient of 0.91, a very law False Positive Fraction (FPF) of 0.8% and a True Positive Fraction (TPF) of 84%.

(a) 3D rendering of the WM

(b) slice 72

(c) slice 75

(d) slice 84

Fig. 2. 3D Segmentation of WM in a T1 brain MRI using KL maximization

As the Rayleigh distribution is well suited to model noise in echography [20], this noise model was applied for segmentation of the left ventricle in contrast echocardiography. Final contours for several images of the sequence are shown in Figure 3. The segmentation is accurate all along the sequence. Note that experimental results reported in [21, 4] prove that when using the appropriate noise model, segmentation results are more accurate and less sensitive to the choice of the regularization parameters.

frame 1

frame 31

frame 40

Fig. 3. Segmentation of the LV in a contrast echocardiographic sequence

Optimization of Divergences within the Exponential Family for Image Segmentation

147

References 1. Kass, M., Witkin, A., Terzopoulos, D.: Snakes: Active contour models. International Journal of Computer Vision 1, 321–332 (1988) 2. Caselles, V., Kimmel, R., Sapiro, G.: Geodesic active contours. International Journal of Computer Vision 22(1), 61–79 (1997) 3. Aujol, J.F., Aubert, G., Blanc-Féraud, L.: Wavelet-based level set evolution for classification of textured images. IEEE Transactions on Image Processing 12(12), 1634–1641 (2003) 4. Martin, P., Réfrégier, P., Goudail, F., Guérault, F.: Influence of the noise model on level set active contour segmentation. IEEE PAMI 26, 799–803 (2004) 5. Cremers, D., Rousson, M., Deriche, R.: A review of statistical approaches to level set segmentation: integrating color, texture, motion and shape. International Journal of Computer Vision 72(2), 195–215 (2007) 6. Aubert, G., Barlaud, M., Faugeras, O., Jehan-Besson, S.: Image segmentation using active contours: Calculus of variations or shape gradients? SIAM Applied Mathematics 63(6), 2128–2154 (2003) 7. Zhu, S., Yuille, A.: Region competition: unifying snakes, region growing, and bayes/MDL for multiband image segmentation. IEEE PAMI 18, 884–900 (1996) 8. Paragios, N., Deriche, R.: Geodesic active regions: A new paradigm to deal with frame partition problems in computer vision. JVCIR 13, 249–268 (2002) 9. Rougon, N., Discher, A., Prêteux, F.: Region-based statistical segmentation using informational active contours. In: SPIE Conf. on Mathematics of Data/Image Pattern Recognition, San Diego, CA (August 2006) 10. Michailovich, O., Rathi, Y., Tannenbaum, A.: Image segmentation using active contours driven by the bhattacharyya gradient flow. IEEE Transactions on Image Processing 16, 2787– 2801 (2007) 11. Wang, Z., Vemuri, B.: DTI segmentation using an information theoretic tensor dissimilarity measure. IEEE Transactions on Medical Imaging 24(10), 1267–1277 (2005) 12. Jehan-Besson, S., Barlaud, M., Aubert, G.: DREAM2 S: Deformable regions driven by an eulerian accurate minimization method for image and video segmentation. International Journal of Computer Vision (53) , 45–70 (2003) 13. Kullback, S.: Information Theory and Statistics. Wiley, New York (1959) 14. Chan, T.F., Vese, L.A.: Active contour without edges. IEEE Transactions on Image Processing 10, 266–277 (2001) 15. Delfour, M., Zolésio, J.: Shape and geometries. Advances in Design and Control. SIAM, Philadelphia (2001) 16. Bickel, P., Docksum, K.: Mathematical statistics: basic ideas and selected topics, 2nd edn., vol. I. Prentice-Hall, London (2001) 17. Koopman, P.: On distributions admitting a sufficient statistic. Trans. Am. Math. Soc. 39, 399–409 (1936) 18. Osher, S., Sethian, J.: Fronts propagating with curvature-dependent speed: Algorithms based on hamilton-jacobi formulation. Journal of Computational Physics 79, 12–49 (1988) 19. Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (1991) 20. Goodman, J.: Some fundamental properties of speckle. J. of Optical Society of America 66, 1145–1150 (1976) 21. Lecellier, F., Jehan-Besson, S., Fadili, J., Aubert, G., Revenu, M.: Statistical region-based active contours with exponential family observations. In: ICASSP, vol. 2, pp. 113–116 (2006)

148

F. Lecellier et al.

A Appendix A.1 Shape Derivation Tools Let us remind this useful theorem [15] that will be used in the following proofs. Theorem 4. The Gâteaux derivative of the functional J(Ω) = f (x, Ω) dx in the Ω direction of V is: < J (Ω), V >= fs (x, Ω, V)dx− f (x, Ω)(V·N )da(x) where Ω

∂Ω

N is the unit inward normal to ∂Ω, da its area element and fs the shape derivative of f [15]. A.2 Proof of Theorem 3 To compute < D (Ω), V >, we must first get the derivative of q(y(x), η) with respect to the domain, and apply the chain rule to Ψ (q(y(x), η), p(y)). To simplify the notation we write the Eulerian derivative of η as < η (Ω), V >= ∇V η = [< η1 (Ω), V >, .., < ηκ (Ω), V >]T . Using the definition of q(y, η) given in (6) and the chain rule applied to A (η(Ω)), we obtain: = h(y) (∇V η, T(y)−∇V η, ∇A(η)) eη(Ω),T(y)−A(η(Ω)) (14) = q(y, η)∇V η, T(y) − ∇A(η) . By the chain rule applied to Ψ (q(y(x), η), p(y)), we get < Ψ (q(y, η), p(y)), V >=< q (y, η), V > ∂1 Ψ (q, p), which gives < D (Ω), V >= χ q(y, η)∂1 Ψ (q, p)∇V η, T(y) − ∇A(η)dy. We introduce C= q(y, η)∂1 Ψ (q, p) (T(y) − ∇A(η)) dy = E[∂1 Ψ (q, p) (T(Y ) − E[T(Y )])] χ

which completes the proof. A.3 Proof of Lemma1 When using the MLE, the term E[T(Y )] can be empirically estimated with T(Y ) and so derived easily with respect to the domain Ω. We propose to directly derive the expression ∇A(η) = T(Y ) which gives: κ j=1

< ηj , V >

∂2A (η) =< Ti (Y ) , V > ∂ηi ∂ηj

∀i ∈ [1, κ] ,

(15)

¨ which can be written in the compact form ∇V (T) = A(η)∇ V η. ¨ Restricting our study to the full rank exponential family, where A(η) is a symmetric positive-definite, hence invertible, matrix (Theorem 2), the domain derivative of the pa¨ −1 ∇V (T) = ∇V η where ∇V (T) is given rameters η is uniquely determined by A(η) 1 by: ∇V (T) = |Ω| ∂Ω T(y) − T(y(a)) (V · N)da(x) (taking benefit of theorem 4) and the lemma follows.

Optimization of Divergences within the Exponential Family for Image Segmentation

149

A.4 Proof of Lemma 2 Since p and q belongs to the same parametric law, they share the same value for h(y), T(y) and A(η) and then log(q) − log(p) = η − η 1 , T(y) − A(η) + A(η 1 ). The value of C is then C = s1 − s2 , with: s1 = E[(η − η 1 , T(y) − A(η) + A(η 1 ) + 1)(Ti (Y ) − E[Ti (Y )]] p s2 = E[ (Ti (Y ) − E[Ti (Y )]]Ep [(Ti (Y ) − E[Ti (Y )]] q Developing the expression of the expectation of the second term,we find s2 = Ep [(Ti (Y ) − E[Ti (Y )]] = ∇A(η 1 ) − ∇A(η). Using the linearity of the expectation and the fact that E[Tj (Y )(Ti (Y )] − E[Ti (Y )]E[Tj (Y )] designates the co¨ ij = variance matrix of the sufficient statistics T and can then be replaced by A(η) κ ¨ ¨ ¨ Cov[T(Y )]ij = A(η)ji , we find: s1 = j=1 (ηj − η1j )A(η)ij , and then C = A(η) (η − η1 ) + ∇A(η) − ∇A(η 1 ).

Convex Multi-class Image Labeling by Simplex-Constrained Total Variation Jan Lellmann, Jörg Kappes, Jing Yuan, Florian Becker, and Christoph Schnörr Image and Pattern Analysis Group (IPA) HCI, Dept. of Mathematics and Computer Science, University of Heidelberg {lellmann,kappes,yuanjing,becker,schnoerr}@math.uni-heidelberg.de

Abstract. Multi-class labeling is one of the core problems in image analysis. We show how this combinatorial problem can be approximately solved using tools from convex optimization. We suggest a novel functional based on a multidimensional total variation formulation, allowing for a broad range of data terms. Optimization is carried out in the operator splitting framework using Douglas-Rachford Splitting. In this connection, we compare two methods to solve the Rudin-Osher-Fatemi type subproblems and demonstrate the performance of our approach on single- and multichannel images.

1

Introduction

In this paper, we study the variational approach inf f (u) , f (u) = − u(x), s(x)dx + λ TV(u) , u∈C

λ>0,

(1)

Ω

for determining a labeling u : Ω → RL , that is a contextual classiﬁcation of each pixel x ∈ Ω into one out of L classes, based on an arbitrary vector-valued similarity function s(x) ∈ RL as input data that has been computed from image data beforehand. The objective function (1) comprises the common form of a data term plus a regularization term. The data term is given by the L2 inner product of the assignment variables u and the similarity function s, and the regularizer is a total variation (TV) formulation for vector-valued data, TV(u) = ∇u1 2 + · · · + ∇uL 2 dx . (2) Ω

Furthermore, the constraint u ∈ C restricts the vector ﬁeld u(x) at each location x ∈ Ω to lie in the standard probability simplex, that is u(x) ∈ RL + and L i=1 u(x) i = 1 for all x ∈ Ω. Our work is motivated by the following observation. Suppose that at each pixel x ∈ Ω, there is an unambiguous assignment (labeling) of the data s(x) to some class l ∈ {1, . . . , L} represented by the corresponding l-th unit vector, X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 150–162, 2009. c Springer-Verlag Berlin Heidelberg 2009

Convex Multi-class Image Labeling by Simplex-Constrained Total Variation

151

Fig. 1. Left: Noisy input image. Right: The labeled image based on the non-binary assignment u as global minimizer of the convex approach (1). The discrete problem is accurately solved by a continuous approach.

u(x) = el . Then, an interface with√area A between two image regions labeled with l and l , respectively, adds A 2 to the regularization term iﬀ l = l , as all but two gradients under the square root vanish. As a result, under these √ assumptions and up to the immaterial constant 2, the TV term corresponds to the well-known Potts model that assigns constant penalties to local changes of the labeling. A signiﬁcant diﬀerence between the Potts model and our approach (1), however, is that the former amounts to solve a discrete combinatorial problem, whereas the latter is a continuous convex optimization problem. Experiments show that our approach (1) approximates discrete decisions fairly well (Fig. 1 and 2) by computing a global optimum to a single convex optimization problem. By contrast, the state-of-the-art discrete approach [1] approximates the combinatorial solution by solving a non-uniquely deﬁned sequence of binary problems via graph cuts. This fact, along with the potential of continuous convex optimization for parallel implementations and their more robust dependency on (hyper-) parameters, motivated to investigate the approach (1) as a promising model for a general “labeling submodule” within computer vision systems. To this end, – We have a closer look at the data and regularization terms (section 2). – We apply an operator splitting approach to (1) in order to decompose the computation of a globally optimal labeling into two independent computational steps: TV denoising for vector-valued data, and projection of the labeling vectors u(x) on the canonical simplex (section 3). – We evaluate two diﬀerent algorithms for the TV denoising subroutine (section 4) and compare the performance of our convex method to a range of established graph cut-based approaches (section 5). Related work. In contrast to the binary case with anisotropic discretization [2], multi-class energies are generally not submodular and thus cannot be optimized globally using graph cuts [3]. Some extensions exist, which ﬁnd a local minimum by solving a sequence of binary graph cuts [1]. The continuous formulation – optimization on the set of characteristic functions – is known as continuous cut [5]. Chan et al. [6] showed that this problem can be relaxed and solved on

152

J. Lellmann et al.

Fig. 2. Output of the standard TV approach [4] for scalar-valued images applied to the noisy input image depicted in Fig. 1, for diﬀerent values of the regularization parameter λ. Irrespective of this value, the performance is worse than with the approach (1) (cf. Fig. 1, right), because the latter approximates the Potts model that does not depend on the size (contrast) of discontinuities. Consequently, the former approach cannot remove noise without degrading weak discontinuities, as is apparent above for the horizontal discontinuities.

a convex set, without losing global optimality. In contrast, our work is aimed at the multi-class case. In [7], a comparable approach based on [8] was presented, which relies on a natural ordering of the labels, as given in e.g. stereo reconstruction. An approach very similar to ours was recently presented in [9], where the authors use a diﬀerent formulation of the total variation on vector ﬁelds, and an alternating optimization method. The (discrete) Potts model was studied in [10], where approximate solutions were computed by an LP relaxation with explicit constraints. In contrast, our approach considers the general TV term and a problem decomposition into eﬃciently solvable subproblems, without the need to introduce additional variables. Notation. We consider the discretized version of our approach (1). Let Ω = {1, . . . , n1 } × · · · × {1, . . . , nd } ⊆ Rd , d ∈ N, denote a regular image grid of n := |Ω| pixels. The (multidimensional) image space X := Rn×L is equipped with the Euclidean inner product ·, ·Ω over the vectorized elements. We naturally identify v = (v 1 , . . . , v L ) ∈ Rn×L with ((v 1 ) · · · (v L ) ) ∈ RnL . Superscripts v i denote a collection of vectors, while subscripts vk denote vector components. Using the notation e = (1, 1, . . . , 1) , the standard simplex on RL n×L L and its extension C on R are given by ΔL := v ∈ R v ≥ 0 , e, v = 1

and C := x∈Ω ΔL . Deﬁne δC (x) to be 0 iﬀ x ∈ C, and +∞ otherwise. Let grad := (grad 1 , . . . , gradd ) be the d-dimensional forward diﬀerence gradient operator for Neumann boundary conditions. Accordingly, div := −grad is the backward diﬀerence divergence operator for Dirichlet boundary conditions. These operators extend to Rn×L via Grad := (IL ⊗ grad), Div := (IL ⊗ div), where IL is the L × L identity matrix. We will also need the convex sets L 12 Bλ := (p1 , . . . , pL ) ∈ Rd×L pi 22 λ , i=1

(3)

Convex Multi-class Image Labeling by Simplex-Constrained Total Variation

Dλ :=

Bλ ⊆ Rn×d×L , Eλ := {u ∈ Rn×L |u = Div p , p ∈ Dλ } .

153

(4)

x∈Ω

The discrete total variation on vector-valued data is then deﬁned as

TV(u) := σE1 (u) = Gx u2 ,

(5)

x∈Ω

where σM (u) := supp∈M u, p is the support function from convex analysis, and Gx is an (Ld)×n matrix composed of rows of (Grad) s.t. Gx u gives the gradients of all ui in x stacked one above the other.

2

Variational Approach

Based on the introduced notation, our novel approach (1) reads inf f (u) ,

u∈C

λ TV(u) , f (u) = −u, sΩ + data term regularization term

λ>0,

(6)

As the objective function f and the constraint set C are convex, the overall problem is convex as well. We will now deﬁne and motivate each term. Data Term. The data term in (6) is fairly general. Any vector-valued similarity function s can be used, whose components s(x) i indicate the aﬃnity of some data point at x with class i. As an example, suppose we have image features g(x), x ∈ Ω, prototypical feature vectors G = (G1 , . . . , GL ) as well as a distance measure d on the features. We might think of g as a grayscale image, of G as some prototypical gray values, and of d as a quadratic distance measure, possibly derived from a statistical noise model. The hard assignment of the pixel x ∈ Ω to a label (or class) l(x) ∈ {1, . . . , L} should then be penalized by the distance d(g(x), Gl(x) ) of the corresponding feature to the prototype of the assigned class. Denoting the negative distance by s, and summing up over the image domain, we see that

s(x), u(x) for u(x) = el(x) . d g(x), Gl(x) = − (7) x∈Ω

x∈Ω

Thus, instead of looking for l ∈ {1, . . . , L}n , we may equivalently look for u ∈ {e1 , . . . , eL }n . However, the right hand side formulation has the advantage that it extends naturally to the soft assignment u ∈ C: We may now solve the easier problem of optimizing for u on the convex set C. In our experiments, we chose d(x, y) = ||x − y||1 , as the 1 -norm is still convex but known to be more robust against noise and outliers. However, s is not restricted to representing distances. In fact, it may be arbitrarily nonlinear and nonconvex in x and g, and involve nonlocal operations on g. The complexity is completely hidden within the precomputed vector s. Regularization Term. Recall that the regularizer of (6) is deﬁned (5) as

TV(u) = sup u, Div p = Gx u2 . (8) p∈D1

x∈Ω

154

J. Lellmann et al.

This deﬁnition for vector-valued u parallels the deﬁnition of the “isotropic” total variation measure in the scalar-valued case [11, 4, 12]. It is also known as MTV [13, 14, 15], and was recently studied in [16] in its continuous formulation. Contrary to the anisotropic discretization, where one would substitute the sum of 1-norms in (3), it is less biased towards edges parallel to the axes. See also [17] for an overview of TV-based research and applications. Optimality. After solving the relaxed problem, it remains to show that a binary solution can be recovered. For the continuous, binary case, Chan et al. [6] showed that an exact solution can be obtained by thresholding at almost any threshold. However, their results do not immediately transfer to the discrete multi-class case. In particular, the crucial “layer cake” formula holds for 1 -, but not 2 discretizations of the TV. Contrary to the binary case, it is not clear which rounding scheme to use for vector-valued u. For our experiments, we chose the ﬁnal class label for each pixel x as the index l of the maximal u∗l (x) of the global optimum u∗ of (6). This deﬁnes a suboptimal discrete solution u∗t . Bounding the error f (u∗t ) − f (u∗d) with respect to the unknown discrete optimum u∗d will be subject of our future work.

3

Optimization

Two basic problems arise concerning the optimization of (6): Nondiﬀerentiability of the objective function due to the TV term, and handling of the simplex constraint u ∈ C. We cope with the latter using the tight Douglas-Rachford splitting method as presented in the following section. We refer to [18] for the full derivations. Douglas-Rachford Splitting. Minimization of a proper, convex, lower-semicontinuous (lsc) function f : X → R can be regarded as ﬁnding a zero of its (necessarily maximal monotone [19, Chap. 12]) subgradient operator T := ∂f : X ⇒ X. In the operator splitting framework, ∂f is assumed to be decomposable into the sum of two “simple” operators, T = A + B, of which forward and backward steps can practically be computed. Here, we consider the (tight) Douglas-Rachford-Splitting iteration [20, 21], z k+1 ∈ (Jτ A (2Jτ B − I) + (I − Jτ B ))(z k ) ,

(9)

where Jτ T := (I + τ T )−1 is the resolvent of T . Under the very general constraint that A and B are maximal monotone and A + B has at least one zero, the sequence (z k ) will converge to a point z, with the additional property that x := Jτ B (z) is a zero of T ( [22, Thm. 3.15], [22, Prop. 3.20], [22, Prop. 3.19], [23]). In particular, for f = f1 +f2 , fi proper, convex, lsc with ri(dom f1 )∩ri(dom f2 ) = ∅ (ri(S) denoting the relative interior of a set S), it can be shown [19, Cor. 10.9] that ∂f = ∂f1 + ∂f2 , and the ∂fi are maximal monotone. As x ∈ Jτ ∂fi (y) ⇔ x = argmin(2τ )−1 x − y22 + fi (x), the computation of the resolvents reduces to proximal point optimization problems involving only the fi .

Convex Multi-class Image Labeling by Simplex-Constrained Total Variation

155

Application. For our speciﬁc problem, we split inf (f1 (u) + f2 (u)) , f1 (u) = −u, sΩ + λTV(u) , f2 (u) = δC (u) .

u∈C

(10)

and get the following Douglas-Rachford scheme: Algorithm 1. Outer loop (Douglas-Rachford) 1: choose some u0 and a ﬁxed step size τ > 0 2: repeat 1 3: solve uk ← argminu { 2τ u − z k 2 − u, s + σEλ (u)} 1 k 4: solve w ← argminw { 2τ w − (2uk − z k )2 + δC (w)} 5: z k+1 ← z k + wk − uk 6: until uk − uk−1 ∞ δouter .

As f is bounded from below on the compact set C and thus attains its minimum. From the remarks in the last section, we get convergence of the scheme for the discrete case: δC (w) and σEλ are both proper, convex, lsc with dom σEλ = Rn and ri(C) = ∅. In practice, one has to deal with solutions of the subproblems with limited accuracy. While there are extensions of the convergence result that take these inexact solutions into account [22, Prop. 4.50], they require the subproblems to be solved with increasing accuracy. However, we found that the method generally converged even though these requirements were not met.

4

Inner Loop Optimization

The second subproblem (Alg. 1, step 4) is a projection on the constraint set, wk = ΠC (2uk − z k ), which requires one projection on the low-dimensional unit simplex ΔL per x ∈ Ω. These projections can be computed in a ﬁnite number of steps [24]. The ﬁrst subproblem (step 3) is equivalent to 1 uk = argminu u − (z k + τ s)2 + (τ λ)T V (u), (11) 2 i.e. an extension to vector vector-valued u of the classical Rudin-Osher-Fatemi (ROF, TV-L2 ) problem with regularization parameter τ λ. Many methods have been suggested to solve the ROF problem, e.g. PDE, ﬁxpoint, or interior point methods for primal [4, 25], dual [26, 27, 28], or mixed [29] formulations. We evaluate two approaches: First, we will formulate a particularly simple gradient projection method in the operator splitting framework, cf. [30]. This scheme was introduced in [27] and extended to the multidimensional case in [31] (see also [16]). The second approach is based on the fast half-quadratic method of Yang et al. [15]. −1 k Forward-backward approach. The optimality of step (z − k 3, τ k condition u) + s ∈ ∂σEλ (u), can be rewritten as u = τ z /τ + s − ΠEλ z /τ + s . To compute the projection ΠEλ , we use the dual representation, 1 1 2 2 ΠEλ (x) = argmin q − xΩ = Div argmin Div p − xΩ + δDλ (p) . (12) 2 p q∈Eλ 2

156

J. Lellmann et al.

Using a simple forward-backward splitting for the inner problem results in the (gradient projection) update rule pj+1 = ΠDλ p − νDiv (Div p − x) . The projection ΠDλ can be computed explicitly and is separable in x, while the inner part can be computed for all models independently. This opens up the method to parallelization. Convergence is guaranteed for ν < 2/Div Div (see e.g. [22,√Thm. 3.12]). Extending the argument in [26, Thm. 3.1], we ﬁnd that div 4d. Accord1 ingly, we may set ν < 2d . In our experiments, we set ν = 0.95 2d to avoid numerical problems close to the theoretical maximum. Wrapping up, we have Algorithm 2. Inner loop, forward-backward approach 1: 2: 3: 4: 5:

k

x ← zτ + s, choose arbitrary p0 ∈ Rn×d×L repeat pj+1 = ΠDλ (pj − νDiv (Div p − x)) until pj+1 − pj ∞ δinner uk ← τ (x − Div pj+1 ).

Half-quadratic approach. While the forward-backward method is simple and easy to implement, its convergence speed is in practice not satisfactory. As an alternative, we tested an ROF specialization of the general multichannel image restoration method by Yang et al. [15]. Starting from (11), the problem is to ﬁnd μ uk = argminu g(u) , g(u) := u − f 2 + T V (u) , (13) 2 where μ := τ1λ and f := z k + τ s. Using a half-quadratic approach [32, 33], Yang et al. derive the splitting/penalty formulation

μ β 2 (u, y) = argmin yx + yx − Gx u + u − f 2Ω . (14) 2 2 Ld nL yx ∈R ,x∈Ω,u∈R x∈Ω

The parameter β controls smoothing of the total variation; setting β n/(2ε) guarantees ε-suboptimality of the solution of the smoothed problem with respect to the original problem (for a derivation see [18]). Equation (14) can be solved using alternating minimization w.r.t. u and the auxiliary variables yx . The latter is highly parallelizable, as it boils down to n separate explicit operations: yxj+1 = max Gx u − β −1 , 0 (Gx u/Gx u) . (15) On the other hand, minimizing (14) for u amounts to solving μ Grad Grad + (μ/β)I(nL) uj+1 = Grad y j+1 + f, β for uj+1 , where y j+1 is a proper rearrangement of the yx .

(16)

Convex Multi-class Image Labeling by Simplex-Constrained Total Variation

157

Fig. 3. Results of the speed comparison between forward-backward (FB) and halfquadratic method (HQ) for the inner problem, applied to data from the first iteration of the outer problem (cf. Table 1). Left to right: Original input, FB with τ λ = 5, HQ with τ λ = 5, FB with τ λ = 20, HQ with τ λ = 20. Iteration counts were ﬁxed at 80 resp. 300 to equalize the runtime for both approaches. For larger regularization parameter, the half-quadratic method outperforms the forward-backward approach as smoothness increases.

For periodic boundary conditions, Yang et al. solved (16) rapidly using FFT. In our case, Neumann boundary conditions and thus the Discrete Cosine Transform (DCT-2) [34] are appropriate. This requires 2L independent (parallelizable) individual DCTs which can be eﬃciently computed in O(n log n) each. By the alternating application of the above two steps, we can solve (14) for ﬁxed β large enough for any required suboptimality bound. In practice, convergence can be sped up by starting with a small β and solving a sequence of problems for increasing β, warm-starting each with the solution for the previous problem. Given an arbitrary u0 ∈ RnL , the complete algorithm reads Algorithm 3. Inner loop, half-quadratic approach 1: while stopping criterium not satisﬁed do 2: compute y j+1 from (15) 3: compute uj+1 from y j+1 and (16), 4: possibly increase β 5: end while The stopping criteria can be based on the residual [15]. For our experiments, we set a ﬁxed iteration count, as increasing β at each step turned out to lead to fastest convergence, and residua for diﬀerent β are not comparable.

5

Experiments, Performance Evaluation

Inner Problem. We compared the half-quadratic approach to the conventional forward-backward method. The diﬃculty with the former lies in the choice of the update strategy for β. We chose a generalization of the exponential strategy outlined in [15]: Set β = βmin and update by multiplying with c := (βmax /βmin)1/K for some K until β = βmax . We made the following observations: – In order to rapidly minimize the objective function, it is best to use a continuation strategy, i.e. to increase β at each step, rather than spending time on solving (14) exactly for each β.

158

J. Lellmann et al.

– Increasing K generally improves the quality of the result. – For ﬁxed βmax and K, there seems to be a unique optimal βmin that minimizes the ﬁnal objective function value. With the continuation strategy and ﬁxed βmax , we found the optimal βmin to usually lie in the range of 10−5 βmax to 10−3 βmax . Unfortunately, there seems to be a strong dependency on the choice of λ as well as the scale and complexity of s. We set βmin = 0.2 · 10−4 βmax , which worked well for our data. βmax was set at n/0.2 according to a suboptimality bound of ε = 0.1 (section 4). We compared the performance of the two methods in terms of the objective function value for ﬁxed runtime of the optimized Matlab implementations (Fig. 3, Table 1). For larger τ λ,the half-quadratic method gives better results. For τ λ = 20, less than 10 iterations are required to reach the quality of 300 iterations of the forward-backward method, giving a speedup of about 4-5. However, ﬁnding the optimal parameter set is more involved than for the forward-backward method. Table 1. Run times t (in seconds), objective function values r and relative diﬀerences (rHQ − rFB )/rHQ for the experiment in Fig. 3. For larger τ λ, the half-quadratic method gives more accurate results in the same time. τλ 0.1 1 2 5 10 20 50 tHQ 1.14 1.23 1.20 1.31 0.98 0.95 1.08 tFB 1.03 1.02 1.06 1.03 1.22 1.25 1.19 rHQ 3901.9 27660.7 36778.5 40038.8 42262.8 44377.1 44752.5 rFB 3901.9 27660.4 36760.6 40104.3 42924.3 46988.6 57504.9 rel. diﬀ. 1.17e-16 1.24e-5 4.85e-4 -1.64e-3 -0.0156 -0.0588 -0.285

Overall Problem. We evaluated the performance of our algorithm against ﬁve diﬀerent methods in their publicly available implementations from the Middlebury MRF benchmark [35]: Belief Propagation (BP), Sequential Belief Propagation (BPS), Graph Cuts with alpha-expansion (GCE) and alpha-beta swap (GCS), and Sequential Tree Reweighted Belief Propagation (TRBPS). Of each of the grayscale 32 × 32 images, 20 noisy copies were generated and segmented into four gray levels with ﬁxed intensities. In view of the last section and in order not to mix up speed with accuracy issues, we used the forward-backward approach for the inner loop. We set δinner = 1 · 10−3 , δouter = 2 · 10−2, and τ = 1. For small λ, our method shows results comparable to the other approaches with respect to the number of bad labels. We point out again that this solution to the non-binary labeling problem is achieved by solving the convex optimization problem (6) followed by local rounding as explained in section 2. In contrast to our method, the MRF benchmark algorithms optimize the anisotropic energy. To compensate, their λ was scaled by a common factor of √ ≈ 2 that was found empirically. Nevertheless, the discretization gives them a small advantage on images with axis parallel edges (experiments 1 and 2).

Convex Multi-class Image Labeling by Simplex-Constrained Total Variation

159

25

50 bp bps gce gcs trws tv

40 30

bp bps gce gcs trws tv

20 15 10

20 10 0 0

Standard deviation

Incorrect labels (mean %)

Fig. 4. Exemplary grayscale segmentation results for the benchmarked methods for four labels. Left to right: Noisy input data, ﬁnal results for BP, BPS, GCE, GCS, TRWS, and the proposed method (TV). λ was manually chosen for each method. Axis-parallel edges are better recovered by the anisotropic methods, while our isotropic discretization has an advantage on diagonal edges.

5 0

0.2

0.4

0.6

λ

0.8

1

1.2

1.4

0

0.2

0.4

0.6

λ

0.8

1

1.2

1.4

Fig. 5. Error rates for the ﬁrst experiment in Fig. 4. For each λ, all experiments were repeated 20 times with random noise (zero-mean Gaussian with σ = 0.45, 0.35, 0.25 resp. 0.35 for experiments 1-4 and image intensities in [0, 1]), and the percentage of incorrectly assigned labels compared to ground truth was recorded. Sequential Belief Propagation (BPS) generally performed worst, while our method (TV) was on par with the others, in particular for lower λ. The ﬁgure also reveals that belief propagation (BP) gets stuck in a good, but often inferior local optimum, and does not respond to larger values of λ, i.e. stronger regularization requested by the user.

Figure 6 demonstrates the performance of our algorithm for color segmentation. Only few outer iterations (20 in our case) are necessary for accurate optimization.

160

J. Lellmann et al.

Fig. 6. Performance of our method for four-class segmentation based on 1 color distance. Left to right: Ground truth, inspired by [29, 36]; ground truth overlaid with Gaussian noise, σ = 1; local nearest-neighbor labeling; our approach with λ = 0.7 after 20 outer iterations. The energy of the result is about 1% lower than the energy of the ground truth, suggesting that at this noise level, further improvements are limited by the model.

6

Conclusion and Future Work

In this paper, we presented a convex variational approach to solve the combinatorial multi-labeling problem for energies involving a general data term, total-variation-like regularizers, and simplex constraints. To enforce the simplex constraint, we based our approach on the globally convergent Douglas-Rachford operator splitting scheme. We evaluated two methods in order to eﬃciently solve the ROF-type subproblems, and showed that the half-quadratic approach allows faster convergence at the price of more involved parameter tuning. Experiments showed that the quality of the generated labelings is comparable to state of the art discrete optimization methods, and can be achieved by just solving a convex optimization problem. Due to the generality of the data term, our method allows for a wide range of features or distance measures. To fully evaluate these possibilities in connection with variations of the TV measure is a subject of our future research. Acknowledgements. Jing Yuan gratefully acknowledges support by the German National Science Foundation (DFG) under grant SCHN 457/9-1.

References 1. Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. PAMI 23(11), 1222–1239 (2001) 2. Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-ﬂow algorithms for energy minimization in vision. PAMI 26(9), 1124–1137 (2004) 3. Kolmogorov, V., Zabih, R.: What energy functions can be minimized via graph cuts? PAMI 26(2), 147–159 (2004) 4. Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 5. Strang, G.: Maximal ﬂow through a domain. Math. Prog. 26, 123–143 (1983)

Convex Multi-class Image Labeling by Simplex-Constrained Total Variation

161

6. Chan, T.F., Esedo¯ glu, S., Nikolova, M.: Algorithms for ﬁnding global minimizers of image segmentation and denoising models. J. Appl. Math. 66(5), 1632–1648 (2006) 7. Pock, T., Schönemann, T., Graber, G., Bischof, H., Cremers, D.: A convex formulation of continuous multi-label problems. In: ECCV, vol. 3, pp. 792–805 (2008) 8. Ishikawa, H.: Exact optimization for Markov random ﬁelds with convex priors. PAMI 25(10), 1333–1336 (2003) 9. Zach, C., Gallup, D., Frahm, J.M., Niethammer, M.: Fast global labeling for realtime stereo using multiple plane sweeps. In: VMV (2008) 10. Kleinberg, J., Tardos, E.: Approximation algorithms for classiﬁcation problems with pairwise relationships: Metric labeling and MRFs. In: FOCS, pp. 14–23 (1999) 11. Ziemer, W.: Weakly Diﬀerentiable Functions. Springer, Heidelberg (1989) 12. Meyer, Y.: Oscillating Patterns in Image Processing and Nonlinear Evolution Equations. Univ. Lect. Series, vol. 22. AMS (2001) 13. Sapiro, G., Ringach, D.L.: Anisotropic diﬀusion of multi-valued images with applications to color ﬁltering. Trans. Image Process. 5, 1582–1586 (1996) 14. Chan, T.F., Shen, J.: Image processing and analysis. SIAM, Philadelphia (2005) 15. Yang, J., Yin, W., Zhang, Y., Wang, Y.: A fast algorithm for edge-preserving variational multichannel image restoration. Tech. Rep. 08-09, Rice Univ. (2008) 16. Duval, V., Aujol, J.F., Vese, L.: A projected gradient algorithm for color image decomposition. CMLA Preprint (2008-21) (2008) 17. Chan, T., Esedoglu, S., Park, F., Yip, A.: Total variation image restoration: Overview and recent developments. In: The Handbook of Mathematical Models in Computer Vision. Springer, Heidelberg (2005) 18. Lellmann, J., Kappes, J., Yuan, J., Becker, F., Schnörr, C.: Convex multi-class image labeling by simplex-constrained total variation. TR, U. of Heidelberg (2008) 19. Rockafellar, R., Wets, R.J.B.: Variational Analysis, 2nd edn. Springer, Heidelberg (2004) 20. Douglas, J., Rachford, H.H.: On the numerical solution of heat conduction problems in two and three space variables. Trans. of the AMS 82(2), 421–439 (1956) 21. Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM Journal on Numerical Analysis 16(6), 964–979 (1979) 22. Eckstein, J.: Splitting Methods for Monotone Operators with Application to Parallel Optimization. PhD thesis, MIT (1989) 23. Eckstein, J., Bertsekas, D.P.: On the Douglas-Rachford splitting method and the proximal point algorithm for max. mon. operators. M. Prog. 55, 293–318 (1992) 24. Michelot, C.: A ﬁnite algorithm for ﬁnding the projection of a point onto the canonical simplex of Rn . J. Optim. Theory and Appl. 50(1), 195–200 (1986) 25. Dobson, D.C., Curtis, Vogel, R.: Iterative methods for total variation denoising. J. Sci. Comput 17, 227–238 (1996) 26. Chambolle, A.: An algorithm for total variation minimization and applications. JMIV 20, 89–97 (2004) 27. Chambolle, A.: Total variation minimization and a class of binary MRF models. In: Rangarajan, A., Vemuri, B.C., Yuille, A.L. (eds.) EMMCVPR 2005. LNCS, vol. 3757, pp. 136–152. Springer, Heidelberg (2005) 28. Aujol, J.F.: Some algorithms for total variation based image restoration. CMLA Preprint (2008-05) (2008) 29. Chan, T.F., Golub, G.H., Mulet, P.: A nonlinear primal-dual method for total variation-based image restoration. J. Sci. Comput. 20, 1964–1977 (1999) 30. Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. SIAM J. Multisc. Model. Sim. 4(4), 1168–1200 (2005)

162

J. Lellmann et al.

31. Bresson, X., Chan, T.: Fast minimization of the vectorial total variation norm and applications to color image processing. Tech. Rep. 07-25, UCLA (2007) 32. Geman, D., Yang, C.: Nonlinear image recovery with halfquadratic regularization. IEEE Trans. Image Proc. 4(7), 932–946 (1995) 33. Cohen, L.: Auxiliary variables and two-step iterative algorithms in computer vision problems. JMIV 6(1), 59–83 (1996) 34. Strang, G.: The discrete cosine transform. SIAM Review 41(1), 135–147 (1999) 35. Szeliski, R., Zabih, R., Scharstein, D., Veksler, O., Kolmogorov, V., Agarwala, A., Tappen, M., Rother, C.: A comparative study of energy minimization methods for Markov random ﬁelds. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 16–29. Springer, Heidelberg (2006) 36. Hintermüller, M., Stadler, G.: An infeasible primal-dual algorithm for total bounded variation-based inf-convolution-type image restoration. J. Sci. Comput. 28(1), 1–23 (2006)

Geodesically Linked Active Contours: Evolution Strategy Based on Minimal Paths Julien Mille and Laurent D. Cohen CEREMADE, UMR CNRS 7534, Université Paris IX-Dauphine Place du Maréchal de Lattre de Tassigny, 75016 Paris, France {mille,cohen}@ceremade.dauphine.fr

Abstract. The proposed method is related to parametric and geodesic active contours as well as minimal paths, in the context of image segmentation1 . Our geodesically linked active contour model consists in a set of vertices connected by paths of minimal cost. This makes up a closed piecewise deﬁned curve, over which an edge or region energy functional is formulated. The greedy algorithm is used to move vertices towards a conﬁguration minimizing the energy functional. This evolution technique ensures lower sensitivity to erroneous local minima than usual gradient descent of the energy. Our method intends to take advantage of explicit active contours, minimal paths and greedy evolution techniques.

1

Introduction

Among well known variational models for image segmentation, active contours have drawn lively interest since their introduction by Kass et al [1]. Their key principle is the research of a curve minimizing an energy functional, which mainly depends on the adequacy of the curve to the target object. Active contours are implemented either with a parametric curve - in which case they are often referred to as ’snakes’ - or in an implicit fashion based on the level set framework [2] [3]. Early active contour models are mainly parametric and boundary-based, as the data term of the energy functional is an edge indicator function integrated along the curve. The Euler-Lagrange equation, determined by calculus of variations, indicates the minimizing ﬂow to be followed by gradient descent scheme. These models are dependent of curve parameterization and unable to adapt their topology. Moreover, gradient descent is sensitive to local minima of the energy functional. Parameterization invariance is achieved by the geodesic active contour model [4], which introduces a geometrically intrinsic functional, whereas topology adaptiveness is provided by the level set implementation. Signiﬁcant attempts have been made to decrease the sensitivity to local minima, based either on the gradient descent direction or on the minimization method itself. The balloon force [5] falls into the ﬁrst category, as it adds a normal-oriented inﬂation or retraction component, in order to increase the capture range of the snake. As regards the evolution process, several heuristics based 1

This work was partially supported by ANR grant NanoGPSCellulaire ANR-05NANO- 045-06.

X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 163–174, 2009. c Springer-Verlag Berlin Heidelberg 2009

164

J. Mille and L.D. Cohen

on local searches have been proposed as alternatives to gradient descent, including dynamic programming [6] [7] and the greedy algorithm [8] [9]. The latter, which is subsequently addressed in the paper, considers the energy as a sum of curve points energies. It basically consists in iteratively moving curve points to locations minimizing their own energies, these locations belonging to a search window. On the other hand, the minimal path approach by Cohen and Kimmel [10], which seeks for a curve of minimal cost between two end-points, can be used to recover open and closed boundaries. It is closely related to the geodesic active contour with respect to the functional to be optimized, but has in addition the main beneﬁt of ﬁnding a global minimum eﬃciently thanks to the Fast Marching technique [11]. In this paper, we deal with an explicit implementation of active contour, i.e. a discrete curve deﬁned by control points, or vertices. The described method is both related to minimal paths and greedy search. Our geodesically linked active contour model is made up of a set of vertices connected by paths of minimal cost with respect to a boundary-based metric. We deﬁne search windows centered at each vertex and evolve vertices according to a greedy fashion. Making a given vertex movable and the other ones still, we consider every geodesically linked contour passing through the points in the window of the moving vertex. This last one is ﬁnally moved to the location leading to the contour of smallest energy. The motivation for this work resides in several points. Firstly, the minimal path approach alone can only ﬁnd a minimizer of an edge functional, with one or several(s) ﬁxed input end-point(s). Conversely, our model is suitable to any energy functional, which we prove by endowing it with edge-based or diﬀerent regionbased energies, including the minimal variance of the Chan and Vese model [12]. We believe that describing the curve with geodesics is pertinent whatever the energy functional is. Indeed, whether the functional holds edge, region and/or even shape prior terms, the major part of the ﬁnal curve will be located on more or less salient edges. In comparison to snakes driven by gradient descent, the use of search windows signiﬁcantly reduces sensitivity to erroneous local minima and energy weights tuning.

2 2.1

Background Parametric and Geodesic Active Contours

The active contour model is represented as a plane curve Γ with C 2 position vector c(u) = [x(u) y(u)]. Segmentation of an object of interest is performed by ﬁnding the curve minimizing an energy functional E, which has the general form: 1 E(Γ ) = L(c, c , c )du (1) 0

where L is usually made up of internal terms regularizing the curve and external terms attaching the curve to image data. According to calculus of variations, the following variational derivative vanishes if the curve is a local minimizer of E: δE ∂L d ∂L d2 ∂L = − + 2 (2) δΓ ∂c du ∂c du ∂c

Geodesically Linked Active Contours

165

Curve evolution is usually performed by gradient descent, taking the opposite variational derivative as a descent direction. Given an image I deﬁned over D ∈ R2 , they use the following edge indicator g, which is a decreasing function of gradient magnitude (the image is usually convolved with the derivative of a gaussian). The original parametric snake [1] has the following energy and variational derivative: 1 2 2 Lsnake (c, c , c ) = α c + β c + g(c) 2 (3) δEsnake = −αc + βc + ∇g δΓ The energy functional of the snake is dependent on parameterization. This has an impact on discretization, since the energy varies in terms of sampling when the contour is implemented as a polygonal curve. The geodesic active contour (GAC) [4] solves the parameterization issue by introducing an intrinsic energy functional, weighting the edge indicator by length element c : LGAC (c, c , c ) = g(c) c (4)

δEGAC = (∇g, n − κg) n δΓ

where n and κ are the unit inward normal vector and curvature, respectively. Hence, the ﬂow resulting from the geometric energy also holds a regularization term. This model lends itself to level set implementation, allowing topology changes. Boundary-based models driven by gradient descent, whether parametric or geodesic, are relatively blind to neighboring structures and may get trapped in local minima induced by noise. To increase the capture range, the balloon force was introduced in [5] for parametric contours, whereas an advection term is used in [3] for level sets. Despite such techniques, gradient descent may still cause the contour to miss or pass through signiﬁcant boundaries. The minimal path method addresses this issue by ﬁnding a global minimum of the energy. 2.2

Minimal Paths

The minimal path approach by Cohen and Kimmel [10] aims at ﬁnding curves of minimal lengths in a Riemannian space endowed with an heterogeneous isotropic metric. The length of path C is: 1 1 L(C) = (5) P˜ (C(s))ds = P˜ (C(u)) C (u) du 0

0

where s is the arc length. Potential P˜ , which deﬁnes the isotropic metric, should be chosen according to the application. Curves located on image boundaries are detected by using an edge-dependent potential P˜ (x) = w + g(x), where w is a regularizing constant. Hence the cost of C may be rewritten using euclidean length: 1 1 L(C) = (w + g(C(s)))ds = wLeuclidean(C) + g(C(s))ds (6) 0

0

166

J. Mille and L.D. Cohen

With respect to the energy functional to be minimized, the minimal path approach is similar to the geodesic active contour model, as can be seen in term LGAC of eq. 4. However, the minimal path has the avantageous diﬀerence of reaching the global minimum of the energy, given two ﬁxed end-points x0 and x1 . Starting from point x0 , the minimal action map U0 should be calculated. It corresponds to the minimal cost integrated along a path starting at x0 and ending at x: C(0) = x0 U0 (x) = inf L(C) s.t. C(1) = x C The action map U0 is the viscosity solution of the Eikonal equation ∇ U = P˜ with initial condition U (x0 ) = 0. This allows U0 to be computed by the Fast Marching method [11], which is similar in principle to Dijkstra’s graph search algorithm. Once the action map has been computed, the geodesic γ, i.e. the path of minimal action linking a point x1 to x0 , is found by back-propagation starting from x1 until x0 is reached: γ = −∇U0 (γ). In its initial formulation, the minimal path method determines an open curve between two ﬁxed end-points. It is also able to ﬁnd closed contours by providing only one point on the ﬁnal contour and detecting a saddle point on the minimal action map [10]. 2.3

Greedy Algorithm

Along with dynamic programming [6] [7], greedy methods deal with discrete energy functionals. The greedy algorithm for active contours, as developed in [8], seeks for a minimizer of the energy by means of a set of local optimizations. It is only applicable on explicit implementations, where the contour is represented as a polygon with n vertices {vi }1≤i≤n . The total energy is considered as a sum of vertex energies: n E(Γ ) = Evertex (vi ) i=1

where Evertex is the discretization of the energy at a given vertex, using ﬁnite diﬀerences. Considering the snake term Lsnake of eq. 3, it comes: 1 2 2 α vi − vi−1 + β vi+1 − 2vi − vi−1 + g(vi ) (7) Evertex (vi ) = 2 Vertices are successively moved in order to minimize their own energies. At each iteration, a square window of width m is considered around the current vertex. ˜ i in the window. The energy of the latter is computed at each tested position v The vertex is then moved to the position leading to the lowest energy, which is summarized by the evolution scheme, at iteration t: (t+1)

vi

= arg min Evertex (˜ vi ) ˜ i ∈W vi(t) v

where W(x) is the window centered at point x. The initial greedy algorithm [8] performs in O(nm2 ) operations. The window size has an obvious impact on computational cost, but also on convergence abilities. Indeed, the contour can capture

Geodesically Linked Active Contours

167

farther structures as the window is larger. The greedy algorithm is by essence a discrete optimization heuristic. The formulation of the variational derivative is not used and continuous calculus of variations is thus not necessary.

3

The Geodesically Linked Active Contour

We develop an approach taking advantage of above described methods. Our geodesically linked active contour is based simultaneously on an explicit implementation of active contours, minimal paths and the greedy algorithm. Basically, we deal with an evolving explicit closed curve, allowing initialization inside or around the target object without providing ﬁxed points. Minimal paths coupled with a geometric energy functional allows a parameterization-free handling of the contour. The use of the greedy algorithm, as opposed to gradient descent, guarentees better robustness to local minima. 3.1

Minimal Paths to Connect Vertices

Let us consider a set of n linked vertices S = {vi }1≤i≤n . We denote as γi (u) = [xi (u) yi (u)] the geodesic path connecting vi to vi+1 : C(0) = vi γi = arg min L(C) s.t. (8) C(1) = vi+1 C where the cost functional L is deﬁned in eq. 5. At every step of the evolution algorithm, the set of geodesics {γi }1≤i≤n describes a closed piecewise diﬀerentiable contour Γ , which euclidean length is: Leuclidean(Γ ) =

n i=1

0

1

γi (u) du

One may note that a concatenation of geodesics γi is not a geodesic itself, since it is forced to pass through given points. To some extent, curve Γ may be considered as a piecewise minimizer of an edge-based functional. If a uniform potential P˜ (x) = 1 was chosen, the geodesics would become straight lines of equation γi (u) = (1 − u)vi + uvi+1 , u ∈ [0, 1], in which case Γ would represent a polygon. Fig. 1 depicts geodesically linked contours with uniform potential and edge-based potential (dark smooth lines represent high image gradient areas). As described in section 2.2, path γi is determined by gradient descent of the minimal action map Ui+1 of origin vi+1 . Given start point vi+1 , the Fast Marching algorithm [11] allows to specify one or several end points (in our case vi ) so that propagation can be stopped when vi is reached. This prevents the whole image from being visited by the Fast Marching and saves computational time. In the case of edge-based segmentation, the interest of describing the evolving contour with geodesics is obvious. Indeed, in the end of deformation, the

168

J. Mille and L.D. Cohen

(a)

(b)

Fig. 1. Vertices linked by geodesics with uniform potential (a) and edge-based potential (b)

geodesics ﬁt the actual boundaries of the sought object. On the other hand, in the case of region-based segmentation, image edges are not explicitly searched. However, we believe that linking vertices with geodesics is relevant for any usual segmentation criterion. We may assume that the ﬁnal contour should be partially located on more or less salient boundaries, whatever energy functional is optimized. In subsequent sections, we formulate three energies independently implemented on the geodesically linked active contour, namely the edge, region and narrow band region energies. Before, we recall Green’s theorem, which we use to convert domain integrals into boundary integrals. For every region R and real-valued function f over R2 , we have: f (x)dx = P dx + Qdy (9) R

∂R

where [P (x) Q(x)] is a continuously diﬀerentiable vector ﬁeld such that: 1 x 1 y Q(x, y) = f (t, y)dt P (x, y) = − f (x, t)dt 2 −∞ 2 −∞

(10)

The theorem expects that ∂R should be at least piecewise smooth, it is thus applicable to the geodesically linked active contour. For instance, to express the area of region Rin enclosed by Γ , we consider eq. 9 with f (x) = 1: n 1 1 |Rin | = xi (u)yi (u) − xi (u)yi (u)du 2 i=1 0 3.2

Edge Energy

Boundary-based segmentation is performed by minimizing an edge energy. The edge indicator function g is integrated along geodesics. In order not to penalize lengthy contours, the edge energy is normalized by euclidean length: n 1 1 g(γi (u)) γi (u) du Eedge (Γ ) = Leuclidean(Γ ) i=1 0 Note that according to eq. 6, the integral of g along γi equals Ui+1 (vi ) minus the euclidean length Leuclidean(γi ). Hence, once the action maps have been computed, the edge indicator does not need to be summed over geodesics again. With the edge energy alone, if the search space of vertex coordinates is too small, the

Geodesically Linked Active Contours

169

contour fails at capturing actual boundaries when initialized far from them. To increase the capture range, we add an area-dependent term, which minimization acts like a balloon force [5]: Eballoon (Γ ) =

|D| − |Rin | |D|

where |D| is the image area. In that case, the total energy is a weighted sum of edge and balloon energies. 3.3

Region Energy

The increasing use of region terms has proven to overcome limitations of edgebased only models, especially when dealing with data sets suﬀering from noise and lack of contrast between neighboring structures. Classical region-based deformable models segment images according to statistical data computed over the object of interest and the background. Image partitions should be uniform in terms of pixel intensities or higher level features like texture descriptors. We rely on the intensity variance, which is close to the two-phase Mumford-Shah segmentation model by Chan and Vese [12]. The average intensity in the inner region is expressed using Green’s theorem: μ(Rin ) =

1 |Rin |

1 |Rin | i=1 n

I(x)dx = Rin

1

0

xi P (γi ) + yi Q(γi )du

where P and Q are the summed intensities (see template formulas in eq. 10). Then, the inner intensity variance is: 1 1 2 2 σ (Rin ) = (I(x) − μ(Rin )) dx = I 2 (x)dx − μ(Rin )2 |Rin | Rin |Rin | Rin where the integral of squared intensities may also be expanded according to Green’s theorem. Corresponding quantities on the outer region may be expressed using relation f (x)dx = Rout

3.4

D

f (x)dx −

f (x)dx Rin

Narrow Band Region Energy

The ideal case of uniform regions is rarely encountered in real applications, as the background usually contains structures of various intensities. Hence, strict homogeneity is not necessarily a desirable property. In order to account for spatially varying intensity, local statistics in region-based segmentation have emerged recently [13] [14]. The narrow band principle, which has proven its eﬃciency in the evolution of level sets [3], is used in our approach to formulate a local region term [15].

170

J. Mille and L.D. Cohen

Γ

Γ[−B]

Γ[B]

Bin Bout Fig. 2. Inner and outer bands for narrow band region energy

Instead of dealing with whole domains Rin and Rout , we consider an inner band Bin and an outer band Bout in the vicinity of the contour, as depicted in ﬁg. 2. The narrow band region energy is the intensity variance over the bands: Eband (Γ ) = σ 2 (Bin ) + σ 2 (Bout ) Our narrow band region energy is based on parallel curves [16]. We deﬁne curve γ[B] i as a parallel curve of γi : γ[B] i (u) = γi (u) + Bni (u)

(11)

where B is the user-deﬁned band thickness, constant along the curve, and ni is the inward unit normal to geodesic γi . Hereafter, we will use the index [B] to denote all quantities related to the parallel curve. Bands Bin and Bout are bounded by parallel curves of the n geodesics γi , respectively γ[B] i and γ[−B] i . We assume that geodesics are smooth enough so that their parallel curves do not self-intersect nor exhibit singularities. An important property resulting from the deﬁnition in eq. 11 is that the velocity vector of the parallel curve can be expressed as a function of the velocity vector of the initial curve, as well as its curvature and normal. Using the identity ni = − κi γi , we have: γ[B] i = γi + Bni = (1 − Bκi )γi

(12)

By a change of variable, an integral over inner band Bin may be expressed explicitly in terms of the curve and band thickness: n 1 B f (x)dx = f (γi + bni ) γi (1 − bκi ) db du (13) Bin

0

i=1

0

We use the template formula in eq. 13 to express the mean and variance of intensities in the inner band: n 1 1 B μ(Bin ) = I(γi + bni ) γi (1 − bκi ) db du |Bin | i=1 0 0 1 σ (Bin ) = |Bin | i=1 2

n

0

1

0

B

(I(γi + bni ) − μ(Bin ))2 γi (1 − bκi ) db du

and similarly for the outer band, replacing b with −b.

Geodesically Linked Active Contours

3.5

171

Evolution with Greedy Algorithm

Vertices should be moved in order to minimize the selected energy. This is usually performed with gradient descent of the Euler-Lagrange equation. In our case, it is diﬃcult to diﬀerentiate Eedge , Eregion or Eband with respect to a given vertex vi , since these energies depend on geodesics to vi (see eq. (8)). The greedy algorithm presented in section 2.3 provides us a way to evolve vertices without diﬀerentiating the energy. Motion of curve points can always be decomposed into normal and tangential components. While the geometry of the curve is modiﬁed by normal displacements, tangential motion only aﬀects curve parameterization [4]. Since the distribution of vertices along the contour can be updated with a resampling technique, we only consider normal displacement in the greedy evolution. We deﬁne a normal-oriented window WN of length m centered at vertex vi : m m

WN (vi ) = vi + knvi k ∈ − , 2 2 where nvi is the inward unit normal vector, estimated by ﬁnite diﬀerence using the second and next-to-last points of geodesics γi and γi+1 , respectively. Since steps between successive points in the window are integers, the window may be computed using a Bresenham-like algorithm.

γ˜i ˜i v

vi+1

vi

γ˜i−1

vi−1 Fig. 3. Geodesics linking neighboring vertices to points in search window

Greedy evolution is performed by moving vertex vi to the position in the window which corresponding geodesically linked contour has the smallest en˜ i belonging to the window. The associated ergy. Let us consider a test position v geodesics γ˜i−1 and γ˜i link it to the neighbors of vi , as depicted in ﬁg. 3. The energy of the corresponding geodesically linked contour Γ˜ = {γ1 , ..., γi−2 , γ˜i−1 , γ˜i , γi+1 , ..., γn } is computed and compared to the energy of the initial contour Γ . All window points are tested in this way, so that the evolution scheme at iteration t is: (t+1)

vi

=

arg min

˜ i ∈WN v

(t) vi

E(Γ˜ )

172

J. Mille and L.D. Cohen

where E is one of the previously described energies. If we consider set H = {1, ..., i − 2} ∪ {i + 1, ...n} holding indices of geodesics not inﬂuenced by a modiﬁcation on vi , all quantities involved in the energies are written with constant ˜ i . For instance, the area of the tested inner and variable parts with respect to v region is decomposed: 1 ˜ xj (u)yj (u) − xj (u)yj (u) du Rin =

j∈H

+ 0

0 1

x ˜i−1 y˜i−1

−

x ˜i−1 y˜i−1

du + 0

1

x ˜i y˜i − x ˜i y˜i du

˜ i need to be comThis implies that the part of energies invariant with respect to v puted only once, before moving vi . Finally, once all vertices have been treated, resampling may be performed to maintain consistent distribution of vertices along the curve.

4

Experiments

We tested the geodesically linked active contour with the three diﬀerent energy conﬁgurations (edge+balloon, region and narrow band region). A comparison

Fig. 4. Segmentation of left ventricle: initialization (a) and ﬁnal location (b) of the geodesically linked active contour, initialization (c) and ﬁnal location (d) of the parametric contour

Geodesically Linked Active Contours

173

with a parametric snake endowed with the same energies is provided. The snake was initialized as a small circle inside the area of interest, far from the target boundaries. Similarly, the initial vertices of our model are sampled on a circle. Results are shown in ﬁg. 4. For each row, columns (a) and (b) represents the initial and ﬁnal states of the geodesically linked active contour, respectively. Columns (c) and (d) represent the same states for the snake. For all experiments, the regularization weight w was set to 0.25, which achieved suﬃcient regularization for all tested images. The size of the window was m = 50 and the maximal inter-vertex distance for resampling was set to 20. The image in row 1, which was segmented using the edge energy, depicts the gapclosing ability of the model. The geodesically linked active contour managed to pass through false edges and reach actual boundaries. Thanks to the large search window, it turned out to be rather unsensitive to balloon strength, as values for coeﬃcient α in the range [0.1, 4] were suitable. On the other hand, the balloon coeﬃcient has a strong inﬂuence on the gradient descent-driven parametric snake, which yields diﬃcult parameter tuning. Actually, it was not possible to ﬁnd a correct balloon weight allowing to jump false edges while stabilizing on real ones. The image in row 2, which was segmented using the region energy, depicts a similar phenomenon. The geodesically linked contour does not get trapped in small gaps in the region, which could present an interest for segmentation of partially occluded objects. Row 3 depicts a MRI of the heart left ventricle, which was used to put the narrow band region energy into application. The band thickness B is an important parameter. Apart from its impact on the algorithmic complexity - computing intensity means and variances on the bands takes at least O(nB) operations it controls the trade-oﬀ between local and global features around the object. If B = 1, the region energy is as local as an edge term. The main image property having an eﬀect on the minimal band thickness is the edges sharpness. Indeed, the deformable curve needs a larger band as the boundaries of the target object are fuzzy. However, B = 10 was a suitable value in our experiments. Note that we depict the state of the parametric snake before self-collision. One may note that an unconstrained region-based level set method would also properly segment images in row 2 and 3. However, this remark should be moderated by the fact our model is dedicated to applications where topology preservation is needed.

5

Conclusion and Perspectives

We proposed the geodesically linked active contour model for image segmentation. The model lies on an explicitly implemented curve moved by an evolution method based on minimal paths and a greedy algorithm. Linking curve points with geodesics solves parameterization issues and allows the contour to ﬁt the most salient boundaries at every step of deformation. Displacing vertices according to a greedy search ensured lower sensitivity to erroneous local minima than usual gradient descent of the energy. The model was endowed with edge and

174

J. Mille and L.D. Cohen

region energies and was validated on a few datasets. Further work may focus on developing an adaptive search window for greedy evolution. Currently, the window length is constant whatever the values of energies or the previous positions of vertices. We believe the algorithm could be improved by adapting the window length with respect to these properties, in order to avoid visiting positions that would not seemingly minimize the energy.

References 1. Kass, M., Witkin, A., Terzopoulos, D.: Snakes: active contour models. International Journal of Computer Vision 1(4), 321–331 (1988) 2. Osher, S., Sethian, J.: Fronts propagation with curvature-dependent speed: algorithms based on Hamilton-Jacobi formulations. Journal of Computational Physics 79, 12–49 (1988) 3. Malladi, R., Sethian, J., Vemuri, B.: Shape modeling with front propagation: a level set approach. IEEE Transactions on Pattern Analysis and Machine Intelligence 17(2), 158–175 (1995) 4. Caselles, V., Kimmel, R., Sapiro, G.: Geodesic active contours. International Journal of Computer Vision 22(1), 61–79 (1997) 5. Cohen, L.: On active contour models and balloons. Computer Vision, Graphics, and Image Processing: Image Understanding 53(2), 211–218 (1991) 6. Amini, A., Weymouth, T., Rain, R.: Using dynamic programming for solving variational problems in vision. IEEE Transactions on Pattern Analysis and Machine Intelligence 12(9), 855–867 (1990) 7. Geiger, D., Gupta, A., Luiz, A., Vlontzos, J.: Dynamic programming for detecting, tracking, and matching deformable contours. IEEE Transactions on Pattern Analysis and Machine Intelligence 17(3), 294–302 (1995) 8. Williams, D., Shah, M.: A fast algorithm for active contours and curvature estimation. Computer Vision, Graphics, and Image Processing: Image Understanding 55(1), 14–26 (1992) 9. Sakalli, M., Lam, K.M., Yan, H.: A faster converging snake algorithm to locate object boundaries. IEEE Transactions on Image Processing 15(5), 1182–1191 (2006) 10. Cohen, L., Kimmel, R.: Global minimum for active contour models: a minimal path approach. International Journal of Computer Vision 24(1), 57–78 (1997) 11. Sethian, J.: A fast marching level set method for monotonically advancing fronts. Proceedings of the National Academy of Science 93(4), 1591–1595 (1996) 12. Chan, T., Vese, L.: Active contours without edges. IEEE Transactions on Image Processing 10(2), 266–277 (2001) 13. Piovano, J., Rousson, M., Papadopoulo, T.: Eﬃcient segmentation of piecewise smooth images. In: Sgallari, F., Murli, A., Paragios, N. (eds.) SSVM 2007. LNCS, vol. 4485, pp. 709–720. Springer, Heidelberg (2007) 14. Lankton, S., Tannenbaum, A.: Localizing region-based active contours. IEEE Transactions on Image Processing 17(11), 2029–2039 (2008) 15. Mille, J., Boné, R., Cohen, L.: Region-based 2D deformable generalized cylinder for narrow structures segmentation. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 392–404. Springer, Heidelberg (2008) 16. Farouki, R., Neﬀ, C.: Analytic properties of plane oﬀset curves. Computer Aided Geometric Design 7(1-4), 83–99 (1990)

Validation of Watershed Regions by Scale-Space Statistics Tomoya Sakai and Atsushi Imiya Institute of Media and Information Technology, Chiba University, Japan {tsakai,imiya}@faculty.chiba-u.jp

Abstract. This paper shows a potential use of scale space for statistical validation of watershed regions of a greyscale image. The watershed segmentation has diﬃculty in distinguishing valid watershed regions associated with real structures of the image from invalid random regions due to background noise. In this paper, a hierarchy of watershed regions is established by following merging process of the regions in a Gaussian scale space. The distribution of annihilation scales (lives) of the regional minima is investigated to statistically judge the regions as being valid or not. Recursive validation using the hierarchy prevents oversegmentation due to the randomness.

1

Introduction

The aim of this study is to develop a statistical validation scheme for segmentation of a greyscale image. If we do not have a priori knowledge on the shapes or structures of objects in the image, topographic features of the greyscale image, and the watersheds in particular, are useful for unsupervised image segmentation. A well-known phenomenon in the watershed segmentation is oversegmentation, that is, producing a large number of undesired tiny regions. Since the undesired watershed regions are mainly caused by noise in the image, it is desirable to settle the oversegmentation problem by taking account of statistical properties of the randomness. There is a body of literature dealing with the oversegmentation problem of watersheds [1,2,3,4,5,6,7,8]. In the antecedent work, most schemes for preventing the oversegmentation attempt to hierarchically merge the oversegmented regions on the basis of similarity between adjacent regions measured by the MDL [3], colour distance [8], and so on. Diﬀusion-based multiscale image representations are also used for merging the regions [5, 6, 8], since the scale space theory [9,10,11,12,13,14,15] mathematically underpins topological relationships among the topographic features without a priori knowledge about them. The oversegmentation can be reduced by selecting levels in the hierarchy of regions, or by setting lower bounds to the scale above and below which the watersheds are valid and invalid, respectively. In this paper, we show that the scale-space treatment of the image is also useful for the statistical analysis of the random watershed regions. The validity X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 175–186, 2009. c Springer-Verlag Berlin Heidelberg 2009

176

T. Sakai and A. Imiya

of a watershed region can be quantiﬁed in terms of the statistical conﬁdence of distinguishing it from invalid watershed regions due to randomness. We present a fully unsupervised watershed segmentation algorithm, in which the watershed regions are recursively validated according to their hierarchical relationships in the scale space.

2 2.1

Watershed Segmentation with Variable Scale Gaussian Scale Space

In the Gausian scale-space theory [9,10,11,12,14,15,16], a one-parameter family of nonnegative functions is derived from a d-dimensional greyscale image f (x), x ∈ Rd . f (x, σ) = G(x, σ) ∗ f (x) (1) Here, “∗” expresses d-dimensional convolution, and G(x, σ) is an isotropic Gaussian function with the scale σ. |x|2 1 (2) G(x, σ) = √ d exp − 2 2σ 2π σ d We redeﬁne the d-dimensional greyscale image and its scale-space representation in the extended real scale and space as follows. Definition 1. A d-dimensional greyscale image is defined as a nonnegative d ¯ d with a finite net image intensity scalar function f (x), x ∈ R x∈R¯ d f (x)dx . ¯ d, R ¯ + ), is the convoDefinition 2. The scale-space image f (x, σ), (x, σ) ∈ (R lution of the greyscale image f (x) with the isotropic Gaussian kernel G(x, σ). ¯ d and R ¯ + denote the d-dimensional extended real space including a point Here, R at inﬁnity and the extended real scale including an inﬁnite scale, respectively. Although the domain of a greyscale image in practice is bounded within a limited area or volume, we embed such an image in the extended real scale space. The point at inﬁnity will be theoretically used as a representative point of the background of the image in the watershed segmentation later. 2.2

Watershed Segmentation and Hierarchy of Regions

The watershed segmentation was derived from spatial partitioning on the basis of the drainage patterns of rainfall. As the topographic height map deﬁnes the boundaries of the catchment basins draining to the same lowest points, a twodimensional greyscale image deﬁnes the watershed boundary curves enclosing regions with local minima when we regard the image intensity as the topographic height. For a d-dimensional image, the entire space is partitioned by (d − 1)dimensional hypersurfaces into d-dimensional watershed regions. Each watershed region deﬁned by a smooth function f (x) contains a unique local minimum, to

Validation of Watershed Regions by Scale-Space Statistics

177

which any point in the watershed region is connected by a gradient curve of f (x). In practice, the watershed segmentation of the gradient image |∇f (x)| is known to provide better intuitive partitions than that of the image f (x) itself [2, 5, 6, 8] because object boundaries in a scene may cause large spatial changes in the image intensity. Simple computation of the watersheds of the images results in oversegmentation caused by tiny and insigniﬁcant catchment basins. As suggested in the antecedent work [3, 5, 6, 8], hierarchical relationships among the watershed regions are of great help for merging the oversegmented regions. We employ the scale-space framework to derive the hierarchy because the scale-space axioms are acceptable in general cases where any prior information about the similarities among the unexpected watershed regions are not given. If we apply the gradient watershed segmentation to the image f (x, σ) with the variable scale σ, we can observe the evolution of the watersheds with respect to scale. The catastrophy theory applied to the gradient watershed segmentation in the Gaussian scale space [5] shows that the gradient watershed regions of f (x, σ) may be generically annihilated, merged, created and splitted with increasing scale σ. Therefore, hierarchical watershed segmentation using multiscale representation of the image [2, 6, 8] is essentially the extraction of the hierarchical relationships among the watershed regions in the scale space through the generic events. Since every watershed region is represented by its local minimum, the trajectories of the regional minima in scale space describe the relationships among the regions. For the purpose of validation of the regions, we derive the hierarchy from all the traceable regional minima from the ﬁnest scale along their trajectories in scale space. We trace the trajectories by local minimisation at every level of scale [16]. In an annihilation or merging event, two regional minima and a saddle between them are involved. We regard one of these two regional minima as a child of the resulting regional minimum after the event. We trace only one of two local minima after a creation or splitting event because we are interested in the hierarchy of the regions at the ﬁnest scale. Remark that the point at inﬁnity is a local minimum which exists at any scale. The local minimum at inﬁnity is the regional minimum of the image background because the rainfall in the background region is drained to this ideal point. The following algorithm RegionHierarchy traces every trajectory of the regional minimum from every pixel p ∈ P at σ = 0 until the regional minimum disappears or goes outside the image boundary toward the local minimum at inﬁnity with increasing scale. RegionHierarchy(set of pixel centres P , image f (p ∈ P )) 1 let G be a graph with card(P ) + 1 nodes with the labels l = 0, . . . , N where l = 0 represents the point at inﬁnity; 2 store σlt = ∞ in all nodes of G; 3 set σmax to be the size of the convex hull of P ; 4 σ := 0; 5 Q := P ; 6 while card(Q) = 1 or σ < σmax do

178

7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

T. Sakai and A. Imiya

Q := Q; σ := σ + Δσ, where Δσ is a small value compared with the space intervals of the points Q; for each q l ∈ Q do update q l by minimising |∇f (x, σ)|2 with q l as the initial position 1 ; if q l is outside the convex hull of P then connect the two nodes of G labelled 0 and l; end if end for let L be a list of labels corresponding to the points in Q; while card(L) = 1 do pop a label l from L; n := NearestNeighbour(L, l); if |q l − q n | < εσ, where q l , q n ∈ Q, and ε is the tolerance of minimisation then if |q l − q l | > |q n − q n |, where q l , q n ∈ Q then child := l and parent := n; else child := n and parent := l; end if connect the two nodes of G labelled parent and child; t set σchild := σ; remove q child from Q; end if end while end while return G.

The resulting graph G is a set of trees representing the hierarchy of the watershed regions of the gradient image. Any node in G represents a watershed region consisting of the pixels indicated by its subtree nodes. The annihilation or merging scale σ t is stored at the node in G corresponding to p. We utilise the bicubic spline interpolation [17] to seach for the local minimum with subpixel precision in Step 10. The function NearestNeighbour in Step 18 searches for the nearest point to pl in the set of points listed in L and returns its label. The annihilation or merging event is detected in Step 19, and one of the two regional minima with larger displacement is identiﬁed as the child in Step 20. Figure 1 shows an example of the trajectories of regional minima and the region hierarchy obtained by RegionHierarchy. Since the set of tree, G, expresses hierarchical relationships among the image pixels, any tree node with a scale σ > 0 represents a set of pixels consisting a watershed region.

1

It is trivial that the watershed regions of the gradient magnitude squared |∇f |2 are identical to those of the gradient magnitude |∇f |.

Validation of Watershed Regions by Scale-Space Statistics

179

σ

(a)

(b)

(c)

Fig. 1. Trajectories of regional minima and region hierarchy. (a) A noisy 96 × 96 image f (x) embeded in a dark background. (b) Gradient magnitude squared |∇f (x, σ = 20)|2 . The brighter the larger magnitude. (c) Trajectories of the regional minima in scale space. The thick curves (blue) are the parts of the trajectories for σ > 5. The thin straight lines (red) are the edges of G between the nodes with σ > 5.

2.3

Scale Selection Problem

We need a criterion to select the scales or the tree levels in hierarchy. One may expect that the watersheds of the image f (x, σ) at a small scale σ well approximates the boundary of true image regions. However, if noise spoils the ﬁne structure of the image, the estimated watersheds at small scales are stochastic and experimentally less reproducible. The noise is suppressed at a large scale, but the watershed segmentation is poor in terms of detection ability and localisation: the edges of small watershed regions are smoothed out, and the boundary shapes of large regions are simpliﬁed. Since the randomness is the major cause of the oversegmentation problem in the watershed methods [1, 4, 5], the oversegmentation problem should be resolved in a statistical manner.

3 3.1

Validation of Watershed Regions Valid Watershed Regions

Generally, a greyscale image expresses spatial distribution of a measured physical quantity. The true image f true(x), which we want to measure and apply the watershed segmentation to, is inevitably spoiled by random noise through the measurement. Therefore, the actual image f (x) presents valid watersheds related to those of the true image f true (x) and invalid watersheds due to the randomness. Assertion 1. A valid watershed region of an observed gradient image |∇f (x)| is related to one of the watershed regions of the true gradient image |∇f true (x)|. Since the watershed regions are represented by the region minima, the image f (x) has the valid watershed regions of the gradient image |∇f (x)| iﬀ the true gradient image |∇f true (x)| has corresponding local minima. Contrapositively,

180

T. Sakai and A. Imiya

iﬀ |∇f true (x)| is a featureless image without any local minimum, then no valid watershed exists for any observation f (x), which should be considered as an image of the background only. This condition means that f true (x) = 0 everywhere ¯ d because of the Deﬁnition 1. Therefore, f (x) for f true (x) = 0, i.e., the in R noise image, produces only the invalid watershed regions. The valid watershed region must be statistically distinguishable from such invalid region. From this viewpoint, the validity of the watershed region is interpreted as the statistical conﬁdence in rejecting the following null hypothesis. Null hypothesis H0 : The watershed region is that of the noise image. Alternative hypothesis H1 : The watershed region is not that of the noise image. The null hypothesis H0 is rejected if the regional minimum is distinguishable from that of the noise image using test statistics. 3.2

Life Distribution

An important fact is that the randomness of the image f (x, σ) is ﬁltered out as the scale σ increases, and deterministic features of the image f (x) emerge at large scales. In other words, the deterministic features such as the valid watershed regions are established from coarse to ﬁne. There presumably exists a critical lower bound of scale, above and below which the watersheds of f (x, σ) are valid and invalid, respectively. In order to observe how the valid regions survive until large scales against the scale-space ﬁltering, we deﬁne the life of the watershed region. Definition 3. The life of the watershed region is defined as the annihilation scale σ t of the regional minimum. Let W be a distribution of the lives of the watershed regions of |∇f (x, σ)| for the image of random noise. If W can be parametrically modelled, a goodness-of-ﬁt test can be performed under the null hypothesis H0 . That is, if an image f (x) is an observation of a true uniform image with noise, then the model of W ﬁts the distribution of lives {σ t } of its watershed regions, and H0 for any watershed regions of f (x) is accepted. We investigate experimentally the life distribution W for the gradient watershed regions of a Gaussian white noise image as shown in Fig. 2(a). We averaged the frequencies of lives over one hundred noise images. We discard the lives of pixel points whose annihilations are detected in 0 < σ ≤ Δσ by RegionHierarchy because not all the pixel centres are the local minima. Figure 2(b) is the averaged histogram of life. The obtained life histogram shows an unimodal shape. This implies that there exists a scale where the merging of the regions most frequently occurs. The regional minima of the noise image are uniformly distributed random points, and the regions tend to merge with nearest regions. Therefore, we deduce that this unimodal property is associated with distribution of the nearest neighbour distances of random points. In fact, the nearest neighbour distance distribution has a unimodal shape (See appendix A). The scale of

Validation of Watershed Regions by Scale-Space Statistics

181

σ

Relative frequency (a)

(b)

Fig. 2. Noise image and the averaged life histogram for its gradient watershed regions. (a) The noise image has uncorrelated random pixel values. (b) The life histogram shows relative frequency of scale at which the regional minima of the gradient image are annihilated as Gaussian blurring of the noise image proceeds.

the mode can be used as a gauge of the density of invalid regions. The regional minima with signiﬁcantly large values of life out of the unimodal distribution W can be identiﬁed to be valid, because such regional minima are distinguishable from the invalid regional minima of the noise image. 3.3

Recursive Validation

We can set a critical value of the scale to judge the watershed regions valid or invalid. Although the computation of such a critical scale requires the parametric model of the life distribution in the strict sense of statistics, the critical scale can be roughly evaluated by the peak and decaying form of the life histogram. If the image contains valid regions, the life histogram may be multimodal or may have a peak at a small scale relative to the outlying lives representative of the valid regions. According to our experimental result in Section 3.2, a regional minimum with a life which is more than six times greater than the peak can be considered to be valid with the statistical conﬁdence level α > 99% under the assumption of uncorrelated Gaussian random pixel values of a two-dimensional image as the noise. We present an algorithm RegionDiscovery for discovery of the valid watershed regions. This algorithm recursively validates the regions in a top-down fashion using each tree T in G by RegionHierarchy. According to the hierarchy, any discovered region is split into subregions as long as they are valid. Each subregion is validated using the life histograms constructed from the lives stored in the subtrees of T corresponding to the subregion.

182

T. Sakai and A. Imiya

RegionDiscovery(tree T , set of valid regions V , signiﬁcance level α) 1 let Σ be a set of life values stored in T except the root; 2 let s be the subroot node of T with the largest life value σmax ∈ Σ; 3 if IsMultimodal(Σ) or IsOutlier(σmax , Σ, α) then 4 RegionDiscovery(Subtree(T , s), V , α); 5 RegionDiscovery(T \Subtree(T , s), V , α); 6 else 7 push the region R := Pixels(T ) into V ; 8 end if. Here, the function IsMultimodal returns true if the histogram of Σ is not unimodal. IsOutlier returns true if the life σ t is greater than the critical αlevel of scale computed from the given set of lives Σ. Note that these functions discard the lives in 0 < σ ≤ Δσ. Subtree extracts the subtree with subroot node s from the tree T . Pixels returns a set of pixels whose labels are recorded at the nodes in the given tree. The following function, Watershed, executes our watershed segmentation algorithm for a given image f with a set of pixels P and a signiﬁcance level α. It returns the set of valid watershed regions consisting of subsets of P . Watershed(set of pixel centres P , image f , signiﬁcance level α) 1 set V := ∅; 2 G := RegionHierarchy(P, f ); 3 for each tree T in G do 4 RegionDiscovery(T , V , α); 5 end for 6 return V .

scale 30 25 20 15 10 5

(a)

(b)

(c)

Fig. 3. An example of our watershed segmentation of noisy image. (a) Original image with 20% noise. (b) Trajectories of local minima of the gradient magnitude of (a) in scale space. The trajectories reaching out of the spatial domain are subordinate to a local minimum at inﬁnity. (c) Watershed regions of the gradient magnitude by the algorithm Watershed. The brightness indicates the order of lives.

Validation of Watershed Regions by Scale-Space Statistics

4

183

Test Example

We demonstrate our gradient watershed segmentation Watershed for a noisy greyscale image. The purpose of this section is not to test the performance of the algorithm, but to show that the statistics in scale space has potential to discover the valid watershed regions without any prior information about them. Figure 3(a) shows a 128 × 128 test image f (x) with 20% additive noise [18]. The trajectories of local minima of |f (x, σ)| traced from σ = 0 in scale space are shown in Fig. 3(b). We see a large number of local minima created by the noise at small scales. As the scale increases, the local minima are hierarchically grouped and representative local minima survive at larger scales. Figure 3(c) shows the segmentation result with a conﬁdence level α = 99% for f (x). There are nine

σ=2

σ=6

σ = 12

Fig. 4. Watershed segmentation of Fig. 3(a) at diﬀerent scales. First row: the scalespace image f (x, σ). Second row: the gradient magnitude |∇f (x, σ)|. Third row: the watersheds of |∇f (x, σ)|. Each column corresponds to the same scale indicated below.

184

T. Sakai and A. Imiya

discovered regions clearly corresponding to the major regions of the original image. The tiny faults in the regions were caused by failure in the minimisation. They were wrongly assigned to the image background, which should be ﬁxed in the future work. For the comparision purpose, we show in Fig. 4 the simple watershed segmentation results at a few levels of scale without using the region hierarchy or statistics in scale space. We see invalid small regions at small scales while the shapes of valid regions at large scales are distorted. It is remarkable that structural and statistical analyses using scale space can reconstruct the precise edges of statistically valid watershed regions despite the signiﬁcant noise.

5

Concluding Remarks

The scale-space treatment of the image clariﬁes not only the hierarchical relationships among the watershed regions but also their statistical properties. We can observe in the Gaussian scale space how the random features are suppressed and deterministic features emerge as the scale grows. A valid watershed region must be statistically distinguishable from unreproducible regions caused by the random features. The reproducibility is a desirable ability of image recognition techniques. On the basis of this simple requirement we described the null hypothesis H0 , which is to be rejected if the watershed region is valid. A watershed region is recognised as valid at a statistical conﬁdence level in rejecting H0 . We presented a validation scheme for watershed segmentation using statistics in scale space. We deﬁned the life of a watershed region, whose distribution is useful for testing H0 . We showed that the life distribution for the noise image is unimodal, and the valid regions can be identiﬁed by the regional minima with signiﬁcantly large values of lives out of the unimodal distribution. The statistical properties of the life and the region hierarchy enable the recursive validation of the watershed regions. A distinctive feature of our scheme is that it does not require any deﬁnition of similarity or dissimilarity measures between watershed regions, which is used in many methods for preventing oversegmentation. Instead, we focused on the statistical diﬀerences between the valid and invalid regions in scale space. In order to take advantage of the potential of scale-space statistics, our scheme requires further investigation, especially in relation to the model of the life distribution, and improvement and acceleration of the algorithms to obtain feasible segmentation results for larger size real images.

References 1. Vincent, L., Soille, P.: Watersheds in digital spaces: An eﬃcient algorithm based on immersion simulations. IEEE Trans. on Pattern Analysis and Machine Intelligence 13(6), 583–598 (1991)

Validation of Watershed Regions by Scale-Space Statistics

185

2. Beucher, S.: Watershed, hierarchical segmentation and waterfall algorithm. In: Proc. Math. Morphology and Its Appl. to Image Processing, pp. 69–76 (1994) 3. Maes, F., Vandermeulen, D., Suetens, P., Marchal, G.: Computer-aided interactive object delineation using an intelligent paintbrush technique. In: Ayache, N. (ed.) CVRMed 1995. LNCS, vol. 905, pp. 77–83. Springer, Heidelberg (1995) 4. Hagyard, D., Razaz, M., Atkin, P.: Analysis of watershed algorithms for greyscale images. In: Proc. of IEEE Intl. Conf. Image Procesing, vol. 3, pp. 41–44 (1996) 5. Olsen, O.F., Nielsen, M.: Generic events for the gradient squared with application to multi-scale segmentation. In: ter Haar Romeny, B.M., Florack, L.M.J., Viergever, M.A. (eds.) Scale-Space 1997. LNCS, vol. 1252, pp. 101–112. Springer, Heidelberg (1997) 6. Gauch, J.M.: Image segmentation and analysis via multiscale gradient watershed hierarchies. IEEE Trans. on Image Processing 8(1), 69–79 (1999) 7. Roerdink, J.B.T.M., Meijster, A.: The watershed transform: deﬁnitions, algorithms, and parallelization strategies. Fundamenta Informaticae 41, 187–228 (2001) 8. Vanhamel, I., Pratikakis, I., Sahli, H.: Multiscale gradient watersheds of color Images. IEEE Trans. on Image Processing 12(6), 617–626 (2003) 9. Witkin, A.P.: Scale space ﬁltering. In: Proc. of 8th IJCAI, pp. 1019–1022 (1986) 10. Koenderink, J.J.: The structure of images. Biological Cybernetics 50, 363–370 (1984) 11. Lindeberg, T.: Scale-Space Theory in Computer Vision. Kluwer, Boston (1994) 12. Weickert, J., Ishikawa, S., Imiya, A.: Linear Scale-Space has First been Proposed in Japan. Journal of Mathematical Imaging and Vision 10, 237–252 (1999) 13. Lifshitz, L.M., Pizer, S.M.: A multiresolution hierarchical approach to image segmentation based on intensity extrema. IEEE Trans. on Pattern Analysis and Machine Intelligence 12(6), 529–540 (1990) 14. Florack, L.M.J., Kuijper, A.: The topological structure of scale-space images. Journal of Mathematical Imaging and Vision 12(1), 65–79 (2000) 15. Kuijper, A.: The deep structure of Gaussian scale-space images. PhD thesis, Utrecht University (2002) 16. Sakai, T., Imiya, A.: Gradient structure of image in scale space. Journal of Mathematical Imaging and Vision 28(3), 243–257 (2007) 17. Keys, R.: Cubic convolution interpolation for digital image processing. IEEE Trans. on Acoustics, Speech, and Signal Processing 29(6), 1153–1160 (1981) 18. SAMPL database, http://sampl.ece.ohio-state.edu/database.htm 19. Suwa, N.: Quantitative morphology: stereology for biologists. Iwanami Shoten (1977) (in Japanese)

A

Distribution of Nearest Neighbour Distances

We present a proof that the nearest neighbour distances obey the Weibull distribution if the points in Rd are uniformly distributed in a Poisson arrangement [19]. The Poisson arrangement is deﬁned as the uniformly random distribution of points with constant density ρ such that the number of points x in a ﬁxed volume V follows the Poission distribution. Po(x; λ) =

λx exp(−λ) x!

(3)

186

T. Sakai and A. Imiya

Here, λ = ρV is the expected number of points in the volume V . Let r be the distance from an arbitrary point. The distribution of the nearest neighbour distances, p(r), can be regarded as the probability that the nearest neighbour is found in an inﬁnitesimal gap between r and r + δr. This is the case that no points are found within the distance r, and at least one point is found between r and r + δr. Since the volume Vd of a unit d-ball and its surface area Sd−1 has a relationship Vd d = Sd−1 , we have p(r)δr = Po(0; ρVd rd ) 1 − Po(0; ρSd−1 rd−1 δr) ≈ exp(−ρVd rd ) 1 − exp(ρSd−1 rd−1 δr) = exp(−ρVd rd ) · ρSd−1 rd−1 δr = exp(−ρVd rd ) · ρVd drd−1 δr √ Letting s = 1/d ρVd be the scale of the average volume of d-dimensional hypercube per point, we obtain the Weibull distribution

d r d−1 r d p(r; s, d) = (4) exp − s s s where s and d correspond to the so-called scale and shape parameters of the Weibull distribution, respectively. This distribution p(r; s, d) has a mode at r = s d (d − 1)/d. For a ﬁxed dimensionality, the mode depends only on the scale parameter s, which enables us to calculate the point density ρ from the mode.

Adaptation of Eikonal Equation over Weighted Graph Vinh-Thong Ta, Abderrahim Elmoataz, and Olivier Lézoray Université de Caen Basse-Normandie, GREYC CNRS UMR 6072, Image Team {vinhthong.ta,abderrahim.elmoataz-billah,olivier.lezoray}@unicaen.fr http://www.info.unicaen.fr/˜vta

Abstract. In this paper, an adaptation of the eikonal equation is proposed by considering the latter on weighted graphs of arbitrary structure. This novel approach is based on a family of discrete morphological local and nonlocal gradients expressed by partial diﬀerence equations (PdEs). Our formulation of the eikonal equation on weighted graphs generalizes local and nonlocal conﬁgurations in the context of image processing and extends this equation for the processing of any unorganized high dimensional discrete data that can be represented by a graph. Our approach leads to a uniﬁed formulation for image segmentation and high dimensional irregular data processing.

1

Introduction

Solutions of the nonlinear eikonal equation have found numerous applications. One can quote for instance, geometric optics, image analysis or computer vision including shape from shading [1, 2], median axis or skeleton extraction [3], topographic segmentation (watershed) [4] or geodesic distance computation on discrete and parametric surfaces [5, 6, 7, 8, 9]. The latter works consider both structured and unstructured meshes on cartesian or non-cartesian domains. The eikonal equation is a special case of the following general continuous Hamilton-Jabobi equation: H(x, f, ∇f ) = 0 x∈Ω ⊂ IRn , (1) f (x) = φ(x) x∈Γ ⊂ Ω where φ in the boundary condition is a positive speed function deﬁned on Ω and f (x) is the traveling time or distance from source Γ . Then, the eikonal equation can be expressed by using the following Hamiltonian: H(x, f, ∇f ) = ∇f (x) − P (x),

(2)

where P (x) is a given potential function. Solution of (1) represents the shortest distance from x to the zero distance curve given by Γ (where φ(x)=0). Solutions of (2) are usually based on a discretization of the Hamiltonian where the approximation of the derivatives is performed by the Godunov [10] or the X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 187–199, 2009. c Springer-Verlag Berlin Heidelberg 2009

188

V.-T. Ta, A. Elmoataz, and O. Lézoray

Lax-Friedrich [11] schemes. Then, many numerical methods have been proposed and investigated to solve the nonlinear system described by (2). For instance, one can quote the following schemes. (i) An iterative scheme [1] relying on ﬁxed point methods that solves a quadratic equation was proposed. (ii) The fast sweeping methods [12] that use Gauss-Seidel type of iterations to update the distance function ﬁeld. The key point of fast sweeping is to update the points in a certain order. (iii) Tsitsiklis [13] was the ﬁrst to develop a Dijkstra like method and proposed an optimal algorithm for solving the eikonal equation. Based on this idea, [14, 11] produced the fast marching methods. Another approach to solve (2) is to consider a time dependent version of the equation and to evolve it to the steady state. Then, (2) can be rewritten as ⎧ n ⎪ ⎨∂f (x, t)/∂t = −∇f (x) + P (x) x∈Ω ⊂ IR (3) f (x, t) = φ(x) x∈Γ ⊂ IRn . ⎪ ⎩ f (x, 0) = φ0 (x) x∈Ω This paper only considers the discrete analogue of the time dependent formulation of the eikonal equation but, in future works, the stationary case (time independent) will be also considered. Contributions. In this work, we propose an adaptation of (3) over weighted graphs of the arbitrary structure. The goal here is to provide a simple and common formulation that solves the eikonal equation for any discrete data that can be represented by a weighted graph such as images or high dimensional data deﬁned on irregular domains. This alternative formulation for solving the eikonal equation is based on partial diﬀerence equations (PdEs) and discrete gradients over weighted graphs. Our formulation has several advantages. Any discrete domain that can be described by a graph can be considered without any spatial discretization. In the context of image processing, local and nonlocal conﬁgurations are directly enabled within a same formulation. Finally, the aim of this paper is not to solve a particular application with the eikonal equation but to show the potentialities of our proposition to address image segmentation, data clustering or distance computation. Paper Organization. The paper is organized as follows. Section 2 recalls basics, deﬁnitions and operators on weighted graphs. Section 3 introduces our formulation for solving the eikonal equation. Section 4 shows the potentialities of our proposition for the segmentation of images and unorganized data processing. Finally, last Section concludes.

2

Discrete Derivatives on Weighted Graphs

This Section recalls basics, deﬁnitions, operators and processes on weighted graphs.

Adaptation of Eikonal Equation over Weighted Graph

2.1

189

Definitions and Weighted Graphs Construction

Notations and Definitions. We consider the general situation where any discrete domain can be viewed as a weighted graph. Let G=(V, E, w) be a weighted graph composed of two ﬁnites sets: vertices V and weighted edges E⊆V ×V . An edge (u, v)∈E connects two adjacent (neighbor) vertices u and v . The neighborhood of a vertex u is noted N (u)={v∈V \{u} : (u, v)∈E}. The weight ωuv of an edge (u, v) can be deﬁned with a function w:V ×V →IR+ such that w(u, v)=ωuv if (u, v)∈E and w(u, v)=0 otherwise. Graphs are assumed to be simple, connected and undirected implying that function w is symmetric. Let f :V →IR be a discrete real-valued function that assigns a real value f (u) to each vertex u∈V . We denote by H(V ) the Hilbert space of such functions deﬁned on V . Weighted Graphs Construction. Any discrete domain can be represented by a weighted graph where functions of H(V ) represents the data to process. In the general case, an unorganized set of points V ⊂IRn can be seen as a function f 0 :V ⊂IRn →IRn . Then, constructing a graph from this data consists in deﬁning the set of edges E by modeling the neighborhood. It is based on a similarity relationship between data with a pairwise distance measure μ:V ×V →IR+ . There exists several methods to transform a set of vertices V into a neighborhood (similarity) graph (see [15] for a survey on proximity and neighborhood graphs). In this paper, we focus on two particular graphs: the τ -neighborhood graphs and a modiﬁed version of k-nearest neighbors graphs. The k nearest neighbors graph, noted k-NNG is a weighted graph where each vertex u∈V is connected to its k nearest neighbors which have the smallest distance measure towards u according to function μ. Since this graph is directed, a modiﬁed version of this graph is used to make it undirected. The τ -neighborhood graph, noted Gτ is a weighted graph where the τ -neighborhood Nτ for a given vertex u∈V is deﬁned as Nτ (u)={v∈V \{u} : μ(u, v)≤τ } with τ >0 a threshold parameter. 2D images can be viewed as functions f 0 :V ⊂ZZ 2 →IRn . In this case, the associated distance μ for construct the neighborhood graph is usually the city block or the Chebychev distances computed with the spatial coordinates of each vertex representing an image pixel. With these distances and the τ -neighborhood graphs, one recovers the two usual graphs used in image processing, the 4-adjacency grid graph (denoted G0 with the city block distance) and the 8-adjacency grid graph (denoted G1 with the Chebychev distance) with τ ≤1. Another useful graph structure in image processing is the region adjacency graph (RAG) where vertices correspond to image regions, and the set of edges is obtained by considering an adjacency distance. With the τ -neighborhood (τ =1), the RAG is the Delaunay graph of an image partition. Weights Computation. Similarities between data can be incorporated within edges’ weights according to a measure of similarity g:E→IR+ that satsﬁes w(u, v)=g(u, v) for (u, v)∈E. Then, the distance computation between data is performed by comparing their features that generally depend on a given initial function f 0 ∈H(V ). To this aim, each vertex u∈V is assigned with a feature vector F (f 0 , u)∈IRm . With F , the following weight functions can be considered. For

190

V.-T. Ta, A. Elmoataz, and O. Lézoray

a given edge (u, v)∈E and a distance measure ρ:V ×V →IR+ associated to F , we can have g0 (u, v) = 1 (constant weight case) , g1 (u, v) = (ρ(F (f 0 , u), F (f 0 , v)) + )−1 with >0, →0, g2 (u, v) = exp(−ρ(F (f 0 , u), F (f 0 , v))2 /σ 2 ) with σ>0, where σ controls the similarity and ρ is usually the euclidean distance. Several choices for the expression of F can be considered depending on the features to preserve. The simplest one is F (f 0 , .)=f 0 . In the context of image processing, an important feature vector F is provided by image patches, i.e., F (f 0 , u)=Fτ (f 0 , u)={f 0 (v) : v∈Nτ (u) ∪ {u}}. In the case of a grayscale image Fτ (f 0 , .) is a vector of size (2τ +1)2 corresponding to the values of f 0 in a square window of size (2τ +1)×(2τ +1) centered at vertex u (a pixel). Color images can be handled using features of dimension 3×(2τ +1)2 . Then, the resultant weight function directly incorporates local or nonlocal features [16]. This feature vector has been proposed in the context of texture synthesis [17], and further used in the context of image processing [18,19]. 2.2

Graph Based Discrete Gradients

Let G=(V, E, w) be a weighted graph. The discrete weighted gradient of a function f ∈H(V ) at a vertex u∈V is deﬁned by

(∇w f )(u) = (∂v f (u))(u,v)∈E

where ∂v f (u)= w(u, v)(f (v)−f (u)) corresponds to the discrete (partial) derivative of f with respect to the edge (u, v). These deﬁnitions have been used by [20] for image and mesh regularization. Based on the latter works, two discrete formulations of weighted morphological gradients on graphs have been proposed − by [21]: namely, the weighted external ∇+ w and the internal ∇w gradient operators. For u∈V + (∇+ w f )(u) = (∂v f (u))(u,v)∈E

and

− (∇− w f )(u) = (∂v f (u))(u,v)∈E ,

(4)

where the external ∂v+ f (u) and the internal ∂v− f (u) discrete partial derivatives are ∂v+ f (u) = max(0, ∂v f (u)) and ∂v− f (u) = − min(0, ∂v f (u)), with ∂v− f (u)=∂u+ f (v). When the weight is constant (w=g0 ) these deﬁnitions recover the classical directional derivative operators. The Lp -norm (with 0
1/p w(u, v)p/2 |(f (v)−f (u))± |p and (5) (∇± w f )(u)p = v∼u

1/2 |(f (v)−f (u))± | . (∇± w f )(u)∞ = max w(u, v) v∼u

(6)

Adaptation of Eikonal Equation over Weighted Graph

191

Notation v∼u means that vertex v is adjacent to u. ∇± w refers to both external and internal gradient (with respect to the sign) and (a)+ = max(0, a) and (a)− = min(0, a). These gradients have the following property: p − p (∇w f )(u)pp = (∇+ w f )(u)p + (∇w f )(u)p

with 0
3

Eikonal Equation on Weighted Graphs

In this Section, we present our formulation to approximate the eikonal equation (3) over weighted graphs by considering PdEs and the morphological gradients presented in the previous Section. With morphological processes described by (8), the time dependent eikonal formulation (3) can be viewed as an erosion process regarding the minus sign and a null potential function P . With the corresponding internal gradient (∇− w) involved in discrete PdEs based erosion process, (3) can be directly rewritten with weighted graphs. Given a graph G=(V, E, w) and a function f ∈H(V ), we obtain a discrete PdEs based version of the system (3) ⎧ − ⎪ ⎨∂f (u, t)/∂t = −∇w f (u)p + P (u) u∈V f (u, t)=φ(u) u∈V0 ⊂ V , ⎪ ⎩ f (u, 0)=φ0 (u) u∈V where V0 corresponds to the initial seed vertices. With f n (u) ≈ f (u, nΔt), this iterative numerical scheme is obtained for all u∈V : n f n+1 (u)=f n (u) − Δt (∇− (9) w f )(u)p − P (u) .

192

V.-T. Ta, A. Elmoataz, and O. Lézoray

The steady state (i.e. given a ﬁxed number n of iteration or when f n+1 −f n < ) of this process is the solution of the eikonal equation (2). Injecting the corresponding internal gradient norm in (9), we obtain for the Lp -norm (5) and the L∞ -norm (6)

1/p f n+1 (u)=f n (u) − Δt w(u, v)p/2 | min(0, f (v)−f (u))|p − P (u) , (10) v∼u

f

n+1

(u)=f (u) − Δt max w(u, v)1/2 | min(0, f (v)−f (u))| − P (u) . n

v∼u

(11)

The proposed methodology leads to a simple and common formulation that constitutes an adaptative framework for the eikonal equation. Indeed, our approach only depends on the p value and the weight function w. In Sect. 4, experiments show how the framework can be adapted to address image segmentation or data clustering. Relations with other schemes. Scheme (9) has the advantage to work on any graph structures. Then, with an adapted graph topology and an appropriated weight function, the proposed formulation is linked to well-known schemes such as Osher-Sethian Hamiltonian discretization scheme or the graph based Dikjstra algorithm. Osher-Sethian scheme. Let G0 =(V, E, g0 ) be an unweighted 4-adjacency grid graph associated with an image. Then, (10) recovers the exact Osher-Sethian upwind ﬁrst order Hamiltonian discretization scheme [14] when p=2 and using G0 :

1/2 f n+1 (u)=f n (u) − Δt | min(0, f (v)−f (u))|2 − P (u) . v∼u

Replacing vertices u∈V and their neighborhood by their spatial coordinates (x, y), the latter expression can be rewritten as 2 f n+1 ((x, y))=f n ((x, y))−Δt | min 0, f n ((x, y))−f n ((x−1, y)) 2 +| max 0, f n ((x+1, y))−f n ((x, y)) 2 +| min 0, f n ((x, y))−f n ((x, y−1))

2 1/2 +| max 0, f n ((x, y+1))−f n ((x, y)) − P ((x, y)) , since min(0, a−b)2 = max(0, b−a)2 . This equation corresponds to the discretization scheme of the Hamilton-Jacobi equations proposed by [14]. Dikjstra scheme. Let G=(V, E, g0 ) be an unweighted graph. Then, (11) corresponds to an iterative version of the Dikjstra shortest path algorithm deﬁned on graphs of arbitrary structure. Indeed, in the case where p=∞, Δt=1 and with G, (11) becomes, for all u∈V f n+1 (u) = f n (u) − max | min(0, f (v)−f (u))| + P (u) = min(f n (v)) + P (u), v∼u

v∼u

Adaptation of Eikonal Equation over Weighted Graph

193

by considering the neighborhood of u as the set N (u)∪{u} and with the properties that max(0, a−b)= − min(0, b−a) and min(0, a−b)= min(a, b)−b. This equation corresponds to a shortest path algorithm for a given graph where at each step, the distance f (u) at vertex u corresponds to the minimal distance in its neighborhood.

4

Experiments

The proposed formulation of the eikonal equation and can be used to process any function deﬁned on vertices of a graph or on any arbitrary discrete domain. This Section illustrates the potentialities of our formulation through examples of weighted distance computation, image segmentation and unorganized high dimensional data processing. Diﬀerent graph structures and weight functions are also used to show the ﬂexibility of our approach. In the sequel, all experiments are obtained with a constant potential function P =1. Clearly, a diﬀerent potential function can be adapted for a particular application. The objective of the following experiments is not to solve a particular application. They only illustrate the potential and the behavior of our eikonal equation formulation. Adaptative Front Propagation and Weighted Distances. Figure 1 shows the adaptivity of our formulation in order to compute weighted distances. Indeed, this example shows results for diﬀerent p values, graph topologies, weight functions and features F . The initial seed is located at the top left corner of the original grayscale image f 0 :H(V )→IR. First, second and third rows of Fig. 1 show results for p=2, 1 and ∞ respectively, where (10) and (11) are used. All the results correspond to color distance maps (red for small and blue for large distances) where iso-levels sets are superimposed in white. First and second columns of Fig. 1 show results obtained with unweighted (w=g0 ) graphs. First column uses a 4-adjacency grid graph (G0 ) and corresponds to the classical case. Second column uses a 25-adjacency grid graph (G2 ) and shows the eﬀect of a larger neighborhood. Third and fourth columns show results obtained with weighted graphs. Third column considers graph G0 weighted by function g2 with F =f 0 . By using non constant weights, image information is automatically integrated in the distance computation that modiﬁes the front evolution speed particularly into the textured sub-image. Fourth column shows the nonlocal case where graph G2 is constructed and weighted with function g2 associated with patches of size 11×11. In that case, repetitive information are clearly captured by the weights that stops the front propagation around the textured sub-image. Finally, segmentation of the textured sub-image can be simply obtained by thresholding the computed distances. Image Segmentation with Region Based Graphs. The goal of the following two examples is not to show a perfect segmentation but to show how we can take advantage of graph topologies in image segmentation. The basic idea is to consider that image pixels are not the only relevant components in image

194

V.-T. Ta, A. Elmoataz, and O. Lézoray

Original f 0 :H(v)→IR

G0 , w=g0

G2 , w=g0

G0 , w=g2 F =f 0

G2 , w=g2 F =F5 (f 0 , .)

p=1

p=2

p=∞

Fig. 1. Front propagation and weighted distances with diﬀerent p values, graph conﬁgurations G, weights w and features F . Figures represent color distance maps with iso-level sets obtained by thresholding the distances. The seed is located at the top left corner (see text for more details).

and more abstract elements such as image regions can be used. Hence, we suggest to work directly with a reduced version of images: image partitions. Image partitions can be obtained by image pre-processing methods such as watershed. Figures 2(b) and 3(b) show such partitions computed from Figs. 2(a) and 3(a). Figures 2(c) and 3(c) are reconstructed images from partitions with the mean color value for each region. Figure 2 presents an example of image segmentation based on RAG and also shows that this graph structure can accelerate segmentation processes. This example compares segmentation obtained by a 4-adjacency grid graph G0 weighted by function g2 with pixel grayscale values (Fig. 2(d)) and segmentation result with a RAG constructed from partition (Fig. 2(b)) and weighted by function g2 with mean values (Fig. 2(e)). Color distance maps are obtained with the initial seeds (white points) in Fig. 2(a). Segmentations are performed by thresholding the obtained distances. Results show similar behaviors both on distance maps and segmentations while drastically speeding-up the segmentation process in the RAG case. Indeed, the number of vertices in the RAG represents approximatively 3% as compared to the number of vertices in the pixel based graph. The direct consequence is a decreasing of the computational complexity thanks to the reduced amount of data to consider. On a standard computer the computing time can be decreased by a 10 factor. Figure 3 shows another beneﬁt of using a RAG structure: nonlocal (non spatially connected) object segmentation. This experiment compares segmentation results with RAG (Fig. 3(d)) and nonlocal RAG (Fig. 3(e)). Both graphs are

Adaptation of Eikonal Equation over Weighted Graph

(a)

Original and seeds

(white)

(d)

(b)

Partition (97% of

reduction)

Grid graph G0 , w=g2

(c)

195

Reconstructed im-

age

(e)

RAG, w=g2

Fig. 2. Acceleration of image segmentation process. (a) original image (150×235) with 35 250 pixels. (b) partition with 999 regions (97% of reduction in terms of image components). (c) reconstructed image with mean color value. (d) and (e): at left, distance color maps (red for small and blue for large distances) and at right, ﬁnal segmentations. Images (d) are obtained with a pixel based graph computed from (a). Images (e) are obtained with a RAG constructed with (b) and (c) (see text for more details).

computed from partition 3(b) weighted by function g2 with mean color values. In nonlocal RAG case, each vertex neighborhood is extended by a 5 nearest neighborhood based on mean value feature. The obtained graph is a RAG∪5NNG graph. Figures 3(d) and 3(e) show color distance maps computed from initial seeds (white stroke) in Fig. 3(a) and ﬁnal segmentations. For local case (Fig. 3(d)), object marked by seeds is well segmented with respect to close distances (red color). The other objects are far (blue color) and the ﬁnal segmentation only extracts the marked one. For nonlocal case (Fig. 3(e)), the distance within the marked object is close to the initial seeds. In addition the distances to other triangles in the scene are also computed as close to seeds (red color). The consequence is that all the objects in the image are extracted by thresholding even if they are not spatially close with a minimal number of initial seeds. Unorganized High Dimensional Data Processing. The following experiments show applications of our formulation of the eikonal equation for the processing of high dimensional data in irregular domains. Figure 4 shows applications of the eikonal equation for data clustering and shortest path problems. The initial data set (Fig. 4(a)) is constituted of 133 images of head pose. Each image is of size 29×29. From this data set, two possible applications can be performed: clustering and head pose transition estimation. The goal here is not to solve machine leaning problems, but to show that these problems can be addressed by our formulation of eikonal equation. In order to process such data, a graph (|V |=133) is constructed where each vertex represents an image and is described by a feature of size 29×29 (i.e IR841 ). In the following results, initial seeds (images) are represented with white boundaries. Points that are close and

196

V.-T. Ta, A. Elmoataz, and O. Lézoray

(a)

Original and seeds

(white)

(d)

(b)

Partition (98% of re-

(c)

Reconstructed image

duction)

RAG, w=g2

(e)

RAG∪5-KNNG,w=g2

Fig. 3. Nonlocal region based image segmentation. (a) original image (256×256) with 65 536 pixels. (b) partition with 1 324 regions (98% of reduction as compared to original one). (d) and (e) at left, distance color maps (red for small and blue for large distances) and at right, ﬁnal segmentations. Graphs used in (d) and (e) are computed from (b) and (c) (see text for more details).

far to seeds are respectively represented with blue and red colors in distance maps (Fig. 4(b) and 4(c)). Figures 4(b) and 4(d) show the application of the eikonal equation for data clustering. Such an application can be used for data set exploration or semisupervised learning: given an input seed (query) one wants to obtained the closest points with respect to the initial input. Figure 4(b) shows the distance map obtained from a single initial seed. Figure 4(d) shows clustering results. Initial input has a white boundary. The 10 closest images are located at the top and the 10 farthest are located at the bottom of Fig. 4(d). Figures 4(c) and 4(e) shows another example of application of the eikonal equation for data set. Given two initial images, one wants to recover a transition sequence of images that separates them. This problem can be viewed as a shortest path problem solved by the eikonal equation. Figure 4(c) shows the distance map obtained from the initial seeds. Figure 4(e) shows the obtained path from seed at top left to seed at bottom right. These experiments show satisfying results and the ability of our approach to address machine learning problems even if a simple euclidean distance is used to compare data points. Clearly, results can be improved by using well adapted distances or features estimation.

Adaptation of Eikonal Equation over Weighted Graph

(a)

Original data, 133 images of size 29×29 (|V |=133, f 0 : V →

841

IR

(b)

)

Color distance map+seed (white boundaries)

(d)

197

Local clustering with (b)

(c)

Color distance map+seeds (white boundaries)

(e)

Shortest path with (c)

Fig. 4. High dimensional data clustering and shortest path. (b) and (c) color distance maps (blue for small and red for large distance) images superimposed with white boundaries are initial seeds. (d) clustering results where at top the 10 closest and at bottom the 10 farthest with respect to the seed (white boundary). (e) shortest path from the two initial seeds (white boundary).

5

Conclusion

In this paper, a discrete version of the eikonal equation over weighted graphs of arbitrary structure is proposed. Solution of the eikonal equation based on PdEs, discrete gradients and weighted graphs is presented. The proposed formulation constitutes a simple, common and adaptative framework that recovers well-known deﬁnitions and uniﬁes local and nonlocal conﬁgurations in the context of image processing. This framework can consider any discrete data that can be represented by weighted graphs. Through experiments, we have shown the potentiality and the ﬂexibility of our approach to address image segmentation

198

V.-T. Ta, A. Elmoataz, and O. Lézoray

and unorganized high dimensional data processing. Finally, an ongoing work is to address the stationary (time independent) version of the eikonal equation and to solve this equation by considering fast marching like methods on arbitrary graphs within our framework.

References 1. Rouy, E., Tourin, A.: A viscosity solutions approach to shape-from-shading. SIAM J. Num. Anal. 29, 867–884 (1992) 2. Sethian, J.: A fast marching level set methods for monotonically advancing fronts. Proc. Nat. Acad. Sci. 41(2), 199–235 (1999) 3. Siddiqi, K., Bouix, S., Tannenbaum, A., Zucker, S.W.: The hamilton-jacobi skeleton. In: Proc. ICCV, pp. 828–834 (1999) 4. Maragos, P., Butt, M.: Curve evolution, diﬀerential morphology and distance transforms as applied to multiscale and eikonal problems. Fundamentae Informatica 41, 91–129 (2000) 5. Bronstein, A.M., Bronstein, M.M., Kimmel, R.: Weighted distance maps computation on parametric three-dimensional manifolds. J. Comput. Phys. 225(1), 771–784 (2007) 6. Sethian, J.A., Vladimirsky, A.: Ordered upwind methods for static hamiton-jacabi equations: Theory and algorithms. SIAM J. Num. Anal. 41(1), 325–363 (2003) 7. Abgrall, R.: Numerical discretization of the ﬁrst-order hamilton-jacobi equations on triangular meshes. Comm. Pure and Applied Math. 49, 1339–1373 (1996) 8. Shu, C.-W., Zhang, Y.-T.: High order WENO schemes for hamilton-jacobi equations on triangular meshes. SIAM J. Scien. Comp. 24, 1005–1030 (2003) 9. Mémoli, F., Sapiro, G.: Fast computation of weighted distance functions and geodesics on implicit hyper-surfaces. J. Comput. Phys. 173, 730–764 (2001) 10. Leveque, R.: Finite Volume Methods for Hyperbolic Problems. Cambridge University Press, Cambridge (2002) 11. Sethian, J.A.: Level Set Methods and Fast Marching Methods. Evolving Interfaces in Computational Geometry. In: Fluid Mechanics, Computer Vision, and Materials Science, 2nd edn. Cambridge University Press, Cambridge (1999) 12. Zhao, H.K.: Fast sweeping method for eikonal equations. Math. Comp. 74, 603–627 (2005) 13. Tsitsiklis, J.: Eﬃcient algorithms for globally optimal trajectories. IEEE Trans. Autom. Control 40(9), 1528–1538 14. Osher, S., Sethian, J.: Fronts propagating with curvature-dependent speed: Algorithms based on Hamilton-Jacobi formulations. J. Comput. Phys. 79, 12–49 (1988) 15. Jaromczyk, J., Toussaint, G.: Proc. IEEE. Relative Neighborhood Graphs and Their Relatives 80(9), 1502–1517 (1992) 16. Elmoataz, A., Lézoray, O., Bougleux, S., Ta, V.T.: Unifying local and nonlocal processing with partial diﬀerence operators on weighted graphs. In: Proc. LNLA, pp. 11–26 (2008) 17. Efros, A., Leung, T.: Texture synthesis by non-parametric sampling. In: Proc. ICCV, pp. 1033–1038 (1999) 18. Buades, A., Coll, B., Morel, J.: Nonlocal image and movie denoising. IJCV 76(2), 123–139 (2008) 19. Gilboa, G., Osher, S.: Nonlocal operators with applications to image processing. Report 07-23, UCLA (2007)

Adaptation of Eikonal Equation over Weighted Graph

199

20. Bougleux, S., Elmoataz, A., Melkemi, M.: Discrete regularization on weighted graphs for image and mesh ﬁltering. In: Sgallari, F., Murli, A., Paragios, N. (eds.) SSVM 2007. LNCS, vol. 4485, pp. 128–139. Springer, Heidelberg (2007) 21. Ta, V.T., Elmoataz, A., Lézoray, O.: Partial diﬀerence equations over graphs: Morphological processing of arbitrary discrete data. In: Proc. ECCV, pp. 668–680 (2008) 22. Brockett, R., Maragos, P.: Evolution equations for continuous-scale morphological ﬁltering. IEEE Trans. Signal Process. 42(12), 3377–3386 (1994)

A Variational Model for Interactive Shape Prior Segmentation and Real-Time Tracking Manuel Werlberger, Thomas Pock, Markus Unger, and Horst Bischof Institute for Computer Graphics and Vision, Graz University of Technology {werlberger,pock,unger,bischof}@icg.tugraz.at http://www.gpu4vision.org

Abstract. In this paper, we introduce a semi-automated segmentation method based on minimizing the Geodesic Active Contour energy incorporating a shape prior. We increase the robustness of the segmentation result using the additional shape information that represents the desired structure. Furthermore the user has the possibility to take corrective actions during the segmentation and adapt the shape prior position. Interaction is often desirable when processing diﬃcult data like in medical applications. To facilitate the user interaction we add a shape deformation which allows to change the shape position manually by the user and automatically in terms of underlying image features. Using a variational formulation, the optimization can be done in a globally optimal manner for a ﬁxed shape representation. To obtain real-time behavior, which is especially important for an interactive tool, the whole method is implemented on the GPU. Experiments are done on medical, as well as on video data and camera streams that are processed in real-time. In terms of medical data we compare our method with a segmentation done by an expert. The GPU based binaries will be available online on our homepage.

1

Introduction

Image segmentation is a very common problem in computer vision. Many segmentation methods use low-level features to obtain a division into fore- and background. Due to the need of robustness it has become a common practice to incorporate high level knowledge to gain reasonable results. The method presented in this paper enhances the robustness of segmentation by imposing shape information of the desired object which allows a precise result with diﬃcult image data (Fig. 1). Pioneering contributions have been made by Cremers et al. [1] with their variational approach of ‘Diﬀusion Snakes’, the level set formulation by Leventon et al. [2, 3] as well as the region based approach by Paragios and Rousson [4, 5]. The eﬃcient registration of the shape prior to the desired image structure is a challenging problem. Therefore we developed a semi-automated segmentation tool that allows to adjust the object position by hand and by a local optimization routine which is modelled as shape transformation in either case. For the X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 200–211, 2009. c Springer-Verlag Berlin Heidelberg 2009

A Variational Model for Interactive Shape Prior Segmentation

201

Fig. 1. Segmentation of the metacarpal bone of a ring ﬁnger. The left image shows a simple intensity thresholding which clearly fails. The pure GAC segmentation in the middle image does not result into a valid segmentation either. The shape prior segmentation in the right image can also deal with the low-contrast regions and provides an accurate result.

realization of the segmentation method we use a variational formulation of the Geodesic Active Contour (GAC) energy. The minimization is done with a fast primal-dual approach that is implemented using NVIDIA graphics hardware to become real-time capable. Therefore the method can even be used for tracking objects in videos or live camera streams. Approaches based on the calculus of variation have had great success and recently it has been shown that variational methods show good parallelization capabilities that beneﬁt from a GPU implementation [6, 7]. The main contribution of our work is the incorporation of a shape representation into a variational segmentation framework. We show that the resultant segmentation is globally optimal for a ﬁxed shape prior and provide a GPU implementation for a fast primal-dual optimization procedure deﬁning a deﬁnite convergence criterion. Therefore we can add the possibility to interact with the shape position and get a segmentation result in real-time. The framework permits a local optimization of the shape position to get a correct segmentation of objects with a preceding misalignment of the prior. The remainder of the paper is organized as follows: First we give an overview of related work. Section 3 discusses the method on combining a shape prior with a variational formulation of the GAC segmentation model which leads to a segmentation model utilizing a Mumford-Shah (MS) like data term as shape force. In Section 3.1 we propose a fast numerical algorithm to compute the solution of the segmentation model. In 4 we present experiments and a qualitative assessment on reference data. Finally, Section 5 gives a short conclusion.

2 2.1

Related Work Mumford-Shah Segmentation

In [8], Mumford and Shah (MS) proposed a segmentation model of the form 2 2 (u − f ) dx + α |∇u| dx + β length (Γ ) (1) min u,Γ

Ω

Ω\Γ

202

M. Werlberger et al.

where f denotes the observed image, u its piecewise smooth approximation and Γ represents the edges in u. Equation (1) is based on a piecewise smooth approximation of the intensity function and was used in the computer vision community for various tasks like denoising, inpainting, stereo matching, segmentation and many more. A special case of the MS model to segment an image into fore- and background was proposed with the so-called piecewise constant MS segmentation model (2) that was later used by Chan and Vese [9] in combination with a level-set optimization. 2 2 min Per (Σ) + λ (f − c1 ) dx + λ (f − c2 ) dx (2) Σ,c1 ,c2

Σ

Ω\Σ

In (2) f denotes the input image and c1 , c2 the mean values of the fore- and background intensities separated by the region Σ. This realization of the MS functional represents the Potts model [10] for two distinct classes. In addition Chan et al. provided a convex formulation in [11] in form of a Total Variation (TV) functional for a binary segmentation u = 1Σ : |∇u| dx + λ us (x) dx (3) min u∈{0,1}

Ω

Ω

with

2

2

s (x) = (c1 − f (x) ) + (1 − u) (c2 − f (x) )

(4)

and with Ω |∇u| dx being the TV-norm in a distributional sense Ω |D u|. Due to the fact that we are working on image data which can be interpreted as suﬃcient smoothfunctions, the TV-norm is valid for any input u and we stick to the notation Ω |∇u| dx in this paper. For this formulation the TV-norm denotes the length of the segmentation: Ω |∇u| dx = Per (Σ). Moreover Chan et al. [11] showed that a global minimizer can be found for (2) with a restricted minimization of the relaxed problem (3) so that 0 ≤ u ≤ 1. The minimization set is then given by Σ = x ∈ Ω : u > μ , for every μ ∈ (0, 1) . (5) 2.2

Geodesic Active Contours

Based on the Snake model of Kass et al. [12], Caselles et al. [13] and Kichenassamy et al. [14, 15] proposed an energy that is invariant with respect to new parametrizations of the contour. The Geodesic Active Contour (GAC) (in 3D the model is called minimal surface) is deﬁned as the variational problem |C| min g (|∇I (C(s))|) ds , (6) C

0

where |C| describes the Euclidean length of the curve C and the function g models an edge detector. The edge strength has to be restricted to an interval g ∈ (0, 1]. One common choice for computing g is κ

g(|∇I|) = e−η|∇I| ,

for some reasonable parameters κ and η.

(7)

A Variational Model for Interactive Shape Prior Segmentation

203

The general intention of the Snake model is to locate the curve at points with a high edge strength and keep a certain smoothness in the curve. The main advance of GACs is the profound mathematical framework that makes the model very versatile for diﬀerent applications. The main drawbacks of the model are the non-convexity of the GAC energy and that the empty set is always a global minimizer of (6). In [16, 17, 18], several authors proposed the so-called weighted Total Variation (8) that can be used to give an alternative formulation of the GAC energy. They showed that if u = 1ΩC is a binary function with C the boundary of ΩC , the energy (8) equals the GAC energy (6). T Vg (u) = g |∇u| dx (8) Ω

Note that the weighted TV-norm is similar to the regularization term of (3). The additional weighting function g is a pointwise constant multiplier and therefore the method of Chan et al. [11, 19] mentioned in the previous Section 2.1 is still valid. Based on these assumptions they showed that by replacing 1ΩC with u ∈ [0, 1], (8) becomes convex, allowing to compute a global minimizer. However, there remains the problem that the empty set depicts a global optimal solution. In [6, 7], Unger et al. proposed a variational formulation incorporating user constraints that avoid this drawback of the classical GAC model.

3

Shape Prior Segmentation

Our main contribution is to combine GAC segmentation and a MS-like data term to incorporate shape information. Therefore we model the GAC energy with the weighted Total Variation (8) and utilize the need of additional constraints by imposing the shape prior. Starting with a formulation of the Mumford-Shah like energy utilized by Chan and Vese in [9] we obtain a variational optimization problem like in (3). The multiplicative part in the data-term will be used as shape information s (x) to model the shape prior segmentation energy. In addition we use the weighted Total Variation (8) to model the GAC energy and add a parameter λ to balance between regularization and shape force: min g |∇u| dx + λ s (x) u dx (9) 0≤u≤1

Ω

Ω

For a low λ the result of the GAC will be preferred, whereas for increasing λ the shape prior will be taken more into account. In Fig. 2 the eﬀects of diﬀerent parameter settings are shown. Since (9) is homogeneous of degree one, the thresholding theorem of [11] still applies in our case, allowing us to compute the global minimizer of (9) as mentioned in Section 2.1 according the MS segmentation model (3). The optimization method will be discussed in more detail in the following Section 3.1.

204

M. Werlberger et al.

Fig. 2. Evaluation of diﬀerent settings of λ: The images show that with increasing λ = {0.01, 0.02, 0.1} (left to right) the segmentation is more attracted to the ﬁxed shape. For a low λ the inﬂuence of the pure GAC energy increases and the segmentation is more attracted to signiﬁcant edges.

The shape prior itself inﬂuences the segmentation by setting pixelwise foreand background constraints which are modelled in the following way: s (x) < 0 s (x) > 0

... ...

Foreground Background

(10)

Therefore one can use diﬀerent types of shape representations. As a simple example we deﬁned the shape s (x) as a binary function with s (x) = −1 within the shape region and s (x) = 1 outside similar to [20] where Cremers et al. use subspace methods to learn a representative set of shapes. Binary functions are used to encode shapes which lead to problems when interpolating between two instances. As a consequential step we use a signed-distance map with the constraints (10) which implicitly includes a distance information towards the shape boundary. That means for our algorithm that the more a pixel is within the shape boundary, the more likely this region belongs to the desired segmentation. In Fig. 5 we show the beneﬁt on using a signed-distance map as shape representation compared to a binary one. This representation and the combination with a GAC energy allows to handle deformations with a single prior. 3.1

Solving the Shape Prior Segmentation Model

It is well known that functionals like (9) are diﬃcult to optimize due to the L1 norm |∇u|. Chan et al. [21], Carter [22] and Chambolle [23, 24] proposed a dual formulation for optimizing the classical variational problem of Rudin, Osher and Fatemi [25] for image denoising. Such a primal-dual approach can be applied to our minimization problem (9). The main intention is to remove the singularity by introducing the dual formulation of the weighted TV-norm g |∇u| dx = max − u div p dx , (11) Ω

d T

||p||≤g

Ω

where p = p1 , . . . , p : Ω → Rd is the dual variable with d being the problems dimension. Combining this maximization problem with the initial minimization task (9) this leads to

A Variational Model for Interactive Shape Prior Segmentation

min max − u div p dx + λ s (x) u dx .

0≤u≤1 ||p||≤g

Ω

205

(12)

Ω

For a ﬁxed shape prior the outline of the primal-dual optimization algorithm is given as follows: 1. Primal update: The primal update accomplishes the segmentation update and therefore performs the optimization according to the minimization of u: ∂ − u · div p dx + λ s (x) u dx = − div p + λs (x) (13) ∂u Ω Ω Performing a gradient descent update scheme this leads to un+1 = Π[0,1] un − τP (− div p + λs (x) ) ,

(14)

where τP denote the steplength and the orthogonal projection Π towards the binary set [0, 1] can be done with a simple thresholding step. 2. Dual update: The maximization according to p can be stated as ∂ p · ∇u dx + λ s (x) u dx = ∇u (15) ∂p Ω Ω with the additional constraint ||p|| ≤ g. This results into a gradient ascent method with a orthogonal reprojection to restrict the length of p to the weight g: pn+1 = ΠB0g pn + τD ∇u (16) Here B0g denotes a d-dimensional ball centered at the origin with the radius g. The reprojection onto B0g can be formulated with ΠB0g q =

q ||q|| max 1, g

(17)

3. Iterate until convergence: Solving the optimization problem (12) results in a consecutive update scheme with a gradient descent (14) and a gradient ascent step (16). Such an iterative algorithm demand on a convergence criterion. Therefore we take the energy of the single steps into account: Primal energy: The primal energy can be calculated by solving (12) by maximizing the equation towards the dual variable p. Due to (11) p can be restated as ∇u g |∇u| p ∈ B0g

p=

if ∇u =0

(18)

else

for the optimization. This results into the energy equation (19) which is the same as evaluating the energy functional (9). g |∇u| + λ s (x) u dx. (19) EP rimal = Ω

206

M. Werlberger et al.

Fig. 3. Relation of primal to dual energy while optimizing with the proposed primaldual update scheme. The plot shows the iterations in a logarithmic scale over 100 iterations. Note that after 20 iterations the primal-dual gap is small enough to stop iterating.

Dual energy: The dual energy can be formulated by minimizing (12) towards u:

min u − div p + λs (x) dx , (20) 0≤u≤1

Ω

which conclude that the binary segmentation u ∈ {0, 1} is set to u = 1 if the term − div p + λs (x) < 0 and u = 0 otherwise: min − div p + λs (x) , 0 dx (21) EDual = Ω

In [26, 27], Zhu et. al. introduce a measurement for the convergence state of primal-dual algorithms in case of the ROF model. As a criterion they use the gap between the primal and dual energy. Applied to our primal-dual optimization algorithm we get an adaption of the energies like in Fig. 3. Therefore we use a ﬁxed stepwidth for the primal (τP ) and dual update (τD ) with the constraint that τP τD ≤ 12 . For all the results shown in this paper we used τP = τD = √12 . We also tried adaptive timesteps (τP , τD ) for the optimization steps similar to the work of Zhu and Chan in [26] but did not ﬁnd a reasonable equivalent for our method. 3.2

Shape Alignment

So far, our considerations assume a spatially ﬁxed shape prior s (x) . In order to adapt the shape prior to diﬀerent locations in the image we introduce a set of transformation parameters φ = {t, R, S} with the transformation parameters t for translation, R for rotation and S for the scale. Imposing this transformation into the segmentation energy (9) leads to an additional optimization parameter: min φ (t, R, S) ◦ s (x) u dx (22) g |∇u| dx + λ u,φ

Ω

Ω

A Variational Model for Interactive Shape Prior Segmentation

207

Fig. 4. Segmentation of a vertebra in an X-ray image of the spline. Due to very bad contrast the segmentation without prior would fail completely. The deﬁnition of the prior was prepared by us and therefore cannot be considered as reference data.

In [20], Cremers et al. show with the help of the Lipschitz continuity that with an integrated rigid body motion the energy functional remains convex and therefore can be optimized globally for ﬁxed transformation parameters. They additionally show that for optimizing the shape position itself the complete subspace Ω where the energy (9) is deﬁned has to be sampled on a rather ﬁne grid with all possible shape positions for evaluating the minimization task in a global manner. This is of course not feasible for an interactive application due to the need of optimizing the problem in real-time. Therefore we added a semiautomated approach that allows the user to have inﬂuence on the shape position and in addition an automated search for the optimal transformation parameters in a local neighborhood can be done. The optimization scheme in Sect. 3.1 can be retained with an extension on optimizing the transformation φ: 1. Solve shape prior segmentation model. 2. Optimize transformation parameters φ with a fixed u: Here we use a semi-automated position optimization of the shape prior. First, the user has the possibility to do a coarse positioning of the shape and second an optimization step tries to ﬁt the shape to desired surrounding structures. Therefore we evaluate the energy (19) for diﬀerent transformations φ. The optimal position is found where (19) has its minimum. The user gets an immediate result while changing the shape position and therefore can directly interact with the segmentation algorithm. 3. Iterate until convergence. Note that the domain of φ is restricted due to performance issues. Doing a complete search over the whole parameters space of the transformation parameters, a global optimal solution of (22) can be calculated like in [20].

4

Experimental Results

To reach real-time performance we have to compute the iterative solution as fast as possible. Due to the good parallelization attributes of variational algorithms we decided to implement the method using GPGPU programming with the help of NVIDIAs CUDA. The involved enhancements oﬀer the possibility to combine

208

M. Werlberger et al.

Fig. 5. The same dataset as in Fig. 1 was used to evaluate the shape alignment step. Therefore the shape prior is placed nearby the desired bone (left image) and the alignment step searches for the optimal position in the local neighborhood. The results (middle image) show that the segmentations are equivalent to the hand-labeled points. The right image shows the alignment step using a binary shape prior which fails for the optimization step. For the correct result in the middle image a signed distance representation of the shape term was used.

Fig. 6. The position reﬁnement is robust against partial occlusion

user interactivity with the computational intensive variational method and get a segmentation result at interactive rates. The performance depends mainly on the size of the search region in the parameter space and on the shape prior’s size as well. As an example we achieve 80 frames per second on a NVIDIA GeForce GTX 280 for pure segmentation and 20 frames per second including the position optimization. In our Framework we have two possibilities to provide a shape prior for the segmentation. Either the user can deﬁne a shape prior directly with a segmentation of a structure using pure GAC energy by setting foreground and background seeds. This is especially useful when recurring structures have to be segmented. For more accurate results especially on diﬃcult data we can load a predeﬁned shape structure that is used for the segmentation method. In Fig. 1 a predeﬁned prior is used for segmenting ﬁnger bones and compare the segmentation result with a simple intensity thresholding and a segmentation with the pure GAC-energy. The result shows an identical segmentation result for the shape prior segmentation and the reference data labeled by an expert. A more diﬃcult example is shown in Fig. 4 which shows a segmentation of a single vertebra in an X-ray image of the spline. Due to the very bad contrast simple thresholding would obviously fail completely and also pure GAC energy would end up into setting very much seed information.

A Variational Model for Interactive Shape Prior Segmentation

209

Fig. 7. Real-time tracking of an espresso cup in a live camera stream

Fig. 8. Multiple vertebrae of the cervical spline are tracked separately through a sequence. The intention is to ascertain the movement of the vertebrae towards each other.

Fig. 9. Segmentation of bottles with a single shape prior

Examples of automated shape alignment with optimizing the transformation parameters in a local neighborhood are shown in Fig. 5–8. Fig. 5 shows again the labeled bone dataset. The overlay with reference data shows that the position optimization leads to the correct segmentation. For non-medical image data examples are presented in Fig. 6 and 7. The ﬁrst one shows that the proposed method is robust against partial occlusion. Furthermore the method can be used

210

M. Werlberger et al.

for tracking a certain structure over a sequence of frames. Fig. 7 shows a sequence of an espresso cup which was directly processed from a camera image with on-theﬂy segmentation and position optimization. For a restricted domain of movement we gain real-time performance for tracking a sequence. In Fig. 8 we track parts of the cervical spline in a moving X-ray image series to obtain a path of movement of the vertebrae to each other during a ﬂexion. The predeﬁned shapes of the four vertebrae are initialized in the ﬁrst frame and than automatically tracked over the complete ﬂexion. This can be used to ascertain shapes of implants for intervertebral discs. In Fig. 9 multiple objects are segmented with the help of a single shape prior. Therefore a prior is deﬁned in form of a bottle and placed roughly on each bottle in the image. The ﬁne adjustments are done automatically.

5

Conclusion

In this paper, we proposed a globally optimal shape prior segmentation method with additional user interaction and automated position reﬁnement. With this approach we can handle very diﬀerent images and gain robust segmentation results. Especially the segmentation for diﬃcult data like the low-contrast spline image beneﬁts from the additional shape information. A great advantage of variational methods like this are the parallelization capability that especially proﬁts by the modern graphics hardware that are able to boost the performance of such highly parallel algorithms.

References 1. Cremers, D., Tischhäuser, F., Weickert, J., Schnörr, C.: Diﬀusion snakes: Introducing statistical shape knowledge into the Mumford–Shah functional. International Journal of Computer Vision 50(3), 295–313 (2002) 2. Leventon, M., Faugeraus, O., Grimson, W.: Level set based segmentation with intensity and curvature priors. In: Workshop on Mathematical Methods in Biomedical Image Analysis, pp. 4–11 (2000) 3. Leventon, M., Grimson, W., Faugeras, O.: Statistical shape inﬂuence in geodesic active contours. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 316–323. IEEE, Los Alamitos (2000) 4. Paragios, N., Rousson, M., Ramesh, V.: Matching distance functions: A shape-toarea variational approach for global-to-local registration. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2351, pp. 775–789. Springer, Heidelberg (2002) 5. Paragios, N., Rousson, M., Ramesh, V.: Non-rigid registration using distance functions. Computer Vision and Image Understanding 89(2-3), 142–165 (2003) 6. Unger, M., Pock, T., Bischof, H.: Continuous Globally Optimal Image Segmentation with Local Constraints. In: Computer Vision Winter Workshop (2008) 7. Unger, M., Pock, T., Trobin, W., Cremers, D., Bischof, H.: TVSeg - Interactive total variation based image segmentation. In: British Machine Vision Conference (2008)

A Variational Model for Interactive Shape Prior Segmentation

211

8. Mumford, D., Shah, J.: Optimal approximations by piecewise smooth functions and variational problems. Comm. on Pure and Applied Math. XLII(5), 577–685 (1988) 9. Chan, T.F., Vese, L.A.: Active contours without edges. IEEE Trans. Image Processing 10(2), 266–277 (2001) 10. Potts, R.B.: Some generalized order-disorder transformations. Proc. Camb. Phil. Soc. 48, 106–109 (1952) 11. Chan, T.F., Esedoglu, S., Nikolova, M.: Algorithms for ﬁnding global minimizers of image segmentation and denoising models. SIAM Journal of Applied Mathematics 66(5), 1632–1648 (2006) 12. Kass, M.: Snakes: Active contour models. International Journal of Computer Vision 1(4), 321–331 (1980) 13. Caselles, V., Kimmel, R., Sapiro, G.: Geodesic active contours. International Journal of Computer Vision 22(1), 61–79 (1997) 14. Kichenassamy, S., Kumar, A., Olver, P., Tannenbaum, A., Yezzi, A.: Conformal curvature ﬂows: From phase transitions to active vision. Archive for Rational Mechanics and Analysis, 275–301 (1996) 15. Kichenassamy, S., Kumar, A., Olver, P.J., Tannenbaum, A.R., Yezzi Jr., A.J.: Gradient ﬂows and geometric active contour models. In: International Conference on Computer Vision, pp. 810–815 (1995) 16. Leung, S., Osher, S.: Global minimization of the active contour model with TVinpainting and two-phase denoising. In: Paragios, N., Faugeras, O., Chan, T., Schnörr, C. (eds.) VLSM 2005. LNCS, vol. 3752, pp. 149–160. Springer, Heidelberg (2005) 17. Bresson, X., Esedoglu, S., Vandergheynst, P., Thiran, J.P., Osher, S.J.: Global minimizers of the active contour/snake model. In: International Conference on Free Boundary Problems: Theory and Applications (FBP) (2005) 18. Bresson, X., Esedoglu, S., Vandergheynst, P., Thiran, J.P., Osher, S.J.: Fast global minimization of the active contour/snake model. J. of Mathematical Imaging and Vision 28(2), 151–167 (2007) 19. Chan, T.F., Esedoglu, S.: Aspects of total variation regularized L1 function approximation. SIAM Journal of Applied Mathematics 65(5), 1817–1837 (2005) 20. Cremers, D., Schmidt, F.R., Barthel, F.: Shape priors in variational image segmentation: Convexity, Lipschitz continuity and globally optimal solutions. In: Computer Vision and Pattern Recognition, pp. 1–6 (2008) 21. Chan, T.F., Golub, G.H., Mulet, P.: A nonlinear primal-dual method for total variation-based image restoration. SIAM Journal on Scientiﬁc Computing 20(6), 1964–1977 (1999) 22. Carter, J.: Dual Methods for Total Variation-based Image Restoration. PhD thesis, UCLA (2001) 23. Chambolle, A.: An algorithm for total variation minimization and applications. Journal of Mathematical Imaging and Vision 20(1-2), 89–97 (2004) 24. Chambolle, A.: Total variation minimization and a class of binary MRF models. In: Rangarajan, A., Vemuri, B.C., Yuille, A.L. (eds.) EMMCVPR 2005. LNCS, vol. 3757, pp. 136–152. Springer, Heidelberg (2005) 25. Rudin, L.I., Osher, S.J., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena 60, 259–268 (1992) 26. Zhu, M., Chan, T.: An eﬃcient primal-dual hybrid gradient algorithm for total variation image restoration. UCLA CAM Report 08-34 (2008) 27. Zhu, M., Wright, S.J., Chan, T.F.: Duality-based algorithms for total variation image restoration. UCLA CAM Report 08-33 (2008)

A Nonlinear Probabilistic Curvature Motion Filter for Positron Emission Tomography Images Musa Alrefaya, Hichem Sahli, Iris Vanhamel, and Dinh Nho Hao Vrije Universiteit Brussel, Dept. Electronics and Informatics ETRO-IRIS, Pleinlaan 2, B-1050 Brussels, Belgium {malrefay,hsahli,iuvanham}@etro.vub.ac.be http://www.etro.vub.ac.be

Abstract. Positron Emission Tomography (PET) is an important nuclear medicine imaging technique which enhances the eﬀectiveness of diagnosing many diseases. The raw-projection data, i.e. the sinogram, from which the PET is reconstructed, contains a very high level of Poisson noise. The latter complicates the PET image’s interpretation which may lead to erroneous diagnoses. Suitable denoising techniques prior to reconstruction can signiﬁcantly alleviate the problem. In this paper, we propose ﬁltering the sinogram with a constraint curvature motion diﬀusion for which we compute the edge stopping function in terms of edge probability under the assumption of contamination by Poison noise. We demonstrate through simulations with images contaminated by Poisson noise that the performance of the proposed method substantially surpasses that of recently published methods, both visually and in terms of statistical measures.

1

Introduction

Positron Emission Tomography (PET) is an in vivo nuclear medicine imaging method that provides functional information of the body tissues. The PET image results from reconstructing very noisy, low resolution raw data, i.e. the sinogram, in which important features are shaped as a curved structures. Enhancing the PET image spurred a wide range of denoising models and algorithms. Some methodologies focus on enhancing the reconstructed PET image directly, where others prefer enhancing the sinogram prior to reconstruction. Existing methods may suﬀer drawbacks such as the careful selection of a high number of parameters, smoothing of the important features’ boundaries, or prohibitive computation. Recently, nonlinear diﬀusion techniques have been investigated for PET images. Many researchers did explore the application of the well-known Perona

The authors sincerely wishes to express great thanks to Prof. M. Defrise, Division of Nuclear Medicine at AZ-VUB, for his discussions and feedback. The comparison to the TV-Nestrove scheme would not have been possible without the help of Dr. Pierre Weiss from INRIA-France, who did provide us the Matlab Code.

X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 212–223, 2009. c Springer-Verlag Berlin Heidelberg 2009

A Curvature Motion Filter for PET

213

and Malik anisotropic diﬀusion [15] in combination with diverse diﬀusivity functions, on PET images [2, 4, 5, 14, 26], as well as on sinograms [6, 25]. The main drawback of this ﬁlter, with respect to the Poisson noise, which characterize such type of images, is that the diﬀusion produces important oscillations in the gradient, which ﬁnally leads to a poorly smoothed image [28, 29]. Moreover, the adopted diﬀusivity functions do not consider the special properties of the sinogram in which the preservation of the curved-shape features is paramount (see Figure1. In [28], mean curvature motion and Gaussian curvature motion of PET images have been investigated. Total variation (TV) scheme for smoothing the PET images was also discussed in [28]. Happonen et al. [29] propose ﬁltering the sinogram in the stackgram domain where the signal along the sinusoidal trajectories of the sinogram can be ﬁltered separately. They used and compared the Gaussian and nonlinear ﬁlters. Filtering the sinogram has the advantage that the noise distribution is known, which is not the case after reconstruction. Consequently, this work proposes ﬁltering the sinogram by means of a curvature constrained ﬁlter in which the amount of diﬀusion is modulated according to a probabilistic diﬀusivity function that suits images contaminated with Poisson noise. In addition, a comparison of the proposed method with TV-based methodologies proposed in [1] and [12], is conducted. For this purpose, a simulated thorax PET phantom was constructed to which varying levels of Poisson noise have been added is used. For evaluating the ﬁltering approaches, contrast noise curves (contrast versus background noise at diﬀerent iteration numbers) were generated for the diﬀerent ﬁltering approaches. The reminder of the paper is organized as follows. Section 2, brieﬂy review the notions of curvature motion, edge aﬀected diﬀusion ﬁltering, and self-snakes. The proposed ﬁltering scheme is introduced in Sect. 3. Section 4.1 introduces the applied validation methods, the remainder of Sec.4 discusses the experimental results. Conclusions and future work are given in Sec.5.

2

Geometry Driven Scale-Space Filtering

This section reviews the formulations for mean curvature motion (MCM), Edge Aﬀected Variable Conductance Diﬀusion (EA-VCD), and self-snakes. Let f be a scalar image deﬁned on the spatial image domain Ω, then the family of diﬀused versions of f is given by: U (f ) : f (.) → u(., t) with u(., 0) = f (.)

(1)

where U is referred to as the scale-space ﬁlter, u is denoted the scale-space image, and the scale t ∈ R+ [23, 26]. The denoised or enhanced version of f , is a given u(., t) that is closest to the unknown noise-free version of f . 2.1

Curvature Motion

One way of introducing smoothness in the curve is to let it evolve under its Euclidean curvature k. Mean curvature motion (MCM) is considered as the

214

M. Alrefaya et al.

standard curvature evolution. MCM allows diﬀusion solely along the level-lines. In Gauge coordinates the corresponding PDE formulation is: ∇u ut (., t) = uvv = k|∇u| = div |∇u| (2) |∇u| Hence diﬀusion solely occurs along the v-axis. 2.2

Edge Aﬀected Variable Conductance Diﬀusion

Variable Conductance ﬁltering (VCD) is based on the diﬀusion with a variable conduction coeﬃcient that controls the rate of diﬀusion [23]. In the case of Edge Aﬀected-VCD (EA-VCD), the conductance coeﬃcient is inversely proportional to the edgeness. Consequently it is commonly referred to as the edge stopping function (g), in which the edgeness is typically measured by the gradient magnitude. The EA-VCD is governed by: ut = div [g(|∇u|)∇u]

(3)

The above PDE system together with the initial condition given in (1) is completed with homogenous von Neumann boundary condition on the boundary of the image domain. Note that the Perona and Malik’s antitropic diﬀusion [15] is an EA-VCD. 2.3

Self-Snakes

Self-snakes are a variant of the MCM in which an edge-stopping function is introduced [19]. The main goal is preventing further shrinking of the level-lines once they have reached the important image edges. For scalar images, self-snakes are governed by: ∇u ut = |∇u|div g(|∇u|) (4) |∇u| This equation adopts the same boundary condition as (3). Furthermore, it can be decomposed in two parts [19, 23]: ∇u ut = g|∇u|div |∇u| +(∇g).∇u (5) = gk|∇u| +(∇g).∇u The ﬁrst part describes a degenerate forward diﬀusion along the level lines, i.e. orthogonal to the local gradient; it allows preserving the edges. Additionally, the diﬀusion is limited in areas with high gradient magnitude and encouraged in smooth areas. Actually the ﬁrst term is the constraint curvature motion. The second term can be viewed as a shock ﬁlter since it pushes the level-lines towards valleys of high gradient, acting as Osher’s shock ﬁlter [18].

A Curvature Motion Filter for PET

3

215

The Probabilistic Curvature Motion Filter

Based on (i) the curvature motion method, and (ii) a probabilistic diﬀusivity function, we presented in an earlier work [16], this section introduces the proposed ﬁltering schemes for PET sinogram, considering the following characteristics: 1. The important features in the sinogram are curved structures with high contrast values. These represent the region of interests in the reconstructed PET image, e.g. tumor. 2. The weak edges in the sinogram are the edges that contains low contrast values. In other words, edges with small |uww |. 3. The noise in the sinogram is a priori identiﬁed as a Poisson noise. The above presented schemes, namely, MCM (2), EA-VCD (3) and the SelfSnakes (5), can be derived using the following general equation: ut = g1 (|∇u|)uvv + g2 (|∇u|)uww

(6)

where the second order Gauge derivatives of the image in the (vv) and (ww) directions are given by: uvv =

uxx u2y −2ux uy uxy +uyy u2x

(u2x +u2y )

uww =

uxx u2y +2ux uy uxy +uyy u2x

(u2x +u2y )

(7)

Equation (6) comprises, a diﬀusion modulated by g1 along the image edges (vv) (a smoothing term), and a diﬀusion adjustable by g2 across the image edges (ww) (a sharpening term). Careful modeling of these terms allows eﬃciently denoising the PET sinograms, whilst keeping their interesting features. In the following sections, we propose the use of a probabilistic diﬀusivity function, and derive two diﬀusions schemes, for which we did apply the Gauge derivatives numerical approximation that was described in [23]. 3.1

The Probabilistic Diﬀusivity Function

The main idea of the Probabilistic Diﬀusivity Function [16] is to express the diﬀusivity function as the probability that the observed gradient presents no edge of interest under a suitable marginal prior distribution for the noise-free gradient histogram. The diﬀusivity function was deﬁned as: gpr (x) = A(1 − P (H1 |x))

(8)

where the normalizing constant A is set to A = 1/(1 − P (H1 |0)) to ensure that gpr (0) = 1; the hypothesis H1 describes the notion whether an edge element of interest is present given the considered noise, and H0 an edge element of interest is absent. Formally, H0 : y ≤ σn , and H1 : y > σn (9)

216

M. Alrefaya et al.

with y being the ideal, noise-free, gradient magnitude, and σn the noise standard deviation in the observed gradient image. In [16] it has been demonstrated that gpr (x) = (1 + μη(0))

1 1 + μη(x)

(10)

where μ = P (H1 )/P (H0 ) is the prior odds, and η(x) = p(x|H1)/p(x|H0) is the likelihood ratio. −1 Considering a Laplacian prior p(y) = λ2 e−λ|y| , we have μ = eλσn − 1 [16], and the parameter λ can be estimated as λ = [0.5(σ 2 − σn2 )]−1/2

(11)

with σ 2 denoting the variance of the noisy image, and σn2 , as deﬁned above. Due to limited space, the reader is referred to [16] for the detailed expression of η(x) in (10). The proposed diﬀusivity function,(10), has no free parameters to optimize, and it ﬁts well in the cluster of the reference backward-forward diﬀusivities. Indeed, for the considered PET sinograms, the noise standard deviation, σn ,in 11 is being estimated as σn2 = V ar(uLn ) where the image noise uLn is reconstructed from the two ﬁnest resolution levels coeﬃcients by applying the wavelet decomposition of u, using daubechi(4) function. 3.2

Probabilistic Constraint Curvature Motion

For the probabilistic constraint curvature motion (PCCM), we start from a constraint version of mean curvature motion: the diﬀusion across the level lines is prohibited whilst the diﬀusion along the level-lines is controlled via the probabilistic diﬀusivity function (10): ut = gpr (|∇u|)uvv = gpr (|∇u|)k|∇u|

(12)

. Thus the function g1 in (6) is chosen to be g1 = gpr , for dealing with Poisson noise. This ﬁlter eﬀectively smooths the image, as well as preserves edges of the important features such as lines, curve and ﬂow-like structures. By its nature, the PCCM cannot enhance the weak edges and/or features in the sinogram. The second term in (6) allows the sharpening. Consequently, we . set g2 = g1 . In this way, weak but important edges are enhanced whilst the noise is removed eﬃciently. Formally, the enhanced PCCM (ePCCM) is given by: ut = gpr (|∇u|)uvv + gpr (|∇u|)uww

3.3

(13)

Probabilistic Self Snakes (PSS)

It can be demonstrated that the diﬀusion of scalar images via EA-VCD can be decomposed into (5), moreover, it can be rewritten as [7, 23]: ut = g(|∇u|)uvv + [g(|∇u|) + g (|∇u|)|∇u|] uww

(14)

A Curvature Motion Filter for PET

217

consolidating the properties of both the self-snakes and the EA-VCD into a single diﬀusion schema. Considering equation (6), and the proposed probabilistic . . diﬀusivity function, we have g1 (x) = gpr (x), and the sharpening term, g2 (x) = gpr (x) + xgpr (x). This ﬁlter proves to be very eﬀective and ﬂexible for the sinogram image where the high contrast regions, which represent a tumor in the reconstructed PET, should be smoothed wisely without blurring the poor edges. Like EA-VCDT, the main advantage of this ﬁlter is that the average gray value of the image is not altered during the diﬀusion process which is a signiﬁcant issue in the sinogram.

4 4.1

Experiments and Discussion Introduction

The goal of the conducted experiments consists of measuring the performance of the proposed ﬁltering methods, and studying their inﬂuence on the two commonly used PET reconstruction methods. The ﬁltered back projection (FBP) [11], and the iterative ordered subset expectation maximization (OSEM) [8], respectively. The FBP algorithm is computationally eﬃcient while the OSEM algorithm can incorporate easily prior information on the image to improve image quality. In our experiments, the reconstruction parameters of these algorithms are set as follows: the hamming parameter in the FBP method is 0.5, while for the OSEM, we use 16 subsets and run it for 4 iterations. In the following the reconstructed PET images are denoted by UF (t) = FBP(ut ), and UO (t) = OSEM(ut ), respectively, for a given enhanced sinogam ut . A simulated thorax PET phantom, containing three regions of interest (tumors) was constructed. To which varying levels of Poisson noise have been added and used for the evaluation. 50 realizations (noisy sinograms) with added noise level of 1x106 coincident events, have been generated. Each sinogram has a mm size of 256x256 pixels and their spacing is 2x2 pixel . Figure.1(a) shows the ideal noise-free sinogram with the PET images obtain via the FBP (Fig.1(c)) and, OSEM (Fig.1(d)) reconstruction. A corresponding noise contaminated realization is shown in Fig.1(b),(e)-(f). The proposed PCCM and PSS diﬀusion schemes have been assessed and evaluated against resent Total Variation (TV) denoising techniques, namely, the approach of Chambolle [1], denoted here after as TV-C, and the Nesterov [12] algorithm, denoted as TV-N. 4.2

Quantitative Evaluation Measures

Two types of evaluation measures are adopted. The ﬁrst set stems from measuring the quality of the ﬁltering techniques whilst the second set originates from validating the quality of the PET reconstruction. As ground-truth information, the former uses the noise-free image, whilst the latter needs prior identiﬁcation of the important areas by a medical professional.

218

M. Alrefaya et al.

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 1. (a) Original simulated sinogram and its reconstructed PET image using (b) FBP (c) OSEM reconstruction. (d) An example of one realization of a noisy sinogram and (e-f) the corresponding reconstructed PET images. The tumors (ROI) are the 3 clearly visible white spots.

Denoising Quality. The idea is to verify the quality of the denoised sinogram, ut , with respect to the noise-free image I. In this work, we adopt the following measures [22]: DQ1: The Peak Signal to Noise Ratio (PSNR) is a statistical measure of error, used to determine the quality of the ﬁltered images. It represents the ratio of a signal power to the noise power corrupting. Obviously, one sees that the higher the PSNR, the better the quality. P SN R(t) = 10log10

Card(Ω) |I(p) − ut (p)|

(15)

p∈Ω

DQ2: The correlation (Cmρ ) between the noise-free and the ﬁltered image. The higher this correlation the better the quality is. Cmρ (t) = ρ [I, ut ]

(16)

DQ3: The calculated variance of the noise (NV) describes the remaining noiselevel. Therefore, it should be as small as possible. NV(t) = V ar (|I − ut |)

(17)

In this work, we are interested in comparing the maximum of each measure for the diﬀerent ﬁltering approaches. The latter yields the best obtainable result per measure.

A Curvature Motion Filter for PET

219

The Contrast Recovery Curve. For evaluating the ﬁltering on the reconstructed PET images, the ﬁltered data, at discrete scales for the proposed PDE approaches, and regularization parameters values, for the TV approaches, were reconstructed using the FBP and the OSEM approaches. With the reconstructed data sets we determine the contrast recovery curve using a set of region of interests that were identiﬁed by a medical professional. In our case, the 3 white spots that represent tumors in Fig.1. This was accomplished by quantifying a Contrast Gain, cg, and coeﬃcients of variations V arcg . We calculate the contrast gain cgi for each realizations i ∈ [1, N = 50] and its overall variance. Let R = {r1 , r2 , . . . , rn } be the set of identiﬁed ROIs (n = 3 in our case), and B a representative background tissue area,then: (i) (i) 1 1 cgi (t) = n1 U (p, t) − Card(B) U (p, t) Card(r) r∈R

V arcg (t) =

1 N

N

p∈r

(cgi (t) −

i=1

1 N

N

p∈B

cgj (t))

2

(18)

j=1

where p is a pixel. The V arcg versus cg plot provides a straightforward evaluation method for the contrast-noise tradeoﬀ [3]. The best quality PET reconstruction is situated in the upper, i.e. high contrast gain, left, i.e. high stability, area of the plot. 4.3

Evaluation

A fundamental issue with scale-spaces induced by diﬀusion processes, as the ones proposed in this paper, is the automatic selection of the most salient scale. For our PET sinogram denoising application, we use an earlier proposed optimal scale selection approach [22], where the maximum correlation method has been adopted: σ[no (t0 )] topt = argmax Cmp (t) = argmax σ[ut ] + σ[no (t)] (19) σ[ut0 ] withno is the so-called outlier noise, which we estimated using the proposed wavelet-based noise estimation. Note that, t0 is the zeroth scale, thus ut0 = f and n(t0 ) represents the initial amount of noise. Figures 2.(a),(d) illustrates the obtained optimal scale using the PSS approach, and and the TV-C results, respectively. The Table 1 lists the quantitative results comparing the diﬀerent denoising approaches. The best performing ﬁltering method, per measure is, displayed in bold. As it can be seen, the best performing ﬁltering is achieved when using the PSS. Furthermore, we notice that for all the used measures, the proposed diﬀusion methods outperform the considered TV-based ﬁlters. Figure 3 depicts the Contrast Recovery Curves for the investigated ﬁltering methods. Recall that, the best enhancement is obtained when the contrast gain is as high as possible whilst the variance over it remains as small as possible. Furthermore, the degree of smoothness of the curve indicates the stability and biasing level.

220

M. Alrefaya et al. Table 1. Denoising quality measures v.s. ﬁltering approaches f PCCM ePCCM PSS TV-C TV-N P SN R(topt ) -17.510 -4.1000 -4.1000 -3.9900 -5.7200 -5.1100 NV(topt ) 7.5100 1.6000 1.6000 1.5500 1.9300 1.8000 Cmρ (topt ) 0.8500 0.9915 0.9916 0.9917 0.9876 0.9890

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 2. Enhanced sinogram and reconstructed PET images. PSS-based approach ﬁrst row, TV-based approach second row. 1.5

Contrast Gain

1.4

1.3

1.2

PCCM + FBP ePCCM + FBP PSS + FBP PCCM + OSEM ePCCM + OSEM PSS + OSEM TV−N + FBP TV−C + FBP TV−N + OSEM TV−C + OSEM

1.1

1

0.9

0.8 1.5

Variance x 10 2

2.5

3

−3

3.5

4

4.5

Fig. 3. The contrast-noise curve. The best performance occurs in the area where the contrast gain is high and its variance is low.

A Curvature Motion Filter for PET

221

We may observe that all investigated methods have a good performance. However, the proposed PSS yields the best performance on the given data set.

5

Conclusions

Experiments show that combining the probabilistic diﬀusivity function with the curvature motion diﬀusion produces a powerful nonlinear ﬁltering method that is appropriate for PET sinograms. It preserves the boundaries of the curvy shape features and wisely smoothes the regions of interest as well as the other regions. Our ﬁndings show that the PCCM method smoothes the PET images and keeps the boundaries of the important features, while the weak edges in some cases are vanished. On the other hand, the ePCCM method overcome this problem and the contrast recovered better in the ROIs by the enhancing term in the ﬁlter. This ﬁlter gives a well smoothed image and preserves the edges, and gains the advantage of the curvature motion diﬀusion and the shock ﬁlter. The PSS approach deal better with the problem of the poor and discontinuity of edges which is common in the PET images. Using the probabilistic diﬀusion function has proven to be an eﬀective and suitable tool for controlling the diﬀusion process in the proposed scheme. The results, as shown in the contrast-noise curves, demonstrate that this function has a great capability to detect and enhance the important features edge’s in the high noisy sinogram images. Moreover, The proposed diﬀusivity function has no free parameters to optimize. All parameters are image-based, and are automatically estimated and proved to give the the best results.

References 1. Chambolle, A.: An Algorithm for Total Variation Minimization and Applications. JMIV 20, 89–97 (2004) 2. Chan, T., Li, H., Lysaker, M., Tai, X.C.: Level Set Method for Positron Emission Tomography. International Journal of Biomedical Imaging 2007 (2007) 3. Comtat, C., Kinahan, P.E., Fessler, J., Beyer, T., Townsend, D.W., Defrise, M., Michel, C.: Clinically feasible reconstruction of 3D whole-body PET/CT data using blurred anatomical labels. Phys. Med. Biol. 47, 1–20 (2002) 4. Demirkaya, O.: Diﬀusion Filtering of Functional Images using the structural information available in Hyprid imaging modalities. In: IEEE Medical Imaging Symposium, Germany (2008) 5. Demirkaya, O.: Post-reconstruction ﬁltering of positron emission tomography whole-body emission images and attenuation maps using nonlinear diﬀusion ﬁltering. Acad. Radiol. 11, 1105–1114 (2004) 6. Demirkaya, O.: Anisotropic diﬀusion ﬁltering of PET attenuation data to improve emission images. Physics in Medicine Biology 47(20), 271–278 (2002) 7. Didas, S., Weickert, J.: Combining Curvature Motion and Edge-Preserving Denoising. In: Sgallari, F., Murli, A., Paragios, N. (eds.) SSVM 2007. LNCS, vol. 4485, pp. 568–579. Springer, Heidelberg (2007)

222

M. Alrefaya et al.

8. Hudson, M., Larkin, R.: Accelerated image reconstruction using ordered subsets of projection data. IEEE Trans. Med. Imag. 13(4), 601–609 (1994) 9. Happonen, A.P., Koskinen, M.O.: Experimental Investigation of Angular Stackgram Filtering for Noise Reduction of SPECT Projection Data: Study with Linear and Nonlinear Filters. International Journal of Biomedical Imaging 2007 (2007) 10. Jonsson, E., Huang, S.C., Chan, T.: Total Variation Regularization in Positron Emission Tomography. UCLA, Tech. Rep. no. 48 (1998) 11. Kak, C.A., Slaney, M.: Principles of Computerized Tomographic Imaging. IEEE Press, Los Alamitos (1999) 12. Nesterov, Y.: Smooth minimization of non-smooth functions. Mathematic Programming, Series A 103, 127–152 (2005) 13. Osher, S., Sethian, J.A.: Fronts propagating with curvature-dependent speed: algorithms based on Hamilton Jacobi formulations. Journal of Computational Physics 79(1), 12–49 (1988) 14. Padﬁeld, D.R., Manjeshwar, R.: Adaptive conductance ﬁltering for spatially varying noise in PET images. Progress in biomedical optics and imaging 7(3) no. 30 (2006) 15. Perona, P., Malik, J.: Scale space and edge detection using anisotropic diﬀusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 12, 629–639 (1990) 16. Pizurica, A., Vanhamel, I., Sahli, H., Philips, W., Katartzis, A.: A Bayesian formulation of edge-stopping functions in non-linear diﬀusion. IEEE Signal Processing Letters 13(8), 501–504 (2006) 17. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 18. Rudin, L.I., Osher, S.: Feature-oriented image enhancement with shock ﬁlters. Technical Report, Department of Computer Science, California Institute of Technology (1989) 19. Sapiro, G.: Geometric partial diﬀerential equations and image analysis. University Press, Cambridge (2001) 20. Sumengen, B., Manjunath, B.S.: Edgeﬂow-driven Variational Image Segmentation: Theory and Performance Evaluation. Technical Report, Department of Electrical and Computer Engineering University of California, Santa Barbara (2006) 21. Turkheimer, F.E., Boussion, N., Anderson, A.N., Pavese, N., Piccini, P., Visvikis, D.: PET Image Denoising Using a Synergistic Multiresolution Analysis of Structural (MRI/CT) and Functional Datasets. The Journal of nuclear medicine 49, 657–666 (2008) 22. Vanhamel, I., Mihai, C., Sahli, H., Katartzis, A., Pratikakis, I.: Scale Selection for Compact Scale-Space Representation of Vector-Valued Images. International Journal of Computer Vision 4485 (2008) 23. Vanhamel, I.: Vector valued nonlinear diﬀusion and its application to image segmentation Ph.D. Thesis, Vrije Universiteit Brussel, Faculty of Engineering Sciences, Electronics and Informatics (ETRO) (2006) 24. Wang, Y., Zhou, H.: Total Variation Wavelet-Based Medical Image Denoising. International Journal of Biomedical Imaging 2006 (2006) 25. Wang, W.: Anisotropic Diﬀusion Filtering for Reconstruction of Poisson Noisy Sinograms. Journal of Communication and Computer 2(11), 16–23 (2005) 26. Weickert, J.: Anisotropic diﬀusion in image processing. ECMI Series. TeubnerVerlag, Stuttgart (1998)

A Curvature Motion Filter for PET

223

27. Weiss, P., Aubert, G., Blanc-Fraud, L.: Eﬃcient schemes for total variation minimization under constraints in image processing. Technical Report 6260, INRIA (2007) 28. Zhu, H., Shu, H., Zhou, J., Bao, X., Luo, L.: Bayesian algorithms for PET image reconstruction with mean curvature and Gauss curvature diﬀusion regularizations. Computers in Biology and Medicine 37(6), 793–804 (2007) 29. Zhu, H., Shu, H., Zhou, J., Toumoulin, C., Luo, L.: Image reconstruction for positron emission tomography using fuzzy nonlinear anisotropic diﬀusion penalty. Med. Biol. Eng. Comput. 44(11), 983–997 (2006)

Finsler Geometry on Higher Order Tensor Fields and Applications to High Angular Resolution Diﬀusion Imaging Laura Astola and Luc Florack Department of mathematics and computer science, Eindhoven University of Technology, PO Box 513, NL-5600 MB Eindhoven, The Netherlands [email protected]

Abstract. We study three dimensional volumes of higher order tensors, using Finsler geometry. The application considered here is in medical image analysis, speciﬁcally High Angular Resolution Diﬀusion Imaging (HARDI) [1] of the brain. We want to ﬁnd robust ways to reveal the architecture of the neural ﬁbers in brain white matter. In Diﬀusion Tensor Imaging (DTI), the diﬀusion of water is modeled with a symmetric positive deﬁnite second order tensor, based on the assumption that there exists one dominant direction of ﬁbers restricting the thermal motion of water molecules, leading naturally to a Riemannian framework. HARDI may potentially overcome the shortcomings of DTI by allowing multiple relevant directions, but invalidates the Riemannian approach. Instead Finsler geometry provides the natural geometric generalization appropriate for multi-ﬁber analysis. In this paper we provide the exact criterion to determine whether a ﬁeld of spherical functions has a Finsler structure. We also show a ﬁber tracking method in Finsler setting. Our model also incorporates a scale parameter, which is beneﬁcial in view of the noisy nature of the data. We demonstrate our methods on analytic as well as real HARDI data.

1

Introduction

High Angular Resolution Diﬀusion Imaging (HARDI) is a non-invasive medical imaging modality that measures the attenuation of directional MRI (Magnetic Resonance Imaging) signal due to the diﬀusion of water molecules. Diﬀusion weighted measurements are taken in several directions, typically ranging from 50 to 130 (equidistant) angular directions. It is assumed that this diﬀusion of water molecules reveals relevant information of the underlying tissue architecture. The so-called apparent diﬀusion coeﬃcient, D(g), is computed from the StejskalTanner [2] formula S(g) = exp(−bD(g)), (1) S0

The Netherlands Organisation for Scientiﬁc Research (NWO) is gratefully acknowledged for ﬁnancial support.

X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 224–234, 2009. c Springer-Verlag Berlin Heidelberg 2009

Finsler Geometry on HOT Fields

225

where S(g) is the signal associated with gradient direction g, S0 the signal obtained when no diﬀusion gradient is applied, and b is a parameter associated with the imaging protocol. In the Diﬀusion Tensor Imaging framework, (1) is interpreted as S(g) = exp(−bg T Dg) , S0

(2)

with the 3 × 3 two-tensor D describing the probability of directional diﬀusivity at each voxel. A natural way to do geometric analysis on the image, is to use the inverse of the diﬀusion tensor D as the Riemann metric tensor [3]. This approach has been exploited to some extent in the DTI literature [4], [5], [6], [7]. Since HARDI data typically contains more directional measurements than the traditional DTI, we study it as a metric space, but using a more reﬁned model for directional information than can be accounted for by using only the local position dependent inner product i.e. Riemannian metric. Higher order tensor representations [8], [9], [10], [11] of HARDI data are well suited to diﬀerential geometric methods. We mention that Finsler geometry has already been introduced in HARDI setting. In the work of Melonakos et al. [12] the homogeneity condition is forced by normalizing the parameter-vectors, but we take a diﬀerent approach, using higher order monomial tensors and an ODE-based ﬁber tracking method. This paper is organized as follows. In section 2, we give a very short introduction to Finsler geometry and in section 3, we show that indeed HARDI measurements can be modeled with a Finsler-structure and give the speciﬁc condition which ensures this. In section 4 we discuss how to switch back and forth between iterative polynomial tensor ﬁtting, that allows Laplace-Beltrami smoothing, and a monomial tensor ﬁtting convenient for constructing a Finsler-norm. In section 5 we show some results of ﬁber-tracking based on the local Finsler metric and demonstrate it on an analytical example as well as on a real HARDI data of a rat brain scan. In the appendix we will give the details of the construction of the strong convexity criterion.

2

Finsler Geometry

In a perfectly homogeneous and isotropic medium, geometry is Euclidean, and shortest paths are straight lines. In an inhomogeneous space, geometry is Riemannian and the shortest paths are geodesics. If a medium is not only inhomogeneous, but also anisotropic1 , i.e. has innate directional structure, the appropriate geometry is Finslerian [13] [14] and the shortest paths are correspondingly Finsler-geodesics. As a consequence the metric tensor depends on both, position and direction. This is also a natural model for high angular resolution diﬀusion images. 1

We will call a medium isotropic if it is endowed with a direction independent inner product, or Riemannian metric. In the literature such a medium is also often referred to as anisotropic due to the directional bias of the metric itself.

226

L. Astola and L. Florack

Definition 1. We denote the bundle of tangent spaces T(x,y) M (y = 0) as T M \ {0}. A Finsler norm is a function F : TM → [0, ∞) that satisﬁes each of the following criteria: 1. Diﬀerentiability: F is C ∞ on the tangent bundle T M \ {0}. 2. Homogeneity: F (x, λy) = λF (x, y). 3. Strong convexity: The Hessian matrix, with components gij (x, y) =

1 ∂ 2 F 2 (x, y) , 2 ∂y i ∂y j

(3)

is positive deﬁnite at every point (x, y) of T M \ {0}.

3

Finsler Norm on HARDI Higher Order Tensor Fields

We want to show that higher order tensors, such as those ﬁtted to HARDI data, do deﬁne a Finsler norm, which can be used in the analysis of this data. We take as a point of departure a given orientation distribution function (ODF), which if normalized, is a probability density function on the sphere and which can be computed from the data by using one of the methods described in the literature [15], [16], [17], [18], [19]. It models the probability that a given direction corresponds to a direction of a ﬁber. We use the heuristics that a high probability of ﬁnding a ﬁber in direction y corresponds to a larger diﬀusivity and at the same time to a shorter travel time from the diﬀusing particle point of view. Just as in the Riemannian framework, we can actually take our metric tensor to be the inverse of a local (y-dependent) two-tensor. We use the Einstein summation convention ai bi = i ai bi , and put y = (y1 , y2 , y2 ) = (sin θ cos ϕ, sin θ sin ϕ, cos θ) ,

(4)

thus y denotes a unit vector while y = ||y||y is a general vector in R3 . We denote the higher order spherical tensor (a homogeneous polynomial restricted to sphere) approximating the ODF as D. As an example, we show how a ﬁeld of sixth order tensors D(x) deﬁnes a Finsler norm. This can be extended directly to all even order tensors. We put 1/6 F (x, y) = Dijklmn (x)y i y j y k y l y m y n . (5) In the following, we verify the deﬁning criteria stated in Deﬁnition 1. 1. Diﬀerentiability: The tensor ﬁeld D(x) is constructed by ﬁtting a tensor to the set of angular samples at each voxel, using a least squares method. The data set with ﬁxed angle is continuous in x by linear interpolation between the sample points and diﬀerentiable w.r.t. x using Gaussian derivatives. Therefore the tensor ﬁeld itself is diﬀerentiable in x, and because D is always positive, diﬀerentiability of F w.r.t. x follows. The diﬀerentiability of F in y is obvious from Eq. (5).

Finsler Geometry on HOT Fields

227

Fig. 1. Left:A fourth order spherical harmonic (or tensor), representing the (not convexiﬁed) norm function and 3 ellipsoids illustrating the metric tensors corresponding to the 3 vectors with same color. Right: Similarly a sixth order spherical harmonic function with 3 metric tensors.

2. Homogeneity: Indeed for any α ∈ R+ , x ∈ M , v ∈ Tx M : 1/6 = αF (x, v) . F (x, αv) = Dijklmn (x)αv i αv j αv k αv l αv m αv n

(6)

3. Strong convexity: We now state a strong convexity criterion for a general Finsler norm in R3 , by analogy to the R2 -criterion by Bao et al [13]. We have put the derivation of the condition into the appendix, and merely state the result here. We consider the so-called indicatrix of the norm function F at any ﬁxed x, which is the set {g | g : (θ, ϕ) → R3 , F (g) = 1}. In our case the indicatrix is the ODF, which can be easily seen from the homogeneity condition 2. in Deﬁnition 1. F (y(θ, ϕ)) =

1 =⇒ F (ODF (θ, ϕ) · y(θ, ϕ)) = 1 . ODF (θ, ϕ) 2

∂ ∂ We denote g˙ θ := ∂θ (g), g¨θ := ∂θ 2 (g) and similarly for ϕ. We deﬁne the following three matrices: ⎛ 1 2 3⎞ ⎛ 1 2 3⎞ ⎛ 1 2 3⎞ g¨ϕ g¨ϕ g¨ϕ g g g g¨θ g¨θ g¨θ m = ⎝ g˙ θ1 g˙ θ2 g˙ θ3 ⎠ , mθ = ⎝ g˙ θ1 g˙ θ2 g˙ θ3 ⎠ , mϕ = ⎝ g˙ θ1 g˙ θ2 g˙ θ3 ⎠ . (7) g˙ ϕ1 g˙ ϕ2 g˙ ϕ3 g˙ ϕ1 g˙ ϕ2 g˙ ϕ3 g˙ ϕ1 g˙ ϕ2 g˙ ϕ3

Then the strong convexity requires: (gij y˙ θi y˙ ϕj )2 det(mϕ ) det(mθ ) > 0 , and > . det(m) det(m) gij y˙ θi y˙ θj

(8)

Since we use linear interpolation between tensors, we only need to check the condition at original data-points. This condition is always met in our ODF-data, and we expect it to hold quite generally. The goal of this section was to deﬁne a Finsler-structure and in particular a Finsler metric tensor gij (x, y) corresponding to a given tensorial ODF. Indeed

228

L. Astola and L. Florack

in case the ODF is a symmetric tensor of order two, this metric tensor is equivalent to the Riemann metric tensor. Following our Finsler approach, instead of one metric tensor per voxel we obtain a bundle of metric tensors at any x. For illustration, see Fig.1.

4

Transforming a Polynomial Tensor to a Monomial Tensor

Assume we wish to apply Laplace-Beltrami smoothing to our spherical data, by which we obtain a ﬁeld of spherical functions at any desired scale, and that we wish to use a tensorial representation of the data instead of spherical harmonics. As is shown in [10], this smoothing is easy to do, using iterative polynomial tensor ﬁtting. The point here is that for Finsler analysis, we would rather work with a tensor representation of monomial form D(y) = Di1 ···in yi1 · · · yin ,

(9)

than with the equivalent polynomial expression ˜ D(y) =

n

˜ i1 ···i yi1 · · · yik , D k

(10)

k=0

but still exploit the convenient (co-domain) scale space representation of the latter: n ˜ i1 ···i yi1 · · · yik . ˜ τ) = e−τ k(k+1) D (11) D(y, k k=0

This poses no problem, since we can rather easily transform the polynomial expression to a monomial one, using the fact that our polynomials are restricted to the sphere (eq. (4)), thus we may expand a lower order tensor to a sparse higher order one and symmetrize it. We can also always transform the monomial expression to polynomial sum of irreducible monomial tensors using Clebschprojection [20].

5

Fiber Tracking in HARDI Data Using Finsler Geometry

In DTI setting the most straightforward way of tracking ﬁbers is to follow the principal eigenvector corresponding to the largest eigenvalue of the diﬀusion tensor until some stopping criterion. This method cannot reveal crossings and only provides a single direction (if at all) per voxel. Instead computing the shortest paths according to the diﬀusion-induced Riemann metric tensor, we could expect these to be the candidates for real ﬁbers [5]. Of course, most of the shortest paths (geodesics) are not representing actual ﬁbers, and therefore we should extract the potential neural ﬁbers from arbitrary geodesics based on their connectivity [6]. We show some results of solving well-connected geodesics in an analytic as well as in a real rat brain data.

Finsler Geometry on HOT Fields

5.1

229

Analytic Tensor Field

We treat an analytic norm ﬁeld in R2 , but the situation can be directly extended to R3 . Let us take as a convex norm function at each spatial position 1 1 (12) F (ϕ) = (cos 4ϕ + 4) 4 = 5 cos4 ϕ + 2 cos2 ϕ sin2 ϕ + 5 sin4 ϕ 4 . This is an example of fourth order tensor on unit vectors. Such a tensor ﬁeld could represent an inﬁnitely dense ﬁeld of orthogonally crossing ﬁbers. From the fact that F has no x-dependence we conclude that the geodesic coeﬃcients vanish and that the geodesics coincide with the Euclidean geodesics γ(t) = (t · cos ϕ, t · sin ϕ), i.e. straight lines. However the so-called connectivity of a geodesic [6], [21] is relatively large, only in cases, where the directional norm function is correspondingly small. In Finsler setting the connectivity measure m(γ) is:

ηij γ˙ i γ˙ j dt m(γ) = , (13) gij (γ, γ) ˙ γ˙ i γ˙ j dt where the ηij (γ) represents the covariant Euclidean metric tensor which in Cartesian coordinates reduces to the constant identity matrix, γ˙ the tangent to the curve γ and gij (γ, γ) ˙ the Finsler-metric tensor (which depends not only on the position on the curve but also on the tangent of the curve). For illustration we compute explicitly the metric tensors, using Cartesian coordinates:

1 g11 g12 gij = , (14) (5 cos ϕ4 + 2 cos ϕ2 sin ϕ2 + 5 sin ϕ4 )3/2 g21 g22 where g11 = 5(5 cos ϕ6 + 3 cos ϕ4 sin ϕ2 + 15 cos ϕ2 sin ϕ4 + sin ϕ6 ) g12 = g21 = −48 cos ϕ3 sin ϕ3 g22 = 5(cos ϕ6 + 15 cos ϕ4 sin ϕ2 + 3 cos ϕ2 sin ϕ4 + 5 sin ϕ6 ) 1 2

1 2

g˙ g ¨ 2 The strong convexity criterion gg¨˙ 1 gg˙ 2 − −g1 g˙ 2 > 0 in R [13] on the indicatrix g(ϕ), for metric (14) is satisﬁed for every ϕ, since

13 − 8 cos 4ϕ g¨1 g˙ 2 − g˙ 1 g¨2 = >0. 1 2 1 2 g˙ g − g g˙ (4 + cos 4ϕ)2

(15)

The connectivity measure for a (Euclidean) geodesic γ can be computed analytically:

dt , (16) m(γ) =

(4 + cos(4ϕ))1/4 dt 5π 7π which gives the maximal connectivities in directions { π4 , 3π 4 , 4 , 4 }, as expected. See Fig. 2 for an illustration. We observe that on such a norm ﬁeld the Riemannian (DTI) framework would result in Euclidean geodesics and constant connectivity over all geodesics thus revealing no information at all of the angular heterogeneity.

230

L. Astola and L. Florack

20

3

2

10 1

0 3

2

1

1

2

3

1

10

2

20 20

10

0

10

20

3

Fig. 2. Left:A ﬁeld of fourth order spherical harmonics as in the norm function eq. (12) representing dense crossings and some well connected geodesics, colored in red. Right: 200 equiangular metric tensors of the same norm function, and an ellipse with light blue color corresponding to the metric in direction ϕ = π4 .

Fig. 3. Left:Finsler geodesics emanating from a voxel, and the most connective ones in red. Right: Fibers through same neighborhood in the traditional DTI principal eigenvector tracking.

5.2

Real Rat Brain Data

The Subthalamic Nucleus is a small area in the brain, that is involved in physiopathology of Parkinson’s disease [22]. We computed the Finsler geodesics and their connectivities, having an initial point in several central voxels in the Subthalamic Nucleus. These voxels were located based on comparison to an atlas of rat brain [23]. We tracked Finsler geodesics using the standard equation (ODEformulation) [14](p.78) and second order Taylor approximation, with initial directions as the 49 measurement directions, stepsize 0.2 voxel size and for 10 steps. Then we selected those 30% of all geodesics that have the best connectivity. Compared to the traditional DTI-tracking, we found that one of the main

Finsler Geometry on HOT Fields

231

directions with strong connectivity typically coincide with the DTI-ﬁbers, but we also found other potential ﬁber directions. For illustration see Fig. 3.

6

Conclusions and Future Work

We have seen that it is indeed possible to analyze spherical tensor ﬁelds using Finsler geometry. It gives new methods to work with the data and also has the potential to give new information on the data. Finsler geodesics and Finsler curvatures are examples of geometric measures that can be applied on HARDI ﬁber-analysis, and which will be a subject of extensive future work.

Acknowledgement The rat brain data acquired for a study [24], was kindly provided by Ellen Brunenberg.

References 1. Tuch, D., Reese, T., Wiegell, M., Makris, N., Belliveau, J., van Wedeen, J.: High angular resolution diﬀusion imaging reveals intravoxel white matter ﬁber heterogeneity. Magnetic Resonance in Medicine 48(6), 1358–1372 (2002) 2. Stejskal, E., Tanner, J.: Spin diﬀusion measurements: Spin echoes ion the presence of a time-dependent ﬁeld gradient. The Journal of Chemical Physics 42(1), 288–292 (1965) 3. Cohen de Lara, M.: Geometric and symmetry properties of a nondegenerate diﬀusion process. The Annals of Probability 23(4), 1557–1604 (1995) 4. O’Donnell, L., Haker, S., Westin, C.F.: New approaches to estimation of white matter connectivity in diﬀusion tensor MRI: Elliptic PDEs and geodesics in a tensorwarped space. In: Dohi, T., Kikinis, R. (eds.) MICCAI 2002. LNCS, vol. 2488, pp. 459–466. Springer, Heidelberg (2002) 5. Lenglet, C., Deriche, R., Faugeras, O.: Inferring white matter geometry from diﬀusion tensor MRI: Application to connectivity mapping. In: Pajdla, T., Matas, J.G. (eds.) ECCV 2004. LNCS, vol. 3024, pp. 127–140. Springer, Heidelberg (2004) 6. Astola, L., Florack, L., ter Haar Romeny, B.: Measures for pathway analysis in brain white matter using diﬀusion tensor images. In: Karssemeijer, N., Lelieveldt, B. (eds.) IPMI 2007. LNCS, vol. 4584, pp. 642–649. Springer, Heidelberg (2007) 7. Astola, L., Florack, L.: Sticky vector ﬁelds and other geometric measures on diﬀusion tensor images. In: MMBIA 2008, IEEE Computer Society Workshop on Mathematical Methods in Biomedical Image Analysis, held in conjunction with CVPR 2008, Anchorage, Alaska, The United States. CVPR, vol. 20, pp. 1–7. Springer, Heidelberg (2008) 8. Özarslan, E., Mareci, T.: Generalized diﬀusion tensor imaging and analytical relationships between diﬀusion tensor imaging and high angular resolution diﬀusion imaging. Magnetic resonance in Medicine 50, 955–965 (2003) 9. Barmpoutis, A., Jian, B., Vemuri, B., Shepherd, T.: Symmetric positive 4th order tensors and their estimation from diﬀusion weighted MRI. In: Karssemeijer, N., Lelieveldt, B. (eds.) IPMI 2007. LNCS, vol. 4584, pp. 308–319. Springer, Heidelberg (2007)

232

L. Astola and L. Florack

10. Florack, L., Balmashnova, E.: Decomposition of high angular resolution diﬀusion images into a sum of self-similar polynomials on the sphere. In: Proceedings of the Eighteenth International Conference on Computer Graphics and Vision, GraphiCon 2008, Moscow, Russian Federation, June 2008, pp. 26–31 (2008) (invited paper) 11. Florack, L., Balmashnova, E.: Two canonical representations for regularized high angular resolution diﬀusion imaging. In: MICCAI Workshop on Computational Diﬀusion MRI, New York, USA, September 10, 2008, pp. 94–105 (2008) 12. Melonakos, J., Pichon, E., Angenent, S., Tannenbaum, A.: Finsler active contours. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(3), 412–423 (2008) 13. Bao, D., Chern, S.S., Shen, Z.: An Introduction to Riemann-Finsler Geometry. Springer, Heidelberg (2000) 14. Shen, Z.: Lectures on Finsler Geometry. World Scientiﬁc, Singapore (2001) 15. Tuch, D.: Q-ball imaging. Magnetic Resonance in Medicine 52(4), 577–582 (2002) 16. Jansons, K., Alexander, D.: Persistent angular structure: New insights from diﬀusion magnetic resonance imaging data. Inverse Problems 19, 1031–1046 (2003) 17. Özarslan, E., Shepherd, T., Vemuri, B., Blackband, S., Mareci, T.: Resolution of complex tissue microarchitecture using the diﬀusion orientation transform. NeuroImage 31, 1086–1103 (2006) 18. Jian, B., Vemuri, B., Özarslan, E., Carney, P., Mareci, T.: A novel tensor distribution model for the diﬀusion-weighted MR signal. NeuroImage 37, 164–176 (2007) 19. Descoteaux, M., Angelino, E., Fitzgibbons, S., Deriche, R.: Regularized, fast and robust analytical q-ball imaging. Magnetic Resonance in Medicine 58(3), 497–510 (2006) 20. Müller, C. (ed.): Analysis of Spherical Symmetries in Euclidean Spaces. Applied Mathematical Sciences, vol. 129. Springer, New York (1998) 21. Prados, E., Soatto, S., Lenglet, C., Pons, J.P., Wotawa, N., Deriche, R., Faugeras, O.: Control Theory and Fast Marching Techniques for Brain Connectivity Mapping. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, New York, USA, vol. 1, pp. 1076–1083. IEEE Computer Society Press, Los Alamitos (2006) 22. Hamani, C., Saint-Cyr, J., Fraser, J., Kaplitt, M., Lozano, A.: The subthalamic nucleus in the context of movement disorders. Brain, a Journal of Neurology 127, 4–20 (2004) 23. Paxinos, G., Watson, C.: The Rat Brain In Stereotaxic Coordinates. Academic Press, San Diego (1998) 24. Brunenberg, E., Prckovska, V., Platel, B., Strijkers, G., ter Haar Romeny, B.M.: Untangling a ﬁber bundle knot: Preliminary results on STN connectivity using DTI and HARDI on rat brains. In: Proceedings of the 17th Meeting of the International Society for Magnetic Resonance in Medicine (ISMRM), Honolulu, Hawaii (2009)

Finsler Geometry on HOT Fields

233

Appendix We seek the general condition for gij (y)v i v j > 0 ,

(17)

to be valid in R3 (= Tx M ). From the homogeneity of the norm function F , it follows that it is suﬃcient to have this condition on the unit level set of the norm. We consider this level surface i.e. the set of vectors y for which F (y) = 1 and a parametrization y(θ, ϕ) = (y 1 (θ, ϕ), y 2 (θ, ϕ), y 3 (θ, ϕ)). In what follows we abbreviate gij = gij (x, y). From F (y) = 1 we have gij y i y j = 1 .

(18)

Taking derivatives of both sides and using a consequence of Euler’s theorem for homogeneous functions ( [13] p.5) that says ∂gij k y =0, ∂y k

(19)

we obtain gij y˙ θi y j = 0

(20)

gij y˙ ϕi y j = 0 , implying y˙ θ ⊥g y and y˙ ϕ ⊥g y. Taking derivatives once more, we get gij y¨θi y j = −gij y˙ θi y˙ θj gij y¨ϕi y j = −gij y˙ ϕi y˙ ϕj i gij y¨θϕ yj

=

−gij y˙ θi y˙ ϕj

(21) .

We may express an arbitrary vector v as a linear combination of orthogonal basis vectors:

y˙ ϕ , y˙ θ v = αy + β y˙ θ + γ y˙ ϕ − y˙ θ . (22) y˙ θ , y˙ θ We substitute this expression for v to the left hand side of (17) and obtain: (gij y˙ θi y˙ ϕj )2 i j 2 i j 2 i j 2 i j , (23) gij v v = α gij y y + β gij y¨θ y + γ gij y¨ϕ y − gij y˙ θi y˙ θj because the mixed terms vanish due to the orthogonality of basis vectors. On the other hand, for y’s on the indicatrix we have as a consequence of ∂F Euler’s theorem on homogeneous functions (denoting Fyi = ∂y i ): Fyi y i = F (y) = 1 .

(24)

234

L. Astola and L. Florack

Diﬀerentiating eq. (24) w.r.t. θ and ϕ, we obtain two equations: Fyi y˙ θi = 0

(25)

Fyi y˙ ϕi

(26)

=0,

for F is a homogeneous function. The matrices m, mθ , mϕ are as deﬁned in eq. (7). Solving system of equations (24), (25) and (25) we get: Fy1 =

y˙ ϕ2 y˙ θ3 − y˙ ϕ3 y˙ θ2 y˙ ϕ3 y˙ θ1 − y˙ ϕ1 y˙ θ3 y˙ ϕ1 y˙ θ2 − y˙ ϕ2 y˙ θ1 , Fy2 = , Fy3 = . det(m) det(m) det(m)

(27)

Now using equalities gij y¨i θ y j = Fyk y¨θk ,

Fyi = gij y j , and

det(mθ ) , det(m)

−gij y¨θi y j =

gij y¨i ϕ y j = Fyk y¨ϕk ,

(28)

det(mϕ ) det(m)

(29)

−gij y¨ϕi y j =

we obtain

2

gij v v = α − β i j

2

gij y¨θi y j

−γ

if det(mθ ) >0 det(m)

and

2

gij y¨ϕi y j

−

(gij y˙ θi y˙ ϕj )2

gij y˙ θi y˙ θj

(gij y˙ θi y˙ ϕj )2 det(mϕ ) > . det(m) gij y˙ θi y˙ θj

>0

(30)

(31)

Bregman-EM-TV Methods with Application to Optical Nanoscopy Christoph Brune, Alex Sawatzky, and Martin Burger Westfälische Wilhelms-Universität Münster, Institut für Numerische und Angewandte Mathematik, Einsteinstr. 62, D-48149 Münster, Germany {christoph.brune,alex.sawatzky,martin.burger}@wwu.de http://imaging.uni-muenster.de Abstract. Measurements in nanoscopic imaging suﬀer from blurring effects concerning diﬀerent point spread functions (PSF). Some apparatus even have PSFs that are locally dependent on phase shifts. Additionally, raw data are aﬀected by Poisson noise resulting from laser sampling and "photon counts" in ﬂuorescence microscopy. In these applications standard reconstruction methods (EM, ﬁltered backprojection) deliver unsatisfactory and noisy results. Starting from a statistical modeling in terms of a MAP likelihood estimation we combine the iterative EM algorithm with TV regularization techniques to make an eﬃcient use of a-priori information. Typically, TV-based methods deliver reconstructed cartoon-images suﬀering from contrast reduction. We propose an extension to EM-TV, based on Bregman iterations and inverse scale space methods, in order to obtain improved imaging results by simultaneous contrast enhancement. We illustrate our techniques by synthetic and experimental biological data.

1

Introduction

Image reconstruction is a fundamental problem in many ﬁelds of applied sciences, e.g. nanoscopic imaging, medical imaging or astronomy. Fluorescence microscopy for example is an important imaging technique for the investigation of biological (live-) cells, up to nano-scale. In this case image reconstruction arises in form of deconvolution problems. Undesirable blurring eﬀects can be ascribed to diﬀraction of light. Mathematically, image reconstruction in such applications can often be formulated as the solution of a linear inverse and ill-posed problem. The task consists of computing an estimation of an unknown object from given measurements. Typically these problems deal with Fredholm integral equations of the ﬁrst kind ¯ , f¯ = Ku

(1)

¯ is a compact operator, f¯ (exact) data and u the desired image. In the where K ¯ is a convolution operator case of nanoscopic imaging K ¯ (Ku)(x) = (k ∗ u)(x) = k(x − y)u(y)dy , Ω X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 235–246, 2009. c Springer-Verlag Berlin Heidelberg 2009

236

C. Brune, A. Sawatzky, and M. Burger

where k is a convolution kernel, describing the blurring eﬀects created by a ¯ is not suitable, nanoscopic apparatus. Determining u by direct inversion of K since (1) is ill-posed. In such cases regularization techniques are needed to produce reasonable reconstructions. A frequently used way to realize regularization techniques is the Bayesian model, whose aim is the computation of an estimate u of the unknown object by maximizing the a-posteriori probability density p(u|f ) with measurements f . The latter is given according to Bayes formula p(u|f ) ∼ p(f |u) p(u) .

(2)

This approach is called maximum a-posteriori probability (MAP) estimation. If the measurements f are given, we describe the density p(u|f ) as the a-posteriori likelihood function which depends on u only. The Bayesian approach (2) has the advantage that it allows to incorporate additional information about u via the prior probability density p(u) into the reconstruction process. The most frequently used prior densities are Gibbs functions p(u) ∼ e−α R(u) ,

(3)

where α is a positive parameter and R a convex energy. Usual models for the probability density p(f |u) in (2) are Gaussian- or Poisson-distributed raw data f , i.e. 2

p(f |u) ∼ e−Ku−f 2 /(2σ

2

)

,

p(f |u) ∼

(Ku)fi i

fi !

i

e−(Ku)i ,

(4)

¯ In the canonical case of additive where K is a semi-discrete Operator based on K. Gaussian noise (see (4), left) the minimization of the negative log likelihood function (2) leads to classical Tikhonov regularization [1] based on minimizing a functional of the form 1 2 min Ku − f 2 + α R(u) . (5) u≥0 2 The ﬁrst, so-called data-ﬁdelity term, penalizes the deviation from equality in (1) whereas R is a regularization term as in (3). If we choose K = Id and the total variation (TV) regularization technique R(u) := |u|BV , we obtain the wellknown ROF-model [2] for image denoising. The additional positivity constraint is necessary in typical applications as the unknown represents a density image. In nanoscopic imaging measured data are stochastic and pointwise, more precisely, they are called "photon counts". This property refers to laser scanning techniques in ﬂuorescence microscopy. Consequently, the random variables of measured data are not Gaussian- but Poisson-distributed (see (4), right), with expected value given by equation (1). Hence a MAP estimation via the negative log likelihood function (2) leads to the following variational problem [1] min (Ku − f log Ku) dμ + α R(u) . (6) u≥0

Ω

Bregman-EM-TV Methods

237

Up to additive terms independent of u, the data-ﬁdelity term is the so-called Kullback-Leibler functional (also known as cross entropy or I-divergence) between the two probability measures f and Ku. A particular complication of (6) compared to (5) is the strong nonlinearity in the data ﬁdelity term and resulting issues in the computation of minimizers. In case of K = Id, i.e. in case of Poisson noise removal with total variation regularization, we refer to [3]. In the absence of regularization (α = 0) the EMalgorithm (cf. [4, 5, 6]) has become a standard scheme, which is however diﬃcult to be generalized to regularized cases. Robust solutions of this problem for appropriate models of R is one of the novelties of this paper. The speciﬁc choice of the regularization functional R in (6) is important for how a-priori information about the expected solution is incorporated into the reconstruction process. Smooth, in particular quadratic regularizations have attracted most attention in the past, mainly due to the simplicity in analysis and computation. However, such regularization approaches always lead to blurring of the reconstructions, in particular they cannot yield reconstructions with sharp edges. Recently, singular regularization energies, in particular those of 1 or L1 -type, have attracted strong attention. In this work, we introduce an approach which uses total variation (TV) as the regularization functional. TV regularization was derived as a denoising technique in [2] and generalized to various other imaging tasks subsequently. The exact deﬁnition of TV [7], used in this paper, is R(u) := |u|BV = sup u divg , (7) g∈C0∞ (Ω,Rd ), ||g||∞ ≤1

Ω

which is formally (true if u is suﬃciently regular) |u|BV = Ω |∇u|. The motivation for using TV is the eﬀective suppression of noise and the realization of almost homogeneous regions with sharp edges. These features are attractive for nanoscopic imaging if the goal is to identify object shapes that are separated by sharp edges and shall be analyzed quantitatively. Unfortunately, images reconstructed by methods using TV regularization suffer from loosing contrast. In this paper, we suggest to extend EM-TV by iterative regularization to Bregman-EM-TV, attaining simultaneous contrast enhancement. More precisely, we apply total variation inverse scale space methods by employing the concept of Bregman distance regularization. The latter has been derived in [8] with a detailed analysis for Gaussian-type problems (5) and generalized to time-continuity [9] and Lp -norm data ﬁtting terms [10]. Here, in the case of Poisson-type problems, the method consists in computing a minimizer u1 of (6) with R(u) := |u|BV ﬁrst. Updates are determined successively by computing l+1 l u = arg min (Ku − f log Ku) dμ + α ( |u|BV − p , u ) , (8) u∈BV (Ω)

Ω

l

where p is an element of the subgradient of the total variation semi norm in ul . Introducing the Bregman distance with respect to | · |BV deﬁned via p˜ (u, u ˜) := |u|BV − |˜ u|BV − ˜ p, u − u ˜ , D|·| BV

p˜ ∈ ∂|˜ u|BV ⊆ BV ∗ (Ω) ,

(9)

238

C. Brune, A. Sawatzky, and M. Burger

where ·, · denotes the duality product, allows to characterize ul+1 in (8) as pl l+1 l u = arg min (Ku − f log Ku) dμ + α D|·|BV (u, u ) . (10) u∈BV (Ω)

Ω

We will see that inverse scale space strategies can noticeably improve reconstructions for inverse problems with Poisson statistics like optical nanoscopy.

2

Reconstruction Methods

In literature there are two types of reconstruction methods that are used in general: analytic (direct) and algebraic (iterative) methods. A classical example for a direct method is the Fourier-based ﬁltered backprojection (FBP). Although FBP is well understood and computationally eﬃcient, iterative type methods obtain more and more attention in the applications mentioned above. The major reason is the high noise level (low SNR) and the type of statistics, which cannot be taken into account by direct methods. Hence, we will give a short review on the Expectation-Maximization (EM) algorithm [4, 11], which is a popular iterative algorithm to maximize the likelihood function p(u|f ) in problems with incomplete data. Then we will proceed to the presentation of the proposed EMTV and Bregman-EM-TV algorithm. 2.1

Reconstruction Method: EM Algorithm

In the absence of prior knowledge any object u has the same relevance, i.e. the Gibbs a-priori density p(u) in (3) is constant. We can then normalize p(u) such that R(u) ≡ 0. Hence (6) reduces to the constrained minimization problem min (Ku − f log Ku) dμ . (11) u≥0

Ω

A suitable iteration scheme for computing stationary points, which also preserves positivity (assuming K preserves positivity), is the so called EM algorithm (cf. [12]) K∗ f , k = 0, 1, . . . . (12) uk+1 = uk ∗ K 1 Kuk For noise-free data f several convergence proofs of the EM algorithm to the maximum likelihood estimate, i.e. the solution of (11), can be found in literature [12,13,14,15]. Besides, it is known that the speed of convergence of iteration (12) is slow. A further property of the iteration is a lack of smoothing, whereby the so-called "checkerboard eﬀect" arises. For noisy data f it is necessary to diﬀerentiate between discrete and continuous modeling. In the discrete case, i.e. if K is a matrix and u is a vector the existence of a minimum can be guaranteed since the smallest singular value is bounded by a positive value. Hence, the vectors are bounded during the iteration and convergence is ensured. However, if K is a general continuous operator

Bregman-EM-TV Methods

239

the convergence is not only diﬃcult to prove, but even a divergence of the EM algorithm is possible. Again the reason is the ill-posedness of the integral equation (1), which transfers to problem (11). This aspect can be taken as a lack of additional a-priori knowledge about the unknown u resulting from R(u) = 0. The EM algorithm converges to a minimizer if it exists. Consequently, in the continuous case it is essential to ensure consistence of the given data to prevent divergence of the EM algorithm. As described in [13], the EM iterates show the following typical behavior for ill-posed problems. The (metric) distance between the iterates and the solution decreases initially before it increases as the noise is ampliﬁed during the iteration process. This issue might be regulated by using appropriate stopping rules to obtain reasonable results. In [13] it is shown that certain stopping rules indeed allow stable approximations. Ways to improve reconstruction results are TV or Bregman-TV regularization techniques that we will consider in the following section. 2.2

Reconstruction Method: EM-TV Algorithm

The EM or Richardson/Lucy algorithm is currently the standard iterative reconstruction method for deconvolution problems with Poisson noise based on the linear equation (1). However, with the assumption R(u) = 0, no a-priori knowledge about the expected solution is taken into account, i.e. diﬀerent images have the same a-priori probability. Especially in case of measurements with low SNR the multiplicative ﬁxed point iteration (12) delivers unsatisfactory and noisy results even with early termination. Therefore we propose to integrate nonlinear variational methods into the reconstruction process to make an eﬃcient use of a-priori information and to obtain improved results. An interesting approach to improve the reconstruction is the EM-TV algorithm. In the classical EM algorithm, the negative log likelihood functional (11) is minimized. We modify the functional by adding a weighted TV term [2], min . (13) (Ku − f log Ku) dμ + α|u|BV u∈BV (Ω) u≥0

Ω

This is exactly (6) with TV as regularization functional R. That means images with smaller total variation are preferred in the minimization (have higher prior probability). BV (Ω) is a popular function space in image processing since it can represent discontinuous functions. By minimizing TV the latter are even preferred [16, 17]. Hence, expected reconstructions are cartoon-like images. Obviously, such an approach cannot be used for studying very small structures in an object, but it is perfect for segmenting diﬀerent cell structures and analyzing them quantitatively. For the solution of (13), we propose a forward-backward splitting algorithm, which can be realized by alternating classical EM steps with almost standard TV minimization steps as encountered in image denoising. The latter is solved by using duality [18] obtaining a robust and eﬃcient algorithm. For designing the proposed alternating algorithm, we consider the ﬁrst order optimality condition

240

C. Brune, A. Sawatzky, and M. Burger

of (13). Due to the total variation, this variational problem is not diﬀerentiable in the usual sense. But the latter is convex since TV is convex and since we can extend the data ﬁdelity term to a Kullback-Leibler functional, cf. [19], without aﬀecting the stationary points. For such problems powerful methods from convex analysis are available, e.g. a generalized derivative called the subdiﬀerential [20], denoted by ∂. This generalized notion of gradients and the Karush-Kuhn-Tucker (KKT) conditions [20, Theorem 2.1.4] yield the existence of a Lagrange multiplier λ ≥ 0 such that ⎧ ⎫ f ⎨ 0 ∈ K ∗1 − K ∗ + α ∂|u|BV − λ ⎬ Ku . (14) ⎩ ⎭ 0 = λu By multiplying (14) with u we can eliminate the Lagrange multiplier and derive the following semi-implicit iteration scheme K∗ f + α ˜ uk pk+1 = 0 (15) uk+1 − uk ∗ K 1 Kuk ˜ := Kα∗ 1 . Interestingly, the second term within with pk+1 ∈ ∂|uk+1 |BV and α this iteration scheme is the EM step in (12). Consequently, method (15) solving variational problem (13), can be realized as a nested two step iteration, ⎧ ⎫ ∗ f ⎨u 1 = u K ⎬ (EM step) k k+ 2 K ∗ 1 Kuk . (16) ⎩ ⎭ uk+1 = uk+ 12 − α ˜ uk pk+1 (TV step) Thus, we alternate an EM step with a TV correction step. The complex second half step from uk+ 12 to uk+1 can be realized by solving the following variational problem, (u − uk+ 12 )2 1 . (17) uk+1 = arg min +α ˜ |u|BV 2 Ω uk u∈BV (Ω) Inspecting the ﬁrst order optimality condition conﬁrms the equivalence of this minimization with the TV correction step in (16). Problem (17) is just a modiﬁed version of the Rudin-Osher-Fatemi (ROF) model, with weight u1k in the ﬁdelity term. This analogy creates the opportunity to carry over eﬃcient numerical schemes known for the ROF-model. For the solution of (17) we use the exact deﬁnition of TV (7) with dual variable g and derive an iteration scheme for the quadratic dual problem similar to [18]. The resulting algorithm reads as follows: We initialize the dual variable g 0 with 0 (or the resulting g from the previous TV correction step) and for any n ≥ 0 we compute the update g n+1 =

α uk divg n − uk+ 12 ) g n + τ ∇(˜ 1 + τ |∇(˜ α uk divg n − uk+ 12 )|

,

0 < τ <

1 , 4α ˜ uk

with the constrained damping parameter τ to ensure stability and convergence of the algorithm. For a detailed analytical examination of EM-TV we refer to [21].

Bregman-EM-TV Methods

2.3

241

Extension to Inverse Scale Space: Bregman-EM-TV

The EM-TV algorithm (16) we presented solves the problem (13) and delivers cartoon-reconstructions with sharp edges due to TV regularization. However, the realization of TV steps via the weighted ROF-models (17) has the drawback that reconstructed images suﬀer from loosing contrast. Thus, we propose to extend (13) and therewith EM-TV by iterative regularization to a simultaneous contrast correction. More precisely, we perform a contrast enhancement by inverse scale space methods and by using the Bregman iteration. These techniques have been derived in [8], with a detailed analysis for Gaussian-type problems (5), and have been generalized in [9, 10]. Following these methods, an iterative reﬁnement is realized by a sequence of modiﬁed EM-TV problems based on (13). The inverse scale space methods concerning TV, derived in [8], follow the concept of iterative regularization by the Bregman distance [22]. In case of the Poisson-model the method initially starts with a simple EM-TV algorithm, i.e. it consists in computing a minimizer u1 of (13). Then, updates are determined successively by considering variational problems with a shifted TV, namely (8), where pl is an element of the subgradient of the total variation in ul . The Bregman distance concerning TV is deﬁned in (9). The introduction of this deﬁnition allows to characterize the sequence of modiﬁed variational problems (8) by addition of constant terms as pl l+1 l u = arg min (Ku − f log Ku) dμ + α D|·|BV (u, u ) . (18) u∈BV (Ω)

Ω

Thus, the ﬁrst iterate u1 can also be realized by the variational problem (18), if p u0 is constant and p0 := 0 ∈ ∂|u0 |BV . The Bregman distance D|·| does not BV represent a distance in the common (metric) sense, since D is not symmetric in general and the triangle inequality does not hold. Though, compared to (8), the p formulation in (18) oﬀers the advantage that D|·| is a distance measure with BV p D|·| (u, u ˜) ≥ 0 BV

p and D|·| (u, u ˜) = 0 for u = u˜ . BV

Besides, the Bregman distance is convex in the ﬁrst argument because | · |BV is convex. In general, i.e. for any convex functional J (see e.g. [10]), the Bregman distance can be interpreted as the diﬀerence between J(·) in u and the Taylor linearization of J around u ˜ if, in addition, J is continuously diﬀerentiable. Before deriving a two-step iteration corresponding to (16) we will motivate the contrast enhancement by iterative regularization in (18). The TV regularization in (13) prefers functions with only few oscillations. The iterative Bregman regularization has the advantage that, with ul as an approximation to the possible solution, additional information is available. The variational problem (18) can be interpreted as follows: search for a solution that matches the Poisson distributed data after applying K and simultaneous minimization of the residual of the Taylor approximation of | · |BV around ul . In the following we will see that this form of regularization does not change the position of gradients with respect

242

C. Brune, A. Sawatzky, and M. Burger

to the last computed EM-TV solution ul but that an increase of intensities is permitted. This leads to a noticeable contrast enhancement. For the derivation of a two-step iteration we consider the ﬁrst order optimality condition of the variational problem (8) resp. (18). Due to convexity of the Bregman distance in the ﬁrst argument we can determine the subdiﬀerential of (18). Analogous to the derivation of the EM-TV iteration the subdiﬀerential of the log likelihood functional can be expressed by the Fr´echet derivative in (14). Hence, the optimality condition is given by f ∗ ∗ 0 ∈ K 1 − K + α ( ∂|ul+1 |BV − pl ), pl ∈ ∂|ul |BV . (19) Kul+1 For u0 constant and p0 := 0 ∈ ∂|u0 |BV this delivers a well deﬁned update of the iterates pl , K∗ 1 f l+1 l p 1− ∗ ∈ ∂|ul+1 |BV , := p − α ˜ K 1 Kul+1 where α ˜ := Kα∗ 1 results from an operator normalization. Analogous to EM-TV we can apply the idea of the nested iteration (16) in every reﬁnement step, l = 1, 2, · · · . For the solution of (18) condition (19) yields a strategy consisting of an EM-step ul+1 followed by solving the adapted weighted ROF-problem k+ 1 2

ul+1 k+1

⎫ ⎧ ⎬ ⎨ 1 (u − ul+11 )2 k+ 2 l = arg min + α ˜ ( |u| − p , u ) . BV ⎭ ul+1 u∈BV (Ω) ⎩ 2 Ω k

(20)

Following [8,9,10], we provide an opportunity to transfer the shift-term pl , u to the data-ﬁdelity term. This approach facilitates the implementation of contrast enhancement with Bregman distance via a slightly modiﬁed EM-TV algorithm. With the scaling v l := α ˜ pl and (19) we obtain the following update formula K∗ f l+1 l , v0 = 0 . =v − 1− ∗ (21) v K 1 Kul+1 Using this scaled update we can rewrite the second step (20) to ⎧ ⎫ ⎨ 1 (u − ul+11 )2 − 2uul+1 ⎬ vl k k+ 2 = arg min + α ˜ |u| . ul+1 BV k+1 ⎭ ul+1 u∈BV (Ω) ⎩ 2 Ω k Note that l+1 l+1 l 2 l+1 2 l 2 l+1 l+1 l l )2 − 2uul+1 (u − ul+1 k v = (u − (uk+ 1 + uk v )) + (uk ) (v ) − 2uk+ 1 uk v , k+ 1 2

2

holds, where the last two terms are independent of u. Hence (20) ⎧ l 2 ⎨ 1 (u − (ul+11 + ul+1 k v )) k+ 2 = arg min + α ˜ |u|BV ul+1 k+1 ul+1 u∈BV (Ω) ⎩ 2 Ω k

2

simpliﬁes to ⎫ ⎬ , (22) ⎭

Bregman-EM-TV Methods

243

i.e. the second step (20) can be realized by a slight modiﬁcation of the TV step introduced in (17). Obviously, the eﬃcient numerical implementation of the weighted ROF-problem in Section 2.2 using the exact deﬁnition of TV and duality strategies can be applied in complete analogy to (22). The update variable v in (21) is an error function with reference to the optimality condition of the unregularized log-likelihood functional (11). In every reﬁnement step of the Bregman iteration v l+1 diﬀers from v l by the current error in the optimality condition (11). Within the TV-step (22) one observes that an iterative regularization with the Bregman distance leads to contrast enhancement. Instead of ﬁtting to the EM solution ul+1 in the weighted norm, we use a function in the ﬁdelity k+ 1 2

term whose intensities are increased by the error function v l . Resulting from the idea of adaptive regularization v l is weighted by ul+1 k , too. As usual for iterative methods the described reconstruction method by iterative regularization needs a stopping criterion. The latter should stop at an iteration oﬀering a solution that approximates the true image as good as possible. This is necessary to prevent that too much noise arises by the inverse scale space strategy. In the case of Gaussian noise, the discrepancy principle is a reasonable stopping criterion, i.e. the procedure would stop if the residual Kul − f 2 reaches the variance of the noise. In the case of Poisson noise, however, it makes sense to stop the Bregman iteration if the Kullback-Leibler distances of Kul and the given data f reach the noise level. For synthetic data the noise level is given by the KL distance of Ku∗ and f , where u∗ denotes the true, noise-free image. For experimental data it is necessary to ﬁnd a suitable estimate for the noise level from counts.

3

Results

In recent years revolutionary imaging techniques have been developed in light microscopy with enormous importance for biological and material sciences or medicine. For a couple of decades the technology of light microscopy has been considered to be exhausted, as the resolution is basically limited by Abbe’s law for diﬀraction of light. By developing stimulated emission depletion (STED)and 4Pi-microscopy now resolutions are achieved that are way beyond these

(a)

(b)

(c)

(d)

(e)

Fig. 1. Synthetic data concerning diﬀerent PSFs: (a) true image; (b) Gaussian PSF; (c) is convolved with Gaussian PSF and Poisson noise; (d) PSF appearing in 4Pi microscopy; and (e) is convolved with 4Pi PSF and Poisson noise

244

C. Brune, A. Sawatzky, and M. Burger

(a)

(b)

(c)

(f)

(e)

(d)

(g)

(h)

Fig. 2. Synthetic data: (a) raw data; (b) EM reconstruction, 20 its, KL-distance: 3.20; (c) EM-TV, α = 0.04, KL-distance: 2.43; (d) Bregman-EM-TV, α = 0.1, after 4 updates, KL-distance: 1.43; (e) true image; (f)-(h) horizontal slices EM, EM-TV and Bregman-EM-TV compared to true image slice

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 3. Experimental data: (a) Protein Bruchpilot in active zones of neuromuscular synapses in larval Drosophila; (b) EM-TV; (c) Bregman-EM-TV; (d) Protein Syntaxin in cell membrane, ﬁxed mamalian (PC12) cell; (e) EM-TV; and (f) Bregman-EM-TV

Bregman-EM-TV Methods

245

diﬀraction barrier [23,24]. To get an impression of nanoscopic images blurred by diﬀerent convolution kernels (PSFs), we refer to Figure 1. Figure 2 illustrates our techniques at a simple synthetic object. With EM-TV (see 2(c) and 2(g)) we get rid of noise and oscillations, but we are not able to separate the objects suﬃciently. Using Bregman-EM-TV a considerable improvement resulting from contrast enhancement can be achieved. This aspect is underlined by the values of the KL-distance for the diﬀerent reconstructions. Figure 3, (a)-(c) demonstrate the protein Bruchpilot [25] and its EM-TV and Bregman-EM-TV reconstruction. Particularly, the latter delivers well separated object segments and a high contrast level. In Figure 3, (d)-(f) we illustrate our techniques by reconstructing Syntaxin [26], a membrane integrated protein participating in exocytosis. Here, the contrast enhancing property of Bregman-EM-TV is observable as well, compared to EM-TV. It is possible to preserve ﬁne structures in the image.

4

Conclusions

We have derived reconstruction methods for inverse problems with Poisson noise. Particularly, we concentrated on deblurring problems in nanoscopic imaging, although the proposed methods can easily be adapted to other imaging tasks, i.e. medical imaging (PET, [27]). Motivated by a statistical modeling we developed a robust EM-TV algorithm that incorporates a-priori knowledge into the reconstruction process. By combining EM with simultaneous TV regularization we can reconstruct cartoon-images with sharp edges, that deliver a reasonable basis for quantitative investigations. To overcome the problem of contrast reduction, we extended the reconstruction to Bregman iterations and inverse scale space methods. We applied the proposed methods to optical nanoscopy and pointed out their improvements in comparison to standard reconstruction techniques. Acknowledgments. This work has been supported by the German Federal Ministry of Education and Research through the project INVERS. C.B. acknowledges further support by the Deutsche Telekom Foundation, and M.B. by the German Science Foundation DFG through the project "Regularisierung mit Singulären Energien". The authors thank Dr. Katrin Willig and Dr. Andreas Schönle (MPI Biophysical Chemistry, Göttingen) for providing experimental data and stimulating discussions.

References 1. Bertero, M., Lantéri, H., Zanni, L.: Iterative image reconstruction: a point of view. In: Mathematical Methods in Biomedical Imaging and Intensity-Modulated Radiation Therapy (IMRT). CRM series, vol. 8 (2008) 2. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 3. Le, T., Chartrand, R., Asaki, T.J.: A variational approach to reconstructing images corrupted by Poisson noise. J. Math. Imaging Vision 27, 257–263 (2007) 4. Shepp, L.A., Vardi, Y.: Maximum likelihood reconstruction for emission tomography. IEEE Transactions on Medical Imaging 1(2), 113–122 (1982)

246

C. Brune, A. Sawatzky, and M. Burger

5. Richardson, W.H.: Bayesian-based iterative method of image restoration. J. Opt. Soc. Am. 62, 55–59 (1972) 6. Lucy, L.B.: An iterative technique for the rectiﬁcation of observed distributions. The Astronomical Journal 79, 745–754 (1974) 7. Acar, R., Vogel, C.R.: Analysis of bounded variation penalty methods for ill-posed problems. Inverse Problems 10, 1217–1229 (1994) 8. Osher, S., Burger, M., Goldfarb, D., Xu, J., Yin, W.: An iterative regularization method for total variation based image restoration. Multiscale Modelling and Simulation 4, 460–489 (2005) 9. Burger, M., Gilboa, G., Osher, S., Xu, J.: Nonlinear inverse scale space methods. Commun. Math. Sci. 4(1), 179–212 (2006) 10. Burger, M., Frick, K., Osher, S., Scherzer, O.: Inverse total variation ﬂow. SIAM Multiscale Modelling and Simulation 6(2), 366–395 (2007) 11. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum Likelihood from Incomplete Data via the EM Algorithm. J. of the Royal Statistical Society, B 39, 1–38 (1977) 12. Natterer, F., Wübbeling, F.: Mathematical methods in image reconstruction. SIAM Monographs on Mathematical Modeling and Computation (2001) 13. Resmerita, E., et al.: The expectation-maximization algorithm for ill-posed integral equations: a convergence analysis. Inverse Problems 23, 2575–2588 (2007) 14. Vardi, Y., Shepp, L.A., Kaufman, L.: A statistical model for positron emission tomography. J. of the American Statistical Association 80(389), 8–20 (1985) 15. Iusem, A.N.: Convergence analysis for a multiplicatively relaxed EM algorithm. Mathematical Methods in the Applied Sciences 14, 573–593 (1991) 16. Evans, L.C., Gariepy, R.F.: Measure theory and ﬁne properties of functions. Studies in Advanced Mathematics. CRC Press, Boca Raton (1992) 17. Giusti, E.: Minimal surfaces and functions of bounded variation. Birkhäuser, Basel (1984) 18. Chambolle, A.: An algorithm for total variation minimization and applications. J. of Mathematical Imaging and Vision 20, 89–97 (2004) 19. Resmerita, E., Anderssen, S.: Joint additive Kullback-Leibler residual minimization and regularization for linear inverse problems. Math. Meth. Appl. Sci. 30, 1527– 1544 (2007) 20. Hiriart-Urruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms I. Grundlehren der mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 305. Springer, Heidelberg (1993) 21. Brune, C., Sawatzky, A., Wübbeling, F., Kösters, T., Burger, M.: EM-TV methods for inverse problems with poisson noise (in preparation) (2009) 22. Bregman, L.M.: The relaxation method for ﬁnding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comp. Math. and Math. Phys. 7, 200–217 (1967) 23. Klar, T.A., et al.: Fluorescence microscopy with diﬀraction resolution barrier broken by stimulated emission. PNAS 97, 8206–8210 (2000) 24. Hell, S., Schönle, A.: Nanoscale resolution in far-ﬁeld ﬂuorescence microscopy. In: Hawkes, P.W., Spence, J.C.H. (eds.) Science of Microscopy. Springer, Heidelberg (2006) 25. Kittel, J., et al.: Bruchpilot promotes active zone assembly, Ca2+ channel clustering, and vesicle release. Science 312, 1051–1054 (2006) 26. Willig, K.I., Harke, B., Medda, R., Hell, S.W.: STED microscopy with continuous wave beams. Nature Meth. 4(11), 915–918 (2007) 27. Sawatzky, A., Brune, C., Wübbeling, F., Kösters, T., Schäfers, K.: Accurate EMTV algorithm in PET with low SNR. In: IEEE Nucl. Sci. Symp. (2008)

PDE-Driven Adaptive Morphology for Matrix Fields Bernhard Burgeth, Michael Breuß, Luis Pizarro, and Joachim Weickert Mathematical Image Analysis Group, Faculty of Mathematics and Computer Science, Saarland University, 66041 Saarbrücken, Germany {burgeth,breuss,pizarro,weickert}@mia.uni-saarland.de http://www.mia.uni-saarland.de

Abstract. Matrix ﬁelds are important in many applications since they are the adequate means to describe anisotropic behaviour in image processing models and physical measurements. A prominent example is diffusion tensor magnetic resonance imaging (DT-MRI) which is a medical imaging technique useful for analysing the ﬁbre structure in the brain. Recently, morphological partial diﬀerential equations (PDEs) for dilation and erosion known for grey scale images have been extended to three dimensional ﬁelds of symmetric positive deﬁnite matrices. In this article we propose a novel method to incorporate adaptivity into the matrix-valued, PDE-driven dilation process. The approach uses a structure tensor concept for matrix data to steer anisotropic morphological evolution in a way that enhances and completes line-like structures in matrix ﬁelds. Numerical experiments performed on synthetic and realworld data conﬁrm the gap-closing and line-completing qualities of the proposed method.

1

Introduction

Initiated in the sixties by the pioneering research of Serra and Matheron on binary morphology [23, 31], this branch of image processing has developed into a rich ﬁeld of research. Numerous monographs e.g. [17, 24, 32, 33, 34] and proceedings, e.g. [16,18,22] bear witness to the variety in mathematical morphology. The building blocks of morphological operations are dilation and erosion. These are usually realised by algebraic set operations involving a probing set, a so-called structuring element, e.g. [34] for details. An alternative approach to dilation is given [1] by the nonlinear partial diﬀerential equation (PDE) ∂t u = ∇u = |∂x u|2 + |∂y u|2 (1) with initial condition u(x, y, 0) = f (x, y). The equation mimics the dilation of a grey scale image f with respect to a ball-shaped structuring element of growing radius t. PDEs of this type using a continuous size parameter t for the structuring element give rise to continuous-scale morphology [1,2,6,29,35]. Equation (1) has been extended in two ways: X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 247–258, 2009. c Springer-Verlag Berlin Heidelberg 2009

248

B. Burgeth et al.

Firstly, in [5] adaptivity has been incorporated by introducing a speed function β = β(u) into (1), ∂t u = β(u) · ∇u

(2)

Earlier attempts towards adaptivity have been made in [20, 26] where a local switch between dilation and erosion with a nonadaptive structuring element leads to a so-called morphological shock ﬁlter, and in [21] introducing morphological amoebae described in a set-theoretic framework. Secondly, in [8] scalar continuous morphology has been extended to a PDEdriven morphology of matrix-valued images, matrix fields for short. Matrix ﬁelds have received increasing attention over the recent years since they are the appropriate data type to describe anisotropy in models or measurements of physical quantities. For instance, diﬀusion tensor magnetic resonance imaging (DT-MRI) became a valuable tool in medicine for in vivo diagnosis. It results in three dimensional tensor ﬁelds that describe the diﬀusive properties of water molecules, and as such the structure of the tissue under examination. The goal of this article is to introduce adaptivity into morphology for matrix ﬁelds. As it turns out it is advantageous to start for this generalisation from a scalar adaptive formulation for d-dimensional data u in form of the PDE ∂t u = M (u) · ∇u

(3)

with ∇u as a column vector and a data dependent, symmetric, positive semidefinite d × d-matrix M = M (u)rather than from (2). For example, for greyvalue ab images (d = 2) one has M = and (3) turns into bc ∂t u =

(a∂x u + b∂y u)2 + (b∂x u + c∂y u)2

(4)

→ M (x, y) transforms a sphere cenAn application of the mapping (x, y) tered around the origin into an ellipse. So, in fact, (3) describes a dilation with an ellipsoidal structuring element. The matrix M must contain directional information of the evolving u, and thus it may be derived from the so-called structure tensor. The structure tensor, going back to [14, 27, 4], is a classic tool in image processing to extract directional information from an image. It is given by Sρ (u(x)) := Gρ ∗ ∇u(x) · (∇u(x)) = Gρ ∗ ∂xi u(x) · ∂xj u(x) i,j=1,...,d (5) Here Gρ ∗ indicates a convolution with a Gaussian of standard deviation ρ, however, more general averaging procedures can be used. For more details the reader is referred to [3] and the literature cited there. We will make use of the extended structure tensor concept for matrix ﬁelds as proposed in [10]. There it was used to steer an coherence-enhancing diﬀusion process for matrix ﬁelds, an anisotropic ﬁltering process that has been proposed for scalar and colour images in [36, 37].

PDE-Driven Adaptive Morphology for Matrix Fields

249

In [38, 7, 13] Di Zenzo‘s approach [12] to construct a structure tensor for multi-channel images has been extended to matrix ﬁelds yielding a standard structure tensor (using the notation of forthcoming Section 2): Jρ (U (x)) := m i,j=1 Sρ (Ui,j (x)) This construction has been reﬁned to a customisable structure tensor in [30]. The article has the following structure: We will brieﬂy convey in Section 2 basic notions of matrix analysis needed to establish a matrix-valued PDE for an adaptively steered morphological dilation process. This includes a short account of the construction of an extended structure tensor for matrix ﬁelds. In Section 3 we introduce the steering tensor that guides the dilation process adaptively. We explain how the numerical scheme of Rouy and Tourin is generalised to the matrix valued setting in Section 4. We compare in our experiments adaptive and isotropic dilation with CED-diﬀusion when applied to synthetic matrix ﬁelds and real DT-MRI data sets. We report on this comparison of the results in Section 5. The remarks in Section 6 conclude this article.

2

Matrix Analysis and an Extended Structure Tensor Concept

This section contains the key deﬁnitions for the formulation of matrix-valued PDEs. For a more detailed exposition the reader is referred to [9]. A matrix ﬁeld is considered as a mapping U : Ω ⊂ Rd −→ Symm (R) from a d-dimensional image domain into the set of symmetric m × m-matrices with real entries, U (x) = (Up,q (x))p,q=1,...,m . The set of positive (semi-) deﬁnite matrices, + denoted by Sym++ m (R) (resp., Symm (R)), consists of all symmetric matrices A with v, Av := v Av > 0 (resp., ≥ 0) for v ∈ Rm \ {0} . This set is of special interest since DT-MRI produces data with this property. Note that at each point x the matrix U (x) of a ﬁeld of symmetric matrices can be diagonalised yielding U (x) = V (x) D(x)V (x), where V (x) is a orthogonal matrix, while D(x) is a diagonal matrix. In the sequel we will denote m × m - diagonal matrices with entries λ1 , . . . , λm ∈ R from left to right simply by diag(λi ). The extension of a function h : R −→ R to Symm (R) is standard [19]: With a slight abuse of notation we set h(U ) := V diag(h(λ1 ), . . . , h(λm ))V ∈ Sym+ m (R), h denoting now a function acting on matrices as well. Specifying h(s) = |s|, s ∈ R as the absolut value function leads to the absolut value |A| ∈ Sym+ m (R) of a matrix A. It is natural to deﬁne the partial derivative for matrix ﬁelds componentwise: ∂ ω U = (∂ω Up,q )p,q=1,...,m

(6)

where ω ∈ {t, x1 , . . . , xd }, that is, ∂ ω stands for a spatial or temporal derivative. Viewing a matrix as a tensor (of second order), its gradient would be a third order tensor according to the rules of diﬀerential geometry. However, we adopt a more operator-algebraic point of view by deﬁning the generalised gradient ∇U (x) at a voxel x = (x1 , . . . , xd ) by ∇U (x) := (∂ x1 U (x), . . . , ∂ xd U (x))

(7)

250

B. Burgeth et al.

which is an element of (Symm (R))d , in close analogy tothe scalar setting where ∇u(x) ∈ Rd . For W ∈ (Symm (R))d we set |W |p := p |W1 |p + · · · + |Wd |p for 0 < p < +∞. It results in a positive semideﬁnite matrix from Sym+ m (R), the direct counterpart of a nonnegative real number as the length of a vector in Rd . There will be the need for a symmetric multiplication of symmetric matrices. We opt for the so-called Jordan product A •J B := 12 (AB + BA) . It produces a symmetric matrix, and it is commutative but neither associative nor distributive. Furthermore, for later use in numerical schemes we have to clarify the notion of maximum and minimum of two symmetric matrices A, B. In direct anaology with relations known to be valid for real numbers one deﬁnes [8]: max(A, B) =

1 1 (A + B + |A − B|) and min(A, B) = (A + B − |A − B|) (8) 2 2

where |F | stands for the absolut value of the matrix F . With this at our disposal we formulate the matrix-valued counterpart of (3) as ∂ t U = |M (U ) • ∇U |2

(9)

with an initial matrix ﬁeld F (x) = U (x, 0). Here M (U ) denotes a symmetric md × md-block matrix with d2 blocks of size m × m that is multiplied block-wise with ∇U employing the symmetrised product "•". Note that | · |2 stands for the length of M (U ) • ∇U in the matrix valued sense. The construction of M (U ) is detailed in Section 3 and relies on the so-called full structure tensor. The full structure tensor S L for matrix ﬁelds as deﬁned in [10] reads S L (U ) := Gρ ∗ ∇U ·(∇U ) = Gρ ∗ ∂ xi U · ∂ xj U i,j=1,...,d (10) with Gρ ∗ indicating a convolution with a Gaussian of standard deviation ρ. S L (U (x)) is a symmetric md × md-block matrix with d2 blocks of size m × m, S L (U (x)) ∈ Symd (Symm (IR)) = Symmd (IR). Typically for the 3D medical DT-MRI data one has d = 3 and m = 3, yielding a 9 × 9-matrix S L . It can md be diagonalised as S L (U ) = k=1 λk wk wk with real eigenvalues λk (w.l.o.g. arranged in decreasing order) and an orthonormal basis {wk }k=1,...,md of IRmd . In order to extract useful d-dimensional directional information S L (U ) ∈ Symmd (IR) is reduced to a structure tensor S(U ) ∈ Symd (IR) in a generalised projection step [10] using the block operator matrix TrA := diag(trA , . . . , trA ) containing the trace operation. We set Tr := TrIm where Im denotes the m × m unit matrix. This operator matrix acts on elements of the space (Symm (IR))d as well as on block matrices via formal block-wise matrix multiplication, ⎛ ⎞⎛ ⎞ ⎛ ⎞ trA · · · 0 M11 · · · M1d trA (M11 ) · · · trA (M1d ) . . . . . . .. .. .. ⎝ .. . . . .. ⎠ ⎝ .. . . . .. ⎠ = ⎝ ⎠, (11) . 0 · · · trA trA (Md1 ) · · · trA (Mdd ) Md1 · · · Mdd provided that the square blocks Mij have the same size as A. The projection that is conveyed by the reduction process condenses the directional information contained in S L (U ), for a more detailed reasoning we must refer the reader to [10]

PDE-Driven Adaptive Morphology for Matrix Fields

251

for the sake of brevity. The reduction operation is accompanied by an extension operation: The Im -extension is the mapping from Symd (IR) to Symmd (IR) conveyed by the Kronecker product ⊗ : ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ v11 · · · v1d v11 · · · v1d Im · · · Im v11 Im · · · v1d Im ⎜ .. . . .. ⎟ ⎜ .. . . .. ⎟ ⎜ .. . . .. ⎟ ⎜ .. .. ⎟(12) .. −→⎝ . ⎝ . . . ⎠ . . ⎠⊗⎝ . . . ⎠:=⎝ . . . ⎠ vd1 · vdd

vd1 · · · vdd

Im · · · Im

vd1 Im · · · vdd Im

This resizing step renders a proper matrix-vector multiplication with the large generalised gradient (∇U (x)) possible. By specifying the matrix A in (11) one may invoke a priori knowledge into the direction estimation [10]. The research on these structure-tensor concepts has been initiated by [38, 7]. The approaches to matrix ﬁeld regularisation suggested in [11] are based on diﬀerential geometric considerations. Comprehensive survey articles on the analysis of matrix ﬁelds using various techniques can be found in [39].

3

Steering Matrix M (U ) for Matrix Fields

With this notions we are in the position to propose the steering matrix M in the adaptive dilation process for matrix ﬁelds. We proceed in four steps: 1. The matrix ﬁeld IRd x → U (x) provides us with a module ﬁeld of generalised gradients ∇U (x) from which we construct the generalised structure tensor S L (U (x)) possibly with a certain integration scale ρ. This step corresponds exactly to the scalar case. 2. We infer d-dimensional directional information by reducing S L (U (x)) with trA by means of the block operator matrix TrA leading to a symmetric d × dmatrix S, for example S = Jρ if A = Im , (13) S(x) := TrA S L (U (x)) 3. The symmetric d × d-matrix S is spectrally decomposed, and the following mapping is applied: Rd+ −→ Rd H: , (14) c (λ1 , . . . , λd ) −→ λ1 +···+λ (λd , λd−1 , . . . , Kc · λ1 ) d with constants c, K > 0. H applied to S yields the steering matrix M , M := H(S)

(15)

Observe that the ellipsoid associated with the matrix M is ﬂipped if compared with S and, depending on the choice of K, more excentric than the one accompanying S. 4. Finally we enlarge the d × d-matrix M to a md × md-matrix M by the extension operation: ⎛ ⎞ Im · · · Im ⎜ ⎟ M = M ⊗ ⎝ ... . . . ... ⎠ (16) Im · · · Im

252

4

B. Burgeth et al.

Matrix-Valued Numerical Schemes

In the context of PDE-based mathematical morphology, ﬁrst-order ﬁnite difference methods such as the Osher-Sethian scheme [25] and the Rouy-Tourin method [28] are reasonable choices for solving the scalar PDE (4). We choose the latter in our experiments. The variant we present for the sake of brevity in its two-dimensional form reads 1 x n 2 1 n+1 n x n ui,j = ui,j + τ max max −D− ui,j , 0 , max D+ ui,j , 0 hx hx 1/2 1 y n 2 1 y n + max max −D− ui,j , 0 , max D+ ui,j , 0 (17) hy hy In the latter formulation we employ the notation unij as the grey value of the image u at the pixel centred at (ihx , jhy ) ∈ R2 at the time-level nτ of the

Fig. 1. (a) Top left: 2D slice of original 3D matrix ﬁeld. (b) Top right: Adaptive dilation with of the original data with K = 25, ρ = 1 after t = 0.3. (c) Bottom left: Standard PDE-based dilation mimicing a ball-shaped structuring element after t = 1. (d) Bottom right: CED-ﬁltering with ρ = 4 after t = 10.

PDE-Driven Adaptive Morphology for Matrix Fields

253

evolution. Additionally we use standard abbreviations for forward and backward x n x n diﬀerence operators, i.e., D+ ui,j := uni+1,j −uni,j and D− ui,j := uni,j −uni−1,j . and spatial grid size hx , hy . This scheme approximates, in the pixel (ihx , jhy ) 1 x n 1 x n max −D− ui,j , 0 , max D+ ui,j , 0 (18) ux ≈ max hx hx uy ≈ max

1 y n 1 y n max −D− ui,j , 0 , max D+ ui,j , 0 hy hy

(19)

Using this approximations, we modify the original Rouy-Tourin scheme (17) in an obvious manner to obtain a numerical scheme for the adaptive version of the PDE-based dilation (3). The extension to higher dimensions poses no problem. Since linear combinations and elementary functions such as the square, squareroot or absolute value function for matrix ﬁelds are now at our disposal it is straightforward to deﬁne one sided diﬀerences in x-direction for 2D matrix ﬁelds of m × m-matrices: x n D+ U (i, j) := U n ((i + 1)hx , jhy ) − U n (ihx , jhy ) ∈ Symm (R)

(20)

x n U (i, j) := U n (ihx , jhy ) − U n ((i − 1)hx , jhy ) ∈ Symm (R) D−

(21)

Fig. 2. (a) Left: 2D slice of 3D DT-MRI data set. (b) Right: Adaptive dilation of the original data with K = 10, ρ = 1, t = 0.5.

254

B. Burgeth et al.

In order to avoid confusion with the subscript notation for matrix components we used the notation U (i, j) to indicate the (matrix-) value of the matrix ﬁeld evaluated at the voxel centred at (ihx , jhy ) ∈ R2 . The y-direction (and z-direction in 3D) is treated accordingly. The notion of supremum and inﬁmum of two matrices – as needed in a matrix variant of Rouy-Tourin – has been provided by (8). Having these generalisations at our disposal a modiﬁed, adaptive version of the Rouy-Tourin scheme is available now in the setting of matrix ﬁelds simply by replacing grey values unij by matrices U n (i, j).

5

Experiments

The matrix data are visualised as an ellipsoid in each voxel via the level sets of quadratic form {v ∈ R2 v : v U −2 (i, j)v = const.} associated with the matrix

Fig. 3. (a) Top left: Enlarged section of the original data of ﬁgure 2 showing the genu area. (b) Top right: Adaptive dilation of the original data with K = 10, ρ = 1, t = 0.5. (c) Bottom left: Standard PDE-based dilation mimicing a ball-shaped structuring element with t = 0.5. (d) Bottom right: CED-ﬁltering with ρ = 1 after t = 0.5.

PDE-Driven Adaptive Morphology for Matrix Fields

255

Fig. 4. (a) Top left: Enlarged section of the original data of ﬁgure 2 showing the splenium area. (b) Top right: Adaptive dilation of the original data with K = 10, ρ = 1, t = 0.5. (c) Bottom left: Standard PDE-based dilation mimicing a ball-shaped structuring element with t = 0.5. (d) Bottom right: CED-ﬁltering with ρ = 1 after t = 0.5.

U (i, j) ∈ Sym+ 3 (R) representing the matrix ﬁeld at voxel (ihx , jhy ). By using U −2 the length of the semi-axes of the ellipsoid correspond directly with the three eigenvalues of the matrix. Changing the constant const. amounts to a mere scaling of the ellipsoids. Note that only positive deﬁnite matrices produce ellipsoids as level sets of its quadratic form. In all our experiments we compare the results of the proposed matrix-valued adaptive dilation with the isotropic dilation [8] , and with the matrix-valued coherence-enhancing diﬀusion from [10]. For the explicit numerical schemes we used a time step size of 0.1, grid size hx = hy = 1, and c = 0.01 · K in (14). Figure 1 shows a synthetic data set of size 32 × 32 representing an interrupted diagonal stripe built from cigar-shaped ellipsoids of equal size. All methods succeed to some degree to ﬁll the gaps. In the case of the proposed adaptive dilation the gap is ﬁlled almost completely with tensors comparable in size with the original ones while the width of the stripe is not altered at all. However, the numerical scheme has a slight bias towards the directions of the coordinate system entailing in the appearance of mild artefacts. Standard dilation ﬁlls the gap basically as a side eﬀect of the isotropic dilation process which leads also to a considerable widening of the ribbon-like structure. CED for matrix ﬁelds produces indeed small cigar-shaped ellipsoids at the location of the gap. But the process is considerably slower than any of the dilation processes

256

B. Burgeth et al.

and the neighbouring ellipsoids become smaller due to the property of mass conservation. Additionally an undesirable widening of the stripe is observed. We also tested the proposed method on a real DT-MRI data set of a human head consisting of a 128 × 128 × 38-ﬁeld of positive deﬁnite matrices. Figure 2 shows the lateral ventricals in a 40 × 55 2D section before and after applying adaptive dilation with speed parameter K = 10, integration scale ρ = 1 and stopping time t = 0.5. For a better comparison we display two enlarged regions of interest in Figures 3 and 4, namely the genu and the splenium areas, resp.. We observe that adaptive dilation preserves the shape of the ventricles better than the isotropic dilation, while enhancing slightly the directional structure of the ﬁbre tracts surrounding the ventricles. Due to measurement errors the ﬁbre tracts are interrupted in the original Figures 3(a) and 4(a). These holes in the anisotropic regions (splenium) are quickly ﬁlled by the adaptive dilation while CED-ﬁltering will take much longer to do so.

6

Conclusion

In this article we have presented a novel method for an adaptive, PDE-based dilation process in the setting of matrix ﬁelds. The evolution governed by a matrix-valued PDE is guided by a steering tensor, the construction of which relies on an extended structure tensor concept for matrix ﬁelds. A matrix-valued extension of the Rouy-Tourin-scheme that allows to include directional information is employed to solve the novel PDE. Experiments on positive semideﬁnite DT-MRI and synthetic data conﬁrm that the novel adaptive dilation process displays line-enhancing and gap-closing qualities, and as such it is superior to standard isotropic dilation which extends structures in all directions. It is also a valuable alternative in terms of quality and speed to coherence-enhancing diﬀusion ﬁltering for matrix ﬁelds, an anisotropic processes which aims at enhancing ﬂow-like structures as well but may suﬀer from dissipative eﬀects. Future research will concentrate on improving the numerical realisation of our adaptive dilation.

Acknowledgement The ﬁnancial support of the German Academic Exchange Service (DAAD) for the third author is gratefully acknowledged.

References 1. Alvarez, L., Guichard, F., Lions, P.-L., Morel, J.-M.: Axioms and fundamental equations in image processing. Archive for Rational Mechanics and Analysis 123, 199–257 (1993) 2. Arehart, A.B., Vincent, L., Kimia, B.B.: Mathematical morphology: The Hamilton–Jacobi connection. In: Proc. Fourth International Conference on Computer Vision, Berlin, pp. 215–219. IEEE Computer Society Press, Los Alamitos (1993)

PDE-Driven Adaptive Morphology for Matrix Fields

257

3. Bigün, J.: Vision with Direction. Springer, Berlin (2006) 4. Bigün, J., Granlund, G.H., Wiklund, J.: Multidimensional orientation estimation with applications to texture analysis and optical ﬂow. IEEE Transactions on Pattern Analysis and Machine Intelligence 13(8), 775–790 (1991) 5. Breuß, M., Burgeth, B., Weickert, J.: Anisotropic continuous-scale morphology. In: Martí, J., Benedí, J.M., Mendonça, A.M., Serrat, J. (eds.) IbPRIA 2007. LNCS, vol. 4478, pp. 515–522. Springer, Heidelberg (2007) 6. Brockett, R.W., Maragos, P.: Evolution equations for continuous-scale morphological ﬁltering. IEEE Transactions on Signal Processing 42, 3377–3386 (1994) 7. Brox, T., Weickert, J., Burgeth, B., Mrázek, P.: Nonlinear structure tensors. Image and Vision Computing 24(1), 41–55 (2006) 8. Burgeth, B., Bruhn, A., Didas, S., Weickert, J., Welk, M.: Morphology for tensor data: Ordering versus PDE-based approach. Image and Vision Computing 25(4), 496–511 (2007) 9. Burgeth, B., Didas, S., Florack, L., Weickert, J.: A generic approach to diﬀusion ﬁltering of matrix-ﬁelds. Computing 81, 179–197 (2007) 10. Burgeth, B., Didas, S., Weickert, J.: A general structure tensor concept and coherence-enhancing diﬀusion ﬁltering for matrix ﬁelds. Technical Report 197, Department of Mathematics, Saarland University, Saarbrücken, Germany (July 2007); to appear in: Laidlaw, D., Weickert, J. (eds.): Visualization and Processing of Tensor Fields. Springer, Heidelberg (2009) 11. Chefd’Hotel, C., Tschumperlé, D., Deriche, R., Faugeras, O.: Constrained ﬂows of matrix-valued functions: Application to diﬀusion tensor regularization. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 251–265. Springer, Heidelberg (2002) 12. Di Zenzo, S.: A note on the gradient of a multi-image. Computer Vision, Graphics and Image Processing 33, 116–125 (1986) 13. Feddern, C., Weickert, J., Burgeth, B., Welk, M.: Curvature-driven PDE methods for matrix-valued images. International Journal of Computer Vision 69(1), 91–103 (2006) 14. Förstner, W., Gülch, E.: A fast operator for detection and precise location of distinct points, corners and centres of circular features. In: Proc. ISPRS Intercommission Conference on Fast Processing of Photogrammetric Data, Interlaken, Switzerland, June 1987, pp. 281–305 (1987) 15. Goutsias, J., Heijmans, H.J.A.M., Sivakumar, K.: Morphological operators for image sequences. Computer Vision and Image Understanding 62, 326–346 (1995) 16. Goutsias, J., Vincent, L., Bloomberg, D.S. (eds.): Mathematical Morphology and its Applications to Image and Signal Processing. Computational Imaging and Vision, vol. 18. Kluwer, Dordrecht (2000) 17. Heijmans, H.J.A.M.: Morphological Image Operators. Academic Press, Boston (1994) 18. Heijmans, H.J.A.M., Roerdink, J.B.T.M. (eds.): Mathematical Morphology and its Applications to Image and Signal Processing. Computational Imaging and Vision, vol. 12. Kluwer, Dordrecht (1998) 19. Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge University Press, Cambridge (1990) 20. Kramer, H.P., Bruckner, J.B.: Iterations of a non-linear transformation for enhancement of digital images. Pattern Recognition 7, 53–58 (1975) 21. Lerallut, R., Decencière, E., Meyer, F.: Image ﬁltering using morphological amoebas. Image and Vision Computing 25(4), 395–404 (2007)

258

B. Burgeth et al.

22. Louverdis, G., Vardavoulia, M.I., Andreadis, I., Tsalides, P.: A new approach to morphological color image processing. Pattern Recognition 35, 1733–1741 (2002) 23. Matheron, G.: Eléments pour une théorie des milieux poreux. Masson, Paris (1967) 24. Matheron, G.: Random Sets and Integral Geometry. Wiley, New York (1975) 25. Osher, S., Fedkiw, R.P.: Level Set Methods and Dynamic Implicit Surfaces. Applied Mathematical Sciences, vol. 153. Springer, New York (2002) 26. Osher, S., Sethian, J.A.: Fronts propagating with curvature-dependent speed: Algorithms based on Hamilton–Jacobi formulations. Journal of Computational Physics 79, 12–49 (1988) 27. Rao, A.R., Schunck, B.G.: Computing oriented texture ﬁelds. CVGIP: Graphical Models and Image Processing 53, 157–185 (1991) 28. Rouy, E., Tourin, A.: A viscosity solutions approach to shape-from-shading. SIAM Journal on Numerical Analysis 29, 867–884 (1992) 29. Sapiro, G., Kimmel, R., Shaked, D., Kimia, B.B., Bruckstein, A.M.: Implementing continuous-scale morphology via curve evolution. Pattern Recognition 26, 1363– 1372 (1993) 30. Schultz, T., Burgeth, B., Weickert, J.: Flexible segmentation and smoothing of DTMRI ﬁelds through a customizable structure tensor. In: Bebis, G., Boyle, R., Parvin, B., Koracin, D., Remagnino, P., Neﬁan, A., Meenakshisundaram, G., Pascucci, V., Zara, J., Molineros, J., Theisel, H., Malzbender, T. (eds.) ISVC 2006. LNCS, vol. 4291, pp. 455–464. Springer, Heidelberg (2006) 31. Serra, J.: Echantillonnage et estimation des phénomènes de transition minier. PhD thesis, University of Nancy, France (1967) 32. Serra, J.: Image Analysis and Mathematical Morphology, vol. 1. Academic Press, London (1982) 33. Serra, J.: Image Analysis and Mathematical Morphology, vol. 2. Academic Press, London (1988) 34. Soille, P.: Morphological Image Analysis, 2nd edn. Springer, Berlin (2003) 35. van den Boomgaard, R.: Mathematical Morphology: Extensions Towards Computer Vision. PhD thesis, University of Amsterdam, The Netherlands (1992) 36. Weickert, J.: Coherence-enhancing diﬀusion of colour images. In: Sanfeliu, A., Villanueva, J.J., Vitrià, J. (eds.) Proc. Seventh National Symposium on Pattern Recognition and Image Analysis, Barcelona, Spain, April 1997, vol. 1, pp. 239–244 (1997) 37. Weickert, J.: Coherence-enhancing diﬀusion ﬁltering. International Journal of Computer Vision 31(2/3), 111–127 (1999) 38. Weickert, J., Brox, T.: Diﬀusion and regularization of vector- and matrix-valued images. In: Nashed, M.Z., Scherzer, O. (eds.) Inverse Problems, Image Analysis, and Medical Imaging. Contemporary Mathematics, vol. 313, pp. 251–268. AMS, Providence (2002) 39. Weickert, J., Hagen, H. (eds.): Visualization and Processing of Tensor Fields. Springer, Berlin (2006)

On Semi-implicit Splitting Schemes for the Beltrami Color Flow Lorina Dascal1 , Guy Rosman1 , Xue-Cheng Tai2,3 , and Ron Kimmel1 1

Department of Computer Science, Technion – Israel Institute of Technology, 32000, Haifa, Israel {lorina,rosman,ron}@cs.technion.ac.il 2 Division of Mathematical Sciences, SPMS, Nanyang Technological University, 50 Nanyang Avenue, 639798, Singapore [email protected] 3 Department of Mathematics, University of Bergen, Johannes Brunsgate 12, 5007, Bergen, Norway [email protected]

Abstract. The Beltrami ﬂow is an eﬃcient non-linear ﬁlter, that was shown to be eﬀective for color image processing. The corresponding anisotropic diﬀusion operator strongly couples the spectral components. Usually, this ﬂow is implemented by explicit schemes, that are stable only for small time steps and therefore require many iterations. In this paper we introduce a semi-implicit scheme based on the locally one-dimensional (LOD) and additive operator splitting (AOS) schemes for implementing the anisotropic Beltrami operator. The mixed spatial derivatives are treated explicitly, while the non-mixed derivatives are approximated in a semi-implicit manner. Numerical experiments demonstrate the stability of the proposed scheme. Accuracy and eﬃciency of the splitting schemes are tested in applications such as the scale-space analysis and denoising. In order to further accelerate the convergence of the numerical scheme, the reduced rank extrapolation (RRE) vector extrapolation technique is employed.

1

Introduction

Nonlinear diﬀusion ﬁlters based on partial diﬀerential equations (PDEs) have been extensively used in the last decade for diﬀerent tasks in image processing. Their eﬃcient implementation requires designing numerical schemes in which the issues of accuracy, stability, and computational cost all play important roles. The Beltrami image ﬂow is an example of a non-linear ﬁlter, that is eﬃcient for color image processing. It treats the image as a 2-D manifold embedded in a hybrid spatial-feature space. Minimization of the image area surface yields the Beltrami ﬂow. The corresponding diﬀusion operator is anisotropic and strongly couples the spectral components. Due to its anisotropy and non-separability, so far there is no eﬃcient implicit, nor operator-splitting-based numerical scheme for the partial diﬀerential equation that describes the Beltrami ﬂow in color. X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 259–270, 2009. c Springer-Verlag Berlin Heidelberg 2009

260

L. Dascal et al.

Usual discretizations of this ﬁlter are based on explicit schemes, that limit the time step and therefore result in a large number of iterations. In [1] an acceleration technique based on the reduced rank extrapolation (RRE) algorithm [2, 3] was proposed in order to speed-up the slow convergence of the explicit scheme. As an alternative to the explicit scheme, an approximation using the short time kernel of the Beltrami operator was suggested in [4]. Although unconditionally stable, this method is still computationally demanding, since computing the kernel involves geodesic distance computation around each pixel. The bilateral ﬁlter, which can be shown to be an Euclidean approximation of the Beltrami kernel, was studied in diﬀerent contexts (see [5], [6], [7], [8], [9], [10]). Recently, a related ﬁlter, the nonlocal means ﬁlter, was proposed in [11] and shown to be useful in denoising gray-scale and color images. In this paper we propose to approximate the system of nonlinear coupled equations given by the Beltrami ﬂow by a semi-implicit ﬁnite diﬀerence scheme based on operator splitting. Additive operator splitting (AOS) schemes were ﬁrst developed for (nonlinear elliptic/parabolic) monotone equations and NavierStokes equations [12, 13]. In image processing applications, the AOS scheme was found to be an eﬃcient way for approximating the Perona-Malik ﬁlter [14], especially if symmetry in scale-space is required. The AOS scheme is ﬁrst order in time, semi-implicit, and unconditionally stable with respect to its time-step [13, 14]. In the early 1950’s (see [15]) the alternating-direction method (ADI) was introduced, and in [16] the LOD (locally one-dimensional) splitting method was proposed. The LOD scheme and other multiplicative splitting methods were employed in the context of nonlinear diﬀusion image ﬁltering in [17]. We stress that the main characteristic of this class of equations, which allows splitting, is local isotropy. However, in the case of the anisotropic Beltrami operator, the main diﬃculty in splitting stems from the presence of the mixed derivatives. To overcome this problem, we suggest to construct the following semi-implicit scheme; the spatial mixed derivatives are discretized explicitly at the current time step nΔt, while those that do not contain mixed derivatives are approximated using an average of two levels of time steps: nΔt and (n + 1)Δt (Crank-Nicolson scheme). As our equations are nonlinear, a stability proof of the corresponding ﬁnite diﬀerence scheme is a non-trivial task. We provide numerical experiments which indicate that the LOD and the AOS splitting schemes for the nonlinear Beltrami color ﬁlter are stable for a wide range of time steps. We demonstrate the eﬃciency and stability of the splitting in applications such as: Beltramibased scale space and Beltrami-based denoising. In order to further expedite the LOD/AOS splitting schemes, we show how to speed-up their convergence by using the RRE (reduced rank extrapolation) technique. The RRE method was introduced by Me˘sina and Eddy [2, 3] to speed-up the convergence of general sequences of vectors without explicit knowledge of the sequence generator. This technique was applied in [1] in order to speed up the slow convergence of the standard explicit scheme for the Beltrami color ﬂow. In this paper we show that in applications such as scale-space and denoising of color images, the semiimplicit LOD/AOS schemes can also be accelerated using the RRE technique.

On Semi-implicit Splitting Schemes for the Beltrami Color Flow

261

This paper is organized as follows: In Section 2 we brieﬂy summarize the Beltrami framework. In Section 3 we brieﬂy review general semi-implicit splitting operator schemes. In Section 4 we propose a semi-implicit splitting scheme for the anisotropic Beltrami operator, based on the LOD/AOS schemes. In Section 5 we demonstrate the eﬃciency and stability of the LOD/AOS splitting schemes for Beltrami-based scale-space and Beltrami-based denoising. Furthermore, we propose to accelerate the LOD/AOS schemes using the RRE technique. Section 6 concludes the paper.

2

The Beltrami Framework

Let us brieﬂy review the Beltrami framework for non-linear diﬀusion in computer vision [18, 19, 20, 21]. We represent images as embedding maps of a Riemannian manifold in a higher dimensional space. We denote the map by U : Σ → M , where Σ is a two-dimensional surface, with (σ 1 , σ 2 ) denoting coordinates on it. M is the spatial-feature manifold, embedded in Rd+2 , where d is the number of image channels. For example, a gray-level image can be represented as a 2D surface embedded in R3 . The map U in this case is U (σ 1 , σ 2 ) = (σ 1 , σ 2 , I(σ 1 , σ 2 )), where I is the image intensity. For color images, U is given by U (σ 1 , σ 2 ) = (σ 1 , σ 2 , I 1 (σ 1 , σ 2 ), I 2 (σ 1 , σ 2 ), I 3 (σ 1 , σ 2 )), where I 1 , I 2 , I 3 are the three components of the color vector. Next, we choose a Riemannian metric on this surface, g, with elements denoted by gij . The canonical choice of coordinates in image processing is Cartesian (we denote them here by x1 and x2 ). For such a choice, which we follow in the rest of the paper, we identify σ 1 = x1 and σ 2 = x2 . In this case, σ 1 and σ 2 are the image coordinates. We denote the elements of the inverse of the metric by superscripts g ij , and the determinant by g = det(gij ). Once images are deﬁned as embedding of Riemannian manifolds, it is natural to look for a measure on this space of embedding maps. Denote by (Σ, g) the image manifold and its metric, and by (M, h) the spacefeature manifold and its metric. Then, the functional S[U ] assigns a real number to a map U : Σ → M , √ S[U, gij , hab ] = ds σ g||dU ||2g,h , (1) where s is the dimension of Σ, g is the determinant of the image metric, and the range of indices is i, j = 1, 2, ... dim(Σ) and a, b = 1, 2, ... dim(M ). The integrand ||dU ||2g,h is expressed in a local coordinate system by ||dU ||2g,h = (∂xi U a )g ij (∂xj U b )hab . This functional, for dim(Σ) = 2 and hab = δab , was ﬁrst proposed by Polyakov [22] in the context of high energy physics, in the theory known as string theory. The elements of the induced metric for color images with Cartesian color coordinates are 3 3 1 + β 2 a=1 (Uxa1 )2 β 2 a=1 Uxa1 Uxa2 G = (gij ) = , (2) 3 3 β 2 a=1 Uxa1 Uxa2 1 + β 2 a=1 (Uxa2 )2

262

L. Dascal et al.

where a subscript of U denotes a partial derivative and the parameter β > 0 determines the ratio between the spatial and spectral (color) distances. Using standard methods in calculus of variations, the Euler-Lagrange equations with respect to the embedding (assuming Euclidean embedding space) are 1 1 δS 0 = − √ hab b = √ div (D∇U a ), g δU g

(3)

Δg U a

where the diﬀusion matrix is D =

√ −1 gG . Note that we can write 2

div(D∇U ) =

∂xq (dqr ∂xr U ).

q,r=1

The operator that acts on U is the natural generalization of the Laplacian from ﬂat spaces to manifolds. It is called the Laplace-Beltrami operator, and denoted by Δg . The parameter β, in the elements of the metric gij , determines the nature of the ﬂow. At the limits, where β → 0 and β → ∞, we obtain respectively a linear diﬀusion ﬂow and a nonlinear ﬂow, akin to the TV ﬂow [23] for the case of grey-level images (see [20] for details). The Beltrami scale-space emerges as a gradient descent minimization process 1 δS Uta = − √ = Δg U a , g δU a

a = 1, 2, 3.

(4)

For Euclidean embedding, the functional in Eq. (1) reduces to S(U ) =

√ g dx1 dx2 .

(5)

This geometric measure can be used as a regularization term for color image processing. In the variational framework, the reconstructed image is the minimizer of a cost-functional. This functional can be written in the following general form, 3

Ψ (U ) = λ ||U a − F a ||2 + S(U ), a=1

where the parameter λ controls the smoothness of the solution and F is the given image. The modiﬁed Euler-Lagrange equations as a gradient descent process are 1 δΨ 2λ Uta = − √ = − √ (U a − F a ) + Δg U a , g δU a g

a = 1, 2, 3.

(6)

On Semi-implicit Splitting Schemes for the Beltrami Color Flow

3

263

Operator Splitting Schemes

In this section we brieﬂy review standard ﬁrst order accurate splitting schemes for diﬀusion equations. One of the main drawbacks of the semi-implicit schemes for such equations in multiple dimensions is that the resulting inverted matrix does not have an eﬃcient algorithm for its inversion. In order to remedy this shortcoming, splitting techniques are commonly employed in solving timedependent partial diﬀerential equations. They allow one to reduce problems in multiple spatial dimensions to a sequence of problems in one dimension, which are easier to solve. One of the simplest splitting schemes belonging to the class of multiplicative operator splitting schemes, is the locally one-dimensional (LOD) scheme [16]. The LOD scheme only needs to invert one three-diagonal matrix for each direction. It is simple to implement, is unconditionally stable and it is ﬁrst order accurate. However, the system matrix is not axis symmetric, a property that may be important in some cases. If such a property is required, one could use the additive operator splitting scheme [13], which was actually invented for parallel implementation of splitting methods. Even for sequential implementations, the AOS is almost as eﬃcient as the LOD scheme; instead of multiplying the operators, one computes them independently and then averages the sums of the inverse of the two matrices. We want to emphasize that the matrices for AOS use 2Δt instead of Δt. It is not a trivial matter to apply dimensional splitting schemes for Beltrami type of equations. Our goal is to construct a splitting scheme for the nonlinear anisotropic Beltrami operator, which would amount to inverting tridiagonal matrices, be unconditionally stable and preserve the time discretization accuracy that was obtained without applying splitting techniques.

4

The Proposed Splitting Scheme

In this section we present an operator splitting scheme for the Beltrami ﬁlter. Before splitting, we ﬁrst introduce a semi-implicit approximation scheme to our equations. A semi-implicit Crank-Nicolson scheme for an equation involving mixed derivatives can rely on the following discretization of the spatial derivatives operators: mixed derivatives are computed at time step nΔt, while the non-mixed derivatives are computed as the average of the values at time steps nΔt and (n + 1)Δt. This approach for handling mixed derivatives in semiimplicit schemes for approximating linear equations has been considered in several previous works (see [24, 25, 26] for example), including the context of image processing [27], although it was not combined with the Crank-Nicolson method in the latter case. We note that in numerical experiments we have found the introduction of the Crank-Nicolson method into the splitting scheme necessary in order to maintain stability for large time steps. A simpler scheme, similar to the one used in [27], did not seem to be suﬃciently stable for this PDE and the applications demonstrated in this paper. We now present the scheme we intend to use.

264

L. Dascal et al.

First, we reﬁne our grid notations. We work on the rectangle Ω = (0, 1)×(0, 1), which we discretize by a uniform grid of m × m pixels, such that xi = iΔx, yj = jΔy, tn = nΔt, where 1 ≤ i ≤ m, 1 ≤ j ≤ m, 1 ≤ n ≤ J and JΔt = T . Let the 1 grid size be Δx = Δy = m−1 . a For each channel U , a = 1, 2, 3 of the color vector, we deﬁne the discrete approximation (U a )nij by (U a )(iΔx, jΔy, nΔt) = (U a )nij ≈ U a (iΔx, jΔy, nΔt). We impose von-Neumann boundary condition, and initially set U a to be our initial data image. 4.1

LOD/AOS Scheme for the Beltrami Scale-Space

We approximate the Beltrami ﬁlter given in Eq. (4) by the following semi implicit Crank-Nicolson scheme: 1 1 n a n+1 1 n a n (U a )n+1 − (U a )n = √ n All (U ) + All (U ) + Δt g 2 2 2

l=1

2

2

l=1

Anqr (U a )n ,

q=1 r =q

where U a is the N -dimensional vector denoting one of the components of the color vector, and Anqr is a central diﬀerence approximation of the operator ∂xq (dqr ∂xr ) at time step n. Rearranging terms, we obtain −1

2

Δt (U a )n+1 = I − √ n Anll 2 g l=1 ⎛ ⎞ 2

2

Δt Δt ⎝I + √ Anqr + √ n Anll ⎠ (U a )n , g n q=1 2 g r =q

l=1

which can also be written as ⎞ −1 ⎛

2 2

2

Δt Δt ⎝I + Δt (U a )n+1 = I − A¯nll A¯nqr + A¯nll ⎠ (U a )n , 2 2 q=1 l=1

where

r =q

l=1

1 A¯11 = √ ∂x (A∂x ), g

1 A¯22 = √ ∂y (C∂y ), g

1 A¯12 = √ ∂x (B∂y ), g

1 A¯21 = √ ∂y (B∂x ), g

and the functions A, B, C are the corresponding elements of the diﬀusion matrix associated with the Beltrami ﬂow.

On Semi-implicit Splitting Schemes for the Beltrami Color Flow

265

Again, this semi-implicit scheme still has a major drawback. At each iteration one needs to solve a large linear system whose matrix of coeﬃcients is not tridiagonal and thus costly. Instead, we employ the LOD splitting scheme Δt ¯ −1 Δt ¯ −1 A22 A11 (U a )n+1 = I − I− 2 2 2

Δt ¯ Δt ¯ A11 )(I + A22 ) + Δt (I + A¯nqr (U a )n , 2 2 q=1 r =q

or the AOS scheme, that reads, (U a )n+1 =

−1 −1 1 I − ΔtA¯22 + I − ΔtA¯11 2 2

Δt ¯ Δt ¯ A11 )(I + A22 ) + Δt (I + A¯nqr (U a )n . 2 2 q=1 r =q

The above splitting schemes are eﬃcient because at each time step a single tridiagonal matrix inversion is performed for each spatial dimension. The system of diﬀerential equations we deal with is nonlinear. The question of theoretical stability of the LOD/AOS based nonlinear ﬁnite diﬀerence scheme is a non-trivial challenge, with theory still lagging behind common practice. Our numerical experiments indicate that the splitting is stable for a wide variety of parameters, suitable for most applications, as will be shown in Section 5. 4.2

LOD/AOS Scheme for the Beltrami-Based Denoising

The splitting scheme in the presence of a ﬁdelity term requires a slight modiﬁcation that we detail below. In this case we solve for each channel the equation 2λ Uta = − √ (U a − F a ) + Δg U a , g

(7)

with von-Neumann boundary condition and the initial condition U a (x, 0) = F a (x). The Crank-Nicolson scheme approximating Eq. (7) is 2 Δt ¯n λ −1 (U a )n+1 = I − All + 2Δt √ n I 2 g l=1

2

Δt ¯n Δt ¯n A11 )(I + A22 ) + Δt A¯nqr (U a )n + 2 2 q=1 r =q λ +2ΔtF a √ n . g

(I +

(8)

266

L. Dascal et al.

It is possible to use LOD/AOS approximations for the inverse of the matrix in the above scheme. √ However, we would like to treat the ﬁdelity term in a special way. When λ/ g n is big, we ﬁnd that the scheme proposed below possesses better stability properties. We now describe the details for treating the ﬁdelity term for our CrankNicolson the nominator and the denominator by the matrix scheme. Dividing S n = 1 + 2Δt √λgn I, and rearranging terms, we get 2 Δt n −1 ¯n −1 (S ) (U a )n+1 = I − All 2

l=1

2

Δt ¯n Δt ¯n A11 )(I + A22 ) + Δt A¯nqr (U a )n 2 2 q=1 r =q λ +2(S n )−1 ΔtF a √ n . g

(S n )−1 (I +

Approximating the semi-implicit scheme based on the LOD-splitting, we have −1 −1 1 1 I − Δt(S n )−1 A¯n11 (U a )n+1 = I − Δt(S n )−1 A¯n22 2 2 2

Δt ¯n Δt ¯n A11 )(I + A22 ) + Δt (S n )−1 (I + A¯nqr (U a )n + 2 2 q=1 r =q λ +2(S n )−1 ΔtF a √ n . g A similar splitting scheme can be developed using AOS.

5

Experimental Results

We proceed to demonstrate experimentally the stability, accuracy, and eﬃciency of the LOD and AOS splitting schemes for the Beltrami color ﬂow. In Figure 1 we show the results of the Beltrami ﬂow, implemented by employing the LOD splitting scheme for approximating Eq. (4). Next we illustrate the use of the splitting schemes in the case where the functional involves a ﬁdelity term. A noisy image as well as the reference denoising result, based on the explicit scheme, are shown in Figure 3, next to the result of the AOS and LOD splitting schemes. Note that the visual results obtained by the two schemes are similar to the reference image. 5.1

RRE Extrapolation Technique for Acceleration of the LOD Splitting Scheme

In [28, 1] vector extrapolation was applied in order to speed up the slow convergence of the explicit schemes for the Beltrami color ﬂow. In the experiments

On Semi-implicit Splitting Schemes for the Beltrami Color Flow

267

Fig. 1. Top row, left: The original image which contains JPEG artifacts.√Middle: Results of the LOD splitting scheme with Δt = 1, after 1 iteration, β = 103 , λ = 0. Right: Results of the LOD splitting scheme with after 2 iterations. Bottom row, left: Results of the LOD splitting scheme with after 4 iterations. Middle: a close-up of the original image. Right: a close-up of the resulting image after 4 iterations.

Fig. 2. The diﬀerent image channels of an image patch taken from the images in Figure 1. Left to right: An image patch before denoising, its diﬀerent color channels, the denoised image, and the denoised color channels. The color arrows indicate the direction of the gradient in the various color channels.

below we demonstrate how the RRE extrapolation technique can also be used to accelerate the convergence of implicit schemes. Figure 4 shows that the RRE method accelerates the LOD scheme. A comparison is also given to the convergence rate achieved by the method of [28,1]. Extrapolation techniques also allow us to obtain a more accurate rate, if one takes a smaller time step.

268

L. Dascal et al.

Fig. 3. Large image at the right: An image with artifacts resulting from lossy compression.. Smaller images – a close-up on a section of the image. Top row, left: The image with JPEG artifacts. Right: Beltrami-based denoising by explicit scheme, run with 4000 explicit iterations, Δt = 0.0005. Bottom row,√left: Denoising by LOD, Δt = 0.02. Right: Denoising by AOS, Δt = 0.02. λ = 1, β = 2000.

10

Residual Norm

10

10

10

10

5

Explicit Explicit+RRE LOD LOD+RRE

0

−5

−10

−15

0

10

20 30 40 CPU Time (sec)

50

Fig. 4. Graph of the residuals (LOD, explicit+RRE and LOD+RRE) versus CPU times. Parameters: Δt = 0.05 for the explicit scheme, Δt = 2.5 for LOD, λ = 0.5, β = √ 500 ≈ 22.36.

6

Conclusions

Due to its anisotropy and non-separability nature, no implicit scheme, nor operator splitting based scheme was so far introduced for the partial diﬀerential equations that describe the Beltrami color ﬂow. In this paper we propose a

On Semi-implicit Splitting Schemes for the Beltrami Color Flow

269

semi-implicit splitting scheme based on LOD/AOS for the anisotropic Beltrami operator. The spatial mixed derivatives are discretized explicitly at time step nΔt , while the non-mixed derivatives are approximated using the average of the two time levels nΔt and (n + 1)Δt. The stability of the splitting is empirically tested in applications such as Beltrami-based scale-space and Beltrami-based denoising, which display a stable behavior. In order to further accelerate the convergence of the splitting schemes, the RRE vector extrapolation technique is employed.

Acknowledgements We thank Prof. Avram Sidi for interesting discussions. This research was supported by the United States -Israel Binational Science Foundation grant No. 2004274, by the Israeli Science Foundation grant No. 623/08, by the Ministry of Science grant No. 3-3414, and by the Elias Fund for Medical Research. XueCheng Tai is supported by the MOE (Ministry of Education) Tier II project T207N2202 and IDM project NRF2007IDMIDM002-010.

References 1. Rosman, G., Dascal, L., Kimmel, R., Sidi, A.: Eﬃcient beltrami image ﬁltering via vector extrapolation methods. SIAM J. Imag. Sci. (2008) (submitted) 2. Mešina, M.: Convergence acceleration for the iterative solution of the equations X = AX + f . Comp. Meth. Appl. Mech. Eng. 10, 165–173 (1977) 3. Eddy, R.: Extrapolating to the limit of a vector sequence. In: Wang, P. (ed.) Information Linkage Between Applied Mathematics and Industry, New York, pp. 387–396. Academic Press, London (1979) 4. Spira, A., Kimmel, R., Sochen, N.A.: A short-time Beltrami kernel for smoothing images and manifolds. IEEE Trans. Image Process. 16(6), 1628–1636 (2007) 5. Smith, S.M., Brady, J.: Susan - a new approach to low level image processing. Intl. J. of Comp. Vision 23, 45–78 (1997) 6. Aurich, V., Weule, J.: Non-linear gaussian ﬁlters performing edge preserving diﬀusion. In: Mustererkennung 1995, 17. DAGM-Symposium, London, UK, pp. 538–545. Springer, Heidelberg (1995) 7. Tomasi, C., Manduchi, R.: Bilateral ﬁltering for gray and color images. In: Proceedings of IEEE International Conference on Computer Vision, pp. 836–846 (1998) 8. Sochen, N., Kimmel, R., Bruckstein, A.M.: Diﬀusions and confusions in signal and image processing. J. of Math. Imag. and Vision 14(3), 195–209 (2001) 9. Elad, M.: On the bilateral ﬁlter and ways to improve it. IEEE Trans. Image Process. 11(10), 1141–1151 (2002) 10. Barash, D.: A fundamental relationship between bilateral ﬁltering, adaptive smoothing and the nonlinear diﬀusion equation. IEEE Trans. Image Process. 24(6), 844–847 (2002) 11. Buades, A., Coll, B., Morel, J.M.: A review of image denoising algorithms, with a new one. SIAM Interdisciplinary Journal 4, 490–530 (2005) 12. Lu, T., Neittaanmaki, P., Tai, X.C.: A parallel splitting up method and its application to Navier-Stokes equations. Applied Mathematics Letters 4(2), 25–29 (1991)

270

L. Dascal et al.

13. Lu, T., Neittaanmaki, P., Tai, X.C.: A parallel splitting up method for partial differential equations and its application to Navier-Stokes equations. RAIRO Mathematical Modelling and Numerical Analysis 26(6), 673–708 (1992) 14. Weickert, J., Romeny, B.M.T.H., Viergever, M.A.: Eﬃcient and reliable schemes for nonlinear diﬀusion ﬁltering. IEEE Trans. Image Process. 7(3), 398–410 (1998) 15. Peaceman, D.W., Rachford, H.H.: The numerical solution of parabolic and elliptic diﬀerential equations. Journal Soc. Ind. Appl. Math. 3, 28–41 (1955) 16. Yanenko, N.N.: The method of fractional steps. The solution of problems of mathematical physics in several variables. Springer-Verlag, New York (1971) 17. Barash, D., Schlick, T., Israeli, M., Kimmel, R.: Multiplicative operator splittings in nonlinear diﬀusion: from spatial splitting to multiple timesteps. J. of Math. Imag. and Vision 19(16), 33–48 (2003) 18. Kimmel, R., Malladi, R., Sochen, N.: Images as embedding maps and minimal surfaces: Movies, color, texture, and volumetric medical images. Intl. J. of Comp. Vision 39(2), 111–129 (2000) 19. Sochen, N., Kimmel, R., Maladi, R.: From high energy physics to low level vision. In: ter Haar Romeny, B.M., Florack, L.M.J., Viergever, M.A. (eds.) Scale-Space 1997. LNCS, vol. 1252, pp. 236–247. Springer, Heidelberg (1997) 20. Sochen, N., Kimmel, R., Maladi, R.: A general framework for low level vision. IEEE Trans. Image Process. 7, 310–318 (1998) 21. Yezzi, A.J.: Modiﬁed curvature motion for image smoothing and enhancement. IEEE Trans. Image Process. 7(3), 345–352 (1998) 22. Polyakov, A.M.: Quantum geometry of bosonic strings. Physics Letters 103 B, 207–210 (1981) 23. Rudin, L., Osher, S., Fatemi, E.: Non-linear total variation based noise removal algorithms. Physica D Letters 60, 259–268 (1992) 24. Yanenko, N.N.: About implicit diﬀerence methods of the calculation of the multidimensional equation of thermal conductivity. In: Proceedings of VUZ. Series of Mathematics, vol. 23(4), pp. 148–157 (1961) 25. Andreev, V.B.: Alternating direction methods for parabolic equations in two space dimensions with mixed derivatives. Zhurnal Vychislitelnoi Matematiki i Matematicheskoi Fiziki 7(2), 312–321 (1967) 26. Mckee, S., Mitchell, A.R.: Alternating direction methods for parabolic equations in three space dimensions with mixed derivatives. The Computer Journal 14(3), 25–30 (1971) 27. Weickert, J.: Coherence-enhancing diﬀusion ﬁltering. Intl. J. of Comp. Vision 31(2/3), 111–127 (1999) 28. Dascal, L., Rosman, G., Kimmel, R.: Eﬃcient Beltrami ﬁltering of color images via vector extrapolation. In: Sgallari, F., Murli, A., Paragios, N. (eds.) SSVM 2007. LNCS, vol. 4485, pp. 92–103. Springer, Heidelberg (2007)

Multi-scale Total Variation with Automated Regularization Parameter Selection for Color Image Restoration Yiqiu Dong1 and Michael Hintermüller2 1 START-Project “Interfaces and Free Boundaries” and SFB “Mathematical Optimization and Applications in Biomedical Science”, Institute of Mathematics and Scientiﬁc Computing, University of Graz, Heinrichstrasse 36, A-8010 Graz, Austria [email protected] 2 Department of Mathematics, Humboldt-University of Berlin, Unter den Linden 6, 10099 Berlin, Germany, and START-Project “Interfaces and Free Boundaries” and SFB “Mathematical Optimization and Applications in Biomedical Science”, Institute of Mathematics and Scientiﬁc Computing, University of Graz, Heinrichstrasse 36, A-8010 Graz, Austria [email protected]

Abstract. In this paper, a multi-scale vectorial total variation model for color image restoration is introduced. The model utilizes a spatially dependent regularization parameter in order to preserve the details during noise removal. The automated adjustment strategy of the regularization parameter is based on local variance estimators combined with a conﬁdence interval technique. Numerical results on images are presented to demonstrate the eﬃciency of the method.

1

Introduction

We consider the problem of recovering color images degraded by cross-channel ˆ blurring and Gaussian noise. Without loss of generality, we assume an image u is a vectorial function deﬁned on a bounded and piecewise smooth open subset ˆ : Ω → RM , where M is the number of channels in the color Ω ∈ R2 , that is, u ˆ is given by model. The degraded form z of u ˆ + n, z = Ku where K ∈ L(L2 (Ω; RM )) is a cross-channel blurring operator, and n represents white Gaussian noise with zero mean and standard deviation σ. The problem of ˆ from z with unknown n is known to be typically ill-posed [1]. restoring u In order to preserve signiﬁcant edges during restoring images, Rudin, Osher and Fatemi proposed total variation regularization [2] for gray-level images. In this approach (which we call the TV-model in what follows), the image u ˆ is recovered by solving the optimization problem λ min |Du| + |Ku − z|2 dx, (1) u∈BV (Ω) Ω 2 Ω X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 271–281, 2009. c Springer-Verlag Berlin Heidelberg 2009

272

Y. Dong and M. Hintermüller

where BV (Ω) denotes the space of functions of bounded variation and λ > 0. Because of the edge preservation ability, the TV-model is widely accepted as a reliable tool in image restoration. Over the years, various research eﬀorts have been devoted to studying, solving and extending the TV-model; see, e.g., [3, 4, 5, 6, 7, 8, 9] as well as the monograph [1] and the many references therein. In general, images are comprised of multiple objects at diﬀerent scales. This suggests that diﬀerent values of λ localized at image features of diﬀerent scales are desirable to obtain better restoration results. For this reason, a multi-scale total variation (MTV) model with a spatially varying choice of parameters was proposed [10]. In order to enhance image regions containing details while still sufﬁciently smoothing homogeneous features, a spatially dependent regularization parameter selection was proposed in [11]. In this paper, we will extend the multi-scale total variation with spatially dependent regularization parameter to restore degraded color images. The automated adjustment strategy of the regularization parameter is based on local variance estimators combined with a conﬁdence interval technique. For speeding up the performance of the scheme we generalize the multi-scale representation according to [12, 13], and the corresponding subproblems are solved by a superlinearly convergent algorithm based on Fenchel-duality and inexact semismooth Newton techniques. The latter extends earlier work in [9]. The outline of the rest of the paper is as follows. In Section 2 we introduce the multi-scale vectorial total variation model and the primal-dual algorithm for solving the associated minimization problem. In Section 3 we extend the LVE-based parameter selection to color images. Section 4 proposes a method for color image restoration combining the multi-scale representation and spatially adaptive parameter selection. Section 5 gives numerical results to demonstrate the performance of the new method. Finally conclusions are drawn in Section 6.

2

Multi-scale Vectorial Total Variation

Based on the TV-model (1), in [14] the vectorial total variation (VTV) regularization was proposed for restoring color images: λ min |Du| + |Ku − z|2 dx, (2) 2 Ω u∈BV (Ω;RM ) Ω where the space BV (Ω; RM ) of vector-valued functions is the set of functions u ∈ L1 (Ω; RM ) such that Ω |Du| < ∞, where the vectorial TV norm Ω |Du| is deﬁned as |Du| = sup u · divv dx : v ∈ Cc1 (Ω; RM×2 ), |v| ≤ 1 , Ω

and |v| =

Ω

M i=1 vi , vi .

The space BV (Ω; RM ) endowed with the norm uBV (Ω;RM ) = uL1 (Ω;RM ) + |Du| Ω

is a Banach space.

Multi-scale Total Variation

273

In the VTV-model (2), the parameter λ controls the trade-oﬀ between a good ﬁt of z and a smoothness requirement due to the vectorial total variation regularization. Since images are usually comprised of multiple objects at diﬀerent scales, locally diﬀerent λ is desirable. Therefore, here we consider multi-scale vectorial total variation (MVTV): 1 min |Du| + λ(x)|Ku − z|2 dx. (3) 2 Ω u∈BV (Ω;RM ) Ω Similar as in Section 2 of [11], we can obtain the same conclusion on the existence and uniqueness of the minimizer for the MVTV-model. Here, we do not repeat proof details, but rather refer to [11]. 2.1

Primal-Dual Approach to Multi-scale Vectorial Total Variation

In [9] an infeasible primal-dual algorithm of generalized Newton-type was proposed for solving (1). In the sequel we extend its key features to the case (3). Rather than operating on the MVTV-model (3) the method is based on 1 μ 2 2 min |∇u| dx + λ|Ku − z| dx + |∇u| dx, (4) 2 2 Ω u∈H01 (Ω;RM ) 2 Ω Ω ¯ for almost all x ∈ Ω and 0 < μ λ ¯ −1 . The μwhere 0 < ≤ λ(x) ≤ λ term serves the purpose of a function space regularization for a “convenient" dualization in a Hilbert space setting. In our numerics, we typically choose μ = 0. Applying the Fenchel-Legendre calculus [15] analogously as in [9], the Fencheldual of (4) reads sup 2

M

p ∈ L (Ω; R ) |p(x)| ≤ 1 a.e. in Ω

1 1 − |||K ∗ z − divp|||2H −1 + z2L2 , 2 2

(P0 )

where |||u|||2H −1 = Hμ,K v, vH01 ,H −1 , v ∈ H −1 (Ω; RM ) with Hμ,K = (K ∗ λK − μ)−1 , : H01 (Ω; RM ) → H −1 (Ω; RM ), and ·, ·H01 ,H −1 denotes the duality pairing between H01 (Ω; RM ) and its dual H −1 (Ω; RM ). Moreover, L2 (Ω; RM ) = (L2 (Ω; RM ))2 . In order to avoid the non-uniqueness of the solution of (P0 ), following [9] we consider a dual regularization: 1 1 β ∗ 2 2 sup − |||K z − divp|||H −1 + zL2 − p2L2 . (P ) 2 2 2 Ω p ∈ L2 (Ω; RM ) |p(x)| ≤ 1 a.e. in Ω

where β > 0 is the regularization parameter. In order to study the eﬀect of the βregularization of the Fenchel-dual, we apply the Fenchel-Legendre calculus once more and ﬁnd that the dual of (P ) is given by 1 μ 2 2 |∇u|2 dx + λ|Ku − z| dx + Φβ (∇u)dx, (P ∗ ) min 2 Ω u∈H01 (Ω;RM ) 2 Ω Ω where for w ∈ L2 (Ω; RM ),

274

Y. Dong and M. Hintermüller

Φβ (w)(x) =

|w(x)| − β2 , if |w(x)| ≥ β, 1 2 2β |w(x)| , if |w(x)| < β.

(5)

¯ and The ﬁrst-order optimality conditions of (P ∗ ) characterize the solution u ¯ of (P ∗ ) and (P ), respectively, by p ¯ − div¯ − μ¯ u + K ∗ λK u p = K ∗ λz in H −1 (Ω; RM ), max(β, |∇¯ u|)¯ p − ∇¯ u=0

2

in L (Ω; R ). M

(6a) (6b)

Note that the system (6) is non-smooth, i.e. not necessarily Fréchet-diﬀerentiable. The discrete version of this system can be solved by a semismooth Newton method [9, 11]. The generalized Newton solver converges globally, that is regardless of the initialization, and locally at a superlinear rate [9].

3

Spatially Dependent Regularization Parameter Selection

Since the capability of multi-scale vectorial total variation is mainly limited by the selection of the parameter λ, in this section we extend the way to choose λ proposed in [11] to the MVTV-model. Suppose the variance of Gaussian noise is σ 2 , which can be estimated easily in practice. With a correct choice of λ in the TV-model (1), the restored image u can satisfy the constraint |Ku − z|2 dx = σ 2 |Ω| (7) Ω

globally. However, the MVTV-model (3) represents a localized version of the constraint by allowing λ = λ(x). In order to enhance image details while preserving homogenous regions, the choice of λ must be based on local image features. Hence, we search for a reconstruction where the variance of the residual is closer to the noise variance in both the detail regions and the homogeneous parts. In order to achieve this goal we introduce local variance estimators (LVEs) for an automated adaptive choice of λ. 3.1

Local Variance Estimator

Consider the discrete version of the residual image rh = zh − K h uh , where uh is the restored image from the minimization problem (2) with λ > 0. If we use a relatively small parameter λ, the residual rh will include the noise as well as the details. Then, the average of the squared residual in a small window will reﬂect the distribution of details in the image. ω Let Ωi,j denote the set of pixel-coordinates in a ω-by-ω window centered at (i, j) (with obvious modiﬁcation near the boundary), i.e., ω ω

ω Ωi,j ≤ s, t ≤ , = (s + i, t + j) : − 2 2

Multi-scale Total Variation

275

where · means rounding to the nearest integer towards zero. Then we apply the mean ﬁlter with window size ω to the residual image rh as follows: LVEω i,j =

M 1 M ω2

rhs,t

ω k=1 (s,t)∈Ωi,j

2 k

=

M 1 M ω2

ω k=1 (s,t)∈Ωi,j

h 2 zs,t − (K h uh )s,t k .

Here LVE stands for a “Local Variance Estimator”. In general, LVEω has a large value in the detail regions, and it has a small value in the homogeneous regions. But the noise in the residual may also lead to some large LVE values in the homogeneous regions. In order to reduce the eﬀect due to noise, we utilize the conﬁdence interval technique well-known in statistics [16, 17] in connection with LVE. 3.2

Upper Bound for the Local Variance

In the discrete setting, all elements of n can be regarded as an array of independent normally distributed random variables with mean 0 and variance σ 2 . Then, the random variable M 1 ω Ti,j = 2 (nhs,t )2k σ ω k=1 (s,t)∈Ωi,j

ω has the χ2 -distribution with M ω 2 degrees of freedom, that is, Ti,j ∼ χ2Mω2 . Set M 1 ω Si,j := (zhs,t − (K h uh )s,t )2k . M ω2 ω k=1 (s,t)∈Ωi,j

ˆ h , then ˆ h satisﬁes nh = zh − K h u If uh = u ω Si,j =

M 1 M ω2

M 1 M ω2

ˆ h )s,t )2k (zhs,t − (K h u

ω k=1 (s,t)∈Ωi,j

=

(nhs,t )2k =

ω k=1 (s,t)∈Ωi,j

σ2 ω T . M ω 2 i,j

On the contrary, if the residual image zh − K h uh contains details, we expect ω Si,j =

M 1 M ω2

k=1

M 1 > M ω2

(zhs,t − (K h uh )s,t )2k

ω (s,t)∈Ωi,j

ω k=1 (s,t)∈Ωi,j

(nhs,t )2k =

σ2 ω T . M ω 2 i,j

ω > B for some pixel (i, j) Therefore, we search for a bound B such that Si,j implies that in the residual some details are left. Given m × m, the total number

276

Y. Dong and M. Hintermüller

of pixels in the color image with M channels, we propose to consider the expected σ2 ω 2 maximum of the m2 random variables Mω 2 Ts , s = 1, . . . , m , as the bound B: B ω,m :=

σ2 E( max T ω ), M ω 2 k=1,...,m2 k

(8)

where E represents the expected value of a random variable. Similar as proposed in [11], we get σ2 (Em (T ω ) + dm (T ω )), M ω2

B ω,m =

where Em (T ω ) = Td + βκm , dm (T ω ) = β π√6 , βm = m2 fm (Td ), κ = 0.577215, and m f(Td ) is the distribution of Td , which is the so-called dominant value. 3.3

Selection of the Parameter λ

Now, we use the conﬁdence interval for S ω to reduce the eﬀect from noise on the local variance estimators in order to distinguish the detail regions in the images correctly. Recall that LVEω denotes the mean of the squared residual in a given window. Ideally, there is only noise in the residual. Then LVEω should behave like S ω . Hence, whenever ω,m LVEω ), i,j ∈ [0, B

(9)

we assume that the window contains noise only. On the other hand, if (9) is not satisﬁed, we suppose that this is due to image details contained in the residual ω image in Ωi,j . This property is useful when updating the parameter λ locally. For adapting λ algorithmically we proceed as follows. Initially we assign a small positive value to λ. Then we restore the image iteratively by increasing λ according to the following rule: ˜ k+1 λ i,j

= ζ · min

˜k λ i,j

λk+1 i,j =

+ ω +ρ (LVEk )i,j − σ ,L ,

1 ω2

˜k+1 , λ s,t

(10a)

(10b)

ω (s,t)∈Ωi,j

where ζ ≥ 1, ρ > 0, (x)+ = max(x, 0), LVEω k is obtained from uk , L is a large ˜ k ∈ L∞ (Ω), and for each channel of the vectorial data positive value to ensure λ we use the same λk during restoration. In our numerics we choose ζ = 2 which comes from the method proposed in [12] (TNV-algorithm). Finally, we set the ˜ k ||∞ /σ in order to keep the new λ ˜ k+1 at the same scale parameter ρ = ρk = ||λ ˜ as λk .

Multi-scale Total Variation

4

277

Our Method

Recently, a multi-scale image decomposition method (TNV-algorithm) was proposed in [12], which uses the TV-model (1) to extract the details in the residual, and which varies the regularization parameter over a sequence of dyadic scales to capture diﬀerent features in the image. Although this method performs better than a number of existing methods, it satisﬁes the constraint (7) only globally, and does not consider the local characteristic of the features in the image. Referring to this decomposition method, we intertwine its idea with the MVTV-model (3), and combine it with the spatially dependent regularization parameter selection. This results in the following algorithm: Algorithm 2

2

1: Initialize uh0 = 0 ∈ RMm , ph0 = 0 ∈ RMm ×2 , λ0 = [λ0 , · · · , λ0 ] ∈ RM with 2 λ0 ∈ Rm and k = 0. 2: If k = 0 solve the discrete version of the minimization problem 1 μ 2 0 2 ˜ 0 = arg u min |∇u|2 dx + λ |Ku − z| dx + |∇u| dx, 2 Ω u∈H01 (Ω;RM ) 2 Ω Ω else compute vkh = zh − K h uhk and solve the discrete version of the minimization problem: 1 μ 2 k 2 ˜ k = arg u min |∇u|2 dx + λ |Ku − vk | dx + |∇u| dx, 2 Ω u∈H01 (Ω;RM ) 2 Ω Ω ˜ hk . 3: Update uhk+1 = uhk + u h 4: Based on uk+1 , update

+ ˜ k+1 = 2 · min λ ˜k + ρ λ LVEω − σ ,L , k

(λk+1 )i,j =

1 ω2

˜k+1 . λ s,t

ω (s,t)∈Ωi,j

5: Stop; or set k := k + 1 and go to step 2. A few remarks on the algorithm are in order. We initialize λ by a relatively small positive constant. In our numerical practice an 11-by-11 window turned out to yield reliable results. In Section 5, we study the inﬂuence of the window size on the restoration results. Similar to the Bregman iteration proposed in [18], we stop the iterative procedure as soon as the residual zh −K h uhk 2 drops below ξσ, where ξ > 1 relates to the image size. For m → ∞ we have ξ → 1.

5

Numerical Results

In this section we provide numerical results to study the behavior of our method with respect to its image restoration capabilities. We use two RGB color images

278

Y. Dong and M. Hintermüller

(a)

(b)

Fig. 1. Original images: (a) “Barbara”, (b) “Lena”

(a)

(b)

(c)

Fig. 2. Results of denoising image “Barbara” (the 1st row) and “Lena” (the 2nd row): (a) Noisy images, (b) Restored images (k = 3), (c) Final values of λ

(i.e., M = 3), “Barbara” (576-by-720) and “Lena” (512-by-512), as shown in Figure 1. Furthermore, from the experiments conducted on a broad variety of images we found that our method is robust with respect to the initial choice of λ. Thus, in all experiments listed here we use the same initial choice λ = 2.5. 5.1

Color Image Denoising

Here, we concentrate on image denoising, i.e., K h is the identity matrix. The degraded images containing Gaussian white noise with the noise level σ = 0.1. For a study of our method in the case of texture-like structures we zoom the

Multi-scale Total Variation

279

Fig. 3. Restored images by our method with diﬀerent ω: (a) ω = 5, (b) ω = 11, (c) ω = 17

(a)

(b)

(c)

Fig. 4. Results of restoring blurred noisy image “Barbara” (the 1st row) and “Lena” (the 2nd row): (a) Blurred noisy images, (b) Restored images (k = 5), (c) Final values of λ

two images in Figure 1 into certain regions. In all of our experiments the image intensity range is scaled to [0, 1]. The results are shown in Figure 2 together with the number of iterations k. We can see that our method suppresses the noise successfully while preserving the details. In addition, we also show the ﬁnal values of λ obtained by our choice rule. We ﬁnd that in detail regions λ is large in order to preserve the details, and it is small in the homogeneous regions to remove noise.

280

Y. Dong and M. Hintermüller

In order to test our method for diﬀerent values of the window size ω, Figure 3 shows the restored images with ω = 5, 11, 17. Except for some slight eﬀects, we observe a remarkable stability with respect to ω. 5.2

Color Image Deblurring and Denoising

In this section, we illustrate the restoration ability of our method for noisy blurred images. The blurring operator K is a cross-channel blurring operator with the kernel: ⎡ ⎤ ⎡ ⎤ Krr Krg Krb 0.8 · (M, 7, 135) 0.1 · (G, 9, 7) 0.1 · (A, 7) ⎣ Kgr Kgg Kgb ⎦ = ⎣ 0.1 · (A, 9) 0.8 · (M, 7, 90) 0.1 · (G, 5, 1) ⎦ , Kbr Kbg Kbb 0.1 · (G, 7, 5) 0.1 · (M, 7, 45) 0.8 · (A, 11) where (A, r) denotes the average blur with window size r, (G, r, σ) denotes the Gaussian blur with window size r and standard deviation σ, (M, l, θ) denotes the motion blur with length l and angle θ, and (r, g, b) are the three channels in the RGB color model. Further we have Gaussian white noise with σ = 0.02. Figure 4 depicts a part of the noisy blurred “Barbara” and “Lena” images with the restored results and ﬁnal values of λ. We ﬁnd that our method still can preserve most of the details; see, e.g., the features on the scarf. Furthermore, for noisy blurred images our method is still able to distinguish most of the detail regions properly.

6

Conclusion

A multi-scale vectorial total variation model with spatially adapted regularization parameter λ for color image restoration is proposed in this paper. The local variance estimator LVE of the residual image is extended to the multi-channel case, and turns out to be an accurate instrument for updating λ within an iterative procedure. Assuming that the noise variance σ 2 is known, the present algorithm is completely automatized, i.e., there is no necessity of tuning parameters. The numerical results show that the new method can restore the degraded images eﬃciently while preserving most details.

References 1. Vogel, C.: Computational Methods for Inverse Problems. Frontiers Appl. Math., vol. 23. SIAM, Philadelphia (2002) 2. Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 3. Dobson, D., Vogel, C.: Convergence of an iterative method for total variation denoising. SIAM J. Numer. Anal. 34, 1779–1791 (1997) 4. Chambolle, A., Lions, P.L.: Image recovery via total variation minimization and related problems. Numerische Mathematik 76, 167–188 (1997)

Multi-scale Total Variation

281

5. Chang, Q., Chern, I.L.: Acceleration methods for total variation-based image denoising. SIAM J. Applied Mathematics 25, 982–994 (2003) 6. Strong, D., Chan, T.: Edge-preserving and scale-dependent properties of total variation regularization. Inverse Problems 19, 165–187 (2003) 7. Chambolle, A.: An algorithm for total variation minimization and application. Journal of Mathematical Imaging and Vision 20, 89–97 (2004) 8. Hintermüller, M., Kunisch, K.: Total bounded variation regularization as bilaterally constrained optimization problem. SIAM J. Appl. Math. 64, 1311–1333 (2004) 9. Hintermüller, M., Stadler, G.: An infeasible primal-dual algorithm for total bounded variation-based inf-convolution-type image restoration. SIAM Journal on Scientiﬁc Computing 28(1), 1–23 (2006) 10. Almansa, A., Ballester, C., Caselles, V., Haro, G.: A TV based restoration model with local constraints. J. Sci. Comput. 34(3), 209–236 (2008) 11. Dong, Y., Hintermüller, M., Rincon-Camacho, M.: Automated parameter selection in a multi-scale total variation model. IFB-Report No. 22, Institute of Mathematics and Scientiﬁc Computing, University of Graz (November 2008) 12. Tadmor, E., Nezzar, S., Vese, L.: A multiscale image representation using hierarchical (BV, L2 ) decompositions. Multiscale Model. Simul. 2, 554–579 (2004) 13. Tadmor, E., Nezzar, S., Vese, L.: Multiscale hierarchical decomposition of images with applications to deblurring, denoising and segmentation. Comm. Math. Sci. 6, 1–26 (2008) 14. Bresson, X., Chan, T.: Fast dual minimization of the vectorial total variation norm and applications to color image processing. Inverse Problems and Imaging 2(4), 455–484 (2008) 15. Ekeland, I., Témam, R.: Convex Analysis and Variational Problems. Classics Appl. Math., vol. 28. SIAM, Philadelphia (1999) 16. Papoulis, A.: Probability, Random Variables, Stochastic Processes. McGraw Hill, New York (1991) 17. Mood, A.: Introduction to the Theory of Statistics. McGraw-Hill, New York (1974) 18. Osher, S., Burger, M., Goldfarb, D., Xu, J., Yin, W.: An iterative regularization method for total variation-based image restoration. SIAM Multiscale Model. and Simu. 4, 460–489 (2005)

Multiplicative Noise Cleaning via a Variational Method Involving Curvelet Coeﬃcients Sylvain Durand1 , Jalal Fadili2 , and Mila Nikolova3 1

2

3

M.A.P. 5 - CNRS, University Paris Descartes, France [email protected] http://www.math-info.univ-paris5.fr/∼sdurand/ GREYC CNRS-ENSICAEN-Université de Caen, France [email protected] http://www.greyc.ensicaen.fr/∼jfadili/ CMLA - CNRS, ENS Cachan, PRES UniverSud, France [email protected] http://www.cmla.ens-cachan.fr/∼nikolova/

Abstract. Classical ways to denoise images contaminated with multiplicative noise (e.g. speckle noise) are ﬁltering, statistical (Bayesian) methods, variational methods and methods that convert the multiplicative noise into additive noise (using a logarithmic function) in order to apply a shrinkage estimation for the log-image data and transform back the result using an exponential function. We propose a new method that involves several stages: we apply a reasonable under-optimal hard-thresholding on the curvelet transform of the log-image; the latter is restored using a specialized hybrid variational method combining an 1 data-ﬁtting to the thresholded coeﬃcients and a Total Variation regularization (TV) in the image domain; the restored image is an exponential of the obtained minimizer, weighted so that the mean of the original image is preserved. The minimization stage is realized using a properly adapted fast Douglas-Rachford splitting. The existence of a minimizer of our specialized criterion and the convergence of the minimization scheme are proved. The obtained numerical results outperform the main alternative methods.

1

Introduction

In many active imaging systems (e.g. synthetic aperture radar, laser or ultrasound imaging), the data for the unknown image S0 : Ω → R+ , Ω ⊂ R2 , are severely corrupted with multiplicative noise. Then several independent measurements for the same image are needed: Sk = S0 ηk + nk ,

∀k ∈ {1, · · · , K},

(1)

where ηk : Ω → R+ , and nk represent the multiplicative and a typically zeromean additive noise, ∀k. Commonly (see e.g. [27]) ηk is modeled as a onesided exponential probability density function (pdf) (cf. Fig. 1(a)): pdf(ηk ) = X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 282–294, 2009. c Springer-Verlag Berlin Heidelberg 2009

Multiplicative Noise Cleaning

283

μ e−μηk 1lR+ (ηk ) for μ = 1. In practice, one takes an average of all measurements, 1 K see e.g. Fig. 2(b). Since K k=1 nk ≈ 0, the data read (cf. e.g. [27, 1, 30]): S=

K K 1 1 Sk = S 0 ηk = S0 η . K K k=1

(2)

k=1

Usually all ηk are independent. Denoting by Γ the usual Gamma-function, the mean of the noise η in (2) has a Gamma distribution (cf. Fig. 1(b)): η=

K 1 ηk : K

pdf(η) =

k=1

K K η K−1 exp (−Kη) . Γ (K)

(3)

Various adaptive ﬁlters have been proposed, see e.g. [31,17]: they work well when the noise is moderate or weak, i.e. for K large. Bayesian, variational or diﬀusionbased methods have been proposed as well; see e.g. [28, 24, 18, 2]. Numerous methods convert the multiplicative noise into additive noise by v = log S = log S0 + log η = u0 + n,

(4)

see e.g. [16, 30, 1, 23]. Then the pdf of n reads (cf. Fig. 1(c)): −1 exp − K(n − en ) . n = log η : pdf(n) = K K Γ (K)

(5)

One can prove that E [n] = ψ0 (K) − log K and Var [n] = ψ1 (K), where ψk (z) = d k+1 log Γ (z) is the polygamma function. A common strategy is to decompose dz the log-data v into a multiscale frame for L2 (R2 ) (an over-complete basis), say W ≡ {wi : i ∈ I} where I is a set of indexes: y = W v = W u0 + W n.

(6)

By the Central Limit Theorem, the noise W n in y is nearly Gaussian — cf. Fig. 1(d). Then coeﬃcients y are denoised using shrinkage estimators T : R → R: yT [i] = T (W v)[i] , ∀i ∈ I. (7) Shrinkage functions designed for multiplicative noise were proposed e.g. in [30,1]. ≡ {w Let W i : i ∈ I} be a left inverse of W . Then a denoised log-image vT reads T ((W v)[i]) w i = T (y[i]) w i . (8) vT = i∈I

i∈I

Then the sought-after image is of the form ST = exp vT . 1

1

0

10

(a) ηk

0

1

1

(b) η =

K 2

1 K

k=1

1

−2

0

1

ηk (c) n = log η

Fig. 1. Noise distributions

−1

0

(d) W n

1

284

S. Durand, J. Fadili, and M. Nikolova

Our approach. We apply (4) and consider a tight-frame transform of the logdata. The restored log-image (section 2) minimizes a criterion composed of an 1 -ﬁtting to the (suboptimally) hard-thresholded frame coeﬃcients and a Total Variation (TV) regularization in the image domain. The minimization (section 3) uses a specialized Douglas-Rachford splitting. The full algorithm, involving a bias correction, is given in section 4. Experiments are presented in section 5. Some notations. (.T ) means transposed, (.∗ ) means convex conjugate and (. ) means adjoint.

2

Restoration of the Log-Image

Here we consider how to restore a good log-image given data v : Ωmega → R obtained using (4). We focus on methods which, for a given preprocessed data set, lead to convex optimization problems. We comment only variational methods and shrinkage estimators since they underly our specialized hybrid objective function. 2.1

Drawbacks of Shrinkage Restoration and Variational Methods

Shrinkage restoration. The main problems with these methods, sketched in (7)(8), is that shrinking large coeﬃcients entails an erosion of the spiky features, while shrinking small coeﬃcients yields Gibbs-like oscillations in the vicinity of edges and a loss of details in the textured area. On the other hand, if shrinkage is insuﬃcient, some coeﬃcients bearing mainly noise can remain almost unchanged—we call such coeﬃcients outliers—and (8) shows that they yield artifacts with the shape of the functions w i , see Fig. 2. Even though various improvements were brought, these artifacts remain visible—see the results on Fig. 3(d) and Fig. 4(c) in Section 5 using the very recent Stein-block thresholding [8].

(a) Noisy, K = 10

(b) T = 2 Var [n] (c) T = 4 Var [n] (d) T = 6 Var [n]

Fig. 2. (a) Noisy Lena obtained according to (1)-(2) for K = 10. (b)-(d) Restorations exp vTH where data v are denoised by hard-thresholding of its curvelet coeﬃcients, see (12)-(13), for diﬀerent choices of T .

Multiplicative Noise Cleaning

285

(b) Noisy: μ = 1, K = 10 (c)ˆ u by (11) and Sˆ by (34) see (2)-(3) psnr=26.2 db, mae=8.5

(a) Original (256 × 256)

(d) Stein-block [8] (e) AA algorithm [2] (f) Our method psnr=25.5 db, mae=9.4 psnr=25.4 db, mae=9.4 psnr=26.05 db, mae=8.8 Fig. 3. Restoration of (b) using modern methods. Note that (c) is a slightly improved version of [26] and that the restoration in (d) is done in the curvelet domain.

(a) Original

(b) Noisy

(c) Stein-block th.

(d) Our method

Fig. 4. (a) Shepp-Logan phantom (256 × 256). (b) Noisy, K = 10. (c) Denoised with Stein-block thresholding in the curvelet domain [8] PSNR=24.73dB, MAE=4. (d) Denoised with our algorithm PSNR=31.25dB, MAE=1.87.

Variational methods. In these methods, the restored function minimizes a criterion Fv of the form Fv (u) = ρ Ω

ψ u(t), v(t) dt +

ϕ(|∇u(t)|) dt, Ω

(9)

286

S. Durand, J. Fadili, and M. Nikolova

where ψ : R+→ R+ measures closeness to data and ϕ(|∇u(·)|) introduces priors 2 via a trade-oﬀ parameter ρ > 0. A classical choice is ψ = u(·)−v(·) . It is usually required that the potential function ϕ : R+ → R+ promotes images involving edges. Analysing the minimizers of Fv as solutions of PDE’s on Ω, Rudin, Osher and Fatemi [25] exhibited that ϕ(|∇u(t)|) = |∇u(t)|, leads to such images, where def for any z(t) = (z1 (t), z2 (t)) ∈ R2 , t ∈ Ω, one sets |z(t)| = z1 (t)2 + z2 (t)2 . The resulting regularization term is known as Total Variation (TV) and will be denoted by · TV . However, whatever smooth data-ﬁtting is chosen, this regularization yields images containing numerous constant regions (called staircasing eﬀect), hence textures and ﬁne details are removed, see [22]. The method in [2] is of this kind and operates in the image domain; the ﬁtting term is derived ˆ deﬁned by from (3) and the denoised image S, Sˆ = arg min FS for FS (Σ) = ρ(K) log Σ(t) + S(t)/Σ(t) dt + Σ TV , (10) Σ

exhibits constant regions (see section 5). In [26], the regularization Σ TV is changed into log Σ TV so as to reformulate the model as a convex problem, and not to over smooth the image parts with higher gray values. To recover the denoised image, we applied Sˆ ∝ exp(ˆ u) for u ˆ = arg min where Fv (u) = ρ u − v 2 + u TV . u

(11)

Following [25], various edge-preserving convex functions ϕ have been proposed; see [3] for a recent overview. Even though ϕ (0) = 0 alleviates stair-casing, a systematic drawback of the resulting restored images is that the amplitude of edges is underestimated; thus neat edges or spiky areas are subjected to erosion. 2.2

Hybrid Methods

Hybrid methods, see e.g. [9, 19, 5, 14], combine the information contained in the large coeﬃcients y[i] obtained according to (6) with priors directly on the image u. They amount to deﬁne the restored function u ˆ by minimize Φ(u) subject to u ˆ ∈ {u : |(W (u − v)) [i]| ≤ μi , ∀i ∈ I} . Using an edge-preserving regularization, such as Φ = TV is a pertinent choice. The selection of parameters {μi }i∈J is more tricky. This choice must take into account the magnitude of the relevant data coeﬃcient y[i]. However, choosing μi based solely on y[i], as done in these papers, is too rigid since there are either correct data coeﬃcients that incur smoothing (μi > 0), or noisy coeﬃcients that are left unchanged (μi = 0). A good compromise that we adopt is to determine (μi )i∈I based both on the data and on the prior term. 2.3

A Specialized Hybrid Criterion

Given the log-data v obtained by (4), we apply a frame transform as in (6) to get y = W v = W u0 + W n. The noise contained in the i-th datum reads n, wi .

Multiplicative Noise Cleaning

287

The low frequency approximation coeﬃcients carry important information on the image. Therefore, a good choice is to keep them intact at this stage. Let I∗ ⊂ I denote the subset of all such elements of the frame. Then we apply a hard-thresholding operator TH [12] to all coeﬃcients I \ I∗ :

0 if |t| ≤ T, def yTH [i] = TH y[i] , ∀i ∈ I \ I∗ , where TH (t) = (12) t otherwise, where T is an underoptimal threshold in order to preserve the information relevant to edges and to some ﬁne details in textured areas, contained in the small coeﬃcients. Let us consider vTH = W v[i] w i , where I1 = {i ∈ I : |y[i]| > T } ∪ I∗ . (13) i∈I1

The image vTH contains a lot of artifacts with the shape of the w i for those y[i] that are noisy but above the threshold T , as well as information on the ﬁne details in the original log-image u0 . In all cases, whatever the choice of T , an image of the form vTH is unsatisfactory—see Fig. 2. The denoised coeﬃcients, denoted by x ˆ, are obtained based on the under-thresholded data yTH . We focus on hybrid methods of the form: x ˆ = arg minx F (x) for x), where Ψ is a data-ﬁtting term in the frame domain and F (x) = Ψ (x, yTH ) + Φ(W Φ is an edge-preserving regularization term in the log-image domain. Let us denote I0 = I \ I1 = {i ∈ I \ I∗ : |y[i]| ≤ T }.

(14)

Coeﬃcients y[i] for i ∈ I0 can be of the two types. 1. Coeﬃcients y[i] bearing mainly noise—then the best choice is x ˆ[i] = 0; 2. Coeﬃcients y[i] relevant to edges and other details in u0 . Since y[i] is diﬃcult to distinguish from the noise, the relevant x ˆ[i] should be restored using the edge-preserving prior Φ. Note that a careful restoration must ﬁnd a nonzero x ˆ[i] in order to avoid Gibbs-like oscillations in u ˆ. Coeﬃcients y[i] for i ∈ I1 are of the following two types. 1. Large coeﬃcients which carry the main features of the sought-after function. They verify y[i] ≈ wi , u0 and can be kept intact. 2. Coeﬃcients highly contaminated by noise, i.e. |y[i]| | wi , u0 |. We call them outliers because if we had x ˆ[i] = y[i], then u ˆ would contain an artifact with the shape of w i since by (13) we get vTH = ˆ[j]w j + y[i]w i . Instead, x ˆ[i] must be restored according to the j\i x prior Φ. This analysis clearly deﬁnes the goals that the minimizer x ˆ of F is expected to achieve: restored coeﬃcients x ˆ[i] have to ﬁt yTH [i] exactly if they are coherent with the prior Φ, otherwise they have to be restored according to Φ. Since [21] it is known that such requirements can be satisﬁed by criteria F where Ψ is non-smooth at the origin (e.g. 1 ), see also [13]. For these reasons, we focus on F (x) = Ψ (x) + Φ(x),

(15)

288

S. Durand, J. Fadili, and M. Nikolova

where, for Λ = diag(λi )i∈I , λi |(x − y)[i]| + λi |x[i]| = Λ(x − yTH ) 1 , Ψ (x) = i∈I1 ∪I∗

Φ(x) =

Ω

(16)

i∈I0

x| ds = W x . |∇W TV

(17)

In the pre-processing step (12) we do not recommend the use of a shrinkage function other than TH since it will alter all the data coeﬃcients without restoring them faithfully. Via TH , we base our restoration on data yTH where all nonthresholded coeﬃcients keep the original information on the sought-after image. The theorem stated next addresses the existence and the uniqueness of a minimizer for F . Given y, let Gy be the (convex) set of all minimizers of F : def Gy = x ˆ ∈ 2 (I) : F (ˆ x) = min F (x) . (18) 2 x∈ (I)

2

Theorem 1. [13] For y ∈ (I) and T > 0 given, consider F as defined in (15), where Ω ∈ R2 is open, bounded and its boundary ∂Ω is Lipschitz. Suppose is the pseudo-inverse that {wi }i∈I is a frame of L2 (Ω) and the operator W of W . Assume also that λmin = min λi > 0. Then Gy is nonempty, and for all i∈I

x x x ˆ1 , x ˆ2 ∈ Gy , ∇W ˆ1 ∝ ∇W ˆ2 , a.e. on Ω. x x In words, Sˆ1 = W ˆ1 and Sˆ2 = W ˆ2 have the same level lines, i.e. they diﬀer by a local change of contrast; the latter is usually invisible to the naked eye. The choice of λi is investigated in [13]. Following this analysis, we use only two values for λi , depending only on the set I the index i belongs to. We focus on curvelets transforms of the log-data because (a) such a transform captures eﬃciently the main features of the data and (b) it is a tight-frame which is helpful for the subsequent numerical stage.

3

Minimization for the Log-Image

Let Γ0 (H) denote the class of proper lower-semicontinuous convex functions on a Hilbert space H. Now we focus on the minimization problem ﬁnd xˆ such that F (ˆ x) =

min F for F = Ψ + Φ, x

(19)

where Ψ and Φ are deﬁned in (16)-(17). Clearly, Ψ, Φ ∈ Γ0 (2 (I)), hence F ∈ Γ0 (2 (I)). The set Gy in (18) is non-empty by Theorem 1 and can be rewritten as Gy = {x ∈ 2 (I) x ∈ (∂F )−1 (0)}, where ∂F stands for subdiﬀerential. Minimizing F amounts to ﬁnding a solution to the ﬁxed point equation x = (Id + γ∂F )−1 (x) ,

(20)

where (Id + γ∂F )−1 is the resolvent operator associated to ∂F , γ > 0 is the proximal stepsize and Id is the identity map on 2 (I). Since (Id + γ(∂Ψ + ∂Φ))−1 cannot be calculated in closed-form, we focus on splitting methods that use separately the resolvent operators (Id + γ∂Ψ )−1 and (Id + γ∂Φ))−1 .

Multiplicative Noise Cleaning

3.1

289

Specialized Douglas-Rachford (D-R) Splitting Algorithm

The D-R family is the most general class of monotone operator splitting methods. Given a sequence μt ∈ (0, 2), D-R methods can be expressed via the recursion μt μt Id+ (2(Id+γ∂Ψ )−1− Id) ◦ (2(Id+γ∂Φ)−1− Id) x(t) . (21) x(t+1)= 1− 2 2 Since problem (19) has solutions, we have the following convergence result: Theorem 2. Let γ > 0 and μt ∈ (0, 2) be such that t∈N μt (2 − μt ) = +∞. Take x(0) ∈ 2 (I) and consider the sequence of iterates defined by (21). Then, (x(t) )t∈N converges weakly to some point x ˆ ∈ 2 (I) and (Id+γ∂Φ)−1 (ˆ x) ∈ Gy . The statement follows from [10, Corollary 5.2]. The sequence μt = 1, ∀t ∈ N ﬁts. 3.2

Proximal Calculus

Proximity operators, invented in [20], generalize convex projection. Definition 1 (Moreau [20]). Let ϕ ∈ Γ0 (H). Then ∀x ∈ H the function z → 2 ϕ(z)+ x−z /2, for z ∈ H, achieves its infimum at a unique point denoted by proxϕ x. The relevant operator proxϕ : H → H is the proximity operator of ϕ. By the minimality condition for proxϕ , it is easy to see that ∀x, p ∈ H we have p = proxϕ x ⇐⇒ x − p ∈ ∂ϕ(p) ⇐⇒ (Id + ∂ϕ)−1 = proxϕ . By introducing def

the reﬂection operator rproxϕ = 2proxϕ − Id, the D-R iteration (21) reads μt μt Id + rproxγΨ ◦ rproxγΦ x(t) . (22) x(t+1) = 1 − 2 2 Proximity operator of Ψ

Lemma 1. Let x ∈ 2 (I). Then proxγΨ (x) = yTH [i]+TS γλi (x[i]−yTH [i]) , i∈I where TS γλi (z[i]) = max 0, z[i] − γλi sign(z[i]) . The proof is quite standard and can be found in our Report [15]. Note that rproxγΨ (x) = 2 yTH [i] + TS γλi (x[i] − yTH [i]) −x . (23) i∈I

(x). Computing proxγΦ Proximity operator of Φ. Clearly, Φ(x) = · TV ◦ W for an arbitrary W may be intractable. We assume that : 2 (I) → L2 (Ω) is surjective; (w1) W W = Id and W = c−1 W for 0 < c < ∞; note that W W = c Id; (w2) W (w3) W is bounded.

Let X = L2 (Ω) × L2 (Ω), ·, · X be the inner product in X and

·

p , p ∈ γ

[1, ∞] the Lp -norm on X . DeﬁneB ∞ (X ) as the γ-radius closed L∞ -ballin X ,

γ def B ∞ = z ∈ X :

z

∞ ≤ γ = z = (z1 , z2 ) ∈ X : |z(t)| ≤ γ, ∀t ∈ Ω , and γ

PB γ (X ) : X → B ∞ (X ) the associated projector. ∞

290

S. Durand, J. Fadili, and M. Nikolova γ

Lemma 2. Let x ∈ 2 (I) and B ∞ (X ) is as defined above. Then: (x) ; proxγΦ (x) = Id − W ◦ Id − proxc−1 γ·TV ◦ W

(24)

(25) proxc−1 γ·TV (u) = u − PC (u) ,

γ/c where C = div(z) ∈ L2 (Ω) z ∈ Cc∞ (Ω × Ω), z ∈ B ∞ (X ) . (26) 2 Sketch of the proof. By (w1), range(W ) = L (Ω). Using that domain( · TV ) = = {0}. Statement (i) follows from L2 (Ω), we ﬁnd cone dom · TV − range W

applying [11, Proposition 11] whose requirements are satisﬁed. If ϕ ∈ Γ0 (L2 (Ω)) and ϕ∗ is its convex conjugate, the Moreau decomposition [20, Proposition 4.a] asserts proxϕ + proxϕ∗ = Id .

(27)

Since the conjugate function of a norm is the indicator function of the ball of ∗ ∈ C. where C is given its dual norm, c−1 γ · TV (z) = 0 if z ∈ C, +∞ if z ∗ = PC . Identifying c−1 γ . TV with in (26). Using Deﬁnition 1, prox −1 c γ.TV ∗ ϕ and c−1 γ . TV with ϕ∗ , equation (27) leads to (ii)1 . From (24)-(25) we easily ﬁnd that (x) . rproxγΦ (x) = Id − 2W ◦ PC ◦ W

(28)

Calculation of the projection PC in (25) on a discrete grid. In this case, W is an M×N tight frame with M= #I N = #Ω and assumption (w2) reads W = Id and W = c−1 W T , c ∈ (0, ∞) hence W T W = c Id). The discrete W counterpart of X is X = 2 (Ω) × 2 (Ω). We denote the discrete gradient by ¨ (cf. [6] or [29]) and the discrete divergence Div : X → 2 (Ω) is deﬁned as ∇ ¨ . Moreover, C in (26) admits a simpler expression: Div = −∇

γ/c (29) C = Div(z) ∈ 2 (Ω) z ∈ B ∞ (X ) , γ/c

where B ∞ (X ) is deﬁned using the new discrete notations. The projection PC in (25) does not admit an explicit form so we provide an iterative scheme for its calculation in the next lemma. Lemma 3. We adapt all assumptions of Lemma 2 to the new discrete setting, as explained above. Consider the forward-backward iteration ¨ Div(z (t) ) − cu/γ z (t+1) = PB 1 (X ) z (t) + βt ∇ (30) ∞

(31) 0 < inf βt ≤ sup βt < 1/4 t t z[i, j] if |z[i, j]| ≤ 1; PB 1 (X ) (z)[i, j] = (32) ∞ z[i, j]/|z[i, j]| otherwise . for

where ∀(i, j) ∈ Ω , 1

Note that our argument (27) to compute proxc−1 γ·TV (u) is not used in [6], which instead uses conjugates and bi-conjugates of the objective function.

Multiplicative Noise Cleaning

291

Then 1

(i) (z (t) )t∈N converges to a point zˆ ∈ B ∞ (X ); −1 (t) (ii) c γDiv(z ) converges to c−1 γDiv(ˆ z ) = (Id − proxc−1 γ·TV )(u). t∈N

The proof of this lemma can be found in our Report [15]. The iteration proposed in (30) to compute the proximity operator of the TV-norm is diﬀerent from the projection algorithm of [6]. A similar iteration was proposed in [7] and in some other articles. The proof we gave is however simpler as it uses known properties of proximity operators. Note that computing prox·TV amounts to solving a discrete ROF-denoising. Our iteration to solve this problem is one possibility among others, see e.g. a recent report [4]. A crucial property of the D-R scheme (22) is its robustness to numerical errors that may occur when computing the proximity operators proxΨ and proxΦ , see [10]. More precisely, let at ∈ 2 (I) be an error term that models the inexact computation of proxγΦ in (24), as the latter is obtained through (30). If the sequence of error terms (at )t∈N and stepsizes (μt )t∈N in Theorem 2 obey t∈N μt at < +∞, then the D-R algorithm (22) converges [10, Corollary 6.2]. In our experiments, using 200 inner iterations in (30) is suﬃcient to satisfy this requirement. 3.3

Bias Correction to Recover the Sought-After Image

x Recall from (4) that u0 = log S0 and set u ˆ=W ˆ(NDR ) as the estimator of u0 , where NDR is the number of D-R iterations in (22). Unfortunately, the estimator u ˆ is prone to bias, i.e. E [ˆ u] = u0 − buˆ . A problem that classically arises in statistical estimation is how to correct such a bias. More importantly is how this bias aﬀects the estimate after applying the inverse transformation, here the ˆ exponential. Our goal is then to ensure that for the estimate S of the image, we ˆ ˆ have E S = S0 . Expanding S in the neighborhood of E [ˆ u], we have u])(1+Var [ˆ u] /2+R2 ) = S0 exp (−buˆ )(1+Var [ˆ u] /2+R2 ) , (33) E euˆ = exp (E [ˆ where R2 is expectation of the Lagrange remainder in the Taylor series. One can observe that the posterior distribution of u ˆ is nearly symmetric, hence R2 ≈ 0. Then buˆ ≈ log(1v +Var [ˆ u] /2) ensures unbiasedness. Consequently, ﬁnite sample (nearly) unbiased estimates of u0 and S0 are respectively u ˆ + log(1 + Var [ˆ u] /2), and exp (ˆ u) (1 + Var [ˆ u] /2). Var [ˆ u] can be reasonably estimated by ψ1 (K), the variance of the noise n in (4) being given in (1). Thus, given the restored logimage u ˆ, our denoised image read: Sˆ = exp (ˆ u) (1 + ψ1 (K)/2) .

4

(34)

Full Algorithm to Suppress Multiplicative Noise

Piecing together Lemmas 1 and 2, and Theorem 2, we write down the full multiplicative noise removal algorithm:

292

S. Durand, J. Fadili, and M. Nikolova

Task: Denoise an image S corrupted with multiplicative noise according to (2). Parameters: The observed noisy image S, number of iterations NDR (DouglasRachford outer iterations) and NFB (Forward-Backward inner iterations), stepsizes μt ∈ (0, 2), 0 < βt < 1/4 and γ > 0, tight-frame transform W and initial threshold T (e.g. T = 2 ψ1 (K)), regularization parameters λ0,1 associated to the sets I0,1 . Specific operators: (a) TS γλi (z) = max 0, z[i] − γλi sign(z[i]) , ∀z ∈ R#I . i∈I

z[i, j] if |z[i, j]| ≤ 1 (b) ∀(i, j) ∈ Ω, PB 1 (X ) (z)[i, j] = ∞ z[i, j]/|z[i, j]| else. ¨ and Div—the discrete versions of the continuous operators ∇ and div. (c) ∇ (d) ψ1 (·) deﬁned according to (1) (built-in Matlab function). Initialization: Compute v = log S and transform coeﬃcients y = W v. Hardthreshold y at T to get yTH . Choose x(0) . Main iteration: For t = 1 to NDR , x(t) . (1) Inverse curvelet transform of x(t) according to u(t) = W (0) (2) Initialize z ; For s= 0 to NFB− 1 ¨ Div(z (s) ) − c u(t) . z (s) + βt ∇ z (s+1) = P 1 B ∞ (X )

γ

(3) Set z (t) = z (NFB) and compute w(t) = c−1 γ Div(z (t) ). (4) Forward curvelet transform: α(t) = W w(t) . (5) Compute r(t) = rproxγΦ (x(t) ) = x(t) − 2α(t) . (6) Find q (t)= rproxγΨ ◦ rproxγΦ x(t)= 2 yTH [i]+TS γλi r(t) [i]−yTH [i] −r(t) . (t) (7) Update x(t+1) : x(t+1) = (1 − μt /2) x(t) + (μt /2)q . x(NDR ) (1 + ψ1 (K)/2). Output: Denoised image Sˆ = exp W

5

i∈I

Experiments

In all experiments, our algorithm was run using second-generation curvelet tight frame along with the following set of parameters: ∀t, μt ≡ 1, βt = 0.24, γ = 10 and NDR = 50. The initial threshold T was set to 2 ψ1 (K). For comparison purposes, some very recent multiplicative noise removal algorithms from the literature are considered: the AA algorithm [2] minimizing the criterion in (10), and the Stein-block denoising method [8] in the curvelet domain, applied on the log transformed image. The latter is a sophisticated shrinkage-based denoiser that thresholds the coeﬃcients by blocks rather than individually, and has been shown to be nearly minimax over a large class of images in presence of various additive bounded noises. We also tried the L2-TV method where the restored log-image u ˆ minimizes (11) and the denoised image Sˆ involves the bias correction (34). Thanks to the bias correction, it can be seen as an improved version of the ﬁrst method proposed in the recent Report [26, § 4.1]. For fair comparison, the hyperparameters for all competitors were tweaked to reach their best level of performance on each noisy realization.

Multiplicative Noise Cleaning

293

The denoising algorithms were tested on two images: Lena and Boat, all of size 256×256 and gray-scale in the range [1, 256]. For each image, a noisy observation is generated by multiplying the original image by a realization of noise according to (2)-(3) for K = 10. The running time of our denoising method is 1 minute 3 seconds for 50 iterations on Intel 2.5 GHz Core Duo. The denoising performance of any algorithm is measured in terms of peak signal-to-noise ratio (PSNR) and mean absolute-deviation MAE, namely √ PSNR = 20 log10 N S0 ∞ / Sˆ − S0 dB and MAE = Sˆ − S0 /N . 2

1

The results are depicted in Figs. 3 and 4. Note that the AA algorithm tends to over-regularize the solution. Our denoiser clearly outperforms its competitors.

References 1. Achim, A., Tsakalides, P., Bezerianos, A.: Sar image denoising via bayesian wavelet shrinkage based on heavy-tailed modeling. IEEE Trans. Geosci. Remote Sens. 41(8), 1773–1784 (2003) 2. Aubert, G., Aujol, J.-F.: A variational approach to remove multiplicative noise. J. on Applied Mathematics 68(4), 925–946 (2008) 3. Aubert, G., Kornprobst, P.: Mathematical problems in image processing, 2nd edn. Springer, Berlin (2006) 4. Aujol, J.-F.: Some algorithms for total variation based image restoration. Report CLMA 2008-05 (2008) 5. Candès, E.J., Guo, F.: New multiscale transforms, minimum total variation synthesis. Applications to edge-preserving image reconstruction. Signal Processing 82 (2002) 6. Chambolle, A.: An algorithm for total variation minimization and application. J. of Mathematical Imaging and Vision 20(1) (2004) 7. Chambolle, A.: Total variation minimization and a class of binary MRF models. In: Rangarajan, A., Vemuri, B.C., Yuille, A.L. (eds.) EMMCVPR 2005. LNCS, vol. 3757, pp. 136–152. Springer, Heidelberg (2005) 8. Chesneau, C., Fadili, J., Starck, J.-L.: Stein block thresholding for image denoising. Technical report (2008) 9. Coifman, R.R., Sowa, A.: Combining the calculus of variations and wavelets for image enhancement. Applied and Computational Harmonic Analysis 9 (2000) 10. Combettes, P.L.: Solving monotone inclusions via compositions of nonexpansive averaged operators. Optimization 53(5) (2004) 11. Combettes, P.L., Pesquet, J.-C.: A Douglas-Rachford splittting approach to nonsmooth convex variational signal recovery. IEEE J. of Selected Topics in Signal Processing 1(4), 564–574 (2007) 12. Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. Biometrika 81(3), 425–455 (1994) 13. Durand, S., Nikolova, M.: Denoising of frame coeﬃcients using l1 data-ﬁdelity term and edge-preserving regularization. SIAM J. on Multiscale Modeling and Simulation 6(2), 547–576 (2007) 14. Durand, S., Froment, J.: Reconstruction of wavelet coeﬃcients using total variation minimization. SIAM J. on Scientiﬁc Computing 24(5), 1754–1767 (2003)

294

S. Durand, J. Fadili, and M. Nikolova

15. Durand, S., Fadili, J., Nikolova, M.: Multiplicative noise removal using L1 ﬁdelity on frame coeﬃcients. Report CMLA n.2008-40 (2008) 16. Fukuda, S., Hirosawa, H.: Suppression of speckle in synthetic aperture radar images using wavelet. Int. J. Remote Sens. 19(3), 507–519 (1998) 17. Krissian, K., Westin, C.-F., Kikinis, R., Vosburgh, K.G.: Oriented speckle reducing anisotropic diﬀusion. IEEE Trans. on Image Processing 16(5), 1412–1424 (2007) 18. Ma, J., Plonka, G.: Combined Curvelet Shrinkage and Nonlinear Anisotropic Diffusion. IEEE Trans. on Image Processing 16(9), 2198–2206 (2007) 19. Malgouyres, F.: Mathematical analysis of a model which combines total variation and wavelet for image restoration. J. of information processes 2(1), 1–10 (2002) 20. Moreau, J.-J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. CRAS Sér. A Math 21. Nikolova, M.: Minimizers of cost-functions involving nonsmooth data-ﬁdelity terms. Application to the processing of outliers. SIAM J. on Numerical Analysis 40(3), 965–994 (2002) 22. Nikolova, M.: Weakly constrained minimization. Application to the estimation of images and signals involving constant regions. J. of Mathematical Imaging and Vision 21(2), 155–175 (2004) 23. Pizurica, A., Wink, A.M., Vansteenkiste, E., Philips, W., Roerdink, J.B.T.M.: A review of wavelet denoising in mri and ultrasound brain imaging. Current Medical Imaging Reviews 2(2), 247–260 (2006) 24. Rudin, L., Lions, P.-L., Osher, S.: Multiplicative denoising and deblurring: Theory and algorithms. In: Osher, S., Paragios, N. (eds.), pp. 103–119. Springer, Heidelberg (2003) 25. Rudin, L., Osher, S., Fatemi, C.: Nonlinear total variation based noise removal algorithm. Physica 60D, 259–268 (1992) 26. Shi, J., Osher, S.: A nonlinear inverse scale space method for a convex mutiplicative noise model. In: UCLA 2007 (2007) 27. Ulaby, F., Dobson, M.C.: Handbook of Radar Scattering Statistics for Terrain. Artech House, Norwood (1989) 28. Walessa, M., Datcu, M.: Model-based despeckling and information extraction from sar images. IEEE Trans. Geosci. Remote Sens. 38(9), 2258–2269 (2000) 29. Welk, M., Steidl, G., Weickert, J.: Locally analytic schemes: A link between diﬀusion ﬁltering and wavelets shrinkage. Applied and Computational Harmonic Analysis 24, 195–224 (2008) 30. Xie, H., Pierce, L.E., Ulaby, F.T.: SAR speckle reduction using wavelet denoising and markov random ﬁeld modeling. IEEE Trans. Geosci. Remote Sensing 40(10), 2196–2212 (2002) 31. Yu, Y., Acton, S.T.: Speckle reducing anisotropic diﬀusion. IEEE Trans. on Image Processing 11(11), 1260–1270 (2002)

Projected Gradient Based Color Image Decomposition Vincent Duval, Jean-François Aujol, and Luminita Vese 1

Institut TELECOM, TELECOM ParisTech, CNRS UMR 5141 [email protected] 2 CMLA, ENS Cachan, CNRS, UniverSud [email protected] 3 UCLA, Mathematics Department [email protected]

Abstract. This work deals with color image processing, with a focus on color image decomposition. The problem of image decomposition consists in splitting an original image f into two components u and v = f − u. u contains the geometric information of the original image, while v is made of the oscillating patterns of f , such as textures. We propose a numerical scheme based on a projected gradient algorithm to compute the solution of various decomposition models for color images or vectorvalued images. A direct convergence proof of the scheme is provided, and some analysis on color texture modeling is given. Keywords: Color image decomposition, projected gradient algorithm, color texture modeling.

1

Introduction

Total variation regularization was introduced almost 20 years ago for image restoration in the seminal work by Rudin et al [1]. It has now grown as a popular and widely used tool in image processing (see [2, 3] and references therein for instance). If we denote by f the original image, the problem we are interested in consists in minimizing energies of the type: |Du| + μf − ukT . (1) Here |Du| is the total variation of u; we simply have |Du| = |∇u| dx in the case when u is regular. .T stands for a norm which favors the noise and/or the textures of the original image f (in the sense that it is small for such features) and k is a positive exponent. The most basic choice for .T is the L2 norm, and k = 2. However, inspired from the book by Y. Meyer [4], and also motivated by work of Mumford-Gidas [5], other spaces have been considered for modeling natural images and oscillating patterns such as textures or noise. [4] was the inspiration source of many works, e.g. to name a few [6, 7, 8, 9, 10, 11, 12, 13, 14]. Image decomposition consists in splitting an original image f into two components, u X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 295–306, 2009. c Springer-Verlag Berlin Heidelberg 2009

296

V. Duval, J.-F. Aujol, and L. Vese

and v = f − u. u contains the geometrical component of the original image (it can be seen as a sketch of the original image), while v is made of the oscillatory component (when the original image f is noise free, v is the texture component). In this work, we are concerned with color image processing. While some authors deal with color images using a Riemannian framework, like G. Sapiro and D. L. Ringach [15] or N. Sochen et al [16], others combine a functional analysis viewpoint with the Chromaticity-Brightness representation [17]. The model we use is more basic: it is the same as the one used in [18] (and related with [19]). Its advantage is to have a rich functional analysis interpretation. Note that in [20], the authors also propose a cartoon + texture color decomposition and denoising model inspired from Y. Meyer [4], using the vectorial versions of total variation and approximations of the space G(Ω) for textures (to be deﬁned later); unlike the work presented here, they use Euler-Lagrange equations and a gradient descent scheme for the minimization. Here, we give some insight into the deﬁnition of a texture space for color images. In [21], a TV-Hilbert model was proposed for image restoration and/or decomposition: |Du| + μf − u2H (2) where .H stands for the norm of some Hilbert space H. This is a particular case of problem (1). Thanks to the Hilbert structure of H, diﬀerent methods can be used to minimize (2), such as a projection algorithm [21]. We extend (2) to the case of color images. From a numerical point of view, (1) is not straightforward to minimize. Depending on the choice of .T , the minimization of (1) can be quite challenging. Even in the simplestcase when .T is the L2 norm and k = 2, handling the total variation term |Du| needs to be done with care. The most classical approach consists in writing the associated Euler-Lagrange equation to problem (1). In [1], a ﬁxed step gradient descent scheme is used to compute the solution. This method has on the one hand the advantage of being very easy to implement, and on the other hand the disadvantage of being quite slow. To improve the convergence speed, quasi-Newton methods have been proposed [22]. Duality based schemes have also drawn a lot of attention to solve (1): ﬁrst by Chan, Golub and Mulet in [23], later by A. Chambolle in [24] with a projection algorithm. This projection algorithm has recently been extended to the case of color images in [18]. It has been shown that graph cuts based algorithms could also be used [25,26]. Let us notice that it is shown in [27,28] that Nesterov’s schemes provide fast algorithms for minimizing (1). Another variant of Chambolle projection algorithm [24] is to use a projected gradient algorithm [25, 28, 29]. Here we have decided to use this approach which has both advantages of being easy to implement and of being quite eﬃcient. The plan of the paper is the following. In Sect. 2, we deﬁne and provide some analysis about the spaces we consider in the paper. In Sect. 3, we extend the TVHilbert model originally introduced in [21] to the case of color images. In Sect. 4, we present a projected gradient algorithm to compute a minimizer of problem (2). This projected gradient algorithm has ﬁrst been proposed by A. Chambolle

Projected Gradient Based Color Image Decomposition

297

in [25] for total variation regularization. A proof of convergence was given in [28] relying on optimization results by Bermudez and Moreno [30]. We derive here a simple and direct proof of convergence. In Sect. 5, we apply this scheme to solve various classical denoising and decomposition problems. We illustrate our approach with many numerical examples.

2

Definitions and Properties of the Considered Color Spaces

In this section, we introduce some notations, and we provide some analysis of the functional analysis spaces we consider to model color textures. 2.1

Introduction

Let Ω be a Lipschitz convex bounded open set in R2 . We model color images as RM -valued functions deﬁned on Ω. The inner product in L2 (Ω, RM ) is denoted M as u, vL2 (Ω,RM ) = Ω i=1 ui vi . For a vector ξ ∈ RM , we deﬁne the norms: M M |ξ|1 = |ξi |, |ξ|2 = ξi2 , |ξ|∞ = max |ξi | . i=1

i=1

i=1...M

We say that a function f ∈ L1 (Ω, RM ) has bounded variation if the following quantity is ﬁnite: |f |T V = supξ∈B f , div ξL2 (Ω,RM ) , with B = {ξ ∈ Cc1 (Ω, R2×M )/∀x ∈ Ω, |ξ(x)|2 ≤ 1} .

(3)

This quantity is called the total variation. For more information on its properties, we refer the reader to [3]. The set of functions with bounded variation is a vector space classically denoted by BV (Ω, RM ). For f smooth enough, the total M 2 variation of f is |f |T V = Ω i=1 |∇fi | dx. Other choices of sets B are possible (see [18] for a discussion), which are mathematically equivalent and deﬁne the same BV space. But in practice, in image processing, it is crucial to have a coupling between the channels as in (3) in order to avoid visual artifacts. 2.2

The Color G(Ω) Space

The G(R2 ) space was introduced by Y. Meyer in [4] to model textures in grayscale images. For the generalization to color images, we will adopt the framework of [8]; the color space G(Ω) is also used in [20], as a generalization of [6] to color image decomposition and color image denoising. Definition 1. The space G(Ω) is defined by: G(Ω) = {v ∈ L2 (Ω, RM )/ ∃ξ ∈ L∞ (Ω, (R2 )M ), ∀i = 1, . . . , M, vi = div ξi and ξi · N = 0 on ∂Ω}

298

V. Duval, J.-F. Aujol, and L. Vese

(where ξi · N refers to the normal trace of ξi over ∂Ω). One can endow it with the norm: vG = inf{ξ∞ , ∀i = 1, . . . , M, vi = div ξi , ξi · N = 0 on ∂Ω} M 2 with ξ∞ = ess sup i=1 |ξi | . The following result was proved in [9] for grayscale images: it characterizes G(Ω). Working component by component, it is straightforward to extend it to color images (see [31]). Proposition 1 G(Ω) =

v ∈ L2 (Ω, RM )/ v=0 . Ω

Remark 1. The topology induced by the G-norm on G(Ω) is coarser than the one induced by the L2 norm. Let us consider, for m ∈ N∗ , the sequence ∀k = (k) 1 . . . M, fm (x, y) = cos mx + cos my deﬁned on (−π, π)M . The vector ﬁeld 1 1 ξ (k) = ( m sin(mx), m sin(my)) satisﬁes the boundary condition, and its diver√

2M and limm→+∞ f m G = gence is equal to f m . As a consequence f m G ≤ m 0. Yet, it is easy to see that f m 2L2 (Ω,RM ) = 4M π 2 . The sequence f m converges to 0 for the topology induced by the G-norm, but not for the one induced by the L2 norm.

More generally, oscillating patterns with zero mean have a small G norm (see [4] for more details).

3 3.1

Color TV-Hilbert Model: Presentation and Mathematical Analysis Presentation

The TV-Hilbert framework was introduced for grayscale images by J.-F. Aujol and G. Gilboa in [21] as a way to approximate the BV-G model. They prove that one can extend Chambolle’s algorithm to this model. In this section we show that this is still true for color images. We are interested in solving the following problem: 1 inf |u|T V + f − u2H (4) u 2λ where H is the space of zero-mean functions of L2 (Ω, RM ), regarded as a Hilbert space endowed with the following norm : v2H = v, KvL2 (Ω,RM ) . Here we assume that K : H → L2 (Ω, RM ) is a symmetric positive deﬁnite, bounded linear operator (for the topology induced by the L2 (Ω, RM ) norm on H) and K −1 is bounded on Im(K).

Projected Gradient Based Color Image Decomposition

299

Example 1 (The Rudin-Osher-Fatemi model). It was proposed in [1] for grayscale images, then extended to color images using diﬀerent methods (e.g. [15] or [19]). In [18], the authors use another kind of color total variation, which is the one we use in this paper. The idea is to minimize the functional: |u|T V +

1 f − u2L2 (Ω,RM ) . 2λ

(5)

Without loss of generality, we can assume that f has zero mean. Then this model becomes a particular case of the TV-Hilbert model with K = Id. Example 2 (The OSV model). In [7], S. Osher, A. Solé and L. Vese propose to model textures by the H −1 space. In order to generalize this model, we introduce the following functional : 1 inf |u|T V + |∇Δ−1 (f − u)|2 (6) u 2λ Ω ⎛ −1 ⎞ ⎛ ⎞ Δ v1 ∇ρ1 M ⎜ ⎟ ⎜ . ⎟ .. 2 where Δ−1 v = ⎝ ⎠, ∇ρ = ⎝ .. ⎠, |∇ρ|2 = j=1 |∇ρj | and .

Δ−1 vM ∇ρM M −1 2 −1 i |∇Δ (f −u)| = Ω i=1 |∇Δ (f −ui )|2 = f −u, −Δ−1 (f −u)L2 (Ω,RM ) . Ω For K = −Δ−1 , the Osher-Solé-Vese problem is a particular case of the TV-Hilbert framework. We also refer to L. Lieu, L. Vese [14] for more general (BV, H −s ) models, as particular cases of the TV-Hilbert formulation.

3.2

Mathematical Study

For f ∈ L2 (Ω, RM ), the existence and uniqueness of the minimizer u of (4) can be proved using standard methods (see [3]). Now, let us introduce the notation v = f − u, when u is a minimizer of (4). Following Y. Meyer’s steps, one can extend the result proposed in [4] for grayscale images (see [31] for a detailed proof): Theorem 1 (Characterization of minimizers) Let f ∈ L2 (Ω, RM ). (i) If KfG ≤ λ then the solution of the TV-Hilbert problem is given by (u, v) = (0, f ). (ii) If Kf G > λ then the solution (u, v) is characterized by: KvG = λ and u, KvL2 (Ω,RM ) = λ|u|T V . For λ > 0, the set Gλ = {v ∈ L2 (Ω, RM ), vG ≤ λ} is a closed convex set, as well as K −1 Gλ . The orthogonal projection of this set is well-deﬁned and we can notice that Theorem 1 reformulates:

H v = PK −1 G (f ) λ . u= f −v

300

V. Duval, J.-F. Aujol, and L. Vese

That is, v is the orthogonal projection of f on the set K −1 Gλ . Therefore, the problem is equivalent to its dual formulation, with v = λK −1 div p : inf λK −1 div p − f 2H .

|p|≤1

4

(7)

Projected Gradient Algorithm

We present here a projection algorithm for solving this dual formulation, inspired from [24, 18], and we provide a complete proof of convergence of this scheme. 4.1

Discrete Setting

From now on, we will work in the discrete case, using the following convention. A grayscale image is a matrix of size N × N . We write X = RN ×N the space of grayscale images. Their gradients belong to the space Y = X × X. The L2 inner product is u, vX = 1≤i,j≤N ui,j vi,j . For the gradient and divergence operators on grayscale images, we use the same discretizations as in [24]. A color image is an element of X M and its gradient belongs to Y M . The gradient and the divergence are deﬁned component by component, so that the color divergence is still the opposite of the adjoint of the color gradient. Notice that in this framework, we have ∇2 = div2 = 8 (see [24]). 4.2

Projected Gradient

It was recently noticed ([25], [28]), that problem (7) for grayscale images could be solved using a projected gradient descent. This is the algorithm we decided to extend to the case of color images. Let B = {v ∈ Y M , ∀ 1 ≤ i, j ≤ N, |vi,j |2 ≤ 1} be the discrete version of our set of test-functions. Theorthogonal projection on x1 x2 B is easily computed: PB (x) = max{1,|x| . The projected gradient , 2 } max{1,|x|2 } descent scheme is deﬁned by : pm+1 = PB pm + τ ∇(K −1 div pm − f /λ (8) which amounts to: pm+1 = i,j

pm + τ ∇(K −1 div pm − fλ )i,j . i,j −1 div pm − f ) | max 1, |pm i,j 2 i,j + τ ∇(K λ

(9)

Since the functional is not elliptic, the standard proof of convergence of the projected gradient algorithm (see [32] for instance) needs to be adapted to this particular case. Proposition 2. If 0 < τ < 4 K1−1 , then algorithm (9) converges. More precisely, there exists p ˜ ∈ B such that : lim (K −1 div pm ) = K −1 div p ˜

m→∞

˜ − f 2H = inf p∈B λK −1 div p − f 2H . and λK −1 div p

Projected Gradient Based Color Image Decomposition

301

Proof. We only give here a sketch of the proof. Let us ﬁrst notice that p is a minimizer iﬀ p ∈ B and ∀q ∈ B, ∀τ > 0, q − p, p − (p + τ ∇(K−1 div p − f /λ))L2 ≥ 0. Or equivalently: p = PB p + τ (∇(K −1 div p − f /λ) , where PB is the orthogonal projection on B with respect to the L2 inner product. Let p be such a minimizer. • Now let us consider a sequence deﬁned by (8), and write A = −∇K −1 div . We have : pk+1 − p2 ≤ (I − τ A)(p − pk )2 since PB is 1-Lipschitz [32]. Provided I − τ A ≤ 1, we can deduce : pk+1 − p ≤ pk − p

(10)

and the sequence (pk − p) is convergent. • A is a symmetric positive semi-deﬁnite operator. By writing E = ker A and ⊥

F = ImA, we have Y M = E ⊕ F , and we can decompose any q ∈ Y M as the sum of two orthogonal components q E ∈ E and q F ∈ F . Notice that by injectivity of K −1 , E is actually equal to the kernel of the divergence operator. Let μ1 = 0 < μ2 ≤ . . . ≤ μa be the ordered eigenvalues of A. I − τ A = max(|1 − τ μ1 |, |1 − τ μa |) = 1 for 0 ≤ τ ≤

2 . μa

We can restrict I − τ A to F and then deﬁne : g(τ ) = (I − τ A)|F < 1 for 0 < τ < μ2a • Now we assume that 0 < τ < μ2a . Therefore, inequality (10) is true and the sequence (pk ) is bounded, and so is the sequence (K −1 div pk ). We are going to prove that the sequence (K −1 div pk ) has a unique cluster point. Let (K −1 div pϕ(k) ) be a convergent subsequence. By extraction, one ˜ its limit. Passcan assume that pϕ(k) is convergent too, and denote by p ϕ(k)+1 ing to the limit in (8), the sequence (p ) is convergent towards p ˆ = PB p ˜ + τ ∇(K −1 div p ˜ − f /λ) . Using (10), we also notice that ˜ p − p = ˆ p − p. As a consequence: ˜ p − p2 = PB p ˜ − f /λ) − PB p + τ ∇(K −1 div p − f /λ) 2 ˜ + τ ∇(K −1 div p ≤ (I − τ A)(˜ p − p)2 = (˜ p − p)E 2 + g(τ )2 (˜ p − p)F 2 < ˜ p − p2 if (˜ p − p)F =0 .

Of course, this last inequality cannot hold, which means that (˜ p −p)F = 0. Hence (˜ p − p) ∈ E = ker A and K −1 div p ˜ = K −1 div p: the sequence (K −1 div pk ) is convergent. • Since div 2 = ∇2 = 8 (see [24]), we conclude by noticing that μa ≤ 8K −1 .

Since we are only interested in v = λK −1 div p, Proposition (2) justiﬁes the validity of algorithm (8). We can actually prove that the sequence (pm ) deﬁned by (8) converges (see [31] Corollary 4.1).

302

5

V. Duval, J.-F. Aujol, and L. Vese

Applications to Color Image Denoising and Decomposition

In this last section, we apply the projected gradient algorithm to solve various color image problems. 5.1

TV-Hilbert Model

The Color ROF Model. As an application of (9), we use the following scheme for the ROF model (5): pm+1 = i,j

pm + τ ∇(div pm − fλ )i,j . i,j f m max 1, |pm i,j + τ ∇(div p − λ )i,j |2

(11)

The Color OSV Model: As for the OSV model (6), we use: = pm+1 i,j 5.2

pm − τ ∇(Δdiv pm + fλ )i,j . i,j m + f) | max 1, |pm − τ ∇(Δdiv p i,j 2 i,j λ

(12)

The Color A2BC Algorithm

Following Y. Meyer [4], one can use the G(Ω) space to model textures, and try to solve the problem: inf u (|u|BV + αf − uG ). In [8], the authors approximate this problem by minimizing the following functional: 1 f − u − v2L2 (Ω) + χGμ (v n ) 2λ

0 if v ∈ Gμ with χGμ (v) = . +∞ otherwise

Fμ,λ (u, v) = |u|BV +

(13)

Following [8,17,18], it is straightforward to extend the A2BC algorithm using the projection algorithm. We start by initializing with u0 = v 0 = 0, and then compute iteratively until convergence1: v n+1 = PGμ (f − un ) 5.3

and

un+1 = f − v n+1 − PGλ (f − v n+1 ) .

The Color TV-L1 Model

The TV-L1 model is very popular for grayscale images. It beneﬁts from having both good theoretical properties (it is a morphological ﬁlter) and fast algorithms (see [26]). In order to extend it to color images, we consider the problem: M 2 inf u |u|T V + λf − u1 with the notation u1 = Ω l=1 |ul | . As for the A2BC algorithm, we are led to consider the approximation, for α > 0: inf |u|BV +

u,v 1

1 f − u − v22 + λv1 . 2α

The proof of convergence of this algorithm is the same as the one in [8].

Projected Gradient Based Color Image Decomposition

303

Fig. 1. From left to right: original and noisy images (WG, PSNR = 57.3 dB), denoised with color ROF (λ = 25, PSNR= 74.2 dB) and with color OSV (λ = 25, PSNR= 74.1 dB)

Fig. 2. Cartoon-texture decomposition using color A2BC algorithm (upper row) and color TVL1 (lower row). On top, the original image.

In order to generalize the TV-L1 algorithm proposed by Aujol et al ( [33]), we aim at solving the alternate minimization problem: 1 1 f − u − v22 f − u − v22 + λv1 . inf |u|BV + and inf u v 2α 2α

304

V. Duval, J.-F. Aujol, and L. Vese

Fig. 3. From left to right: original and noisy images (using salt and pepper noise, PSNR= 34.6 dB), denoised with color TVL1 (PSNR= 67.5 dB) and noise part

The ﬁrst problem is a Rudin-Osher-Fatemi problem. Scheme (9) with K = Id is well adapted for solving it. The second one can be solved by a "vectorial soft thresholding": Proposition 3. The solution of the second problem is given by: v(x) = V Tαλ (f (x) − u(x)) =

f (x) − u(x) max (|f (x) − u(x)|2 − αλ, 0) a.e. |f (x) − u(x)|2

The proof of this last result is given in [31]. Therefore, we propose to generalize the TV-L1 algorithm by initializing with u0 = v 0 = 0, then computing iteratively until convergence (the proof of convergence is the same as the one in [33]): v n+1 = V Tαλ (f − un ) 5.4

and

un+1 = f − v n+1 − PGα (f − v n+1 ).

Numerical Experiments

Figure 1 displays denoising results using ROF (5) and OSV (6) models. The images look very similar but since the OSV model penalizes much more the highest frequencies than the ROF model [33], the denoised image still shows the lowest frequencies of the noise. The convergence speed in the ROF model is roughly the same as with the Bresson-Chan algorithm (see [18], [31]). Figure 2 displays a cartoon-texture decomposition experiment using diﬀerent kinds of texture. The algorithms used were A2BC and TVL1. Both results look good. On Figure 3, a denoising experiment was performed using salt-and-pepper noise. The denoised picture looks quite good and surprisingly better than the original image! This is because the picture we used had some compression artifacts that the algorithm removed.

Acknowledgements This work has been supported by the French "Agence Nationale de la Recherche" (ANR), under grant FREEDOM (ANR07-JCJC-0048-01), "Films, REstauration Et DOnnées Manquantes", and by the National Science Foundation under Grants DMS-0312222 and DMS-0714945. Part of this work was done while the ﬁrst author was visiting the Department of Mathematics, UCLA.

Projected Gradient Based Color Image Decomposition

305

References 1. Rudin, L., Osher, S., Fatemi, E.: Non linear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 2. Chan, T., Shen, J.: Image processing and analysis - Variational, PDE, wavelet, and stochastic methods. SIAM Publisher, Philadelphia (2005) 3. Aubert, G., Kornprobst, P.: Mathematical Problems in Image Processing: Partial Diﬀerential Equations and the Calculus of Variations. Applied Mathematical Sciences, vol. 147. Springer, Heidelberg (2001) 4. Meyer, Y.: Oscillating patterns in image processing and nonlinear evolution equations. In: The ﬁfteenth Dean Jacqueline B. Lewis memorial lectures. University Lecture Series, vol. 22. American Mathematical Society, Providence, RI (2001) 5. Mumford, D., Gidas, B.: Stochastic models for generic images. Quarterly of Applied Mathematics LIV(1) (2001) 6. Vese, L., Osher, S.J.: Modeling textures with total variation minimization and oscillating patterns in image processing. Journal of Scientiﬁc Computing 19(1-3), 553–572 (2003) 7. Osher, S., Solé, A., Vese, L.: Image decomposition and restoration using total variation minimization and the H −1 norm. SIAM Journal on Multiscale Modeling and Simulation 1(3), 349–370 (2003) 8. Aujol, J.F., Aubert, G., Blanc-Féraud, L., Chambolle, A.: Image decomposition into a bounded variation component and an oscillating component. Journal of Mathematical Imaging and Vision 22(1), 71–88 (2005) 9. Aubert, G., Aujol, J.: Modeling very oscillating signals. Application to image processing. Applied Mathematics and Optimization 51(2), 163–182 (2005) 10. Aujol, J.F., Chambolle, A.: Dual norms and image decomposition models. International Journal on Computer Vision 63(1), 85–104 (2005) 11. Yin, W., Goldfarb, D., Osher, S.: A comparison of three total variation based texture extraction models. Journal of Visual Communication and Image Representation 18(3), 240–252 (2007) 12. Garnett, J., Jones, P., Le, T., Vese, L.: Modeling oscillatory components with the homogeneous spaces BM O−α and W −α,p . Pure and Applied Mathematics Quarterly (to appear) 13. Le, T., Vese, L.: Image decomposition using total variation and div (BMO). Multiscale Modeling and Simulation, SIAM Interdisciplinary Journal 4(2), 390–423 (2005) 14. Lieu, L., Vese, L.: Image restoration and decomposition via bounded total variation and negative hilbert-sobolev spaces. Applied Mathematics & Optimization 58, 167– 193 (2008) 15. Sapiro, G., Ringach, D.L.: Anisotropic diﬀusion of multivalued images with applications to color ﬁltering. IEEE Transactions on Image Processing 5(11), 1582–1586 (1996) 16. Sochen, N., Kimmel, R., Malladi, R.: A general framework for low level vision. IEEE Transactions on Image Processing 7(3), 310–318 (1998) 17. Aujol, J.F., Kang, S.H.: Color image decomposition and restoration. Journal of Visual Communication and Image Representation 17(4), 916–928 (2006) 18. Bresson, X., Chan, T.: Fast minimization of the vectorial total variation norm and applications to color image processing. Inverse Problems and Imaging (IPI) (accepted) (2007)

306

V. Duval, J.-F. Aujol, and L. Vese

19. Blomgren, P., Chan, T.: Color TV: Total variation methods for restoration of vector valued images. IEEE Transactions on Image Processing 7(3), 304–309 (1998) 20. Vese, L., Osher, S.: Color texture modeling and color image decomposition in a variational-PDE approach. In: Proceedings of the Eighth International Symposium on Symbolic and Numeric Algorithms for Scientiﬁc Computing (SYNASC 2006), pp. 103–110. IEEE, Los Alamitos (2006) 21. Aujol, J., Gilboa, G.: Constrained and SNR-based solutions for TV-Hilbert space image denoising. Journal of Mathematical Imaging and Vision 26(1-2), 217–237 (2006) 22. Vogel, C.: Computational Methods for Inverse Problems. Frontiers in Applied Mathematics, vol. 23. SIAM, Philadelphia (2002) 23. Chan, T., Golub, G., Mulet, P.: A nonlinear primal-dual method for total variationbased image restoration. SIAM Journal on Scientiﬁc Computing 20(6), 1964–1977 (1999) 24. Chambolle, A.: An algorithm for total variation minimization and its applications. JMIV 20, 89–97 (2004) 25. Chambolle, A.: Total variation minimization and a class of binary MRF models. In: Rangarajan, A., Vemuri, B.C., Yuille, A.L. (eds.) EMMCVPR 2005. LNCS, vol. 3757, pp. 136–152. Springer, Heidelberg (2005) 26. Darbon, J., Sigelle, M.: Image restoration with discrete constrained total variation part I: Fast and exact optimization. Journal of Mathematical Imaging and Vision 26(3), 277–291 (2006) 27. Weiss, P., Aubert, G., Blanc-Féraud, L.: Eﬃcient schemes for total variation minimization under constraints in image processing. SIAM Journal on Scientiﬁc Computing (to appear) (2007) 28. Aujol, J.: Some algorithms for total variation based image restoration. CMLA Preprint 2008-05 (2008), http://hal.archives-ouvertes.fr/hal-00260494/en/ 29. Zhu, M., Wright, S., Chan, T.: Duality-based algorithms for total variation image restoration, UCLA CAM Report 08-33 (May 2008) 30. Bermudez, A., Moreno, C.: Duality methods for solving variational inequalities. Comp. and Maths. with Appls. 7(1), 43–58 (1981) 31. Duval, V., Aujol, J.F., Vese, L.: A projected gradient algorithm for color image decomposition. Technical report, UCLA, CAM Report 08-40 (2008) 32. Ciarlet, P.G.: Introduction á l’Analyse Numérique Matricielle et á l’Optimisation. Dunod (1998) 33. Aujol, J., Gilboa, G., Chan, T., Osher, S.: Structure-texture image decomposition modeling, algorithms, and parameter selection. International Journal of Computer Vision 67(1), 111–136 (2006)

A Dual Formulation of the TV-Stokes Algorithm for Image Denoising Christoﬀer A. Elo1 , Alexander Malyshev1 , and Talal Rahman2 1

2

Department of Mathematics, University of Bergen Johannes Bruns gate 12, 5007 Bergen, Norway [email protected], [email protected] Bergen University College, Faculty of Engineering, Nygårdsgaten 112, 5020 Bergen [email protected]

Abstract. We propose a fast algorithm for image denoising, which is based on a dual formulation of a recent denoising model involving the total variation minimization of the tangential vector ﬁeld under the incompressibility condition stating that the tangential vector ﬁeld should be divergence free. The model turns noisy images into smooth and visually pleasant ones and preserves the edges quite well. While the original TV-Stokes algorithm, based on the primal formulation, is extremely slow, our new dual algorithm drastically improves the computational speed and possesses the same quality of denoising. Numerical experiments are provided to demonstrate practical eﬃciency of our algorithm.

1

Introduction

We suppose that the observed image d0 (x, y), (x, y) ∈ Ω ⊂ R2 , is an original image d(x, y) perturbed by an additive noise η, d0 = d + η.

(1)

The problem of recovering the image d from the noisy image d0 is an inverse problem that is often solved by variational methods using the total variation (TV) minimization. The corresponding Euler equation, which is a set of nonlinear partial diﬀerential equations, is typically solved by applying a gradient-descent method to a ﬁnite diﬀerence approximation of these equations. A classical total variation denoising model is the primal formulation due to Rudin, Osher and Fatemi [1] (the ROF model): λ d − d0 2L2 . (2) 2 The parameter λ > 0 can be chosen, e.g., to approximately fulﬁll the condition d − d0 L2 ≤ σ, where σ is an estimate of ηL2 . The Euler equation −div (∇d/|∇d|) + λ(d − d0 ) = 0 is usually replaced by a regularized one, ∇d + λ(d − d0 ) = 0, (3) −div |∇d|β min ∇dL1 + d

X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 307–318, 2009. c Springer-Verlag Berlin Heidelberg 2009

308

C.A. Elo, A. Malyshev, and T. Rahman

where |∇d|β =

|∇d|2 + β 2 is a necessary regularization, since images contain

ﬂat areas where |∇d| = d2x + d2y ≈ 0. When solving (3) numerically, an explicit time marching scheme with an artiﬁcial time variable, t, is typically used. However, such an algorithm is rather slow due to severe restrictions requiring small time steps for the convergence. It is well known that the ROF model suﬀers from the so called staircase eﬀect, which is a disadvantage when denoising images with aﬃne regions. To overcome this defect, we motivate for a two-step approach, where the fourth-order model, studied in [2, 3, 4], is decoupled into two second-order problems. Such methods are known to overcome the staircase eﬀect, but tend to have computational diﬃculties due to very large conditioning. The authors of [5, 6] used the same two-step approach as in [7], but adopting ideas from [8, 9] they proposed to preserve the divergence-free condition on the tangential vector ﬁeld. Recall that the tangential vector ﬁeld τ is orthogonal to the normal (gradient) vector ﬁeld n of the image d: n = ∇d = (dx , dy ),

τ = ∇⊥ d = (−dy , dx )T .

(4)

Hence div τ = 0. The ﬁrst step of the TV-Stokes algorithm smoothes the tangential vector ﬁeld τ0 = ∇⊥ d0 for a given noisy image d0 and then solve the minimization problem 1 min ∇τ L1 + τ − τ0 2L2 τ 2δ

subject to div τ = 0,

(5)

where δ > 0 is some carefully chosen parameter. Once a smoothed tangential vector ﬁeld τ is obtained, the second step reconstructs the image d by ﬁtting it to the normal vector ﬁeld by solving the minimization problem min ∇dL1 d

n − ∇d, |n| L2

subject to d − d0 L2 = σ,

(6)

where σ is an estimate of ηL2 . In [5] the minimization problems (5) and (6) are numerically solved by means of a time marching explicit scheme, while existence and uniqueness are proven for the Modiﬁed TV-Stokes in [6]. The TV-Stokes approach resulted in an algorithm which does not suﬀer from the staircase eﬀect, preserves the edges, and the denoised images look visually pleasant. However, the TV-Stokes algorithm from [5] is extremely slow convergent and therefore practically unusable as demonstrated in the last section of the present paper. We adopt the TV-Stokes denoising model but reduce the above presented primal formulation to the so called dual formulation, which is then numerically solved by a variant of fast Chambolle’s iteration [10]. The reduction exploits the orthogonal projector ΠK onto the subspace K = {τ : div τ = 0} for elimination of the divergence-free constraint.

A Dual Formulation of the TV-Stokes Algorithm for Image Denoising

2

309

The TV-Stokes Denoising Algorithm in Dual Formulation

To overcome diﬃculties with non-diﬀerentiability in the primal formulation, Carter [11], Chambolle [10] and Chan, Golub and Mulet [12] have proposed dual formulations of the ROF model, where a dual variable p = (p1 (x, y), p2 (x, y)) is used to express the total variation: ∇dL1 = max {(d, divp)L2 : |pj (x, y)| ≤ 1 ∀(x, y) ∈ Ω, j = 1, 2} . p

(7)

For instance, a variant of dual formulation from [10] consists in minimization of the distance divp − λd0 L2 . In [10] Chambolle also proposed a fast iteration for solving this minimization problem that produces a denoised image after a few steps only. Below we show how to reduce the TV-Stokes model to a dual formulation. 2.1

Step 1

To derive a dual formulation of the ﬁrst step we take advantage of the following analog of (7) for the total variation of the tangential vector ﬁeld τ = (τ1 , τ2 )T : ∇τ L1 = max {(τ, divp)L2 : |pi (x, y)| ≤ 1 ∀(x, y) ∈ Ω, i = 1, 2} , p

(8)

where the dual variable p is a pair of two rows, p1 = (p11 , p12 ) and p2 = (p21 , p22 ). The divergence is deﬁned as follows: divp = (divp1 , divp2 )T , where divpi =

∂pi2 ∂pi1 + , i = 1, 2. ∂x ∂y

(9)

This deﬁnition is similar to the vectorial dual norm from [13] for vectorial images, e.g. color images. Plugging (8) into (5) yields 1 min max (τ, divp)L2 + (τ − τo , τ − τo )L2 . (10) div τ =0 |pi |≤1 2δ Results from convex analysis, see for instance Theorem 9.3-1 in [14], allow us to exchange the order of max and min in (10) and obtain an equivalent optimization problem 1 max min (τ, divp)L2 + (τ − τo , τ − τo )L2 . (11) 2δ |pi |≤1 div τ =0 Now comes a trick. Let us introduce the orthogonal projection ΠK onto the constrained subspace K = {τ : div τ = 0}. Note that τ0 ∈ K. By means of the pseudoinverse Δ+ we may write that τ1 τ1 τ + = − ∇Δ div 1 . (12) ΠK τ2 τ2 τ2

310

C.A. Elo, A. Malyshev, and T. Rahman

The constraint div τ = 0 means that ΠK τ = τ , and the latter implies the equalities (τ, divp) = (ΠK τ, divp) = (τ, ΠK divp). Hence (11) is equivalent to 1 (13) max min (τ, ΠK divp)L2 + (τ − τo , τ − τo )L2 . |pi |≤1 div τ =0 2δ Solution to the minimization problem (without constraint div τ = 0!) 1 min (τ, ΠK divp)L2 + (τ − τo , τ − τo )L2 τ 2δ is τ = τ0 − δΠK divp

(14)

and satisﬁes the constraint div τ = 0. Owing to (14) we have the equality 1 1 (τ − τo , τ − τo ) = 2δ [(τ0 , τ0 ) − (δΠK divp − τ0 , δΠK divp − τ0 )] , (τ, ΠK divp) + 2δ which together with (13) gives our dual formulation:

min ΠK divp − δ −1 τ0 L2 : |pi | ≤ 1, i = 1, 2 . (15) p

Numerical solution of (15) is computed by Chambolle’s iteration from [10]:

pn + Δt ∇ ΠK divpn − δ −1 τ0 0 n+1 p = 0, p . (16) = 1 + Δt |∇ (ΠK divpn − δ −1 τ0 )| The iteration converges rapidly when Δt ≤ 14 . The smoothed tangential ﬁeld after n iterations is given by τn = τ0 − δΠK divpn . 2.2

Step 2

The image d is reconstructed at the second step by ﬁtting it to the normal vector ﬁeld built from the tangential vector ﬁeld computed at step 1, (n1 , n2 ) = (τ2 , −τ1 ). Again we introduce a dual variable r = (r1 (x, y), r2 (x, y)) and use the formula ∇dL1 = max|r|≤1 (∇d, −r)L2 . Then the minimization problem (6) is equivalent to the problem n 1 d − d0 2L2 , d, div r + (17) min max + d |r|≤1 |n| 2μ L2 where μ > 0 is a Lagrangian multiplier. After interchanging min and max in (17) we ﬁnd conditions for attaining the minimum: n d = d0 − μ div r + . (18) |n| By analogy with (15) we can derive the dual formulation for step 2: d0 n − : |r| ≤ 1 . min div r + r |n| μ L2

(19)

A Dual Formulation of the TV-Stokes Algorithm for Image Denoising

Chambolle’s iteration for (19) is as follows: n − μ−1 d0 rn + Δt ∇ div rn + |n| . rn+1 = n − μ−1 d0 1 + Δt ∇ div rn + |n| 2.3

311

(20)

The Discrete Algorithm

The staggered grid is used for discretization as in [5]. For convenience we introduce the diﬀerentiation matrices ⎞ ⎛ ⎞ ⎛ 1 −1 1 ⎟ ⎜ −1 1 ⎟ ⎟ −1 1 1⎜ 1⎜ ⎟ ⎟ ⎜ ⎜ T .. .. (21) , −B = ⎜ B= ⎜ ⎟, ⎟ . . . . .. .. ⎠ ⎟ h⎝ h⎜ ⎝ −1 1 ⎠ −1 1 −1 where B is the forward diﬀerence operator and −B T is the backward diﬀerence operator. The discrete gradient operator applied to a matrix d is then deﬁned as ∇h d = dBxT , By d , (22) where Bx (By ) stands for diﬀerentiation in the x (resp. y) direction. The discrete divergence operator is given by divh (p1 , p2 ) = −p1 Bx − ByT p2 .

(23)

The discrete analog of the projection operator ΠK has the form h ΠK = I − ∇h (Δh )+ divh ,

(24)

where the gradient and divergence are applied in a slightly diﬀerent manner: T τ1 dBx h T h div = −τ1 Bx − By τ2 , ∇ d= . (25) τ2 By d To complete the deﬁnition (24) we need a description of the pseudoinverse operator (Δh )+ for the discrete Laplacian Δh d = −dBxT Bx − ByT By d.

(26)

Let us introduce the orthogonal N × N matrix of the Discrete Cosine Transform, C, which is deﬁned by dst(eye(N)) in MATLAB. The symmetric matrix deﬁned in MATLAB by dst(eye(N-1)), satof the Discrete Sine Transform, S, T isﬁes the equation S S = (N/2) ∗ I, where I is theidentity matrix. We prefer N/2 of order N − 1. The to use the orthogonal symmetric matrix S = −S/ singular value decomposition of B has the form B = S[0, Σ]C,

Σ = diag(σ1 , . . . , σN −1 ),

where the diagonal matrix Σ has the diagonal entries

(27)

312

C.A. Elo, A. Malyshev, and T. Rahman

σk =

πk 2 sin , h 2N

k = 1, 2, . . . , N − 1.

(28)

By the aid of (27) equation (26) can be rewritten as f = Δh d = −dC T

0

Σx2

C − CT

0

Σy2

Cd.

Denoting f = Cf C T and d = CdC T we arrive at the equation 0 0 − d. f = −d Σy2 Σx2

(29)

(30)

Suppose that the matrices f and This equation is easily solved with respect to d. d have the entries fij and dij for i, j = 0, 1, . . . . Note that in our case f00 = 0. Then the solution d = G(f) is as follows: d00 = 0, 2 , di,0 = −fi,0 /σi,y

i = 1, 2, . . . ,

2 , d0,j = −f0,j /σj,x

j = 1, 2, . . . ,

(31)

2 2 + σj,x ), i, j = 1, 2, . . . . dij = −fij /(σi,y

Thus the pseudoinverse operator (Δh )+ can be eﬃciently computed with the help of the Discrete Cosine Transform: (Δh )+ f = C T G(Cf C T )C,

(32)

where the function G is deﬁned in (31). In conclusion we recall that multiplication of an N × N matrix by C or C T = C −1 is typically implemented by the aid of the fast Fourier transform and requires only O(N 2 log2 N ) arithmetical operations. All other computations have the cost O(N 2 ).

(a) Lena, 200 × 200

(b) Cameraman, 256 × 256 Fig. 1. Original images

(c) Barbara, 512 × 512

A Dual Formulation of the TV-Stokes Algorithm for Image Denoising

313

Algorithm. Dual TV-Stokes Given d0 , k, δ and μ ; Step one; Let p0 = 0 and q 0 = 0 ; Calculate τ 0 = (v 0 , u0 ) : v 0 = −Bd and u0 = dB T ; Initialize counter: n = 0 ; while not converged do Calculate projections: h (πp , πq ) = ΠK (divh pn , divh q n )

pn + k ∇h πp − δ −1 v0 . 1 + k |(∇h (πp − δ −1 v0 ))| q n + k ∇h πq − δ −1 u0 . q n+1 = 1 + k |(∇h (πq − δ −1 u0 ))| Update counter: n = n + 1 ; end Calculate τ : pn+1 =

h τ = τ0 − ΠK (δdivh pn+1 , δdivh q n+1 )

(33)

(34)

(35)

(36)

Step two; Let r 0 = 0 and calculate the normal ﬁeld: n = (n1 , n2 ), n1 = u(v 2 + u2 )−1/2 and n2 = −v(v 2 + u2 )−1/2 ; Initialize counter: n = 0 ; while not converged do Calculate projections: r

n+1

r n + k ∇h divh (r n + n) − μ−1 v0 . = 1 + k ∇h divh (rn + n) − μ−1 v0

(37)

Update counter: n = n + 1 ; end Recover image d: d = d0 − μdivh r n+1

(38)

Algorithm 1. Dual TV-Stokes algorithm for image denoising 2.4

Numerical Experiments

In what follows we present several examples to show how the TV-Stokes method works for diﬀerent images. All the images we have tested are normalized into gray-scale values, ranging from 0 (black) to 1 (white). In the experiments we start with a clean image, shown in ﬁgure 1, and then add random noise with zero mean. This is done by the imnoise MATLAB command, where the variance

314

C.A. Elo, A. Malyshev, and T. Rahman 5

130

4.4

x 10

4.2

120

4

110 3.8

100 3.6

90 3.4

80

3.2

3

70

2.8

60

2.6

50

0

10

20

30

40

50

60

70

80

90

0

1

2

3

4

5

6

(a) Dual TV-Stokes algorithm 1

7

8 4

100

x 10

(b) TV-Stokes [5]

Fig. 2. Energy vs. iterations plot for the ﬁrst step

parameter is set to 0.001 for the Barbara image and 0.005 for the Lena image. The Cameraman image is taken directly from the paper [5], so we compare the results with the same noisy image as input. In [5] this model is further compared to the two-step method LOT and famous ROF model. The signal-to-noise ratio is measured in decibels before denoising: " ! (d − d)2 dx Ω SN R = 20 log10 ! , (39) (η − η)2 dx Ω ! ! 1 1 where d = |Ω| and η = |Ω| Ω d dx, Ω η dx The numerical procedures used in [5] were based on explicit ﬁnite diﬀerence schemes. This process is very slow, as the constraint converges slowly. However, in the proposed dual method the constraint is satisﬁed on each step by the orthogonal projection. The energy and number of iterations required for convergence in step one are shown in ﬁgure 2. The ﬁgure clearly illustrates that the dual TV-Stokes algorithm requires less iterations before the energy is stable than the primal TV-Stokes algorithm. Although the iterations in the dual TV-Stokes algorithm require more computational eﬀort in each iteration, it is much faster than using sparse linear solvers. Inverting the Laplacian for the orthogonal projection in each iteration is a bottleneck for very large images. In all these examples the projection was applied by the aid of the Fast Fourier Transform, which needs O(n2 log(n)) operations in each iteration. For very large images, one should consider using a multigrid solver method for applying the projection. This will reduce the operations cost to O(N 2 ). All methods were coded in MATLAB, and in table 1 the CPU time is given in seconds for each test image. The ﬁgure shows the dual TV-Stokes algorithm vs. the primal TV-Stokes algorithm from [5]. We measure the L2 -norm of the energy in (15) and (19) for stopping criteria, and stop the iteration when the diﬀerence of the energy is below 10−3 . For the TV-Stokes algorithm we used the same stopping criteria as in [5], where the tolerance of the L2 -norm of the

A Dual Formulation of the TV-Stokes Algorithm for Image Denoising

315

Table 1. Runtimes of the dual TV-Stokes algorithm compared to the TV-Stokes algorithm [5]. The test system is a 2 Opteron 270 dualcore 64-bit processor and 8GB RAM. Both steps in the dual TV-Stokes algorithm are computed with 150 iterations, while the ﬁrst step in the primal TV-Stokes algorithm is calculated with 75000 iterations and the second step with 25000 iterations. Algorithm Dual TV-Stokes algorithm Image First step Second step Lena 9.8 1.12 Cameraman 17.4 2.2 Barbara 128.2 20.7

TV-Stokes algorithm, [5] First step Second step 9083.2 1992.5 11189.0 2259.4 80602.5 14926.3

constraint is equal to 5 × 10−3 and the diﬀerence in the energy tolerance is equal to 10−3 . The time steps were set to 10−3 and 5 × 10−3 respecitvely for the ﬁrst and second step of the TV-Stokes algorithm. Our ﬁrst test is the well known Lena image, which we will recover from highly added noise. We have cropped the image to show the face, which consists of smooth areas and edges that are important to preserve. The denoised image in Figure 3, shows that the dual TV-Stokes method has recovered the smooth areas without inducing any staircase-eﬀect. The smoothing parameter δ is equal to 0.0835 and μ is equal to 0.17. Since this is a highly noisy image, the ROF model fails to give a visually pleasant image, because the smooth surfaces are piecewise continuous. The TV-Stokes algorithm however, has nearly the same quality as the dual TV-Stokes algorithm. For the TV-Stokes algorithm, δ was equal to 0.045. The next test is the Cameraman image, which consists of a smooth skyline and some low-intensity buildings in the background. The buildings are diﬃcult to recover, as they get smeared out by the denoising. The results are shown in ﬁgure 4 with δ equal to 0.055 and μ equal to 0.08. The TV-Stokes result is taken from [5] where the SNR are the same as the one we report, 20 log10 (8.21) ≈ 18.28. Figure 4.d shows the TV-Stokes reconstruction for the same noisy image, where the delta parameter is equal to 0.06. The last example is the Barbara image, which is quite detailed, with high and low intensity textures. The high intensity textures and the smooth areas are preserved quite well, but the low intensity textures disappear in the same way as for the Cameraman. This image is 512 × 512 in size, which makes the algorithm slower, because of the rather large number of matrix operations per iteration. However, reaching a result for the optimal parameters is still obtainable, since the method has a denoised image after a few steps. Thus, one can run the method multiple times to ﬁnd the optimal parameters. For this image we used δ equal to 0.05 and μ equal to 0.15. We do not report on an optimal result for this particular case of the TV-Stokes algorithm, due to page limitation and the amount of running time. Clearly, using the dual formulation is more eﬀective than solving the model with the explicit gradient descent method. The CPU time is found for only one runtime, since computing an average of many runtimes is very time consuming

316

C.A. Elo, A. Malyshev, and T. Rahman

(a) Noisy image, SN R ≈ 14.0

(b) Denoised using the dual TVStokes algorithm

(c) Contour plot, dual TV-Stokes (d) Diﬀerence image, dual TVimage Stokes

(e) Denoised using ROF [1]

(f) Diﬀerence image, ROF

(g) Denoised using the TV-Stokes (h) Diﬀerence image, TV-Stokes algorithm [5] Fig. 3. Lena image (200 × 200), denoised using the dual TV-Stokes, TV-Stokes and the ROF algorithm

A Dual Formulation of the TV-Stokes Algorithm for Image Denoising

317

(a) Noisy image, SN R ≈ 18.28 (b) Denoised using the dual TVStokes algorithm

(c) Diﬀerence image, dual TV- (d) Denoised using Stokes Stokes algorithm [5]

the

TV-

Fig. 4. Cameraman (256 × 256), denoised using the dual and the primal formulation of the TV-Stokes algorithm

(a) Noisy image, SN R ≈ 20.0

(b) Denoised image

Fig. 5. Barbara (512 × 512), denoised using the dual formulation of the TV-Stokes algorithm

for the TV-Stokes method. Although, the time shown are for one runtime, they clearly give the indication that our method is much faster and stable. The comparison with the primal method also shows that the proposed dual method has the same denoising quality.

318

C.A. Elo, A. Malyshev, and T. Rahman

References 1. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Phys. D 60(1-4), 259–268 (1992) 2. Chan, T., Marquina, A., Mulet, P.: High-order total variation-based image restoration. SIAM J. Sci. Comput. 22(2), 503–516 (2000) 3. Chambolle, A., Lions, P.L.: Image recovery via total variation minimization and related problems. Numer. Math. 76, 167–188 (1997) 4. Lysaker, O., Lundervold, A., Tai, X.C.: Noise removal using fourth-order partial diﬀerential equation with applications to medical magnetic resonance images in space and time. IEEE Trans. Imag. Proc. 12, 1579–1590 (2003) 5. Rahman, T., Tai, X.C., Osher, S.: A tv-stokes denoising algorithm. In: Sgallari, F., Murli, A., Paragios, N. (eds.) SSVM 2007. LNCS, vol. 4485, pp. 473–483. Springer, Heidelberg (2007) 6. Litvinov, W., Rahman, T., Tai, X.C.: A modiﬁed tv-stokes model for image processing (submitted) (2008) 7. Lysaker, O.M., Osher, S., Tai, X.C.: Noise removal using smoothed normals and surface ﬁtting. IEEE Transaction on Image Processing 13(10), 1345–1357 (2004) 8. Bertalmio, M., Bertozzi, A., Sapiro, G.: Navier-stokes, ﬂuid dynamics, and image and video inpainting. In: Proc. IEEE Computer Vision and Pattern Recognition (CVPR) (2001) 9. Tai, X., Osher, S., Holm, R.: Image inpainting using tv-stokes equation. Image Processing based on partial diﬀerential equations (2006) 10. Chambolle, A.: An algorithm for total variation minimization and applications. J. Math. Imaging Vis. 20(1-2), 89–97 (2004) 11. Carter, J.: Dual methods for total variation-based image restoration. PhD thesis, UCLA (2001) 12. Chan, T.F., Golub, G.H., Mulet, P.: A nonlinear primal-dual method for total variation-based image restoration. SIAM J. Sci. Comput. 20(6), 1964–1977 (1999) 13. Bresson, X., Cham, T.F.: Fast minimization of the vectorial total variation norm and applications to color image processing. CAM Report 07-25 (2007) 14. Ciarlet, P.G., Jean-Marie, T., Bernadette, M.: Introduction to numerical linear algebra and optimisation. Cambridge University Press, Cambridge (1989)

Anisotropic Regularization for Inverse Problems with Application to the Wiener Filter with Gaussian and Impulse Noise Micha Feigin and Nir Sochen School of Mathematics, Tel Aviv University [email protected], [email protected]

Abstract. Most inverse problems require a regularization term on the data. The classic approach for the variational formulation is to use the L2 norm on the data gradient as a penalty term. This however acts as a low pass ﬁlter and thus is not good at preserving edges in the reconstructed data. In this paper we propose a novel approach whereby an anisotropic regularization is used to preserve object edges. This is achieved by calculating the data gradient over a Riemannian manifold instead of the standard Euclidean space using the Laplace-Beltrami approach. We also employ a modiﬁed ﬁdelity term to handle impulse noise. This approach is applicable to both scalar and vector valued images. The result is demonstrate via the Wiener ﬁlter with several approaches for minimizing the functional including a novel GSVD based spectral approach applicable to functionals containing gradient based features.

1

Introduction

Handling degraded images, both due to blur and noise, is a practical reality in any imaging ﬁeld. The common image degradation model is I = I0 ∗ h + n

(1)

where I, the observed image, is the result of a convolving the input image (or ideal image) I0 with some blurring kernel h. The result is then summed with additive noise n. This is a common model for any system that contains a lens and sensor. Both the blur and noise are a combination of several processes. Some typical causes for image blue are out of focus images, motion blur due to an unstable camera and/or object and a low pass ﬁlter resulting from the ﬁnite aperture and anti aliasing ﬁlter on the sensor. Noise can result from the sensor and ampliﬁer due to low light, heat, dead pixels and background radiation or from memory and communication corruption. Each of these processes has it’s own typical blur kernel and noise distribution statistics [1, 2]. A direct naive approach to handle the blur can be given using a spectral (Fourier) approach manipulation of the degradation model equation. To see the ˆ +n diﬃculty though, look at the Fourier transform of this equation Iˆ = Iˆ0 · h ˆ X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 319–330, 2009. c Springer-Verlag Berlin Heidelberg 2009

320

M. Feigin and N. Sochen

(where the hat notation denotes the Fourier transform). This transforms the convolution into a multiplication which allows for an easy rearrangement of the ˆ Any L2 kernel h will the equation. Extracting Iˆ0 gives us Iˆ0 = Iˆ − n ˆ /h. decay to zero at inﬁnity. This results with a divide by zero issue at least for high frequencies. Add to that the issue that the SNR usually drops at these frequencies, which makes this procedure very sensitive to noise. One solution is this case is the Wiener ﬁlter [3], which can be derived from the standard variational formulation for ill posed inverse problems by adding prior knowledge (or assumptions) via an additional penalty term to the reconstruction. That is to minimize an energy functional of the form S (I0 ) = I0 ∗ h − I + μ Φ (I0 ) . ﬁdelity term penalty

(2)

Here Φ is some function of the parameter I0 that imposes the assumptions on the model. A common constraint term is Φ (I0 ) = ∇I0 which penalizes high frequencies as these are often the source of instability. The side eﬀect of this constraint is that while high frequency noise is reduced in the reconstruction, edge detail is lost as well as is demonstrated in Fig. 1.

(a) Original Image (b) Degraded Input

(c) μ = 5 · 10−4

(d) μ = 5 · 10−5

Fig. 1. Edge preservation vs. Noise suppression with the Wiener ﬁlter. The input image 1(a) is degraded using Gaussian white noise 1(b). The results show the diﬀerence between preferring noise suppression 1(c) to edge preservation 1(d).

This functional is often minimized under the L2 norm which is appropriate for Gaussian noise. This is mainly due to the fact that the resulting Euler Lagrange equations are linear and are thus (relatively) easy to solve. That is, the classic Wiener ﬁlter functional based on the L2 norm 2 2 (3) S (I0 ) = I0 ∗ h − IL2 + ∇I0 L2 = |I0 ∗ g − I|2 + |∇I0 |2 dA . results with the following Euler Lagrange equations (see [4] for the derivation of the Euler Lagrange formulation of the convolution) −h (−¯ x) ∗ (h (¯ x) ∗ I0 − I) − μΔI0 = 0 .

(4)

Anisotropic Regularization for Inverse Problems

321

Here x ¯ is the coordinate vector x ¯ = (x, y) for the two dimensional case. This can be solved as before by applying the Fourier transform, which results with ˆ (−ω) · h ˆ (¯ h ω ) · Iˆ0 − Iˆ + μ |¯ ω |2 Iˆ0 = 0 (5) where ω ¯ = (ωx , ωy ) is the frequency vector for the resulting frequencies along the x and y axes respectively. Now, assuming that the convolution kernel is real we ˆ (−ω) = h ˆ ∗ (ω) (where h∗ is the conjugate of h) to rewrite can use the identity h the equation as ˆ h∗ (ω) Iˆ0 = Iˆ . (6) 2 ˆ 2 h (ω) + μ |ω| Despite being easy to solve, there are two main issues with the L2 norm approach, both for the constraint and the ﬁdelity term. The ﬁrst issue is that it fails to preserve object boundaries (Fig. 1). The main reason is the penalty term that penalizes high frequencies. As the ﬁdelity term is also L2 it does little to alleviate this problem. The second issue is that the ﬁdelity term is designed to handle Gaussian noise and behaves poorly in the presence of impulse noise One solution to both these issues is to use the L1 or total variation (TV) norm [5,6,7]. When used for the ﬁdelity term it improves behavior with impulse noise. For the constraint it improves edge preservation. For the functional S (I0 ) = I0 ∗ h − IT V + μ ∇I0 T V = |I0 ∗ h − I| + μ |∇I2 | dA (7) the resulting Euler-Lagrange equations are −h (−¯ x) ∗

h (¯ x) ∗ I0 − I − μdiv |h (¯ x) ∗ I0 − I|

∇I |∇I|

= 0.

(8)

Unfortunately though the solution of which is unstable. One approach to improve on this is to use an augmented TV norm [5] 2 2 (I0 ∗ h − I) + η + μ |∇I0 | + ηdA (9) S (I0 ) = with 0 < η 1. The resulting modiﬁed Euler-Lagrange equation are ⎛ ⎞ ∇I h (¯ x) ∗ I0 − I ⎠ = 0. − μdiv ⎝ −h (−¯ x) ∗ 2 2 (h (¯ x) ∗ I0 ) + η |∇I| + η

(10)

This greatly improves the response of the ﬁdelity term to impulsive noise, but not so much for the edge preservation of the constraint. It also doesn’t account explicitly for the edges in the image. Other approaches include using Mumford-Shah like techniques of edge detection into the functional [4], weighing the Laplacian based on edge detection [8], Perona-Malik like regularizers [9], maximal likelihood estimators [10], certainty maps [11] and channel pairing on color images [12].

322

M. Feigin and N. Sochen

We propose two novelties in this paper. The ﬁrst is to combine the augmented L1 norm on the ﬁdelity term for handling impulse noise with anisotropic regularization based on the Laplace Beltrami operator for edge preservation. This is achieved by keeping the L2 norm of the gradient, however this is calculated over a Riemannian manifold instead of the standard Euclidean space using a Laplace-Beltrami approach [13]. When combined with the augmented TV norm (9), this approach also produces exceptional results for impulsive noise (Sec. (4)) The second is the use of the GSVD (generalized singular value decomposition) for the minimization of functionals that employ a gradient based penalty term. It’s direct contribution is the ability easily minimize non-local operators and functionals deﬁned on non square domains where the Fourier transform is inapplicable. For isotropic operators it can be very eﬃcient as the decomposition needs to be calculated once only oﬀ line. One interesting point to both these approaches is the relation to other frameworks. In particular it enables to better understand the relation to sparse representation and K-SVD [14]. It is important to note that both these ideas are easily applicable to general ill posed inverse problems over general feature spaces, and speciﬁcally for this case, also for color images [15] and textures [16]. The rest of this paper is organized as follows: Sec. 2 discusses the anisotropic approach. Sec. 3 discusses several approaches to minimizing the functional, including a novel approach using the GSVD. Sec. 4 shows some results of the method.

2

Anisotropic Regularization for the Wiener Filter

The problem with edge preservation lies with the gradient based penalty term. In the Euler-Lagrange equations it manifests as a Laplacian that acts as a low pass ﬁlter. In order to correctly formulate the anisotropic penalty term, we start with the Euler Lagrange equation for the Wiener ﬁlter −h (−x) ∗ (h (x) ∗ I0 − I) − μΔI0 = 0

(11)

and replace the Laplacian with an anisotropic operator, namely the LaplaceBeltrami operator [13] resulting with −h (−x) ∗ (h (x) ∗ I0 − I) − μΔg I0 = 0 .

(12)

The Laplace Beltrami operator is deﬁned as √ 1 Δg I = √ div gG−1 ∇I g where for the gray-scale case

1 + Ix2 Ix Iy , G= Ix Iy 1 + Iy2

g = det (G) .

(13)

(14)

What this does is apply the Laplacian diﬀusion operator, but instead of applying it under the standard Euclidean norm, it is applied over the image manifold [13].

Anisotropic Regularization for Inverse Problems

323

This means that we are looking at the image as a two dimensional manifold in three dimensional space for gray scale images and in 5 dimensional space for color images. When applying the diﬀusion operator, distance between pixels is measured over this manifold so the distance takes into account not only spatial oﬀset but also intensity oﬀset. The result is that pixels on diﬀerent side of an edge are farther apart than pixels on the same homogeneous region and the edges act as insulators so that image data doesn’t ﬂow across edges. This can be extended to color images by applying the diﬀusion √ on a per- channel basis, that is for each channel I i the process is Δg I i= √1g div gG−1 ∇I i with 2 i i 1 + i Ixi i IxIy . (15) G= i i 2 1 + i Iyi i Ix Iy The metric itself takes into account all the channels coupling them in the ﬁnal process to remove misalignment of the edges across the diﬀerent channels. Note that the image channels can be color channels such as RGB, CMY or more general features such as textures [16]). When extending the functional to handle impulse noise using the augmented L1 ﬁdelity term, the Euler-Lagrange equations become instead −h (−¯ x) ∗

3

h (¯ x) ∗ I0 − I 2

(h (¯ x) ∗ I0 ) + η

− μΔg I = 0 .

(16)

Finding the Minimizer

There are several approaches to minimizing the resulting functional. We already have the Euler-Lagrange equations, i.e Eq. (12) and 16. Using the direct Fourier space approach, even for the L2 ﬁdelity term, is not applicable here since the Fourier transform doesn’t diagonalize the LaplaceBeltrami operator. A diﬀerent relatively simple direct approach approach is to use the gradient descent equations h (¯ x) ∗ I0 − I ∂ I0 = h (−¯ + μΔg I x) ∗ ∂t (h (¯ x) ∗ I0 )2 + η

(17)

For the L2 ﬁdelity term there are two other spectral approaches that can be applied here, and eigen transform and the GSVD. The advantage of these among other things is that they provide a direct solution and thus prove the existence of the minimizer, same as for the standard Wiener ﬁlter. Proving the existence of a minimizer for the proposed Tikhonov functional is much more diﬃcult and beyond the scope of this paper, but can be done using similar lines to those taken in [5]. 3.1

The Laplace-Beltrami Eigen-Space

We can use the same approach implemented in [17] to diagonalize the LaplaceBeltrami operator. The problem is that the Eigenvectors of the Laplace-Beltrami

324

M. Feigin and N. Sochen

operator don’t convert the convolution into a multiplication, so we need to combine this approach with the Fourier transform. We start with the Euler-Lagrange equations for the anisotropic Wiener ﬁlter, Eq. (12). If we linearize the Laplace Beltrami operator by ﬁxing the metric, it becomes a self adjoint negative (semi) deﬁnite operator and thus it’s eigenspace is a bases to the function space under the L2 norm. Insert into this equation the eigen decomposition of the image using this eigen space I0 = c0i φi , I = ci φi (18) i

i

This produces h (−x) ∗

h (x) ∗

i

c0i φi −

+μ

ci φi

i

λi c0i φi = 0

(19)

i

which after rearrangement gives c0i h (−x) ∗ h (x) ∗ φi + μλi c0i φi = ci h (−x) ∗ φi .

(20)

i

Now, to handle the convolution, apply the Fourier transform ˆ∗ · h ˆ · c0 φˆi + μλi c0 φˆi = − −h ci ˆh∗ φˆi i i i

(21)

i

which can be rewritten as

c0i

i

ˆ 2 ˆ ∗ φˆi . ci h h − μλi φˆi =

(22)

i

˜ Here I˜ = (ci ) and This is a linear set of equations of the form AI˜0 = B I. 0 ˜ I0 = ci are the coeﬃcient vectors in the Laplace-Beltrami eigen-space. This is a system of equations needs to be solved for I˜0 . Using these coeﬃcients the ideal image I0 can be reconstructed. For a full solution this needs to be combined with ﬁxed point iterations updating the metric, although it is stable with respect to the ﬂow so in eﬀect this is rarely need. There are two things to note here. First, the coeﬃcients of I decay rather quickly so we can truncate I˜ and thus not calculate the right hand side of B. The same assumption can be made for I˜0 and thus for A. 3.2

Using the GSVD

Consider an energy functional with two linear operators La and Lb using the L2 norm 2 2 S (f ) = |La f | + μ |Lb f | dA . (23)

Anisotropic Regularization for Inverse Problems

325

Assuming that these operators can be discretized as matrices A and B respectively this can written as equations with v a vector representation of the function f S (v) = Av2 + μ Bv2 . (24) The two matrices A and B have a joint diagonalization based on the general singular value decomposition (GSVD) of the form [18] A = U Σ1 X T ,

B = V Σ2 X T

(25)

with U and V unitary matrices and Σ1 and Σ2 positive diagonal (not necessarily square). U and V must have the same number of columns but not necessarily the same number of rows (this last property we will need later on). Thus Eq. (24) can be rewritten as 2 2 S (v) = U Σ1 X T v L2 + μ V Σ2 X T v L2 . (26) Now, we can substitute v˜ = X T v to construct a functional in v˜. Also note that the L2 norm is invariant to unitary transformations, thus this functional is equivalent to 2 2 S (˜ v ) = Σ1 v˜L2 + μ Σ2 v˜L2 . (27) This new functional can be minimized according to v˜ resulting with Σ1T Σ1 v˜ + μΣ2T Σ2 v˜ = 0

(28)

We would like to do something similar with the Wiener-Filter formulation. The problem is that the gradient operator can not be discretized as a matrix operator since it takes a function and returns a vector. Luckily, what we need is an operator operating on I such that the norm would be equal to that of the gradient. For the L2 case this can be achieved as follows 2 2 S (I0 ) = |h ∗ I0 − I| + μ |∇I0 | dA 2 Dx 2 2 (29) ⇒ HI0 − I + Dy I0 = HI0 − IL2 + DI0 L2 L2 where H is the convolution matrix (which is block cyclic but not cyclic in the x 2D case) and D = D is the matrix resulting from stacking the matrix for the Dy derivative in the x direction and the one for the derivative in the y direction. For the L2 case we get 2 Dx 2 2 2 (30) Dy I0 = Dx I0 L2 + Dy I0 L2 = ∇I0 L2 . L2 Now we can use the fact that the GSVD can be applied to matrices with a diﬀerent number of rows to diagonalize this equation H = U Σ1 X T ,

D = V Σ2 X T

(31)

326

M. Feigin and N. Sochen

Using this we can do the same procedure as before 2 2 HI0 − I2L2 + DI0 2L2 ⇒ U Σ1 X T I0 − I L + V Σ2 X T I0 L 2

2

(32)

and again based on U and V being unitary and substituting I˜0 = X T I0 and I˜ = U −1 I = U T I results with 2 2 (33) S I˜0 = Σ1 I˜0 − I˜ + Σ2 I˜0 L2

L2

this can be minimized according to I˜0 to produce Σ1T Σ1 I˜0 − I˜ + Σ2T Σ2 I˜0 = 0

(34)

or after rearrangement and back-substitution −1 T T Σ1 U I0 . I = X −T Σ1T Σ1 + μΣ2T Σ1

(35)

Note that Σ1T Σ1 + μΣ2T Σ1 is a diagonal matrix and thus easy to invert (in fact for μ = 1 it is the identity matrix). To apply the same idea to the anisotropic case, we need to formulate the prior to the Laplace-Beltrami operator as a gradient over a manifold instead. The operator is the minimizer of the following symmetric positive deﬁnite √ √ −1 2 ∇I T G−1 ∇I gdm σ = Dg ∇I dm σ, gG = Dg2 (36) and the discrete formulation for the anisotropic derivative matrix Dg (which replaces D in Eq. 29) can be found via an eigen decomposition of the matrix √ −1 gG √ ⎛ 2 √ 2 2 2 ⎞ Ix + 1+Ix +Iy Iy Ix Iy (1− 1+Ix2 +Iy2 )

√ √ D + D x y 4 4 2 2 2 2 2 2 2 2 Dx 1+Ix +Iy ⎜ (I +Iy )√ 1+Ix +Iy ⎟ (Ix +I√ y) = ⎝ I xI 1− Dg = A (37) ⎠ 1+Ix2 +Iy2 ) Iy2 + 1+Ix2 +Iy2 Ix2 x y( Dy √ √ D + D (Ix2 +Iy2 ) 4 1+Ix2 +Iy2 x (Ix2 +Iy2 ) 4 1+Ix2 +Iy2 y One advantage of this approach is that it is applicable to non-local operators and to non square domains where the Fourier transform as applied to the original Wiener ﬁlter fails. For the isotropic case it needs to be calculated once oﬀ line as the transform is constant and thus can be very eﬃcient for reoccurring problems (or by splitting the problem into constant sized patches as described in [17]).

4

Numerical Results

Comparing the reconstruction quality based on standard measurements alone such as SNR and PSNR doesn’t do justice to the method. This is due to the fact that these values are not good assessors for edge reconstruction being L2 based measures. Despite this and for a lack of a better objective comparison method,

Anisotropic Regularization for Inverse Problems

327

we do see an improvement in the reconstruction based on these measurements. It is important to also note the subjective diﬀerence when looking at the images themselves. The biggest diﬀerence is seen near pronounced edges and textures which are much better preserved than with the standard wiener ﬁlter. This method also removes ringing (Gibbs eﬀect) seen around strong edges and color skews in color images. The results are cropped and zoomed to better accent the diﬀerence due to the limit of the medium.

(a) Input

(b) Degraded

(c) Standard W.F.

(d) Anisotropic W.F.

Fig. 2. Reconstruction of a gray-scale image (2(a)) degraded using a Gaussian kernel and Gaussian noise (2(b)) with standard deviation of 10%. The image is reconstructed using the standard (2(c)) and anisotropic Wiener ﬁlter (2(d)).

The ﬁrst example (Fig. 2) shows the results for a gray scale image degraded by a Gaussian kernel and Gaussian noise with standard deviation of 10% (with a resulting SNR of 16.34db). The reconstruction for both the standard Wiener ﬁlter (2(c)) and the anisotropic version (2(d)) is done based on the L2 ﬁdelity term. The SNR of the reconstructed images are 20.72db and 21.08db respectively. The anisotropic reconstruction displays less noise, especially visible in homogeneous areas such as the white background and skin. The edges in the isotropic version on the other hand display both blur (such as the back, hands and hair) and ringing around pronounced edges not appearing in the anisotropic version. This is most pronounced around the dominant edges of the back and the hair. Figure 3 shows the results of applying the Wiener ﬁlter to an image with impulse noise (11% density, with 8.47db SNR). The ﬁrst two examples (3(b), 3(e)) display the result of applying the standard and anisotropic Wiener ﬁlters respectively, both using the L2 ﬁdelity term. Despite improving SNR values (15.9db and 16.48db) the results are still rather miserable, although the anisotropic version still displays more pronounced edges (teeth, wall) as well as less noise. On the other hand, looking at the versions employing the augmented L1 ﬁdelity term (3(c) and 3(f)), on ﬁrst look one can mistake them for the input image. Despite this the anisotropic version still displays much sharper results up close, as well as improved SNR (22.48db compared to 22.98db). The following examples for color images show the extendability of the method to vector valued images.

328

M. Feigin and N. Sochen

(a) Input Image

(b) Std. W.F. L2 ﬁdelity

(c) Std. W.F., L1 ﬁdelity

(d) Degraded Image

(e) AI W.F., L2 ﬁdelity

(f) AI W.F., L1 ﬁdelity

Fig. 3. Restoration of a gray scale image corrupted by impulse noise of density 0.11. Figures 3(b) and 3(e) show the reconstruction using regular and anisotropic Wiener ﬁlter with L2 ﬁdelity. Figures 3(c) and 3(f) show the reconstruction using the L1 ﬁdelity term.

Figure 4 shows the results for a color image degraded by a Gaussian kernel and Gaussian noise with a standard deviation of 10% (SNR of 16.7db). As can be seen, the anisotropic reconstruction produces sharper edges without the color shifts and ringing which is visible around sharp edges. Additionally, there is less overall noise and color shifts due to the smoothing of the noise. SNR for the isotropic case is 21.04db compared to 21.6db for the anisotropic variation. Fig. 5 shows the results of applying both the regular and anisotropic Wiener ﬁlter, both based on the L1 ﬁdelity term, to a color image degraded by a Gaussian kernel and impulse noise with 11% density (SNR of 11db). The anisotropic variation shows sharper edges, better color restoration and less color skews around edge boundaries. This, like the previous results, is most pronounced around bright edges such as the teeth, eyes and wall. The SNR of the reconstruction is 20db and 23.1db for the isotropic and anisotropic varieties respectively.

Anisotropic Regularization for Inverse Problems

(a) Degraded image

(b) Standard W.F.

329

(c) Anisotropic W.F.

Fig. 4. Color image degraded by a gaussian kernel and uncorrelated Gaussian noise (4(a)) with standard deviation of 10%. Figures 4(b) and 4(c) show the results for the standard and the anisotropic reconstruction.

(a) Degraded image

(b) Std. W.F. L1 ﬁdelity

(c) AI W.F. L1 ﬁdelity

Fig. 5. Color image degraded by a gaussian kernel and uncorrelated impulse noise (5(a)) with density 0.11. Figures 5(b) and 5(c) show the results for the standard and anisotropic restoration based on the L1 ﬁdelity term.

5

Conclusion

In this work we presented an anisotropic regularization term for inverse problems that allows to better preserve object edges while at the same time improving noise suppression. Combined with an augmented L1 ﬁdelity term it provides remarkable results for images corrupted by impulse noise.

References 1. Goodman, J.: Introduction to Fourier Optics. McGraw-Hill Book Company, New York (1996) 2. Jähne, B.: Digital Image Processing, 5th edn. Springer, Heidelberg (2002)

330

M. Feigin and N. Sochen

3. Gonzalez, R.C., Woods, R.E.: Digital image processing, 2nd edn. Prentice-Hall, Englewood Cliﬀs (2002) 4. Bar, L., Sochen, N., Kiryati, N.: Semi-blind image restoration via mumford-shah regularization. IEEE Trans. on Image Processing 15(2), 483–493 (2005) 5. Bar, L., Kiryati, N., Sochen, N.: Image deblurring in the presence of impulsive noise. Int. J. Comput. Vision 70(3), 279–298 (2006) 6. Blomgren, P., Chan, T.F.: Color tv: Total variation methods for restoration of vector-valued images. IEEE Trans. Image Processing 7, 304–309 (1998) 7. Chan, T.F., Vese, L.A.: Image segmentation using level sets and the piecewiseconstant mumford-shah model. Technical Report 00-14, UCLA CAM (2000) 8. Charbonnier, P., Blanc-féraud, L., Aubert, G., Barlaud, M.: Deterministic edgepreserving regularization in computed imaging. IEEE Trans. Image Processing 6, 298–311 (1997) 9. Welk, M., Theis, D., Weickert, J.: Variational deblurring of images with uncertain and spatially variant blurs. In: Kropatsch, W.G., Sablatnig, R., Hanbury, A. (eds.) DAGM 2005. LNCS, vol. 3663, pp. 485–492. Springer, Heidelberg (2005) 10. Jalobeanu, A., Blanc-Feraud, L., Zerubia, J.: An adaptive gaussian model for satellite image deblurring. IEEE Transactions on Image Processing (4), 613–621 (2004) 11. Krajsek, K., Mester, R.: The edge preserving wiener ﬁlter for scalar and tensor valued images. In: DAGM-Symposium, pp. 91–100 (2006) 12. Kaftory, R., Sochen, N., Zeevi, Y.Y.: Variational blind deconvolution of multichannel images. Int. J. Imaging Science and Technology 15(1), 56–63 (2005) 13. Sochen, N., Kimmel, R., Malladi, R.: A general framework for low level vision. IEEE Trans. Image Processing, Special Issue on Geometry Driven Diﬀusion 7, 310–318 (1998) 14. Aharon, M., Elad, M., Bruckstein, A.: The K-SVD: An algorithm for designing of overcomplete dictionaries for sparse representation. IEEE Trans. On Signal Processing 54(11), 4311–4322 (2006) 15. Kimmel, R., Malladi, R., Sochen, N.: Images as embedded maps and minimal surfaces: Movies, color, texture, and volumetric medical images. International Journal of Computer Vision 39, 111–129 (2000) 16. Sagiv, C., Sochen, N., Zeevi, Y.: Gabor features diﬀusion via the minimal weighted area method. In: EMMCVPR (September 2001) 17. Feigin, M., Sochen, N., Vemuri, B.C.: Eﬃcient anisotropic α-kernels decompositions and ﬂows. In: POCV (2008) 18. Golub, G.H., Loan, C.F.V.: Matrix computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996)

Locally Adaptive Total Variation Regularization Markus Grasmair Department of Mathematics, University of Innsbruck, Technikerstr. 21a, A-6020 Innsbruck, Austria [email protected] http://infmath.uibk.ac.at

Abstract. We introduce a locally adaptive parameter selection method for total variation regularization applied to image denoising. The algorithm iteratively updates the regularization parameter depending on the local smoothness of the outcome of the previous smoothing step. In addition, we propose an anisotropic total variation regularization step for edge enhancement. Test examples demonstrate the capability of our method to deal with varying, unknown noise levels.

1

Introduction

Because of its ability to generate images with piecewise smooth structures that are well separated by pronounced edges, total variation regularization is one of the most widely used techniques for image denoising and related tasks. Since the ﬁrst proposal by Rudin, Osher, and Fatemi [14] of using the total variation for denoising purposes, that is, the L1 -norm of the gradient, this method has been applied to a wide range of applications in imaging and inverse problems. We refer to [1, 2, 3, 5, 12, 13, 15] to name but a few contributions to this ﬁeld. Given a noisy function f ∈ L2 (Ω) on some open and bounded domain Ω ⊂ IRn , n ∈ IN, the goal of denoising is to ﬁnd a new function u close to f that retains the important features of f while noise, consisting of fast oscillations, is removed. Noting that edges belong to the most prominent features in images, this task can be achieved by minimizing the total variation functional 2 1 T (u; α) := u(x) − f (x) dx + α|Du|(Ω) (1) 2 Ω with respect to u ∈ BV(Ω). The regularization parameter α > 0 in (1) controls the amount of smoothing that is desired: the larger α, the more the regularized function uα tends to consist of well separated homogeneous regions. Conversely, a small parameter α implies a function lying close to the input data, but also possibly exhibiting a signiﬁcant number of oscillations. The relation between α and uα , however, exists only on a qualitative level. There is no simple connection between the value of α and the smoothness of uα , or even between α and the diﬀerence f − uα , which is simply the part of the data classiﬁed as noise by the functional T . The necessity of taking into X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 331–342, 2009. c Springer-Verlag Berlin Heidelberg 2009

332

M. Grasmair

account both the data and the expected noise level is a well established fact in the theory of inverse problems (see for instance [8]). Because for many applications of mathematical imaging, in particular tasks that are to be completely automated, a precise knowledge of the noise is not available, this leads to the conclusion that, in these cases, a-priori parameter choices are not feasible. Instead, one should adapt α until both uα and the perceived noise f − uα are satisfactory. Though better than a ﬁxed a-priori choice, also adaptation of the regularization parameter need not be suﬃcient for good results. It may happen that the noise on the image f is not identically distributed but varies locally. In this case, it is diﬃcult to ﬁnd a compromise between oversmoothing in noise-free regions caused by too large a parameter choice, and a still noisy output resulting from a small parameter. Similar eﬀects can be observed, if the structure of the noise-free data itself changes over the image. Then, the regularization parameter should be larger for homogeneous parts of the image than for parts with small details. The problem of ﬁnding a parameter that is suited for the whole image can be circumvented by passing from a global parameter α > 0 to a parameter function α : Ω → IR>0 . Then, the regularization functional reads as 1 T (u; α) = 2

Ω

2 u(x) − f (x) dx +

α(x) d|Du|(x) .

(2)

Ω

This functional is well-deﬁned, if α is continuous, and, using direct methods, can readily be shown to attain a minimizer, if α is bounded away from zero. Total variation regularization with non-constant regularization parameter has already been studied in several other articles [6, 9, 10, 11, 16, 17]. In [16, 17], the choice of α is based on the scale of the features one wants to recover. In [10], at ﬁrst the uniform problem is solved with an automatically identiﬁed optimal regularization parameter α. The result of the ﬁrst denosing attempt is used for extracting the edges in the image, at which subsequently the regularization parameter is locally increased. Then the minimization problem is solved a second time with the localized parameter α(x). The approach in [11] uses statistical properties of the residual in order to decide whether the local regularization parameter is suited. The criterion employed there is based on the local variance of the residual: If it is close to the noise level, one can expect that mostly noise has been ﬁltered. It it is higher, then the residual probably contains texture and therefore the regularization parameter has to be decreased. The estimates in [11] are closely related to the inequalities in [10], though the approaches by which they are reached diﬀer considerably. Note moreover that the same idea has already been employed in [6] for one-dimensional total variation regularization. In this paper, we propose to target some a-priori speciﬁed smoothness of the output uα , which is measured in terms of the oscillations of the direction ∇uα /|∇uα | of the gradient of the image. This direction can be determined by passing to a dual formulation, as it essentially equals the rescaled dual variable. This idea of parameter adaptation based on the properties of the dual function is taken from [6].

Locally Adaptive Total Variation Regularization

333

The main concept of this paper of using a dual variable to provide a guess on the smoothness of the regularized image is introduced in Section 2. For further improving this smoothed image by enhancing the edges, we propose to subsequently apply anisotropic total variation regularization with an anisotropy that is estimated from the same dual variable that has determined the isotropic regularization parameter (see Section 3). A complete description of the algorithm can be found in Section 4. Finally, we apply this method in Section 5 to two test examples that show its suitability for adaptive noise removal.

2

Parameter Adaptation via Dual Variables

Consider the dual formulation of T (·; α), which consists in solving the constrained minimization problem 2 J (V ) := div V (x) + f (x) dx → min , Ω

|V (x)| ≤ α(x) almost everywhere on Ω , V (x) · ν(x) = 0 almost everywhere on ∂Ω ,

(3)

over the space of vector valued essentially bounded functions L∞ (Ω; IRn ). In (3), ν denotes the outward normal to the domain Ω, and the equation V · ν = 0 is understood in a distributional sense. Also, the divergence of an essentially bounded function is deﬁned distributionally. To be precise, the functions V and div V satisfy the equation ∇φ(x) · V (x) dx = − φ(x) div V (x) dx Ω

Ω

for every φ ∈ C 1 (IR ). Minimization of Tα is equivalent to solving the dual problem (3) in the sense that a function Vα ∈ L∞ (Ω; IRn ) solves (3), if and only if uα := f + div Vα minimizes Tα . We refer to [4], which treats the dual formulation of total variation regularization, and to [7] for a detailed introduction to inﬁnite dimensional convex analysis. We now examine the dual variable V more closely. Formally, the optimality condition for a minimizer uα of the functional T reads as ∇uα (x) for almost every x ∈ Ω . uα (x) − f (x) div α(x) |∇uα (x)| n

Since uα − f = div Vα , one sees that the dual minimizer Vα introduced above in fact coincides with the direction of the gradient of uα , multiplied by α(x). In particular, for almost every x ∈ Ω, we either have that |Vα (x)| = α(x) or the gradient of uα at x is zero, that is, uα is approximately constant near x. Even more, the local behaviour of Vα is strongly related to a certain kind of regularity of the regularized function uα : Large variations of Vα /α on the unit

334

M. Grasmair

sphere imply equally large variations of the direction of the gradient of uα . In other words, variations of Vα /α imply small oscillations of uα . The method we propose in the following takes advantage of these properties of Vα and uα and exploits their relation. Let r > 0 be some ﬁxed parameter. We deﬁne the r-local mean of a vector valued, essentially bounded function W ∈ L∞ (Ω; IRn ) at x ∈ Ω by 1 Mr (x; W )(x) := − W (y) dy := n W (y) dy . L Br (x) ∩ Ω Br (x)∩Ω Br (x)∩Ω Here, Ln denotes the n-dimensional Lebesgue measure. In addition, we deﬁne the r-local variation of W by Σr (x; W )(x) := W (x) − Mr (x; W ) . (4) The deﬁnition of Σr directly implies that Σr (x; W ) ≤ 2 ess sup |W (y)| : y ∈ Br (x) ∩ Ω for almost every x ∈ Ω. Applying the above inequality to the scaled solution Wα (x) := Vα (x)/α(x) of (3), one immediately sees that 0 ≤ Σr (x; Wα ) ≤ 2 max |Vα (y)|/α(y) : y ∈ Br (x) ∩ Ω ≤ 2 . Moreover, the actual size of the value Σr (x; Wα ) provides an indication of the oscillation of the function uα near x: If Σr (x; Wα ) is close to zero, then the gradient ∇uα points in roughly the same direction on the whole set Br (x). Conversely, a value above one implies that the orientation of ∇uα (x) vastly diﬀers from the majority of directions present in Br (x). See Figure 1 for an example of a smoothed image with corresponding local variation of the dual variable Vα . In this manner, the function Σr (x; Wα ) can serve as a local criterion for the smoothness of the regularized function uα . If the ﬁnally desired smoothness is not yet reached, that is, if Σr (x; Wα ) is too large, it is necessary to increase the local regularization parameter α(x). Conversely, if the function uα appears too smooth, that is, Σr (x; Wα ) is close to zero, then α(x) is decreased and a new tentative solution uα is computed. This process of computing Σr (x; Wα ) and updating α is repeated until the update of uα becomes small enough. In order to reach a uniform smoothness of the regularized image uα over its whole domain, we propose to prescribe some target smoothness 0 < θ < 1. Then one can compute a suitable update α ˜ of α setting s α(x) ˜ = α(x) θ + Σr (x; Wα )/2 (5) for some parameter s > 0 determining the size of the update. Iteration of this update will lead to a uniform smoothness Σr (x; Wα ) ≈ 2(1 − θ). The choice

Locally Adaptive Total Variation Regularization

335

Fig. 1. Smoothed image (left) and corresponding function Σr (right). Bright pixel values indicate a higher value of Σr .

of the target smoothness should reﬂect the properties of the image one wants to recover: A large parameter (θ ≥ 0.7) means that only the structures about the size of r are of interest. Small values (θ ≈ 0.55) put more emphasis on the structures of size smaller than r (see also Figure 4). In order to avoid too rapid changes of the parameter α(x), it is necessary smooth the update α ˜ computed by means of (5). Also from a theoretical point of view, this smoothing procedure is required for obtaining a continuous regularization function α. We propose to simply replace the update α ˜ (x) by its local mean value Mr (x; α). ˜ In this way, the average smoothness in the balls Br (x) will be almost independent of x.

3

Edge Enhancement by Anisotropy

Having determined the size of the local regularization parameter α(x) by means of the scaled dual variable Wα , it is in addition possible to use the distribution of the values of Wα on the unit sphere for sharpening edges and, in particular, thin ridges, which usually tend to get oversmoothed. To that end, instead of applying isotropic regularization, we introduce an anisotropy the direction of which is determined by the local covariance of Wα . For R > 0 we deﬁne the IRn×n -valued function CovR (x; W ), the covariance of W on BR (x) ∩ Ω, by deﬁning its (i, j)-th component as (i) (i,j) (i) (j) W (y) − MR (x; W ) W (j) (y) − MR (x; W ) dy . CovR (x; W ) := − BR (x)∩Ω

(6) Again using the property that Wα is proportional to ∇uα , one sees that the principal component of CovR (x; Wα ) indicates, up to sign, the prevailing direction of ∇uα near x. This dominant direction can be pronounced further by replacing the isotropic bound |Vα (x)| ≤ α(x) in (3) by an anisotropic one deﬁned by CovR (x; Wα ). This is achieved by minimizing J (V ) respecting the constraints V · ν = 0 on ∂Ω and

336

M. Grasmair

c(x) V (x)t CovR (x; Wα )V (x) ≤ 1

on Ω .

(7)

Here, the scalar valued function c : Ω → IR>0 has to be chosen in such a way that a similar amount of smoothing is reached as for isotropic regularization with parameter α(x). For determining a suitable size for c, note that the amount of smoothing induced by the bound (7) can be estimated by the determinant of the matrix c(x) CovR (x; Wα ), which, for consistency with the constraint |V (x)| ≤ α(x), should equal α(x)−2n . Thus one obtains for the function c the value −1/n c(x) = α(x)−2 det CovR (x; Wα ) . We therefore propose an edge enhancement via solving the minimization problem 2 J (V ) = div V (x) + f (x) dx → min , Ω

V (x)t A(x)V (x) ≤ 1 almost everywhere on Ω ,

(8)

V (x) · ν(x) = 0 almost everywhere on ∂Ω . Here

−1/n CovR (x; Wα ) , A(x) = α(x)−2 det CovR (x; Wα )

and Wα = Vα /α, where Vα is the solution of (3). Denoting the solution of (8) by VA and deﬁning uA := f + div VA , we obtain an enhanced version of the isotropic total variation minimizer uα .

4

Summary of the Algorithm

We now summarize the method developed in the previous sections for adaptive denoising of a noisy image f ∈ L2 (Ω). Algorithm 1. Set k = 1, choose some initial regularization function α1 : Ω → IR>0 , a smoothness parameter 0 < θ < 1, some r > 0, R > 0, s > 0, and ε > 0. 1. Compute

Vk := arg min J (V ) : |V (x)| ≤ αk (x) on Ω, V · ν = 0 on ∂Ω . 2. Deﬁne Wk := Vk /αk and compute Σr (x; Wk ) (see (4)). 3. If Vk − Vk−1 < ε go to 5. 4. Compute s α ˆ k+1 (x) := αk (x) θ + Σr (x; Wα )/2 and ˆ k+1 ) , αk+1 (x) := Mr (x; α increase k by one, and go to 1.

Locally Adaptive Total Variation Regularization

337

5. Compute CovR (x; Wk ) (see (6)) and −1/n CovR (x; Wk ) . A(x) := α(x)−2 det CovR (x; Wk ) 6. Compute

VA := arg min J (V ) : V (x)t A(x)V (x) ≤ 1 on Ω, V · ν = 0 on ∂Ω .

Deﬁne the regularized function uA := f + div VA . In steps 1–4, only the regularization function α is determined. For this, it is not necessary to compute the minimizers of J precisely. Instead, a reasonable approximation of a minimizer is suﬃcient to provide a good update of α, at least during the ﬁrst iterations. In particular if an iterative method is used for the minimization of J , the computation time can be improved by stopping the iteration well before convergence is reached. In the numerical examples below, the functions Vk and VA were computed by alternating between gradient of J and descent steps for the minimization t projections of V on the sets V : |V (x)| ≤ α (x) and V : V (x) A(x)V (x) ≤ k 1 , respectively. The function Vk−1 was used as initial guess for the computation of Vk .

5

Examples

The algorithm presented in Section 4 is tested by means of two images. The ﬁrst, synthetic image shows a collection of ellipses and rectangles of diﬀerent size and intensity (see Figure 2, upper left). These clean data were distorted by normally distributed random noise. In order to illustrate the capability of the algorithm for dealing with varying noise level, the standard deviation of the random noise was chosen to increase towards the right bottom of the image from about 10% of the maximal intensities to 150% (see Figure 2, lower left). The original image only consisting of simple geometric forms without any texture, it should be perfectly suited for total variation regularization. The changing noise level within the distorted data, however, makes a uniform parameter choice almost impossible: If the regularization parameter is chosen too small, then the noise on the right hand side of the data is barely removed. In particular, the right hand edges of the lower ellipses can hardly be recovered. On the other hand, a too large regularization parameter leads to the disappearance of the small circle at the left hand side of the image (see Figure 2, middle column). Only a very small range of parameters removes the noise reasonably well while still preserving the small scale structure—and even then the contrast deteriorates. Figure 2, upper right, shows the smoothed image obtained with Algorithm 1. Since the original image is very smooth, the smoothness parameter was chosen rather large as θ = 0.85. The variance Σr was evaluated on balls with a radius of 3 pixels, the complete image measuring 256 × 256 pixels. The lower right image in Figure 2 shows the distribution of the ﬁnally chosen regularization function α. As expected, it increases to the right bottom, where more noise is present. Over the whole image, the maxima and minima of α diﬀer by a factor of 12.

338

M. Grasmair

Fig. 2. Left column: original and noisy image; the noise level increases to the right bottom of the image. Middle column: denoising without parameter adaptation; either small details are lost or the smoothing eﬀects are partially insuﬃcient. Right column, upper row: denoised image for smoothness parameter θ = 0.85. Right column, lower row: logarithm of the ﬁnally chosen regularization function α; the minima and maxima of α diﬀer by a factor of 12.

One can see in the resulting image that the noise is eﬃciently removed. Also, the shape of the two lower ellipses is reconstructed in a reasonable way, considering that rather more noise than signal is present in these regions. Moreover, the small circle on the left is clearly visible, though some contrast was lost. As a second test example, we consider the photographer image. In a ﬁrst experiment we add diﬀerent levels of random noise (see Figure 3). The outcome of the adaptive Algorithm 1 (right column) is compared with the solution of standard total variation regularization with constant parameter choice independent of the noise level (middle column). The smoothness parameter for the adaptive algorithm was chosen as θ = 0.60; the regularization parameter for the standard algorithm was selected in such a way that the results for moderate noise level (third row) are comparable. The results show that, as expected, a constant regularization parameter only yields good results for a very speciﬁc noise level. For stronger noise, almost no smoothing is obtained, whereas the image is oversmoothed in case it is already quite clean. In contrast, the adaptive algorithm yields comparable results for diﬀerent noise levels, and is also able to treat noise-free images (ﬁrst row). In order to illustrate the eﬀect of the smoothness parameter, we apply Algorithm 1 to the noise-free photographer image and vary θ (see Figure 4). For a value of θ = 0.55 mainly the grass and details of the camera are smoothed. As

Locally Adaptive Total Variation Regularization

339

Fig. 3. Left column: image with Gaussian noise; the noise level increases with each row (σ = 0, 30, 50, 100). Middle column: total variation denoised image with constant parameter choice. The regularization parameter is kept the same for all images. Right column: denoised images with adaptive parameter choice for a smoothness parameter θ = 0.60.

θ increases, more and more details are lost until only the large scale structures in the image remain. Thus, the smoothness parameter works in some sense like the regularization parameter of standard total variation regularization.

340

M. Grasmair

Fig. 4. Inﬂuence of the smoothness parameter θ. First row: original image and smoothed images with θ = 0.55 and θ = 0.60. Second row: smoothed images with θ = 0.65, θ = 0.70, and θ = 0.80. Table 1. Comparison between standard TV regularization, the method proposed in [11], and our method for diﬀerent smoothness parameters. The table provides signal to noise ratios for the photographer image with various levels of Gaussian noise added (σ = 20, 30, 40, 50). original uniform TV adaptive ( [11]) θ = 0.55 θ = 0.60 θ = 0.65 9.47 6.31 3.86 1.93

14.63 12.13 10.05 8.38

15.63 13.35 11.59 10.11

15.30 13.28 11.47 10.00

14.73 13.46 12.45 11.54

13.71 12.84 12.18 11.51

There is, however, a notable diﬀerence. In the standard method, the time when structures in the image disappear depends on their scale, which is basically the ratio between contrast, that is, the diﬀerence of the intensities of the structure and the background, and the perimeter of the structure. As opposed to this, the model presented here puts much less emphasis on the contrast. Low contrast but distinct parts of the image tend to disappear much later than with uniform regularization. Compare for instance the rightmost building in the images of Figure 4 with the outcome of the standard method (Figure 3, ﬁrst row, middle image). Finally, Table 1 compares the performance of our algorithm with uniform total variation regularization and the adaptive method from [11]. The regularization parameter for the comparison was chosen in such a way that the norm of the

Locally Adaptive Total Variation Regularization

341

residual equals the norm of the noise. At small noise levels, the texture enhancing method [11] and even uniform regularization perform better. On the other hand, our algorithm provides good results if much noise is present. Note moreover that the here proposed method does not require a guess on the noise level, whereas the other methods do.

6

Conclusion

We have introduced an algorithm for the local adaptation of the regularization parameter in total variation regularization applied to the task of image denoising. The main idea of the method is to base the parameter choice on the smoothness of the output image, which is measured in terms of the variation of the direction of its gradient. This variation can be obtained when employing a dual method for the actual minimization of the total variation regularization functional. Starting from an initial guess of the regularization function, the proposed algorithm consecutively computes the corresponding minimizer of the total variation functional and updates the regularization function depending on the smoothness of the update. The iteration stops when the update is suﬃciently small. As a post-processing step, we propose to apply an anisotropic regularization method intended to sharpen edges. Again, the regularization is determined by the dual variable. This anisotropic regularization step reduces the contrast loss due to isotropic smoothing and, in particular, is suited for the enhancement of ridges. The examples presented in Section 5 indicate the suitability of the proposed method for denoising images with unknown, varying noise levels. In particular, they show its ability to provide an estimate for the amount of smoothing required to obtain a certain smoothness of the output.

Acknowledgement This work has been supported by the Austrian Science Fund (FWF) within the national research network Industrial Geometry, project 9203-N12.

References 1. Acar, R., Vogel, C.R.: Analysis of bounded variation penalty methods for ill-posed problems. Inverse Probl. 10(6), 1217–1229 (1994) 2. Aubert, G., Kornprobst, P.: Mathematical problems in image processing. In: Partial diﬀerential equations and the calculus of variations, With a foreword by Olivier Faugeras, 2nd edn. Applied Mathematical Sciences, vol. 147. Springer, New York (2006) 3. Burger, M., Osher, S.: Convergence rates of convex variational regularization. Inverse Probl. 20(5), 1411–1421 (2004) 4. Chambolle, A.: An algorithm for total variation minimization and applications. J. Math. Imaging Vision 20(1–2), 89–97 (2004)

342

M. Grasmair

5. Chambolle, A., Lions, P.-L.: Image recovery via total variation minimization and related problems. Numer. Math. 76(2), 167–188 (1997) 6. Davies, P.L., Kovac, A.: Local extremes, runs, strings and multiresolution. Ann. Statist. 29(1), 1–65 (2001) 7. Ekeland, I., Temam, R.: Convex Analysis and Variational Problems. NorthHolland, Amsterdam (1976) 8. Engl, H.W., Hanke, M., Neubauer, A.: Regularization of inverse problems. Mathematics and its Applications, vol. 375. Kluwer Academic Publishers Group, Dordrecht (1996) 9. Frigaard, I.A., Ngwa, G., Scherzer, O.: On eﬀective stopping time selection for visco-plastic nonlinear BV diﬀusion ﬁlters used in image denoising. SIAM J. Appl. Math. 63(6), 1911–1934 (electronic) (2003) 10. Frigaard, I.A., Scherzer, O.: Herschel–Bulkley diﬀusion ﬁltering: non-Newtonian ﬂuid mechanics in image processing. Z. Angew. Math. Mech. 86(6), 474–494 (2006) 11. Gilboa, G., Sochen, N., Zeevi, Y.Y.: Variational denoising of partly-textured images by spatially varying constraints. IEEE Trans. Image Process. 15(8), 2281–2289 (2006) 12. Ito, K., Kunisch, K.: Augmented Lagrangian methods for nonsmooth, convex optimization in Hilbert spaces. Nonlinear Anal. 41A, 591–616 (2000) 13. Nashed, M.Z., Scherzer, O.: Least squares and bounded variation regularization with nondiﬀerentiable functional. Numer. Funct. Anal. Optim. 19(7-8), 873–901 (1998) 14. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Phys. D 60(1–4), 259–268 (1992) 15. Scherzer, O., Grasmair, M., Grossauer, H., Haltmeier, M., Lenzen, F.: Variational Methods in Imaging. Applied Mathematical Sciences, vol. 167. Springer, New York (2008) 16. Strong, D.M.: Adaptive Total Variation Minimizing Image Restoration. CAM Report 97-38, University of California, Los Angeles (1997) 17. Strong, D.M., Aujol, J.-F., Chan, T.F.: Scale recognition, regularization parameter selection, and Meyer’s G norm in total variation regularization. Multiscale Model. Simul. 5(1), 273–303 (electronic) (2006)

Basic Image Features (BIFs) Arising from Approximate Symmetry Type Lewis D. Griffin1, Martin Lillholm1, Mike Crosier1, and Justus van Sande2 2

1 Computer Science, University College London, London WC1E 6BT, UK Biomedical Engineering, Eindhoven University of Technology, The Netherlands [email protected]

Abstract. We consider detection of local image symmetry using linear filters. We prove a simple criterion for determining if a filter is sensitive to a group of symmetries. We show that derivative-of-Gaussian (DtG) filters are excellent at detecting local image symmetry. Building on this, we propose a very simple algorithm that, based on the responses of a bank of six DtG filters, classifies each location of an image into one of seven Basic Image Features (BIFs). This effectively and efficiently realizes Marr’s proposal for an image primal sketch. We summarize results on the use of BIFs for texture classification, object category detection, and pixel classification. Keywords: Gaussian Derivatives, Hermite Transform, Group Theory.

1 Introduction Previous schemes for detection of image symmetry are fairly complex [1-6]; requiring, for example, comparison of the outputs of filters at multiple positions. Herein we show that symmetries may be detected by single linear filters. Building on this we present a simple algorithm that computes a Marr-type primal sketch [7] by categorizing local image structure according to its approximate symmetry. The paper is organized as follows. In section 2 we present results on image symmetries. In 3 we show how to test whether a linear filter is sensitive to a symmetry. In 4 we review image measurement with derivative-of-Gaussian (DtG) filters. In 5 we consider the symmetry-sensitivity of these DtG filters. In 6 we show how this sensitivity gives rise to a system of Basic Image Features (BIFs). In 7 we summarize results on using BIFs for texture categorization, object category detection and pixel classification. In 8 we conclude. Sections 2-5 are a distillation of work published, in press and under review in fuller form elsewhere [8-14]; 6 is new; parts of 7 have been presented or are under review in fuller form elsewhere [9, 11].

2 Image Symmetries Symmetry of a structure (X) is always relative to some class of admissible transformations. A structure is said to have a symmetry when a non-trivial group of X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 343–355, 2009. © Springer-Verlag Berlin Heidelberg 2009

344

L.D. Griffin et al.

admissible transformations, known as the automorphism group, each leave it indistinguishable from the original. This is denoted by Aut [ X ] := {t | t D X = X } .

Considering images, an obvious class of transformations are the spatial isometries; and the possible symmetries, relative to this class, have long been catalogued [15-17]. A broader class of transformations, where each spatial isometry is combined with a permutation of a finite set of image ‘colour’ values, has also been considered. These allow the symmetries of, for example, Escher’s ‘Reptiles’ to be expressed [18]. The gamut of possible ‘colour symmetries’ has been fully determined [19, 20]. We propose that the class of ‘image isometries’, defined as a spatial isometry combined with an intensity isometry, is appropriate for images. We write an image isometry as g = ( i, s ) , where i : \ → \ is an intensity isometry, and s : \ 2 → \ 2 is a spatial isometry. Such an image isometry is applied to an image I : \ 2 → \

(

)

according to g D I = i D I ( s D _ ) = i I ( s ( _ ) ) . Choosing a class of transformations is tantamount to choosing a geometry [21], and the geometry that corresponds to the class of image isometries has previously been considered for images [22] and much earlier, abstractly, as one of a larger class of possible geometries [23]. We have employed a method for determining the possible automorphism groups of images, relative to the class of image isometries. The method relies on two results. First, that the projection of a group of image isometries onto their spatial or intensity components in both cases makes a group. Second, that (except for a special case) the intensity projection group must be isomorphic to a factor group of the spatial projection group [8]. Using the method, we have determined the possible automorphism groups of 2-D images, except for cases that contain discrete periodic translations. A summary of these possible symmetries, together with our notational system is shown in fig. 1. The symmetries include: familiar ones, such as reflectional (J2,1), reflect-and-negate (J6,1 ), and Yin-Yang type (J7,2); simple but often ignored ones, such as variation in one direction only (J3); simple but novel, such as continuous translate-and-increment in one direction, plus a line of reflection parallel to that direction (J11); and some wholly novel, such as continuous translate-and-increment in one direction, plus a continuous line of centres of Yin-Yang type symmetry (J12).

3 Sensitivity of Linear Filters to Symmetries Detection of a symmetry seems to require multiple measurements, but this is incorrect. Consider a +1/-1 filter, such as used in finite-difference schemes. When positioned so that it straddles a putative line of reflection, a necessary criterion for the symmetry is that the filter gives a 0 response. We generalizes this: a filter F is sensitive to a symmetry K if it gives the same response to all images that have the symmetry (i.e. ∃ f ∈ \ Aut [ I ] ⊇ K ⇒ F I = f ). This definition is impractical because it requires assessment across all images. However, we have found a necessary and sufficient test that requires only a single integral to be computed. We present this below in Theorem 1, after introducing some notation.

J6,3, J6,5, …

…

J6,2

J6,1

J1,4, J1,6, …

…

… J2,3, J2,5, …

…

J1,2

J2,4, J2,6, …

…

…

…

J0

J2,1

J7,4, J7,6, …

J8,4, J8,6, …

J2,2

J3

J4

J8,2

J7,2

J9

J11

Jslope

J10

Fig. 1. The group/subgroup lattice of the possible image symmetries, excluding those with discrete periodic translation

J1,3, J1,5, …

…

J6,4, J6,6, …

J5

Jconst

J12

Basic Image Features (BIFs) Arising from Approximate Symmetry Type 345

346

L.D. Griffin et al.

We use an inner product notation ( F I :=

G G ∫ F( x) I(x) )

to denote the

G x∈\ 2

measurement of an image I : \ 2 → \ by a filter F : \ 2 → \ ; and we define an K operator F ( ) := ∑ i s D F which, roughly speaking, ‘smears’ a filter by a group. ( i , s )∈K

Theorem: Symmetry-Sensitivity Test for Filters

F is sensitive to K if and only if F F ( K ) = 0 Proof A formal proof will be published elsewhere [14]. Intuitively the truth of the theorem can be understood as follows. The signal that a filter ‘sees’ best is a copy of itself. Of all the symmetric signals, a symmetrised version of the filter should be the most easily seen. If the filter cannot see a symmetrised version of itself, then it is insensitive to the symmetry.

4 Gaussian Derivative Filters Gaussian Derivative (DtG) filters are defined in 1-D by

Gσ ( x ) := ( 2πσ

)

1 2 −2

e

−

x2 2σ 2

, Gσ(

n)

( x ) :=

n

dn ⎛ −1 ⎞ ⎛ x ⎞ Gσ ( x ) = ⎜ ⎟ Hn ⎜ ⎟ Gσ ( x ) dx n 2 σ ⎝ ⎠ ⎝σ 2 ⎠

where Hn is the nth Hermite polynomial; and in 2-D by Gσ(

m, n )

( x, y ) := Gσ( m)( x ) Gσ( n)( y ) .

They are used as a general-purpose method to probe an image location (which for simplicity we assume is at the origin 0 ) by computation of inner products jmn = Gσ( m, n ) I . Typically, one measures with a family of DtG filters up to some order e.g. the 2nd

{

}

order family Gσ( m, n ) | 0 ≤ m + n ≤ 2 . Scale-normalized filter responses c pq := σ p + q j pq make later equations simpler. The suitability of DtGs as the front-end of an uncommitted computational vision system arises from the symmetries that individual filters and families possess [24]. First amongst these is a scale symmetry, which manifests as a change of size, but not of shape, when a DtG is rescaled by blurring with a Gaussian kernel. Second is that the linear span of a family of DtGs is rotationally symmetric. The responses of a bank of DtG filters entangle intrinsic and extrinsic aspects of image structure. For example, an in-plane rotation of the image about the measurement point causes the DtG responses to change. A representation that

Basic Image Features (BIFs) Arising from Approximate Symmetry Type

347

disentangles these aspects for measurement up to 2nd order has been developed [13]. The representation works by factoring out of the 6-D 2nd order DtG response space the changes due to the group of image isometries that fix the measurement point and do not invert the intensity axis which we denote D∞ ( 0 ) × A+ (1) . 12.1

=

-2.3

=

-0.1

2nd order DtG family

image patch

0.8

4.1

-3.7

point in 6-D jet space

6-D jet space D∞ ( 0 ) × A+ (1) - The group of centred rotations and reflections, and positive affine intensity re-scalings

= The 2nd order local-image-structure orbifold

Fig. 2. The top part illustrates schematically the probing of an image patch by a bank of DtG filters resulting in a point in jet space; the bottom, the factoring of the jet space by a group of transformations resulting in the local-image-structure orbifold

The result is an orbifold – a type of manifold with boundaries, creases and corners allowed – consisting of a 3-D and a 0-D component (figure 2). The intrinsic aspect of a 6-tuple of filter responses corresponds to a particular location in the orbifold, and is invariant to rotating the image about the measurement point, reflecting it in a line through the measurement point, or affinely scaling the intensity. When the responses of the 1st and 2nd order DtG filters are all zero, the intrinsic aspect is the 0-D part of the orbifold; all other responses map to the 3-D component. A coordinate system ( l , b, a ) ∈ ⎡⎣ − π2 , π2 ⎤⎦ × ⎡⎣ 0, π2 ⎤⎦ × ⎡⎣0, π2 ⎤⎦ for the 3-D component is given by [13]:

(

)

2 ⎛ ⎞ 2 2 2 l = arctan ⎜ 4 c10 + c01 + ( c20 − c02 ) + 4 c11 , c20 + c02 ⎟ ⎝ ⎠

⎛ 2 2 b = arctan ⎜ 2 c10 + c01 , ⎝

a=

1 2

((

( c20 − c02 )2 + 4 c112 ⎞⎟ ⎠

((

)

)

2 2 arctan c01 − c10 ( c02 − c20 ) + 4c10 c01 c11 , 2 c012 − c102 c11 + c10 c01 ( c20 − c02 )

))

The orbifold has been equipped with a metric, induced by one on the filter response space, which expressed as a line element in the lba-system is

(

−1

)

ds 2 = dl 2 + cos 2 l db 2 + da 2 2 ( 5 − 3cos2b ) sin 2 2b . The orbifold is intrinsically curved, but it can be embedded into Euclidean 3-space with only mild distortion.

348

L.D. Griffin et al.

5 Symmetry-Sensitivity of DtG Filters Using the elements of sections 2-4, we can determine which DtG filters are sensitive to which symmetries. We consider not just canonical filter forms (e.g. an x-derivative) but any linear combination of filters in the 2nd order filter family. This allows us to determine the symmetry-sensitivity of the entire filter family, independent of the particular basis filters used. For example, while the x-derivative filter is sensitive to a reflectional symmetry with a vertical mirror line through the measurement point, the x- and y-derivatives together are sensitive to any reflectional symmetry in a line through the measurement point. J 6,c {3, J 7,c {6, c J 8,4

J1,c{3,

}

J1,c{4, J

c 2,3 +

c J1,2

J

}

J

c 5

J 6,c {6, } J 7,c {12, } J 8,c {6, } J const

} }

c J 6,2 c J 7,4

a 4

J11a

c J 2,2

a J 2,1

a J 2,2 + a+ J 8,2 +

J3

J10

J11g J12g

c J8,2 J 9a

c J 7,2

c J 8,4

J12a J slope

J 4g J 9g

a J 6,1

a J 6,2 + a− J 8,2 +

J 5a

SS is the exterior only

J0

g g J1,2 + J 2,1+ g g J 6,1 + J 7,2+ g J 8,2 +

SS is the entire volume

Fig. 3. The sensitivity-submanifolds (SS) of different symmetry types are shown in red. The different possible SS are arranged in a lattice induced by inclusion relations. The symmetry type labels correspond to those used in figure 1. Superscripts indicate the spatial relationship between the symmetry and the origin: a c indicates origin-centred rotation; an a+ that the origin is contained in a line of reflection, but is not a centre of rotation; similarly for a- and anti-reflections; a g indicates general position, neither centred nor aligned. All symmetries labelled in a box have the indicated SS; those on the left are minimal.

The filter family sensitivities can be projected into the orbifold to determine where the intrinsic component of the jet responses must lie whenever the image has any of a class of symmetries equivalent by conjugation with an element of D∞ ( 0 ) × A+ (1) . We

Basic Image Features (BIFs) Arising from Approximate Symmetry Type

349

call the restricted set of possible responses the sensitivity-submanifold (SS). For example the SS is the orbifold exterior ( a = 0 ∧ a = π 2 ) for reflectional symmetry in a mirror through the measurement point. The results are summarized in fig 3.

6 Symmetry-Based Basic Image Features (BIFs) We have used the symmetry sensitivities of the DtG filters as a starting point in defining a set of Basic Image Features (BIFs) that realize Marr’s idea of a primal sketch of image structure, in a computationally simple scheme. We do not claim that the scheme is derived as rigorously as the results on symmetry sensitivity. Our scheme works by considering the orbifold projection of jets, and classifying them according to the SS that they are closest to i.e. we define a Voronoi cell partitioning of the orbifold with the SS as cell centres. We find that this works best when only seven 0-D SS (the first and second rows of figure 3) are used, though we cannot justify this beyond that it produces nice results. The resulting orbifold partitioning is shown in the top-left of figure 4.

Fig. 4. Top left: the partitioning of the orbifold into BIF categories. Bottom left: BIFs calculated across a range of scales for a simple image of a figure ‘8’; in each cube scale increases right-to-left. Lower cubes sectioned for visualisation. Right: an example complex greyscale image, with BIFs calculated at one particular scale.

The orbifold distance to the six of these SS that lie in the 3-D component of the c orbifold are simple to compute; for example, the distance to the J 7,2 SS is tan −1

(

1 c2 2 20

2 2 + c11 + 12 c02

) (c

2 10

)

2 + c01 . To find which distance is shortest it is

350

L.D. Griffin et al.

computationally equivalent but simpler to find which of six quantities is maximum. The distance to the seventh SS, which corresponds to the origin of jet space where all the 1st and 2nd derivative filters have zero response, is not well-defined. We incorporate it into our scheme by using a multiple of the 0th order jet response. The full resulting scheme for computing BIFs is as follows. i) compute scale-normalized DtG filter responses as described in section 7. ii) compute λ :=

1 2

( c20 + c02 )

and γ :=

1 4

( c20 − c02 )2 + c112

iii)classify according to which is the largest of

{

M = ε . c00 ,

2 2 c10 + c01 , λ , − λ , (γ + λ )

2 , (γ − λ )

}

2, γ .

In our scheme the only free parameters, that have to be tuned to the application are the filter scale σ and ε which controls the amount of image classified as flat; a setting of ε = 0.05 is an effective default. For display purposes we find the following colour scheme effective: if ε .c00 is the largest of M then colour the pixel pink; if 2 2 c10 + c01 is largest colour it grey; then black, white, blue, yellow and green.

7 Example Applications Using BIFs We summarize results on using BIFs for texture, object and pixel classification. 7.1 Texture Classification

Textures are often classified based on a representation of them by a histogram over a texton vocabulary [25-29]. Textons are categorical patch classifications [25, 30]. To define the texton vocabulary, a space of patch descriptions is typically Voronoi partitioned into on-the-order-of 1000 texton categories, usually around centres found by k-means clustering of the responses from many images. Textures are then classified by nearest-neighbour matching of histograms. We have investigated the classification performance of an approach in which images are labelled using spatial complexes of BIFs instead of Voronoi cells in a local description vector space. Our approach is (i) simpler because we have eliminated the clustering step needed to produce a dictionary of features, and (ii) faster because we assign image patches to histogram bins without having to use a high-dimensional nearest-neighbour computation. We call the spatial complexes of BIFs that we use analogously to textons, Basic Image Patterns (BIPs). The type of BIP that we have found effective for texture description is a scale-template of the BIFs at the same location but at four, octave-separated scales. Unlike spatial-template BIPs, these scale-templates retain the rotation invariance of BIFs, which has been shown [30] to be advantageous in texture classification tasks. For textures, we do not use the pink/flat BIF category, so four scales produces a 64=1296 bin histogram representation, which seems to capture the right trade-off between specificity and generality (see figure 5).

Basic Image Features (BIFs) Arising from Approximate Symmetry Type

351

Fig. 5. Left: An image from the CUReT 'polyester' texture class. Centre: BIFs computed at four octave-separated scales, stacked to form an array of 'column-BIPs'. Right: Occurrence histogram of column-BIPs from every position in the image".

Our method has been tested on the CUReT texture dataset [31]. As reported in [9], the simple column-BIP representation and nearest-neighbour matching using the Bhattacharyya distance correctly classifies 98.2±0.1% of the remaining 49 images per class, which is at least as good as other methods using nearest-neighbour classifiers. Extending this method by using a multi-scale histogram comparison [9] results in an improvement to 98.6±0.1% on CUReT, which is comparable to methods [27] using SVMs for classification; and produces what are, to the best of our knowledge, the best reported results [9] on the more challenging UIUCTex [32] and KTH-TIPS [27] datasets, which include variations in scale. 7.2 Object Categorization

Texton approaches have also been shown to be useful for object categorization [28, 33]. Similar to texture, the ‘standard’ approach is to partition a patch descriptor space, such as that used by SIFT [34, 35], into on-the-order-of 1000 categories (visual words) and then to describe each image to be analyzed by what visual words it contains, and to use machine learning techniques to determine a classifier that can predict the category of object based on such descriptions. We have conducted preliminary experiments to assess whether visual words built from BIFs could be used rather than SIFT-space categories. As with texture this would be simpler and faster. For our initial experiments, we have labelled pixels according to their BIF type and, inspired by SIFT, with an orientation, quantized at the π 4 level. The orientation depends on the BIF type: grey BIFs have one of eight possible orientations based on 1st order structure; yellow, green and blue BIFs have one of four possible orientations based on 2nd order structure; black, white and pink BIFs are unoriented. Thus we have twenty-three possible orientation-augmented BIF (oBIF) labels. oBIFs are a natural 2nd order generalization of the gradient orientation alphabet typically used in SIFT [34, 35]. See figure 6 for an example image and calculated oBIFs.

352

L.D. Griffin et al.

Fig. 6. The top row (left) shows an image from the PASCAL challenge, labeled with direction-augmented BIFs at right. On the bottom are shown the 4ä4 template BIPs whose occurrence in an image most informatively signals the presence of a car.

We have tested three different types of visual word, which when built from BIFs or oBIFs we call Basic Image Patterns (BIPs); two based on geometrical partitioning of patch space and one based on more standard data-driven quantization. Each BIP system has been used with simple un-optimised of-the-shelf classifiers and applied to the 20-class PASCAL VOC 2008 [33] object recognition challenge dataset. Our score in figure 7 is based on a late fusion of the three schemes and is mid-field: above other first-time entrants and below well-optimised veteran entries. Using the PASCAL VOC 2008 [33] dataset, examples of 4x4 template BIPs whose presence in images is approximately independent, and which are maximally informative for the ‘car’ category are shown in figure 6. SurreyUv A_SRKDA UvA_TreeSFS LEAR_shotgun Uv A_ FullSFS Uv A_ Soft5ColorSift LEAR_ flat XRCE TKK_ ALL_ SFBS TKK_MAXVAL BerlinFIRSTNikon UCL ECPLIAMA CASIA_ LinSVM INRIASaclay_ CMA CASIA_ NonLinSVM INRIASaclay_ MEVO FIRST_ SCST FIRST_ SC1C CASIA_ NeuralNet 0

10

20

30

40

50

Fig. 7. Results for the PASCAL VOC 2008 challenge. Each bar in the chart is a challenge entry - our result is highlighted.

Basic Image Features (BIFs) Arising from Approximate Symmetry Type

353

7.3 Pixel Classification

Many image problems involve inferring one of a small class of labels for each pixel of an image. For complex images with unpredictable global structure, most approaches balance the likelihood of the labels, given the local image structure, and the likelihood of the local arrangement of inferred labels. In both cases the likelihoods are computed on the basis of statistics learnt from groundtruth-labelled training data. We have experimented with the use of BIFs in the computation of label likelihoods given the image i.e. ignoring the likelihood of arrangements of inferred labels. For our experiments we have used 2-D Electron Microscopy images of neuronal grey matter tissue, stained to enhance neuronal membranes. We trained on four images with hand-drawn groundtruth data, indicating the position of membranes, and evaluated on a further four images. We use a k-Nearest Neighbour (k-NN) approach to classification. NN classification starts by compiling a list of descriptors of all the patches in the training data, together with the groundtruth label of the pixel in the patch centre. The classifier is used by extracting a patch around each pixel to be classified, forming a description of it, comparing the description to each the compiled descriptions, finding the k which are most similar, and assigning the pixel being analyzed with the label associated with the majority of the k. We evaluated a baseline solution based on pixel values. The distance between two patch descriptions is simply the Euclidean distance between their blurred pixel values, minimized over allowing one patch to be rotated or reflected into eight configurations. We jointly optimize blur, patch size and k. The best settings that we find are: no blur, 7ä7 patches, and k=14. At these settings membrane-labelled pixels overlap (intersection divided by union) with the groundtruth by 48%. Our solution uses a patch of BIF labels as a patch descriptor. The distance between two descriptors is simply the number of pixels where the label does not agree. As in the baseline, we minimize the distance over one of the patches being rotated or reflected. We jointly optimize the scale ( σ ) at which the BIFs are computed, the parameter ε which controls the amount of the flat BIF class, patch size and k. The best settings that we find are σ = 1.2 , ε = 0.15 , 9ä9 patches, and k=10. At these settings we achieve an overlap of 55%. See figure 8. image

groundtruth

greylevel-based classification

BIF-based classification

BIFs

Fig. 8. Typical results of our pixel-classification system

So, using BIF- rather than greylevel-description raises the score from 48% to 55%. Computation is also faster because the kNN lookup dominates the cost of computing patch descriptions, and with BIFs the distances that need to be computed are of a Hamming rather than Euclidean type.

354

L.D. Griffin et al.

8 Conclusions We have derived a scheme for classifying image structure into one of seven BIF types based on the outputs of a bank of six DtG filters. Applied to an entire image, the output realizes Marr’s notion of an image primal sketch. Presented results show that BIF description is simple, fast and effective for texture, object and pixel classification. The BIF system was derived by considering the sensitivity of DtG filters to image symmetry. Although the final algorithm is pleasingly simple, there are some weak points in the derivation of the BIFs from symmetry sensitivities. Specifically, why are only the 0-D SS considered, how exactly does orbifold distance correspond to degree of failure of symmetry, why should least-approximate local symmetry be an effective feature label? We hope that the foundation of symmetry-sensitivity of DtGs can eventually answer all of these questions in a scheme where arbitrary choice has been eliminated. Such a scheme will be extendable to higher-order filter families (where appeal to visual evidence and past practice are less effective), for which a richer alphabet of feature labels is to be expected. We predict that such a richer alphabet will give more effective solutions in the application areas that we have reviewed.

References 1. Liu, Y.X., Collins, R.T., Tsin, Y.H.: A computational model for periodic pattern perception based on frieze and wallpaper groups. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(3), 354–371 (2004) 2. Scognamillo, R., et al.: A feature-based model of symmetry detection. Proceedings of the Royal Society of London Series B-Biological Sciences 270(1525), 1727–1733 (2003) 3. Mellor, M., Brady, M.: A new technique for local symmetry estimation. In: Kimmel, R., Sochen, N.A., Weickert, J. (eds.) Scale-Space 2005. LNCS, vol. 3459, pp. 38–49. Springer, Heidelberg (2005) 4. Bonneh, Y., Reisfeld, D., Yeshurun, Y.: Quantification of local symmetry - application to texture-discrimination. Spatial Vision 8(4), 515–530 (1994) 5. Mancini, S., Sally, S.L., Gurnsey, R.: Detection of symmetry and anti-symmetry. Vision Research 45(16), 2145–2160 (2005) 6. Baylis, G.C., Driver, J.: Perception of symmetry and repetition within and across visual shapes: Part-descriptions and object-based attention. Visual Cognition 8(2), 163–196 (2001) 7. Marr, D.: Vision. W H Freeman & co., New York (1982) 8. Griffin, L.D.: Symmetries of 1-D Images. Journal of Mathematical Imaging and Vision 31(2-3), 157–164 (2008) 9. Crosier, M., Griffin, L.D.: Texture classification with a dictionary of basic image features. In: CVPR 2008. IEEE, Los Alamitos (2008) 10. Lillholm, M., Griffin, L.D.: Statistics and category systems for the shape index descriptor of local image. Image and Vision Computing (in press) (2008) 11. Lillholm, M., Griffin, L.D.: Novel image feature alphabets for object recognition. In: ICPR 2008 (2008) 12. Griffin, L.D.: Symmetries of 2D images: cases without periodic translation. Journal of Mathematical Imaging and Vision (in press)

Basic Image Features (BIFs) Arising from Approximate Symmetry Type

355

13. Griffin, L.D.: The 2nd order local-image-structure solid. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(8), 1355–1366 (2007) 14. Griffin, L.D., Lillholm, M.: Symmetry-sensitivity of derivative of gaussian filters. IEEE Transactions on Pattern Analysis and Machine Intelligence (in press) 15. Bieberbach, L.: Über die bewegungsgruppen der euklidischen raume I. Mathematische Annalen 70, 297 (1911) 16. Conway, J.H., et al.: On three-dimensional space groups. Contributions to Algebra and Geometry 42(2), 475–507 (2001) 17. Grünbaum, B., Shephard, G.C.: Tilings and Patterns. WH Freeman & co., New York (1987) 18. Schattschneider, D.: MC Escher. Visions of Symmetry. Plenum Press (1990) 19. Holser, W.T.: Classification of symmetry groups. Acta Crystallographica 14, 1236–1242 (1961) 20. Loeb, A.A.: Color and Symmetry. Robert E. Krieger (1978) 21. Klein, F.: A comparative review of recent researches in geometry (trans. by MW Haskell). Bulletin of the New York Mathematical Society 2, 215–249 (1892) 22. Koenderink, J.J., van Doorn, A.J.: Image processing done right. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 158–172. Springer, Heidelberg (2002) 23. Cayley, A.: Sixth memoir upon the quantics. Philosophical Transactions of the Royal Society 149, 61–70 (1859) 24. Koenderink, J.J., van Doorn, A.J.: Generic Neighborhood Operators. IEEE Transactions on Pattern Analysis and Machine Intelligence 14(6), 597–605 (1992) 25. Varma, M., Zisserman, A.: Texture classification: are filter banks necessary? In: CVPR 2003. IEEE, Los Alamitos (2003) 26. Varma, M., Zisserman, A.: A statistical approach to texture classification from single images. International Journal of Computer Vision 62(1), 61–81 (2005) 27. Hayman, E., et al.: On the signifigance of real-world conditions for material classification. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3024, pp. 253–266. Springer, Heidelberg (2004) 28. Zhang, J., et al.: Local features and kernels for classification of texture and object categories: a comprehensive study. In: CVPR 2006 (2006) 29. Perronnin, F., et al.: Adapted vocabularies for generic visual categorization. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 464–475. Springer, Heidelberg (2006) 30. Varma, M., Zisserman, A.: Unifying Statistical Texture Classification Frameworks. Image and Vision Computing (in press) (2005) 31. Cula, O.G., Dana, K.J.: Compact representation of bidirectional texture functions. In: CVPR 2001. IEEE, Los Alamitos (2001) 32. Lazebnik, S.C., Schmid, C., Ponce, J.: A spare texture representation using local affine regions. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(8), 1265– 1278 (2005) 33. Csurka, G., et al.: Visual categorization with a bag of keypoints. In: ECCV 2004, pp. 1–22 (2004) 34. Lowe, D.G.: Towards a computational model for object recognition in IT cortex. In: Biologically Motivated Computer Vision, Proceeding, pp. 20–31 (2000) 35. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)

An Anisotropic Fourth-Order Partial Diﬀerential Equation for Noise Removal Mohammad Reza Hajiaboli Department of Electrical and Computer Engineering Concordia University, Montreal, Canada [email protected]

Abstract. Fourth-order nonlinear diﬀusion ﬁlters are isotropic ﬁlters in which the strength of diﬀusion at regions with strong image features such as regions with an edge or texture is reduced leading to their preservation. However, the optimal choice of parameter in the numerical solver of these ﬁlters for having a minimal distortion of the image features results in a very slow convergence rate and formation of speckle noise on the denoised image especially when the noise level is moderately high. In this paper, a new fourth-order nonlinear diﬀusion ﬁlter is introduced, which have an anisotropic behavior on the image features. In the proposed ﬁlter, it is shown that a suitable design of a set of diﬀusivity functions to unevenly control the diﬀusion on the directions of level set and gradient leads to a fast convergent ﬁlter with a good edge preservation capability. The comparison of the results obtained by the proposed ﬁlter with that of the classical and recently developed techniques shows that the proposed method produces a noticeable improvement in the quality of denoised images evaluated subjectively and quantitatively as well as a substantial increment of the convergence rate comparing to the classical ﬁlter.

1

Introduction

Nonlinear diﬀusion denoising ﬁlters are known for their good edge preservation capabilities. In these techniques, the denoised image is a solution of a partial differential equation (PDE). The ﬁrst kind of these denoising methods is introduced by Perona and Malik [1] in 1990 based on solving a nonlinear second-order PDE (i.e. the so-called Perona-Malik equation). Since then, there has been a great deal of research in this ﬁled which led to introduction of variety of nonlinear diﬀusion denoising techniques (see [2], [3] as a few examples). In spite of the good edge preservation obtained by these techniques, these methods tend to produce blocky eﬀects in the images [4]. In fact, the solution of Perona-Malik equation is a piecewise constant solution, therefore these ﬁlters create blocky eﬀects on the smooth regions of the image. A spatially regularized version of the nonlinear diﬀusion ﬁlter has been introduced by Catte et al. [2] to reduce the formation of the these artifacts on the denoised image. You and Kaveh [4] proposed a more eﬀective solution to this problem by using a fourthorder PDE for noise removal, where a planar approximation of the noisy image X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 356–367, 2009. c Springer-Verlag Berlin Heidelberg 2009

An Anisotropic Fourth-Order Diﬀusion Filter

357

is supported in the solution of the PDE resulting in a signiﬁcant improvement of the ramp edge preservation and a dramatic reduction of blocky eﬀects. Based on this idea a variety of the fourth-order PDE based denoising techniques have been developed such as the ﬁlters given in [5], [6], and [7]. However, the fourthorder diﬀusion ﬁlters damp high spatial frequency components (i.e. noise and step edges) much faster than the second-order ones [5]. This feature can result in distorting of the step edges during the evolutionary process of the image denoising especially when smoothing strength of the ﬁlter for the detected edges is not eﬀectively reduced by a diﬀusivity function. Setting a small threshold value in the diﬀusivity function substantially reduces the diﬀusivity on the edges with the expense of a very slow convergence rate, as reported in [4] and [5]. All of previously mentioned techniques belong to a class of diﬀusion-based denoising ﬁlters known as isotropic nonlinear diﬀusion denoising methods. It means that total amount of the diﬀusion controlled by the diﬀusivity function is applied on the diﬀerent regions of the image regardless of the direction of the image features. To improve edge preservation of these ﬁlters, the other class of diﬀusion-based denoising techniques have been emerged in which the diﬀusion is adapted to the direction of the local image features [8], [9] and [10]. It means that the ﬁlter minimizes the diﬀusion strength on the direction perpendicular to the direction of local features and maximizes it in the direction of the local features. However, these techniques have been developed in the context of the second-order diﬀusion ﬁlters. In this paper, an anisotropic fourth-order diﬀusion ﬁlter is proposed in which the diﬀusion strength is adjusted respecting the direction of the local features. Two diﬀerent diﬀusivity functions are designed to extremely minimize the diﬀusion perpendicular to the feature orientation, while allowing the diﬀusion parallel to the edge orientation and on the smooth regions to proceed with normal strength. The comparison of the results obtained by the proposed ﬁlter with that of the classical and newly developed ones reveals a noticeable improvement in the quality of the denoised images evaluated subjectively and quantitatively as well as a substantial increment of the convergence rate comparing to the classical ﬁlter.

2 2.1

A Brief Review From Second to Fourth-Order Filters

The nonlinear diﬀusion ﬁlters are evolutionary processes. The fundamental PDE of the nonlinear diﬀusion ﬁlter introduced by Perona and Malik [1] is given by ∂u/∂t = div. (c (∇u) ∇u) ,

(1)

where u is the image intensity function, c(.) is a diﬀusivity function by which the diﬀusion coeﬃcient is calculated and t is the evolution time. Symbols of div. and . are used for mathematical notation of Euclidean norm and divergence respectively. The diﬀusivity function is a positive and none increasing function of ∇u. One of these diﬀusivity functions deﬁned by Perona and Malik is given by

358

M.R. Hajiaboli

c (∇u) = k 2 / k 2 + ∇u2 ,

(2)

where k is the so-called contrast parameter. You and his colleagues [11], carried out a detailed analysis to show that the solution of (1) is equal to the minimization of an energy functional. If the diﬀusivity function of (2) is used then the energy functional is R (u) = Ω

k2 2 ln k + ∇u2 dxdy , 2

(3)

where Ω is the region of support of u. R (u) is minimized when ∇u2 is minimum, which leads to a piecewise constant approximation of u. Therefore, formation of staircase artifacts on the ramp edges is unavoidable. In order to resolve this problem, You and Kaveh [4] introduced a fourth-order PDE-based denoising method in which the denoised image is obtained by minimization of the potential function given by E (u) = f |∇2 u| dxdy , (4) Ω

2

where f (s) = sc (s)and |∇ u| is the absolute value of Laplacian of u. Therefore, for the same diﬀusivity function in (2), E(u) is in form of E (u) = Ω

k2 2 ln k + |∇2 u| dxdy , 2

(5)

meaning that E(u) is minimized when |∇2 u| is minimum. Therefore, the ramp region of u (i.e. the regions where |∇2 u| = 0) are ﬁt in the solution of the associated fourth-order PDE. The solution of the Minimization problem of (4) after using Euler equation followed by gradient descent procedure is given by ∂u/∂t = −∇2 c |∇2 u| ∇2 u , (6) By the forward Euler approximation of the ∂u/∂t , the numerical solver of (6) is given by un+1 = un − dt × ∇2 c |∇2 un | ∇2 un , u0 = u0

and

n = 0, 1, · · · , N ,

(7)

where n is the number of iterations, dt is the time step-size and u0 is a noisy image. This process is an iterative process. In order to protect the edges from over-smoothing, the process needs to be ceased at a certain number of iterations denoted by N. Besides these nonlinear diﬀusion ﬁlters, another class of techniques known as regularization techniques based on solving the nonlinear PDE has been widely used for image restoration. The classical paper of Rudin, Osher and Fatemi [12] is introduced one of the ﬁrst kind of these ﬁlters in which PDE to be solved is

An Anisotropic Fourth-Order Diﬀusion Filter

359

of the second order. Therefore, the same problem of formation of staircases on the ramp regions of the image motivates the researchers to introduce the new regularization techniques by solving the higher order PDE such as [13], [14]. However, the focus of this paper is on the diﬀusion based techniques as they have been reviewed earlier. 2.2

Edge Preservation and Convergence Rate

Apart from a signiﬁcant advancement in reduction of the blocky eﬀects on the denoised image using (6), the optimal parameter setting for numerical solver in (7) leads to very slow convergence rate in its numerical solver especially when the level of contaminating noise is moderately high. A recently developed technique known as a fourth-order hybrid model [6] uses a relaxed median ﬁlter [15] to improve the quality of the denoised image when the observed image is heavily contaminated by noise. The numerical model of this ﬁlter is given by un+1 = RMαω un − dt × ∇2 c |∇2 un | ∇2 un , (8) where RM denotes the relaxed median ﬁlter with a lower bound of α and upper bound of ω. This ﬁltering process needs a lower number of iterations to give a denoised image. However, the denoised image is highly aﬀected by using the relaxed median ﬁlter and the main advantage of using fourth-order diﬀusion ﬁlters (i.e. the ramp edge preservation) is hindered as it is shown later. Moreover, the computational burden per iteration is dramatically higher than that of the You and kaveh. Another recently introduced technique [7] demonstrates a signiﬁcant improvement in the convergence rate along with a good ramp edge preservation. In this technique, the diﬀusivity function of the You and Kaveh ﬁlter, c |∇2 u| , is replaced by c (∇u) and the PDE of the ﬁlter is given by (9) ∂u/∂t = −∇2 c (∇u) ∇2 u , Although the energy functional of (9) does not have a closed form, it can be seen that the ﬁlter can still support the planar approximation of the image. The ramp edge preservation of this fourth-order diﬀusion ﬁlter comes from the fact that 2 2 ∂u/∂t 2 → 0 when ∇ u → 0. However, as|∇ u| ≥ ∇u the diﬀusivity function of c |∇ u| gives the smaller diﬀusion coeﬃcient for the step edges compared to c (∇u) . Therefore, in spite of the good convergence rate obtained by (9), the step edges are still facing the higher amount of the distortion comparing to that of the classical methods. 2.3

Anisotropic Diﬀusion Filters

The so-called anisotropic diﬀusion ﬁlters refer to the schemes in which the diffusion rate is speciﬁcally controlled based on the direction of the local features such as the ones introduced in [8], [9] and [10]. The coherence-enhancing diﬀusion ﬁlter [9] is one this kind in which the scalar diﬀusion coeﬃcient in (1) is

360

M.R. Hajiaboli

replaced by a tensor diﬀusion coeﬃcient to reduce the diﬀusivity of the ﬁlter in perpendicular to the orientation of the local features, while let the diﬀusion with high strength is performed at the direction of the level set. Another anisotropic ﬁlter introduced by Carmona and Zhong [10] uses the scalar diﬀusivity functions to perform anisotropic diﬀusion. The PDE of this ﬁlter is given by ∂u/∂t = c1 (c2 uηη + c3 uξξ ) ,

(10)

where c1 ,c2 and c3 are diﬀerent diﬀusivity functions and uηη and uξξ are the second-order directional derivative. Let η denote the perpendicular direction to the orientation of the feature or the so-called gradient direction and ξ denote the direction of the contour or level set. All of these techniques belong to a class of ﬁlters known as the secondorder diﬀusion ﬁlters. Some techniques such as [16] for surface smoothing by anisotropic diﬀusion ﬁltering of the normals to the surface or its other variant for image denoising [17] can be considered as fourth-order anisotropic ﬁlters, however these ﬁlters are two phase ﬁlters meaning that at the ﬁrst phase, an anisotropic ﬁlter applies on the normal map of the surface or image and at the second phase, a surface is ﬁtted to the processed normals. In Section 3, a new setting of the fourth-order anisotropic diﬀusion ﬁlter is proposed, which is a single phase ﬁlter and can be seen as a generalization of the classical fourth-order nonlinear diﬀusion ﬁlter of You and Kaveh.

3 3.1

The Proposed Model Diﬀusion Equation

The previously mentioned fourth-order diﬀusion ﬁlters are isotropic in which the extent of the diﬀusion is controlled by the diﬀusivity function regardless of the orientation of the edges. The only anisotropic behavior of those ﬁlters is limited to the anisotropic response of the discrete Laplacian operator. Most of the discrete Laplacian operators exhibit an anisotropic response to the edge with respect to x and y (i.e. the Cartesian coordination) [18]. However, in order to give an anisotropic realization of the fourth-order diﬀusion ﬁlter, one should consider the second-order directional derivative of the image. Two normalized and orthogonal vectors of η and ξ pointing at the direction of the gradient and level set respectively are given by [ux uy ] [−uy ux ] η= and ξ = . u2x + u2y u2x + u2y

(11)

Based on the deﬁnition in (11) , one can derive the second order derivative of the image in the direction of the gradient and level set as uηη = and

uxx u2x + 2ux uy uxy + uyy u2y u2x + u2y

(12)

An Anisotropic Fourth-Order Diﬀusion Filter

uξξ =

uxxu2y − 2ux uy uxy + uyy u2x . u2x + u2y

361

(13)

However, it can be simply shown that the summation of these second directional derivatives is equal to the Laplacian of the image, ∇2 u = uxx + uyy = uξξ + uηη .

(14)

Therefore, the proposed fourth-order diﬀusion equation, which is of a generalization of (6) can be written as ∂u/∂t = −∇2 (c1 (c2 uηη + c3 uξξ )) .

(15)

In the proposed model, c1 , c2 and c3 are the diﬀusivity functions, where c1 controls total amount of diﬀusion and c2 and c3 control the uneven diﬀusion in the direction of η and ξ . Apparently, choosing c2 = c3 and c1 ∗ c2 = c will lead to the nonlinear diﬀusion ﬁlter of (6) or (9) depending on the deﬁnition of c. In the next section, the criteria of a suitable choice for these diﬀusivity functions are discussed. 3.2

Diﬀusivity Functions

Diﬀerent diﬀusivity functions in context of nonlinear diﬀusion denoising have been introduced and depending on the choice of the diﬀusivity function, the behavior of the ﬁlter can be varied. The most commonly used diﬀusivity function in fourth-order diﬀusion ﬁlters is the one in (2) as c (s), where s is the modulus of the derivative of the image (s = |∇2 u| in (6) or s = ∇u in (9). This diﬀusivity function regardless of the choice of s is a function bounded in (0,1]. However, a low computational cost and suitable choice of these diﬀusivity functions in our proposed model is given by c1 (s) = c2 (s) = c (∇u) and c3 = 1 .

(16)

Similar to (9), s in the function c1 is the modulus of the gradient of u which leads to a fast convergence rate and c2 = c1 is an optimal choice in terms of overall computational cost of the ﬁlter. Therefore, the proposed model in (15) can be rewritten in the form of ∂u/∂t = −∇2 c (∇u)2 uηη + c (∇u) uξξ . (17) Since the function c is bounded in (0, 1], the overall diﬀusivity in η direction is smaller than the one in ξ direction. Before presenting comparative results in the next section, the performance of the ﬁlter is compared to the second order ﬁlter of Perona Malik in Fig.1, which can show the ability of the proposed ﬁlter to preserve the ramp edges. In fact, the proposed ﬁlter supports the planar approximation of the image similar to (6) and (9), since for planar regions, uηη → 0 and uξξ → 0 which lead to ∂u/∂t → 0.

362

M.R. Hajiaboli

(a)

(b)

(c)

Fig. 1. Comparing the results obtained by a second-order ﬁlter and the proposed ﬁlter, (a) noisy image, (b) denoised image by the Perona and Malik ﬁlter, (c) denoised image by the proposed ﬁlter

3.3

Inverse Diﬀusion

The classical fourth-order ﬁlter of You and Kaveh [4] in (6) is a well-posed process because its potential function, (5), is a positive potential function with a global minimum. On the other hand, deriving the potential function of the proposed ﬁlter, (17), is not as simple as (6). However, in order to demonstrate that the uneven weighed summation of uηη and uξξ may lead to the inverse diﬀusion, it is suﬃcient to show that at least for a sub-region of u 2 sing c (∇u) uηη + c (∇u) uξξ = sign ∇2 u . (18) In this case, the dynamic ﬂow of (17) performs an inverse diﬀusion, which results in the edge enhancement. The maximum of the uneven weight between coeﬃcients of uηη and uξξ happens, when c (∇u) = 1/2. In this case, the linear version of the (17) can be written in the form of u uξξ ηη ∂u/∂t = −∇2 + 2 4u uξξ uξξ ηη 2 = −∇ + + 4 4 24 u u ∇ ξξ + . (19) = −∇2 4 4 Knowing (6) has a positive potential function, if sign ∇2 u/4 + uξξ /4 = 2 that sign ∇ u , it results in a positive potential function for ﬁlter (19). It means that |∇2 u| > |uξξ | should be valid throughout the whole image, which does not hold true. An example shown in Fig.2 can demonstrate the fact that the linear diﬀusion equation of (19) performs an inverse diﬀusion on the edges. The signal shown in Fig.2-(b) is the extracted intensity proﬁle of the standard test image of "disk" in Fig.2-(a) at the middle row. The signal in Fig.2-(c) is the same intensity

An Anisotropic Fourth-Order Diﬀusion Filter

363

Fig. 2. Inverse diﬀusion as a result of the uneven diﬀusion in the directions of η and ξ, (a) is the original image of "disk", (b),(c) and (d) are the intensity of the original, diﬀused image by (19) and diﬀused image by the proposed ﬁlter (17) at the middle row

proﬁle of the image after being ﬁltered by (19). The inverse diﬀusion in this case leads to the edge enhancement. However, if the ﬁlter is run on the nonlinear fashion as it is proposed in (17), the image shown in Fig.2-(d) shows that process of uplifting of the edges is dramatically reduced. In the other word, in the general application of the image denoising, the process of the inverse diﬀusion in the proposed ﬁlter does not lead to instability of the ﬁlter and formation of ringing artifacts around the edges.

4

Comparative Results

In this section, we are presenting the comparative results of the proposed method with the other fourth-order nonlinear diﬀusion ﬁlters. The results of the following ﬁlters are going to be compared: 1. The Proposed ﬁlter with the PDE of (17) with k=7 and dt=0.031 (i.e. the time-step size that provides a data independent stability in the numerical solver [7]). 2. The ﬁlter of (7) introduced by You and Kaveh [4] with the suggested parameters setting of dt=0.25 and k=1. 3. The ﬁlter of (8) introduced in [12] with the suggested parameters setting of dt=0.1, k=3, α = 3 and ω = 5. 4. The ﬁlter in (9) introduced in [7] that is a self-governing ﬁlter. In this ﬁlter, the diﬀusivity function of Pernoa and Malik, c(s) has been used with s = ∇u, the contrast parameter of k is estimated by histogram-based mechanism used in [1] and dt=0.031. Three test images of "Pepper", "Cameraman" and "House" have been corrupted by white additive Gaussian noise with standard deviation of 15. In Table 1, an objective comparison between the performances of these ﬁlters in terms of signal-to-noise ratio (SNR) of the denoised image and their computational complexity are presented.

364

M.R. Hajiaboli Table 1. Quantitative comparison of the results

Noisy Image SNR(dB) Pepper 10.98 Cameraman 12.38 House 9.68

Method Proposed (9) (7) (8) Proposed (9) (7) (8) Proposed (9) (7) (8)

SNR(dB) 17.84 17.32 15.83 15.21 17.08 16.83 16.59 13.59 17.44 17.08 15.80 15.39

Denoised Image Num. of Iter. CPU/Iter. Convergence(s) 80 0.038 3.04 14 0.080 1.12 3133 0.031 97.12 2 0.155 0.31 35 0.038 1.33 6 0.082 0.492 3015 0.031 93.46 1 0.160 0.16 89 0.038 3.382 36 0.081 2.916 3907 0.031 121.12 2 0.160 0.32

18 17 16

SNR(dB)

15 14 13 12 filter (7) filter (9) proposed filter (17) filter(8)

11 10 9 0 10

1

10

2

10 Number of Iteration −1

3

10

4

10

Fig. 3. Comparing the convergence rate of the ﬁlters for denoising of test image "House"

The results exhibit that the proposed method constantly produces the denoised image with higher SNR. It is important to note that the results are obtained at the optimal number of iterations in which the maximum SNR in evolutionary process of the ﬁlters are achieved. If the iterative ﬁltering process is continued after the optimal number of iterations, the SNR of the denoised image is reduced due to over-smoothness of edges. The other important feature in the proposed method is its fast convergence rate. As it is shown in Fig.3, for the test image of "House", the convergence rate in the proposed method is much higher than the ﬁlter of You and Kaveh. The computational burden of the ﬁlters is measured as CPU time of each iteration provided that they are ﬁltering the same image on the same computer. Thus, the total convergence time for ﬁltering process is a multiplication of CPU/iteration by number of iterations. The relaxed median regularized ﬁlter converges faster

An Anisotropic Fourth-Order Diﬀusion Filter

(a)

(d)

(b)

(c)

(e)

(f)

365

Fig. 4. Comparing the perceptual quality of the results. The pair of images labeled (a) to (f) are as the following: (a) noiseless image, (b) noisy image, (c) denoised image using (7), (d) denoised image using (8), (e) denoised image using (9), (f) is denoised image using proposed ﬁlter (17).

366

M.R. Hajiaboli

than the proposed method, however the maximum SNR is signiﬁcantly lower than that of other methods, and the decay rate of SNR due to over-smoothness of the edges is also very fast. The computational cost of the proposed ﬁlter compared to the one in (9) is slightly higher, however the higher SNR obtained by the proposed ﬁlter justiﬁes this amount of the higher computational burden. In Fig.4, the perceptual quality of the denoised image by the proposed method is compared with that of the other methods. In the ﬁrst row, the whole image and in the second row, a magniﬁed portion of the image are shown. Each pair of the images is labeled from (a)-(f). The ﬁrst two images (a) and (b) are the noiseless and the noisy images. In Fig.4-(c), the denoised image by You and Kaveh ﬁlter is shown in which formation of some speckle noise on the denoised image is visible. This drawback is known and addressed in [4] and it is as a result of choosing small value for k in diﬀusivity function, however this setting of k is necessary to protect the edges from over-smoothing. In Fig.4-(d), the denoised image by the relaxed median regularized ﬁlter using (8) is shown. This denoised image is blurred and some staircase artifacts on smooth regions of the image are formed. The next image, shown in Fig.4-(e) is the result of the ﬁlter in (9) in which the extent of denoising and edge preservation is noticeably better than that of the ﬁlters of (7) and (8). However, comparing this result with the one obtained by the proposed ﬁlter in Fig. 4-(f) reveals that the extent of edge preservation in the proposed ﬁlter is noticeably higher.

5

Conclusion

An anisotropic fourth-order PDE for noise removal has been proposed. A brief theoretical review of the second and fourth-order diﬀusion denoising ﬁlters has been presented with highlighting the fact that previously developed fourth-order ﬁlters are isotropic ﬁlters in which the extent of the edge preservation is controlled by reduction of the diﬀusivity of the ﬁlters near the edge regardless of its orientation. A major challenge in these ﬁlters is that the optimal choice of the model parameters for good edge preservation leads to a dramatically slow convergence rate. However, in the proposed ﬁlter, the diﬀusion strength has been adjusted with respect to the direction of the local features. Two diﬀerent diﬀusivity functions have been designed to extremely minimize the diﬀusion in perpendicular to the feature orientation (i.e. gradient direction), while let the diﬀusion on the direction parallel to the orientation of the edge (i.e. direction of the level set) proceed with normal speed. Therefore, the proposed ﬁlter leads to a faster reduction of the uncorrelated noise and overall faster convergence rate with a good edge preservation due to reduction of the diﬀusivity of the ﬁlter in the gradient direction. The comparison of the results obtained by the proposed ﬁlter with that of the classical and newly developed ones has shown that the proposed method produces a noticeable improvement in the quality of the denoised images evaluated subjectively and quantitatively as well as a substantial increment of the convergence rate compared to the classical ﬁlter.

An Anisotropic Fourth-Order Diﬀusion Filter

367

References 1. Perona, P., Malik, J.: Scale-space and edge detection using anisotropic diﬀusion. IEEE Trans. on Pattern Analysis and Machine Intelligence 12(7), 629–639 (1990) 2. Catte, F., et al.: Image selective smoothing and edge detection by nonlinear diﬀusion. SIAM J. Numer. Anal. 29(1), 182–193 (1992) 3. Black, M.J., et al.: Robust anisotropic diﬀusion. IEEE Transactions on Image Processing 7(3), 421–432 (1998) 4. You, Y.L., Kaveh, M.: Fourth-order partial diﬀerential equations for noise removal. IEEE Transactions on Image Processing 9(10), 1723–1730 (2000) 5. Lysaker, M., Lundervold, A., Tai, X.-C.: Noise removal using fourth-order partial diﬀerential equation with applications to medical magnetic resonance images in space and time. IEEE Tran. on Image Processing 12(12), 1579–1590 (2003) 6. Rajan, J., Kannan, K., Kaimal, M.R.: An Improved hybrid model for molecular image denoising. Journal of Mathematical Imaging and Vision 31, 73–79 (2008) 7. Hajiaboli, M.R.: A self-governing hybrid model for noise removal. In: Wada, T., Huang, F., Lin, S. (eds.) PSIVT 2009. LNCS, vol. 5414, pp. 295–305. Springer, Heidelberg (2009) 8. Weickert, J.: Anisotropic Diﬀusion in Image Processing. B. G. Teubner (1998) 9. Weickert, J.: Coherence-enhancing diﬀusion ﬁltering. International Journal of Computer Vision 31(2-3), 111–127 (1998) 10. Carmona, R.A., Zhong, S.: Adaptive smoothing respecting feature directions. IEEE Transactions on Image Processing 7(3), 353–358 (1998) 11. You, Y.-L., et al.: Behavioral analysis of anisotropic diﬀusion in image processing. IEEE Trans. Image Processing 5, 1539–1553 (1996) 12. Rudin, L., Osher, S., Fatemi, E.: Nonlinear Total Variation based noise removal algorithms. Physica D 60, 259–268 (1992) 13. Chan, T., Marquina, A., Mulet, R.: High Order Total Variation-based Image Restoration. SIAM J. on Scientiﬁc Computing 22(2), 503–516 (2000) 14. Fang, L., et al.: Image restoration combining a total variational ﬁlter and a fourthorder ﬁlter. Journal of Visual Communication and Image Representation 18(4), 322–330 (2007) 15. Hamza, A.B., et al.: Removing noise and preserving details with relaxed median ﬁlters. Journal of Mathematical Imaging and Vision 11(2), 161–177 (1999) 16. Tasdizen, T., et al.: Geometric surface smoothing via anisotropic diﬀusion of normals. IEEE visualization 1(1), 125–132 (2002) 17. Lysaker, M., Osher, S., Tai, X.-C.: Noise removal using smoothed normals and surface ﬁtting. IEEE Transactions on Image Processing 13(10), 1345–1357 (2004) 18. Kamgar-Parsi, B., Rosenfeld, A.: Optimally isotropic Laplacian operator. IEEE Transactions on Image Processing 8(10), 1467–1472 (1999)

Enhancement of Blurred and Noisy Images Based on an Original Variant of the Total Variation Khalid Jalalzai and Antonin Chambolle Centre de Mathématiques Appliquées (CMAP), École Polytechnique, 91128 Palaiseau Cedex, France [email protected], [email protected]

Abstract. In this paper, we introduce a new variant of the total variation (T V ). Its purpose is to simplify T V -based restoration when the image is degraded by some kernel which is easily computed in the Fourier domain (blur, Radon transform...). We actually replace the T V term by a mere L1 norm of some ﬁeld, for which the optimization is much easier. This approach permits us to use a recent and fast algorithm to enhance, in particular, blurred and noisy images. We also compare our approach with standard total variation based denoising and show that it avoids the famous staircasing eﬀect.

1

Introduction

In 1992, Rudin, Osher and Fatemi (ROF) introduced the total variation in their founding article [13] as a regularizing criterion for inverse problems in imaging. This has been fruitful in image restoration since it can regularize images without smoothing the edges. A possible approach to tackle the minimization of ROF’s problem consists in the generic forward-backward splitting method studied for instance by Combettes and Wajs in [3]. This consists in minimizing (ϕ + ψ) where ϕ and ψ are both convex functions with certain regularity properties. Usually in signal restoration, given a signal u, ϕ(u) is the so-called data ﬁdelity term and is equal to 12 Au − g2 where g is a noisy signal which also underwent a linear perturbation A. The second term ψ usually reﬂects a priori knowledge about the noise for instance. In case ψ = T V as in ROF’s problem, the main drawback is that it is usually diﬃcult to compute (even with a small error) the minimizer, namely the proximal operator proxT V (see Moreau [9] or again Combettes and Wajs [3] for more about this). Therefore, what we propose in this article is a variant of ROF’s problem where ψ is simply the L1 norm of some ﬁeld p. This new term preserves the nice properties of the original total variation. Its relevance is due to the fact that its proximal operator is easy to compute and leaves the way open to compressed sensing-type algorithms (see Nesterov [12] or Beck and Teboulle [2] for instance). X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 368–376, 2009. c Springer-Verlag Berlin Heidelberg 2009

Enhancement of Blurred and Noisy Images

369

However our idea is diﬀerent from the "Augmented lagrangian" (see Tai and Wu [15]) or "Split Bregman" (see Goldstein and Osher [6]) methods where the ﬁeld p must satisfy at convergence (sometimes approximately) the constraint p = ∇u, while in our approach p might be quite far from being a gradient.

2

Few Notations

From now on, an image u will be represented by an n×n matrix with real entries i.e. an element of X = IRn×n . To simplify matters in the sequel, especially when we shall consider the discrete Fourier transform of u, we assume that the image u is also periodic and deﬁned for all k ∈ ZZ by ui+kn,j+kn = ui,j with i, j ∈ {1, ..., n}. To deﬁne the total variation of the image u, we ﬁrst have to introduce a discretized version of the gradient. For u ∈ X, it is the vector ∇u of Y = X × X given by ui+1,j − ui,j (∇u)i,j = , ui,j+1 − ui,j for i, j = 1, ..., n. Finally, the most simple approximation of the total variation of u ∈ X is deﬁned by T V (u) = |(∇u)i,j | i,j

where | · | is simply the Euclidian norm of IR2 . Let us also introduce two important operators: the divergence div p of an element p ∈ Y and the laplacian Δv of an image v. By analogy with the continuous setting, we want them to satisfy div p, uX = −p, ∇uY and Δv = div ∇v,

(1)

for all u ∈ X.

3

The TV-Based Classical Approach

Given a noisy image g which has also been exposed to a linear perturbation A, the Rudin, Osher and Fatemi method suggests to minimize the following functional 1 2 F (u) = Au − g + λT V (u) (2) 2 to restore the image g. The positive parameter λ controls the regularization level. Actually the T V term is not diﬀerentiable and in practice it is often replaced by another approximation of the total variation: ε2 + |(∇u)i,j |2 T Vε (u) = i,j

370

K. Jalalzai and A. Chambolle

where ε is a positive real number. Therefore, we are led to ﬁnd the unique uε which minimizes 1 2 Fε (u) = Au − g + λT Vε (u). 2 We are actually facing a smooth convex optimization problem which can be solved easily with the gradient method. To do so, it is enough to consider a sequence (un ) of images and a small enough gradient step h > 0 that satisfy un+1 = un − h AT (Aun − g) + λ∇T Vε (un ) ⎛ ⎞ (∇un )i,j ⎠, ∇T Vε (un ) i,j = − div ⎝ ε2 + |(∇un )i,j |2

with

for any i, j = 1, ..., n. It remains to choose u0 : it would be wiser to take it as close as possible to the minimizer, consequently u0 = g seems to be a good choice. Unfortunatelly, the simple scheme which is suggested above is fairly slow since it converges as O n1 which means that there exists a positive real C such that Fε (un ) − Fε (uε ) ≤

C . n

A proof of this classical result can be found in [10], [11] or even [2]. Actually, in [11], Nesterov proposes a variant of the gradient algorithm with convergence rate O( n12 ) which solves the problem. It is as follows:

L 2 vn = argmin Fε (un ) + v − un , ∇Fε (un )X + v − un , v ∈ X , 2 n 1 2 [Fε (uk ) + w − uk , ∇Fε (uk )X ] + w − u0 , w ∈ X , wn = argmin 2 k=0

un+1

2 k+1 wn + vn , = k+3 k+3

where L is the Lipschitz constant of ∇Fε . This algorithm combines eﬃciently classical gradient method (for the calculation of vn ) and conjugate gradient method (calculation of wn ). We refer to Nesterov [10] for further explanations on these two techniques. See also Beck-Teboulle [2] for a recent, simpler variant.

4

A Variant of TV

Let u ∈ X be an image. The main idea is to replace the T V term in (2) by J(u) = min p1 p∈Y Πp=∇u

Enhancement of Blurred and Noisy Images

371

2 2 where on the one hand, p1 = i,j (p1i,j ) + (p2i,j ) when p = (p1 , p2 ) ∈ X ×X and on the other hand, Π is the projection on the gradients deﬁned by Πp = ∇¯ v, where v¯ realizes the minimum min ∇v − p.

v∈X

(3)

Here · is the Euclidian norm of Y . Remark by the way that we have J(u) ≤ T V (u) ≤ T Vε (u). for any u ∈ X. This is a straightforward consequence of the deﬁnition. In the sequel, we shall detail some other interesting properties of this functional which makes us believe that it behaves the same way as T V . Let us get back to work: the solution of (3) is characterized by the EulerLagrange equation ∇∗ (∇u − p) = 0 or, using the notation introduced in (1), Δu = div p, (we recall that our operators ∇, div and Δ are here discrete operators with periodic boundary conditions). Therefore, J(u) =

min

p∈Y div p=Δu

p1 .

Hence, the Rudin, Osher and Fatemi’s problem expressed in terms of this new functional consists in minimizing G(p) =

1 2 Au − g + λp1 2

over (p, u) which satisfy the constraint Δu = div p. Lately, minimization of such functionals has attracted much attention in data compression in particular and was the subject of many papers. Among those, two recent articles by Nesterov [12] and by Beck and Teboulle [2] focus on the minimization of objective functions which can be decomposed as a sum ϕ+ψ where ϕ is a continuously diﬀerentiable convex function whose gradient is Lipschitz continuous and ψ is a continuous convex function which is possibly nonsmooth but is simple in the sense that its proximal operator is easy to compute (see Combettes and Wajs [3] for the deﬁnition). These characteristics suit perfectly the two terms composing G and we henceforth denote ϕ(p) =

1 2 AΔ−1 div p − g and ψ(p) = p1 . 2

372

K. Jalalzai and A. Chambolle

In their article, Beck and Teboulle describe the following scheme to construct a minimizing sequence (pn ) for G: q1 = p0 ∈ Y, t1 = 1, L 2 pn = argmin ϕ(qn ) + p − qn , ∇ϕ(qn )Y + p − qn + p1 , p ∈ Y , 2 2 1 + 1 + 4tn , tn+1 = 2 tn − 1 qn+1 = pn + (pn − pn−1 ), tn+1 −1

is the Lipschitz constant of ∇ϕ. where L = 12 (1 − cos( 2π n )) Remark that in this algorithm, each iteration needs only one computation of the gradient if things are done properly. As for Nesterov’s algorithm, which converges as O( n12 ) as does Beck and Teboulle’s one, and which is again a clever combination of gradient method and conjugate gradient, it demands two calculations of the gradient which slows down notably each iteration.

5

The Continuous Setting

Let us mention in this section some properties of the functional J in the continuous setting. We refer to Jalalzai [7] for proofs and further results. First of all, let us ﬁx some notations especially for this section. Henceforth, Ω will designate an open set of IRn with a smooth enough boundary and to simplify matters we ﬁrst place ourselves in the context of functions u whose distributional derivatives are integrable functions that we denote Du, i.e. u lies in Sobolev space H 1 (Ω). The functional we previously introduced is a discretization of n 2 J(u) = inf |φ|, φ ∈ L (Ω) and Πφ = Du . Ω

where Π is the orthogonal projection on gradients as in section 4. Formally n speaking, given a function φ ∈ L2 (Ω) we set Πφ = D¯ v where v¯ minimizes min Dv − φL2 (Ω)n .

v∈H 1 (Ω)

n

It is actually easy to see that there exists a unique ψ ∈ L2 (Ω) such that we have the so-called Helmholtz decomposition ¯ φ = Πφ + ψ where ∇v · ψ = 0 given any v ∈ C 1 (Ω). Ω

(we refer to Dautray-Lions [4] or Temam [14] for more about this topic). If we put things together, we proved that n 2 1 ¯ |Du + ψ|, ψ ∈ L (Ω) and ∇v · ψ = 0 ∀v ∈ C (Ω) . J(u) = inf Ω

Ω

Enhancement of Blurred and Noisy Images

373

Nonetheless, this new formulation of J stays meaningful even when u is simply a function of bounded variation in Ω (denoted u ∈ BV (Ω)) which means that its n distributional derivative Du is this time in Mb (Ω) , the space of IRn -valued ﬁnite Radon measures on Ω. Henceforth, we also let ψ range in the space Mb (Ω)n . We refer to Ambrosio, Fusco and Pallara [1] or even Giusti [5] for properties of bounded variation functions and for other measure theory considerations. All this motivated a new deﬁnition of J when u ∈ BV (Ω), namely: n 1 ¯ J(u) = inf |Du + ψ|, ψ ∈ Mb (Ω) and ∇v dψ = 0 ∀v ∈ C (Ω) . Ω

Ω

Note by the way that J(u) is obviously well-deﬁned for any u ∈ BV (Ω) since |Du|. (4) J(u) ≤ Ω

Thanks to a classical convex duality argument, it is possible to show that under some additional assumptions on Ω, we have ∇w · Du, w ∈ C 1 (Ω) and ∇w∞ ≤ 1 . J(u) = sup Ω

Using this dual formulation one can prove the following result: Theorem 1. Let Ω be an open set in IRn and u = χE be the characteristic of a ﬁnite-perimeter set E ⊂ Ω, or even let u ∈ BV (Ω) with a derivative Du concentrated on the jump set. Then J(u) = |Du|. Ω

The proof is mostly based on the fact that rectiﬁable sets admit approximate tangent hyperplanes. Remark that when Du has a diﬀuse part, inequality (4) may be strict. The latter theorem legitimates the use of functional J in the image processing context since it shows that J behaves the same way as T V .

6

Preliminary Numerical Simulations

In this last section, we compare the two diﬀerent approaches based on the functionals T V and J. For this purpose, we use the two algorithms we presented above. In our implementation, Beck-Teboulle’s algorithm does 2 times less iterations since it needs to compute four Fourier transforms per iteration whereas Nesterov’s algorithm needs only two. We think that one can do much better especially in the case we consider J since it makes extensive use of Fourier transform methods and therefore is easily parallelizable. Moreover, the functional J seems to avoid the famous staircasing eﬀect (see Louchet and Moisan’s article [8]) produced by the T V minimization. Indeed, the latter yields images with peculiar local conﬁgurations. J does not. All along these tests, the regularization parameter λ is maintained equal to 1. For all the simulations, we used a personal computer with a 2 Ghz Core2 Duo processor and we let the two Matlab programs run for exactly 20 seconds.

374

6.1

K. Jalalzai and A. Chambolle

First Example

We look at the 256 × 256 Lenna image. This photo went through a Gaussian blur of standard deviation σblur = 1.5 followed by an additive zero-mean Gaussian noise with standard deviation σnoise = 4. The original image is represented in Fig. 1. We then implemented and runned Beck-Teboulle and Nesterov’s algorithms to restore the image. The results of these two experiments are shown in Fig. 3 and 4.

Fig. 1. Original Lenna photo

Fig. 3. J-processed iterations

6.2

Lenna,

Fig. 2. σblur = 1.5, σnoise = 4

600

Fig. 4. T V -processed iterations

Lenna,

1000

Second Example

The second example aims to compare the deblurring for a text scan. The ﬁrst ﬁgure is the original image. In Fig. 2 the 256 × 256 poem image underwent the same disruption process with parameters σblur = 1 and σnoise = 4.

Enhancement of Blurred and Noisy Images

Fig. 5. The ﬁrst verse of a famous Verlaine’s poem

Fig. 7. J-processed iterations

poem,

600

375

Fig. 6. σblur = 1, σnoise = 4

Fig. 8. T V -processed iterations

poem,

1200

References 1. Ambrosio, L., Fusco, N., Pallara, D.: Functions of Bounded Variation and Free Discontinuity Problems. Oxford University Press, Oxford (2000) 2. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences (accepted) (2008) 3. Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. SIAM Journal on Multiscale Modeling and Simulation 4(4), 1168–1200 (2005) 4. Dautray, R., Lions, J.-L.: Mathematical Analysis VI and Numerical Methods for Science and Technology. Evolution Problems II, vol. 6. Springer, Heidelberg (1993) 5. Giusti, E.: Minimal Surfaces and Functions of Bounded Variation. Birkhäuser, Basel (1984) 6. Goldstein, T., Osher, S.: The Split Bregman Method for L1 Regularized Problems. UCLA CAAM Report 08-29 (2008)

376

K. Jalalzai and A. Chambolle

7. Jalalzai, K.: Étude des propriétés d’une variante de la variation totale. Master thesis (2008) 8. Louchet, C., Moisan, L.: Total variation denoising using posterior expectation (2008), http://hal.archives-ouvertes.fr 9. Moreau, J.-J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. C. R. Acad. Sci. Paris Sér. A Math. 255, 2897–2899 (1962) 10. Nesterov, Y.: Introductory lectures on convex optimization. Kluwer Academic Publishers, Dordrecht (2004) 11. Nesterov, Y.: Smooth minimization of non-smooth functions. Mathematical Programming (A), pp. 127–152 (2005) 12. Nesterov, Y.: Gradient methods for minimizing composite objective function. CORE Report (2007) 13. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 14. Temam, R.: Navier-Stokes Equations Theory and Numerical Analysis. AMS Bookstore (2001) 15. Tai, X.-C., Wu, C.: Augmented Lagrangian Method, Dual Methods and Split Bregman Iteration for ROF Model. UCLA CAAM Report 09-05 (2009)

Coarse-to-Fine Image Reconstruction Based on Weighted Diﬀerential Features and Background Gauge Fields Bart Janssen, Remco Duits, and Luc Florack Eindhoven University of Technology, Dept. of Biomedical Engineering & Dept. of Mathematics and Computer Science {B.J.Janssen,R.Duits,L.M.J.Florack}@tue.nl

Abstract. We propose an iterative approximate reconstruction method where we minimize the diﬀerence between reconstructions from subsets of multi scale measurements. To this end we interpret images not as scalar-valued functions but as sections through a ﬁbered space. Information from previous reconstructions, which can be obtained at a coarser scale than the current one, is propagated by means of covariant derivatives on a vector bundle. The gauge ﬁeld that is used to deﬁne the covariant derivatives is deﬁned by the previously reconstructed image. An advantage of using covariant derivatives in the variational formulation of the reconstruction method is that with the number of iterations the accuracy of the approximation increases. The presented reconstruction method allows for a reconstruction at a resolution of choice, which can also be used to speed up the approximation at a ﬁner level. An application of our method to reconstruction from a sparse set of diﬀerential features of a scale space representation of an image allows for a weighting of the features based on the sensitivity of those features to noise. To demonstrate the method we apply it to the reconstruction from singular points of a scale space representation of an image.

1

Introduction

Reconstruction from signal samples is a long standing problem in signal and image analysis [20]. We present a method for the approximation of a signal or image from its generalized samples, i.e. the samples are given on a non-equidistant grid and were obtained by means of spatially varying ﬁlters. Variational reconstruction of non-equidistant image samples has recently become of interest to the image compression community [9] where signiﬁcant gains in reconstruction quality have been obtained by introducing anisotropic non-linear regularization strategies. In the scale space community a general interest in reconstruction from generalized samples has been there for quite some time [19, 18, 14, 12, 13]. We propose a method that produces an image that approximately satisﬁes all features. Features that are more robust to perturbations of the source image are given a higher weight, which steers the reconstruction method such that those X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 377–388, 2009. c Springer-Verlag Berlin Heidelberg 2009

378

B. Janssen, R. Duits, and L. Florack

features are better approximated than those that are more sensitive to noise. This leads to a more robust method compared to interpolating methods. A gauge ﬁeld is introduced by means of covariant derivatives on a vector bundle. This way a model of the to be reconstructed image can be incorporated in the energy functional which is minimized to ﬁnd a suitable reconstruction. Using this gauge ﬁeld we can construct a coarse-to-ﬁne image reconstruction method. A coarse-to-ﬁne approach naturally leads to a more eﬃcient algorithm in terms of memory consumption and computational eﬃciency.

2

Image Reconstruction

In the reconstruction problem we aim for a reconstruction from a set of linear functionals on an image. These functionals represent measurements on the image and are henceforth called features. More rigorously: a feature di ∈ R of an image f ∈ L2 (R2 ) measured with a ﬁlter ψi ∈ L2 (R2 ) is given by di = (ψi , f )L2 , i = 1 . . . P in which (·, ·)L2 denotes the L2 -inner product. In general the set of features do not describe the input image f unambiguously (they do not constitute a frame), and there is need for a model to which the reconstruction should adhere. When such a model can be described by a (semi-)norm the reconstruction can be obtained directly by means of an orthogonal projection onto the features [14]. Nielsen and Lillholm [19, 18] proposed to ﬁnd a reconstruction from its features using a nonlinear regularization term (model). Their so called observationconstrained evolution ensures that the features are interpolated. When measurements are contaminated by noise approximation is often favored over interpolation. In the following we will not discuss the interpolation but approximation of a set of P features {di }P i=1 that were obtained by means of the ﬁlters {ψi }i=1 .

3

Approximation

Instead of searching for a signal that interpolates the given features one can try to ﬁnd a signal that approximates the features. In the case of noisy measurements the latter approach is often preferred. We now aim for the function g ∈ H1 (R2 ) that minimizes P λ 2 E(g) = ((g, ψi )L2 − di ) + ||∇g||2 dV , (1) 2 2 R i=1 regularization term data term where λ ∈ R+ a parameter that controlls the quality of the approximation. As λ tends to 0 the approximation will approach the interpolation of the features. The minimizer of this linear functional can be found by ﬁnding the unique g that solves the following Euler equation: P ψi ((ψi , g)L2 − di ) − λΔg = 0 . (2) i=1

Coarse-to-Fine Image Reconstruction

379

The parameter λ takes into account each feature with the same weight. This is not desirable when the features are not normalized and even after normalization one can improve on the selection of the weights. We allow for these improvements by introducing P extra parameters (which we will call feature weights), αi ∈ R+ , i = 1 . . . P , that will be set to a ﬁxed value based on the properties of the features. In case of reconstruction from diﬀerential features of a scale space representation of an image, which is the main motivation for our method, we can select the newly introduced parameters based on the noise propagation in the scale space representation of an image. The global parameter λ can be absorbed by these parameters but will be maintained in our formulation for the sake of clarity. For ﬁxed αi we now search for the g that satisﬁes arg min E(g) = arg min g∈L2 (R2 )

P

g∈L2 (R2 ) i=1

2

αi ((g, ψi )L2 − di ) +

λ 2

R2

||∇g||2 dV .

(3)

In the next section we will discuss how the feature weights can be selected. 3.1

Noise Propagation

In order to be able to select sensible values for the αi parameters that appear in eq. (3), we need to make some assumptions on the noise and the set of ﬁlters {ψi }P i=1 that are used to extract the measurements. With regard to the noise we assume additive zero-mean white Gaussian noise which has a correlation distance of τ pixels. In recent work about stability of toppoints [2] (which are singular points of a Gaussian scale space representation of an image) this was found to be a sensible assumption. In our application we will reconstruct from diﬀerential structure taken from the Gaussian scale space representation of the input image f , therefore we assume that the set of ﬁlters {ψi }P i=1 consists of Gaussian kernels or derivatives thereof. The idea is now to construct the weights αi according to the sensitivity of their associated diﬀerential features to noise. In order to estimate the sensitivity of a feature di of the image f that is contaminated by additive noise we can adopt work on noise propagation in scale space by Blom [4]. He proposes to 2 compute at a certain scale t > 0 the momenta Mm = Nmx ,my , Nnx ,ny x ,my ,nx ,ny of derivatives of orders mx , my , nx , and ny of the ﬁducial noise function N . He assumes only the covariance matrix N 2 of the noise to be given. In case the correlation distance τ is much smaller than the scale t, 2 Mm N 2 x ,my ,nx ,ny

τ −1 12 (mx +my +nx +ny ) 2t

4t

Qmx +nx Qmy +ny ,

(4)

with Qn = (n + 1)!! for n even and Qn = 0 otherwise. Features that are sensitive to perturbations on the source image f should inﬂuence the ﬁnal result less than features that are relatively insensitive to these perturbations. Therefore we i compute αi from eq. (4) such that αi ∝ Mn−2 i ,ni ,ni ,ni at scale t . The parameters x

y

x

y

nix , niy , and ti are the derivative order in the x direction, the derivative order

380

B. Janssen, R. Duits, and L. Florack

in the y direction and the scale of the ith ﬁlter ψi . Here we stress that these estimations are based on the assumption that P the ﬁlters are partial derivatives of a Gaussian. We furthermore ensure that i=1 αi = 1, which essentially makes αi independent of the value of N 2 and τ . 3.2

Discretization

We can try to solve an approximation to g by discretizing eq. (2) (augmented with the feature weights) or discretize the energy functional in eq. (3), and thereafter ﬁnding a discrete minimizer of the discretized energy. These two approaches can be equivalent for a slick choice of so called test functions that are involved in the former method. We will proceed by elaborating on directly discretizing the energy. To solve g from eq. (3) we will approximate g by a β-spline of order n:

(eiω/2 − e−iω/2 )n+1 β n (x) = F −1 ω → (x) , (5) (iω)n+1 where F −1 denotes inverse Fourier transformation. Equality (5) is equivalent to the (n + 1)-fold convolution of the β-spline of order 0 ⎧ ⎨ 1 − 12 < x < 12 0 β (x) = 21 |x| = 12 ; (6) ⎩ 0 otherwise a rectangle. Further details concerning β-splines can be found in eg. [22]. It was shown in the context of optic ﬂow [17] and registration [21] that such an approach has computational advantages over a ﬁnite diﬀerence approach. Arigovindan et al. [1] showed good results in his application of this approach to (a multigrid scheme for) image and vector ﬁeld interpolation. Moreover it allows for a coarse-to-ﬁne implementation in a elegant way because of the 2-scale relation

x x n+1 n −n n β = 2 − k . (7) β k 2j 2j−1 k∈Z

The n-th order β-spline approximation of g in two spatial dimensions at resolution a > 0, is given by g˜a (x, y) =

M−1 −1 N l=0 k=0

ck,l β n (

x y − k)β n ( − l) , a a

(8)

with ck,l , x, y ∈ R, β n (·) the central β-spline of order n ∈ N, resolution parameter a, and N, M ∈ N correspond to the width and height of the image in pixels. Notice this is a representation of the image in the continuous domain and that g˜a ∈ C n (R2 ), i.e. n-times continuously diﬀerentiable.

Coarse-to-Fine Image Reconstruction

381

The regularization term in eq. (3), R2 ||∇g||2 dxdy, can be approximated with the help of eq. (8) by ∞ ∞ ∞ 1 M−1 −1 N ∂β n xi 2 −k ||∇ga (x, y)|| dxdy = ck,l cm,n a −∞ −∞ −∞ ∂xi i=0 l,n=0 k,m=0 x ∞

x ∂β n xi 1−i 1−i − m dxi − l βn − n dx1−i , (9) βn ∂xi a a a −∞ where (x1 , x2 ) correspond to (x, y) in eq. (8). When we consider the integrals in the previous equations we notice that it can be expressed by a convolution: ∞ ∞ ∂β n x ∂β n ∂β n x ∂β n −k − m dx = −a (u) ((m − k) − u) du . a ∂x a ∂u −∞ ∂x −∞ ∂u (10) This is easily veriﬁed by substitution of integration variable (u = xa − k) and noting that β n (x) = β n (−x) for all x ∈ R. We furthermore note that a derivative of a central β-spline of degree n is again a linear combination of β-splines at the expense of lowering its degree to (n − 1) ∂ n β (x) = β n−1 (x + 1/2) − β n−1 (x − 1/2) . ∂x As a result we can write eq. (9) in matrix-vector notation as ∞ ∞ ||∇ga (x, y)||2 dxdy = cT Rc , −∞

(M−1)(N −1)

with c = {ci }i=0

(11)

(12)

−∞

and

N −1 M−1 ∂β 2n (m − k) R = aβ 2n (n − l) n,l=0 ⊗ −a + ∂x m,k=0 M−1 N −1 ∂β 2n (n − l) −a ⊗ aβ 2n (m − k) m,k=0 . ∂y n,l=0

(13)

We will express the inner product in the data term in equation (3) in terms of β-splines as well. This leads to an expression similar to eq. (10), (ga , ψi )L2 (R2 ) = (−1)ni +mi

N −1,M−1

ck,l (β n ∗ ψi )(k − xi , l − yi ) ,

(14)

k,l=0

where (xi , yi ) and (ni , mi ) are the location and diﬀerential order of the ith ﬁlter ψi respectively. In contrast to the discretization of the regularization we will not derive a closed form expression for this convolution but we will approximate the β-spline in eq. (14) by a Gaussian. Where we use the observation in [23] that 6x2 6 n ∼ β (x) = e− (n+1) . (15) π(n + 1)

382

B. Janssen, R. Duits, and L. Florack

The data term can be expressed in matrix-vector notation by Edata (c) = ||Sc − d||2 ,

(16)

(N −1)(M−1),P

and d = {di }P where S = {(β n ∗ ψi )(k − xi , l − yi )}k,l=0,i=1 i=1 . Now we can write the minimizer of equation (3) in matrix-vector notation as T S S − λR c = S T d . (17) This linear system of equations can be solved using a conjugate gradient (CG) method [3]. In case the matrix S is sparse it is beneﬁcial to apply a multigrid method [5]. Mainly due to the non-sparseness of S, the conjugate gradient method is preferred. Notice that, in this speciﬁc case, R can be expressed as a convolution. For large images it is infeasible to explicitly compute S T S, therefore we compute the matrix vector product ˆ c = S T Sc that appears in a conjugate ˜ = Sc and thereafter evaluating ˆc = S T c ˜. gradient iteration by ﬁrst evaluating c

4

Adaptation to a Gauge Field

In the previous sections we used a very simple model as a regularization term. For several applications it would be beneﬁcial if we were able to introduce a more sophisticated model of the image we want to reconstruct. Feature based image editing [16] and optic ﬂow estimation [13,8] are applications that potentially have great beneﬁt of such a reﬁnement. Recently an image in-painting method was introduced that achieves a model reﬁnement by means of covariant derivatives on a vector bundle that are guided by a user selectable gauge ﬁeld [10]. We will adapt a similar approach. The basic idea is to replace the gradient that appears in the regularization term of eq. (3) by a covariant derivative DAh that is biased by a gauge ﬁeld h ∈ H2 (R2 ). To this covariant derivative the gauge ﬁeld h should be “invisible”, i.e DAh h = 0. If we were able to put h to be the original image f the approximation would exactly produce f again. To this end we interpret f not as a scalar function but as a section through a ﬁbered space E = R2 × R+ . Heuristically this means that we rescale intensity by a spatially varying factor, the unit section σ. Thus we consider f σ instead of f to model intensity values in the image (the latter is a special case in which σ(x) = (x, 1) ∀x ∈ R2 ). This implies that when we consider derivatives, we need to account for the spatial variability of σ. In the next subsection we will introduce to this end a connection on a vector bundle. There, we will also make the heuristic description of our approach presented here a bit more rigorous. For the reader who is not familiar with the concept of vector bundles it could be useful to take notice of Fig. 1 before reading the next subsection, since it aids in developing the right geometrical interpretation of the presented material. 4.1

Connections on Vector Bundles

Consider a vector bundle (E, π, M ), with total space E = R2 × R+ , base space M = R2 , and projection π : E → M . π projects a point in E (a point in M augmented with an intensity L ∈ R+ ) to M in the following manner

Coarse-to-Fine Image Reconstruction

383

(18)

π(x, y, L) = (x, y) .

L amounts to a certain physical quantity such as luminous intensity, which is expressed in candela (cd). Next we deﬁne a section s : M → E such that π◦s = idM , where idM denotes the identity map on M . We deﬁne the association of a section σf with unique image f ∈ L2 (R2 ) as f ↔ σf ⇔ ∀(x,y)∈R2 σf (x, y) = (x, y, f (x, y)) .

(19)

The multiplication of such a section σf by an image g is given by (20)

gσf = σf g .

Let σ ˜ denote the unit section σ ˜ (x, y) = (x, y, L0 ), with L0 a ﬁxed luminous intensity unit (eg. 1cd). We want to deﬁne a connection D over the space of sections Γ (E) on E. Let L (Γ (T M ), Γ (E)) denote the space of linear operators that map a section of a tangent bundle on M to a section of a vector bundle. Here we stress that a section of a tangent bundle, V ∈ Γ (T M ), is just a vector ﬁeld on M . A map D : Γ (E) → L (Γ (T M ), Γ (E))

(21)

is a connection on a vector bundle iﬀ it possesses the following properties, cf. [15], pp.106. In the following we will use standard notation DV σ = (Dσ) (V ). 1. D is tensorial in V : DV +W σ = DV σ + DW σ for V, W ∈ Γ (T M ), σ ∈ Γ (E) Df V σ = f DV σ for f ∈ C ∞ (M, R), V ∈ Γ (T M ) .

(22) (23)

2. D is R-linear in σ: DV (σ + τ ) = DV σ + DV τ for V ∈ Γ (T M ), σ, τ ∈ Γ (E)

(24)

and it satisﬁes the Leibniz product rule: DV (f σ) = V (f )σ + f DV σ for f ∈ C ∞ (M, R) .

(25)

Suppose we have a section D on a vector bundle. Then it must satisfy the four properties (eqs. (22) to (25)) mentioned above. Therefore we must have the following identity σ+ Dσ(X)(c(t)) = D(z σ ˜ )(X)(c(t)) = X|c(t) (z)˜

2

z(c(t))c˙i (t)D∂xi σ ˜

(26)

i=1

2 for all sections σ = z σ ˜ , and vector ﬁelds X = i=1 c˙i ∂xi . Here c : (0, 1) → M ˙ i = 1, 2, with c(t) ˙ = ddt c(t), and is a smooth curve on M , c˙i (t) = dxi , c(t) z ∈ C ∞ (M, R) an arbitrary image. By {dxi }2i=1 = {dx, dy} we denote the dual frame in the cotangent bundle T ∗ M .

384

B. Janssen, R. Duits, and L. Florack

For each i = 1, 2 D∂xi σ ˜ should be a section on the vector bundle. Such a section can be identiﬁed with a function Ai : M → R

(27)

D∂xi σ = σAi = Ai σ ˜.

(28)

by eq. (19) , i.e.

Substituting eq. (28) into eq. (26) yields Dσ(X)(c(t)) =

2 i c˙ (t)∂xi (z)(c(t)) + z(c(t))c˙i (t)Ai (c(t)) σ ˜.

(29)

i=1

2 So each connection is parameterized by the co-vector ﬁeld A = i=1 Ai dxi . At this point we still have a degree of freedom, namely we still can select a speciﬁc co-vector ﬁeld. In our application we want a certain image h to be “invisible” so for a ﬁxed h we select A = Ah such that A D h (σh ) = 0 , (30) h i.e. DcA ˙ (σh ) = 0 for all curves c, holds for a speciﬁc image h. Here we made the dependence of D on Ah explicit in the superscript notation (in the previous equations we left it out in order to facilitate readability). Given the requirement of eq. (30) we explicitly calculate i c˙ (t)(∂xi h)(c(t)) + h(c(t))c˙i (t)Ai (c(t)) σ ˜=0σ ˜ for all curves c : (0, 1) → M

⇔ (∇h)(c(t)) + h(c(t))A(c(t)) = 0 ⇔ Ah (c(t)) = −

(31) 2

∂xi logh(c(t)) dxi ∀h>0 .

i=1

(32) Which gives us an expression for Ah (eq. (32)) provided h is strictly positive. This is a limitation of our method. However, for a system that observes physical quantities this is a realistic assumption. From the previous derivations we conclude that applying a covariant derivative that is gauged by an image h to an image f amounts to 2 2 A D h (σf ) (c) ˙ = c(f ˙ )+ Ai c˙i f σ ˜ = c(f ˙ )− ((∂xi logh)c˙i f ) σ ˜ (33) i=1

i=1

f ˙ σ ˜. = c(f ˙ ) − c(h) h

Where we used the following short notation: c(f ˙ )=

2 i=1

c˙i ∂xi (f ) = (c(t) ˙ · ∇f ) (c(t)) =

d f (c(t)) . dt

(34)

Coarse-to-Fine Image Reconstruction E σf (c(t+)) c(f ˙ )− fh c(h)| ˙ c(t) { c(f ˙ )|c(t)

C σf (c(t))

σh (c(0))

c(h)| ˙ c(t) f c(h)| ˙ c(t) h

c

{

{

y

σh (c(t))

c(t+) c(t)

D

σf

E A

{

σh (c(t+))

385

σh

↓π c(0) c(0) ˙

B

M

x

Fig. 1. A visualization of the calculation of a covariant derivative as described in eq. (33). The base space M corresponds to R2 and total space E corresponds to R2 ×R+ . We refer to the text right after eq. (33) for an explanation of this ﬁgure.

Note that eq. (33) can be rewritten as ˙ (c(t)) = σ ˜ (df + f Ah ) (c(t)) ˙ , ∀c:(0,1)→M : DAh σf (c(t))

(35)

˜) = σ ˜ (df + f Ah ). When we identify σf = f σ ˜ ↔ f this simpliﬁes to i.e. DAh (f σ DAh f = (d + Ah )f ,

(36)

in which Ah f is a multiplication. The calculation of a covariant derivative as described in eq. (33) allows for a geometrical interpretation. A visualization thereof, which is depicted in Fig. 1, will be described next. We stipulate this is a specially crafted example since there is only structure present in one single direction. Therefore we only have to construct a visualization for the calculation of a covariant derivative in the direction that is labelled by x in the ﬁgure. The derivative in the direction that is labelled by y simply vanishes. On the base space M a curve c : (0, 1) → M is drawn. We want to calculate the covariant derivative of the section σf at the point that corresponds to c(t) on the base space. The covariant derivative is gauged by the gauge ﬁeld h. Therefore another section, σh , is depicted in the ﬁgure. The gradient of σh at the point σh (c(t)) in total space E is depicted by a line, labelled A, through σh (c(t)) and σh (c(t + )). The line labelled D visualizes in a similar manner the gradient of σf at σh (c(t)). On the left side it is shown how the gradient of A is attenuated by the fraction of the values of σf (c(t)) and σh (c(t)). The value of this attenuated directional derivative is added to the directional derivative of σf at σf (c(t)) in the upper left of the ﬁgure to ﬁnally produce the result of eq. (33). To clarify the attenuation process we added Fig. 2 where the relevant lines are labeled the same as their corresponding lines in Fig. 1. In essence the energy functional for which we search a minimizer stays the same as the one for which a minimizer is

386

B. Janssen, R. Duits, and L. Florack

}

C A c(h) ˙ { f c(h) ˙ h

{

1

B

Fig. 2. Visualization of the ampliﬁcation of c(h) ˙ by congruence relations that are used in Fig. 1.

f (c(t))

} h(c(t))

f (c(t)) . h(c(t))

This image clariﬁes the

sought in eq. (3). We merely change the notion of a gradient, which is adapted to a gauge ﬁeld h, the resulting energy functional now reads E(g) =

P

2

αi ((g, ψi )L2 − di ) +

i=1

λ 2

R2

||DAh g||2 dV ,

(37)

where DAh is the covariant derivative or equivalently a linear connection acting on an image as in eq. (36).

5

Multi-scale Approximate Reconstruction from Singular Points

We will apply the gauged reconstruction of eq (37) to the reconstruction from singular points of a Gaussian scale space representation uf of an image f , with ∂u uf the unique solution to ∂sf = Δuf with initial condition uf (·, 0) = f . Singular 2 + points (x, y, s) ∈ R × R of uf are those points satisfying ∇uf (x, y, s) = 0 . (38) det∇∇T uf (x, y, s) = 0 For more information about catastrophe theory in general, its application in scale space theory and the calculation of the locations of singular points we refer to [11, 6, 7]. A ﬁlter ψi corresponding to a derivative at a certain position in the scale space of an image is given by ψi (x, y) = (2si )

ni +mi 2

∂ ni +mi 1 − ((x−xi )24s+(y−yi )2 ) i e . ∂(xni y mi ) 4πsi

(39)

Here we used multi-index notation i = (xi , yi , mi , ni , si ). A singular point is encoded by storing the second order derivative jet for each singular point location. The discretization proposed in Section 3.2 allows for a reconstruction at a certain resolution a > 0. We will select scales {2j }Jj=0 . First we ﬁnd all features which can be approximated well at the coarsest resolution J, i.e. those features for which ||ψi − PVa ψi || < , where PVa denotes the L2 -projection onto the

Coarse-to-Fine Image Reconstruction

387

Fig. 3. From left to right, (1) the source image “trui”, (2) reconstruction at resolution 65×65 pixels from 84 feature points, (3) reconstruction from 226 feature points at 129× 129 pixels and gauged by the image on its left, (4) reconstruction from 727 feature points at 257×257 pixels and gauged by the image on its left, (5) same reconstruction as the image on the left but not gauged, and (6) reconstruction from all 1070 feature points, no gauge ﬁeld. The features are up to second order diﬀerential structure obtained from the scale space rep. of the source image at its singular point positions.

set Va = {β n ( xa − k)β n ( xa − l)|k, l ∈ Z} and > 0 a small constant. Next we compute a reconstruction at resolution J using a constant gauge ﬁeld h. Then, for each scale j = J − 1 . . . 0 we ﬁnd the gauge ﬁeld by application of the two scale relation (see eq. (7)) to the reconstructed image at scale j + 1. To reduce memory consumption and gain computational eﬃciency all features that were used in a coarser scale reconstruction are left out such that those features are only implicitly encoded (via the gauge ﬁeld) in the reconstruction algorithm. See the caption of Figure 3 for a description of the experiments we conducted. Comparing the fourth and ﬁfth image shows that features which are not directly encoded are passed by the gauge ﬁeld (lower resolution images). In fact the diﬀerence between the two reconstructions is quite striking. We furthermore note that memory requirements and the computational complexity for the algorithms to produce these two images are equivalent. When the features of all 1070 singular points are directly used (right most image in Figure 3) the visual quality is more appealing. The memory requirements are however much larger. We also mention the method of feature selection for the next level is quite crude and can be improved by incorporating e.g. a feedback loop. These are possibilities for future exploration which are allowed by the presented framework.

6

Conclusions

We introduced a coarse-to-ﬁne image reconstruction method that approximates a set of generalized samples that are weighted according to their noise robustness. Information from a coarse resolution reconstruction is passed to a ﬁner resolution level by means of a gauge ﬁeld. To this end we considered the image not as a scalar function but as a section through a ﬁbered space. Application of the newly proposed method to the reconstruction from singular points of a scale space representation of an image shows the feasibility of the method.

388

B. Janssen, R. Duits, and L. Florack

References 1. Arigovindan, M.: Variational Reconstruction of Vector and Scalar Images from Non-Uniform Samples. PhD thesis, EPFL, Lausanne, Switserland (2005) 2. Balmashnova, E.: Scale-Euclidean invariant object retrieval. PhD thesis, Eindhoven University of Technology, Eindhoven, The Netherlands (2007) 3. Barret, R., Berry, M., Chan, T.F., et al.: Templates for the solution of linear systems: Building blocks for iterative methods. SIAM, Philadelphia (1994) 4. Blom, J.: Topological and Geometrical Aspects of Image Structure. PhD thesis, University of Utrecht, Utrecht, The Netherlands (1992) 5. Briggs, W.L., Henson, V.E., McCormick, S.F.: A Multigrid Tutorial. SIAM, Philadelphia (2000) 6. Damon, J.: Local Morse theory for solutions to the heat equation and Gaussian blurring. Journal of Diﬀerential Equations 115(2), 368–401 (1995) 7. Florack, L.M.J., Kuijper, A.: The topological structure of scale-space images. JMIV 12(1), 65–79 (2000) 8. Florack, L.M.J., Janssen, B.J., Kanters, F.M.W., Duits, R.: Towards a new paradigm for motion extraction. In: Campilho, A., Kamel, M.S. (eds.) ICIAR 2006. LNCS, vol. 4141, pp. 743–754. Springer, Heidelberg (2006) 9. Galic, I., Weickert, J., Welk, M., Bruhn, A., Belyaev, A., Seidel, H.: Image compression with anisotropic diﬀusion. JMIV 31(2-3), 255–269 (2008) 10. Georgiev, T.: Relighting, retinex theory, and perceived gradients. In: Proceedings of Mirage 2005 (March 2005) 11. Gilmore, R.: Catastrophe Theory for Scientists and Engineers. Dover Publications, New York (1993); Originally published by John Wiley & Sons, New York (1981) 12. Janssen, B.J., Duits, R., ter Haar Romeny, B.M.: Linear image reconstruction by Sobolev norms on the bounded domain. In: Sgallari, F., Murli, A., Paragios, N. (eds.) SSVM 2007. LNCS, vol. 4485, pp. 55–67. Springer, Heidelberg (2007) 13. Janssen, B.J., Florack, L.M.J., Duits, R., ter Haar Romeny, B.M.: Optic ﬂow from multi-scale dynamic anchor point attributes. In: Campilho, A., Kamel, M.S. (eds.) ICIAR 2006. LNCS, vol. 4141, pp. 767–779. Springer, Heidelberg (2006) 14. Janssen, B.J., Kanters, F.M.W., Duits, R., Florack, L.M.J., ter Haar Romeny, B.M.: A linear image reconstruction framework based on Sobolev type inner products. IJCV 70(3), 231–240 (2006) 15. Jost, J.: Riemannian Geometry and Geometric Analysis, 4th edn. Springer, Berlin (2005) 16. Kanters, F.M.W.: Towards Object-based Image Editing. PhD thesis, Eindhoven University of Technology, Eindhoven, The Netherlands (February 2007) 17. Le Besnerais, G., Champagnat, F.: B-Spline image model for energy minimizationbased optical ﬂow estimation. IEEE-TIP 15(10), 3201–3206 (2006) 18. Lillholm, M., Nielsen, M., Griﬃn, L.D.: Feature-based image analysis. International Journal of Computer Vision 52(2/3), 73–95 (2003) 19. Nielsen, M., Lillholm, M.: What do features tell about images? In: Proceedings on Scale Space 2001, pp. 39–50. Springer, Heidelberg (2001) 20. Shannon, C.E.: Communication in the presence of noise. In: Proc. IRE, vol. 37, pp. 10–21 (January 1949) 21. Thevenaz, P., Ruttimann, U.E., Unser, M.: A pyramid approach to subpixel registration based on intensity. IEEE-TIP 7(1), 27–41 (1998) 22. Unser, M.: Splines: A perfect ﬁt for signal and image processing. IEEE Signal Processing Magazine 16(6), 22–38 (1999) 23. Unser, M., Aldroubi, A., Eden, M.: On the asymptotic convergence of B-Spline wavelets to Gabor functions. IEEE-TIT 38(2), 864–872 (1992)

Edge-Enhanced Image Reconstruction Using (TV) Total Variation and Bregman Refinement Shantanu H. Joshi1 , Antonio Marquina2,3, Stanley J. Osher3 , Ivo Dinov1 , John D. Van Horn1 , and Arthur W. Toga1 1

3

Laboratory of Neuroimaging, University of California, Los Angeles, CA 90095, USA 2 Departamento de Matematica Aplicada, Universidad de Valencia, C/ Dr Moliner, 50, 46100 Burjassot, Spain Department of Mathematics, University of California, Los Angeles, CA 90095, USA

Abstract. We propose a novel image resolution enhancement method for multidimensional images based on a variational approach. Given an appropriate downsampling operator, the reconstruction problem is posed using a deconvolution model under the assumption of Gaussian noise. In order to preserve edges in the image, we regularize the optimization problem by the norm of the total variation of the image. Additionally, we propose a new edge-preserving operator that emphasizes and even enhances edges during the up-sampling and decimation of the image. Furthermore, we also propose the use of the Bregman iterative refinement procedure for the recovery of higher order information from the image. This is coarse to fine approach for recovering finer scales in the image first, followed by the noise. This method is demonstrated on a variety of low-resolution, natural images as well as 3D anisotropic brain MRI images. The edge enhanced reconstruction is shown to yield significant improvement in resolution, especially preserving important edges containing anatomical information. Keywords: Edge-preserving operators, total variation regularization, deconvolution, Gaussian blur, Bregman iteration, up/down sampling.

1 Introduction With the recent advances in low-cost imaging solutions and increasing storage capacities, there is an increased demand for better image quality in a wide variety of applications involving both image and video processing. Often times, owing to sensor shortcomings, low-power requirements, or environmental limitations, one is only able to acquire a low-resolution observation of the scene. The low-resolution data can exist in the form of still images, a sequence of image frames devoid of inter-frame motion, a single video sequence, or a collection of video sequences. Furthermore the observations can be corrupted by motion-induced artifacts either in the case of still images or videos. The collective approach that tackles the problem of reconstructing a high-resolution image from one or more of the above low-resolution observations is termed as superresolution. There are several prominent approaches to this problem, all of them largely employing various cues such as sub-pixel shifts between successive frames, the camera blur, defocus, and zoom, etc. These approaches can be divided into two types, ones that use motion information between successive frames (e.g., video super-resolution), and X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 389–400, 2009. c Springer-Verlag Berlin Heidelberg 2009

390

S.H. Joshi et al.

the others that use a motion-free approach. Most of these approaches usually expect multiple low-resolution observations as input. Super-resolution image reconstruction can be mathematically modeled as a nonlinear process consisting of a convolution operator acting on the image, followed by a down sampling operation and the mixing of additive noise. Most of the earlier research work in this area has been developed in the frequency domain approach using (discrete) Fourier transform and wavelet-transform based methods. For e.g. the approach of Tsai and Huang [13] first outlined the idea of super-resolution in their seminal paper. Peleg et al. [8] used the iterative back projection scheme to achieve image reconstruction. Yet another approach [12] uses projections on convex sets (POCS) of images to restrict the solution domain for reconstruction. A hybrid approach by Elad and Feuer [5] combines the POCS and the maximum likelihood approaches for both motion-based and motion-free super-resolution. A very different set of methods use the learning-based approach for super-resolution. The general idea here is to learn a set of image features from exemplar images and use them for the reconstruction of a high-resolution image. Capel and Zisserman [2] use PCA on face image databases to learn the image model and use it to reconstruct images from multiple views. Freeman et al. [6] learn a feature set of image patches that encode the relationships among different spatial frequencies from a large training set and use it as prior information for reconstructing higher frequencies for resolution enhancement. The reader is referred to an excellent monograph by Chaudhari and Joshi [4] for a comprehensive bibliography and references in the field. Along with a wide range of applications of super-resolution methods in tasks such as satellite image processing, surveillance, computer vision, and even video processing, there has been a considerable effort by researchers trying to apply these methods to medical imaging. In particular, MRI acquisitions usually have a low-resolution in the inter-slice direction, and it is of considerable interest to “fill-in” the intermediate slices. Carmi et al. [3] use sub-pixel shifted MR (Magnetic Resonance) images for high resolution reconstruction. Greenspan et al. [7] combine several low resolution images in the slice-select direction to achieve SR reconstruction. Kornprobst et al. [9] also achieve higher resolution in the slice-select direction for fMRI sequences. While super-resolution methods attempt to exploit the information redundancy in several low-resolution observations of images, at times, only a single low-resolution instance of the image is available. This is sometimes the case in MRI images, where due to economic or health reasons, a patient is scanned only once over a period of time, or the time elapsed between successive scans may be too large to preserve any temporal coherence to take advantage of. Based on this assumption, we will focus mainly on the problem of single frame high resolution reconstruction of images. Our approach will be based upon a variational model that uses the TV norm [11] as a regularizing functional. Recently, Marquina et al. [14] have proposed a new variational model based on the TV norm [11] for super-resolution of multidimensional images. They use a new multi-scale approach (Bregman iterations) for iterative refinement and recovery of finer details in images. We will follow this approach to solve the more general super-resolution problem using the TV norm as regularizing functional. In addition, we propose an iterative refinement procedure based on an original idea by Bregman [1], to improve spatial resolution. The proposed super-resolution method improves upon the behavior of any

Edge-Enhanced Image Reconstruction

391

interpolation method (including high order and sinc interpolation) because our method preserves edges satisfactorily avoiding Gibbs phenomenon, whereas the iterative refinement procedure allows us to recover fine scales of the image. The main contributions of this paper are as follows: – a three-dimensional variational model based on the TV norm [11] regularizer. – a new multi-scale approach (Bregman iterations) for iterative refinement and recovery of finer details in images. – a new piecewise-linear up(down) sampling operator that preserves edges. – application of this method for super-resolution for anisotropic 3D MRI images. This paper is organized as follows: Section 2 outlines the super-resolution model using TV regularization. In particular, it explains the variational model as well as a new scale-space approach that utilizes the Bregman iterative procedure for recovering finer details from images. Additionally, section 2.2 proposes a new edge-preserving up (down) sampling operator used in the model. Section 3 presents details of the numerical implementation of the model. Section 4 demonstrates experimental results for a few 2D natural images as well as 2D slices and 3D volumes of MRI images, followed by the summary.

2 Image Observation and Synthesis Model The low resolution image observation model can be formulated in a standard fashion as a down-sampled degraded version of the original high resolution image. We assume that the low resolution image f is defined on a subset of a plane Ω ⊂ Rk . For the purpose of this paper, k is either 2 or 3. Here onwards, all the notation will be specified for 3D images. The restriction to 2D images is straightforward. For a discrete representation, we assume f ∈ Rn × Rm × Rp . Let the unknown high resolution image to be estimated be given by u ∈ R2m × R2n × R2p . Then given a linear down sampling operator D, we can write the observation model as, f = D(h ∗ u) + n,

(1)

where n is an additive Gaussian white noise with zero mean and variance σ 2 , and h is a translation invariant convolution kernel corresponding to the point spread function of the imaging device. A related problem in the above formulation is the estimation of the kernel h, that we shall skip in this paper. Throughout this paper, we assume that the kernel is given by the Gaussian, h(x, y, z) = Ke

− 12

x2 2 σx

2

2

y

z

y z +σ 2 + σ2

,

(2)

where K is a normalization constant, and σx , σy , σz are variances along the X, Y , and Z directions respectively. The problem in Eqn. 1 is usuallysolved as a constrained optimization problem that seeks to minimize the regularizer Ω ||∇u||2 dxdy, while constraining the noise to be ||h ∗ u − f ||2L2 = σ 2 . This ensures that the reconstructed image u is free of discontinuities. An alternative to the above regularizer is the total variation

392

S.H. Joshi et al.

proposed by Rudin and Osher [11]. This norm is shown to recover edges in images satisfactorily. The total variation norm is given as, TV(u) = |∇u|dxdy (3) Ω

Using the regularizer in Eqn. 3, we can state the single frame image reconstruction model as follows: u ˆ = arg min{T V (u) + u

λ [||f − D(h ∗ u)||2L2 − σ 2 ]} 2

(4)

The Euler-Lagrange formulation for Eqn. 4 can be written as ∇u ˜ ∗ S(f ) − h ˜ ∗ (S ◦ D(h ∗ u))) = 0 + λ(h |∇u| ∇u ˜ ∗ (¯ =⇒ ∇ · + λh g − T (h ∗ u)) = 0 |∇u| ∇·

(5) (6)

˜ is the inverse of h, g¯ = S(f ), and the operator T where S is an upsampling operator, h is defined as T = S ◦ D. Furthermore D ◦ S = Id The Euler-Lagrange equation given by Eqn. 6 can be solved as a time-dependent equation ∇ ˜ ∗ (¯ ut = ∇ · + λh g − T (h ∗ u)) (7) |∇u| with homogeneous Neumann boundary conditions and initiating with u0 = S(f ). 2.1 Bregman Iterative Method The convergence of Eqn. 7 to the steady state yields a reconstructed high resolution image. However if one wishes to recover even finer scales from the reconstructed image, one can use the Bregman iterative refinement procedure [1] to do so. If u0 is the solution of Euler-Lagrange equation (6), then we have, ∇·

∇u0 ˜ ∗ (¯ + λh g − T (h ∗ u0 )) = 0 |∇u0 |

(8)

We will denote the image residual in the high resolution scale by v0 as, v0 = g¯ − T (h ∗ u0 )

(9)

We now solve the Euler-Lagrange equation for the new image g¯ + v0 to obtain a new solution, which we denote by u1 . Again, the solution u1 will satisfy ∇u1 ˜ ∗ g¯ + v0 − T (h ∗ u1 ) = 0, ∇· + λh (10) |∇u1 | where the new residual is defined as v1 = g¯ + v0 − T (h ∗ u1 )

(11)

Edge-Enhanced Image Reconstruction

393

and so on. The sequence of images u0 , u1 , · · · , uj , · · · are also referred to as Bregman iterates. It is advisable to terminate this procedure when a satisfactory image quality is obtained, otherwise it has a tendency to recover noise after all the finer scales in the image are recovered. This iterative procedure was introduced for image restoration in [10]. 2.2 Edge-Preserving Up (Down)-Sampling Operator There are various choices for the up (S) and down (D) sampling operators used in the observation model in Eqn. 1 and the synthesis model in Eqn. 7 respectively. The simplest down sampling operator can be an averaging operator that simply averages the eight neighbors of the pixel using either a Gaussian kernel, or an arithmetic average. Correspondingly, the up sampling operation simply involves repeating voxel values for each row, column, and slice. Alternately, one can also use bilinear interpolation for up sampling and down sampling images. The problems with the above approaches are the unnecessary blurring (averaging) that is caused at each step of the iteration while solving the Euler Lagrange equation in 6. To overcome this problem, one can use better signal preserving operators that involve sinc or Fourier interpolation for up and down sampling. However these methods can potentially introduce ringing artifacts in images with sharp edges or boundaries. Especially for images with prominent edges and interfaces, we need an appropriate interpolation operator that preserves these features. Accordingly, we propose a new piecewise-linear up (down) sampling operator that preserves such edges and boundaries. We describe the edge-preserving operator in detail below. We set up the grid xj = (j − 1)Δx, yk = (k − 1)Δy and zl = (l − 1)Δz, where Δx > 0, Δy > 0, Δz > 0 and j = 1, . . . , n, k = 1, . . . , m and l = 1, . . . , p. We define the domain E = [0, A] × [0, B] × [0, C], where A = (n − 1)Δx, B = (n − 1)Δy, and C = (n − 1)Δz. We consider the grid function u defined as uj,k,l : R3 → R We define the edge-preserving piecewise linear approximation of the grid function u as the function L(x, y, z)|Ejkl = Ljkl (x, y, z) where the computational voxel Ejkl is given by Ejkl = [xj −

Δx Δx Δy Δy Δz Δz , xj + ] × [yk − , yk + ] × [zl − , zl + ] 2 2 2 2 2 2

and Ljkl (x, y, z) = uj,k,l + a(x − xj ) + b(y − yk ) + c(z − zl ), x Δ− uj,k,l Δx + uj,k,l where a, b, and c are determined from a = minmod , , Δx Δy u z Δx y Δ− uj,k,l Δz+ uj,k,l − j,k,l Δ+ uj,k,l b = minmod , and c = minmod , where the , Δy , Δz Δy Δz operations in the term containing derivatives are understood component-wise, and given by Δx± uni,j,k = ±(uni±1,j,k − uni,j,k ), Δy± uni,j,k = ±(uni,j±1,k − uni,j,k ), and Δz± uni,j,k = ±(uni,j,k±1 − uni,j,k ), where i, j, k are the indices of the 3D grid.

394

S.H. Joshi et al.

The minmod(d, e) function is defined as, minmod(d, e) =

sgn(d) + sgn(e) min(|d|, |e|), 2

(12)

where sgn(d) = 1 if d ≥ 0 and sgn(d) = −1 otherwise. The function Ljkl (x, y, z) is defined on the computational voxel Ejkl . We want to up-(down) sample the grid function u with a spatial resolution of hx > 0, hy > 0, hz > 0. Then the up-(down) sampled grid function v is defined on a new grid v(q, r, s) for q = q, . . . , nh, r = 1, . . . , mh, and s = 1 . . . , ph where A B C nh = floor , mh = floor , ph = floor , hx hy hz where floor(d) is the maximum of all integers i such that i ≤ d. The new grid is then defined as xhq = (q − 1)hx , yhr = (r − 1)hy , and zhs = (s − 1)hz . Based on this grid, the function v is defined as v(q, r, s) = L(xhq , yhr , zhs ). We demonstrate the edge-preserving property of the above operator by applying it to a checkerboard pattern as shown in Fig. 1. Figure 1 shows a low-resolution image, as well as its up sampled versions using a bilinear, sinc and the edge-preserving operator for two different types of checkerboard patterns. It also shows a magnified portion from the center of the image. It is observed that the bilinear and the sinc interpolation operators introduce significant spurious levels of gray in between the black squares in the pattern. Furthermore, they have a tendency to smooth out the boundaries of the flat black squares in the image. In contrast, the edge-preserving operator has retained, and in some cases even enhanced the boundaries and edges as compared to the low-resolution image. Figure 3 shows similar results with a 280 × 200 scene image. The first image in the top row shows the 560×400 pixel replicated image, whereas the last image is the superresolved image. The bottom row shows a small portion of the image magnified to show detail. One can immediately observe the blocking effects due to pixel replication in the first image, and blurring of the edge boundaries in the bilinearly interpolated version. The edges get somewhat better using the sinc interpolation, but the best quality is given by the super-resolved image, that resolves and even enhances sharp edges and interfaces in the image. In both the above cases, we used an isotropic Gaussian kernel with kernel widths σx = σy = 1.

3 Numerical Implementation This section discusses the numerical implementations of the solution to the Euler Lagrange equation. The Euler-Lagrange derivative of the TV-norm is not well defined at 1 points where ∇u = 0, due to the presence of the term |∇u| . Hence we modify the regularization TV functional as follows:

|∇u|2 + dxdy (13) Ω

Edge-Enhanced Image Reconstruction Low-resolution Image

Bilinear Interpolation

Sinc Interpolation

395

Edge-preserved Upsampling

Fig. 1. The first and the third rows show a low-resolution image from the left, and its up sampled versions using a bilinear interpolation operator, a sinc operator, and the new edge-preserving operator for two different checkerboard patterns. The second and the fourth rows show a magnified area from the center of the image.

where is a small positive parameter. We express the 3D model (7) in terms of explicit partial derivatives ˜ ∗ (¯ ut =λh g − T (h ∗ u)) 2

unxx((uny ) +(unz )2 + ))+unyy ((unx )2 +(unz )2 + ))+unzz ((unx )2 +(uny )2 + )) [(unx )2 + (uny )2 + (unz )2 + ]3/2 n n n −2uxy ux uy − 2unxz unx unz − 2unyz uny unz + (14) [(unx )2 + (uny )2 + (unz )2 + ]3/2

+

396

S.H. Joshi et al.

low-resolution image

sinc interpolation

Super-resolved reconstruction

1st Bregman refinement

Fig. 2. Clockwise from top, a 380 × 285 low-resolution image, upsampled to twice the size by sinc interpolation, and super-resolved reconstruction, and the first Bregman iterated image

using u0 = S(f ) as the initial guess and homogeneous Neumann boundary conditions (i.e. absorbing boundary). The above expression can also be rewritten as n un+1 i,j,k − ui,j,k ˜ ∗ (¯ = λ[h g − T (h ∗ un ))]i,j,k Δt

(15)

2

+

unxx ((uny ) +(unz )2 +))+unyy ((unx )2 +(unz )2 +))+unzz ((unx )2 +(uny )2 +)) [(unx )2 +(uny )2 +(unz )2 +]3/2

(16)

+

−2unxy unx uny − 2unxz unx unz − 2unyz uny unz [(unx )2 + (uny )2 + (unz )2 + ]3/2

(17)

The approximations to the derivatives in Eqn. 17 can be calculated as: [unxx ]i,j,k = Δx+ Δx− uni,j,k /h2x , [unyy ]i,j,k = Δy+ Δy− uni,j,k /h2y , [unzz ]i,j,k = Δz+ Δz− uni,j,k /h2z , [unxy ]i,j,k = (Δx− + Δx+ )(Δy− + Δy+ )uni,j,k /4(hx hy ), [unxz ]i,j,k = (Δx− + Δx+ )(Δz− + Δz+ )uni,j,k /4(hx hz ), [unyz ]i,j,k = (Δy− + Δy+ )(Δz− + Δz+ )uni,j,k /4(hy hz ), [unx ]i,j,k = (Δx− + Δx+ )uni,j,k /2hx, [uny ]i,j,k = (Δy− + Δy+ )uni,j,k /2hy , [unz ]i,j,k = (Δz− + Δz+ )

Edge-Enhanced Image Reconstruction Low-resolution Image

Bilinear Interpolation

Sinc Interpolation

397

Super-resolved reconstruction

Fig. 3. Top row shows the low-resolution image, and the upsampled versions using bilinear, sinc and the super-resolved reconstruction. The bottom row shows a magnified detail of a portion of the image.

uni,j,k /2hz The Lagrange multiplier λ was chosen to be the maximum value for which the algorithm was stable. It was empirically determined to be λ = 10, and was not changed thereafter.

4 Experimental Results Lastly, we demonstrate the algorithm by performing experiments with 2D natural images, 2D slices of 3D volumetric images, and finally the full 3D volumetric MRI images themselves. 4.1 Results for Natural Images Figure 2 shows the results of the super-resolution reconstruction algorithm applied to a 380 × 285 map image. This image has been scaled to 760 × 570 by pixel-replication for display purposes. It can be observed that pixel replication inherently adds blocking artifacts to the image. The low-resolution image is up sampled by a factor of two using bilinear interpolation, and sinc interpolation, and finally using the super-resolution reconstruction method. It is observed that bilinear interpolation grossly smoothes out the image, the result due to sinc interpolation is preserves some high frequency information, whereas the super-resolved reconstruction yields a sharp, crisp image, even resolving the little text at finer scales. One can further enhance this image by performing the 1st Bregman iteration as shown in Fig. 2. However, this process should be terminated after one or two iterations. 4.2 Results for 2D Slices of 3D MRI Image In this experiment, we look at enhancing the in-plane resolution of individual transverse slices of a 3D MRI image. From left, all rows of Fig. 4 show an isotropic original image

398

S.H. Joshi et al.

Original Image

Subsampled Image Fourier Interpolation SR reconstruction

Fig. 4. Examples of super-resolved reconstruction for 2D slices of 3D MRI images

180 × 216, the subsampled image, a Fourier interpolated image, and a super-resolved reconstructed image. For display purposes, the subsampled image is shown at twice the resolution using pixel-replication. It is observed that the high resolved reconstructed image has sharper edge features, more details, and visually closely resembles the original image as compared to the Fourier interpolated result. 4.3

Results for Full 3D MRI Images

The proposed super-resolution algorithm can be applied to arbitrary 2D images or even 3D volumes of anisotropic voxel dimensions. In this experiment, we apply the reconstruction

Edge-Enhanced Image Reconstruction

399

Original Image Subsampled Image Fourier Interpolation SR reconstruction

Fig. 5. Examples of super-resolved reconstruction for full 3D MRI images (volume rendered)

algorithm to the full 3D MRI image volume. Figure 5 shows a volume rendering of an original image of dimensions 256×256×160, at voxel widths given by 1×1×1.25 mm3. This image is first subsampled to half the resolution at 128 × 128 × 80 (2 × 2 × 2.5 mm3 ) and then super-resolved to a full isotropic 256 × 256 × 160 image with 1 × 1 × 1 mm3 resolution. As expected, we can see an improvement in the resolution plus an increase in the detail simultaneously across all X, Y, and Z dimensions. In this experiment, we used an anisotropic Gaussian kernel with the variances proportional to the voxel dimensions. Furthermore the grid dimensions for the edge-preserving up sampling and down sampling h operators were taken to be Δx = h2x , Δy = 2y , Δz = h2z , where hx , hy , hz are the voxel dimensions of the appropriate up sampled or down sampled image.

5 Conclusion and Future Directions We have presented a method for enhancement of resolution of images. The strengths of this approach lie in the i) TV norm as a regularizing functional in the variational model, and ii) a new piecewise-linear up(down) sampling operator that preserves edges. While we are aware that the proposed method works with the physical space, and not the frequency (k-space) of the data, we emphasize that the TV prior is a nonlinear prior that does modify the amplitudes of the k-space data. In other words, our algorithm works on the processed physical image, yet it modifies the spectral information implicitly in the data. This is an important point to be noted, especially in view of comparison with other methods that involve MRI image processing that work with the k-space representation of the data. We have demonstrated the improvement in spatial resolution for 2D as well as 3D anatomical MRI images. In the future, we intend to investigate the problem of high resolution reconstruction of DT-MRI images using the proposed method.

400

S.H. Joshi et al.

Acknowledgments This research was partially supported by the National Institute of Health through the NIH Roadmap for Medical Research, Grant U54 RR021813. Additionally, Dr. Antonio Marquina gratefully acknowledges the support from the NSF grants DMS-0312222, ACI-0321917, the NIH grant G54 RR021813, as well as DGICYT MTM2008-03597 from the Spanish Government Agency.

References 1. Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. and Math. Phys. 7, 200–217 (1967) 2. Capel, D., Zisserman, A.: Super-resolution from multiple views using learnt image models. In: CVPR, vol. 2, pp. 627–634 (2001) 3. Carmi, E., Liu, S., Alon, N., Fiat, A., Fiat, D.: Resolution enhancement in MRI. Magnetic Resonance Imaging 24(2), 133–154 (2006) 4. Chaudhuri, S., Joshi, M.: Motion-Free Super-Resolution. Springer, New York (2005) 5. Elad, M., Feuer, A.: Restoration of a single super-resolution image from several blurred,noisy, and undersampled measured images. IEEE Tran. Image Processing 6(12), 1646–1658 (1997) 6. Freeman, W.T., Jones, T.R., Pasztor, E.C.: Example-based super-resolution. IEEE Computer Graphics and Applications 22(2), 56–65 (2002) 7. Greenspan, H., Oz, G., Kiryati, N., Peled, S.: MRI inter-slice reconstruction. Magnetic Resonance Imaging 20, 437–446 (2002) 8. Irani, M., Peleg, S.: Improving resolution by image registration. CVGIP: Graphical Models and Image Processing 53(3), 231–239 (1991) 9. Kornprobst, P., Peeters, R., Nikolova, M., Deriche, R., Ng, M., Van Hecke, P.: A superresolution framework for fMRI sequences and its impact on resulting activation maps. In: Ellis, R.E., Peters, T.M. (eds.) MICCAI 2003. LNCS, vol. 2879, pp. 117–125. Springer, Heidelberg (2003) 10. Osher, S.J., Burger, M., Goldfarb, D., Xu, J., Yin, W.: An iterative regularization method for Total Variation-based image restoration. Multiscale Modeling and Simulation 4(2), 460–489 (2005) 11. Rudin, L.I., Osher, S.J., Fatemi, E.: Nonlinear Total Variation based noise removal algorithms. Physica D 60(1-4), 259–268 (1992) 12. Startk, H., Oskoui, P.: High-resolution image recovery from image-plane arrays, using convex projections. Journal of the Optical Society of America 6, 1715–1726 (1989) 13. Tsai, R.Y., Huang, T.S.: Multi-frame image restoration and registration. In: Advances in Computer Vision and Image Processing, pp. 317–339 (1984) 14. Marquina, A., Osher, S.J.: Image super-resolution by TV-regularization and Bregman iteration. Journal of Scientific Computing 37(3), 367–382 (2008)

Nonlocal Variational Image Deblurring Models in the Presence of Gaussian or Impulse Noise Miyoun Jung and Luminita A. Vese University of California, Los Angeles, Department of Mathematics, Los Angeles, CA 90095-1555, USA [email protected], [email protected]

Abstract. We wish to recover an image corrupted by blur and Gaussian or impulse noise, in a variational framework. We use two data-ﬁdelity terms depending on the noise, and several local and nonlocal regularizers. Inspired by Buades-Coll-Morel, Gilboa-Osher, and other nonlocal models, we propose nonlocal versions of the Ambrosio-Tortorelli and Shah approximations to Mumford-Shah-like regularizing functionals, with applications to image deblurring in the presence of noise. In the case of impulse noise model, we propose a necessary preprocessing step for the computation of the weight function. Experimental results show that these nonlocal MS regularizers yield better results than the corresponding local ones (proposed for deblurring by Bar et al.) in both noise models; moreover, these perform better than the nonlocal total variation in the presence of impulse noise. Characterization of minimizers is also given.

1

Introduction

We consider the problem of restoring an image blurred and then contaminated by Gaussian or impulse noise. Let f, u : Ω → IR be image intensity functions, where Ω ⊂ IR2 is open and bounded. The standard linear degradation model is f = k ∗ u + n; f is the observed blurry-noisy image, k is (known) spaceinvariant blurring kernel, u is the ideal image we want to recover, and n is additive random noise independent of u. We approach the restoration problem within the variational framework: inf u {Φ(f − k ∗ u) + Ψ (|∇u|)}, where Φ deﬁnes a data-ﬁdelity term, and Ψ deﬁnes the regularization that enforces a smoothness constraint on u, depending on its gradient ∇u. First, two diﬀerent ﬁdelity terms can be considered based on the noise; in the case of Gaussian noise model, the L2 -ﬁdelity term led by the maximum likelihood estimation is commonly used: Φ(f − k ∗ u) = Ω (f − k ∗ u)2 dx. However, the quadratic data ﬁdelity term considers the impulse noise, which might be caused by bit errors in transmissions or wrong pixels, as an outlier. So, for the impulse noise model, the L1 -ﬁdelity term is more appropriate, due to its robustness of removing outlier eﬀects [2], [17]: Φ(f − k ∗ u) = Ω |f − k ∗ u|dx. Image deblurring-denoising is an inverse problem, which is known to be ill-posed due to either the non-uniqueness of the solution or the numerical X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 401–412, 2009. c Springer-Verlag Berlin Heidelberg 2009

402

M. Jung and L.A. Vese

instability of the inversion of the blurring operator. The regularization term Ψ alleviates this problem by reﬂecting some a-priori properties. Several regularization terms were suggested in the literature, including [23], [9], [19], [20], [16]. Here, we consider the total variation regularization [19], [20] and two approximations of Mumford-Shah regularizers [16], denoted M SH 1 and M ST V , proposed by Ambrosio-Tortorelli [3] and Shah [21], [1] respectively and recently used for image deblurring in the presence of Gaussian and impulse noise by Bar et al [4], [5]. These traditional regularization terms are based on local image operators, which denoise and preserve edges very well, but may induce loss of ﬁne structures like texture during the restoration process. Recently, Buades et al [8] introduced the nonlocal means ﬁlter, which produces excellent denoising results. Kindermann et al [13] and Gilboa-Osher [10,11] formulated the variational framework of NL-means by proposing nonlocal regularizing functionals. Lou et al [14] used the nonlocal total variation (N L/T V ) of Gilboa-Osher in image deblurring in the presence of Gaussian noise with a preprocessing step for the computation of the weight function. We propose here nonlocal versions of the approximated Mumford-Shah and Ambrosio-Tortorelli regularizing functionals, called N L/M SH 1 and N L/M ST V , by applying the nonlocal operators proposed by Gilboa-Osher to M SH 1 and M ST V respectively, for image restoration in the presence of blur and Gaussian or impulse noise. In addition, for the impulse noise model, we propose to use a preprocessed image to compute the weights w (the weights w deﬁned in the NL-means ﬁlter are more appropriate for the additive Gaussian noise). We note that the interesting parallel work [7] also proposed N L/M SH 1 regularizer for segmentation and denoising in the presence of Gaussian noise, but not for deblurring, nor for the impulse noise case. More details about our proposed methods are presented in [12]. Local Regularizers. In this section, we recall several regularization terms. The ﬁrst one is the Mumford-Shah regularizing functional [16] which gives preference to piecewise smooth images. The MS regularizer, depending on the image u and on its edge set K ⊂ Ω, is given by Ψ MS (u, K) = β Ω\K |∇u|2 dx + α K dH1 , where H1 is the 1D Hausdorﬀ measure. The ﬁrst term enforces smoothness of u everywhere except on the edge set K, and the second one minimizes the total length of edges. But it is diﬃcult to minimize in practice the non-convex MS functional. Ambrosio and Tortorelli [3] approximated this functional by a sequence of regular functionals Ψ using the Γ -convergence. The edge set K is represented by a smooth auxiliary function v. Thus we have an approximation to Ψ MS as [3] (v − 1)2 MSH 1 2 2 dx, |∇v|2 + (u, v) = β v |∇u| dx + α Ψ 4 Ω Ω where 0 ≤ v(x) ≤ 1 represents the edges: v(x) ≈ 0 if x ∈ K and v(x) ≈ 1 1 otherwise, > 0 is a parameter, and α, β > 0. A minimizer u = u of ΨMSH approaches a minimizer u of Ψ MS as → 0.

Nonlocal Variational Image Deblurring Models

403

An alternative approach is the total variation [19, 20] proposed by Rudin, Osher, and Fatemi, called T V regularizer: Ψ T V (u) = Ω |Du| ≈ Ω |∇u|dx. Because of its beneﬁts of preserving edges (which have high gradient levels) and convexity, T V has been widely used in image restoration. Shah [21] suggested a modiﬁed version of the AT approximation to the MS functional by replacing the 2-norm of |∇u| by the 1-norm in the ﬁrst term:

ΨMST V

(v − 1)2 dx. |∇v|2 + (u, v) = β v |∇u|dx + α 4 Ω Ω 2

This functional Γ −converges to the other functional Ψ MST V as → 0, [1]:

|∇u|dx + α

Ψ MST V (u) = β Ω\K

K

|u+ − u− | dH1 + |Dc u|(Ω) 1 + |u+ − u− |

where u+ and u− denote the image values on two sides of the jump set K = Ku of u, and Dc u is the Cantor part of the measure-valued derivative Du. Note |u+ −u− | that the non-convex term 1+|u + −u− | is similar with the prior regularization by Geman-Reynolds [9]. We observe that this regularizing functional is simi|Du| = lar to the total variation of u ∈ BV (Ω) that can be written as Ω + − 1 |∇u|dx + |u − u |dH + |D u|(Ω). By comparing the second terms, c Ω\Ku Ku we see that the M ST V regularizer does not penalize the jump part as much as the T V regularizer. In this paper, we consider the T V regularizer Ψ T V , the 1 M SH 1 regularizer ΨMSH , and the M ST V regularizer ΨMST V . Nonlocal Regularizers. Nonlocal methods in image processing have been explored in many papers because they are well adapted to texture denoising while the standard denoising models working with local image information seem to consider texture as noise, which results in losing details. Nonlocal methods are generalized from neighborhood ﬁlters (e.g. Yaroslavsky ﬁlter, [24]) and patch based methods. The idea of neighborhood ﬁlter is to restore a pixel by averaging the values of neighboring pixels with a similar grey level value. Buades et al. [8] generalized this idea by applying the patch-based method, and proposed the famous nonlocal-means (or NL-means) ﬁlter for denoising, given by N Lu(x) = − da (u(x),u(y)) 1 h2 u(y)dy; da (u(x), u(y)) = Ga (t)|u(x + t) − u(y + t)|2 dt is C(x) Ω e the patch distance, Ga is the Gaussian kernel with standard deviation a deterda (u(x),u(y)) h2 dy is a normalization factor, and mining the patch size, C(x) = Ω e− h is the ﬁltering parameter corresponding to the noise level (usually the standard deviation of the noise). The NL-means not only compares the grey level at a single point but the geometrical conﬁguration in a whole neighborhood (patch). In the variational framework, Kindermann et al [13] formulated the neighborhood ﬁlters and NL-means ﬁlters as nonlocal regularizing functionals which generally are not convex. Then, Gilboa-Osher [10] formalized the convex nonlocal functional inspired from graph theory, and moreover, based on the gradient and divergence deﬁnitions on graphs in the context of machine learning,

404

M. Jung and L.A. Vese

they [11] derived the corresponding nonlocal operators. Let u : Ω → IR be a function, and w : Ω × Ω → IR be a nonnegative and symmetric weight function. Thenonlocal gradient vector ∇w u : Ω × Ω → IR is (∇w u)(x, y) := → (u(y) − u(x)) w(x, y). Hence, the nonlocal divergence divw − v : Ω → IR of the → − vector v : Ω × Ω → IR is deﬁned as the adjoint of the nonlocal gradient, → v )(x) := Ω (v(x, y) − v(y, x)) w(x, y)dy, and the norm of the nonlocal (divw − 2 gradient of u at x ∈ Ω is given by |∇w u|(x) = Ω (u(y) − u(x)) w(x, y)dy. Based on these nonlocal operators, they introduced nonlocal regularizing functionals of the general√form Ψ (u) = Ω φ(|∇w u|2 )dx, where s → √ φ(s) is a positive function, convex in s, and φ(0) = 0. By taking φ(s) = s, they proposed the nonlocal TV regularizer (N L/T V ) which corresponds in the local case to Ψ T V (u) = Ω |∇u|dx. Inspired by these ideas, we propose in the next section nonlocal versions of Ambrosio-Tortorelli and Shah approximations to the MS regularizers for image denoising-deblurring. This is also continuation of the work by Bar et al. [4], [5], ﬁrst to propose the use of Mumford-Shah-like approximations to image restoration. In practice, we use the search window Ωw = {y ∈ Ω : |y − x| ≤ r} instead of Ω (semi-local) and the weight function w at (x, y) ∈ Ω ×Ω depending on a function da (f (x),f (y)) . The weight function w(x, y) gives f : Ω → IR, w(x, y) = exp − h2 the similarity of image features between two pixels x and y, which is normally computed using the blurry-noisy image f . Recently, for image deblurring in the presence of Gaussian noise, Lou et al [14] used a preprocessed image obtained by applying the Wiener ﬁlter to f , instead of f , to compute w. In our work, only for the impulse noise model, we propose a diﬀerent preprocessing step and evaluate w by using the preprocessed image.

2

Description of the Proposed Models

We propose the following nonlocal Mumford-Shah regularizers (N L/M S) by applying the nonlocal operators to the approximations of the MS regularizer (v − 1)2 N L/MS 2 2 Ψ dx, |∇v|2 + (u, v) = β v φ(|∇w u| )dx + α 4 Ω Ω √ where φ(s) = s and φ(s) = s correspond to the nonlocal versions of M SH 1 and M ST V regularizers, so called N L/M SH 1 and N L/M ST V , respectively. In addition, we use these nonlocal regularizers to deblur images in the presence of Gaussian or impulse noise. Thus, by incorporating the proper ﬁdelity term depending on the noise model, we design two types of total energies as G Gaussian noise model: E (u, v) = (f − k ∗ u)2 dx + Ψ N L/MS (u, v), Ω Impulse noise model: E Im (u, v) = |f − k ∗ u|dx + Ψ N L/MS (u, v). Ω

Nonlocal Variational Image Deblurring Models

405

Minimizing these functionals in u and v, we obtain the Euler-Lagrange equations ∂E Im ∂E G v−1 2 = = 2βvφ(|∇w u| ) − 2αv + α = 0, ∂v ∂v 2 Gaussian noise model: Impulse noise model:

∂E G = k˜ ∗ (k ∗ u − f ) + LN L/MS u = 0, ∂u ∂E Im = k˜ ∗ sign(k ∗ u − f ) + LN L/MS u = 0, ∂u

˜ where k(x) = k(−x) and

N L/MS (u(y) − u(x))w(x, y) L u=−2 · (v 2 (y)φ (|∇w (u)|2 (y)) + v 2 (x)φ (|∇w (u)|2 (x)) dy. Ω

The energy functionals E G (u, v) and E Im (u, v) are convex in each variable and bounded from below. Therefore, to solve two Euler-Lagrange equations simultaneously, the alternate minimization approach is applied. Note that since both energy functionals are not convex in the joint variable (u, v), we may compute only a local minimizer. However, this is not a drawback in practice, since the initial guess for u in our algorithm is the data f . To extend the nonlocal methods to the impulse noise case, we need a preprocessing step for the weight function w since we cannot directly use the data f to compute w. In other words, in the presence of impulse noise, the noisy pixels tend to have larger weights than the other neighboring points, so it is likely to keep the noise value at such pixel. Thus, we propose a simple algorithm to obtain a preprocessed image g, which removes the impulse noise (outliers) as well as preserving texture as much as possible. Basically, we use the median ﬁlter, well-known for removing impulse noise. However, if we apply one-step of the median ﬁlter, then the output may be too smoothed out. In order to preserve ﬁne structures as well as to remove noise properly, we take the idea of Bregman iteration [6], [18], and we propose the following algorithm to obtain a preprocessed image g that will be used only in the computation of the weight function w: Initialize : r0 = 0, g0 = 0. do (iterate n = 0, 1, 2, . . . , m) gn+1 = median(f + rn , [a a]) rn+1 = rn + f − k ∗ gn+1 while f − k ∗ gn 1 > f − k ∗ gn+1 1 [Optional] gm = median(gm , [b b]) where f is the given noisy-blurry data, median(f, [a a]) is the median ﬁlter of size a × a with input f ; the optional step is needed in the case when the ﬁnal gm still has some salt-and-pepper-like noise. This algorithm is simple, it requires a few iterations only, and it takes less than 1 second for a 256 × 256 size image. Moreover, the preprocessed image gm is a deblurred and denoised version of f ; it will be used only in the computation of the weights w, while keeping f in the data ﬁdelity term, thus artifacts are not introduced by the median ﬁlter.

406

M. Jung and L.A. Vese

Characterization of Minimizers. In this section we characterize the minimizers of the functionals formulated with the nonlocal regularizers, using [15, 22]. Assuming that a functional · on a subspace of L2 (Ω) is a semi-norm, we can deﬁne the dual norm (where ·, · denotes the L2 (Ω) inner product) of f,ϕ f ∈ L2 (Ω) ⊂ L1 (Ω) as f ∗ := supϕ =0 ϕ ≤ +∞, so that the usual duality f, ϕ ≤ ϕ f ∗ holds for ϕ = 0. We deﬁne two functionals (here Ku := k ∗ u), F (u) = λ |f − Ku|2 dx + |u|N L/T V , Ω |v − 1|2 2 2 2 )dx |f − Ku| + η dx + β|u|N L/MS + α (|∇v| + G(u, v) = 4 Ω Ω where λ > 0, and |u|N L/MS ∈ {|u|N L/MST V,v , |u|N L/MSH 1 ,v }. We use here the notations |u|N LT V = Ω |∇w u|(x)dx, |u|N L/MST V,v = Ω v 2 (x)|∇w u|(x)dx, and 2 2 |u|N L/MSH 1 ,v = Ω v (x)|∇w u| (x)dx, which are semi-norms. We modiﬁed the regularizing functional |u|N L/MSH 1 ,v ; the square-root term replaces the original term of our model, Ω v 2 (x)|∇w u|2 (x)dx. It is introduced here to enable the characterization of minimizers below, but the numerical calculations utilize the original formulation. For the proofs we refer to [12]. Proposition 1. Let K : L2 (Ω) → L2 (Ω) be a linear bounded blurring operator with adjoint K ∗ and let F be the associated functional. Then 1 if and only if u ≡ 0 is a minimizer of F . (1) K ∗ f ∗ ≤ 2λ 1 (2) Assume that 2λ < K ∗ f ∗ < ∞. Then u is a minimizer of F if and only if 1 1 ∗ K (f − Ku) ∗ = 2λ and u, K ∗ (f − Ku) = 2λ |u|N L/T V ,

where · ∗ is the corresponding dual norm of | · |N L/T V . Proposition 2. Let K : L2 (Ω) → L2 (Ω) be a linear bounded blurring operator with adjoint K ∗ and let G be the associated functional. If (u, v) is a minimizer of G with v ∈ [0, 1], then f − Ku f − Ku ∗ ∗ K , u = β|u|N L/MS , K = β and (f − Ku)2 + η 2 ∗ (f − Ku)2 + η 2 where · ∗ is the corresponding dual norm of | · |N L/MS .

3

Experimental Results and Comparisons

The nonlocal MS regularizers proposed here, N L/M ST V and N L/M SH 1 , are tested on several images with diﬀerent blur kernels and noise types. We compare them with their traditional (local) versions, such as M ST V and M SH 1 , and with the local and nonlocal total variations (T V [20], N L/T V [11]). In addition, we experiment the nonlocal regularizers in the impulse noise model with a preprocessing step for the weight function.

Nonlocal Variational Image Deblurring Models

407

Fig. 1. Image recovery with cross sections: Gaussian blur kernel with σb = 1 and Gaussian noise with σn = 5. Top: original image and its cross section, noisy blurry image and its cross section. Middle, Bottom rows: recovered images (middle) and recovered cross sections (bottom) using T V, M ST V, N L/T V, N L/M ST V . SNR for the results: T V = 32.9485, M ST V = 33.5629, N L/T V = 45.1943, N L/M ST V = 50.6618. β = 0.0045 (M ST V ), 0.001 (N L/M ST V ), α = 0.00000015, = 0.000001.

Fig. 2. Top: (1st, 3rd) original images, (2nd, 4th) noisy blurry images with Gaussian kernel with σb = 1 (2nd) and using the pill-box kernel of radius 2 (4th), and then contaminated by Gaussian noise with σn = 5. Bottom: recovered images with SNR values: T V (14.4240), M ST V (14.4693), N L/T V (17.4165), N L/M ST V (16.5776). β = 0.007, α = 0.00000015 (M ST V ), β = 0.0025, α = 0.00000025 (N L/M ST V ), = 0.0000005.

408

M. Jung and L.A. Vese

Fig. 3. Recovery of noisy blurry image from Fig. 3. Top: recovered image u using T V (SNR=25.0230), M ST V (SNR=25.1968), M SH 1 (SNR=23.1324). Third row: recovered image u using N L/T V (SNR=26.4554), N L/M ST V (SNR=26.4696), N L/M SH 1 (SNR=24.7164). Second, bottom rows: corresponding residuals f − k ∗ u. β = 0.0045 (M ST V ), 0.001 (N L/M ST V ), 0.06 (M SH 1 ), 0.006 (N L/M SH 1 ), α = 0.00000001, = 0.00002.

First, we test the Gaussian noise model in Figs. 1-3. As expected, N L/M ST V and N L/M SH 1 perform better than M ST V and M SH 1 respectively in the sense that not only they recover the ﬁne scales such as texture better, but also in the case of N L/M ST V , the model does not produce any staircase eﬀect (appeared in M ST V ). Furthermore, comparing the nonlocal MS regularizers

Nonlocal Variational Image Deblurring Models

409

Fig. 4. Recovery of noisy blurry image with Gaussian kernel with σ = 1 and saltand-pepper noise with d = 0.3. Top: original image, blurry image, noisy-blurry image. Middle: recovered images using T V (SNR=26.9251), M ST V (SNR=27.8336), M SH 1 (SNR=23.2052). Bottom: recovered images using N L/T V (SNR=29.2403), N L/M ST V (SNR=29.3503), N L/M SH 1 (SNR=27.1477). Second column: β = 0.25 (M ST V ), 0.1 (N L/M ST V ), α = 0.01, = 0.002. Third column: β = 2 (M SH 1 ), 0.55 (N L/M SH 1 ), α = 0.001, = 0.0001.

with N L/T V , N L/M ST V and N L/T V seem to lead to similar results visually and according to SNR, while N L/M SH 1 gives a smoother image and lower SNR. Speciﬁcally, in Fig. 1, we use a simple image and its 1D cross section. In this example, we use 11 × 11 size search window for N L/M ST V which is suﬃcient to obtain the best result, while N L/T V needs a 31×31 size. Moreover, N L/M ST V recovers the signals much better than N L/T V , which might be caused by the fact that originally, M ST V regularizer does not suppress the jump part as much as T V . On the other hand, in Fig. 2, N L/T V produces clearer edges leading to higher SNR, while N L/M ST V has some artifacts near the edges of especially

410

M. Jung and L.A. Vese

Fig. 5. Comparison between M SH 1 and N L/M SH 1 with the image blurred and contaminated by high density (d = 0.4) of impulse noise. Top: noisy blurry images (left) using motion blur kernel of length=10, oriented at angle θ = 25◦ w.r.t. the horizon and salt-and-pepper noise with d = 0.4, (middle) using Gaussian kernel with σb = 1 and salt-and-pepper noise with d = 0.4, (right) using Gaussian kernel with σb = 1 and random-valued impulse noise with d = 0.4. Middle: recovered images using M SH 1 , (left) SNR=17.1106, (middle) SNR=15.2017, (right) SNR=16.6960. Bottom: recovered images using N L/M SH 1 , (left) SNR=21.2464, (middle) SNR=23.1998, (right) SNR=24.2500. First column: β = 2 (M SH 1 ), 0.4 (N L/M SH 1 ), second column: β = 2 (M SH 1 ), 1 (N L/M SH 1 ), α = 0.001, = 0.0002. Third column: β = 2.5 (M SH 1 ), 0.65 (N L/M SH 1 ), α = 0.000001, = 0.002.

small black boxes. However, in the other real boat image, there is no signiﬁcant diﬀerence between them visually and according to SNR (see Fig. 3). Fig. 3 also justiﬁes the result that the nonlocal regularizers preserve edges and details better than the traditional local ones because we see less textures in the residuals f − k ∗ u.

Nonlocal Variational Image Deblurring Models

411

Next, we recover a blurred image contaminated by impulse noise (salt-andpepper noise or random-valued impulse noise). First, we test all the nonlocal regularizers and the corresponding local ones on the Lenna image Fig. 4 with Gaussian blur kernel and salt-and-pepper noise with the noise density d = 0.3, and then we test M SH 1 and N L/M SH 1 on the Einstein image Fig. 5 with diﬀerent blur kernels and both impulse noise models, salt-and-pepper noise and random-valued impulse noise, with the same noise density d = 0.4. By using a preprocessed image for the weight function, all the nonlocal regularizers outperform the traditional local ones by reducing the staircase eﬀect and recovering the details better. Comparing the nonlocal regularizers, both N L/T V and N L/M ST V seem to give better results than N L/M SH 1 in the sense of SNR, but visually N L/M SH 1 looks more natural by preserving texture or details better especially with high noise density (see Fig. 4). Moreover, in the presence of high density of noise, M SH 1 suﬀers from restoring images especially blurred with Gaussian kernel, while it works satisfactorily with the other blur kernels such as motion blur. But, N L/M SH 1 performs very well with Gaussian blur as well as it produces better results with the other blur kernels. This can be seen in Figures 4 and 5. In Fig. 4 with Gaussian blur and high noise density d = 0.3, M SH 1 suﬀers from some artifacts induced by noise, while M ST V and T V give cleaner results. On the other hand, N L/M SH 1 provides visually better result than the other nonlocal ones by preserving the ﬁne structures. Even though N L/M ST V gives the highest SNR, the result still looks more like cartoon by suppressing the texture parts especially in the hat part. So in this case, we visually prefer N L/M SH 1 . Based on the above results, in Fig. 5, we only compare M SH 1 and N L/M SH 1 with the diﬀerent blur kernels and both impulse noise models with higher noise density d = 0.4. As expected, N L/M SH 1 produces better results than M SH 1 in both blur cases; especially in the Gaussian blur case, the results do not have any artifacts, unlike M SH 1 . Finally we note that in the MS regularizers, the parameters α, β and were selected manually to provide the best SNR results. The smoothness parameter β increases with noise level while the other parameters α, are approximately ﬁxed. For the computational time, it takes about 5 minutes for constructing the weight function of a 256 × 256 image with the 11 × 11 search window and 5 × 5 patch in MATLAB on a dual core laptop with 2GHz processor and 2GB memory. The minimization for the (local or nonlocal) MS regularizers takes around 60 seconds for the computations of both u using an explicit scheme based on the gradient descent method and v using a semi-implicit scheme with the total iterations 5 × (100 + 5), while the (local or nonlocal) TV regularizer using gradient descent with an explicit scheme takes less than 55 seconds with 500 iterations.

Acknowledgments This work has been supported by the National Science Foundation Grants DMS0714945 and DMS-0312222.

412

M. Jung and L.A. Vese

References 1. Alicandro, R., Braides, A., Shah, J.: Free-discontinuity problems via functionals involving the L1 -norm of the gradient and their approximation. Interfaces Free Bound 1, 17–37 (1999) 2. Alliney, S.: Digital Filters as Absolute Norm Regularizers. IEEE TSP 40(6), 1548– 1562 (1992) 3. Ambrosio, L., Tortorelli, V.M.: On the approximation of free discontinuity problems. Boll. Un. Mat. Ital. 6-B, 105–123 (1992) 4. Bar, L., Sochen, N., Kiryati, N.: Semi-Blind Image Restoration via Mumford-Shah Regularization. IEEE TIP 15(2), 483–493 (2006) 5. Bar, L., Sochen, N., Kiryati, N.: Image deblurring in the presence of impulsive noise. IJCV 70, 279–298 (2006) 6. Bregman, L.M.: The relaxation method for ﬁnding common points of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics 7, 200–217 (1967) 7. Bresson, X., Chan, T.F.: Non-local unsupervised variational image segmentation models. UCLA C.A.M. Report 08-67 (2008) 8. Buades, A., Coll, B., Morel, J.M.: A review of image denoising algorithms, with a new one. SIAM MMS 4(2), 490–530 (2005) 9. Geman, D., Reynolds, G.: Constrained Restoration and the Recovery of Discontinuities. IEEE TPAMI 14(3), 367–383 (1992) 10. Gilboa, G., Osher, S.: Nonlocal linear image regularization and supervised segmentation. SIAM MMS 6(2), 595–630 (2007) 11. Gilboa, G., Osher, S.: Nonlocal operators with applications to image processing. SIAM MMS 7(3), 1005–1028 (2008) 12. Jung, M., Vese, L.A.: Image restoration via nonlocal Mumford-Shah regularizers. UCLA C.A.M. Report 09-09 (2009) 13. Kindermann, S., Osher, S., Jones, P.W.: Deblurring and denoising of images by nonlocal functionals. SIAM MMS 4(4), 1091–1115 (2005) 14. Lou, Y., Zhang, X., Osher, S., Bertozzi, A.: Image recovery via nonlocal operators. UCLA C.A.M. Report 08-35 (2008) 15. Meyer, Y.: Oscillating Patterns in Image Processing and Nonlinear Evolution Equations. Univ. Lecture Ser. 22 (2002) 16. Mumford, D., Shah, J.: Optimal approximations by piecewise smooth functions and associated variational problems. Comm. Pure Appl. Math. 42, 577–685 (1989) 17. Nikolova, M.: Minimizers of cost-functions involving non-smooth data-ﬁdelity terms. Application to the processing of outliers. SIAM Num. Anal. 40(3), 965–994 (2002) 18. Osher, S., Burger, M., Goldfarb, D., Xu, J., Yin, W.: An iterative regularization method for total variation based image restoration. SIAM MMS 4, 460–489 (2005) 19. Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Phys. D 60, 227–238 (1996) 20. Rudin, L., Osher, S.: Total variation based image restoration with free local constraints. IEEE ICIP 1, 31–35 (1994) 21. Shah, J.: A common framework for curve evolution, segmentation and anisotropic diﬀusion. In: IEEE CVPR, pp. 136–142 (1996) 22. Tadmor, E., Nezzar, S., Vese, L.: Multiscale hierarchical decomposition of images with applications to deblurring. Denoising and segmentation: CMS 6(2), 281–307 (2008) 23. Tichonov, A., Arsensin, V.: Solution of ill-posed problems. Wiley, New York (1977) 24. Yaroslavsky, L.P.: Digital image processing: An Introduction. Springer, Heidelberg (1985)

A Geometric PDE for Interpolation of M -Channel Data Frank Lenzen1 and Otmar Scherzer1,2 1

Department of Mathematics, University of Innsbruck, Technikerstrasse 21a, A-6020 Innsbruck, Austria {Frank.Lenzen,Otmar.Scherzer}@uibk.ac.at http://infmath.uibk.ac.at 2 Johann Radon Institute for Computational and Applied Mathematics (RICAM), Austrian Academy of Sciences, Altenbergerstrasse 69, A-4040 Linz, Austria

Abstract. We propose a partial diﬀerential equation to be used for interpolating M -channel data, such as digital color images. This equation is derived via a semi-group from a variational regularization method for minimizing displacement errors. For actual image interpolation, the solution of the PDE is projected onto a space of functions satisfying interpolation constraints. A comparison of the test results with standard and state-of-the-art interpolation algorithms shows the competitiveness of this approach.

1

Introduction

A frequent task in image processing is interpolation, which we refer to as the process of assigning a discrete set of pixel positions and according discrete M channel image data (e.g. RGB color data) an interpolating function. Interpolation is frequently used for zooming into or scaling digital images. A special kind of image interpolation problems is inpainting, i.e. the problem of reconstructing lost or corrupted parts of images. Linear interpolation (that is convolution methods) [18], such as for example nearest neighbor, spline, and the Whittaker-Shannon interpolation [14, 4], is computationally eﬃcient but produce unpleasant artifacts. On the other hand, nonlinear methods adapting to geometrical structures can produce more visually attractive results but are computationally more demanding . Nowadays, most of these nonlinear methods are motivated by energy minimization or by scale spaces of partial diﬀerential equations, see for example [1, 22, 21, 18]. In particular for inpainting such nonlinear methods are widely used, see for example [2, 5, 6, 23]. In this paper we derive a partial diﬀerential equation that is designed to correct and ﬁlter for displacement errors in M -channel data. Combined with the interpolation ideas of [11, 16], this method is suited for interpolation. The paper is organized as follows: In Section 2 we consider a variational ansatz for correcting displacement errors. Application of the semi-group concepts yields X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 413–425, 2009. c Springer-Verlag Berlin Heidelberg 2009

414

F. Lenzen and O. Scherzer

a PDE, which can be considered the gradient ﬂow of the variational problem. A relationship of our PDE to the Mean Curvature Flow (MCF) equation is established. Our approach is combined with interpolation constraints in Section 3. For comparison, we show in Section 4 results from the proposed method and from interpolation methods from the scale space literature. In particular we take into account the GREYCstoration software of Tschumperlé [21] and the interpolation method proposed by Roussos and Maragos [18,19]. The paper ends with a conclusion in Section 5.

2

Displacement Regularization

Let u : Ω → IRM be an M channel function representing continuous M -channel data on a bounded open domain Ω ⊆ IR2 . We presume the following image acquisition model: Data u(0) of u are given, which satisfy u(0) = u ◦ Φ , (1) where Φ : Ω → Ω is a displacement vector ﬁeld. In the following we consider the problem of ﬁnding (u, Φ) satisfying (1) such that the displacement Φ − Id is small and u has minimal total variation. A variational method corresponding to this problem consists in minimization of 1 2 |Φ(x) − x| dx + α |∇u(x)| dx , (2) 2 Ω Ω for small α > 0 over the set of functions satisfying u(0) = u ◦ Φ. Here ⎛ ⎞1/2 2 M ∂1 u1 ∂1 u2 ∂1 u3 and |∇u(x)| = ⎝ (∂i uj (x))2 ⎠ . ∇u = ∂2 u1 ∂2 u2 ∂2 u3 j=1 i=1

We want to avoid solving a coupled system for (u, Φ), and therefore we assume that u is a smooth function, so that we can make a ﬁrst order Taylor series expansion. Then it follows from our modeling assumptions that u(0) (x) = (u ◦ Φ)(x) = u(x + (Φ(x) − x)) ≈ u(x) + ∇uT (x) (Φ(x) − x) . (3) Here, ≈ symbolizes that the left hand side approximates the right hand side for small displacements Φ − Id. In the following, we assume that equality holds instead of ≈, which implies that only small displacements occur. Note that the equation ∇uT (x)(Φ(x) − x) = u(0) (x) − u(x) for unknown Φ(x) − x is overdetermined. In case that the diﬀerence u(0) (x) − u(x) is not only caused by a distortion Φ, no solution to this problem might exist. To overcome this problem, we consider the minimization of

2

T

∇u (x)(Φ(x) − x) − u(0) (x) + u(x) ,

(4)

A Geometric PDE for Interpolation of M -Channel Data

415

that is, we search for the displacement vector Φ(x) − x , which ﬁts best to the data (u(0) (x), u(x)). The minimizer of (4) is given by Φ(x) − x = (∇uT (x))† (u(0) (x) − u(x)),

(5)

where (∇u(x))† denotes the pseudo–inverse (see [17]) of ∇u(x). For notational convenience, we leave out the dependence of u with respect to x in the following. Inserting (5) into (2) gives the functional 1 Fu0(0) (u) := (u − u(0) )T (∇uT ∇u)† (u − u(0) ) + α |∇u| dx . (6) 2 Ω In order to avoid computation of the pseudo–inverse, we additionally regularize the probably singular matrix ∇uT ∇u by the regular, symmetric, and strictly positive deﬁnite matrix (εI + ∇uT ∇u) with some ε > 0. To summarize, we consider in the sequel the variational problem of minimizing 1 Fuε (0) (u) := (u − u(0) )T (εI + ∇uT ∇u)−1 (u − u(0) ) + α |∇u| dx . (7) 2 Ω For this functional, existence theory within the classical framework of the Calculus of Variations [7, 8] is not applicable. Moreover for a theoretical analysis, minimization has in fact to be considered over the space of M -channel functions with components of ﬁnite total variation. In order to implement the minimization of Fvε numerically, quasi-convexiﬁcation techniques would be most eﬃcient. This approach requires the analytical calculation of the quasi-convex envelope of the function (x, ξ, ν) →

1 (ξ − v(x))T (εI + ν T ν)−1 (ξ − v(x)) + α |ν| 2

with respect to ν. However, the quasi-convex envelope function is not known so far, and thus eﬃcient numerical minimization based on this approach is not at hand. In the following we recall the convex semi-group solution concept [3]: Let R : H → IR ∪ {∞} be a convex functional on a Hilbert space H, and let uα be a minimizer of the variational regularization functional 2 1 Gu(0) (u) := u − u(0) + αR(u) . 2 H Then, for u(0) suﬃciently smooth, (uα − u(0) )/α converges for α → 0 to an element in the subgradient ∂R(u(0) ) of R. Choosing u(k) ∈ argmin Gu(k−1) , iterative minimization of Gu(k) yields an approximation of the solution of the ﬂow ∂u ∈ ∂R(u) ∂t at scale t = kα. In other words, variational regularization approximates a diffusion ﬁltering scale space, which is the associated gradient ﬂow equation. For

416

F. Lenzen and O. Scherzer

convex semi-groups the solutions of diﬀusion ﬁltering and variational methods are comparable and look rather similar [20]. We expect a similar behavior for the non-convex functional Fuε (0) and derive the according ﬂow equation, which is the gradient ﬂow associated with (7). We use the abbreviations

−1 Aε (u) := εI + ∇uT ∇u and 1 (u − u(k−1) )T Aε (u)(u − u(k−1) ) dx . Suε (k−1) (u) := 2 Ω The directional derivative of Suε (k−1) at u in direction φ (provided it exists) satisﬁes ∂τ Suε (k−1) (u + τ φ) = φT Aε (u)(u − u(k−1) ) dx+ Ω (8) 1 (k−1) T ε (k−1) (u − u ) ∂u,φ A (u) (u − u ) dx , 2 Ω where

Aε (u + τ φ) − Aε (u) . τ →0 τ In a similar way, the directional derivative of Rα (u) := α Ω |∇u| at u in direction φ can be derived in a formal way: ∇u ∂τ Rα (u + τ φ) = α dx. (9) ∇φT |∇u| Ω ∂u,φ Aε (u) := lim

Note that the right hand side of (9) is meant as the subdiﬀerential of the TV semi-norm evaluated in the direction of φ. Using (8) and (9), the optimality condition for the minimizer u(k) of Fuε (k−1) reads as u(k) − u(k−1) dx φT Aε (u(k) ) α Ω 1 (u(k) − u(k−1) )T (10) ∂u(k) ,φ Aε (u(k) ) (u(k) − u(k−1) ) dx + 2 Ω α ∇u(k) dx. ∇φT =− |∇u(k) | Ω Let t > 0 be ﬁxed and k = t/α, then, as in the convex case, we can expect that (u(k) − u(k−1) )/α converges to ∂t u(t) for α → 0. From this it follows then that u(k) − u(k−1) → 0, and from (10) it follows that ∇u(t) dx. (11) φT Aε (u(t))∂t u(t) dx = − ∇φT |∇u(t)| Ω Ω Using Green’s formula and the fundamental lemma, from (11) the strong formulation

−1 ∇u(t) , (12) ∂t u(t) = ∇ · Aε (u(t))∂t u(t) = εI + ∇uT (t)∇u(t) |∇u(t)| follows, where u(t) satisﬁes natural (Neumann) boundary conditions.

A Geometric PDE for Interpolation of M -Channel Data

417

In the following, we leave out the dependence of u with respect to t for notational convenience. Multiplying both sides of (12) by (εI + ∇uT ∇u), we get ∇u T . (13) ∂t u = (εI + ∇u ∇u) ∇ · |∇u| Moreover, the initial condition associated with the ﬂow is u(0) := u(0) . Now, letting ε → 0, which only seems to make sense mathematically if M ≤ 2, we obtain the evolutionary partial diﬀerential equation ∇u T ∂t u = (∇u ∇u) ∇ · . (14) |∇u| Remark 1. For scalar data (M = 1) the equation (14) reads as ∇u 2 . ∂t u = |∇u| ∇ · |∇u|

(15)

One recognizes that (15) diﬀers from the Mean Curvature Flow equation by the leading factor |∇u|2 instead of |∇u|. We generalize the functional in (6) to

† 1 (u − u(0) )T (∇uT ∇u)p (u − u(0) ) + α |∇u| dx 2 Ω

(16)

with p ≥ 0. We note that the power of a matrix is deﬁned via spectral decomposition. The case p = 1/2 is of particular interest, because – the functional (16) becomes invariant under aﬃne rescaling of the image brightness. – The semi-group approach (see also [10] for the scalar case) gives the gradient ﬂow 1 ∇u T 2 , ∂t u = (∇u ∇u) ∇ · |∇u| which, in the scalar case, is the Mean Curvature Flow equation. For an analytical comparison of the solution of (16) for scalar, radial-symmetric monotonous data to the MCF solution we refer to [9].

3

Interpolation of M-Channel Data

The evolution equation (14) can be used for interpolating discrete M -channel data by restricting u to satisfy interpolation constraints. The problem of interpolating M -channel data has already been studied in the literature before, see for example [1, 21, 18, 19]. The diﬀerence between the approaches by [21, 18, 19] and our approach are the diﬀerent PDEs for ﬁltering: [21, 18, 19] use anisotropic diﬀusion, whereas the PDE (14) generalizes the Mean Curvature Flow equation.

418

F. Lenzen and O. Scherzer

To begin with, we recall the interpolation constraints proposed in [11,16]. For the simplicity of notation we restrict ourself to M -channel data deﬁned on a two-dimensional rectangular domain 1 1 1 1 , Nx + × , Ny + , Ω := 2 2 2 2 where Nx , Ny ∈ N. The domain is partitioned into cells (’pixels’) Qi,j :=

1 1 1 1 × j − ,j + , i − ,i + 2 2 2 2

(i, j) = (1, 1), (1, 2) . . . , (Nx , Ny ) .

Let G be a kernel function deﬁned on IR2 and compactly supported in [− 21 , 12 ]2 . Let Z := (zm,i,j ) a tensor, which denotes sampled data of a function G ∗ u : IR2 → IRM at the positions (i, j). Here ∗ denotes the convolution operator. In particular: zm,i,j := (G ∗ um )(i, j),

(m, i, j) = (1, 1, 1), (1, 1, 2) . . . , (M, Nx , Ny ) .

(17)

Examples for kernel functions typically used in literature are listed in [18]. We rewrite (17) as follows: Let Gi,j := G(· − (i, j)), then zm,i,j = Gi,j , um L2 (Ω) ,

(m, i, j) = (1, 1, 1), . . . , (M, Nx , Ny ) .

We say that an M -channel function u = (u1 , . . . , uM ) satisﬁes the interpolation constraints for some discrete data Z = (zm,i,j ), if Gi,j , um L2 (Ω) = zm,i,j . The set of functions satisfying the interpolation constraints for data Z is denoted by UZ,G . Example 1. We consider for G the two-dimensional δ distribution, i.e., G(x, y) = δ(x)δ(y). Then zm,i,j = um ((i, j)). The nearest neighbor (componentwise, piecewise constant) interpolation reads as u(0) m |Qi,j = zm,i,j ,

(m, i, j) = (1, 1, 1), . . . , (M, Nx , Ny ) .

Here, u(0) = u ◦ Φ, where Φ(x, y)|Qi,j = (i, j). In particular u can be interpreted as a distortion of u(0) by a local sampling displacement Φ. Now let u(0) ∈ UZ,G be arbitrary. The nearest neighbor interpolation in Example 1 motivates the assumption that, for a sampled function u, there exists Φ such that u(0) = u ◦ Φ. Recalling the concepts presented in Section 2 we consider the

A Geometric PDE for Interpolation of M -Channel Data

419

functional deﬁned in (7) restricted to the set UZ,G in order to reconstruct u from given u(0) . In turn, we restrict the ﬂow equation (13) to UZ,G : ∇u T , (18) ∂t u = PU0,G (εI + ∇u ∇u)∇ · |∇u| where PU0,G (v) = v − G−2 L2 (R2 )

Ny Nx

Gi,j , vL2 (Ω) Gi,j

i=1 j=1

is applied on each component separately. Note that the assumption u(0) ∈ UZ,G together with ∂t u ∈ U0,G asserts that the solution u(t), t ≥ 0 stays in UZ,G . At this point we remark that there is no analytical theory guaranteeing the well posedness of (18). Since (18) comprises a projection, in order to solve (18) numerically a timeexplicit scheme with suﬃciently small step size Δt is required.

4

Numerical Results

We compare our method consisting in numerically solving (18) to two standard interpolation methods, namely nearest neighbor and cubic interpolation, as well as to established, sophisticated interpolation methods proposed by Tschumperlé & Deriche [21] and by Roussos & Maragos [19]. The method of Tschumperlé & Deriche is implemented in the GREYCstoration software (see http://cimg. sourceforge.net/greycstoration/), for the method of Roussos & Maragos, test results are available from the site http://cvsp.cs.ntua.gr/∼tassos/PDEinterp/ ssvm07res/. In our method, the kernel function has to be chosen appropriately. We use G(x, y) :=

1 χ 1 1 2 gσ (x, y), g (x, y) dx dy [− 2 , 2 ] [− 1 , 1 ]2 σ 2 2

where gσ is the two-dimensional isotropic Gaussian of standard deviation σ. In our method a value of 20 is used for the variance σ 2 . For evaluating the methods, we use the two test images shown in Fig. 1. For both images, a low and a high resolution version is available, where the low resolution image is obtained from the high resolution image via low-pass ﬁltering (convolution with a bicubic spline) and downsampling by a factor of four, see [19]. The test images were provided by Roussos & Maragos. The methods mentioned above are used to upsample the low resolution image by a factor four. Our method is applied with 100 time steps, Δt = 0.03 , ε = 0.05 and σ 2 = 20 for the ﬁrst and 100 time steps, Δt = 0.05, ε = 0.01 and σ 2 = 20 for the second test image, respectively. For GREYCstoration (version 2.9) we use the option ’-resize’ together with the aimed size of the high resolution image and parameters ’-anchor true’, ’-iter 3’ and ’-dt 10’. For the remaining parameters

420

F. Lenzen and O. Scherzer

Fig. 1. Two test images. Each test image is available in a low and a high resolution version with a factor of four between both resolution.

the default values are used. The results of Roussos’ method were obtained from the web site mentioned above. Let us consider the results of upsampling the ﬁrst test image. In order to highlight the diﬀerences between the methods, we compare only details of the resulting images, see Fig. 2. The results with nearest neighbor and cubic interpolation are shown in Fig. 2, top right and middle left, respectively. Both results are unsatisfactory and conﬁrm, what is well known from the literature, that by nearest neighbor interpolation the upsampled images look blocky and cubic interpolation produces blurry images. The result of GREYCstoration with interpolation constraints (Fig. 2, middle row right) also appears blurry, but compared to cubic interpolation better reconstruct the edges in the image. The method proposed by Roussos & Maragos as well as our method (see Fig. 2, bottom row) produce sharp and well reconstructed edges. In order to further investigate the diﬀerences between the PDE based methods, we zoom into two regions of the second test image, one region containing an edge (see Fig. 3) and one region with texture (see Fig. 4). Fig. 3 shows the edge region after applying the methods proposed by Tschumperlé with interpolation constraints (top row, second left), Roussos (top row, second right) and our method (top row, right). For comparison we have plotted the detail of the original image (top row, left). One can see that by Tschumperlé’s method the edges appear blurry and irregular. This seems to be an eﬀect of the interpolation constraints, because when Tschumperlé’s method is applied without constraints, strong anisotropic diﬀusion along the edge occurs so that the edge becomes more regular. By the method of Roussos the edge is reconstructed in a sharp way, but overshots appear. Our method is also able to reconstruct the edge sharply but with little overshots. Concerning the gray mark at the parrot’s beak, we observe that Tschumperlé’s method reconstructs the shape of the mark better than the other methods do. The diﬀerences in the behavior of the methods can also be recognized when applying the Sobel operator to the interpolated images: The thickness of the edges in the result of the Sobel operator indicates the blurriness of the reconstructed edge. We see that the proposed method produces sharper edges than the

A Geometric PDE for Interpolation of M -Channel Data

421

Fig. 2. Upsampling by a factor of four, Detail of the ﬁrst test image. top left: original high resolution image, top right: nearest neighbor interpolation, middle left: cubic interpolation, middle right: interpolation using GREYCstoration, bottom left: interpolation method proposed by Roussos et. al, bottom right: proposed interpolation method.

method by Roussos and more regular edges than the method by Tschumperlé. The overshots introduced by Roussos’ method can also be observed in the outcome of the Sobel operator. They are far stronger than the overshots produced by our method. Now we investigate the eﬀect of the interpolation methods on textures. Fig. 4, top left, shows a textured region of the original image. The results of the methods proposed by Tschumperlé (with interpolation constraints) and Roussos are given in Fig. 4, top right and bottom left, respectively. The result of the proposed method is shown in Fig. 4, bottom right. One observes a certain blurriness

422

F. Lenzen and O. Scherzer

Fig. 3. Detail of an edge in the original and interpolated images (top row, using GREYCstoration with interpolation constraints, Roussos’ method, and the proposed method) and subsequently applied Sobel operator (bottom row)

in the results by Tschumperlé’s method. As for the result before, we point out that incorporating the interpolation constraints seems to have a strong eﬀect on the result. When applying GREYCstoration without imposing constraints, the results are much more inﬂuenced by the anisotropic diﬀusion and the edges and the texture are accentuated. In the result of the interpolation method proposed by Roussos, we see a strong eﬀect of the anisotropic diﬀusion on the texture, so that the result is more visually appealing than the other results. Nevertheless, a comparison with the original image shows that original and reconstructed texture diﬀer signiﬁcantly. In particular the orientations of the short stripes in the face of the parrot are diﬀerent. Note that the anisotropic diﬀusion induced by the direction of the texture also aﬀects the pupil of the parrot. On the result of our method we remark that the reconstruction of the texture is quite conservative, i.e., we stay near the initial guess. The blockyness is slightly reduced by the evolution process. Taking a look at the eye of the parrot, the relation of our

A Geometric PDE for Interpolation of M -Channel Data

423

Fig. 4. A texture detail of the original (top left) and interpolated images using GREYCstoration (top right), Roussos’ method (bottom left) and the proposed method (bottom right)

method to Mean Curvature Flow can be observed: The pupil is reconstructed as a perfectly circular shape.

5

Conclusion

We have proposed a new PDE based method for the interpolation of color images. The method diﬀers from other state-of-the-art methods by the underlying evolution process. We use a PDE which is a generalized Mean Curvature Flow, whereas other methods are based on anisotropic diﬀusion. Interpolation constraints are satisﬁed by projecting the evolution process onto an adequate function space. Numerical tests show that our method is competitive to state-of-the-art interpolation methods. Due to the Mean Curvature Flow nature of the method, edges are well reconstructed. Textures are treated in a conservative manner.

424

F. Lenzen and O. Scherzer

Acknowledgments We want to thank Gerhard Dziuk (Univ. Freiburg), Peter Elbau (RICAM, Linz) and Markus Grasmair (University Innsbruck) for inspirational discussions. We thank David Tschumperlé for providing GREYCstoration and Anastasios Roussos and Petros Maragos for providing the test images as well as the results of their algorithm. The work of O.S. is partially funded by the project FSP S 92 (subproject 9203-N12).

References 1. Belahmidi, A., Guichard, F.: A partial diﬀerential equation approach to image zoom. In: Proc. of the 2004 Int. Conf. on Image Processing, pp. 649–652 (2004) 2. Bertalmio, M., Sapiro, G., Caselles, V., Ballester, C.: Image inpainting. In: [13], pp. 417–424 (2000) 3. Brézis, H.: Opérateurs maximaux monotones et semi-groupes de contractions dans les espaces de Hilbert. North-Holland Publishing Co., Amsterdam (1973); NorthHolland Mathematics Studies, No. 5. Notas de Matemática (50) 4. Burger, W., Burge, M.J.: Digitale Bildverarbeitung. Springer, Heidelberg (2005) 5. Chan, R., Setzer, S., Steidl, G.: Inpainting by ﬂexible Haar wavelet shrinkage. Preprint, University of Mannheim (2008) 6. Chan, T., Kang, S., Shen, J.: Euler’s elastica and curvature based inpaintings. SIAM J. Appl. Math. 63(2), 564–592 (2002) 7. Dacorogna, B.: Weak Continuity and Weak Lower Semicontinuity of Non-Linear Functionals. Lecture Notes in Mathematics, vol. 922. Springer, Heidelberg (1982) 8. Dacorogna, B.: Direct Methods in the Calculus of Variations. Applied Mathematical Sciences, vol. 78. Springer, Berlin (1989) 9. Elbau, P., Grasmair, M., Lenzen, F., Scherzer, O.: Evolution by non-convex energy functionals. Reports of FSP S092 - Industrial Geometry 75, University of Innsbruck, Austria (submitted) (2008) 10. Grasmair, M., Lenzen, F., Obereder, A., Scherzer, O., Fuchs, M.: A non-convex PDE scale space. In: [15], pp. 303–315 (2005) 11. Guichard, F., Malgouyres, F.: Total variation based interpolation. In: Proceedings of the European Signal Processing Conference, vol. 3, pp. 1741–1744 (1998) 12. Hagen, H., Weickert, J. (eds.): Visualization and Processing of Tensor Fields. Mathematics and Visualization. Springer, Heidelberg (2006) 13. Hoﬀmeyer, S. (ed.): Proceedings of the Computer Graphics Conference 2000 (SIGGRAPH 2000). ACMPress, New York (2000) 14. Jähne, B.: Digitale Bildverarbeitung, 5th edn. Springer, Heidelberg (2002) 15. Kimmel, R., Sochen, N.A., Weickert, J. (eds.): Scale-Space 2005. LNCS, vol. 3459. Springer, Heidelberg (2005) 16. Malgouyres, F., Guichard, F.: Edge direction preserving image zooming: a mathematical and numerical analysis. SIAM J. Numer. Anal. 39, 1–37 (2001) 17. Nashed, M.Z. (ed.): Generalized inverses and applications. Academic Press/ Harcourt Brace Jovanovich Publishers, New York (1976) 18. Roussos, A., Maragos, P.: Vector-valued image interpolation by an anisotropic diﬀusion-projection pde. In: Sgallari, F., Murli, A., Paragios, N. (eds.) SSVM 2007. LNCS, vol. 4485, pp. 104–115. Springer, Heidelberg (2007)

A Geometric PDE for Interpolation of M -Channel Data

425

19. Roussos, A., Maragos, P.: Reversible interpolation of vectorial images by an anisotropic diﬀusion-projection pde. In: Special Issue for the SSVM 2007 conference. Springer, Heidelberg (2007) (accepted for publication) 20. Scherzer, O., Weickert, J.: Relations between regularization and diﬀusion ﬁltering. J. Math. Imaging Vision 12(1), 43–63 (2000) 21. Tschumperlé, D.: Fast anisotropic smoothing of multi-valued images using curvature-preserving pde’s. International Journal of Computer Vision (IJCV) 68, 65–82 (2006) 22. Tschumperlé, D., Deriche, R.: Vector valued image regularization with pdes: A common framework for diﬀerent applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (2005) 23. Weickert, J., Welk, M.: Tensor ﬁeld interpolation with pdes. In: [12], pp. 315–325 (2006)

An Edge-Preserving Multilevel Method for Deblurring, Denoising, and Segmentation Serena Morigi1 , Lothar Reichel2 , and Fiorella Sgallari1 1

2

Dept. of Mathematics-CIRAM, University of Bologna, Bologna, Italy {morigi,sgallari}@dm.unibo.it Dept. of Mathematical Sciences, Kent State University, Kent, OH 44242, USA [email protected]

Abstract. We present a fast edge-preserving cascadic multilevel image restoration method for reducing blur and noise in contaminated images. The method also can be applied to segmentation. Our multilevel method blends linear algebra and partial diﬀerential equation techniques. Regularization is achieved by truncated iteration on each level. Prolongation is carried out by nonlinear edge-preserving and noise-reducing operators. A thresholding updating technique is shown to reduce “ringing” artifacts. Our algorithm combines deblurring, denoising, and segmentation within a single framework.

1

Introduction

Digital image restoration, reconstruction, and segmentation are important in medical and astronomical imaging, ﬁlm restoration, as well as in image and video coding. This paper introduces a cascadic multilevel method for simultaneous restoration and segmentation of blurred and noisy images. Blur arises for many reasons, including out-of-focus cameras, and camera or object motion during exposure. Blur often is modeled by a point-spread function (PSF). Noise is the random, unwanted, variation in brightness of an image. It may originate from, e.g., ﬁlm grain or electronic noise from a digital camera or scanner. We consider additive noise in this work. It is well known that linear deblurring methods tend to introduce oscillatory artifacts. Variational deblurring methods are able to reduce these artifacts, however, they typically are much more computationally intensive than linear methods; see, e.g., Welk et al. [15] for a discussion. Many segmentation methods apply curve evolution techniques. These methods seek to detect object boundaries, represented by closed curves in an image. The contours are represented as the zero level set of an implicit function deﬁned in higher dimension. The active contours evolve in time according to a Partial Diﬀerential Equation (PDE) model, which takes into account intrinsic geometric measures of the image. We will use a variant proposed by Li et al. [7] of the wellknown Geodesic Active Contours (GAC) model [2]. This paper discusses a cascadic multilevel image restoration method that allows both spatially variant and spatially invariant PSFs. The method requires X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 426–438, 2009. c Springer-Verlag Berlin Heidelberg 2009

Multilevel Method for Deblurring, Denoising, and Segmentation

427

the solution of a linear system of equations on each level. These systems are solved by an iterative method, the choice of which depends on properties of the PSF. We introduce a thresholding updating strategy in order to suppress “ringing.” The restriction operators are deﬁned by solving local weighted least-squares problems, and the prolongation operators are determined by piecewise linear prolongation followed by integrating a discretized nonlinear Perona-Malik diﬀusion equation for a few time-steps. The purpose of the integration is to reduce noise. The cascadic multilevel method so obtained shares the computational eﬃciency and simplicity of truncated iteration for the solution of linear discrete ill-posed problems with the edge-preserving property of nonlinear models. The multilevel method proceeds from coarser to ﬁner levels, and regularizes by truncated iteration on each level. For many image restoration problems, the multilevel method demands fewer matrix-vector product evaluations on the ﬁnest level than the corresponding 1-level truncated iterative method, and often determines restorations of higher quality. A beneﬁt of our multilevel approach to image restoration is that it easily can be combined with image segmentation, as is illustrated in the present paper. We remark that our multilevel method diﬀers signiﬁcantly from multilevel methods for the solution of well-posed boundary value problems for elliptic partial diﬀerential equations in that prolongation and restriction operators, as well as the number of iterations on each level are chosen in a diﬀerent manner. This paper is organized as follows. Section 2 introduces the variational deblurring and the denoising model, Section 3 discusses the cascadic multilevel framework, and Section 4 presents a few computed examples. Concluding remarks can be found in Section 5.

2

Deblurring, Denoising, and Segmentation of Images

We consider the restoration of two-dimensional gray-scale images, which have been contaminated by blur and noise. The available observed blur- and noisecontaminated image f δ is related to the unavailable blur- and noise-free image u ˆ by the degradation model δ f (x) = h(x, y)ˆ u(y)dy + η δ (x), x ∈ Ω, (1) Ω 2

where Ω ⊂ R is the image domain, η δ represents noise in the data, and the kernel h(x, y) models the PSF. If the blur is spatially invariant, then h is of the form ˜ ˜ The kernel is smooth or piecewise smooth h(x, y) = h(x− y) for some function h. and, therefore, the integral operator is compact. It follows that the solution of (1) is an ill-posed problem; see, e.g., Engl et al. [3] and Hansen [5] for discussions on ill-posed problems and their numerical solution. We would like to determine an accurate approximation of u ˆ when the observed image f δ and the kernel h, but not the noise η δ , are known. A popular approach to achieving this is to minimize the functional 2 1 δ E(u) = h(x, y)u(y)dy − f (x) + ρ R(u(x))dx, (2) Ω 2 Ω

428

S. Morigi, L. Reichel, and F. Sgallari

where ρ > 0 is a regularization parameter and R(u) = ψ(|∇u|2 )

(3)

is a regularization operator. Here ψ is a diﬀerentiable monotonically increasing function and ∇u denotes the gradient of u; see, e.g., Rudin et al. [11] and Welk et al. [15] for discussions on this kind of regularization operators. The Euler-Lagrange equation associated with (2), supplied with a gradient descent which yields a minimizer as “time” t → ∞, is given by ∂u (t, z) = − h(x, z) h(x, y)u(t, y)dy − f δ (x) dx + ρ D(u(t, z)), ∂t Ω Ω (4) for z ∈ Ω and t ≥ 0. The initial function u(0, z) = f δ (z), z ∈ Ω, and suitable boundary conditions are used. We also refer to D as a regularization operator. Image restoration methods based on the Euler-Lagrange equation require that the regularization operator D, as well as values of the regularization parameter ρ and a suitable ﬁnite time-interval of integration [0, T ] be chosen. The determination of suitable values of ρ and T generally is not straightforward. We get from (3) that D(u) = div(g(|∇u|2 )∇u),

g(t) = dΨ (t)/dt.

(5)

The function g is referred to as the diﬀusivity. Perona-Malik regularization is obtained by choosing the diﬀusivity g(s) =

1 , 1 + s/σ

(6)

where σ is a positive constant; see [14]. Alternatively, one can use a regularization operator of total variation-type. Nonlinear models based on (4)-(6) can provide denoising and deblurring of good quality; however, their time-integration is computationally demanding: explicit methods require many tiny time-steps and therefore are expensive, while each time-step with an implicit or semi-implicit method is, in general, expensive even if it could be accelerated by multigrid techniques. A much cheaper and simpler approach to determining an approximation of the desired image u ˆ is to apply a few steps of an iterative method to the linear system of equations obtained by a discrete approximation of (1), Au = bδ ,

A ∈ Rn×n ,

u, bδ ∈ Rn .

(7)

Here A is a discrete blurring operator and bδ represents the available blur- and noise-contaminated image. In applications typically bδ , rather than f δ , is available; see [5] for details. Approximate solutions of (7) conveniently can be computed by Krylov subspace iterative methods, where the choice of method depends on the matrix properties. For instance, spatially variant blur often gives rise to a nonsymmetric matrix A, and we may use the LSQR Krylov subspace method [13] to solve

Multilevel Method for Deblurring, Denoising, and Segmentation

429

(7). This method is an implementation of the conjugate gradient method applied to the normal equations. When the matrix is symmetric, but possibly indeﬁnite, the MR-II [4] Krylov subspace method is an attractive alternative to LSQR. The iteration number may be considered a discrete regularization parameter. It is important not to carry out too many iterations in order to avoid severe error propagation. This approach to determining a restored image is referred to as regularization by truncated iteration; see, e.g., [3, 4, 8] for discussions. Due to cut-oﬀ of high frequencies, these iterative methods may introduce artifacts, such as ringing, and fail to recover edges accurately. Many image analysis applications require image segmentation. The level to which segmentation is carried out depends on the problem being solved; segmentation should be terminated when the regions of interest in the application have been isolated. This problem-dependence makes autonomous segmentation one of the most diﬃcult computational tasks in image analysis. The presence of noise and blur makes this task even more complicated. In this paper we carry out segmentation by computing Geodesic Active Contours (GAC). This kind of segmentation methods are based on curve evolution theory, see [2] and references therein, and level sets [12]. The basic idea is to start with initial boundary shapes represented by closed curves, i.e., contours, and iteratively modify these contours by application of shrink/expansion operations determined by image constraints. The shrink/expansion operations, referred to as contour evolution, are performed by minimizing an energy functional, similarly to traditional region-based segmentation methods; however, the level set framework provides more ﬂexibility. The GAC PDE model proposed in [2] is given by ∂φ ∇φ = |∇φ|div g(|∇bδ |2 ) , (8) ∂t |∇φ| where the edge-detector function g is deﬁned by (6) and the initial condition φ0 is the signed distance function to an arbitrary initial curve enclosing the objects to be segmented. The solution to the segmentation problem is the zero-level set of the steady state of the ﬂow φt = 0. We apply a fast curve evolution method recently suggested by Li et al. [7] in our multilevel method, which eliminates the need of costly re-initialization, but we remark that other GAC methods also can be used.

3

The Cascadic Multilevel Framework

We ﬁrst review the cascadic multilevel method proposed in [8] for the removal of blur and noise. In [8] only symmetric blurring matrices are considered. Introduce for v = [v (1) , v (2) , . . . , v (n) ]T ∈ Rn the weighted least-squares norm v =

1 (i) 2 v n i=1 n

1/2 .

(9)

430

S. Morigi, L. Reichel, and F. Sgallari

Let ˆb ∈ Rn denote the unknown noise-free right-hand side associated with the right-hand side bδ of (7). We assume that ˆb ∈ Range(A) and that a bound δ for the noise e = bδ − ˆb is available, i.e., e ≤ δ.

(10)

Let W1 ⊂ W2 ⊂ · · · ⊂ W be a sequence of nested subspaces of Rn of dimension dim(Wi ) = ni with n1 < n2 < . . . < n = n. We refer to the subspaces Wi as levels, with W1 being the coarsest and W = Rn the ﬁnest level. Each level is furnished with a weighted least-squares norm; level Wi has a norm of the form (9) with n replaced by ni . We choose ni−1 = ni /4, 1 < i ≤ . Let Ai ∈ Rni ×ni be the representation of the blurring operator A on level Wi . The matrix Ai is determined by discretization of the integral operator (1) similarly as A. This deﬁnes implicitly the restriction operator Ri : Rn → Wi , such that Ai = Ri ARi∗ . (11) We deﬁne R = I. The choice of restriction operators Ri is in our experience less crucial for (ω) achieving high-quality restorations than the choice of restriction operators Ri : Rn → Wi for reducing the available blur- and noise-contaminated image represented by the right-hand side bδ in (7). We let (ω)

bδi = Ri bδ , (ω)

1 ≤ i < ,

(12)

where the Ri are determined by repeated local weighted least-squares approximation, inspired by a “staircasing”-reducing scheme recently proposed by Buades et al. [1]. Also the choice of prolongation operators from level i−1 to level i is important for the performance of the multilevel method. We apply nonlinear prolongation operators Pi : Wi−1 → Wi , 1 < i ≤ , deﬁned by piecewise linear interpolation followed by integration of the Perona-Malik equation over a short time-interval; see below. The Pi are designed to be noise-reducing and edge-preserving. The multilevel methods of the present paper are cascadic, i.e., they ﬁrst determine an approximate solution of A1 u = bδ1 in W1 , using the LSQR or MR-II iterative methods. We refer to the iterative method as IM in Algorithm 1 below. The iterations with this method are terminated by the discrepancy principle; see below. The so determined approximate solution in W1 is mapped into W2 by the prolongation P2 . A correction of this mapped iterate in W2 is computed by the IM. Again, the iterations are terminated by the discrepancy principle, and the approximate solution in W2 so obtained is mapped into W3 by P3 . The computations are continued in this fashion until an approximation of u ˆ has been determined in W = Rn . In the algorithm Δui,mi := IM(Ai , bδi − Ai ui,0 ) denotes the computation of the approximate solution Δui,mi of Ai zi = bδi − Ai ui,0 by mi iterations with one of the iterative methods MR-II or LSQR, using the initial iterate Δui,0 = 0.

Multilevel Method for Deblurring, Denoising, and Segmentation

431

Multilevel Algorithm 1 Input: A, bδ , δ, ≥ 1 (number of levels); Output: approximate solution u ∈ W of (7); segmented result φ ; Determine Ai and bδi from (11) and (12), respectively, 1 ≤ i ≤ ; u0 := 0; φ0 := initial contour; for i := 1, 2, . . . , do ui,0 := Pi ui−1 ; φi,0 := Si φi−1 ; Δui,mi := IM(Ai , bδi − Ai ui,0 ); Correction step: ui := ui,0 + βΔui,mi ; Segmentation step: φi := GAC(φi,0 , ui ); endfor

The number of iterations on each level is based on the discrepancy principle as follows: we assume that there are constants ci independent of δ, such that bδi − ˆbi ≤ ci δ,

1 ≤ i ≤ ,

where δ satisﬁes (10). It can be seen by using the noise-reducing property of the (ω) restriction operators Ri , that a suitable choice is ci =

1 ci+1 , 3

1 ≤ i < ,

c = γ,

(13)

for some constant γ > 1. In the computed examples of Section 4, we use γ = 1.4. The discrepancy principle prescribes that the iterations on level i be terminated as soon as bi − Ai ui,0 − Ai Δui,mi ≤ ci δ. (14) When many iterations are carried out, the computed approximate solution Δui,mi obtained, generally, is severely contaminated by noise, which is propagated from bi − Ai ui,0 . The purpose of the stopping criterion (14) is to i) allow enough iterations be carried out to determine an as accurate restoration on level i as possible, and ii) avoid to carry out so many iterations that the computed approximate solution Δui,mi is severely contaminated by propagated noise. Discussions on properties of the stopping rule (14) can be found in [8,10]. A general discussion on applications of the discrepancy principle to determine approximate solutions of ill-posed problems is provided in [3]. The nonlinear edge-preserving prolongation operators Pi have previously been applied in [8], where further details on their implementation are provided; see also [16]. The prolongation operator Pi ﬁrst maps the approximate solution determined by the algorithm on level Wi−1 into Wi by piecewise linear interpolation, and then uses the result as initial function for a discretized initial-boundary value problem for the Perona-Malik nonlinear diﬀusion equation ∂u = div(g(|∇u|2 )∇u), ∂t

(15)

432

S. Morigi, L. Reichel, and F. Sgallari

where g is the Perona-Malik diﬀusivity (6). Integration over a short time-interval removes noise while preserving rapid spatial transitions, such as edges. Integration is performed by carrying out about 10 time-steps of size about 0.2 with an explicit ﬁnite diﬀerence method. The small number of time-steps avoids diﬃculties due to numerical instability and keeps the computational work required for integration negligible. We found it to be beneﬁcial to apply more time-steps the more noise-contaminated the available image. However, in our experience the exact choices of the number of time-steps and their sizes are not crucial for the good performance of the multilevel method. In the algorithm, φ0 denotes the initial contour for the GAC segmentation method implemented by the solving (8); see [7]. The prolongation of the level set function from Wi−1 to Wi is carried out by spline interpolation and denoted by Si . The statement φi := GAC(φi,0 , ui ) updates the contour on level i. Ringing in restored images stems from the Gibbs phenomenon at discontinuities. The latter could be image borders, boundaries inside the image, or be introduced by inadequate spatial sampling of the image or kernel. The larger the support of the kernel in (1), the more pronounced the ringing. High contrast edges cause strong ringing, and the magnitude of the ringing is proportional to the norm of the image gradient. Based on these observations, we propose a deringing correction obtained by multiplying the image by the spatially variant function β(x, y) = α + (1 − α)(1 − g(|∇ui,0 (x, y)|2 )), (16) where g is the diﬀusivity (6) and the parameter 0 ≤ α ≤ 1 controls the suppression of the computed correction. Since we would like to suppress ringing in the smooth regions, but avoid suppression of edges, the correction function β should be small in smooth regions and large elsewhere. We use α = 0.05 in the computed examples of this paper, but this value can be tuned depending on the presence of large homogeneous regions in the image.

4

Numerical Results

We illustrate the performance of Algorithm 1. The computations are carried out in MATLAB with about 16 signiﬁcant decimal digits. We assume that a fairly accurate estimate of the norm of the noise is available. If this is not the case, such an estimate can be computed by integration of bδ for a few time-steps with the Perona-Malik diﬀerential equation; details are described in a forthcoming paper. Note that the matrices Ai , deﬁned by (11), do not have to be explicitly stored; it suﬃces to deﬁne functions for the evaluation of matrix-vector products with the Ai and, if Ai is nonsymmetric, also with the ATi . For the examples of this section, the matrix-vector products can be computed eﬃciently by using the structure of the Ai ; see, e.g., [9] for a discussion. The matrices corresponding to the ﬁnest level are numerically singular in all examples. The displayed restored images provide a qualitative comparison of the performance of the proposed cascadic multilevel method. A quantitative comparison is given by the Peak Signal-to-Noise Ratio,

Multilevel Method for Deblurring, Denoising, and Segmentation

PSNR(u , u ˆ) = 20 log10

255 dB, u − u ˆ

433

(17)

where u ˆ denotes the blur- and noise-free image and u the restored image determined by Algorithm 1. Each pixel is stored with 8 bits; the numerator 255 is the largest pixel-value that can be represented with 8 bits. A high PSNR-value indicates that the restoration is accurate; however, the PSNR-values are not always in agreement with visual perception. We also measure the variation in the error image uerr = u − uˆ, deﬁned by EV(u , u ˆ) = ∇uerr 22 , (18) pixel

where the sum is over all pixels of the image. The more accurately the edges are restored, the smaller this sum.

Fig. 1. Blur- and noise-free images used in the numerical experiments. Left: butterfly, 400 × 400 pixels. Right: corner, 512 × 512 pixels.

We apply Algorithm 1 to blur- and noise-contaminated versions of the images shown in Figure 1. The corner image is representative of images with welldeﬁned edges, while the butterfly image is a gray-scale photographic image with smoothed edges. Example 4.1. We consider the restoration of a contaminated version of the lefthand side image of Figure 1. Contamination is by space-invariant Gaussian blur as generated by the MATLAB function blur.m from Regularization Tools [6] with parameters sigma = 3 and band = 9. This function generates a block Toeplitz matrix with Toeplitz blocks. The parameter band speciﬁes the halfbandwidth of the Toeplitz blocks and the parameter sigma deﬁnes the variance of the Gaussian PSF. The image also is contaminated by 5% Gaussian noise. The blurring operator is symmetric. We therefore use the MR-II iterative method.

434

S. Morigi, L. Reichel, and F. Sgallari

Fig. 2. Example 1. Top-left: Image contaminated by Gaussian blur and 5% Gaussian noise. Top-right: Image restored by 1-level method. Bottom-left: Image restored by 3-level method. Bottom-right: Deringing function β deﬁned by (16).

Figure 2 provides a qualitative comparison of images restored by the basic 1-level MR-II method and the 3-level method deﬁned by Algorithm 1. The restoration obtained with the latter method can be seen to be of higher quality with sharper edges. The deringing function β (16) is shown in Figure 2 (bottom right); it is small in smooth image regions and large elsewhere. Table 1(a) gives a quantitative comparison of the restorations determined by Algorithm 1 with = 2 and = 3 levels, and the basic 1-level MR-II method, for diﬀerent amounts of noise. The columns marked “PSNR” and “EV” display (17) and (18), respectively. They show Algorithm 1 with = 3 to yield images with the highest PSNR- and smallest EV-values. The column marked “iter” shows the number of iterations required on each level. For instance, the triplet 4 − 1 − 2 indicates that Algorithm 1 carried out 4 MR-II iterations on the coarsest level, 1 iteration on the intermediate level, and 2 iterations on the ﬁnest level. The

Multilevel Method for Deblurring, Denoising, and Segmentation

435

Table 1. PSNR, number of iterations (iter), and edge variation (EV) as functions of the number of levels and noise-level (% noise) for restorations of (a) the image of Example 4.1 contaminated by Gaussian blur determined by band = 9 and sigma = 3, and (b) the image of Example 4.2 contaminated by motion blur deﬁned by r = 15 and θ = 10 % noise 1 1 2 1 3 1 1 5 2 5 3 5 1 10 2 10 3 10

(a) PSNR 26.05 26.73 26.86 24.30 24.38 24.63 23.25 23.42 23.60

iter 11 89 979 4 33 533 3 22 412

EV 5043 4179 4060 5279 4682 4555 5477 4949 4853

% noise 1 1 2 1 3 1 1 5 2 5 3 5 1 10 2 10 3 10

(b) PSNR 30.93 31.69 32.02 27.13 28.56 28.69 25.15 26.77 26.93

iter 12 11 9 17 8 8 5 43 723 3 22 312

EV 4629 2294 2251 6519 3553 3140 5368 3692 3466

Fig. 3. Example 4.2. Left: Restoration determined by 3-level LSQR-based multilevel method. Right: Restoration obtained by basic 1-level LSQR.

dominating computational eﬀort are the matrix-vector product evaluations on the ﬁnest level. The 2- and 3-level methods can be seen to require fewer iterations on the ﬁnest level than the basic 1-level MR-II method. 2 Example 4.2. Consider the restoration of a version of the right-hand side image of Figure 1 that has been contaminated by motion blur and 5% Gaussian noise. The PSF is represented by a line segment of length r pixels in the direction of the motion. The angle θ (in degrees) speciﬁes the direction; it is measured counter-clockwise from the positive x-axis. The PSF takes on the value r−1 on this segment and vanishes elsewhere. We refer to the parameter r as the width.

436

S. Morigi, L. Reichel, and F. Sgallari

Fig. 4. Example 3. Top left: Segmentation of a blur- and noise- free image. Top right: Segmentation a blurred and noisy image by a 1-level method. Bottom-left: Segmentation by 3-level method of the blurred and noisy image on level 2. Bottom-right: Segmentation by 3-level method on ﬁnest level.

The motion blur for this example is deﬁned by r = 15 and θ = 10. The blurring matrix A is nonsymmetric. We therefore use the LSQR iterative method in Algorithm 1. Figure 3 (left) shows the restoration determined by Algorithm 1 with 3 levels. The restored image obtained by the basic 1-level LSQR method is shown in Figure 3 (right). Visual comparison shows Algorithm 1 to give the most pleasing restoration. This is in agreement with the PSNR- and EV-values reported in Table 1(b). 2 Example 4.3. We apply Algorithm 1 to segmentation of a contaminated version of the image of Figure 1 (right). The contamination is caused by Gaussian blur, determined by band = 9 and sigma = 3, and 10% Gaussian noise. Segmentation is carried out using the variational formulation for geodesic active contours

Multilevel Method for Deblurring, Denoising, and Segmentation

437

(GAC) without re-initialization as described by Li et al. [7]. The initial curve is close to the boundary of the image. Figure 4 (top-left) shows the segmentation obtained when applied to the noise- and blur-free image in Figure 1 (left). The curve evolution requires 900 iterations. Segmentation of the contaminated image is more diﬃcult. We ﬁrst deblur the contaminated image by the basic 1-level MR-II iterative method, and then apply GAC segmentation to the restored image. The resulting segmentation is shown in Figure 4 (top-right). The curve evolution required 1200 iterations. Finally, we apply Algorithm 1 with 3 levels and the Segmentation step. No segmentation is carried out on the coarsest level. On level = 2, we apply GAC segmentation with 400 curve evolution iterations. The resulting segmentation is shown in Figure 4 (bottom-left). Prolongation of the evolved contour is carried out by spline interpolation. Only 100 curve evolution iterations are required on the ﬁnest level. The resulting segmentation is displayed in Figure 4 (bottomright). The ﬁgure shows Algorithm 1 to be able to extract object boundaries with less computational eﬀort and higher accuracy than the corresponding 1level method. 2

5

Conclusions and Extension

Visual inspection of the images shown in Section 4, as well as computed PSNRand EV-values, show the cascadic multilevel method to give more accurate restorations than 1-level methods applied on the ﬁnest level only. A multilevel approach to segmentation of contaminated images also yields better results and requires less computational eﬀort than the corresponding 1-level method. The aim of ongoing work is to gain increased understanding of the interplay between image restoration and segmentation.

Acknowledgments This research has been supported by PRIN-MIUR-Coﬁn 2006 project, by University of Bologna "Funds for selected research topics", and in part by an OBR Research Challenge Grant.

References 1. Buades, A., Coll, B., Morel, J.M.: The staircasing eﬀect in neighborhood ﬁlters and its solution. IEEE Trans. Image Processing 15, 1499–1505 (2006) 2. Caselles, V., Kimmel, R., Sapiro, G.: Geodesic active contours. Int. J. Comput. Vis. 22, 61–79 (1997) 3. Engl, H.W., Hanke, M., Neubauer, A.: Regularization of Inverse Problems. Kluwer, Dordrecht (1996) 4. Hanke, M.: Conjugate Gradient Type Methods for Ill-Posed Problems. Longman, Essex (1995)

438

S. Morigi, L. Reichel, and F. Sgallari

5. Hansen, P.C.: Rank-Deﬁcient and Discrete Ill-Posed Problems. SIAM, Philadelphia (1997) 6. Hansen, P.C.: Regularization tools, version 4.0 for MATLAB 7.3. Numer. Algorithms 46, 189–294 (2007) 7. Li, C., Xu, C., Gui, C., Fox, M.D.: Level set evolution without re-initialization: a new variational formulation. In: Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 430–436 (2005) 8. Morigi, S., Reichel, L., Sgallari, F., Shyshkov, A.: Cascadic multiresolution methods for image deblurring. SIAM J. Imaging Sci. 1, 51–74 (2008) 9. Ng, M.K., Chan, R.H., Tang, W.-C.: A fast algorithm for deblurring models with Neumann boundary conditions. SIAM J. Sci. Comput. 21, 851–866 (1999) 10. Reichel, L., Shyshkov, A.: Cascadic multilevel methods for ill-posed problems. J. Comput. Appl. Math. (in press) 11. Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 12. Osher, S., Sethian, J.A.: Fronts propagating with curvaturedependent speed: algorithms based on Hamilton-Jacobi formulations. J. Comput. Phys. 79, 12–49 (1988) 13. Paige, C.C., Saunders, M.A.: LSQR: An algorithm for sparse linear equations and sparse least squares. ACM Trans. Math. Software 8, 43–71 (1982) 14. Perona, P., Malik, J.: Scale-space and edge detection using anisotropic diﬀusion. IEEE Trans. Pattern Anal. Mach. Intell. 12, 629–639 (1990) 15. Welk, M., Theis, D., Brox, T., Weickert, J.: PDE-based deconvolution with forwardbackward diﬀusivities and diﬀusion tensors. In: Kimmel, R., Sochen, N.A., Weickert, J. (eds.) Scale-Space 2005. LNCS, vol. 3459, pp. 585–597. Springer, Heidelberg (2005) 16. Weickert, J., Romeny, B.M.H., Viergever, M.A.: Eﬃcient and reliable schemes for nonlinear diﬀusion ﬁltering. IEEE Trans. Image Process. 7, 398–410 (1998)

Fast Dejittering for Digital Video Frames Mila Nikolova CMLA, ENS Cachan, CNRS, PRES UniverSud, France [email protected] http://www.cmla.ens-cachan.fr/∼nikolova/

Abstract. We propose several very fast algorithms to restore jittered digital video frames (their rows are shifted) in one iteration. The restored row shifts minimize non-smooth and possibly non-convex local criteria applied on the second-order diﬀerences between consecutive rows. We introduce speciﬁc error measures to assess the quality of dejittering. Our algorithms are designed for gray-value, color and noisy images. Some of them can be considered as parameter-free. They outperform by far the existing algorithms both in quality and in speed. They are a crucial step towards real-time dejittering of digital video.

1

Intrinsic Dejittering

Image jitter consists in a random horizontal shift of each row of a video frame. It occurs when the synchronization row pulses are corrupted e.g. by noise or degradation of the storage medium, or in wireless transmission. The visual eﬀect is disturbing since all shapes are jagged, cf. e.g. Fig. 4. Structured jitter can be provoked by acoustic or electrical interferences [7], cf. e.g. Fig. 8. Time base corrector machines recover with some success the row synchronization pulses. This operation is often unsuccessful or impossible [6]. An alternative—restoring the video frames directly from the jittered data, called intrinsic dejittering [5]—is much more ﬂexible and widely applicable. State of the Art. Intrinsic dejittering was invented in [5]. The method is based on a 2D auto-regressive (AR) image model. The unknown AR coeﬃcients and row starts are estimated iteratively, jointly by blocs; a drift compensation is applied afterwards [6]. In [7], the 1 norm of the diﬀerences between 2 or 3 consecutive shifted rows is compared in the framework of dynamic programming. A fully Bayesian iterative method using a TV-based prior for joint dejittering and denoising is derived in [12]. The Bake and Shake method in [3] uses a good PDE image model (e.g. Perona-Malik) to recover the row positions. In [4], the same authors analyze the vertical slicing moments of images of bounded variation and derive a variational method (faster than [3] but less eﬀective for diﬃcult data). Our Approach. We exhibit a pertinent model enabling to discriminate natural images from their jittered versions. Each row is restored based on the previously restored rows using a simple non-smooth and possibly non-convex local criterion. We thus construct one-iteration eﬀective and fast dejittering algorithms. Noisy X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 439–451, 2009. c Springer-Verlag Berlin Heidelberg 2009

440

M. Nikolova

jittered images are restored in two stages: (a) dejittering of the raw data; (b) denoising of the obtained dejittered image.

2

The Main Points of Our Approach

Notations. For any positive integers m and n, the rows of a matrix h ∈ Rm×n are denoted by hi , 1 ≤ i ≤ m, and the components of a row hi by hi (j), 1 ≤ j ≤ n. The components of any n-length vector u are denoted by ui , 1 ≤ i ≤ n. Given an original image f ∈ Rr×c , a jittered image g is produced according to: fi (j + di ) if 1 ≤ j + di ≤ c, (1) 1 ≤ i ≤ r, di ∈ Z, 1 ≤ j ≤ c, gi (j) = 0 otherwise. In practice, the row shifts di are bounded, |di | ≤ M , for M ≤ 6 or more [6]. The ˆ respectively. restored image and row shifts are denoted by fˆ and d, 2.1

Choice of a Local Criterion on Consecutive Rows

First of all, we need a good model for the columns of natural images.

original

(a) Original

jittered

(b) One column

(c) Jittered

Fig. 1. 50 × 50 zoom of Lena. (b) Gray value of column 15 in (a) and in (c).

Remark 1. The gray-value of the columns of natural images can be seen as pieces of 2nd or 3rd order polynomials—see Fig. 1(b) left or Fig. 3 in [9]. Such a claim is false for jittered images—see Fig. 1(b) right. This observation provides a sound basis to discriminate a natural image from its jittered versions. Suppose that fˆ1 , . . . fˆi−1 are already dejittered. By Remark 1, we will estimate the next dˆi using a criterion that compares fˆi−1 , fˆi−2 , . . . with all possible shifts of the ith data row, gi (j − di ), di ∈ {−N, . . . , N } for N ≥ M .

Uniform jitter, M = 6 arg min J , α = 1 Original (116 × 200) arg min J , α = 0.5 Fig. 2. Uniform jitter on {−M, . . . , M }. Restorations using (2)-(3) and (4).

Fast Dejittering for Digital Video Frames

441

Remark 2. Each row of g has no more than N zero-valued pixels at both extremities because of the jitter, see e.g. Fig. 2. Involving them in our criterion can seriously distort its meaning. So for any row i, we will use only data samples gi (j) for j ∈ {N + 1, . . . , c − N } which certainly belong to the original image. Guided by Remarks 1 and 2, as well as by a series of preliminary experiments (see e.g. Fig. 3), our main focus is on (2) dˆi = arg min J (di ) : di ∈ {−N, .., N } , N ≥ M, J (di ) =

c−N

gi (j − di ) − 2fˆi−1 (j) + fˆi−2 (j)α , α ∈ {0.5, 1} .

(3)

j=N +1

dˆi is easily found by exhaustive search since it belongs to a small ﬁnite set. Then: ∀j ∈ {1, · · · , c}, fˆi (j) = gi (j − dˆi ) if 1 ≤ j − dˆj ≤ c and fˆi (j) = 0 else .

(4)

Criterion J for α ∈ (0, 1] is minimized by a dˆi such that for a maximum number of components j we have fˆi (j) ≈ 2fˆi−1 (j) − fˆi−2 (j)—i.e. fˆi (j), fˆi−1 (j) and fˆi−2 (j) form a nearly linear segment—while breakpoints are preserved; for a mathematical ﬂavor, see [10, 11, 13]. Then the gray value of each column of fˆ varies nearly piecewise linearly. More details are given in [9]. Remark 3. Dejittering a single frame yields a translated estimate pˆ of the row shifts, say pˆ = dˆ + C. Given the original d, the integer C is such that (5) C = arg max # i ∈ {1, · · · , r} : pˆi − n = di , n∈Z

where # means cardinality. α ˆ Alternative criteria (see Fig. 3). Minimize J1 (di ) = c−N j=N +1 gi (j−di )−fi−1 (j) yields (c)-(d). Criteria J1 work poorly—they tend to recover constant grayvalue vertical pieces. Solving (2)-(3) yields the original image α in (e)-(f). Criteria c−N ˆ ˆ ˆ J3 (di ) = j=N +1 gi (j−di ) − 3fi−1 (j) + 3fi−2 (j) − fi−3 (j) cannot discriminate well enough a natural image from its slightly shifted versions, see (g)-(h). 2.2

Error Measures for Dejittering

Remind that fˆ is translated with respect to (w.r.t.) f and that the extremities of its rows are null. In order to apply standard error measures, we shrink fˆ to fˆs fˆis (j) = fˆi (j + N ), 1 ≤ j ≤ c − 2N, ∀i ∈ {1, . . . , r}, so that fˆs contains only proper image information. Then we select an r×(c−2N ) inner submatrix f s of the original f that matches fˆs the best. Note that any error measure on fˆs − f s is sensitive to the of f s . We select f s using choice r c−2N s s ˆ the 1 norm: f − f 1 = min0≤k≤2N i=1 j=1 fi (j + k) − fˆs (j). Then we

442

M. Nikolova

(a) Original

(e) J , α = 1

(b) Jittered

(f) J , α = 0.5

(c) J1 , α = 1

(d) J1 , α = 0.5

(g) J3 , α = 1

(h) J3 , α = 0.5

Fig. 3. (b) Independent uniform jitter. Next: restorations for N = M + 1.

consider the mean absolute error mae(fˆ, f ) = f s−fˆs 1 / r(c−2N ) and the peak 2 signal to noise ratio, psnr(fˆ, f ) = 10 log10 δ r(c − 2N )/f s − fˆs 22 , where .2 is the 2 -norm and δ is the dynamic range of (fˆs , f s ). ˆ The error measure The quality of dejittering can also be evaluated using d− d. def ˆ d) = (1/r)d − d ˆ 1 gives the average displacement of the pixels along any e1 (d, column. The following two measures are quite interesting: ˆ d) def ˆ ∞% ; e∞ (d, = (100/c)d − d (6) def Δ ˆ = 0, 1 ≤ i ≤ r−1 % . (7) e0 (d, d) = 100/(r−1) # (dˆi −di ) − (dˆi+1 −di+1 ) e∞ measures the maximum horizontal error w.r.t. the width c of the image while ˆ eΔ 0 measures the number of changes in d − d w.r.t. the height r of the image. Δ Remark 4. When both e∞ and eΔ 0 are small (e.g. e∞ ≤ 0.4% and e0 ≤ 0.8%), we are guaranteed that dejittering is nearly perfect, independently of any other error measure (see Figs. 6, 7, 10 and 12). Indeed, for a 512 × 512 image, the proposed error bounds mean that no more than 4 rows have a horizontal erroneous shift which is no more than 2 pixels. For a natural image, such an error is invisible to the naked eye. However, if one of these values is larger, no conclusion can be done—cf. Fig. 9 and the relevant comments.

3

Algorithms for Gray-Value Natural Images

We construct an r×(c+2N )-size matrix f ∗ for N > M . The middle of its ﬁrst row f1∗ is g1 , so pˆ1 = N+1. Then we restore the relative row shifts pˆi ∈ {1, . . . , 2N +1}, ∀i ∈ {2,· · ·, r} based on (2)-(3) and (4). Then fˆ is an inner sub-matrix of f ∗ .

Fast Dejittering for Digital Video Frames

443

. . Notations. [a .. b .. c] means that a, b and c are concatenated horizontally; a ← b means that we replace a by b. ∀n ∈ N, θ(n) is the n-length 0-valued row: . def . θ(n) = 0 .. · · · .. 0 , #θ(n) = n.

(8)

Algorithm 1 (Gray value images) – Fix N > M , e.g., N = M + 1. – Choose α = 1 or α = 0.5.

. . 1. Deﬁne f ∗ ∈ Rr×(c+2N ) and set f1∗ = θ(N ) .. g1 .. θ(N ) .

. . 2. Split g = g L .. γ .. g R where g L ∈ Rr×N , γ ∈ Rr×(c−2N ) and g R ∈ Rr×N .

.. .. 3. Put pˆ0 = pˆ1 = N + 1 and u = ⎧ v = θ(N ) . γ 1 . θ(N ) . ..

.. 4. For any i = 2, . . . , r, do: ⎪ ⎪ (i) Put hk = θ(k − ⎪ ⎪ 1) . γ i . θ(2N − k + 1) ; ⎪ ⎪ ⎨ (ii) Find m = max k, pˆi−1 , pˆi−2 and n = min k, pˆi−1 , pˆi−2 +c−1 ; (a) ∀ k = 1, . . . , 2N +1, do ⎪ n α ⎪ ⎪ 1 k ⎪ ⎪ − 2u + v (iii) J (k) = h ; ⎪ j j j ⎩ n−m+1 j=m (b) Find pˆi = arg min{J (k) : 1 ≤ k ≤ 2N + 1} ;

. . pi − 1) .. γ i .. θ(2N + 1 − pˆi ) ; (c) Replace v ← u and u ← hpˆi = θ(ˆ

. . (d) Set f ∗ = θ(ˆ p − 1) .. g .. θ(2N − pˆ + 1) . i

i

i

i

5. Extract fˆ ∈ Rr×c from f ∗ ∈ Rr×(c+2N ) : cancel 2N columns at the left and right ends of f ∗ that have the largest number of zeros. Explanations. u, v and hk are c-length rows such that at step i, u and v correspond to the restored rows i − 1 and i − 2, respectively, while hk in 4a(i) realizes all possible shifts for row i. In 4a(ii), m and n help to satisfy Remark 2. In 4b, pˆi is the estimate for relative shift of row i. Computation time. We used Matlab 7.2 on a PC with Pentium 4 CPU 2.8GHz and 1GB RAM, under Windows XP Professional service pack 2. For a 512 × 512 image and N = 7 we got the solution in 0.62 s. for α = 1 and in 1 s. for α = 0.5. Translation Recovery. In order to compute the the errors deﬁned in § 2.2, we need the translation constant C given in (5). Note that 1 − N ≤ C ≤ 3N + 1. Algorithm (Translation Recovery) 1. Deﬁne I = {−N + 1, . . . , 3N + 1}. 2. Compute the histogram H(n) = # j ∈ I : pˆ(j) − d(j) = n , ∀n ∈ I. 3. Obtain C = arg maxn∈ I H(n). Then dˆi = pˆi − C, 1 ≤ i ≤ r.

444

M. Nikolova

Compound models. If the gray-values of the columns of an image are nearly constant on large pieces, we should involve in J a 1st -order diﬀerences term. Algorithm 1(a) In Algorithm 1, 4a(iii), use J below where β is a weight for 1st -order diﬀerences: α 1 J (k) = n−m+1 nj=m |hkj − 2uj + vj | + β|hkj − uj | , β ≥ 0. Illustrations. In all experiments, Algorithm 1, is applied with N = M + 1. The jitter in Fig. 4 is signiﬁcant. We kept this ﬁrst trial since our method found the original for α ∈ {0.5, 1}. In Fig. 5 (Peppers), the dejittered image is hard to distinguish from the original. However, the error image f s − fˆs shows a slight displacement of several pixels. The dejittered image in Fig. 6 is nearly ˆ perfect since eΔ 0 = 0.6% and e∞= 0.39%. We observe that d − d has a 1-pixel error at rows 83, 84 and 401. The ﬁrst two are within the zooms in the same ﬁgure. The restored Boat in Fig. 7 is quasi-perfect since eΔ 0 = 0.25% and e∞= 0.39%. The original Boat can be seen in Fig. 8 where the restorations are exact (all errors are null). For the results concerning [12] and [3], cf. section 6, p. 450.

Uniform jitter, M= 6 Bayesian TV [12] Bake & Shake [3] Algorithm 1≡Original mae=11.7, psnr=22 mae=7.4, psnr=23 mae=0, psnr=∞ Fig. 4. Algorithm 1 for α = 1 and α = 0.5 yields the original image

Uniform jitter,M=10 Original (512×512) Algorithm 1, α = 0.5

Error: f s − fˆs

Fig. 5. Algorithm 1 with α = 0.5 yields mae= 1.35, psnr=31.51 and e1 = 0.4

Large-Scale Experiment. We tested all proposed algorithms using 1000 independent experiments where 4 images were degraded with 2 diﬀerent types of random jitter and restorations were done for α = 1 and α = 0.5. The main conclusion is that α = 0.5 is better for images with texture or curvatures (Lena, Barbara,

Fast Dejittering for Digital Video Frames

Uniform jitter,M=6

Alg. 1, α = 0.5

Zoom dejittered

445

Zoom original

Fig. 6. (512×512). Algorithm 1: mae= 4.16, psnr=25.53, eΔ 0 = 0.6% and e∞ = 0.39%.

Uniform jitter,M=10 Bayesian TV [12] Bake & Shake [3] Alg. 1, α ∈ {0.5, 1} mae=13.4, psnr=20.8 mae=12.5, psnr=20.3 mae=0.6, psnr=42.9 Fig. 7. Boat (400×512). Algorithms 1 is nearly perfect: eΔ 0 = 0.25% and e∞ = 0.39%.

d = 6 sin

n 20

Algorithm 1 ≡Original d = 6 sin

n 4

Algorithm 1 ≡Original

Fig. 8. Boat (400×512). Here . denotes approximation to the nearest integer.

Peppers); α = 1 is better for images with many straight lines (Boat). In all cases α = 1 yields good results, usually α = 0.5 works better. The details are reported in [9]. Globally, the obtained mean results are very encouraging.

4

Algorithms Color Natural Images

We extend Algorithm 1 to RGB color images where all channels incur the same jitter. RGB images are represented by vector-valued matrices f where each pixel fi (j) has 3 components, fi (j; κ), 1 ≤ κ ≤ 3. The jittering model now reads: fi (j + di ; κ), if 1 ≤ j +di ≤ c, 1 ≤ i ≤ r, |di | ≤ M, 1 ≤ κ ≤ 3. gi (j; κ) = 0, otherwise, 1 ≤ j ≤ c,

446

M. Nikolova

The main algorithm is based on (2)-(3) and (4), yet again. Since the jitter is the same for all color channels, we obtain from g a gray-value image γ and estimate the relative row shifts pˆi using γ as in Algorithm 1. The dejittered color image fˆ is obtained by inserting pˆ into g. Similarly to (8), for any positive integer n we denote by θ(n × 3) the n-length vector-valued row whose components are (0, 0, 0) for all i = 1, · · · , n. Algorithm 2 (Color images) – Fix N > M , e.g., N = M + 1. – Choose α = 1 or α = 0.5.

. . 1. Deﬁne f ∗ ∈ Rr×(c+2N )×3 and set f1∗ = θ(N × 3) .. g1 .. θ(N × 3) .

. . 2. Split g = g L .. g .. g R , where g L ∈ Rr×N , g ∈ Rr×(c−2N ) and g R ∈ Rr×N . 3. Calculate γ 1 (j) = |g1 (j; 1)| + |g1 (j; 2)| + |g 1 (j; 3)| for 1 ≤ j ≤ c − 2N . 4. Put pˆ0 = pˆ1 = N + 1 and u = v = θ(N ), γ 1 , θ(N ) . 5. For any i = 2, . . . , r do: i. γ i (j) = g i (j; 1) + g i (j; 2) + g i (j; 3); (a) ∀ k = 1, . . . , 2N + 1 do: ii. do step 4a as in Algorithm 1; (b) Do steps 4b and 4c as in Algorithm 1 ;

. . p − 1) × 3 .. g .. θ (2N − pˆ + 1) × 3 . (c) Set f ∗ = θ (ˆ i

i

i

i

6. Find fˆ ∈ Rr×c as in step 5, Algorithm 1. Computation time. In the conditions of Remark 3, p.443, for a 512×512 RGB image and N = 7 we got the solution in 1 s. for α = 1 and in 1.4 s. for α = 0.5. Algorithm 2(a) (Compound models) In step 5a, Algorithm 2, replace J as done in Algorithm 1(a). Illustrations. In all examples, Algorithms 2 and 2(a) are used with N = M + 1. In Fig. 9, the main part of the error in dˆ corresponds to the sky and to the ground which are quite homogeneous, so the error is invisible to the naked eye. Part of it reaches the the boat, so we display a zoom of the latter. Fig. 10 shows

original

restored

Uniform jitter, M= 8 Man (478 × 532)

Algorithm 2 α = 1

Zooms.

Fig. 9. Dejittering yields mae= 1.45, psnr=33.82, e1 = 0.76 and e∞ = 3.76%

Fast Dejittering for Digital Video Frames

Jitter N (0,52 ) truncated on {−15, .., 15}

Zooms of a 707 × 579 image

447

Algorithm 2, α = 0.5

Original

Fig. 10. The restoration of the whole image quasi-perfect: e∞= 0.17% and eΔ 0 = 0.28%

(a)

(b)

(c) Gaussian jitter, M = 12

Algorithms 2(a)

Zooms

Fig. 11. Zooms: (a) Jittered, (b) Original, (c) Dejittered

Uniform jitter M = 8

Original (542 × 410)

Algorithm 2, α = 0.5

Fig. 12. The result is quasi-perfect, mae=0.14, psnr=45.15, eΔ 0 = 0.37% and e∞= 0.18%

448

M. Nikolova

a zoom of a 707 × 579 image. The dejittering of the full image is nearly perfect since e∞ = 0.17% and eΔ 0 = 0.28%. The jitter in Fig. 11 is a centered Gaussian with standard deviation σ = 6, truncated and quantized on {−12, . . . , 12}. Algorithm 2(a) for α = 0.5 and β ∈ {2, 3} gives better visual results than Algorithm 2. Fig. 12 shows a nearly perfect restoration since e∞ = 0.18% and eΔ 0 = 0.37%.

5

Restoration of Noisy Jittered Images

Our approach is to ﬁrst dejitter the raw data using the ideas of Algorithms 1-2 and then to denoise the dejittered image. In the second stage, we use fast shrinkage estimators, see e.g. [8]. Better methods would improve the ﬁnal result. 5.1

Moderate Noise

For a noise with 15-20 db snr or more, Algorithms 1, 2 perform well. Experiment. The image in Fig. 13(a) is corrupted with white zero-mean normal noise, 15 db snr, and independent uniform jitter on {−6, . . . , 6}. Taking into account that the columns of the image are nearly constant on large segments, dejittering in (b) is done using Algorithm 1(a) for β = 3. Denoising of (b) is done in (c) by hard thresholding the 2D Daubechies wavelet transform with 4 vanishing moments for T = 30. The restoration is fast and the result is clean, compared to Fig. 5.

(a) 15 db snr+Jitter

(b) Dejittered, Alg.1

(c) Denoised

Fig. 13. Pepers (512 × 512). For the restored image in (c), psnr=29.34.

5.2

Strong Noise

When the noise is strong, we propose a sightly diﬀerent scheme having a comparable computational cost. The idea is to partially denoise each row of the image using hard thresholding and to replace the function |.|α in step 4a(iii) of Algorithm 1 by a better adapted edge-preserving function ψ. Let W : R1×n → R1×n denote a 1D wavelet transform and W ∗ its inverse. Given a threshold T > 0, let us introduce the hard thresholding operator HT : R1×n → R1×n by

Fast Dejittering for Digital Video Frames

HT (w)(j) =

0 if w(j) ≤ T w(j) otherwise

1 ≤ j ≤ n, ∀w ∈ R1×n .

449

(9)

Knowing that the asymptotically optimal T , cf. [2], oversmooths rows, we use an under-optimal T . In order to simplify the presentation, we give the algorithm for gray-value images. The extension to color images is straightforward, cf. [9]. Algorithm 3 (Quite noisy images) – – – –

Fix N > M , e.g., N = M + 1. Choose a 1D wavelet transform W (e.g. Daubechies). Fix an under-optimal threshold T . Choose ψ : R×R → R+ , e.g. ψ(s, t) = (|s| + β|t|)α , and ﬁx α > 0 and β ≥ 0.

. . 1. Deﬁne f ∗ ∈ Rr×(c+2N ) and set f1∗ = θ(N ) .. g1 .. θ(N ) .

. . 2. Split g = g L .. g .. g R where g L ∈ Rr×N , g ∈ Rr×(c−2N ) and g R ∈ Rr×N . 3. Compute γ1 = W ∗ HT (W g1 ) . 4. Do steps 3 to 5 of Algorithm 1 with changes: the following (a) in step 4a(i), insert γi = W ∗ HT (W g i ) ; n 1 k k (b) in step 4a(iii), use J (k) = n−m+1 j=m ψ |hj − 2uj + vj |+β|hj − uj | .

(a) 10db snr + Jitter

(d)Algorithm 3, dejittering

(b) Bayesian TV [12] (c) Bake & Shake [3] mae=19.36, psnr=20.24 mae=20.62, psnr=19.37

(e) Our 2-stage method mae=7, psnr=28.31

Original

Fig. 14. Boat (512 × 512). Restoration of (a) using diﬀerent methods.

450

Comments. Hard-thresholding in steps 3 and 4a is better than other shrinkages since it keeps unchanged the important coeﬃcients. The 1D row under-denoising (step 4a) helps to approach the model of Remark 1. Denoising of a dejittered image can be done by various methods. Experiment. Boat in Fig. 14 is corrupted with 10 db snr white zero-mean normal noise and independent jitter, uniform on {−8, .., 8}. The restoration using Bake and Shake [3] is visually better than Bayesian TV [12]. For these results, cf. section 6, p. 450. We used Algorithm 3(a) for β = 0 and ψ(t) = |t|α for α = 0.5 in step 4b. In steps 3 and 4a we use hard-thresholding of the Daubechies wavelet coeﬃcients with 2 vanishing moments for T = 30. The dejittered image in (d) is denoised in (e) by hard thresholding of its curvelet transform using the enhanced-denoising program in the CurveLab 2.1.2 toolbox relevant to [1].

6

Conclusions

The obtained results have a remarkable quality while the algorithms are nearly real-time. More details and examples are presented in [9]. The crux of our approach are (a) to minimize a nonsmooth and possibly nonconvex local criterion on the magnitude of the second-order diﬀerences between consecutive rows; (b) to exclude from J all pixels due to the jitter. In presence of strong noise, a critical step is to (under)-denoise the rows successively so that the prior mentioned in Remark 1 remains relevant, and to adapt the criterion J if necessary. The natural evolution of this work is to involve it in the restoration of video sequences and to take advantage of the correlation between consecutive frames.

Acknowledgements This work has been supported by grant Freedom, anr07-jcjc-0048-01. The author thanks Louis Laborelli, (Institut National de l’Audiovisuel, France), for his discussion on practical questions relevant to jittering. The author is thankful to Dr. Suhg-Ha Kang, Georgia Institute of Technology, Atlanta, who realized all experiments with the methods [3] and [12], as well as to Dr. Jackie Shen (Barclays Capital, Wall Street) who provided his Matlab codes for [12].

References 1. Candés, E.J., Demanet, L., Donoho, D.L., Ying, L.: Fast discrete curvelet transforms. SIAM J. on Multiscale Modeling and Simulation 5(3), 861–899 (2006) 2. Donoho, D.L., Johnstone, I.M.: Ideal Spatial Adaptation by Wavelet Shrinkage. Biometrika 81(3), 425–455 (1994) 3. Kang, S.-H., Shen, J.: Video dejittering by bake and shake. Image and vision computing 24(2), 143–152 (2006) 4. Kang, S.-H., Shen, J.: Image Dejittering Based on Slicing Moments. Springer Series on Mathematics and Visualization, pp. 35–55 (2007)

451 5. Kokaram, A., Roosmalen, P.M.B., Rayner, P., Biemond, J.: Line registration of jittered video. In: Proc. of the IEEE Int. Conf. on Acoustics, Speech and Signal Processing, pp. 2553–2556 (1997) 6. Kokaram, A.: Motion picture restoration. Springer, Heidelberg (1998) 7. Laborelli, L.: Removal of video line jitter using a dynamic programming approach. In: Proc. of the IEEE ICASSP, pp. 331–334 (2003) 8. Mallat, S.: A Wavelet Tour of Signal Processing. Academic Press, London (1999) 9. Nikolova, M.: One-iteration dejittering of digital video images. Report CMLA n.2008-20, http://www.cmla.ens-cachan.fr/fileadmin/Membres/nikolova/RT-DJ.pdf 10. Nikolova, M.: Local strong homogeneity of a regularized estimator. SIAM J. on Appl. Mathematics 61(2), 633–658 (2000) 11. Nikolova, M.: Analysis of the recovery of edges in images and signals by minimizing nonconvex regularized least-squares. SIAM J. on Multiscale Modeling and Simulation 4(3), 960–991 (2005) 12. Shen, J.: Bayesian video dejittering by bv image model. SIAM J. on Appl. Mathematics 64(5), 1691–1708 (2004) 13. Welk, M., Weickert, J., Becker, F., Schnörr, C., Feddern, C., Burgeth, B.: Median and related local ﬁlters for tensor-valued images. Signal Processing (special issue Tensor Signal Processing) 7, 291–308 (2007)

Sparsity Regularization for Radon Measures Otmar Scherzer1,2 and Birgit Walch1, 1

2

Department of Mathematics, University of Innsbruck, Technikerstr. 21a, A-6020 Innsbruck, Austria [email protected], [email protected] http://infmath.uibk.ac.at Radon Institute of Computational and Applied Mathematics, Altenberger Str. 69, A-4040 Linz, Austria

Abstract. In this paper we establish a regularization method for Radon measures. Motivated from sparse L1 regularization we introduce a new regularization functional for the Radon norm, whose properties are then analyzed. We, furthermore, show well-posedness of Radon measure based sparsity regularization. Finally we present numerical examples along with the underlying algorithmic and implementation details. We shall, here, see that the number of iterations turn out of utmost importance when it comes to obtain reliable reconstructions of sparse data with varying intensities.

1

Introduction

In this paper we consider the solution of the abstract equation F u = v subject to u ∈ dom F .

(1)

The operator F is linear and bounded between Hilbert spaces W and V . We assume that dom F is a subset of Radon measures on a bounded domain Ω ⊆ IRn . We consider solving the operator equation (1) approximately by a variational regularization method, which consists in minimizing the functional 2 Tˆα,vδ (u ) := F u − v δ V + α u RM (2) on dom F ⊆ W . Here u RM is the norm of the Radon measure u . In order to see the relation to sparsity we note that if u is absolutely continuous with density U , i.e., U dx = du , then we have that U vdx : v ∈ C0 (Ω), vL∞ ≤ 1 = U L1 . u RM = sup Ω

The regularization method with Tˆα,vδ , where the Radon measure is replaced by the L1 -norm, has been analyzed in [13]. There, however, diﬀerent assumptions

Birgit Walch is Recipient of a DOC fFORTE-fellowship of the Austrian Academy of Sciences at the Department of Mathematics of the University of Innsbruck.

X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 452–463, 2009. c Springer-Verlag Berlin Heidelberg 2009

Sparsity Regularization for Radon Measures

453

have been made that guarantee existence of a minimizer in L1 (Ω), while in this work we consider minimizers, which are Radon measures. The notion of sparsity appears in a variety of settings. In the context of regularization it is mostly used in connection with regularization terms RS (u ) := ωi |u , φi | , where φi is a set of appropriate functions, typically forming a basis or frame. The inner product is on a Hilbert space and ωi are positive coeﬃcients. We refer to a few papers, which are related to this topic [7, 2, 3, 4, 5, 8, 9, 10, 11, 12, 14]. Some researchers even call total variation minimization sparsity regularization. We study the reconstruction of sparse functions and measures. In contrast to total variation regularization we focus on reconstructing sparse measures and not gradient measures. There is a fundamental diﬀerence between regularization terms RS and L1 , respectively Radon measure regularization. To see this, take (φi ) an orthonormal basis and ωi = 1 in the deﬁnition of RS (u ) and note that standard convex analysis in the Hilbert space l2 is applicable. Note that l1 ⊆ l2 2 and therefore we can consider minimization of u → F u − v δ + αRS (u ) over l2 ≡ L2 (Ω). That is, there is a proper extension of the functional from l1 to l2 if the operator F can be extended on l2 . However, convex analysis in the Hilbert spaces L2 is not applicable for ·L1 Regularization, since on domains with ﬁnite 2 measure, L2 (Ω) ⊂ L1 (Ω), and minimization of u → F u − v δ +α u L1 over L2 (Ω) is a real restriction of the proper domain of the regularization functional, which is L1 (Ω). The curiosity is that after discretization with piecewise constant functions of the later a truncated expansion of the former is revealed. The outline of this paper is as follows: In Section 2 we give a review on the analysis of regularization methods. In Section 4 we review some basic facts on Radon measures and duals of Sobolev spaces. Having speciﬁed the ingredients we apply the general results of the review sections to Tˆα,vδ in Section 3 and show well–posedness, and regularizing properties. Section 5 shows the analogy in the analysis to total variation minimization. Section 6 presents an example for sparse recovery and shows some reconstructions.

2

Review on Convergence Properties of Variational Regularization Methods

In this section we make the following general assumptions, where we stick to the notation of [13]. Afterwards, we apply the results to the setting already used in the introduction. Assumption 1 1. 2. 3. 4.

Let U and V be Hilbert spaces. L : U → V is a bounded linear operator. = dom F is closed and convex in U . F := L|dom F , where ∅ τU and τV are the weak topologies on U and V , respectively.

454

O. Scherzer and B. Walch

We consider now the solution of the abstract equation F u = v subject to u ∈ dom F .

(3)

We consider solving this operator equation by variational regularization methods, which consist in minimizing the functional 2 Tα,vδ (u) := F u − v δ V + αR(u) where v δ ∈ y. For most applications it will be considered a noisy approximation of v as in equation 3. In order to have regularization properties of the family (Tα,vδ ) it is required that R, ·V , and L satisfy: Assumption 2 1. The norm ·V is sequentially lower semi-continuous with respect to τV . 2. The functional R : U → [0, ∞] is convex and sequentially lower semicontinuous with respect to τU . dom R = {u : R(u) = ∞} is the domain of R. 3. D := dom F ∩ dom R = ∅ (which, in particular, implies that R is proper). 4. For every α > 0 and M > 0, the level sets Mα (M ) := levelM (Tα,v ) := {u ∈ U : Tα,v (u) ≤ M } are sequentially pre-compact with respect to τU . 5. For every M > 0 the set Mα (M ) is sequentially closed with respect to τU and the restriction of F to Mα (M ) is sequentially continuous with respect to the topologies τU and τV . We stress that the sets Mα (M ) are deﬁned based on the Tikhonov functional for unperturbed data v and we do not a-priori exclude the case that Mα (M ) = ∅. We refer to the following theorems from [13], which guarantee the existence of a minimizer, stability of the regularized solutions, and convergence: Theorem 3 (Existence). Let F , R, D, U , and V satisfy Assumption 2. Assume that α > 0 and v δ ∈ V . Then, there exists a minimizer of Tα,vδ . It has been shown by several authors that information on the noise level δ v − v ≤ δ

(4)

is essential for an analysis of regularization methods. In fact without this information the regularization cannot be chosen such that convergence of uδα to a solution of equation 1 can be guaranteed. Theorem 4 (Stability). Let F , dom F , U , and V satisfy Assumption 2. Assume that α > 0 and vk → v δ . Moreover, let uk ∈ arg min Tα,vk ,

k ∈ IN .

Then, (uk ) has a convergent subsequence. Every convergent subsequence converges to a minimizer of Tα,vδ .

Sparsity Regularization for Radon Measures

455

The following theorem clariﬁes the role of the regularization parameter α. It has to be chosen in dependence of the noise level to guarantee approximation of the solution of (3). Theorem 5 (Convergence). Let F , dom F , U , and V satisfy Assumption 2. Assume that (3) has a solution in dom F and that α : (0, ∞) → (0, ∞) satisﬁes α(δ) → 0 and

δ2 → 0 , as δ → 0 . α(δ)

Moreover, let the sequence (δk ) of positive numbers converge to 0, and assume that the data vk := v δk satisfy v − vk ≤ δk . Let uk ∈ arg min Tα(δk ),vk . Then (uk ) has a convergent subsequence to a solution of (1).

3

Regularization on the Space of Radon Measures

We assume that Ω ⊆ IRn and Ω ⊆ IRm are bounded, open and connected with Lipschitz boundary, respectively. For the sake of simplicity of presentation we take V = L2 (Ω ). Other spaces can be considered but then the notation is not that transparent anymore. We consider and study minimization of the functional Tˆα,vδ (u ) := (F u − v δ )2 + α u RM (5) Ω

over the set of Radon measures on Ω. Here, u RM denotes the norm of the Radon measure of u . Radon Measures Below we shortly review some facts about Radon measures, and specify the according properties. The set of Radon measures is the dual of C0 (Ω). Here, C0 (Ω) is the space of continuous functions from Ω into IR with compact support in Ω. We always consider C0 (Ω) equipped with the supremum norm. We denote the dual by M := (C0 (Ω)) and for u ∈ M the Radon measure is deﬁned by u RM := sup vdu : v ∈ C0 (Ω), vL∞ ≤ 1 . Ω

We recall the deﬁnition of weak* convergence in M, i.e., a bounded sequence (uk )k in M is weakly* convergent to u ∈ M if f duk = f du for all f ∈ C0 (Ω) . lim k→∞

Ω

Ω

Below we show that ·RM is lower semi-continuous with respect to the weak* convergence on M.

456

O. Scherzer and B. Walch

Lemma 1. ·RM is lower semi-continuous with respect to the weak* convergence on M. Proof. Let a sequence of Radon measures (uk )k be weakly* convergent to some measure u . Then, u RM = sup vdu : v ∈ C0 (Ω), vL∞ ≤ 1 Ω vduk : v ∈ C0 (Ω), vL∞ ≤ 1 = sup lim k→∞

Ω

≤ lim inf uk RM . k→∞

Dual of a Sobolev Space Let s ∈ IN be ﬁxed. In the following we investigate the dual of the Sobolev space W := W0s,2 (Ω), which is a Hilbert space with the inner product w1 , w2 s := ∇s w1 · ∇s w2 , Ω

where ∇s is the tensor containing all s-th derivatives. The associated norm is denoted by w s . For w ∈ W , the dual of W0s,2 (Ω), we have w −s := sup {w w ˜:w ˜ ∈ W, w ˜ s ≤ 1} . W satisﬁes the following properties: 1. From the Riesz representation theorem (see e.g. [6, Theorem 3.4]) it follows that for every w ∈ W there exists w ∈ W such that w w ˜ = w, w ˜ s for all w ˜ ∈ W. We deﬁne the Riesz mapping Iw = w ,

(6)

and note that I is an isomorphism between W and W, i.e., Iw s = w −s . In particular, we have that (wk )k → w with respect to the topology τW if and only if (wk )k = (Iwk )k → Iw = w with respect to the topology τW . 2. The inner product on the dual space W can be deﬁned by w1 , w2 −s = w1 , w2 s , where w1 , w1 and w2 , w2 are related by the Riesz representation theorem, respectively. Now, we state a lemma, which is central for our further considerations: Lemma 2. Let 2s > n; Recall that s is the order of diﬀerentiation in the deﬁnition of W and n is the dimension of Ω. Then

Sparsity Regularization for Radon Measures

457

1. ·RM is convex and lower semi-continuous on W . 2. M is closed in W . 3. There exists a constant C such that w −s ≤ C w RM for all w ∈ M. Proof. We make some general statements ﬁrst. Since, by assumption 2s > n, the Sobolev embedding theorem (see [1, Thm. 5.4]) guarantees that the embedding from W into C0 (Ω) is bounded, i.e., there exists a constant C such that uL∞ ≤ C us for all u ∈ W .

(7)

Since C0∞ (Ω) is dense in W and C0 (Ω) (with respect to the topologies of W and C0 (Ω), respectively), we have u RM = sup {u v : v ∈ C0 (Ω), vL∞ ≤ 1} = sup {u v : v ∈ C0∞ (Ω), vL∞ ≤ 1} 1 = sup {u v : v ∈ C0∞ (Ω), vL∞ ≤ C} C 1 ≥ sup {u v : v ∈ C0∞ (Ω), vs ≤ 1} C 1 = sup u v : v ∈ W0s,2 (Ω), vs ≤ 1 C = u −s . Thus, M ⊆ W . 1. Let (uk )k be a sequence of Radon measures, which is convergent to u in W (i.e., with respect to τW ). It remains to prove that u is a Radon measure. Since (uk )k is bounded in W , it is also weakly* convergent in W , meaning that uk v → u v for all v ∈ W. Then, in particular, we have uk v → u v for all v ∈ C0∞ (Ω). Now, let v ∈ C0∞ (Ω) satisfy vL∞ ≤ 1, then u v = lim uk v k→∞

≤ lim sup {uk v˜ : v˜ ∈ C0 (Ω), ˜ v L∞ ≤ 1} k→∞

(8)

≤ lim inf uk RM . k→∞

Since

C0∞ (Ω)

is dense in C0 (Ω), the last inequality shows that u RM ≤ lim inf uk RM k→∞

and, thus, u is a Radon measure. 2. From (8) it also follows that .RM is lower semi-continuous on W . The convexity is trivial. 3. Using (7) it follows that w −s = sup {w w ˜:w ˜ ∈ W, ws ≤ 1} ≤ sup {w w ˜:w ˜ ∈ M, wL∞ ≤ C}

= C w RM . This gives the third assertion.

458

4

O. Scherzer and B. Walch

Application to Variational Regularization on Radon Measures

We consider minimization of Tˆα,vδ on W , the dual of the Sobolev space W0s,2 (Ω), with dom F := M, the space of Radon measures, and L : W → L2 (Ω ) as in Assumption 1 bounded. Here W , L2 (Ω ) play the role of U and V in Assumption 1; i.e., we consider the weak topologies on W (not that since W is a Hilbert space, weak and weak* convergence can be identiﬁed) and L2 (Ω ). Note that in our notation of Assumption 1 we use here F := L|dom F . In order to apply the general results stated in Section 1 we have to verify Assumption 2. The requirement in Assumption 1 that dom F = M is closed in W , has already been shown in Lemma 2. 1. We recall that every norm on a Hilbert space is continuous and convex with respect to the weak topology. Therefore, ·W is sequentially weakly lower semi-continuous with respect to τW . 2. The functional R(·) := ·RM is convex and lower semi-continuous, which has already been shown in Lemma 2. 3. The set of Radon measures, which equals the domain D, is not empty. 4. Let α > 0, M > 0, and let (uk )k be a sequence in Mα (M ). We show that (uk )k has a convergent subsequence with respect to τW . From the deﬁnition of Tˆα,vδ it follows that (uk RM )k is bounded and, therefore, from Lemma 2 it follows that (uk )k is bounded with respect to ·−s . Thus, (uk )k has a subsequence which weakly converges in W . This shows that the sequence is sequentially precompact with respect to τW . 5. Let us follow up on the proof of the previous item. – Let us denote the weak limit of (uk )k by u in W . We show that u ∈ Mα (M ). We use that .RM is lower semi-continuous with respect to W . Moreover, since L : W → L2 (Ω ) is bounded, the functional w → Lw − v δ 2 is lower semicontinuous with respect to W . Thus, the sum of both terms is lower semi-continuous and thus u ∈ Mα (M ). Thus Mα (M ) is sequentially closed. – The operator L|dom F is weakly continuous and dom F is weakly sequentially closed, which follows from Lemma 2, which states that dom F = M is closed and convex, and since L is bounded on W . Therefore, Assumption 2 is satisﬁed and the assertions follow. Theorem 5 requires the existence of a solution of (3) in D. Thus, for the application of this result the existence of a solution with ﬁnite Radon measure is required.

5

Methodological Comparison with Finite Total Variation Regularization

The method which we are proposing is methodologically related to total variation minimization, which can be viewed as the relaxation of W 1,1 –regularization, which in turn consists in minimization of the functional

Sparsity Regularization for Radon Measures

u→

Ω

δ 2

459

(F u − v ) + α

|∇u| . Ω

Total variation minimization consists in minimization of u → Ω (F u − v δ )2 + α |Du|, where |Du| is the total variation of u, which is the norm of the ﬁnite, vector valued, Radon measure Du. In our context the regularization is with respect to Radon measures, which is a relaxation of L1 –regularization. Thus, total variation regularization can be considered as a regularization method on Radon measures for the ﬁrst derivatives of the function, while according to our theory, L1 -regularization is for the distributions in W −2,2 (Ω). The derived analogy is not completely satisfactory and certainly subject to further research. The analogy to total variation minimization suggests that the smallest Sobolev space, which is a Hilbert space and contains the Radon measures, is W −1,2 (Ω). However, based on our analysis so far, this space is slightly too small to perform analytical studies. Our analysis is based on using the standard Sobolev embedding theorem and as a consequence, slightly more regularity properties on the linear operator F have to be imposed, than expected from the comparison with the total variation analysis.

6

Application in Nuclear Medicine

Apart from a purely theoretical background the concept of sparse data also proves relevant to a variety of real-world applications. As far as the imaging point of view is concerned we consider the ﬁeld of nuclear medicine one major area of interest. Basically, however, any type of peaky (clustered) data on an otherwise relatively homogeneous background appears suitable for sparsity reconstruction. In the following we give a short description of the above research topic in order to provide a short introduction to the practical part of sparsity regularization: The two most popular techniques in nuclear medicine, PET (Positron Emission Tomography) respectively SPECT (Single Photon Emission Tomography), both rely on nuclear disintegration. Here, a tomographic scanner measures the decay of a radioactive tracer substance which has previously been injected into the patients body. Such a procedure, e.g., often appears in cancer diagnosis. As far as the ﬁeld of imaging is concerned we consider the related isotopes our sparse data. Based on the respective measurements we obtain a so-called sinogram, plotting the number of radioactive disintegrations against the diﬀerent scanner angles. The actual image is, then reconstructed according to the given sinogram. In the medical imaging context sparse variational reconstructions have already been used for MRI RF excitation pulse design in [15]. 6.1

Algorithm Characteristics

The current section focuses on the most important implementation characteristics of the main reconstruction algorithms involved in sparsity reconstruction.

460

O. Scherzer and B. Walch

Firstly, we have decided to apply our sample data (see Paragraph 6.2) to the following Daubechies, Defrise, DeMol [7] (DDD)-type implementation uk+1 := uk − λF ∗ (F uk − v δ ) − α sgn(uk+1 )

(9)

where the last term represents the sign (denoted by the sgn) operator, applied k+1 to the next step reconstruction, and may also be expressed by |uuk+1 | . We, thus, obtain an alternative formulation

α −1 k+1 ) := 1 + k+1 uk+1 = uk − λF ∗ (F uk − v δ ) . (10) S (u |u | As indicated by the notation the set valued operator S −1 contains a univariate inverse and therefore, we get an implementable scheme by applying the inverse of S −1 : (11) uk+1 = S(uk − λF ∗ (F uk − v δ )) . where

⎧ ⎪ ⎨t + α S(t) := t − α ⎪ ⎩ 0

if t ≤ −α if t ≥ +α else.

(12)

We refer to this implementation as of DDD-type, since the implementation is for function (actually measures) and not basis coeﬃcients, as the original sparsity is devoted to. Aside from this diﬀerence it is the algorithm suggested in [7]. The numerical implementation is for piecewise constant functions approximating Radon measures. The situation is analogous as in the case of total variation regularization with ﬁnite elements where derivative (which are Radon measures) are approximated by derivatives of ﬁnite element. 6.2

Experimental Results

In order to test the practical relevance of the above method we have created test data with a constant background exhibiting (clusters of) peaks as we consider them the most realistic scenario. Most practical acquisition devices, however, rarely yield noise free data, which has lead to the decision of adding to our sample data v diﬀerent types of noise. I. e., in order to achieve a proper real-world scenario we restrict the input to our reconstruction algorithms (see Paragraph 6.1) to noisy sinograms v δ only. Since the tested algorithms are mainly intended for medical use we have decided to adapt the sample framework to the nature of nuclear medical data acquisition. Most underlying processes in this ﬁeld exhibit a clear Poisson nature, which has motivated the decision to overlay the clear sinogram data with typical Poisson noise. From a programming point of view we have decided to allow for the speciﬁcation of four diﬀerent parameters, each of which may have a certain inﬂuence on the outcome of the reconstruction process. The weighting parameters λ and α from Equations (9) to (12) appear an obvious choice in this case. Furthermore,

Sparsity Regularization for Radon Measures 1

1

4

l Regularization 2.5

461

l Residuals

x 10

200 180 2 160 140 1.5 Residuals

120 100

1

80 60

0.5

40 20 0

0

0

100

200

300 400 Number of Iterations

500

600

Fig. 1. The above figure is to illustrate the convergence behavior of our proposed regularization scheme from a practical point of view. The right hand side plot shows the declining residuals obtained during the computation process yielding the reconstruction image to the left. α = 0.0036

α = 0.00036

α = 0.000036

α = 0.0000036

200

150

100

50

0

Fig. 2. Decreasing values of α tend to sharpen even smaller object boundaries but at the same time also produce more background noise. Increasing the parameter, however, results in a quite homogeneous background while blurring and sometimes even removing smaller objects.

we have added one algorithm-independent input parameter, i.e., the number of iteration cycles. With the above implementation details speciﬁed, we have, ﬁnally, submitted the DDD-type algorithm from Paragraph 6.1 to diﬀerent test cases. Number of Iterations: As obvious from the problem statement in Equations (9) to (12) the ﬁnal reconstruction is created from iteratively updating the current reconstruction image. In most cases the starting image will be of random nature. The number of iterations may, thus, have a certain impact on the outcome of the reconstruction process. For our algorithm we have created test cycles within the range of [25, 1600], with the remaining parameters ﬁxed. In this respect we have determined 50 cycles as the minimum value for obtaining a relatively reliable result. Note, however, that here, object boundaries appear blurred on an otherwise constant background. With an increasing number of iterations the diﬀerent objects become sharper, while on

462

O. Scherzer and B. Walch l1 Regularization

L2 Regularization

W1,2 Regularization

Fig. 3. The above figures are intended to compare our benchmark results to those of other popular methods, e. g., L2 and W 1,2 regularization. Here, l1 , as depicted to the left tends to yield the clearest approximations of the original objects. We have however noticed, that in some cases small peaks may not be preserved during the regularization process. On the other hand, L2 appears not only slightly more blurred but also fails to remove the circle object caused by the Radon Transform which we consider a major drawback. Finally, W 1,2 regularization tends to produce strong object blurs which may be a problem not only for small and dense peaks but also deteriorate the overall reconstruction quality.

the other hand we are faced with the problem of an ever more inhomogeneous background. Weighting Parameters: As described in paragraph 6.1 the implementation includes two weighting parameters λ and α closely related to each other. Since we consider the role of the ﬁrst one to be of higher importance we have decided for a ratio-based test environment. I. e., setting λ with the range of [0.016, 0.16] we have evaluated the quality of the reconstructions with α at 10λn , where 1 ≤ n ≤ 4. The described test framework has, furthermore, helped in limiting the computational power involved to a reasonable extent. Interestingly our experiments have shown that the ratio between λ and α turns out less important provided the ﬁrst parameter is selected ’correctly’. There were no obvious diﬀerences between images with α = 10λ2 or α = 10λ3 . On the other side, we have noticed lower values of λ producing a more homogeneous background while higher ones resulted in sharper object boundaries. In this respect the eﬀects appear similar to those described for varying numbers of iterations. Finally we may conclude that there exists a certain relation between the number of iterations and the choice of λ. The higher we set the weighting parameter the sooner we have to stop the iterative cycle in order to limit the background inhomogeneities to a certain extent.

Acknowledgement This work has been supported by the Austrian Science Fund (FWF) within the national research networks Industrial Geometry, project 9203-N12, and Photoacoustic Imaging in Biology and Medicine, project S10505-N20.

Sparsity Regularization for Radon Measures

463

References 1. Adams, R.A.: Sobolev Spaces. Academic Press, New York (1975) 2. Bredies, K., Lorenz, D.: Iterated hard shrinkage for minimization problems with sparsity constraints. SIAM J. Sci. Comput. 30(2), 657–683 (2008) 3. Candès, E.J., Romberg, J., Tao, T.: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489–509 (2006) 4. Combettes, P.L., Pesquet, J.-C.: Proximal thresholding algorithm for minimization over orthonormal bases. SIAM J. Optim. 18(4), 1351–1376 (2007) 5. Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul. 4(4), 1168–1200 (electronic) (2005) 6. Conway, J.B.: A Course in Functional Analysis, 2nd edn. Graduate Texts in Mathematics, vol. 96. Springer, Heidelberg (1990) 7. Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Comm. Pure Appl. Math. 57(11), 1413–1457 (2004) 8. Daubechies, I., Fornasier, M., Loris, I.: Accelerated projected gradient methods for linear inverse problems with sparsity constraints. J. Fourier Anal. Appl. (to appear) (2008) 9. Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006) 10. Figueiredo, M., Nowak, R., Wright, S.: Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems. IEEE J. Sel. Topics Signal Process 1(4), 586–598 (2007) 11. Griesse, R., Lorenz, D.: A semismooth Newton method for Tikhonov functionals with sparsity constraints. Inverse Probl. 24(3), 035007, 19 (2008) 12. Ramlau, R., Teschke, G.: A Tikhonov-based projection iteration for nonlinear illposed problems with sparsity constraints. Numer. Math. 104(2), 177–203 (2006) 13. Scherzer, O., Grasmair, M., Grossauer, H., Haltmeier, M., Lenzen, F.: Variational Methods in Imaging. Applied Mathematical Sciences, vol. 167. Springer, New York (2008) 14. Tropp, J.A.: Just relax: convex programming methods for identifying sparse signals in noise. IEEE Trans. Inf. Theory 52(3), 1030–1051 (2006) 15. Zelinski, A.C., Wald, L.L., Setsompop, K., Goyal, V.K., Adalsteinsson, E.: Sparsityenforced slice-selective MRI RF excitation pulse design. IEEE Trans. Med. Imag. 27, 1213–1229 (2008)

Split Bregman Algorithm, Douglas-Rachford Splitting and Frame Shrinkage Simon Setzer University of Mannheim, A5, 68131 Mannheim, Germany [email protected] http://kiwi.math.uni-mannheim.de

Abstract. We examine relations between popular variational methods in image processing and classical operator splitting methods in convex analysis. We focus on a gradient descent reprojection algorithm for image denoising and the recently proposed Split Bregman and alternating Split Bregman methods. By identifying the latter with the so-called DouglasRachford splitting algorithm we can guarantee its convergence. We show that for a special setting based on Parseval frames the gradient descent reprojection and the alternating Split Bregman algorithm are equivalent and turn out to be a frame shrinkage method.

1

Introduction

In recent years variational models were successfully applied in image restoration. These methods came along with various computational algorithms. Interestingly, the roots of many restoration algorithms can be found in classical algorithms from convex analysis dating back more than 40 years. It is useful from diﬀerent points of view to discover these relations: Classical convergence results carry over to the restoration algorithms at hand and ensure their convergence. On the other hand, earlier mathematical results have found new applications and should be acknowledged. The present paper ﬁts into this context. Our aim is twofold: First, we show that the Alternating Split Bregman Algorithm proposed by Goldstein and Osher for image restoration and compressed sensing can be interpreted as a DouglasRachford Splitting Algorithm. In particular, this clariﬁes the convergence of the algorithm. Second, we consider the following denoising problem which uses an L2 data-ﬁtting and a Besov-norm regularization term [1] 1 1 (Ω) }. argmin { u − f 2L2 (Ω) + λuB1,1 1 u∈B1,1 (Ω) 2

(1)

We show that for discrete versions of this problem involving Parseval frames the corresponding alternating Split Bregman Algorithm can be seen as an application of a Forward-Backward Splitting Algorithm. The latter is also related to the Gradient Descent Reprojection Algorithm, see Chambolle [2]. Since our methods are based on soft (coupled) frame shrinkage, we also establish the relation to the X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 464–476, 2009. c Springer-Verlag Berlin Heidelberg 2009

Split Bregman Algorithm, Douglas-Rachford Splitting and Frame Shrinkage

465

classical wavelet shrinkage scheme. Finally, we consider the Rudin-Osher-Fatemi model [3] 1 argmin u − f 2L2 (Ω) + λ |∇u(x)| dx, (2) u∈BV (Ω) 2 Ω which is a successful edge-preserving image restoration method. We apply our ﬁndings to create an eﬃcient frame-based minimization algorithm for the discrete version of this problem.

2

Operator Splitting Methods

Proximation and Soft Shrinkage. We start by considering the proximity operator proxγΦ (f ) := argmin{ u∈H

1 u − f 2 + Φ(u)} 2γ

(3)

on a Hilbert space H. If Φ : H → R ∪ {+∞} is proper, convex and lower semi-continuous (lsc), then for any f ∈ H, there exists a unique minimizer u ˆ := proxγΦ (f ) of (3). By Fermat’s rule, this minimizer is determined by the inclusion 1 (ˆ u − f ) + ∂Φ(ˆ u) γ ⇔f ∈u ˆ + γ∂Φ(ˆ u) ⇔ u ˆ = (I + γ∂Φ)−1 f, 0∈

where the set-valued function ∂Φ : H → 2H is the subdiﬀerential of Φ. If Φ is proper, convex and lsc, then ∂Φ is a maximal monotone operator. For a set-valued function F : H → 2H , the operator JF := (I + F )−1 is called the resolvent of F . If F is maximal monotone, then JF is single-valued and ﬁrmly nonexpansive. In this paper, we are mainly interested in the following two functions Φi , i = 1, 2, on H := RM : i) Φ1 (u) := Λu1 with Λ := diag(λj )M j=1 , λj ≥ 0,

N ˜ j )N , λ ˜ j ≥ 0 and |u| := uj 2 ii) Φ2 (u) := Λ˜ |u| 1 with Λ˜ := diag(λ j=1 for uj := (uj+kN )p−1 k=0 and M = pN .

j=1

The corresponding Fenchel conjugate functions are given by i) Φ∗1 (u) := ιC (u) with C := {u ∈ RM : |uj | ≤ λj , j = 1, . . . , M }, ˜ j , j = 1, . . . , N }, ii) Φ∗2 (u) := ιC˜ (u) with C˜ := {u ∈ RM : uj 2 ≤ λ ˜ i.e., ιC (u) := 0 for u ∈ C where ιC the indicator function of the set C (or C), and ιC (u) := +∞ otherwise. A short calculation shows that for any f ∈ RM we have proxΦ1 (f ) = TΛ (f ),

proxΦ2 (f ) = T˜Λ˜(f ),

466

S. Setzer

where TΛ denotes the soft shrinkage function given componentwise by 0 if |fj | ≤ λj , Tλj (fj ) := fj − λj sgn(fj ) if |fj | > λj ,

(4)

and T˜Λ˜ denotes the coupled shrinkage function, compare [2, 4, 5], ˜j , 0 if fj 2 ≤ λ T˜λ˜ j (fj ) := ˜ ˜j . fj − λj fj /fj 2 if fj 2 > λ Similarly, we obtain proxΦ∗1 (f ) = f − TΛ (f ),

proxΦ∗2 (f ) = f − T˜Λ˜(f ).

(5)

Operator Splittings. Now we consider more general minimization problems of the form (P ) min g(u) + Φ(Du) , u∈H1

:=FP (u)

where D : H1 → H2 is a bounded linear operator and both functions g : H1 → R ∪ {+∞} and Φ : H2 → R ∪ {+∞} are proper, convex and lsc. Furthermore, 1 we assume that 0 ∈ int(D dom(g) − dom(Φ)). For g(u) := 2γ u − f 2 and D = I this is again our proximation problem. The corresponding dual problem has the form (D) − min g ∗ (−D∗ b) + Φ∗ (b) . b∈H2

:=FD (b)

We assume that solutions u ˆ and ˆb of the primal and dual problems, respectively, exist and that the duality gap is zero. In other words, we suppose that there ˆ which satisﬁes the Karush-Kuhn-Tucker conditions 0 ∈ ∂g(ˆ is a pair (ˆ u, d) u) + ∗ˆ ∗ ˆ D b, 0 ∈ −Dˆ ˆ is a solution of (P ) if and only if u + ∂Φ (b). Then u u) = ∂g(ˆ u) + ∂(Φ ◦ D)(ˆ u). 0 ∈ ∂FP (ˆ Similarly, a solution ˆb of the dual problem is characterized by 0 ∈ ∂FD (ˆb) = ∂(g ∗ ◦ (−D∗ ))(ˆb) + ∂Φ∗ (ˆb). In both primal and dual problem, one ﬁnally has to solve an inclusion of the form 0 ∈ A(ˆ p) + B(ˆ p). (6) Various splitting techniques make use of this additive structure. In this paper, we restrict our attention to the forward-backward splitting (FBS) and the DouglasRachford splitting (DRS). The inclusion (6) can be rewritten as ﬁxed point equation pˆ − ηB(ˆ p) ∈ pˆ + ηA(ˆ p) ⇔ pˆ ∈ JηA (I − ηB)ˆ p, η > 0 (7) and the FBS algorithm is just the corresponding iteration. For the following convergence result and generalizations of the algorithm we refer to [6, 7, 8, 9].

Split Bregman Algorithm, Douglas-Rachford Splitting and Frame Shrinkage

467

Theorem 1 (FBS). Let A : H → 2H be a maximal monotone and βB : H → H be ﬁrmly nonexpansive for some β > 0. Furthermore, assume that a solution of (6) exists. Then, for any p(0) and any η ∈ (0, 2β) the following FBS algorithm converges weakly to such a solution of (6) p(k+1) = JηA (I − ηB)p(k) .

(8)

To introduce the DRS, we rewrite the right-hand side of (7) as

p + ηB pˆ ⇔ pˆ ∈ JηB JηA (I − ηB)ˆ p + ηB pˆ pˆ + ηB pˆ ∈ JηA (I − ηB)ˆ

:=tˆ

The DRS algorithm [10] is the corresponding iteration, where we use t(k) := p(k) + ηBp(k) . For the following convergence result, which in contrast to the FBS algorithm holds also for set-valued operators B, see [6, 8]. Theorem 2 (DRS). Let A, B : H → 2H be maximal monotone operators and assume that a solution of (6) exists. Then, for any initial elements t(0) and p(0) and any η > 0, the following DRS algorithm converges weakly to an element tˆ: t(k+1) = JηA (2p(k) − t(k) ) + t(k) − p(k) , p(k+1) = JηB (t(k+1) ). p) + B(ˆ p). If H is ﬁniteFurthermore, it holds that pˆ := JηB ( tˆ) satisﬁes 0 ∈ A(ˆ dimensional, then the sequence p(k) k∈N converges to pˆ.

3

Bregman Methods (p)

For a function ϕ : H → R ∪ {+∞}, the Bregman distance Dϕ is deﬁned as (p) (u, v) = ϕ(u) − ϕ(v) − p, u − v , Dϕ

with p ∈ ∂ϕ(v), cp. [11]. Given an arbitrary initial value u(0) and a parameter γ > 0, the Bregman proximal point algorithm (BPP) applied to (P ) has the form [12, 13, 14] 1 (p(k) ) u(k+1) = argmin{ Dϕ (u, u(k) ) + FP (u)}, γ u∈H1

p(k+1) ∈ ∂ϕ(u(k+1) ).

(9)

For conditions on ϕ such that (u(k) )k∈N converges to a minimizer of (P ), see [13] and the references therein. For ϕ := 12 · 22 , we recover the classical proximal point algorithm (PP) for (P ) which can be written as follows, compare [15], u(k+1) = proxγFP (u(k) ) = argmin u∈H1

1 u − u(k) 22 + FP (u) = Jγ∂FP (u(k) ). 2γ

468

S. Setzer

Under our assumptions on g, Φ and D, the weak convergence of the PP algorithm is guaranteed for any initial point u(0) , see [16]. In the same way, we can deﬁne the PP algorithm for (D) 1 b(k+1) = proxγ∂FD (b(k) ) = argmin b − b(k) 22 + FD (b) = Jγ∂FD (b(k) ) 2γ b∈H2 and the same convergence result holds true. It is well-known that the PP algorithm applied to (D) is equivalent to the augmented Lagrangian method (AL) for the primal problem, see, e.g., [15,14]. To deﬁne this algorithm we ﬁrst transform (P ) into the constrained minimization problem min

u∈H1 ,d∈H2

E(u, d) s.t. Du = d,

(10)

where E(u, d) := g(u) + Φ(d). This problem was introduced in [29]. The corresponding AL algorithm for (P ) is then deﬁned as 1 (u(k+1) , d(k+1) ) = argmin E(u, d) + b(k) , Du − d + Du − d22 2γ u∈H1 ,d∈H2 1 (11) b(k+1) = b(k) + (Du(k+1) − d(k+1) ). γ Indeed, it has been shown that for the same initial value b(0) the sequence (b(k) )k∈N coincides with the one produced by the PP algorithm applied to (D), see [15]. Moreover, if (b(k) )k∈N converges strongly then every strong cluster point of (u(k) )k∈N is a solution of (P ), cf. [17]. To solve the constrained optimization problem (10), Goldstein and Osher [18] proposed to use the Bregman distance (p(k) )

DE

(k)

(k) (u, d, u(k) , d(k) ) = E(u, d) − E(u(k) , d(k) ) − p(k)

− pd , d − d(k) u ,u−u

and the term

1 2γ Du

− d22 instead of FP in (9). This results in the algorithm

(u(k+1) , d(k+1) ) = argmin u∈H1 ,d∈H2

(p(k) )

DE

1 Du − d22 , (12) 2γ 1 (k) = pd + (Du(k+1) − d(k+1) ), γ

(u, d, u(k) , d(k) ) +

1 ∗ (k+1) (k+1) − d(k+1) ), pd pu(k+1) = p(k) u − D (Du γ

where we have used that (12) implies (k) 0 ∈ ∂E(u(k+1) , d(k+1) ) − pu(k) , pd

1 1 + D∗ (Du(k+1) − d(k+1) ), − (Du(k+1) − d(k+1) ) , γ γ (k+1) (k+1) (k+1) (k+1) , ,d ) − pu , pd = ∂E(u (k) (k) (k) (k) ∈ ∂E(u(k) , d(k) ). Setting pu = − γ1 D∗ b(k) and pd = so that pu , pd for all k ≥ 0 and regarding that for a bounded linear operator D, 1 (p(k) ) Du − d22 = E(u, d) − E(u(k) , d(k) ) DE (u, d, u(k) , d(k) ) + 2γ 1 1 1 Du − d22 , − b(k) , Du − Du(k) − b(k) , d − d(k) + γ γ 2γ

1 (k) γb

Split Bregman Algorithm, Douglas-Rachford Splitting and Frame Shrinkage

469

Goldstein and Osher obtained the Split Bregman method [18] (u(k+1) , d(k+1) ) = argmin

E(u, d) +

u∈H1 ,d∈H2

b

(k+1)

1 (k) b + Du − d22 , 2γ

= b(k) + Du(k+1) − d(k+1) .

(13)

As already discovered in [19], the Split Bregman algorithm (13) is just the AL algorithm (11) with the only diﬀerence that in (13) the iterates b(k) are scaled by γ. Hence, we can conclude that the sequence ( γ1 b(k) )k∈N generated by the Split Bregman method (13) converges to solutions of the dual problem. The same (k) holds true for the sequence (pd )k∈N we get from (12). To summarize: PP for (D)

=

AL for (P )

=

Split Bregman Alg.

Since the minimization problem in (13) is hard to solve, Goldstein and Osher [18] proposed the following alternating Split Bregman algorithm without a convergence proof: 1 (k) u(k+1) = argmin g(u) + b + Du − d(k) 22 , 2γ u∈H1 1 (k) b + Du(k+1) − d22 , d(k+1) = argmin Φ(d) + 2γ d∈H2 b(k+1) = b(k) + Du(k+1) − d(k+1) .

(14) (15) (16)

The next theorem identiﬁes this alternating Split Bregman method as a special case of a DRS. DRS for (D) = Alternating Split Bregman Alg. If H1 and H2 are ﬁnite-dimensional it therefore provides us with a convergence result for the sequence (b(k) )k∈N of this algorithm. Theorem 3. The alternating Split Bregman algorithm coincides with the DRS algorithm applied to (D) with A := ∂(g ∗ ◦ (−D∗ )) and B := ∂Φ∗ , where η = 1/γ and t(k) = η(b(k) + d(k) ), p(k) = ηb(k) , k ≥ 0. (17) Proof: 1. First, we show that for a proper, convex, lsc function h : H1 → R ∪ {+∞} and a bounded linear operator K : H1 → H2 the following relation holds true: η pˆ = argmin Kp − q2 + h(p) ⇒ η(K pˆ − q) = Jη ∂(h∗ ◦(−K ∗ )) (−ηq). 2 p∈H1 (18) The ﬁrst equality is equivalent to

0 ∈ ηK ∗ (K pˆ − q) + ∂h(ˆ p) ⇔ pˆ ∈ ∂h∗ − ηK ∗ (K pˆ − q) .

470

S. Setzer

Applying −ηK on both sides and adding −ηq implies

−ηK pˆ ∈ −ηK∂h∗ − ηK ∗ (K pˆ − q) = η ∂ h∗ ◦ (−K ∗ ) η(K pˆ − q)

−ηq ∈ I + η ∂(h∗ ◦ (−K ∗ )) η(K pˆ − q) which is by the deﬁnition of the resolvent equivalent to the right equality in (18). 2. Applying (18) to (14) with h := g, K := D and q := d(k) − b(k) we get η(b(k) + Du(k+1) − d(k) ) = JηA (η(b(k) − d(k) )). Assume that the alternating Split Bregman iterates and the DRS iterates coincide with the identiﬁcation (17) up to some k ∈ N. Using this induction hypothesis it follows that η(b(k) + Du(k+1) ) = JηA (η(b(k) − d(k) )) + ηd(k) = t(k+1) .

2p(k) −t(k)

(19)

t(k) −p(k)

By deﬁnition of b(k+1) in (16) we see that η(b(k+1) + d(k+1) ) = t(k+1) . Next we apply (18) to (15) with h := Φ, K := I and q := b(k) + Du(k+1) which gives together with (19), η(b(k) + Du(k+1) − d(k+1) ) = JηB (η(b(k) + Du(k+1) )) = p(k+1) .

t(k+1)

Again by the formula (16) for b(k+1) we obtain ηb(k+1) = p(k+1) which completes the proof. 2 A similar result was shown in [20, 21].

4

Application to Image Denoising

In the following, we restrict our attention to a discrete setting. We consider digital images deﬁned on {1, . . . , n} × {1, . . . , n} and reshape them columnwise into vectors f ∈ RN with N = n2 . If not stated otherwise the multiplication of vectors, their square root etc. are meant componentwise. We will now apply the algorithms deﬁned in Sections 2 and 3 to the discrete denoising problem of the form argmin u∈RN

1 2

u − f 22 + Φ(Du) ,

D ∈ RM,N ,

M ≥ N,

(20)

where Φ is deﬁned as in Section 2. Consider the alternating Split Bregman algorithm (14)-(16) with g(u) := 12 u − f 22 . Theorem 3 implies the convergence

of b(k) k∈N and it is not hard to show that for this special choice of g, the se

quence u(k) k∈N converges to a solution of the primal problem. The quadratic functional in (14) with the above choice of g can simply be minimized by setting its gradient to zero which results in

u(k+1) = (γI + DT D)−1 γf + DT (d(k) − b(k) ) .

Split Bregman Algorithm, Douglas-Rachford Splitting and Frame Shrinkage

471

Goldstein and Osher proposed to calculate the inverse (γI + DT D)−1 by GaußSeidel iterations. Applying (4) we see that for Φ = Φ1 the solution of the proximation problem in (15) is given by d(k+1) = TγΛ (b(k) + Du(k+1) ). The following algorithm shows the case Φ = Φ1 . Observe that in order to better compare this method to the other algorithms in this section, we have changed the order in which we compute u(k+1) . This is allowed because there are no restrictions on the choice of the starting values. Algorithm (Alternating Split Bregman Shrinkage) Initialization: u(0) := f , b(0) := 0. For k = 0, 1, . . . repeat until a stopping criterion is reached d(k+1) := TγΛ (b(k) + Du(k) ), b(k+1) := b(k) + Du(k) − d(k+1) ,

u(k+1) := (γI + DT D)−1 γf + DT (d(k+1) − b(k+1) ) . For Φ = Φ2 we have to replace the soft shrinkage TγΛ by the coupled shrinkage T˜ ˜. Note that this algorithm can also be used for the deblurring problem which γΛ

diﬀers from (20) in having a more general data-ﬁtting term g(u) := 12 Ku − f 22 with some linear operator K. In this case one has to invert the matrix γK T K + DT D which can be diagonalized in many applications by FFT or DCT techniques, e.g., if it is circulant. The problem (20) can also be solved via its dual problem by u ˆ = f − DTˆb, where ˆb = argmin{ 1 f − DT b2 + Φ∗ (b)}, i = 1, 2 (21) 2 i 2 b∈RM see, e.g., [22]. Applying the FBS algorithm (8) to the dual problem (21) gives b(k+1) = proxγΦ∗i b(k) + γD(f − DT b(k) ) , i = 1, 2, where 0 < γ < 2/DT D2 . Using the relation (5) we obtain for Φ = Φ1

b(k+1) = b(k) + γD(f − DT b(k) ) − TΛ b(k) + γD(f − DT b(k) ) . This yields the following algorithm to compute the minimizer of (20) for Φ = Φ1 : Algorithm (FBS Shrinkage) Initialization: u(0) := f , b(0) := 0 For k = 0, 1, . . . repeat until a stopping criterion is reached

d(k+1) := TΛ b(k) + γDu(k) , b(k+1) := b(k) + γDu(k) − d(k+1) , u(k+1) := f − DT b(k+1) .

472

S. Setzer

For the functional Φ2 we have to replace the shrinkage functional by T˜Λ˜. This algorithm can also be deduced as a simple gradient descent reprojection algorithm as it was done, e.g., by Chambolle [2]. Note that this is not the often cited Chambolle algorithm in [22]. A relation of this method to the Bermúdez-Moreno algorithm which also turns out to be an FBS algorithm was shown in [23]. A connection to min-max duality was established in [24]. 4.1

Besov-Norm Regularization

For a suﬃciently smooth orthogonal wavelet basis {ψi }i∈I of L2 (Ω) with wavelets of more than one vanishing moment, problem (1) can be rewritten as 1 d − c2 2 + λd 1 , 2 where c := ( f, ψi )i and d := ( u, ψi )i . In the discrete setting, consider the orthogonal matrix W ∈ RN,N having as rows the ﬁlters of orthogonal wavelets (and scaling functions) up to a certain level. Then the minimization problem corresponding to (1) is given by 1 u − f 22 + ΛW u1 2 u∈RN 1 = argmin W u − W f 22 + ΛW u1 . 2 N u∈R

u ˆ = argmin

(22)

ˆ where The orthogonality of W yields further u ˆ = W T d, 1 dˆ = argmin d − c22 + Λd1 , 2 d∈RN

c := W f, Λ := λIN

(23)

and by (4) we obtain the known wavelet shrinkage procedure u ˆ = W T TΛ (W f ) consisting of a wavelet transform W followed by soft shrinkage TΛ of the wavelet coeﬃcients and the inverse wavelet transform W T . However, for image processing tasks like denoising or segmentation, ordinary orthogonal wavelets are not suited due to their lack of translational invariance which leads to visible artefacts. Nevertheless, without the usual subsampling, the method becomes translationally invariant and the results can be improved. But then W ∈ RM,N , M = pN , where p is three times the decomposition level plus one for the rows belonging to the scaling function ﬁlters on the coarsest scale. We still have W T W = IN , but of course W W T = IM , i.e., the rows of W form a discrete Parseval frame on RN but not a basis. For the design of such frames see, e.g., [25, 26]. Equality (22) is still true for Parseval frames, but the problem is no longer equivalent to (23). Instead we can apply FBS shrinkage or alternating Split Bregman shrinkage with D = W and Φ = Φ1 . Note that in order to use the FBS algorithm, γ has to fulﬁll 0 < γ < 2/W TW 2 . Now W T W = IN , thus we have to choose γ in (0, 2) and γ = 1 is an admissible choice. It was shown in [27] that both algorithms coincide for D = W with W T W = IN and γ = 1:

Split Bregman Algorithm, Douglas-Rachford Splitting and Frame Shrinkage

Alternating Split Bregman Shrinkage

473

FBS Shrinkage

=

Moreover, the third step of both algorithms can be simpliﬁed to the frame synthesis step u(k+1) = W T d(k+1) . 4.2

(24)

ROF Regularization

In this section, we apply the algorithms presented so far to the discrete ROF denoising method. We use an appropriate discretization of the absolute value of the gradient. Let h0 := 12 [1 1] and h1 := 12 [1 − 1] be the ﬁlters of the Haar wavelet. For convenience of notation, we use periodic boundary conditions and the corresponding circulant matrices are denoted by H0 ∈ Rn,n and H1 ∈ Rn,n . Then the following matrix fulﬁlls W T W = IN but W T W = I4N ⎛

⎞

⎛

⎞

H0 ⊗ H0 H0 ⎜ H0 ⊗ H1 ⎟ ⎜ ⎟ W := ⎝ = ⎝ ⎠. H1 ⊗ H0 ⎠ H1 H1 ⊗ H1

In [4,5] it was shown that

2

2

2 12 (H0 ⊗ H1 ) u + (H1 ⊗ H0 ) u + (H1 ⊗ H1 ) u

is a consistent ﬁnite diﬀerence discretization of |∇u|. Using this gradient discretization, the discrete version of the ROF functional in (2) reads argmin u∈RN

1 u − f 22 + Λ˜ |H1 u| 1 , 2

Λ˜ := λIN .

(25)

Observe that if we use the alternating Split Bregman algorithm with D = H1 for this problem we have to solve a linear system of equations in the third step of each iteration. This problem can be avoided by using that H1 is part of a Parseval frame, cp. [27]. To this end we deﬁne the proper, convex and lsc functional Φ˜2 which diﬀers from Φ2 in that the ﬁrst part of the input vector is neglected, i.e., Φ˜2 (c) = Λ˜ |c1 | 1 ,

for c = (c0 , c1 ) ∈ RN × R3N .

Now we can rewrite (25) as follows argmin u∈RN

1 u − f 22 + Φ˜2 (W u) . 2

Applying the alternating Split Bregman algorithm, or equivalently the FBS method, with γ = 1 and (24) we obtain the following algorithm.

474

S. Setzer

Initialization: u(0) := f , b(0) := 0. For k = 0, 1, . . . repeat until a stopping criterion is reached (k+1)

d0

(k+1)

d1 b

(k+1)

u(k+1)

:= (W u(k) )0 ,

:= T˜ ˜ b(k) + (W u(k) )1 , Λ

(k+1)

+ (W u(k) )1 − d1 (k+1) d0 T , := W (k+1) d1 := b

(k)

, (26) (0)

where (W u)0 := H0 u and (W u)1 := H1 u. Note that starting with b0 := 0 all (k) iterates b0 remain zero vectors. We also obtain algorithm (26) if we apply FBS shrinkage directly to (25) with D = H1 and γ = 1. We now give a numerical example for these two algorithms. The computations were performed in MATLAB. In Fig. 1 we see the result of applying the two algorithms to a noisy image. Note that we only show the resulting image for algorithm (26) here, since the diﬀerence to the alternating Split Bregman

0.3 0.25 0.2 0.15 0.1 0.05 0 −0.05 −0.1 −0.15 −0.2

Fig. 1. Comparison of algorithm (26) and the alternating Split Bregman method with D = H1 . Stopping criterion: u(k+1) − u(k) ∞ < 0.5. Top left: Original image. Top right: Noisy image (white Gaussian noise with standard deviation 25). Bottom left: Algorithm (26), λ = 70, (53 iterations). Bottom right: Diﬀerence to alternating Split Bregman shrinkage with D = H1 , (53 iterations).

Split Bregman Algorithm, Douglas-Rachford Splitting and Frame Shrinkage

475

method with D = H1 is marginal. We also found that the two algorithms need nearly the same number of iterations. However, algorithm (26) is extremely fast and does not require solving a linear system of equations as the alternating Split Bregman shrinkage does. Moreover, γ = 1 seems to be a very good parameter choice. For the above numerical experiment we used periodic boundary conditions, concerning Neumann boundary conditions, see, e.g., [28].

References 1. DeVore, R.A., Lucier, B.J.: Fast wavelet techniques for near-optimal image processing. In: IEEE MILCOM 1992 Conf. Rec., vol. 3, pp. 1129–1135. IEEE Press, San Diego (1992) 2. Chambolle, A.: Total variation minimization and a class of binary MRF models. In: Rangarajan, A., Vemuri, B.C., Yuille, A.L. (eds.) EMMCVPR 2005. LNCS, vol. 3757, pp. 136–152. Springer, Heidelberg (2005) 3. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 4. Mrázek, P., Weickert, J.: Rotationally invariant wavelet shrinkage. In: Michaelis, B., Krell, G. (eds.) DAGM 2003. LNCS, vol. 2781, pp. 156–163. Springer, Heidelberg (2003) 5. Welk, M., Steidl, G., Weickert, J.: Locally analytic schemes: A link between diﬀusion ﬁltering and wavelet shrinkage. Applied and Computational Harmonic Analysis 24, 195–224 (2008) 6. Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM Journal on Numerical Analysis 16(6), 964–979 (1979) 7. Tseng, P.: Applications of a splitting algorithm to decomposition in convex programming and variational inequalities. SIAM Journal on Control and Optimization 29, 119–138 (1991) 8. Combettes, P.L.: Solving monotone inclusions via compositions of nonexpansive averaged operators. Optimization 53(5–6), 475–504 (2004) 9. Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. Multiscale Modeling and Simulation 4, 1168–1200 (2005) 10. Douglas, J., Rachford, H.H.: On the numerical solution of heat conduction problems in two and three space variables. Transactions of the American Mathematical Society 82(2), 421–439 (1956) 11. Bregman, L.M.: The relaxation method of ﬁnding the common point of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics 7(3), 200–217 (1967) 12. Eckstein, J.: Nonlinear proximal point algorithms using Bregman functions, with applications to convex programming. Mathematics of Operations Research 18(1), 202–226 (1993) 13. Kiwiel, K.C.: Proximal minimization methods with generalized Bregman functions. SIAM Journal on Control and Optimization 35(4), 1142–1168 (1997) 14. Frick, K.: The Augmented Lagrangian Method and Associated Evolution Equations, Dissertation, University of Innsbruck (2008) 15. Rockafellar, R.T.: Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Mathematics of Operations Research 1(2), 97– 116 (1976)

476

S. Setzer

16. Browder, F.E., Petryshyn, W.V.: The solution by iteration of nonlinear functional equations in Banach spaces. Bulletin of the American Mathematical Society 72, 571–575 (1966) 17. Iusem, A.N.: Augmented Lagrangian methods and proximal point methods for convex optimization. Investigación Operativa 8, 11–49 (1999) 18. Goldstein, D., Osher, S.: The Split Bregman method for l1 regularized problems. UCLA CAM Report (2008) 19. Yin, W., Osher, S., Goldfarb, D., Darbon, J.: Bregman iterative algorithms for 1 minimization with applications to compressed sensing. SIAM Journal on Imaging Sciences 1(1), 143–168 (2008) 20. Eckstein, J., Bertsekas, D.P.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Mathematical Programming 55, 293–318 (1992) 21. Gabay, D.: Applications of the method of multipliers to variational inequalities. In: Fortin, M., Glowinski, R. (eds.) Augmented Lagrangian Methods: Applications to the Numerical Solution of Boundary–Value Problems. Studies in Mathematics and its Applications, vol. 15, pp. 299–331. North–Holland, Amsterdam (1983) 22. Chambolle, A.: An algorithm for total variation minimization and applications. Journal of Mathematical Imaging and Vision 20, 89–97 (2004) 23. Aujol, J.F.: Some ﬁrst-order algorithms for total variation based image restoration. Preprint ENS Cachan (2008) 24. Zhu, M., Chan, T.: An eﬃcient primal-dual hybrid gradient algorithm for total variation image restauration. UCLA CAM Report (2008) 25. Daubechies, I., Han, B., Ron, A., Shen, Z.: Framelets: MRA-based construction of wavelet frames. Applied and Computational Harmonic Analysis 14, 1–46 (2003) 26. Dong, B., Shen, Z.: Pseudo-splines, wavelets and framelets. Applied and Computational Harmonic Analysis 22, 78–104 (2007) 27. Setzer, S., Steidl, G.: Split Bregman method, gradient descent reprojection method and Parseval frames. Preprint Univ. Mannheim (2008) 28. Chan, R.H., Setzer, S., Steidl, G.: Inpainting by ﬂexible Haar-wavelet shrinkage. SIAM Journal on Imaging Science 1, 273–293 (2008) 29. Wang, Y., Yang, J., Yin, W., Zhang, Y.: A new alternating minimization algorithm for total variation image reconstruction. SIAM Journal on Imaging Sciences 1(3), 248–272 (2008)

Anisotropic Smoothing Using Double Orientations Gabriele Steidl and Tanja Teuber University of Mannheim, A5, 68131 Mannheim, Germany [email protected], [email protected] http://kiwi.math.uni-mannheim.de Abstract. To improve the quality of image restoration methods directional information has recently been involved in the restoration process. In this paper, we propose a two step procedure for denoising images that is particularly suited to recover sharp vertices and X junctions in the presence of heavy noise. In the ﬁrst step, we estimate the (smoothed) orientations of the image structures, where we ﬁnd the double orientations at vertices and X junctions using a model of Aach et al. Based on shape preservation considerations this directional information is then applied to establish an energy functional which is minimized in the second step. We discuss the behavior of our new method in comparison with single direction approaches appearing, e.g., when using the classical structure tensor of Förstner and Gülch and demonstrate the very good performance of our method by numerical examples.

1

Introduction

Recently, much eﬀort has been put into improving image restoration processes by involving directional information. Our paper contributes to this topic. We restrict our attention to the denoising of images f ∈ L2 (R2 ) corrupted by heavy white Gaussian noise and the minimization of energy functionals 1 2 f − uL2 + λJ(u) , arg min (1) u∈L2 2 where J : L2 → R≥0 ∪ {+∞} denotes a proper, convex, closed functional which is in addition positively homogeneous. Frequently applied examples of such functionals are R2 ϕ(∇u) dx, u ∈ BVϕ , J(u) := (2) ∞, u ∈ L2 \BVϕ , where ϕ(x) = ϕ1 (x) := |x1 | + |x2 | as in [1, 2] or ϕ(x) = ϕ2 (x) := x21 + x22 as 2 2 in the Rudin-Osher-Fatemi (ROF) model [3]. Here BVϕ (R ) := {u ∈ L2 (R ) : ϕ(∇u) dx < ∞} denotes the (anisotropic) space of functions of bounded R2 variation equipped with the norm ϕ(∇u) dx := sup − u(x) divV (x) dx, (3) R2

1 (R2 ,R2 ) V ∈Cc V ∈Wϕ a.e.

R2

X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 477–489, 2009. c Springer-Verlag Berlin Heidelberg 2009

478

G. Steidl and T. Teuber

where the Wulﬀ shape Wϕ := {x ∈ R2 : x, y ≤ ϕ(y) ∀y ∈ R2 } of ϕ is the unit square with horizontal and vertical edges in case ϕ = ϕ1 and the unit circle for ϕ = ϕ2 . Note that other positively homogeneous, ﬁnite, convex, even functions ϕ with ϕ(0) = 0 and ϕ(x) > 0 for x = 0 can be used in (2) and that the spaces BVϕ are equivalent for all these functions [4]. Besides (2) we will also apply inf convolution functionals J(u) := (J1 2J2 )(u) :=

inf

u=u1 +u2

{J1 (u1 ) + J2 (u2 )},

(4)

where J1 , J2 are nonnegative, proper, convex, closed and positively homogeneous. A possible choice for J and J suggested, e.g., in [5] are |∂x u1 | dx 1 2 R2 and R2 |∂y u2 | dx. It is well known that for large regularization parameters λ model (2) with ϕ1 and similarly the above inf convolution model tends to cut vertices vertically and horizontally while the ROF approach rounds them. Therefore we propose to introduce local directional information obtained from the double direction tensors of Aach et al. [6] into these functionals. Outline of our paper. In Sec. 2 we recall the single orientation estimations provided by the structure tensor in [7]. Then we turn to the double orientation estimations proposed in [6], where we get some additional insights on the nullspaces of these tensors. In Sec. 3 we start with shape preservation facts as motivation for the subsequent introduction of our new directional denoising model. Furthermore, we discuss our orientation choice in comparison to the classical structure tensor. The good performance of our method is demonstrated by numerical examples in Sec. 4. Conclusions are given in Sec. 5. More details including proofs are contained in the accompanying preprint [8]. Related work. Image restoration by ﬁrst approximating the local geometry and then involving it into the restoration process was suggested in various papers. A group of methods retrieves the local geometry by computing the Gülch/Förstner structure tensor and then uses its eigenvalues and orthogonal eigenvectors to deﬁne a diﬀusion tensor which steers the direction of the ﬂux in PDEs. Tschumperlé [9] divided these methods into divergence-based [10], tracebased [11] and his curvature-based methods. The ﬁrst approach is also related to the minimization of speciﬁc energy functionals, see, e.g., [12,13]. The curvaturebased method [14, 9] which is related to the line integral convolution [15] is better suited for the restoration of sharp edges than the other two methods, but our method is superior in the presence of heavy noise. Note that as in [16] the curvature-based method can include multiple directions. Various papers deal with the smoothing of normal vectors by minimizing certain energy functionals [17, 18, 19, 20, 21, 22] and use this information for subsequent denoising. In general these minimization procedures are much more expensive then our double direction approach. Kimmel, Sochen et al. suggested restoration techniques within the Beltrami framework [23]. The corresponding smoothing with the socalled ’short-time Beltrami kernel’ diﬀers from the bilateral ﬁlters [24] in the fact that it uses geodetic distances on the image manifold while the bilateral

Anisotropic Smoothing Using Double Orientations

479

kernel applies Euclidian distances. In [25], the authors considered special images containing rotated rectangle and established a unique functional both for ﬁnding the rotation angles and for denoising. However, the resulting algorithm is again a two step procedure. For a simpler two step approach we refer to [26]. So far, the best results behind our new method we have obtained by applying nonlocal means [27, 28]. An example is reported in Sec. 4.

2 2.1

Orientation Estimations Single Orientation Estimations

Let Ω ⊂ R2 be the image part of interest. For simplicity, we assume that Ω := Bε (0) is the ball around 0 with radius ε. Our ideal assumption is that this part of the image corresponds to a function f : Ω → R which has constant values along a single direction r with r2 = 1, i.e., f = ϕ(sT ·) with s := r⊥ = (r2 , −r1 )T and ϕ : [−ε, ε] → R. Then, 0=

∂ f (x) = rT ∇f (x) = rT ϕ (sT x) s, ∂r

∀x ∈ Ω

holds true and we also have for a nonnegative weight function w : Ω → R that 2 0= w(x) (rT ∇f (x)) dx = rT w(x)∇f (x) ∇f (x)T dx r. (5) Ω

Ω

If ϕ is not constant, then the symmetric, positive semideﬁnite matrix 2 J := w(x)∇f (x)∇f (x)T dx = w(x) (ϕ (sT x)) dx ssT Ω

Ω

has rank one and r is an eigenvector of the eigenvalue 0. So far we have considered image parts with an ideal directional behavior. Since in applications we deal with noisy images, a pre-smoothing step with the 2D Gaussian Kσ of standard deviation σ is performed before computing the gradient in J . Thus, (5) holds at least approximately and r is the minimizer of the weighted least squares expression rT J r subject to r2 = 1, i.e., the eigenvector belonging to the smallest eigenvalue of J . Moreover, in natural images the signiﬁcant directions vary in diﬀerent image parts. To detect the direction in the neighborhood of every image point x, we use the shifted Gaussian w = Kρ (· − x) (truncated outside B3ρ (x)). In this way, we can attach to each image point a 2 × 2 matrix, the so-called structure tensor Jρ := Kρ ∗ (∇fσ ∇fσT ) ,

∇fσ := ∇(Kσ ∗ f ).

If the eigenvalues of Jρ (x) fulﬁll λ1 λ2 , then we are in the neighborhood of an edge and the orthogonal eigenvectors r1 = r and r2 = r⊥ approximate the isophote direction and the gradient direction in x. In the neighborhood of vertices, where λ2 ≥ λ1 0, we obtain smoothed eigenvectors between neighboring edges. This causes artefacts in restoration models involving these directions. Therefore we are interested in double orientations.

480

2.2

G. Steidl and T. Teuber

Double Orientation Estimations

Assume that f can be decomposed into two functions fi = ϕi (sTi ·) with si := ri⊥ , i = 1, 2, where r1 ∦ r2 . As in Fig. 1, we consider two decompositions of f , the transparent model f (x) = f1 (x) + f2 (x) ∀x ∈ Ω (6) and the occlusion model with Ω = Ω1 ∪ Ω2 , Ω1 ∩ Ω2 = ∅ and f1 (x) for x ∈ Ω1 , f (x) = f2 (x) for x ∈ Ω2 .

Ω

f1

Ω1

f1

f2

Ω2

f2

(7)

Fig. 1. Illustration of the transparent model (left) and the occlusion model (right)

Transparent model. By the deﬁnition of f1 and f2 we conclude for all x ∈ Ω that 0=

∂2 ∂2 f1 (x) + f2 (x) = r2T H(x) r1 = r1T H(x) r2 f (x) = ∂r1 ∂r2 ∂r1 ∂r2

(8)

with the Hessian H(x) of f at x. Applying tensor products ⊗ of matrices, (8) becomes 0 = (r1 ⊗ r2 )T h(x) = (r2 ⊗ r1 )T h(x)

with h := (∂xx f, ∂xy f, ∂xy f, ∂yy f )T (9)

and since this holds true for all x ∈ Ω we also get 0= w(x) (r1 ⊗ r2 )T h(x)h(x)T (r1 ⊗ r2 ) dx = (r1 ⊗ r2 )T T (r1 ⊗ r2 )

(10)

Ω

with the symmetric, positive semideﬁnite matrix T := Ω w(x) h(x)h(x)T dx ∈ R4,4 . By (10) and since r1 ∦ r2 , the vectors r1 ⊗ r2 and r2 ⊗ r1 are two linearly independent eigenvectors of the eigenvalue 0 of T . Instead of determining the directions r1 and r2 via (10), Aach et al. [6] proposed to rewrite (9) by skipping the double entry ∂xy f in h as ˜ ˜ := (∂xx f, ∂xy f, ∂yy f )T , r := (r11 r21 , r11 r22 + r12 r21 , r12 r22 )T . with h 0 = rT h(x) (11) Then our determining equation (10) becomes T ˜ h(x) ˜ T dx ∈ R3,3 0 = r T r with T := w(x) h(x) (12) Ω

Anisotropic Smoothing Using Double Orientations

481

and r is an eigenvector of 0 of the symmetric, positive semideﬁnite matrix T . ˜ := ˜ 1 , s2 ⊗s ˜ 2 ), v ⊗v More precisely, we can prove that T = S Φ S T with S := (s1 ⊗s 2 2 T (v1 , v1 v2 , v2 ) and

2 ϕ1 (sT1 x) ϕ2 (sT2 x) ϕ1 (sT1 x) dx w(x) Φ := T 2 ϕ1 (sT1 x)ϕ2 (sT2 x) Ω ϕ2 (s2 x) so that rank T = 0 if ϕi ∈ Π1 , i = 1, 2, rank T = 1 if ϕi ∈ Π1 for exactly one i or ϕi ∈ Π2 \ Π1 for i = 1, 2, rank T = 2 otherwise, where Πn denotes the space of polynomials on [−ε, ε] of degree ≤ n. If rank T = 2 (vertex case), then the nullspace of T is N (T ) = {c r : c ∈ R}. If rank T = 1 (edge case) and ϕ1 is linear but ϕ2 not, then N (T ) = {(r11 c1 , r11 c2 + r12 c1 , r12 c2 )T : c = (c1 , c2 )T ∈ R2 }, i.e., c plays the role of r2 in (11). There exist several possibilities to detect the directions ri , i = 1, 2 from an eigenvector u = (u1 , u2 , u3 )T ∈ N (T ). For example, it is not hard to check that the following setting from [6] does the job: T T For u1 = 0 set r1 := √ 21 2 (u1 , y1 ) , r2 := √ 21 2 (u1 , y2 ) , where yi , i = 1, 2 u1 +y1

u1 +y2

are the solutions of the quadratic equation y 2 − u2 y + u1 u3 = 0. If u1 = 0, then T T yi = 0 for one i and we set ri := √ 21 2 (u2 , u3 ) and r3−i := (0, 1) . u2 +u3

In the following, we choose as direction r1 those fulﬁlling |r1 , ∇fσ˜ | ≤ |r2 , ∇fσ˜ |. In particular, r1 is the isophote direction at edges, where some vector c plays the role of r2 . Occlusion model. By the deﬁnition of f1 and f2 we conclude for all x ∈ Ω that 0=

∂ ∂ f (x) f (x) = (r1T ∇f (x)) (r2T ∇f (x)) = r1T ∇f (x)∇f (x)T r2 ∂r1 ∂r2

(13)

and by rewriting the equation using tensor products that T 0 = (r2 ⊗r1 )T g(x) = (r1 ⊗r2 )T g(x) with g := (∂x f )2 , ∂x f ∂y f, ∂x f ∂y f, (∂y f )2 . This reads in the reduced form with r deﬁned by (11) as T 0 = rT g˜(x) with g˜ := (∂x f )2 , ∂x f ∂y f, (∂y f )2 . Since this relation is true for all x ∈ Ω, we also have that T 0 = r C r with C := w(x) g˜(x)˜ g (x)T dx.

(14)

Ω

Thus, r is an eigenvector of the eigenvalue 0 of the symmetric, positive semidef˜ 1 )(s1 ⊗s ˜ 1 )T + inite matrix C. More precisely, we can prove that C = α1 (s1 ⊗s 4 T ˜ 2 )(s2 ⊗s ˜ 2 )T with αi := α2 (s2 ⊗s dx, i = 1, 2, so that the rank Ωi w(x) ϕi (si x) of C is ν ∈ {0, 1, 2} if exactly 2−ν of the functions ϕi are constant on Ωi , i = 1, 2. The directions ri , i ∈ {1, 2} can be obtained from an eigenvector of N (C) as in the transparent model.

482

G. Steidl and T. Teuber

Fig. 2. Noisy images and their double orientation estimations by the occlusion model (left) and by the transparent model (right)

Double orientation tensors. In practice, we deal with noisy images having image parts with various signiﬁcant directions. As for the classical structure tensor the double orientation tensors are deﬁned as

˜σ˜ Tρ := Kρ ∗ h gσ g˜σT ) , hTσ , Cρ := Kρ ∗ (˜ ˜ := ∂xx fσ , ∂xy fσ , ∂yy fσ T , g˜ := (∂x fσ )2 , ∂x fσ ∂y fσ , (∂yy fσ )2 T and the where h directions r1 , r2 can be derived from an eigenvector of the smallest eigenvalue of Tρ /Cρ (x). For an example of estimated double orientations see Fig. 2.

3

Image Restoration and Shape Preservation

We start with a proposition which characterizes the solution of (1). Proposition 1. The function uˆ ∈ L2 is the solution of the minimization problem (1) iﬀ i) u ˆ = f − λˆ v , ii) vˆ ∈ CJ := {v ∈ L2 : v, w ≤ J(w) ∀w ∈ L2 }, iii) ˆ u, vˆ = J(ˆ u). For the special functional (2) we have that vˆ ∈ CJ if there exists a vector ﬁeld Vˆ ∈ L∞ (R2 , R2 ) such that vˆ := −divVˆ ∈ L2 (R2 ) and Vˆ ∈ Wϕ a.e. on R2 . Using this proposition, one can prove that rectangles with horizontal and vertical edges [4] and + junctions [8] are preserved by the solution of (1) with (2) and ϕ = ϕ1 . Corollary 1. The solution u ˆ of (1) with (2) and ϕ = ϕ1 reads function 1Ω of Ω := (−a, a) × (−b, b) as i) for f:= c 1Ω with the characteristic cab u ˆ = c − λ a+b 1 , λ ≤ , a, b > 0, Ω ab a+b ii) for f := c1 1Ω1 + c2 1Ω2 with Ω1 := (−l, l) × (−a, a), b) × (−l, l) as Ω2 := (−b, c1 la c2 lb l+a l+b u ˆ = c1 − λ la 1Ω1 + c2 − λ lb 1Ω2 , λ ≤ min l+a , l+b , l > a, b > 0. In this paper, we propose to modify (2) (and similarly (4)) by locally including directions. The basic idea is that the minimizer of the modiﬁed functional also preserves shapes as, e.g., shown in Fig. 3 and arbitrary X junctions. This

Anisotropic Smoothing Using Double Orientations

483

Fig. 3. Original and noisy trapezoid image (standard deviation 150)

modiﬁcation can be motivated by the following considerations for a globally ﬁxed transform matrix R: Substituting x := R−1 t, fR := f (R−1 ·), we obtain 1 (f − u)2 + λϕ(∇u) dx 2 R2 1 = (f (R−1 t) − u(R−1 t))2 + λ ϕ(∇x u(R−1 t)) dt 2|det R| R2 1 = (fR (t) − uR (t))2 + λ ϕ(RT ∇t uR (t)) dt. 2|det R| R2 Whence, if u ˆ minimizes the left-hand side, then the transformed image u ˆR := u ˆ(R−1 ·) is a minimizer of 1 2 (fR − u) dx + λ ϕ(RT ∇u) dx. (15) 2 R2 R2 n−1

In the following, we consider discrete square images f := (f (x, y))x,y=0 ∈ Rn,n in their columnwise reshaped form f ∈ RN , N := n2 . Instead of partial derivatives we use forward diﬀerences so that the discrete version of the gradient reads ⎛

11 ⎜ 1 1 ⎜ 1⎜ H0 ⊗ H1 Dx .. := , H0 := ⎜ D= . Dy H1 ⊗ H0 2⎜ ⎝ 1

⎞

⎞

⎛

−1 1 ⎟ ⎜ −1 1 ⎟ ⎜ ⎟ ⎜ .. ⎟ , H1 := ⎜ . ⎟ ⎜ ⎝ 1⎠ −1 2

⎟ ⎟ ⎟ ⎟. ⎟ 1⎠ 0

Then problem (1) becomes

arg min f − u22 + λJ(u) u∈RN

and (2) with ϕ = ϕ1 resp. (4) with

J(u) := Du1 , J(u) :=

R2

|∂x u1 | dx and

(16)

R2

|∂y u2 | dx read as

resp.

min {Dxu1 1 + Dy u2 1 }.

u=u1 +u2

The solution of (16) can be characterized as in the continuous setting:

(17) (18)

484

G. Steidl and T. Teuber

160

140

120

100

80

60

40

20

Fig. 4. Denoising with the directions r, r ⊥ from the classical structure tensor. Left: Angle of r mod 180o (σ = 2.5, ρ = 5). The directions are smoothed near vertices following the smallest way between neighboring edge directions. Middle: Denoising result using only one direction R := (r) (λ = 2500). Following this direction, obtuse vertices are rounded, while the acute one is prolongated. Right: Denoising result using both directions R = (r1 , r2 ) = (r, r ⊥ ) (λ = 1000). The edges of the minimizer u ˆ tend u|, i = 1, 2 to be aligned with one of the directions ri , i.e., one of the summands |ri , ∇ˆ becomes very small. Hence, rounding artefacts are visible at obtuse vertices, while the model decides for the wrong direction at the acute vertex which leads to a cut-oﬀ artefact.

Proposition 2. The vector u ˆ ∈ RN is the solution of the minimization problem (16) if and only if i) - iii) of Proposition 1 hold true, where L2 has to be replaced by RN with the Euclidian inner product. For the special functionals (17) and (18) T we have that vˆ ∈ CJ if and only if there exists a vector Vˆ = (Vˆ (1) )T , (Vˆ (2) )T ∈ R2N such that vˆ := DT Vˆ vˆ := DxT Vˆ (1) = DyT Vˆ (2)

and Vˆ ∞ ≤ 1, and Vˆ ∞ ≤ 1.

resp.,

As in the continuous case rectangles and + junctions are preserved by the solution of (16) with (17). However, due to image boundaries one has to be careful with the discretization. Corollary 2. Let x0 , y0 ≥ 0 and x0 + a, y0 + b ≤ n − 2. The solution u ˆ of the minimization problem (16) with J deﬁned by (17) reads for i) f :=

:= {x0 + 1, · · · , x0 + a} × {y0 + 1, · · · , y0 + b} as c 1Ω with Ω 2(a+b) cab 1Ω , λ ≤ 2(a+b) , where Hi are modiﬁed by Hi (0, 0) = 0, u ˆ = c − λ ab Hi (n − 1, n − 1) = (−1)i , i = 0, 1. ii) f := c1 1Ω1 + c2 1Ω2 with Ω1 := {x0 + 1, · · · ,x0 + a} × {0, . . . ,n − 1}, Ω2 := {0, . . . , n − 1} × {y0 + 1, · · · , y0 + b} as u ˆ = c1 − λ a2 1Ω1 + c2 − λ 2b 1Ω2 , λ ≤ min{ ac21 , bc22 }, where Hi are modiﬁed by H0 (n − 1, 0) = 1, H1 (0, 0) = 0, Hi (n − 1, n − 1) = (−1)i , i = 0, 1. Similarly it can be shown that the inf convolution approach preserves + junctions [8].

Anisotropic Smoothing Using Double Orientations 140

140

120

120

100

100

80

80

60

60

40

40

20

20

0

0

485

Fig. 5. Denoising with double orientations from the occlusion model. Left/Middle: u|, i = 1, 2. Except at isolated vertex points the model aligns the Energies |ri , ∇ˆ edges of the minimizer u ˆ with the direction r1 (σ = 2, ρ = 9.5). Right: Denoised image (λ = 2500). Although not perfect, this result is the best we got with various denoising methods so far. 180

160

140

120

100

80

60

40

20

0

Fig. 6. Denoising with the single direction r1 from the occlusion model. Left: Angle corresponding to the chosen direction (σ = 2, ρ = 9.5, σ ˜ = 5σ). Middle: Denoising with the regularization term |r1 , ∇u| introduces textures at ﬂat regions (λ = 2500). Right: Denoising with the regularization term |∇u| − r1⊥ , ∇u avoids these artefacts (λ = 4500).

Having (15) in mind we introduce our double orientations r1 , r2 from Subsection 2.2 into (17) resp. (18) and consider for r˜iT = (diag(ri1 ), diag(ri2 )), i = 1, 2, the minimizers of our new functionals 1 ˜ T Du1 = 1 f − u22 + λ(˜ f − u22 + λR r1T Du1 + ˜ r2T Du1 ), 2 2 1 f − u22 + λ min {˜ r1T Du1 1 + ˜ r2T Du2 1 } . u=u1 +u2 2

(19)

(20)

We want to examine the behavior of (19) by the simple denoising example in Fig. 3. First, we computed the minimizers using the directions r and r⊥ from the classical structure tensor. The appearing artefacts are commented in the caption of Fig. 4. Then, Fig. 5 shows the good denoising result with the proposed occlusion model for double orientations. Finally, Fig. 6 presents the denoising results

486

G. Steidl and T. Teuber

obtained by using only direction r1 from this model. This leads to artefacts in ﬂat regions, where the process introduces texture due to directional smoothing of heavy noise. This eﬀect can be avoided by replacing |r1 , ∇u | by |∇u|−r1⊥ , ∇u . Note that we have to adapt the sign of r1⊥ such that r1⊥ , ∇fσ˜ ≥ 0 here. This functional was also proposed in [19] but with a more expansive procedure to ﬁnd appropriate directions r1⊥ .

4

Numerical Examples

In the following, we present further numerical examples. All programs were written in MATLAB, where we solved the minimization problems via their dual problem using second-order cone programming implemented in the software package MOSEK [29]. To discretize the derivatives occurring in the orientation estimation tensors we applied the ﬁlters suggested by Scharr in [30]. The gray values of the original images are in [0, 255] and for visualization we have used the MATLAB routine ’imagesc’, which incorporates an aﬃne gray value scaling. Moreover, the parameters are chosen with respect to the best visual result. To start with, we took a noisy image with diﬀerent shapes and restored it by nonlocal means, ROF and by (19) with occluding directions. The results are presented in Fig. 7. As already observed in [25] the result by ROF suﬀers from rounding artefacts at corners, since to remove all noise the regularization parameter λ has to be chosen rather large. This is avoided by (19) using occluding directions as visible at bottom right. The example with nonlocal means gives slightly worse results at corners. To demonstrate the performance on a real world image we included Fig. 8. Here, the example shows that the shape of

Fig. 7. Top: noisy image (standard deviation 100) and restored image by iterating two times the nonlocal means ﬁlter [28]. Bottom left: denoised image by ROF (λ = 500). Bottom right: restored image by (19) and occluding directions (λ = 900, σ = 2, ρ = 6).

Anisotropic Smoothing Using Double Orientations

487

Fig. 8. Top: noisy image (standard deviation 30) and result by the nonlocal means ﬁlter [28]. Bottom left: denoised image by ROF (λ = 50). Bottom right: result by (19) and occluding directions (λ = 50, σ = 0.5, ρ = 8).

Fig. 9. Left to right: original image [30], noisy image (standard deviation 10), denoised image by (19) (λ = 15, σ = 2, ρ = 12), denoised image by (20) (λ = 40, σ = 2,ρ = 12). The directions are estimated by the transparent model.

488

G. Steidl and T. Teuber

the building is much better preserved by (19) than by ROF, since the local directions in the image are treated much more accurate. In contrast to nonlocal means, our method as well as ROF suﬀer from staircaising eﬀects. However, for a large smoothing parameter related to the noise level nonlocal means creates small blur artefacts where our result has sharp structures. Besides, our method is computationally much faster. Finally, to point out the beneﬁts of inf convolution, Fig. 9 shows restored images of an oriented texture by (19) and (20) resp. using the transparent model. For such images inf convolution is better suited than (19), since (19), like ROF, aims for a piecewise constant solution, which means that too many details are removed.

5

Conclusions

We have demonstrated how directional information estimated by the transparent or the occlusion model [6] can be integrated into certain minimization problems to improve the restoration results especially at sharp corners and X junctions. For simplicity we have restricted our attention to double orientations, but a generalization to more than two directions is possible with the results presented in [31]. To further improve the restoration results one option would be to use also higher order derivatives as done in [32]. Through this, it is for example possible to overcome the staircaising eﬀects observed for (19).

References 1. Chambolle, A.: Total variation minimization and a class of binary MRF models. In: Rangarajan, A., Vemuri, B.C., Yuille, A.L. (eds.) EMMCVPR 2005. LNCS, vol. 3757, pp. 136–152. Springer, Heidelberg (2005) 2. Hintermüller, M., Kunisch, K.: Total bounded variation regularization as a bilaterally constrained optimization problem. SIAM J. Appl. Math. 4(64), 1311–1333 (2004) 3. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 4. Esedoglu, S., Osher, S.: Decomposition of images by the anisotropic Rudin-OsherFatemi model. Comm. Pure and Applied Mathematics 57(12), 1609–1626 (2004) 5. Chambolle, A., Lions, P.L.: Image recovery via total variation minimization and related problems. Numerische Mathematik 76, 167–188 (1997) 6. Aach, T., Mota, C., Stuke, I., Mühlich, M., Barth, E.: Analysis of superimposed oriented patterns. IEEE Trans. on Image Processing 15(12), 3690–3700 (2006) 7. Förstner, W., Gülch, E.: A fast operator for detection and precise location of distinct points, corners and centres of circular features. In: Proc. ISPRS Intercommission Conf. on Fast Processing of Photogrammetric Data, pp. 281–305 (1987) 8. Teuber, T.: Anisotropic smoothing using double orientations. Preprint University of Mannheim (2009) 9. Tschumperlé, D.: Fast anisotropic smoothing of multivalued images using curvature preserving PDEs. International Journal of Computer Vision 68(1), 65–82 (2006) 10. Weickert, J.: Anisotropic Diﬀusion in Image Processing. Teubner, Stuttgart (1998) 11. Tschumperlé, D., Deriche, R.: Vector-valued image regularization with PSDs: A common framework for diﬀerent applications. IEEE Trans. on Pattern Analysis and Machine Intelligence 27(4) (2005)

Anisotropic Smoothing Using Double Orientations

489

12. Aubert, G., Kornprobst, P.: Mathematical Problems in Image Processing: Partial Diﬀerential Equations and the Calculus of Variations. Applied Mathematical Sciences, vol. 147. Springer, New York (2002) 13. Steidl, G., Teuber, T.: Diﬀusion tensors for denoising sheared and rotated rectangles (submitted) (2008) 14. Tschumperlé, D.: The CImg library. C++ Template Image Processing Library, http://cimg.sourceforge.net 15. Cabral, B., Leedom, L.C.: Imaging vector ﬁelds using line integral convolution. In: SIGGRAPH 1993, Computer Graphics, vol. 27, pp. 263–272 (1993) 16. Weickert, J.: Anisotropic diﬀusion ﬁlters for image processing based quality control. In: Fasano, A., Primicerio, M. (eds.) Proc. Seventh European Conference on Mathematics in Industry, pp. 355–362. Teubner, Stuttgart (1994) 17. Goldfarb, D., Wen, Z., Yin, W.: A curvilinear search method for p-harmonic ﬂows on spheres. SIAM Journal on Imaging Sciences 2(1), 84–109 (2009) 18. Kimmel, R., Sochen, N.: Orientation diﬀusion or how to comb a porcupine? Journal of Visual Communication and Image Representation 13(1-2), 238–248 (2002) 19. Lysaker, O., Osher, S., Tai, X.C.: Noise removal using smoothed normals and surface ﬁtting. IEEE Trans. on Image Processing 13(10), 1345–1357 (2004) 20. Vese, L., Osher, S.: Numerical methods for p-harmonic ﬂows and applications to image processing. SIAM Journal on Numerical Analysis 40(6), 2085–2104 (2002) 21. Yuan, J., Schnörr, C., Steidl, G.: Convex Hodge decomposition and regularization of image ﬂows. Journal of Mathematical Imaging and Vision 33(2), 169–177 (2009) 22. Rahman, T., Tai, X.C., Osher, S.: A TV-Stokes denoising algorithm. In: Sgallari, F., Murli, A., Paragios, N. (eds.) SSVM 2007. LNCS, vol. 4485, pp. 473–483. Springer, Heidelberg (2007) 23. Spira, A., Kimmel, R., Sochen, N.: A short-time Beltrami kernel for smoothing images and manifolds. IEEE Trans. on Image Processing 16(6), 1628–1636 (2007) 24. Tomasi, C., Manduchi, R.: Bilateral ﬁltering for gray and color images. In: Proc. Sixth Intern. Conf. on Computer Vision, pp. 839–846. Narosa Publishing House (1998) 25. Berkels, B., Burger, M., Droske, M., Nemitz, O., Rumpf, M.: Cartoon extraction based on anisotropic image classiﬁcation. In: Vision, Modeling, and Visualization Proceedings, pp. 293–300 (2006) 26. Setzer, S., Steidl, G., Teuber, T.: Restoration of images with rotated shapes. Numerical Algorithms 48(1-3), 49–66 (2008) 27. Buades, A., Coll, B., Morel, J.M.: A non-local algorithm for image denoising. In: IEEE Int. Conf. on Comp. Vision and Pattern Recognition., vol. 2, pp. 60–65 (2005) 28. Manjón, J.V., Buades, A.: NL means. MATLAB Software, http://dmi.uib.es/~abuades/software.html 29. The MOSEK Optimization Toolbox, http://www.mosek.com 30. Scharr, H.: Diﬀusion-like reconstruction schemes from linear data models. In: Franke, K., Müller, K.-R., Nickolay, B., Schäfer, R. (eds.) DAGM 2006. LNCS, vol. 4174, pp. 51–60. Springer, Heidelberg (2006) 31. Mühlich, M., Aach, T.: A theory for multiple orientation estimation. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 69–82. Springer, Heidelberg (2006) 32. Setzer, S., Steidl, G.: Variational methods with higher-order derivatives in image processing. In: Neamtu, M., Schumaker, L.L. (eds.) Approximation Theory XII: San Antonio 2007, pp. 360–385. Nashboro Press (2008)

Image Denoising Using TV-Stokes Equation with an Orientation-Matching Minimization Xue-Cheng Tai1,2 , Soﬁa Borok1, and Jooyoung Hahn1 1

Division of Mathematical Sciences, School of Physical Mathematical Sciences, Nanyang Technological University, Singapore 2 Department of Mathematics, University of Bergen, Norway [email protected]

Abstract. In this paper, we propose an orientation-matching minimization for denoising digital images with an additive noise. Inspired by the two-step algorithm in the TV-Stokes denoising process [1, 2, 3], the regularized tangential vector ﬁeld with the zero divergence condition is used in the ﬁrst step. The present work suggests a diﬀerent approach in order to reconstruct a denoised image in the second step. Namely, instead of ﬁnding an image that ﬁts the regularized normal direction from the ﬁrst step, we minimize an orientation between the image gradient and the regularized normal direction. It gives a nonlinear partial diﬀerential equation (PDE) for reconstructing denoised images, which has the diffusivity depending on an orientation of a regularized normal vector ﬁeld and the weighted self-adaptive force term depending on the direction between the gradient of an image and the vector ﬁeld. This allows to obtain a denoised image which has sharp edges and smooth regions, even though an original image has smoothly changing pixel values near sharp edges. The additive operator splitting scheme is used for discretizing Euler-Lagrange equations. We show improved qualities of results from various numerical experiments.

1

Introduction

Digital image denoising processes based on partial diﬀerential equations (PDEs) and energy minimization have been extensively studied for last 20 years in both theoretical and practical ways. From the Gaussian ﬁltering to the anisotropic diffusion [4,5,6] and the total variation (TV) minimization [7,8], a noisy image has been denoised from poorly estimated derivative information. The TV-ﬁltering is very eﬀective for piecewise constant images and the anisotropic diﬀusion is adjustable to ﬂow-like images. However, both approaches are not suitable for an image which has smoothly changing pixel values near sharp edges. Since qualities of denoised images are seriously dependent on estimated derivative information, it has been a crucial topic to regularize derivatives of an image [9], that is, an orientational information [10, 11, 12, 1]. Inspired by [1, 2, 3],

The research is supported by MOE (Ministry of Education) Tier II project T207N2202 and IDM project NRF2007IDMIDM002-010. In addition, the support from SUG 20/07 is also gratefully acknowledged.

X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 490–501, 2009. c Springer-Verlag Berlin Heidelberg 2009

The TV-Stokes Equation with an Orientation-Matching Minimization

491

we also use a regularization of the tangent vector ﬁeld of an image with the zero divergence condition. The present work propose a diﬀerent approach in order to reconstruct a denoised image from the regularized normal vector ﬁeld, which we call an orientation-mathching minimization. That is, we minimize an orientation between the image gradient and the regularized normal direction. It gives a nonlinear PDE for reconstructing denoised images, which has the diffusivity depending on an orientation of the regularized normal vector ﬁeld and the weighted self-adaptive force term depending on the direction between the gradient of an image and the vector ﬁeld. This allows to obtain a denoised image which has sharp edges and smooth regions, even though an original image has smoothly changing pixel values near sharp edges. The paper is organized as follows. In Section 2, we introduce the proposed model with a review of TV-Stokes (TVS) denoising algorithm [1,2]. Some numerical aspects are explained in Section 3. Several numerical examples are shown and diﬀerent models are compared in Section 4. The paper is concluded in Section 5.

2 2.1

Two-Step Denoising Model Review of TV-Stokes Denoising Algorithm

Let us consider a gray true image d: Ω ⊂ R2 → [0, 1]. We assume that a given noisy image d0 has an additive Gaussian white noise η with the relation d0 (p) = d(p) + η(p),

p = (x, y) ∈ Ω.

The normal and tangential vectors of the level curves of an image d are given by T T ∂d ∂d ∂d ∂d , ,− and t = ∇⊥ d(p) = , (1) n = ∇d(p) = ∂x ∂y ∂y ∂x where T is a transpose. Then, the vector ﬁelds are satisﬁed with the following conditions ∇ × n = 0 and ∇ · t = 0, which means n is the irrotational vector ﬁeld and t is the incompressible vector ﬁeld. This property is very crucial when an image is reconstructed from the information of n or t. The TVS denoising model [1, 2] consists of two steps to obtain a denoised image, which uses the same process in the second step as the Lysaker-OsherTai (LOT) model [10]. However, for the ﬁrst step, instead of regularizing the normal vector ﬁeld in the LOT model, a tangential vector ﬁeld is regularized with the constraint of incompressibility. The regularized tangential vector ﬁeld t is obtained by minimizing a functional: δ 2 min |∇t| + |t − t0 | dp, (2) ∇·t=0 Ω 2 where t0 = ∇⊥ d0 , δ is a positive parameter, and |∇t| is deﬁned by

492

X.-C. Tai, S. Borok, and J. Hahn

|∇t| =

∂u ∂x

2

+

∂u ∂y

2

+

∂v ∂x

2

+

∂v ∂y

2

, ∇t =

∇u ∇v

, t=

u . v

The minimization problem is originally introduced in [2, 1]. The optimality condition for the saddle point is obtained by the gradient descent ﬂow which gives the PDE ∂t ∇t −∇· + δ(t − t0 ) − ∇λ = 0, ∂τ |∇t| (3) ∇ · t = 0, with the boundary conditions and the initial condition ∇t + λI · ν = 0, t(p, 0) = t0 , |∇t| where I is the identity matrix. Note that it is not straightforward to use the Perona-Malik (PM) model [4] or Rudin-Osher-Fatemi (ROF) model [7] directly for regularizing derivative information of an image [9]. One of reason for regularizing the tangential vector ﬁeld is that the incompressibility condition, ∇ · t = 0, is numerically computed using the Chorin projection type method which is well developed in the ﬂuid dynamics; see details in Section 3. Moreover, the condition guarantees the existence of an image d which satisﬁes the relation (1). Once the regularized tangent vector ﬁeld t = (u, v)T is obtained in the ﬁrst step, the regularized normal vector ﬁeld n is deﬁned by (v, −u)T . In two-step algorithms for image denoising [10, 2, 1] and image inpainting [2], it is suggested to solve the following minimization problem in the second step to reconstruct an image from n: n min dp, (4) |∇d| − ∇d · |n| d−d0 2 =σ Ω where ·2 is the L2 (Ω) norm and σ is the standard deviation of a Gaussian white noise. From the Euler-Lagrange equation and the gradient descent method along ﬁctitious time τ , we obtain a PDE for reconstructing an image with the free ﬂux boundary condition and an initial condition d(p, 0) = d0 (p): n ∂d ∇d (p, τ ) = ∇ · − − μ(d − d0 ), (5) ∂τ |∇d| |n| where μ is a positive parameter. Note that the ROF model is in the case of n = 0, which means that TV-norm ﬁlter is very suitable for denoising a piecewise constant image. In other words, the model suﬀers from a stair-case eﬀect on regions whose pixel values are smoothly changed. Since the TVS denoising model and the LOT model ﬁnd an image that ﬁts the regularized normal vector ﬁeld from the PDE (5), it is natural to have a better performance than the ROF model. However, it still has problems when the original image has smoothly changing pixel values near sharp edges and the regularized normal vector ﬁeld on some regions is almost parallel or has some numerical errors; see Figures 2 and 4.

The TV-Stokes Equation with an Orientation-Matching Minimization

2.2

493

Orientation-Matching Minimization

Inspired by the two-step algorithm in the TVS denoising model, we also use the regularized tangential vector ﬁeld with the zero divergence condition in the ﬁrst step. In this paper, we propose a new approach for reconstructing a denoised image in the second step. Namely, unlike ﬁnding an image that ﬁts the regularized normal direction (4), we minimize an orientation between the image gradient and the regularized normal direction: |∇d · n| min dp, (6) − d−d0 2 =σ Ω |∇d||n| where ·2 and σ are same in (4). From the Euler-Lagrange equation and the gradient descent method along ﬁctitious time τ , we obtain new PDE for obtaining a denoised image with the free ﬂux boundary condition and an initial condition d(p, 0) = d0 (p): sgn(∇d · n) n |∇d · n| ∇d ∂d (p, τ ) = ∇ · − − μ(d − d0 ), (7) ∂τ |∇d|2 |n| |∇d| |∇d| |n| where sgn(·) is the sign function and μ is a positive parameter. Unlike the diﬀu1 n sivity term |∇d| and the ﬁxed force ∇· |n| term in (5), the PDE from the proposed minimization has the diﬀusivity depending on an orientation of the regularized normal vector ﬁeld n and the weighted self-adaptive force term depending on the direction between ∇d and n. We expect two diﬀerences between the proposed model (6) and the previous one (4) for reconstructing a denoised image. The ﬁrst is that we have smaller orientation diﬀerence between the gradient of an original image and the gradient of a denoised image. The second is that the result in our model will have sharper edges in a denoised image, specially when the original image has smoothly changing pixel values near sharp edges. These are easily observed in numerical experiments and there are some plausible reasons. In order to see the ﬁrst diﬀerence, we assume that θ is the angle between ∇d/|∇d| and n/|n|. Then, the functional in the proposed model is written by (−| cos θ|)dp. (8) Ω

and the functional in the previous model is presented by n ∇d · n dp = dp |∇d| − ∇d · |∇d| 1 − |n| |∇d||n| Ω Ω = |∇d|(1 − cos θ)dp.

(9)

Ω

The previous energy functional minimizes both |∇d| and the angle θ. If an image d has some regions where |∇d| is large enough, the minimization of the angle diﬀerence between ∇d/|∇d| and n/|n| has quite an weak eﬀect. In case of very

494

X.-C. Tai, S. Borok, and J. Hahn

small |∇d|, any angle will ﬁt to n/|n|. Even though there exists a small amount of the angle diﬀerence, the graph of a denoised image is easily aﬀected to generate a diﬀerent shape to the original image. Since the proposed energy functional only minimizes the orientation diﬀerence, the shape of a denoised result is more sensitively changed in order to ﬁt the original image regardless of the magnitude of |∇d|. We numerically show the orientation diﬀerence in Table 1 using diﬀerent methods. When we assume that ∇d is approximately parallel to n, the second diﬀerence is expected because the proposed PDE can be written by |∇d · n| ∇d sgn(∇d · n) n n ∇d 1 ∇· − ∇· − (±) . |∇d|2 |n| |∇d| |∇d| |n| |∇d| |n| |∇d| From the approximation, if |∇d| is large, we observe that the proposed model (7) is dominantly inﬂuenced by a data ﬁdelity term and slightly aﬀected by a regularization term. However, the previous model (5) is still aﬀected by an additional force term from the regularized normal vector ﬁeld. Since we may have some numerical errors of the vector ﬁeld in a numerical computation of (2), it is diﬃcult to know whether the additional force will generate a good result or not. Even though the extra force reduces a stair-case eﬀect comparing to the TV-ﬁltering method in smooth regions, it may derive an erroneous eﬀect near edges where |∇d| is large. We numerically show qualities of a denoised image when the original image has smoothly changing pixel values near sharp edges; see Figure 2, 3, and 4.

3

Numerical Aspects

For the discretization, we use the standard staggered grid which is suggested in [2]. In this section, we brieﬂy note some issues of discretization in the ﬁrst and second steps. 3.1

A Regularization of the Tangent Vector Field

The minimization problem (2) for regularizing the tangent vector ﬁled with the constraint of the incompressibility condition is solved by the method of Lagrange and the Chorin projection type method. We apply the Chorin projection type method and the AOS method [13, 14] to solve the PDE (3). 1. Calculation for an intermediate tangent ﬁeld t∗ which is not incompressible vector ﬁeld. t∗ − tn ∇t∗ = ∇· − δ(t∗ − t0 ), Δτ |∇tn | with the boundary condition ∇t∗ · ν = 0,

The TV-Stokes Equation with an Orientation-Matching Minimization

495

where |∇tn | ≡ + |∇tn |2 and tn is the tangent vector ﬁeld at the nth time step. The AOS method of the linearized equation for the component u and v is used. The spatial derivatives with respect to x and y are approximated by standard one-sided ﬁnite diﬀerences. 2. Solving for λ such that ⎧ n+1 − t∗ ⎨t = ∇λ, Δτ ⎩∇ · tn+1 = 0. This gives a Poisson equation for λ with the zero Neumann boundary condition: 1 ∇ · t∗ . ∇ · ∇λ = − Δτ 3. Updating the tangent vector ﬁeld by tn+1 = t∗ + Δτ ∇λ. The boundary values are updated by the incompressibility condition. More datails are shown in [2, 1]. For the stopping criterion, we use the steady state condition for the ﬂow t = (u, v)T : n+1 − un ||∞ ||v n+1 − v n ||∞ ||u ≤ α, , max ||un ||∞ ||v n ||∞ where n and n + 1 are consecutive time steps and || · ||∞ is the L∞ (Ω) norm. Note that α = 10−4 is ﬁxed for all examples in the paper. 3.2

A Reconstruction of a Denoised Image

After the regularized tangent vector ﬁeld t = (u, v)T is computed from the ﬁrst step, we propose an orientation-matching minimization (6) to reconstruct a denoised image from the regularized normal vector ﬁeld n = (v, −u)T . The optimality condition for the saddle point is obtained by the gradient descent ﬂow which gives a PDE (7). We also apply the AOS method to solve the proposed PDE. Note that we use a regularized sign function [15]:

sgnε (s) ≡ 2Hε (s) − 1,

⎧ 1 ⎪ ⎪ ⎨ Hε (s) ≡ 0 ⎪ ⎪ ⎩ 1 1 + s + 1 sin πs 2 ε π ε

s > ε, s < ε, otherwise,

and a parameter is used to avoid division by zero in numerical experiments: |∇dn | ≡ + |∇dn |2 , |n| ≡ + |n|2 , where n is the nth time step. More datails are shown in [1, 2].

496

X.-C. Tai, S. Borok, and J. Hahn

For the stopping criterion, we use the steady state condition for the relative diﬀerence in the energy (6). That is, |E n+1 − E n | ≤ β, En where E n is the energy value at the time step n approximated by

|∇dn · n| n − . E ≈ |∇dn | |n| i,j The value of β may be diﬀerent for images and we use 10−2 ≤ β ≤ 10−4 . The energy (4) is similarly computed and it is used for the stopping criterion of the second step in the previous model. Remark 1. The right choice of parameters is crucial for qualities of a denoised image. The parameters, δ and μ, they control a balance between a data smoothing and a ﬁdelity therm. The parameter is used to avoid a division by zero, which also controls the diﬀusivity for smoothing a data. The AOS scheme provides us a wide range of the time step. However, if Δτ is too large, then visual qualities of a denoised image are deteriorated.

4

Examples

In this section, we show numerical experiments for denoising an image based on the proposed method. With synthetic images and real images, we discuss about the strength of the proposed orientation-matching minimization and compare with results from other methods. For the simplicity, the following notations are used to indicate parameters in diﬀerent methods. – – – – –

V (Δτ, δ, ): a regularization of the tangent vector ﬁeld (3). M 1 (Δτ, μ, ): a reconstruction of a denoised image from (7). M 2 (Δτ, μ, ): a reconstruction of a denoised image from (5). M 3 (λ): the TV-ﬁltering method in [8]. M 4 (μ, ρ, ): a reconstruction of a denoised image from (10).

We also include an interesting numerical experiment to combine the anisotropic nonlinear diﬀusion [6, 5] with the regularized tangent vector ﬁeld t = (u, v)T in the ﬁrst step (2). That is, the diﬀusivity tensor is constructed from n = (v, −u)T and we solve a PDE with the free ﬂux boundary condition: ∂d (p, τ ) = ∇ · g Gρ ∗ nnT ∇d − μ(d − d0 ), (10) ∂τ where (Gρ ∗M )ij = Gρ ∗mij for a matrix M = (mij ) and Gρ ∗f is the convolution of f with the two-dimensional Gaussian kernel with the standard deviation ρ. The function g is deﬁned on a set S of real semi-positive symmetric 2×2 matrices: 1 1 g(M ) ≡ √ vΛ vΛ T + √ (Λ2 ) vΛ2 vΛ2 T , + Λ1 1 1 + Λ2 where (Λ1 , vΛ1 ) and (Λ2 , vΛ2 ) are eigenpairs of M ∈ S, Λ1 ≥ Λ2 .

The TV-Stokes Equation with an Orientation-Matching Minimization

(test 1)

(test 2)

(test 3)

(test 4)

497

(test 5)

Fig. 1. Results from the proposed method: the ﬁrst row is original images, we add a Gaussian white noise with zero mean and the standard deviation 10 for all images in the second row, and the last row is the result from the proposed method

Table 1. Comparison of the orientation diﬀerence γ in (11): (A) is the result of the proposed method, (B) is the result of TVS denoising method, (C) is the result of TVﬁlter method. The denoised image from the prosed method is shown in the third row of Figure 1. images test 1 test 2 test 3 test 4 test 5 (A) (B) (C)

(a)

(b)

0.9706 0.8693 0.7668 0.5681 0.4936 0.9316 0.8478 0.6304 0.4983 0.4051 0.7466 0.6825 0.6218 0.3891 0.3228

(c)

(d)

(e)

(f)

Fig. 2. Comparison with other methods: (a), (b), and (c) are the graph of images from top to bottom of the test 5 in Figure 1, respectively. (d) is the result of TVS denoising model and (e) is the result of TV-ﬁltering model. (f) is the result from (10). Note that (c) is the result from the proposed model.

498

X.-C. Tai, S. Borok, and J. Hahn

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 3. (a) is an original image. We add a Gaussian white noise with zero mean and the standard deviation 20 in (b) which is larger noise than in test 4 in Figure 1. (c) is the result of the proposed model. (d) is the result of TVS denoising model and (e) is the result of TV-ﬁltering model. (f) is the result from (10).

(a)

(a1)

(a2)

(a3)

(b)

(b1)

(b2)

(b3)

Fig. 4. (a) is a part of a tangent vector ﬁeld from (2). (a1), (a2), and (a3) in the ﬁrst row are a part of the images (c), (d), (f) in Figure 3, respectively. In the second row, we compute less smooth tangent vector ﬁeld (b) in the ﬁrst step and use the same method for the second step as the ﬁrst row.

The TV-Stokes Equation with an Orientation-Matching Minimization

(a)

(b)

(c)

(d)

499

Fig. 5. There is a Gaussian white noise with zero mean and the standard deviation 10 in (a) from [16]. (b) is the result from the proposed model. (c) is the result of TVS denoising model and (d) is the result of TV-ﬁltering model. The size of image is 240 × 124.

(a)

(b)

(c)

(d)

Fig. 6. There is a Gaussian white noise with zero mean and the standard deviation 10 in (a) from [16]. (b) is the result of the proposed model. (c) is the result of TVS denoising model and (d) is the result of TV-ﬁltering model. The size of image is 181 × 274.

Example 1. We numerically check how well the orientation of the gradient of a denoised image is ﬁtted to the gradient of the original image. In Table 1, we measure the orientation diﬀerence for diﬀerent test images: ∇de 1 ∇dc γ= · dp, (11) |Ω| Ω |∇de | |∇dc | where de is the original image, dc is the computed denoised image, and |Ω| is the area of the domain. In the ﬁrst step in (A) and (B), V (10−1 , 1, 104) is

500

X.-C. Tai, S. Borok, and J. Hahn

ﬁxed for all test images. In the second step in (A) and (B), M 1 (10−3 , 1, 10−3) and M 2 (10−3 , 1, 10−6) for test 1, M 1 (10−3 , 1, 5 · 10−3 ) and M 2 (10−3 , 1, 2.5 × 10−5 ) for test 2, M 1 (10−3 , 1, 2.5 · 10−5 ) and M 2 (103 , 1, 5 · 10−3 ) for test 3, M 1 (10−3 , 1, 10−3 ) and M 2 (10−3 , 5, 10−3) for test 4, and M 1 (10−3 , 2, 3 × 10−3 ) and M 2 (103 , 3, 3 × 10−3 ) for test 5 are used, respectively. In (C), all results are obtained by M 3 (60). As we explain in Section 2.2, the proposed model has better performance for ﬁtting the orientation. In Figure 2, the graph of computed results are presented in order to see visual diﬀerence. The result (f) is obtained by (10) with M 4 (0.4, 0.1, 10−3). A denoised image from the proposed method has very clean shape, even though an original image has smoothly changing pixel values near edges. We observe that results from other methods do not have very sharp edges. The result (e) from the TV-ﬁltering model has has a stair-case eﬀect on smooth regions. These results are expected in Section 2.2. Example 2. In Figure 3, we compare the results from diﬀerent methods with larger noise in Figure 1. For a regularization of the tangent vector ﬁeld in (c) and (d), V (5 × 10−2 , 1, 10−4 ) is used. The result of the proposed method in (c) is obtained by using M 1 (10−3 , 2, 10−3). (d), (e), and (f) are obtained by M 2 (10−3 , 4, 10−4 ), M 3 (80), and M 4 (0.5, 1, 10−3). Now, we observe the eﬀect of the ﬁrst step (2) to the second step in (7), (5), and (10) is numerically shown. The ﬁrst row in Figure 4 is a part of images in Figure 3. In the second row, we obtain a relatively less smooth vector ﬁeld with V (10−1 , 3, 10−4). (b2) is obtained by M 1 (10−3 , 2, 10−3 ) and we use same parameters for (b1) and (b3) as (a1) and (a3). Note that the result (b2) does not have very clean edge even if we use smaller μ in the second step for the previous model (5). The other methods, (5) and (10), are responded by a small change of the vector ﬁeld because the ﬁeld is directly used in the formulation without considering any relation with an image data. Example 3. For real images, we make a comparison with denoised images from diﬀerent methods. In Figure 5, the image (a) is obtained by the proposed method using V (10−1 , 5, 10−4 ) and M 1 (5 × 10−4 , 5, 5 × 10−4 ). (b) is from V (5 × 10−2, 5, 10−4 ) and M 2 (10−3 , 1, 5 × 10−3). (c) is from M 3 (60). In Figure 6, the image (a) is obtained by the proposed method with V (10−1 , 2, 10−4) and M 1 (10−4 , 30, 10−3). (b) is from V (10−1 , 2, 10−4 ) and M 2 (10−3 , 2, 10−3). (c) is from M 3 (60). For these images, two models (4) and (6) give similar results which are better than the TV-ﬁltering model.

5

Conclusions

We proposed an orientation-matching minimization for denoising digital images. Our algorithm consisted of two steps. In the ﬁrst step, we use the regularized tangent vector ﬁeld with the incompressibility condition which is suggested in [2]. The condition is crucial for reconstructing an image from the vector ﬁeld. In the second step, the present work proposed a minimization of an orientation between the image gradient and the regularized normal direction. It gives a nonlinear PDE for reconstructing a denoised images, which has the diﬀusivity depending on an

The TV-Stokes Equation with an Orientation-Matching Minimization

501

orientation of the regularized normal vector ﬁeld and the weighted self-adaptive force term depending on the direction between the gradient of an image and the vector ﬁeld. This allows to obtain a denoised image which has sharp edges and smooth regions, even though an original image has smoothly changing pixel values near sharp edges. We show improved qualities of results from various numerical experiments.

References 1. Rahman, T., Tai, X.C., Osher, S.: A TV-Stokes denoising algorithm. In: Sgallari, F., Murli, A., Paragios, N. (eds.) Scale Sace and Variational Methods in Computer Vision, pp. 473–482. Springer, Heidelberg (2007) 2. Tai, X.C., Osher, S., Holm, R.: Image inpainting using TV-Stokes equation. In: Image Processing Based on Partial Diﬀerential Equations, pp. 3–22. Springer, Heidelberg (2006) 3. Bertalmio, M., Sapiro, G., Bertozzi, A.L.: Navier-Stokes, ﬂuid dynamica, and image and video inpainting. In: Proc. Conf. Comp. Vision Pattern Rec., pp. 355–362 (2001) 4. Perona, P., Malik, J.: Scale space and edge detection using anisotropic diﬀusion. IEEE Trans. Pattern Anal. Machine Intell. 12(7), 629–639 (1990) 5. Weickert, J.: Coherence-enhancing diﬀusion ﬁltering. Int. J. Comput. Vis. 31, 111– 127 (1999) 6. Brox, T., Weickert, J., Burgeth, B., Mrázek, P.: Nonlinear structure tensors. Image Vis. Comput. 24, 41–55 (2006) 7. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 8. Bresson, X., Chan, T.: Fast daul minimization of the vectorial total variation norm and applications to color image processing. Inverse Problems and Imaging 2(4), 455–484 (2008) 9. Hahn, J., Lee, C.O.: A nonlinear structure tensor with the diﬀusivity matrix composed of the image gradient. J. Math. Imag. Vis. (accepted) 10. Lysaker, M., Osher, S., Tai, X.C.: Noise removal using smoothed normals and surface ﬁtting. IEEE Trans. Image Processing 13(10), 1345–1357 (2004) 11. Vese, L., Osher, S.: Numerical methods for p-harmonic ﬂows and applications to image processing. SIAM J. Numer. Anal. 40(6), 2085–2104 (2002) 12. Sochen, N., Sagiv, C., Kimmel, R.: Stereographic combing a porcupine or studies on direction diﬀusion in image processing. SIAM J. Appl. Math. 64(5), 1477–1508 (2004) 13. Lu, T., Neittaanmaki, P., Tai, X.C.: A parallel splitting up method for partial diﬀerential equations and its application to Navier-Stokes equations. RAIRO Math. Model. and Numer. Anal. 26(6), 673–708 (1992) 14. Weickert, J., ter Harr Romeny, B.M., Viergever, M.A.: Eﬃcient and reliable schemes for nonlinear diﬀusion ﬁltering. IEEE Trans. Image Processing 7, 398– 410 (2001) 15. Chan, T., Vese, L.: Active contours without edges. IEEE Trans. Image Processing 10, 266–277 (2001) 16. Martin, D., Fowlkes, C., Tal, D., Malik, J.: A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proc. 8th Int’l Conf. Computer Vision, vol. 2, pp. 416–423 (July 2001)

Augmented Lagrangian Method, Dual Methods and Split Bregman Iteration for ROF Model Xue-Cheng Tai1 and Chunlin Wu2 1

2

Division of Mathematical Science, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore and Department of Mathematics, University of Bergen, Johannes Brunsgate 12, N-5008 Bergen, Norway [email protected] Division of Mathematical Science, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore

Abstract. In the recent decades the ROF model (total variation (TV) minimization) has made great successes in image restoration due to its good edge-preserving property. However, the non-diﬀerentiability of the minimization problem brings computational diﬃculties. Diﬀerent techniques have been proposed to overcome this diﬃculty. Therein methods regarded to be particularly eﬃcient include dual methods of CGM (Chan, Golub, and Mulet) [7] Chambolle [6] and split Bregman iteration [14], as well as splitting-and-penalty based method [28] [29]. In this paper, we show that most of these methods can be classiﬁed under the same framework. The dual methods and split Bregman iteration are just diﬀerent iterative procedures to solve the same system resulted from a Lagrangian and penalty approach. We only show this relationship for the ROF model. However, it provides a uniform framework to understand these methods for other models. In addition, we provide some examples to illustrate the accuracy and eﬃciency of the proposed algorithm.

1

Introduction

Image restoration such as denoising and deblurring is one of the most fundamental task in image processing and is in general based on regularization. To preserve image edges and features during image regularization procedures is diﬃcult but very desired. Recently the ROF model [23] has been demonstrated to be very successful in edge-preserving image restoration; see [9] [11] and references therein. Consequently the model attracted much attention and has been extended to high order models [8] [31] [18] [19] [16] [25] and vectorial models [24] [2] [10] for color image restoration [17] [27]. However, the computation of the ROF model suﬀers from serious nonlinearity and non-diﬀerentiability. In [23], the authors proposed an artiﬁcial time marching strategy to the associated Euler-Lagrange equation. This method is slow due to strict stability constraints in the time step size. Besides, the artiﬁcial time marching method computes solutions of not the exact ROF model, but its approximation, say, regularized ROF model. Diﬀerent techniques have been proposed to overcome this diﬃculty. X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 502–513, 2009. c Springer-Verlag Berlin Heidelberg 2009

Augmented Lagrangian Method, Dual Methods and Split Bregman Iteration

503

There are several methods regarded as particularly eﬃcient. One approach is the dual methods [7] [5] [6], which is based on various dual formulations of the model. The other is split Bregman iteration [14], which uses functional splitting and Bregman iteration for constrained optimization [20] [30]. Similar to split Bregman iteration, another approach based on splitting and then alternating minimization of the penalized cost function was proposed in [28] [29]. In this paper, we present augmented Lagrangian method to solve the model and show that the dual method and split Bregman iteration can actually be either deduced from, or equivalent to our method.

2

ROF Model and Related Numerical Solvers

Assume Ω ⊂ R2 is a bounded open subset (usually a rectangle in image processing) and f : Ω → R is an observed image. f often contains various degradation and can be noisy and blurred, which is usually modelled as f = Ku + n,

(1)

where u is the true image, and K, n are the linear operator and noise respectively. The K operator may stand for the identity operator, or various blur operations such as Gaussian blur and motion blur. The noise n may denote Gaussian noise or salt-pepper noise or even others. Image restoration aims to recover u from f with some information of K and n. In this paper we assume that n is some Gaussian white noise and K is a general blur operator. Since the variance of n and the blur kernel of K can usually be estimated, we further assume we know K and the variance of n exactly. With these knowledge, it’s still diﬃcult to recover u from f . Even in the pure denoising case (K = I), it’s not an easy task to get u since we only know the variance of the random noise n. For pure deblur case in which K = I and n = 0, we cannot directly solve f = Ku to get u due to the compactness of K. The problem f = Ku is ill-posed, and the solution would be highly oscillatory. Regularization on the solution should be considered. The restoration problem is thus presented using some regularity R(u) as min R(u) u

s.t.f − Ku2 = σ 2 ,

(2)

where σ is the variance of n. The constrained minimization problem is often solved approximately using Tikhonov regularization as follows min F (u) = R(u) + u

λ Ku − f 2 , 2

(3)

for some parameter λ. There are many choices for the regularity term R(u). One of the most basic and successful choice of the regularity is due to Rudin, Osher, and Fatemi [23] in which R(u) was chosen to be the total variation of u. The so-called ROF model reads

504

X.-C. Tai and C. Wu

u = arg min Frof (u) = u

|∇u| + Ω

λ Ku − f 2 . 2

(4)

In [23] the authors considered the image denoising problem (K = I) and presented a gradient descent method to solve (4). (Here the method is described for general K.) The artiﬁcial time marching was introduced to the associated Euler-Lagrange equation as follows ∇u ) |∇u|2 +β

ut = ∇ · ( √ u(0) = f

+ K ∗ (f − Ku)

,

(5)

where β is a small positive number to avoid zero division and K ∗ is the L2 adjoint of K. There are mainly two drawbacks for the gradient descent method (5). At ﬁrst, the method computes the solution of (4) not exactly, but approximately. On the second, the method is slow due to strict constraints on the time step size. The choice of β aﬀects both aspects. Larger the β, more eﬃcient the scheme is, whereas worse the approximation will be. There is a tradeoﬀ between the accuracy and eﬃciency in choosing β. Many algorithms have been proposed to improve on this method. Those regarded as particularly eﬃcient include dual methods and split Bregman iteration, as well as splitting-and-penalty based method, as mentioned before. Before we go on, we present here an obviously equivalent formulation of the restoration problem (4), which will play an important roll in our derivation. The diﬃculty to solve the ROF restoration model (4) is due to the nondiﬀerentiability of the total variation norm. We introduce an auxiliary variable q for ∇u to separate the calculation of the non-diﬀerentiable term and the ﬁdelity term. The model (4) is thus equivalent to min Grof (u, q) = Ω |q| + λ2 Ku − f 2 u,q , (6) ∂x u q1 = s.t. q= = ∇u ∂y u q2 a constrained optimization problem. 2.1

CGM Dual Method

In [7] Chan et al presented a primal-dual method for the TV minimization. They introduced a new variable ∇u (7) ω= |∇u| to the Euler-Lagrange equation of the model (4), yielding −∇ · ω + λK ∗ (Ku − f ) = 0 , ∇u − ω|∇u| = 0

(8)

to remove some of the singularity caused by the non-diﬀerentiability of the object functional.

Augmented Lagrangian Method, Dual Methods and Split Bregman Iteration

505

Diﬀerent from the original Euler-Lagrange equation for u, this system contains both u and ω variables. In [7], u and ω are called the primal and dual variables, respectively. Again the authors approximate this primal-dual system using a regularized TV norm for real calculation. Newton’s linearization technique for both the primal and dual variables is used to solve the discrete version. 2.2

Chambolle’s Dual Method

Another work based on dual formulation with a slightly diﬀerent derivation is due to Chambolle. In [6] Chambolle used Legendre-Fenchel transform and a key result from optimization theory to get an original and eﬃcient algorithm for total variation minimization. The primal variable of the image data is expressed explicitly with the dual variable and only the dual variable is iteratively computed. The primal variable u is obtained from the ﬁnal result of the dual variable. However, the algorithm dose not consider general K operators. Speciﬁcally, Chambolle adopted the following deﬁnition of total variation for general (not necessary to be smooth) function u: TV(u) = sup{ u(x)∇ · ξ(x) : ξ ∈ Cc1 (Ω; R2 ), |ξ(x)| ≤ 1, ∀x ∈ Ω}. (9) Ω

Denoting S = Closure{∇ · ξ(x) : ξ ∈ Cc1 (Ω; R2 ), |ξ(x)| ≤ 1, ∀x ∈ Ω},

(10)

Chambolle showed that the ROF restoration model (4) with K = I (Note the slight diﬀerence between Eqn. (4) and the model in [6] about the parameter λ) yields 1 u = f − πS (λf ) = f − π S (f ), (11) λ λ where πS (·) is the L2 norm projection operator to S, which reads πS (·) = arg min {divξ(x) − ·2 : |ξ(x)| ≤ 1, ∀x ∈ Ω}. divξ(x)

(12)

Since S is not a linear space, this projection is nonlinear. From the KKT conditions and with a careful observation, it was shown in [6] that ξ(x) for πS (λf ) satisﬁes −∇(divξ(x) − λf ) + |∇(divξ(x) − λf )|ξ(x) = 0, (13) which can be solved by a semi-implicit gradient descent algorithm. Note here we present the continuous case instead of the discrete version used in [6]. 2.3

Split Bregman Iteration

Recently (split) Bregman iteration attracts much attention in signal recovery and image processing community. The basic idea is to transform a constrained optimization problem to a series of unconstrained problems. In each unconstrained

506

X.-C. Tai and C. Wu

problem, the object function is deﬁned by Bregman distance [3] of a convex functional. The Bregman distance of a convex functional J(u) is deﬁned as the following (nonnegative) quantity DJp (u, v) ≡ J(u) − J(v)− < p, u − v >,

(14)

where p ∈ ∂J(v). When J(u) is a continuously diﬀerentiable functional, its sub-diﬀerential ∂J(v) has a single element for each v, and consequently the Bregman distance is unique. In this case the distance is just the diﬀerence at the point u between J(·) and its ﬁrst order approximation at the point v. For non-diﬀerentiable functionals, the sub-diﬀerential may contain none or multiple values. Therefore, the Bregman distance between u and v can be ill-deﬁned or multivalued. However, this poses no diﬃculty for the iterative algorithms as the algorithms automatically choose a unique sub-gradient in each iteration as long as the ﬁdelity term for the constraints is diﬀerentiable (this condition holds usually). We also want to emphasis here that Bregman distance of a functional is not a distance in the usual sense since, in general, DJp (u, v) = DJp (v, u) and the triangle inequality does not hold. See [20] [30] for more details. To ﬁnd the solution of the ROF model (4), or equivalently the constrained problem (6), split Bregman iteration (In [14] algorithms for K = I, say, TV denoising are presented) solves a sequence of unconstrained problems taking the form as k r (pk u ,pq ) (uk+1 , q k+1 ) = arg min DGrof ((u, q), (uk , q k )) + |q − ∇u|2 , (15) u,q 2 Ω where pku , pkq , sometimes written together to be (pku , pkq ), are the sub-gradients of Grof at (uk , q k ) with respect to u and q, respectively. Taking the update of the sub-gradients into consideration, the iteration procedure can be formulated as Algorithm 1. For the computation of (uk+1 , q k+1 ), we refer to Algorithm 3 for more details. Algorithm 1. Split Bregman iteration for the ROF model 1. Initialization: q 0 = 0, u0 = 0, p0q = 0, p0u = 0; 2. For k=0, 1, 2, ...: Compute (uk+1 , q k+1 ) using Eqn. (15), and update = pku − rdiv(q k+1 − ∇uk+1 ) pk+1 u . k+1 pq = pkq − r(q k+1 − ∇uk+1 )

3

(16)

Augmented Lagrangian Method, and Relations to Dual Methods and Split Bregman Iteration

In this section we present augmented Lagrangian method [15] [21] [22] for the ROF model, or equivalently the constrained problem (6). Augmented Lagrangian

Augmented Lagrangian Method, Dual Methods and Split Bregman Iteration

507

method has many advantages over other methods such as penalty method [1], and has been successfully applied to nonlinear PDE and mechanics [13]. We also show that the dual methods and split Bregman iteration can be either deduced from, or equivalent to augmented Lagrangian method. 3.1

Augmented Lagrangian Method

In augmented Lagrangian method, one solves the constrained optimization problem (6) by λ r 2 min max Lrof (u, q, μ) = |q| + Ku − f + μ · (q − ∇u) + |q − ∇u|2 , u,q μ 2 2 Ω Ω Ω (17) μ1 is the Lagrange multiplier and r is a positive constant. That where μ = μ2 is, the method is to seek a saddle point of the augmented Lagrangian functional Lrof (u, q, μ). The system of optimality conditions is thus ∂Lrof = λK ∗ (Ku − f ) + ∇ · μ + r∇ · (q − ∇u) = 0, ∂u q ∂Lrof = + μ + r(q − ∇u) = 0, ∂q |q| ∂Lrof = q − ∇u = 0. ∂μ

(18) (19) (20)

We now have two ways to solve the problem (17). One is using optimization techniques to directly minimize/maximize corresponding functionals; while the other is solving the associated system of optimality conditions. The augmented Lagrangian method uses an iterative procedure to solve (17); see Algorithm 2. The iterative scheme runs until some stopping condition is satisﬁed. Algorithm 2. Augmented Lagrangian method for the ROF model 1. Initialization: u0 = 0, q 0 = 0, µ0 = 0; 2. For k=0,1,2,...: compute (uk+1 , q k+1 ) as a minimizer of the augmented Lagrangian method for the Lagrange multiplier µk , i.e., (uk+1 , q k+1 ) = arg min Lrof (u, q, µk ), u,q

(21)

where Lrof (u, q, µk ) is deﬁned in Eqn. (17); and update µk+1 = µk + r(q k+1 − ∇uk+1 ).

(22)

To solve the problem (21), we separate it to the following two sub-problems ([28] [29]): r λ arg min Ku − f 2 − μk · ∇u + |q − ∇u|2 , (23) u 2 2 Ω Ω

508

X.-C. Tai and C. Wu

for given q, and arg min q

|q| +

Ω

μk · q + Ω

r 2

|q − ∇u|2 ,

(24)

Ω

for given u. Sub-problems (23) and (24) can be eﬃciently solved. For (23), the optimality condition gives a linear equation λK ∗ (Ku − f ) + divμk + rdivq − r u = 0 for u, which allows us to use Fast Fourier transforms. Denoting F (u) as the Fourier transform of u, we get u from u = F −1 (

λF (K ∗ )F (f ) − F (div) · F(μk ) − rF (div) · F(q) ), λF (K ∗ )F (K) − rF ( )

(25)

where applying Fourier transform to a vector such as div and μk means applying Fourier transform to its components, respectively; and Fourier transforms of operators such as K, ∂x , ∂y , are regarded as the transforms of their corresponding convolution kernels (for diﬀerential operators, the kernels will be approximated by kernels of diﬀerence operators). For (24), we actually have the following closed form solution 1 1 (1 − |w(x,y)| )w(x, y), |w(x, y)| > 1, q= r (26) 0, |w(x, y)| ≤ 1, where w = r∇u − μk , since we can reformulate the problem to be 1 arg min |rq| + |rq − (r∇u − μk )|2 . q 2 Ω Ω Based on these observation, we can use Algorithm 3 to solve (21). Here N can be chosen using some convergence test techniques. In common augmented Lagrangian method, one usually sets N = 1.

Algorithm 3. Augmented Lagrangian method for the ROF model – solve the sub-problem of Eqn. (21) 1. Initialization: uk+1,0 = uk , q k+1,0 = q k ; 2. For n = 0, 1, 2, ..., N : Compute uk+1,n+1 from Eqn. (25) for q = q k+1,n ; and then compute q k+1,n+1 from Eqn. (26) for u = uk+1,n+1 ; 3. uk+1 = uk+1,N , q k+1 = q k+1,N .

As for the second approach to solve the problem (17), people can use some other iterative procedures to solve the corresponding optimality system. Actually the optimality system naturally infers CGM and the dual method of Chambolle as shown in the following.

Augmented Lagrangian Method, Dual Methods and Split Bregman Iteration

3.2

509

Relations between Augmented Lagrangian Method and Dual Methods as Well as Split Bregman Iteration

In this sub-section we show that CGM and Chambolle’s dual methods for the ROF model can be deduced naturally from the augmented Lagrangian method. This is a much simpler derivation of the dual methods. Also split Bregman iteration is demonstrated to be equivalent to Algorithm 2. Connection to CGM Dual Method. We ﬁrst show that CGM dual method can be deduced from the augmented Lagrangian method. The optimality conditions for the augmented Lagrangian approach are given in (18)–(20). From Eqn. (20), we get q = ∇u. Combining this with (19), we see that μ=−

∇u . |∇u|

(27)

Therefore, the dual variable in CGM dual method is nothing but the Lagrange multiplier μ with a diﬀerent sign. Hence, the system of optimality conditions (18)–(20) is equivalent to ∇ · μ + λK ∗ (Ku − f ) = 0 , ∇u + μ|∇u| = 0

(28)

which is just the primal-dual system of CGM dual method if one replaces −μ with ω. Connection to Chambolle’s Dual Method. We now further derive Chambolle’s dual method. From the ﬁrst equation of (28), we get u as: u = (λK ∗ K)−1 (λK ∗ f − divμ),

(29)

yielding the equation for the dual variable ∇((K ∗ K)−1 (λK ∗ f − divμ)) + |∇((K ∗ K)−1 (λK ∗ f − divμ))|μ = 0.

(30)

For image denoising problems where K = I, (30) and (29) are just the equations used by Chambolle in [6] to solve the dual variable and recover the primal variable u, respectively. The equation (30) for the dual variable in [6] was obtained through a not well-known KKT conditions for inequalities constrained optimization problems, whereas here we deduce this equation very naturally from the augmented Lagrangian method. This is a generic formulation and is not discussed in [6]. We also point out here that some connections between CGM and Chambolle’s dual methods have been noticed in [32]. Connection to Split Bregman Iteration. The split Bregman iteration is actually equivalent to the augmented Lagrangian method. Considering the zero initialization for the sub-gradients and the Lagrange multiplier and letting (pku , pkq ) = −(divμk , μk )

(31)

510

X.-C. Tai and C. Wu

for each k, we have (uk+1 , q k+1 )

r + |q − ∇u|2 = 2 Ω λ r = arg min |q| + Ku − f 2 + udivμk + μk · q + |q − ∇u|2 u,q Ω 2 2 Ω Ω Ω λ r 2 k k = arg min |q| + Ku − f − μ · ∇u + μ ·q+ |q − ∇u|2 u,q Ω 2 2 Ω Ω Ω k (pk u ,pq ) ((u, q), (uk , q k )) arg min DGrof u,q

= arg min Lrof (u, q, μk ), u,q

indicating the equivalence between split Bregman iteration and the iterative procedure for augmented Lagrangian method. In the context of compressive sensing, this equivalence has been pointed out in [30].

Original SNR: InfdB

Blurry&Noisy SNR: 6.30dB

deconvwnr deconvreg SNR: 11.29dB, t = 0.08s SNR: 11.17dB, t = 0.36s

ALM(r=10) SNR: 12.99dB, t = 0.86s

deconvlucy SNR: 9.29dB, t = 1.31s

Fig. 1. Augmented Lagrangian method for ROF restoration, and comparisons to builtin Matlab functions. In the sub-ﬁgures, SNR and t denote signal-noise-ratio and the CPU time usage, respectively.

Augmented Lagrangian Method, Dual Methods and Split Bregman Iteration FTVd(r0=1, SF=2, r=256) SNR: 12.62dB, t = 1.09s

511

ALM(r0=1, SF=2, r=128) ALM(r0=1, SF=1.70, r=69.758) SNR: 12.52dB, t = 0.75s SNR: 12.71dB, t = 0.80s

Fig. 2. Comparisons between FTVd package (splitting-and-penalty) and augmented Lagrangian method with increasing penalty parameters for ROF restoration. In the sub-ﬁgures, r0, SF and r stand for the initial value, the scaling factor and the ﬁnal value of the penalty parameter of methods, respectively. Here, SNR and t denote signalnoise-ratio and the CPU time usage, respectively.

3.3

Remark

We want to emphasis that our observations can be extended to many other models including anisotropic TV, high order nonlinear PDE ﬁlters (e.g. fourth order models), vectorial TV, and even general models. Similarly, we can use FFTbased fast solvers and closed form solutions to solve the sub-problems for the corresponding algorithms. In addition, one can also derive naturally the dual methods [12] [26] [4] from the system of optimality conditions of augmented Lagrangian functionals for these models. Furthermore, the equivalence between split Bregman iteration and augmented Lagrangian method is also valid for these models. More details will be given in a forthcoming paper.

4

Examples

Two numerical examples are provided in Fig. 1 and Fig. 2 to illustrate the accuracy and eﬃciency of our method. We compare our method with some builtin Matlab functions, i.e. deconvwnr.m, deconvreg.m and deconvlucy.m in Fig. 1. As one can see, our method generates much better restoration than these built-in Matlab functions in comparable (or even less) CPU time costs. We also compare our method (with increasing parameter r) in Fig. 2 with the recently developed FTVd package based on pure splitting-and-penalty, which is one of the most eﬃcient approaches as compared to other existing methods as discussed in [29]. From Fig. 1 and 2 people can also compare FTVd with our method with ﬁxed parameter r.

5

Conclusion

In this paper we use an approach based on augmented Lagrangian method for ROF model. The algorithm beneﬁts from FFT-based fast solvers and closed

512

X.-C. Tai and C. Wu

form solution. We also show that our method gives a uniform framework to understand the approaches currently regarded to be particularly eﬃcient for ROF model, such as dual methods and split Bregman iteration. The CGM and Chambolle’s dual methods are diﬀerent iterative schemes to solve the Augmented Lagrangian systems and the dual variables in these methods are nothing but the Lagrange multiplier. Split Bregman iteration is actually equivalent to augmented Lagrangian method. Numerical examples demonstrate the accuracy and eﬃciency of our approach. The method can be extended to many other restoration models.

Acknowledgements This research has been supported by MOE (Ministry of Education) Tier II project T207N2202 and IDM project NRF2007IDM-IDM002-010. Support from SUG 20/07 is also gratefully acknowledged.

References 1. Bertsekas, D.P.: Multiplier methods: a survey. Automatica 12, 133–145 (1976) 2. Blomgren, P., Chan, T.F.: Color TV: total variation methods for restoration of vector-valued images. IEEE Trans. Image Process. 7, 304–309 (1998) 3. Bregman, L.M.: The relaxation method of ﬁnding the common point of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics 7, 200–217 (1967) 4. Bresson, X., Chan, T.F.: Fast minimization of the vectorial total variation norm and applications to color image processing. UCLA CAM Report 07-25 (2007) 5. Carter, J.L.: Dual methods for total variation – based image restoration. Ph.D. thesis, UCLA (2001) 6. Chambolle, A.: An algorithm for total variation minimization and applications. J. Math. Imaging Vis. 20, 89–97 (2004) 7. Chan, T.F., Golub, G.H., Mulet, P.: A nonlinear primal-dual method for total variation-based image restoration. SIAM J. Sci. Comput. 20, 1964–1977 (1999) 8. Chan, T., Marquina, A., Mulet, P.: High-order total variation-based image restoration. SIAM J. Sci. Comput. 22, 503–516 (2000) 9. Chan, T.F., Osher, S., Shen, J.: The digital TV ﬁlter and nonlinear denoising. IEEE Trans. Image Process. 10, 231–241 (2001) 10. Chan, T.F., Kang, S.H., Shen, J.H.: Total variation denoising and enhancement of color images based on the CB and HSV color models. J. Visual Commun. Image Repres. 12, 422–435 (2001) 11. Chan, T., Esedoglu, S., Park, F.E., Yip, A.: Recent developments in total variation image restoration. UCLA CAM Report 05-01 (2005) 12. Chan, T.F., Esedoglu, S., Park, F.E.: A fourth order dual method for staircase reduction in texture extraction and image restoration problems. UCLA CAM Report 05-28 (2005) 13. Glowinski, R., Le Tallec, P.: Augmented Lagrangians and operator-splitting methods in nonlinear mechanics. SIAM, Philadelphia (1989) 14. Goldstein, T., Osher, S.: The split Bregman method for L1 regularized problems. UCLA CAM Report 08-29 (2008)

Augmented Lagrangian Method, Dual Methods and Split Bregman Iteration

513

15. Hestenes, M.R.: Multiplier and gradient methods. Journal of Optimization Theory and Applications 4, 303–320 (1969) 16. Hinterberger, W., Scherzer, O.: Variational methods on the space of functions of bounded Hessian for convexiﬁcation and denoising. Computing 76, 109–133 (2006) 17. Kimmel, R., Malladi, R., Sochen, N.: Images as embedded maps and minimal surfaces: movies, color, texture, and volumetric medical images. Int’l J. Computer Vision 39, 111–129 (2000) 18. Lysaker, M., Lundervold, A., Tai, X.-C.: Noise removal using fourth-order partial diﬀerential equation with applications to medical Magnetic Resonance Images in space and time. IEEE Trans. Image Process. 12, 1579–1590 (2003) 19. Lysaker, M., Tai, X.-C.: Iterative image restoration combining total variation minimization and a second order functional. Int’l J. Computer Vision 66, 5–18 (2006) 20. Osher, S., Burger, M., Goldfarb, D., Xu, J.J., Yin, W.T.: An iterative regularization method for total variation-based image restoration. SIAM Multiscale Model. Simul. 4, 460–489 (2005) 21. Powell, M.J.D.: A method for nonlinear constraints in minimization problems. Optimization. In: Fletcher, R. (ed.), pp. 283–298. Academic Press, New York (1972) 22. Rockafellar, R.T.: A dual approach to solving nonlinear programming problems by unconstrained optimization. Mathematical Programming 5, 354–373 (1973) 23. Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 24. Sapiro, G., Ringach, D.L.: Anisotropic diﬀusion of multivalued images with applications to color ﬁltering. IEEE Trans. Image Process 5, 1582–1586 (1996) 25. Scherer, O.: Denoising with higher order derivatives of bounded variation and an application to parameter estimation. Computing 60, 1–27 (1998) 26. Steidl, G.: A note on the dual treatment of higher-order regularization functionals. Computing 76, 135–148 (2006) 27. Tschumperlé, D., Deriche, R.: Vector-valued image regularization with PDEs: a common framework for diﬀerent applications. IEEE Trans. Pattern Anal. Machine Intell. 27, 506–517 (2005) 28. Wang, Y.L., Yin, W.T., Zhang, Y.: A fast algorithm for image deblurring with total variation regularization. UCLA CAM Report 07-22 (2007) 29. Wang, Y., Yang, J., Yin, W., Zhang, Y.: A new alternating minimization algorithm for total variation image reconstruction. SIAM Journal on Imaging Sciences (to appear) 30. Yin, W.T., Osher, S., Goldfarb, D., Darbon, J.: Bregman iterative algorithms for compressend sensing and related problems. SIAM J. Imaging Sciences 1, 143–168 (2008) 31. You, Y.-L., Kaveh, M.: Fourth-order partial diﬀerential equation for noise removal. IEEE Trans. Image Process. 9, 1723–1730 (2000) 32. Zhu, M., Wright, S.J., Chan, T.F.: Duality-based algorithms for total variation image restoration. UCLA CAM Report 08-33 (2008)

The Convergence of a Central-Diﬀerence Discretization of Rudin-Osher-Fatemi Model for Image Denoising Ming-Jun Lai1 , Bradley Lucier2 , and Jingyue Wang3 1 2

University of Georgia, Athens GA 30602, USA [email protected] Purdue University, West Lafayette IN 47907, USA [email protected] 3 University of Georgia, Athens GA 30602, USA [email protected]

Abstract. We study the connection between minimizers of the discrete and the continuous Rudin-Osher-Fatemi models. We use a centraldiﬀerence total variation term in the discrete ROF model and treat the discrete input data as a projection of the continuous input data into the discrete space. We employ a method developed in [13] with slight adaption to the setting of the central-diﬀerence total variation ROF model. We obtain an error bound between the discrete and the continuous minimizer in L2 norm under the assumption that the continuous input data are in W 1,2 .

1

Introduction

One of the most inﬂuential variational models for image denoising is the total variation–based model proposed by Rudin, Osher and Fatemi(ROF) [10]. This model studies the following constrained minimization problem: arg min |u|BV u with u= g Ω

Ω

and

(1) |u − g|2 = σ 2

Ω

where g is the input data, σ is the standard deviation of the noise, Ω is the unit square [0, 1]2 , and |u|BV is the total variation (TV) of u deﬁned as follows. We consider functions φ in the space of C 1 functions from Ω to R2 with compact support, i.e., [C01 (Ω)]2 . The variation of a function u ∈ L1 (Ω) is then deﬁned to be |u|BV :=

|Du| := Ω

u∇ · φ.

sup φ∈[C01 (Ω)]2 , |φ|≤1 point-wise

Ω

For more details on functions of bounded variation, we refer the reader to [9]. X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 514–526, 2009. c Springer-Verlag Berlin Heidelberg 2009

The Convergence of a Central-Diﬀerence Discretization of ROF Model

515

The existence and uniqueness of the minimizer of (1) have been studied by Lions, Osher and Rudin [11] and more completely by Acar and Vogel [1]. Chambolle and Lions [4] proved that the constrained problem (1) is equivalent to the following unconstrained problem: 1 arg min |u|BV + |u − g|2 . (2) u 2λ Ω They also proved more general results of existence and uniqueness of (1). We later call 1 |u − g|2 E(u) = |u|BV + (3) 2λ the ROF energy functional. On the computing side, the most commonly used discrete variational model is based on the discrete energy Ek (u) =

k−1

μi,j |(∇u)i,j | +

i,j=0

k−1 1 μi,j (ui,j − gi,j )2 , 2λ i,j=0

(4)

where u is deﬁned by a 2-dimensional matrix of size k × k, μi,j is related to the scale k. A simple choice of μi,j is μi,j = 1/k 2 . There are several possible choices for the discrete gradient operator ∇u [3], [5], and [13]. A common choice is (∇u)i,j = ((∇x u)i,j , (∇y u)i,j ) , with (∇x u)i,j =

ui+1,j − ui,j , h

(∇y u)i,j =

ui,j+1 − ui,j , h

where h = 1/k. On the boundary, u is assumed to satisfy the discrete Neumann boundary conditions: u−1,j = u0,j , ui,−1 = ui,0 ,

uk,j = uk−1,j , ui,k = ui,k−1 .

(5) (6)

The discrete function gi,j is the input image. Many eﬃcient algorithms have been developed to ﬁnd the numerical minimizer of (4) [6], [2], [3]. It is not hard to show that Ek Γ -converges to E (for the deﬁnition of Γ convergence, we refer the reader to [7]), therefore, the sequence {uk }, minimizers of Ek , converges to u, the minimizer of E, in L1 (Ω) and Ek (uk ) converges to E(u) as k tends to ∞ (cf. [7]). It is interesting to know the rate of convergence and the convergence in other norm, e.g., in L2 norm. It is also interesting see the diﬀerence between the continuous minimizer and the discrete minimizer. The authors in [13] proved that if the discrete energy Ek is equipped with a symmetrical discrete total variation as deﬁned in (7) and the discrete input data g k is the projection of the

516

M.-J. Lai, B. Lucier, and J. Wang

continuous input data g by taking average of g on each pixel, one can bound the error between the discrete minimizer uk and the continuous u in L2 norm by the Lipschitz norm of g provided that g is in some Lipschitz space. ⎛ 2 2 ⎞1/2 k−1 h2 k uki+1,j − uki,j uki,j+1 − uki,j u = ⎠ + ⎝ + TV 4 h h i,j=0 ⎛

uk − uki,j ⎝ i+1,j h ⎛ ⎝

uki,j

− h

uki−1,j

⎛

uk − uki−1,j ⎝ i,j h

2

+

2

+

2

+

uki,j − uki,j−1 h − h

uki,j+1

uki,j

uki,j − uki,j−1 h

2 ⎞1/2 ⎠

+

2 ⎞1/2 ⎠

+

2 ⎞1/2 ⎠

(7)

In this paper, we extend the study in [13], [12] to the discrete ROF model equipped with a central-diﬀerence TV term which is much simpler than the symmetrical discrete TV term. The ideas for the study in this paper is exactly the same to the ones in [13]. However, a problem of the central-diﬀerence model is that it does not deal well with some non-smooth data, for example, a chessboard image. Thus we have to adapt the study in [13] slightly to this situation and put a stronger assumption on the input data g in order to establish the convergence. We can still get a similar error bound if the input data g ∈ W 1,2 . More precisely, our main results are Theorem 1. If g ∈ W 1,2 , u is the minimizer of E in (3) and uk is the minimizer of Ek in (4) equipped with the central-diﬀerence TV operator, we will give the deﬁnition in (10), then |E(u) − Ek (uk )| ≤ C(1 +

1 )(gW 1,2 + g2W 1,2 )h1/2 . λ

and Theorem 2. If g ∈ W 1,2 , u is the minimizer of the functional E in (3) and uk is the minimizer of the functional Ek in (10), then Ih uk − u2 ≤ C(λ + 1)(gW 1,2 + g2W 1,2 )h1/2 . where Ih uk is the piecewise constant injection of uk into L2 space. The deﬁnition of Ih uk will be given in (14) in the next secion.

2

Preliminaries

A continuous image u is deﬁned as a L2 function on Ω ⊂ R2 . In practice, we always assume Ω to be the unit square [0, 1] × [0, 1].

The Convergence of a Central-Diﬀerence Discretization of ROF Model

517

We assume the output of denoised image to be in the space of bounded variation. In the discrete settings, we consider the discrete set Ω k to be the set of all pairs i = (i1 , i2 ) ∈ Z 2 with 0 ≤ i1 , i2 ≤ k. A discrete image uk is deﬁned as a function on Ω k . We always use superscripts to indicate a function is a discrete function through this paper. For discrete functions, we deﬁne the discrete p (Ω k ) norms ⎛ ⎞1 u p (Ω k ) k

:= ⎝

p

|uki |p

μi ⎠

for 1 ≤ p ≤ ∞

i∈Ω k

where μi is the measure of the discrete space at each index i. The simplest choice of μi is μi = 1 for i ∈ Ω k . In analogue of Sobolev norm, we deﬁne the discrete Sobolev norm as follows. The ﬁrst order forward ﬁnite diﬀerences of uk at point i = (i1 , i2 ) are k Δ+ x ui =

uki1 +1,i2 − uki1 ,i2 ; h

k Δ+ y ui =

uki1 ,i2 +1 − uki1 ,i2 , h

where h = 1/k is the step size. We can also deﬁne backward ﬁnite diﬀerence as k Δ− x ui =

uki1 ,i2 − uki1 −1,i2 ; h

k Δ− y ui =

uki1 ,i2 − uki1 ,i2 −1 . h

One can deﬁne the second order ﬁnite diﬀerence as Δxx uki =

k − k Δ+ x ui − Δx ui . h

Also Δyy uki can be similarly deﬁned. We deﬁne ∇uk 1 , Δxx uk 1 , Δyy uk 1 as k + k ∇uk 1 := (|Δ+ x ui | + |Δy ui |)μi ; Δxx uk 1 :=

i

|Δxx uki |μi ,

Δyy uk 1 :=

i

(8) |Δyy uki |μi .

(9)

i

In this paper, we shall study the error bound for the central-diﬀerence discrete ROF model of which the energy functional is deﬁned as follows Ec (uk ) = Jc (uk ) +

1 k u − g k 2c . 2λ

where the BV term Jc is deﬁned by |Δcx uki |2 + |Δcy uki |2 μi , Jc (uk ) := i∈Ω k

(10)

(11)

518

M.-J. Lai, B. Lucier, and J. Wang

and Δcx uki and Δcy uki at i := (i1 , i2 ) are deﬁned by Δcx uki =

uki1 +1,i2 − uki1 −1,i2 , 2h

Δcy uki =

uki1 ,i2 +1 − uki1 ,i2 −1 . 2h

Here uk satisﬁes the discrete Neumann boundary condition: uk−1,j = uk1,j ,

ukk+1,j = ukk−1,j ,

uki,−1 = uki,1 ,

uki,k+1 = uki,k−1 .

The discrete space measure μi = |Ωi | where Ωi is the intersection of Ω and the square with center ih and size h. Ωi := Ω ∩ [i1 h −

h h h h , i1 h + ] × [i2 h − , i2 h + ]. 2 2 2 2

It is straightforward to calculate ⎧ 2 ⎨ h /4 (i1 , i2 ) ∈ {(0, 0), (0, k), (k, 0), (k, k)} μi = h2 /2 i1 = 0, k; 0 < i2 < k or i2 = 0, k; 0 < i1 < k ⎩ 2 h 0 < i 1 , i2 < k

(12)

(13)

The 2 term is deﬁned by uk − g k 2c =

k

k 2 |uki,j − gi,j | μi,j .

i,j=0

We often need to extend u ∈ Lp (Ω) and uk ∈ p (Ω k ) to all of R2 and Z2 , respectively; we denote the extensions by Ext u and Extk uk . For u ∈ Lp (Ω), we use the following procedure. First, Ext u(x) = u(x),

x ∈ Ω.

We then reﬂect horizontally across the line x1 = 1, Ext u(x1 , x2 ) = Ext u(2 − x1 , x2 ),

1 ≤ x1 ≤ 2, 0 ≤ x2 ≤ 1,

and reﬂect again vertically across the line x2 = 1, Ext u(x1 , x2 ) = Ext u(x1 , 2 − x2 ),

0 ≤ x1 ≤ 2, 1 ≤ x2 ≤ 2.

Having deﬁned Ext u on 2Ω, we then extend Ext u periodically with period (2, 2) on all of R2 . We use a similar construction for discrete functions uk . First we extend uk to 2Ω k := {i = (i1 , i2 ) ∈ Z2 | 0 ≤ i1 , i2 ≤ 2k} as follows: Extk uki = uki ,

i ∈ Ωk ;

The Convergence of a Central-Diﬀerence Discretization of ROF Model

519

then we reﬂect horizontally Extk uk(i1 ,i2 ) = Extk uk(2k−i1 ,i2 ) ,

k + 1 ≤ i1 ≤ 2k, 0 ≤ i2 ≤ k,

and then vertically Extk uk(i1 ,i2 ) = Extk uk(i1 ,2k−i2 ) ,

0 ≤ i1 ≤ 2k, k + 1 ≤ i2 ≤ 2k.

Now that Extk uk is deﬁned on 2Ω k , we extend it periodically with period (2k, 2k) to all of Z2 . Note that with this deﬁnition, the value of Extk uk at any point immediately “outside” Ω k is the same as the value of uk at the closest point “inside” Ω k . We sometimes need to inject or project functions into L2 (Ω) or discrete space 2 (Ω k ) respectively. We use the piecewise constant injector to inject discrete function uk into Lp (Ω): (Ih uk )(x) = uki

for x ∈ Ωi .

(14)

We also deﬁne an injector Lh into a space of continuous, piecewise linear functions. In fact, Lh is the linear interpolation of discrete points {uki } on a triangulation of vertices hZ2 . uki φki . (15) Lh uk = i∈Ω k

Here φki is a dilated and translated tent function, φki (x) := φki1 ,i2 (x1 , x2 ) := φ(x1 /h − i1 , x2 /h − i2 ),

(16)

where φ is the tent function which is continuous on R2 , supported in the hexagon shown in Fig. 1, linear on each triangle as shown in Fig. 1, and satisﬁes the following 0 (i1 , i2 ) ∈ Z2 \(0, 0) φ(i1 , i2 ) = 1 (i1 , i2 ) = (0, 0) We also consider the piecewise constant projector of u ∈ L1 (Ω) onto the space of discrete functions, deﬁned by 1 (Pk u)i = u, i ∈ Ω k , |Ωi | Ωi where |Ωi | = μi is the measure of Ωi deﬁned in (12). We need both continuous and discrete smoothing operators, which we deﬁne as follows. Assume that η(x) is a a ﬁxed non-negative, rotationally symmetric, molliﬁer with support in the unit disk that is C ∞ and has integral 1. For > 0 we deﬁne the scaled function 1 x , x ∈ R2 ; η (x) := 2 η

520

M.-J. Lai, B. Lucier, and J. Wang

(−1, 1)

(0, 1)

(−1, 0)

(1, 0)

(0, 0)

(0, −1)

(1, −1)

Fig. 1. The Support of φ

we smooth a function u ∈ Lp (Ω), 1 ≤ p ≤ ∞, by computing η (x − y) Ext u(y) dy, x ∈ 2Ω. (η ∗ Ext u)(x) = R2

The discrete smoothing operator SL is deﬁned by (SL uk )i =

1 (2L + 1)2

L j1 ,j2 =−L

uki+(j1 ,j2 )

for i ∈ Ω k

For u ∈ L (Ω) we deﬁne the (ﬁrst-order) Lp (Ω) modulus of smoothness by p1 sup |u(x + τ ) − u(x)|p dx . ω(u, t)Lp (Ω) = p

τ ∈R2 , |τ |
x,x+τ ∈Ω

We also deﬁne ω(Ext u, t)Lp (2Ω) :=

sup τ ∈R2 , |τ |
Ext u(· + τ ) − Ext uLp (2Ω) .

We also have need of a discrete modulus of smoothness. To begin, we deﬁne the translation operator (T (uk ))i := uki+

for any = (1 , 2 ) ∈ Z2 . 2

(17)

We deﬁne the norm || = |1 | + |2 | on Z , and then the discrete modulus of smoothness is p1 k k k p ω(u , m)p := sup |ui+ − ui | μi . ∈Z2 , ||≤m

p

i,i+∈Ω k

For Extk uk we deﬁne similarly ω(uk , m)p (2Ω k ) =

sup ∈Z2 , ||≤m

T uk − uk p (2Ω k ) .

The Convergence of a Central-Diﬀerence Discretization of ROF Model

3

521

Basic Properties

We begin with the following properties. Lemma 1. (Contraction) Let u, v be the minimizers for input data f and g in problem (2) respectively, u − vL2 ≤ f − gL2 . See a proof in [13] or [12]. With the above property, one can have the following Lemma 2. (Continuity of translation) Assume u is the minimizer of E in problem (2) for input data g. Extend u to Ext u over R2 by symmetric extension as defined before. Then Ext u(x + h) − Ext u(x)L2 (Ω) ≤ ω(g, |h|)L2 (Ω) . Remark 1. One can conclude from Lemma 2 that ω(u, |h|)L2 (Ω) ≤ ω(g, |h|)L2 (Ω) .

(18)

Remark 2. Similar techniques allow one to show that this result also holds for the discrete case of uk and g k where uk is the minimizer of the discrete energy Ek with the symmetric discrete TV operator Jc , and uk is extended on Z2 as before. In fact, the corresponding discrete version is. T (uk ) − uk 2 (A) ≤ Cω(g k , ||)2 (A) ,

(19)

where A is the index set {i := (i1 , i2 ) : 0 ≤ i1 , i2 ≤ k}. For any discrete image v k , the discrete modulus of continuity is ω1 (v k , m)2 (A) :=

sup T (v k ) − v k 2 (An1 ,n2 )

(20)

0<||≤m

with T being the translation operator deﬁned in (17) and An1 ,n2 := {(i, j) : (i, j) ∈ A, (i + n1 , j + n2 ) ∈ A}. Lemma 3. (Maximum principle) Suppose uk is the minimizer of Ek . If g k ∈ L∞ . Then uk ∞ ≤ g k ∞ . The following lemmas bound the errors introduced by injectors and projectors deﬁned before respectively. Lemma 4. Let u ∈ L2 (Ω) and uk ∈ 2 (Ω k ). Then there exists a constant C such that the following properties hold: a) Pk u2 ≤ uL2 ;

522

M.-J. Lai, B. Lucier, and J. Wang

b) ω(Pk u, m)2 ≤ Cω(u, mh)L2 . c) uk 2 = Ih uk L2 ; d) ω(Ih uk , mh)2 ≤ Cω(uk , m)L2 . e) u − Ih Pk uL2 ≤ Cω(u, h)L2 . The following lemma bounds the diﬀerence between the two injectors we deﬁned in (14) and (15). Lemma 5 Lh uk − Ih uk L2 ≤ Cω(uk , 1)2 The following lemmas show the properties of the smoothing operators Lemma 6 SL uk − uk 2 ≤ ω(uk , L)2 ,

(21)

Jc (SL uk ) ≤ Jc (uk ),

(22)

and Δxx SL uk 1 + Δyy SL uk 1 ≤

C ∇uk 1 . Lh

(23)

The ﬁrst inequality in Lemma 6 shows the error between uk and smoothed uk can be bounded by its discrete modulus of continuity. The second inequality shows smoothing does not increase the discrete total variation. The last inequality shows the the second order diﬀerence of the smoothed function can be bounded by its ﬁrst order ﬁnite diﬀerence. Lemma 7 is the continuous case of Lemma 6. Lemma 7 η ∗ u − uL2 ≤ ω(u, )L2 ,

(24)

|η ∗ u|BV ≤ |u|BV ,

(25)

and Dxxu L1 + Dyy u L1 ≤

C |u|BV .

(26)

The Convergence of a Central-Diﬀerence Discretization of ROF Model

4 4.1

523

Proof of the Main Result Main Idea

Recall the ROF continuous and discrete energy functionals are deﬁned by 1 v − g2 ; 2λ 1 k v − g k 2c Ek (v k ) = Jc (v k ) + 2λ E(v) = |v|BV +

(27) (28)

with input image g k = Pk g. To study the diﬀerence between Ek (uk ) and E(u), it should ﬁrst be noticed that Ek and E are two diﬀerent functionals deﬁned on diﬀerent spaces. E is deﬁned on the continuous BV(Ω) space while Ek is a discrete operator deﬁned on discrete function space. Therefore, some connection between these two operators should be built. We use two energy bounds to bridge them. First, given a discrete minimizer uk of functional Ek , we inject uk into L2 space by function Lh SL uk with E(Lh SL uk ) less than Ek (uk ) plus some error. The construction of Lh SL uk is done by ﬁrst “smoothing” uk as SL uk , then linearinterpolating SL uk . Assuming u is the minimizer of E, we have E(u) ≤ E(Lh SL uk ) ≤ Ek (uk ) + eg,h ,

(29)

where eg,h is the error to be bounded in the next section, which depends on initial g and mesh size h, and tends to zero as h tends to zero. The second energy bound is similar but taken in the opposite direction. Based on u, we construct a “smoothed" discrete function Pk η ∗ u by ﬁrst “smoothing" it, then projecting it into discrete function space, with Ek (Pk η ∗ u) less than E(u) plus an error term eg,h similar to eg,h . By the deﬁnition of uk , we have Ek (uk ) ≤ Ek (Pk η ∗ u) ≤ E(u) + eg,h .

(30)

From (29) we see E(u) − Ek (uk ) ≤ eg,h ; from (30)

Ek (uk ) − E(u) ≤ eg,h ;

then we conclude that |Ek (uk ) − E(u)| ≤ max{eg,h , eg,h } . This will complete our error bound. 4.2

Sketch of the Proof

Proposition 1. If g ∈ W 1,2 , and uk , u are the minimizers of Ek , E in (28), (27) respectively, then E(u) ≤ Ek (uk ) + C(1 +

1 )(gW 1,2 + g2W 1,2 )h1/2 . λ

524

M.-J. Lai, B. Lucier, and J. Wang

Proof. We shall bound the energy E(Lh SL uk ). It is straightforward to calculate its TV term (albeit, the computation is tedious) that Lh SL uk ≤ Jc (SL uk ) + Ch Δxx SL uk 1 + Δyy SL uk 1 . BV By the property of discrete smoothing operator (22) and (23) in Lemma 6, L h SL u k

BV

≤ Jc (uk ) +

C ∇uk 1 . L

By Holder’s inequality and Lemma 2, ∇uk 1 is bounded by k + k ∇uk 1 = |Δ+ x ui | + |Δy ui | μi i

⎛

≤C⎝

1/2 k 2 |Δ+ x ui | μi

+

i

1/2 ⎞ k 2 ⎠ |Δ+ y ui | μi

i

C T(1,0) uk − uk + T(0,1) uk − uk ≤ h C ≤ ω(g k , 1)2 by (19) h ≤ CgW 1,2 We have L h SL u k

BV

≤ Jc (uk ) +

C gW 1,2 . L

The L2 term of E Lh SL uk can be written as

Lh SL uk − gL2 = (Lh SL uk − Ih SL uk ) + (Ih SL uk − Ih uk ) + (Ih uk − Ih g k ) + (Ih g k − g)L2 ≤ uk − g k c + C(Lh)gW 1,2 Applying properties of injectors and projectors, Lemma 4 and Lemma 5 and noting the assumption Lh ≤ 1 and the fact that uk − g k c ≤ g k c ≤ g, we obtain

Lh SL uk − g2L2 ≤ uk − g k 2c + C(Lh)g2W 1,2 .

Thus 1 E(Lh SL uk ) = Lh SL uk BV + Lh SL uk − g2L2 2λ C 1 k C u − g k 2c + (Lh)g2W 1,2 ≤ Jc (uk ) + gW 1,2 + L 2λ λ C C k 2 = Ek (u ) + gW 1,2 + (Lh)gW 1,2 . L λ

The Convergence of a Central-Diﬀerence Discretization of ROF Model

Setting

525

L = h−1/2 ,

we obtain the result of this proposition. Using similar method we prove the following Proposition 2. If g ∈ W 1,2 , and u, uk are the minimizers of E, Ek in (27), (28) respectively, then Ek (uk ) ≤ E(u) + C(1 +

1 )(gW 1,2 + g2W 1,2 )h1/2 . λ

Combining Propositions 1 and 2 immediately yields the following Theorem 1. If g ∈ W 1,2 , and u, uk are the minimizers of E, Ek in (27), (28) respectively, then |E(u) − Ek (uk )| ≤ C(1 +

1 )(gW 1,2 + g2W 1,2 )h1/2 . λ

Next we need the following lemma Lemma 8. If u is the minimizer of E in (27), then for any v ∈ BV, v − u2 ≤ 2λ(E(v) − E(u)).

(31)

A proof of this Lemma can be found in [13] or [12]. It then follows Theorem 2. If g ∈ W 1,2 , and u, uk are the minimizers of E, Ek in (27), (28) respectively, then Ih uk − u2 ≤ C(λ + 1)(gW 1,2 + g2W 1,2 )h1/2 . Remark 3. In this paper, we have proved the error bound for the discrete ROF model equipped with a central-diﬀerence TV term using the method suggested in [13]. This model is simpler in form than the model studied in [13], where a symmetrical TV term is used. This model is also slightly easier to be computed by Chambolle’s method (cf. [3]). However we notice that the central-diﬀerence model fails to deal with a class of data, for example a chessboard image. Thus we have to put some stronger assumption on the initial data(in W 1,2 )) to obtain the error bound which may not be satisﬁed by all real images. However this result still shows the method in [13] can be extended to other symmetric discrete TV operators. It is also interesting to study further if a similar error bound for this model can be obtained without this assumption imposed.

References 1. Acar, R., Vogel, C.R.: Analysis of bounded variation penalty methods for ill-posed problems. Inverse Problems 10, 1217–1229 (1994) 2. Carter, J.L.: Dual Methods for Total Variation-Based Image Restoration, Ph.D. thesis, U.C.L.A (2001)

526

M.-J. Lai, B. Lucier, and J. Wang

3. Chambolle, A.: An algorithm for total variation minimization and applications. Journal of Mathematical Imaging and Vision 20(1-2), 89–97 (2004) 4. Chambolle, A., Lions, P.-L.: Image recovery via total variation minimization and related problems. Numer. Math. 76(2), 167–188 (1997) 5. Chambolle, A., Levine, S., Lucier, B.: ROF image smoothing: some computational comments, draft (2008) 6. Chan, T.F., Golub, G.H., Mulet, P.: A nonlinear primal-dual method for total variation-based image restoration. SIAM J. Sci. Comput. 20(6), 1964–1977 (1999) 7. Dal Maso, G.: An Introduction to Γ -Convergence. Birkhauser, Boston (1993) 8. DeVore, R., Lorentz, G.: Constructive Approximation. Springer, Heidelberg (1993) 9. Evans, L., Gariepy, R.: Measure theory and ﬁne properties of functions. CRC Press, Boca Raton (1992) 10. Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 11. Lions, P.-L., Osher, S.J., Rudin, L.: Denoising and deblurring using constrained nonlinear partial diﬀerential equations, Tech. Rep., Cognitech Inc., Santa Monica, CA, submit to SINUM 12. Wang, J., Lucier, B.: Error bounds for numerical methods for the ROF image smoothing model (2008) (in preparation) 13. Wang, J.: Error Bounds for Numerical Methods for the ROF Image Smoothing Model, Ph.D. thesis, Purdue (2008)

Theoretical Foundations for Discrete Forward-and-Backward Diffusion Filtering Martin Welk1 , Guy Gilboa2 , and Joachim Weickert1 1

Mathematical Image Analysis Group Faculty of Mathematics and Computer Science, Campus E1.1 Saarland University, 66041 Saarbrücken, Germany {welk,weickert}@mia.uni-saarland.de http://www.mia.uni-saarland.de 2 3DV Systems, 2nd Carmel St., Industrial Park Building 1 P.O. Box 249, Yokneam, 20692, Israel [email protected]

Abstract. Forward-and-backward (FAB) diffusion is a method for sharpening blurry images (Gilboa et al. 2002). It combines forward diffusion with a positive diffusivity and backward diffusion where negative diffusivities are used. The well-posedness properties of FAB diffusion are unknown, and it has been observed that standard discretisations can violate a maximum-minimum principle. We show that for a novel nonstandard space discretisation which pays specific attention to image extrema, one can apply a modification of the space-discrete well-posedness and scale-space framework of Weickert (1998). This allows to establish well-posedness and a maximum-minimum principle for the resulting dynamical system. In the fully discrete 1-D case with an explicit time discretisation, a maximum-minimum principle and total variation reduction are proven in spite of the fact that negative diffusivities may appear. This provides a theoretical justification for applying FAB diffusion to digital images.

1 Introduction In the last two decades, many partial differential equations (PDEs) and variational approaches have been proposed for enhancing digital images; see e.g. [1, 13] for an overview. The continuous framework behind these models offers advantages such as transparent and compact formulations where rotationally invariant approaches are easy to model. However, some of the most interesting models are difficult to analyse in the continuous setting due to well-posedness problems. Often these filters work well in practice, but lack a sound continuous theory. This has triggered researchers to investigate wellposedness properties for space-discrete and fully discrete formulations. Let us mention a few examples. For the Perona–Malik filter, Weickert [13] has proposed a space-discrete and fully discrete theory for smooth nonnegative diffusivities. Moreover, in [14] it is proven that the corresponding explicit scheme preserves monotonicity in the 1-D case. This explains that staircasing is the worst phenomenon that can happen. Pollak et al. [12] have X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 527–538, 2009. c Springer-Verlag Berlin Heidelberg 2009

528

M. Welk, G. Gilboa, and J. Weickert

extended this analysis to singular nonnegative diffusivities by showing well-posedness for dynamical systems with a discontinuous right hand sides that result from a spacediscrete Perona-Malik model. For the stabilised inverse linear diffusion process introduced by Osher and Rudin, it was not possible to establish a continuous well-posedness theory, but a stable minmod discretisation proved to work well in practice [9]. Later on, Breuß and Welk [2] showed that staircasing cannot be avoided by suitable space discretisations. Shock filtering [5,10] constitutes another example of a PDE that is difficult to analyse in the continuous setting, while for a 1-D space discretisation, Welk et al. [15] have shown that this process is well-posed and satisfies a maximum–minimum principle. It was even possible to find an analytic solution of the corresponding dynamical system. On the variational side, Nikolova has published a number of impressive papers that provide deep insights in the behaviour of minimisers of space-discrete energies, even if they are highly nonconvex or nondifferentiable; see e.g. [7, 8]. It would have been extremely difficult if not impossible to obtain similar results in the continuous setting. One PDE that has been proposed for sharpening images and for which no well-posed results are known so far, is the so-called forward-and-backward (FAB) diffusion model of Gilboa et al. [3]. Essentially this is a filter of Perona-Malik type, but its diffusivities are positive in certain areas and negative in others. Since pure inverse diffusion with a negative diffusivity is a prototype of an ill-posed problem, it is not surprising that no well-posedness results exist in the continuous setting. Experimentally it has been observed that straightforward explicit discretisations can violate a maximum–minimum principle. The goal of our paper is to address this problem. We show that space-discrete FAB diffusion is well-posed and satisfies a maiximum–minimum principle if a specific nonstandard discretisation is applied at extrema. This is achieved by modifying the spacediscrete diffusion framework of Weickert [13]. Moreover, for the fully discrete 1-D case with an explicit time discretisation, a maximum-minimum principle and a total variation reduction property are established. Our paper is organised as follows. In Section 2 we discuss the FAB diffusion model, while Section 3 reviews the space-discrete diffision framework from [13]. In the fourth section we present our nonstandard space discretisation for FAB diffusion, and we modify the space-discrete diffusion framework such that it becomes applicable to this model. The fully discrete 1-D case is discussed in detail in Section 5. Our paper is concluded with a summary in Section 6.

2 Forward-and-Backward Diffusion Filtering Forward-and-backward (FAB) diffusion filtering has been introduced by Gilboa, Sochen and Zeevi in 2002 [3]. Let Ω ∈ R2 be a rectangular image domain and consider a greyscale image f : Ω → R that is to be sharpened. Then FAB diffusion filtering creates filtered versions u(x, t) of f (x) by solving a Perona-Malik type [11] equation ∂t u = div g(|∇u|2 ) ∇u (1) with f as initial condition, u(x, 0) = f (x),

(2)

Theoretical Foundations for Discrete Forward-and-Backward Diffusion Filtering

529

and homogeneous Neumann boundary conditions, ∂n u = 0,

(3)

where n denotes a normal vector to the image boundary ∂Ω. Here x := (x, y) , subscripts denote partial derivatives, ∇ := (∂x , ∂y ) is the spatial gradient, and div its corresponding divergence operator. The diffusivity g may have different formulations, for example [4]: 1 α − , g(s2 ) = 2 1 + (s/kb )2 1 + (s/kf )

(4)

where kf and kb control the gradient magnitudes for forward and backward diffusion, respectively, and α is the weight between these terms. Note that for small image gradients, this diffusivity is positive, while it becomes negative for larger ones, and finally becomes positive again. Our theory relies on the essential assumption g(0) > 0, which ensures that extrema undergo forward diffusion. FAB diffusion has also been interpreted as an energy minimisation process of a nonmonotone potential in the shape of a triple-well [4]. In the variational formulation of [4] two additional terms have been introduced: a fidelity term to the input image and a fourth order term (hyper-diffusion) which increases the regularisation, strongly suppressing highly oscillating regions. Here we keep the notion of a sharpening flow without these terms. Connections between FAB diffusion and wavelet methods for image enhancement have been described in [6]. Apart from these results not many theoretical properties of the FAB process have been proven. In particular, existence, uniqueness and stability results are not available. Moreover, it was conjectured that such a process violates a maximum–minimum principle, as it may have a negative diffusivity [3]. This was shown to happen in numerical experiments, using standard numerical methods. In this paper we will prove that using a more sophisticated space discretisation, the process admits the maximum–minimum principle and useful theoretical results can be established.

3 A Space-Discrete Diffusion Framework Let us now review the space-discrete diffusion framework of Weickert [13], since parts of it can be extended to the FAB setting. A standard discretisation of a Perona-Malik type diffusion equation ∂t u = ∂x g(|∇u|2 ) ∂x u + ∂y g(|∇u|2 ) ∂y u (5) in some inner pixel (i, j) yields the ordinary differential equation 1 gi+1,j + gi,j ui+1,j − ui,j gi,j + gi−1,j ui,j dui,j = − dt h1 2 h1 2 gi,j + gi,j−1 ui,j 1 gi,j+1 + gi,j ui,j+1 − ui,j − + h2 2 h2 2

− ui−1,j h1 − ui,j−1 . (6) h2

Here ui,j denotes an approximation to u in pixel (i, j). It is centred in the location ((i − 12 )h1 , (j − 12 )h2 ), where h1 and h2 denote the grid size (pixel width) in x- resp.

530

M. Welk, G. Gilboa, and J. Weickert

y-direction. This formula even holds for boundary pixels, provided that the homogeneous Neumann boundary conditions (3) are implemented by mirroring boundary pixels into dummy pixels. A suitable discretisation for the diffusivity g will be discussed later. In a more compact notation, one can represent a pixel (i, j) by a single index k(i, j). This leads to 2 gl + gk duk = (ul − uk ), (7) dt 2h2n n=1 l∈Nn (k)

where Nn (k) are the neighbours of pixel k in n-direction (boundary pixels may have less neighbours). This can be written as a system of ordinary differential equations (ODEs): du = A(u) u, (8) dt where u = (u1 , ..., uN ) , and the N × N matrix A(u) = (ak,l (u)) satisfies ⎧ gk +gl ⎪ (l ∈ Nn (k)), ⎪ 2h2n ⎪ ⎨ 2 gk +gl ak,l := (9) (l = k), − 2h2n ⎪ n=1 l∈Nn (k) ⎪ ⎪ ⎩ 0 (else). Denoting the index set {1, ..., N } by J, a space-discrete problem class (Ps ) is defined in the following way. ⎫ Let f ∈ RN . Find a function u ∈ C1 ([0, ∞), RN ) that satisfies the ⎪ ⎪ ⎪ ⎪ initial value problem ⎪ ⎪ du ⎪ ⎪ ⎪ = A(u) u, ⎪ ⎪ dt ⎪ ⎪ ⎪ ⎪ u(0) = f , ⎪ ⎪ ⎬ where A = (aij ) has the following properties: (Ps ) N N ×N (S1) Lipschitz-continuity of A ∈ C(R , R ) for every bounded ⎪ ⎪ ⎪ N ⎪ subset of R , ⎪ ⎪ N ⎪ (S2) symmetry: a ij (u) = aji (u) ∀ i, j ∈ J, ∀ u ∈ R , ⎪ ⎪ ⎪ N ⎪ ⎪ (S3) vanishing row sums: a (u) = 0 ∀ i ∈ J, ∀ u ∈ R , ij ⎪ j∈J ⎪ ⎪ ⎪ (S4) nonnegative off-diagonals: aij (u) ≥ 0 ∀ i = j, ∀ u ∈ RN , ⎪ ⎭ (S5) irreducibility for all u ∈ RN . One should remember that a matrix A ∈ RN ×N is called irreducible if for any i, j ∈ J there exist k0 ,...,kr ∈ J with k0 = i and kr = j such that akp kp+1 = 0 for p = 0,...,r−1. In other words: There is a way from pixel i to pixel j along which the diffusivities do not vanish. Under these requirements the subsequent theorem is proven in [13]: Theorem 1 (Properties of Space-Discrete Diffusion Filtering) For the space-discrete filter class (Ps ) the following statements are valid: (a) (Well-Posedness) For each T > 0 the problem (Ps ) has a unique solution u(t) ∈ C1 ([0, T ], RN ). This solution depends continuously on the initial value and the right-hand side of the ODE system.

Theoretical Foundations for Discrete Forward-and-Backward Diffusion Filtering

531

(b) (Maximum-Minimum Principle) Let a := minj∈J fj and b := maxj∈J fj . Then, a ≤ ui (t) ≤ b for all i ∈ J and t ∈ [0, T ]. (c) (Average Grey Level Invariance) The average grey level μ := N1 j∈J fj is not affected by the space-discrete diffu sion filter: N1 j∈J uj (t) = μ for all t > 0. (d) (Lyapunov Functionals) V (t) := Φ(u(t)) := i∈J r(ui (t)) is a Lyapunov function for all r ∈ C1 [a, b] with increasing r on [a, b]: V (t) is decreasing and bounded from below by Φ(c), where c := (μ, ..., μ) ∈ RN . (e) (Convergence to a Constant Steady State) lim u(t) = c. t→∞

The proof shows that not all of the requirements (S1)–(S5) are necessary for each of the theoretical results above: Requirement (S1) is needed for local well-posedness, while proving a maximum–minimum principle requires (S3) and (S4). Local well-posedness together with the maximum–minimum principle implies global well-posedness. The average grey value invariance is based on (S2) and (S3). The existence of Lyapunov functionals can be established by means of (S2)–(S4), and convergence to a constant steady state requires (S5) in addition to (S2)–(S4).

4 Application to Space-Discrete FAB Diffusion It is straightforward to verify the prerequisites (S1)–(S5) for the popular positive diffusivity functions, such that Theorem 1 is applicable. However, for FAB diffusion negative diffusivities are possible and the situation becomes much more complicated. One immediatly sees that space-discrete FAB diffusion with g ∈ C 1 [0, ∞) satisfies (S1: smoothness), (S2: symmetry), and (S3: vanishing row sums). However, this just implies local well-posedness and average grey level invariance. By inspecting (9) it becomes clear that (S4: nonnegative off-diagonals) and (S5: irreducibility) cannot be satisfied for typical FAB diffusivities: These diffusivities may vanish (which violates (S5)) and they may even become negative (violating (S4)). As a consequence, global well-posedness, a maximum–minimum principle, Lyapunov functions and convergence to a constant steady state cannot be proven in this way. For the practical applicability of FAB diffusion it would be highly desirable to have at least global well-posedness and a maximum–minimum principle. Is there a remedy for these properties? Fortunately the answer is affirmative, since (S4: nonnegative off-diagonals) can be replaced by a less restrictive condition that only holds at extrema: Theorem 2 (Space-Discrete Diffusion Filtering under Weaker Conditions) Assume that a space-discrete filter satisfies only the properties (S1)–(S3) of the framework (Ps ), and

532

M. Welk, G. Gilboa, and J. Weickert

(S4a) nonnegative off-diagonals at extrema: = i if u has an extremum in i. ai,j (u) ≥ 0 for all j ∈ J with j Then the well-posedness result (a), the maximum–minimum principle (b), and the average grey level invariance (c) of Theorem 1 are still satisfied. Proof. Following [13], one observes that in some pixel k that is a discrete global maximum (i.e. uk ≥ uj for all j ∈ J), condition (S4a) implies that duk = akj (u) uj dt j∈J = akk (u) uk + akj (u) uj ≤ uk ·

j∈J\{k}

≥0

≤uk

akj (u)

j∈J (S3)

= 0.

(10)

In the same way one can prove that if k is a minimum, one has ≥ 0. This nonenhancement behaviour in extrema is the only place where nonnegativity is required in the entire proof of the maximum–minimum principle in [13]. As a consequence, the maximum–minimum principle still holds if (S4) is replaced by the weaker condition (S4a). Moreover, together with local well-posedness, global well-posedness is obtained. This completes the proof. duk dt

While the preceding results are encouraging, we have not yet shown that a suitable space-discretisation satisfies the nonnegativity requirement (S4a) at extrema. Unfortunately, this issue is a bit more delicate than one might assume: A standard discretisation of the diffusivity g(|∇u|2 ) in some pixel (i, j) is given by the central difference approximation 2 2 ui+1,j − ui−1,j ui,j+1 − ui,j−1 gi,j := g (11) + 2h1 2h2 Note that even if u has an extremum in (i, j), the preceding central difference approximation of |∇u|2 may become positive – and not 0 as one would expect from the continuous theory. Since the FAB diffusivities only guarantee that g(0) > 0, it can happen that this finite difference approximation creates negative diffusivities in extrema and (S4a) is violated. Fortunately there is an interesting alternative to the standard discretisation of the diffusivity that solves these problems immediately: Theorem 3 (Properties of Space-Discrete FAB Diffusion) The space discretisation (6) of FAB diffusion with g(0) > 0 and g ∈ C 1 [0, ∞) is wellposed, satisfies a maximum–minimum principle and average grey level invariance, if the diffusivity is evaluated by the nonstandard finite difference approximation ui+1,j − ui,j ui,j − ui−1,j gi,j := g max · ,0 h1 h1 ui,j+1 − ui,j ui,j − ui,j−1 · ,0 . (12) + max h2 h2

Theoretical Foundations for Discrete Forward-and-Backward Diffusion Filtering

533

It should be noted that this approximation has the same quadratic order of consistency as the previous one. However, it guarantees a vanishing discrete gradient approximation in extrema. As a consequence, (S4a) is guaranteed, since FAB diffusities satisfy g(0) > 0. Interestingly, the positivity property g(0) > 0 together with the smoothness assumption g ∈ C 1 [0, ∞) are the only requirements that are necessary to establish well-posedness and a maximum–minimum principle for space-discrete FAB diffusion. Last but not least, these results are not restricted to the two-dimensional case: With a similar nonstandard approximation, it is straightforward to verify that space-discrete FAB diffusion is well-posed and satisfies an extremum principle in any dimension.

5 Fully Discrete FAB Diffusion In order to establish useful properties for FAB diffusion in the fully discrete case, we restrict ourselves to the 1-D setting and use a simple explicit time discretisation with step size τ . Then the corresponding scheme to ∂t u = ∂x (g((∂x u)2 ) ∂x u) is given by g k + gik uki−1 − uki g k + gik uki+1 − uki − uki uk+1 i = i−1 · · + i+1 (13) 2 τ 2 h 2 h2 k k u −u uk −uk with the nonstandard approximation gik = g max i h i−1 · i+1h i , 0 . The up-

per index denotes the time level, i.e. uki approximates u at location (i − 12 )h and time kτ . This approximation also holds at the boundary pixels u1 and uN when one uses the before mentioned dummy pixels. For our analysis, two additional assumptions are essential. While the first one refers to the range of grey values, the second one requires a diffusivity g that still takes sufficiently large positive values for small positive arguments. We get the following result. Theorem 4 (Properties of Fully Discrete FAB Diffusion) Let an initial 1-D image f = (fi ) be given and let the sequence of images uk = (uki ) evolve according to (13) with the initial condition u0 = f . Let the grey-values fi be restricted to a finite interval of length R. Assume further that two constants c1 > c2 > 0 exist such that the diffusivity g fulfils g(0) = c1 , and g(z) > −c2 for all z > 0. Moreover, assume that a positive ω exists such that g(s2 ) > c2 holds for all s with 0 < s < ωR. If the time step satisfies τ<

ω 2 h4 , c1 + c2 + 2c1 ω 2 h2

(14)

the following results are true for the evolution of (uk ). (a) (Maximum–Minimum Principle) If the initial signal is bounded by a ≤ fi ≤ b for all i ∈ J, then a ≤ uki ≤ b holds for all i ∈ J and all k ≥ 0.

534

M. Welk, G. Gilboa, and J. Weickert

(b) (Total Variation Reduction) For each time step k ≥ 0, the total variation of the image uk+1 is less or equal to the total variation of uk : N −1

N −1 k+1 k k+1 u ui+1 − uki . ≤ − u i+1 i

i=1

(15)

i=1

Proof. The global statements of the theorem follow from local properties which will be proven in four steps. Step 1: A local maximal pixel does not increase Assume that uki is a local maximum of the 1-D image in time step k, i.e. we have k k uki ≥ uki+1 and uki ≥ uki−1 . Since in this case gi−1 + gik and gik + gi+1 are certainly k+1 k k k nonnegative, ui is a convex combination of ui−1 , ui and ui+1 if only 1−

τ k (g k + 2gik + gi+1 )≥0 2h2 i−1

(16)

k k + 2gik + gi+1 ≤ 4c1 this is certainly the case if holds. Because of gi−1

τ≤

h2 . 2c1

(17)

Step 2: A neighbour pixel of a local maximum remains below this maximum Assume that uki is a maximum and uki+1 is not a local minimum. Then the inequality k uk+1 i+1 ≤ ui holds if ω 2 h4 τ≤ . (18) 2c1 ω 2 h2 + c2 To see this, we use the equation k k k uki − uki+1 uki+2 − uki+1 gi+1 + gi+2 gik + gi+1 k+1 k · · (19) + ui+1 = ui+1 + τ · 2 h2 2 h2 and distinguish two cases. Case 1: (uki+1 − uki )(uki+2 − uki+1 ) ≤ ω 2 h2 R2 . k k Then gi+1 + gi+2 is certainly nonnegative. The right-hand side of (19) is therefore a convex combination of uki , uki+1 and uki+2 if (16) holds. Analogous to our above reasoning, this is true if (17) is satisfied. Case 2: (uki+1 − uki )(uki+2 − uki+1 ) > ω 2 h2 R2 . Here we conclude from uki+1 − uki+2 ≤ R that uki − uki+1 > ω 2 h2 R .

(20)

k k k Using 12 (gik + gi+1 ) < c1 and 12 (gi+1 + gi+2 ) > −c2 we obtain from (19) the estimate τ τ k k k uk+1 (21) i+1 ≤ ui+1 + 2 c1 (ui − ui+1 ) + 2 c2 R h h k which ensures uk+1 i+1 ≤ ui , provided that

Theoretical Foundations for Discrete Forward-and-Backward Diffusion Filtering

τ≤

ω 2 h4 c 1 ω 2 h 2 + c2

535

(22)

holds. Condition (18) ensures the bounds of both cases, i.e. (17) and (22). Step 3: No new extrema are generated around existing extrema Assume that uki is a local maximum, and none of its neighbours is a local minimum. Assume first that (uki+1 − uki )(uki+2 − uki+1 ) > ω 2 R2 (23) and thus again (20) and (21) hold. Similar considerations for uk+1 yield i uk+1 ≥ uki + i

τ τ c1 (uki+1 − uki ) − 2 c1 R 2 h h

which together with (21) implies τ k τ k+1 − u ≥ 1 − 2 c1 (ui − uki+1 ) − 2 (c1 + c2 )R . uk+1 i i+1 2 h h

(24)

(25)

By the hypothesis of the theorem, (14), and (20) we have that τ<

h2 , (c1 + c2 )R/(uki − uki+1 ) + 2c1

(26)

such that the expression on the right-hand side of (25) is nonnegative. k+1 ) only if Therefore uk+1 i+1 can become a maximum in (u (uki+1 − uki )(uki+2 − uki+1 ) ≤ ω 2 h2 R2 .

(27)

Analogous reasoning applies to the left neighbour uk+1 i−1 . This means that the maximum property of pixel i can be shifted to one of its neighbours. Our assertion that no new k+1 extrema are generated remains true except if both neighbours uk+1 i−1 and ui+1 would simultaneously turn into maxima. Let us therefore discuss this case. This would require the two inequalities (uki+1 − uki )(uki+2 − uki+1 ) ≤ ω 2 h2 R2 and (uki−1 − uki−2 )(uki − uki−1 ) ≤ ω 2 h2 R2

(28)

k k k k + gi+2 and gi−1 + gi−2 are to hold at the same time. In this situation, however, gi+1 nonnegative, implying

uki − uki+1 h2 k u − uk ≤ uki−1 + τ c1 i 2 i−1 , h

k uk+1 i+1 ≤ ui+1 + τ c1

and

uk+1 i−1

(29)

536

M. Welk, G. Gilboa, and J. Weickert

while for the central pixel uk+1 ≥ uki + τ c1 i

uki−1 − 2uki + uki+1 h2

(30)

holds. Hence,

τ k+1 k+1 + 2u − u ≥ 1 − 2 c1 (−uki−1 + 2uki − uki+1 ) . −uk+1 i−1 i i+1 h2

(31)

2

h For τ ≤ 2c , the right-hand side is clearly nonnegative which ensures that uk+1 i−1 and 1 k+1 ui+1 cannot both become maxima.

Step 4: Monotonicity is preserved in image segments without extrema k+1 Assume that uki > uki+1 > uki+2 > uki+3 . We show that then also uk+1 i+1 ≥ ui+2 holds. In the proof we distinguish three cases. k k k Case 1: gik + gi+1 ≥ 0 and gi+2 + gi+3 ≥ 0. Then τ k k+1 uk+1 − u ≥ 1 − 2 c1 (ui+1 − uki+2 ) (32) i+1 i+2 h2 such that the right-hand side is again nonnegative if (17) holds. k k k Case 2: gik + gi+1 ≥ 0 and gi+2 + gi+3 < 0. k k k (The case gik + gi+1 < 0 and gi+2 + gi+3 ≥ 0 is treated in a symmetric way.) k k k From ui+2 − ui+3 ≤ R and (ui+1 − uki+2 )(uki+2 − uki+3 ) > ω 2 h2 R2 we obtain

uki+1 − uki+2 > ω 2 h2 R .

(33)

Consequently, τ c1 (uki+1 − uki+2 ) − h2 τ > uki+1 − uki+2 − 2 2 c1 (uki+1 − uki+2 ) − h Due to (33) the right-hand side is certainly nonnegative if k+1 k k uk+1 i+1 − ui+2 ≥ ui+1 − ui+2 − 2

τ≤

τ c2 (uki+2 − uki+3 ) h2 τ c2 R . (34) h2

ω 2 h4 . 2c1 ω 2 h2 + c2

(35)

k k k < 0 and gi+2 + gi+3 < 0. Case 3: gik + gi+1 Since in this case we have

(uki − uki+1 ) + (uki+2 − uki+3 ) ≤ R ,

(36)

(uki+1 − uki+2 ) min(uki − uki+1 , uki+2 − uki+3 ) > ω 2 h2 R2

(37)

uki+1 − uki+2 > 2ω 2 h2 R .

(38)

it follows that

and thus

A similar reasoning as in Case 2 gives that τ≤

uk+1 i+1

−

uk+1 i+2

ω 2 h4 . 2c1 ω 2 h2 + c2 /2

is ensured if (39)

Theoretical Foundations for Discrete Forward-and-Backward Diffusion Filtering

537

Comparing the bounds derived for the different statements yields (14) as the most restrictive one. If this condition is imposed, extrema cannot be created but only shifted to neighbouring pixels, and monotone segments preserve their monotonicity. Both the maximum–minimum principle and the reduction of total variation follow immediately. This completes the proof. We are convinced that Theorem 4 also possesses a 2-D analogue. The preceding proof, however, does not transfer in a straightforward way to this case: The dependency of g on nonstandard discretisations of ux and uy (cf. (12)) makes it highly cumbersome to control the sign of g.

6 Summary and Conclusions In spite of its negative diffusivity, FAB diffusion becomes well-posed if a nonstandard space discretisation is used. It guarantees a positive diffusivity in discrete extrema. This result is fundamental for justifying FAB diffusion in a practical setting with digital images. Our ongoing work includes research on the multidimensional fully discrete case as well as extensions of our results to (semi-)implicit time discretisations.

Acknowledgement This work has been initiated during a visit of Guy Gilboa to Saarland University. His visit has been financially supported by the Minerva Foundation.

References 1. Aubert, G., Kornprobst, P.: Mathematical Problems in Image Processing: Partial Differential Equations and the Calculus of Variations, 2nd edn. Applied Mathematical Sciences, vol. 147. Springer, New York (2006) 2. Breuß, M., Welk, M.: Staircasing in semidiscrete stabilised inverse diffusion algorithms. Journal of Computational and Applied Mathematics 206, 520–533 (2007) 3. Gilboa, G., Sochen, N.A., Zeevi, Y.Y.: Forward-and-backward diffusion processes for adaptive image enhancement and denoising. IEEE Transactions on Image Processing 11(7), 689– 703 (2002) 4. Gilboa, G., Sochen, N.A., Zeevi, Y.Y.: Image sharpening by flows based on triple well potentials. Journal of Mathematical Imaging and Vision 20, 121–131 (2004) 5. Kramer, H.P., Bruckner, J.B.: Iterations of a non-linear transformation for enhancement of digital images. Pattern Recognition 7, 53–58 (1975) 6. Mrázek, P., Weickert, J., Steidl, G.: Diffusion-inspired shrinkage functions and stability results for wavelet denoising. International Journal of Computer Vision 64(2/3), 171–186 (2005) 7. Nikolova, M.: Local strong homogeneity of a regularized estimator. SIAM Journal on Applied Mathematics 61(2), 633–658 (2000) 8. Nikolova, M.: Minimizers of cost-functions involving nonsmooth data fidelity terms. Application to the processing of outliers. SIAM Journal on Numerical Analysis 40(3), 965–994 (2002)

538

M. Welk, G. Gilboa, and J. Weickert

9. Osher, S., Rudin, L.: Shocks and other nonlinear filtering applied to image processing. In: Tescher, A.G. (ed.) Applications of Digital Image Processing XIV. Proceedings of SPIE, vol. 1567, pp. 414–431. SPIE Press, Bellingham (1991) 10. Osher, S., Rudin, L.I.: Feature-oriented image enhancement using shock filters. SIAM Journal on Numerical Analysis 27, 919–940 (1990) 11. Perona, P., Malik, J.: Scale space and edge detection using anisotropic diffusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 12, 629–639 (1990) 12. Pollak, I., Willsky, A.S., Krim, H.: Image segmentation and edge enhancement with stabilized inverse diffusion equations. IEEE Transactions on Image Processing 9(2), 256–266 (2000) 13. Weickert, J.: Anisotropic Diffusion in Image Processing. Teubner, Stuttgart (1998) 14. Weickert, J., Benhamouda, B.: A semidiscrete nonlinear scale-space theory and its relation to the Perona–Malik paradox. In: Solina, F., Kropatsch, W.G., Klette, R., Bajcsy, R. (eds.) Advances in Computer Vision, pp. 1–10. Springer, Wien (1997) 15. Welk, M., Weickert, J., Gali´c, I.: Theoretical foundations for spatially discrete 1-D shock filtering. Image and Vision Computing 25(4), 455–463 (2007)

L0 -Norm and Total Variation for Wavelet Inpainting Andy C. Yau1 , Xue-Cheng Tai1,2 , and Michael K. Ng3 1

Division of Mathematical Science, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 2 Mathematics Institute, University of Bergen, Norway 3 Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Hong Kong

Abstract. In this paper, we suggest an algorithm to recover an image whose wavelet coeﬃcients are partially lost. We propose a wavelet inpainting model by using L0 -norm and the total variation (TV) minimization. Traditionally, L0 -norm is replaced by L1 -norm or L2 -norm due to numerical diﬃculties. We use an alternating minimization technique to overcome these diﬃculties. In order to improve the numerical eﬃciency, we also apply a graph cut algorithm to solve the subproblem related to TV minimization. Numerical results will be given to demonstrate our advantages of the proposed algorithm.

1

Introduction

Inpainting refers as ﬁlling the missing “information" in an image. However, missing information of the image can be in the pixel domain, but also can be in the other domain. As wavelet plays an important role in the image compression, some information may be lost when the image is compressed and transmitted, either in terms of pixels or wavelet coeﬃcients. In this work, we shall consider to “inpaint" the missing wavelet information. Inpainting idea in digital image processing has been developed for several years. Masnou and Morel [17] solved the inpainting problem by using the propagating level curves. Bertalmio et. al. [2] suggested to solve a third order PDE. Chan and Shen [6] proposed a total variation (TV) inpainting model which uses variational methods in inpainting. Tai et. al [18] suggested an inpainting algorithm that propagate the information into the inpainting domain along the isophote direction by solving TV-Stokes equation. Chan et. al. [5] suggested a uniﬁed TV model for inpainting and superresolution. However, all these methods are in the pixel domain only. In the wavelet domain, the situation is totally diﬀerent. The damages in the wavelet domain will give the image with correlated damage patterns in the pixel domain. Therefore, we cannot recover the damage image directly in the pixel domain. Chan et. al. [7] suggested a wavelet-based TV

The research is supported by MOE (Ministry of Education) Tier II project T207N2202 and IDM project NRF2007IDMIDM002-010. In addition, the support from SUG 20/07 is also gratefully acknowledged.

X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 539–551, 2009. c Springer-Verlag Berlin Heidelberg 2009

540

A.C. Yau, X.-C. Tai, and M.K. Ng

algorithm to recover the image whose wavelet coeﬃcients are lost. They applied TV regularization in the pixel domain to control and restore wavelet coeﬃcients in the wavelet domain. Cai et. al. [3] suggested a tight frame based inpainting algorithm. They found the sparse representation of the image by using the smoothed function with the L1 -norm and solved the inpainting problem by the projection of a vector onto the convex set. In this paper, we will restore the image by ﬁnding the best sparse representation of the image in the wavelet domain with a TV minimization. There are some earlier works for image restoration in the wavelet domain with the TV minimization. Chan and Zhou [8] discussed the image denoising and compression by combining wavelet and the TV minimization. Durand and Froment [11] used the TV minimization in conjunction with wavelet to eliminate pseudo oscillations in image restoration. Wang and Zhou [20] suggested a wavelet-based TV minimization to denoise the medical image. To ﬁnd the sparse representation, we apply the L0 -norm which counts the number of nonzero of the vector. It is usually used for ﬁnding spare representations. However, it is a combinatorial problem which is the NP-complex problem [10]. It is hard to be solved directly and therefore it has not been used much for real applications. It is usually replaced by the L1 -norm. This makes the objective function of the optimization to be a convex functional and the problem can be solved more easily. However, the computational cost for ﬁnding the solution of such optimization is very expensive. Mancera and Portilla [16] suggested a method to ﬁnd the sparse representations by the L0 -norm directly. They presented a sub-optimal method, that is, looking for the vector with K non-zero coeﬃcients to minimize the Euclidean distance to the input signal. In this paper, we will consider the observed image damaged in the wavelet domain and propose a fast algorithm for image restoration by using the L0 -norm and the TV minimization. We ﬁrst ﬁnd the best sparse representation by taking the L0 -norm in the wavelet domain and then ﬁll the missing information by the TV minimization solved by a graph cut algorithm [1] [9]. We will present a method to solve the L0 -norm minimization directly and eﬃciently. The paper will be organized as follows. In Section 2, we will introduce our mathematical model and discuss the method to solve it. Numerical results will be given in Section 3 to demonstrate our algorithm. We will conclude this paper in Section 4.

2

Mathematical Model

Consider the image domain Ω ⊂ IR2 . Let g be the original image of size n1 × n2 . Then the noisy image g˜ is given by g˜ = g + η

(1)

where η is a noise vector. Let W be the wavelet transform that maps an image from the pixel domain to the wavelet domain. Then the wavelet decomposition of g˜ is given by gˆ = Wg + Wη.

(2)

L0 -Norm and Total Variation for Wavelet Inpainting

541

We assume that some of the wavelet coeﬃcients are lost. Let J be the index set indicating the positions that the wavelet coeﬃcients are known and the rest are lost. Then we deﬁne a mapping Π by 1, if (i, j) ∈ J; Πi,j = (3) 0, if (i, j) ∈ / J. Therefore, the observed image, which is damaged in the wavelet domain, can be written as gˆob = Π(Wg + Wη). (4) To restore the image, we consider the following minimization. min Π(ˆ u − gˆ)0 + βT V (u) u

(5)

where β is a regularization parameter and T V (u) denotes the TV norm of u, which is T V (u) = |∇u|dx. (6) Ω

Here and later, for any function deﬁned in the image domain, we will use u ˆ to denote its wavelet representation, i.e. u ˆ = Wu. Due to the orthogonality of the wavelet transformation, we always have ˆ u2 = u2.

(7)

For simplicity, we have use · 2 to denote the both the matrix L2 and pixel domain L2 (Ω) norms. The L0 -norm measures the diﬀerences of wavelet coeﬃcients between the observed image and the resultant image while the TV norm ﬁlls the missing information of the image in the pixel domain. Minimization problem (5) is trying to ﬁnd an image u whose wavelet coeﬃcients is close to gˆ while its TV-norm is minimized in the pixel domain. This problem is not easy to solve. To solve the minimization (5), we shall introduce one more auxiliary function f and one more ﬁtting term [13] [21] to the minimization (5). Then it becomes 1 min Π(ˆ u − gˆ)0 + u − f 22 + βT V (f ) u,f α 1 2 ˆ u − f 2 + βT V (f ) = min min Π(ˆ u − gˆ)0 + ˆ u f α

(8)

We have used property (7) in the above formulation. The new auxiliary image f is an approximation of u and the new ﬁtting α1 u − f 22 is used to control the diﬀerence between u and f in the pixel domain. When α goes to zero, the image f will go to u. The advantage with the above formulation is that we can solve the minimization (5) by solving two sub-minimization problems, i.e. 1 ˆ u − fˆ22 , u ˆ α min u − f 22 + βT V (f ).

min Π(ˆ u − gˆ)0 + f

(9) (10)

542

A.C. Yau, X.-C. Tai, and M.K. Ng

These two minimization problem are coupled. The ﬁrst minimization problem is trying to ﬁnd u with a given f and the second minimization problems is trying to ﬁnd f with a given u. We shall try to use an iterative scheme to alternatively minimize these two sub-problems. For the ﬁrst minimization problem (9), We shall show that the solution of can be given by a simple explicit formula when α approaches zero. This is cost eﬃcient. The second minimization problem (10) is essentially the ROF model. We shall use a new fast graph cut algorithm to solve it. In the minimization (9), the minimization functional is separable and we can minimize with respect to u ˆi,j separately for diﬀerent (i, j). This is an very important property. For each (i, j), there are two possible cases for u ˆi,j . We solve it by considering all the cases. Case 1: u ˆi,j = gˆi,j . The objective functional value of (9) related to the (i, j)-th coeﬃcient is 1 gi,j − fˆi,j |22 . 0 + |ˆ (11) α = gˆi,j . The objective functional value of (9) related to the (i, j)-th Case 2: u ˆi,j coeﬃcient is 1 ui,j − fˆi,j |22 . 1 + |ˆ (12) α However, u ˆi,j has two possible choices, either gˆi,j or fˆi,j . Therefore, in this case, we substitute u ˆi,j = fˆi,j into (12), and we have 1+

1 ˆ fi,j − fˆi,j 22 = 1. α

(13)

If u ˆi,j = gˆi,j , the following inequality must hold. 1 ˆ ui,j − fˆi,j 22 ≤ 1. α

(14)

Therefore, the update scheme for u ˆi,j is u ˆi,j =

gˆi,j , fˆi,j ,

if α1 (ˆ gi,j − fˆi,j )2 ≤ 1 and (i, j) ∈ J; otherwise.

(15)

If α is small enough, the update scheme (15) becomes u ˆi,j =

gˆi,j , fˆi,j ,

if (i, j) ∈ J; otherwise.

(16)

In case there is noise, we should choose a small α and use (15) to update u ˆi,j . When u ˆ is found, we can ﬁnd f by applying the graph cut algorithm to the minimization (10). In order to use the graph cut algorithm, we need to discretize the TV norm in the minimization. Let M = {(i, j)|i ∈ {1, . . . , n1 }, j ∈

L0 -Norm and Total Variation for Wavelet Inpainting

543

{1, . . . , n2 }} is the set of grid points, and δ denote the mesh size. We should use a special form for the discrete TV norm as in [1] [4] [19] T V k (f ) =

p∈M q∈Nk (p)

1 ωpq |fp − fq | 2

(17)

where Nk (p) is the set of neighboring points of any grid point p ∈ M and deﬁned as N4 = {(i ± 1, j), (i, j ± 1)|(i, j) ∈ M }, N8 = {(i ± 1, j), (i, j ± 1), (i ± 1, j ± 1)|(i, j) ∈ M } 2

4δ and ωpq = kp−q . 2 Finally, the minimization (10) can be rewritten in the discrete form as follows

min f

|up − fp |2 + β

p∈M q∈Nk (p)

p∈M

1 ωpq |fp − fq |. 2

(18)

We assume that the image is in n-bit grey scale format and thus f can only take values in [0, 1, . . . , 2n − 1]. Due to this special requirement, we can solve the minimization (10) by a graph cut algorithm. 2.1

Graph Construction

In this subsection, we shall use the graph cut method to solve the minimization (18). The graph cut method basically can be divided into two parts: graph construction and ﬁnding the minimal cut. Our graph construction is based on the method from Bae and Tai [1], which constructs a 3-dimensional graph. Consider that the observed image is n-bit gray level image. Then the range of the intensity level of this image is from 0 to 2n − 1 and a set of vertices is deﬁned as V = {vp,l | p ∈ M, l ∈ {1, . . . , 2n − 1}} ∪ {s} ∪ {t}. Here s, t refer to the two terminal nodes and we refer to [1] for some more details. All the edges for the graph can be divided into two groups, Ed and Er , that is E = Ed ∪ Er . where n

Ed = ∪2l=1−2 {(vp,l , vp,l+1 )|p ∈ M } ∪ {(s, vp,1 )|∀p ∈ M } ∪ {(vp,2n −1 , t)|∀p ∈ M }; Er = {(vp,l , vq,l )|p ∈ M, q ∈ Nk (p), ∀l ∈ {1, . . . , 2n − 1}}. The cost of edges in Ed is deﬁned by the data ﬁtting terms, which is given by c(s, vp,1 ) = δ 2 |up |2 , c(vp,l , vp,l+1 ) = δ 2 |up − l|2

where l ∈ {1, . . . , 2n − 2},

c(vp,2n −1 , t) = δ 2 |up − (2n − 1)|2 .

544

A.C. Yau, X.-C. Tai, and M.K. Ng

We say that a cut is admissible if it exactly severs one edge for each p ∈ M , in which case exactly n1 n2 edges from Ed are severed. The cost of the edge in Er is deﬁned by the TV norm, which is given by c(vp,l , vq,l ) = βωpq

where p ∈ M and q ∈ Nk (p).

The above method will give us the 3-dimensional graph G = (V, E). Then we can ﬁnd the minimal cut with G. A cut on G is a partition of the vertices V into two disjoint sets (Vs , Vt ) such that s ∈ Vs and t ∈ Vt . For a given cut, the set of severed edges C is deﬁned as C = {(a, b)|a ∈ Vs , b ∈ Vt and (a, b) ∈ E}

(19)

The cut severs the edge e if e is contained in C. The cost of the cut is deﬁned as |C| = c(e) (20) e∈C

The minimal cut is that the total cost of the cut |C| is the minimum. As in [1], we associate an f with every admissible cut C on G through deﬁnition ⎧ if (s, vp,1 ) ∈ C, ⎨0 if (vp,l , vp,l+1 ) ∈ C, (21) fp = l ⎩ n 2 −1 if (vp,2n −1 , t) ∈ C. It is easy to see the following relation between a cut C and the image function f : 1 ωpq |fp − fq |. |C| = |up − fp |2 + β (22) 2 p∈M

p∈M q∈Nk (p)

which is the objective function of the minimization (18). So if C is the minimal cut, then the objective function is the minimum with respect to f . Therefore, to ﬁnd the minimal cut on this graph is equivalent to solve the minimization (18). Besides the graph construction mentioned above, there are other methods to construct the graph. Darbon et. al. suggested construct the graph with one layer only and solve the problem level by level [9]. Ishikawa et. al. proposed a similar graph construction with more vertices and edges [15] [14]. According to the method mentioned above, we can construct the corresponding graph to the TV minimization (18) and ﬁnd the minimal cut by applying the push-and-relabel algorithm [12]. As a result, we can solve the minimization (10).

3

Numerical Result

In the numerical experiment, three images, which are ‘lena’ image, ‘bush’ image and synthetic image, of size 96×96 are used for testing and shown in ﬁgure 1. We compare the quality of the image by peak-signal-to-noise ratio (PSNR) which is given by

2552 . (23) P SN R = 10 log10 u − u0 22 where u0 is an original image and u is a reconstructed image.

L0 -Norm and Total Variation for Wavelet Inpainting

545

In the experiment, we assume that α is small enough such that we can apply the update scheme (16) directly and we initiate β with some number and decrease β by 1 in each iteration. The reason is that the large β can recover the geometric information, and the small β can recover the details of the image. Figure 2 shows the position of missing wavelet coeﬃcients of the observed images. We use ‘db7’ wavelet in our experiment. We use Matlab to run the experiment in the laptop computer with Intel Core 2 CPU T7200 (2GHz) and 2 GB memory. We will compare with the Chan et. al. algorithms [7] and name Model 1 and Model 2 to represent their model 1 and model 2.

(a)

(b)

(c)

Fig. 1. Original images: (a) ‘lena’ image, (b) ‘bush’ image and (c) synthetic image

(a)

(b)

Fig. 2. The position of the missing coeﬃcients of the observed images: (a) 10% of wavelet coeﬃcients are missing and (b) 50% of wavelet coeﬃcients are missing

3.1

Noise Free Image

In the ﬁrst experiment, we test our algorithm with the noise free image. Figure 3 shows the lena image with 10% wavelet coeﬃcients lost and its restored image. The starting β for this case is 60 and PSNR of the resultant image is 27.80 dB. The result obtained by Chan et. al. algorithms [7] are also shown in the ﬁgure. Figure 3(c) is obtained by their Model 1 with PSNR=28.26 dB and Figure 3(d) with PSNR=26.70 dB is obtained by their Model 2. Figure 4 shows the bush image with 10% wavelet coeﬃcients lost and its restored image. The starting β for this case is 55 and PSNR of the resultant image is 29.54 dB. Figure 4(c) is obtained by Model 1 with PSNR=26.89 dB and Figure 4(c) with PSNR=25.65 dB is obtained by Model 2. Figure 5 shows the bush image with 50% wavelet coeﬃcients lost and its restored image. The starting β for this case is 60 and PSNR of the resultant

546

A.C. Yau, X.-C. Tai, and M.K. Ng

(a)

(b)

(c)

(d)

Fig. 3. The image with 10% of wavelet coeﬃcient lost : (a) observed image (PSNR=11.84 dB) and (b) restored image (PSNR = 27.80 dB). (c) and (d) are obtained by the Chan et. al. algorithms in [7].

(a)

(b)

(c)

(d)

Fig. 4. The image with 10% of wavelet coeﬃcient lost : (a) observed image (PSNR=16.08 dB) and (b) restored image (PSNR = 29.54 dB). (c) and (d) are obtained by the Chan et. al. algorithms in [7].

(a)

(b)

(c)

(d)

Fig. 5. The image with 50% of wavelet coeﬃcients lost : (a) observed image (PSNR=11.00 dB) and (b) restored image (PSNR=19.48 dB). (c) and (d) are obtained by the Chan et. al. algorithms in [7].

image is 19.48 dB. Figure 5(c) is obtained by Model 1 with PSNR=17.92 dB and Figure 5(d) with PSNR=18.22 dB is obtained by Model 2. Figure 6 shows the synthetic image with 50% wavelet coeﬃcients lost and its restored image. The starting β for this case is 60 and PSNR of the resultant image is 23.79 dB. Figure 6(c) is obtained by Model 1 with PSNR=26.04 dB and Figure 6(d) with PSNR=22.38 dB is obtained by their Model 2. Table 1 summarizes the results of our experiment.

L0 -Norm and Total Variation for Wavelet Inpainting

(a)

(b)

(c)

547

(d)

Fig. 6. The image with 50% of wavelet coeﬃcient lost : (a) observed image (PSNR=10.02 dB) and (b) restored image (PSNR = 23.79 dB). (c) and (d) are obtained by the Chan et. al. algorithms in [7]. Table 1. Comparsion of noise free cases (PSNR) Image

Missing coef. Obs. image Our alg. Model 1 Model 2

Lena image Bush image Bush image Synthetic image

10% 10% 50% 50%

(a)

11.84 16.08 11.00 10.02

27.80 29.54 19.48 23.79

(b)

28.26 26.89 17.92 26.04

26.70 25.65 18.22 22.38

(c)

Fig. 7. Noisy ‘lena’ image, ‘bush’ image and synthetic image

(a)

(b)

(c)

(d)

Fig. 8. The image with 10% of wavelet coeﬃcient lost : (a) observed image (PSNR=11.28 dB) and (b) restored image (PSNR = 22.37 dB). (c) and (d) are obtained by the Chan et. al. algorithms in [7].

The experimental results show that our method can obtained better result than Model 2. The result of the Bush image is also better than those results from Model 1 in PSNR. The resultant image shows that our method can keep

548

A.C. Yau, X.-C. Tai, and M.K. Ng

(a)

(b)

(c)

(d)

Fig. 9. The image with 10% of wavelet coeﬃcients lost : (a) observed image (PSNR=15.93 dB) and (b) restored image (PSNR=26.22 dB). (c) and (d) are obtained by the Chan et. al. algorithms in [7].

(a)

(b)

(c)

(d)

Fig. 10. The image with 50% of wavelet coeﬃcients lost : (a) observed image (PSNR=10.76 dB) and (b) restored image (PSNR=17.54 dB). (c) and (d) are obtained by the Chan et. al. algorithms in [7].

(a)

(b)

(c)

(d)

Fig. 11. The image with 50% of wavelet coeﬃcients lost : (a) observed image (PSNR=9.85 dB) and (b) restored image (PSNR=19.06 dB). (c) and (d) are obtained by the Chan et. al. algorithms in [7].

Table 2. Comparsion of noisy cases (PSNR) Image Lena image Bush image Bush image Synthetic image

Missing coef. Obs. image Our alg. Model 2 Model 2 10% 10% 50% 50%

11.28 15.93 10.76 9.85

22.37 26.22 17.54 19.06

17.79 24.47 15.36 15.84

20.67 22.48 15.90 18.95

L0 -Norm and Total Variation for Wavelet Inpainting

549

more details in the restored image than the other methods. In Figure 6, the small circle on the right upper corner is clearer than the other images. 3.2

Noisy Image

In the second experiment, we test our algorithm with noisy images. We add the white Guassian noise with σ = 0.01 to the original images which are shown in Figure 7. Figure 8 shows the lena image with 10% of wavelet coeﬃcients lost and its restored image. The starting β for this case is 65. We obtained the best image when β = 23 and its PSNR is 22.37 dB. The PSNR of Figure 8(c) by using Model 1 is 17.79 dB) and Figure 8(d) is 20.67 dB by using Model 2. Figure 9 shows the input image with 10% of wavelet coeﬃcients lost and its restored image. The starting β for this case is 50 and the PSNR of the resultant image is 26.22 dB. The PSNR of Figure 9(c) by using Model 1 is 24.47 dB and Figure 9(d) is 22.48 dB by using Model 2. Figure 10 shows the bush image with 50% of wavelet coeﬃcients lost and its restored image. The starting β for this case is 60 and the PSNR of the resultant image is 17.54 dB. The PSNR of restored images are 15.36 dB by using Model 1 (Figure 10(c)) and 15.90 dB by using Model 2 (Figure 10(d)). Figure 11 shows the synthetic image with 50% of wavelet coeﬃcients lost and its restored image. The starting β for this case is 60 and the PSNR of the resultant image is 19.06 dB. The PSNR of restored images are 15.84 dB by using Model 1 (Figure 11(c)) and 18.95 dB by using Model 2 (Figure 11(d)). Table 2 summarizes the results of our experiment. In this experiment, it is more diﬃcult as the noise is present in the observed image. Table 2 shows that our restored image better than those results from the other methods in PSNR. The resultant images show that our method can remove the noise and keep more details of the image. In Figure 10, our resultant image keeps more details than the others resultant images. The head of Bush in our resultant image remains a better shape than in the other images.

4

Conclusion

In this paper, we introduce the algorithm to solve the wavelet inpainting problem. We apply the L0 -norm to optimize the wavelet coeﬃcients and the TV minimization to ﬁll the missing information. We suggest a method to solve the L0 -norm directly. We solve the minimization (5) by introducing one more ﬁtting term and break down into two minimizations (9) and (10). The minimization (9) minimizes the L0 -norm and the minimization (10) minimizes the TV norm. We apply the graph cut algorithm to solve the TV minimization. The experimental results show that our algorithm can obtain better results.

550

A.C. Yau, X.-C. Tai, and M.K. Ng

References 1. Bae, E., Tai, X.C.: Graph Cuts for the Multiphase Mumford-Shah Model Using Piecewise Constant Level Set Methods. UCLA, Applied Mathematics, CAMreport-08-36 (2008) 2. Bertalmio, M., Sapiro, G., Caselles, V., Balleste, C.: Image inpainting. Technical report, ECE-University of Minnesota 60, 259–268 (1999) 3. Cai, J.F., Chan, R.H., Shen, Z.: A Framelet-Based Image Inpainting Algorithm. Appl. Comput. Harmon. Anal. 24, 131–149 (2008) 4. Chambolle, A.: Total variation minimization and a class of binary MRF models. In: Rangarajan, A., Vemuri, B.C., Yuille, A.L. (eds.) EMMCVPR 2005. LNCS, vol. 3757, pp. 136–152. Springer, Heidelberg (2005) 5. Chan, T.F., Ng, M.K., Yau, A.C., Yip, A.M.: Superresolution image reconstruction using fast inpainting algorithms. Applied and Computational Harmonic Analysis 23(1), 3–24 (2007) 6. Chan, T., Shen, J.: Mathematical models for local non-texture inpainting. SIAM Journal on Applied Mathematics 62, 1019–1043 (2001) 7. Chan, T., Shen, J., Zhou, H.M.: Total Variation Wavelet Inpainting. Journal of Mathematical Imaging and Vision 25(1), 107–125 (2006) 8. Chan, T., Zhou, H.M.: Optimal Constructions of Wavelet Coeﬃcients Using Total Variation Regularization in Image Compression. UCLA, Applied Mathematics, CAM Report, No. 00–27 (2000) 9. Darbon, J., Sigelle, M.: Image restoration with discrete constrained total variation part I: Fast and exact optimization. J. Math. Imaging Vis. 26(3), 261–276 (2006) 10. Donoho, D.L.: For Most Large Undetermined Systems of Linear Equations the Minimal l1-norm Solution is also the Sparsest Solution. Communications on Pure and Applied Mathematics 59(7), 903–934 (2006) 11. Durand, S., Froment, J.: Artifact Free Signal Denoising with Wavelets. In: Proceedings of ICASSP 2001, vol. 6, pp. 3685–3688 (2001) 12. Goldberg, A.V., Tarjan, R.E.: A new approach to the maximum-ﬂow problem. J. ACM 35(4), 921–940 (1988) 13. Huang, Y., Ng, M.K., Wen, Y.: A Fast Total Variation Minimization Method for Image Restoration. Multiscale Modeling & Simulation 7(2), 774–795 (2008) 14. Ishikawa, H.: Exact optimization for markov random ﬁelds with convex priors. IEEE Transactions on Pattern Analysis and Machine Intelligence 25(10), 1333– 1336 (2003) 15. Ishikawa, H., Geiger, D.: Segmentation by grouping junctions. In: CVPR 1998: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, DC, USA. IEEE Computer Society, Los Alamitos (1998) 16. Mancera, L., Portilla, J.: L0-Norm-Based Sparse Representation Through Alternate Projections. In: IEEE International Conference on Image Processing, Atlanta, pp. 2089–2092 (2006) 17. Masnou, S., Morel, J.: Level-lines based disocclusion. In: Proc. 5th IEEE Int. Conf. on Image Process., Chicago, pp. 259–263 (1998) 18. Tai, X.C., Osher, S., Holm, R.: Image Inpainting Using a TV-Stokes Equation. In: Image Processing based on partial diﬀerential equations, pp. 3–22. Springer, Heidelberg (2006)

L0 -Norm and Total Variation for Wavelet Inpainting

551

19. Ranchin, F., Chambolle, A., Dibos, F.: Total Variation Minimization and Graph Cuts for Moving Objects Segmentation. In: Sgallari, F., Murli, A., Paragios, N. (eds.) SSVM 2007. LNCS, vol. 4485, pp. 743–753. Springer, Heidelberg (2007) 20. Wang, Y., Zhou, H.: Total Variation Wavelet-Based Medical Image Denoising. International Journal of Biomedical Imaging 2006, 1–6 (2006) 21. Wang, Y., Yang, J., Yin, W., Zhang, Y.: A New Alternating Minimization Algorithm for Total Variation Image Reconstruction. SIAM J. Imaging Science 1(3), 248–272 (2008)

Total-Variation Based Piecewise Aﬃne Regularization Jing Yuan1 , Christoph Schnörr1, and Gabriele Steidl2 1

Image and Pattern Analysis Group Dept. Mathematics and Computer Science, University of Heidelberg, Germany {yuanjing,schnoerr}@math.uni-heidelberg.de 2 Appl. Math. Comp. Sci. Group Dept. Mathematics and Computer Science, University of Mannheim [email protected]

Abstract. In this paper, we introduce a novel second-order regularizer, the Aﬃne Total-Variation term, to capture the geometry of piecewise aﬃne functions. The approach can be characterized by two convex decompositions of a given image into piecewise aﬃne structure and texture and noise, respectively. A convergent multiplier-based method is presented for computing a global optimum by computationally cheap iterative steps. Experiments with images and vector ﬁelds validate our approach and illustrate the diﬀerence to classical TV denoising and decomposition.

1 1.1

Introduction Overview and Motivation

In this paper, we suggest and investigate a novel second-order regularization term, (1) u2xx + u2yx + u2xy + u2yy dx , TVa (u) := Ω

called Aﬃne Total Variation, for denoising and decomposing functions into piecewise aﬃne structures. Our work has been motivated by the basic total variation approach [15] to the piecewise constant regularization of functions, henceforth called ROF-model, and a recent extension of this approach suggested in [23] to the piecewise harmonic regularization of vector ﬁelds. The latter approach demonstrates that by modifying the usual total variation term TV(u) = |∇u| dx , (2) Ω

ﬂows can be restored and decomposed into richer structure than merely piecewise constant functions, that only model a narrow subclass of real signals suﬃciently accurate. At the same time, the basic structure of the ROF-model from the viewpoint of convex optimization has been preserved, such that standard methods from convex programming lead to eﬃcient algorithms. X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 552–564, 2009. c Springer-Verlag Berlin Heidelberg 2009

Total-Variation Based Piecewise Aﬃne Regularization

553

While the work [23] was motivated by ﬂows related to image sequences from experimental ﬂuid dynamics, our present work investigates piecewise aﬃne regularization as an alternative to the piecewise harmonic case studied in [23]. Figure 1 shows the result of applying the novel regularizer (1) to a noisy image function. Our approach returns a denoised version of the input data with the piecewise aﬃne structures preserved well. From the viewpoint of optimization, our approach has the same simple structure as the ROF-model. From the viewpoint of algorithm design, however, a bit more work is required to be able to resort to standard algorithms, due to the second-order partial derivatives appearing in (1).

Fig. 1. From left to right. Noisy input image f , denoised image u using the regularizer (1), and the diﬀerence between the original noise-free image and the denoised image. Up to local errors at discontinuities, this latter image is almost constant which means that the piecewise aﬃne structure underlying the noisy input data has been successfully restored.

1.2

Related Work and Contribution

Related work. Applying the standard TV-term (2) to general, not necessarily piecewise constant signals and images, leads to the well-known staircasing eﬀect, that is to many jumps of the minimizing functions making the decomposition of the input data useless for signal interpretation. In this connection, higher-order regularization has been studied in the literature. In [1], Chambolle and Lions propose an inf-convolution of the total-variation term and a functional based on the second-order derivatives: 2 + w2 + w2 + w2 R(u) = min |∇v| dx + α wxx yx xy yy dx. u=v+w

Ω

Ω

A corresponding asymptotical case was studied in [16]. Chan et al. [3] adaptively add the Laplacian as regularizing term or replace the second summand in the inf-convolution by the Laplacian in [2] to avoid staircasing. After mollifying the TV-measure TV(u) ≈ Ω |∇u|2 + ε dx , ε 1, the corresponding Euler-Lagrange equation is iteratively solved by the lagged-diﬀusivity ﬁxed-point method (cf. [19]). Likewise, You and Kaveh [20] and Didas et al. [5] investigate Laplacians u and variations thereof as argument of one convex functional. In [11], Lysaker and Tai provide two regularizars

554

J. Yuan, C. Schnörr, and G. Steidl

R1 (u) = R2 (u) =

|uxx | + |uyy | dx

(3)

u2xx + u2yx + u2xy + u2yy dx

(4)

Ω Ω

which are used in a PDE-based image diﬀusion process so as to avoid staircase eﬀect in smooth regions and a fourth-order numerical scheme is given. In [12], Lysaker and Tai further introduce the convex combination of high-order regularizar and the classical total-variation term. The functional (3) was also considered in [8]. In [13], Rahman, Tai and Osher suggested a two-step high-order image denoising method, which ﬁrst computes a denoised tangential ﬁeld τ = (τ1 , τ2 )t , i.e. div τ = 0, by applying the regularizar |∇τ | dx which is actually equivalent to (4) for the image scalar ﬁeld, then reconstructs the image gray-values by ﬁtting the resulting normal ﬁeld n = (τ2 , −τ1 ) through n dx , s.t. min |∇u| − ∇u · (u − f )2 dx = σ 2 . u |n| Ω Ω Basically, the energy functionals used in our approaches possess the same structure as the work [11] except the applied nonsmooth high-order regularizar and the optimized functional proposed in (13b) is similar as the tangential-smoothing step suggested in [13] except that our approach tries to smooth the curl-free gradient ﬁeld than the div-free ﬁeld. In connection with optical ﬂow estimation, Trobin et al. [18] adopt from [4] the second-order term T √ √ 1 t(u) := √ Δu, 2 (uxx − uyy ), 8 uxy , 3 and use the corresponding TV-term Ω t(u) · t(u)dx for ﬂow estimation. The derivation of t(u) in [4] is based on Fourier transforms and motivated by designing local detectors for detecting ridges and valleys of image functions, say. As a consequence, the corresponding TV-term appears not to be a proper basis for piecewise aﬃne decomposition, and boundaries are not treated adequately (as is clearly visible e.g. in Fig. 2f in [18]). Contribution. Our contribution consists in devising a novel regularization term (1) that provides a mathematically precise solution to the problem of denoising piecewise aﬃne signals. Staircasing is suppressed as well, and a augmented Lagrangian based problem decomposition is derived that enables to compute a global optimum by iterating computationally simple iterative steps. Numerical experiments are presented mainly to illustrate and validate properties of the approach.

2

Subspaces and Orthogonal Decompositions

We let Ω ⊂ R2 denote an open bounded and simply-connected domain with Lipschitz-continuous boundary ∂Ω. For scalar-valued functions, we denote by

Total-Variation Based Piecewise Aﬃne Regularization

555

| · |p , 1 ≤ p < ∞ the usual Lp (Ω) norm and by · the L2 (Ω)inner product. For vector-valued functions g = (g1 , g2 )T , we set g p := g12 + g22 p and

g, hΩ := g1 , h1 + g2 , h2 . Further, we use the notation u ¯ := |Ω|−1 Ω u dx for T ⊥ T the average of u and ∇u := (ux , uy ) , ∇ u := (uy , −ux ) , div g := g1x + g2y and curl g := g1y − g2x . Let H 1 (Ω) denote the Sobolev spaces with the inner product ¯ v¯

u, vH 1 := ∇u, ∇vΩ + u

(5)

and let H01 (Ω) := {u ∈ H 1 (Ω) : u|∂Ω = 0}. We are interested in the space

ux , u ¯y )T · n , H(Ω) := u ∈ H 1 (Ω) : ∂n u|∂Ω = (¯ where n denotes the outer unit normal vector at the boundary ∂Ω. By the following proposition, we can decompose functions u ∈ H(Ω) into a globally aﬃne component ua and an oscillating part uo . Proposition 1. The space H(Ω) admits the orthogonal decomposition H(Ω) = Ha (Ω) ⊕H 1 Ho ,

Ha (Ω) := u ∈ H(Ω) : ∇u = (¯ ux , u ¯y )T ,

Ho (Ω) := u ∈ H(Ω) : u ¯=u ¯x = u ¯y = 0 , ∂n u|∂Ω = 0 .

(6a) (6b) (6c)

¯y . Then ua := Proof. For any u ∈ H(Ω), let uox := ux − u¯x and uoy := uy − u u ¯x x + u ¯y y + u ¯ ∈ Ha (Ω) and the function uo deﬁned by its partial derivatives uox , uoy and by u ¯o = 0 belongs to Ho (Ω). Moreover, we have that u = ua + uo . The orthogonality of the decomposition follows by

ua , uo H 1 = ∇ua , ∇uo Ω + u ¯a u¯o = u¯x uox dx + u¯y uoy dx = 0. 2 Ω

Ω

The Helmholtz decomposition of vector ﬁelds, see [6, 22, 21] also for the discrete setting, is given by L2 (Ω)2 = ∇H 1 (Ω) ⊕ ∇⊥ H01 (Ω), where the spaces can be also characterized by ∇H 1 (Ω) = {v ∈ L2 (Ω)2 : curl v = 0} and ∇⊥ H01 (Ω) = {v ∈ L2 (Ω)2 : div v = 0, v · n|∂Ω = 0}. We will need the space V (Ω) := {v ∈ L2 (Ω)2 : v · n|∂Ω = 0, v¯1 = v¯2 = 0} . (7) By the Helmholtz decomposition, this space admits the orthogonal decomposition V (Ω) = V∇ (Ω) ⊕ V∇⊥ (Ω),

(8)

where V∇ (Ω) := {v ∈ ∇H 1 (Ω) : v · n|∂Ω = 0, v¯1 = v¯2 = 0} and V∇⊥ (Ω) := {v ∈ ∇⊥ H01 (Ω) : v¯1 = v¯2 = 0}. Proposition 2. For every vector ﬁeld v ∈ V∇ (Ω), there is a unique function uo ∈ Ho (Ω) with v = ∇uo .

556

J. Yuan, C. Schnörr, and G. Steidl

Proof. By deﬁnition we have for any v ∈ V∇ (Ω) that there exists u ∈ H 1 (Ω) such that v = ∇u. Then we see that v · n|∂Ω = ∂n u|∂Ω = 0 and v¯1 = u ¯x = 0, v¯2 = u ¯y = 0. On the other hand, uo ∈ Ho is uniquely determined by the Neumann problem Δuo = div v ,

3

∂n uo |Ω = 0 ,

u ¯o = 0 .

(9)

Variational Approaches

In the rest of this paper, we follow the ﬁrst discretize, then optimize paradigm, yet adopt the usual (continuous) notation that is easier to read. Accordingly, all operators like ∇, div etc. denote linear mappings between ﬁnite dimensional spaces, | · |p are the usual p norms and for g := (gi )ni=1 , gi ∈ R2 g p := |(|gi |2 )ni=1 |p . In the following, we denote by δC the indicator function of a convex set C, i.e. δC (x) := 0 if x ∈ C, and δC (x) := ∞ otherwise and by PC the orthogonal projector onto C. We exhibit the eﬀect of the regularizer (1) by computing a dual representations of the optimization problems (13) in accordance to the dual formulation of the ROF-model. In general, if g : Rn → R and Φ : Rm → R are proper, closed convex functions and D : Rn → Rm is a linear operator, then the following problem (P) has the dual (D): (P )

inf {g(u) + Φ(Du)},

(D)

u∈Rn

− infm {g ∗ (−D∗ p) + Φ∗ (p)}, p∈R

where g ∗ denotes the conjugate function of g. For the problems considered in the following, it can be shown that solutions of the primal and dual problem exist and that the duality gap is zero. Rudin-Osher-Fatemi (ROF) Model. We recall some basic formulas as a reference for our approach presented below. The ROF-model reads inf u

1 2

|f − u|22 + α TV(u) ,

α TV(u) := σCα (u),

(10)

where Cα := div Bα , Bα := p : p ∞ ≤ α}. Let u ˆ denote the minimizer of (10). Setting g(u) := 12 |f − u|22 , D := I and Φ(u) := αT V (u) and regarding that g ∗ (v) := 12 |f + v|22 − 12 |f |2 and Φ∗ (v) = δCα the dual problem reads − inf

1

v∈Cα

1 |f − v|22 − |f |2 , 2 2

(11)

where we have replaced v by −v by the symmetry of Cα . Consequently, if pˆ := argmin p∈Bα

1 2

|f − div p|22

(12)

Total-Variation Based Piecewise Aﬃne Regularization

557

then vˆ := div pˆ = PCα (f ) is the minimizer of the dual problem. Primal and dual solutions are related by the optimality condition f − div pˆ = u ˆ, that in turn yields the image decomposition f = u ˆ + vˆ. Aﬃne Variational Models. Based on the regularizer (1) we consider two variational approaches: 1

|u − f |22 + α TVa (u) , inf u 2 1

|u − f |2H 1 + α TVa (u) . inf u 2

(13a) (13b)

These approaches diﬀer due to the data term which is the usual one in case of (13a), whereas the data term in (13b) is induced by the discrete counterpart of the inner product (5). 3.1

Variational Approach (13a)

We introduce an auxiliary vector ﬁeld v in order to express the regularizer (1) in term of the ordinary TV-measure deﬁned in (2). Then approach (13a) reads inf

u,v

1 2

|f − u|22 + α TV(v1 ) + α TV(v2 ) ,

i.e.,

subject to

v = ∇u.

(14)

inf {g(u, v) + Φ D(u, v)T u,v

with g(u, v) :=

1 2 |f

−

u|22

+ α TV(v1 ) + α TV(v2 ), Φ := δ0 , D := (∇ − Since I). div g ∗ (r, s) = 12 |f + r|22 − 12 |f |2 + δCα (s1 ) + δCα (s2 ), Φ∗ ≡ 0 and −D∗ = the I dual problem becomes 1 1 div |f − div q|22 − |f |2 . q) = − inf 2 − inf 2 g ∗ ( I 2 2 q∈Cα q∈Cα This formulation parallels the dual formulation (11) of the ROF-model. Let 1 qˆ := argmin |f − div q|22 . 2 2 q∈Cα

(15)

The higher-order TV regularization becomes apparent through the texture part of f which is deﬁned by the orthogonal projection vˆ = div qˆ = Pdiv Cα2 (f ) onto a diﬀerent convex set. An alternative, more explicit characterization of the regularization eﬀect of (1) in terms of the auxiliary ﬁeld v = ∇u is obtained by reformulating (13a) as inf G(v) + α TV(v1 ) + TV(v2 ) , v

where

G(v) :=

inf

u,∇u=v

1 |u − f |22 (16) 2

558

J. Yuan, C. Schnörr, and G. Steidl

Exploiting strong duality again we obtain that 1

1 |f − div p|22 − |f |2 − v, pΩ . G(v) = − inf p 2 2

(17)

Fermat’s rule yields that the minimizer pˆ has to fulﬁll ∇ div pˆ = ∇f − v and, in turn, Δ(div pˆ) = Δf − div v. Insertion into G(v) in (17) yields for (16) (omitting the constant) 1 Δ−1 (Δf − div v)2 + α TV(v1 ) + TV(v2 ) inf (18) 2 v 2 This representation of (13a) and (15), respectively, shows that the edge image Δf is approximated by the divergence of a piecewise smooth vector ﬁeld v in terms of the | · |Δ−2 -norm. Clearly, inserting v = ∇u into Δ−1 (Δf − div v) yields 1 2 2 |f − u|2 from (13a). 3.2

Variational Approach (13b)

The data term of problem (13b) decomposes according to the orthogonal decomposition (6a). By construction, the aﬃne component ua of u = ua + uo is not aﬀected by the regularizer. Thus, uˆa = fa , where fa can be computed in a preprocessing step. It remains to minimize 1 inf ∇fo − v 22 + α TV(v1 ) + TV(v2 ) , subject to v = ∇uo . uo ,v 2 Due to the Prop. 2 the linear constraint can be expressed as δV∇ (v). Reasoning similar to the previous section yields

1 sup w, ∇fo − vΩ − w 22 + sup q, −vΩ + δV∇ (v) 2 2 w q∈Cα

1 = sup w + q, −vΩ + δV∇ (v) + w, ∇f Ω − w 22 2 2 w,q∈Cα Interchanging inf and sup and taking inf v (ignoring constants), we obtain inf

2 w,q∈Cα

1 ∇fo − w 22 2

subject to

w + q ∈ V∇⊥ .

(19)

The minimizer w ¯ is obviously an element of V∇ , which together with the constraints q ∈ Cα2 , w + q ∈ V∇⊥ leads to the reformulation of (19) inf

2) q∈P∇ (Cα

1 ∇fo − q 22 . 2

(20)

Here P∇ denotes the orthogonal projector onto the subspace V∇ . To compare this approach with (15), we rewrite (20) as inf

2) q∈P∇ (Cα

2 1 1 ∇(fo − Δ−1 div q) 22 = inf 2 ∇ Δ−1 (Δfo − div q) 2 , (21) 2 q∈P∇ (Cα ) 2

Total-Variation Based Piecewise Aﬃne Regularization

559

where Δ−1 stands for the solution operator of problem (9). Approach (15), on the other hand, is given by inf |fa + (fo − div q)|22 .

2 q∈Cα

(22)

Taking into account the representation of vector ﬁelds q ∈ V∇ by a potential functions φq in terms of q = ∇φq viz. div q = Δφq (Prop. 2), we see that (21) focuses on the decomposition of the edge set Δf , whereas (22) decomposes f and does not discriminate the two components fa and fo . Comparing (21) on the other hand with (18) indicates how regularization of the large-scale structural components of f is accomplished by (21) in terms of the small-scale texture component φq , by taking the gradient (after smoothing with Δ−1 ) and projection onto a suitable set P∇ (Cα2 ).

4

Optimization

In this section we specify algorithms for computing a global minimum of (13a) and (13b), respectively. We apply an alternating version of the split Bregman algorithm [7]. Note that the split Bregman algorithm coincides with the augmented Lagrangian method applied to the primal problem [14] and that its alternating version is just a Douglas-Rachford splitting for the dual problem [17]. The convergence properties of this technique are well known. 4.1

Algorithm Minimizing (13a)

The split Bregman algorithm for (14) reads (u(k+1) , v (k+1) ) = argmin u,v

1 2

|f − u|22 + α(TV(v1 ) + TV(v2 ))

1 ∇u − v 22 + b(k) , ∇u − vΩ , 2τ 1 = b(k) + ∇u(k+1) − v (k+1) . τ +

b(k+1)

Alternating the minimization of u(k+1) and v (k+1) we obtain

1 ∇u(k) + τ b(k) − v 22 , v (k+1) = argmin α(TV(v1 ) + TV(v2 )) + 2τ v

1 1 (k+1) ∇u + τ b(k) − v (k+1) 22 . = argmin |f − u|22 + u 2 2τ u Then v (k+1) follows as in the ROF approach by (k) 2 (∇u v (k+1) = ∇u(k) + τ b(k) − PCατ + τ b(k) )

and u(k+1) can be computed by setting the gradient to zero u(k+1) = (I −

1 1 −1 ) f − div( v (k+1) − b(k) ) τ τ

560

J. Yuan, C. Schnörr, and G. Steidl

Algorithm. Initialization: b(0) = 0 and u(0) = f For k = 0, 1, . . . iterate until a convergence criterion is reached w(k+1) := ∇u(k) + τ b(k) (k+1) 2 (w v (k+1) := w(k+1) − PCατ ) 1 1 (k+1) (k+1) −1 f − div( τ v := (I − τ ) − b(k) ) u b(k+1) := b(k) + τ1 ∇u(k+1) − v (k+1) 4.2

Algorithm Minimizing (13b)

Based on the derivation in section 3.2, we consider inf

uo ,v

1 2

∇fo − ∇uo 22 + α TV(v1 ) + TV(v2 ) ,

subject to

v = ∇uo .

and have to iterate

1 ∇fo − ∇u|22 + α(TV(v1 ) + TV(v2 )) 2 u,v 1 + ∇u − v 22 + b(k) , ∇u − vΩ , 2τ 1 = b(k) + ∇u(k+1) − v (k+1) . τ

(u(k+1) , v (k+1) ) = argmin

b(k+1)

Alternating the ﬁrst minimization process we obtain the following algorithm Algorithm. Initialization: b(0) = 0 and u(0) = f For k = 0, 1, . . . iterate until a convergence criterion is reached w(k+1) := ∇u(k) + τ b(k) (k+1) 2 (w v (k+1) := w(k+1) − PCατ ) τ (k+1) −1 := 1+τ div ∇fo + ( τ1 v (k+1) − b(k) ) u b(k+1) := b(k) + τ1 ∇u(k+1) − v (k+1)

5

Numerical Experiments

In this section we illustrate the properties of our approach with few numerical experiments. The mimetic ﬁnite diﬀerence method [9, 10] is used for discretizing relevant scalar ﬁelds and vector ﬁelds and a detailed implementation of the nonlinear functionals is given in [21]. By this numerical scheme, the relevant boundary conditions are kept well and turn out to be compatible with the corresponding integral identities. Signals. Figure 2 shows that our approach (13b) eﬀectively removes noise without staircasing eﬀect, in contrast to the ROF model. We also point out that boundaries are treated without introducing artifacts.

Total-Variation Based Piecewise Aﬃne Regularization

561

Fig. 2. Ground truth and noisy input data are shown by the ﬁrst two graphs respectively. Standard TV-regularizaton (ROF model) leads to the well-known staircasing eﬀect (see 3rd. picture). Piecewise aﬃne TV regularization eﬀectively removes noise and recovers the piecewise aﬃne signal structure (see 4th picture).

Variational approach (13a) versus (13b). Figure 3 compares the minimizers of the two variational approaches (13a) and (13b) for an arbitrary image section. The last two pictures of Figure 3 depict 3D plots of the minimizers subtracted from the original image section. The plot on 5th graph corresponding to the approach (13b) clearly indicates an approximation “error” that is not noticeable in the plot on 4th graph corresponding to (13a). This result conﬁrms the discussion above of formal diﬀerences between equations (21) and (22) and the | · |2H1 based data term is more sensitive to large noises due to the noise ampliﬁcation by partial derivatives.

Fig. 3. From left to right. Original image section, minimizer of (13a), minimizer of (13b), 3D plots of the minimizers subtracted from the original data illustrate a major diﬀerence between the variational approaches (13a) and (13b). While the 4th plot on the shows almost pure noise, the rightmost plot indicates an estimation error due to using the | · |2H1 data term which is sensitive to large noise levels.

Denoising of vector ﬁelds. Figure 4 compares the standard TV regularization (ROF model) with piecewise aﬃne TV regularization for the denoising of vector ﬁelds. The input data simulate estimates obtained for a moving camera in a scene with moving objects. This scenario is roughly represented by a piecewise planar layout of the scene. The numerical results conﬁrm again that our approach returns useful estimates of both denoised vector ﬁelds and its discontinuities, while the ROF-model only returns discontinuities but no useful vector ﬁeld estimates.

562

J. Yuan, C. Schnörr, and G. Steidl

Fig. 4. Top. Color-coded motion ﬁeld corresponding to a moving camera and static as well as moving objects represented by sections of planes; ground-thruth (1st. ﬁg.), input data (2nd. ﬁg.), the ROF-based result (3rd. ﬁg.) and the aﬃne regulariza tionbased (13a) result (4th. ﬁg.). Last two rows: Components of ∇u1 and ∇u2 for the ROF model (2nd. row) and for piecewise aﬃne regularization (3rd. row). The result on the right illustrates that through piecewise aﬃne regularization no staircasing eﬀect occurs, thus enabling both discontinuity detection and motion estimation, while the latter is not feasible for such scenarios with the standard ROF-model.

6

Conclusion

We presented a novel convex variational approach to the denoising and the decomposition of signals, images and vector ﬁelds. Based on a suitable orthogonal decomposition of the underlying vector space, a TV measure comprising second-order derivatives was introduced that enables to denoise noisy input data and to preserve piecewise aﬃne signal structure using standard algorithms of convex programming. The latter are computationally simple due to a problem decomposition employing the augmented Lagrangian and primal and dual variables. By deriving dual variational formulations aking to the ROF model, diﬀerences between ﬁrst- and second order regularization and between two alternative data terms were worked out. Numerical experiments conﬁrm these ﬁndings.

Total-Variation Based Piecewise Aﬃne Regularization

563

References 1. Chambolle, A., Lions, P.L.: Image recovery via total variation minimization and related problems. Numer. Math. 76(2), 167–188 (1997) 2. Chan, T., Esedoglu, S., Park, F.E.: Image decomposition combining staicase reduction and texture extraction. J. Visual Communication and Image Representation 18(6), 468–486 (2007) 3. Chan, T., Marquina, A., Mulet, P.: Higher-order total variation-based image restoration. SIAM J. Sci. Comput. 22(2), 503–516 (2000) 4. Danielsson, P.E., Lin, Q.: Eﬃcient detection of second-degree variations in 2D and 3D images. J. Vis. Comm. Image Repr. 12, 255–305 (2001) 5. Didas, S., Setzer, S., Steidl, G.: Combined 2 data and gradient ﬁtting in conjunction with 1 regularization. Advances in Computational Mathematics 30(1), 79–99 (2009) 6. Girault, V., Raviart, P.-A.: Finite Element Methods for Navier-Stokes Equations. Springer, Heidelberg (1986) 7. Goldstein, D., Osher, S.: The Split Bregman method for l1 regularized problems. UCLA CAM Report (2008) 8. Hintermüller, W., Kunisch, K.: Total bounded variation regularization as a bilaterally constraint optimization problem. SIAM J. Appl. Math. 64(4), 1311–1333 (2004) 9. Hyman, J.M., Shashkov, M.: Natural discretizations for the divergence, gradient, and curl on logically rectangular grids. Comput. Math. Appl. 33(4), 81–104 (1997) 10. Hyman, J.M., Shashkov, M.: Adjoint operators for the natural discretizations of the divergence, gradient and curl on logically rectangular grids. Appl. Numer. Math. 25(4), 413–442 (1997) 11. Lysaker, M., Lundervold, A., Tai, X.C.: Noise removal using fourth-order partial diﬀerential equation with applications to medical magnetic resonance images in space and time. IEEE Trans. Image Processing 12(12), 1579–1590 (2003) 12. Lysaker, M., Tai, X.C.: Iterative image restoration combining total variation minimization and a second-order functional. International Journal of Computer Vision 66(1), 5–18 (2006) 13. Rahman, T., Tai, X.C., Osher, S.J.: A TV-stokes denoising algorithm. In: Sgallari, F., Murli, A., Paragios, N. (eds.) SSVM 2007. LNCS, vol. 4485, pp. 473–483. Springer, Heidelberg (2007) 14. Rockafellar, R.T.: Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1(2), 97–116 (1976) 15. Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 16. Scherzer, O.: Denoising with higher order derivatives of bounded variation and an application to parameter estimation. Computing 60, 1–27 (1998) 17. Setzer, S.: Split Bregman algorithm, Douglas-Rachford splitting and frame shrinkage. In: Lie, K.A., Lysaker, M., Morken, K., Tai, X.C. (eds.) Scale Space and Variational Methods. LNCS. Springer, Heidelberg (2009) 18. Trobin, W., Pock, T., Cremers, D., Bischof, H.: An unbiased second-order prior for high-accuracy motion estimation. In: Rigoll, G. (ed.) DAGM 2008. LNCS, vol. 5096, pp. 296–405. Springer, Heidelberg (2008) 19. Vogel, C.R.: Computational Methods for Inverse Problems. SIAM, Philadelphia (2002)

564

J. Yuan, C. Schnörr, and G. Steidl

20. You, Y.L., Kaveh, M.: Fourth-order partial diﬀerential equations for noise removal. IEEE Trans. Image Processing 9(10), 1723–1730 (2000) 21. Yuan, J., Schnörr, C., Steidl, G.: Simultaneous optical ﬂow estimation and decomposition. SIAM J. Scientiﬁc Computing 29(6), 2283–2304 (2007) 22. Yuan, J., Schnörr, C., Memin, E.: Discrete orthogonal decomposition and variational ﬂuid ﬂow estimation. J. Math. Imaging and Vision 28(1), 67–80 (2007) 23. Yuan, J., Schnörr, C., Steidl, G.: Convex hodge decomposition and regularization of image ﬂows. J. Math. Imag. Vision 33(2), 169–177 (2009)

Image Denoising by Harmonic Mean Curvature Flow Mourad Zéraï Laboratory for Mathematical and Numerical Modeling in Engineering Science National Engineering School at Tunis ENIT-LAMSIN, B.P. 37, 1002 Tunis Belvédère, Tunisia [email protected]

Abstract. We propose a noise-removal method for vector-valued images by considering the negative gradient ﬂow (the biharmonic map heat ﬂow) of the intrinsic Bi-energy on Riemannian manifold of non-positive curvature. This method represents a natural generalization of both harmonic maps and minimal immersions. It is derived by ﬁnding the critical point of the variational problem associated to the integral of the squared norm of the tension-ﬁeld (Bi-harmonic map) or of the mean curvature vector ﬁeld (Bi-minimal immersion). In local coordinates, this method yields a fourth order non-linear system of PDE’s that we, numerically, solve by an explicit ﬁnite diﬀerence method. Experiments on real color-image endowed with the Helmholtz and Stiles metrics show that the proposed method is eﬀective, accurate and highly robust.

1

Introduction

Let (D, g) be a ﬂat 2D image domain endowed with the metric g and mapped in an (V, h) coordinates manifold, which can be, for instance, a color RGB-space endowed with the color-metric h. Consider the energy 1 E2 (u) = |τ (u)|2 dμg , (1) 2 D where μg is the area measure on D endowed with the metric g and τ (u) = trg ∇du is the tension vector ﬁeld, vanishing for critical points of the Dirichlet energy (i.e. harmonic maps), 1 E(u) = |du|2 dμg . (2) 2 D In local coordinates, it takes the form: 2 α α ∂ u α ij D k ∂u − Γ + τ (u) = g ij ∂xi ∂xj ∂xk

V

α Γβγ

∂uβ ∂uγ ∂xi ∂xj

,

(3)

α where D Γijk and V Γβγ are the Christoﬀel symbols of the Levi-Civita connections on (D, g) and (V, h). X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 565–575, 2009. c Springer-Verlag Berlin Heidelberg 2009

566

M. Zéraï

Critical points of E2 (u) are called biharmonic maps. The Euler-Lagrange operator attached to the bienergy (1), called the bitension field and derived by Jiang [16] is τ2 (u) = −g τ (u) − trRV (du, τ (u)du). (4) The corresponding gradient ﬂow is given by the geometric evolution problem ∂ut = −τ2 (u),

(5)

where RV is the curvature tensor of V. Jiang also proved that, in the case of a target manifold RV with non-positive curvature, every biharmonic map is harmonic, which is the case of almost all coordinates manifolds (excluding the directional ones) that we are concerned with in image processing. In the same way, if we denote by Imm(D, V) the space of Riemannian immersions in (V, h), then a Riemannian immersion u : (D, g) → (V, h) is called minimal if it is a critical point of the volume functional 1 V : Imm(D, V) → IR, V (u) = dμu∗ h , 2 D where u∗ h is the pull-back metric and μu∗ h is the induced area mesure on D. The corresponding Euler-Lagrange equation is H = 0, where H is the mean curvature vector ﬁeld. We recall an important fact that will be of great importance in the sequel and established by Eells and Sampson [7] that is : if u is an immersion, then its mean curvature is, up to a constant, the trace of the second fundamental form trg ∇du. As suggested by Eells and Sampson in their seminal paper [7], natural generalization of harmonic maps can be given by considering the critical points of the functionals obtained integrating the square of the norm of the tension ﬁeld, i.e. : 1 E2 (u) = |τ (u)|2 dμg . 2 D Critical points of the functionals obtained integrating the square of the norm of the the mean curvature vector ﬁeld, which is known in the literature by the (generalized)-Willmore functional, can represent a possible generalization of minimal immersions. More precisely, biminimal immersions (or Willmore immersions) are critical points of the Willmore functional (see [21] for a short survey on those topics): W(u) = (|H|2 + K)dμu∗ h , (6) D

where K is the sectional curvature of (V, h) restricted to the image of D. Historically, this functional appears in the context of an embedding of a surface Σ in the three dimensional Euclidean space IR3 and consequently with a vanishing sectional curvature K. As noticed by Willmore in [34], it was Weiner in (1978) who has added the curvature term K in the integrand when he considered immersions of orientable surface into a Riemannian manifold of constant sectional curvature [32].

Image Denoising by Harmonic Mean Curvature Flow

567

The Willmore functional appears already in earlier work of Germain [11]. After what it was considered in the early twentieth century in various works by Thomsen [27], and subsequently by Blaschke [2]. It was reintroduced, and more systematically studied within the framework of the conformal geometry of surfaces, by Willmore in 1965 [33]. The Willmore functional also plays an important role in various areas of science. In molecular biology, it is known as the Helfrich Model [13], where it appears as a surface energy for lipid bilayers. In solid mechanics, the Willmore functional arises as the limit-energy for thin-plate theory (see [9]). In general relativity, this functional appears as the main term in the expression of the Hawking quasilocal mass (see [12] and [15]). In image processing, the Willmore functional was used since 2004 by Droske and Rumpf in a level-set formalism (see [6]) for the restoration of damaged region of a surface. Recently Clarenz et al. [5] covers a hole and reconstruct a surface by using a minimizing Willmore energy functional with a ﬁnite element implementation leading to smooth surface patches with guaranteed continuity properties. In this paper, we use the generalized Willmore functional (6) in the context of manifold-valued image processing. We are interested with a noise-removal method for multi-channels images, taking color images as a typical representative of this class of images. More precisely, we use ﬂat color metrics, i.e., with vanishing curvature tensor. Namely, we use the Helmholtz and Stiles (ﬂat) metrics, and consequently the curvature term will disappear from (6) which simpliﬁes the expressions of the derived ﬂow which is a nonlinear parabolic PDE’s system ∂ut = g H,

(7)

where g = u∗ h and g stands for the rough Laplacian, i.e.: 1 ∂ H = √ det g ∂xi g

α

∂H α det gg ∂xj ij

+

V

α Γβγ

∂H β ∂H γ ij g . ∂xi ∂xj

(8)

Since the Euler-Lagrange equation associated to (6) reduces to g H = 0. Following the denomination used by Chen in [4] we call the ﬂow (7) harmonic mean curvature flow (HMCF) and it represents an extension to the notion of harmonic mean curvature from the Euclidean setting to the Riemannian one. We refer to [1] for more details about this topic in the Euclidean setting. We will tackle the problem with non ﬂat color metrics such as Schrodinger or VosWalraven metrics in a future work. We note that Sochen and Zeevi [25] have already used the Vos-Walraven line-element in a Riemannian setting for processing color images with the Beltrami ﬂow which yields a second-order nonlinear parabolic PDEs system. Finally, one can remark, formally, the apparent similarity between the two HMCF in Euclidean and (ﬂat)-Riemannian settings since it is well known that every ﬂat Riemannian n-dimensional manifold is locally isometric to IRn (see ( [10], p. 109, for instance).

568

1.1

M. Zéraï

Related Works

In local coordinates, the gradient descent of the Euler-Lagrange equation of the minimization problem related to (6) yields a fourth order non-linear system of PDE’s, and thus our method can be classiﬁed in the family of fourth-order parabolic equation for image denoising. This family of methods have gained big importance in the last few years. Indeed, many nonlinear PDEs are proposed to deal with the tradeoﬀ between noise removal and edge preservation. Among them, the fourth-order parabolic PDEs have drawn great interest. In general, the forms of fourth-order PDEs are analogous with the second order ones. For example, the fourth order equation proposed by You and Kaveh [35], ut = −(g(u)u),

(9)

or the equation proposed by Wei [30] ut = −∇(g(|∇u|)∇u),

(10)

are Perona-Malik analogue; the equation proposed by Tumblin and Turk’s in [28] ut = ∇(g(Dij u)∇u),

(11)

where g is a function of the second derivatives of the image intensity function u, is a fourth order possible analogue to the anisotropic diﬀusion equation of Weickert [31], and ﬁnally the equation proposed by Lysaker et al [19] u ut = , (12) |u| is similar to Total Variation model [23].

2

Color-Image as Typical Example of Vector-Valued Image Processing

Since the beginning of quantized color vision theories in the 19th century two approaches have appeared. On one side is the Young-Helmholtz trichromatic approach which is physically oriented and compatible with the science of colorimetry. On the other side is the opponent approach which is mainly based on color sensation. In this paper, we are mainly interested by the geometrical trichromatic approach of Young-Helmholtz. This approach has many good computation characteristics but also many physiological limitations (for a nice discussion about this topic see [3] for instance). In this vein, many approaches were proposed by diﬀerent line element theory. The notion of line-element is nothing but the metric associated to the color manifold. Amount these lineelement theories we mention Helmholtz [14], Schrödinger [24], Stiles [26] and Vos-Walraven [29]. In the geometrical trichromatic approach, a color image is considered as 3 images: Red, Green, and Blue, (or their many other transformations) that are

Image Denoising by Harmonic Mean Curvature Flow

569

composed into one. These three channels represent a limited domain in the three dimensional Euclidean space IR3 , which we endow with a metric derived from the expression of the considered line-element. With this construction we can consider the color space as a Riemannian (sub-)manifold. The diﬀerent line-elements (or metrics) proposed in the literature are derived from two main considerations: – The ﬁrst consideration is what Ron Kimmel has called inductive [18]. In this case, the line elements are established by simple assumptions on the visual response mechanisms. All models of this category assume that the color space can be simpliﬁed and represented as a Riemannian space of nonpositive curvature. Some of the proposed metrics have an eﬀect to ﬂatten the color space like the Helmholtz’s or the Stiles’ one. The others have the eﬀect to warp (negatively) the color space like Schrödinger or Vos-Walraven metrics. Roughly speaking, we can see the negative curvature of a manifold like a generalization of Euclidean space (which is ﬂat) in the sense that if two geodesic-lines start from the same point but in diﬀerent directions, they will never cross again (which is not true in manifold with positive curvature, like the sphere). This characteristic ensures some nice properties like uniqueness results in minimization problems [17]. – The second class of line-elements are derived by empirical considerations. In this category, the metric coeﬃcients are determined to ﬁt empirical data. Among them, some describe an Euclidean space like the CIELAB (CIE 1976 (L*a*b*)) [18]; some others, like MacAdam [20], are based on an eﬀective arclength. 2.1

Vector-Valued Image as Isometric Immersions

Let (D, g) be a ﬂat 2D image domain imbedded in an (V, h) coordinates manifold, which can be, for instance, a color RGB-space endowed with the color-metric h, ˜ = (IR2 ⊕ V, can ⊕ h), where ˜ h) and let (V, – can is the canonical metric of IR2 , – V is an n-dimensional manifold equipped with the metric h and modeling the coordinates space which is the (vector) channels of an image u (the three dimensional RGB space for a color image, for instance), – V˜ is the direct sum of IR2 and V, ˜ = can ⊕ h. – and h A vector-valued image can be described mathematically as an isometric immersion (x1 , x2 ) → u = (x1 , x2 ; v 1 (x1 , x2 ), . . . , v n (x1 , x2 )) of a two-dimensional domain D in the trivial ﬁber bundle IR2 × (V, h) which is a (2 + n)-dimensional manifold. The image manifold and its metric (D, g) are called the space of parameters in the dynamical system community, the target manifold and its metric ˜ are called the space of coordinates. The metric h ˜ of V is then given by ˜ h) (V, d˜ s2 = ds2spatial + β 2 ds2vector

(13)

570

M. Zéraï

where β is the relative scale between the spatial coordinates and the intensity components which we will set equal to one for sake of simplicity. We can rewrite the metric deﬁned by (13) as the quadratic form d˜ s2 = (dx1 )2 + (dx2 )2 + (dv)T h(dv), where v = (v i ). The corresponding metric tensor is I2 02,n ˜ h= , 0n,2 h where h is the metric tensor of V. ˜ Therefore Since the image is an isometric immersion, we have g = u∗ h. gαβ = δαβ + hij ∂α v i ∂β v j ,

α, β = 1, 2,

i, j = 1, . . . , n,

(14)

where n = 3 if we deal with color-images.

3 3.1

Main Examples of Flat Color-Metric Helmholtz’s Color Metric

Hermann von Helmholtz (1821-1894), was the ﬁrst who had attempted to mathematically formulate the distance between colors by the concept of line element. He deﬁnes the following line element: 2 2 2 dR dG dB + + , (15) ds2 = R G B where R, G and B are the three color channels: Red, Green and blue. In local coordinates, this can be expressed as a positive deﬁnite symmetric matrix: ⎛ 1 ⎞ 0 0 x21 ⎜ ⎟ 1 (hij )i,j=1,2,3 = ⎝ 0 x22 0 ⎠ , (16) 0 0 x12 3

where we use the coordinate notation x1 = R, x2 = G and x3 = B. The color space is deﬁned as a domain D in the positive orthant IR3+ deﬁned by:

IR3+ = x ∈ IR3 | xi > 0, i = 1, 2, 3 (17) Having the expression of the metric, we can now give the Christoﬀel symbols using the formula: 1 ∂hkl ∂hjk ∂hjl i , (18) Γjk = hij + − 2 ∂xk ∂xj ∂xl and hence, the non vanishing Christoﬀel symbols are 1 1 1 2 3 , Γ22 = − , Γ33 =− . R G B A simple computation shows that the color-manifold endowed with Helmholtz metric is ﬂat. 1 Γ11 =−

Image Denoising by Harmonic Mean Curvature Flow

3.2

571

Stiles’ Color Metric

Walter W. Stiles modiﬁed the Helmholtz’s proposal in order to better account for observations of threshold values (see [26] p. 660). Thus he proposed the following form of color-metric: 2 2 2 ζ(R) ζ(G) ζ(B) 2 dR + dG + dB (ds) = ρ γ β where: 9 , 1 + 9R

ζ(R) =

ζ(G) =

9 , 1 + 9G

ζ(B) =

9 . 1 + 9B

The functions ζ(R), ζ(G) and ζ(B) are determined experimentally. The constant ρ, γ and β are proportional to the limiting Weber fractions of the three cone responses at high luminances and Stiles obtained the following values: ρ = 1.28,

γ = 1.65,

β = 7.25

At high luminances, the Stiles’ metric reduces to 2

(ds) =

dR ρR

2

+

dG γG

2

+

dB βB

2

and in this form its relationship with the Helmholtz’s metric is obvious. With the same notations as the previous section and using equation (18) we have 9 9 9 1 2 3 , Γ22 , Γ33 . Γ11 =− =− =− 1 + 9R 1 + 9G 1 + 9B Another simple computation shows that the color-manifold endowed with Stiles’ metric is ﬂat.

4

Harmonic Mean Curvature Flow

We consider the ﬂow ∂ut = g H,

(19)

where H, in local coordinates, takes the form (up to a multiplicative constant): ∂uβ ∂uγ ij g , (20) ∂xi ∂xj g = u∗ h the pull-back induced metric and the Laplace-Beltrami operator. Suppose that the color-space is Euclidean, then all the Christoﬀel symbols V α Γβγ vanish and (19) becomes H α (u) = u +

V

α Γβγ

∂ut = (u)

(21)

572

M. Zéraï

where = √

1 ∂ det g ∂xα

∂ det gg αβ β ∂x

.

and we recover a You-Kaveh type equation (9) when we consider the intrinsic formulation (21), and a Wei type equation (10) if we consider (21) in local coordinates. α ∂uβ ∂uγ ij term, is to constraint the ﬂow to live on The eﬀect of the V Γβγ ∂xi ∂xj g the color-manifold, and thus to take account of the physiological aspects of the diﬀerent luminances.

5

Numerical Issues

The corresponding gradient descent of the minimizing of the functional (6) leads to a system of fourth order partial diﬀerential equations. We have used an explicit ﬁnite diﬀerence discretization approach for this PDEs system which requires the evaluation of higher order derivatives and comes along with strong restrictions on the time step. This is not the better method to deal with this problem. To overcome these diﬃculties, a better strategy is, for instance, the discretization by the ﬁnite element method as it was used by Clarenz et al [5]. Nevertheless, the ﬁnite diﬀerence explicit scheme we used seems to be very robust and eﬀective in numerical experiments.

6

Experiments

To be sure that our model is eﬀective and works, we made some tests on diﬀerent color-images. In ﬁgure-1 are presented the aﬀects of HMCF, with Helmholtz and Stiles metrics, on a detail of the peppers color image. In order to test the eﬃciency of our method we must compare it with other fourth-order methods (and even second-order). That’s what we will accomplish in the future. It is interesting to analyse the ﬁgure-2 where are presented in grey-levels images the intensities of the four entries of the inverse image metric tensor, namely (g ij ) with the above notations. It is clearly shown that (g ij ) collects the morphological structure of the image, and acts like an anisotropic edge stopping

Fig. 1. From left to right : 1- Original image as a little detail from peppers, 2- Highly degraded image, 3- HMCF with Helmholtz metric, 100 iterations at dt =0.05, 4- HMCF with Stiles metric, 100 iterations at dt =0.05

Image Denoising by Harmonic Mean Curvature Flow

573

Fig. 2. Grey-levels image representation of the four entries of the (symmetric) tensor g ij which is the inverse of the induced metric gij and acts like an anisotropic edge stopping function

function. This fact proves, empirically, that our method preserves the contours of an image, while it smoothes homogeneous region. And the fact that the edge stopping function is of matrix-type, then anisotropic, motivates the comparison with the fourth-order equation (11) proposed by Tumblin and Turk’s [28].

Acknowledgment I am indebted to Professor Maher Moakher for encouragement, insightful comments and assistance throughout my work.

References 1. Barros, M., Garay, O.J.: On Submanifolds with Harmonic Mean Curvature. Proceedings of the American Mathematical Society 123(8) (August 1995) 2. Blaschke, W.: Vörlesungen über Diﬀerential Geometrie III. Springer, Berlin (1929) 3. Buchsbaum, G., Gottschalk, A.: Trichromacy, opponent colours coding and optimum colour information transmission in the retina. Proc. R. Soc. Lond. B (220), 89–113 (1983) 4. Chen, B.-Y.: Some open problems and conjectures on submanifolds of ﬁnite type. Soochow J. Math. 17, 169–188 (1991)

574

M. Zéraï

5. Clarenz, U., Diewald, U., Dziuk, G., Rumpf, M., Rusu, R.: A ﬁnite element method for surface restoration with smooth boundary conditions. Computer Aided Geometric Design 21(5), 427–445 (2004) 6. Droske, M., Rumpf, M.: A level set formulation for Willmore ﬂow. Interfaces and Free Boundaries 6(3), 361–378 (2004) 7. Eells, J., Sampson, J.H.: Harmonic mappings of Riemannian manifolds. Amer. J. Mah. 86, 109–160 (1964) 8. Eliasson, H.I.: Introduction to global calculus of variations. In: Global analysis and its applications, IAEA, Vienna, vol. II, pp. 113–135 (1974) 9. Friesecke, G., James, R.D., Muller, S.: A theorem on geometric rigidity and the derivation of nonlinear plate theory from three-dimensional elasticity. Commun. Pure Appl. Math. 17(11), 1461–1506 (2002) 10. Gallot, S., Hulin, D., Lafontaine, J.: Riemannian Geometry, 2nd edn. Springer, Heidelberg (1990) 11. Germain, S.: Recherches sur la théorie des surfaces élastiques. Courcier, Paris (1821) 12. Hawking, S.W.: Gravitational radiation in an expanding universe. J. Mat. Phys. 9, 598–604 (1968) 13. Helfrich, W.: Elastic properties of lipid bilayers: theory and possible experiments. Z. Nat. forsch. A C28, 693–703 (1973) 14. von Helmholtz, H.: Handbuch der Physiologischen Optik. Voss, Hamburg (1896) 15. Huisken, G., Ilmanen, T.: The Riemannian Penrose inequality. Int. Math. Res. Not. 1997(20), 1045–1058 (1997) 16. Jiang, G.Y.: 2-harmonic maps and their ﬁrst and second variational formulas. Chin. Annals Math. A 7, 389–402 (1986) 17. Jost, J.: Riemannian Geometry and Geometric Analysis, 2nd edn. Springer, Heidelberg (1998) 18. Kimmel, R.: A natural norm for color processing. In: Chin, R., Pong, T.-C. (eds.) ACCV 1998. LNCS, vol. 1351, pp. 88–95. Springer, Heidelberg (1997) 19. Lysaker, M., Lundervold, A., Tai, X.C.: Noise removal using fourth-order partial diﬀerential equation with applications to medical magnetic resonance images in space and time. IEEE Transactions on images processing 12, 1579–1590 (2003) 20. MacAdam, D.L.: Visual sensitivity to color diﬀerences in daylight. J. Opt. Soc. Am. 32, 247 (1942) 21. Montaldo, S., Oniciuc, C.: A short survey on biharmonic maps between Riemannian manifolds. Revista de la Unión Mathemática Argentina 47(2) (2006) 22. Olischlager, N., Rumpf, M.: A two step time discretization of Willmore ﬂow. In: 21st Chemnitz FEM Symposium (2008) 23. Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 24. Schroedinger, E.: Grundlinien einer theorie de farbenmetrik in tagessehen. Ann. Physik 63, 481 (1920) 25. Sochen, N., Zeevi, Y.: Using Vos-Walraven line element for Beltrami ﬂow in color images. EE-Technion and TAU HEP report Technion and Tel-Aviv University (1992) 26. Stiles, W.S., Wyszecki, G.: Color Science, Concepts and Methods, Quantitative Data and Formulae. John Wiley & Sons, Inc., Chichester (2000) 27. Thomsen, G.: Über konforme Geometrie, I. Grundlagen der konformen Flächentheorie. Abh. Math. Semin. Univ. Hamburg, 31–56 (1923)

Image Denoising by Harmonic Mean Curvature Flow

575

28. Tumblin, J., Turk, G.: LCIS: A boundary hierarchy for detail-preserving contrast reduction. In: Proceeding of the SIGGRAPH annual conference on Computer Graphics, Los Angeles, CA USA, August 1999, pp. 83–90 (1999) 29. Vos, J.J., Walraven, P.L.: An analytical description of the line element in the zoneﬂuctuation model of colour vision II. The derivative of the line element. Vision Research (12), 1345–1365 (1972) 30. Wei, G.: Generalized Perona-Malik equation for image processing. IEEE Signal Processing Letters 6(7), 165–167 (1999) 31. Weickert, J.: Anisotropic diﬀusion in image processing. Teubner (1998) 32. Weiner, J.L.: On a problem of Chen, Willmore et Alia. Indiana University Math. J. (27), 19–35 (1978) 33. Willmore, T.J.: Note on embedded surfaces. An. Stiint. Univ. Al. I. Cuza Iasi., Ser. Noua, Mat. 11B, 493–496 (1965) 34. Willmore, T.J.: Riemannian Geometry. Owford Science Publications (1993) 35. You, Y.L., Kaveh, M.: Fourth-order partial diﬀerential equations for noise removal. IEEE Transactions on Image Processing 10(9), 1723–1730 (2000)

Tracking Closed Curves with Non-linear Stochastic Filters Christophe Avenel1 , Etienne Mémin2 , and Patrick Pérez2 1

ENS Cachan / IRISA INRIA, Vista Project, Center of Rennes {Christophe.Avenel,Etienne.Memin,Patrick.Perez}@irisa.fr 2

Abstract. The joint analysis of motions and deformations is crucial in a number of computer vision applications. In this paper, we introduce a non-linear stochastic ﬁltering technique to track the state of a free curve. The approach we propose is implemented through a particle ﬁlter which includes color measurements characterizing the target and the background respectively. We design a continuous-time dynamics that allows us to infer inter-frame deformations. The curve is deﬁned by an implicit level-set representation and the stochastic dynamics is expressed on the level-set function. It takes the form of a stochastic diﬀerential equation with Brownian motion of low dimension. Speciﬁc noise models lead to traditional evolution laws based on mean curvature motions, while other forms lead to new evolution laws with diﬀerent smoothing behaviors. In these evolution models, we propose to combine local motion information extracted from the images and an incertitude modeling of the dynamics. The associated ﬁlter we propose for curve tracking thus belongs to the family of conditional particle ﬁlters. Its capabilities are demonstrated on various sequences with highly deformable objects.

1

Introduction

Tracking deformable structures delineated by free curves, with no prior on their possible shapes, is a very challenging problem. As a matter of fact, the shape of a deformable object or even of a rigid body may change drastically when visualized from an image sequence. These deformations are due to object apparent motion, to perspective eﬀects and to 3D shape evolution. This diﬃculty is ampliﬁed when the object becomes partially or totally occluded during even a very short time period. The presence of cluttered background and ambiguities constitutes other diﬃculties for tracking. For curve tracking numerous approaches based on the level set representation have been proposed [1, 2, 3, 4, 5, 6, 7]. These techniques mainly addressed the problem as a succession of instantaneous detection or segmentation problems. At best only discrete snapshots of the location of the object of interest are provided and no dynamical or morphological consistency can be really enforced. Implausible growing/decreasing or merging/splitting cannot be avoided without resorting to shape priors [8,9,10]. This reduces considerably the generality of the tracker and restrains its use to very speciﬁc applications [8,10]. X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 576–587, 2009. c Springer-Verlag Berlin Heidelberg 2009

Tracking Closed Curves with Non-linear Stochastic Filters

577

Such deterministic approaches have also great diﬃculties to cope with ambiguities and noise. The explicit introduction of a dynamics in the curve evolution law has been considered in [4]. However, the proposed technique, although much more satisfying from the point of view of the forecasting of the curves, is not embedded into a tracking framework. In [11], an approach based on a group action mean shape and a moving average has been proposed. This tracking is restricted to simple motions. Recently an optimal control strategy has been deﬁned for curve tracking [12]. This technique permits to cope with non linear diﬀerential evolution laws. It is nevertheless a deterministic technique that only involves Gaussian incertitude on the dynamical system. It is also a batch technique which relies on the entire image sequence. It can hardly be used for on-line tracking. The extraction of state trajectories relying on past measurements and on a dynamical model, as done with stochastic ﬁltering, permits to handle naturally partial occlusions, cluttered noise and ambiguities. It enables also to rely on an approximate knowledge of the underlying dynamics. However, the state dimension constitutes the Achille’s heel of recursive Bayesian ﬁlter such as the particle ﬁlter. Due to this so called curse of dimensionality, only few works attempted to mix stochastic ﬁltering and level set representation for curve tracking [13, 14]. These works have to face a high dimensional sampling problem and as a consequence rely on a crude discretization of the non linear curve dynamics which may be problematic in some situations. The approach we proposed for curve tracking is also implemented through a particle ﬁlter and a level set representation. This approach includes color measurements characterizing the target and the background respectively [15]. The dynamics involved is formulated as a stochastic diﬀerential equation. This allows us to get a continuous-time representation of the curve trajectory and, thus, to infer inter-frame deformations. This gives access to richer dynamics on curves. It would also permit the use of continuous time physical evolution laws in speciﬁc contexts. The stochastic dynamics is expressed on the level-set function and takes the form of a stochastic diﬀerential equation with Brownian motion of low dimension. Although such an attempt has been done to build stochastic dynamics for image segmentation in [16], our approach is diﬀerent, as it integrates naturally the contribution of noise in the dynamics derivation. It also allows interpreting additional smoothing terms on the curve as a consequence of the incertitude we have on the curve dynamics. Conceptually, this yields a rigourus derivation of the curve dynamics, enabling to handle topological changes occuring between two frame instants, and also to cope with the propagation of possibly irregular curves driven by noisy motion ﬁelds. No adhoc, additional ﬁlters are here needed to propagate the curve. Such a smoothing is expicitly handled within the expression of the stochastic expression of the level set dynamics. The evolution models we propose combines local motion information extracted from the image and the modeling of dynamics uncertainty. The associated ﬁlter thus belongs to the family of conditional particle ﬁlters [17].

578

2

C. Avenel, E. Mémin, and P. Pérez

Stochastic Filtering and Particle Filter

Before introducing in detail the stochastic evolution laws on which we will rely in this work we present in this section the generic problem of continuous time stochastic ﬁltering in presence of discrete-time measurements. Stochastic ﬁlters constitute well known procedures to estimate the posterior pdf p(xk |z1:k ) (called the ﬁltering distribution) of a state variable of interest at any measurement instant k, given the discrete measurements series z1:k = (z1 · · ·zk ) until instant k, and an initial distribution p(x0 ). In the following, we consider a continuous time state xt . We will denote by xt=k or xk its value at the measurement instant k. At each time instant k, the measurement equation relates the observation zk to the state xk . In this work the general system we are dealing with is described by: dxt = f (xt )dt + σ(t)dBt , (1) zk = g(xk ) + vk , where Bt is a Brownian motion and vk is a noise variable. Functions f and g are non linear in the general case. Assuming there exists a transition distribution p(xt |xr
Particle Filter

Particle ﬁltering is a sequential Monte Carlo framework that yields an approximate solution of the general stochastic ﬁltering problem (non linear likelihood, non additive and non Gaussian noises). The ﬁltering distribution p(xk |z1:k ) is recursively approximated by a ﬁnite weighted sum of N Diracs centered on hypothesized locations in the state space – called particles – of the initial system (i) (i) x0 . At each particle, xk (i = 1 : N ), is assigned a weight γk describing its relevance. This approximation reads: (i) p(xk |z1:k ) ≈ γk δx(i) (xk ). (2) i=1:N

k

Assuming that the approximation of p(xk−1 |z1:k−1 ) is known, the recursive computation of the ﬁltering distribution is done by propagating the swarm of (i) (i) weighted particles {xk−1 , γk−1 }i=1:N . At each time instant (or iteration), the

Tracking Closed Curves with Non-linear Stochastic Filters

579

(i)

set of new particles {xk }i=1:N is drawn from an approximation of the true distribution p(xt≥k−1 |z1:k ), called the importance function and here denoted (i) π(xt |x0:k−1 , z1:k ). The closer the approximation to the true distribution, the (i)

more eﬃcient the ﬁlter. The importance weights, wk , account for the deviation w.r.t. the unknown true distribution. To maintain a consistent sample, the importance weights are updated according to a recursive evaluation as the new measurement zk becomes available: (i) γk

∝

(i) γk−1

(i)

(i)

(i)

p(zk |xk ) p(xk |xk−1 ) (i) (i) π(xk |x0:k−1 , z1:k )

,

(i)

γk = 1.

(3)

i=1:N

Diﬀerent choices are possible for this proposal density [17]. The most common one consists in setting the proposal distribution to the dynamics: (i)

(i)

π(xt |x0:k−1 , z1:t ) = p(xt |xk−1 ).

(4)

In this case the weight update in (3) is greatly simpliﬁed: it amounts to multiply(i) ing by the data likelihood p(zt |xt ). This version of the particle ﬁlter is known as the bootstrap filter. This is the kind of ﬁlter which we are dealing with. In our case the two steps of the ﬁlter reads: (i)

(i)

(i)

– Prediction step : xk ∼ p(xk |xk−1 ) (i)

(i)

(i)

– Correction step : wk ∝ wk−1 p(zk |xk ). (i)

The prediction step consists in sampling trajectories {xt : k−1 t k}i=1:N from the stochastic diﬀerential equation describing the continuous evolution of the state: (i) (i) (i) (i) dxt = f (xt )dt + σ(t, xt )dBt , (5) (i)

(i)

from the initial condition {xk−1 }i=1:N and where {Bt }i=1:N are independent Brownian motions. The simulation of the sde (5) can be done through the Euler scheme: (i) (i) (i) (i) (i) xt+Δt = xt + f (xt )Δt + σ(t)(Bt+Δt − Bt ), (6) (i)

(i)

where the increments Bt+Δt − Bt are independent Gaussian noises with zero mean and variance ΔtI. Let us note that the discretization step is much smaller than the inter measurement time interval (Δt 1). In order to avoid degeneracy of the particle swarm, a resampling step must be applied suﬃciently often [19]. This process consists in drawing, with replacements, a new set of particle from the current one according to a probability distribution that depends on importance weights. The particles associated to low weights will tend to disappear whereas the ones with larger weights are likely to be duplicated. In this work the state variables will consist in closed curves represented by implicit surfaces. Their associate dynamics will be deﬁned in section 3. Before that let us deﬁne the likelihood on which we will rely.

580

2.2

C. Avenel, E. Mémin, and P. Pérez

Likelihood Definition

In bootstrap ﬁlters, the likelihood associated to each particle directly determines its weight. It is therefore crucial for the likelihood to be suﬃciently discriminant in order to discard curves which are too distant from the intended result. To this end, we choose to deﬁne a likelihood that depends on the similarity between the color distributions inside the curve at times t = 0 and t = k respectively. For each particle, it reads: (i)

(i)

p(zt |xt ) ∝ exp −λd(h0 , hk ),

(7)

where d is related to the Bhattacharyya distance between h0 the reference inte(i) rior color histogram instantiated at time 0 and hk the interior color histogram associated to the i-th level-set sample at time k, and λ is a positive parameter. For discrete probability distributions p and q deﬁned over the same domain X, the Bhattacharyya distance is deﬁned as: 1/2 d(p, q) = 1 − p(x)q(x) (8)

3

A Stochastic Evolution Law for Level Sets

As mentioned in the introduction, the curve that we want to track is deﬁned by an implicit level-set representation. The stochastic dynamics has thus to be deﬁned on this level-set function which is of inﬁnite dimension (or at least of very high dimension in its discrete form). In order to cope with the curse of dimensionality that makes ineﬃcient any sampling in high dimension, the model we consider relies on a low dimension Brownian motion. To this end we introduce next three diﬀerent evolution laws and explain how they are related to evolution laws of level sets. Let Ct denote a closed Jordan curve Ct : [0, 1] → R2 at time t ∈ [t0 , τ ] of the image sequence. Let us ﬁrst assume that this curve evolves in time according to the following evolution law: (1)

dCt = wn ndt + σ1 ndBt (1)

(2)

+ σ2 n⊥ dBt ,

(9)

(2)

where dBt and dBt are two independent Brownian motions, n is the unit vector normal to the curve and wn = w · n is the projection of some velocity ﬁeld w on this normal. In this model, a deterministic drift associated to velocity ﬁeld w is mitigated with an isotropic Gaussian incertitude that grows linearly in time. As a matter of fact, let us recall that the quadratic variation of the Brownian motion, on the real line for sake of simplicity, is: < σdBt , σdBt >t =

0

t

(σdBs )2 = lim

Δt→0

t

|Bt+Δt − Bt |2 = σ 2 t.

(10)

0

Contrary to diﬀerentiable deterministic functions, Brownian motion does not have a bounded variation (i.e., its total variation on [0, t] is inﬁnite).

Tracking Closed Curves with Non-linear Stochastic Filters

581

Level Set Representation. As we wish to focus in this work on the tracking of non parametric closed curves that may exhibit topology changes during the time of the analyzed image sequence, we will rely on an implicit level set representation of the curve of interest [5, 7]. Within this framework, the curve Ct enclosing a region D we wish to track is described at time t through a higher dimensional surface Φ : R2 → R and the implicit equation: Ct = (xt (p) : Φ(xt (p)) = 0) ,

(11)

where p stands for a parameter of the curve and x ∈ Ω denote image positions. This representation constitutes an Eulerian representation of a curve and enables a natural topology adaptivity. The implicit representation is deﬁned from an initial surface such as a signed distance function to the contours of interest, and evolves according to the curve evolution law. The curve at time t is deﬁned by construction through its implicit representation at time t: t Φt = Φ0 + dΦs . (12) 0

Assuming the level set representation is uniquely deﬁned from an initial surface and the curve evolution (9), the surface, Φ, constitutes a function of the stochastic process Ct . Its diﬀerential must be calculated using the so called Îto formula from stochastic calculus [20, 21]. Stochastic Level Set Evolution Law. Let us apply Îto formula to the implicit representation of the curve Φ(X t ) where X t = (Xtx Xty )T ∈ Ω, is driven by an Îto diﬀusion deﬁned as an extension of the curve velocity: (1)

dX t = wn∗ ndt + σ1 ndBt

(2)

+ σ2 dBt n⊥ .

(13)

In this equation, the drift term wn∗ is an extension to the whole image domain of the curve deterministic drift along the curve normal n = ∇Φ/∇Φ. Following Îto formula, the process ϕt = Φ(X t ) is an Îto process deﬁned as (1)

dϕt = wn∗ ∇ϕdt + σ1 ∇ϕdBt

+

1 ∂2ϕ < dXti , dXtj > . 2 i,j=x,y ∂xi , xj

(14)

The associated quadratic variation reads: < dXtx , dXtx > = < dXty , dXty > = <

dXtx , dXty

>=

σ12 ϕ2x +σ22 ϕ2y dt, ∇ϕ2 σ12 ϕ2y +σ2 ϕ2x dt, ∇ϕ2 (σ12 −σ22 )ϕx ϕy dt. ∇ϕ2

(15)

Introducing the surface normal expression, the Îto diﬀusion [21] driving the implicit surface evolution reads ﬁnally: dϕt = wn∗ ∇ϕ +

1 (ϕxx (σ12 ϕ2x + σ22 ϕ2y ) + ϕyy (σ12 ϕ2y + σ22 ϕ2x ) 2∇ϕ2 (1)

+2(σ12 − σ22 )ϕx ϕy ϕxy ))dt + σ1 ∇ϕdBt .

(16)

582

C. Avenel, E. Mémin, and P. Pérez

Recalling that the mean curvature can be expressed as: κ = curv(ϕ) =

1 (Δϕ − ∇ϕT ∇2 ϕ ∇ϕ), ∇ϕ

(17)

where ∇2 ϕ denotes the Hessian matrix and Δϕ the Laplacian, the surface evolution law may be written in a more compact form as: dϕ = (wn∗ ∇ϕ +

σ12 σ22 (1) κ∇ϕ + ∇ϕT ∇2 ϕ ∇ϕ)dt + σ1 ∇ϕdBt . (18) 2 2∇ϕ2

It can be observed from (18) that if both incertitudes have the same strength (i.e. σ1 = σ2 ) this model takes a particular simple form: 1 (1) dϕt = (wn∗ ∇ϕ + σ12 Δϕ)dt + σ1 ∇ϕdBt . 2

(19)

The dynamical model (2) constitutes a general stochastic process allowing to guide a curve through an implicit surface. This stochastic process will enable us to draw samples of curves in our tracking process. Before turning to the experiments, it is interesting to see to what corresponds the expectation of these stochastic processes. It can be shown, through Kolmogorov backward equation (the adjoint of the Fokker-Planck equation) that the expectation u(x, t) = Ex (Φ(X t )) evolves as: σ12 σ2 ∂u = (wn∗ + 2 κ)∇u + ∇uT ∇2 u ∇u, and u(x, 0) = Φ0 (x), (20) ∂t 2 2∇u2 where Φ0 denotes the initial surface, built from an initial value of the contour. This equation gives us the evolution law of the expectation on a ﬁxed grid of an implicit surface driven by a stochastic dynamical model of form (9). This dynamical model includes two independent Brownian uncertainty on the curve motion directed along the curve’s tangent and normal respectively. The ﬁrst term corresponds to the traditional deterministic evolution law of a level set function. The curvature term is here introduced due to the eﬀect of the motion incertitude along the curves tangent. The second term is less usual and corresponds to an uncertainty directed along the surface normal. If both uncertainties are set to the same amplitude then the previous equation simpliﬁes as: = wn∗ ∇u + u(x, 0) = Φ0 (x). ∂u ∂t

4

σ2 2∇Φ2 Δu,

(21)

Experiments and Results

Motion Information Extracted from the Images. The evolution laws introduced in the previous section are based on a stochastic force w calculated

Tracking Closed Curves with Non-linear Stochastic Filters

583

from the image. We now introduce the force we use in our experiments. It is a linear combination of two main components: w ∗(i) = β(t)v T n + (1 − β(t))∂ϕ F (ϕ(i) ) n

(22)

with proportions β(t) ∈ [0, 1] and 1 − β(t) respectively. The ﬁrst component is a motion component obtained from an optical ﬂow computation, while the second corresponds to a photometric edge component obtained from a generalized ChanVese operator [12]. Optical-Flow Component. The motion component v = (v x , v y )T is provided by a robust and fast optical-ﬂow estimator. It is deﬁned as the minimizer of the objective function: T f (∇I v + I(t + dt) − I(t)1p(zt |ν(x))<1− )dx + λ (∇v x 2 + ∇v y 2 )dx. Ω

Ω

(23) Function f is a robust function whose role is to discard data that signiﬁcantly deviates from the brightness constancy assumption. This function together with the characteristic function deﬁned from a local likelihood computed over a neighborhood ν(x) of x ∈ Ω (eq. 7) provides a smooth motion ﬁeld on the whole image plane that represents only the motion of data points that likely correspond to the object of interest. This motion component is a rough description of the curve’s motion. It is reasonable to combine it with a photometric edge force.

Fig. 1. Tracking of a skier; ﬁrst row: drift term with only the photometric edge component; second row: drift term deﬁned as a combination of a photometric edge components and a motion component

Photometric Edge Component. The second component is derived from an operator [12] that corresponds to Chan and Vese operator [22] applied to histograms. It is thus deﬁned from the derivative w.r.t. the unknown level set of the following objective function: F (ϕ, I)(x, t) = d(h(ν(x)), h0 )2 1ϕ(x)<0 + d(h(ν(x)), hb )2 1ϕ(x)≥0 ,

(24)

584

C. Avenel, E. Mémin, and P. Pérez

Fig. 2. Tracking cyclone Vince in infrared channel of Meteosat satellite

where d is the Bhattacharyya distance, ho and hb denote respectively the reference interior and exterior color histograms instantiated at time 0, h(ν(x)) represents the local color histogram at point x. The gradient of this objective function reads: ∂ϕ F = (d(h(ν(x)), ho )2 − d(h(ν(x)), hb )2 )δ(ϕ),

(25)

where δ(.) is the Dirac function. Both components have their own advantages in the time interval between measurement instants k and k + 1. For our tracking purpose, the photometric component is especially helpful in the temporal vicinity of the second images, whereas the optical-ﬂow component is more likely to be meaningful as a rough component of the motion only in the temporal vicinity of the ﬁrst image. As a consequence we choose to change gradually the proportion of each according to: β(t) =

2t − 1, t ∈ [0, Δk]. Δk

(26)

In order to illustrate the role of each component we show ﬁrst results on a sequence of 21 frames depicting a skier in action. On Fig. 1, the ﬁrst row exhibits the results obtained when considering only the photometric component with a constant weight. The second row shows the results obtained from the combination of the optical-ﬂow and the photometric components. Between t = 13 and t = 15, the skier moves rapidly to the right of the image. It can be observed that in the ﬁrst case, the tracker quickly focuses on the skier’s shadow only. In the second case, the optical ﬂow term allows us to cope with this large displacement and to improve the result. Let us outline that for visualisation purposes, we have centered all the images on the skier. The second sequence on which we present results is composed of 100 meteorological images (Meteosat infra-red image) showing the evolution of cyclone Vince over North Atlantic. In Fig. 2 we show in red the level set associated to the mean of all implicit function particles (after resampling) and the standard

Tracking Closed Curves with Non-linear Stochastic Filters

585

deviation of the estimate. As can be observed from these pictures or from the companion video the results are of good quality. The method allows a robust tracking of the regions of interest. When the cyclone collapses at the end of the sequence, the tracking becomes less certain and the variance of the estimation grows. Such an assessment of estimate conﬁdence is another great advantage of probabilistic techniques. We ﬁnally present results on 30 frames of a video showing a lion running in the savanna. The results obtained are shown in Fig. 3.

Fig. 3. Tracking of lion running in the savanna with our particle ﬁlter on the space of implicit functions

We can observe on this sequence that for regions where background color is a source of high ambiguities (i.e., around such as the legs), the uncertainty is important. The top of the lion is clearly distinct from the background, it is therefore segmented with better accuracy and conﬁdence. Beside the quality of the results local conﬁdence assessment via variance vizualisation (or analysis) is an interesting feature of our approach. This could probably be of practical interest in medical image applications. In order to show the advantage of our method, we present in Fig. 4 the same sequence with successive segmentations obtained using the Chan-Vese operator only. We can observe the lack of continuity in the tracking, and the selection of several portions of the background due to color ambiguities. Our method avoids these problems by favoring a continuous evolution of the implicit surface. Our method involves two main parameters, which are related to the incertainty we have on the curve dynamics. The estimation of these parameters is not addressed in this paper but will be investigated in future researches. We

Fig. 4. Successive segmentations of lion running in the savanna

586

C. Avenel, E. Mémin, and P. Pérez

have observed that better results were obtained for a noise along the curve tangent that is slightly larger than for the one directed along the normal. For the sequences shown in this paper we chose σ1 = 3 and σ2 = 4.

5

Conclusions and Future Work

In this paper we have described a probabilistic ﬁltering method for the tracking of level sets. The technique we propose is implemented through a particle ﬁlter and combines discrete-time image measurements with a continuous-time stochastic dynamics. This continuous dynamics relies on two diﬀerent incertainties on the curve motion, directed respectively along the curve normal and along the curve tangent. The considered curve dynamics has been built from the image data by considering a drift term that combines in varying proportions a motion component and a photometric component. The measurement considered in this ﬁlter are built from color histograms of the object delineated by the user at the initial time. The ﬁrst perspective concerns the automatic estimation of the two noise variances. The ﬁrst one is related to the incertainty on the motion whereas the second one corresponds to the level of noise in the image. Another perspective concerns the management of occlusions. To that end, an idea would be to modify the coeﬃcient of the normal noise according to the average of all likelihoods of particles. Thus, in case of loss of the object, the uncertainty would grow, resulting in a spread and expansion of the level sets and, as a consequence, in a more likely recovery of the tracker when the object reappears. Finally, it could be interesting to investigate the use a Brownian motion of higher dimension to capture a larger set of deformations between two consecutive frames.

References 1. Cremers, D., Soatto, S.: Motion competition: A variational framework for piecewise parametric motion segmentation. IJCV 62(3), 249–265 (2005) 2. Goldenberg, R., Kimmel, R., Rivlin, E., Rudzsky, M.: Fast geodesic active contours. IEEE Trans. on Image Processing 10(10), 1467–1475 (2001) 3. Kimmel, R., Bruckstein, A.M.: Tracking level sets by level sets: a method for solving the shape from shading problem. Comput. Vis. Image Underst. 62(1), 47–58 (1995) 4. Niethammer, M., Tannenbaum, A.: Dynamic geodesic snakes for visual tracking. In: CVPR (1), pp. 660–667 (2004) 5. Osher, S., Sethian, J.A.: Fronts propagating with curvature-dependent speed: Algorithms based on Hamilton-Jacobi formulations. Journal of Computational Physics 79, 12–49 (1988) 6. Paragios, N., Deriche, R.: Geodesic active regions: a new framework to deal with frame partition problems in computer vision. J. of Visual Communication and Image Representation 13, 249–268 (2002) 7. Sethian, J.: Level set methods: An act of violence - evolving interfaces in geometry, ﬂuid mechanics, computer vision and materials sciences (1996) 8. Cremers, D.: Dynamical statistical shape priors for level set based tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(8), 1262–1273 (2006)

Tracking Closed Curves with Non-linear Stochastic Filters

587

9. Leventon, M., Grimson, E., Faugeras, O.: Statistical shape inﬂuence in geodesic active contours. In: CVPR (2000) 10. Paragios, N.: A level set approach for shape-driven segmentation and tracking of the left ventricle. IEEE Trans. on Med. Imaging 22(6) (2003) 11. Cremers, D., Soatto, S.: Variational space-time motion segmentation. In: ICCV 2003: Proceedings of the Ninth IEEE International Conference on Computer Vision, Washington, DC, USA, p. 886. IEEE Computer Society, Los Alamitos (2003) 12. Papadakis, N., Mmin, E.: A variational technique for time consistent tracking of curves and motion. Journal of Mathematical Imaging and Vision 31(1), 81–103 (2008) 13. Jiang, T., Tomasi, C.: Level-set curve particles. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3953, pp. 633–644. Springer, Heidelberg (2006) 14. Rathi, Y., Vaswani, N., Tannenbaum, A., Yezzi, A.: Tracking deforming objects using particle ﬁltering for geometric active contours. IEEE Trans. Pattern Analysis and Machine Intelligence 29(8), 1470–1475 (2007) 15. Pérez, P., Hue, C., Vermaak, J., Gangnet, M.: Color-based probabilistic tracking. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 661–675. Springer, Heidelberg (2002) 16. Juan, O., Keriven, R., Postelnicu, G.: Stochastic motion and the level set method in computer vision: Stochastics active contours. International Journal of Computer Vision 69(1), 7–25 (2006) 17. Arnaud, E., Mmin, E.: Partial linear gaussian model for tracking in image sequences using sequential monte carlo methods. IJCV 74(1), 75–102 (2007) 18. Jazwinski, A.H.: Stochastic Processes and Filtering Theory. Academic Press, London (1970) 19. Liu, J.S., Chen, R.: Sequential Monte Carlo methods for dynamic systems. Journal of the American Statistical Association 93(443), 1032–1044 (1998) 20. Karatzas, I., Shreve, S.E.: Brownian Motion and Stochastic Calculus. Graduate Texts in Mathematics. Springer, Heidelberg (2004) 21. Oksendal, B.: Stochastic Diﬀerential Equations: An Introduction with Applications (Universitext). Springer, Heidelberg (2005) 22. Chan, T., Vese, L.: An active contour model without edges. In: Nielsen, M., Johansen, P., Fogh Olsen, O., Weickert, J. (eds.) Scale-Space 1999. LNCS, vol. 1682, pp. 141–151. Springer, Heidelberg (1999)

A Multi-scale Feature Based Optic Flow Method for 3D Cardiac Motion Estimation Alessandro Becciu1 , Hans van Assen1 , Luc Florack1,2 , Sebastian Kozerke3, Vivian Roode1 , and Bart M. ter Haar Romeny1 1

Eindhoven University of Technology, Biomedical Engineering, Eindhoven 5600 MB, The Netherlands [email protected] 2 Eindhoven University of Technology, Mathematics and Computer Science, Eindhoven 5600 MB, The Netherlands 3 Institute for Biomedical Engineering, University of Zurich and Swiss Federal Institute of Technology, Zurich, Switzerland

Abstract. The dynamic behavior of the cardiac muscle is strongly dependent on heart diseases. Optic ﬂow techniques are essential tools to assess and quantify the contraction of the cardiac walls. Most of the current methods however are restricted to the analysis of 2D MR-tagging image sequences: due to the complex twisting motion combined with longitudinal shortening, a 2D approach will always suﬀer from throughplane motion. In this paper we investigate a new 3D aperture-problem free optic ﬂow method to study the cardiac motion by tracking stable multi-scale features such as maxima and minima on 3D tagged MR and sine-phase image volumes. We applied harmonic ﬁltering in the Fourier domain to measure the phase. This removes the dependency of intensity changes of the tagging pattern over time due to T1 relaxation. The regular geometry, the size-changing patterns of the MR-tags stretching and compressing along with the tissue, and the phase- and sine-phase plots represent a suitable framework to extract robustly multi-scale landmark features. Experiments were performed on real and phantom data and the results revealed the reliability of the extracted vector ﬁeld. Our new 3D multi-scale optic ﬂow method is a promising technique for analyzing true 3D cardiac motion at voxel precision, and free of through-plane artifacts present in multiple-2D data sets.

1

Introduction

Cardiac diseases represent number one cause of death and disability in the western countries [1]. Symptoms of cardiac illness can be sometimes traced back from the adolescence [2, 3], making a prevention in the childhood a necessity. Cardiac illnesses may inﬂuence the deformation of the cardiac walls. A visualization and quantiﬁcation of cardiac motion may therefore become an important step in the diagnosis, giving indications of the progress of the disease and/or therapy and perhaps even as precursors of cardiac symptoms. X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 588–599, 2009. c Springer-Verlag Berlin Heidelberg 2009

A Multi-scale Feature Based Optic Flow Method

589

Optic ﬂow is one of the traditional techniques in carrying out motion analysis. It measures the apparent velocity pattern of moving structures in an image sequence. In computer vision literature, several optic ﬂow approaches have been described, ranging from gradient based to feature based methods. Diﬀerential techniques compute the velocity from spatiotemporal image intensity derivatives or altered versions of the image, using low-pass or band-pass ﬁlters. In most of these techniques it is assumed that brightness does not change by small displacements and the motion is estimated by solving the so-called Optic Flow Constraint Equation (OFCE): Lx u + Ly v + Lt = 0 3

(1)

where L(x, y, t) : R → R is an image sequence, Lx , Ly , Lt are the spatiotemporal derivatives, u(x, y, t), v(x, y, t) : R3 → R are unknown velocity vectors and x, y and t are the spatial and temporal coordinates respectively. Since there is one equation and two unknowns (u and v), a unique solution cannot be found. This has been referred as the "aperture problem" and can be solved by generating as many equations as the unknown velocities. In order to ﬁnd a plausible solution for equation (1), Horn and Schunck [4] combined the gradient constraint with a global smoothness term, ﬁnding the solution by minimizing an energy function. Lucas and Kanade [5] proposed a local diﬀerential technique, for which the ﬂow ﬁeld is constant in a small spatial neighborhood. The results obtained by the early methods were impressively improved by Brox et al. and Bruhn et al. [6, 7], who investigated a continuous, rotationally invariant energy functional and gave a multi-grid approach to the variational optic ﬂow methods. In literature feature tracking techniques are also well studied methods for motion estimation. For instance Thirion [8] proposed optic ﬂow method introducing an analogy with Maxwell’s demons. In the technique the constant brightness assumption is preserved and the feature points are pushed toward their successive most likely position by forces. One of the ﬁrst applications of optic ﬂow methods to tagged MRI was introduced by Dougherty et al. [9]. Florack et al. [10] developed a robust diﬀerential technique in a multi-scale framework, whose application to cardiac MR images was presented by Niessen et al. [11,12] and Suinesiaputra et al. [13]. Van Assen et al. and Florack and Van Assen [14,15] developed a method based on multiple independent MR tagging acquisitions, removing altogether the aperture problem, by generating as many equations as unknowns. In recent years there has been a high increase of computational power and it is becoming more feasible to compute 3-dimensional optic ﬂow ﬁelds from MRI data. However, most of the current methods for ﬂow estimation are restricted to the analysis of 2-dimensional MR images, even if the extension to 3-dimensional approach would be straightforward. In case of cardiac motion estimation, 2-dimensional optic ﬂow techniques capture only expansions, contractions and rotations of the cardiac tissue, missing, however, the twisting motion. A 3-dimensional optic ﬂow technique takes into account all the components of the cardiac motion, providing therefore a more realistic estimation of the heart behaviour. The 3-dimensional version of equation (1) is:

590

A. Becciu et al.

Lx u + Ly v + Lz w + Lt = 0

(2)

where u(x, y, z, t), v(x, y, z, t) and w(x, y, z, t) : R4 → R are now the unknown velocity vectors. An example of 3-dimensional gradient based optic ﬂow estimation has been proposed in 2004 by Barron [16]. He explored the 3-dimensional motion from gated MRI cardiac datasets extending the Horn and Schunck and Lucas and Kanade approaches to three dimensions. This method, however, imposes a constant intensity assumption, which in MRI tagging images does not hold due to the T1 relaxation. Pan et al. [17] instead tracked a cardiac mesh, consisting of a collection of material points extracted from HARP images. The estimation, however, is done on sparse set of HARP planes, therefore the tracking can not be performed for every point within the heart volume. A similar approach which makes use of the so-called "slice-following" was performed by Sampath and Prince [18]. In this paper we investigate cardiac motion from image volumes by exploiting point features in Gaussian scale-space. These features are interesting candidates for motion analysis: for those points the aperture problem does not arise and they are detected in a robust framework, which is inspired by ﬁndings of the multiscale structure of the visual system. In the experiments maxima and minima are chosen as feature points and the approach has been tested on an artiﬁcial and real image sequence. Outcomes of the proposed technique reveal the reliability of the vector ﬁeld. In Section 2 a preprocessing approach is presented. In 2.1 and 2.2 the image structure of the data and the dataset are discussed. The multi-scale framework used in the experiments and a convenient technique for extracting multi-scale features is explored in Section 3. There we also present the calculation of a sparse velocity vector ﬁeld, the dense ﬂow ﬁeld extension and the angular error measure. Finally in Sections 4 and 5 we describe the experiment, the results, and discuss future directions.

2 2.1

Materials Image Structure

In 1988 Zerhouni et al. [19] introduced a tagging method for noninvasive assessment of myocardial motion. The method introduces structure, represented as dark stripes (Figure 1 top), on the image aiming to improve the visualization of the intramyocardial motion. The approach was later improved by Axel and Dougherty and Fischer et al. [20, 21], who explored magnetic resonance imaging using spatial modulation of magnetization (SPAMM) and (CSPAMM) respectively. The images, however, suﬀer from tag fading, making the frames not suitable for optic ﬂow methods based on conservation of brightness. In the harmonic phase (HARP) method [22, 23], MR images are ﬁltered in the spectral domain and this technique overcomes the fading problem by taking into account the spatial phase information from the inverse transform of the ﬁltered images. In our experiments a similar technique was employed using Gabor ﬁlters [24]. Three tagged image series with mutually perpendicular tag lines were acquired

A Multi-scale Feature Based Optic Flow Method

591

(ﬁgure 1 top) and all but the ﬁrst harmonic peak was suppressed using a bandpass ﬁlter in the Fourier domain (Figure 1, row 3). After applying the inverse Fourier Transform, in the ﬁltered images the phase varies periodically from 0 to 2π creating a saw tooth pattern (Figure 1 row four, columns 1 to 3). A sine function was applied to the phase images so as to avoid spatial discontinuities in the input due to the saw tooth pattern. A combination of sine phase frames was later employed to produce a grid, from which the feature points (maxima and minima) were retrieved (Figure 1 bottom).

Fig. 1. Top: cross sections of the cardiac MR tagged images volumes of a patient. From left to right: short axis view (frames present horizontal tags), 2 long axis views (frames present vertical and horizontal tags). Second row: Fourier spectrum of the MR tagged images. Middle: Fourier spectrum with the band-pass ﬁlter. Fourth row: phase plots, the phase varies periodically from 0 to 2π creating a saw tooth pattern. Bottom: sine phase images and volume grid obtained by combining three sine phase volumes.

2.2

Dataset

The experiments were carried out on a 3-dimensional tagged MR image volume sequence of a patient heart. The data were acquired using a 3D CSPAMM sequence [25] developed at ETH Zurich, Switzerland and consisted of 23 frames

592

A. Becciu et al.

with a temporal resolution of 30 ms. In each frame, 14 image slices were present for each of three diﬀerent views (one short axis and two long axis views). The diﬀerent views were perpendicular with respect to each other (Figure 2) and by combining them, a grid is obtained from which the critical points were retrieved. The images present a resolution of 112 × 112 pixels and in order to obtain an image volume of 112 × 112 × 112 voxels, linear interpolation through the 14 slices was applied.

Fig. 2. Ninth cardiac MR tagged frame. From left to right: short axis view (frames present horizontal tags), 2 long axis views (frames present vertical and horizontal tags) and a combination of the image planes.

3 3.1

Method Scale Space

Scale is one of the most important concepts in human vision. When we look at a scene, we instantaneously view its contents at multiple scale levels. The Gaussian scale-space representation L(x, y, z, s) ∈ R3 × R+ of a raw 3-dimensional image f (x, y, z) ∈ R3 is deﬁned by the convolution of f (x, y, z) with a Gaussian kernel φ(x, y, z, s) ∈ R3 × R+ . L(x, y, z, s) = (f ∗ φ)(x, y, z, s) where φ(x, y, z, s) =

√ 1 ( 2πs)3

exp(− x +

2

2

+y +z 2s2

2

(3)

). In equation (3) x, y and z are the

spatial coordinates, whereas s ∈ R denotes the variance of the Gaussian kernel (scale). Equation (3) provides a blurred version of the image, where the strength of blurring depends on the choice of scale. For an extensive review on scale space see [26, 27, 28, 29]. 3.2

Critical Point Detection

Singularities (critical points) induced by the MR tagging pattern are interesting candidates for structural descriptions. Computation of critical points in scale space can be performed in an eﬃcient way by detecting locations where the gradient of the input image vanishes. Classiﬁcation of the detected points can be then carried out by determining the sign of the eigenvalues of the Hessian matrix. Locations where the signs of all eigenvalues are positive correspond to locations of local minima; locations where the eigenvalues are all negative, match with locations of local maxima and, ﬁnally, eigenvalues with mixed signs provide information about saddle points.

A Multi-scale Feature Based Optic Flow Method

3.3

593

Sparse Velocities of Feature Points and Dense Flow Field

In our experiments given a sequence of frames, we assume that the singularity (feature) points move along with the moving tissue (this is true by construction of the tagging pattern, provided the feature points correctly correspond to the tag crossings). In general, given a point in a sequence of frames deﬁned as L(x(t), y(t), z(t), t), where t indicates the time, the critical points are deﬁned implicitly by a vanishing spatial gradient: ∇L(x(t), y(t), z(t), t) = 0

(4)

In order to track the feature points, we derive equation (4) with respect to time and apply the chain rule for implicit functions, yielding: ⎡ ⎤ Lxx u + Lxy v + Lxz w + Lxt d [∇L(x(t), y(t), z(t), t)] = ⎣ Lyx u + Lyy v + Lyz w + Lyt ⎦ = 0 (5) dt Lzx u + Lzy v + Lzz w + Lzt d is the total time derivative, and where we have dropped space-time where dt arguments on the r.h.s. for simplicity. Equation (5) holds only on location of critical points and can be also written as: ⎡ ⎤ u ⎣ v ⎦ = −H −1 ∂∇L (6) ∂t w

where H denotes the Hessian matrix of L(x(t), y(t), z(t), t). The velocities computed by equation (6) represent the ﬂow ﬁeld at a sparse set of positions. In order to retrieve a dense velocity ﬁeld, the sparse velocities have been interpolated using homogeneous diﬀusion interpolation. Given a spatial domain Ω → R3 , the scalar functions u(x, y, z), v(x, y, z) and w(x, y, z) are the horizontal and vertical components of a velocity vector V : Ω → R3 . We know the velocity vectors just at certain positions and we call these vectors V = { u, v , w} such that V : Ωs → R3 , where Ωs is a ﬁnite subset of Ω. We are interested in retrieving a dense set of vectors V ∀x, y, z ∈ Ω. In order to do so, we minimize the energy function E(u, v) = (∇u(x, y, z)2 + ∇v(x, y, z)2 + ∇w(x, y, z)2 )dxdydz (7) Ω

under the constraint V = V ∀x, y, z ∈ Ωs . The minimization of equation (7) is carried out by employing Euler-Lagrange equations and the resulting expression can be solved with numerical schemes. 3.4

Angular Error

The ﬂow vector at certain positions in the image can deviate from the true ﬂow vector at that position in direction and in length. In our assessment we

594

A. Becciu et al.

are interested in the movement from one frame to the next. Therefore, we set the time component of the ﬂow vector to 1, yielding a 4-dimensional vector V = {u, v, w, 1}. The computed vector ﬁeld has been compared with the ground truth extracted by two artiﬁcial sequences described in Section 4. The assessment has been performed using the so-called average angular error (AAE) introduced by Barron et al. [30] Vt Ve Angular Error = arccos( ) · 2 2 2 2 2 2 u + v ut + vt + wt + 1 e e + we + 1

(8)

where Vt is the true vector with spatial component ut , vt , wt and time component 1, whereas Ve is the estimated velocity vector and ue , ve , we and 1 are its spatial and time components respectively.

4

Results

The proposed optic ﬂow method was applied on a real sequence of 23 MR image volumes (Figure 1), representing the beating heart of a patient. The images presented a resolution of 112 × 112 × 112 √ voxels and contained tags of 8 voxels wide. The spatial scale is deﬁned as σ = 2s and the experiments were performed from spatial scale σ = 1 until scale σ = 3 at time scale 1. In order to assess the quality of extracted vector ﬁeld, one artiﬁcial translating sequence of 19 frames was built using the ﬁrst frame of sine phase grid image (Figure 1, row 5 and column 4). The algorithm was also tested on a more realistic sine phase grid phantom with the same number of frames and with non rigid motion, such as contraction and expansion. Computed vector ﬁelds of the translating sequence and the expanding and contracting phantom are depicted in Figure 3. The computation of the ﬂow ﬁeld was performed from frame 8 to frame 11 in order to avoid outliers due to temporal boundary conditions. In Table 1 the performance of the proposed method, employing multi-scale maxima and minima, is displayed. Error measurement was carried out only on locations of retrieved features, in order to assess the reliability of the corresponding velocity. In both sequences, evaluation revealed high accuracy of the extracted

Fig. 3. Vector ﬁelds in the artiﬁcial sequence. Vector ﬁeld of the translating sequence (left) and two frames of the contracting and expanding phantom’s vector ﬁeld( middle and right).

A Multi-scale Feature Based Optic Flow Method

595

Table 1. Performance of the proposed optic ﬂow method with diﬀerent multi-scale feature points. In the experiments the Average Angular Error (AAE) and its standard deviation have been employed as error measurement. The error measure is expressed in degrees. The scales used in the experiment were: spatial scale σ = {1, 1.3, 1.7, 2.3, 3}, time scale 1. Translating Sequence Nonrigid Motion Feature Maxima Minima Maxima Minima AAE 5.4 × 10−5 2.4 × 10−5 1.0 0.2 Std 2.1 × 10−5 1.3 × 10−5 1.4 0.1

vector ﬁelds for both maxima and minima, suggesting to employ a combination of the two retrieved velocities during the interpolation process. The error measure is expressed in degrees. Accuracy of the dense vector ﬁeld is dependent on the reconstruction method used. As a preliminary study, the homogeneous diﬀusion interpolation method was applied in this optic ﬂow algorithm. Figure 4 depicts plots of average angular error for both phantoms with respect to the scale σ. The graphs reveal that the smallest average angular error was obtained at diﬀerent scales for diﬀerent features, highlighting the importance of using a multi-scale approach. In particular for the translating sequence, maxima and minima (Figure 4 row 1) obtained best performance at scale σ = 1 and scale σ = 1.3 respectively, in case of the contracting and expanding phantom, maxima and minima registered best performance at scale σ = 2.3 and scale σ = 1.7 respectively (Figure 4 row 2). Figure 5 displays the 3-dimensional sparse vector ﬁelds on 2-dimensional cross-section of the tenth frame of the real cardiac image volume. The heart is in phase of contraction. On the short axis view in row 1 and column 1, the velocity vectors in yellow point not only to the center AAE 0.00012

AAE 0.00008

0.00010 0.00006 0.00008 0.00006

0.00004

0.00004 0.00002 0.00002 0.00000 0.5

1.0

1.5

2.0

2.5

3.0

3.5

Σ

0 0.5

1.0

1.5

2.0

2.5

3.0

3.5

3.0

3.5

Σ

AAE 2.0

AAE 10 8

1.5

6 1.0 4 0.5 2 0 1.0 2 4

1.5

2.0

2.5

3.0

3.5

Σ

0.0 1.0

1.5

2.0

2.5

Σ

0.5 1.0

Fig. 4. Average Angular Errors plots in function of scale. Plots in row 1 display the average angular error for the vector ﬁeld extracted from the translating sequence. Case maxima, column 1; case minima, column 2. Plots in row 2 depict the average angular error for motion ﬁeld computed from the contracting and expanding phantom. Case maxima, column 1; case minima, column 2.

596

A. Becciu et al.

Fig. 5. Three-dimensional velocity ﬂow ﬁeld on two-dimensional cross sections of the cardiac image volume. Short axis (row 1) and two long axis (row 2). The 3-dimensional vectors describe with accuracy the cardiac motion and overcome problems typical of the 2-dimensional optic ﬂow methods, such as through-plane motion detection.

of the ventricle, but point also down. To the right, the same image is displayed from another perspective showing how the method is able to ﬁnd through-plane components of the velocity vectors. This is conﬁrmed also by the velocity vectors of the long axis view images in row 2, which point down as well.

5

Discussion

In this paper we investigate a new method to track cardiac motion from 3dimensional volume images by following the movement of multi-scale singularity points. The computed 3-dimensional vector ﬁeld exhibits expansions, contractions and twistings of the cardiac tissue (Figure 5), making the results more realistic than velocity ﬁelds obtained by a 2-dimensional approach. In the latter case, results would highlight only contractions, expansions and rotations of the cardiac muscle, and through-plane motion would not be detected. The proposed method is not based on conservation of brightness, but based on the assumption

A Multi-scale Feature Based Optic Flow Method

597

that an extremum individuated on the crossing between the tags will still remain an extremum after a displacement, and even under T1 relaxation, showing as tag fading. Therefore, equation 4 does hold in this case. Moreover, in order to improve the localization of critical points, the images have been ﬁltered in the Fourier domain, a phase image sequence is reconstructed and a sine-phase grid sequence has been generated. The new images adhere to the brightness conservation principle due to the ﬁltering in Fourier domain, the fading problem is avoided completely and equation 4 holds for these ﬁltered images as well. The method has been assessed using two phantoms, one translating sequence and one expanding and contracting phantom, for which the ground truth was known. In both cases qualitative and quantitative analysis of the results emphasize the reliability of the vector ﬁeld. The experiments have been carried out using only multi-scale maxima and multi-scale minima, in future tests the algorithm will be assessed also with other multi-scale features points and combinations of those. In the tests the velocity ﬁeld of our approach has been extracted at ﬁxed scales. The most suitable scale has been chosen taking into account performance of the method with respect to the ground truth. In real data, due to continuous deformation of the cardiac walls, the structure changes scale over time, thus, the ﬁnal results obtained in the assessment may not be optimal. Therefore, it may be interesting to repeat the same experiments by using a more sophisticated scale selection methods. Furthermore, the behavior of the cardiac muscle is characterized by twistings and contractions, therefore, interpolation with a term, that takes into account the rotation and the expansion of the vector ﬁeld may improve the results. Finally, the retrieved motion ﬁeld may ﬁnd also an application in validating mathematical models describing heart deformation. Ubbink et al. [31], for instance, compared three simulations of the cardiac muscle, illustrating how the orientation of modeled myoﬁbers plays an important role in the computation of the ﬁnal strain. A validation of these methods might be carried out by comparing the simulated strain with a ground truth strain calculated from the extracted optic ﬂow ﬁeld using real data.

Acknowledgements We would like to thank Dr Markus van Almsick for his help in Mathematica implementation. This work is supported by the ENN 06760 project grant through the Stichting voor de Technische Wetenschappen (STW).

References 1. Rosamond, W., Flegal, K., Furie, K., Go, A., Greenlund, K., Haase, N., Hailpern, S.M., Ho, M., Howard, V., Kissela, B., Kittner, S., Lloyd-Jones, D., McDermott, M., Meigs, J., Moy, C., Nichol, G., O’Donnell, C., Roger, V., Sorlie, P., Steinberger, J., Thom, T., Wilson, M., Hong, Y.: American heart association statistics committee and stroke statistics subcommittee: heart disease and stroke statistics 2008 update. A report from the american heart association statistics committee and stroke statistics subcommittee. Circulation 117, 2–122 (2008)

598

A. Becciu et al.

2. Rainwater, D.L., McMahan, C.A., Malcom, G.T., Scheer, W.D., Roheim, P.S., McGill, H.C., Strong, J.: Lipid and apolipoprotein predictors of atherosclerosis in youth. Arteriosclerosis, Thrombosis, and Vascular Biology 19, 753–761 (1999) 3. McGill, H.C., McMahan, C.A., Zieske, A.W., Sloop, G.D., Walcott, J.V., Troxclair, D., Malcom, G.T., Tracy, R.E., Oalmann, M.C., Strong, J.P.: Associations of coronary heart disease risk factors with the intermediate lesion of atherosclerosis in youth. Arteriosclerosis, Thrombosis, and Vascular Biology 20 (2000) 4. Horn, B.K.P., Shunck, B.G.: Determining optical ﬂow. Artiﬁcial Intelligence 17, 185–203 (1981) 5. Lucas, B., Kanade, T.: An iterative image registration technique with application to stereo vision. In: DARPA, Image Process., vol. 21, pp. 85–117 (1981) 6. Brox, B., Bruhn, A., Papenberg, N., Weickert, J.: High accuracy optical ﬂow estimation based on a theory for warping. In: Pajdla, T., Matas, J.G. (eds.) ECCV 2004. LNCS, vol. 3024, pp. 25–36. Springer, Heidelberg (2004) 7. Bruhn, A., Weickert, J., Kohlberger, T., Schnoerr, C.: A multigrid platform for real-time motion computation with discontinuity-preserving variational methods. International Journal of Computer Vision 70(3), 257–277 (2006) 8. Thirion, J.P.: Image matching as a diﬀusion process: an analogy with Maxwell’s demons. Medical Image Analysis 2(3), 243–260 (1998) 9. Dougherty, L., Asmuth, J., Blom, A., Axel, L., Kumar, R.: Validation of an optical ﬂow method for tag displacement estimation. IEEE Transactions on Medical Imaging 18(4), 359–363 (1999) 10. Florack, L., Niessen, W., Nielsen, M.: The intrinsic structure of optic ﬂow incorporating measurements of duality. International Journal of Computer Vision 27(3), 263–286 (1998) 11. Niessen, W., Duncan, J., ter Haar Romeny, B., Viergever, M.: Spatiotemporal analysis of left ventricular motion. In: Medical Imaging 1995, San Diego, pp. 192– 203. SPIE (1995) 12. Niessen, W., Duncan, J., Nielsen, M.L.F., ter Haar Romeny, B., Viergever, M.: A multiscale approach to image sequence analysis. Computer Vision and Image Understanding 65(2), 259–268 (1997) 13. Suinesiaputra, A., Florack, L., Westenberg, J., ter Haar Romeny, B., Reiber, J., Lelieveldt, B.: Optic ﬂow computation from cardiac MR tagging using a multiscale diﬀerential method a comparative study with velocity encoded MRI. In: Ellis, R.E., Peters, T.M. (eds.) MICCAI 2003. LNCS, vol. 2878, pp. 483–490. Springer, Heidelberg (2003) 14. van Assen, H.C., Florack, L., Suinesiaputra, A., ter Haar Romeny, B.M., Westenberg, J.J.M.: Purely evidence based multi-scale cardiac tracking using optic ﬂow. In: MICCAI 2007 workshop on Coputational Biomechanics for Medicine II, pp. 84–93 (2007) 15. Florack, L., van Assen, H.C.: Dense multiscale motion extraction from cardiac cine MR tagging using HARP technology. In: Mathematical Methods in Biomedical Image Analysis. Workshop of the ICCV (2007) 16. Barron, J.: Experience with 3D optical ﬂow on gated mri cardiac datasets. In: Proceedings of the 1st Canadian Conference on Computer and Robot Vision, pp. 370–377. IEEE Computer Society, Los Alamitos (2004) 17. Pan, L., Prince, J., Lima, J., Arts, N.: Fast tracking of cardiac motion using 3DHARP. IEEE transactions on Biomedical Engineering 52(8), 1425–1435 (2005) 18. Sampath, S., Prince, J.: Automatic 3D tracking of cardiac material markers using slice-following and harmonic-phase MRI. Magnetic Resonance Imaging 25, 197–208 (2007)

A Multi-scale Feature Based Optic Flow Method

599

19. Zerhouni, E.A., Parish, D.M., Rogers, W.J., Yang, A., Sapiro, E.P.: Human heart: tagging with MR imaging a method for noninvasive assessment of myocardial motion. Radiology 169(1), 59–63 (1988) 20. Axel, L., Dougherty, L.: MR imaging of motion with spatial modulation of magnetization. Radiology 171(3), 841–845 (1989) 21. Fischer, S.E., McKinnon, G., Maier, S., Boesiger, P.: Improved myocardial tagging contrast. Magnetic Resonance in Medicine 30(2), 191–200 (1993) 22. Osman, N.F., McVeigh, W.S., Prince, J.L.: Cardiac motion tracking using cine harmonic phase (harp) magnetic resonance imaging. Magnetic Resonance in Medicine 42(6), 1048–1060 (1999) 23. Sampath, S., Derbyshire, J., Atalar, E., Osman, N., Prince, J.: Realtime imaging of two dimensional cardiac strain using a harmonic phase magnetic resonance imaging (HARP MRI) pulse sequence. Magnetic Resonance in Medicine 50(1), 154–163 (2003) 24. Gabor, D.: Theory of communication. J. IEE 93(26), 429–457 (1946) 25. Rutz, A., Ryf, S., Plein, S., Boesiger, P., Kozerke, S.: Accelerated whole-heart 3D CSPAMM for myocardial motion quantiﬁcation. Magnetic Resonance in Medicine 59(4), 755–763 (2008) 26. Koenderink, J.J.: The structure of images. Biological Cybernetics 50, 363–370 (1984) 27. ter Haar Romeny, B.M.: Front-End Vision and Multi- Scale Image Analysis: Multiscale Computer Vision Theory and Applications, written in Mathematica. Computational Imaging and Vision. Kluwer Academic Publishers, Dordrecht (2003) 28. Florack, L.: Image Structure. Computational Imaging and Vision. Kluwer Academic Publishers, Dordrecht (1997) 29. Lindeberg, T.: Scale-Space Theory in Computer Vision, 1st edn. The Springer Intern. Series in Engineering and Computer Science. Kluwer Academic Publishers, Dordrecht (1994) 30. Barron, J.L., Fleet, D.J., Beauchemin, S.: Performance of optical ﬂow techniques. International Journal of Computer Vision 12(1), 43–77 (1994) 31. Ubbink, S., Bovendeerd, P., Delhaas, T., Arts, T., van de Vosse, F.: Towards modelbased analysis of cardiac MR tagging data: relation between left ventricular shear strain and myoﬁber orientation. Medical Image Analysis 10, 632–641 (2006)

A Combined Segmentation and Registration Framework with a Nonlinear Elasticity Smoother Carole Le Guyader1 and Luminita A. Vese2 1

IRMAR, UMR CNRS 6625 Institut National des Sciences Appliquées de Rennes 20, Avenue des Buttes de Coësmes, CS 14315, 35043 RENNES Cedex, France [email protected] 2 Department of Mathematics, University of California, Los Angeles Los Angeles, CA 90095-1555, USA [email protected]

Abstract. In this paper, we present a new non-parametric combined segmentation and registration method. The problem is cast as an optimization one, combining a matching criterion based on the active contour without edges [4] for segmentation, and a nonlinear-elasticity-based smoother on the displacement vector ﬁeld. This modeling is twofold: ﬁrst, registration is jointly performed with segmentation since guided by the segmentation process; it means that the algorithm produces both a smooth mapping between the two shapes and the segmentation of the object contained in the reference image. Secondly, the use of a nonlinearelasticity-type regularizer allows large deformations to occur, which makes the model comparable in this point with the viscous ﬂuid registration method [7]. Several applications are proposed to demonstrate the potential of this method to both segmentation of one single image and to registration between two images.

1

Introduction

Image registration and image segmentation are challenging issues that are encountered in a wide range of ﬁelds such as medical imaging (shape tracking, comparison of images taken at diﬀerent instants, data fusion from images that have not necessarily been acquired with the same modality, comparison of data to a common reference frame), pattern recognition or geophysics, etc. We propose in this paper a segmentation model based on the active contour model without edges [4], that is no longer solved in terms of level set functions. This is now solved using registration techniques. Therefore, a displacement ﬁeld models the deformation of the initial curve into the ﬁnal segmented boundary via registration. Thus, the binary segmentation functional [4], F (c1 , c2 , φ) = ν1 |R − c1 |2 H(φ) + ν2 |R − c2 |2 (1 − H(φ)) + μ|∇H(φ)| dx Ω

(R is the given image, φ is a level set function describing the unknown contour, H is the Heaviside function), can be reformulated as a warping problem X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 600–611, 2009. c Springer-Verlag Berlin Heidelberg 2009

A Combined Segmentation and Registration Framework

601

between the binary image deﬁning the initial contour, and the (unknown) binary segmented image. Or the proposed model can also be used for registration between two images: having a segmentation of one of the images deﬁned via a displacement ﬁeld, this is used as initial guess in the “registration-segmentation” model, to segment/register the second image. The main ingredients of our proposed minimization model are thus the active contour model without edges [4], and registration via a non-linear elasticity smoother, solved in a particular simpliﬁed way. The unknown level set function φ is substituted by the unknown transformation, with an appropriate regularization as a substitute for the length term. Topology-preserving segmentation results can be obtained. An extensive overview of registration techniques can be found in [24]. These can be partitioned into two classes: parametric and non-parametric. In the nonparametric methods (our framework) the problem is phrased as a functional minimization whose unknown is the displacement vector ﬁeld u. Denoting by T the template, by R the reference, the introduced functional combines a distance measure component D[R, T, u] and a smoother on the displacement vector ﬁeld S = S[u] to remove the ill-posed character of the problem. Usually, the distance measure is intensity-driven and is chosen to be the L2 −norm of the diﬀerence between the deformed template and the reference (suitable when the images have been acquired through similar sensors), i.e. D[R, T, u] =

1 2

2

(T (x + u) − R(x)) dx, Ω

but one could also use correlation-based or mutual information-based techniques [24]. Several methods to regularize the displacement vector ﬁeld have been investigated. One is the elastic registration introduced by Broit [3], in which the objects to be registered are considered to be the observations of a same elastic body before and after being subjected to a deformation. The smoother S = S[u] is chosen to be the linearized elastic potential of the displacement vector ﬁeld u and its expression integrates the Lamé coeﬃcients λ, μ which reﬂect material properties. A drawback of this smoother is that it is not suitable for problems involving large deformations. To circumvent this problem, Christensen et al. [7] proposed a viscous ﬂuid registration model in which objects are viewed as ﬂuids evolving in accordance with the ﬂuid-dynamic Navier-Stokes equations. However, this is a computationally expensive procedure. In the diﬀusion registration model introduced by Fischer and Modersitzki [11], the smoother is based on the semi-norm of H 1 (Ω, IRn ) of u = (u1 , · · · , un )T , Ω being an open bounded subset of IRn . Regularizing properties motivate this choice (it minimizes oscillations of all components of u) rather than physical ones but here again only small deformations can be expected. In the ”curvature”based registration model introduced by Fischer and Modersitzki [12], [13], the H˙ 2 (biharmonic) regularization is explored. Aﬃne linear transformations belong to the kernel of the regularizer S[u], which is not the case in elastic, viscous ﬂuid or diﬀusion registration. But here again, transformations are restricted to small deformations. To circumvent this drawback, we propose in this paper a nonlinear elasticity-based smoother that allows larger deformations.

602

C. Le Guyader and L.A. Vese

Many improvements or alternatives of these non-parametric methods have been proposed. These include [14], [15], [37], [21], [20], [19]. By comparison with some of these methods, the only input required in our method is a ﬁxed level set function representing the template image, that is, partitioning the image into two regions. Also, we jointly treat segmentation and registration: the distance measure is devised using the segmentation criterion [4], while registration is jointly performed, guided by the segmentation process. Our method applies to a particular class of images, since the binary criterion is being used. Before depicting our approach, we would like to mention previous work for joint segmentation and registration while stressing the main diﬀerences with our model. In [38], Yezzi et al. also suggest to jointly treat segmentation and registration. The authors couple segmentation and registration as follows: denoting R : Ω ⊂ : Ω ⊂ IR2 → IR the two images containing a common IR2 → IR and T = R object to be registered and segmented, ﬁnd a closed curve C ⊂ Ω and a closed ⊂ Ω related by C = g(C) where g : IR2 → IR2 is an element of a curve C ﬁnite dimensional group G (for instance, the group of rigid motions) such that correctly delineate the object contained respectively in R and the one C and C Consequently, there are two unknowns, the closed curve C ⊂ Ω contained in R. and the mapping g. The authors exploit region-based active contour models [4] and minimize the energy: E(g, C) = E1 (C) + E2 (g(C)) |R − c1 |2 dx + |R − c2 |2 dx + = Cin

Cout

in C

− c1 |2 dx + |R

out C

− c2 |2 dx |R

with Cin and Cout the regions inside and outside C, c1 and c2 the mean values in and C out the regions inside and outside of R on Cin and Cout , and with C out . The main diﬀerences C, c1 and c2 the mean values of R on Cin and C with our model are: the contours C and C are jointly deformed here through a combination of segmentation and registration methods while in our model, we assume that the object in the template image has already been detected (we could have considered a problem with two unknowns as well). It means that the energy-minimization problem is only written in terms of the unknown contour C. Segmentation is performed using a registration approach as in [38]. The model is cast in the level set setting, which allows a straightforward modeling of the evolving curve. Contrary to [38], the class of admissible deformations (rigid, etc...) is not an input in our model. Their model, ﬁrst exposed in the context of rigid deformations, has then been extended to non-rigid motions [35], [34], [29]. We would also like to mention the interesting work by Lord et al. [22] which uses a matching criterion based on metric structure comparison. The authors propose a uniﬁed method that simultaneously treats segmentation and registration by introducing two unknowns in the process: the deformation map and the segmenting curve. The segmentation process is guided by the registration map. The matching criterion, unlike classical registration methods, rests on the minimization of deviation from isometry. The matching criterion introduced is based on the metric structure comparison of the surfaces, more precisely on their ﬁrst

A Combined Segmentation and Registration Framework

603

fundamental form, and on a homogeneity constraint as in [4]. Thus contrary to our model in which the expected curve (implicitly represented as the zero level set of a Lipschitz function) delineates two regions with homogeneous intensity, their criterion is still based on metric structure comparisons to disconnect normal regions from abnormal ones. We would also like to mention the related work by Vemuri et al. [31], [32]. The authors propose a coupled PDE model to perform both segmentation and registration. In the ﬁrst PDE, the level sets of the source image are evolved along their normals with a speed deﬁned as the diﬀerence between the target and the evolving source image. The second PDE allows to explicitly retrieve the displacement vector ﬁeld. In particular, in the work of Vemuri-Chen [30] for joint registration and segmentation, the piecewise-smooth level set segmentation model from [33] is combined with prior shape information through global alignment. As will be seen below, our model is diﬀerent from the one in [30]. We also refer the reader to [5] in which a geodesic-active-contour-based model including a shape prior is presented and [6] in which a shape prior is incorporated this time in the Mumford-Shah model. Related work is presented in [10], on an atlas-based segmentation of medical images locally constrained by level sets. We wish to refer to a segmentation method, diﬀerent from ours, that also uses nonlinear elasticity to deﬁne the deformation of the evolving contour or surface in Rouchdy et al. [27]. The segmentation criterion is based on the gradient vector ﬂow [36], and a deformation ﬁeld is computed via non-linear elasticity using the ﬁnite element method. For completeness, we also refer the reader to [2], [23] for a variational registration method for large deformations, to [26], for a much related work which also uses nonlinear elasticity regularization but which is implemented using the ﬁnite element method, and to [9], a related work that uses nonlinear elasticity principles but diﬀerent from our proposed approach. More details of the proposed method are presented in [18].

2

Description of the Proposed Model

As mentioned in Sect. 1, the scope of the proposed method is twofold: – devise a model in which segmentation and registration are jointly performed. – large and smooth deformations must be authorized, while keeping the deformation map topology-preserving. We see in the sequel how these criteria are fulﬁlled. Distance Measure Criterion. Let Ω be a bounded open subset of IRn . For the ¯ → IR purpose of illustration, we consider the case n = 2. Let us denote by R : Ω the “reference” image to be segmented (later we will discuss how the proposed ¯ → IR and method can be used for registration between a template image T : Ω the reference image R; initially, our method is deﬁned as a segmentation method based on [4]). Let Φ0 be a given Lipschitz level set function. Denoting by C the zero level set of Φ0 and w ⊂ Ω the open set it delineates, Φ0 is such that:

604

C. Le Guyader and L.A. Vese

C = {x ∈ Ω | Φ0 (x) = 0} , w = {x ∈ Ω | Φ0 (x) > 0} , Ω \ w ¯ = {x ∈ Ω | Φ0 (x) < 0} . The deformation of the evolving curve is made in order to satisfy a segmentation criterion. Indeed, the distance measure we introduce is related to the ﬁtting term of the active contours without edges model [4]. In this way, registration and segmentation are correlated and we expect, at the end of the process, to obtain the segmentation of the reference image as well as a smooth deformation map. It results in a region-based intensity approach and no longer in a pointwise process as usually done. The idea is to ﬁnd a smooth displacement vector ﬁeld u = (u1 , u2 ) : Ω → IR2 , x → (u1 (x), u2 (x)) ∈ Ω, for each x ∈ Ω, such that the zero level line of Φ deﬁned by Φ(x) = Φ0 (x + u(x)) ﬁts the boundary of the object to be warped in the given “reference” image. Denoting by H the onedimensional Heaviside function, by ν1 , ν2 > 0 two ﬁxed parameters and c1 and c2 being two unknown constants depending on Φ0 , R and u, the distance measure functional Fd (the segmentation criterion) is deﬁned by: |R(x) − c1 |2 H (Φ0 (x + u(x))) dx Fd (c1 , c2 , u) = ν1 Ω |R(x) − c2 |2 (1 − H (Φ0 (x + u(x)))) dx. (1) + ν2 Ω

We need to add a regularization term of the form Freg (u) to (1), which is a substitute for the length term of the evolving curve in [4], and therefore the unknown Φ(x) from [4] is substituted by Φ0 (x + u(x)), with Φ0 ﬁxed now. Thus, we obtain a binary segmentation method that can also be used for registration. Introduction of a Nonlinear Elasticity-Based Regularizer. A regularizing term Freg is now introduced to ensure the smoothness of the displacement vector ﬁeld u. To allow large displacements, we introduce a nonlinear-elasticitybased smoother. We propose to view the deformation of the initial contour into the ﬁnal segmented contour as the deformation undergone by St. Venant-Kirchhoﬀ materials. These materials are homogeneous, isotropic, hyperelastic and the axiom of frame indiﬀerence is satisﬁed (see [8] for further details). Let us denote by ε the Green-St. Venant strain tensor deﬁned by: ε = 12 (C − I) with C = ∇ϕT ∇ϕ, ϕ being the deformation such that ϕ = Id+u, ∇ϕ being the Jacobian matrix and I denoting the identity matrix. We have equivalently ε = ε(u) = 12 (∇uT + ∇u + ∇uT ∇u). The strain tensor is a measure of the deviation between a given deformation and a rigid deformation for which C = I. As stressed by Ciarlet ( [8]), St. Venant-Kirchhoﬀ materials are the simplest ones among nonlinear models (large strains are also possible when the stress is small, however a linear relation implies that the stress is small if and only if the strain is small). The stored energy of St. Venant-Kirchhoﬀ materials [8] is given by W (ε) = λ2 (tr ε)2 + μtr ε2 . Thus, the nonlinear elasticity regularizer that will be coupled with the distance measure functional Fd is deﬁned by: λ 2 2 (tr ε(u)) + μtr ε (u) dx . W (ε(u)) dx = (2) Freg (u) = 2 Ω Ω

A Combined Segmentation and Registration Framework

605

Although this functional does not satisfy known theoretical assumptions (the stored energy function is not polyconvex; it is also not rank-1 convex and consequently not quasiconvex, which raises a drawback of theoretical nature since the introduced functional is not lower semi-continuous on W 1,4 ) to insure existence of minimizers, we can expect to get, in practice, better results than those obtained with linearized models, as will be demonstrated next. The computation of the Euler-Lagrange equation satisﬁed by u is cumbersome. Following the idea of the more theoretical work [25], we propose to circumvent this issue by introducing a second unknown, a matrix auxiliary variable V , which approximates the Jacobian matrix of u. The nonlinear elasticity regularizer is thus applied to V and no longer to ∇u, that is, the nonlinearity is no longer in the derivatives of the unknown u. Also, as the matrix variable V is introduced to mimic the Jacobian matrix of u, an additional term based on the Frobenius norm denoted by || · ||F of ∇u − V is incorporated in the modeling. More precisely, letting T T V = V +V2+V V and α > 0 a tuning parameter, we redeﬁne the smoothing functional Freg = Freg (u, V ) by: α W (V ) dx + ||∇u − V ||2F dx . (3) Freg (u, V ) = 2 Ω Ω In the limit, as α → +∞, we obtain ∇u V in the L2 -topology. Total Energy Functional. The total energy Etotal considered in the remainder of this work is given by: Etotal (c1 , c2 , u, V ) = Fd (c1 , c2 , u) + Freg (u, V ).

(4)

Evolution Problem. We give the form of the associated Euler-Lagrange equations in the two-dimensional case. In the calculations, the Heaviside function is replaced by a smooth version denoted by H and H = δ , regularization of the Dirac measure. Fixing u and V and minimizing Etotal (c1 , c2 , u, V ) with respect to c1 and c2 yields, as in [4]: R(x)H (Φ0 (x + u(x))) dx R(x) (1 − H (Φ0 (x + u(x)))) dx , c2 = Ω . c1 = Ω H (Φ0 (x + u(x))) dx (1 − H (Φ0 (x + u(x)))) dx Ω

Ω

Computing the ﬁrst variation of functional Fd (c1 , c2 , u) in (1) with respect to u gives the following gradient: ∂u Fd (c1 , c2 , u) = ν1 (R − c1 )2 −ν2 (R − c2 )2 δ (Φ0 (x + u(x))) ∇Φ0 (x+u(x)) . Also, computing the ﬁrst variation of functional Freg (u, V ) in (3) with respect to u gives only linear diﬀerential equations in each ui :

∂vk2 ∂vk1 , k = 1, 2. (5) + ∂uk Freg (u, V ) = −α uk − ∂x1 ∂x2

606

C. Le Guyader and L.A. Vese

To ﬁnish, setting V = (vij )1≤i,j≤2 and letting c01 = v11 + v22 +

1 2 2 2 2 2 2 v11 + v12 , c02 = 2v11 + v11 + v21 + v22 + v21 2

2 2 c03 = 2v22 + v12 + v22 , c04 = v12 + v21 + v11 v12 + v21 v22 ,

we obtain:

∂v11 Freg (u, V ) = α v11 −

∂u1 ∂x1

∂v12 Freg (u, V ) = α v12 −

∂u1 ∂x2

∂v21 Freg (u, V ) = α v21 −

∂u2 ∂x1

∂v22 Freg (u, V ) = α v22 −

∂u2 ∂x2

+ (λc01 + μc02 )(1 + v11 ) + μc04 v12 .

+ (λc01 + μc03 )v12 + μc04 (1 + v11 ).

+ (λc01 + μc02 )v21 + μc04 (1 + v22 ).

+ (λc01 + μc03 )(1 + v22 ) + μc04 v21 .

(6)

We solve the Euler-Lagrange equations in u and V using gradient descent, parameterizing the descent direction by an artiﬁcial time t ≥ 0. Systems of 4 and 2 equations are obtained (solved by semi-implicit ﬁnite diﬀerence schemes), ∂V = −∂V Freg (u, V ), ∂t

∂u = −∂u Fd (c1 , c2 , u) − ∂u Freg (u, V ), ∂t

(7)

equipped with the boundary conditions u = 0IR2 on ∂Ω and with the initial conditions u(x, 0) = 0IR2 and V = 0M2 (IR) . In most cases, no regridding is necessary. Nevertheless, in the algorithm, we have used a regridding technique quite similar to the one proposed by Christensen et al. [7]. The Jacobian det(∇(Id + u)) is monitored and if it drops below a deﬁned threshold in some parts of the image, the process is reinitialized. The only change is that instead of doing the reinitialization step with the last deformed template as done in [7], we use the last deformed level set function Φ0 (· + u(·)). The overall displacement u is reconstructed similarly to [7].

3

Numerical Experiments

We conclude the paper by presenting several results on both synthetic and real images in 2 dimensions. In most experiments, ν1 = ν2 = 1 but when dealing with complex topologies involving long and thin concavities, these parameters have been increased up to 2.5. The C ∞ regularization of the Heaviside function [4] is 1 2 z H (z) = 2 1 + π arctan . Our ﬁrst experimental test in Fig. 1 is an academic one and is similar to those performed by Modersitzki in [24] (we refer to pages 114–115, 129–130, 150–153, 168–170 for comparisons using linear elasticity, diﬀusion, curvature, or the viscous ﬂuid method), with the goal to illustrate that the model easily handles large displacements while segmenting the reference object. The problem is to warp a

A Combined Segmentation and Registration Framework

607

Fig. 1. Top: left, the reference image; right the template. Bottom: left, the boundary of the disk (zero level set of Φ0 ) superimposed on the reference image; middle, the segmentation of the letter C; right, deformed grid using nonlinear elasticity regularization.

Fig. 2. Left, boundary of the ellipse (zero level set of Φ0 ) superimposed on the reference image; middle, the topology-preserving segmentation of the two disks; right, deformed grid using nonlinear elasticity regularization

black disk to the letter C both deﬁned on the same image domain. The given data are the template and reference images as well as the curve delineating the disk boundary. We wish to demonstrate that our method qualitatively performs in a way similar to the ﬂuid model without requiring the expensive Navier-Stokes solver employed for its numerical discretization, and provides two results: the segmentation of the reference image as well as a smooth displacement vector ﬁeld u. The implementation is simple, based on ﬁnite diﬀerence schemes, and allows to remove the nonlinearity in the derivatives of the unknown u. The method allows large deformations unlike the linear elasticity model, diﬀusion model, curvature-based model for which the registration cannot be accomplished, the images diﬀering too much (see pages 114–115, 150–153, 168–171 from [24]). In this example, three regridding steps were necessary: the transformation was considered as admissible if the Jacobian exceeded 0.01. Note that regridding steps were also necessary with the ﬂuid registration model.

608

C. Le Guyader and L.A. Vese

Fig. 3. Topology-preserving segmentation of three complex slices of the brain. Left, the boundary of the disk (zero level set of Φ0 ) superimposed on the reference image; middle, the segmentation of the slice of the brain; right, deformed grid using nonlinear elasticity regularization.

The second example in Fig. 2 illustrates how the method can be used in the case of topology-preserving segmentation ([16], [1], [28], [17] on this topic). This synthetic reference image represents two disks (similar to tests performed in prior related works [16], [28], [17]). The template image, deﬁned on the same image domain is made of a black ellipse such that, when superimposed on the reference image, its boundary encloses the two disks. We aim at segmenting these two disks while maintaining the same topology throughout the process (one pathconnected component) and at obtaining a smooth displacement vector ﬁeld u. In this example, two regridding steps were necessary: the transformation was considered as admissible if the Jacobian exceeded 0.01. The method has been tested on complex slices of brain data. The goal is to register a disk to the outer boundary of the brain with topology preservation. In Fig. 3, the template image, deﬁned on the same image domain, is made of a disk (shown superimposed on the reference). Two regridding steps were necessary for the ﬁrst row, and 3-4 regridding steps for the 2nd and 3rd rows: the transformation was considered as admissible if the Jacobian exceeded 0.01.

A Combined Segmentation and Registration Framework

609

Fig. 4. Top: left, reference R; right, template T (mouse atlas and gene data). Bottom, left to right: contour obtained by the proposed algorithm segmenting template T (starting with Φ0 deﬁning a disk), superimposed over the reference R; segmented reference, using as Φ0 the output contour detected at the previous step; ﬁnal deformed grid using nonlinear elasticity smoother.

Fig. 5. Experiment exactly as in Fig. 4

Another medical application, as shown in Fig. 4 and Fig. 5, is proposed for mapping mouse gene data to an atlas. First, the proposed method is applied to the gene data, using Φ0 deﬁning a disk, to segment it and extract a contour; then the method is applied again using as Φ0 the new contour, to segment the atlas data. In the process, we obtain a smooth deformation between the gene and the atlas data. No regridding step was necessary for Fig. 4.

Acknowledgments This work was supported in part by the National Institutes of Health (NIH) through the NIH Roadmap for Medical Research Grant U54 RR021813 entitled

610

C. Le Guyader and L.A. Vese

Center for Computational Biology, and by the National Science Foundation Grant DMS 0312222.

References 1. Alexandrov, O., Santosa, F.: A topology-preserving level set method for shape optimization. J. Comput. Phys. 204(1), 121–130 (2005) 2. Beg, F., Miller, M., Trouvé, A., Younes, L.: Computing large deformation metric mappings via geodesic ﬂows of diﬀeomorphisms. IJCV 61(2), 139–157 (2005) 3. Broit, C.: Optimal Registration of Deformed Images. PhD thesis, Computer and Information Science, University of Pensylvania (1981) 4. Chan, T., Vese, L.: Active Contours Without Edges. IEEE Trans. Image Process. 10(2), 266–277 (2001) 5. Chen, Y., Thiruvenkadam, H., Tagare, H., Huang, F., Wilson, D.: On the Incorporation of Shape Priors in Geometric Active Contours. In: IEEE Workshop on VLSM, pp. 145–152 (2001) 6. Chen, Y., Thiruvenkadam, H., Gopinath, K., Brigg, R.: Image Registration Using the Mumford-Shah Functional and Shape Information. In: World Multiconference on Systems, Cybernetics and Informatics, pp. 580–583 (2002) 7. Christensen, G.E., Rabbitt, R.D., Miller, M.I.: Deformable Templates Using Large Deformation Kinematics. IEEE Trans. Image Process. 5(10), 1435–1447 (1996) 8. Ciarlet, P.G.: Elasticité Tridimensionnelle. Masson (1985) 9. Droske, M., Rumpf, M.: A variational approach to non-rigid morphological registration. SIAM J. Appl. Math. 64(2), 668–687 (2004) 10. Duay, V., Houhou, N., Thiran, J.-P.: Atlas-based segmentation of medical images locally constrained by level sets. In: ICIP, vol. 2 (2005) 11. Fischer, B., Modersitzki, J.: Fast Diﬀusion Registration. AMS Contemporary Mathematics. Inverse Problems, Image Analysis, and Medical Imaging 313, 117– 129 (2002) 12. Fischer, B., Modersitzki, J.: Curvature based image registration. JMIV 18(1), 81– 85 (2003) 13. Fischer, B., Modersitzki, J.: A Uniﬁed Approach to Fast Image Registration and a New Curvature Based Registration Technique. Linear Algebra and its applications 380, 107–124 (2004) 14. Haber, E., Modersitzki, J.: Numerical methods for volume preserving image registration. Inverse problems 20(5), 1621–1638 (2004) 15. Haber, E., Modersitzki, J.: Image Registration with Guaranteed Displacement Regularity. Int. J. Comput. Vision 71(3), 361–372 (2007) 16. Han, X., Xu, C., Prince, J.L.: A Topology Preserving Level Set Method for Geometric Deformable Models. IEEE Trans. Pattern Anal. Mach. Intell. 25(6), 755–768 (2003) 17. Le Guyader, C., Vese, L.: Self-repelling snakes for topology-preserving segmentation models. IEEE Trans. Image Process. 17(5), 767–779 (2008) 18. Le Guyader, C., Vese, L.: A Combined Segmentation and Registration Framework with a nonlinear Elasticity Smoother. UCLA C.A.M. Report 08-16 (2008) 19. Leow, A., Chiang, M.-C., Protas, H., Thompson, P., Vese, L., Huang, H.S.C.: Linear and Non-Linear Geometric Object Matching with Implicit Representation. In: Proc. 17th ICPR, vol. 3, pp. 710–713 (2004)

A Combined Segmentation and Registration Framework

611

20. Liao, W.-H., Khuu, A., Bergsneider, M., Vese, L., Huang, S.-C., Osher, S.: From Landmark Matching to Shape and Open Curve Matching: A Level Set Approach. UCLA CAM Report 02-59 (2002) 21. Liao, W.-H., Yu, C.-L., Bergsneider, M., Vese, L., Huang, S.-C.: A New Framework of Quantifying Diﬀerences Between Images by Matching Gradient Fields and Its Application to Image Blending. In: Nuclear Science Symposium Conference Record, vol. 2, pp. 1092–1096. IEEE, Los Alamitos (2002) 22. Lord, N.A., Ho, J., Vemuri, B.C., Eisenschenk, S.: Simultaneous Registration and Parcellation of Bilateral Hippocampal Surface Pairs for Local Asymmetry Quantiﬁcation. IEEE Trans. Med. Imaging 26(4), 417–478 (2007) 23. Miller, M., Trouvé, A., Younes, L.: On the metrics and Euler-Lagrange equations of computational anatomy. Annu. Rev. B. Eng. 4, 375–405 (2002) 24. Modersitzki, J.: Numerical Methods for Image Registration. Oxford University Press, Oxford (2004) 25. Negrón Marrero, P.V.: A numerical method for detecting singular minimizers of multidimensional problems in nonlinear elasticity. Numerische Mathematik 58, 135–144 (1990) 26. Rabbitt, R.D., Weiss, J.A., Christensen, G.E., Miller, M.I.: Mapping of hyperelastic deformable templates using the ﬁnite element method. In: Proceedings SPIE, vol. 2573, pp. 252–265 (1995) 27. Rouchdy, Y., Pousin, J., Schaerer, J., Clarysse, P.: A nonlinear elastic deformable template for soft structure segmentation: application to the heart segmentation in MRI. IP 23, 1017–1035 (2007) 28. Sundaramoorthi, G., Yezzi, A.: Global regularizing ﬂows with topology preservation for active contours and polygons. IEEE Trans. Image Process. 16(3), 803–812 (2007) 29. Unal, G.B., Slabaugh, G.G.: Coupled PDE’s for non-rigid registration and segmentation. In: CVPR, pp. 168–175 (2004) 30. Vemuri, B., Chen, Y.: Joint image registration and segmentation. In: Osher, S., Paragios, N. (eds.) Geometric Level Set Methods, pp. 251–269 (2003) 31. Vemuri, B., Ye, J., Chen, Y., Leonard, C.: A level-set based approach to image registration. In: IEEE Workshop on Mathematical Methods in Biomedical Image Analysis, pp. 86–93 (2000) 32. Vemuri, B., Ye, J., Chen, Y., Leonard, C.: Image Registration via level-set motion: Applications to atlas-based segmentation. Medical Image Analysis 7(1), 1–20 (2003) 33. Vese, L., Chan, T.: A Multiphase Level Set Framework for Image Segmentation Using the Mumford and Shah Model. IJCV 50(3), 271–293 (2002) 34. Wang, F., Vemuri, B.C.: Simultaneous registration and segmentation of anatomical structures from brain MRI. In: Duncan, J.S., Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3749, pp. 17–25. Springer, Heidelberg (2005) 35. Xiaohua, C., Brady, J.M., Rueckert, D.: Simultaneous segmentation and registration of medical images. In: Barillot, C., Haynor, D.R., Hellier, P. (eds.) MICCAI 2004. LNCS, vol. 3216, pp. 663–670. Springer, Heidelberg (2004) 36. Xu, C., Prince, J.L.: Snakes, shapes, and gradient vector ﬂow. IEEE Trans. Image Process. 7, 359–369 (1998) 37. Yanovsky, I., Thompson, P.M., Osher, S., Leow, A.D.: Topology Preserving LogUnbiased Nonlinear Image Registration: Theory and Implementation. In: IEEE Conf. on CVPR (2007) 38. Yezzi, A., Zollei, L., Kapur, T.: A variational framework for joint segmentation and registration. IEEE-MMBIA, 44–51 (2001)

A Scale-Space Approach to Landmark Constrained Image Registration Eldad Haber1 , Stefan Heldmann2 , and Jan Modersitzki3 1

Dept. of Math. and Computer Science, Emory Emory University, Atlanta, USA [email protected] 2 Inst. of Mathematics, University of Lübeck, Lübeck, Germany [email protected] 3 Dept. of Computing and Software, McMaster University, Hamilton, Canada [email protected]

Abstract. Adding external knowledge improves the results for ill-posed problems. In this paper we present a new multi-level optimization framework for image registration when adding landmark constraints on the transformation. Previous approaches are based on a ﬁxed discretization and lack of allowing for continuous landmark positions that are not on grid points. Our novel approach overcomes these problems such that we can apply multi-level methods which have been proven being crucial to avoid local minima in the course of optimization. Furthermore, for our numerical method we are able to use constraint elimination such that we trace back the landmark constrained problem to a unconstrained optimization leading to an eﬃcient algorithm.

1

Introduction

Image registration is a challenging problem in digital imaging. Roughly speaking, the problem can be described as follows. Given a reference image R and a template image T , ﬁnd a reasonable spatial transformation y such that the transformed image T [y] is similar to the reference. Image registration is required whenever images resulting from diﬀerent times, devices, and/or perspectives need to be compared or integrated. Alone in the area of medical applications, registration is used in radiation therapy, surgery planing, treatment evaluation, motion correction and estimation and many more, see, e.g. [1, 2, 3, 4, 5, 6] and references therein. See also [7, 8, 9] for related work. However, although the registration problem is easily stated it is hard to be solved. A key diﬃculty is the ill-posedness of the problem: For a particular point x, scalar intensity values R(x) and T (x) are given but a transformation vector y(x) vector is to be computed. A common approach is to phrase image registration as an optimization problem involving a distance measure D reﬂecting similarity of images and a regularization term S reﬂecting reasonability of the transformation. Though appropriate regularization results in a well-posed problem in the sense of Hadamard [10] (see, e.g. [11, 12, 13]), it is sometimes diﬃcult or even impossible to ﬁnd an application conform regularization. X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 612–623, 2009. c Springer-Verlag Berlin Heidelberg 2009

A Scale-Space Approach to Landmark Constrained Image Registration

613

Fig. 1. Reference (left) and template (right) images

A simple example is shown in Fig. 1, where the reference and template image cover the full intensity range and share some obvious symmetries. Considering only rigid transformations, there are four diﬀerent solutions for any reasonable distance measure. Regularization can be used to privilege one of these (for example by penalizing rotations). However, any regularization is somehow artiﬁcial and may favor a meaningless solution. One way to obtain better results and to guide the model towards a more realistic solution is by using landmarks. In the above example, just adding the information that the top-left corner of the square in the reference image corresponds to the bottom-right corner of the square in the template image eliminates three of the above solutions. Adding landmark information to image registration is far from being new, see e.g. [14, 15, 16, 17, 18, 4] and references therein. Although landmarks have been used extensively in the past, the eﬀective numerical implementation of image registration with landmark is unsatisfactory. For example, no landmark registration scheme known to us allow for the incorporation of scale space or multi-level techniques which are frequently used to avoid local minima. Typically, landmark constraints are described in a discrete sense, where the ith pixel in a ﬁxed discretization is constrained. This causes troubles if the discretization is variable, that is, if discretization on diﬀerent scales is used. The goal of this paper is to develop a multilevel technique for the incorporation of landmarks in a registration process. We stress that ignoring issues such as local minima, diﬀerent algorithms than the one proposed here (for example [17]) should give similar results. Thus, the focus of this work is on numerical implementation of multilevel algorithms with landmark constraints. Starting with a variational formulation of the landmark constrained registration problem, this paper provides a consistent numerical approach. The new approach is based on discretize-then-optimize approach and takes advantage of a multi-level discretization. The new approach automatically resolves the problem resulting from a ﬁxed number of constraints versus a varying number of unknowns and related inconsistency of the constraints. A numerical stable and computational feasible basis of the constrained manifold is derived. Using a reduced formulation gives a handle to an elegant algorithm, where indeﬁnite Karush-Kuhn-Tucker systems [19] can be avoided.

614

E. Haber, S. Heldmann, and J. Modersitzki

This paper is organized as follows. Sect. 2 introduces the basic notation and states the problem in a variational framework. A discretized then optimize approach is used to numerically solve the constrained registration problem. Details are outlined in Sect. 3, where the discretization, the construction of a basis for the constraint manifold, the numerical optimization, and a multi-level strategy are described. Sect. 4 presents some numerical results. Conclusions are given in Sect. 5.

2

Variational Formulation

In this section we formulate the constrained registration problem. Let d ∈ N denote the spatial dimension (typically d = 2, 3) and Ω ⊂ Rd the region of interest and let T , R ∈ L2 (Rd , R) denote the template and reference image, respectively. The objective is to ﬁnd a transformation y : Rd → Rd such that the transformed image T [y] in similar to R and the transformation y is regular, where similarity and regularity are measured by D and S, respectively. More precisely, T [y](x) := T (y(x)) for all x ∈ Ω, D[T , R] := 12 Ω (T − R)2 dx, S[y] := α2 Ω |By|2 dx, B := Id ⊗ Δ. Here, for ease of presentation, it is assumed that similarity is quantiﬁed by the energy in the diﬀerence image. However, other distance measure like mutual information [20,21] or normalized gradient ﬁelds [22,23] can be handled similarly. Regularity is measured using the curvature regularizer [24, 25] where the partial diﬀerential operator B is the vector valued Laplacian, | · | denotes the Euclidian norm in Rn , and α is a regularization parameter. Note that the order of the regularizer has to be suﬃciently high to cover the landmark constraints [26, 4]. It is assumed that a number L of landmarks r1 , ..., rL ∈ Rd in the reference and corresponding landmarks t1 , ..., tL ∈ Rd in the template image are given. The automatic detection of landmarks is beyond the scope of this paper; see [16] for an overview. The point evaluation functional is denoted by δx . With (Id ⊗ δr )[y] = (y 1 (r ), ..., y d (r )) = y(r ) ∈ Rd the landmark constraints can be phrased as C[y] = t := (t1 , ..., tL ) ∈ RL,d , where minimize J [y] = D[T [y], R] + S[y − yref ] subject to

C[y] = t,

(1)

where yref allows for a bias towards a particular solution. The above problem is strongly related to plain landmark based registration, where D = 0 and S = S TPS is the bending energy of a thin-plate-spline; see,

A Scale-Space Approach to Landmark Constrained Image Registration

615

e.g. [26,4] for an extended discussion. The solution yTPS is explicitly known and a linear combination of shifts of a radial basis function ρ associate to S and a polynomial correction. Following [4], the kth component of yTPS reads k (x) = yTPS

L

k k θk ρ(|x − r |) + (1, x1 , ..., xd )(θL+1 , ..., θL+d+1 ) ,

(2)

=1

where the coeﬃcients are given by Aθk = (tk1 , . . . , tkL , 0, . . . , 0) with 1 ··· 1 P [ρ(|ri − rj )|)]L i,j=1 , P = r · · · r ∈ Rd+1,L , A= P 0 1 L 2 t log t (d = 2) and ρ(t) = . t (d = 3) In our ﬁnal formulation of the continuous problem, we use this function as a reference for regularization, i.e. yref = yTPS , and it is thus convenient, to rephrase the problem in the update u = y − yref : minimize J [u] = D[T [yref + u], R] + S[u] subject to

C[u] = 0.

(3)

The role of the plain landmark solution as a reference is manifold. It can be seen as a good starting guess for a later implementation, minimizing the risk of being trapped by a local minimum. Moreover, it injects boundary values to region of interest. In fact, these boundary conditions make yTPS linear for x → ∞ and thus invertible, which is preferable for most applications. Finally, it yields homogeneous constraints. As it is pointed out later, this is a crucial point for the discretization as now the feasible set is always non-empty.

3

Numerical Treatment

A discretize-then-optimize approach is used to compute a numerical solution of (3). The discretization is brieﬂy outlined for dimension d = 2, see [27] for a detailed and general description. Note that the discretization is variable during the course of optimization and all quantities introduced in this section depend on the discretization with h. However, in this section a ﬁxed discretization level is assumed and in order to keep the presentation clear, dependencies on h are neglected. 3.1

Discretization

Fig. 2.a shows the discretization of a domain Ω in m = (3, 4) cells with cellcenters xj , j = 1, ..., n = m1 m2 . Note that all discrete quantities depend on the discretization width h, hi = ˆ = h1 · · · hd . The next equations describes how the discrete quantiωi /mi and h ties are assembled. X = (x11 , ..., x1n , ..., xdn ) ∈ Rdn , U = (u11 , ..., u1n , ..., udn ) ∈ Rdn ,

R = (R(x1 ), ..., R(xn )) ∈ Rn , T (U ) = (T (u1 ), ..., T (un )) ∈ Rn .

616

E. Haber, S. Heldmann, and J. Modersitzki

x2 ω2

h2

•

•

•

•

•

•

•

•

xj •

•

0 h1

•

∂i2,h

• ω1

x

1

1 0 −2 1 C B 1 −2 1 C .. 1 B C B . 1 −2 = 2B C C ∈ Rmi ,mi hi B .. A @ . −2 1 1 −2

xb

xd ξ1 r ξ2

xa

xc

Fig. 2. Discretization of a 2D domain Ω = (0, ω1 ) × (0, ω2 ) ⊂ R2 (left); discrete 2nd derivative ∂i2,h (middle); linear interpolation (right)

The discretization of the curvature operator can by expressed as Kronneckerproducts [25] of identity matrices Iq ∈ Rq,q and discrete 2nd derivatives ∂i2,h (see Fig. 2.b): B ≈ B = Id ⊗ (Im2 ⊗ ∂12,h + ∂12,h ⊗ Im1 ). Finally, the integrals are approximated using a midpoint quadrature rule. Thus J [u] ≈ J(U ) =

1 2

ˆ |T (yTPS(X) + U ) − R|2 + h

1 2

ˆ |BU |2 . αh

The ﬁnal step is the discretization of the point evaluation functional δx . For an arbitrary location r , a d-linear interpolation of discrete point evaluation functionals located at the 2d closest grid points is exploited. For example, let d = 2 and let the four neighboring grid points of r be denoted by xa , ..., xd ; see Fig. 2.c. Thus, δr [u] ≈ δrh u = C u(X) = (1 − ξ1 )(1 − ξ2 )u(xa ) + ξ1 (1 − ξ2 )u(xb ) + (1 − ξ1 )ξ2 u(xc ) + ξ1 ξ2 u(xd ), and C is a sparse row vector with non-zero entries only at positions related to the locations of xa , ..., xd . If for a certain discretization a landmarks r would be located precisely on a grid point xj , then C has only one non-zero entry at position j. Assembling these rows for = 1, ..., L results a sparse L-by-n matrix C with at most 2d non-zero entries per row, see Fig. 3.b. The Kronnecker-products Id ⊗ C enables a simultanuous treatment of all components of the discretized vector ﬁeld U . Note that even for a very coarse discretization (n < L) there exists a feasible solution fulﬁlling the constraints: U = 0. Thus the feasible set is non-empty. The discrete formulation of the constrained registration problem thus reads: minimize subject to 3.2

ˆ |T (yref (X) + U ) − R|2 + J(U ) = 12 h (Id ⊗ C)U = 0, U ∈ Rdn .

1 2

ˆ |BU |2 hα

(4)

An Eﬃcient Basis for the Feasible Set

The objective is to derive a numerical feasible basis for the nullspace of the operator C. Note the size L-by-n of C can be large (e.g. n = 1283 and L = 100)

A Scale-Space Approach to Landmark Constrained Image Registration

617

and the rank of this matrix is generally unknown. For a coarse discretization, C has more rows than columns and a ﬁne discretization it has more columns than rows. The basic idea is to reorder the columns of C, such that the non-zeros columns are placed ﬁrst. Let Π denote the corresponding n × n permutation matrix and C ∗ be a matrix consisting of the non-zeros columns of C, such that CΠ = ( C ∗ | 0 ). The size of C ∗ is L-by-p, where p ≤ 2d L since each row of C can have at most 2d non-zeros entries. The matrix C ∗ is not only relatively small but also very sparse. Assuming the number of landmarks to be less then 1.000, it is thus possible to compute a singular value decomposition (SVD) of C ∗ [28], i.e. C ∗ = W ΣV ,

W W = IL ,

where

V V = Ip ,

and Σ = diag(σ1 , ..., σmin{L,p} ) ∈ RL,p ,

σ1 ≥ · · · ≥ σmin{L,p} ≥ 0.

The above SVD enables the computation of the numerical rank of the matrix C ∗ and hence C. To this end let tol be a user proscribed tolerance (e.g. tol = 0 or tol = 10−16 ) and let k the largest integer such that σk > tol. The last p − k columns of V are a basis of the (numerical) nullspace of the matrix C ∗ and thus the columns of Z form a basis for the nullspace of C, where V (:, k+1 : p) ∈ Rp,p−k V (:, k + 1 : p) 0 ∈ Rn,n−k , Z=Π 0 I n−p

and the ﬁnal step undoes the permutation. Important issues are summarized as follows. The matrix C ∗ is relatively small, such that the SVD becomes numerically feasible. The SVD enables a uniform treatment independent of the rank of C ∗ and thus handles a coarse discretization (L > n) as well as a ﬁne discretization (L < n). Note that in the case L > n the solution is the thin plate spline solution since there are 0 degrees of freedom. The columns of Z form a sparse, orthonormal, and numerically stable basis for the set of constraints. For very ﬁne discretizations, the matrix Z is essentially the identity matrix and can be stored eﬃciently. Any feasible vector is given by U = (Id ⊗ Z)w, where w ∈ Rd(n−k) , and there always exists a feasible point w = 0. 3.3

Numerical Optimization

The ﬁnal version of the discrete constrained registration problem is given in terms of the reduced basis and reads minimize J(w) =

1 2

ˆ |T (yref (X) + Zd w) − R|2 + h

1 2

ˆ |BZd w|2 . hα

(5)

where Zd = Id ⊗ Z. In order to ﬁnd a numerical solution to (5) standard optimization techniques can be applied; see e.g. [19] for an overview. Here, we use a Gauss-Newton type

618

E. Haber, S. Heldmann, and J. Modersitzki

algorithm with an Armijo line search as outlined in [27]. The quasi-Newton system is given by Zd HZd δw = −∇J(w) where δw is the new search direction and H = ∇T ∇T + αB B is an approximation to the Hessian. Note that since the regularization is quadratic the term B B is exactly the Hessian of the regularization part and only the data ﬁtting term is approximated. A generalized Gauss-Newton strategy can be used to handle other distance measures as mentioned before. For a numerical solution of the Newton-systems, a preconditioned conjugate gradient solver is used with symmetric Gauss-Seidel preconditioned; see [29] for details. 3.4

The Multilevel Strategy

It remains to describe the multi-level framework. To this end, a multi-level rep resentation {TD , RD , m } of given discrete data is initialized, where for ease of presentation it is assumed that mi = 2 , i = 1, ..., d, = min , ..., max . Note that hi = ωi /mi depends on the level. More precisely, T max = original data,

T −1 = downsample(conv(G, T )),

where G is a smoothing kernel (in our numerical experiments we used the block smoother G = (1, 1, 1)(1, 1, 1)/9). In general, we compute updates to the thin plate spline solution on diﬀerent grids. Similar to many other multilevel algorithms, the solution on ﬁner grid is initialized by the coarser grid solution. To be more speciﬁc, running from coarse to ﬁne, the continuous represen tation T , R for TD , RD are computed (in our numerical experiments, spline interpolation is used). Moreover, the discretized thin-plate spline solution Yref = TPS y (X ) (cf. (2)) for a cell-centered grid X of size m and the matrix Z (cf. Sect.3.2) is initialized. A numerical solution wopt of the discretized registration problem (5) is computed and the current grid solution is given by Yopt = Y0 + Zd wopt . On the coarse grid, the initial guess we choose w0min = 0 as min starting guess such that Y0min = Yref . The starting guess w0 for a ﬁner grid is chosen as the best least squares approximation of the prolongated coarser grid so lution, where P−1 denotes the linear prolongation operator. Since the constraint

−1 Z −1 wopt . basis Z is orthogonal, the computation simpliﬁes to w0 := Z P−1 When designing a multilevel strategy we require set the number of levels. Unfortunately, setting the number of levels is non-trivial. In general, similar to other problems, one requires that the coarsest level actually represents the problem [30].

4 4.1

Results Artiﬁcial 2D Data

We use the hand data shown in Fig. 4. In this example, a synthetic transformation ytrue has been speciﬁed and the reference is a transformed copy of the template image R = T [ytrue ]; see Fig. 4. This construction allows a comparison with

A Scale-Space Approach to Landmark Constrained Image Registration

619

a ground truth. Here, 47 manually detected landmarks t˜j haven been chosen in the template image. Using a numerical approximation to the inverse of the trans−1 ˜ formation, the landmarks in for the reference are deﬁned by rj ≈ ytrue (tj ) and corresponding landmarks in the template image are deﬁned by tj := ytrue (rj ). Note that since ytrue is explicitly known, there are no errors in the landmark pairing. The original data is 128-by-128 and the level ranges from min = 3 to max = 7. Fig. 3.a shows the coarse grid representation of the data. Here, many landmarks can be found in some particular cells. The problem is over-constrained and the 47-by-64 matrix C min is rank deﬁcient (the rank being 27). The non-zero pattern of this matrix is shown in Fig. 3.b.

Fig. 3. Coarse grid representation of data with 47 landmarks (circles), min = 3 (left); non-zero pattern of the matrix C min (right)

Fig. 4 shows the original data (a,b,c) and the results based on the thin-platespline solution yTPS (d,g), an unconstrained solution yun (e,h), and the constrained solution ycon (f,i). The distance measure and landmark error are given by err(Y ) := 100 D(Y )/D(X)[%], D(Y ) = |T (Y ) − R|2 , LM(Y ) := |(Id ⊗ C)Y − t|Frobenius. All three registration approaches (TPS, unconstrained, constrained) perform well for this example. The TPS approach gives perfect results for the landmarks but a large diﬀerence for the trapezoid. The unconstrained approach results a very small diﬀerence but the landmark error is relatively large. Finally, the constrained approach performs perfect on the landmark and results the smallest diﬀerence. The later is due to the fact that the stopping criteria is relative to the initial guess, which is results a smaller distance in the constrained approach. 4.2

3D Example

For our 3D experiment we use real data from CT and 3D power Doppler ultrasound (US) of a human liver. The goal of this application is the alignment of

620

E. Haber, S. Heldmann, and J. Modersitzki

err ≈ 100% LM ≈ 8.1

err ≈ 9% LM ≈ 10−14

err ≈ 0.6% LM ≈ 0.68

err ≈ 0.4% LM ≈ 10−14

Fig. 4. Original template image with landmarks (crosses) and visualization of an artiﬁcial transformation ytrue (top left); reference (top middle) is a transformed template R(x) = T (ytrue (x)), with visualization of transformed landmarks and initial grid; initial diﬀerence |T −R| (top right); transformed template T [y] based on thin-plate spline solution yTPS (center left), unconstrained solution yun (center middle), and constrained solution ycon (center right); diﬀerences |T [y] − R| for y = yTPS (bottom left), y = yun (bottom middle), and y = ycon (bottom right)

A Scale-Space Approach to Landmark Constrained Image Registration

621

Fig. 5. 3D registration of CT and US. Reference (top left) R with landmarks (black balls); (b) template T with landmarks (top right); reference R and deformed template T [yTPS ] after landmark registration (bottom left); reference R and deformed template T [ycon ] after constrained registration (bottom right)

vessels that have been segmented from the original data. Consequently, we have binary images allowing for a direct comparison by the SSD distance measure. The size of the data in our experiment is 171 × 165 × 186 voxels. Additionally, we have 11 corresponding landmarks that were manually picked by an expert; see Fig. 5 (a,b). For the registration we used four levels starting from 22 × 21 × 24 and ranging to the original resolution with 171 × 165 × 186 voxels. Results for a plain landmark based registration by using only the thin-plate-spline solution yTPS and the constrained solution ycon = yTPS + u are shown in Fig. 5(c,d). As it turns out, the landmark solution provides a reasonable alignment but is far from being perfect. On the other hand, using the constrained approach improved the quality of the results considerably and leads to an almost perfect alignment of large parts of the vessel system.

5

Conclusions

The paper presents a variational framework for the landmark constrained registration problem and a discretize-then-optimize approach for computing a

622

E. Haber, S. Heldmann, and J. Modersitzki

numerical solution. A diﬃculty for the multi-level discretization is that the number of constraints is constant while the number of degrees of freedom varies. In particular for a coarse discretization, inconsistent constrains are to be expected. This paper provides a technique to overcome this problem by mixing landmark and update components, which results in compatible constraints. Moreover, it is shown how to eﬃciently compute a stable, orthogonal, and sparse basis for the constraint manifold and thus enabling a reduced space optimization avoiding saddle point problems.

References 1. Glasbey, C.: A review of image warping methods. Journal of Applied Statistics 25, 155–171 (1998) 2. Pluim, J., Maintz, J., Viergever, M.: Mutual-information-based registration of medical images: a survey. IEEE Transactions on Medical Imaging 22, 986–1004 (1999) 3. Hajnal, J., Hawkes, D., Hill, D.: Medical Image Registration. CRC Press, Boca Raton (2001) 4. Modersitzki, J.: Numerical Methods for Image Registration. Oxford University Press, Oxford (2004) 5. Goshtasby, A.A.: 2-D and 3-D Image Registration. Wiley Press, New York (2005) 6. Joshi, A., Shattuck, D., Thompson, P.: Brain image registration using cortically constrained harmonic mappings. In: Karssemeijer, N., Lelieveldt, B. (eds.) IPMI 2007. LNCS, vol. 4584, pp. 359–371. Springer, Heidelberg (2007) 7. Grady, L.: A lattice-preserving multigrid method for solving the inhomogeneous poisson equations used in image analysis. In: Forsyth, D.A., Torr, P.H.S., Zisserman, A. (eds.) Scale Space and Variational Methods in Computer Vision, SSVM, ECCV (2008) 8. Koestler, H.: A Multigrid Framework for Variational Approaches in Medical Image Processing and Computer Vision. Ph.d. dissertation, University of Erlangen, Netherland (2008) 9. Keller, S., Lauze, F., Nielsen, M.: Motion compensated video super resolution. In: Sgallari, F., Murli, A., Paragios, N. (eds.) SSVM 2007. LNCS, vol. 4485, pp. 801–812. Springer, Heidelberg (2007) 10. Hadamard, J.: Sur les problmes aux drives partielles et leur signiﬁcation physique, pp. 49–52. Princeton University Bulletin, Princeton (1902) 11. Weickert, J., Schnörr, C.: A theoretical framework for convex regularizers in PDEbased computation of image motion. Int. J. Computer Vision 45(3), 245–264 (2001) 12. Hinterberger, W., Scherzer, O., Schnörr, C., Weickert, J.: Analysis of optical ﬂow models in the framework of calculus of variations. Num. Funct. Anal. Opt. 23, 69–82 (2002) 13. Droske, M., Rumpf, M.: A variational approach to non-rigid morphological registration. SIAM Appl. Math. 64(2), 668–687 (2004) 14. Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(6), 567–585 (1989) 15. Maurer, C.R., Fitzpatrick, J.M.: A Review of Medical Image Registration. In: Interactive Image-Guided Neurosurgery. In: American Association of Neurological Surgeons, Park Ridge, IL, pp. 17–44 (1993)

A Scale-Space Approach to Landmark Constrained Image Registration

623

16. Rohr, K.: Landmark-based Image Analysis. Computational Imaging and Vision. Kluwer Academic Publishers, Dordrecht (2001) 17. Fischer, B., Modersitzki, J.: Combining landmark and intensity driven registrations. PAMM 3, 32–35 (2003) 18. Ashburner, J., Friston, K.: Spatial normalization using basis functions. In: Frackowiak, R., Friston, K., Frith, C., Dolan, R., Friston, K., Price, C., Zeki, S., Ashburner, J., Penny, W. (eds.) Human Brain Function, 2nd edn. Academic Press, London (2003) 19. Nocedal, J., Wright, S.J.: Numerical optimization. Springer, New York (1999) 20. Collignon, A., Vandermeulen, A., Suetens, P., Marchal, G.: 3D multi-modality medical image registration based on information theory. Computational Imaging and Vision 3, 263–274 (1995) 21. Viola, P.A.: Alignment by Maximization of Mutual Information. PhD thesis, Massachusetts Institute of Technology (1995) 22. Clarenz, U., Droske, M., Rumpf, M.: Towards fast non–rigid registration. In: Inverse Problems, Image Analysis and Medical Imaging, AMS Special Session Interaction of Inverse Problems and Image Analysis, vol. 313, pp. 67–84. AMS (2002) 23. Haber, E., Modersitzki, J.: Intensity gradient based registration and fusion of multimodal images. Methods of Information in Medicine 46(3), 292–299 (2007) 24. Fischer, B., Modersitzki, J.: Fast curvature based registration of MRmammography images. In: Meiler, M., et al. (eds.) Bildverarbeitung für die Medizin, pp. 139–143. Springer, Heidelberg (2002) 25. Fischer, B., Modersitzki, J.: A uniﬁed approach to fast image registration and a new curvature based registration technique. Linear Algebra and its Applications 380, 107–124 (2004) 26. Light, W.A.: Variational methods for interpolation, particularly by radial basis functions. In: Griﬃths, D., Watson, G. (eds.) Numerical Analysis 1995, pp. 94– 106. Longmans, London (1996) 27. Haber, E., Modersitzki, J.: A multilevel method for image registration. SIAM J. Sci. Comput. 27(5), 1594–1607 (2006) 28. Golub, G.H., van Loan, C.F.: Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore (2000) 29. Barrett, R., Berry, M., Chan, T.F., Demmel, J.W., Donato, J., Dongarra, J., Eijkhout, V., Pozo, R., Romine, C., van der Vorst, H.: Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd edn. SIAM, Philadelphia (1994) 30. Trottenberg, U., Oosterlee, C., Schuller, A.: Multigrid. Academic Press, London (2001)

A Variational Approach for Volume-to-Slice Registration Stefan Heldmann and Nils Papenberg Institute of Mathematics, University of Lübeck, Germany {heldmann,papenber}@math.uni-luebeck.de

Abstract. In this work we present a new variational approach for image registration where part of the data is only known on a low-dimensional manifold. Our work is motivated by navigated liver surgery. Therefore, we need to register 3D volumetric CT data and tracked 2D ultrasound (US) slices. The particular problem is that the set of all US slices does not assemble a full 3D domain. Other approaches use so-called compounding techniques to interpolate a 3D volume from the scattered slices. Instead of inventing new data by interpolation here we only use the given data. Our variational formulation of the problem is based on a standard approach. We minimize a joint functional made up from a distance term and a regularizer with respect to a 3D spatial deformation ﬁeld. In contrast to existing methods we evaluate the distance of the images only on the two-dimensional manifold where the data is known. A crucial point here is regularization. To avoid kinks and to achieve a smooth deformation it turns out that at least second order regularization is needed. Our numerical method is based on Newton-type optimization. We present a detailed discretization and give some examples demonstrating the inﬂuence of regularization. Finally we show results for clinical data.

1

Introduction

In this paper we describe a new method for the registration of volumetric images to data that is given only on a low dimensional submanifold. The work is motivated by a clinical problem on improved resection of tissue by pre-operative intervention planning in liver surgery [1, 2]. Before an intervention an extensive planning including the deﬁnition of surgical paths and risk analysis is made. The planning is based on abdominal CT scans of the patient and subsequent segmentation of liver, liver segments, and vessels, cf. Fig. 1(a). During the intervention the surgeon is guided by tracked ultrasound (US) images of the liver. Consequently, the pre-operative CT planning data has to be aligned to the actual deformation of the liver given by the US data. A challenge in laparoscopic liver surgery is that the US data is recorded as a sequence of two dimensional slices in 3-space. Although the spatial ordering of the slices follows the scan path, they are not aligned and in general each slice can have an arbitrary position, cf. 1(b). X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 624–635, 2009. c Springer-Verlag Berlin Heidelberg 2009

A Variational Approach for Volume-to-Slice Registration

(a)

625

(b)

Fig. 1. Clinical image data; (a) pre-operative CT planning data (few slices out of volume and segmentation of the liver); (b) few US slices from a single scan

One approach for the registration of a CT volume and US slices is to use so-called compounding techniques. Therefore, in a ﬁrst step the US slice data is compounded into volume by interpolation and subsequently standard volumetric image registration is applied. However, using compounding has several drawbacks [3, 4, 5]. and practical experiments showed that using this approach for registration performs poorly and did not produce reasonable results. Besides poor performance, matching volumetric CT data to artiﬁcially generated volumetric US data does not provide conﬁdence in registration results for the surgeon. Here, we take a diﬀerent approach by comparing volumetric data directly to the given slice data. We use a variational setting for image registration. Therefore we minimize a cost-functional consisting of a so-called distance measure and regularizer with respect to a volumetric deformation. Here the regularizer is an integral on a d-dimensional domain while the distance is an integral on a d − 1-dimensional manifold. Although this seems to be a slight modiﬁcation it turns out that higher order regularization is necessary to ensure smooth and diﬀerentiable deformations. In this work we provide proof-of-concept for our new approach. Therefore we consider a simpliﬁed mono-modal setting, i.e., we assume the volumetric and the slice data stem from the same type of imaging device. Without loss of generality, this allows for using the easy to present so-called Sum-of-Squared distance measure for the description of our method. The paper is organized as follows. First we present our variational approach to image registration and the novel distance measure. Next we discuss the need of higher-order regularization. In Sect. 4 we present a numerical scheme and subsequent we discuss our speciﬁc discretization of the distance measure and the regularizer in detail. part. Finally, in Sect. 5 we demonstrate the method with a synthesized clinical example.

2

Approach

In general we are given two images, a so-called reference R : Rd → R and a socalled template T : Rd → R. The goal of image registration is to ﬁnd a smooth

626

S. Heldmann and N. Papenberg

deformation y : Ω → Rd that spatially aligns the images best on a domain of interest Ω ⊂ Rd . Typically Ω is a rectangular domain. Mathematically we formulate image registration as an optimization problem [6]. That is, we want to compute a solution y to min y

J (y) := D(R, T (y)) + αS(y)

(1)

where T (y) denotes the composition T ◦ y. The ﬁrst term D of the objective function is a so-called distance measure that quantiﬁes similarity between the reference R and the deformed template T (y). The second building block S is a regularizer forcing smoothness of the solution where α > 0 is a ﬁxed chosen parameter. Typically S has the form [7] S(y) :=

1 By2L2(Ω) 2

(2)

where B is a linear diﬀerential operator. The particular diﬃculty in our case is that the template is a volumetric image while the reference is only known on a few scattered slices. As mentioned in the introduction one can use compounding-techniques to generate an artiﬁcial volume and subsequently use standard distance measure that relies an comparing two images of same dimension. We propose a diﬀerent method. The idea of our new approach is to use only the given data rather than guessing the missing parts of the reference. To make the idea clear, in the following we assume that the distance measure is the socalled sum-of-squared-diﬀerences (SSD) [8], i.e, D is the squared L2 norm of the diﬀerence of the images. This is no loss of generality. The proposed modiﬁcation applies to other distance measures such as mutual information [9,10], too, which is more suitable for multi-modal registration of CT and US data. As mentioned in the introduction, the goal of this paper is proof-of-concept and to outline the general method. Therefore and for ease of presentation, here we use the SSD distance measure. However, the standard SSD for d-dimensional images is given by 2 1 SSD(R, T ) = T (x) − R(x) dx. (3) 2 Ω In our approach we assume the reference is given only on a few planes on Ω. More generally, we assume R is known only on a set of smooth and bounded (d − 1)-dimensional sub-manifolds Mj ⊂ Ω, j = 1, . . . , m. Therefore, we modify (3) and deﬁne our distance measure by 1 D(R, T ) := 2 j=1 m

Mj

2 T (x) − R(x) dS(x)

(4)

where dS is the (d − 1)-dimensional surface measure. Note that in the particular case when Mj are slices we can trace back our modiﬁed distance to a sum of SSD distances of (d − 1)-dimensional images similar to serial registration. In this

A Variational Approach for Volume-to-Slice Registration

627

particular case we can parametrize Mj by linear maps τj with Gram determinant det Dτj Dτj = 1, where Dτj denotes the Jacobian matrix of τj , such that D(R, T ) =

m

SSD(Rj , Tj )

j=1

with Rj := R ◦ τj and Tj := T ◦ τj . Although changing integration in the distance measure seems a slight modiﬁcation of problem (1) it turns out that regularization becomes crucial and needs to be chosen carefully. Since now the data is only given on a low-dimensional manifold the solution is strongly inﬂuenced by the full-space regularization. It turns out that ﬁrst-order regularization, e.g, by choosing B = ∇ in (2), will produce non-diﬀerentiable solutions with kinks at the boundary of the manifold, cf. Fig. 2(e) and (h). In contrast, using second order regularization, e.g., setting B = Δ where Δ denotes the vector Laplacian, produces smooth results, cf. Fig. 2(f) and (i). In Sect. 3 we analyze this behavior by considering a simpliﬁed quadratic functional. Generally, the order of regularization to ensure diﬀerentiability depends on the space dimension. However, from the analysis in Sect. 3 we found that second order regularization is suﬃcient for space dimension d = 2 and d = 3. As a result we particularly propose using the curvature regularizer, i.e., setting B = Δ. Summarizing, for volume-to-slice registration we consider problem (1) with the distance measure (4) and smoother (2) with B = Δ. Thus, our approach is m 2 α 1 min T (y(x)) − R(x) dS(x) + |Δy|2 dx. (5) y 2 j=1 Mj 2 Ω

3

Regularization

In the following we motivate second order so-called curvature regularization [11, 12] by choosing B = Δ. The resulting functional for the registration (cf. (5)) is highly non-linear and in general non-convex which makes an analysis diﬃcult and involved. To illustrate the main point on regularization we now consider a simpliﬁed quadratic problem 1 min By2L2(Ω) + gy dS (6) y 2 M where Ω ⊂ Rd is a domain with smooth boundary (Lipschitz), M ⊂ Ω is a smooth (d − 1)-dimensional manifold, and a function g ∈ L2 (M). Without loss of generality we assume that locally coordinates can be chosen such that M = {x ∈ Ω : xd = 0}. Then we can deﬁne a distribution f as the product of g multiplied by a Dirac-delta distribution, i.e., f is given by f = g δxd , such that f y dx = gδxd y dx = gy dS. (7) Ω

Ω

M

628

S. Heldmann and N. Papenberg

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

Fig. 2. Volume-to-slice-registration results for academic 2D (a)–(f) and 3D (g)–(i) experiments. (a) Template image and 1D manifold (vertical line); (d) Original Reference that is compared to the template on the 1D manifold (vertical line); (b)+(e) Deformed template (a) and deformation for 1st order regularization (B = ∇); (c)+(f) Deformed template (a) and deformation for 2nd order regularization (B = Δ); (g) Surface of 3D template (elongated bar) and three orthogonal 2D manifolds with reference data taken from a big cuboid; (h) Deformed template for 1st order regularization (B = ∇); (i) Deformed template for 2nd order regularization (B = Δ).

Furthermore we assume that g = 0, i.e., gL2(M) = 0. Computing the EulerLagrange equations in its weak form shows a necessary condition for a minimizer is Ay = f (8) where A := −B ∗ B and B ∗ denotes the adjoint of B.

A Variational Approach for Volume-to-Slice Registration

629

The right-hand-side f belongs to the space H −1 (Ω) but clearly f ∈ L2 (Ω) = H (Ω) where H −1 (Ω) denotes the dual space of H 1 (Ω) and H m (Ω) is the Sobolev space of all m-times weakly diﬀerentiable functions [13, §3]. Now we discuss two diﬀerent choices for the regularizer B. First ﬁrst-order socalled diﬀusive regularization [14] with B = ∇ and second second-order curvature regularization by B = Δ. In the ﬁrst case B = ∇ yields B ∗ = −∇· and hence A = Δ is a second-order diﬀerential operator. Since the right-hand-side f belongs to H −1 (Ω) \ H 0 (Ω) a solution of (8) must be in H −1+2 (Ω) \ H 0+2 (Ω) = H 1 (Ω) \ H 2 (Ω) (cf. [15, §8]). Due to the embedding H k (Ω) ⊂ C m (Ω) for m < k − d/2 this shows that if d > 1a solution cannot be diﬀerentiable [13, §6]. Applying the same logic in the second case for B = Δ, we ﬁnd B ∗ = −Δ yielding the fourth-order diﬀerential operator A = Δ2 . Therefore, a weak solution y of (8) has to satisfy y ∈ H 3 (Ω) \ H 4 (Ω). Hence, if d < 4 then y ∈ C 1 (Ω) such that a solution is continuously diﬀerentiable for d = 2, 3. 0

4

Numerical Method

In this section we describe our approach to compute a numerical solution for the volume-to-slice registration problem (5). Here, we follow the ﬁrst-discretize-thenoptimize paradigm. Therefore, we discretize the functional and subsequently apply Gauss-Newton optimization. We start by explaining our discretization. In the following we particularly describe the discretization for the threedimensional case, i.e., d = 3. That is, the domain of interest Ω is a subset of R3 and Mj are two-dimensional manifolds. We assume that the domain of interest is rectangular, i.e., Ω = (a1 , b1 ) × (a2 , b2 ) × (a3 , b3 )

with − ∞ < ai < bi < ∞, i = 1, 2, 3,

and Mj are rectangular slices. For simplicity we assume that all slices Mj are parametrized over the same parameter space Θ such that Mj = {x = τj (t) : t ∈ Θ}

and

Θ := (0, θ1 ) × (0, θ2 )

with parametrizations τj : Θ ⊂ R2 → Mj ⊂ R3 given by τj (t) := Qj t + bj ,

3 Qj ∈ R3×2 such that Q j Qj = I and bj ∈ R .

(9)

Note that the condition Q j Qj = I implies det Dτj Dτj = 1 where Dτj denotes the Jacobian matrix of τj . This property simpliﬁes computing the integrals on the manifolds and will be used later. We start with the discretization of the deformation and the distance measure. Subsequently we describe the discretization of the regularizer.

Discretization of the Deformation We use a nodal discretization for the deformation y on Ω. Therefore, we introduce a uniform grid composed of n1 × n2 × n3 cells with grid-spacing h = 1 b2 −a2 b3 −a3 ( b1n−a , n2 , n3 ) and nodal grid points 1

630

S. Heldmann and N. Papenberg

Ω h := xk = x0 + k h : k ∈ {0, . . . , n1 } × {0, . . . , n2 } × {0, . . . , n3 } where x0 = (a1 , a2 , a3 ) and denotes the Hadamard (point-wise) product of two vectors. Then, we collect the values y(xk ) ∈ R3 of the deformation at all N = (n1 + 1)(n2 + 1)(n3 + 1) nodal grid points xk ∈ Ω h in a grid-function, i.e., a vector y h ∈ R3N . Discretization of the Distance Measure Now we turn to the to the discretization of the distance measure. Recall, that it was deﬁned as m 2 1 T (y(x)) − R(x) dS(x). D(R, T (y)) = 2 j=1 Mj For an approximation of the integrals on Mj we start by discretizing the parameter space Θ. Therefore, we deﬁne θ 1 θ2 h h : k ∈ {1, . . . , p1 }×{1, . . . , p2 } Θ := tk = k h − with h = , 2 p1 p2

such that Θh contains the cell-center of a regular discretization by p1 × p2 cells. Consequently, we discretize Mj by

Mhj := {mk = τj (tk ) : tk ∈ Θh }. Note that we have two diﬀerent grid-spacings h and h for the discretization of the deformation y on Ω and the discretization of the manifolds Mj , respectively. y

t2

τj x

t1 cell-centered discretization Θh of the parameter-space

z nodal discretization Ω h of the deformation (gray) with cell-centered discretiza tion Mhj of the manifold (black)

Fig. 3. Schematic overview on the discretization of the parameter-space Θ (left) and a manifold Mj and the domain Ω (right)

A Variational Approach for Volume-to-Slice Registration

631

An schematic overview of the diﬀerent discretizations Θh , Mhj , and Ω h is shown in Fig. 3. Using the common mid-point rule for the discretization of an integral over Mj we obtain 2 2

T (y(x)) − R(x) dS(x) = T (y(τj (t)))−R(τj (t)) det Dτj Dτj dt Mj

Θ

=

2 T (y(τj (t))) − R(τj (t)) dt

Θ

≈ h1 h2

tk ∈Θh

= h1 h2

2 T (y(τj (tk ))) − R(τj (tk )) 2 T (y(mk )) − R(mk ) ,

mk ∈Mh j

where we used orthogonality of the Jacobian matrix Dτj , cf. (9). For short nota tion, analogues to the deformation we collect the M = p1 p2 grid points in Mhj in a vector mhj ∈ R3M . With some abuse of notation let Rjh := R(mhj ) ∈ RM be the values of the reference R on Mhj and analogues T (y(mhj )) be the values of T (y) such that 2 T (y(mhj )) − Rjh 22 = T (y(mk )) − R(mk ) .

mk ∈Mh j

As we can see this approximation involves values of the deformation y at points mk ∈ Mhj which are in general no grid-points of our nodal discretization Ω h . To this end we approximate the values y(mk ) for mk ∈ Mhj by interpolation of the nodal grid-function y h , i.e., y(mk ) ≈

3N

ξi yih

for

mk ∈ Mhj .

i=1

We particularly use linear interpolation such that in fact only 8 coeﬃcients per point are involved. Collecting all interpolation weights ξi for each point mk ∈ Mhj in a 3M × 3N matrix Pj we have

T (Pj y h ) ≈ T (y(mhj )). Summarizing, we approximate the distance measure by m m 2 h h 1 T (y(x))−R(x) dS(x) ≈ 1 2 T (Pj y h )−Rjh 22 . D(R, T (y)) = 2 j=1 Mj 2 j=1

h ) ∈ RMm , P = diag(P1 , . . . , Pm ) ∈ R3Mm×3N we Setting Rh = (Rh1 , . . . , Rm obtain a concise formulation for a discrete version of D(R, T (y)) given by

D(y h ) :=

h1 h2 T (P y h) − Rh 22 . 2

(10)

632

S. Heldmann and N. Papenberg

Discretization of the Regularizer For a discrete version of the curvature regularizer we use standard ﬁnite diﬀerences for approximating derivatives and the mid-point rule for the approximation integrals. Recall the curvature regularizer was deﬁned as 1 1 2 S(y) = ΔyL2 (Ω) = |Δy|2 dx. 2 2 Ω In a ﬁrst step we approximate the Laplacian based on the standard second-order seven-point-formula, i.e., we deﬁne Δh y(x) :=

3 1 y(x − h e ) − 2y(x) + y(x + h e ) 2 h =1

where e1 , e2 , e3 are the unit vectors of R3 . Furthermore, let B h ∈ R3N ×3N be its matrix representation such that B h y h is a second order approximation to Δy at the nodal grid points in Ω h yielding (B h y h ) (B h y h ) is a second order approximation to (Δy)2 . Now, let Acn ∈ Rn1 n2 n3 ×N be a matrix that averages values from nodes to the cell-centers such that Acn (B h y h ) (B h y h ) is a second order approximation to (Δy)2 at the cell-centers. Thus applying the mid-point rule for mesh size h = (h1 , h2 , h3 ) we obtain c h h h h h1 h2 h3 e An (B y ) (B y ) ≈ |Δy|2 dx Ω

with e = (1, 1, . . . , 1) ∈ R algebra we ﬁnd

n1 n2 n3

the one-vector. Moreover, applying some linear

e Acn (B h y h ) (B h y h ) = e Acn diag(B h y h )B h y h = y h B h diag(e Acn )B h y h . As a result, we deﬁne the discrete version of the curvature regularizer by S(y h ) :=

1 h h h y A y 2

with a matrix Ah := h1 h2 h3 B h diag(e Acn )B h ∈ R3N ×3N . Gauss-Newton Optimization Having established discrete versions of the distance measure and the smoother now we aim to min D(y h ) + αS(y h ). (11) y

Clearly, (11) is not a quadratic function due to the non-linearity in the distance D. Therefore, we cannot compute a solution directly and have to rely on an iterative method. Here, we us a standard Gauss-Newton method [16]. Therefore, in each iteration we solve a linear system of the type Hs = −g

(12)

to compute an update s for the current iterate. Thereby g is the gradient ∇D + α∇S of the objective function given by

A Variational Approach for Volume-to-Slice Registration

633

g = h1 h2 P ∇T (T (P y h ) − Rh ) + αAh y h and H is an approximation to the Hessian ∇2 D+α∇2 S. Neglecting second order terms in ∇2 D we set H := h1 h2 P ∇T ∇T P + αAh . Thus, the Hessian is a sparse symmetric positive deﬁnite matrix such that we can apply a conjugate gradient (CG) method for solving the linear system (12). In our implementation we use CG with symmetric Gauss-Seidel relaxation as a preconditioner. Summarizing this leads to an eﬃcient numerical algorithm for computing a solution to the discrete volume-to-slice registration problem (11).

5

Experiments

We demonstrate our method by an academic example on real liver data. Therefore, we use 238 × 155 × 156 US volumetric data captured by a 3D US-scanner.

(a)

(b)

(c)

(d)

Fig. 4. 3D Volume-to-slice-registration results for clinical data. (a) 3D data (black) with ﬁve 2D reference slice; (b) 3D template (gray) with ﬁve reference slice; (c)+(d) 3D template (gray) and original data (black) before and after registration.

634

S. Heldmann and N. Papenberg

We simulate a typical ultrasound sweep by extracting few 2D slices from the volume. Fig. 4(a) shows the setting for ﬁve slices where we visualize the volumetric data by a surface rendering of the contained vessels. This slice data is used as reference. Subsequently, we apply an artiﬁcial non-linear deformation to the volume that is used as a template. Fig. 4(b) displays a surface rendering of the template with the reference slice data. Based on the ﬁve reference slices and the volumetric template then we performed a volume-to-slice registration. Fig. 4(c) and 4(d) shows the 3D template vessels before and after registration together with original vessels. Note that the original vessels served only to generate the reference slices and was not take into account during registration. As we can see we obtain an amazing and almost perfect alignment based on very few reference data (see Fig. 4(d)).

6

Conclusions

We described a new method for registration of a d-dimensional template to d − 1-dimensional reference data motivated by CT/US registration. A key observation is that high order regularization is required to avoid unwanted and non-diﬀerentiable deformations. Furthermore, we described an eﬃcient algorithmic based on a Gauss-Newton optimization method. In a ﬁrst experiment we successfully demonstrated our method for the registration of artiﬁcially deformed data where we were able to almost recover the original deformation based only on very few reference data. These promising ﬁrst result shows that out approach works in general. Clearly, the chosen SSD distance measure is not suitable for the target application on CT and US registration. However, our overall method is independent of a particular choice for the distance measure. An extension to other distance measure that can handle multi-modality, such as mutual information, is straightforward. Concluding, we have presented a novel scheme and proof-of-concept for a clinical-relevant problem based on sound theory and eﬃcient numeric. Future work includes extension to a multi-modal setting for registration of CT and US.

Acknowledgments We thank Dirk Langemann from the Institute of Mathematics at the University of Lübeck for his support on functional analysis. We also thank Thomas Lange from the Department of Surgery and Surgical Oncology at Charité - Universitätsmedizin Berlin for providing image data.

References 1. Fong, Y., Fortner, J., Sun, R., et al.: Clinical score for predicting recurrence after hepatic resection for metastatic colorectal cancer: analysis of 1001 consecutive cases. Ann. Surg. 230, 309–318 (1999)

A Variational Approach for Volume-to-Slice Registration

635

2. Lang, H.: Technik der leberresektion - teil i. Chirurg 78(8), 761–774 (2007) 3. Barry, C., Allott, C., John, N., Mellor, P., Arundel, P., Thomson, D., Waterton, J.: Three-dimensional freehand ultrasound: Image reconstruction and volume analysis. Ultrasound in Medicine & Biology 23, 1209–1224 (1997) 4. Coupe, P., Azzabou, P.H.N., Barillot, C.: 3D freehand ultrasound reconstruction based on probe trajectory. In: Duncan, J.S., Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3749, pp. 597–604. Springer, Heidelberg (2005) 5. Rohling, R.: 3D Freehand Ultrasound: Reconstruction and Spatial Compounding. PhD thesis, Department of Engineering, University of Cambridge (1998) 6. Broit, C.: Optimal registration of deformed images. PhD thesis, Department of Computer and Information Science, University of Pensylvania (1981) 7. Modersitzki, J.: Numerical Methods for Image Registration. Numerical Mathematics and Scientiﬁc Computation. Oxford University Press, Oxford (2003) 8. Brown, L.G.: A survey of image registration techniques. ACM Computing Surveys 24(4), 325–376 (1992) 9. Viola, P.A., Wells, W.M.I.: Alignment by maximization fo mutual information. In: 5th International Conference on Computer Vision (1995) 10. Collignon, A., Maes, F., Vandermeulen, P., Suetens, P., Marchal, G.: Automated multi-modality image registartion based on information theory. Information Processing in Medical Imaging (1995) 11. Fischer, B., Modersitzki, J.: Curvature based image registration. JMIV 18(1) (2003) 12. Fischer, B., Modersitzki, J.: Combining landmark and intensity driven registrations. PAMM 3, 32–35 (2003) 13. Wloka, J.: Partial Diﬀerential Equations. Cambridge University Press, Cambridge (1987) 14. Fischer, B., Modersitzki, J.: Fast diﬀusion registration. In: Nashed, M., Scherzer, O. (eds.) Inverse Problems, Image Analysis, and Medical Imaging. Contemporary Mathematics, vol. 313. AMS (2002) 15. Rudin, W.: Functional Analysis. McGraw-Hill, New York (1991) 16. Nocedal, J., Wright, S.J.: Numerical Optimization. Springer Series in Operations Research. Springer, Heidelberg (1999)

Hyperbolic Numerics for Variational Approaches to Correspondence Problems Henning Zimmer1,2 , Michael Breuß1 , Joachim Weickert1 , and Hans-Peter Seidel2 1

Mathematical Image Analysis Group, Faculty of Mathematics and Computer Science, Building E1.1, Saarland University, 66041, Saarbrücken, Germany {zimmer,breuss,weickert}@mia.uni-saarland.de 2 Max-Planck Institute for Informatics, Stuhlsatzenhausweg 85, 66123 Saarbrücken, Germany [email protected]

Abstract. Variational approaches to correspondence problems such as stereo or optic ﬂow have now been studied for more than 20 years. Nevertheless, only little attention has been paid to a subtle numerical approximation of derivatives. In the area of numerics for hyperbolic partial diﬀerential equations (HDEs) it is, however, well-known that such issues can be crucial for obtaining favourable results. In this paper we show that the use of hyperbolic numerics for variational approaches can lead to a signiﬁcant quality gain in computational results. This improvement can be of the same order as obtained by introducing better models. Applying our novel scheme within existing variational models for stereo reconstruction and optic ﬂow, we show that this approach can be beneﬁcial for all variational approaches to correspondence problems.

1

Introduction

Numerous tasks in the ﬁeld of computer vision belong to the class of correspondence problems, where one has to match pixels of two or more images. Popular examples are stereo reconstruction and optic ﬂow, that both amount to computing a displacement ﬁeld between two images. In the stereo context, the absolute value of this ﬁeld is called disparity and is needed to recover the depth information of a static scene. For optic ﬂow, the displacement ﬁeld is called optic ﬂow ﬁeld and gives information about the dynamics of a moving scene. A successful class of techniques for solving correspondence problems like stereo or optic ﬂow are the variational approaches that ﬁnd the displacement ﬁeld as the minimiser of a continuous energy functional. Those methods have been studied for more than two decades, starting from the optic ﬂow approach of Horn and Schunck [1]. During this period of time, lots of eﬀort has been spent to improve the quality of models [2, 3, 4, 5, 6, 7]. In order to apply those continuous models to sampled digital images and for solving the minimisation problem on a computer, one certainly has to discretise X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 636–647, 2009. c Springer-Verlag Berlin Heidelberg 2009

Hyperbolic Numerics

637

occurring image derivatives. This task obviously oﬀers a certain degree of freedom in choosing a well-suited derivative approximation. Surprisingly, this issue has hardly been studied for variational approaches to correspondence problems. If the discretisation is discussed at all, most approaches use “standard” central ﬁnite diﬀerence approximations [3, 4, 5]. For variational approaches to image restoration, sophisticated approximation schemes have already been considered for a long time [8, 9]. They also have been thoroughly studied in the ﬁeld of hyperbolic partial diﬀerential equations (HDEs) [10, 11], where one simulates the transport of liquids or gases, resulting in a problem setting related to correspondence problems: Given an initial density distribution (ﬁrst image) and the velocity of transport (displacement), compute the density distributions at later times (second image). One realises that the role of known and unknown is switched compared to correspondence problems. In this paper we make use of this relation between HDEs and correspondence problems for the ﬁrst time in the literature. In the style of numerical schemes for HDEs, we develop an adaptive discretisation scheme that decides, based on a smoothness measure, on a suitable approximation of image derivatives at each point. This scheme is then used within variational frameworks for stereo reconstruction and optic ﬂow. Experiments show that this approach improves the quality of results in the same order as can be achieved with model reﬁnements. This paper is organised as follows: In Sect. 2 we investigate the importance of an appropriate approximation of image derivatives on the example of simple 1-D correspondence problems. Based on this we develop the adaptive discretisation scheme that is applied to stereo reconstruction and optic ﬂow in Sect. 3 and Sect. 4, respectively. There we also show corresponding experiments. The paper is then concluded by a summary and an outlook on future work in Sect. 5.

2 2.1

Hyperbolic Numerics for 1-D Variational Approaches A Variational Approach for 1-D Correspondence Problems

For simplicity, let us consider a 1-D signal sequence f (x, t) where x ∈ Ω denotes the position in the signal domain Ω ⊂ IR and t ≥ 0 denotes time. In order to compute the unknown displacement function u(x) that gives the displacements from time t to t + 1, we minimise the energy functional E(u) = (fx u + ft )2 + α u2x dx , (1) Ω

where subscripts denote partial derivatives. The term (fx u+ft )2 is called data term and models how well the displacement u matches the signal sequence f . We impose that the signal values are invariant under their displacement, i.e., f (x+u, t+1) = f (x, t). Assuming that u is small and f suﬃciently smooth, we can perform a linearisation that ﬁnally leads to the presented data term. Note that in the 1-D setting, the data term alone allows to compute a solution u = −ft /fx , if fx = 0. However, in 2-D this will no longer be the case.

638

H. Zimmer et al.

There, and also to obtain a solution in ﬂat signal regions, the smoothness term u2x is needed. By penalising large derivatives of u, it allows to smoothly ﬁll in the displacement function where the data term is not suﬃcient. Its contribution to the energy is steered by a smoothness weight α > 0. In order to actually compute a minimiser u of the energy (1), the calculus of variations states that u necessarily has to fulﬁl the Euler-Lagrange equation fx (fx u + ft ) − α uxx = 0 ,

(2)

with homogeneous Neumann boundary conditions. 2.2

A Closer Look into Discretisation Issues

For solving the Euler-Lagrange equation (2) on a computer, we have to discretise the signal f , the displacement u and their derivatives fx , ft and uxx . Note that the image derivatives that occur in the Euler-Lagrange equation (2) are in general the same as in the linearised data term of the energy (1). Thus, the data term suﬃces to ﬁnd out which derivatives have to be approximated. Let us start with the discretisation of the signals f and u. To this end we sample them on a spatio-temporal discrete grid which yields the approximations f (xi , tk ) ≈ fik and u(xi ) ≈ ui where xi := (i − 12 ) h and tk = k τ for a spatial grid size h and a time step size τ . In this paper we will only consider the two frames fik and fik+1 , assuming a temporal sampling of τ = 1. Derivative Approximations. The discretisation of the occurring derivatives can be done in diﬀerent ways. We use the popular concept of ﬁnite diﬀerences, as for example presented in [12]. As notation for the approximation of partial derivatives we use fd (xi , tk ) ≈ (fd )ki to denote the corresponding ﬁnite diﬀerence discretisation. I. Temporal Discretisation. For the time derivative we use the forward diﬀerence (ft )ki :=

1 k+1 f − fik , τ i

(3)

as this is the only reasonable choice, given fik and fik+1 . II. Spatial Discretisation of First Order. The approximation of fx oﬀers diﬀerent possibilities for (fx )ki . Basic choices are forward, backward and central diﬀerences: 1 k fi+1 − fik , h 1 k − k k fi − fi−1 Dx fi := , h 1 k k fi+1 − fi−1 Dx0 fik := , 2h Dx+ fik :=

1 k+1 fi+1 − fik+1 , h 1 k+1 k+1 − k+1 fi Dx f i , := − fi−1 h 1 k+1 k+1 fi+1 − fi−1 Dx0 fik+1 := , 2h Dx+ fik+1 :=

(4)

where D+ denotes forward, D− backward and D0 central diﬀerences, respectively, that can be computed at the time level k or k + 1.

Hyperbolic Numerics

639

Note that the approximation error of the one-sided diﬀerences (forward and backward) is in O(h), whereas their central counterparts only involve an error of O(h2 ). This, together with the unbiased stencil orientation, explains why they are a popular “standard” choice in image processing applications. To further reduce the approximation error one may consider averaged diﬀerences, taking into account the time level k and k + 1. In the remainder of this paper those will be referred to as “standard” derivative approximation. They are given by k+ 12

Dx0 fi

:=

1 k 1 0 k k+1 k+1 k Dx fi + Dx0 fik+1 = f . − fi−1 + fi+1 − fi−1 2 4h i+1

(5)

III. Spatial Discretisation of Second Order. Finally we have to approximate the second order spatial derivative of the displacement function. As this choice is not crucial we propose a simple central approximation 1 (uxx )i := Dx− Dx+ ui = 2 (ui+1 − 2ui + ui−1 ) . h

(6)

Why the Discretisation of fx Matters. To show that an appropriate choice of (fx )ki is crucial for computing reasonable displacements u, we conduct a small experiment: Consider the two frames of a signal sequence in Fig. 1 (a). Here, the signal is displaced by one position to the right in its middle part and stays unchanged otherwise, which is also indicated in the ground truth displacement in Fig. 1 (b). Note that this example comprises smooth as well as discontinuous signal and displacement regions which make it rather indicative. In Fig. 1 (c)–(e) we depict computed displacements using diﬀerent discretisations for fx . The displacements were obtained as the solution of a linear system of equations that arises from the discretised Euler-Lagrange equation (2). As the system matrix is tri-diagonal, it can directly be solved via the Thomas algorithm [13]. Further note that we set the smoothness weight α = 10−4 , to clearly see the inﬂuence of the data term where fx occurs. When comparing the displacements in Fig. 1 (c)–(e), the large inﬂuence of the choice of (fx )ki becomes obvious: Averaged central diﬀerences only perform well in the smooth signal regions at the left and right boundaries. At discontinuities they suﬀer from over- and undershoots. One-sided diﬀerences perform either favourably or fail totally. Obviously, the correct orientation matters here. When using the “correct” one-sided diﬀerences, the displacement almost coincides with the ground truth, except at one point. This is, however, not due to the numerics, but is caused by the occlusion at the jump in the displacement. Hence the considered point at time level k does not possess a matching point at time level k + 1 and its displacement is undeﬁned. In the ground truth, we assign to this point the displacement of its right neighbour. The observed behaviour in our experiment can be explained when looking into the theory of HDEs [10, 11]. There, so called upwind schemes are a widely used concept where the signal derivatives are approximated by “correctly oriented” one-sided diﬀerences. The correct orientation in our case means opposite to the displacement direction, see our experiment.

640

H. Zimmer et al. 30

1.5

25 1 20 15

0.5

10 0 5 0

-0.5 0

2

4

6

8

10

12

0

2

4

8

6

8

10

12

2

4

8 5

7 6

7 6

0

5

5

-5

4

4

-10

3 2

3 2

-15

1

1 -20

0

0

-1

-1 0

2

4

6

8

10

12

0

2

4

6

8

10

12

0

6

8

10

12

Fig. 1. Top row: (a) Signal at time k (solid) and k + 1 (dotted). (b) Ground truth displacement. Bottom row: (c) Displacement computed using standard averaged central diﬀerences (solid), compared to the ground truth (dotted). (d) Same for one-sided forward diﬀerences. (e) Same for one-sided backward diﬀerences.

2.3

An Adaptive Discretisation Scheme

After explaining the outcome of our experiment with the help of hyperbolic numerics, we now adapt a successful concept from this area for our purpose. Recall that one-sided upwind diﬀerences – that are low-order approximations – perform well at signal discontinuities. However, they involve a higher discretisation error than central diﬀerences that are high-order approximations and that perform favourably in smooth signal regions. Hence a natural idea is to combine the two strategies by using high-order approximations in smooth signal parts and low-order ones at discontinuities. Slightly more involved techniques utilising this idea are the high-resolution methods [11], developed in the context of HDEs. They use a nonlinear blend of low- and high-order approximations, steered by a smoothness measure. Adapting this methodology to the variational framework will result in an adaptive highresolution-type (HRT) discretisation scheme for correspondence problems, that will be presented now. Measuring smoothness. First we discuss how to determine the smooth and discontinuous regions of a signal. Therefore we introduce a smoothness measure Θi := Θ fik , fik+1 := Dx− fik − Dx+ fik + Dx− fik+1 − Dx+ fik+1 ,

(7)

that is close to 0 in smooth regions where backward and forward diﬀerences of fik and fik+1 are almost identical, and large at discontinuities of fi . Determining the Upwind Directions. Next we need to determine the appropriate upwind directions for the one-sided diﬀerences. Note that our experiment

Hyperbolic Numerics

641

from Fig. 1 has shown that this is very crucial. We propose to compute a predictor solution u ˜ whose sign determines the upwind direction. The predictor is computed using standard averaged central diﬀerences and a comparatively large smoothness weight, e.g., α ˜ = 1 to cope with outliers caused by the possibly less appropriate high-order discretisation. With its help the low-order upwind approximation fxL of fx is deﬁned as ⎧ ⎪ D− f k , if u˜i > 0 , ⎪ ⎨ x i L (8) fx i := Dx+ fik , if u ˜i < 0 , ⎪ ⎪ ⎩ H (fx )i , if u ˜i = 0 , where

H k+ 1 fx i := Dx0 fi 2

(9)

denotes the high-order standard approximation of fx using averaged central diﬀerences. Revisiting the experiment from Fig. 1, we realise that this deﬁnition agrees with the results obtained there. The High-Resolution-Type (HRT) Discretisation Scheme. Now we have everything at hand to deﬁne the adaptive HRT discretisation scheme as (fx )ki := fxL i + Φ (Θi ) fxH i − fxL i , (10) using a blending function Φ(Θi ). It is close to 1 in smooth signal regions (indicated by Θi ), yielding a high-order approximation there. At discontinuities it is close to 0 which leads to a low-order approximation that is better suited there. For the actual choice of Φ(Θi ) we propose 1 − ΘTi , if 0 ≤ Θi < T , (11) Φ(Θi ) := 0, else , using a threshold parameter T > 0. Note that for T → 0 we obtain the upwind scheme and for T → ∞ one falls back to a standard scheme. Applying the HRT scheme to the signal sequence from Fig. 1 gives the same result as with the appropriate upwind scheme, hence we omit an additional ﬁgure. However, for more challenging stereo and optic ﬂow problems that we discuss in Sect. 3 and 4, the blending of the HRT scheme will give results superior to a pure upwind scheme.

3

Integration into Variational Stereo Approaches

In this section we integrate our adaptive HRT discretisation scheme into a recent variational stereo approach by Slesareva et al. [6]. We restrict ourselves to the rectiﬁed scenario where displacements can only occur in horizontal direction and thus one has to solve a 1-D correspondence problem for each image row. However, it makes sense to couple those via a 2-D smoothness assumption, as will be described now.

642

3.1

H. Zimmer et al.

Variational Stereo

We consider the image pair fl (x) ≡ f (x, t) and fr (x) ≡ f (x, t + 1) denoting the left and right view of a static scene, respectively. Here, x := (x, y) denotes the location within a rectangular image domain Ω2 ⊂ IR2 . Further assume that the images are presmoothed by a Gaussian convolution of standard deviation σ. The unknown scalar-valued disparity is given by the absolute value of u which can be written as u := (u, 0) in the rectiﬁed case. In accordance to [6], the disparity is found by minimising the energy E(u) = [M (u) + α V (u)] dx . (12) Ω2

The data term

2 2 , M (u) = ΨM |fr (x+u) − fl (x)| + γ |∇fr (x+u) − ∇fl (x)|

(13)

where ∇ := (∂x , ∂y ) denotes the spatial gradient operator, combines the brightness and gradient constancy assumption weighted by γ > 0. The latter makes the method more robust under illumination changes. To cope with√ outliers caused by noise or occlusions, a robust penaliser function ΨM (s2 ) := s2 + ε2 using a small regularisation parameter ε > 0 is employed that results in modiﬁed L1 penalisation. As will be described below, the linearisation of the data term is postponed to the minimisation phase to allow for a correct handling of large displacements. The smoothness term V (u) = ΨV (|∇u|2 ) , (14) uses the same robust non-quadratic penaliser function as the data term, i.e., ΨV = ΨM , resulting in Total Variation regularisation [8]. Concerning the minimisation of the energy (12), we refer to [6] for the corresponding Euler-Lagrange equation. To solve it, we employ a coarse-to-ﬁne multiscale warping approach [4] and compute on each warping level small ﬂow increments du using the linearised data term

2 2 2 ΨM (fx du + ft ) + γ (fxx du + fxt ) + (fxy du + fyt ) . (15) Note that the discretised Euler-Lagrange equation now leads to a nonlinear system of equations. After linearisation, we obtain a large but sparse linear system, which can be solved eﬃciently by an iterative solver of Gauß-Seidel type [14]. 3.2

The HRT Discretisation Scheme for Variational Stereo

We now adapt the HRT scheme from Sect. 2.3 to the stereo setting. First, we extend the discrete grid to a 2-D version with grid sizes hx and hy in x- and y-direction, respectively. The images and the disparity are then approximated k+1 k by fl (xi , yj ) ≈ fi,j , fr (xi , yj ) ≈ fi,j and u(xi , yj ) ≈ ui,j .

Hyperbolic Numerics

643

I. Smoothness Measures. In the 2-D stereo case, we ﬁrst of all need distinct smoothness measures Θx , Θy and Θxy for the x-, y- and xy-direction, respectively. For Θx we use the according expression (7) from the 1-D case and Θy is obtained by using y- instead of x-diﬀerences. With their help, the mixed expression is deﬁned as Θxy = Θx + Θy . II. Derivative Approximations. Inspecting the linearised data term from (15), we realise that now also the second-order derivatives fxx , fxt , fxy and fyt need to be discretised. Due to space limitations we will exemplify our approach for fxy . The other derivatives are than approximated accordingly. Note that given the two signals k+1 k fi,j and fi,j , the time derivative ft is always approximated as in (3). We start with the high-order approximation of fxy = ∂x fy . This translates to the ﬁnite diﬀerence case as

1 k+ 1 k+1 H k k (fxy + Dx0 Dy0 fi,j (16) )i,j = Dx0 Dy0 fi,j 2 = Dx0 Dy0 fi,j 2 0 k+1 1 0 k k+1 k Dx fi,j+1 − fi,j−1 + Dx fi,j+1 − fi,j−1 (17) = 4hy k 1 k k k = fi+1,j+1 − fi+1,j−1 (18) − fi−1,j+1 − fi−1,j−1 8hx hy k+1 k+1 k+1 k+1 +fi+1,j+1 . − fi+1,j−1 − fi−1,j+1 − fi−1,j−1 Note that for fxx we employ the central discretisation in accordance to (6). In the low-order case we use the upwind discretisation of (fx )ki,j , steered by the predictor u ˜. For the y-derivative we employ the averaged central diﬀerence approximation as in the rectiﬁed scenario, the displacement in y-direction is always zero. Thus we obtain for u ˜>0:

1 L k − 0 k k k k k (fxy )i,j = Dx Dy fi,j = f , (19) −f − fi−1,j+1 −fi−1,j−1 2hx hy i,j+1 i,j−1 and a corresponding expression for u ˜ < 0. Note that we do not need a larger smoothness weight α ˜ to compute u ˜ in this case since an appropriate α for usual stereo pairs will be large enough. 3.3

Experiments for Variational Stereo

We now show results for disparity computations using the approach of Slesareva et al. [6] with diﬀerent derivative approximations. We use greyscale versions of the stereo image data from the Middlebury University [15]1 . To measure the quality of estimated disparities compared to the given ground truth disparities, we employ the bad pixel error (BPE) measure [15]. As ﬁxed parameters we set ε = 10−3 and T = 1. In the stereo case we set σ = 0.5 and for the optic ﬂow experiments in Sect. 4 we set σ = 0.8. 1

Available under http://vision.middlebury.edu/stereo

644

H. Zimmer et al.

In Fig. 2, the results for the Plastic pair are depicted. Considering the bad pixel maps in Fig. 2 (b)–(c), we see that the HRT scheme improves the results in the vicinity of image discontinuities and at the boundaries. Those areas are marked grey in the error maps. Note that the artefacts in Fig. 2 (f) are again caused by occlusions. The improvement also becomes visible in the BPE measures that are summarised in Table 1 that also lists other Middlebury pairs and parameter settings. Also error measures for a pure upwind scheme are given there. Comparing them to the HRT scheme shows that the blending of the latter scheme also pays oﬀ in terms of quality measures.

Fig. 2. Top row: (a) Left image of the Plastic pair. (b) Bad pixels for approach with a standard derivative approximation (bad pixels are coloured black). (c) Same for the HRT scheme. Bottom row: (d) Ground truth disparity. (e) Disparity for approach with a standard derivative approximation. (f ) Same for the HRT scheme.

4

Extension to Variational Optic Flow

Having presented how to employ the adaptive HRT discretisation scheme for stereo, its extension to the optic ﬂow case is more or less straightforward. For optic ﬂow we consider a presmoothed image sequence f (x, t) and want to compute a ﬂow ﬁeld w := (u, v) , where u and v give the displacements in x- and y-direction, respectively. Using the method of Brox et al. [4] that was the basis for the stereo approach of Slesareva et al. [6], we compute w as the minimiser of an energy functional similar to the one from (12). One diﬀerence concerning the HRT scheme is that we now also have to approximate fy and fyy . This, however, works accordingly to the stereo case. More problematic are the low-order upwind approximations of fxy , as they now depend on a predictor w ˜ = (˜ u, v˜) . Hence we need to do an extensive case distinction taking into account all possible combinations of the signs of u ˜ and v˜. For example, let u ˜ > 0 and v˜ < 0 then

Hyperbolic Numerics

645

Table 1. BPE measures and parameters for stereo experiments Image Pair Derivative Approximation standard Plastic upwind HRT scheme standard Teddy upwind HRT scheme standard Venus upwind HRT scheme

L k k = (fxy )i,j = Dx− Dy+ fi,j

α= α= α= α= α= α= α= α= α=

Parameters 5.5, γ = 190.0 5.5, γ = 190.0 5.5, γ = 190.0 8.0, γ = 9.5 8.0, γ = 9.5 8.0, γ = 9.5 4.5, γ = 0.5 4.5, γ = 0.5 4.5, γ = 0.5

BPE 25.85 21.35 18.85 17.45 16.94 16.75 3.06 2.78 2.77

1 k k k k fi,j+1 . − fi,j − fi−1,j+1 − fi−1,j hx h y

(20)

In order to show that the HRT scheme also performs favourably for optic ﬂow, we performed experiments using the recent optic ﬂow data sets from the Middlebury University [16]2 . In Fig. 3 we show results obtained for the Urban3 sequence. Note that the error maps now show the magnitude of the average angular error (AAE) [17] measure. Inspecting them, the favourable performance of the HRT scheme in the marked regions becomes visible, which is also reﬂected in the AAE measures shown in Table 2. It again comprises also other Middlebury sequences, parameter settings and results for the upwind scheme. Concerning the latter, we see that also for optic ﬂow, the HRT scheme performs better.

Fig. 3. Top row: (a) Frame 10 of the Urban3 sequence. (b) AAE map for approach with a standard derivative approximation. (c) Same for the HRT scheme. Bottom row: (d) Flow magnitude of the ground truth. (e) Flow magnitude for approach with a standard derivative approximation. (f ) Same for the HRT scheme.

2

Available under http://vision.middlebury.edu/flow

646

H. Zimmer et al. Table 2. AAE measures and parameters for optic ﬂow experiments Image Sequence Derivative approximation standard Urban3 upwind HRT scheme standard RubberWhale upwind HRT scheme standard Dimetrodon upwind HRT scheme

5

Parameters α = 4.5, γ = 4.0 α = 4.5, γ = 4.0 α = 4.5, γ = 4.0 α = 50.0, γ = 50.0 α = 50.0, γ = 50.0 α = 50.0, γ = 50.0 α = 7.0, γ = 10.0 α = 7.0, γ = 10.0 α = 7.0, γ = 10.0

AAE 5.71 4.58 4.11 4.72 4.73 4.34 1.94 3.06 1.88

Conclusions and Outlook

In this paper we have presented a sophisticated numerical scheme for the approximation of spatial image derivatives in variational approaches to correspondence problems. Our experiments demonstrated that such a scheme allows to tangibly improve the quality of results, which has in more than 20 years of research in this ﬁeld only been experienced by model reﬁnements. We hence conjecture that the numerics can be a fruitful alternative starting point for further advances. This ﬁnding is no surprise for people acquainted with the theory of HDEs where sophisticated numerical schemes have been thoroughly investigated. In this paper we have seen that HDEs and variational approaches share some structural similarities. However, we were the ﬁrst to utilise this similarity for developing a well-engineered numerical scheme for variational approaches. We want to stress that the adaptive discretisation scheme developed within this paper is for sure not the only lucrative technique that can be adapted from the ﬁeld of HDEs. Our current research is thus concerned with exploring further directions that may lead to better numerical schemes for variational approaches to correspondence problems.

Acknowledgement Henning Zimmer gratefully acknowledges funding by the International MaxPlanck Research School (IMPRS).

References 1. Horn, B., Schunck, B.: Determining optical ﬂow. Artiﬁcial Intelligence 17, 185–203 (1981) 2. Alvarez, L., Deriche, R., Papadopoulo, T., Sanchez, J.: Symmetrical dense optical ﬂow estimation with occlusions detection. International Journal of Computer Vision 75(3), 371–385 (2007)

Hyperbolic Numerics

647

3. Ben-Ari, R., Sochen, N.: Variational stereo vision with sharp discontinuities and occlusion handling. In: Proc. 2007 IEEE International Conference on Computer Vision, Rio de Janeiro, Brazil, pp. 1–7. IEEE Computer Society Press, Los Alamitos (2007) 4. Brox, T., Bruhn, A., Papenberg, N., Weickert, J.: High accuracy optical ﬂow estimation based on a theory for warping. In: Pajdla, T., Matas, J. (eds.) ECCV 2004. LNCS, vol. 3024, pp. 25–36. Springer, Heidelberg (2004) 5. Nir, T., Bruckstein, A.M., Kimmel, R.: Over-parameterized variational optical ﬂow. International Journal of Computer Vision 76(2), 205–216 (2008) 6. Slesareva, N., Bruhn, A., Weickert, J.: Optic ﬂow goes stereo: A variational method for estimating discontinuity-preserving dense disparity maps. In: Kropatsch, W., Sablatnig, R., Hanbury, A. (eds.) DAGM 2005. LNCS, vol. 3663, pp. 33–40. Springer, Heidelberg (2005) 7. Zimmer, H., Bruhn, A., Valgaerts, L., Breuß, M., Weickert, J., Rosenhahn, B., Seidel, H.P.: PDE-based anisotropic disparity-driven stereo vision. In: Deussen, O., Keim, D., Saupe, D. (eds.) Proceedings of Vision, Modeling, and Visualization (VMV) 2008, pp. 263–272. AKA, Heidelberg (2008) 8. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 9. Marquina, A., Osher, S.: Explicit algorithms for a new time dependent model based on level set motion for nonlinear deblurring and noise removal. SIAM Journal on Scientiﬁc Computing 22(2), 387–405 (2000) 10. LeVeque, R.J.: Numerical Methods for Conservation Laws. Birkhäuser, Basel (1992) 11. LeVeque, R.J.: Finite Volume Methods for Hyperbolic Problems. Cambridge University Press, Cambridge (2002) 12. Morton, K.W., Mayers, L.M.: Numerical Solution of Partial Diﬀerential Equations. Cambridge University Press, Cambridge (1994) 13. Thomas, L.H.: Elliptic problems in linear diﬀerence equations over a network. Technical report, Watson Scientiﬁc Computing Laboratory. Columbia University, New York (1949) 14. Saad, Y.: Iterative Methods for Sparse Linear Systems, 2nd edn. SIAM, Philadelphia (2003) 15. Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision 47(1-3), 7–42 (2002) 16. Baker, S., Roth, S., Scharstein, D., Black, M., Lewis, J., Szeliski, R.: A database and evaluation methodology for optical ﬂow. In: Proc. 2007 IEEE International Conference on Computer Vision, Rio de Janeiro, Brazil, pp. 1–8. IEEE Computer Society Press, Los Alamitos (2007) 17. Barron, J.L., Fleet, D.J., Beauchemin, S.S.: Performance of optical ﬂow techniques. International Journal of Computer Vision 12(1), 43–77 (1994)

From a Single Point to a Surface Patch by Growing Minimal Paths Fethallah Benmansour and Laurent D. Cohen CEREMADE, UMR CNRS 7534, Université Paris Dauphine, Place du Maréchal De Lattre De Tassigny, 75775 PARIS CEDEX 16, France {benmansour,cohen}@ceremade.dauphine.fr

Abstract. We introduce a novel implicit approach for surface patch segmentation in 3D images starting from a single point. Since the boundary surface of an object is locally homeomorphic to a disc, we know that the boundary of a small neighboring domain intersects the surface of interest on a single closed curve. Similarly to active surfaces, we use a cost potential which penalizes image regions of low interest. First, Using a front propagation approach from the source point chosen by the user, one can see that the closed curve corresponds to a valley line of the arrival time from the source point. Next, we use an implicit 3D segmentation method. It assumes that the object boundary contains two known constraining curves. In our case, the ﬁrst curve is reduced to a point and the other one is automatically detected by our approach. A partial diﬀerential equation is introduced and its solution is used for segmentation. The zero level set of this solution contains the valley line and the source point as well as the set of minimal paths joining them. We present a fast implementation which has been successfully applied to 3D biomedical and synthetic images.

1

Introduction

In this paper we are interested in interactive segmentation of a surface in a 3D image by clicking a single point on the boundary of an object and obtaining a patch of the desired surface around the given point. For this we use energy minimizing techniques and partial diﬀerential equations. Energy minimization techniques have been applied to a broad variety of problems in image processing and computer vision. Since the original work on snakes [1], they have notably been used for boundary detection. An active contour model, or snake, is a curve that deforms its shape in order to minimize an energy combining an internal part which smooths the curve and an external part which guides the curve toward particular image features. One of the main drawbacks of this approach is that it suﬀers from local minima ’traps’. Consequently, results strongly depend on the model initialization. Since the publication of [1], much work has been done in order to free active models from the problem of local minima. Cohen and Kimmel [2] introduced an approach to globally minimize the geodesic active contour energy, provided that two endpoints of the curve are X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 648–659, 2009. c Springer-Verlag Berlin Heidelberg 2009

From a Single Point to a Surface Patch by Growing Minimal Paths

649

initially supplied by the user. This energy is of the form γ P˜ where the incremental cost P˜ is chosen to take lower values on the contour of the image, and γ is a path joining the two points. The solution of this minimization problem is obtained through the computation of the minimal action map associated to a source point. The minimal action map can be regarded as the arrival times of ˜ and it satisﬁes a front propagating from the source point with velocity (1/P), the Eikonal equation. Therefore, we can compute eﬃciently the minimal action map with the Fast Marching Method as will be detailed in section 2. However their approach [2] cannot be directly extended to ﬁnd the global minimum for an active surface in a 3D image. Nevertheless, this approach has been extended to surfaces in a 3D image by extracting a minimal surface laying on two given curves [3]. The advantage of this method is that it does not suﬀer from local minima problems, as would other active surface methods like [4, 5]. In this work, we focus on a novel approach for 3D object segmentation. Our aim is to generate a local surface patch from a single point. The method presented herein can be seen as an extension of the Eulerian approach presented by Ardon et al in [3] for surface extraction from a couple of ’constraining’ closed curves. But in our case, one of the curves is reduced to a single point and the other one is unknown. Let P˜ : Ω → R+ be a potential , where Ω ⊂ R3 , such that P˜ takes lower values on the surface of the object to be extracted, noted S and unknown. Having a single point p on S and a neighborhood od p: Σ ⊂ Ω, the required conditions are (see ﬁgure 1.) • the boundary ∂Σ is a connected closed surface. • ∂Σ ∩ S is a simple closed curve. • p ∈ S ∩ Σ. The volume Σ might be a ball or any topology equivalent volume. Our objective is to ﬁnd the surface patch S ∩ Σ from the source point p and the potential P. We proceed in two stages : ﬁrst, we look for the boundary of the

Fig. 1. On the left, one can see the required conditions for the surface patch extraction. The point p must be initialized on the surface S in the volume Σ. ∂Σ, the boundary of Σ, is a closed surface and ∂Σ ∩ S is a simple closed curve. On the right, we represent the information one has in practice : the surface S is unknown but the potential P takes lower values along S and higher values elsewhere.

650

F. Benmansour and L.D. Cohen

surface patch S ∩ ∂Σ and give a good estimate of it Γ ; in fact, running the Fast Marching algorithm (which will be detailed is section 2) from the source point p one can see that the Valley Line, noted Γ , of the arrival time on the boundary ∂Σ is a good approximation of S ∩∂Σ. A detailed deﬁnition of the valley line and the way it is extracted is presented in section 3. Next, one can represent the surface of interest as a dense network of minimal paths joining points of the valley line Γ to the source point p (section 4). The surface generated by this algorithm is completely composed of globally minimal paths. Indeed, by solving a stationary transport equation of the form : ∇Ψ.∇U = 0, where U is the action map (deﬁned in section 2), and Ψ is the unknown, we show that any minimal path between the valley line Γ and the source point p is contained in its zero level set Ψ −1 ({0}). Important advantages of this approach are that it needs minimal interaction and that it is computationally eﬃcient as explained later. This approach can also be used as computing brick for a complete segmentation from one single point (see section 5). Segmentation results on synthetic and medical images are presented in section 5. Finally conclusions, advantages and drawbacks of our method, and perspectives follow in section 6.

2

Background on Minimal Paths

Given a 3D image I : Ω → R+ and two points p1 and p2 , the underlying idea introduced by Cohen and Kimmel [2] is to build a potential P : Ω → R∗+ which takes lower values near desired features of the image I. The choice of the potential P depends on the application. For example, one can deﬁne P as a decreasing function of ∇I to extract image edges by ﬁnding a curve that globally minimizes the energy functional E : Ap1 ,p2 → R+ P γ(s) + w ds = E(γ) = P˜ γ(s) ds, (1) γ

γ

where Ap1 ,p2 is the set of all paths connecting p1 to p2 , s is the arc-length parameter, w > 0 is a regularization term and P˜ = (P + w). A curve connecting p1 to p2 that globally minimizes the energy (1) is a minimal path between p1 and p2 , noted Cp1 ,p2 . The solution of this minimization problem is obtained through the computation of the minimal action map U1 : Ω → R+ associated to p1 . The minimal action is the minimal energy integrated along a path between p1 and any point x of the domain Ω: ∀ x ∈ Ω, U1 (x) = min P˜ γ(s) ds . (2) γ∈Ap1 ,x

γ

The values of U1 may be regarded as the arrival times of a front propagating ˜ U1 satisﬁes the Eikonal equation from the source p1 with velocity (1/P). ˜ for x ∈ Ω, and U1 (p1 ) = 0. ∇U1 (x) = P(x)

(3)

From a Single Point to a Surface Patch by Growing Minimal Paths

651

Fig. 2. Minimal action map U from the source p using the potential P of ﬁgure 1 computed using the Fast Marching algorithm. Left: slices through the volume. Right: some equi-distant surfaces (level sets) of U.

The map U1 has only one local minimum, the point p1 , and its ﬂow lines satisfy the Euler-Lagrange equation of functional (1). Thus, the minimal path Cp1 ,p2 can be retrieved with a simple gradient descent on U1 from p2 to p1 , solving the following ordinary diﬀerential equation with standard numerical methods like Heun’s or Runge-Kutta’s: dCp1 ,p2 (s) = −∇U1 Cp1 ,p2 (s) , and Cp1 ,p2 (0) = p2 . ds 2.1

(4)

Fast Marching Method

The Fast Marching Method (FMM) is a numerical method introduced by Sethian in [6] and Tsitsiklis in [7] for eﬃciently solving the isotropic Eikonal equation on a cartesian grid. In equation (3), the values of U may be regarded as the arrival ˜ The times of wavefronts propagating from the point of S with velocity (1/P). central idea behind the FMM is to visit grid points in an order consistent with the way wavefronts propagate. It leads to a single-pass algorithm for solving equation (3) and computing the minimal action map U. The FMM is a front propagation approach that computes the values of U in increasing order, and the structure of the algorithm is almost identical to Dijkstra’s algorithm for computing shortest paths on graphs [8]. In the course of the algorithm, each grid point is tagged as either Alive (point for which U has been computed and frozen), Trial (point for which U has been estimated but not frozen) or Far (point for which U is unknown). The set of Trial points forms an interface between the set of grid points for which U has been frozen (the Alive points) and the set of other grid points (the Far points). This interface may be regarded as a set of fronts expanding from each source until every grid point has been reached. The key to the speed of the FMM is the use of a priority queue to quickly ﬁnd the Trial point with the smallest U value. If Trial points are ordered in a min-heap data structure, the computational complexity of the FMM is O(N log2 N ), where N is the total number of grid points.

652

F. Benmansour and L.D. Cohen

A way to estimate U, for a grid point xn is presented here. We limit ourselves to the 3D case. Adopting standard notation, we denote by Ui,j,k the value of U at the grid vertex (i, j, k) associated to the point xn with coordinates (i hx , j hy , k hz ), where hx , hy and hz are grid spacings in the x, y and z directions. A discretized version of (3) is solved in order to compute Ui,j,k . For the Eikonal equation, classic ﬁnite diﬀerence schemes tend to overshoot and are unstable. Rouy and Tourin [9] showed that the correct viscosity solution for Ui,j,k is given by the following ﬁrst order accurate scheme :

max{(Ui,j,k − Ui−1,j,k ), (Ui,j,k − Ui+1,j,k ), 0} hx max{(Ui,j,k − Ui,j−1,k ), (Ui,j,k − Ui,j+1,k ), 0} hy max{(Ui,j,k − Ui,j,k−1 ), (Ui,j,k+1 − Ui,j,k ), 0} hz

2 + 2 + 2

= (P˜i,j,k )2 .

(5)

This is an upwind scheme : the forward and backward diﬀerences are chosen to follow the direction of the ﬂow of information.

3

Valley Line Detection

In this section, we present a method to extract the intersection between the sub-domain boundary and the unknown surface of interest. We propose to use the minimal action map to extract the desired curve, since one can see that it corresponds to a valley line of the minimal action map (without a formal proof). Ridge and valley lines are concepts used in geomorphology and computer vision [10, 11]. According to Koenderink [12], valley lines are the locus of points on a surface at which the normal curvature assumes a local minimum in the principal direction associated with the largest, negative curvature. The main drawback of the existing criteria [10, 11] is that thresholding is needed. Hence, the detection is not precise enough, and needs more interaction for real noisy images. Moreover, these approaches are not adapted to our case where we want to extract the valley line of a scalar function deﬁned on a surface topologically equivalent to a sphere. Our approach is heuristic, based on the fact that the fast marching propagates faster along the desired surface and then the minimal action map takes lower values along the curve of intersection between the domain boundary and the surface. Discrete definition of Σ and ∂Σ and Minimal action map on ∂Σ In practice, we assume that the volume Σ is deﬁned as a boolean array. Then, we can partition Σ into two subsets, int(Σ) and ∂Σ, its interior and its boundary. A voxel x ∈ Σ is in the interior of the volume if all its 6 neighbors are in Σ, and it is a point of the boundary ∂Σ if x ∈ Σ \ int(Σ). Then ∂Σ is also represented by a boolean array (see ﬁgure 3).

From a Single Point to a Surface Patch by Growing Minimal Paths

653

(b)

(a)

Fig. 3. Discrete representation of the volume Σ and its boundary ∂Σ. (a) The volume Σ is described by a boolean array. (b) Σ is partitioned into two subsets int(Σ) and ∂Σ such that ∂Σ is connex according to on 26-connectivity.

(a)

(b)

(c)

frontier Γ of the surface patch

Fig. 4. Minimal action map associated to source point p and potential of ﬁgure 1. (a) Cut views of the minimal action map U on volume Σ. (b) View of U on ∂Σ, and its valley line Γ . (c) Unfolded U|∂Σ , valley line, and diﬀerent marked points on Γ correspond to local minima.

Let us note U|∂Σ : ∂Σ → R+∗ the restriction of U on ∂Σ (see ﬁgure 4.) The value U(x) for a point x in ∂Σ is the arrival time to point x of the wavefront ˜ Since potential P˜ takes propagating from the source point p with velocity 1/P. lower values along the surface S, the front propagates faster along it. So, we can reasonably assume that the ﬁrst point reached by the front on ∂Σ belongs to ∂Σ ∩ S. This point is easy to detect, because it is the global minimum of U|∂Σ and is noted xmin . In a more general manner, each local minimum xm of U|∂Σ has been reached by the front before all points in a small neighborhood of xm . Since, the wavefront propagates faster along S, one can expect that the curve ∂Σ ∩ S corresponds to valley lines on U|∂Σ . For valley line detection, our approach is simple and fast. Using the function U|∂Σ and without parametrizing the surface ∂Σ, we ﬁnd frontier Γ of the surface patch S ∩ ∂Σ by looking for the cyclic sequences of the valley lines of U|∂Σ containing xmin . Finding Valley Lines of U|∂Σ As explained above, valley lines of U|∂Σ contain the local minima xm as well as the saddle points. A robust way to link two local minima is to detect the saddle point between them and to make a double gradient descent to each minimum. The diﬃculty here is that some local minima and saddle points of U|∂Σ do not

654

F. Benmansour and L.D. Cohen

belong to the curve of interest. To avoid this, saddle points of U|∂Σ are detected by increasing order. During this step, we store the information on a graph G such that vertices of the graph correspond to local minima of U|∂Σ , and an edge corresponds to a pair of valley lines joining two local minima via a saddle point. The valley line detection algorithm stops when a cycle (in the sense of a simple closed path) is detected in the graph G. However, the closed curve Γ tends to have low length, linking between close local minima. In practice, one adds two ad hoc constraints which make it possible to extract the border of the surface patch in a more robust manner. The algorithm stops as soon as the global minimum of U∂Σ , xmin , belongs to the closed sequence , and the subset of int(Σ) deﬁned by : −1 U|int(Σ) (] max{U(x )}, +∞[) = {x ∈ int(Σ); U(x) > max(U(x ))} x ∈

x ∈

includes exactly two connected components for the 26-connectivity, which means that the sequence cuts the boundary ∂Σ into exactly two connected components (see ﬁgure 4).

4

Dense Network of Minimal Paths: An Implicit Approach

Once the boundary curve Γ is obtained, it is easy to construct explicitly a network of minimal paths linking points of Γ to the source point p by simple gra Γ dient descents as in [13]. The network linking Γ to p is noted Np = CxΓ ,p . xΓ ∈Γ

Since this networkmay have holes, our objective is to ﬁnd a smooth function Ψ : Σ → R, such that the network NpΓ is included in the zero level set of Ψ , i.e NpΓ ⊂ Ψ −1 ({0}), where Ψ −1 ({0}) = {x ∈ Σ; Ψ (x) = 0}. A necessary condition on function Ψ is ∇Ψ (x).∇U(x) = 0, (6) for each point x belonging to a path CxΓ ,p . Thus, vector ∇Ψ is perpendicular to ∇U along the minimal paths of the network NpΓ . Extending the constraint given by equation (6) to the whole domain Σ gives a suﬃcient condition on Ψ . Moreover, adding a linear term on Ψ smoothes the solution without changing the zero level set of Ψ . Hence, if Ψ is a smooth function satisfying the following conditions:

(C1 ) ∀ x ∈ Σ, ∇Ψ (x) · ∇U(x) − α Ψ (x) = 0, (7) (C2 ) ∀ x ∈ Γ, Ψ (x) = 0, where α ≥ 0, then NpΓ ⊂ Ψ −1 ({0}). Finally, Ψ −1 ({0}) is a dense network of minimal paths. Indeed, if Ψ satisﬁes conditions (C1 ) and (C2 ), then ∀x ∈ Ψ −1 ({0}), the minimal path Cx,p linking x to the source p in included in Ψ −1 ({0}). Detailed proof of these results can be found in [3]. Using conditions (C1 ) and (C2 ), we look for a solution Ψ of the following Dirichlet problem:

From a Single Point to a Surface Patch by Growing Minimal Paths

∇Ψ (x) · ∇U(x) − α Ψ (x) = 0 if x ∈ int(Σ), if x ∈ ∂Σ, Ψ (x) = d|∂Σ (x)

655

(8)

where d|∂Σ is a signed Euclidean distance to Γ on ∂Σ. Indeed, that makes the function Ψ satisfying the second condition (C2 ). One can propose other boundary conditions satisfying (C2 ), but empirically, we found that the signed distance is an adequate choice. Since Γ is a simple closed curve on ∂Σ and ∂Σ is topologically equivalent to a sphere, Γ partitions ∂Σ into two distinct open surfaces. That makes the sign choice for d|∂Σ obvious. First, the unsigned distance from Γ on ∂Σ is calculated using the Fast Marching algorithm (this time using 26-connectivity), then diﬀerent signs are attributed to the distance on each connected component of ∂Σ \ Γ of the partition (see ﬁgure 5).

Fig. 5. Transport initialization. First, the distance map from the curve Γ is computed. Then using Γ , ∂Σ \ Γ is partitioned into exactly two parts. Finally, diﬀerent signs are attributed to d|∂Σ on each connected component.

Equation (8) is a stationary transport equation. The associated non stationary PDE models the transport in time and space of material along the vector ﬁeld ∇U. The stationary transport equation has been studied [3] for surface segmentation, for computing tissue thickness [14] and inpainting [15]. The stationary transport equation (8), as most PDEs for which characteristics intersect are numerically hard to solve. Nevertheless, the direction on which information propagates is known (−∇U) thus one can elaborate a single pass algorithm based on an ordered sweeping of the grid points [3,14,15]. We propose to ﬁnd values of Ψ by exploring points of Σ in decreasing order of |Ψ |. The algorithm, called Fast Transport is similar to the Fast Marching algorithm : only the ordering is diﬀerent as well as the local update scheme. The complexity of the Fast Transport algorithm is O(N log(N )). The information propagates from ∂Σ to the source point p following the direction −∇U. Thus, it is important to use an upwind scheme that takes into account the direction −∇U to approximate the derivatives of Ψ . Let us note Ψi,j,k the value of Ψ at point x of coordinate (ihx , jhy , khz ), ∂d Ψi,j,k the derivative of Ψ along direction d (d corresponds to x, y or z-direction) and ∂d Ui,j,k the derivative of U along direction d. If ∂d Ui,j,k < 0, the information is transported increasingly along d direction. Thus along the x direction we have:

656

F. Benmansour and L.D. Cohen

Fig. 6. On the left and on the middle are respectively shown, on a cut view, the function Ψ and its sign on Σ. On the right is shown the extracted surface pathc, i.e. the isosurface Ψ −1 ({0}), as well as the network of minimal paths NpΓ .

∂x Ψi,j,k

⎧ Ψi+1,j,k − Ψi,j,k ⎪ ⎪ if ∂x Ui,j,k ≥ 0, ⎨ hx = ⎪ − Ψi−1,j,k Ψ ⎪ ⎩ i,j,k if ∂x Ui,j,k < 0. hx

The derivatives along y and z direction are similar. The update scheme of the Fast Transport algorithm is based on the previous equation, by injecting it in equation (8), see [3] for more details. Lastly, although this scheme is of relatively low precision and dissipative, it gives satisfactory results in our experiments with an acceptable convergence speed. In our implementation α is a parameter that can be ﬁxed through the maximum discontinuity jump of Ψ around the source p. Indeed, by considering the minimal path Cx,p , linking a point x ∈ ∂Σ to p, parametrized on the interval J = [0, L(x)], where L(x) is the Euclidean length of the path, one can prove using equation (8) that ∀ s ∈ J, Ψ Cx,p (s) = d|∂Σ (x) e−αs . Thus the discontinuity jump occurs around the source point p and is as high as |d|∂Σ (x)|e−αL(x) . Fixing a maximum discontinuity jump ε and

α =

log max |d|∂Σ (x)| − log(ε) x∈∂Σ

min L(x)

,

x∈∂Σ

guaranty that the discontinuity jump around the source point p is less or equal than ε. Imposing this constraint requires the computation of the Euclidean length L of the minimal paths. This calculus can be easily done during the Fast Marching propagation as explained in [16, 17]. On ﬁgure 6, function Ψ solution of equation (8), the ﬁnal segmentation result Ψ −1 ({0}) as well as the network of minimal paths are shown.

From a Single Point to a Surface Patch by Growing Minimal Paths

657

Fig. 7. We select a sub-volume from a CT cardiac image. Then an edge detector potential, inversely proportional to the gradient magnitude of the image ∇I is shown. The Fast Marching algorithm is launched from the selected source point to compute the minimal action map U. Then the valley line of U is calculated. Finally the information is transported from the initialized values of the sub-volume boundary using the fast transport algorithm, and the segmentation result of this patch of surface is found using the marching cube algorithm on the solution of the transport equation.

Fig. 8. On the left: segmentation of a synthetic torus. On the right: segmentation of a closed cell from electronic microscopy image. (a) Potential P taking lower values on the features of interest on which a single source point is selected. The other points are found automatically using the approach presented in [17]. (b) A cut view of the visited domain Ω ∗ showing the value of the minimal action map U. (c) A Cut view of the domain Ω ∗ showing the Voronoi partition. (d) The set of sources and the valley lines detected on each Voronoi cell. (e) A cut view of the domaine Ω ∗ showing values of function Ψ solution of the transport equation (8). (f ) Isosurface Ψ −1 ({0}) on which the detected keypoint points, the valley lines and the geodesic meshing are superimposed. On the right: (g-h-i) Some slices of the original image and the ﬁnal segmentation Ψ −1 ({0}) superimposed on it.

5

Experimental Results

Using our method, one can extract a surface patch from a single point, see ﬁgure 7. The main advantages of our method is that it is minimally interactive

658

F. Benmansour and L.D. Cohen

and fast. The important constraint is that the boundary of the selected subvolume intersects the surface on a single closed curve. One can imagine that by considering a subdivision of the whole domain, and by selection of a few points on the sub-domains that contains the surface of interest, one can extract a full segmentation of the desired object. Recently, we presented [17] a new method for segmenting closed contours and surfaces. Our work builds on a variant of the minimal path approach. First, an initial point on the desired contour is chosen by the user. Next, new keypoints are detected automatically using a front propagation approach. We assume that the desired object has a closed boundary. This a-priori knowledge on the topology is used to devise a relevant criterion for stopping the keypoint detection and front propagation. The ﬁnal domain visited by the front will yield a band surrounding the object of interest. Using this method for 3D closed objects, we can extract a networks of minimal paths from a 3D image called Geodesic Meshing. But this segmentation is insuﬃcient. The Voronoi partition of the visited domain gives a good subdivision of it, and by applying the algorithm presented in this paper on each Voronoi cell, one can ﬁnd a full segmentation of the object of interest, see ﬁgure 8.

6

Conclusion

In this paper we have proposed a new method to segment a surface patch from a single source point. Our method needs minimal interaction : a single source point. An important condition is that the boundary of the sub-volume that contains the surface patch of interest should intersects the surface on a single closed curve. By remarking that this closed curve corresponds to the valley line of the arrival time from the source point we have proposed a heuristic to extract it automatically. Finally we adapted an existing implicit surface segmentation method to ﬁnd a complete surface that contains the valley line and the network of minimal paths linking this valley line to the source point. Our approach can be extended to segment a complete surface by subdividing the domain into several sub-domains containing the desired surface patches. Then, a few points can be enough to generate a coherent object boundary segmentation.

Acknowledgements We would like to thank Stéphane Bonneau for his contributions, and Professor Anthony J. Yezzi for interesting discussions. This work was partially supported by ANR grant SURF -NT05-2_45825.

References 1. Kass, M., Witkin, A., Terzopoulos, D.: Snakes: active contour models. International Journal of Computer Vision 1, 321–331 (1988) 2. Cohen, L.D., Kimmel, R.: Global minimum for active contour models: a minimal path approach. International Journal of Computer Vision 24, 57–78 (1997)

From a Single Point to a Surface Patch by Growing Minimal Paths

659

3. Ardon, R., Cohen, L.D., Yezzi, A.: Fast surface segmentation guided by user input implicit extension of minimal paths. Journal of Mathematical Imaging and Vision 25, 289–305 (2006) 4. Caselles, V., Kimmel, R., Sapiro, G., Sbert, C.: Minimal surfaces based object segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 394–398 (1997) 5. Cohen, L.D., Cohen, I.: Finite element methods for active contour models and balloons for 2D and 3D images. IEEE Transactions on Pattern Analysis and Machine Intelligence 15, 1131–1147 (1993) 6. Sethian, J.A.: Level Set Methods and Fast Marching Methods. Cambridge University Press, Cambridge (1999) 7. Tsitsiklis, J.N.: Eﬃcient algorithms for globally optimal trajectories. IEEE Transactions on Automatic Control 40, 1528–1538 (1995) 8. Dijkstra, E.W.: A note on two problems in connection with graphs. Numerische Mathematic 1, 269–271 (1959) 9. Rouy, E., Tourin, A.: A viscosity solution approach to shape from shading. SIAM Journal on Numerical Analysis 29, 867–884 (1992) 10. López, A., Lloret, D.: On ridges and valleys. In: ICPR 2000: Proceedings of the International Conference on Pattern Recognition, Washington, DC, USA, p. 4059. IEEE Computer Society, Los Alamitos (2000) 11. Tang, C.K., Medioni, G.G.: Extremal feature extraction from 3-D vector and noisy scalar ﬁelds. In: IEEE Visualization 1998, October 1998, pp. 95–102 (1998) 12. Koenderink, J.: Solide Shape. MIT Press, Cambridge (1990) 13. Ardon, R., Cohen, L.D.: Fast constrained surface extraction by minimal paths. Int. J. Comput. Vision 69(1), 127–136 (2006) 14. Yezzi, A., Prince, J.L.: An Eulerian PDE Approach for Computing Tissue Thickness. IEEE Transactions On Medical Imaging 22, 1332–1339 (2003) 15. Bornemann, F., Marz, T.: Fast image inpainting based on coherence transport. JMIV 28(3), 259–278 (2007) 16. Cohen, L.D., Deschamps, T.: Segmentation of 3D tubular objects with adaptive front propagation and minimal tree extraction for 3D medical imaging. Computer Methods in Biomechanics and Biomedical Engineering 10, 289–305 (2007) 17. Benmansour, F., Cohen, L.D.: Fast object segmentation by growing minimal paths from a single point on 2D or 3D images. Journal of Mathematical Imaging and Vision 33(2), 209–221 (2009)

Optimization of Convex Shapes: An Approach to Crystal Shape Identification Timo Eirola and Toni Lassila Helsinki University of Technology, Institute of Mathematics, P.O. Box 1100, FI-02015 TKK, Finland [email protected]

Abstract. We consider a shape identiﬁcation problem of growing crystals. The shape of the crystal is to be constructed from a single interferometer measurement. This is an ill-posed inverse problem. The forward problem of interferogram from shape is injective if we restrict the problem to convex shapes with known boundary. The problem is formulated as a shape optimization problem. Our aim is to solve this numerically using the gradient descent method. In the numerical computations of this paper we study the behavior of the approach in simpliﬁed cases. Using H 1 -gradients (inner products) acts as a regularization method. Methods for enforcing the convexity of shapes are discussed.

1

Introduction

Shape optimization is a ﬁeld of mathematical optimization concerned with ﬁnding the shape (bounded open set with Lipschitz boundary) that minimizes a given cost functional. Boundary variational techniques can be used to compute sensitivities of functionals with respect to shape. Comprehensive texts on the topic of shape analysis include [1] and [2]. We consider a shape identiﬁcation problem of ﬁnding the shape of a growing 3 He crystal that best ﬁts the interferogram produced in a Fabry-Pérot interferometer. Based on physical principles it is assumed that the crystal shape is convex at all times. For an overview of the growth process of 3 He crystals and the interferometer setup, see [3]. The restriction to convex shapes can be used as a simpliﬁcation tool in shape optimization problems. In [4] the authors showed the existence of solutions to very generic shape optimization problems with the constraint that the shapes were convex. In our problem of determining shape from interferogram the operator solving the forward problem is generally not injective if the shapes are allowed to be nonconvex. We prove that if the convexity assumption holds and the height of the shape at the boundary of the computational domain is known then the shape identiﬁcation problem does have a unique solution.

This work has been supported by the Academy of Finland (decision number 107290/04). We would like to thank Heikki Junes from the Low Temperature Laboratory at TKK for his input and introducing us to this problem.

X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 660–671, 2009. c Springer-Verlag Berlin Heidelberg 2009

Optimization of Convex Shapes

661

It has been previously noted that the convexity constraint can be diﬃcult to handle in numerical computations, especially in higher dimensions. It is known that pointwise conditions, such as curvature conditions, can fail to guarantee convexity for functions sampled at discrete points. For further discussion on this point, see [5]. Methods for optimization in the family of convex functions have been previously studied in [5, 6, 7, 8, 9]. In contrast to most of these approaches we do not write a strict convex constraint system, but instead use a penalization method that allows convexity to be temporarily broken when it is beneﬁcial to the convergence of the iteration. The shape identiﬁcation problem is solved using level set methods and gradient descent for shapes. Methods for convexiﬁcation by evolution equations, such as the level set method, have been previously considered in [10, 11]. As is typical for ill-posed inverse problems, the presence of experimental noise in the measurements requires some type of regularization. We demonstrate that using H 1 -gradients (inner products) for the shape gradients acts as a form of regularization.

2 2.1

Shapes and Shape Evolution Representing Shapes

We ﬁrst deﬁne the notation. The computational domain D ⊂ IRd , d ∈ {1, 2}, is a convex bounded open set. We consider convex shapes (open sets with Lipschitz boundary) Ω ⊂ D × IR+ , which are supported by D from below, that is to say n(x), e3 < 0

=⇒

x3 = 0 ,

(1)

where n is the outward normal vector ﬁeld on the surface ∂Ω. A convex shape Ω supported by D can be represented in many ways. One is to give a Lipschitz function φ : D × IR+ → IR such that Ω = {x : φ(x) < 0} ,

Ω c = {x : φ(x) ≥ 0}

(2)

and |∇φ| nonvanishing on ∂Ω. Then φ is called an implicit function or a level set function for Ω. An alternative representation of Ω is with a function u : D → IR+ deﬁned as u(x1 , x2 ) = sup {x3 ≥ 0 : φ(x1 , x2 , x3 ) ≤ 0} , (3) where φ is an implicit function for Ω. We call this the height function of Ω. Note that if Ω is convex then u is concave. Denote by C ⊂ H 1 (D) ∩ C(D) the set of concave functions on D that are continuous on D. We also deﬁne Ch ⊂ C as the subset of concave functions that are equal to h on the boundary ∂D for a given function h : ∂D → IR+ .

662

2.2

T. Eirola and T. Lassila

Level Set Methods

Consider an initial shape Ω0 and an evolution its boundary ∂Ω0 under a smooth velocity ﬁeld v(x, t). When the shape Ω(t) at time t is represented by an implicit function φ(·, t), we have an Eulerian representation of the evolution of the implicit function in time φt (x, t) + vn (x, t)|∇φ(x, t)| = 0 ,

(4)

where vn is the component of v in the outward normal direction of ∂Ω. This is called a level set equation. Level set methods are a generic framework of nonlinear hyperbolic-parabolic PDEs for implicit functions that can be used to model evolution of shapes under certain types of ﬂows. For a generic introduction into level set methods, see [12]. For a survey of level set methods speciﬁcally in inverse problems, see [13].

3 3.1

Shape Optimization Shape Derivatives

Let J(Ω) : Σ → IR be a shape functional deﬁned on some family of admissible shapes Σ. The derivative with respect to shape at Ω0 in the direction of the smooth velocity ﬁeld v is deﬁned as the limit dJ(Ω0 ; v) = lim+ t→0

J(Ωt ) − J(Ω0 ) t

(5)

when it exists. With some general assumptions (see Chap. 8 of [1] for details) this expression is bounded and linear with respect to v, and has support only on the boundary of Ω0 : dJ(Ω0 ; v) = D · vn dS . (6) ∂Ω0

Using the shape derivative (6) the shape functional can be expanded as J(Ωt ) = J(Ω0 ) + t · dJ(Ω0 ; v) + o(t) .

(7)

For a given Hilbert space H(∂Ω0 ) we look for the unique function ∇S J ∈ H(∂Ω0 ) such that dJ(Ω0 ; v) = ∇S J, vn H . (8) Then ∇S J is the shape gradient of J with respect to the chosen inner product. If the velocity normal ﬁeld vn is chosen to be the negative shape gradient vn = −∇S J(Ω0 ) we have J(Ωt ) = J(Ω0 ) − t · ||dJ(Ω0 )||2H(∂Ω0 ) + o(t) < J(Ω0 )

(9)

for suﬃciently small t > 0. This is the method of gradient descent for shape optimization. The negative gradient ﬂow can be eﬃciently implemented with numerical level set methods.

Optimization of Convex Shapes

3.2

663

Convexity Constraints

To obtain level set methods that preserve the convexity of the shape we follow the basic idea of constrained gradient descent. Let G(Ω) be a shape constraint functional. We consider the constrained shape optimization problem min Ω

(10)

J(Ω)

subject to G(Ω) = 0. Then if J and G are shape diﬀerentiable and there exist shape gradients ∇S J and ∇S G, we let μ be a Lagrange multiplier and obtain the necessary conditions for a constrained minimum ∇S J(Ω) + μ∇S G(Ω) = 0 , G(Ω) = 0 .

(11) (12)

A C 2 shape in the plane is convex if the curvature of its boundary is nonnegative. In three dimensions a suﬃcient condition for convexity is that both principal curvatures of the surface must be nonnegative. Let Ω be a convex shape with the height function u. Then the minimum curvature k1 of the surface is given by 2 2 ux1 x1 + ux2 x2 + (ux1 x1 − ux2 x2 ) + (2ux1 x2 ) k1 = − . (13) 1 + u2x1 + u2x2 This follows from taking the smaller eigenvalue of the matrix representation of the second fundamental form. We extend k1 to all of D × IR+ by setting k1 (x1 , x2 , x3 ) = k1 (x1 , x2 , u(x1 , x2 )) for all x3 ≥ 0 . (14) Let Ω be supported by D and deﬁne k := k1 1 + |∇u|2 . We use the constraint functional G(Ω) = u(x) max {0, −k1 (x)} dS . (15) ∂Ω

This functional vanishes if and only if k1 is everywhere nonnegative. The scaling by u is shown to be useful by the following computation. We reformulate the functional in terms of a change of integrals from ∂Ω to D. Then: u G(Ω) = max 0, − max 0, −u k dx1 dx2 k dS = 1 + |∇u|2 ∂Ω D u(x1 ,x2 ) = max 0, − k dx3 dx2 dx1 = max 0, − k dx . D

0

Ω

According to Theorem 4.2 of Chap. 8 in [1] this functional has the L2 shape gradient = max 0, − ∇S G k . (16)

664

T. Eirola and T. Lassila

We obtain the penalty function formulation for the level set equation (4) with a convexity constraint

φt + vn − μ max 0, − k |∇φ| = 0 , (17) with a penalty term μ > 0. This method is a version of the min/max curvature ﬂows studied in [14], since φt + vn |∇φ| = μ min {0, k1 } |∇φ| .

(18)

Furthermore, the minimum curvature ﬂow will convexify the initial shape, justifying our choice of the constraint functional (15). The following theorem was proven in [11]: Theorem 1. In the case that vn ≡ 0, the viscosity solution of the equation (17) converges towards the convex hull of the initial shape Ω0 as t → ∞.

4 4.1

A Problem in 3 He Crystal Imaging Fabry-Pérot Interferometer Measurement of a Crystal

The formation of faceted crystals in low-temperature 3 He has been the subject of study in the low temperature physics community. It is known that at below 200 mK temperatures smooth facets appear that correspond to orientations of the lattice planes. The problem of predicting which facets appear at which temperature is still open. It is known that as the temperature is increased past the so called roughening limit the facets become rounded out and no longer appear. The theoretical roughening limit is much higher than what has been observed in practical experiments. We consider an experimental setup where liquid 3 He at temperature below 200 mK is placed between the two plates of a Fabry-Pérot interferometer. Overpressure is then exerted to allow the creation of crystals to occur. As light passes through the crystals, a diﬀraction pattern is observed on a CCD imaging array. By relating the intensity of the interferogram to the phase delay through the crystal at each point we can determine the shape of the crystal and the orientation of all the facets. 4.2

Convexity of Crystals and the Growth Process

The growth of crystals is governed by three principal forces: the external work done to the system by the driving overpressure, the surface tension between the liquid and solid Helium, and gravity. When the crystal growth process is suﬃciently slow we can assume that at each measurement the crystal has achieved thermal equilibrium. The crystal shape is then determined by minimizing a surface energy. This leads to an anisotropic mean curvature ﬂow that models the growth process of crystals [15]. It is known that such ﬂows preserve convexity of the shapes [16]. We therefore assume that, apart from small irregularities, the thermal equilibrium shape is also convex. This assumption has been veriﬁed in experimental measurements.

Optimization of Convex Shapes

4.3

665

Inverse Problem of Shape from Interferogram

Let D = [0, 1]2 be the domain of the interferogram and f : D → IR a function that gives the intensity of the interference pattern at each point on the CCD. The physical parameters are Δnsl , the diﬀerence between the refractive indices of the solid and liquid 3 He, and λ, the interferometer laser wavelength, and a(x) the amplitude. The intensity of the interference pattern at each point is given approximately by Δnsl F (u)(x) = a(x) ϕ( u(x)) = f (x) , (19) λ where ϕ : IR → [−1, 1] is a continuously diﬀerentiable piecewise strictly monotone waveform function. Note that this deﬁnition forbids square or sawtooth type waveforms. To simplify things we assume the laser amplitude to be almost constant and known, a(x) ≈ a. The inverse problem to be solved is: given an interferogram f ∈ L2 (D) of measured intensities (with noise), deduce the shape of the crystal Ω. This problem can be posed as a mathematical shape optimization problem. Let Ω be a convex trial shape supported by D. Denote the bottom part of the surface of the shape as Γb := ∂Ω ∩ D. We consider the shape functional with the L2 -norm J(Ω) = 12 |ϕ(x3 ) − Sf (x1 , x2 )|2 dS , (20) ∂Ω\Γb

where ϕ is a continuously diﬀerentiable and piecewise strictly monotone function and S : L2 (D) → H 1 (D) is a smoothing operator. The corresponding mathematical shape optimization problem is then min

convex Ω∈ΣΓ

J(Ω) ,

(21)

b

where ΣΓconvex is a family of convex shapes with Γb ﬁxed. The choice of this family b of will be discussed later. We have the following existence theorem from [4]: Theorem 2. Let f be such that Sf is continuous. Then the shape optimization problem (21) has at least one solution. 4.4

Is the Inverse Problem Uniquely Solvable?

It is possible to construct examples that show that in the absence of a convexity constraint the inverse problem of ﬁnding the shape Ω from its interference pattern f is not uniquely solvable even when we set a perimeter constraint such as requiring Γb to be ﬁxed. But if we require convexity and ﬁx u on the boundary ∂Γb , we have the following result: Theorem 3. Let D ⊂ IRd be a bounded convex open set and Γ its boundary. Fix a function h ∈ C(Γ ) on the boundary. Let Ch be the family of concave functions u : D → IR in C(D) such that u|Γ = h. Let the operator F : H 1 (D) → H 1 (D) be deﬁned as (Fu)(x) = ϕ(u(x)) . (22)

666

T. Eirola and T. Lassila

where ϕ is a continuously diﬀerentiable and piecewise strictly monotone function. Then the restriction of F into Ch is injective. Proof. Case d = 1 Let u, v : [a, b] → IR be distinct concave functions such that u(a) = v(a), u(b) = v(b), and that ϕ(u) ≡ ϕ(v). Let (ξ, η) ⊂ [a, b] be any open interval where u = v but u(ξ) = v(ξ) and u(η) = v(η). Without loss of generality we assume u > v on (ξ, η). Since ϕ is continuously diﬀerentiable and ϕ(u(ξ)) = ϕ(v(ξ)) from the inverse function theorem it follows that ϕ (u(ξ)) = 0. From the assumption that ϕ is piecewise strictly monotone follows that ϕ has only isolated zeros. Thus the local behavior of ϕ near u(ξ) can be of only two types, a) or b), as shown in Fig. 1. a)

b)

ϕ

u, v

u(ξ) = v(ξ)

u(ξ) = v(ξ)

Fig. 1. The diﬀerent kinds of possible local behavior of the function ϕ(u) near a bifurcation point ξ

Since u is concave there exists an interval (ξ, ξ + ε) where it is either constant, increasing, or decreasing: 1. If u was constant in some interval (ξ, ξ + ε) then so would be ϕ(v). But because ϕ cannot vanish in any neighborhood of u(ξ) this would mean that v would also be constant in (ξ, ξ + ε), a contradiction. So neither u nor v can be locally constant past the bifurcation point ξ. 2. Assume that u is increasing in some interval (ξ, ξ + ε) and the local behavior of ϕ is like in a). Then v must be decreasing in (ξ, ξ + ε). 3. Assume that u is decreasing in some interval (ξ, ξ + ε) and the local behavior of ϕ is like in b). But since u > v, case b) is impossible. Thus immediately after the bifurcation point ξ we must have u increasing and v decreasing. Using the same argument at η we get that u must be decreasing and v increasing in some interval (η − ε, η). But v is concave and cannot be ﬁrst decreasing and later increasing, a contradiction.

Optimization of Convex Shapes

667

Case d ≥ 2 For every pair of points x, y ∈ Γ we take the line segment L connecting x to y and look at the restrictions u|L , v|L , which are concave functions of one variable. Since u, v coincide on all such segments L they are equal everywhere. We remark that in when the measurement is noisy we can lose the uniqueness of the solution. This is due to the fact that the range of the forward operator F is nonconvex, and thus if the measurement f lies outside the range of F the minimization problem (21) can have multiple solutions. 4.5

Formulation for the H 1 -Variation of a Shape Functional

To solve optimization problem (21) using the gradient descent method we must ﬁnd the shape gradient of the functional given by (8). While the gradient could be computed only in the L2 inner product, we prefer the H 1 inner product since the resulting gradients are smoother and hopefully also lead to a numerically more robust algorithm. The need for regularizing the shape variations is wellestablished in the literature, but the relation with regularization of ill-posed inverse problems perhaps less so. The eﬀect of diﬀerent inner products on the convergence of the gradient descent iteration was studied in more detail in [17]. Lemma 1. Consider the shape functional for d-dimensional convex shapes Ω ⊂ D × IR+ : g(x, n) dS , (23) J(Ω) = ∂Ω \ Γb

1

where g(x, n) is H with respect to both arguments. Then J is shape diﬀerentiable and the shape derivative dJ(Ω; v) with respect to a normal variation vn ∈ H01 (D) is given by

dJ(Ω; v) = − ∇n g · ∇vn + (∇x g · n + κg)vn |F | dξ , (24) D

for all vn ∈ H01 (D), where |F | := 1 + |∇u|2 is the change of integrals term given by u the height function of the convex shape. Proof. The details are given for example in [18]. Here we reproduce only the general procedure. Let Ω be given and φ its implicit function. Then according to the coarea formula [19] gives ∇φ ) |∇φ| δ(φ) 1lΓbc dx . g(x, n) dS = g(x, J(Ω) = |∇φ| ∂Ω \ Γb IRd The variation can now be performed in terms of φ. Let vn = −ψ/|∇φ| be an extension velocity ﬁeld to the entire IRd such that ψ|Γb ≡ 0, i.e. the base remains ﬁxed. The Gâteaux derivative is, after some computations, given by d ψ ∇φ J(φ + τ ψ) = ∇ · ∇n g + g |∇φ|δ(φ) dx . dJ(Ω; v) = − dτ |∇φ| |∇φ| IRd

668

T. Eirola and T. Lassila

Integration by parts gives ∇ · (∇n g) vn dS = − ∂Ω

∇n g · ∇vn dS

∂Ω

and the result follows by using the coarea formula in the other direction and noting that n = ∇φ/|∇φ| and κ = ∇ · n is the mean curvature of ∂Ω. We can thus compute the negative shape gradient of J with respect to the H 1 inner product as the solution w ∈ H01 (D) of the elliptic equation (∇vn · ∇w + vn w) dξ+ (αvn + β · ∇vn ) dξ = 0, for all vn ∈ H01 (D), (25) D

D

where α = |F | (∇x g · n + κg) and β = −|F |∇n g as in Lemma 1, plus homogeneous Dirichlet boundary conditions. For the convex constrained iteration it also beneﬁcial to use the H 1 -gradient of the constraint functional (15), which can be obtained by the same procedure from (16).

5 5.1

Numerical Experiments Methodology

As a ﬁrst approach to optimization of convex shapes we limit the numerical experiments to 1-d and choose D = [0, 1]. The questions to be answered are: – Does the convexity constraint penalty term improve the quality of the recovered shapes? – We would like to estimate the tensor of anisotropy of the mean curvature ﬂow that drives the crystal formation process. Can reasonable estimates for the curvatures be obtained from the recovered shapes? The quality of the recovered shapes was studied with two diﬀerent crystal proﬁles (shown in Fig. 2). Case A represents a faceted crystal, while Case B is a smooth proﬁle. For the forward model we used a sinusoidal waveform, f (x) = sin(γu(x)). To measure the error of the recovered shapes we generated a testing sample of 100 noisy realizations of the data f , each with 10% standard deviation, and took the mean L2 -error over this sample set. At each descent step the shape derivative (24) was computed. The H 1 -gradient was solved from equation (25). The normal velocity ﬁeld was extended to the entire computational domain and the resulting level set evolution was solved using the Level set method toolbox [20] for Matlab. The gradient descent step size was chosen according to the Armijo rule [21] to obtain decreasing steps in the functional (20). The iteration was stopped when the recovered height function u changed less than 0.1% in the L2 -norm during the previous step. For the convex constrained iteration (17) we used a penalty parameter value of μ = 105 .

Optimization of Convex Shapes

Case A

−4

x 10

Case B

−4

4

669

x 10

3

2

2 1 0

1 0

0.2

0.4

0.6

0.8

0

1

0

0.2

0.4

0.6

0.8

1

Fig. 2. Left: True crystal shape (solid line) and initial guess (dashed line) for the test Case A. Right: Same for Case B.

5.2

Choosing the Smoothing Operator S

To construct the smoothing operator S in (20) we considered linear diﬀusion operators of the form −K

(Sf )(xi ) = (I − δDxx )

f (xi ),

K ∈ IN ,

(26)

where Dxx is an operator giving the discrete approximation of the second derivative of f at xi . The simplest choice is the symmetric diﬀerence approximation for the second derivative (in the 1-d case) Dxx =

f (xi+1 ) − 2f (xi ) + f (xi−1 ) . Δx2

(27)

This diﬀerence approximation tends to smooth out especially the corners of f , so that for faceted proﬁles we should choose K moderately small. We chose δ = 0.01 and considered the cases K = 0 (no smoothing) and K = 100 (with smoothing). 5.3

Comparison of Convergence with and without the Convexity Constraint

The ﬁrst observation we made was that the L2 -gradient descent iteration in general does not work at all. The computed boundary variations were too oscillatory. After an H 1 -gradient was implemented the regularization was enough to provide local convergence from an initial guess that had 15%-20% relative L2 -error. In Table 1 we list the accuracy of the obtained shapes by the relative L2 error from the true crystal shape. We note that in both cases the recovered solutions were roughly within 3% of relative error. This remained the case even with convexity constraints and smoothing of the data. The sharp corner of Case A also produced more error than the smooth proﬁle of Case B. 5.4

Estimating the Curvature(s) of the Crystal Surface

One way of evaluating the quality of the recovered crystal shapes is to see if useful estimates for the curvature(s) of the crystal surface can be obtained. We

670

T. Eirola and T. Lassila

Table 1. Relative L2 -error from the true proﬁle u obtained by the unconstrained (μ = 0) and convex constrained (μ = 105 ) iterations with and without smoothing

Case A B

No smoothing No smoothing With smoothing With smoothing μ=0 μ = 105 μ=0 μ = 105 1.71 % 0.47 %

2.61 % 0.51 %

1.98 % 0.47 %

2.63 % 0.61 %

ran both the unconstrained and convex constrained iterations for Case A. We also tested the eﬀect of increasing K in the smoothing operator (26). The obtained curvatures are plotted in Fig. 3. In this case the curvature should be zero almost everywhere with a singularity at one point. None of the curvature estimates are free from numerical artifacts. The convex constrained solution gives curvatures that are nearly nonnegative everywhere. The eﬀect of added smoothing is to dampen the oscillations of the recovered curvatures. μ = 0, No smoothing

μ = 0, With smoothing

1

1

0.5

0.5

0

0 0

0.5 5 μ = 10 , No smoothing

1

1

1

0.5

0.5

0

0 0

0.5

1

0

0.5 5 μ = 10 , With smoothing

1

0

0.5

1

Fig. 3. Estimated curvatures for the Case A obtained with the unconstrained and convex constrained iterations, with and without smoothing of the data. The true curvature is denoted by a dashed line.

6

Conclusions

The inverse problem of crystal shape identiﬁcation from a single interferogram is uniquely solvable if the shape is required to be convex and we have boundary data available. Numerical level set methods can be used to solve such problems with the gradient descent method. We added a penalty term to enforce convexity of the shapes. By choosing H 1 shape gradients we introduced regularization to the problem. This allowed recovery of solutions of the otherwise ill-posed problem. We demonstrated that local convergence is obtained even when relatively large amounts of noise are present in the interferogram. The convex penalty term improved the quality of the recovered surface curvatures.

Optimization of Convex Shapes

671

References 1. Delfour, M., Zolésio, J.P.: Shapes and geometries - analysis, diﬀerential calculus, and optimization. SIAM, Philadelphia (2001) 2. Sokolowski, J., Zolésio, J.P.: Introduction to shape optimization: shape sensitivity analysis. Springer, Heidelberg (2003) 3. Tsepelin, V., Alles, H., Babkin, A., Jochemsen, R., Parshin, A., Todoshchenko, I., Tvalashvili, G.: Morphology and growth kinetics of 3He crystals below 1 mK. J. Low Temp. Phys. 129(5-6), 489–530 (2002) 4. Buttazzo, G., Guasoni, P.: Shape optimization problems over classes of convex domains. J. Convex Anal. 4(2), 343–351 (1997) 5. Aguilera, N., Morin, P.: Approximating optimization problems over convex functions. Numer. Math. 111(1), 1–34 (2008) 6. Carlier, G., Lachand-Robert, T.: Convex bodies of optimal shape. J. Convex Anal. 10, 265–273 (2003) 7. Carlier, G., Lachand-Robert, T., Maury, B.: A numerical approach to variational problems subject to convexity constraint. Numer. Math. 88, 299–318 (2001) 8. Carlier, G., Lachand-Robert, T., Maury, B.: H 1 -projection into set of convex functions: A saddle point formulation. In: ESAIM: Proc., vol. 10, pp. 277–290 (2001) 9. Lachand-Robert, T., Oudet, É.: Minimizing within convex bodies using a convex hull method. SIAM J. Optim. 16(2), 368–379 (2005) 10. Hinterberger, W., Scherzer, O.: Variational methods on the space of functions of bounded Hessian for convexiﬁcation and denoising. Comput. 76, 109–133 (2006) 11. Vese, L.: A method to convexify functions via curve evolution. Commun. Partial Diﬀerential Equations 24(9), 1573–1591 (1999) 12. Osher, S., Fedkiw, R.: Level set methods and dynamic implicit surfaces. Applied Mathematics Sciences, vol. 153. Springer, Heidelberg (2002) 13. Burger, M., Osher, S.: A survey on level set methods for inverse problems and optimal design. European Journal of Applied Mathematics 16(2), 263–301 (2005) 14. Malladi, R., Sethian, J.: Image processing: ﬂows under min/max curvature and mean curvature. Graph. Models Image Process. 58(2), 127–141 (1996) 15. Wettlaufer, J., Jackson, M., Elbaum, M.: A geometric model for anisotropic crystal growth. J. Phys. A 27, 5957–5967 (1994) 16. Bellettini, G., Caselles, V., Chambolle, A., Novaga, M.: Crystalline mean curvature ﬂow of convex sets. Arch. Ration. Mech. Anal. 179, 109–152 (2005) 17. Burger, M.: A framework for the construction of level set methods for shape optimization and reconstruction. Interfaces Free Bound 5, 301–329 (2003) 18. Solem, J.: Variational problems and level set methods in computer vision - theory and applications. PhD thesis, Lund University (2006) 19. Federer, H.: Geometric measure theory. Springer, New York (1969) 20. Mitchell, I.: The ﬂexible, extensible and eﬃcient toolbox of level set methods. J. Sci. Comput. (2007) (online ﬁrst) 21. Armijo, L.: Minimization of functions having Lipschitz continuous ﬁrst partial derivatives. Paciﬁc J. Math. 16(1) (1966)

An Implicit Method for Interpolating Two Digital Closed Curves on Parallel Planes Nikolaos Gabrielides and Laurent Cohen Centre de Recherche en Mathématique de la Décision, Université Paris IX, Dauphine, Place du Maréchal de Lattre de Tassigny, 75775 Paris Cedex 16, France [email protected], [email protected]

Abstract. Ardon et al. [2] presented an implicit method for surface segmentation in 3D images. The boundary of the surface is assumed to be constrained by two given curves in the image. In this work we adopt the afore approach to interpolate two given digital curves lying on parallel planes, by introducing an artiﬁcial image potential, which is based on a triangular facet surface interpolation technique.

1

Introduction

Let us be given two digital contours Γ and Δ, i.e. two closed ordered sets of black voxels on a white background, lying on the planes z = rΓ and z = rΔ , of a 3D image Ωpqr , which discretizes the volume Ω ⊂ IR3 , with p, q and r being the number of voxels distributed equidistantly along the x, y and z axis, respectively. We wish to construct a surface that interpolates the data sets Γ and Δ. A similar formulation to the afore digital contour interpolation problem can be found in the construction of a gradual transformation from the closed polygon, PΓ to the closed polygon PΔ , most widely known as the morphing problem. Following Efrat et al. [11], this tranformation can be expressed as a mapping: M(PΓ , PΔ ) = {μ(t), t ∈ [0, 1], such that μ(0) = PΓ , μ(1) = PΔ }, which can be computed by solving the following two problems: (a) The correspondence problem, where an explicit mapping between PΓ and PΔ , is established, by specifying two functions cγ (u) : [0, 1] → PΓ and cδ (u) : [0, 1] → PΔ . (b) The vertex path problem, where we seek for the trajectory that connects cγ (u) with cδ (u) (see also [15]). If this path is a straight line, then it is easy to ﬁnd examples with self intersections. The authors of [11] assert that if one adopts the policy of moving cγ (u) to cδ (u) along the Euclidean shortest path, from cγ (u) to cδ (u) that avoids PΓ and PΔ , then it is guaranteed that all intermediate morphs are simple, since the shortest paths do not cross each other, although two such paths may have a common sub-path. Hence, in order to achieve a solution to the digital contour interpolation problem, free of self intersections, we seek for a method that constructs surfaces from

This work was partially supported by ANR grant SURF -NT05-2_45825.

X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 672–683, 2009. c Springer-Verlag Berlin Heidelberg 2009

An Implicit Method for Interpolating Two Digital Closed Curves

673

3D images, that contain geodesic paths connecting the digital contours Γ and Δ. The method presented in [2] might give us the opportunity to solve the problem with implicitly deﬁned surfaces, as it possesses this property.

2

Preliminaries

In order to segment a given 2D or 3D image I : Ω → IR, a common approach is to deﬁne a Riemannian manifold, called potential function, P = P (I) : Ω → IR, such that features in I will be captured on P . This, of course, is ensured with an “appropriate” deﬁnition of the function P , which takes into account the nature of the features we aim to follow. More speciﬁcally, after the classic work of Kass et al. [16] in 2D image segmentation methods the objective is to compute an active contour, C(s), s ∈ [0, L], located on the surface P , such that minimizes the energy functional:

L

E(C) =

(1)

P (C(s))ds. 0

Towards this aim, Cohen & Kimmel, in [8], presented a segmentation method, which computes the active contour connecting two given points, P1 , P2 on P . The authors show that a globally minimal curve for (1) is obtained by following the opposite gradient direction on the minimal action map UP1 (Q) (see [18]) which is deﬁned by: UP1 (Q) =

L

inf

C(0)=P1 ,C(L)=Q

P (C(s))ds, 0

Q on P.

(2)

The minimal path C(s), from P1 to P2 is then obtained by solving the problem: ˜ dC(σ) ˜ ˜ = −∇UP1 (C(σ)), with C(0) = P2 , dσ

˜ − σ) (3) and setting C(s) = C(L

According to the analysis in [19] the minimal action map UP1 is the solution of the following eikonal equation: ||∇UP1 || = P,

with UP1 (P1 ) = 0.

(4)

An extension of the above results for 3D images is presented in [1]. Given a 3D image, I, and the corresponding potential, P , the Euler-Lagrange equations of the energy functional E in the 3D space are: ˆ = P (C)κ ∇P (C) · n

ˆ = 0, and ∇P (C) · b

(5)

ˆ and the scalar κ denote the normal, the binormal and ˆ, b where the vectors n the curvature of C, respectively. It was proved that if UP1 is the solution of the eikonal equation (4), then every curve C(s) that is a solution of the ordinary diﬀerential equation (3) is also a solution of the Euler-Lagrange equations (5).

674

N. Gabrielides and L. Cohen

This result paved the way to deﬁne and compute the globally minimal path between a point P and a curve Γ on the Riemannian manifold P . The minimal action map with respect to Γ and P is deﬁned as the function 1 UΓ (P) = min E(C) = min P (C(t)) ||C (t)|| dt, (6) C

C

0

where C(t), t ∈ [0, 1] is any curve from the point P to the curve Γ . Note that, by the deﬁnition of C, the minimal action map UΓ (P) is equal to UQ (P) for some Q ∈ Γ . Thus, UΓ satisﬁes the eikonal equation: ||∇UΓ || = P,

with UΓ (Q) = 0,

∀Q ∈ Γ.

(7)

Going one step beyond, let us assume that the point P belongs to a set Δ. Having solved (7) all the minimal paths from each point in Δ to the curve Γ , can be computed using (3). Let us denote this set of paths by SΓΔ . It can now be undrestood that if the points in Δ form a curve, then the set SΓΔ consists of all the minimal paths, CΔ Γ (s), between the points of the two curves Γ and Δ. Next, in [2] a function Ψ , was deﬁned on the image domain, such that its zero level set contains all the paths in SΓΔ , i.e. Ψ (CΔ Γ (s)) = 0. Assuming that Ψ is continuously diﬀerentiable, the following necessary condition was obtained: Δ Ψ (CΔ Γ (s)) = 0 =⇒ ∇Ψ (CΓ (s)) ·

CΔ Γ (s) = 0 =⇒ ∇Ψ (P) · ∇UΓ (P) = 0, ds

(8)

for every point P ∈ SΓΔ . Demanding that Ψ satisﬁes a relation similar to (8), everywhere in Ω, a suﬃcient condition for the minimal paths to be contained in Ψ = 0 is given by the following transport equation: ∇Ψ (P) · ∇UΓ (P) + G(Ψ (P)) = 0,

Ψ (Q) = 0,

∀Q ∈ Δ,

(9)

where the function G is such that G(0) = 0 (e.g., G(Ψ ) = aΨ (P)). In fact it was proved that if Ψ satisﬁes (9) then for all points P of its zero-level set, the minimal path joining P with the curve Γ is contained in the zero level set of Ψ . This, in turns, proposes to solve equation (7) and then compute Ψ through (9). , Note that the equations (7) and (9) can be solved over the nodes of Ωpqr which discretizes Ω. In view of this, the point-sets Γ and Δ form two digital contours, which in turn implies that the afore method establishes essentially an interpolation between the two given digital contours. This allows us to employ it in the digital contour interpolation problem, provided that Γ and Δ lie on the parallel planes z = rΓ and z = rΔ , and no potential function is given. Since the surface Ψ = 0 contains all the minimal paths from the digital contour Γ to the digital contour Δ, we can allegate that solving the problems (7) and (9) we obtain an interpolating surface free of self-intersections.

3

An Artificial Image Potential

The need for an artiﬁcial image potential, other than constant, can be explained as follows: if P is constant, then the induced Riemannian manifold is a hyperplane in R4. Thus, the minimization of the energy functional (1) leads to a set

An Implicit Method for Interpolating Two Digital Closed Curves

675

of straight lines in R3, which start from the point set (contour) Δ and end on the points of the contour Γ , having the minimum length. Suppose now that the contour Γ is translated onto the plane z = rΓ until one point p of it is closer to all points of the set Δ. Then, the surface that contains all minimal paths is conic with its apex at P and base the set Δ. In that case all the points of Γ but P are not interpolated by the surface Ψ = 0. Thus, the problem is to introduce an artiﬁcial potential function, by using only the given information of Γ and Δ. Let us suppose that we are given a matching between the two given point sets (pixel sets) Γ and Δ. Then, we can easily deﬁne the set of minimal paths SΓΔ through equation (3) for any potential function, P . If P is constant, then the minimal paths are the straight lines which connect the points of the two point sets (the pixel centers) according to the preassumed matching, thus a C 0 surface containing all the minimal paths can be a triangular facet surface that interpolates Γ and Δ. The main disadvantage with such a construction is that self-intersections cannot be avoided in general (see [14]). However, since there are interpolation techniques which can easily construct triangular facet surfaces that interpolate the given point sets, the above remarks make us think that it is preferable to compute the potential P through the construction of such a surface, say S. Since S consists of triangles, it can easily be implicitized on the grid, Ωpqr . This can be achieved, for example, by computing the euclidean distance function, D, of S, on the grid nodes P, i.e. D(P, S) = min ||P − S|| , ∀P ∈ Ω

(10)

Then, regardless the matching we chose between the points of Γ and Δ, if one traverses the minimal path from a point on Δ to some point on Γ and the surface intersects itself, the minimal path is chosen so as to have a common sub-path after the intersection point, thus avoiding self-intersections. This suggests that the surface S could be the Riemannian manifold on which the minimal paths lie, i.e. the unsigned distance D can play the role of the discrete potential P at the image discretized domain Ωpqr . 3.1

Interpolating Two Polygons with C 0 Triangular Facet Surfaces

Previous Work. The construction of the surface S can be formulated as follows: Problem 1. Given the ordered closed planar point sets: PΓ = {PΓ,j ∈ IE 3 , j = 0, . . . , n − 1} and PΔ = {PΔ,k ∈ IE 3 , k = 0, . . . , m − 1}, which belong to the planes z = rΓ and z = rΔ , respectively, construct a C 0 surface interpolating them and consists of triangles with vertices in PΓ and PΔ . (n+m)! . Among them, one has to The total number of such triangulations is (n−1)!(m−1)! compute the optimal, according to some objective function, which quantiﬁes the quality of these triangulations. Apparently, the quality of such a surface depends mainly on the relative twist between the points of the two contours. This in turns lets us entitle the objective function as a twist minimization criterion.

676

N. Gabrielides and L. Cohen

Keppel introduced in [17] a representation of all continuous solutions, with the aid of a toroidal graph, i.e., a binary matrix, Kn×m , where the indices j, k of its elements are regarded as j = mod(j, n) and k = mod(k, m). If Kjk = 1, then the points PΓ,j and PΔ,k are connected. If Kjk = 1 and Kj+1,k = 1, then the points PΓ,j , PΓ,j+1 and PΔ,k form a triangle. (Analogously, if Kjk = 1 and Kj,k+1 = 1, then the points PΓ,j , PΔ,k and PΓ,k+1 form a triangle). Each triangle arrangement is represented by a set of unitary elements in this matrix. Keppel proved that for acceptable triangulation, these elements form a monotone path in the graph. Thus, the optimum surface can be obtained by searching among all monotone paths in the toroidal graph Kn×m . The methods for computing such paths can be divided into two categories: the exhaustive search methods (e.g, [17, 13]) that evaluate the ﬁnal surface according to some global criterion, and the methods based on weighted graphs (e.g., [6, 4, 12]) according to which a weight is assigned on each graph node and then starting from the least one, the whole path is computed by choosing in each step, among the neighboring nodes, the one with minimum weight. The methods based on weighted graphs reduce eﬀectively the computational cost, but since they are depending on the selection of the nodal weights, may yield surfaces that do not interpolate all points in PΓ and PΔ . Our intension is to propose a nodal weight deﬁnition, which resolves such ambiguities. Our Method. In order to introduce our method, let us further restrict ourselves to convex contour data sets. In [6, 12] the weight at the node Kjk of the toroidal graph, is the length ||PΓ,j − PΔ,k ||. Thus, by deﬁnition, the ﬁnal result depends on the relative position of the sets PΓ and PΔ . The method of [4] proposes a translation of the polygons so as their centers, AΓ and AΔ coinside. Thus, the square of the afore deﬁned distance for the translated polygons, with respect to the initial points is equal to ||(PΓ,j − AΓ ) − (PΔ,k − AΔ )||2 = ||(PΓ,j − AΓ )||2 + ||(PΔ,k − AΔ )||2 − 2(PΓ,j − AΓ ) · (PΔ,k − AΔ ). Then, setting −(PΓ,j − AΓ ) · (PΔ,k − AΔ ) as nodal weight, the path is computed by choosing the minimum weight at each step. We propose as weight function the dimensionless quantity: −

(PΓ,j − AΓ ) · (PΔ,k − AΔ ) , ||PΓ,j − AΓ || ||PΔ,k − AΔ ||

(11)

which is equal to the negative cosine of the angle formed by the vectors: PΓ,j − AΓ , j = 0, . . . , n − 1 and PΔ,k − AΔ , k = 0, . . . , m − 1, in [0, π]. Since the cosine is a decreasing function in [0, π], the proposed weight can equivalently be deﬁned as the least angle, φ(θΓ,j , θΔ,k ) formed by two lines with directions given by PΓ,j − AΓ and PΔ,k − AΔ , where θΓ,j denotes the polar angle of the point PΓ,j with respect to a coordinate system whose origin is AΓ,j . (Analogous deﬁnition holds for θΔ,k ). We connect the point PΓ,j with the point PΔ,k (analogously the point PΔ,k with PΓ,j ), when the index k (index j) is such that solves the following problems: min

k=0,...,m−1

φ(θΓ,j , θΔ,k ) and

min

j=0,...,n−1

φ(θΓ,j , θΔ,k ).

(12)

An Implicit Method for Interpolating Two Digital Closed Curves

677

We set the weight at every node Kjk equal to the angle φ(θΓ,j , θΔ,k ). Then, Kjk = 1 for all couples of points that constitute the set of solutions of the problems (12). Now, we can easily establish that the solution has the following properties (see, e.g., Fig.1): i. In every row and every column of the toroidal graph there exists at least one unitary node, since ∀j we have computed the corresponding index k and ∀k we have computed the corresponding j. ii. The unitary nodes of the graph are ordered monotonically. The proof is simple, if one realizes that for each particular connection between PΓ,p1 PΔ,q1 and PΓ,p2 PΔ,q2 , every point PΓ,j which is in between PΓ,p1 and PΓ,p2 must be connected with a point which is in between PΔ,q1 and PΔ,q2 , since both polygons share the same orientation and are convex. iii. Solving the problems (12) does not imply that all the nodes of the monotone path in the graph have been computed. It is possible to be left with couples (p1 , q1 ) and (p1 + 1, q1 + 1) but none of (p1 , q1 + 1) and (p1 + 1, q1 ).

Fig. 1. Left: The connections between the points of two convex polygons, as obtained by solving the problems (12). Right: The toroidal graph of the connections. The unitary nodes are illustrated by spheres, the computed triangle edges by blue lines and the possible triangle edges by red lines.

If we interprete geometrically the afore properties, we may assert that up to this point we have constructed a surface which interpolates the point sets PΓ and PΔ and consists of triangular and rectangular patches. The ﬁnal triangulation can be obtained by tracking all the rectangular patches (i.e. where the property (iii) holds) and triangulating them, based on the least nodal weight. Constructing the surface in this way, O(nm) operations need to be performed, but this cost can eﬀectively be reduced. Towards this aim, we deﬁne the circular lists: LΓ = {θΓ,j }n−1 j=0 and LΔ = {θΔ,k }m−1 of the polar angles of the points of the two initial point sets, with k=0 respect to their centers. Note that the elements of these lists have a circularly increasing order. We ﬁnd the element of the list LΓ with the least value and we

678

N. Gabrielides and L. Cohen

set the head of LΓ at its position. Then, we compute the index which solves (12) for j = 0 and we set the head of LΔ at . (We also reorder accordingly the elements of the point sets PΓ and PΔ ). Now, we know that the element K00 of the graph belongs to the set of solutions of the problem. Note that up to this point, the operations done are O(n + m). Say now that the node Kjk belongs to the solution set of the problems (12), i.e. Kjk = 1. We consider only the possible connection of this to the nodes Kj+1,k , Kj,k+1 and Kj+1,k+1 , knowing that due to the properties (i)-(iii), at least one of them belongs to the solution nodes. Thus, we begin from the node K00 , which is already computed, and at each step we compare the weights given by the function φ(·, ·), only for the afore mentioned three neigboring nodes. In case the least node is Kj+1,k+1 , we also insert in the path the one of the other two that has the least weight. Apparently, the path computed this way will traverse the nodes of the solution of the problems (12), and since the nodes to be computed are exactly (n + m), it readily follows that the complexity of the algorithm is O(n + m). Now, we can state the following result: Lemma 1. A C 0 triangular faced surface that interpolates any two convex planar polygons, with n and m points and satisfies the criteria (12) can be computed after O(n + m) operations. Moreover, the space needed for the whole process is of O(n + m). If one or both polygons are not convex, we can map them onto their convex hulls and apply the algorithm to the trasformed polygons. The output of the alogrithm is actually a point matching, thus the ﬁnal surface can be constructed by adopting this matching. The use of such a technique was ﬁrst proposed and implemented in [12] but their method increases the computational cost. Alternatively, in order to eliminate the cost of this mapping, we project all the points of the non-convex segments, Pj , j = S + 1, . . . , E − 1, to the corresponding convex hull segment PS PE , according to rule given by: Pj = (1 − tj )PS + tj PE ,

tj =

j−1 k=S ||Pk+1 −Pk || E−1 . k=S ||Pk+1 −Pk−1 ||

Computing the convex hull

of a polygon by using the algorithm of [21] needs O(n) operations, hence we can state that the results of the Lemma 1 still hold in the general case of nonconvex polygons. It is worth to remark that this algorithm although is of linear complexity, the criterion is not local (in the sense that the same result is obtained following the exhaustive search procedure) in constrast to all up today published algorithms, except of the one given in [24] also for convex polygons. Finally, the result, i.e. the point matching, is independent of any translation of the initial data and moreover independent of an isotropic scaling of the initial data sets, thus it satisﬁes the criteria given by [24]. Note also that the whole method emulates the algorithmic procedure proposed by [5]. The Discrete Potential Function. Since the surface S consists of (n + m) triangles, the minimum Euclidean distance (10) from every point of a grid Ωpqr to S can be found in (n + m) operations, thus the total number of calculations for the discrete image potential becomes of O(pqr(n + m)).

An Implicit Method for Interpolating Two Digital Closed Curves

4

679

Numerical Solution of the Eikonal and the Transport Equation

Both equations (7) and (9), belong to the class of Hamilton-Jacobi stationary equations and shall be considered simultaneously. The conditions under which the solution of a numerical approximation of any Hamilton-Jacobi equation converges towards the so-called viscosity solution can be found in [9] and [10]. In [25, 2] a ﬁrst order upwind scheme employed in order to solve equation (9). According to them, the numerical Hamiltonian of (9) can be written, for G(Ψ ) = αΨ , as i,j,k i,j,k i,j,k Ψ x , Ψy , Ψz · (UΓ )i,j,k + αΨ i,j,k = 0. , (UΓ )i,j,k , (UΓ )i,j,k (13) x y z where the subscripts denote the partial diﬀerentation with respect to x, y and z. Approximating the derivatives by biasing the ﬁnite diﬀerence stencil in the direction where the characteristic information is coming from, lets us write the product Ψxi,j,k (UΓ )i,j,k as: x i,j,k Ψ i,j,k −Ψ i+1,j,k (UΓ )i,j,k = −(UΓ )i,j,k , if (UΓ )i,j,k <0 Ψ+x i,j,k i,j,k x x x Δx Ψx (UΓ )x = i,j,k i−1,j,k i,j,k −Ψ i,j,k i,j,k Ψ i,j,k Ψ−x (UΓ )x = (UΓ )x , if (UΓ )x > 0 Δx or i + 1, if (UΓ )i,j,k <0 x = >0 i − 1, if (UΓ )i,j,k x (14) Applying the above to (13) and solving it with respect to Ψxi,j,k we obtain: |(UΓ )i,j,k | |(U )i,j,k | |(U )i,j,k | + Ψ i,J,k Δyy + Ψ i,j,K ΓΔzz Ψ I,j,k ΓΔxx

Ψ i,j,k = (15) |(U )i,j,k | |(UΓ )i,j,k | )i,j,k | x z + ΓΔyy + |(UΓΔz +α Δx Ψxi,j,k (UΓ )i,j,k x

Ψ |(UΓ )i,j,k | x

i,j,k

− Ψ I,j,k , where I = Δx

for i = 0, . . . , p − 1, j = 0, . . . , q − 1 and k = 0, . . . , r − 1, with I, J and K being deﬁned in analogous to (14) manner, according to the sign of the nodal derivatives of UΓi,j,k with respect to x, y and z, respectively. For the eikonal equation (7) the scheme proposed by Rouy & Tourin [22],

2 i,j,k max max (UΓ )i,j,k = (P i,j,k )2 , −X , 0 , − min (UΓ )+X , 0

(16)

X={x,y,z}

leads to a quadratic equation, with respect to (UΓ )i,j,k . Both equations (15) and (16) can be solved iteratively by updating their grid values until they converge, according to some predeﬁned accuracy. An ultimately eﬃcient approach to solving them is based on the so-called fast marching method, which was introduced by Sethian [23] for the eikonal equation (16). Realizing that the solution of the eikonal equation represents the distance

680

N. Gabrielides and L. Cohen

map on the (hyper)-surface P from the boundary curve Γ (see [19] and [7]) it is to be expected that the information propagates from the smaller values, near the boundary Γ , to the larger ones as we move far from it. In other words, since the characteristics of the eikonal equation are straight lines (see [20]) emanating from the boundary Γ , the numerical solution can be built "outwards" from the smallest values, as Sethian pointed out. The idea is to sweep the front ahead, by considering a set of points in a narrow band around the existing front, and to march this narrow band forward, freezing the values of existing points and bringing new ones into the narrow band structure. The key is in the selection of which grid point in the narrow band to update. The answer is that the point having the smallest value (i.e. the closest to the already calculated points) in this narrow band around the front is the one that cannot be aﬀected by the other points next to it, thus its value must be correct. Returing back to the discrete transport equation (15) an extremely fast convergence can be achieved by visiting the points in the order they are reached by the characteristic curves, in an analogous way to that of the fast marching method for the eikonal equation (see [25, 2]). Considering the characteristics of the equation (9) we obtain that the absolute values of Ψ i,j,k decrease, as we move from the boundary to the zero-level set of Ψ , provided that the coeﬃcient α is greater than zero, thus in each step we update the values of Ψ i,j,k on a narrow band of nodes, using the values of Ψ i,j,k that have already been calculated (solved), starting from the boundary of the domain, via equation (15). Then, we consider as solved the point, whose value is closest to solved points, i.e., the one with the maximum absolute value in the narrow band. Regarding the boundary conditions, since we concern only for the zero level set of Ψ , and the condition Ψ = 0 on Γ , following [2] we deﬁne the closed set: Vη = {P ∈ Ωpqr : D(P, Γ ) ≤ η}, Γ

where η is a real positive value. We impose Ψ to be equal to the signed distance between P and Γ on the nodes of V ∩ Ωpqr and equal to ± min(D(P, Γ )), Γ P ∈ ΩpqrΓ on the rest of the boundary nodes of Ωpqr , by choosing the negative sign for the nodes exterior to Γ and the positive sign for those interior to Γ . Note also that Γ can be on the boundary of Ω, while Δ must be entirely inside Ω. Numerical experimentation has shown that visually acceptable results can be achieved if we extend the grid Ωpqr in the z direction, so as Δ lies in the middle z-plane. Finally, we should remark that the algorithm yields a diﬀerent solution, Γ if we compute the surface containing the minimal paths SΔ , instead of the one Δ containing SΓ . The authors of [3] raise this asymmetry by exploiting both the minimal action maps UΓ and UΔ , which is deﬁned analously to UΓ .

5

Examples

In what follows we have taken the digital contours Γ and Δ to lie on the planes z = rΓ and z = rΔ , with rΓ < rΔ and the coeﬃcient α in equation (15) to be

An Implicit Method for Interpolating Two Digital Closed Curves

681

Fig. 2. Ex. 1: The C 0 triangular facet surface and the implicit surface Ψ = 0

Fig. 3. Ex. 2: The C 0 triangular facet surface and the implicit surface Ψ = 0

Fig. 4. Ex. 2: Intermediate slices of the implicit surface, from the contour Γ to Δ

Fig. 5. Ex. 3: The C 0 triangular facet surface and the implicit surface Ψ = 0

equal to 0.1. The grids are relatively coarse, ranging from 50 ÷ 70 nodes in the x and y directions and 20 nodes in the z direction. The ﬁrst example (see Fig.2) can be characterized as a “simple case” where the triangular facet surface has no self-intersections. In the example shown in

682

N. Gabrielides and L. Cohen

Figs.3-4 the triangular surface has a widely spread self-intersection region, due to the interpolated contours, which are far from being convex. The method yields a surface with no self-intersections. The third example (see Fig.5) is an interpolation of two contours of U as S like shapes. It shows that “morphing” cannot be achieved always due to the fact that in some cases the resulting surface, although it has no self-intersections, appears to have holes, i.e., disconnected cross sections in the area of self-intersection of the triangular surface. This means that the particular image potential function dictates the minimal paths to go around the self-intersection area, thus generating a hole in the surface.

6

Conclusions

We presented an implicit method to interpolate two digital contours on parallel planes, employing the 3D image segmentation technique of [2]. In order to guarantee that the voxels of both contours will always be interpolated, we introduced an artiﬁcial potential function. Towards this, we developed an interpolation method, that matches all the pixel centers through a C 0 triangular facet surface, and set the potential function to be the eucledean distance to this surface. The method results to non self-intersecting surfaces. However, when the polygons, connecting the contour voxel centers, are far from convex, it cannot always produce morphs that preserve the connectedness of the given contours along each intermediate slice, which in turns arises a question on how can the potetial function be improved, so as to stably accomplish an acceptable morphing between Γ and Δ. This remains an open question. The idea behind this work was to set up processes for interpolating sets of pixels/voxels following minimal paths on some appropriately deﬁned manifolds. The idea seems to be fruitful and it might pave the way to solve even more diﬃcult interpolation problems in the future.

References 1. Ardon, R., Cohen, L.D.: Fast constrained surface extraction by minimal paths. Inter. J. of Computer Vision 69, 127–136 (2006) 2. Ardon, R., Cohen, L.D., Yezzi, A.: A new implicit method for surface segmentation by minimal paths in 3D images. Appl. Math. Optim. 55, 127–144 (2007) 3. Ardon, R., Cohen, L.D., Yezzi, A.: Fast surface segmentation guided by user input using implicit extension of minimal paths. J. of Math. Imaging & Vision 25, 289– 305 (2006) 4. Batnitzki, S., et al.: Three-dimensional computer reconstruction from surface contours for head CT examinations. J. of Comp. Assist. Tomogr. 5, 60–67 (1981) 5. Choi, Y.-K., Park, K.-H.: A heuristic triangulation algorithm for multiple planar contours using an extended double branching procedure. Visual Computer 10, 372– 387 (1994) 6. Christiansen, H.N., Sederberg, T.W.: Conversion of complex contour lines into polygonal element mosaics. In: Phillips, R.L. (ed.) Computer Graphics (SIGGRAPH 1978), vol. 12, pp. 187–192 (1978)

An Implicit Method for Interpolating Two Digital Closed Curves

683

7. Cohen, L.D.: Minimal paths and fast marching methods for Image Analysis. In: Paragios, N., Chen, Y., Faugeras, O. (eds.) Mathematical Models in Computer Vision: The Handbook, pp. 97–111. Springer, Heidelberg (2005) 8. Cohen, L.D., Kimmel, R.: Global minimum for active contour models: A minimal path approach. Inter. J. of Computer Vision 24, 57–78 (1997) 9. Crandall, M., Lions, P.L.: Viscosity solutions of Hamilton-Jacobi equations. Trans. Amer. Math. Soc. 277, 1–42 (1983) 10. Crandall, M., Lions, P.L.: Two approximations of solutions of Hamilton-Jacobi equations. Math. of Comp. 43, 1–19 (1984) 11. Efrat, A., Har-Peled, S., Guibas, L., Murali, T.: Morphing between Polylines. In: Proc. 12th Ann. ACM-SIAM Symp. on Discr. Alg. (SODA 2001), pp. 680–689 (2001) 12. Ekoule, A.B., Peyrin, F.C., Odet, C.L.: A triangulation algorithm from arbitrary shaped multiple planar contours. ACM Trans. on Graph. 10, 182–199 (1991) 13. Fuchs, H., Kedem, Z.M., Uselton, S.P.: Optimal surface reconstruction from planar contours. Commun. ACM 20, 693–702 (1977) 14. Gitlin, G., O’Rourke, J., Sabramanian, V.: On reconstructing polyhedra from parallel slices. Intern. J. of Comp. Geom. & Appl. 6, 103–122 (1996) 15. Hahmann, S., Bonneau, G.-P., Caramiaux, B., Cornillac, M.: Multiresolution morphing of planar curves. Computing 79, 197–209 (2007) 16. Kass, M., Witkin, A., Terzopoulos, D.: Snakes: Active Contour Models. Intern. J. of Computer Vision 1, 321–331 (1988) 17. Keppel, E.: Approximating complex surfaces by triangulation of contour lines. IBM J. Res. Devel. 19, 2–11 (1975) 18. Kimmel, R., Amir, A., Bruckstein, A.: Finding shortest paths on surfaces using level sets propagation. IEEE Trans. Pat. Anal. Mach. Int. 17, 635–640 (1995) 19. Kimmel, R., Kiryati, N., Bruckstein, A.: Sub-pixel distance map and weighted distance transforms. J. of Math. Imaging & Vision 6, 223–233 (1996) 20. Mauch, S.: Eﬃcient Algorithms for Solving Static Hamilton-Jacobi Equations, Doctoral Thesis, California Institute of Technology, Pasadena, California (2003) 21. Melkman, A.: On-line construction of the convex hull of a simple polygon. Inform. Proc. Letters 25, 11–12 (1987) 22. Rouy, E., Tourin, A.: A viscosity solutions approach to shape-from-shading. SIAM J. Numer. Anal. 29, 867–884 (1992) 23. Sethian, J.: A fast marching level set method for monotonically advancing fronts. Proc. Natl. Acad. Sci. USA 93, 1591–1595 (1996) 24. Welzl, E., Wolfers, B.: Surface reconstruction between simple polygons. In: Lengauer, T. (ed.) ESA 1993. LNCS, vol. 726, pp. 397–408. Springer, Heidelberg (1993) 25. Yezzi, A., Prince, J.: An Eulerian PDE approach for computing tissue thickness. IEEE Trans. on Medical Imaging 22, 1332–1339 (2003)

Pose Invariant Shape Prior Segmentation Using Continuous Cuts and Gradient Descent on Lie Groups Niels Chr. Overgaard, Ketut Fundana, and Anders Heyden Applied Mathematics Group, Malmö University, Sweden {nco,ketut.fundana,anders.heyden}@mah.se

Abstract. This paper proposes a novel formulation of the Chan-Vese model for pose invariant shape prior segmentation as a continuous cut problem. The model is based on the classic L2 shape dissimilarity measure and with pose invariance under the full (Lie-) group of similarity transforms in the plane. To overcome the common numerical problems associated with step size control for translation, rotation and scaling in the discretization of the pose model, a new gradient descent procedure for the pose estimation is introduced. This procedure is based on the construction of a Riemannian structure on the group of transformations and a derivation of the corresponding pose energy gradient. Numerically, this amounts to an adaptive step size selection in the discretization of the gradient descent equations. Together with eﬃcient numerics for TVminimization we get a fast and reliable implementation of the model. Moreover, the theory introduced is generic and reliable enough for application to more general segmentation- and shape-models.

1

Introduction

The celebrated model of T. Chan and L. Vese [1] for piecewise constant, twophase segmentation of a gray scale image I : Ω → R+ can be formulated as follows: Among all characteristic functions u = 1Σ of measurable sets Σ, contained in the bounded (image) domain Ω ⊂ R2 , and all pairs of real numbers c = (c0 , c1 ), ﬁnd u∗ = 1Σ ∗ , c∗ = (c∗0 , c∗1 ) which minimizes the following energy λ 1 − u, (I − c0 )2 + u, (I − c1 )2 , (1) 2 where λ > 0 is a ﬁxed weight, J(u) = Ω |∇u| dx is the total variation of u, and u, v = Ω uv dx is the L2 inner product between u and v. Recall that for u = 1Σ , J(u) = Per(Σ), the perimeter (in Ω) of Σ, i.e. the length of the boundary Γ = ∂Σ in Ω. Traditionally, and originally [1], minimization of (1) was formulated in the level set framework of Osher an Sethian [2, 3, 4] by setting u = H(φ), where H denotes the Heaviside function, and φ : Ω → R an embedding function used to represent the image object implicitly as Σ = {x ∈ Ω ; φ(x) > 0}. This highly ECV (u, c) = J(u) +

X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 684–695, 2009. c Springer-Verlag Berlin Heidelberg 2009

Pose Invariant Shape Prior Segmentation Using Continuous Cuts

685

non-linear optimization problem is solved using gradient descent, which, in the level set framework, corresponds to the following evolution PDE for the active contour Γ (t) := ∂Σ(t) = {x ∈ Ω ; φ(x, t) = 0}, ∇φ λ ∂φ = div( ) + (I − c0 )2 − (I − c1 )2 |∇φ|, ∂t |∇φ| 2 where t is an artiﬁcial time parameter and φ = φ(x, t) a time dependent level set function. At every instant in this evolution, the gray value estimates c0 , c1 are updated according to c0 = c0 (u) =

1 − u, I 1 − u, 1

and c1 = c1 (u) =

u, I . u, 1

(2)

One of the most inspiring discoveries in resent years, due to Chan, Esedou¯glu and Nikolova [5], is that, for any ﬁxed c, the minimization (1) with respect to binary label functions u may be solved exactly by considering a convex relaxation of the problem, where the set of admissible u’s is enlarged to: K := {u ∈ BV(Ω) ; 0 ≤ u(x) ≤ 1 for all x ∈ Ω}.

(3)

In fact, it was shown in [5] that if u ∈ K minimizes (1), then for almost all thresholds t ∈ (0, 1) the function 1 if u(x) > t ut (x) = (x ∈ Ω), (4) 0 otherwise is a global minimizer for the original problem. The proof is recalled in Section 2.1. Thus, global minimizers of the Chan-Vese model can be found by truncation of the solution to an easier, unilaterally constrained, convex variational problem. The use of this truncation property is referred to as the continuous (graph) cut method, and problems formulated in this manner can be solved eﬃciently using fast algorithms for TV-minimization. See, e.g., Chambolle [6]. The problem of including apriori shape information into the segmentation process has been studied extensively within the level set framework for the last decade or so [7, 8, 9, 10, 11]. The common approach is to include a interaction energy between object Σ and a prior shape Σ into the segmentation functional. If f denotes the characteristic function of the prior shape Σ , then a typical shape prior segmentation functional looks like E(u, c, f ) = ECV (u, c) +

γ u − f 2 , 2

(5)

where γ > 0 is a ﬁxed coupling constant for the interaction, and u = u, u is the L2 norm. The shape interaction in (5) may be interpreted geometrically as u − f 2 = area(Σ Σ ), i.e. the area of the symmetric set diﬀerence between the sets Σ and Σ , c.f. [10] and [11]. The segmentation is now obtained by minimization of the functional (5) with respect to the (binary) label functions

686

N.Chr. Overgaard, K. Fundana, and A. Heyden

u, gray values c and f ∈ F , where F denotes a class of prescribed shape priors. This formulation is quite general. A speciﬁc example, considered in this paper, is segmentation with pose invariant priors. In this case F = {f = f0 ◦ T }T ∈G , where the binary function f0 is a shape template, and T ranges over a group of transformations G, e.g. the group of similarity transformations. Since continuous cuts have emerged as an alternative to level sets for minimization of the CV- and other segmentation models, it is natural to ask if known shape prior segmentation models can be reformulated as variational problems possessing the important truncation property, which allow them to be solved using TV-minimization algorithms. One such attempt has been made in [12], see Section 2.2, but it does not go all the way. The purpose of the present paper is to formulate the shape prior segmentation model (5) as a continuous cut problem. This is achieved by reformulating the problem as a CV model (see Section 3.1). We speciﬁcally consider shape priors which are pose invariant under the group of similarity transforms, which involves optimization over a Lie group. In order to solve this problem eﬃciently and reliably, we develop a theory for gradient descent on Lie groups (Section 3.3). The problem here is, essentially, to construct a Riemannian structure on the Lie group. The new theory eliminates the problems associated with step-size selection in discretizations of the gradient descent ODEs usually encountered in segmentation models with pose estimation.

2 2.1

Background Relaxation in the CV Model

In this section we brieﬂy describe the theory behind the continuous cut solution for the CV model and its connection to the ROF denoising model and TV-minimization. Let us consider the minimization of (1) over the set of label functions u ∈ K deﬁned in (3), and gray values c ∈ R2 . In this setting ECV is a bi-convex functional, that is, convex in each of its arguments u and c, separately, when the other is kept ﬁxed. However, ECV is not jointly convex. One therefore uses a method referred to, in this paper, as the CV-algorithm, which alternates between optimization in u and c: If an initial state (u0 , c0 ) is given, then a minimizing sequence (uk , ck ) is constructed by uk+1 = arg min ECV (u, ck ),

(6)

u∈K

ck+1 = arg min ECV (uk+1 , c).

(7)

c∈R2

The sub-problem (7) is a simple quadratic optimization whose solution is readily given by the formulas in (2) with u = uk+1 . We therefore proceed to describe the theory and algorithms for the continuous cut solution of the sub-problem (6). If c is ﬁxed then the minimization of (1) over K is equivalent to minimization over K of the energy

Pose Invariant Shape Prior Segmentation Using Continuous Cuts

λ ˆ E(u) = J(u) + (I − c1 )2 − (I − c0 )2 , u := J(u) + g, u, 2

687

(8)

where g = (λ/2)[(I − c1 )2 − (I − c0 )2 ] is the data term. We now prove the result in Chan et al. [5] referred to in the Introduction, that minimization of Eˆ over binary u’s can be obtained from the solution of the convex variational problem ˆ inf u∈K E(u) by truncation. For u ∈ BV (Ω), let ut denote the function deﬁned in (4). We recall: ˆ then so does ut for almost The Truncation Lemma. If u ∈ K solves inf K E, all t ∈ [0, 1]. 1 Proof. The coarea formula, J(u) = 0 J(ut ) dt, and the layer cake representation 1 1 ˆ ˆ t ) dt. Since ut ∈ K it is g, u = 0 g, ut dt, together yield E(u) = 0 E(u t ˆ ) ≥ E(u) ˆ admissible, and E(u for all t, by assumption, the integrand on the left 1 t ˆ ˆ hand side of 0 E(u ) − E(u) dt = 0 must be zero for almost all t ∈ [0, 1].

In Chan et al. [5], the minimum was approximated by solving a degenerate parabolic PDE for u (the gradient descent PDE) with an exact penalty term to ensure that the constraint 0 ≤ u ≤ 1 is satisﬁed at all times. This PDE was implemented with an explicit ﬁnite diﬀerence scheme, and is therefore rather slow. We have chosen another method, introduced by Aujol and Chambolle [13] and used successfully by Bresson et al. [14, Sect. 3.2]. This consists of minimizing a variant of (8) which has been regularized slightly by inﬁmal convolution with a quadratic function: inf

v∈BV, u∈K

J(v) +

1 v − u 2 + g, u , 2θ

(9)

where θ > 0 is a parameter, and send θ → 0. For θ ﬁxed, the problem is solved iteratively using what we call the ABC-algorithm: If (v 0 , u0 ) denotes an initial guess, then a minimizing sequence is given by the pair (v n , un ) where 1 v n+1 = arg min J(v) + v − un 2 = un − θ PrC (un /θ), 2θ v∈BV un+1 = arg min u∈K

1 n+1 v − u 2 + g, u = PrK (v n − θg). 2θ

(10) (11)

The ﬁrst of these problems is the classical Rudin-Osher-Fatemi (ROF) image denoising model [15] with un as input image. The second one is a simple L2 optimization. Both problems are strictly convex, thus admits unique solutions, and, as indicated, their optima can be expressed in terms of L2 -projections onto closed convex sets: the ﬁrst projection is onto C, which is the L2 -closure of the set {div ξ ; ξ ∈ C 1 (Ω; R2 ), |ξ(x)| ≤ 1 ∀x ∈ Ω}, c.f. Chambolle [6]. The second projection is onto K, deﬁned above. The latter is easy to compute, indeed PrK f (x) = min(1, max(0, f (x))) for x ∈ Ω), for any square L2 function f : Ω → R.

688

N.Chr. Overgaard, K. Fundana, and A. Heyden

To minimize the ROF functional (10) we use a variant of the fast and reliable algorithm for TV-minimization proposed by Chambolle [6, 16]. 2.2

The Algorithm of Fundana and Co-workers

A resent paper by Fundana et al. [12] contains what is probably the ﬁrst attempt to include shape priors into continuous cut segmentation. The authors consider the model (5) where f = f0 ◦ T is pose invariant under the group of similarity transformations T of the plane, i.e. the variational problem inf

u,c,T

γ E(u, c, T ) := ECV (u, c) + u − f0 ◦ T 2 . 2

(12)

This problem cannot be solved by continuous cuts (for c and T ﬁxed) simply by enlarging the admissible label functions from the binary u’s to u ∈ K. The problem, of course, lies in the quadratic interaction term, which seems to “spoil” the Truncation Lemma. In [12] this problem is cleverly circumvented by the following construction: If (u0 , c0 , T 0 ) denotes an initial guess then a minimizing sequence (uk , ck , T k ) is (essentially) constructed by the following procedure: ck+1 = c(uk )

using formula (2).

(13)

∂ E(uk , ck+1 , T k ) time step Δt > 0 (14) ∂T γ = arg min ECV (u, ck+1 ) + u − f0 ◦ T k+1 , uk − f0 ◦ T k+1 (15) 2 u∈K

T k+1 = T k − Δt uk+1

Here we observe that by “freezing” one occurrence of u = uk in the quadratic interaction term, the update step (15) becomes linear in u, hence solvable by continuous cut methods. In [12] this minimization was performed using the gradient descent PDE from [5]. Our aim is to improve the above method by formulating the problem in such a way that the model itself, not only the algorithm, satisﬁes the truncation property.

3 3.1

The Shape Prior Segmentation Model The Basic Energy Functional

Our reformulation of the functional (5) is based on the following observation: If the label function u : Ω → {0, 1} is binary, and we deﬁne an image model by Imodel = Imodel (u, c) = c0 (1 − u) + c1 u, then it is easy to see that the CVfunctional (1) may be rewritten as: ECV (u, c) = J(u) +

λ I − Imodel 2 . 2

(16)

This suggests the following model for shape prior segmentation: If f : Ω → R denotes a (possibly fuzzy) shape prior, that is 0 ≤ f (x) ≤ 1 on Ω, then we

Pose Invariant Shape Prior Segmentation Using Continuous Cuts

689

associate an image model to f given by Iprior = Iprior (f, b) = b0 (1 − f ) + b1 f . We now pose shape prior segmentation as the minimization over all binary label functions u of the following functional: λ μ E(u, c, f, b) = ECV + Eprior = J(u)+ I − Imodel 2 + Imodel − Iprior 2 . (17) 2 2 Notice that close to convergence, it is reasonable to expect that b0 ≈ c0 and b1 ≈ c1 . Assuming that exact equality holds here, we ﬁnd that μ μ Imodel − Iprior 2 = (c1 − c0 )2 u − f 2 , 2 2

(18)

which corresponds to the interaction term in (5) if we set γ = μ(c1 − c0 )2 . We will use this simpliﬁcation in Section 3.2. Let us consider the minimization of (17) with respect to u and c when prior data b and f are kept ﬁxed. After completion of squares in (17) we ﬁnd that E(u, c, f, b) = J(u) +

λ + μ

Imodel − ( λ I + μ Iprior ) 2 2 λ+μ λ+μ

2 μ 1 1 λ I+ Iprior . + I 2 − 2 2 λ+μ λ+μ

(19)

Only the ﬁrst square depends on the (binary) u and c. So updating u and c is equivalent to solving the following CV-problem: λ + μ 1 − u, (Ieﬀ − c0 )2 + u, (Ieﬀ − c1 )2 . inf J(u) + (20) 2 μ λ I + λ+μ Iprior is an eﬀective image obtained as a convex combiHere Ieﬀ = λ+μ nation of the observed image I and the prior image Iprior . The problem (20) has the truncation property, and may be solved by the CV-algorithm (6), (7), using continuous cuts. This solution is a minimizer of (17). Suppose that c and u have been updated and are now held ﬁxed. Returning to the energy E, written in the original form (17), we optimize with respect to prior image model Iprior = b0 (1−f )+b1f . An easy calculation shows the optimal gray scales b = (b0 , b1 ) are given by the formulas:

b0 =

1 − f, Imodel 1 − f 2

and b1 =

f, Imodel . f 2

With these values ﬁxed, we proceed to update the pose of the shape prior f , which is the subject of the next few sections. 3.2

Pose Invariant Prior Interaction Energy

Let f0 : Ω → R denote a shape template of class C01 (Ω), and T : R2 → R2 a similarity transformation, that is, a mapping of the form y = T (x) = μ−1 R−1 (x − a), x ∈ R2 , where R ∈ SO(2) denotes rotation, μ > 0 a scaling

690

N.Chr. Overgaard, K. Fundana, and A. Heyden

factor, and a ∈ R2 translation. We deﬁne the shape prior f as the transformed template T ∗ f0 : R2 → R by the formula f (x) = T ∗ f0 (x) = (f0 ◦ T )(x) = f0 (T (x)) for all x ∈ R2 . If T is suﬃciently close to the identity map then, clearly, T ∗ f0 ∈ C01 (Ω), so that the support of the prior will remains inside the image domain Ω. In the present paper we use the simpliﬁcation of (17) in (18) and consider a pose invariant prior interaction deﬁned by the energy, Eprior (u) = inf u − T ∗ f0 2 = inf (u(x) − f0 (T (x)))2 dx, (21) T

T

Ω

where the inﬁmum is taken over the group of similarity transforms T in the plane. The following (natural) parametrization is used throughout: cos θ − sin θ (θ ∈ R). (22) a ∈ R2 , μ = eσ (σ ∈ R), and R(θ) = sin θ cos θ The pose parameters θ, σ and a are collected in a vector p = (p1 , p2 , p3 , p4 ) := (θ, σ, a) ∈ R4 , the corresponding map is occasionally denoted T = T (p), and the shape prior becomes f (x) = T ∗ f0 (x) = T (p)∗ f0 (x) = f0 (e−σ R(−θ)(x − a)). Now, the inﬁmum in (21) is usually computed by applying a gradient descent procedure to the function R4 p → E(p) := u − T (p)∗ f0 2 /2. That is, one solves a system of ODE:s given by p (t) = −∇E(p(t)), with respect to an artiﬁcial time parameter t, and the obtain the optimal pose p∗ as p∗ = limt→∞ p(t). This method requires the computation of the partial derivatives ∂E(p)/∂pi for every component pi of p. A simple calculation shows that ∂E(p)/∂pi = T (p)∗ f0 − u, ∂T (p)∗ f0 /∂pi , so we begin with the partials ∂T (p)∗ f0 (x)/∂pi . By the chain rule, ∂ ∗ T f0 (x) = −∇x T ∗ f0 (x) = −∇x f (x) (two components!) ∂a ∂ ∗ T f0 (x) = −∇x T ∗ f0 (x)T J(x − a) = −∇x f (x)T J(x − a) ∂θ ∂ ∗ T f0 (x) = −∇x T ∗ f0 (x)T (x − a) = −∇x f (x)T (x − a) ∂σ 0 1 where J = R(−θ)T R (−θ) = [ −1 0 ] is the clockwise rotation by π/2 radians. Notice that −∇x f appears in all the formulas, with the x-derivative computed after transformation of the template. It follows from the above formulas that the partial derivatives of E(p) are given by (The ﬁrst equation being interpreted component wise.)

∂ E(θ, σ, a) = −f − u, ∇x f , ∂a and

∂ E(θ, σ, a) = −f − u, ∇x f T J(· − a), ∂θ

∂ E(θ, σ, a) = −f − u, ∇x f T (· − a). ∂σ (23)

Pose Invariant Shape Prior Segmentation Using Continuous Cuts

691

These integrals are eﬀectively computed on the support of −∇x f , that is, over a neighbourhood of the boundary of the shape prior. The traditional way to proceed is to iteratively update the pose parameters a, θ and σ using (essentially) the schemes a(t + Δta ) = a(t) − Δta · ∂E/∂a, θ(t + Δtθ ) = θ(t) − Δtθ · ∂E/∂θ, and σ(t + Δtσ ) = σ(t) − Δtσ · ∂E/∂σ. This is problematic; in order for this method to work properly the time steps Δta , Δtθ and Δtσ have to be chosen diﬀerently, and with great care. This is not only unsatisfying from a theoretical view point but it also limits the practical applicability of the method; not least because the delicate choice of time steps tends to be time-consuming. We address this problem in the next section. 3.3

The Gradient Construction

The group of similarity transformations constitutes a four-dimensional manifold that we denote M (i.e., M is a Lie group). Any point p ∈ M may be represented by the coordinates p = (θ, σ, a) using (22), which may be regarded as a (almost global) parametrization of a neighbourhood of the identity map in M. If E : M → R is a diﬀerentiable function then dE(p) : Tp M → R denotes the diﬀerential of E at p ∈ M, where Tp M is the tangent space of M at p. In the lo∂E ∂E cal coordinates the diﬀerential may be expressed as dE = ∂E ∂a da + ∂θ dθ + ∂σ dσ. Suppose that Tp M is equipped with a scalar product (·, ·)p , then we may deﬁne the gradient of E at p as the unique vector ∇E(p) ∈ Tp M which satisﬁes the relation (∇E(p), v)p = dE(p)v, ∀v ∈ Tp M. (24) The metric ds2 = |da|2 + dθ2 + dσ 2 deﬁnes a scalar product which, as already noted, is insuﬃcient for the construction of a reliable gradient descent scheme for E(p) = u − T (p)∗ f0 2 /2. Our goal is to deﬁne a Riemannian structure on M which is better suited for this task. Let a function f : M × R2 → R be deﬁned by f (p, x) = T (p)∗ f (x) = f0 (T (p)x). Since the shape template f0 ∈ L2 (R2 ), the mapping p → f (p, ·) is a function f : M → L2 (R2 ). Now, L2 (R2 ) comes with an inner product ·, ·, so it is natural to deﬁne the scalar product on Tp M as the pullback by f of the L2 -inner product to the tangent space Tp M, (v, w)p = df (p)v, df (p)w,

(v, w ∈ Tp M)

(25)

where df (p) : Tp M → Tf (p) L2 (R2 ) ≡ L2 (R2 ) denotes the diﬀerential of f . By the chain rule, df = −(Dx f0 ◦ T )dT , so in view of the identity dT = DT dp = Dx T (p)DT (0)dp (which uses the group structure of M) we see that df = −∇x f T DT (0) dp, where DT (0) is the linear map given by the block matrix: DT (0) = I2×2

J(x − a)

(x − a) .

692

N.Chr. Overgaard, K. Fundana, and A. Heyden

0 1 As before, J = [ −1 0 ]. With this calculation we ﬁnd that

df (p)v,df (p)w = −∇x f T DT (0) dp(v), −∇x f T DT (0) dp(w) = dp(v)T 1, DT (0)T ∇x f ∇x f T DT (0)dp(w) := dp(v)T G(p)dp(w), where G(p) denotes the metric tensor on Tp M expressed in the coordinates p. If we deﬁne M = ∇x f ∇x f T then G(p) = 1, g(p, ·) where g(p, ·) : R2 → R4×4 is given by g(p, x) = DT (0)T M DT (0), which equals ⎡

M J(x − a)

M

M (x − a)

⎤

⎢ ⎥ ⎣(x − a)T J T M (x − a)T J T M J(x − a) (x − a)T JM (x − a)⎦ (x − a)T M

(x − a)T M J(x − a)

(x − a)T M (x − a)

This expression is, unfortunately, too complicated for our present purpose, so we need to make some simpliﬁcation. This is achieved by approximating the structure tensor M by the simpler tensor 12 |∇x f |2 I2×2 . (There are some compelling reasons for doing so! For instance g3,3 + g4,4 = |∇x f |2 |x − a|2 .) With this simpliﬁcation we get ⎡

⎤ J(x − a) (x − a) I2×2 |x − a|2 (x − a)T J(x − a)⎦ , g(p, x) = |∇x f |2 ⎣(x − a)T J T T T (x − a) (x − a) J(x − a) |x − a|2 where we notice that, in fact, the matrix elements g4,3 = g3,4 = 0 because J is skew-symmetric. Finally, if we choose a—the center of rotation and scaling— such that |∇x f |2 , x − a = 0, that is, as the barycenter of the mass-distribution dm = |∇x f |2 dx, then the metric tensor G = 1, g has the following diagonal form: ⎡ ⎤ ∇x f 2 I2×2 0 0 ⎦. 0 |x − a|∇x f 2 0 G(p) = ⎣ (26) 2 0 0 |x − a|∇x f Equivalently, (dp, dp)p = ∇x f 2 |da|2 + |x − a|∇x f 2 (dθ2 + dσ 2 ). It follows from (25) and the formulas (23), that the corresponding gradient of E has the components: ∇a E =

f − u, −∇x f , ∇x f 2

∇θ E =

f − u, −∇x f T J(· − a) , |x − a|∇x f 2

f − u, −∇x f T (· − a) . and ∇σ E = |x − a|∇x f 2

(27)

This is the gradient used in our implementation of gradient descent search for the optimal pose parameters. Its use amounts to an adaptive step-size control in the numerical discretization of the associated system of ODEs.

Pose Invariant Shape Prior Segmentation Using Continuous Cuts

693

Fig. 1. Experiment 1: First row: The original image, 212×320 pixles (left), the active contour Γ = {x ; u(x) = .5} in CV-segmentation without priors after 100 iterations (middle), and the corresponding segmentation (right). Second row: The shape template, the active contour and the shape prior after 150 iterations, and the ﬁnal segmentation. Final row: segmentation of the image contaminated with 15% Gaussian noise using 200 iterations. Parameters: μ = .4, λ = .1, θ = .5 and step-size Δt = .75.

4

Experiments

The method presented in Section 3 was implemented in MATLAB with the following speciﬁcs: For the minimization of (17) (in the form (19)) we used the ABC-algorithm (10) and (11) with the parameter θ = 0.5 and a variant of Chambolle’s algorithm [16, Eq. (12)], implemented with periodic boundary conditions, for the TV-minimization in (10). This was alternated with an update of the pose of the prior, using gradient descent with the new gradient (27). The experiments presented here are limited to a proof-of-concept level. The ﬁrst experiment (Figure 1) shows the CV segmentation with and without the shape prior, and with added noise. The segmentation result is displayed as a cutout from the original image by multiplication with the optimal label function u. This veriﬁes the binary character of u. The second experiment (Figure 2) shows how the search evolves for three diﬀerent initializations. As shown, the method may not always converge to the wanted solution. In fact, the prior contour may sometimes even shrink and disappear. These cases correspond, however, to quite plausible local minima for the pose energy, and this behavior is not unexpected in a local optimization method. More details are found in the ﬁgure captions.

694

N.Chr. Overgaard, K. Fundana, and A. Heyden

Fig. 2. Experiment 2: Shape prior segmentation with three diﬀerent initial poses (top row). Evolution after (approximately) 12, 25, 50, 100 and 200 iterations (rows 2–6). The run-time for 100 iterations is about 25 CPU-seconds. In the ﬁnal phase of the segmentation, objects previously detected outside the prior disappear. With the third initialization the shape prior gets stuck in a local minimum. Such behavior cannot be ruled out when we work with local optimization methods. Image size and parameter settings are as in Experiment 1.

Pose Invariant Shape Prior Segmentation Using Continuous Cuts

5

695

Conclusion

This paper contains two central contributions. Firstly, the reformulation in (17) of the shape prior segmentation model in (5), which leads to a minimization problem which can be solved using continuous cut methods. Secondly, the derivation of the gradient expressions (27), which is the basis for a stable and eﬃcient gradient descent scheme for prior pose optimization. We believe that the ideas introduced here can be extended to cover more general and complex shape prior segmentation models. In particular it would be interesting to see if the ideas can be applied to pose problems in three dimensions.

References 1. Chan, T., Vese, L.: Active contours without edges. IEEE Transactions on Image Processing 10(2), 266–277 (2001) 2. Osher, S., Sethian, J.A.: Fronts propagating with curvature-dependent speed: Algorithms based on Hamilton-Jacobi formulations. Journal of Computational Physics 79, 12–49 (1988) 3. Sethian, J.: Level Set Methods and Fast Marching Methods Evolving Interfaces in Computational Geometry, Fluid Mechanics, Computer Vision, and Materials Science. Cambridge University Press, Cambridge (1999) 4. Osher, S.J., Fedkiw, R.P.: Level Set Methods and Dynamic Implicit Surfaces. Springer, Heidelberg (2002) 5. Chan, T.F., Esedo¯ glu, S., Nikolova, M.: Algorithms for ﬁnding global minimizers of image segmentation and denoising models. SIAM J. Appl. Math. 66(5), 1632–1648 (2006) 6. Chambolle, A.: An algorithm for total variation minimization and applications. J. Math. Imaging and Vision 20(1–2), 89–97 (2004) 7. Leventon, M., Grimson, W., Faugeras, O.: Statistical shape inﬂuence in geodesic active contours. In: CVPR (2000) 8. Rousson, M., Paragios, N.: Shape priors for level set representations. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2351, pp. 78–92. Springer, Heidelberg (2002) 9. Cremers, D., Soatto, S.: A pseudo-distance for shape priors in level set segmentation. In: Faugeras, O., Paragios, N. (eds.) 2nd IEEE Workshop on Variational, Geometric and Level Set Methods in Computer Vision (2003) 10. Chan, T., Zhu, W.: Level set based prior segmentation. Technical Report UCLA CAM 03-66, Department of Mathematics, UCLA (2003) 11. Riklin-Raviv, T., Kiryati, N., Sochen, N.: Unlevel-sets: Geometry and prior-based segmentation. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3024, pp. 50–61. Springer, Heidelberg (2004) 12. Fundana, K., Heyden, A., Gosch, C., Schnörr, C.: Continuous graph cuts for priorbased object segmentation. In: Proc. ICPR (2008) 13. Francois Aujol, J., Chambolle, A.: Dual Norms and Image Decomposition Models. Int. J. Comput. Vis. 63(1), 85–104 (2005) 14. Bresson, X., Esedo¯ glu, S., Vandergheynst, P., Thiran, J.-P., Osher, S.: Fast global minimization of the active contour/snake model. J. Math. Imaging Vis. 28(2), 151– 167 (2007) 15. Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 16. Chambolle, A.: Total variation minimization and a class of binary MRF models. UMR CNRS 7641, Ecole Polytechnique, Centre de mathematiques appliquées (June 2005)

A Non-local Approach to Shape from Ambient Shading Emmanuel Prados1 , Nitin Jindal1 , and Stefano Soatto2 1

2

Perception Lab., INRIA Grenoble – Rhône-Alpes, France Computer Science Department, University of California, Los Angeles, USA

Abstract. We study the mathematical and numerical aspects of the estimation of the 3-D shape of a Lambertian scene seen under diﬀuse illumination. This problem is known as “shape from ambient shading” (SFAS), and its solution consists of integrating a strongly non-local and non-linear Integro-Partial Diﬀerential Equation (I-PDE). We provide a ﬁrst analysis of this global I-PDE, whereas previous work had focused on a local version that ignored eﬀects such as occlusion of the light ﬁeld. We also design an original approximation scheme which, following Barles and Souganidis’ theory, ensures the correctness of the numerical approximations, and discuss about some numerical issues.

1

Introduction

Shape From Shading (SFS) refers to the problem of computing the three-dimensional shape of a surface, under certain assumptions on its reﬂectance and on the illumination, from a single grayscale image. By necessity, to render the problem tractable, these assumptions are rather coarse: Most restrict the illumination to a single point-light source at inﬁnity [20, 4, 13, 7]. Only recently, [14] have shown that the problem actually simpliﬁes when the attenuation of the light source at ﬁnite distance is taken into account. Nevertheless, due to inter-reﬂections and other complex phenomena, modeling illumination as a point source is very unrealistic even on a bright sunny day. Indeed, in most realistic conditions including indoors and outdoor overcast conditions, a uniform hemispherical illumination source is a more realistic model. The study of SFS under such illumination conditions has been pioneered by Langer et al. [10, 16, 9], and followed by others that we discuss shortly. In this work, we focus on the mathematical properties of the problem of “Shape From Ambient Shading” (SFAS), and seek for conditions that render the problem well-posed. 1.1

Relation to Prior Work

Langer et al. [10,16,9] were the ﬁrst to consider the case of ambient lighting, and to note that vignetting eﬀects, far from being a nuisance, enable the inference of object shape similar to more traditional SFS, except for the added complication of the distributed source. In [17], Tian, Tsui and Yeung have proposed a numerical SFS algorithm for dealing with some non-punctual and multiple X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 696–708, 2009. c Springer-Verlag Berlin Heidelberg 2009

A Non-local Approach to Shape from Ambient Shading

697

light sources (any combination of spherical, rectangular and cylindrical light sources). Following a more elaborate and physically motivated model of illumination, [12, 18, 9, 19] introduced methods to deal with interreﬂection. However, in none of these works [17, 10, 16, 9, 12, 18, 9] are the mathematical properties of the SFAS problem elucidated analytically. In particular, there are no results on the existence and uniqueness of solution for the ensuing global PDE. At the opposite end of the spectrum, Lions, Rouy and Tourin [11] performed a theoretical analysis the SFS problem for multiple and continuous distributed light sources. As Tian, Tsui and Yeung [17], Lions, Rouy and Tourin neglect shadows (i.e. occlusions of the light sources by the surface itself); more speciﬁcally, they assume that for any ﬁxed point x on the surface, all the light sources located on the hemisphere normal to the surface at x are visible from this point. This allows them to neglect the global nature of the equation, which in turn signiﬁcantly simpliﬁes the analysis. As Langer et al. [10,16,9] we focus on ambient lighting. In their work, Langer et al. do not neglect the “shadows eﬀect” and they model interreﬂections. They also underline the importance of ambient lighting in psychophysics. In this context, light comes from all directions and the assumption of Lions, Rouy and Tourin [11] is equivalent to assume that the solution is concave. Here, we do not want to limit ourself to concave objects. Therefore, Lions’ constraints are far too restrictive.1 . The necessity to consider these phenomena takes us to mathematicaly uncharted territories. To the best of our knowledge, we are the ﬁrst to provide theoretical results for the SFAS problem. Also, we introduce numerical algorithms verifying the properties of monotony, consistency and stability which typically ensure its convergence (see [1]).

2

Modeling Shape from Ambient Shading

Shape From Shading exploits assumptions on the illumination and reﬂectance properties of the scene (or of an object of interest within the scene) to relate its three-dimensional (3-D) shape to the measured grayscale image. The most typical assumptions are that the scene is Lambertian with constant diﬀuse albedo. This is akin to chalk and rough stone, and neglects specularities, translucency and other complex phenomena in the interaction of light with matter. While this assumption is clearly violated in most natural and man-made scenes, there are signiﬁcant portions of scenes where the assumption is reasonable, and even objects that are far from Lambertian, such as human faces, have been successfully approximated as such for the purpose of analysis and inference (but not for synthesis, as humans are evolutionarily atuned to discriminate subtle features in human faces). Clearly, being SFS an ill-posed problem, there is no way to validate the assumptions on the data themselves, so applying SFS to a scene that is not Lambertian and that does not have constant diﬀuse albedo will results in gross errors even if the SFS algorithm used is provably correct and optimal. The 1

For simplicity, however, we also neglect interreﬂections, as Lions et al. [11] did, and we lump their contribution into the ambient illumination term, up to additive errors.

698

E. Prados, N. Jindal, and S. Soatto

second class of assumptions commonly made concern illumination. The most common assumption, that of a point light source, is made more for mathematical convenience than for realism. Under this model, anything hidden from direct line-of-sight to the sun would be invisible, clearly a far cry from reality. Modeling the entire sky as a constant-radiance hemisphere seems to be equally crude, but indeed it has been shown to be a better approximation that a single pointlight source [8]. Clearly, both phenomena are important and we auspicate their eventual integration. In the next subsection we formalize these assumptions and introduce our notation. 2.1

Reﬂectance Assumptions

Let S be a surface that supports a bi-directional reﬂectance distribution function (BRDF) β with Lambertian reﬂection and constant diﬀuse albedo ρ. In other words, following [6], the BRDF at a point p ∈ S does not depend on the viewing direction νpx , but only on the light source direction ν and on the position of the point itself p ∈ S: β(p; νpx , ν) = ρ. Because the intensity of the light source is not known, without loss of generality we can assume that the albedo to be equal to 1, and attribute the actual value to the light source. 2.2

Lighting Assumptions

We assume the dominating sky principle [10], so we neglect inter-reﬂections and, for any point of the surface, consider only radiant energy coming from the sky which is assumed to be a whole sphere of inﬁnite radius. We also assume that the illumination is homogeneous, that is to say, that its power density distribution is constant. This assumption is required if we want to get rid of other contraints while still keeping the problem manageable. Now, unlike most previous work, we want to model the eﬀect of self-occlusions, whereby the light source is only partly visible at each point. Let q be a point in R3 . We call visibility function and we denote χS (q; ν) the indicator function of the directions ν ∈ S2 from q that are not occluded by S: χS (q; ν) = 1 if {q + λν, λ ∈ R+ } ∩ S = φ, otherwise, χS (q; ν) = 0. The visibility function speciﬁes if a point q is reached by the light ray of direction ν. The visibility cone assembles all the visible rays from a point q ∈ R3 : CS,q = {ν ∈ S2 : χS (q; ν) = 1}. 2.3

Resulting Radiance

Given the assumptions above, the radiance of the surface at a point p is χS (p; ν)ν, νp dν = ν, νp dν, = ν, νp + dν. RS (p) = S2

CS,p

(1)

CS,p

where νp is the unit normal vector to the surface S at p (see [6]) and where for all a in R, a+ = a if a ≥ 0 and a+ = 0 else. Here, the surface is implicitly assumed to be smooth. This ensures that all the ligth rays visible from a point come from above its tangent plane (the tangent plane would not be deﬁned otherwise). So,

A Non-local Approach to Shape from Ambient Shading

699

for all points p on S, all the ligth rays visible from that point are included in the hemisphere deﬁned by the normal νp to the surface at that point; that is to say CS,p ⊂ Hemiνp . Therefore ∀ν ∈ CS,p , ν, νp ≥ 0. Already at this point one can immediatly see the diﬃculty introduced by self-occlusions, for the integration domain of (1) is restricted to the visibility cone CS,p , which directly depends of the global geometry of the scene S. This is unlike traditional SFS, where the radiance only depended on local properties of the scene, for instance the direction of the normal νp to the surface at a given point. This requires the deployment of a diﬀerent arsenal of tools that traditionally considered in SFS.2 Unlike most prior work, we consider full ambient illumination. In such a case, the assumptions of [11] are equivalent to assuming that the surface is convex which is too restrictive an assumption. In the next section we relate the measurements, i.e. the image greyscale, to the unknown – the 3-D shape of the scene – via the model above.

3

Shape from Ambient Shading

In this section we formalize the problem of SFAS as the solution of a global integro-partial diﬀerential equation, which we analyze in the next section. 3.1

Imaging Equation

We assume that we measure a greyscale image I : D ⊂ R2 → R+ ; x → I(x), on a closed domain D. Our goal is to characterize the surfaces S which generate it. Note that in general there is no guarantee that the surface is unique. We now need to link the measurements (I) with the unknowns (S). To do so we use the assumptions developed in the previous section, together with the socalled Radiance equation [6], which approximates the brightness of a pixel x of the image with the radiance of the point πS−1 (x) of the surface viewed in x: I(x) = RS (πS−1 (x)). Using the results from the previous section we have I(x) = ν, νp + dν, (2) CS,p

where νp is the outward-pointing normal vector to the surface S at the point p = πS−1 (x). In what follows we are going to assume that the data I corresponds with an image of a scene verifying our modeling assumptions. In particular, for 2

In order to simplify the problem and to remove this global dependency, Lions, Rouy and Tourin [11] assume that for all the points of the surface, all the light sources located on the normal hemisphere are visible. In other words, they assume that there are no self-shadows. simpliﬁes strongly the problem Also, such an assumption because we have then C ν, νp RL (ν) dν = S2 ν, νp RL (ν) dν, where RL (ν) is S,p the power density distribution of the lighting. Also, this completely removes the global dependency of the radiance with respects to the whole shape.

700

E. Prados, N. Jindal, and S. Soatto

convenience, we rescale the range so as to have 0 ≤ I(x) ≤ π. Also for simplicity, we assume that the camera performs an orthographic projection of the scene. This is a reasonable hypothesis provided that the domain of interest in the scene is small compared to its distance to the camera. Under these conditions, we can represent the surface as the graph of a function u, and write the outward unit normal vector explicitly: S = {(x, u(x)); x ∈ D} ; ν(x,u(x)) = √ 1 (−∇u(x), 1). 2 1+|∇u(x)|

Finally, following [13], we could assume that the camera is a pinhole. This assumption could be forgone at the cost of a more complicated notation, but the core of the analysis in this paper would hold nevertheless. 3.2

Formulation as an Integro-Diﬀerential Equation

With the orthographic camera model, the image formation model above can be interpreted as a Partial Diﬀerential Equation (PDE) in the unknown function u: + 1 (−∇u(x), 1) , ν dν, (3) I(x) = 1 + |∇u(x)|2 Cu,(x,u(x)) where Cu,p denote CS,p (the surface S is represented by the function u). Solving the SFAS problem then amounts to integrating the PDE (3) given an image I. Clearly the result would be meaningful only if a solution exists, and if it is unique, or at least if one can characterize the set of functions u that are indistinguishable in the sense of all solving (3) for a given measured image I. Note that this equation is a ﬁrst-order stationary global integro-partial diﬀerential equation of the general form: H(x, u(x), ∇u(x), u(.)) = 0, ∀x ∈ Int(D). The numerical and theoretical study of the solutions of these kind of equation is done via the Hamiltonian 1 H(x, t, p, u) = (−p, 1), ν+ dν − I(x). 1 + |p|2 Cu,(x,t)

4

Analysis of the Shape from Ambient Shading Equation

We consider now the problem of uniqueness of solution of (3). While we show that the solution is, in general, not unique, we give an analytical characterization of all the diﬀerent scenes that – under the given assumptions – yield the same measured image. This analysis is important both for the purpose of implementing viable numerical integration scheme, and also to make SFAS a useful tool in Computer Vision. This is akin to what is done in Structure From Motion [5], where the 3-D structure of a scene is in general not unique, but one can easily characterize the solutions as being equivalence classes under the similarity, aﬃne or projective groups depending on knowledge on the camera calibration. 4.1

An Intrinsic Ambiguity

First, recall that 0 ≤ RS (p) ≤ π, p ∈ S and CS,p ⊂ Hemiνp , so one can easily show that RS (p) = π iﬀ CS,p = Hemiνp . Now, let us consider a completely

A Non-local Approach to Shape from Ambient Shading

701

u(x)

x

Fig. 1. Example of multiple solutions in dimension 1 when the image contains a subset of pixels having the maximal intensity. Any curve between the blue and the green curves, and which is concave on the set of points with maximal intensity, generates the same image as the one generated by the initial black curve.

white image with a maximal intensity: I(x) = π ∀ x ∈ D. With such an image, the solutions of equation (2) satisfy CS,p = Hemiνp for all the points p on the surface. Therefore, if we represent the surface as the graph of the function u, it is easy to see that the surface lies below the tangent plane to the surface at the point (x, u(x)). So, the solutions u of (3) are concave, and so is the surface S. Since inversely all concave functions generate such a white image then we can conclude that the set of solutions is comprised of all concave functions. In this case, the problem is clearly ill-posed because the image can be generated by a number of diﬀerent surfaces, and therefore the solution cannot be unique. This problem does not arise only in this pathological case: It is patent as soon as the image contains a subset of pixels having the maximal intensity, as we illustrate in Figure 1. Pixels with maximal intensity are shown in red, and the green curve corresponds with a maximal solution when the blue gives the minimal one. Any curve between these two, which is concave on the set of points with maximal intensity, generates the same image as the one generated by the black curve. In the following sections, we will show that this condition is minimal, in the sense that the solution is unique if and only if there are no subsets of pixels having the maximal intensity. Also, when there are multiple solutions, they are characterized by in terms of their value on these subsets. 4.2

Uniqueness Result and Characterization of the Solutions

In this section we show that the solutions of the SFAS problem are charaterized by their value on the subset {x | I(x) = π} ⊂ D. To the end, let us deﬁne Ω = {x | I(x) < π} and let us complete the equation H(x, u(x), ∇u(x), u) = 0, ∀x ∈ D

(4)

by some Dirichlet boundary conditions on CΩ = D− Ω = {x ∈ D | I(x) = π}. In other words, we assume that we know the height of the solution on this subset. The equation then becomes H(x, u(x), ∇u(x), u) = 0, ∀x ∈ Ω, (5) u(x) = ϕ(x) ∀x ∈ CΩ.

702

E. Prados, N. Jindal, and S. Soatto

For mathematical convenience, we also assume that the brightness image I is continuous (then Ω is an open subset of D) and that the intensity is maximal ¯ ⊂ Int D). We on the boundary of the image (in other words, we assume that Ω can now state the uniqueness theorem: Theorem 1. If u and v are two C 1 solutions to equation (5) then u = v on D. This theorem ensures that there exists at most a unique C 1 solution to equation (5). Also, it provides a characterization of the set of the solutions of equation (4), characterized by its values on the subset CΩ (the region where I(x) = π). If the image never saturates (CΩ is empty), then the solution is unique when complemented by a Dirichlet boundary condition. Equivalently, all solutions are parameterized by their boundary conditions. Because of space constraints, we cannot report the complete proof of theorem 1 here, and we refer the reader to our technical report [15] for details. The relevance of this result from the standpoint of Computer Vision is that if we know the depth of the scene on the subset where the image is saturated, then there exists a unique solution to the Shape From Ambient Shading problem. This means that, elsewhere on the image, ambient shading is suﬃcient to recover the original surface which generated the image. In the next section we develop an approximation scheme for numerically integrating (5).

5

Approximation Scheme and Numerical Algorithm

In section 3 we have formalized the SFAS problem as the solution of a partial diﬀerential equation of the form H(x, u(x), ∇u(x), u) = 0. We have then added Dirichlet boundary conditions on CΩ = D−Ω to arrive at a unique solution when the image is not saturated. In order to compute a reliable numerical solution to this equation, we use machinery available for Hamilton-Jacobi equations. The key point consists then in designing approximation schemes which are monotone [2, 1]. 5.1

A Monotonic Scheme

Following [1], we consider schemes of the form S(h, x, uρ (x), uρ ) = 0 where S : ¯ × R × B(Ω) ¯ → R : (h, x, t, u)

R+ × Ω → S(h, x, t, u); h ∈ R+ deﬁnes the size of the grid that is used in the corresponding numerical algorithms (a 2D Cartesian ¯ is the space of bounded functions deﬁned on the set Ω. ¯ uρ is the grid); B(Ω) ρ unknown (u is a function). Also, we are interested in the solution uρ of the ¯ and scheme S. We say that the scheme S is monotone if for all h ∈ R+ , x ∈ Ω ¯ ¯ t ∈ R the function S(h, x, t, ·) : B(Ω) → R is monotone. That is, for all y ∈ Ω, u(y) ≥ v(y), then S(h, x, t, u) ≥ T (h, x, t, v). An iterative algorithm for computing a numerical approximation of the solution directly follows. Given un (the approximation of uρ at step n), and a point ¯ the associated algorithm consists in solving the equation x of Ω, S(h, x, t, un ) = 0

(6)

A Non-local Approach to Shape from Ambient Shading

703

with respect to t. A solution of (6) is the updated value of un at x. Here, we are then going to use the deﬁnition of monotonicity given by Barles and Souganidis in [1]: Deﬁnition 1 (monotonicity). The scheme S(h, x, uρ (x), uρ ) = 0 defined in ¯ ∀t ∈ R and ∀u, v ∈ B(Ω), ¯ ¯ , is monotone if ∀h ∈ R+ , ∀x ∈ Ω, Ω u≤v

=⇒

S(h, x, t, u) ≥ S(h, x, t, v)

(the scheme is non-increasing with respect to u). The interest of the monotonicity is twofold. (i) With other basic assumptions (monotonicity with respect to t, existence of a subsolution, bound for the subsolutions), this property is the key to ensure that the scheme is stable (existence of the solution and of an upper bound), that the computed approximations converge towards the solution of the scheme, see [13]. (ii) Combined with some stability and consistency properties, the monotonicity ensures that the solutions of the scheme converge towards the continuous solution of the considered PDE when the grid vanishes see [1]. In what follows, we are going to design a monotonic approximation scheme for the SFAS problem in order to take advantage of all these beneﬁts. 5.2

Monotonic Scheme for the SFAS Problem

For readability, we denote Hu,t (x, p) = H(x, t, p, u). Let us recall that the Hamiltonian of insterest in SFAS is Hu,t (x, p) = Cu,(x,t) √ 1 2 (−p, 1), ν+ dν −I(x). 1+|p|

One can verify easily that Cu,(x,t) is decreasing (in the sense of inclusion) with respect to u and increasing with respect to t. Also, it follows that Hu,t veriﬁes exactly the same monotonic properties. On the other hand, in order to get a consistent approximation scheme, we have to replace ∇u (represented by the variable p in the above Hamiltonian) in the PDE by one of its numerical approximations (ﬁnite diﬀerences). The diﬃculty is then to ﬁnd such a discretization while maintaining monotonicity. In order to get a monotonic scheme, we take inspiration from Lax-Friedrichs scheme for conservation laws [3, 2]. We chose: S(h, x, t, u) = Hu,t (x, Du(x)) − θ Lut (x),

(7)

where Du(x) is the vector obtained by a centered discretization of ∇u(x), more precisely, the ith component of Du(x) is [Du(x)]i =

→ → u(x + h− ei ) − u(x − h− ei ) 2h

and where Lut (x) is the classical discretization of the Laplacian Δu(x) (in which one replaces u(x) by t), i.e. Lut (x) =

→ → u(x + h− ei ) + u(x − h− ei ) − 2t . 2 h

i=1..N

704

E. Prados, N. Jindal, and S. Soatto

This scheme, however, is still not necessarily monotonic. To satisfy this property, we need to ﬁnd an adequate value for θ. By diﬀerential calculus, one can verify that maxi=1..N h |∂pi Hu,t (x, Dz)| ≤ 2θ is a suﬃcient condition to ensure this property; see [15] for a detailled √ proof. By the same tools, one can also easily prove√that |∂pi Hu,t (x, p)| ≤ 2 2π. The scheme(7) is then monotonic as soon as θ ≥ 2πh. Also, to limit the smoothing due to the Laplacian term introduced in the scheme (term which can be interpreted as a regularization), θ must be as small as possible. On the other hand, under the assumptions of section 4.2, one can verify that any deep enough function is a subsolution of the scheme (7) (because the visibility cone becomes arbitrarily small). Moreover, the subsolutions are necessarily bounded by the function corresponding to convex hull deﬁned by the Dirichlet boundary constraints. Since the scheme is also increasing with respect to t and veriﬁes limt→+∞ S(h, x, t, u) ≥ 0 then theorems 3.1 and 3.5 of [13] ensure that the scheme (7) is stable and that the iterative approximations converge towards the solution of the scheme. In practice, we can start from any subsolution and we have just to update the surface with scheme (7) until convergence. Finally, our scheme being also consistent with the SFAS I-PDE, relying on Barle and Souganidis theorem [1], we can conjecture that the computed approximations converge towards the continuous solution of the I-PDE. This guarantees the reliability of our numerical approximations toward the theoretical solution of our problem.

6

Numerical Experiments

We focus here on the numerical results obtained by the algorithm associated to the scheme (7). As described in section 5.1, the approximation schemes suggest an iterative numerical algorithm, whose udating step (at point x) consists in solving equation S(h, x, t, u) = 0 (equation in t), where u is the approximation of the whole solution at the previous step. Here, to solve equation Hu,t (x, Du(x))−θ Lut (x) = 0, we rewrite this equation as a ﬁxed point equation t = g(t), where

→ − → − h2 g(t) = 14 i=1,2 (u(x + h ei ) + u(x − h ei )) − θ Hu,t (x, Du(x)

and then process the iterations tn+1 = g(tn ). In practice this process systematically converges after less than 5 iterations (we assign t0 to the previous value of u(x)). The numerical algorithm starts with a subsolution as a very steep valley such that visibility is closed to 0 for all points in the domain of the image. We refer the reader to [15] for further implemention details. To test our algorithm, we consider some scenarios for which the problem is well-posed. In other words, we limit the computation domain to a subset of Ω = {x | I(x) < π}. This computation domain is delimited by the red box in the corresponding ﬁgures. On the other part of the image domain, we enforce Dirichlet boundary conditions. In our tests, we use the sin(x) sin(y) surface. For the ﬁrst test, we restrict the computation domain to a subset on which the surface is convex. As shown in

A Non-local Approach to Shape from Ambient Shading

705

Fig. 2. Left: image generated by the sin x ∗ sin y surface with h = 0.05 and region of interest where we run the algorithm; middle: original surface (groundtruth) on the region interest; right: surface reconstructed by our algorithm (result)

Fig. 3. Left: image generated by the sin x ∗ sin y surface with h = 0.05 inside a cubical box and region of interest where we run the algorithm; middle: original surface (groundtruth) on the region of interest; right: surface reconstructed by our algorithm (result)

Fig. 4. sinx ∗ siny image with regularization and region of interest where we run the numerical scheme. Results of the numerical scheme with (right) and without (left) regularization in input image.

Fig. 5. Reconstruction with diﬀerent grid sizes h

706

E. Prados, N. Jindal, and S. Soatto Table 1. Errors for the ﬁrst two tests min value max value L1 errors L2 errors L∞ errors sin x sin y, Fig. 2 -0.999707 0.066750 0.006191 0.009792 0.033867 sin x sin y in box, Fig. 3 -0.999707 0.999568 0.188896 0.240712 0.372564 Table 2. Errors by adding the regularization term in the input image Min Value Max Value L1 Error L2 Error L∞ Error without regularization -0.999707 0.999568 0.186037 0.189434 0.207331 with regularization -0.999707 0.999568 0.065627 0.067900 0.078941 Table 3. Errors with respect to h

grid sizes (h) L1 error L2 error L∞ error

h = 0.2 0.504147 0.526644 0.658852

h = 0.1 0.358676 0.371685 0.424875

h = 0.08 0.270054 0.276862 0.308127

h = 0.05 0.186037 0.189434 0.207331

h = 0.04 0.151427 0.153691 0.166671

Figure 2, the computed iterative solution converges accurately towards the original surface. In the second test, we want to extend the computation domain to both concave and convex areas. To remove the ambiguity due to points with maximal intensity, we reduce the intensity of the image by placing the sin(x) sin(y) surface in a box, i.e. surrounded by four walls of a cube with the roof open. In this test, the algorithm converges towards the solution in both concave and convex regions. Nevertheless, as shown Figure 3, when the reconstruction is very accurate in the convex region, there is a signiﬁcant error in the concave region. Table 1 shows the minimum and maximum values of the original surfaces in the regions of interest (where the algorithm is applied). It also shows the L1 , L2 and L∞ errors. The top row shows the errors for the ﬁrst test (sin(x) sin(y) surface) illustrated in Figure 2. The second row shows the errors for sin(x) sin(y) surface inside a box; it corresponds with the result of Figure 3. In our experiments, we have used the L1 error to test for convergence. In the second test, one can understand the error on the concave region as a result of the introduction of the regularization term (which was needed to make the scheme monotonic). To further analyze this eﬀect, we focus on the concave part and we perform the following two experiments. 1) We run our algorithm with an input image containing the regularization term. More precisely, we use 1 ˜ I(x) = (−Du(x), 1), ν+ dν − θ Lu(x) 1 + |Du(x)|2 Cu,(x,u(x)) as input to our algorithm. So, in practice, the algorithm computes the solution of equation 1 ˜ − θ Lu(x) = 0 (−Du(x), 1), ν+ dν − I(x) 2 1 + |Du(x)| Cu,(x,u(x))

A Non-local Approach to Shape from Ambient Shading

707

and the computed solution should then better coincide with the original surface. We then make this third test with the sin x sin y surface inside the box (with a computation domain reduced to the concave part). As shown in table 2 and Figure 4, the algorithm is now able to recover accurately the surface. 2) Finally, since the regularization parameter θ is linearly dependent with the size of the grid h, then the regularization eﬀect should reduce when the size of the grid vanishes. We then redo the second test (sin x sin y surface inside a box, with the original image I, with the same reduced computation domain as previously) with smaller and smaller grid sizes: h = 0.2, 0.1, 0.08, 0.05, 0.04. Also, as we can see in Figure 5 and Table 3, the computed approximations actually converge towards the original surface when the grid size is reduced. In addition to conﬁrm the above assertion, this also validates our methodology and our theory which ensures a well-posed algorithm whose the output convergences towards the continuous solution when the grid vanishes.

7

Conclusion and Future Work

In 3-D reconstruction approaches to Computer Vision, illumination is rarely modeled explicitly. With few notable exceptions, most work in Structure From Motion assumes that illumination is constant and therefore it ascribes all photometric eﬀects to the radiance of the scene, regardless of how it comes to be. In Shape From Shading, where the illumination is key, most existing work models it as an ideal point light source. In this paper we focus on the opposite abstraction, where the illumination is diﬀuse, and indeed it is constant. Outdoor scenes on a cloudy day, or indoor scenes in modern oﬃces are reasonably well approximated by these conditions. Clearly one would like to account for arbitrary unknown radiant distributions, and possibly also illumination, but this would render the analysis prohibitive. Already under the restrictive assumptions we have chosen to operate under, the problem of recovering the 3-D shape of the scene translates to a global integro-diﬀerential equation that, to the best of our knowledge, has never been analyzed. Although algorithms have been explored in the past to exploit diﬀuse shading for recovering properties of the scene, a thorough theoretical study of the mathematical properties of this problem has been lacking. We believe we are the ﬁrst to study the uniqueness of SFAS, to show that – in general – it is not unique, and to characterize the set of scenes that are indistinguishable, in the sense of satisfying the assumptions of SFAS and generating the same image. While we believe that the main contribution of this paper is analytical, we do validate our results empirically in simulation. To that end, we propose a monotonic scheme for numerically integrating the SFAS equation, and show experimental results that highlight the features, and challenges, of this method.

Acknowledgement ANR-06-MDCA-007 and ONR N00014-08-1-0414.

708

E. Prados, N. Jindal, and S. Soatto

References 1. Barles, G., Souganidis, P.E.: Convergence of approximation schemes for fully nonlinear second order equations. Asymptotic Analysis 4, 271–283 (1991) 2. Crandall, M.G., Lions, P.L.: Two approximations of solutions of Hamilton-Jacobi equations. Mathematics of Computation 43(167), 1–19 (1984) 3. Crandall, M.G., Majda, A.: Monotone diﬀerence approximations for scalar conservation laws. Mathematics of Computation 34(149), 1–21 (1980) 4. Durou, J.-D., Falcone, M., Sagona, M.: Numerical methods for shape-from-shading: A new survey with benchmarks. CVIU 109(1), 22–43 (2008) 5. Faugeras, O.: Three-Dimensional Computer Vision: A Geometric Viewpoint. MIT Press, Cambridge (1993) 6. Horn, B.K.: Robot Vision. MIT Press, Cambridge (1986) 7. Horn, B.K., Brooks, M.J. (eds.): Shape from Shading. MIT Press, Cambridge (1989) 8. Koenderink, J.J., Pont, S.C., van Doorn, A.J., Kappers, A.M.L., Todd, J.T.: The visual light ﬁeld. Perception 36, 1595–1610 (2007) 9. Langer, M.S., Bulthoﬀ, H.H.: Depth discrimination from shading under diﬀuse lighting. Perception 29(6), 649–660 (2000) 10. Langer, M.S., Zucker, S.W.: Shape from shading on a cloudy day. Journal of Optical Society of America 11, 467–478 (1994) 11. Lions, P.-L., Rouy, E., Tourin, A.: Shape-from-shading, viscosity solutions and edges. Numer. Math. 64, 323–353 (1993) 12. Nayar, S., Ikeuchi, K., Kanade, T.: Shape from interreﬂections. IJCV 6(3), 173–195 (1991) 13. Prados, E.: Application of the theory of the viscosity solutions to the Shape From Shading problem. PhD thesis, Univ. of Nice-Sophia Antipolis (2004) 14. Prados, E., Faugeras, O.: Shape from shading: a well-posed problem? In: Proceedings of CVPR 2005, vol. II, pp. 870–877. IEEE, Los Alamitos (2005) 15. Prados, E., Jindal, N., Soatto, S.: A non-local approach to shape from ambient shading. Technical report, INRIA (2009) 16. Stewart, A.J., Langer, M.S.: Towards accurate recovery of shape from shading under diﬀuse lighting. IEEE Trans. on PAMI 19(9), 1020–1025 (1997) 17. Tian, Y.L., Tsui, H.T., Yeung, S.Y., Ma, S.: Shape from shading for multiple light sources. Journal of the Optical Society of America 16(1), 36–52 (1999) 18. Wada, T., Ukida, H., Matsuyama, T.: Shape from shading with interreﬂections under proximal light source-3D shape reconstruction of unfolded book surface from a scanner image. In: ICCV (1995) 19. Yang, J., Zhang, D., Ohnishi, N., Sugie, N.: Determining a polyhedral shape using interreﬂections. In: CVPR 1997, p. 110 (1997) 20. Zhang, R., Tsai, P.-S., Cryer, J.-E., Shah, M.: Shape from Shading: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 21(8), 690–706 (1999)

An Elasticity Approach to Principal Modes of Shape Variation Martin Rumpf and Benedikt Wirth Bonn University, 53113 Bonn, Germany {martin.rumpf,benedikt.wirth}@ins.uni-bonn.de http://www.ins.uni-bonn.de Abstract. Concepts from elasticity are applied to analyze modes of variation on shapes in two and three dimensions. This approach represents a physically motivated alternative to shape statistics on a Riemannian shape space, and it robustly treats strong nonlinear geometric variations of the input shapes. To compute a shape average, all input shapes are elastically deformed into the same conﬁguration. That conﬁguration which minimizes the total elastic deformation energy is deﬁned as the average shape. Each of the deformations from one of the shapes onto the shape average induces a boundary stress. Small amplitude stimulation of these stresses leads to displacements which reﬂect the impact of every single input shape on the average. To extract the dominant modes of variation, a PCA is performed on this set of displacements. To make the approach computationally tractable, a relaxed formulation is proposed, and sharp contours are approximated via phase ﬁelds. For the spatial discretization of the resulting model, piecewise multilinear ﬁnite elements are applied. Applications in 2D and in 3D demonstrate the qualitative properties of the presented approach.

1

Introduction

This paper is concerned with the notion of shape averages and principal modes of shape variation based on concepts from continuum mechanics, namely nonlinear and linearized elasticity. As shapes we consider object contours, encoded as edge sets in images. Compared to a classical principal component analysis in a vector space, where an average and a covariance tensor can be computed directly on the linear space itself, in the case of shapes we are dealing with highly nonlinear geometric variations. Hence, for the zero moment analysis – i. e. the deﬁnition of a suitable shape average – the total elastic energy stored in a set of deformations from the input shapes onto a single image shape is minimized. At the energy minimum the corresponding image shape is deﬁned as the shape average. Concerning a ﬁrst moment analysis, we propose a physically sound linearization of shape variations which allows to deﬁne a covariance tensor. Each deformation from an input onto the average shape induces stresses on the shape average, which can be regarded as the imprint of the input shape. Modulating these stresses leads to displacements on the shape average, where the mapping from stresses to X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 709–720, 2009. c Springer-Verlag Berlin Heidelberg 2009

710

M. Rumpf and B. Wirth

displacements is linear and well-deﬁned. Each of these displacements can be regarded as a linearization of the usually nonlinear elastic deformation from one of the image shapes onto the shape average. Thus, a covariance tensor can be computed based on these displacements of the shape average. It linearly encodes the modes of variation of the shape average induced by the set of input shapes, even though the underlying deformations are usually large and nonlinear. Finally, we perform a principal component analysis based on this covariance tensor, which allows to identify the dominant modes of variation of the input shapes. Our model is related to the physical interpretation of the arithmetic mean and the covariance tensor for n points x1 , · · · , xn in IRd . Indeed, the arithmetic d mean x ∈ IR minimizes i=1,...,n αd(x, xi )2 , where d(x, xi ) is the distance between x and xi . Due to Hooke’s law, the stored elastic energy αd(x, xi )2 in the spring connecting xi and x is proportional to the squared distance. Hence, the arithmetic mean minimizes the total elastic energy of the system of connected springs. Likewise, the covariance tensor (xi − x, xj − x) can – up to the spring constant – be identiﬁed with the covariance tensor (σi , σj ) of the forces σi pulling at the mean x. At ﬁrst, shape analysis was mainly based on correspondences between landmark positions on diﬀerent shapes as in the inﬂuential work by Cootes et al. [1]. Principal component analysis (PCA) is a classical, by deﬁnition linear statistical tool. Chalmond and Girard [2] have proposed a PCA which incorporates also truely nonlinear geometric transformations. A survey on the potential of shape analysis in brain imaging is given by Faugeras and coworkers in [3]. Another important application concerns ready-made clothing, where it would be favorable to know the shape of the average human body and its principal modes of variation to design clothes which suﬃciently ﬁt as many people as possible. Conceptually, correlations of shapes have been studied on the basis of a general framework of a space of shapes and its intrinsic structure. The notion of shape space was introduced by Kendall [4] already in 1984. Charpiat et al. [5] discuss shape averaging and shape statistics based on the Hausdorﬀ distance of sets. Statistics on signed distance functions was also studied by Leventon et al. [6], whereas Dambreville et al. [7] used shape statistics based on characteristic functions to deﬁne a robust shape prior in image segmentation. Kernel density estimation in feature space was introduced by Cremers et al. [8] to incorporate the probability of 2D silhouettes of 3D objects in image segmentation. An overview on related kernel density methods is given by Rathi et al. [9]. Mémoli and Sapiro [10] have investigated the Gromov–Hausdorﬀ distance as a global measure for the lack of isometry in shape analysis. In contrast to such a global measure for the defect from an isometry, the nonlinear elastic energy functional involved in our approach measures this defect locally, and locally isometric deformations indeed minimize the corresponding local functional. Understanding shape space as an inﬁnite-dimensional Riemannian manifold has been studied extensively by Miller et al. [11, 12]. Fuchs et al. [13] proposed a viscoelastic notion of the distance between shapes S given as boundaries of physical objects O. The elasticity paradigm for shape analysis on which our

An Elasticity Approach to Principal Modes of Shape Variation

711

approach is founded diﬀers signiﬁcantly from these metric approaches to shape space (cf. Sect. 4 for a detailed discussion of the conceptual diﬀerence). In this paper, shapes are represented implicitly via a diﬀused phase ﬁeld description. This in particular enables a robust and ﬂexible application in two and three dimensions.

2

Zero Moment Analysis

In this section we brieﬂy recall an elastic approach to shape averaging already presented in [14]. We consider shapes Si as the boundaries ∂Oi of suﬃciently regular objects Oi . Given n shapes, S1 , . . . , Sn , we seek an average shape S that reﬂects the geometric characteristics of the given shapes in a physical manner. For that purpose we assume that the average shape S can be described as a deformed conﬁguration of the input shapes, i. e. there are deformations φi : Oi → IRd , i = 1, . . . , n, with S = φi (Si ) (see Fig. 1). A natural choice for the shape average S is that particular shape which minimizes the total n accumulated deformation energy of all deformations, E[S, (φi )i=1,...,n ] = n1 i=1 W[Oi , φi ], where W[Oi , φi ] represents the stored deformation energy of the deformation φi . To ensure existence of a minimizing shape S, we add a regularizingprior L[S] to the energy. Here, we consider the Hd−1 -measure of S, i. e. L[S] = S da, and the shape average S is deﬁned as a minimizer of the energy E[S, (φi )i=1,...,n ] + μL[S]. As deformation energy W[Oi , φi ] we will employ a nonlinear, hyperelastic energy W[O, φ] = O W (Dφ) dx , whose integrand can be rewritten as a function of ˆ (Dφ, cofDφ, det (Dφ)) = W ¯ (I1 , I2 , I3 ) with only the three invariants W (Dφ) = W 2 2 T (I1 , I2 , I3 ) := (|Dφ|2 , |cofDφ|2 , det (Dφ)). |Dφ|2 := tr(Dφ Dφ), |cof(Dφ)|2 , and det(Dφ) describe the averaged local change of length, area, and volume, ˆ is conrespectively. We consider polyconvex energy functionals [15], where W vex and isometries, i. e. deformations with DφT Dφ = ½, are local minimizers p ¯ (I1 , I2 , I3 ) = α1 I 2 + (cf. Fig. 2). Typical energy densities are of the form W 1 q α2 I22 + α3 I3−s + α4 I3r with α1 , . . . , α4 > 0, where the penalization of volume 3 →0 ¯ I−→ ∞, enables us to control local injectivity (cf. [16]). shrinkage, i. e. W φ1

S1

S2

φ2

φ4

φ3

S3

S

φ5

S4

S5

Fig. 1. Sketch of elastic shape averaging. The input shapes Si (i = 1, . . . , 4) are mapped onto a shape S via elastic deformations φi . The shape S which minimizes the elastic deformation energy is denoted the shape average.

712

M. Rumpf and B. Wirth

Fig. 2. For two input shapes from Fig. 1 the deformation (via a deformed checkerboard), the averaged local change of length √12 |Dφi |2 , and the local change of area det(Dφi ) are depicted (colors encode range [0.95, 1.05])

This type of energy has two major advantages: it allows to incorporate large deformations with strong material and geometric nonlinearities, and its form follows from ﬁrst principles and allows to distinguish the physical eﬀects of length, area, and volume distortion, which reﬂect the local distance from an isometry. The ﬁrst Piola–Kirchhoﬀ stress tensor, which describes force per unit area in the reference conﬁguration O, is then recovered as σ ref [φ] = W,A (Dφ) := ∂W∂A(A) . The Cauchy (real) stress, describing the force per unit area in the deformed conﬁguration φ(O), reads σ[φ] = σ ref [φ](cofDφ)−1 . To simplify the numerical treatment and to allow for slight topological diﬀerences between the shapes Si we relax the constraint φi (Si ) = S, i = 1, . . . , n, and −1 introduce a penalty functional F [Si , φi , S] = Hd−1 (Si \ φ−1 i (S) ∪ φi (S) \ Si ) which measures the symmetric diﬀerence of the input shapes Si and the pull back φ−1 i (S) of S. Our shape averaging model is thus based on the energy 1 E [S, (φi )i=1,...,n ] = n i=1 n

γ

3

Oi

W (Dφi ) dx + γF [Si , φi , S] + μL[S] .

First Moment Analysis

As outlined in the introduction, our ﬁrst moment analysis on shapes is based on an analysis of stresses induced on the shape average by each individual input shape. Modulation of each of these stresses results in a certain displacement, and the proposed principal component analysis on shapes will be performed on these displacements. To comprehensively derive this model we proceed in several steps: Encoding nonlinear deformations via stresses on a linear vector space. Let us at ﬁrst review the underlying physical concept of stress. By the Cauchy stress principle, each deformation φi : Oi → O is characterized by pointwise boundary stresses on S in the deformed conﬁguration, which try to restore the undeformed conﬁguration Oi . The stress at some point x on S is given by the application of the Cauchy stress tensor σi = σ[φi ] to the outer normal ν on S. The resulting stress σi ν is a force density acting on a local surface element of S. Let us assume that the above relation between the energetically favorable deformation and its induced stresses is one-to-one. Hence, the average shape can be described in terms of the input shape Si and the boundary stress σi ν, and

An Elasticity Approach to Principal Modes of Shape Variation

713

we write S = Si [σi ν]. If we now scale the stress with a weight t ∈ [0, 1], we obtain a one-parameter family of shapes S(t) = Si [tσi ν] connecting Si = S(0) with S = S(1). Thus, we can regard σi ν as a representative of shape Si in the linear space of vector ﬁelds on S. Modeling the impact of an input shape on the average shape. Let us now study how the average shape S varies if we increase the impact of a particular input shape Sk for some k ∈ {1, . . . , n}. In fact, we intend to associate to every surface load σk ν a displacement on the averaged object domain O via the solution operator of a suitable linearized elasticity problem. Here, the object O actually is a deformed conﬁguration of diﬀerent original objects Oi . Hence, we have to choose a proper elasticity tensor which reﬂects the compound stress conﬁguration of the averaged domain O. A simple isotropic linearized elasticity model would not take into account the nonlinear geometric nature of our zero order analysis. To achieve this, we apply the Cauchy stress σk ν to the average shape S, scaled with a small constant δ. Based on our above discussion of stresses and due to the sketched equilibrium condition, this additional boundary stress δσk ν acts as a ﬁrst Piola–Kirchhoﬀ stress on the (reference) conﬁguration S. The elastic response is given by a correspondingly scaled displacement uk : O → IRd . To properly model the loaded conﬁgurations we concatenate this displacement with every nonlinear deformation φi and take into account the sum of the resulting elastic energies plus a term involving the given Cauchy stress in the following energy, 1 Ek [δ, u] = W[Oi , (½ + δu) ◦ φi ] − δ 2 σk ν · u da . n i=1,...,n S Now, the displacement uk is obtained as a minimizer of this modulated energy for a ﬁxed set of deformations (φi )i=1,...,n under the constraints O uk dx = 0 and O x× uk dx = 0, which encode zero average translation and rotation.Let us remark that the boundary integral can be replaced by the volume integral O σk : Du dx, which is more convenient with respect to a numerical discretization. To verify this, we use integration by parts and the fact that div σk = 0 holds on O. As Euler Lagrange condition for uk we obtain div σk [δ uk ] = 0 on O and σ[δ uk ]ν = δσk ν on S after a tedious but straightforward computation. Here, σ[δ uk ] :=

1 −1 W,A ((½ + δDuk )Dφi ◦ φ−1 i )cofD(φi ) n i=1,...,n

is the ﬁrst Piola–Kirchhoﬀ stress tensor on the compound object O, which eﬀectively reﬂects an average of all stresses in the n deformed conﬁgurations φi (Oi ) for i = 1, . . . , n. As long as A → W (A) is not quadratic in A, uk still solves a nonlinear elastic problem. The advantage of this nonlinear variational formulation is that it is of the same type as the one for the zero moment analysis, and it encodes in a natural way the compound elasticity conﬁguration of the

714

M. Rumpf and B. Wirth

σ2refν ref S2

σ2 ν

φ2

1 φ− 2 (x)

σ3 ν

σ1 ν

1 φ− 1 (x)

S1

x

1 φ− 3 (x)

ref ref

σ1 ν

φ1

S

φ3

σ3refν ref

S3

Fig. 3. Sketch of the pointwise stress balance relation on the averaged shape

averaged shape domain O. As an obvious drawback we have to consider the sum of n nonlinear elastic energies for the computation of every displacement uk , k = 1, . . . , n. In the limit for δ → 0, we would obtain uk as the solution of the actually linear elasticity problem div (C [u]) = 0 in O ,

C [u] ν = σk ν on S

for the symmetric displacement gradient [u] = (Du + DuT )/2 under the constraint O u dx = 0. Here, the in general inhomogeneous and anisotropic elasticity tensor C is deﬁned by 1 1 T C= Dφi W,AA [Dφi ]Dφi ◦ φ−1 , i n i=1,...,n det Dφi based on an appropriate transformation of the Hessian of the energy density W . This elasticity tensor takes into account the loads of the compound conﬁguration based on the combination of all deformations φi on the input objects Oi for i = 1, . . . , n. In our current implementation, we avoid the evaluation of C and consider the above nonlinear approximation, which is simpler to implement but computationally more expensive. The actual covariance analysis based on the derived displacements. Now, we have a set of displacements uk : O → IRd at hand which represent the variations of the average shape, induced by a modulation of the stresses σk from the deformations φk of the input shapes Sk into the average shape S. On this space of displacements, we consider the standard L2 –product (u, u ˜)2 := O u·˜ u dx and deﬁne the covariance operator Cov : L2 (O) → L2 (O); u → Covu :=

1 n

(u, uk )2 uk .

k=1,...,n

Obviously, Cov is positive deﬁnite on span(u1 , · · · , un ). Hence, we can diagonalize Cov on this ﬁnite dimensional space and obtain a set of L2 –orthogonal eigenfunctions wk : O → IRd – actually displacements – and eigenvalues λk > 0 with Covwk = λk wk .

An Elasticity Approach to Principal Modes of Shape Variation

715

Fig. 4. The two dominant modes (right) for four diﬀerent shapes (left) demonstrate that our principal component analysis properly captures strong geometric nonlinearities

These eigenfunctions can be considered as principal modes of variation of the average object O and hence of the average shape S, given the n input shapes. The eigenvalues encode the actual strength of these variations. Let us underline that this covariance analysis properly takes into account the usually strong geometric nonlinearity in shape analysis via the transfer of geometric shape variation to elastic stresses on the average shape, based on paradigms from nonlinear elasticity (cf. Fig. 4). These stresses lie in a linear vector space and thus allow for a covariance analysis, which is by deﬁnition linear. The interpretation of stresses in terms of displacements can be regarded as a proper choice of a scalar metric g(·, ·) on the space of stresses interpreted as a tangent space of the shape space at the average shape: we deﬁne g(σν, σ ˜ ν) := (u, u ˜)2 , given the above identiﬁcation of stresses σν, σ ˜ ν with induced displacements u, u ˜ via the proper compound elasticity problem. Finally, this identiﬁcation provides a suitable physical interpretation of stresses as modes of shape variation.

4

Elastic versus Riemannian Shape Analysis

The elasticity paradigm, on which our zero and ﬁrst order shape analysis are based, diﬀers signiﬁcantly from a Riemannian approach to shape space as proposed for instance by Srivastava et al. [17]. Due to the axiom of elasticity, the energy at the deformed conﬁguration S is independent of the path from a shape S˜ to the shape S along which the deformation is generated in time. Hence, there is no notion of shortest paths if we consider a purely elastic shape model. The visco-plastic model by Fuchs et al. [13] and the related model by Younes [18] deﬁne energies based on an integration of dissipation along transformation paths, where dissipation is understood as a Riemannian metric. This approach is not elastic in the classical axiomatic sense we consider here, and it partiularly requires that at rest the intermediate conﬁgurations are all stress-free. The above-mentioned conceptual diﬀerences are reﬂected in a diﬀerent behavior. If we regard shapes from a ﬂow-oriented perspective, then a visco-elastic approach would be more appropriate. However, the elastic approach is favorable for rather rigid, more stable shapes, since it prevents locally strong isometry violation. An example is provided in Fig. 5: The input shapes are regarded as two versions of an object that may have none, one, or two pins at more or less stable positions. Both pins are apparently not interpreted as shifted versions of each other since a shifting deformation would cost too much energy. However, if the material was visco-plastic, a horizontal shift of each pin would be easier and result in an average shape with just one centered pin and its variation being a

716

M. Rumpf and B. Wirth

Fig. 5. Average and variation (right) for two shapes with pins at diﬀerent positions (left). The pins are not interpreted as shifted versions of each other.

sideward movement. This corresponds to a completely diﬀerent perception of the input shapes. The strong local rigidity and isometry preservation of the elasticity concept becomes particularly evident in Fig. 4 and Fig. 6, where non-isometric deformations are concentrated only at joints. On a Riemannian manifold, the exponential map allows to describe geodesics from an averaged shape S – in the sense of Karcher [19] – to the input shapes Sk via Sk = expS (vk ) for some tangent vector vk at the shape S in shape space. Hence, a covariance analysis will be performed on the tangent vectors v1 , · · · , vn with respect to the Riemannian metric g(·, ·). In the strictly elastic setup, the shape space is in general not metrizable. Instead, the stresses σk play the role of the vk , imprinting the impact of Sk on the average shape S in terms of an induced displacement uk .

5

Finite Element Phase Field Approximation

Since explicit treatment of an edge set is diﬃcult in a variational setting, we consider a phase ﬁeld model picking up the approach by Ambrosio and Tortorelli [20] for the discretization of the Mumford–Shah model [21]. Hence, a shape S is encoded by a smooth phase ﬁeld function v : Ω → IR, which is close to zero on S and one in between. In our approach we construct such phase ﬁeld functions vi for the input shapes Si in advance. Usually, vi can be computed based on the model in [20] applied to the input images ui . The speciﬁc form of the phase ﬁeld function v for the averaged shape S is then directly determined via a phase ﬁeld approximation of our variational model. Given a phase ﬁeld parameter , which will determine the width of the phase ﬁeld, we ﬁrst deﬁne an approxi mate mismatch penalty F [vi , φi , v] = 1 Ω (v ◦ φi )2 (1 − vi )2 + vi2 (1 − v ◦ φi )2 dx . Here, we suppose v to be extended by1 outside the computational domain Ω. 1 Next, we consider the energy L [v] = Ω |∇v|2 + 4 (v − 1)2 dx , which acts as an approximation of the prior L[S]. Furthermore, we simplify the later numerical implementation by assuming that the whole computational domain behaves elastically with an elasticity several orders of magnitude softer outside the object domains Oi on the complement set Ω \ Oi . Thus, given a smooth approximation χOi of the characteristic function of the object domain O i , we deﬁne an ap proximate elastic energy W [Oi , φi ] = Ω (1 − η)χOi + η W (Dφi ) dx , where in our applications η = 10−4 . Finally, the resulting approximation of the total energy functional for the variational description of the average shape reads

An Elasticity Approach to Principal Modes of Shape Variation

717

1 (W [Oi , φi ] + γF [vi , φi , v]) + μL [v] . n i=1 n

E γ, [v, (φi )i=1,...,n ] =

In analogy, a phase ﬁeld approximation Ekγ, of the energy Ek can be constructed. In these approximations, F acts as a penalty with γ 1 and L ensures a mild regularization of the averaged shape with μ 1. Integration is performed only in regions where all integrands are deﬁned. The actual spatial discretization is based on ﬁnite elements. We consider the phase ﬁelds v, vi and deformations φi as being represented by continuous, piecewise multilinear (trilinear in 3D and bilinear in 2D) ﬁnite element functions on an image domain Ω = [0, 1]d . A cascadic multi scale approach is applied for the relaxation of the energy. For details both on the phase ﬁeld approximation and the numerical discretization we refer to [14].

6

2D and 3D Applications

We have applied our shape analysis approach to various collections of 2D and 3D shapes. The computed average and dominant variations for sets of 2D shapes are depicted in Figs. 1 to 7 as ﬁrst illustrative examples. Figure 1 shows the average of ﬁve human silhouettes. The corresponding deformations φi and local deformation invariants are displayed in Fig. 2 for two of the input shapes. Particularly the deformed checkerboard patterns show that – due to the invariance properties of the energy – isometries are locally preserved. Also, the indicators of length and area variation only peak locally at the person’s joints. The corresponding principal components are given in Fig. 6. The average shape is represented by the dark line, whereas the light red lines signify deformations of the shape along the principal components. Here, we see the bending of the arm and the leg basically decoupled as the ﬁrst two dominant modes of variation. The silhouette variations of raising the arm or the leg can only be obtained as linear combinations of the ﬁrst and fourth or of the second and third mode of variation, respectively. A larger set of shapes is treated in Fig. 7, where 20 binary images “device7” from the MPEG7 shape database serve as input shapes. Apparently, the ﬁrst principal component is given by a thickening or thinning of the leaves, accompanied by a change of indentation depth between them. The second mode obviously corresponds to bending the leaves, and the third mode represents local changes at the tips: A sharpening and orientation of neighboring

Fig. 6. A set of input shapes (cf. Fig. 1) and their modes of variation with ratios 1, 0.22, 0.15, and 0.06

λi λ1

of

718

M. Rumpf and B. Wirth

Fig. 7. Original shapes and their ﬁrst three modes of variation with ratios 0.20, and 0.05

λi λ1

of 1,

Fig. 8. 24 given foot shapes, textured with the distance to the surface of the average foot (bottom right). The range [−6 mm, 6 mm] is color-coded as .

λ1 /λ1 = 1

λ2 /λ1 = 0.010

λ3 /λ1 = 0.010

λ4 /λ1 = 0.003

λ5 /λ1 = 0.001

λ6 /λ1 = 0.0008

Fig. 9. The ﬁrst six dominant modes of variation for the feet from Fig. 8

tips towards each other, originating e. g. from the sixth or the second last input shape. The ﬁnal example uses 24 foot-shapes as input (which were originally provided as triangulated surfaces and then converted to characteristic functions

An Elasticity Approach to Principal Modes of Shape Variation

719

on the unit cube). The average shape is shown along with the original shapes in Fig. 8, where the input feet are color-coded according to their local distance to the surface of the average foot. It is doubtlessly diﬃcult to analyze the shape variation on this basis: We see modest variation at the toes and the heel as well as on the instep, but any correlation between these variations is diﬃcult to determine. The corresponding modes of variation in Fig. 9, however, are quite intuitive. For all modes we show the average in the middle and its conﬁgurations after deformation according to the principal components. The ﬁrst mode apparently represents changing foot lengths, the second and third mode belong to diﬀerent variants of combined width and length variation, and the fourth to sixth mode correspond to variations in relative heel position, ankle thickness, and instep height.

7

Conclusion

We have developed an elasticity-based notion of shape variation. Since the shape space of elastically deformable objects inherently does not possess a Riemannian structure, we utilized an alternative shape space structure, in which distance is replaced by elastic deformation energy and boundary stresses play the role of linear representations of shapes. Such an approach imposes a physically and mathematically sound structure on spaces of elastic objects. Its computational feasibility has been proven by application to sets of 2D and 3D shapes.

Acknowledgments The authors thank Guillermo Sapiro for pointing them to the issue of an elastic principal component analysis. We are grateful to Heiko Schlarb from adidas, Herzogenaurach, Germany, for providing 3D scans of feet. Furthermore, we acknowledge support by the Hausdorﬀ Center for Mathematics. Benedikt Wirth has been supported by the Bonn International Graduate School.

References 1. Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active shape models—their training and application. Computer Vision and Image Understanding 61(1), 38–59 (1995) 2. Chalmond, B., Girard, S.C.: Nonlinear modeling of scattered multivariate data and its application to shape change. IEEE Transactions on Pattern Analysis and Machine Intelligence 21(5), 422–432 (1999) 3. Faugeras, O., Adde, G., Charpiat, G., Chefd’Hotel, C., Clerc, M., Deneux, T., Deriche, R., Hermosillo, G., Keriven, R., Kornprobst, P., Kybic, J., Lenglet, C., LopezPerez, L., Papadopoulo, T., Pons, J.P., Segonne, F., Thirion, B., Tschumperlé, D., Viéville, T., Wotawa, N.: Variational, geometric, and statistical methods for modeling brain anatomy and function. NeuroImage 23, S46–S55 (2004)

720

M. Rumpf and B. Wirth

4. Kendall, D.G.: Shape manifolds, procrustean metrics, and complex projective spaces. Bull. London Math. Soc. 16, 81–121 (1984) 5. Charpiat, G., Faugeras, O., Keriven, R.: Approximations of shape metrics and application to shape warping and empirical shape statistics. Foundations of Computational Mathematics 5(1), 1–58 (2005) 6. Leventon, M., Grimson, W., Faugeras, O.: Statistical shape inﬂuence in geodesic active contours. In: 5th IEEE EMBS International Summer School on Biomedical Imaging, 2002 (2002) 7. Dambreville, S., Rathi, Y., Tannenbaum, A.: A shape-based approach to robust image segmentation. In: Campilho, A., Kamel, M. (eds.) ICIAR 2006. LNCS, vol. 4141, pp. 173–183. Springer, Heidelberg (2006) 8. Cremers, D., Kohlberger, T., Schnörr, C.: Shape statistics in kernel space for variational image segmentation. Pattern Recognition 36, 1929–1943 (2003) 9. Rathi, Y., Dambreville, S., Tannenbaum, A.: Comparative analysis of kernel methods for statistical shape learning. In: Beichel, R., Sonka, M. (eds.) CVAMIA 2006. LNCS, vol. 4241, pp. 96–107. Springer, Heidelberg (2006) 10. Mémoli, F., Sapiro, G.: A theoretical and computational framework for isometry invariant recognition of point cloud data. Foundations of Computational Mathematics 5, 313–347 (2005) 11. Miller, M.I., Younes, L.: Group actions, homeomorphisms and matching: a general framework. International Journal of Computer Vision 41(1-2), 61–84 (2001) 12. Miller, M., Trouvé, A., Younes, L.: On the metrics and euler-lagrange equations of computational anatomy. Annual Review of Biomedical Enginieering 4, 375–405 (2002) 13. Fuchs, M., Jüttler, B., Scherzer, O., Yang, H.: Shape metrics based on elastic deformations. Forschungsschwerpunkt S92, Idustrial Geometry 71, Universität Innsbruck (2008) 14. Rumpf, M., Wirth, B.: A nonlinear elastic shape averaging approach. SIAM Journal on Imaging Sciences (2008) (submitted) 15. Ciarlet, P.G.: Three-dimensional elasticity. Elsevier Science Publishers B. V., Amsterdam (1988) 16. Baker, T.: Three dimensional mesh generation by triangulation of arbitrary point sets. In: Computational Fluid Dynamics Conference, 8th, Honolulu, HI, June 9-11, 1987, vol. 1124-CP, pp. 255–271 (1987) 17. Srivastava, A., Jain, A., Joshi, S., Kaziska, D.: Statistical shape models using elastic-string representations. In: Narayanan, P. (ed.) ACCV 2006. LNCS, vol. 3851, pp. 612–621. Springer, Heidelberg (2006) 18. Younes, L.: Computable elastic distances between shapes. SIAM J. Appl. Math. 58, 565–586 (1998) 19. Karcher, H.: Riemannian center of mass and molliﬁer smoothing. Communications on Pure and Applied Mathematics 30(5), 509–541 (1977) 20. Ambrosio, L., Tortorelli, V.M.: On the approximation of free discontinuity problems. Bollettino dell’Unione Matematica Italiana, Sezione B 6(7), 105–123 (1992) 21. Mumford, D., Shah, J.: Optimal approximation by piecewise smooth functions and associated variational problems. Communications on Pure Applied Mathematics 42, 577–685 (1989)

Pre-image as Karcher Mean Using Diffusion Maps: Application to Shape and Image Denoising Nicolas Thorstensen, Florent Segonne, and Renaud Keriven Universite Paris-Est, Ecole des Ponts ParisTech, Certis [email protected] http://certis.enpc.fr/˜thorsten

Abstract. In the context of shape and image modeling by manifold learning, we focus on the problem of denoising. A set of shapes or images being known through given samples, we capture its structure thanks to the Diffusion Maps method. Denoising a new element classically boils down to the key-problem of pre-image determination, i.e.recovering a point, given its embedding. We propose to model the underlying manifold as the set of Karcher means of close sample points. This non-linear interpolation is particularly well-adapted to the case of shapes and images. We define the pre-image as such an interpolation having the targeted embedding. Results on synthetic 2D shapes and on real 2D images and 3D shapes are presented and demonstrate the superiority of our pre-image method compared to several state-of-the-art techniques in shape and image denoising based on statistical learning techniques.

1 Introduction Manifold learning, the process of extracting the meaningful structure and correct geometric description present in a set of training points Γ = {s1 · · · sp } ⊂ §, has seen renewed interest over the past years. These techniques are closely related to the notion of dimensionality reduction, i.e.the process of recovering the underlying low dimensional structure of a manifold M that is embedded in a higher-dimensional space §. Among the most recent and popular techniques are the Locally Linear Embedding (LLE) [5], Isomap [6], Laplacian eigenmaps [7] and Diffusion Maps [8, 9, 10]. In this paper we focus on Diffusion Maps. Their nonlinearity, as well as their locality-preserving property and stable behavior under noise are generally viewed as a major advantage over classical methods like principal component analysis (PCA) and classical multidimensional scaling [8]. This method considers an adjacency graph on the set Γ of training samples, which matrix (Wi,j )i,j∈1,...,p captures the local geometry of Γ - its local connectivity - through the use of a kernel function w. Wi,j = w(si , sj ) measures the strength of the edge between si and sj . Typically w(si , sj ) is a decreasing function of the distance d§ (si , sj ) between the training points si and sj . In this work, we use the Gaussian kernel w(si , sj ) = exp (−d2§ (si , sj )/2σ 2 ), with σ estimated as the median of the distances between all the training points [2, 10]. The kernel function has the property to implicitly map data points into a highdimensional space, called the feature space. This space is better suited for the study of non-linear data. Computing the Diffusion Maps amounts to embed the data into the X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 721–732, 2009. c Springer-Verlag Berlin Heidelberg 2009

722

N. Thorstensen, F. Segonne, and R. Keriven

feature space through a mapping Ψ . While the mapping from input space to feature space is of primary importance , the reverse mapping from feature space back to input space (the pre-image problem) is also useful. Consider for example the use of kernel PCA for pattern denoising. Given some noisy patterns, kernel PCA first applies linear PCA on the -mapped patterns in the feature space, and then performs denoising by projecting them onto the subspace defined by the leading eigenvectors. These projections, however, are still in the feature space and have to be mapped back to the input space in order to recover the denoised patterns. 1.1 Related Work Statistical methods for shape processing are very common in computer vision. A seminal work in this direction was published by Leventon et. al. [11] adding statistical knowledge into energy based segmentation methods. Their method captures the main modes of variation by performing a PCA on the set of shapes. This was extended to nonlinear statistics by Cremers et al. in [12]. The authors introduce non linear shape priors by using a probabilistic version of Kernel PCA (KPCA). Dambreville et.al [1] and Arias et al. [2] developed a method for shape denoising based on Kernel PCA. So did Kwok et al. [3] in the context of image denoising. Both methods compute a projection of the noisy datum onto a low dimensional space. In [13,4] the authors propose another kernel method for data denoising, the so called Laplacian Eigenmaps Latent Variable Model (LELVM), a probabilistic method. This model provides a dimensionality reduction and reconstruction mapping based on linear combinations of input samples. LELVM performs well on motion capture data but fails on complex shapes (see Fig. 1). Further we would like to mention the work of Pennec [14] and Fletcher [15] modeling the manifold of shapes as a Riemannian manifold and the mean of such shapes as a Karcher mean [16]. Their methodology is used in the context of computational anatomy to solve the average template matching problem. Closer to our work is the algorithm proposed by Etyngier et. al. [17]. They use Diffusion Maps as a statistical framework for non linear shape priors in segmentation. They augment an energy functional by a shape prior term. Contrary to us, they do not compute a denoised shape but propose an additional force toward a rough estimate of it.

Fig. 1. Digit images corrupted by additive Gaussian noise (from left to right, σ 2 = 0.25, 0.45, 0.65, 0.85). The different rows respectively represent, from top to bottom: the original digits; the corrupted digits; denoising with [1]; with [1]+ [2]; with [3]; with [3]+ [2]; with [4]; with our Karcher means based method. See table 2 for quantified results.

Pre-image as Karcher Mean Using Diffusion Maps

723

1.2 Our Contributions In this paper, we propose a new method to solve the pre-image (see Section 3) problem in the context of Diffusion Maps for shape and image denoising. We suggest a manifold interpretation and learn the intrinsic structure of a given training set. Our method relies on a geometric interpretation of the problem which naturally leads the definition of the pre-image as a Karcher-mean [16] that interpolates between neighboring samples according to the diffusion distance. Previous pre-image methods were designed for Kernel PCA. Our motivation for using Diffusion Maps comes from the fact that the computed mapping captures the intrinsic geometry of the underlying manifold independently of the sampling. Therefore, the resulting Nyström extension (see Section 2.2) proves to be more “meaningful” far from the manifold and leads to quantitatively better pre-image estimations, even for very noisy input data. In the case of shape denoising, we compare our results to the work proposed by Dambreville [1] and for image denoising, to several denoising algorithms using Kernel PCA: [3], [2], [4]. Results on 3D shapes and 2D images are presented and demonstrate the superiority of our method. The rest of the paper is organized as follows. Section 2 presents the Diffusion Maps framework and the out-of-sample extension. Section 3 introduces our pre-image methodology. Numerical experiments on real data are reported in section 4 and section 5 concludes.

2 Learning a Set of Shapes Let Γ = {s1 · · · sp } be p independent random points of a m-dimensional manifold M locally sampled under some density qM (s) (m << p). The manifold M is assumed to be a smooth finite-dimensional sub-manifold embedded in a (potentially infinitedimensional) space S. The density qM (s) is unknown and might not be uniform. In this work, we consider more general spaces than the traditional Euclidean space Rn and only assume that the input space S is equipped with a distance dS . 2.1 Diffusion Maps To extract the meaningful structure present in the training set Γ , classical manifold learning techniques minimize a quadratic distortion measure of the desired coordinates on the data, naturally leading to the eigenfunctions of Laplace-type operators as minimizers [8, 9]. Unfortunately, most unsupervised learning methods generate coordinates (the embedding) that combine the information of both the density qM and the geometry [9, 10, 18]. Diffusion Maps construct a discrete density-independent approximation of the Laplace-Beltrami operator ΔM defined on M and provide an embedding that captures the intrinsic geometry independently of the sampling density. We quickly review the construction of Diffusion Maps [8]. In a first step, we build a fully connected graph on the set Γ where each node correponds to a sample ∈ Γ . Based on the distance (dS ) between samples, nodes are connected if their mutal distance is less or equal to σ, with σ being the median distance between all shapes. In order to build the normalized Laplacian matrix we use the diffusion kernel w(., .) Pi,j = p(si , sj ) =

w(si , sj ) . g(si )

(1)

724

N. Thorstensen, F. Segonne, and R. Keriven

The diffusion kernel w(si , sj ) encodes the probability of transition between si and sj and g(si ) normalizes the quantity in (1) such that j p(si , sj ) = 1. Therefore, the quantity p(si , sj ) can be seen as the probability of a random walker to jump from si to sj and P encodes a Markov Chain on Γ . The function g(si ) measures the number of incident edges to the node corresponding to the shape si . If we introduce a time t and denote pt the elements of P t (the tth power of P ), then pt (si , sj ) corresponds to the probability of transition after t time steps. When t → ∞ the random walk converges to a unique stationary distribution ϕ0 . We have ϕT0 P = ϕT0 . Using a well known fact from spectral theory, Coifman [8] introduces the following eigen-decomposition of the kernel pt : pt (si , sj ) = λtl ψlt (si )ϕtl (sj ), (2) l

{λtl }

where is the decreasing eigenspectrum of P t and {ϕtl (sj )} respectively {ψlt (si )} the corresponding biorthogonal left and right eigenvectors. They verify ϕ0 (x)ψl (x) = ϕl (x).

(3)

Note that because of the fast decaying sequence of eigenvalues only a few terms need to be retained to approximate the probability pt (., .) within a certain relative accuracy. Then the diffusion distance Dt (si , sj ) between two points si and sj can be written as Dt2 (si , sj ) =

(pt (si , sl ) − pt (sj , sl ))2 l

ϕ0 (sl )

.

(4)

This simple L2 -weighted distance between the two conditional probabilities pt (si , .), pt (sj , .) defines a metric on the data that measures the amount of connectivity of the points si and sj along paths of length t. In order to relate the diffusion distance we have to combine (2) and (4) and find with the biorthogonality relation between left and right eigenvectors(cf. [10]) that Dt2 (si , sj ) = (λtl ψlt (si ) − λtl ψlt (sj ))2 . (5) l≥1

(since ψ0 is a constant vector, it is left out of the sum). Equation (5) shows that the right eigenvectors of Pt can be used to express the diffusion distance. To this end, we introduce the family of Diffusion Maps indexed by a time parameter t ⎞ ⎛ t t λ0 ψ0 (s) ⎟ ⎜ t t Ψt (s) = ⎝ λ1 ψ1 (s) ⎠ .. . In the sequel we will omit the parameter t and assume it set to a fixed value [10]. From Equation (5), we can see that Diffusion aps generate a quasi-isometric mapping since the diffusion distance is approximately equal to the L2 metric in the new coordinate system when retaining the first m eigenvectors. Also note that methods like LLE or Laplacian Eigenmaps do not provide an explicit metric which is crucial for the contribution in this paper.

Pre-image as Karcher Mean Using Diffusion Maps

725

2.2 Out-of-Sample Extension In general, the mapping Ψ , also referred to as an embedding, is only known over the training set. The extension of the mapping to new input points is of primary importance for kernel based methods whose success depend crucially on the “accuracy” of the extension. This problem, referred to as the out-of-sample problem, is often solved using the popular Nyström extension method [2, 19, 18]. Instead of recomputing the whole embedding, which can be costly for very large datasets because it involves a spectral decomposition, the problem is solved through a method borrowed from numerical analysis [20]. With this technique in hand and considering that every training sample verifies: ∀sj ∈ Γ ∀l ∈ 1, . . . , p p(sj , si )ψl (y) = λl ψl (si ), si ∈Γ

the embedding of new data points located outside the set Γ can similarly be computed by a smooth extension Ψˆ of Ψ : ⎧ → (ψˆ1 (s), . . . , ψˆp (s)) ⎨ S → Rp , s Ψˆ : ∀l ∈ 1, ..., p ψˆl (s) = λl (6) p(s, y)ψl (y). ⎩ y∈Γ

It is obvious that the extension depends on the data and recomputing the whole embedding with the new datum would yield a different embedding. But in general the approximation works well and is used throughout the literature. In addition, the reverse mapping from the feature space back to the input space is often required. After operations are performed in feature space (these operations necessitate the extension of the mapping), corresponding data points in input space often needs to be estimated. This problem, known as the pre-image problem, is the problem to be addressed in this paper. We now tackle the problem of pre-image computation using Diffusion Maps.

3 Pre-image as Karcher Means We push the manifold interpretation and define the pre-image of φ ∈ Rp as the point −1 s = Ψ|M (φ) in the manifold M such that Ψ (s) = φ. Although Diffusion Maps extract the global geometry of the training set and define a robust notion of proximity, they cannot permit the estimation of the manifold between training samples, i.e. the local geometry of the manifold is not provided. Following [21], we propose to approximate the manifold as the set of Karcher means [16] interpolating between correctly chosen subsets of m + 1 sample points, m being the fixed dimension reduction parameter. Usually it is chosen by observing the eigenvalues of the eigenvectors. As mentioned in Section 2.1 only a few eigenvectors are needed to approximate well the diffusion distance. And the parameter m is exactly the number of eigenvectors retained. From a dimensionality reduction point of view this parameter corresponds to the degree of freedom in the data set but which cannot be computed automatically and therefore must be guessed. In [21], these subsets are the Delaunay simplices of a m-dimensional Delaunay triangulation of

726

N. Thorstensen, F. Segonne, and R. Keriven

the sample points. This limits in practice m to small values. Here, we simply exploit the Euclidean nature of the feature space: for a given φ, we choose the interpolating subset as its m + 1 nearest neighbors with respect to the diffusion distance D. We then −1 define the pre-image s = Ψ|M (φ) as a Karcher mean that minimizes the mean-squared criterion: s = arg min Ψ (z) − φ2 (7) z∈S

3.1 Shape Interpolation Using Karcher Means Given a set of neighboring points N = {s1 , · · · , sm+1 } (i.e.neighboring for the diffusion distance D), we assume that the manifold M can be locally described (i.e.between neighboring samples) by a set of weighted-mean samples {sΘ } that verifies: sΘ = arg min θi dS (z, si )2 , (8) z∈S

1≤i≤m+1

m+1 where dS is the distance in the input space and θi ≥ 0, i=1 θi = 1 . The coefficients Θ = {θ1 , . . . , θm+1 } are the barycentric coefficients of the point sΘ with respect to its neighbors N in S. Proposed by Charpiat el al. [22], this model proved to give natural shape interpolations, compared to linear approximations. One classical choice is the area of the symmetric difference between the regions bounded by the two shapes: 1 dSD (s1 , s2 ) = |χΩ1 − χΩ2 | , (9) 2 where χΩi is the characteristic function of the interior of shape si . This distance was recently advocated by Solem in [23] to build geodesic paths between shapes. But the drawback is that this distance yields no unique geodesics. We proved this behavior analytically in the context of our method. But in the simulations we did not encounter any problems with the symmetric distance function. Another definition has been proposed [11, 24, 22], based on the representation of a curve in the plane, of a surface in 3D space, by its signed distance function. In this context, the distance between two shapes can be defined as the L2 -norm or the Sobolev W 1,2 -norm of the difference between their signed distance functions. Let us recall that W 1,2 (Ω) is the space of square integrable functions over Ω with square integrable derivatives: dL2 (s1 , s2 )2 = ||Ds1 − Ds2 ||2L2 (Ω,R) ,

(10)

dW 1,2 (s1 , s2 )2 = ||Ds1 − Ds2 ||2L2 (Ω,R) + ||∇Ds1 − ∇Ds2 ||2L2 (Ω,Rn ) ,

(11)

where Dsi denotes the signed distance function of shape si (i = 1, 2), and ∇Dsi its gradient. 3.2 Pre-image and Manifold Interpolation We propose to define the pre-image of a target point φ in the feature space, as the point sΘ that minimizes the energy EΨ (sΘ ) = Ψ (sΘ ) − φ2 , sΘ being expressed a Karcher

Pre-image as Karcher Mean Using Diffusion Maps

727

mean for the neighborhood N made of the m + 1 samples of Γ which embedding are the m + 1-closest neighbors of φ in the feature space equipped with D: −1 Ψ|M (φ) = arg min Ψ (sΘ ) − φ2 , sΘ where sΘ = arg min θi dS (z, si )2 z∈S

(12)

1≤i≤m+1

When the input space is some Euclidean space Rn with its traditional L2 -norm, this indeed amounts to assuming that the manifold M is piecewise-linear (i.e.linearly interpolated between neighboring training samples). For shapes, we will see that this yields natural pre-images. By simple extension, we define the projection of any new test sam−1 ple s on the manifold M by ΠM (s) = Ψ|M (Ψ (s)). 3.3 Implementation Issues −1 (φ) is computed by gradient descent. Instead of optimizing over The pre-image Ψ|M Θ, we use a descent over sΘ itself (Equation 13), constraining it to remain a Karcher mean (Equation 8). This boils down to projecting the deformation field ∇s Eψ onto the tangent space TM sΘ of M at point sΘ . Note that to compute this tangent space, we are implicitly assuming that the space S has a manifold structure, in particular that the tangent space TSsΘ of S at location sΘ (i.e.the space of local deformations around sΘ ) is equipped with an inner product that we denote .|. S . The optimality condition of Equation 8 is:

∀β ∈ TSsΘ ,

m+1

θi di ∇s di |β S = 0,

i=1

where we denote N = {s1 , ..., sm + 1} and di = dS (sΘ , si ). In order to recover the tangent space TM sΘ at sΘ , one needs to relate the m-independent modes of variations of m+1 the coefficient Θ (remember that i=1 θi = 1) with local deformation fields dsΘ ∈ TSsΘ . To a small variation of the barycentric coefficients Θ → Θ + dΘ, corresponds a small deformation of the sample sΘ → sΘ + dsΘ . Differentiating the optimality condition with respect to Θ and sΘ provides the relation between dΘ and dsΘ . For n example, when the input space m+1is taken to be the Euclidean m+1 space, i.e.S = R , we obviously obtain dsΘ = 1 dθi si . Remembering 1 dθi = 0 and fixing the dθi appropriately, we can recover TM sΘ . Therefore we optimize for sΘ without explicitly computing Θ. The gradient descent generates a family of samples s : τ ∈ R+ → s(τ ) ∈ M such that ds s(0) = s0 , = −v M (sτ ), dτ with s0 ∈ N (in practice, the nearest neighbor of φ). The velocity field v M (sτ ) is the orthogonal projection of the deformation field ∇sτ EΨ = (Ψ (sτ ) − φ)T ΛΨ T ∇sτ psτ onto the tangent space TM sτ . Here Λ is a diagonal matrix of eigenvalues and P si are the corresponding eigenvectors. Note that before projecting onto TM sτ we first orthogonalize the tangent space by using Gram-Schmidt. In the case of the L2 -norm the Θ’s can be

728

N. Thorstensen, F. Segonne, and R. Keriven

Fig. 2. Interpolation using Karcher means for 39 three-dimensional sample shapes. From left to right: a) a new shape not in the given sample b) the same shape with an occlusion c) the 3 nearest neighbors of the corrupted shape according to the diffusion distance (in red, green and blue) d) the original shape (in yellow) and our interpolation (in red). See text for quantitative results.

easily recovered. When using a different distance function such as the symmetric difference or the Sobolev W 1,2 -norm then one needs to solve additionally a system of linear equations in each step of the gradient descent.

4 Results In order to validate the proposed method, we run several experiments on real and synthetic data. First, we test the Karcher mean interpolation with the reconstruction problem of occluded 3D medical shapes [1]. In a second experiment we validate the purpose of the projection of the gradient onto the tangent space. Finally, a third experiment demonstrates the superiority of our method for a standard denoising problem on images. 4.1 Remaining on the Manifold To validate both the Karcher means modeling of the manifold and our projecting constraint (section 3.3), we generate a set of 200 synthetic shapes parameterized by an articulation angle and a scaling parameter (Fig. 3a). The corresponding embeddings are shown Fig. 3b. Choosing two distant shapes A and B, we compute a path s(τ ) from A to B be mean of a gradient descent starting from s(0) = A and minimizing dS (s(τ ), B). Fig. 3c and 3b show in red the intermediate shapes and the corresponding embeddings. In purple are shown the same path when projecting the gradient in order to remain on the manifold. Observe how the intermediate shapes look more like the original sample ones in that case. Note also that when remaining on M, the interpolating path is almost a straight line with respect to the diffusion distance. 4.2 Projection and Manifold as Karcher Means We here test the validity of using Karcher means as a manifold interpolation model. We consider the space of two-dimensional surfaces embedded in R3 . For such a general space, many different definitions of the distance between two shapes have been proposed in the computer vision literature but there is no agreement on the correct way to measure shape similarity. In this work, we represent a surface si in the Euclidean embedding space R3 by its signed distance function Dsi . In this context, we define the distance between two shapes to be the L2 -norm of the difference between their signed distance functions [11]: dS (s1 , s2 )2 = ||Ds1 − Ds2 ||2L2

Pre-image as Karcher Mean Using Diffusion Maps

729

Table 1. Average reconstruction error for a set of 9 noisy shapes Avg err of shapes with occlusion Nearest neighbors(NN) Mean of NN [1] Our method 4.67 1.81 1.96 1.1 0.58

Fig. 3. Synthetic sample of 200 articulated and elongated shapes. From left to right: (a) a subset of the sample. (b) triangulated 2-dimensional embedding computed using Diffusion Maps and a gradient descent from an initial shape to a target one, without (red dots) and with (purple dots) remaining on the interpolated manifold. (c) Some shapes of the resulting evolution (left column: without projection, right column: with projection.

Note that, in order to define a distance between shapes that is invariant to rigid displacements (e.g.rotations and translations), we first align the shapes using their principal moments before computing distances. Note also that the proposed method is obviously not limited to a specific choice of distance [22, 17]. We use a dataset of 39 ventricles nuclei extracted from Magnetic Resonance Image (MRI). We learn a random subset of 30 shapes and corrupt the nine remaining shapes by an occlusion (Fig. 2a,b). In order to recover the original shapes we project the shapes onto the shape manifold with our method. We then compare the reconstruction results with the nearest neighbor, the mean of the m+1 nearest neighbors and the method of Dambreville [1]. The parameters of this experiments is m = 2. In Figure 2-d one example of a reconstructed shape (red) is obtained from the m + 1 nearest neighbors of s• (Fig. 2c). In order to quantitatively evaluate the projection, we define the reconstruction error as e(s) = dS (s◦ , s)/σ, where s◦ is the original shape and s is the reconstructed shape. The occluded shape has an error of e(s• ) = 4.35, while the nearest-neighbor has an error of 1.81. In Table 1 we see that our method is superior the one proposed by Dambreville [1]. 4.3 Application: Denoising of Digits To test the performance of our approach on the task of image denoising, we apply the algorithm on the USPS dataset of handwritten digits1 . In a first experiment, we compare 1

The USPS dataset is available from http://www.kernel-machines.org.

730

N. Thorstensen, F. Segonne, and R. Keriven

Table 2. Average PSNR (in dB) of the denoised images corrupted by different noise levels σ. Training sets consist in 60 samples (first 4 rows) and 200 samples (last 4 rows). σ2 0.25 0.45 0.65 0.85 0.25 0.45 0.65 0.85

[1] 8.50 9.05 9,78 9.06 9.35 9.64 9.41 9,24

[3] 15.71 13,87 13,10 12,58 16.08 15.70 13.97 13.06

[2]+ [1] 10.17 9.98 9,58 8,61 11.97 10.18 10.26 10.25

[2]+ [3] 16.18 15,42 13,60 13,91 16.21 15.98 15.85 15.07

[4] Our method 14,01 17.71 13,91 17.52 13,89 17.38 13,87 17.32 15,27 17.95 14,85 17,85 14,13 17,79 14,07 17,75

our method to five state-of-the-art algorithms [1], [1]+ [2], [3], [3]+ [2] and [4]. For each of the ten digits, we form two training sets composed of randomly selected samples (60 and 200 respectively). The test set is composed of 40 images randomly selected and corrupted by some additive Gaussian noise at different noise levels. The process of denoising simply amounts to estimating the pre-images of the feature vectors given by the Nyström extension of the noisy samples. For all the methods, we take m = 8 for the reduced dimension (number of eigenvectors for the kernel-PCA based methods). Table 2 shows a quantitative comparison based on the pixel-signal-to-noise ratio (PSNR). Our method outperforms visually (Fig. 1) and quantitatively other approaches. Interestingly, it is less sensitive to noise than other ones and yields good results even under heavy noise.

5 Conclusions and Future Work In this paper, we focused on the pre-image problem. We provide a solution to the preimage problemusing Diffusion Maps. Following a manifold interpretation of the training set, we define the pre-image as a Karcher mean interpolation between neighboring samples with respect to the diffusion distance. Results on real world data, such as 3D shapes and noisy 2D images, demonstrate the superiority of our approach. In the continuation of this work several ideas may be exploited. In the perspective of working on complex shape spaces, our projection operator, defined from a manifold point-of-view, could be used in different tasks, such as segmentation with shape priors, interpolation and reconstruction of shapes, and manifold denoising. Interestingly, our approach is able to deal with manifolds of complex topology. In the context of manifold denoising this property can be useful. So far, none of the pre-image problems were tested when the training data itself contains heavy noise. We are currently investigating these directions.

References 1. Dambreville, S., Rathi, Y., Tannenbaum, A.: Statistical shape analysis using kernel PCA. In: IS&T/SPIE Symposium on Electronic Imaging (2006) 2. Arias, P., Randall, G., Sapiro, G.: Connecting the out-of-sample and pre-image problems in kernel methods. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 18-23 (2007)

Pre-image as Karcher Mean Using Diffusion Maps

731

3. Kwok, J.T., Tsang, I.W.: The pre-image problem in kernel methods. IEEE Transaction in Neural Network 15(6), 1517–1525 (2004) 4. Carreira-Perpiñan, M.A., Lu, Z.: The Laplacian Eigenmaps Latent Variable Model. JMLR W&P 2, 59–66 (2007) 5. Roweis, S., Saul, L.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000) 6. Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000) 7. Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation 15(6), 1373–1396 (2003) 8. Coifman, R., Lafon, S., Lee, A., Maggioni, M., Nadler, B., Warner, F., Zucker, S.: Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps. PNAS 102(21), 7426–7431 (2005) 9. Hein, M., Audibert, J.Y., von Luxburg, U.: From graphs to manifolds - weak and strong pointwise consistency of graph Laplacians. Journal of Machine Learning Research, ArXiv Preprint (forthcoming) (2006) 10. Lafon, S., Keller, Y., Coifman, R.R.: Data fusion and multicue data matching by diffusion maps. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(11), 1784–1797 (2006) 11. Leventon, M., Grimson, E., Faugeras, O.: Statistical shape influence in geodesic active contours. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 316–323 (2000) 12. Cremers, D., Kohlberger, T., Schnörr, C.: Nonlinear shape statistics in mumford shah based segmentation. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2351, pp. 93–108. Springer, Heidelberg (2002) 13. Lu, Z., Carreira-Perpinan, M., Sminchisescu, C.: People tracking with the laplacian eigenmaps latent variable model. In: Platt, J., Koller, D., Singer, Y., Roweis, S. (eds.) Advances in Neural Information Processing Systems, vol. 20, pp. 1705–1712. MIT Press, Cambridge (2008) 14. Pennec, X.: Intrinsic statistics on Riemannian manifolds: Basic tools for geometric measurements. Journal of Mathematical Imaging and Vision 25(1), 127–154 (2006); a preliminary appeared as INRIA RR-5093 (January 2004) 15. Davis, B., Fletcher, P., Bullitt, E., Joshi, S.: Population shape regression from random design data. In: ICCV, vol. 1 (2007) 16. Karcher, H.: Riemannian center of mass and mollifier smoothing. Comm. Pure Appl. Math. (30), 509–541 (1977) 17. Etyngier, P., Segonne, F., Keriven, R.: Shape priors using manifold learning techniques. In: 11th IEEE International Conference on Computer Vision, Rio de Janeiro, Brazil (October 2007) 18. Lafon, S., Lee, A.B.: Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(9), 1393–1403 (2006) 19. Bengio, Y., Paiement, J.F., Vincent, P., Delalleau, O., Le Roux, N., Ouimet, M.: Out-ofsample extensions for lle, isomap, mds, eigenmaps, and spectral clustering. In: Thrun, S., Saul, L.K., Schölkopf, B. (eds.) Advances in Neural Information Processing Systems, vol. 16. MIT Press, Cambridge (2004) 20. Baker, C.T.H., Baker, C.T.H.: Numerical analysis of volterra functional and integral equations. In: Duff, I.S., Watson, G.A. (eds.) The state of the art in numerical analysis, pp. 193– 222. University Press (1996)

732

N. Thorstensen, F. Segonne, and R. Keriven

21. Etyngier, P., Keriven, R., Segonne, F.: Projection onto a shape manifold for image segmentation with prior. In: 14th IEEE International Conference on Image Processing, San Antonio, Texas, US (September 2007) 22. Charpiat, G., Faugeras, O., Keriven, R.: Approximations of shape metrics and application to shape warping and empirical shape statistics. Foundations of Computational Mathematics 5(1), 1–58 (2005) 23. Solem, J.: Geodesic curves for analysis of continuous implicit shapes. In: International Conference on Pattern Recognition, vol. 1, pp. 43–46 (2006) 24. Rousson, M., Paragios, N.: Shape priors for level set representations. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2351, pp. 78–92. Springer, Heidelberg (2002)

Fast Shape from Shading for Phong-Type Surfaces Oliver Vogel, Michael Breuß, Thomas Leichtweis, and Joachim Weickert Mathematical Image Analysis Group, Faculty of Mathematics and Computer Science, Building E1.1 Saarland University, 66041 Saarbrücken, Germany {vogel,breuss,leichtweis,weickert}@mia.uni-saarland.de Abstract. Shape from Shading (SfS) is one of the oldest problems in image analysis that is modelled by partial diﬀerential equations (PDEs). The goal of SfS is to compute from a single 2-D image a reconstruction of the depicted 3-D scene. To this end, the brightness variation in the image and the knowledge of illumination conditions are used. While the quality of models has reached maturity, there is still the need for eﬃcient numerical methods that enable to compute sophisticated SfS processes for large images in reasonable time. In this paper we address this problem. We consider a so-called Fast Marching (FM) scheme,which is one of the most eﬃcient numerical approaches available. However, the FM scheme is not trivial to use for modern non-linear SfS models. We show how this is done for a recent SfS model incorporating the non-Lambertian reﬂectance model of Phong. Numerical experiments demonstrate that – without compromising quality – our FM scheme is two orders of magnitude faster than standard methods.

1

Introduction

Given a single 2-D image, the aim of Shape from Shading (SfS) is to infer the 3-D depth of the surface of depicted objects. For this, SfS uses the brightness variation in the image together with information on intensity and position of the light source. Much progress has been achieved in the last years in modelling SfS. As proper model components have been identiﬁed, SfS is now considered to be a well-posed problem. In recent model extensions, also non-Lambertian surfaces are taken into account within this well-posed framework. Thus, SfS has reached a reasonable level of maturity. However, these advances on the modelling side also lead to new challenges for numerical methods in this ﬁeld. In order to obtain 3-D reconstructions of good quality it is recommended to use modern, highly non-linear SfS models together with large, high-resolution input images. Thus, a proper algorithm must be able to deal with the arising large non-linear problems in reasonable computing time. In this paper, we show how to use a Fast Marching (FM) scheme for this purpose. It turns out that this is not trivial because of the involved non-linearities. Brief history of SfS models. The SfS-problem is a classic problem in computer vision. It was introduced in the works of Horn [1]. In particular, his model X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 733–744, 2009. c Springer-Verlag Berlin Heidelberg 2009

734

O. Vogel et al.

assumptions of an orthographic camera and Lambertian surface reﬂectance became a standard for early SfS research, see the review article [2]. However, the authors of [2] also concluded that orthographic SfS models do not perform well on synthetic data, and even worse on real-world images. In recent years, sophisticated models employing a more realistic perspective projection have been developed [3,4,5]. In [5], it was shown that the perspective camera model, together with a point light source at the optical centre of the camera and a non-linear light attenuation term, leads to the well-posedness of the SfS-task. Recently, this class of perspective SfS models has been extended to cover also non-Lambertian surface reﬂectance. In [6], the Lambertian diﬀuse reﬂection has been substituted by the model of Oren and Nayar [7] for the purpose of facial recognition. Another approach has been introduced in [8], where the reﬂectance model of Phong [9] well-known from computer graphics is used. The Fast Marching method. The SfS models of interest infer the problem to solve boundary value problems for a class of non-linear hyperbolic partial differential equations (PDEs) called Hamiton-Jacobi equations. The fast marching (FM) method is an eﬃcient technique for solving such problems. It was introduced by Tsitsiklis [10] and further developed by Sethian [11]. Our contribution. We show how to use the FM technique for the highly nonlinear, perspective SfS model given in [8] which especially incorporates light attenuation and the non-Lambertian reﬂectance model of Phong. In particular, we address the following issues. We consider the problem to compute an initial guess of the depth in surface points with the minimal distance to the camera. The estimation is non-trivial for highly non-linear models such as the one we use. For this estimate, a suitable set of corresponding image points needs to be identiﬁed in advance. In order to realise the scheme, one also needs to perform in each discretisation point a ﬁxed-point iteration, for which we give a well-working scheme here. Having solved these problems, we compare the method with other schemes in the Lambertian case, conﬁrming that our FM scheme is two orders of magnitude faster without a trade-oﬀ in accuracy. Then we apply the FM scheme directly for Phong-type non-Lambertian SfS of objects in real-world images taken with a standard digital camera. We show that our FM scheme delivers high-quality results in just a few seconds of computing time, while the method from [8] we compare with takes hours for computing comparable results. Relation to previous work. It is quite well-known that FM schemes may outperform other discretisation methods for the class of problems we are interested in, provided it is possible to construct such a scheme. The potential usefulness of FM schemes has also been noticed by other authors in the ﬁeld of SfS. The ﬁrst one who applied FM to the SfS problem was Sethian [11]. The model he considered was the classic orthographic Lambertian model with a single far light source. Later, Kimmel and Sethian used FM for the same set-up but with an oblique light source [12]. In [13], Yuen et al. apply FM at a Lambertian model incorporating a perspective projection. Let us note that this model is formulated

Fast Shape from Shading for Phong-Type Surfaces

735

in terms of unknown surface normals – in contrast to the unknown depth as in [5,3] – and it does not include a light attenuation term. Tankus et al. perform FM at a perspective Lambertian model also not incorporating light attenuation [14]. In [15], Prados and Soatto develop a FM approach based on ideas from optimal control theory. However, while they claim that their approach holds for perspective SfS with Lambertian reﬂectance, they only show computational results of their scheme for the classic orthographic model also considered in [11]. The paper [16] is an extension of the work [13], addressing problems with strong gradients of the authors’ previous method arising by occluded regions. As it is of importance in the context of this paper, let us stress that up to now the light attenuation has not been taken consequently into account in FM, and that non-Lambertian reﬂectance models have not been considered at all within an FM scheme. Note that exactly terms corresponding to these model assumptions yield strong non-linear contributions. Paper organisation. After brieﬂy introducing the Phong-type SfS model in Section 2, we describe in Section 3 in detail its discretisation of the SfS model, making use of the FM method. We then proceed elaborating on the choice of points featuring the initial guess of the 3-D depth in Section 4. Sections 5 and 6 are devoted to the experimental evaluation and a conclusion, respectively.

2

The Perspective SfS Model with Phong-Type Reflectance

The SfS model we deal with in this paper is given in [8]. We brieﬂy review here the developments in that work. The Phong reflection model. It is adequate to begin the presentation of the SfS model with the modeling ansatz given by the brightness equation due to Phong [9]. Assuming thereby the presence of only one light source, it reads as I = ka Ia +

1 kd Id cos φ + ks Is (cos θ)α r2

(1)

where I := I(x) is the normalised grey value of the image pixel located at T x = (x1 , x2 ) ∈ R2 , and r = uf is the distance of the surface point from the light source. In (1), the intensities of ambient, diﬀuse, and specular components of light are denoted by Ia , Id and Is , respectively. In analogous notation, the constants ka , kd , and ks with ka + kd + ks ≤ 1 denote the ratio of ambient, diﬀuse, and specular reﬂection. Discussing the light reﬂection contributions, the ambient light models a base intensity in the depicted scene, i.e., a basic illumination present everywhere. The diﬀusely reﬂected light in each direction is proportional to the cosine of the angle φ between surface normal and light source direction. In our scenario, the latter is identical to the direction of the optical centre. The amount of specular light also reﬂected in this direction is proportional to (cos θ)α , where θ is the angle between the ideal (mirror) reﬂection direction of the incoming light and

736

O. Vogel et al.

the optical centre. The number α is a constant depending on the roughness of the material. An ideal mirror reﬂection can be described via α → ∞. Note also that the cosine in the specular term is to be set to zero if it yields negative values. The SfS model. Plugging in appropriate expressions, the brightness equation (1) yields a nonlinear Hamilton-Jacobi equation. For details of the derivation see [8]. One (usual) important model assumption not mentioned up to now is the visibility of the surface. This means that it is in the front of the optical centre, so that the unknown 3-D depth u is strictly positive. Employing then the change of variables v := v(x) = ln(u(x)), the resulting model is given by 2 α M ks Is 2Q exp (−2v) JM − kd Id exp (−2v) − − 1 = 0, (2) Q M2 2 I )f /Q, M (x) = f 2 |∇v|2 + (∇v · x)2 + Q2 , and Q(x) = where J(x) = (I − k a a 2 f/ |x| + f 2 . In this description, |.| is the Euclidean vector norm and f is the focal length relating the optical centre of the camera and the retinal plane. The terms occuring in (2) can be distinguished by their ordering corresponding to their appearance within the brightness equation (1). T T ∂ ∂ Note that ∇v = ∂x v, v =: (vx1 , vx2 ) contains ﬁrst-order spatial ∂x2 1 derivatives, and thus the given model is a ﬁrst-order PDE. It needs to be supplemented by boundary conditions: for details see the section concerned with experiments. The expressions in (2) are also the basis for our numerical implementation of the FM scheme.

3

Discretisation and Fast Marching Implementation

It is of importance to discretise the occuring spatial derivatives in the correct fashion, as in the case of hyperbolic PDEs like the currently given HamiltonJacobi equation it is well-known that simply using central diﬀerences leads to a blow-up of numerical solutions. In order to ensure the stability of our algorithm as well as the validity of reasonable theoretical properties, we thus employ an upwind method as in [5, 4, 8]. Spatial Discretisation. We use the following conventions: – vi,j denotes the approximation of v (ih1 , jh2 ), where – i and j are the coordinates of the pixel (i, j) in x1 - and x2 -direction, respectively, and – h1 and h2 are the corresponding mesh widths in our pixel grid. Then the spatial discretisation of derivatives reads as vx1 (ih1 , jh2 ) ≈ h−1 1 min (0, vi+1,j − vi,j , vi−1,j − vi,j ) ,

(3)

vx2 (ih1 , jh2 ) ≈

(4)

h−1 2

min (0, vi,j+1 − vi,j , vi,j−1 − vi,j ) .

Fast Shape from Shading for Phong-Type Surfaces

737

Terms like Q, I and exp (−2v) can be evaluated pointwise at (i, j), so that we have completely deﬁned the spatial discretisation of (2). We refrain from writing down the complete discrete expression of the scheme, as this is quite cumbersome and does not give more insight. Fast Marching. Let us now turn to the FM method. We only sketch here the idea behind it, as there are many extensive descriptions available in the literature, see especially [11]. The basic principle behind the FM scheme applied in the SfS setting is to advance monotonically a front from the foreground of the depicted object to the background. Thereby, the pixels are distinguished by the labels ’known’, ’trial’ and ’far’, respectively, referring thereby via ’known’ and ’trial’ to the corresponding 3-D depth. In the beginning, all pixels are labelled as ’far’ with their depth values set to inﬁnity. However, since the FM method propagates information from the foreground to the background, it relies on correct depth values being supplied in the pixel which is most in the foreground, i.e. the pixel with minimum depth. In the case of complex images which consist of multiple segments, for each of these segments the correct depth in the point with minimum depth must be supplied. These points are called singular points. These singular points are then marked as ’trial’, which concludes the initialisation of the method. For FM methods on SfS it is common to just require this data to be provided. Other methods like [5], however, do not require the knowledge of given initial depth data. We therefore aim at estimating very precisely the locations of singular points and obtain a SfS method using the FM scheme that does not rely on any depth information to be provided. The task of estimating this data will be the subject of the next section. The ’trial’ candidate with the smallest computed depth is then marked as ’known’, taking the computed 3-D depth in this point as the estimate. The pixels adjacent in terms of the stencil to the new set of known points are updated with respect to their label, marking them as ’trial’. The described process is then repeated until all image pixels are marked ’known’. Fixed-Point Iteration. Updating the depth at ’trial’ points consists of solving the discrete form of (2) for v in this point. In contrast to other SfS techniques using FM, we need to solve a nonlinear equation. This is not trivial in our case, since near the solution, the derivative of (2) is very low, making standard solvers like the Newton method diverge in most cases. To avoid this, we employ the Regula Falsi: Starting with two values v1 and v2 such that v1 < v2 and the left-hand side L of (2) is negative in v1 and positive in v2 , one chooses v3 :=

L(v2 )v2 − L(v1 )v1 , L(v2 ) − L(v1 )

(5)

which is between v1 and v2 . If L at v3 is negative, set v1 := v3 , otherwise set v2 := v3 . Repeating this until v1 and v2 are very close together yields an estimate for the solution of (2) in this pixel. Note that computing the derivatives involves computing a minimum. Depending on v1 , v2 and v3 , these minima might change

738

O. Vogel et al.

within the estimation process. Thus, it is necessary to update the values of v1 , v2 , v3 during the process.

4

Estimating the Initial Depth

The FM methods for SFS rely on the knowledge of ground truth data at singular points, i.e. at points with locally minimal depth. However, in general this kind of data is not given. Thus, these depth values need to be estimated. In the experimental section, we will show that a good estimate is crucial for the reconstruction quality. In most other works, this issue is neglected. In [4], the problem is solved by obtaining an initial estimate for the depth using an orthographic SfS method. Their perspective method, however, is not comparable with the one used in this paper, since they neglect the light attenuation term. By doing this, their solution is invariant to multiplicative scalings of the depth. This is not true in our case. To obtain a working method, we either need to know the correct depth at singular points or estimate both the singular points and their depth. In this section, we will introduce ways to estimate the locations of singular points and estimate their depths as correctly as possible. Lambertian Case. For simplicity, we ﬁrst focus on the Lambertian case, i.e. ka = ks = 0, kd = 1. In this simpliﬁed model, the brightness of a pixel is determined by two main factors: (i) The angle between surface normal and light source direction φ and (ii) the light attenuation because of the distance of the surface point to the light source. Directly from the model (1) we obtain the simple equation cos φ I = Id 2 2 . (6) u f Assuming the surface to be continuously diﬀerentiable, the points of minimal depth are the points where the derivatives of the depth vanish, which means the surface normal points directly to the viewer. This results in φ = 0, which leads by use of cos 0 = 1 and re-arranging (6) to

1 u = Id 2 . (7) If Knowing the coordinates of singular points, we can compute the depth. It remains to determine the coordinates of singular points. Singular points are local minima in depth. Since minima in depth mean both less attenuation and a maximum Lambertian reﬂectance, this suggests that local maxima in image brightness are the singular points. At the image boundary, it might happen that we have brightness maxima that do not satisfy φ = 0. In this case, there can be errors. In most cases, this does not aﬀect the reconstruction quality signiﬁcantly. Due to sampling and quantisation artifacts, it is possible that this estimate might be slightly oﬀ, both in the location of singular points and in the estimated depth. This eﬀect is usually rather small.

Fast Shape from Shading for Phong-Type Surfaces

739

In conclusion, we propose to search local maxima in the image and estimate their depth according to equation (7). Boundary pixels should not be considered, since the estimate might be incorrect due to φ not being zero. The points obtained in this way should be marked as ’trial’ points for the subsequent FM method. In the Phong case which follows we use the same approach. Phong Case. To obtain a good estimate for singular points in the general case, we review the model equation again. Essentially, we have I = ka Ia +

α

kd Id cos φ + ks Is (cos θ) . u2 f 2

(8)

At singular points, we have φ = θ = 0, which simpliﬁes equation (8) to I = ka Ia +

kd Id + ks Is . u2 f 2

(9)

Now, after shifting the grey values down by the ambient brightness to I − ka Ia , we can separate diﬀuse and specular light and compute the diﬀuse brightness I by kd Id I = (I − ka Ia ). (10) kd Id + ks Is Now, we can make use of the equation (7) using I instead of I.

5

Experiments

In this section, we evaluate the presented method on both synthetic and realworld images. We discuss the accuracy and importance of the initial estimates at singular points. In comparison to other methods in the ﬁeld, we evaluate the accuracy and the performance of our method. Note that for none of the experiments, any a-priori depth information is used. In the cases where we need depth initialisation at singular points, we use the estimation method introduced in Section 4. Lambertian case. First, we restrict the method to diﬀuse reﬂection only. We compare the reconstruction quality and performance with the methods of Prados et al. [5], Cristiani et al. [17], and Vogel et al. [18], which use all the same Lambertian model, but diﬀerent schemes. Visually, the reconstructions of these methods are almost identical. Their performance, however, is diﬀerent. Figure 1 shows the vase surface [2], a classic test surface for SfS algorithms, and a rendered version of this surface using a Lambertian model. The rendering parameters are f = 492, Id = 100000, 128 × 128 pixels. When detecting the local maxima, we notice that around the maxima, we have more than just one point with the same maximal grey value. This is a result from the quantisation of the image. Since we set the depth estimates of these points to ’trial’, only one of them will be used as an actual depth estimate. This might not be the actual position of the singular point, but it is close.

740

O. Vogel et al.

Fig. 1. Vase surface and Lambertian rendered image

Fig. 2. Reconstruction of the vase using a Lambertian model. Left: Reference methods. Middle: Our method. Right: Wrong depth estimate. Table 1. Results of the Lambertian vase experiment Method Depth Error Initialisation Time Computation Time Prados et al. 0.39% ≈ 0s 36.99s Cristiani et al. 0.31% ≈ 0s 28.89s Vogel et al. 0.32% ≈ 0s 2.96s Our method 0.39% 0.02s 0.39s Wrong initialisation 8.15% 0.02s 0.39s

Figure 2 shows the reconstructions of the vase surface using both the presented method and the reference methods. The results are visually very similar. In Figure 2, also a reconstruction can be found where we manually chose a wrong depth at the singular points, i.e. we multiplied the estimates with a factor 0.9. We can see that this distorts the reconstruction. Table 1 supports our visual impression. It shows the relative average depth errors for the reconstructions. With the correct depth estimates, our method is about as good as the three reference methods. In fact, all reconstructions are nearly perfect. The quality of the reconstruction using the faulty estimation technique is a lot worse. This means the correctness of our initial guess is crucial for the reconstruction quality. Phong case. Now we evaluate the method on a synthetic image rendered using the Phong reﬂectance model. We compare to the same methods as before, but since the reference methods only consider a Lambertian model, we additionally compare to the method of Vogel et al. [8] using the Phong model.

Fast Shape from Shading for Phong-Type Surfaces

741

Fig. 3. Mozart surface and rendered image using the Phong model

Fig. 4. Reconstructions of the Mozart surface. Left: Lambertian methods. Middle: Vogel et al. using the Phong model. Right: Our method.

Table 2. Results of the Phong Mozart experiment. Methods marked with (L) use a Lambertian model for the reconstruction. Method Depth Error Initialisation Time Computation Time Prados et al. (L) 12.58% 0.02s 158.62s Cristiani et al. (L) 12.17% 0.02s 170.37s Vogel et al. (L) 12.56% 0.02s 16.33s Vogel et al. 5.39% 0.02s 68.76s Our method 5.07% 0.03s 1.85s

Figure 3 shows a rendered version and the ground truth of the Mozart face [2], a classic test image. This time, we rendered the image using the Phong reﬂectance model. Parameters for the rendering are f = 500, Id = Is = 100000, kd = 0.7, ks = 0.3, α = 5. Note that the Mozart face is a perfect test image for multiple sectors in an image, of which each has its own singular point. In Figure 4, we show reconstructions of the Mozart face using our method, the method of Vogel et al., and the three Lambertian reference methods. The Phong reconstructions are clearly more accurate than the Lambertian ones. Table 2 shows the reconstruction errors and computation times of the Mozart experiment. We notice that the error of our method is about equal to the one of the method of Vogel et al., and it the Lambertian methods w.r.t. quality. Again, our method is up to two orders of magnitude faster than any of the other methods. Another important observation is that the performance gain is

742

O. Vogel et al.

Fig. 5. Real input image: Rook, knight, and pawn

Fig. 6. Reconstructions of chess ﬁgures. Left: [8] with Phong. Right: Our method.

much larger on the Mozart test image compared to the vase. The reason for this is the larger size of the Mozart image. This suggests that on high-resolution images, our method might have a clear advantage over other methods in the ﬁeld. This is particularly interesting for real-world images. Many authors apply their methods to relatively small test images, at most 256 × 256 pixels, usually even much less. We now apply our method to a full-size real-world image with 8 megapixels. On such images, the reference methods take very long to converge. A Real-World Experiment. Figure 5 shows a photograph of three chess ﬁgures: a rook, a knight, and a pawn. The original image has size 3264 × 2448 and has been taken with a digital camera with ﬂash in a darkened room. For the reconstruction, we used the known square pixel sizes of 1.61μm and the known focal length of 70.2mm. This gives for pixel size 1 a relative focal length of f = 43478, which we used for the reconstruction. Since scaling Ia , Id and Is only stretches the reconstructed surface by a factor that depends quadratically on the scaling factor, their magnitude is not important for the reconstruction process. For simplicity, we just chose them all equal to 100000. We manually estimated the other parameters to kd = 0.6, ks = 0.4 and α = 10. We neglected ambient light, i.e. we set ka = 0.

Fast Shape from Shading for Phong-Type Surfaces

743

Figure 6 shows reconstructions of the high-resolution version of the image using our method and the method of Vogel et al., both with the same parameters. The reconstruction using our method looks much smoother than the one with the method of Vogel et al. This can be explained by the diﬀerent numerics of both methods. Our method starts at singular points and reconstructs the surface from near to far, while the other method treats all depths equally. For images like this, i.e. images with light objects in front of a dark background, our method has the clear advantage of recovering the object of interest ﬁrst, such that this part is not distorted by artifacts caused by the background. Table 3. Run times of the chess experiment. (S) marks experiments on a downsampled image of size 408 × 306, (F) denotes the full, high-resolution image. Method Vogel et al. (S) Our method (S) Vogel et al. (F) Our method (F)

Iterations 296 1 1207 1

Initialisation Time 0.03s 0.07s 1.98s 2.9s

Computation Time 139.8s 2.8s 38645s 263.2s

Table 3 shows the computation times compared to a test using a downsampled version of the image. While the computation times of our method are very low, the computation times of the iterative reference method are extremely high, especially for the large input image. This makes our method still applicable even on large images, outshining other methods with respect to computation time. It also shows that the performance gain of using FM for SfS increases with the size of the input image.

6

Conclusion

We have shown that the FM scheme is the method of choice for modern SfS models that incorporate light attenuation and non-Lambertian reﬂectance. Without compromising quality it is two orders of magnitude faster than other approaches. We demonstrated that it is possible to estimate initial depths to obtain a method that does not rely on the knowledge of initial data. By combining state-of-the-art SfS models and proper numerical methods, it becomes possible to tackle real-world data with image sizes of many megapixels. This is far beyond the size of the model problems that are considered in many SfS papers.

References 1. Horn, B.K.P., Brooks, M.J.: Shape from Shading. Artiﬁcial Intelligence Series. MIT Press, Cambridge (1989) 2. Zhang, R., Tsai, P.S., Cryer, J.E., Shah, M.: Shape from shading: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 21(8), 690–706 (1999)

744

O. Vogel et al.

3. Tankus, A., Sochen, N., Yeshurun, Y.: A new perspective [on] shape-from-shading. In: Proc. Ninth International Conference on Computer Vision, vol. 2, pp. 862–869. IEEE Computer Society Press, Nice (2003) 4. Tankus, A., Sochen, N., Yeshurun, Y.: Perspective shape-from-shading by fast marching. In: Proc. 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 43–49. IEEE Computer Society Press, Washington (2004) 5. Prados, E., Faugeras, O.: Shape from shading: A well-posed problem? In: Proc. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 870–877. IEEE Computer Society Press, San Diego (2005) 6. Ahmed, A., Farag, A.: A new formulation for shape from shading for nonLambertian surfaces. In: Proc. 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 17–22. IEEE Computer Society Press, New York (2006) 7. Oren, M., Nayar, S.: Generalization of the Lambertian model and implications for machine vision. Vogel-International Journal of Computer Vision 14(3), 227–251 (1995) 8. Vogel, O., Breuß, M., Weickert, J.: Perspective shape from shading with nonLambertian reﬂectance. In: Rigoll, G. (ed.) DAGM 2008. LNCS, vol. 5096, pp. 517–526. Springer, Heidelberg (2008) 9. Phong, B.T.: Illumination for computer-generated pictures. Communications of the ACM 18(6), 311–317 (1975) 10. Tsitsiklis, J.N.: Eﬃcient algorithms for globally optimal trajectories. IEEE Transactions on Automatic Control 40(9), 1528–1538 (1995) 11. Sethian, J.A.: Level Set Methods and Fast Marching Methods, 2nd edn. Cambridge University Press, Cambridge (1999) 12. Kimmel, R., Sethian, J.A.: Optimal algorithm for shape from shading and path planning. Vogel-Journal of Mathematical Imaging and Vision 14(3), 237–244 (2001) 13. Yuen, S.Y., Tsui, Y.Y., Chow, C.K.: Fast marching method for shape from shading under perspective projection. In: Proceedings of the 2nd IASTED International Conference on Visualization, Imaging and Image Processing, Malaga, Spain, September 2002, pp. 584–589 (2002) 14. Tankus, A., Sochen, N., Yeshurun, Y.: Shape-from-shading under perspective projection. International Journal of Computer Vision 63(1), 21–43 (2005) 15. Prados, E., Soatto, S.: Fast marching method for generic shape from shading. In: Paragios, N., Faugeras, O., Chan, T., Schnörr, C. (eds.) VLSM 2005. LNCS, vol. 3752, pp. 320–331. Springer, Heidelberg (2005) 16. Yuen, S.Y., Tsui, Y.Y., Chow, C.K.: A fast marching formulation of perspective shape from shading under frontal illumination. Pattern Recognition Letters 28, 806–824 (2007) 17. Cristiani, E., Falcone, M., Seghini, A.: Some remarks on perspective shape-fromshading models. In: Sgallari, F., Murli, F., Paragios, N. (eds.) SSVM 2007. LNCS, vol. 4485, pp. 276–287. Springer, Heidelberg (2007) 18. Vogel, O., Breuß, M., Weickert, J.: A direct numerical approach to perspective shape-from-shading. In: Lensch, H., Rosenhahn, B., Seidel, H.P., Slusallek, P., Weickert, J. (eds.) Vision, Modeling, and Visualization, pp. 91–100. AKA, Berlin (2007)

Generic Scene Recovery Using Multiple Images Kuk-Jin Yoon1 , Emmanuel Prados2 , and Peter Sturm2, 1

Computer vision Lab., Dept. Information and Communications, GIST, Korea 2 Perception Lab., INRIA Grenoble - Rhône-Alpes, France

Abstract. We present a generative model based method for recovering both the shape and the reﬂectance of the surface(s) of a scene from multiple images, assuming that illumination conditions are known in advance. Based on a variational framework and via gradient descents, the algorithm minimizes simultaneously and consistently a global cost functional with respect to both shape and reﬂectance. Contrary to previous works which consider speciﬁc individual scenarios, our method applies to a number of scenarios – mutiview stereovision, multiview photometric stereo, and multiview shape from shading. In addition, our approach naturally combines stereo, silhouette and shading cues in a single framework and, unlike most previous methods dealing with only Lambertian surfaces, the proposed method considers general dichromatic surfaces.

1

Introduction and Related Work

Many methods have been proposed to recover the three-dimensional surface shape using multiple images during these last two decades [1]. On the other hand, for a long time, the estimation of surface radiance/reﬂectance was secondary. Even some recent works [2,3,4,5] compute the 3D shape without considering radiance estimation. However, radiance/reﬂectance estimation has become a matter of concern in multiview reconstruction scenarios in the last decade [6, 7, 8]. Especially, recovering reﬂectance is required for realistic relighting, which is also fundamental in virtual reality as well as augmented reality. In addition, in real life applications, perfect Lambertian surfaces are rare and, therefore, multiview stereo algorithms have to be robust to specular reﬂection. Widespread ideas are to use appropriate similarity measures [2,9,10] and/or to modify input images in order to remove specular highlights [11, 12]. However, those similarity measures are not generally valid under general lighting conditions and these methods are strongly limited by the speciﬁc lighting conﬁguration. Concerning the robustness to non-Lambertian eﬀects, it is also worth to cite [6] which considers the radiance tensor. However, the radiance tensor presented in [6] is not appropriate when the images of the scene are taken under several (diﬀerent) lighting conditions. In this paper, our goal is to provide a model based method that simultaneously estimates shape and reﬂectance by combining stereo, silhouette, and

This work was supported by the Flamenco project (ANR-06-MDCA-007) and by the GIST Dasan project.

X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 745–757, 2009. c Springer-Verlag Berlin Heidelberg 2009

746

K.-J. Yoon, E. Prados, and P. Sturm

shading cues in a single framework. The method we propose is robust to nonLambertian eﬀects by directly incorporating a specular reﬂectance model in the mathematical formulation of the problem. By incorporating a complete photometric image formation model, it also exploits proliﬁcally all the photometric phenomena, as it is explicitly done in photometric stereo methods. Also, it allows to naturally deal with images taken under several lighting conditions. Let us note that actually there already exist recent works that provide solutions in this direction. [13] proposes a model-based method for recovering the 3D shape and the reﬂectance of a non-Lambertian object. Nevertheless, in this paper, the authors constrain the object to be made of a single textureless material — the parameters of the reﬂectance (in particular the albedo) are the same for all the points of the object surface. So, the method in [13] is a “multiview shape from shading” method, similarly as the one proposed by [8, 14] which focus on the Lambertian case. To our knowledge, with the exception of [15, 16], all the works going in the same direction as ours are limited to surfaces made of a single (textureless) material. In particular, this is the case for the photometric stereo methods proposed by [17, 18] and for the multiview photometric stereo work of [19]. Only the similar works [15, 16] are able to recover scenes with varying albedo. However, in [15,16], the authors tried to ﬁlter out specular highlights by using a simple thresholding and to use only diﬀuse components to estimate the shape. [15] also used a thresholding to detect shadowed pixels not visible from light sources, which is however not working under multiple light sources. In our work, we do not want to restrain ourselves to a single textureless material. (In return, we assume that lighting conditions are known in advance.) And, more generally, one of the goals of this paper is to show that the joint computation of shape and reﬂectance is beneﬁcial from several points of view. In addition to providing the reﬂectance of the scene, this allows to naturally introduce specular models in the mathematical formulation of the multiview reconstruction problem; and thus the method to be robust to highlights. Without any additional eﬀort, it is also possible to deal with a set of images lighted by several diﬀerent conditions (which is not possible with radiance only). Moreover in such a case, the method allows to completely exploit the variations of the radiance according to the changes of illumination, as in photometric stereo. Finally, this allows to easily incorporate some constraints on the reﬂectance and so in particular to naturally exploit shading eﬀects in textureless regions. Here, let us emphasize that, contrary to previous works considering speciﬁc scenarios, our method can be applied indiscriminately to a number of scenarios — multiview stereovision, multiview photometric stereo, and multiview shape from shading.

2

Modeling Assumptions and Notations

We assume here that the scene can be decomposed into two entities: the foreground, which corresponds to the objects of interest, and the background. The foreground is composed by a set of (bounded and closed) 2D manifolds of R3 and represented by S.

Generic Scene Recovery Using Multiple Images

747

Images are generated by nc pinhole cameras. The perspective projection performed by the ith camera is represented by Πi : R3 → R2 . πi ⊂ R2 is the image domain of the ith camera. It is split into two parts: the pixels corresponding to the foreground, πiF = πi ∩ Πi (S), and the other points πiB = πi \ πiF . Ii : πi → Rc is the image of the true scene, captured by the ith camera1 . I is the set of input images and IiF and IiB are the restrictions of the function Ii to πiF and πiB , respectively. In other respects, the visibility function vSi : R3 → R is deﬁned by: vSi (X) = 1 if X is visible from the ith camera and vSi (X) = 0 if not. Si denotes −1 the part of S that is visible from the ith camera and Πi,S is the back-projection th from the i camera onto Si . We model the scene as being illuminated by a ﬁnite number of distant point light sources and an ambient light. nil is the number of illuminants corresponding to the ith camera and lij ∈ S2 and Lij ∈ Rc are the direction and the intensity1 of the j th illuminant of the ith camera, respectively. Similarly, Lia ∈ Rc is the intensity1 of the ambient illumination of the ith camera. vLij : R3 → R is the light visibility function: vLij (X) = 1 if the j th illuminant of the ith camera is visible from X, vLij (X) = 0 otherwise. We model the foreground object(s) by its shape S and its reﬂectance R. We denote Ω = (S, R). Contrary to most previous stereovision methods, we want to go beyond the Lambertian model. In order to get a solvable minimization problem without too many unknowns, we represent the reﬂectance by a parametric model. In this work, we consider the popular Blinn-Phong shading model. In this context, and assuming that Ii (x) is equal to the radiance −1 of the surface S at point X = Πi,S (x) in the direction of the ith camera, the images Ii are decomposed as Ii = Iid + Iis + Iia , where Iid , Iis , and Iia are images with the diﬀuse, specular, and ambient reﬂection components of Ii , respectively. Here, diﬀuse reﬂection can be expressed by using the cosine nil law as Iid (x) = n(X) · l , where ρd (X) ∈ Rc is v (X) ρ (X)L d ij ij j=1 Lij the diﬀuse albedo1 at point X ∈ S, n(X) is the normal vector to the surface S at X. On hand, specular reﬂection is expressed as Iis (x) = the other αs (X) nil , where hij (X) is the bisector of n(X) · h v (X) ρ (X)L (X) s ij ij j=1 Lij the angle between the view of the ith camera and the j th illuminant at X, ρs (X) ∈ Rc and αs (X) ∈ R+ are the specular albedo and the shininess parameter at point X. The ambient illumination is assumed to be uniform and modeled as Iia (x) = ρd (X)Lia , where Lia is deﬁned above. Finally, the image formation equation is given as Ii (x) =

nil

vLij (X)Lij (X, n(X)) + ρd (X)Lia ,

(1)

j=1

αs (X) . We where Lij (X, n(X)) = Lij ρd (X) n(X) · lij + Lij ρs (X) n(X) · hij (X) denote R = (Rd , Rs ), where Rd = ρd and Rs = (ρs , αs ). As suggested by [20,21], to be sure that the estimated foreground surface does not shrink to an empty set, it is crucial to deﬁne and characterize the background. 1

Non-normalized color vector, if c = 3.

748

K.-J. Yoon, E. Prados, and P. Sturm

In this work, we assume that we have the background images I˜ = {I˜1 , · · · , I˜nc } and deﬁne (I˜iF , I˜iB ) analogously to (IiF , IiB ).

3

Bayesian Formulation of the Problem

Clearly, the goal of this work is to estimate the shape S and the reﬂectance R of a scene surface Ω, that maximize P (Ω|I) for given I. By Bayes’ rule, P (Ω|I) = P (I|Ω) P (Ω)/P (I) ∝ P (I|Ω) P (Ω) = P (I|S, R) P (S) P (R)

(2)

under the assumption that S and R are independent. Here, P (I|Ω) = P (I|S, R) is a likelihood and P (S) and P (R) are priors on the shape and reﬂectance, respectively. When Πi is given, we can produce a synthetic image I¯i (Ω) corresponding to Ii by using the current estimation of Ω. This allows us to measure the validity of the current estimation by comparing input images with generated ones. When assuming an independent identical of observations, the likelihood nc distribution nc can be expressed as P (I|Ω) ∝ i=1 exp − ξi (Ω) = i=1 exp − ξ(Ii , I¯i (Ω)) , where ξi (Ω) = ξ(Ii , I¯i (Ω)) is a function of Ω, measuring image dissimilarity. A typical and prior for the surface shape S is about the area given reasonable as P (S) ∝ exp − ψ(S) . Here, ψ(S) is the monotonic increasing function of the surface area S dσ where dσ is the Euclidean surface measure. In other respects, a prior on the reﬂectance is also required because there are not enough observations exhibiting specular reﬂection at every surface point. To overcome the lack of observations, we assume that specular reﬂectance varies smoothly within each homogeneous material surface patch. This prior is clearly reasonable in real life applications and in common scenes. Thus, in this work, we use the diﬀuse reﬂectance of a surface as a soft constraint Ω and to partition deﬁne the prior on the surface reﬂectance as P (R) ∝ exp − ω(R) , where ω(R) will be deﬁned later.

4

Description of the Cost Functions

Based on the section 3, the problem can be expressed terms of cost functions in nc as Etotal (Ω) = Edata (Ω) + Eshape (S) + Eref l (R) = i=1 ξi (Ω) + ψ(S) + ω(R). Maximizing the probability P (Ω|I) is equivalent to minimizing the total cost. Data Cost Function. The current estimation of Ω gives a segmentation of the input image Ii into foreground IiF and background IiB and we can synthesize I¯iF according to the above image formation model. As for I¯iB , it is generated according to the available background model. In this paper, we use actual background images, i.e. I¯iB =I˜iB . Also, as suggested by [20], ξi (Ω) = ξ(Ii , I¯i ) is then rewritten as ξ(Ii , I¯i ) = ξF (IiF , I¯iF ) + ξB (IiB , I¯iB ) = ξˆF (IiF , I¯iF ) + ξ(Ii , I˜i ),

(3)

Generic Scene Recovery Using Multiple Images

749

where ξˆF (IiF , I¯iF ) = ξF (IiF , I¯iF ) − ξF (IiF , I˜iF ). Since ξ(Ii , I˜i ) is independent of nc ˆ ξF (IiF , I¯iF ) + C, where Ω, the data cost function is written as Edata (Ω) = i=1 nc nc C = i=1 Ci = i=1 ξ(Ii , I˜i ) is constant. When computing ξ, any statistical correlation among color or intensity patterns such as the sum of squared diﬀerences (SSD), cross correlation (CC), and mutual information (MI) can be used. In any case, ξ can be expressed as the integral over the image plane as ξ(Ii , I¯i ) = πi ei (x)dσi , where dσi is the surface measure and ei (x) is the contribution at x to ξi . The data cost function is then given as nc

Edata (Ω) = eˆi (x)dσi + C, (4) i=1

πiF

where eˆi (x) = ei Ii (x), I¯i (x) − ei Ii (x), I˜i (x) . Decoupling Appearance from Surface Normal. As shown in Eq. (1), surface appearance is dependent on both the surface normal and position, and this makes the problem hard to solve and unstable. To resolve this problem, we introduce a photometric unit vector ﬁeld v satisfying v = 1 as in [14], which is used for the computation of surface appearance. To penalize the deviation between the actual normal vector n and the photometric normal vector v, we add a new term

Edev (Ω) = τ χ(X)dσ = τ (1 − (n(X) · v(X))) dσ, (5) S

S

to the cost function, where τ is a control constant. Shape Area and Reﬂectance Discontinuity Cost Functions. By using the area of a surface for the prior, Eshape (S) is simply deﬁned as Eshape (S) = ψ(S) = λ S dσ, where λ is a control constant. Based on the assumption in section 3, we deﬁne a discontinuity cost function of surface reﬂectance, which makes the discontinuities of specular reﬂectance generally coincide with the discontinuities of diﬀuse reﬂectance, as

Eref l (R) = ω(R) = β

ζ Rd (X) × η Rs (X) dσ,

f (X)dσ = β S

(6)

S

where β is a control constant, and ζ Rd (X) and η Rs (X) are deﬁned as ζ Rd (X) =

1−

∇S Rd (X)2 M

, η Rs (X) = ∇S ρs (X)2 + γ∇S αs (X)2 (7)

with a pre-deﬁned constant M .2 ∇S denotes the intrinsic gradient deﬁned on S. By using the proposed discontinuity cost function of surface reﬂectance, surface points that do not have enough specular observations get assigned specular reﬂectance inferred from the specular reﬂectance of neighboring surface points. 2

Be sure that M ≥ 3 for gray-level images and M ≥ 9 for color images.

750

K.-J. Yoon, E. Prados, and P. Sturm

Total Cost Function. Finally, the total cost function is given by Etotal (Ω) = C +

nc

i=1

πiF

eˆi (x)dσi +τ

χ(X)dσ+λ S

dσ+β S

f (X)dσ. (8) S

Here, it is worthy of notice that Edev (Ω), Eshape (S), and Eref l (R) are deﬁned over the scene surface while Edata (Ω) is deﬁned as an integral over the image plane. By the change of variable, dσi = − di (X)·n(X) dσ, where di (X) is the zi (X)3 vector connecting the center of the ith camera and X and zi (X) is the depth of X relative to the ith camera, we can replace the integral over the image plane by an integral over the surface [7]. When denoting g(X, n(X)) : R3 × Ω → R as

n c d · n i vSi eˆi 3 + τ χ + λ + βf , g(X, n(X)) = − (9) zi i=1 Eq. (8) is simply rewritten as Etotal (Ω) = C +

5

S

g(X, n(X))dσ.

Scene Recovery

Recently, via graph cuts or convexity, some authors have proposed some global optimization methods for the classical multiview stereovision problem [5, 22, 23]. Nevertheless, because of the presence of the normal but also of the visibility in the cost function, the state of the art in optimization does not allow to compute the global minimum of the energy we have designed. Also, here, scene recovery is achieved by minimizing Etotal via gradient descents. In other respects, S and R being highly coupled, it is very complicated to estimate all unknowns simultaneously. To solve the problem, we adopt an alternating scheme, updating S for a ﬁxed R and then R for a ﬁxed S. 5.1

Shape Estimation – Surface Evolution

When assuming that R is given, Etotal is a function of S. In this work, we derive the gradient descent ﬂows corresponding to the cost functions respectively. The ﬁnal gradient descent ﬂow is then given by St = St data + St dev + St shape + St ref l , (10) where St data , St dev , St shape and St ref l are described below. The data cost is a function of the visibility of a surface point, which is dependent on the whole surface shape. According to [20, 21] for correctly dealing with the visibility of non-convex objects, St data is given by nc v i (ˆ v i ei − eˆ ) − S 3 i dti ∇ndti δ(di · n) + S3 ∂2 eˆi ∇I¯i · di , (11) St data = zi zi i=1

Generic Scene Recovery Using Multiple Images

751

where δ(·) is the delta function and eˆi is an error computed by using the radiance at point X in the direction of the ith camera, which is the terminator of a horizon point X [21]. When a horizon point has no terminator point on the surface, eˆi = 0 ¯ because the terminator nil point is from the background. ∇Ii is expressed by using ¯ Eq. (1) as ∇Ii = j=1 {(∇vLij )Lij + vLij (∇Lij )} + (∇ρa )Lia . This gradient descent ﬂow includes both the variation related to the camera visibility changes (the ﬁrst term) and the variation related to the image changes (the second term), which also includes the variation due to the light visibility changes. In addition, similarly as [8, 14], the gradient descent ﬂows for the normal deviation cost St dev (originating from Edev (Ω)) is St dev = (−2τ H + τ (∇ · v)). Also St shape (from Eshape (S)) is the mean curvature ﬂow as St shape = −2λH. Due to the complexity of the discontinuity cost function of surface reﬂectance, it needs more attention to derive the gradient descent ﬂow. By using the derivation in [24], we get the following equation for surface evolution. 1 St ref l = −2β m(ρd )η(Rs ) − (m(ρs ) + γm(αs )) ζ(Rd ) . (12) M Here, m(y) = II ∇S y×n +∇S y2 H , where II(t) is the second fundamental form for a tangent vector t with respect to n. 5.2

Photometric Unit Vector Field Update

The computed gradient descent ﬂows minimize the total cost with respect to given reﬂectance and v. We then update the photometric unit vector ﬁeld v to minimize the total cost with respect to given shape and reﬂectance. The v that nc ∂g I¯i di ·n + vSi ∂2 eˆi ∂∂v minimizes the total cost satisﬁes the equation ∂v = − i=1 3 zi

(−τ n) = 0. Here, we have to keep v = 1. Since v ∈ S2 , v can be expressed in spherical coordinates as [cos θv sin φv , sin θv sin φv , cos φv ]T where θv and φv are the coordinates of v. Therefore, we update θv and φv to update v. As before, the θv and φv that minimize the total cost satisfy the following two equations by the chain rule. ∂g ∂v ∂g ∂g ∂v ∂g · · = = 0, = =0 ∂θv ∂v ∂θv ∂φv ∂v ∂φv

(13)

So, we update v by performing gradient descent using above two PDEs. 5.3

Reﬂectance Estimation

Here, we estimate R for ﬁxed S and v, still minimizing the total cost function. Since Edev and Eshape do not depend on R at all, we seek an optimal R by minimizing (Edata (Ω) + Eref l (R)). Here, because it is also complex to estimate diﬀuse and specular reﬂectance at the same time due to the high coupling between them, we alternatively estimate surface reﬂectance one by one while assuming that the rest are given and ﬁxed. We repeat the procedure until they no longer change.

752

K.-J. Yoon, E. Prados, and P. Sturm

Diﬀuse Reﬂectance Estimation. For given S and Rs , we estimate ρd that minimizes the cost (Edata + Eref l ). Here, ρd that minimizes the total cost func c i ∂ I¯i di ·n tion will satisfy the Euler-Lagrange equation given as − ni=1 vS ∂2 eˆi ∂ρ 3 + d zi 2β M η Rs ΔS ρd = 0, where ΔS denotes the Laplace-Beltrami operator deﬁned on the surface S. We solve the PDE by performing gradient descent using the following PDE:

n c ∂ρd ∂ I¯i di · n 2β i = − η R + ΔS ρd . vS ∂2 eˆi (14) s ∂t ∂ρd zi 3 M i=1 Specular Reﬂectance Estimation. We then estimate Rs = (ρs , αs ) for given S and Rd in the same manner. ρs that minimizes the total cost function nc ∂ I¯i di ·n will satisfy the Euler-Lagrange equation given as − i=1 − vSi ∂2 eˆi ∂ρ 3 s zi 2β ΔS ρs ζ ρd = 0. We again solve the PDE by performing gradient descent using the following PDE. nc ∂ρs ∂ I¯i di · n i =− vS ∂2 eˆi − 2β ΔS ρs ζ ρd . 3 ∂t ∂ρ z s i i=1 αs is also estimated in the same manner by solving the PDE as nc ∂αs ∂ I¯i di · n =− vSi ∂2 eˆi − 2βγ ΔS αs ζ ρd . 3 ∂t ∂αs zi i=1

(15)

(16)

Single-Material Surface Case. When dealing with a single-material surface that has a single specular reﬂectance Rs , the discontinuity cost function of surface reﬂectance, Eref l (R), can be excluded because f (X) is zero everywhere on the surface. The PDE used for the ρd estimation, Eq. (14), then simpliﬁes to nc i ∂ρd ∂ I¯i di ·n ˆi ∂ρ 3 . Here ρs and αs are also computed by performing i=1 vS ∂2 e ∂t = − d zi gradient descent using PDEs given as ∂ρs = ∂t

6

− S

nc

vSi ∂2 eˆi

i=1

∂αs ∂ I¯i di · n = dσ, ∂ρs zi 3 ∂t

− S

nc i=1

vSi ∂2 eˆi

∂ I¯i di · n dσ. (17) ∂αs zi 3

Experiments

We have implemented the gradient descent surface evolution in the level set framework. The proposed method starts with the visual hull obtained by rough silhouette images to reduce computational time and to avoid local minima. We also adopt a multi-scale strategy. 640×480 or 800×600 images were used as inputs and the simple L2 -norm was used to compute the image similarity, e. For synthetic data sets, the estimated shape is quantitatively evaluated in terms of accuracy and completeness as in [1]. We used 95% for accuracy and

Generic Scene Recovery Using Multiple Images

(a) input images

(b) synthesized images

753

(c) results

Fig. 1. “dino” image set (16 images) — Lambertian surface case (static illumination)

(a) ground-truth model

(b) estimated model

(c) input vs. synthesized image

Fig. 2. “bimba” image set (18 images) — textureless Lambertian surface case (varying illumination and viewpoint). 95% accuracy (shape, ρdr , ρdg , ρdb )=(2.16mm, 0.093, 0.093, 0.093), 1.0mm completeness (shape, ρdr , ρdg , ρdb ) = (82.63%, 0.104, 0.104, 0.104), eimage =1.44.

the 1.0mm error for completeness. For easy comprehension, the size of a target object is normalized so that it is smaller than [100mm 100mm 100mm]. Here, beside the shape evaluation, we also evaluated the estimated reﬂectance in the same manner. In addition, we computed the average between input nc 1diﬀerence images and synthesized images as eimage = n1c i=1 I (x) − I¯i (x) dσi , i A πi where A = πi dσi . Due to the generality of the proposed method, it can be applied to various types of image sets with diﬀerent camera/light conﬁgurations. Here, knowledge of illumination allows to factorize radiance into reﬂectance and geometry. In practice, depending on the scenario, that knowledge may not be required, e.g. for recovering shape and radiance of Lambertian surfaces with static illumination. In this case, the proposed method can be applied even without lighting information, assuming only an ambient illumination, and the proposed method works much like the conventional multiview stereo methods. Figure 1 shows the result for the “dino" image set [1], for which no lighting information is required. The proposed method successfully recovers the shape as well as the radiance. The proposed method can also be applied to images taken under varying illumination. Results using images of textureless/textured Lambertian surfaces are shown in Fig. 2 and Fig 3. In the case of Fig. 2, the proposed method works as a multiview photometric stereo method and recovers the shape and the diﬀuse reﬂectance of each surface point. Based on these, we can synthesize images of the scene for diﬀerent lighting conditions. We then applied our method to the images of textureless/textured nonLambertian surfaces showing specular reﬂection. Note that, unlike [15, 16], we do not use any thresholding to ﬁlter out specular highlight pixels. The result for the smoothed “bimba” data set is shown in Fig. 4. In this case, the surface has

754

K.-J. Yoon, E. Prados, and P. Sturm

(a) input image

(b) true reﬂ.

(c) true shading

(d) est. reﬂ.

(e) est. shading

Fig. 3. “dragon" image set (32 images) — textured Lambertian surface case (static illumination and varying viewpoint). 95% accuracy (shape, ρdr , ρdg , ρdb )=(1.28mm, 0.090, 0.073, 0.066), 1.0mm completeness (shape, ρdr , ρdg , ρdb ) = (97.11%, 0.064, 0.056, 0.052), eimage =1.25.

(a) true model

(b) est. shape

(c) diﬀuse & specular images

(d) synthesized

Fig. 4. Smoothed “bimba" image set (36 images) — textureless non-Lambertian surface case (uniform specular reﬂectance, varying illumination and viewpoint). 95% accuracy (shape, ρdr , ρdg , ρdb , ρs , αs )=(0.33mm, 0.047, 0.040, 0.032, 0.095, 8.248), 1.0mm completeness (shape, ρdr , ρdg , ρdb , ρs , αs ) = (100%, 0.048, 0.041, 0.032, 0.095, 8.248), eimage =1.63.

uniform diﬀuse/specular reﬂectance and each image was taken under a diﬀerent illumination. Here, we used the method with Eq. (17) to estimate the specular reﬂectance. Although there is high-frequency noise in the estimated shape, the proposed method estimates the specular reﬂectance well — the ground-truth specular reﬂectance is (ρs =0.7, αs =50) while the estimated one is (ρs =0.61, αs =41.8). Here, note that small errors in estimated surface normals can cause large errors in specular reﬂectance due to its sensitivity to the surface normal. For instance, 0.7 × (0.98)50 (= 0.255) ≈ 0.61 × (0.979)41.8 (= 0.251). Note that most previous methods do not work for image sets taken under varying illumination and, moreover, they have diﬃculties to deal with specular reﬂection even if the images are taken under static illumination. For example, Fig. 5 shows results obtained by the method of [2] and our result for comparison. We ran the original code provided by the authors many times while changing parameters and used mutual information (MI) and cross correlation (CCL) as similarity measures to get the best results under specular reﬂection. As shown, the method of [2] fails to get a good shape even when the shape is very simple, while our method estimates it accurately. Also, with such images, given the large proportion of overbright surface parts, it seems intuitive that the strategy chosen by [16] and [15] (who consider bright pixels as outliers) might return less accurate results, because it removes too much information.

Generic Scene Recovery Using Multiple Images

(a) two input images

(b) results using [2] (MI, CCL)

755

(c) our result

Fig. 5. Comparison using the “ellipse" image set (16 images) — textureless nonLambertian surface case (uniform specular reﬂectance, static illumination and varying viewpoint)

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 6. Result for real image sets. (a) input image (b) initial shape (c) estimated shape (d) diﬀuse image (e) specular image (f) synthesized image.

We also used real image sets of textured glossy objects, which were taken by using ﬁxed cameras/light sources, while rotating the objects as in [15, 16]. Here, we simply assumed a single-material surface. (72 × 72 × 72) grids were used for the “saddog” (59 images) and “duck” (26 images) image sets. Figure 6 shows that, although sparse grid volumes were used, the proposed method successfully estimated the shape of the glossy object even under specular reﬂection while estimating specular reﬂectance. In addition, although the estimated specular reﬂectance may not be so accurate because of the inaccuracy of lighting calibration, saturation, and some unexpected photometric phenomenon such as interreﬂection, it really helps to recover the shape well. Finally, we applied our method to the most general case — textured nonLambertian surfaces with spatially varying diﬀuse and specular reﬂectance and shininess, cf. Fig. 7. (64 × 125 × 64) grids were used in this case. We can see that the proposed method yields plausible specular/diﬀuse images and shape. However, there is high-frequency noise in the estimated shape. Moreover, the error in reﬂectance estimation is rather larger compared to the previous cases. This result shows that, although the proposed discontinuity cost function of

756

K.-J. Yoon, E. Prados, and P. Sturm

(a) input image

(b) true shading

(c) shape

init.

(d) est. shading

(e) syn. image

Fig. 7. Result for the “amphora" image set (36 images). 95% accuracy (shape, ρdr , ρdg , ρdb , ρs , αs )=(0.59mm, 0.041, 0.047, 0.042, 0.226, 13.59), 1.0mm completeness (shape, ρdr , ρdg , ρdb , ρs , αs ) = (89.73%, 0.042, 0.047, 0.042, 0.226, 13.55), eimage =1.99.

surface reﬂectance helps to infer the specular reﬂectance of all points with sparse specular reﬂection observation, reliably estimating specular reﬂectance for all surface points is still diﬃcult unless there are enough observations of specular reﬂection for every surface point.

7

Conclusion

In this paper, we have presented a variational method that recovers both the shape and the reﬂectance of surfaces using multiple images. Scene recovery is achieved by minimizing a global cost functional by alternation. As a result, the proposed method produces a complete description of scene surfaces. Contrary to previous works that consider speciﬁc scenarios, our method can be applied indiscriminately to a number of classical scenarios — it naturally fuses and exploits several important cues (silhouettes, stereo, and shading) and allows to deal with most of the classical 3D reconstruction scenarios such as stereo vision, (multi-view) photometric stereo, and multiview shape from shading. In addition, our method can deal with strong specular reﬂection, which is diﬃcult even in some other state of the art methods using complex similarity measures.

References 1. Seitz, S.M., Curless, B., Diebel, J., Scharstein, D., Szeliski, R.: A comparison and evaluation of multi-view stereo reconstruction algorithms. In: IEEE CVPR, pp. 519–528 (2006) 2. Pons, J.P., Keriven, R., Faugeras, O.: Multi-view stereo reconstruction and scene ﬂow estimation with a global image-based matching score. IJCV 72(2), 179–193 (2007) 3. Goesele, M., Curless, B., Seitz, S.M.: Multi-view stereo revisited. In: IEEE CVPR, vol. 2, pp. 2402–2409 (2006) 4. Tran, S., Davis, L.: 3D surface reconstruction using graph cuts with surface constraints. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 219–231. Springer, Heidelberg (2006)

Generic Scene Recovery Using Multiple Images

757

5. Kolev, K., Klodt, M., Brox, T., Esedoglu, S., Cremers, D.: Continuous global optimization in multiview 3D reconstruction. In: Yuille, A.L., Zhu, S.-C., Cremers, D., Wang, Y. (eds.) EMMCVPR 2007. LNCS, vol. 4679, pp. 441–452. Springer, Heidelberg (2007) 6. Jin, H., Soatto, S., Yezzi, A.J.: Multi-view stereo reconstruction of dense shape and complex appearance. IJCV 63(3), 175–189 (2005) 7. Soatto, S., Yezzi, A.J., Jin, H.: Tales of shape and radiance in multi-view stereo. In: IEEE ICCV, pp. 974–981 (2003) 8. Jin, H., Cremers, D., Wang, D., Prados, E., Yezzi, A., Soatto, S.: 3-D reconstruction of shaded objects from multiple images under unknown illumination. IJCV 76(3), 245–256 (2008) 9. Jin, H., Yezzi, A., Soatto, S.: Variational multiframe stereo in the presence of specular reﬂections. In: 3DPVT, pp. 626–630 (2002) 10. Yang, R., Pollefeys, M., Welch, G.: Dealing with textureless regions and specular highlights-a progressive space carving scheme using a novel photo-consistency measure. In: IEEE ICCV, pp. 576–583 (2003) 11. Yoon, K.J., Kweon, I.S.: Correspondence search in the presence of specular highlights using specular-free two-band images. In: Narayanan, P.J., Nayar, S.K., Shum, H.-Y. (eds.) ACCV 2006. LNCS, vol. 3852, pp. 761–770. Springer, Heidelberg (2006) 12. Zickler, T., Mallick, S.P., Kriegman, D.J., Belhumeur, P.: Color subspaces as photometric invariants. To appear in IJCV (2008) 13. Yu, T., Xu, N., Ahuja, N.: Shape and view independent reﬂectance map from multiple views. IJCV 73(2), 123–138 (2007) 14. Jin, H., Cremers, D., Yezzi, A.J., Soatto, S.: Shedding light on stereoscopic segmentation. In: IEEE CVPR, vol. 1, pp. 36–42 (2004) 15. Esteban, C.H., Vogiatzis, G., Cipolla, R.: Multiview photometric stereo. IEEE TPAMI 30(3), 548–554 (2008) 16. Birkbeck, N., Cobzas, D., Sturm, P., Jägersand, M.: Variational shape and reﬂectance estimation under changing light and viewpoints. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 536–549. Springer, Heidelberg (2006) 17. Georghiades, A.S.: Incorporating the torrance and sparrow model of reﬂectance in uncalibrated photometric stereo. In: IEEE ICCV, vol. 2, pp. 816–823 (2003) 18. Vogiatzis, G., Favaro, P., Cipolla, R.: Using frontier points to recover shape, reﬂectance and illumunation. In: IEEE ICCV, vol. 1, pp. 228–235 (2005) 19. Lu, J., Little, J.: Reﬂectance function estimation and shape recovery from image sequence of a rotating object. In: IEEE ICCV, pp. 80–86 (1995) 20. Yezzi, A., Soatto, S.: Stereoscopic segmentation. IJCV 53(1), 31–43 (2003) 21. Gargallo, P., Prados, E., Sturm, P.: Minimizing the reprojection error in surface reconstruction from images. In: IEEE ICCV (2007) 22. Paris, S., Sillion, F.X., Quan, L.: A surface reconstruction method using global graph cut optimization. IJCV 66(2), 141–161 (2006) 23. Vogiatzis, G., Esteban, C.H., Torr, P.H.S., Cipolla, R.: Multiview stereo via volumetric graph-cuts and occlusion robust photo-consistency. IEEE TPAMI 29(12), 2241–2246 (2007) 24. Jin, H., Yezzi, A.J., Tsai, Y.H., Cheng, L.T., Soatto, S.: Estimation of 3D surface shape and smooth radiance from 2D images: A level set approach. J. Sci. Comput. 19(1-3), 267–292 (2003)

Highly Accurate PDE-Based Morphology for General Structuring Elements Michael Breuß and Joachim Weickert Mathematical Image Analysis Group, Faculty of Mathematics and Computer Science, Building E1.1 Saarland University, 66041 Saarbrücken, Germany {breuss,weickert}@mia.uni-saarland.de

Abstract. Modelling the morphological processes of dilation and erosion with convex structuring elements with partial diﬀerential equations (PDEs) allows for digital scalability and subpixel accuracy. However, numerical schemes suﬀer from blur by dissipative artifacts. In our paper we present a family of so-called ﬂux-corrected transport (FCT) schemes that addresses this problem for arbitrary convex structuring elements. The main characteristics of the FCT-schemes are: (i) They keep edges very sharp during the morphological evolution process, and (ii) they feature a high rotational invariance. Numerical experiments with diamonds and ellipses as structuring elements show that FCT-schemes are superior to standard schemes in the ﬁeld of PDE-based morphology.

1

Introduction

Mathematical morphology is concerned with the analysis of shapes. Beginning with the works of Serra and Matheron [1, 2], it has evolved to a highly successful ﬁeld in image processing. Many monographs and conference proceedings document this development, see e.g. [4, 6, 8, 18] and [17, 21, 22, 25], respectively. In mathematical morphology two fundamental operations are employed, dilation and erosion. Many other morphological processes such as openings, closings, top hats and morphological derivative operators can be derived from them. While dilation/erosion are frequently realised using a set-theoretical framework, an alternative formulation is available via partial diﬀerential equations (PDEs) [10, 11, 13, 14, 15]. Compared to the set-theoretical approach, the latter oﬀers the conceptual advantages of digital scalability and subpixel accuracy. However, a usual drawback of PDE-based algorithms is that they introduce blurring artefacts, especially at edges of dilated/eroded objects. In this paper we are addressing this problem by dealing with the proper numerical realisation of PDE-based dilation and erosion for general structuring elements. We show how a ﬂux-corrected transport (FCT) scheme that gives a sharp resolution of dilated/eroded object edges combined with a high-rotational invariance can be X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 758–769, 2009. c Springer-Verlag Berlin Heidelberg 2009

Highly Accurate PDE-Based Morphology for General Structuring Elements

759

used. It is not only easy to implement, but we also show in numerical experiments that it outperforms other schemes for PDE-based morphology. Mathematical Formulation of Dilation and Erosion. Let us consider a grey-value image f : Ω ⊂ IR2 → IR and a so-called structuring element B ⊂ IR2 . The building blocks of morphological ﬁlters, dilation and erosion, are then deﬁned by dilation: erosion:

(f ⊕ B) (x) := sup {f (x−z), z ∈ B}, (f B) (x) := inf {f (x+z), z ∈ B}.

(1) (2)

Dilation/erosion are often realised in a set-theoretical framework. To this end, the structuring elements are given by masks deﬁned in accordance to the discrete pixel grid in an image. For convex structuring elements, there exists an alternative formulation of dilation/erosion in terms of PDEs that guarantee the validity of the semigroup property of dilation/erosion operations [14, 15, 10, 11]. Here, a scaling parameter t > 0 is introduced within the structuring element which is then given as tB, achieving digital scalability. Especially, in Paragraph 4.2 of [14] it was shown that dilation/erosion can be realised by solving the PDEs dilation:

∂t u(x, t) = sup z, ∇u(x, t) ,

(3)

erosion:

∂t u(x, t) = inf z, ∇u(x, t) ,

(4)

z∈B z∈B

respectively. In (3)-(4), ∇ = (∂x , ∂y ) is the spatial nabla operator, and a, b denotes the Euclidean product of the vectors a and b. Interpreting the scaling parameter t as an artiﬁcial time, the given image f serves as the initial condition for the temporal evolution described by the PDEs (3)-(4). As we deal with rectangular images of ﬁnite size, we also need to deﬁne boundary conditions. Thus, we employ homogeneous Neumann boundary conditions at the image boundary ∂Ω, complementing the PDE-based problem description. Set-Theoretical vs. PDE-Based Approach. As already mentioned, the PDE-based approach oﬀers the advantages of digital scalability and subpixel accuracy compared to the set-theoretical formulation, while the PDE-based algorithms usually introduce blurring of edges. Let us note in addition, that round structuring elements such as circles or ellipses can not be represented conveniently in the set-theoretical approach, and they typically do not deﬁne a granulometric family [18]. Thus, conceptually the PDE-based approach is favourable. Numerical Schemes. Let us ﬁrst brieﬂy comment on the nature of the evolutionary PDEs (3)-(4). By the ﬁrst-order spatial derivatives these PDEs are hyperbolic, describing a wave propagation or transport behaviour, in analogy to Huygens’ principle. Thereby, the shape of the evolving wavefront is determined by the shape of the scalable structuring element. Thus, given the hyperbolic character of the dilation/erosion PDEs (3)-(4), it is natural that techniques from hyperbolic conservation laws are of importance for this work; see e.g. [23] for a general discussion of numerical methods

760

M. Breuß and J. Weickert

for hyperbolic PDEs. In the context of dilation/erosion, popular schemes are the Osher-Sethian (OS) schemes [24,9,20] and the Rouy-Tourin (RT) scheme [12,19]. In particular, let us note that one of the mentioned OS-schemes is a second-order high-resolution method. The use of a comparable high-resolution ansatz, speciﬁcally an essentially non-oscillatory (ENO) approach, was reported in [16]. In [26], Breuß and Weickert constructed a FCT-scheme for performing dilation/erosion with a disc of radius t as structuring element. Our Contribution. We extend the applicability of the FCT-scheme introduced in [26] from discs to general structuring elements. As it turns out, this is feasible but involves technical diﬃculties, especially for the case of general ellipses as structuring elements we discuss here in detail. We validate experimentally that the attractive features discussed in [26], namely a sharp resolution of edges and high rotational invariance, do carry over to the general case. In order to compare the performance of the FCT-scheme to set-theoretical algorithms, we use a diamond-shaped structuring element. For a comparison relying completely on digitally scalable structuring elements, we use an ellipse as structuring element. We show experimentally that the FCT-scheme gives much more accurate results than other PDE-based schemes. Paper Organisation. In Section 2, we brieﬂy introduce classic numerical schemes important in this paper for the case of a diamond as structuring element. We also construct the FCT-scheme for the same structuring element there. After that, we elaborate in Section 3 on the FCT-construction for ellipses as structuring element. In Section 4, we present numerical results. The paper is ﬁnished by a conclusion and outlook in Section 5.

2

PDE-Based Algorithms for Diamonds

For the sake of brevity, we discuss only dilation in detail, as the corresponding scheme for erosion is easily obtained. Employing the structuring element B := z ∈ IR2 , z 1 ≤ 1 , (5) the sought PDE (3) describing speciﬁcally dilation with a diamond is based on the dual norm to the norm used in (5). It reads as ∂t u = ∇u ∞ ,

(6)

where ∇u ∞ = max (|∂x u| , |∂y u|). Now, we need to discretise the PDE (6). For this, we deﬁne a spatio-temporal grid with uniform mesh widths hx , hy and τ , respectively. For the formulae of numerical schemes, let us then introduce the n notation Ui,j via n Ui,j ≈ u (ihx , jhy , nτ ) . (7) Also, for writing down our schemes let us deﬁne the following ﬁnite diﬀerence operators:

Highly Accurate PDE-Based Morphology for General Structuring Elements

right-sided: left-sided: central:

x n n n D+ Ui,j := Ui+1,j − Ui,j , x n Ui,j D− n Dcx Ui,j

:= :=

n n Ui,j − Ui−1,j , n n Ui+1,j − Ui−1,j

761

(8) (9) .

(10)

y In an analogous fashion, we use corresponding ﬁnite diﬀerence operators D+ , y y D− and Dc for the y-direction.

2.1

The High-Resolution Osher-Sethian-Scheme

In what follows, we will refer to this method as the OS-scheme, as its simpler, ﬁrst-order variant will not be considered here. For its deﬁnition, we employ the minmod-function (as it gives back the minimal modulus of its arguments) given as ⎧ ⎨ min (a, b) if a > 0 and b > 0 , (11) mm(a, b) := max (a, b) if a < 0 and b < 0 , ⎩ 0 else . To keep the presentation of the OS-scheme short, let us deﬁne the following discrete derivative operators:

x x n 1 1 OS− n x n x x n δx Ui,j := min D− Ui,j + mm D− D+ Ui,j , D− D− Ui,j , 0 , (12) hx 2

x x n 1 1 OS+ n x n x x n δx Ui,j := max D+ Ui,j − mm D+ D+ Ui,j , D− D+ Ui,j , 0 , (13) hx 2 n n and we set analogously δyOS− Ui,j and δyOS+ Ui,j . Let us note that the basic idea behind the construction within (12)-(13) is to augment the ﬁrst-order derivatives x n x n D− Ui,j and D+ Ui,j by a higher-order correction given in terms discrete secondorder derivatives. For a compact notation, let us then set OS+ n OS− n OS+ n n L (U n , i, j) := max δxOS− Ui,j + δx Ui,j , δy Ui,j + δy Ui,j , (14)

which realises the maximum norm on the discrete level. Let us brieﬂy comment on the ’double’ of the discretised derivatives in (14), for instance in contributions OS+ n n x-direction: δxOS− Ui,j + δx Ui,j . For a strictly monotone grey-value proﬁle in the points incorporating the indices i − 1, i, i + 1, there will only be one nonzero contribution from one of the summands; the other one will be zero. That is determined by the sign of the slope in a strictly monotone proﬁle. Only at a n local minimum Ui,j , both summands could be non-zero. The OS-scheme is a second-order high-resolution scheme. As such, we need to employ a second-order time stepping scheme, for which we choose the well-known method of Heun which is a two-stage Runge-Kutta method [7]: ¯ n+1 = U n + τ L (U n , i, j) U i,j i,j 1 n 1 ¯ n+1 τ ¯ n+1 n+1 Ui,j = Ui,j + U + L U , i, j . i,j 2 2 2

(15)

762

2.2

M. Breuß and J. Weickert

The FCT-Scheme

Like the OS-scheme, the FCT-scheme is a predictor-corrector method. However, while this format arises in the case of the OS-scheme by use of a Runge-Kutta method for time integration, the FCT-construction works diﬀerently. As a predictor step, a ﬁrst-order scheme is used for wave propagation. Thus, by the ﬁrst-order error the predictor features desirable theoretical properties but also introduces much artiﬁcial dissipation. Then, by taking into account the so-called viscosity form of the predictor scheme, the dissipation can be quantiﬁed on a discrete level and is negated in a second step using stabilised inverse diﬀusion [27]. For details we refer to [26]. Let us note that the basic idea to negate dissipation by a corrector step was invented by Boris and Book [3,5]. However, the corrector step was realised technically quite diﬀerently in their original works. Following their procedure would lead to a diﬀerent (and less attractive) scheme than with the approach followed here. As a predictor step we use the dissipative scheme proposed by Rouy and Tourin [12]. In order to write this down, we use the abbreviation

1 x n 1 n x n δxRT Ui,j := max max −D− Ui,j , 0 , max D+ Ui,j , 0 , (16) hx hx n and δyRT Ui,j is used accordingly. Then the RT-scheme is in our case deﬁned as n n RT n ¯ n+1 = Ui,j , δy Ui,j . (17) U + τ max δxRT Ui,j i,j

The FCT scheme then consists of a subsequent application of (17) and a corrector step negating the artiﬁcial dissipation of the RT scheme, reading in total as n n RT n ¯ n+1 = Ui,j U , δy Ui,j + τ max δxRT Ui,j i,j n+1 n+1 n+1 ¯ ¯ ¯ n+1 + Ch U Ui,j − Cd U . (18) =U ij Let us consider the corrector step and especially the functions Ch (’h’ for highorder part) and Cd (’d’ for dissipative part) in some detail. As indicated, the ﬁrst step of the FCT procedure is to split the dissipative part of the scheme from the non-dissipative second-order part. The latter part of the scheme can be described via central diﬀerences as in (10), since central diﬀerences do not feature dissipation in the leading-order part of the truncation error. Thus, the discretisation of the dilation PDE (6) using central diﬀerences only,

τ n+1 n x n τ y n ¯ (19) Ui,j = Uij + max Dc Ui,j , Dc Ui,j , 2hx 2hy incorporates no numerical dissipation in the approximation of spatial derivatives. Employing predicted data as arguments in the formulae of the corrector step, we can identify the high-order part within the predictor formula (which was absent before adding it) as

τ n+1 x ¯ n+1 τ y ¯ n+1 ¯ Ch U (20) := + max Dc Ui,j , Dc Ui,j 2hx 2hy

Highly Accurate PDE-Based Morphology for General Structuring Elements

by adding zero via adding/subtracting

τ ¯ n+1 , τ Dcy U ¯ n+1 . Dcx U max i,j i,j 2hx 2hy

763

(21)

Let us now stress that the remaining terms of the predictor formula plus the ¯ n+1 as arguments, deﬁne the discrete contribution due to (20)-(21), with data U dissipation Cd . However, since we want to subtract Cd in (18), we aim for a backward dissipation. Thus, we need to stabilise this contribution with help of a straightforward extension of the minmod-function from (11) to three arguments:

τ x ¯ n+1 x ¯ n+1 x ¯ n+1 Gi+1/2,j := mm D− Ui,j , (22) D U , D+ Ui+1,j , 2hx + i,j

τ y ¯ n+1 ¯ n+1 , Dy U ¯ n+1 Gi,j+1/2 := mm D− (23) Dy U Ui,j , + i,j+1 . 2hy + i,j Let us note that the left and right arguments in (22)-(23) are supposed to prevent overshoots, while the middle argument is determined in accordance to (16). For details concerning this procedure and an analysis of stabilised inverse diﬀusion, ¯ n+1 we employ see [26,27], respectively. For the dissipative correction term Cd U the stabilised ﬂuxes from (22)-(23), yielding ¯ n+1 := τ Dcx U ¯ n+1 + Gi+1/2,j − Gi−1/2,j , δxbd U (24) i,j i,j 2hx ¯ n+1 := τ Dy U ¯ n+1 + Gi,j+1/2 − Gi,j−1/2 , (25) δybd U c i,j i,j 2hy and ﬁnally:

3

n+1 ¯ ¯ n+1 , δybd U ¯ n+1 . = max δxbd U Cd U i,j i,j

(26)

The FCT-Scheme for General Ellipses

The key for obtaining dilation with a general ellipse is to consider the normal form of an ellipse in the x-y-plane which can be written for our purpose as a2 x2 + b2 y 2 = 1 .

(27)

This equation describes the location of the front of the solution of the evolutionary PDE ∂t u =

2

2

a2 (∂x u) + b2 (∂y u)

(28)

at time t = 1, starting from the center (x, y)T = (0, 0)T . For a = b = 1, one obtains a circle, retrieving a disc as structuring element. Note that we should be able to handle a PDE like (28) easily, while implementing directly an algorithm for ellipses with a general orientation of the principal axis poses diﬃculties. The General Idea. Let us brieﬂy outline the procedure. In order to ﬁnally solve the PDE (28), we collect, for each pixel individually, grey values from positions

764

M. Breuß and J. Weickert

corresponding to a rotated grid. As these will not be located exactly at pixel centers, they will in general not coincide with the given grey values and need to be interpolated. With these interpolated data we solve pointwise the PDE (28). Having thus described the general proceeding, we begin its realisation for general ellipses as structuring elements by implementing a rotation of the coordinate system. For a more detailed explanation of this, we need to ﬁx some geometric properties of the ellipse deﬁning the structuring element. In order to simplify the presentation, we set hx := hy := 1. First, let us calibrate the length of the principal axis to 1, i.e. the ﬁnal ellipse is a subset of the unit disc. In order to use a PDE of the form of (28), we have to rotate the grid. Let us note that for hx = hy = 1, all points within the stencil of the Rouy-Tourin scheme (16)-(17) are on the unit sphere if we center this at (ihx , jhy )T . Then we rotate the local Euclidean coordinate system centered at (ihx , jhy )T by an angle α with 0 ≤ α ≤ π/2. Making use of elementary trigonometry, the values rotated now onto the knots of our ﬁnite diﬀerence stencil are grey values from the points given by (cos αk , sin αk )T , αk := α + k · π2 , k = 0, 1, 2, 3. Let us note that in using this procedure, we eﬀectively consider an ellipse where the angle between x-axis and principal axis is −α. Let us stress, that we can obtain via 0 ≤ α ≤ π/2 all possible ellipses, as we can switch at any time the roles of a and b in (28) that deﬁne the principal axis. It is just practical to impose 0 ≤ α ≤ π/2 since this helps to give a suitable interpolation formula, which is the next step. Obviously, we need at each pixel the grey values after rotation for deﬁning our ﬁnite diﬀerence scheme. We wish to achieve second-order accuracy because the second-order high-resolution OS-scheme will serve as the comparison scheme for the procedure. Thus, we use standard bilinear interpolation for this purpose as the error of this approach is formally of the same order. In order to show how the computation wroks, we now clarify the details for the values in the ﬁrst quadrant. As 0 ≤ α ≤ π/2, the grey value we need at the knot ((i + 1)hx , jhy )T is located at (cos α0 , sin α0 )T = (cos α, sin α)T . Because of hx = hy = 1, we can use the general formula for bilinear interpolation of some function g over the rectangle [0, 1] × [0, 1] reading as g(x, y) ≈ g(0, 0)(1 − x)(1 − y) + g(1, 0)x(1 − y) +g(0, 1)(1 − x)y + g(1, 1)xy , x, y ∈ [0, 1].

(29)

Plugging in our values within the ﬁrst quadrant, we obtain the rotated grey ˜n value U i+1,j as n n n ˜i+1,j U := Ui,j (1 − cos α)(1 − sin α) + Ui+1,j cos α(1 − sin α) n n +Ui,j+1 (1 − cos α) sin α + Ui+1,j+1 cos α sin α .

(30)

Analogously, we can compute the other members of our stencil after rotation of our local coordinate system.

Highly Accurate PDE-Based Morphology for General Structuring Elements

765

The resulting formulae are: n n ˜i,j := Ui,j , U

˜n U i,j+1 ˜n U i−1,j ˜n U i,j−1

:= := :=

n n Ui,j (1 − cos α)(1 − sin α) + Ui,j+1 cos α(1 − sin α) n n +Ui−1,j (1 − cos α) sin α + Ui−1,j+1 cos α sin α , n n Ui,j (1 − cos α)(1 − sin α) + Ui−1,j cos α(1 − sin α) n n +Ui,j−1 (1 − cos α) sin α + Ui−1,j−1 cos α sin α , n n Ui,j (1 − cos α)(1 − sin α) + Ui,j−1 cos α(1 − sin α) n n +Ui+1,j (1 − cos α) sin α + Ui+1,j−1 cos α sin α .

(31) (32) (33) (34)

In terms of these values we now give the main formulae for the FCT-scheme. Comparing especially with (18), (20) and (26) from Paragraph 2.2 shows what needs to be done:

2 2 n+1 n 2 δ RT U ˜ ˜n ˜n ¯ + b Ui,j = Ui,j + τ a2 δxRT U y i,j i,j n+1 n+1 n+1 n+1 ¯ ¯ ¯ − Cd U , (35) Ui,j = Uij + Ch U with

n+1 τ ¯ ¯ n+1 2 + b2 Dcy U ¯ n+1 2 , =+ a2 Dcx U Ch U i,j i,j 2hx n+1 ¯ ¯ n+1 2 + b2 δybd U ¯ n+1 2 . = a2 δxbd U Cd U i,j i,j

(36) (37)

n+1 ¯ , we Note that for the arguments of the minmod-function used within Cd U n+1 n+1 ˜ ¯ also need to compute rotated grey values Ui±2,j±2 from the data set U . This can be done in the same fashion as in (30)-(34). Also note that because of the callibration of ellipses, we always have in our experiments a = 1 and b ∈ [0, 1].

4

Numerical Experiments

The main disadvantage in using PDE-based algorithms is the occurence of dissipative discretisation artefacts. The resulting blurring is especially observable at edges of dilated/eroded objects. The Diamond Experiment. In this experiment, we solve the dilation PDE (6), comparing the FCT-scheme with the set-theoretical approach. For convenience, we always employ hx = hy = 1. For the fully discrete, set-theoretical approach, we employ the usual 5-pointstructuring element deﬁned centered in (0, 0)T with vertices (1, 0)T , (0, 1)T , (−1, 0)T , (0, −1)T .

(38)

In Figure 1, we observe the outcome of this experiment, where we have inverted the grey values. As input image, we use an image of size 129 × 129, where we

766

M. Breuß and J. Weickert

Fig. 1. Comparison of dilation with a diamond using inverted grey values. Top. Left: Initial image (a). Right: The set-based result (b). Bottom. Left: The FCT-result (c). Right: Scaled diﬀerence (d). The average diﬀerence visualised in (d) is of the grey value 1.502.

have exactly one pixel in the center of the image which is dilated. We perform 100 time steps with τ = 0.5 for dilation with FCT, and 50 iterations with the set-based algorithm, respectively. We observe that the FCT-result (c) is visually nice, with sharp diamond edges. Compared with the set-based result (b), we observe that there is some diﬀerence at the edges, which can be seen in the scaled diﬀerence map (d). Note that the average (unscaled) diﬀerence amounts to a grey value of 1.502. However, let us also note that the solution of the PDE is digitally scalable, so that the set-based solution is not intended to be the true solution of the dilation PDE. The Ellipse Experiment. We now show computational results for ellipses as structuring elements. In order to ﬁrst give an impression of what quality one may expect, we ﬁrst consider an ellipse where the principal axis is aligned with the grid. For this experiment, we deﬁne the structuring element via a = 1, b = 0.25, compare (27). For the numerical experiment, we use τ = 0.5, and we perform 100 time steps. The results of the OS-scheme together with the result of the

Highly Accurate PDE-Based Morphology for General Structuring Elements

767

Fig. 2. Dilation comparison with inverted grey values. Left: OS-result (a). Right: FCTresult (b).

Fig. 3. Dilation comparison with inverted grey values. Rotated ellipses with (top) α = 0.6 and (bottom) α = 0.9. Bilinear interpolation was used to rotate the grid in each time step. Left column: OS-results. Right column: FCT-results.

768

M. Breuß and J. Weickert

FCT-scheme are displayed in Figure 2. While the result of the OS-scheme is quite blurry, we observe a mixed behaviour of the FCT-scheme. While the left and right front travelling with the largest signal velocity in this example are well-resolved, there is some blurring on the slow-moving upper and lower part of the edge of the ellipse. Let us now consider the rotated case employing bilinear interpolation. In a setting analogous to the non-rotated case, but with a = 1, b = 0.25 and (a) α = 0.6, (b) α = 0.9. We obtain the results displayed in Figure 3. We observe that due to the interpolation there is some more blurring in using both schemes, however, the general qualitative relationship between results of these schemes is the same as in the non-rotated case.

5

Conclusion and Outlook

The main message of this paper is twofold: – The FCT-methodology is readily applicable in the context of general structuring elements. With the exception of suitable interpolation formulae, there is no more technical eﬀort than in the case of a disc-shaped structuring element. – The quality of FCT-results is better than the quality of results using other schemes with respect to edge resolution. The current paper represents one of the most advanced numerical approaches to continuous-scale morphology. For our future work, we aim to improve the quality of numerical schemes in this ﬁeld even further.

References 1. Serra, J.: Echantillonnage et estimation des phénomènes de transition minier. PhD thesis, University of Nancy, France (1967) 2. Matheron, G.: Eléments pour une théorie des milieux poreux. Masson, Paris (1967) 3. Boris, J.P., Book, D.L.: Flux corrected transport. I. SHASTA, a ﬂuid transport algorithm that works. Journal of Computational Physics 11(1), 38–69 (1973) 4. Matheron, G.: Random Sets and Integral Geometry. Wiley, New York (1975) 5. Boris, J.P., Book, D.L.: Flux corrected transport. III. Minimal error FCT algorithms. Journal of Computational Physics 20, 397–431 (1976) 6. Serra, J.: Image Analysis and Mathematical Morphology, vol. 1. Academic Press, London (1982) 7. Hairer, E., Norsett, S., Wanner, G.: Solving Ordinary Diﬀerential Equations. I: Nonstiﬀ Problems. Springer Series in Computational Mathematics, vol. 8. Springer, New York (1987) 8. Serra, J.: Image Analysis and Mathematical Morphology, vol. 2. Academic Press, London (1988) 9. Osher, S., Sethian, J.A.: Fronts propagating with curvature-dependent speed: Algorithms based on Hamilton–Jacobi formulations. Journal of Computational Physics 79, 12–49 (1988)

Highly Accurate PDE-Based Morphology for General Structuring Elements

769

10. Brockett, R.W., Maragos, P.: Evolution equations for continuous-scale morphology. In: Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, San Francisco, CA, March 1992, vol. 3, pp. 125–128 (1992) 11. van den Boomgaard, R.: Mathematical Morphology: Extensions Towards Computer Vision. PhD thesis, University of Amsterdam, The Netherlands (1992) 12. Rouy, E., Tourin, A.: A viscosity solutions approach to shape-from-shading. SIAM Journal on Numerical Analysis 29, 867–884 (1992) 13. Sapiro, G., Kimmel, R., Shaked, D., Kimia, B.B., Bruckstein, A.M.: Implementing continuous-scale morphology via curve evolution. Pattern Recognition 26(9), 1363– 1372 (1993) 14. Alvarez, L., Guichard, F., Lions, P.L., Morel, J.M.: Axioms and fundamental equations in image processing. Archive for Rational Mechanics and Analysis 123, 199– 257 (1993) 15. Arehart, A.B., Vincent, L., Kimia, B.B.: Mathematical morphology: The Hamilton–Jacobi connection. In: Proc. Fourth International Conference on Computer Vision, Berlin, pp. 215–219. IEEE Computer Society Press, Los Alamitos (1993) 16. Siddiqi, K., Kimia, B.B., Shu, C.W.: Geometric shock-capturing ENO schemes for subpixel interpolation, computation and curve evolution. Graphical Models and Image Processing 59, 278–301 (1997) 17. Heijmans, H.J.A.M., Roerdink, J.B.T.M. (eds.): Mathematical Morphology and its Applications to Image and Signal Processing. Computational Imaging and Vision, vol. 12. Kluwer, Dordrecht (1998) 18. Soille, P.: Morphological Image Analysis, 2nd edn. Springer, Berlin (2003) 19. van den Boomgaard, R.: Numerical solution schemes for continuous-scale morphology. In: Nielsen, M., Johansen, P., Olsen, O.F., Weickert, J. (eds.) Scale-Space 1999. LNCS, vol. 1682, pp. 199–210. Springer, Heidelberg (1999) 20. Sethian, J.A.: Level Set Methods and Fast Marching Methods, 2nd edn. Cambridge University Press, Cambridge (1999) 21. Goutsias, J., Vincent, L., Bloomberg, D.S. (eds.): Mathematical Morphology and its Applications to Image and Signal Processing. Computational Imaging and Vision, vol. 18. Kluwer, Dordrecht (2000) 22. Talbot, H., Beare, R. (eds.): Proc. Sixth International Symposium on Mathematical Morphology and its Applications, Sydney, Australia (April 2002), http://www.cmis.csiro.au/ismm2002/proceedings/ 23. LeVeque, R.J.: Finite Volume Methods for Hyperbolic Problems. Cambridge University Press, Cambridge (2002) 24. Osher, S., Fedkiw, R.P.: Level Set Methods and Dynamic Implicit Surfaces. Applied Mathematical Sciences, vol. 153. Springer, New York (2002) 25. Ronse, C., Najman, L., Decencière, E. (eds.): Mathematical Morphology: 40 Years On. Computational Imaging and Vision, vol. 30. Springer, Dordrecht (2005) 26. Breuß, M., Weickert, J.: A shock-capturing algorithm for the diﬀerential equations of dilation and erosion. Journal of Mathematical Imaging and Vision 25, 187–201 (2006) 27. Breuß, M., Welk, M.: Analysis of staircasing in semidiscrete stabilised inverse linear diﬀusion algorithms. Journal of Computational and Applied Mathematics 206, 520– 533 (2007)

Computational Geometry-Based Scale-Space and Modal Image Decomposition Application to Light Video-Microscopy Imaging Anatole Chessel1 , Bertrand Cinquin3,4 , Sabine Bardin3 , Jean Salamero3,5, and Charles Kervrann1,2 1

INRIA Rennes INRA-MIA UMR 144 CNRS-Institut Curie 4 Soleil Synchrotron 5 PICT-IBiSA Institut Curie 2

3

Abstract. In this paper a framework for deﬁning scale-spaces, based on the computational geometry concepts of α-shapes, is proposed. In this approach, objects (curves or surfaces) of increasing convexity are computed by selective sub-sampling, from the original shape to its convex hull. The relationships with the Empirical Mode Decomposition (EMD), the curvature motion-based scale-space and some operators from mathematical morphology, are studied. Finally, we address the problem of additive image/signal decomposition in ﬂuorescence video-microscopy. An image sequence is mainly considered as a collection of 1D temporal signals, each pixel being associated with its temporal intensity variation.

1

Introduction

Vision is a complex and hierarchical process of aggregation and reconstruction going from pointwise data to global information. The scale-space approach which builds several versions of the original signal at increasingly coarser scales, is a general framework for investigating those hierarchies. In this paper we propose to explore how to derive such a scale-space from a computational geometry point of view based on space or time signal/image convexity. To our knowledge, the tools from computational geometry [7], commonly used in images synthesis and 3D modeling, are quite unusual as far as raster images and computer vision are concerned. Typically, a 2D image may be viewed as a collection of sampled points on a surface in R3 , and may be represented by computational geometry objects using projection and interpolation operators, as we shall see in section 2. √ The notion of α-shape [8], which roughly corresponds to the “up-to- α-detailsconvex-hull” was introduced to represent 3D objects from given unorganized point clouds in R3 . In our approach, it is applied to known objects (signals or images) to deﬁne the so-called α-scale-space. In some way, this modeling can be thought of as a continuous variant of the Empirical Mode Decomposition (EMD) introduced earlier for signal decomposition in [10]. X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 770–781, 2009. c Springer-Verlag Berlin Heidelberg 2009

Computational Geometry-Based Scale-Space

771

The relationships between the proposed scale-space and previous mathematical frameworks are given in section 3. The motion-by-curvature is especially examined since it constitutes a possible interpolation operator needed in our approach. An equivalence with the usual opening operators from mathematical morphology is also presented. In section 4, a demonstration of the methodology applied to an usual decomposition problem in biological imaging is proposed. In some circumstances, the studied ﬂuorescence video-microscopy sequences can be viewed as the sum of a diﬀusing component (slowly varying in space and time), and a very localized faster moving component. The proposed algorithm is mainly used to analyze 1D temporal signals extracted from a temporal series of images. The implementation is now routinely used by biologists and collaborators because of its rapidity of execution and its simplicity of control.

2

Computational Geometry-Based Scale-Space for Curves

This section is devoted to the description of a new morphological scale-space based on the computational geometry theory and α-shape concept [8]. The theory is laid out for Rd , but is mainly applied to signals and images, that is α-shapes in R2 and R3 . An analogy with EMD is also discussed in this section. 2.1

α-Shapes and Convexity

The key ingredients of computational geometry (see Fig. 1), are simplices, that is points, segments, triangles and tetrahedron. Sets of simplices form complexes. Definition 1 – A k-simplex σT is the convex combination of a set T of k + 1 points in Rd (k-simplices are points (k = 0), segments (k = 1), triangles (k = 2) or tetrahedrons (k = 3)). – A complex K is a set of simplices verifying: i) If σT ∈ K, ∀U ∈ T, σU ∈ K; ii) If σU , σV ∈ K, then σU∩V = σU ∩ σV . – A triangulation of a set P of points in Rd is a connected complex where all the k-simplices are included in a k + 1-simplex, 0 ≤ k < d. – A Delaunay triangulation of a set P of points in Rd is a triangulation where the circumscribed circle of all the d-simplices does not include any points of P . The triangulation is unique if the points are in generic positions. The triangulation is a tessellation of the convex hull of P . – A ﬁltration of a complex K is a nested sequence of complexes included in K: ∅ = K 0 ⊆ K 1 ⊆ . . . ⊆ K m = K. In Fig.1, the Delaunay triangulation of a point set is shown in blue and the original points are shown in red.

772

A. Chessel et al.

The notion of α-shape is based on Delaunay triangulations and amounts to selecting points “with enough empty space around them”. More formally, Definition 2 (α-shapes). Let P be a set of points of Rd and TP its Delaunay triangulation. Let σT be a simplex of TP , cT the smallest sphere going through T and ρT its radius. σT belongs to the α-complex Kα of P , α ≥ 0 if and only if either of these propositions is true: 1. ρ2T < α and cT does not include any other points of P , 2. σT ⊂ σU with σU ∈ Kα . The subset of Rd actually covered by the α-complex (the underlying space) is called the α-shape Sα of P . In the general case, the α-complex is not pure, i.e. it may contain isolated points or segments not included within a triangle. Thus, the regularized α-shapes are also deﬁned: Definition 3 (regularised α-shapes). Let P be a set of points of Rd , TP its Delaunay triangulation and Kα its α-complex, α ≥ 0. The regularized α-complex ˜ α of P is the largest complex (wrt inclusion) included in Kα for which all the K k-simplices are included in a k + 1-simplex. The regularized α-shape S˜α is the underlying space of the regularized α-complex. The following basic property holds true for both the α-shapes and regularized α-shapes we have deﬁned: Property 1 – There exists a ﬁnite number of values of α leading to distinct α-complexes and α-shapes. – If α = 0, Kα = Sα = P . – There exists a ﬁnite value αM such that ∀α ≥ αM Kα = TP . – The sequence {Kα }, 0 ≤ α ≤ αM , is a ﬁltration of TP . √ Intuitively, the idea is to deﬁne the “convex-up-to- α” hull of a set of points. For a given α, the α-shape consists of the√points “with enough empty space around them”. The resulting shape is made of α sized concavities. In Fig.1, the α-shape (for a given α) is shown in green and is a subset (according to a size criteria) of the original Delauney triangulation (in blue) computed from the initial set of points (in red). 2.2

α-Scale-Space Framework

In this section, we use the α-shape concept to deﬁne original scale-spaces. Let u : Rn → R be a continuous function, n = 1 or 2. Let d = n + 1, and P = {(x1 , . . . , xn , u(x1 , . . . , xn )) ∈ Rd | ( x1 , . . . , xn ) ∈ Nn } the ﬁnite set of points of Rn+1 = Rd corresponding to samples from u obtained at integer coordinates. Let P˜α , α ≥ 0 be the regularized α-shapes of P in Rd . Considering sampled points in Rn+1 allows us to pass from the continuous to the computational geometry setting. But going back from a computational geometry object in (n + 1)D to a continuous function in Rn two tools will be needed: a projection operator to get something mono-valued in nD and an interpolation operator for continuity.

Computational Geometry-Based Scale-Space

773

Definition 4 (lower envelope). The lower envelope of a set of functions H = {H1 , . . . , Hn } defined in R1 . . . , Rn ⊂ Rn is the pointwise minimum over H: LH = min1
Thus, Pα are selective sub-samplings of u based on local convexity analysis. By interpolating those two sets of points, we can deﬁne sets of functions corresponding to the original function analyzed at diﬀerent increasing scales. Let Ω be an open of Rn and f : ∂Ω → R values deﬁned on its border. Let I be an interpolation operator that associates a unique function on Ω to f deﬁned onto ∂Ω. In what follows, I is assumed to verify the maximality principle: max∂Ω f ≥ maxΩ I(f ) ≥ minΩ I(f ) ≥ min∂Ω f . We can now deﬁne: Definition 6. Let I be an interpolation operator verifying the maximality principle. Considering a point P = (x1 , x2 , x3 ) as a function of R2 f (x1 , x2 ) = x3 , + − − we define the upper and lower α-scale space as: u+ α = I(Pα ) and uα = I(Pα ). + − ) The α-scale space of u is then defined as uα = (u +u . 2 In practice, various interpolation operators can be used, depending on the application. In 1D, a linear interpolation is the most usual. The question is more diﬃcult in 2D and more details are given in the following sections. From the basic properties of α-shape explained above, basic properties of α-scale-spaces are given: Property 2 − – For α = 0, u+ α = uα = uα = u. + – For α > αM , uα and u− α are respectively the upper and lower convex hulls of u. − – For all α, u+ α ≥ u ≥ uα .

Properties of invariance (or covariance) are also of particular interest in the case of scale-spaces. Because the α-shape is related to the distance between points, it is invariant to isometric transformations and deﬁned up to a scale factor. − d Property 3 (Invariance). u+ α , uα and uα are invariant to similarity in R .

This includes rotation and translation (for images), gray level shift and contrast inversion. On the other hand, invariance to generic or even increasing contrast changes does not hold. Indeed such a transformation would treat one direction independently from the others and thus would break the isotropy of the underlying algorithms. It is worth noting that it might be possible to exhibit more invariance properties by formulating the computational geometry framework using other metrics, but this is out of the scope of this paper.

774

2.3

A. Chessel et al.

Relationships with the Empirical Mode Decomposition

The EMD algorithm was introduced for highly non-stationary and non-linear 1D physical signals by Huang in [10], and since extended to 2D images [11]. A modal decomposition of signals is obtained by applying an iterative and intuitive algorithm which received recently a more formal justiﬁcation within the wavelet framework [9]. Brieﬂy, the EMD algorithm is composed of two loops. In the inner loop, the upper and lower envelopes are iteratively computed based on an interpolation process of maxima and minima and further used to build a “mean envelope”. The diﬀerence between the “mean envelope” and the signals yields a component, called the IMF (Intrinsic Mode Function) and a residual signal with no high frequency. The computation of the “mean envelope” and subtraction are iterated in the outer loop until the residual is a monotonic trend. Ultimately, we have n u = i=1 ci + r where the {ci } represent the modes that capture increasingly higher frequencies, and r represents the residual component. In the terminology of scale-spaces, we rather write uk = u − ki=1 ci , k = [1 . . . n] with u0 = u and un+1 = r by convention, to yield an increasingly coarse description of the original signal. Thus EMD can be seen as a discrete scale-space or, as the number of modes is relatively small, a scale-space with a limited number of scales automatically selected. Our proposed α-scale-space is consistent with this framework. The common ideas are modes corresponding to variations around a local mean, and local maxima/minima used to compute upper and lower envelopes. Nevertheless, they diﬀer for two reasons: i) the more continuous nature of the α-shape decomposition,which consider a larger number of modes and scales; ii) the possibility of computing non-symmetric modes with the α-shapes. The IMFs computed by EMD are symmetric components with respect to the computed mean, which was desired in [10]. In image analysis, since the images are lower bounded, this constraint can be relaxed. The relationships between our scale-space formulation and the modal decomposition outlined by EMD however is worth keeping in mind. Section 4 illustrates these connections, especially if we consider lower α-scale-spaces. 2.4

Implementation and First Examples

In the following, the CGAL implementation of the computational geometry algorithms is used [2]. In the 1D case, a simple linear interpolation is performed. For 2D images, the interpolation is more problematic and a scattered points interpolation framework must be considered (see [1,6] for reviews). Natural neighbor interpolation [3] is used, as is it deﬁned in the same computational geometry framework available in CGAL. Other possible choices are discussed in the next section.

Computational Geometry-Based Scale-Space

775

Fig. 1. Intensity variation wrt time (for a given pixel) in video-microscopy (see section 4). Red: original points (see also Fig. 4), blue: Delauney triangulation, green: α-shapes.

Fig. 2. Example of α-scale-space applied to a 2D otolith image (see text). Left to right: original, α-scale-space for α = 100, image diﬀerence.

Figure 1 shows a 1D example in video-microscopy, i.e. the intensity variations for a given pixel wrt time (see section 4 for details). The two peaks, corresponding to two objects passing through that pixel, are of interest. This illustrates the Delauney triangulation of a set of points and its relationships with the α-shape concept. The lower α-scale-space is deﬁned as the green curve below the red points. Thus the diﬀerence with the original curve allows us to recover the peaks. Figure 2 shows a 2D example of the α-scale-space decomposition when applied to an otolith image. Otoliths are biological hard tissue (of few mm [12]) of much use in marine biology and ecology. The α-scale-space corresponds to the trend in intensity, while the diﬀerence between the trend and the original image corresponds to variations around that trend.

3

Interpolation and α-Shapes: Mean Curvature Motion and Mathematical Morphology

As deﬁned, the α-scale-space is based on selective sub-sampling and interpolation. In this section, we investigate several interpolation operators. The so-called mean curvature motion and some tools from the mathematical morphology, are mainly studied. 3.1

α-Shape and Curvature Motion

Motion-by-curvature is a partial diﬀerential equation (PDE) commonly-used in image analysis. Curves evolve with a speed and a direction depending on the local curvature. If an image is represented as a stack of non-intersecting level

776

A. Chessel et al.

lines, each level line evolves according to its own curvature. It is established that evolving an image according to the mean curvature motion amounts to minimizing the total variation of the image. Finally, an image is also a 3D surface and may be deformed according to its local curvatures in R3 . In [6] an axiomatic approach for image interpolation was studied, singling out three second order interpolation operators: i) the ﬁrst operator is the Laplacian operator, known not to be able to handle isolated data points and thus not suitable to scattered point interpolation; ii) the second operator is the Absolutely Minimizing Lipschitz Extension (AMLE) which supposes a Lipschitz initialization and thus not suitable to scattered point interpolation either; iii) the third operator is the curvature operator for which, when used as an interpolation operator, neither uniqueness nor existence of solutions holds. In [16], L. Vese proposed a mean curvature motion based on PDEs for the computation of convex hulls of signals and images, and proved existence and uniqueness of viscosity solutions: ∂u = 1 + |Du|2 min(0, λmin (D2 u)), ∂s u(x, 0) = u, where λmin (D2 u) is the lower eigenvalue of the Hessian matrix. It is an alternative way of computing a convexity based scale-space, yielding a family of functions of increasing convexity up to the convex hull. Two ﬂows, converging respectively toward the convex and concave hulls, are then deﬁned. These ﬂows can be used in a similar way as the upper and lower α-scale- spaces deﬁned earlier, that is two envelopes enclosing the original function. However the relationships between the two frameworks cannot be easily exhibited. In the particular case of a convex function u, the PDE can be used as an interpolation operator of Pα− . It amounts to using the convex hull of Pα− . A tessellation of u into convex components would be needed in the general case. 3.2

α-Shape and Mathematical Morphology

Pioneered by Serra and Matheron, the mathematical morphology [15] is one of the most classical theory in image analysis. In the scale-space theory, it has been shown to be related to the mean curvature motion and related ﬁlters [5]. Typically the gray-scale opening of an image is deﬁned as the composition of a gray-scale erosion and dilation. Let u be considered as the part of Rd below it (in R3 for images: u = {x ∈ R3 | x3 < u(x1 , x2 )}). Its opening can be written as: O(u) = Sup{B ⊂ B, B ⊂ u} with B the class closed under union generated by a structuring element B0 . Thus if the structuring element B0 is a ball of radius r, Or (u) is the best approximation of u below u by unions of ball in R3 . It corresponds to the so-called “rolling ball” algorithm commonly used to remove backgrounds and trends in biological imaging. It is intuitively related to the α-shape, as both are obtained through an approximation by “sweeping balls”. Indeed we have

Computational Geometry-Based Scale-Space

777

Fig. 3. Lower α-scale-space and opening on an artiﬁcial example (see text). Blue crosses: original points; red line: opening by a disk of radius 1; green: α-scale-space for α = 1. − Proposition 1. There exists an interpolation operator I for which u+ α and uα are √ respectively the morphological opening and closing of u by a sphere of radius α.

Proof. Let Pα− = {(xi1 , . . .√ , xid ), i = 1 . . . |Pα− |}. Points in Pα− have by deﬁnition an empty space of radius α below them, so we can write ∀i ∈ [1 . . . |Pα− |], O√α (u)(xi1 , . . . , xid−1 ) = xid , √ that is all points of Pα− are in O√α (u). Thus u− α = O α (u) is a valid deﬁnition of a lower α-scale-space. The same ideas apply for closing and upper α-scale-spaces.

However, the resulting interpolation operator does not verify the maximality principle. It can be seen on a simple counter-example: let b : [−1, 1] → R be the half-sphere of radius 1 centered on O, and let u = b + εb, with ε small. Then the opening with a ball of radius 1, O1 (u) = b, while Pα− for α = (1)2 is the points (0, −1) and (0, 1). As an interpolation of Pα− in [−1, 1], O1 (u) does not verify the maximality principle. This example is presented in Fig.3 where u is shown in blue and r corresponding to the opening and the interpolated α-scale-space are respectively shown in red and green. Thus the connection with some tools from mathematical morphology gives an intuitive idea of α-scale-space, but is not a suitable choice in practice since the maximality principle does not hold.

4

Application to Additive Signal Decomposition in Fluorescence Video-Microscopy

The proposed framework is especially used to solve the additive signal decomposition problem in ﬂuorescence microscopy. We shall see that the proposed framework is particularly adapted to process such image sequences depicting two components with diﬀerent spatio-temporal characteristics that translate into signals with diﬀerent convexities.

778

A. Chessel et al.

Biological context and motivation. The discovery of the xFP (naturally ﬂuorescent proteins), for which the 2008 chemistry Nobel prize was awarded, along with advances in genetic engineering allow for the coupling of any protein of interest with a ﬂuorescent tag in cells, tissues and organisms. Dynamic visualization of in vivo protein behavior is then possible in many biological models. In this section, we study the Rab GTPase family, a family of proteins involved in the intra-cellular transport and maintenance of membranes. The Rab proteins are known to exist in two main states, a cytosolic state slowly diﬀusing and a membrane state in which they eventually move as vesicles with directed movements. The ﬂuorescence depends in ﬁrst approximation on the concentration of ﬂuorescent proteins, and the membrane state consists of vesicles of much higher concentration than the cytosol in which they are embedded. Thus the vesicles are seen as dots of higher intensity moving rather quickly on a slowly varying background. The separation of those two components is a necessary step for several studies [13]. Thanks to the biophysical properties of the image, it can be regarded as an additive signal decomposition problem. The intensity of a given pixel is proportional to the total concentration of the proteins in the corresponding volume, deﬁned as the sum of the concentration of the proteins in each state. Thus for an acquired image sequence u, we can write u = ucyt + uves with ucyt the cytosol component slowly varying in both space and time, and uves the vesicular component, very localized in space and time and moving rather quickly. In our experiments, we performed pairwise comparisons of the behavior of speciﬁc Rab complexes. Rabin8 against Rab8wt (wild type) or against Rab11Awt are especially studied in this paper. Rabin8 was previously shown to interact with both (or either) the two Rab GTPases, Rab8 and Rab11A, whose functions in membrane traﬃc are closely related, although separated in time and location. Those proteins have clear diﬀerent properties, and in particular show diﬀerent separations into membrane and cytosolic states. The images shown are taken using fast Total Internal Reﬂection Fluorescent microscopy (10 to 20 frames/seconds, time sequences 120s) which allows for an imaging of the very narrow depth inside the cell near the plasma membrane. Experiments and results. Previous related works include spatial detection using wavelets [14] and background extraction using a temporal model [4]; the vesicles appear as bright blobs in space or as spikes in time. Figure 4shows the intensity variation wrt time for a given pixel (the two spikes correspond to vesicles passing through that pixel). Generally, the decreasing trend corresponding to the loss of ﬂuorescence over time (known as observational photobleaching) needs to be estimated and to be potentially removed. Our scale-space framework is then particularly well suited if we consider the lower envelope as the key feature to analyze such additive signals. When applied to individual 2D images, the image sequence decomposition is not reliably performed (not shown in the paper). The typical size of vesicles is relatively small and may be confounded by noise. It turns out that the temporal characteristics (movements, apparition and disappearance) are actually more pregnant than the spatial ones, and are analyzed further.

Computational Geometry-Based Scale-Space

779

Fig. 4. Intensity variation wrt time for a given pixel (the two spikes correspond to two vesicles passing through that pixel)

Figure 5 shows the application of the proposed lower α-scale-space to analyze 1D temporal signals, processed individually. Two typical sequences depicting several Rab proteins are presented. We decided arbitrarily to display the maximum intensity projection (MIP) along the time axis since these maps enable to summarize the sequence contents (the videos are more demonstrative but can not be embedded in the paper). In Fig.5, the bright lines correspond to the main trajectories of moving vesicles. The MIP maps of the original image sequences are shown on the ﬁrst row in Fig. 5. The second and the third rows show respectively the MIPs of the vesicular components and the MIPs of computed trends corresponding to the lower α-scale-spaces when α = 1000. The vesicular components are computed as the diﬀerence between the original image sequence and its computational geometry-based scale-space representation. By summing the two decomposed image sequences, we get the original image sequences with no loss of information. Clearly, the bright lines are enhanced and the slowly moving background contain no blob (moving vesicles) as desired. A few structures in the vesicular component are only hinted at in the original sequence. In this biological experiment, we focus on the pairwise co-localization of membrane (or vesicular) signals of two pairs of Rab proteins: (Rab8 and Rabin8) and (Rab11a and Rabin8). In Fig. 5 (left), co-localization of (Rab8 and Rabin8) is measured near the plasma membrane of the cell; (Rab11a and Rabin8) appear more co-localized in the same area, for the time scale we considered in this experiment. It turns out that our signal decomposition enables to better assess co-localization for moving and static structures in the image sequences. Each temporal signal is processed individually but the resulting image sequences are surprisingly regularized. While the results are quite preliminary and need to be carefully inspected by experts, we have now the opportunity to perform quantitative spatio-temporal co-localization. This validation is currently underway. It is worth noting that the proposed algorithm is very fast (a few seconds to process several hundred images) and easy to control. It is routinely used by biologists (via the ImageJ software) to separate the membrane (or vesicular) and cytosolic states of Rab proteins. This method is relevant to elucidate the roles of diﬀerent molecular partners and can be used in many other topics. It will allow further a better description and classiﬁcation of complex behaviors of various proteins, more reliably than previously, that is from the original image sequences.

780

A. Chessel et al. Rab8 and Rabin8

Rab11a and Rabin8

Fig. 5. MIP maps of image sequences. Top to bottom: original images, vesicular components, cytosolic components (see text).

5

Conclusion

The computational geometry concepts, earlier introduced for 3D image synthesis were exploited to derive an original scale-space-based signal/image representation. Sub-samplings of the original function are performed, keeping only the points with enough space around them. The selected points are then used to compute a continuous approximation of the original function by interpolation. The relationships with the Empirical Mode Decomposition was established, since they share algorithmic similarities and enable to represent the original signal using a limited number of modes. The relationships between the proposed framework and several classical scale-space frameworks were also studied. In particular, it was shown that, for a well chosen interpolation operator, the lower α- scale-space is equivalent to a gray-scale opening in mathematical morphology. Finally the particular case of additive decomposition in video sequences acquired by ﬂuorescence microscopy was addressed and served as a demonstration. The convexity-based decomposition is particularly well suited to the biophysical properties of the studied proteins in ﬂuorescence microscopy.

Computational Geometry-Based Scale-Space

781

A number of theoretical questions remain open for investigation. In particular, going beyond spheres to deﬁne α-shape, toward α-shapes in generic imageinduced metric may lead to interesting results both in theory and practice. Finding an appropriate interpolation operator remains an open question to be addressed in future work.

References 1. Amidror, I.: Scattered data interpolation methods for electronic imaging systems: a survey. Journal of Electronic Imaging 11, 157–176 (2002) 2. CGAL Editorial Board. CGAL-3.2 User and Reference Manual (2006) 3. Bobach, T., Hering-Bertram, M., Umlauf, G.: Comparison of voronoi based scattered data interpolation schemes. Palma de Majorque (2006) 4. Boulanger, J., Kervrann, C., Bouthemy, P.: Estimation of dynamic background for ﬂuorescence video-microscopy. In: 2006 IEEE International Conference on Image Processing, pp. 2509–2512 (2006) 5. Cao, F.: Geometric curve evolution and image processing. Lecture notes in mathematics (2003) 6. Caselles, V., Morel, J.M., Sbert, C.: An axiomatic approach to image interpolation. IEEE Trans. Image Processing 7, 376–386 (1998) 7. de Berg, M., van Kreveld, M., Overmars, M., Schwarzkopf, O.: Computational geometry: algorithms and applications. Springer, New York (1997) 8. Edelsbrunner, H., Mucke, E.P.: Three-dimensional alpha shapes. ACM Transactions on Graphics 13, 43–72 (1994) 9. Flandrin, P., Rilling, G., Goncalves, P.: Empirical mode decomposition as a ﬁlter bank. IEEE Signal Processing Letters 11, 112–114 (2004) 10. Huang, N.E., Shen, Z., Long, S.R., Wu, M.C., Shih, H.H., Zheng, Q., Yen, N.C., Tung, C.C., Liu, H.H.: The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Royal Society of London Proceedings Series A 454, 903 (1998) 11. Nunes, J.C., Bouaoune, Y., Delechelle, E., Niang, O., Bunel, P.: Image analysis by bi-dimensional empirical mode decomposition. Image and Vision Computing 21, 1019–1026 (2003) 12. Panﬁli, J., de Pontual, H., Troadec, H., Wright, P.J. (eds.): Manual of ﬁsh sclerochronology, Ifremer-ird coedition (2002) 13. Pecot, T., Kervrann, C., Bouthemy, P.: Minimal paths and probabilistic models for origin-destination traﬃc estimation in live cell imaging. In: ISBI 2008, pp. 843–846 (2008) 14. Racine, V., Sachse, M., Salamero, J., Frasier, V., Trubuil, A., Sibarita, J.-B.: Visualization and quantiﬁcation of vesicle traﬃcking on a three-dimensional cytoskeleton network in living cells. Journal of Microscopy 225, 214–228 (2007) 15. Serra, J.: Image analysis and mathematical morphology, vol. 1. Academic press, London (1982) 16. Vese, L.: A method to convexify functions via curve evolution. Commun. Partial Diﬀerential Equations 24, 1573 (1999)

Highlight on a Feature Extracted at Fine Scales: The Pointwise Lipschitz Regularity Christophe Damerval1 and Sylvain Meignen2 1 2

Dept. of Computer Science, Katholieke Universiteit Leuven, Belgium Laboratoire Jean Kuntzmann (LJK), University of Grenoble, France

Abstract. The aim of this paper is to study the robustness of the pointwise Lipschitz regularity in 2D, which is a measure of the local regularity of the intensity function associated to an image. This regularity can be eﬃciently computed by an approach based on ﬁne scales. We assess its robustness when the image undergoes various transformations, especially geometric ones. The results we obtain show that the pointwise Lipschitz regularity is a suitable feature for applications in computer vision. Keywords: Lipschitz regularity, invariance properties, wavelet decompositions, multiscale edge detection, extraction of characteristic values, robustness to transformations applied to the image.

1

Introduction

The extraction of invariant or robust features from an image appears as a central issue in computer vision. The diﬃculty of this problem lies in the fact that natural scenes are often viewed under various situations, corresponding to a wide class of transformations (geometric deformations or illumination change for instance). So as to reach certain invariance properties regarding transformations such as scale change, multiscale approaches were put forward. In particular, methods based on the Scale-Space theory [1,2,3] turned out as successful in computer vision. These can evidence regions of interest, which are stable through local geometric deformations [4]. These regions are identiﬁed by their location and characteristic scale, and their content can be quantiﬁed by a suitable descriptor [5]. Recent works compared state-of-the-art interest regions detectors [6] and region descriptors [7]. Existing methods proved to be eﬃcient for one type of scene or transformation [8, 9, 10]; however, no method outperforms the others in all cases, so combining diﬀerent kinds of features seems relevant. In this paper, we study a feature related to the local regularity of the intensity function: the pointwise Lipschitz regularity α ∈ R (denoted regularity α). It was widely studied in 1D, especially in the case of multifractal signals [11, 12], and also applied to the characterization of singularities [13, 14] and landmark registration [15]. In 2D, methods based on regularity measures were also put forward with applications to textured images [16, 17], the regularity being used from a global point of view. Besides, recent advances in edge detection using X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 782–794, 2009. c Springer-Verlag Berlin Heidelberg 2009

Highlight on a Feature Extracted at Fine Scales

783

multiscale approaches [18] were used for object detection [19]. New developments using the multiscale SIFT descriptor were also recently proposed [20]. Here we are concerned with the pointwise regularity in 2D. More precisely, a multiscale approach focused on ﬁne scales allows to compute numerically values of regularity α. As we will see, this regularity appears as a relevant feature to detect points of interest and could be proﬁtably used as an image descriptor: on the one hand it has invariance properties (especially regarding geometric transformations), and on the other hand values of the pointwise Lipschitz regularity α can be eﬃciently computed. This paper is organized as follows. We ﬁrst present the notion of regularity α in 2D, its invariance properties. So as to numerically estimate α in 2D, we recall an algorithm based on a multiscale edge detector [21], which gives pointwise estimations of α at edge points of the image. We also present a methodology so as to compare values of α between two images related by a geometric deformation. Finally, an evaluation procedure allows to assess the robustness of the regularity α, for natural scenes viewed under various imaging conditions. The obtained results show that the regularity α makes up a robust feature.

Regularity α in the Context of Image Analysis

2

We consider an image (in level of gray) given by its intensity function f : R2 → R. We ﬁrst present brieﬂy the deﬁnition of Lipschitz regularity in 2D, inferred from the 1D deﬁnition [13]. This leads to the notion of regularity α. Then we recall a known algorithm for computing the values of the regularity α. We also investigate its invariance properties, especially from a practical point of view. 2.1

Notion of Regularity α in 2D – Invariance Properties

The Lipschitz regularity generalizes the usual notion of regularity. Deﬁnition 1. (1D Lipschitz regularity) Given α ∈]0, 1[, a function f : R → R is α-Lipschitz at x0 ∈ R if there exists a neighborhood V of x0 and A > 0 so that ∀x ∈ V, |f (x) − f (x0 )| ≤ A|x − x0 |α This can be extended for α ∈ R. In particular, for α = n ∈ N∗ , this corresponds to a locally C n function. Besides, for α < 0, this deﬁnition can be generalized thanks to the theory of distributions (see details in [14]). Deﬁnition 2. (2D Lipschitz regularity) Let f : R2 → R and x0 ∈ R2 . For θ ∈ [0, 2π[, we deﬁne fθ : R∗+ → R as fθ (h) = f (x0 + huθ ), where uθ = (cos θ, sin θ). For α ∈ R, f is α-Lipschitz at x0 ∈ R2 if ∀θ ∈ [0, 2π[, fθ α-Lipschitz at 0 Note that this deﬁnition agrees with the usual deﬁnition of the Lipschitz regularity. Indeed, when α ∈]0, 1[, provided f is α-Lipschitz at x0 , we can write |f (x) − f (x0 )| = |f (x0 + (h cos θ, h sin θ)) − f (x0 )| = |fθ (h) − fθ (0)| ≤ Ahα |f (x) − f (x0 )| ≤ A||x − x0 ||α with A > 0, for x in a neighborhood of x0 .

784

C. Damerval and S. Meignen

Deﬁnition 3. (Regularity α) Let f : R2 −→ R and x0 ∈ R2 . The regularity α of f at x0 is deﬁned as α = α(f, x0 ) = inf{α0 ∈ R, f α0 -Lipschitz at x0 }

(1)

The relevance of α arises from its invariance properties: the regularity α appears as a characteristic value. In particular, let us study the case of a constant aﬃne deformation (so including rotation and scale change), widely studied in the ScaleSpace theory [2]. Proposition 1. (Inﬂuence of an aﬃne deformation on the regularity α) Let f : R2 −→ R, and g deﬁned by ∀x ∈ R2 , g(x) = f (Bx), with B a 2 × 2 invertible matrix

(2)

Then, for α ∈]0, 1[, α(f, x0 ) = α(g, y0 ) with y0 = B −1 x0 . Proof. According to def. 3, there exists A > 0 so that ∀θ ∈ [0, π[, |f (x0 ) − f (x0 + huθ )| ≤ Ahα

(3)

Let us study the regularity of g at y0 = B −1 x0 . For θ ∈ [0, 2π[ we have |g(y0 ) − g(y0 + huθ )| = |g(B −1 x0 ) − g(B −1 x0 + huθ )| = |f (x0 ) − f (x0 + hBuθ )|

(4) (5)

Moreover, since Buθ = λuθ with λ ∈ R∗ and θ ∈ [0, 2π[: |g(y0 ) − g(y0 + huθ )| = |f (x0 ) − f (x0 + hλuθ )| ≤ (A|λ|α )hα

(6) (7)

So there exists A > 0 so that ∀θ ∈ [0, 2π[, |g(y0 ) − g(y0 + huθ )| ≤ A hα

(8)

and g is α-Lipschitz at x0 . Then, let us assume the regularity α of f corresponds to a minimum α0 attained in a certain direction θ0 . Since we consider a constant aﬃne deformation, there exists θ1 so that Buθ1 and uθ0 are collinear. Hence, the regularity α of g at B −1 x0 corresponds to a minimum α0 in the direction θ1 . So the regularity α is preserved when a constant aﬃne deformation is applied to the image. Note this invariance property may not always hold in practice. Indeed, when B becomes nearly singular (case of extreme deformations), λ can be very small when uθ is an eigenvector of B. So there may be numerical instabilities for extreme deformations. However, as we will see, the regularity α yields a signiﬁcant robustness for wide-ranging transformations (and not only small deformations). Now, let us discuss more general transformations, given by ∀x ∈ R2 , g(x) = f (v(x)) with v : R2 −→ R2

(9)

Highlight on a Feature Extracted at Fine Scales

785

In this general context, note that the regularity α is not necessarily preserved: depending on the regularity of v, g may be more regular than f , resulting in a higher regularity α for g than for f . Nevertheless, we point out that it can be preserved in many practical cases, especially when considering image edges [22]. Note that in the case of an image representing an edge, f is regular along the tangent to the edge and irregular along the normal direction – see Fig.1(a). In this regard, so as to estimate precisely the regularity α of f at a given point, it is important to determine the direction of maximum irregularity; we further explain how to compute this direction and estimate α (section 2.2). More generally, since transformations such as local aﬃne deformations do not alter the topology of the edges, the regularity α on these edges should be preserved – see Fig.1(b). 2.2

Numerical Computation through a Multiscale Approach

So as to compute numerical values of α, we use a known approach based on a multiscale edge detector [21]. Let us recall brieﬂy some aspects of this detector. According to Canny [22], edge points correspond to locations where the magnitude of the gradient attains a local maximum in the direction of the gradient – which is the direction of maximum irregularity. A generalization of Canny’s detector was put forward by Mallat, using wavelet decompositions [21]. This allows to detect edge points, and also to compute an accurate estimation of the regularity α at these edge points. This computation of α is carried out using a linear regression at the ﬁnest scales. Besides, denoting N the size of the data (N = n2 for an image n × n), this formulation can be computed in O(N ), thus allowing a fast computation. In summary, this detector is known as an eﬃcient method so as to compute numerical values of α. We emphasize that this method is focused on the ﬁnest scales, and that it gives pointwise estimations of the regularity α. Given an image f , the output of this detector can be expressed as a set (xi , yi , αi ) ∈ R3 , 1 ≤ i ≤ nf (10) where nf is the number of detected edge points (xi , yi ), each being associated to a value αi . Since edge points correspond to singularities (where f may not be diﬀerentiable), the obtained values can be negative: typically a boundary leads to α = 0, a line to α = −1, and an isolated point to α = −2. For natural images, we obtain various values; for instance, given the image represented on Fig.2(a), we represent the density associated to the regularity α of detected edge points on Fig.2(c). In this case, 95% of the computed values are within [−1.4, 0.8]. Besides, some parameters of the detector (like thresholding) allow to tune the number of edge points. We use here a light thresholding in our numerical experiments, so as to obtain a large number of values of regularity α. This is consistent insofar as we want to evaluate the robustness of the regularity α from a practical point of view. 2.3

Empirical Study in the Case of an Aﬃne Deformation

We study here the robustness of the estimation of α in the case of natural images, for which values of regularity α are computed at detected edge points. For that

786

C. Damerval and S. Meignen

Deformation

Edge D1

Tangent

D2

Normal

Deformed edge

Original edge

(a)

(b)

Fig. 1. (a) At a point belonging to an edge line, the Lipschitz regularity is minimal along the normal direction; along this direction, the regularity α can be accurately computed. (b) If an edge undergoes a deformation which does not change its topology, the regularity α is preserved (D1 , D2 : directions of maximum irregularity).

0.07

0.06

Density

0.05

0.04

0.03

0.02

0.01

0 −3

(a)

−2

(b)

−1

0

1

Regularity α

2

(c) 0.9

Proportion of correct matches

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0

(a’)

(b’)

Exact Matches Approximate Matches 0.1

0.2

0.3

0.4

Tolerated error on regularity α

0.5

(d)

Fig. 2. (a,b) Two images related by a known aﬃne deformation; (a’, b’) Detected edge points; (c) Density of the regularity α, based on edge points represented on (a’); (d) Errors on the estimation of α for exact matches (EM) and approximate matches (AM) between edge points of (a’) and (b’)

purpose we consider an original image X0 (see Fig.2(a)) and a deformed image X1 (see Fig.2(b)) related by a known aﬃne deformation. This homography allows to carry out point-to-point correspondences and thus to compare the values of regularity between the two images (see Fig.2(a’,b’)). Given X0 and X1 , the detector leads to two sets of edge points with associated values of regularity α:

Highlight on a Feature Extracted at Fine Scales

787

S0 = (x0i , yi0 , α0i )i∈I0 and S1 = (x1j , yj1 , α1j )j∈I1 Afterwards, we project the points (x1i , yi1 )i ∈ I1 into the coordinates of X0 , and we carry out correspondences between the sets S1 and S0 . At this step, we have two possible choices: either exact matches (EM) for which a projected point of S1 corresponds exactly to a point of S0 ; or approximate matches (AM), by tolerating an error of 1 pixel. This allows to compare the computed values of α: given a correspondence between (x0i , yi0 , α0i ) and (x1j , yj1 , α1j ) (either EM or AM), we deﬁne the error on α (for one matched pair) as dα = dα (i, j) = |α0i − α1j |

(11)

Finally, a match is said correct if dα < , where the parameter is a tolerated error on α. We point out there is a certain freedom of choice for , which should depend on the application. In this regard, there is a trade-oﬀ between too low values (refusing any numerical error on α) and too high values of (not taking into account α). Let us now study the eﬀect of this parameter > 0, by comparing the sets S0 and S1 : we represent on Fig.2(d) the proportion of correct matches depending on , for both EM and AM. As expected, it increases with respect to the parameter . More precisely, a good result lies in the fact that it increases rapidly for small values , becoming thus signiﬁcant: indeed, when exceeds 0.3, this proportion attains almost 80%. Moreover we observe that the results are only slightly better for EM than AM, so that it can be relevant to consider AM to deﬁne a descriptor since the number of extracted points is signiﬁcantly larger. So these ﬁrst results show that the regularity α estimated at edge points is a feature robust to aﬃne deformations. Let us now evaluate the robustness of this feature in a more general context, when various transformations are applied to natural images.

3 3.1

Quantifying the Robustness of the Regularity α Evaluation Procedure

We consider 8 sequences, each consisting of 6 images (Xk )0≤k≤5 : ZoomRotation1, ZoomRotation2, Viewpoint1, Viewpoint2, Blur1, Blur2, Jpeg and Light (see Fig.3). For each sequence, the 6 images represent a given scene viewed under a certain imaging condition. For instance, considering the sequence Viewpoint1 (see Fig.4), each image Xk (1 ≤ k ≤ 5) corresponds to a change of viewpoint applied to the reference image X0 . The relevance of these sequences lies in different aspects. First they represent various objects: textured scenes – repeated textures, see Fig.3(a,d) – and structured ones – homogeneous regions with edges boundaries, see Fig.3(b,c). Secondly the imaging conditions are wide-ranging: geometric deformations and speciﬁc transformations like JPEG compression. Thirdly the degree of these transformation can be signiﬁcant (scale change up to 4, angle of viewpoint up to 60o , JPEG compression rate up to 98%). Finally, we mention the sequences ZoomRotation1, ZoomRotation2, Viewpoint1 and

788

C. Damerval and S. Meignen

Viewpoint2 correspond to actual camera displacements; the sequences Blur1, Blur2 and Light correspond to camera operations (varying the camera focus or shutter speed); for the sequence Jpeg, diﬀerent levels of JPEG compression were obtained by a software. For illustration purposes, we represent on Fig.3 the images X0 and X5 associated to every sequence. For more details, see http://www.robots.ox.ac.uk/~vgg/research/affine For a given set of images (Xk )0≤k≤5 associated to a sequence (viewpoint change for instance, see Fig.4), we carry out the following procedure: 1. For each image (Xk )0≤k≤5 , detect edge points and compute associated values of regularity α: pki = (xki , yik , αki ), 1 ≤ i ≤ nk . 2. For ﬁxed k (1 ≤ k ≤ 5), determine a set of C k of point-to-point correspondences between edge points of X0 and Xk (thanks to the known homography between these images) C k = (p0i , pkj ) matched according to a geometric criterion (12) This leads to a certain number of correspondences (NC) #C k . 3. Select the subset Ck of correspondences for which regularities are suﬃciently close (according to a parameter > 0) Ck = (p0i , pkj ) ∈ C k , dα = |α0i − αkj | < (13) and compute the matching score, representing the proportion of correct matches: #Ck Sk = (14) #C k This score reﬂects the robustness of the regularity α. We will study its evolution with respect to for the sequence Viewpoint1 (Fig.5), and also with respect to k for all sequences (Fig.6, for ﬁxed ). Note that in step 2, the matches based on a geometric criterion can be either exact or approximate (as described in section 2.3); we study both EM and AM. In step 3, one may have to choose the parameter , representing a tolerated error on α (as seen in section 2.3). In our experiments we use = 0.3, for which a high proportion of the matches (almost 80%) are deemed correct in the case of the aﬃne deformation studied in section 2.3 (see Fig.2(d)). Note also that this choice allows to identify clearly boundaries (α = 0), lines (α = −1) and isolated points (α = −2). 3.2

Analysis of the Results

Study of the sequence Viewpoint1 (Fig.5) First, we consider the sequence Viewpoint1, associated to increasing angles of viewpoint changes (k = 1 : 20o , k = 2 : 30o , k = 3 : 40o, k = 4 : 50o , k = 5 : 60o ). For each k (1 ≤ k ≤ 5), we represent on Fig.5(ak ) the evolution of Sk with respect to the tolerated error on α (parameter ), for both EM and AM. The obtained results are similar to the aﬃne deformation studied in section 2.3; indeed, EM

Highlight on a Feature Extracted at Fine Scales

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

789

Fig. 3. Sample of data set, representing X0 (reference image) and X5 (highest degree of transformation) associated to: (a, b) Scale change and rotation; (c, d) Viewpoint change; (e, f) Blur; (g) JPEG compression; (h) Illumination change

Fig. 4. (Top) Complete sequence Viewpoint1 (viewpoint change): 6 images X0 , ..., X5 ; (Bottom) Associated edge points

C. Damerval and S. Meignen 1

1

0.9

0.9

0.8

0.8

0.8

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.1

0.2

0.3

0.4

0.7 0.6 0.5 0.4 0.3 0.2 0.1

Exact Matches Approximate Matches

0 0

0.1

0.2

0.3

0.4

(a1 )

0.6 0.5 0.4 0.3 0.2 Exact Matches Approximate Matches

0 0

0.5

Tolerated error on regularity α

0.7

0.1

Exact Matches Approximate Matches

0 0

0.5

Tolerated error on regularity α

Proportion of correct matches

1 0.9

Proportion of correct matches

Proportion of correct matches

790

0.1

0.2

(a2 )

0.9

0.3

0.4

0.5

Tolerated error on regularity α

(a3 )

0.9 1

0.8

0.8

0.6 0.5 0.4 0.3 0.2 0.1

0.1

0.2

0.3

0.4

0.6 0.5 0.4 0.3 0.2 0.1

Exact Matches Approximate Matches

0 0

0.7

Exact Matches Approximate Matches

0 0

0.5

Tolerated error on regularity α

Proportion of correct matches

Proportion of correct matches

Proportion of correct matches

0.9

0.7

0.1

0.2

0.3

0.4

0.7

2

1

0.6

3 4

5

0.5 0.4 0.3 0.2 0.1 0 0

0.5

Tolerated error on regularity α

(a4 )

0.8

0.1

0.2

0.3

0.4

0.5

Tolerated error on regularity α

(a5 )

(b)

1

1

0.9

0.9

0.8

0.8

0.8

0.8

0.7 0.6 0.5 0.4 0.3 0.2 Exact Matches Approximate Matches

0.1 0 1

1.5

2

0.7 0.6 0.5 0.4 0.3 0.2 Exact Matches Approximate Matches

0.1

2.5

3

3.5

0 1

4

1.5

Factor of scale change

0.6 0.5 0.4 0.3 0.2

2.5

0

3

Exact Matches Approximate Matches 20

30

Factor of scale change

(a)

0.7 0.6 0.5 0.4 0.3 0.2

50

0

60

1 0.9

0.8

0.8

0.8

0.8

0.4 0.3 0.2 Exact Matches Approximate Matches

0.1 0

2

2.5

3

3.5

0.6 0.5 0.4 0.3 0.2 Exact Matches Approximate Matches

0.1 4

4.5

Increasing blur

(e)

5

5.5

6

0

2

2.5

3

3.5

0.7 0.6 0.5 0.4 0.3 0.2 0.1

4

4.5

Increasing blur

(f)

5

5.5

6

Proportion of correct matches

1 0.9 Proportion of correct matches

1

0.7

0 55

Exact Matches Approximate Matches 60

65

70

75

85

90

95

Rate of JPEG compression JPEG (in %)

(g)

50

60

0.7 0.6 0.5 0.4 0.3 0.2 Exact Matches Approximate Matches

0.1 80

40

(d)

0.9

0.5

30

Angle of viewpoint change

1

0.6

20

(c)

0.9

0.7

Exact Matches Approximate Matches

0.1 40

Angle of viewpoint change

(b) Proportion of correct matches

Proportion of correct matches

0.7

0.1 2

Proportion of correct matches

1 0.9 Proportion of correct matches

1 0.9 Proportion of correct matches

Proportion of correct matches

Fig. 5. Proportion of correct matches with respect to the tolerated error on α, between the images X0 and Xk (1 ≤ k ≤ 5) associated to the sequence Viewpoint1 (see Fig.4). (ak ) Comparison between X0 and Xk , for both EM and AM; (b) Comparison between X0 and all Xk , for AM (the curves 1–5 correspond to those of ﬁg.(a1 –a5 ) for AM).

100

0

2

2.5

3

3.5

4

4.5

5

5.5

6

Decreasing light

(h)

Fig. 6. (a–h) Matching scores ( = 0.3) associated to the sequences of Fig.3(a–h)

yield slightly better than AM in all cases, and the proportion of correct matches increases with the tolerated error on α (parameter ). In particular, for = 0.3, we note that the proportion of correct matches (for EM and AM) exceeds: 80% for the angles 20o , 30o and 40o , see Fig.5(a1 –a3 ); 60% for the angles 50o and 60o ,

Highlight on a Feature Extracted at Fine Scales

791

see Fig.5(a4 –a5 ). Besides, we display on Fig.5(b) all the preceding curves (only for EM), representing the degradation of the estimation of regularity α as the degree of the viewpoint change increases. Note that as soon as is larger than 0.2, the proportion of correct matches exceeds 50%, even for signiﬁcant changes of viewpoint. This good result shows the regularity α makes up a robust feature. Study of all sequences (Fig.6) Now, considering the 8 sequences, for both approximate and exact matches, we represent on Fig.6 the curves (k, Sk ). More precisely, each graph of Fig.6 describes the performance for one particular sequence associated to a certain image transformation. For instance, Fig.6(c) refers to the sequence Viewpoint1 (associated to a viewpoint change), represented partially on Fig.3(c) (and comprehensively on Fig.4). This allows to assess the robustness of the estimation of the regularity α in general situations, which is the main objective of this paper. We emphasize that a method is all the better than it leads to higher scores and that these scores are stable, i.e, they remain high when the degree of deformation increases. On the basis of the matching scores (see section 3.1), we can evaluate how robust the estimation of α is, under various imaging conditions (Fig.6). The matching score gives the proportion of correspondences (between two images) for which computed values are close. Globally, we observe that the score tends to decrease as the degree of the transformation increases. Note also the scores for EM and AM are close, and that EM yield better results than AM (as we pointed out in section 2.3, see Fig.2). We do not observe a signiﬁcant diﬀerence between textured (Fig.6(a,d)) and structured scenes (Fig.6(b,c)); however, the structured scene of Fig.6(c) performs better, due to the presence of clear edges. Let us now detail the analysis of the results for each transformation. Scale change and rotation, Fig.6(a,b): for both textured and structured scenes, the performance decreases overall (from 0.8 to 0.5). Since sequences ZoomRotation1 and ZoomRotation2 correspond to signiﬁcant scale changes and rotations, the obtained results are satisfactory. Viewpoint change, Fig.6(c,d): the performance decreases moderately, remaining high for the structured scene (between 0.9 and 0.7, Fig.6(c)) and good for the textured scene (between 0.8 and 0.6, Fig.6(d)). In this regard, note that both sequences Viewpoint1 and Viewpoint2 contain distinct edge, which are moderately aﬀected by a viewpoint change. Blur, Fig.6(e,f): the performance decreases rapidly (from 0.7 to 0.3) for both structured scenes and textured ones. It is not surprising to obtain these average results since the blur modiﬁes the regularity α (edges are smoothed). It is well known that a smoothing operation alters the regularity α; yet, one can retrieve the regularity α when the smoothing kernel is known [14]. JPEG compression, Fig.6(g): the performance remains high (between 0.9 and 0.6), decreasing steadily. So JPEG artifacts have little impact on the regularity α, even if JPEG compression tends to blur sharp edges.

792

C. Damerval and S. Meignen

Light change, Fig.6(h): very high performance (stable, close to 0.9). Since illumination change does not alter the structure of the edges, the regularity α is not aﬀected by such changes. In conclusion, the regularity α appears as very robust to light change and JPEG compression, and less to image blur. Concerning geometric deformations, we obtain very good results for viewpoint change (especially structured scenes) and satisfying ones for scale change and rotation. In addition, we observe that for all transformations except blur, the performance does not fall down, even for a high degree of transformation. This emphasizes the fact that the regularity α is characteristic of the kind of edge. In a blurry context, the computation of α seems less reliable; nevertheless this can be improved by using higher scales in the detector (provided the edges are suﬃciently far apart). Eventually note that there is a balance between quantity – larger number of AM than EM, better repeatability for AM – and quality – better estimation of α for EM than AM. More precisely this balance is in favor of AM: the scores are only slightly inferior for AM than for EM, while AM leads to a greater number of points (which is important for image description). Note also that in an application such as image matching, increasing the parameter leads to a limited number of matched pairs (for which the computed values are closer, see section 2.3). 3.3

Discussion

Let us discuss now some aspects of the regularity α. First, concerning the way of computing values of α, note there exist methods based on wavelet transforms that allow to estimate the regularity α at any point of the image [23]. Here we focused on edges, since they appear to be robust to various transformations of the image. Moreover, so as to improve the matching performance, the number of detected edge points can be limited. This can be done by selecting only the highest responses (threshold on the modulus of the gradient), or by considering higher scales. This will result in evidencing only the most salient edges. Here, we considered all the values for regularity α, showing that a signiﬁcant proportion of the computed values α corresponds to values for which the estimation is robust. In addition, we can compare certain aspects of our approach with works related to interest regions [6, 7], to the extent they allow to characterize some objects present in the image. To that regard, it is important to note that we focus on position and pointwise regularity α, whereas these methods are based on position and characteristic scale. On the one hand, such methods use this characteristic scale so as to deﬁne interest regions, and then compute associated descriptors which characterize their content. On the other hand, our method gives pointwise features, so there is no region of interest associated to the Lipschitz regularity (the deﬁnition of such regions seems possible, but is not straightforward). Besides, we can compare the performance measures of these two methods. Since the criteria used are diﬀerent, we can only draw conclusions from the shape of the curves. Numerically we observe that our method is more stable than those based on interest regions: as the level of transformation increases, the performance declines slower overall. More precisely, our method appears:

Highlight on a Feature Extracted at Fine Scales

793

more stable for JPEG, Viewpoint1, Viewpoint2 and ZoomRotation2; less stable for ZoomRotation1 and Blur1; equivalent for Light, Blur2. So, compared to the best state-of-the-art methods, the regularity α (computed at edge points) yields a signiﬁcant robustness to various image transformations.

4

Conclusions and Perspectives

In this paper, we studied the regularity α in the context of interest point detection, focusing on edge points. This approach is based on ﬁne scales (pointwise features) which diﬀers from other methods based on coarser scales (local features). We explained why certain transformations of the image do not change the pointwise regularity α at such edge points. Hence the regularity α stands out as a characteristic value. The main contribution of our work lies in quantifying the robustness of the estimated value of α. For that purpose we proposed an evaluation procedure which allows to compare the values of the regularity α between two images related by a known homography. This leads to good results of robustness concerning geometric deformations – such as viewpoint change, scale change and rotation – and also JPEG compression and illumination change. So the regularity α (computed at edges) appears as a relevant feature for various tasks in computer vision. In terms of perspectives, let us point out potential applications of the regularity α. A ﬁrst application may consist on clustering edge points into edges (1D-curves) since these have connexity properties. Instead of relying only on a distance measure, such a clustering would use a criterion based on both distance and regularity α. Secondly, the regularity α could be used complementary to interest region descriptors: the estimated regularity α at all edge points within a given region may help to characterize the content of the region. For that purpose, it is interesting that our method can evidence a great number of features. Besides, the regularity α has potential applications to image registration, in particular feature-based methods (thanks to the identiﬁcation of lines, curves, points and corners). Eventually, the regularity α appears as an interesting additional feature, additional to other existing local features: an integration of the pointwise Lipschitz regularity in existing detectors will certainly improve their performance.

Acknowledgement We would like to thank the referees for relevant suggestions, and also Prof. M. Jansen from the K.U. Leuven for useful comments.

References 1. Iijima, T.: Basic theory on normalization of pattern (in case of typical onedimensional pattern). Bull. of the Electrotechnical Laboratory 26, 368–388 (1962) 2. Lindeberg, T.: Scale Space Theory in Computer Vision. Kluwer, Dordrecht (1994)

794

C. Damerval and S. Meignen

3. Witkin, A.: Scale-space ﬁltering. In: Proceedings of the 8th International Joint Conference on Artiﬁcial Intelligence, pp. 1019–1021 (1983) 4. Lindeberg, T.: Feature detection with automatic scale selection. International Journal of Computer Vision 30(2), 77–116 (1998) 5. Lowe, D.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004) 6. Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaﬀalitzky, F., Kadir, T., Gool, L.V.: A comparison of aﬃne region detectors. International Journal of Computer Vision 62(1), 43–72 (2005) 7. Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Transactions on PAMI 27(10), 1615–1630 (2005) 8. Kadir, T., Zisserman, A., Brady, M.: An aﬃne invariant salient region detector. In: Pajdla, T., Matas, J. (eds.) ECCV 2004. LNCS, vol. 3021, pp. 228–241. Springer, Heidelberg (2004) 9. Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: British Machine Vision Conf., pp. 384–393 (2002) 10. Tuytelaars, T., Van Gool, L.: Matching widely separated views based on aﬃne invariant regions. International Journal of Computer Vision 59(1), 61–85 (2004) 11. Arneodo, A., Bacry, E., Jaﬀard, S., Muzy, J.F.: Singularity spectrum of multifractal functions involving oscillating singularities. Journal of Fourier Analysis and Applications 4(2), 159–174 (1998) 12. Benassi, A., Cohen, S., Istas, J., Jaﬀard, S.: Identiﬁcation of ﬁltered white noises. Stochastic Processes and their Applications 75(1), 31–49 (1998) 13. Jaﬀard, S., Meyer, Y.: Wavelet methods for pointwise regularity and local oscillations of functions. American Mathematical Society (1996) 14. Mallat, S., Hwang, W.L.: Singularity detection and processing with wavelets. IEEE Transactions on Information Theory 38(2), 617–643 (1992) 15. Bigot, J.: Automatic landmark registration of 1d curves. In: Recent advances and trends in nonparametric statistics, pp. 479–496. Elsevier, Amsterdam (2003) 16. Deguy, S., Debain, C., Benassi, A.: Classiﬁcation of texture images using multi-scale statistical estimators of fractal parameters. In: British Machine Vision Conference, pp. 192–201 (2000) 17. Kaplan, L.M., Kuo, C.C.: Texture roughness analysis and synthesis via extended self-similar model. IEEE Transactions on PAMI 17(11), 1043–1056 (1995) 18. Martin, D., Fowlkes, C., Malik, J.: Learning to detect natural image boundaries using local brightness, color and texture cues. IEEE Transactions on PAMI 26(5), 530–549 (2004) 19. Ferrari, V., Fevrier, L., Jurie, F., Schmid, C.: Groups of adjacent contour segments for objet detection. IEEE Transactions on PAMI 30(1), 36–51 (2008) 20. Brown, M., Lowe, D.: Automatic panoramic image stitching using invariant features. International Journal of Computer Vision 74(1), 59–73 (2007) 21. Mallat, S., Zhong, S.: Characterization of signals from multiscale edges. IEEE Transactions on PAMI 14(7), 710–732 (1992) 22. Canny, J.: A computational approach to edge detection. IEEE Transactions on PAMI 8(6), 679–698 (1986) 23. Mallat, S.: A wavelet tour of signal processing. Academic Press, London (1998)

Line Enhancement and Completion via Linear Left Invariant Scale Spaces on SE(2) Remco Duits1,2 and Erik Franken2 1

Dept. of Mathematics and Computer Science 2 Dept. of Biomedical Engineering, Eindhoven University of Technology, Den Dolech 2, P.O.Box 513, 5600 MB Eindhoven, The Netherlands [email protected], [email protected]

Abstract. From an image we construct an invertible orientation score, which provides an overview of local orientations in an image. This orientation score is a function on the group SE(2) of both positions and orientations. It allows us to diﬀuse along multiple local line segments in an image. The transformation from image to orientation score amounts to convolutions with an oriented kernel rotated at multiple angles. Under conditions on the oriented kernel the transform between image and orientation score is unitary. This allows us to relate operators on images to operators on orientation scores in a robust way such that we can deal with crossing lines and orientation uncertainty. To obtain reasonable Euclidean invariant image processing the operator on the orientation score must be both left invariant and non-linear. Therefore we consider nonlinear operators on orientation scores which amount to direct products of linear left-invariant scale spaces on SE(2). These linear left-invariant scale spaces correspond to well-known stochastic processes on SE(2) for line completion and line enhancement and are given by group convolution with the corresponding Green’s functions. We provide the exact Green’s functions and approximations, which we use together with invertible orientation scores for automatic line enhancement and completion.

1

Introduction

In many medical imaging applications elongated structures (such as catheters, blood-vessels and collagen ﬁbres) appear only partially and vaguely in noisy medical image data, [9]. It is often desirable to process these images such that crossing elongated structures become more visible before actual detection takes place. Due to occlusions small parts of these line or edge-like structures may not be clearly visible, requiring line-completion, [15, 19, 1, 18, 7]. Furthermore, since the acquisition of, for example, X-ray images is harmful to a patient, the radiation dose is reduced as much as possible leading to very noisy images. Such images typically require line-enhancement, [9, 3] where the aim is to make the elongated structures more visible while reducing the noise. In this article we will consider operators for line enhancement, using diffusion equations on the non-commutative group SE(2) of planar translations X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 795–807, 2009. c Springer-Verlag Berlin Heidelberg 2009

796

R. Duits and E. Franken

and rotations. This group SE(2) is a semi-direct product of R2 and the circle T = {eiθ | θ ∈ [0, 2π)} ≡ SO(2) and is equipped with the following product

gg = (x, eiθ )(x , eiθ ) = (x+Rθ x , ei(θ+θ ) ), g = (x, eiθ ), g = (x , eiθ ) ∈ SE(2),

θ − sin θ with x = (x, y) ∈ R2 and Rθ = cos ∈ SO(2). sin θ cos θ Before we can apply line completion and enhancement to images we need a map Uf : SE(2) → C which provides an overview of all local orientations in the image f : R2 → R. There exist several approaches to construct such a map, see for example [11], [4], [19], [1], but only few methods put emphasis on the stability of the inverse transformation Uf → f . However, well-posed image enhancement on the basis of local orientations in an image f can be done via the map Uf iﬀ there exists a stable transformation between image f and map Uf . In this article we restrict ourselves to the case where Uf = Wψ f is given by Wψ f (g) = ψ(Rθ−1 (y − x))f (y) dy, g = (x, eiθ ) ∈ SE(2), Rθ ∈ SO(2), (1) R2

i.e. the orientation score Wψ f is obtained from image f by convolution with a directed anisotropic kernel ψ ∈ L2 (R2 ) rotated at multiple angles. In section 2 we will show that for a certain class of directional kernels ψ, we obtain quadratic norm preservation and thereby a stable reconstruction formula. This allows us to relate operators on images to operators on orientation scores via a robust commuting diagram, see Figure 1 (where the precise details will follow later). Note that an invertible orientation score has useful properties: It carries per position a whole distribution of orientations and by invertibility it automatically unwraps crossing lines, [9, 8]. So instead of applying a diﬀusion directly on the image f we apply anisotropic diﬀusion on the corresponding orientation score Wψ f such that we take advantage of these properties. Now in order to obtain Euclidean invariant smoothing of the image the diﬀusion on the orientation score must be left invariant and therefore in section 3 we consider left invariant diﬀusions on orientation scores. These diﬀusions are Fokker-Plank PDE’s of wellknown stochastic processes for line completion and enhancement. We provide their exact solutions as SE(2)-convolutions with the explicit Green’s functions, which were strongly required by Mumford [15], Citti [3] and many others [19], [1] but hitherto unknown. Since our exact derivation of the Green’s functions (which are scale space kernels on SE(2)) is rather technical we omit the derivations here and focus only on the results. For details see our recent works [7], [6]. Instead we will consider the highly simpliﬁed case of scale space kernels on the circle T, which is often used in image analysis and quite analogous to the SE(2) case. This helps the reader to get a better grasp on the scale space kernels on SE(2). Finally, we include an experiment for both line enhancement and line completion. Here the advantage of our approach compared to our previous work [8] on non-linear diﬀusion on SE(2), is that it involves less parameters, it is easier to grasp from a stochastic point of view and easier to implement (in parallel). The drawback, however, is that this scheme is less adaptive. For various biomedical engineering applications we refer to our thesis, [9, 18, 4].

Line Enhancement and Completion

2

797

Invertible Orientation Scores

The transformation between an image f : R2 → R and an orientation score Wψ f : R2 T → R given by (1) is a wavelet transformation generated by a reducible representation U : SE(2) → B(L2 (R2 )) of the Euclidean motion group SE(2) = R2 T into the space of bounded operators in L2 (R2 ). This important observation needs some explanation. By deﬁnition a representation of the group SE(2) (with unit element e = (0, ei0 )) is an isomorphism between SE(2) and the space of bounded operators on L2 (R2 ), which means that Ug ◦ Uh = Ugh for all g, h ∈ SE(2) and Ue = I. In our case we have Ug f (y) = f (Rθ−1 (y − x)), for all f ∈ L2 (R2 ), g = (x, eiθ ). The transform which maps an image f to orientation score Wψ f given by (1) can now be rewritten in an L2 -inner product form: Wψ [f ](g) = (Ug ψ, f )L2 (R2 ) , g ∈ SE(2),

(2)

which is the standard group theoretical structure of a continuous wavelet transform. However, we restrict ourselves initially to a single scale, like in [11]. The issue of scale comes into play later on by the diﬀusions on the orientation scores. Note that in standard continuous wavelet theory on the group of translations, rotations and scalings, it is not possible to obtain a stable reconstruction from a single scale layer as this conﬂicts [6, ch:2] the admissibility condition, [12]. The same holds for edgelets, curvelets, ridgelets [2]. Moreover, the admissibility condition in standard wavelet theory, [12], requires the wavelet to oscillate in radial direction which is undesirable with the diﬀusions we consider later on. However, in contrast to the standard approach, [12], our representation U is reducible, which means that there exists a closed subspace of L2 (R2 ) which is invariant under Ug for all g ∈ SE(2). Consider for example the closed subspace: L2 (R2 ) = {f ∈ L2 (R2 ) | support{F f } ⊂ B0, },

(3)

where B0, denotes a ball around 0 ∈ R2 with radius > 0 and where fourier 1 −iω ·x f (x)dx. Contransform F : L2 (R2 ) → L2 (R2 ) is given by F f (ω) = 2π Re sequently, the celebrated result of Grossmann et al. [10] on stable reconstruction does not apply. Therefore in previous work [4] we showed that under minor conditions on ψ the wavelet transform Wψ is a unitary map from L2 (R2 ) onto some reproducing kernel space of L2 -functions on SE(2). Here we avoid technicalities and just provide the essential formula which describes the stability: R2 T

|(FWψ f )(ω, eiθ )|2 dθ Mψ1(ω ) dω = |(Ff )(ω)|2 |Fψ(RθT ω)|2 dθ 2 T R = R2 |(Ff )(ω)|2 dω = f 2L2 (R2 ) ,

1 Mψ (ω )

dω

(4)

2π where Mψ ∈ C(R2 , R) is defined by Mψ (ω) := 0 |Fψ(RθT ω)|2 dθ. If ψ is chosen such that Mψ = 1 then we get L2 -norm preservation. However, this is not possible as ψ ∈ L2 ∩ L1 (R2 ) implies that Mψ is a continuous function

798

R. Duits and E. Franken

vanishing at inﬁnity. This can be taken into account using distributional kernels [4]. In practice however, because of ﬁnite grid sampling, we can just restrict Wψ to the space of bandlimited images L2 (R2 ) given by (3) and use localized wavelets ψ with the property that Mψ (ω) = M(ρ), ρ = ω , where M : [0, ] → R+ is a smooth approximation of 1[0,) . We call these wavelets proper wavelets. Exact reconstruction is obtained by the adjoint wavelet transform Wψ∗ : → f = Wψ∗ Wψ [f ] = F −1 ω

2π 0

F[Uf (·, eiθ )](ω) F[Reiθ ψ](ω) dθ Mψ−1 (ω)

,

(5)

where the rotated kernel is given by Reiθ ψ(x) = ψ(Rθ−1 x). Now for proper wavelets one may as well use the (approximative) reconstruction: → F −1 ω

2π 0

F[Uf (·, eiθ )](ω) F[Reiθ ψ](ω) dθ

(6)

In [4] we construct two diﬀerent classes of proper wavelets. Here we shall brieﬂy mention a typical example of one particular class (for the other class see [18, 4]) that even allows a reconstruction by integration over θ only, which is practical, fast and intuitive. Example. Let B k be a k-th order B-spline, i.e. B k = B k−1 ∗ B 0 , with B 0 (x) = 1[− 12 , 12 ] then we set (with ω = (ρ cos φ, ρ sin φ)): ψ(x) = F

−1

[ω →B

k

nθ (φmod 2π − 2π

π ) 2

M(ρ)](x) ,

and where nθ equals the number orientation samples in our orientation score, ρ2

k

controls “kernel-width” and M(ρ) = e− 2σ2 ( 4k=0 (−1)k 2−1 σ −2 ρ2 )−1 , σ = 2 . Now that we have constructed a stable transformation between images f and corresponding orientation scores Uf we can relate operators Υ on images to operators Φ on orientation scores in a robust manner, see Figure 1. This relation is 1-to 1 if we ensure that the operator on the orientation score again provides an orientation score of an image. However the operators Φ that we will propose in the remainder of this article will not leave the space of orientations scores (which we from now on denote by CSE(2) ) invariant, i.e. the processed orientation score K will not be the orientation score of an image but just some enhanced square integrable element Φ(Wψ f ) in L2 (SE(2)). In practice however, this does not matter since we naturally extend the reconstruction formula to L2 (SE(2)): → (Wψ∗ )ext U (g) = F −1 ω

2π 0

F[U (·, eiθ )](ω) F[Reiθ ψ](ω) dθ Mψ−1 (ω) (x),

(7)

for all U ∈ L2 (SE(2)), where g = (x, eiθ ) ∈ SE(2). So there arise no practical problems, however one should be aware that the eﬀective part of an operator Φ on an orientation score is in fact Pψ Φ where Pψ = Wψ (Wψ∗ )ext is the orthogonal projection of L2 (SE(2)) onto the space of orientation scores CSE(2) . K Next we give a brief motivation why we must restrict ourselves to left invariant operators on orientation scores: It can be veriﬁed that Wψ ◦ Ug = Lg ◦ Wψ for all g ∈ SE(2), where the left-regular representation L : G → B(L2 (SE(2))) is

Line Enhancement and Completion

799

given by Lg U (h) = U (g −1 h). Consequently, the eﬀective operator Υ on images is Euclidean invariant iﬀ the operator Φ on orientation scores is left-invariant: Υ ◦ Ug = Ug ◦ Υ for all g ∈ SE(2) ⇔ Φ ◦ Lg = Lg ◦ Φ for all g ∈ SE(2), (8) for further details see [4, Thm. 21, p.153]. It is well-known that the only leftinvariant kernel operators are convolutions. On SE(2) they are given by (K ∗SE(2) U )(g) =

2π K(h−1 g)U (h) dμ(h) = K(RθT (x−x ), θ−θ ) U (x , θ )dθ dx, (9)

SE(2)

R2 0

with g = (x, eiθ ) and μ the left-invariant Haar-measure on SE(2). For a detailed overview of alternative algorithms (including complexity, steerability, performance, relation to Fourier transform on SE(2), relation to tensor voting methods [9], [13] and extension to 3D), see the most complete and recent work [9, ch:3, p.53, p.72], containing new faster algorithms for steerable SE(2) convolutions, [9, ch: 3.5.1], [18, ch: 6.5.1, 6.5.2], [4, ch: 7.8, 5.4, 5.3.2]. However, the operators on orientation scores should not be linear, since this would imply that the eﬀective operator Υ is a rotation and translation invariant kernel operator and thereby [16], Υ would be a R2 -convolution with an isotropic kernel. Clearly, in this case one does not require invertible orientation scores. Therefore, based on the works [19], [1], [17] on “completion ﬁelds”, we consider so-called “collision distribution operators” which are given by ˜ (10) (Φ(U, V ))(g) = (Rγ ∗SE(2) (χ(U )))(g) · (Rγ ∗SE(2) (χ(V )))(g), ∞ −γt where Rγ (g) = γ 0 e Kt (g) dt, g ∈ SE(2), is a time integrated probability kernel obtained from a scale space kernel Kt : SE(2) → R+ (satisfying Kt1 ∗SE(2) Kt2 = Kt1 +t2 ) which we shall derive in section 3 and where U and V denote two initial distributions on SE(2). Finally, χ in (10) is a monotonic, homogenous greyvalue transformation on orientation scores such as χ(U )(x, y, θ) = F (Re{U (x, y, θ)}), with F : R → R given by F (I) = |I|p sign(I), for some p > 1. Here we do not put sources and sinks by hand as delta-distributions on SE(2), [19], but we use invertible orientation scores instead. So in (10) we set ˜ ψ f, Wψ f ). U = V = Wψ f and consider the operators Wψ f → Φ(Wψ f ) := Φ(W The motivation for our choice (10) comes from basic probability theory which we explain next.

3

Scale Spaces on SE(2) Based on Stochastic Processes

By the results of the previous section an operator on orientation scores must be left invariant. Therefor we consider left invariant scale spaces. The PDE’s of these scale spaces are stochastic diﬀerential equations corresponding to left invariant stochastic processes for line enhancement/completion. Just like an image can be interpreted as a distribution of greyvalue particles over space, the absolute value of an orientation score can be interpreted as a

800

R. Duits and E. Franken

Image

Wψ

2

f ∈ L2 (R )

SE(2)

⊂ L2 (SE(2)) Φ

Υ

Processed Image Υ[f ] = Wψ∗ [Φ[Uf ]]

Orientation Score Uf ∈ CK

Processed Score ∗ ext ∗ (Wψ ) = Wψ Pψ

Φ[Uf ] ∈ L2 (SE(2))

Fig. 1. The complete scheme; for admissible vectors ψ the linear map Wψ is unitary SE(2) from L2 (R2 ) onto the closed subspace of orientation scores CK within L2 (SE(2)). SE(2) SE(2) So we can uniquely relate an operator Φ : CK → CK on an orientation score to an operator on an image Υ = (Wψ∗ )ext ◦ Φ ◦ Wψ ∈ B(L2 (Rd )), where (Wψ∗ )ext is ˜ ψ f, Wψ f ) is given by (10) using the Green’s given by (7) and where Φ(Wψ f ) = Φ(W functions/probability kernels Ks := GD,a : SE(2) → R+ of the scale spaces (for line s enhancement and completion ) on SE(2), that we shall derive in section 3.

distribution of oriented greyvalue particles over space and orientation. Next we derive suitable stochastic processes on this distribution of oriented greyvalue particles. We ﬁrst consider a single oriented greyvalue particle with initial position X(0) and orientation eiΘ(0) in SE(2). We will apply superposition afterwards. For line completion this oriented greyvalue particle is send in the spatial plane along its preferred direction eξ = cos θ ex +sin θ ey , ξ = x cos θ+y sin θ , allowing random behavior (with variance σ 2 > 0) of its orientation over time: √ (Xn+1 , Θn+1 ) := (Xn , Θn )+Δs(cos Θn ex + sin Θn ey , κ0 ) + Δs σ n+1 (0, 0, 1), (X0 , Θ0 ) = (0, 0), where n+1 ∼ N (0, 1) independently normally distributed ,

(11)

with steps Δs = L/N , total length L of the trajectory, n = 0, 1, . . . , N −1, N ∈ N and κ0 an a priori curvature. This stochastic process is known in computer vision as the direction process [15], see Figure 2. By inﬁnite repetition of this process one gets a limiting distribution G : SE(2) × R+ → R+ of greyvalue particles which (by Ito’s formula) satisﬁes the following Fokker-Plank equation

∂s G(x, y, θ, s) = −∂ξ − κ0 ∂θ + D11 (∂θ )2 G(x, y, θ, s) G(·, s = 0) = δg0 = δx0 ⊗ δy0 ⊗ δθ0

(12)

In a Markov-process traveling time s is memoryless. Therefore s must be negatively exponentially distributed, i.e. P (S = s) = γe−γs with expectation E(s) = γ −1 . Now by superposition the probability densities of ﬁnding an oriented greyvalue particle at time s > 0, at position (x, y) with orientation θ, starting from the distribution U ∈ L1 (SE(2)) at s = 0, equals

Line Enhancement and Completion 11 11 ∗SE(2) U )(x, y, θ) with GD (x, y, θ) = G(x, y, θ, s) P (x, y, θ | U, S = s) = (GD s s P (x, y, θ | U ) = R+ P (x, y, θ | U, S = s)P (S = s)ds = (RsD11 ∗SE(2) U )(x, y, θ) ,

801

(13)

∞ 11 where P (S = s) = γe−γs so that RγD11 = γ 0 GD e−γ s ds. For line enhances ment, we consider a diﬀerent stochastic process on SE(2): √ (Xn+1 , Θn+1 ) := (Xn , Θn ) + Δs (σ2 2n+1 (ex cos Θn +ey sin Θn ) , σ1 1n+1 ), i (X0 , Θ0 ) = (0, 0), with n+1 ∼ N (0, 1) independently normally distributed ,

(14)

where i = 1, 2. Again by inﬁnite concatenation of this process one gets a limiting distribution which satisﬁes following Fokker-Planck equation

11 ,D22 11 ,D22 ∂s GD (x, y, θ) = D11 (∂θ )2 + D22 (∂ξ )2 GD (x, y, θ) s,g0 s,g0 D11 G0,g0 (·) = δg0 = δx0 ⊗ δy0 ⊗ δθ0 ,

(15)

with ξ = x cos θ + y sin θ, which coincides with Citti’s model for perceptional enhancement in SE(2), [3]. Next we consider all linear left invariant 2nd-order scale spaces on the Euclidean motion group SE(2), whose solutions are SE(2)convolutions with the corresponding Green’s functions. In two particular cases we arrive at the Green’s functions (12) and (15). In contrast to previous work [15] and [3], we provide the exact Green’s functions. 3.1

Left-Invariant Scale Spaces on SE(2)

A vector ﬁeld X on SE(2) is called left invariant if for all g ∈ SE(2) the pushforward of (Lg )∗ Xe by left multiplication Lg h = hg equals Xg , that is (Xg ) = (Lg )∗ (Xe ) ⇔ Xg f = Xe (f ◦ Lg ), for all f ∈ C ∞ : Ωg → R,

(16)

where Ωg is some open set around g ∈ SE(2). Recall that the tangent space at the unity element e = (0, 0, ei0 ) is spanned by {ex , ey , eθ } = {(1, 0, 0), (0, 1, 0), (0, 0, 1)} and by the general recipe explained in [5] we get the following basis for the space of left-invariant vector ﬁelds, L(SE(2)): {A1 , A2 , A3 } := {∂θ , ∂ξ , ∂η } = {∂θ , cos θ ∂x + sin θ ∂y , − sin θ ∂x + cos θ ∂y }, with ξ = x cos θ + y sin θ and η = −x sin θ + y cos θ.

(17)

Note that the non-commutative behavior of the group is intuitively reﬂected in a non-commuting Lie-algebra: [A1 , A2 ] = A1 A2 − A2 A1 = A3 , [A1 , A3 ] = −A2 , [A2 , A3 ] = 0 . Next we follow our general theory for left invariant scale spaces on Lie-groups, [5] and set the following quadratic form on L(SE(2)), with a = (a1 , a2 , a3 ) ∈ R3 , Q

D,a

(A1 , A2 , A3 ) =

3 i=1

−ai Ai +

3 j=1

Dij Ai Aj

, D := [Dij ] ∈ R3×3 ,

(18)

802

R. Duits and E. Franken

with DT = D ≥ 0 and consider the linear left-invariant scale spaces on SE(2): s > 0, g ∈ SE(2). ∂s W (g, s) = QD,a (A1 , A2 , A3 ) W (g, s) , (19) W (g, s = 0) = U (g) , g ∈ SE(2). with corresponding resolvent equations (obtained by Laplace transform over s): Pγ (g) = γ(QD,a (A1 , A2 , A3 ) − γI)−1 U (g)

(20)

which (for the cases a = 0) correspond to ﬁrst order Tikhonov regularization on SE(2), [5]. By our results in [5], the solutions of these left invariant evolution equations are SE(2)-convolutions with the corresponding Green’s function: ∗SE(2) U )(x, θ), W (x, θ, s) = (GD,a s

Pγ (x, θ) = (RγD,a ∗SE(2) U )(x, θ)

where we recall (9). In the special case Dij = 12 σ 2 δi1 δj1 , i, j = 1, 2, 3, and a = (κ0 , 1, 0) our scale space equation (19) is the Fokker-Planck equation (12) of Mumford’s direction process for line completion and in the case Dij = Dii δij , D11 = 12 (σ1 )2 , D22 = 12 (σ2 )2 , D33 = 0, a = 0, our scale space equation is the Fokker-Planck equation (15) of the stochastic process for line enhancement. Next we provide the exact Green’s functions with suitable approximations, but ﬁrst we provide a simple intuitive, but nevertheless analogous, example. 3.2

A Simple Introductory Example: Scale Spaces on the Circle

The Gaussian scale space equation and corresponding resolvent equation on a circle T = {eiθ | θ ∈ [0, 2π)} with group product eiθ eiθ = ei(θ+θ ) , read

∂s u(θ, s) = D11 ∂θ2 u(θ, s), and pγ (θ) = γ(D11 ∂θ2 − γI)−1 f (θ), u(0, s) = u(2π, s) and u(θ, 0) = f (θ)

(21)

with θ ∈ [0, 2π) and D11 > 0 ﬁxed, where we recall that the function θ → ∞ pγ (θ) = γ 0 u(θ, s)e−γs ds is the minimizer of the energy

E(pγ ) :=

2π 0

γ|pγ (θ) − f (θ)|2 + D11 |pγ (θ)|2 dθ

under the periodicity condition pγ (0) = pγ (2π). By left-invariance the solutions are given by T-convolution with their Green’s function (or “impuls-response”), ∞ 11 11 −γs say GD : T → R+ and RγD11 : T → R+ . Note that RγD11 = γ 0 GD e ds. s s Now orthogonal eigenfunctions of the diﬀusion process correspond to eigenfunceinθ tions of the generator D11 (∂θ )2 and they are given by ηn (θ) = √ , so that 2π u(θ, s) = pγ (θ) =

2

(ηn , f )L2 (T) ηn (θ)e−n

n∈Z

sD11

(ηn , f )L2 (T) ηn (θ) D11 nγ2 +γ

n∈Z

2 11 , GD (θ, s) = ηn (θ)ηn (0)e−n sD11 , s n∈Z , RγD11 (θ) = ηn (θ)ηn (0) D11 nγ2 +γ . n∈Z

(22)

A well-known drawback of such an approach is that the series do not converge quickly if s > 0 resp. γ > 0 are small. In such case one of course prefers a spatial

Line Enhancement and Completion

803

implementation over a Fourier implementation, where one unfolds the circle and calculate modulo 2π-shifts afterwards, i.e. D11 ,∞ 11 11 u(θ, s) = (GD ∗ f )(θ) , where GD (θ) = Gs (θ − 2πn) s s (23) n∈Z

where the Green’s functions for diﬀusion and Tikhonov regularization on R are √ θ2 11 ,∞ 11 ,∞ −γs GD (θ) = (4πs)−1/2 e− 4s and RγD11 ,∞ (θ) = γ R+ GD e ds = γ2 e− γ|θ| . s s Again the latter formula follows by Laplace transform of the ﬁrst, but a better derivation is by means of a continuous (not diﬀerentiable) ﬁt at θ of two solutions in the nullspace of operator (∂θ2 + γ) which vanish resp. at +∞ and −∞. The 1 11 √θ sums in (23) can be computed explicitly, yielding GD (θ) = ϑ , e−s , s 2π 3 2 D11 where ϑ3 is a theta-function of the 3rd kind. 3.3

The Green’s Functions of the Line-Completion Process 2

Let us consider the case Dij = σ2 δi1 δj1 , a = (κ0 , 1, 0) where our scale space equation (19) equals the Fokker-Planck equation (12) of Mumford’s direction process. The next theorems provide formulas (like (22)) for the Green’s function in terms of Mathieu-functions, using the conventions as in cf. [14], meν (z, q), ceν (z, q) with Floquet exponent ν, such that Im(ν) ≥ 0. 11 ∈ C ∞ (SE(2) \ {e}), of the diTheorem 1. The Green’s functions RγD11 , GD s rection process with κ0 = 0, i.e. the unique smooth solutions of

D

D11 2 11 ∂ξ −D11 ∂θ2 + γ RγD11 = γ δe , ∂s Gs = −∂ξ + D11 ∂θ Gs D11 D11 11 D11 D11 Gs (·, 0) = Gs (·, 2π), lim GD = δe s Rγ (·, 0) = Rγ (·, 2π) s↓0

are

RγD11 (x, y, θ)=F −1

⎛

∞ ω →

11 GD (x, y, θ) = F −1 ⎝ω → s

n=0 ∞ n=0

γ cen π 2 λn (ρ)

a2 (ρ)s − n D11 e π2

cen

−ϕ , i D2ρ11 2 −ϕ , i D2ρ11 2

cen cen

θ−ϕ , i D2ρ11 2 θ−ϕ , i D2ρ11 2

(24)

(x, y), ⎞ ⎠ (x, y)

(25)

2iρ with ω = (ρ cos ϕ, ρ sin ϕ), −λn (ρ) = −an ( D )−γ < 0, where an (h2 ) denote the 11 positive eigenvalues of Mathieu’s equation, cf. [14]. The Green’s function RγD11 is indeed a probability kernel, i.e. RγD11 > 0 and SE(2) RγD11 (g)dg = 1 .

For detailed proof we refer to our latest work [7], where our most relevant observation is that the generator B = −∂ξ + D11 ∂θ2 of the line-completion process in the Fourier domain (only with respect to (x, y)) reads ˆ = FBF −1 = −iωx cos θ − iωy sin θ + D11 ∂θ2 = −iρ cos(θ − ϕ) + D11 ∂θ2 , B

(26)

so (25) is a bi-orthogonal expansion of eigen functions (directly related to the Mathieu functions, which are eigen functions of ∂z2 −2h2 cos(2z)) of the restriction ˆ + γI to the circle T. The formulae (25) are the exact solutions of the of −B numerical algorithm by August [1]. The drawback, however, of this bi-orthogonal

804

R. Duits and E. Franken

expansion is the speed of convergence near e. This inspired us to ﬁnd a much better series representation: The idea here is to make a continuous (but not diﬀerentiable at θ = 0!) ﬁt of elements on each side of the singularity at θ = 0 ˆ within the nil-space of −B+γI that vanish at θ → ±∞. Here we unfold the circle providing a series of 2π-shifts of the solutions with inﬁnite boundary conditions. For relevant parameter settings this series can be truncated at N = 0, 1 or at the most N = 2 if D11 /γ is small. This yields the following analogue of (23): Theorem 2. The Green’s function RγD11 of the direction process with a priori curvature κ0 ≥ 0, i.e. the unique smooth solution of (24), is given by ˆ γD11 (ω, θ))(x), RγD11 (x, θ) = F −1 (ω →R N

ˆ D11 (ω, θ) = lim with R γ

N →∞ k=−N

ˆ D11 (ω, θ − 2kπ), where the Fourier transform R γ,∞

D11 D11 D11 ˆ γ,∞ R = F Rγ,∞ of the solution Rγ,∞ of

∂ξ − D11 (∂θ )2 + γ RγD11 ,∞ = γδe RγD11 ,∞ (·, θ) → 0 uniformly on compacta as |θ| → ∞

θ 2ρ , i me−ν ϕ− u(θ) D11 2 ϕ 2ρ ϕ−θ 2ρ + me−ν 2 , i D11 meν 2 , i D11 u(−θ) ,

κ0 θ

D11 ˆ γ,∞ is given by R (ωx , ωy , θ) =

−γe 2D11 πD11 W (ρ)

with Floquet exponent given by ν = ν 1 2 (1+sign(θ)), me−ν (·, i D2ρ11 ).

meν −4γ D11

ϕ , i D2ρ11 2

−

κ20 2iρ 2 , D D11 11

, unitstep function u(θ) =

and the function W (ρ) denotes the Wronskian of meν (·, i D2ρ11 ) and

Remark. The sum in Theorem 2 can be computed explicitly (by Floquet’s theorem) yielding a single exact formula on SE(2) consisting of only 4 Mathieu functions [18, p.127], [7, ch:4.2.1], likewise the ϑ-function of subsection 3.2. However it still requires sampling of Mathieu-functions, therefore we derive a parametrix (see [3], [17]) by replacing the true left invariant vector ﬁelds {A1 , A2 , A3 } on SE(2) by {Aˆ1 , Aˆ2 , Aˆ3 } = {∂θ , ∂x + θ∂y , −θ∂x + ∂y }. (27) Essentially, this replaces the group of positions and orientations, locally by the (nilpotent) group of positions and velocities (normalized in x-direction). See Figure 2. By some theory on nilpotent Lie groups it follows, see [4]p.166, that 11 ˜D G (x, y, θ) = s

yielding R˜ γD11 (x, y, θ) = γ 3.4

√

−

3e

√

3 2 D11 πx2

δ(x−s) e

−

3(xθ−2y)2 +x2 (θ−κ0 x)2 4x3 D11

3(xθ−2y)2 +x2 (θ−κ0 x)2 −γx 4x3 D11

2 D11 πx2

,

(28)

u(x) by Laplace transform.

The Green’s Functions of the Line-Enhancement Process

In this paragraph we will derive the Green’s functions of the line-enhancement process (14). These kernels are the exact heat-kernels for a Gaussian scale space

Line Enhancement and Completion

g(t)

y

y

4

4

0

0

805

-4

y

-P

x

x

Q

0

P

y

4

8

12 x

-P

Q

0

P

0

0

Q

Q

-P

-P

Ƨ

image plane

-4 0

0

4

8

12 x

Fig. 2. Left: Random walks of the direction process (11). Middle: isoline-plots of the marginals of RγD11 , which is the time-integrated limiting distribution (using Ito1 1 calculus) of all random walks, γ = 10 , D11 = 32 , κ0 = 0. Left corner middle image: 3D-plot of 2D-isolines of RγD11 . Right: a comparison of the level curves of the marginals ˜ γD11 and RγD11 . Dashed lines denote the level sets of approximation R ˜ γD11 , see (28). of R

on the group of positions and orientations. Here we even allow D33 ≥ 0. Set D33 = 0 to get the Green’s functions of (14). Theorem 3. Let D11 , D22 , D33 > 0, then the Green’s function (or rather Gaus11 ,D22 ,D33 sian kernels) GD on the Euclidean motion group SE(2) of the scale s space equation (19) generated by (18) with a = 0 and Dij = Dii δij is given by 11 ,D22 ,D33 11 ,D22 ,D33 ˆD GD (b1 , b2 , eiθ ) = F −1 [ω →G (ω, eiθ )](b1 , b2 ), with s s 11 ,D22 ,D33 ˆD G (ω, eiθ ) s

with q = q(ρ) =

e−s(1/2)(D22 +D33 )ρ = π

ρ2 (D22 −D33 ) 4 D11

2

∞

cen (ϕ, q)cen (ϕ − θ, q)e

−s an (q)D11

n=0

and an (q) the Mathieu Characteristic. For relatively D

,D

,

11 22 D33 simple formula for the corresponding resolvent Green’s functions Rγ,∞ analogous to the formulas in Theorem 2 see [6, part I, Thm 5.2, 5.3]. For D33 < D22 the resolvent (or Tikhonov regularization) kernel on SE(2) D=diag{D11 ,D22 ,D33 } Rγ is given by

γ [FRγD (·, θ)](ω) = 4πD11 ceν (0,q) { seν (0,q) (− cot(νπ) (ceν (ϕ, q) seν (ϕ − θ, q) + seν (ϕ, q) seν (ϕ − θ, q)) + + ceν (ϕ, q) seν (ϕ − θ, q) − seν (ϕ, q) ceν (ϕ − θ, q)) u(θ) (− cot(νπ) (ceν (ϕ, q) ceν (ϕ − θ, q) − seν (ϕ, q) seν (ϕ − θ, q)) + ceν (ϕ, q) seν (ϕ − θ, q) + seν (ϕ, q) ceν (ϕ − θ, q)) u(−θ) },

(D22 −D33 )ρ2 , ω 4D11 γ+(1/2)(D22 −D33 )ρ2 − . D11

with q = a=

(29)

= (ρ cos ϕ, ρ sin ϕ), Floquet exponent ν = ν(a, q) and

For the corresponding Green’s functions on the group of positions and velocities, recall (27), see [5]. Finally, in [6, ch:5.4] we derive the useful formula: KsD11 ,D22 (x, y, eiθ )≈ 1

− 4s c2 1 4πs2 D11 D22 e

where we recall (17).

θ2 D11

2

2

θ (y−η) + 4(1−cos(θ)) 2D

22

2

1 11 D22

+D

θ2 (ξ−x)2 4(1−cos θ)2

(30) ,

806

R. Duits and E. Franken a input

b output

c output

d output

e output

f output

xy-marginal Green’s functions:

Corresponding Orientation scores a Wavelet d

f

Fig. 3. Top row: a: noisy input image f , b: |f |p signf , c: (Wψ∗ )ext (χp (Wψ f )) with p = 1.5, d,e: line enhancement using time-dependent diﬀusion kernel depicted below, f: line completion using resolvent completion-kernel depicted below. All involved orientation scores are sampled on a 100×100×64 grid. Circles depict parts where a clear diﬀerence arises between line completion and enhancement. Middle row: proper wavelet ψ in example of section 2 (par’s k = 2, nθ = 64), Green’s function line enhancement process par’s D11 = 0.00015, D22 = 1, s = 15, using asymptotic formula (30), Green’s function 1 line enhancement process D11 = 0.00015, D33 = 1, γ = 64 , Green’s function line 1 completion process D11 = 0.0024, γ = 64 . Bottom row: Slices Wψ f (·, ·, eiθk ) for θk = π , k = 0, . . . , 5. (2k + 1) 32

Now that we have derived the Green’s functions we return to our scheme, Fig. 1. First, we construct an invertible orientation score (1). Then we convolve the orientation score with 2 Green’s functions to compute the direct product in (10) and ﬁnally we apply the inverse transform (6). For experiments, see Fig. 3.

4

Conclusion

Since the transformations between image and orientation score are stable, we can apply image processing via orientation scores. To ensure Euclidean invariance of the operator on an image the corresponding operator on its orientation score must be left-invariant. Therefore we consider left-invariant scale spaces on these orientation scores based on stochastic processes on the group SE(2), the solutions of which are given by SE(2)-convolution with the corresponding Green’s functions, which we derived explicitly.

Line Enhancement and Completion

807

References 1. August, J.: The Curve Indicator Random Field. PhD thesis, Yale University (2001) 2. Candes, F.: New ties between computational harmonic analysis and approximation theory. Approximation Theory X, Innov. Appl. Math. (6), 87–153 (2000) 3. Citti, G., Sarti, A.: A cortical based model of perceptual completion in the rototranslation space. JMIV 24(3), 307–326 (2006) 4. Duits, R.: Perceptual Organization in Image Analysis. PhD thesis, Eindhoven University of Technology, Dep. of Biomedical Engineering, The Netherlands (2005) 5. Duits, R., Burgeth, B.: Scale spaces on lie groups. In: Sgallari, F., Murli, A., Paragios, N. (eds.) SSVM 2007. LNCS, vol. 4485, pp. 300–312. Springer, Heidelberg (2007) 6. Duits, R., Franken, E.M.: Left-invariant parabolic evolutions on SE(2) and contour enhancement via invertible orientation scores, part i: Linear left-invariant diﬀusion equations on SE(2). Quarterly of Appl. Math. (to appear, 2009) 7. Duits, R., van Almsick, M.A.: The explicit solutions of linear left-invariant second order stochastic evolution equations on the 2d-euclidean motion group. Quarterly of Applied Mathematics 66, 27–67 (2008) 8. Franken, E., Duits, R., ter Haar Romeny, B.M.: Nonlinear diﬀusion on the 2D euclidean motion group. In: Sgallari, F., Murli, A., Paragios, N. (eds.) SSVM 2007. LNCS, vol. 4485, pp. 461–472. Springer, Heidelberg (2007) 9. Franken, E.M.: Enhancement of Crossing Elongated Structures in Images. PhD thesis, Dep. of Biomedical Engineering, Eindhoven University of Technology, The Netherlands, Eindhoven (October 2008) 10. Grossmann, A., Morlet, J., Paul, T.: Integral transforms associated to square integrable representations. J. Math. Phys. 26, 2473–2479 (1985) 11. Kalitzin, S.N., ter Haar Romeny, B.M., Viergever, M.A.: Invertible apertured orientation ﬁlters in image analysis. IJCV 31(2/3), 145–158 (1999) 12. Louis, A.K., Maass, P., Rieder, P.: Wavelets, Theory and Applications. Wiley, New York (1997) 13. Medioni, G., Lee, M.S., Tang, C.K.: A Computational Framework for Segmentation and Grouping. Elsevier, Amsterdam 14. Meixner, J., Schaefke, F.W.: Mathieusche Funktionen und Sphaeroidfunktionen. Springer, Heidelberg (1954) 15. Mumford, D.: Elastica and computer vision. Algebraic Geometry and Its Applications, pp. 491–506. Springer, Heidelberg (1994) 16. Sporring, J., Nielsen, M., Florack, L.M.J., Johansen, P.: Gaussian Scale-Space Theory. KAP, Dordrecht (1997) 17. Thornber, K.K., Williams, L.R.: Analytic solution of stochastic completion ﬁelds. Biological Cybernetics 75, 141–151 (1996) 18. van Almsick, M.A.: Context Models of Lines and Contours. PhD thesis, Eindhoven University of Technology, Department of Biomedical Engineering, Eindhoven, The Netherlands (2007) ISBN:978-90-386-1117-4 19. Zweck, J., Williams, L.R.: Euclidean group invariant computation of stochastic completion ﬁelds using shiftable-twistable functions. JMIV 21(2), 135–154 (2004)

Spatio-Featural Scale-Space Michael Felsberg Computer Vision Laboratory, Linköping University, S-58183 Linköping, Sweden [email protected]

Abstract. Linear scale-space theory is the fundamental building block for many approaches to image processing like pyramids or scale-selection. However, linear smoothing does not preserve image structures very well and thus non-linear techniques are mostly applied for image enhancement. A diﬀerent perspective is given in the framework of channelsmoothing, where the feature domain is not considered as a linear space, but it is decomposed into local basis functions. One major drawback is the larger memory requirement for this type of representation, which is avoided if the channel representation is subsampled in the spatial domain. This general type of feature representation is called channel-coded feature map (CCFM) in the literature and a special case using linear channels is the SIFT descriptor. For computing CCFMs the spatial resolution and the feature resolution need to be selected. In this paper, we focus on the spatio-featural scale-space from a scaleselection perspective. We propose a coupled scheme for selecting the spatial and the featural scales. The scheme is based on an analysis of lower bounds for the product of uncertainties, which is summarized in a theorem about a spatio-featural uncertainty relation. As a practical application of the derived theory, we reconstruct images from CCFMs with resolutions according to our theory. The results are very similar to the results of non-linear evolution schemes, but our algorithm has the fundamental advantage of being non-iterative. Any level of smoothing can be achieved with about the same computational eﬀort.

1

Introduction

The concept of scale is a central ingredient to many image analysis and computer vision algorithms. Scale was ﬁrst introduced systematically in terms of the concept of linear scale-space [1, 2, 3], establishing a 3D space of spatial coordinates and a scale coordinate. Often identiﬁed with Gaussian low-pass ﬁltering, a rigorous analysis of underlying scale-space axioms [4] has led to the discovery of the Poisson scale-space [5] and more general α scale-spaces [6]. In practice, discrete scale-spaces are mostly sub-sampled with increasing scale parameter, leading to the concept of scale-pyramids [7, 8], multi-scale analysis

The research leading to these results has received funding from the European Community’s Seventh Framework Programme (FP7/2007-2013) under grant agreement n◦ 215078 (DIPLECS).

X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 808–819, 2009. c Springer-Verlag Berlin Heidelberg 2009

Spatio-Featural Scale-Space

809

and wavelet theory [9, 10]. While pyramids and wavelets speedup the computation of linear operators and transforms, non-linear scale-space methods are widely used, e.g. for image enhancement. Non-linear scale-space is based on a non-stationary or anisotropic diﬀusivity function [11, 12]. More recently, non-linear methods have been introduced which are less directly connected to linear scale-space space and diﬀusion, but allow for faster processing and partially superior results [13, 14]. The former method is based on wavelets, whereas the latter one is based on the channel representation [15] and is called channel smoothing. Combining the channel representation with a systematic decimation of spatial resolution, similar to the pyramid approach, has been applied in blob-detection [16] and in channel-coded feature maps (CCFM) [17, 18], a density representation in spatio-featural domain, see also [19]. In this paper, we propose a new spatio-featural scale-space approach including an image reconstruction algorithm, which generates images from CCFMs. The CCFM scale-space is generated by applying the principles of linear scale-space to the spatial resolution of CCFMs and simultaneously increasing the resolution of feature space. By subsampling this space and subsequent reconstruction, image evolutions are generated which are very similar to those generated by iterative methods. We show some examples and propose a scale-selection scheme based on a new uncertainty relation: the spatio-featural uncertainty relation. In the Section 2, we introduce lesser known relevant techniques: channel representation, channel smoothing, CCFMs. In Section 3 we propose the novel reconstruction algorithm, deﬁne the linear scale-space of CCFMs, and formulate a scale-selection scheme based on a spatio-featural uncertainty relation. In Section 4 we present experimental results and in Section 5 we give some concluding remarks.

2 2.1

Required Methods The Channel Representation

Channel coding, also called population coding [20, 21], is a biologically inspired data representation, where features are represented by weights assigned to ranges of feature values [22, 15], see Fig. 1. Similar feature representations can also be found in the visual cortex of the human brain, e.g. in the cortical columns. The closer the current feature value f to the respective feature interval center n, the higher the channel weight cn : cn (f ) = k(f − n) ,

(1)

where k(·) is a suitable kernel function and where f has been scaled such that it has a suitable range (note that we chose to place the channel centers at integers). By introducing z as a continuous feature coordinate, kn (z) = k(z − n), and δf (z) = δ(z − f ) denoting the Dirac-delta at f , the encoding can be written as a scalar product cn (f ) = δf |kn =

δf (z)kn (z) dz

(2)

810

M. Felsberg

orientation Fig. 1. Orientation distribution is encoded into channels, resulting in a (low-pass ﬁltered) reconstruction. Figure courtesy Erik Jonsson.

or as a sampled correlation in the feature-domain: cn = (δf k)(n) = δf (z )k(z − z) dz

.

(3)

z=n

From the weights of all channels the feature value can be decoded unambiguously by ﬁnding the mode, where the decoding depends on the kernel function. In some theoretic considerations we will consider Gaussian functions as kernels but in the practical implementation we have been using quadratic B-splines: ⎧ (z + 3/2)2 /2 −3/2 < z ≤ −1/2 ⎪ ⎪ ⎪ ⎨3/4 − z 2 −1/2 < z ≤ 1/2 B2 (z) = (4) 2 ⎪ (z − 3/2) /2 1/2 < z < 3/2 ⎪ ⎪ ⎩ 0 otherwise The features can be scalar valued or vector valued, e.g. grey-scales, color vectors, or orientations. In the case of scalar features the decoding from quadratic B-splines has been considered in detail in [14], which we will not repeat here. For the case of non-interfering channel weights, a simpliﬁed scheme based on the quotient of linear combinations can be used: Mn = cn−1 + cn + cn+1

n0 = arg max Mn

cn +1 − cn0 −1 + n0 (5) fˆ = 0 M n0

where fˆ is our estimate of the feature f that had been encoded in cn . Channel representations obviously need more memory than directly storing features, but this investment pays oﬀ in several ways which we will show in the subsequent sections.

Spatio-Featural Scale-Space

2.2

811

Channel Smoothing and Channel-Coded Feature Maps

The idea of channel smoothing is based on considering the feature f in the encoding (1) as a stochastic variable. It has been shown in [14] that the distribution pf is approximated by cn in expectation sense: E{cn (f )} = (pf k)(n)

(6)

such that fˆ becomes a maximum-likelihood estimate of f . If we assume that pf is locally ergodic, we can estimate fˆ from a local image region, which corresponds to a local averaging of the channel weights within a spatial neighborhood. The algorithm consisting of the three steps channel encoding, channel averaging, and channel decoding is called channel smoothing and has been shown to be superior to many other robust smoothing methods [14]. Due to the assumption of (piecewise) constant distributions, the positioning of region boundaries might violate the sampling theorem, resulting in unstable edge-pixels. To avoid this eﬀect, a modiﬁcation to the channel decoding has been proposed in [23], called α-synthesis, which creates smooth transitions between neighborhoods with diﬀerent feature levels. Instead of extracting the global maximum in (5), all local maxima are extracted located at channels nr . The decoding is then obtained according to ˆ α cnr +1 − cnr −1 r f r M nr ˆ ˆ fr = + nr . (7) f= α Mnr l M nl For the choice of α see [23]; we used α = 2 throughout this paper. One major drawback of channel smoothing is the extensive use of memory if many feature channels are required. A high density of channels is only reasonable if the spatial support is large, which implies that the individual feature channels are heavily low-pass ﬁltered along the spatial dimension. Therefore, the feature channels have a lower band limit and can be subsampled in the spatial domain without losing information. If the three steps of channel encoding, channel averaging, and subsampling are integrated into a single step, channel-coded feature maps (CCFMs) are generated. The advantage of CCFMs is a much higher number of channels, e.g. by combining several features as in Fig. 2, without increasing the memory requirements signiﬁcantly. The CCFM encoding of a single feature point can be written as (cf. (1)): cl,m,n (f (x, y), x, y) = kf (f (x, y) − n)kx (x − l)ky (y − m) ,

(8)

where kf , kx , ky are the 1D kernels in feature domain and spatial domain. Note that x and y are scaled such that they suit the integer spatial channel centers l, m. Note further, that the previous deﬁnition of CCFMs assumes separable kernels, but we could easily use non-separable kernels, e.g. in the case of orientation data. Similar to (1), the encoding (8) of a set of feature points can be written as a scalar product in 3D function space or as a 3D correlation, where we use δf (x, y, z) = δ(z − f (x, y))

(9)

812

M. Felsberg

Fig. 2. Simultaneous encoding of orientation and color in a local image region. Figure taken from [17] courtesy Erik Jonsson.

and kf,n (z) = kf (z − n), kx,l (x) = kx (x − l), ky,m (y) = ky (y − m): δf (x, y, z)kf,n (z)kx,l (x)ky,m (y) dz dy dx cl,m,n (f ) = δf |kf,n kx,l ky,m = = (δf (kf kx ky ))(n, m, l).

(10)

The ﬁnal formulation is the starting point of the CCFM scale-space.

3

The CCFM Scale-Space

In this section, we introduce the concept of CCFM scale-space. Our considerations are based on CCFMs computed from grey-scale images, i.e., we consider f : R2 → R+ instead of a more general feature function. 3.1

Linear Scale-Space Theory in the Spatio-Featural Domain

The starting point is to embed the image f (x, y) as a 3D surface according to (9). One might try to generate a 3D α scale-space [6] (Gaussian as a special case α = 1 and all α-kernels are symmetric, i.e., correlation and convolution are the same): Fs (x, y, z) = (ks(α) δf )(x, y, z) (11) However, the semi-group property of scale-space implies that all dimensions (spatial dimensions and the feature dimension) become increasingly blurred. Despite the fact that this implies a rapidly growing loss of information with increasing scale and a singular zero scale, this procedure is insensible from a statistical perspective and does not comply with the notion of scale selection [24, 25]. Since the latter argument is not straightforward, we explain our rationale in some more detail. From the requirement that the dimensionless derivative attains its maximum at a position proportional to the wavelength of the signal [24] (section 13.1), we conclude that the scale of a structure is proportional to its spatial scale (a trivial fact) and anti-proportional to its feature scale. The latter can be shown by looking at the Taylor expansion of a harmonic oscillation

Spatio-Featural Scale-Space

813

A sin(ωx) in the origin: Aωx. The steepness of a sinusoid Aω in the origin grows linearly with the amplitude and the frequency, i.e., it is antiproportional to the wavelength λ = 2π ω . Alternatively, one can consider the energy of a harmonic oscillation. The energy is proportional to the square of the amplitude times the square of the 2 frequency: E ∝ A2 ω 2 ∝ A λ2 . That means, if we apply a 3D lowpass ﬁlter to the spatio-featural domain, the energy decays with a power of four. Hence, scale selection would favor the highest possible frequencies in nearly all cases. If we scale the amplitude anti-proportionally to the spatial domain, the change of energy is balanced and will reﬂect intrinsic properties of the signal. 3.2

The Spatio-Featural Uncertainty: The Linear Case

In what follows, we analyze a linear 1D signal resulting from, e.g., the cross section of a locally planar image. Images are observations of a stochastic process, i.e., each measurement of the signal at each position follows a certain distribution in the feature domain. Furthermore, our measurements are subject to stochastic position errors and deterministic distortions (e.g. point-spread function), resulting in a distribution in the spatial domain. If we assume stationarity and independence of the two distributions, we can model the densities by a separable function that is shift-invariant in (f, x)-space, see Fig. 3, top left.

f

nt die gra r a line

t

f

t ien rad rg a e lin

t

s

s

nel ker uss a G ral atu -fe o i t spa

x f

x ar line

t

nt die gra

s kf

kx x

Fig. 3. Illustration for the derivation of uncertainties of a linear gradient. Top left: spatio-featural distribution moving along a linear signal. Top right: 2D distribution obtained by marginalizing along s. Bottom: projections of the 2D distribution onto the original spatio-featural coordinates.

814

M. Felsberg

The overall distribution is obtained as the margin at angle φ, i.e. it is a function of t = cos φ f − sin φ x obtained by integrating along s = cos φ x + sin φ f , see Fig. 3, top right. For the case of Gaussian distributions, this integral can be computed analytically:

x2 f2 p(t; φ) ∝ exp − 2 − 2 ds (12) 2σx 2σf

(s cos φ − t sin φ)2 (s sin φ + t cos φ)2 ds (13) = exp − − 2σx2 2σf2

t2 (14) ∝ exp − 2(sin2 φ σx2 + cos2 φ σf2 ) where σf2 is the variance of the distribution in the feature domain and σx2 is the spatial distribution. In order to compute suitable kernels for a scale-space representation, the 2D distribution p(t; φ) is projected onto the spatial domain respectively the feature domain, see Fig. 3, bottom. For the case of Gaussian distributions, the projections can be computed analytically again:

(f cos φ − x sin φ)2 px (x; φ) = p(t; φ) f =0 ∝ exp − 2(sin2 φ σx2 + cos2 φ σf2 ) f =0

2 x and (15) = exp − 2 2(σx + cot2 φ σf2 )

f2 , (16) pf (f ; φ) ∝ exp − 2(tan2 φ σx2 + σf2 ) resulting in the variances σk2f (φ) = σf2 + tan2 φ σx2 σk2x (φ)

=

σx2

2

+ cot

φ σf2

(17) .

(18)

Hence, we obtain the spatial uncertainty (Δx)2 = 12 σk2x (φ) and the feature uncertainty (Δf )2 = 12 σk2f (φ). Minimizing the product of uncertainties with respect to φ 1 φ0 = arg min(Δx)2 (Δf )2 = arg min σk2x (φ)σk2f (φ) (19) φ φ 4 results in a global minimum at φ0 = tan−1 3.3

σf σx

giving σk2x (φ0 )σk2f (φ0 ) = 4σf2 σx2 .

The Spatio-Featural Uncertainty Relation

In order to generalize the result from the previous section, we have to deﬁne the group structure of spatio-featural transformations. Readers not familiar with

Spatio-Featural Scale-Space

815

group theory might consider the previous example as a proof of concept and continue with the subsequent section. We choose a methodology which is based on the isotropic model used in [26], although restricted to the 1D case. The higher-dimensional case generalizes straightforwardly. The group that we consider contains the shearing group and the translation group given as x = x + tx

(20)

f = f + tan(φ)x + tf .

(21)

The shearing transformation corresponds to the rotation of a Euclidean space and is obtained since the f -coordinate is a null-vector [26], i.e., f · f = 0. The parameterization is chosen such that it is consistent with the angle φ in the previous section and it reﬂects the fact that points move along the surface / curve with angle φ, i.e., that we cannot determine whether measurement noise comes from spatial uncertainty or feature noise. Using this deﬁnition we state the following Theorem 1. Let the spatio-featural domain be described by the isotropic model. The uncertainty product in the spatio-featural domain has a lower bound ∃k > 0 :

(Δx)(Δf ) ≥ k

(22)

and the lower bound is given as 1 σf σx (23) 2 where σf2 is the variance of the feature domain marginal distribution and σx2 is the variance of the spatial domain distribution. k=

The proof of this theorem is given as follows. The generators of (20) and (21) are given as sx = ∂x of = x∂f sf = ∂f (24) and the commutator of sx and of is given by [sx , of ] = sx of − of sx = ∂x x∂f − x∂f ∂x = ∂f = sf .

(25)

Hence, using the Robertson-Schrödinger relation [27] (note that the considered shearing transformations is a spinor group in the considered space), we obtain 1 1 E{[sx , of ]}2 = . (26) 4 4 Taking the square root and scaling x and f by σx respectively σf , we obtain Theorem 1. Note that (23) and the example of the previous section diﬀer by a factor of 2. This might either mean that other types of noise distribution would lead to smaller uncertainties or that it is not possible to reach the lower bound. Despite the actual uncertainty product, Theorem 1 implies that we should not use iterative ﬁltering with a 3D low-pass kernel as in (11). Instead, the scales in the diﬀerent domains must behave reciprocal. The optimal choice of scales is the topic of the subsequent section. E{s2x }E{o2f } ≥

816

3.4

M. Felsberg

Scale-Selection for CCFMs

The derivation from the previous section can be used to determine the proper change of scale when constructing a CCFM scale-space. The major trouble is, however, that we normally do not have access to the eﬀective σf2 and σx2 , i.e., we have to ﬁnd a way to estimate these unknown parameters. When considering the 3D embedding (9) in context of channel representations, we make the following observation. Encoding f (x, y) with three arbitrarily, but ﬁnitely large channels always results in the same image after decoding (assuming inﬁnite accuracy of real numbers): (c1 (f /R) − c−1 (f /R))R = f

0 << R < ∞ .

(27)

This identity is obtained directly from (1) and (5) for the case of three channels (implying n0 = 0 and Mn0 = 1) or directly by (4). This means, we can always select the relative feature resolution R such that the feature resolution σk−2 (φ) f is minimal. Furthermore, we know the original spatial resolution, which can be considered as a maximal spatial resolution [28]. These extremal resolutions can be considered as initial conditions for the CCFM scale-space. Consider for example an image of the size X × X with values in [−0.5, 0.5]. We simply select R = 1 and obtain 1 = σk2f (φmin ) = σf2 + tan2 φmin σx2 .

(28)

From the image size, we know that 1 = σk2x (φmin ) = σx2 + cot2 φmin σf2 . X2

(29)

This means, we get two equations and three unknowns σx2 , σf2 , and φmin and thus we cannot compute the proper scale-selection scheme from the initial image only. However, we also have knowledge about the ﬁnal state of the scale-space: There is an angle φmax for which the spatial resolution σk−2 (φ) becomes minimal, x i.e., equal to one, which corresponds to three channels (the minimum number for quadratic B-spline channels): 1 = σk2x (φmax ) = σx2 + cot2 φmax σf2 .

(30)

However, this introduces another unknown φmax . This can ﬁnally be eliminated by selecting a maximal resolution in feature space. The choice of the latter is however subject to heuristics, as there is no natural upper bound to feature resolution. As an alternative, it can be chosen according to requirements in the further processing. In any case, by selecting some constant F (we used F = 20), we obtain 1 = σk2f (φmax ) = σf2 + tan2 φmax σx2 . (31) F2 The four equations (28–31) are solved by σf2 = φmin

X2 − 1 F 2X 2 − 1 = tan−1 X

σx2 = φmax

F2 − 1 F 2X 2 − 1 = cot−1 F .

(32) (33)

Spatio-Featural Scale-Space

817

Inserting these parameters into (17) and (18) determines good choices of scales in spatial-featural domain and has been used in subsequent experiments.

4

Experiments

In the experiments shown in Fig. 4, we have quantized the spatio-featural resolutions resulting from (17) and (18). The image has been encoded in a CCFM with

Fig. 4. Examples for CCFM-smoothing at diﬀerent scales. The spatial and featural (denoted as channels) resolutions are given as (quantized) functions (18) and (17) where φ is linearly increasing with the frame number. The feature considered here is the grey-scale.

818

M. Felsberg

the corresponding number of channels and has been decoded afterwards. For the decoding of pixels not lying at a spatial channel center, we have linearly interpolated between the nearest channels. Obviously, the processed images maintain similar perceptual quality for a wide range of channels, before the image degrades at a resolution of 32 × 32. Note that any resolution can be computed in a single step, since the CCFM scale-space is linear and need not be computed iteratively.

5

Conclusion

This paper presents a new theorem for a spatio-featural uncertainty relation. The theoretic result is directly applicable in terms of scale-selection for non-linear image ﬁltering using CCFMs. The main claim of this paper is that one should always consider the spatial domain and the feature domain in conjunction, since they are inherently connected. Still, the presented results are only a very ﬁrst step and need to be considered more in detail for various applications and the theoretic results need to be generalized for non-ﬂat manifolds (non-trivial ﬁbrebundles) and eﬀects of higher dimensionality need to be considered. Classical scale-space features as preservation of the average grey-scale or the max-min principle have to be considered in future work, presumably in some modiﬁed formulation.

Acknowledgement The authors would like to thank P.-E. Forssen for various discussions about the paper, in particular on alpha-synthesis.

References 1. Iijima, T.: Basic theory of pattern observation. Papers of Technical Group on Automata and Automatic Control, IECE, Japan (December 1959) 2. Witkin, A.P.: Scale-space ﬁltering. In: Int. Joint Conf. Art. Intell., pp. 1019–1022 (1983) 3. Koenderink, J.J.: The structure of images. Biolog. Cybernetics 50, 363–370 (1984) 4. Weickert, J., Ishikawa, S., Imiya, A.: Linear scale-space has ﬁrst been proposed in Japan. Mathematical Imaging and Vision 10, 237–252 (1999) 5. Felsberg, M., Sommer, G.: The monogenic scale-space: A unifying approach to phase-based image processing in scale-space. J. Math. Imag. Vis. 21, 5–26 (2004) 6. Duits, R., Florack, L.M.J., de Graaf, J., ter Haar Romeny, B.M.: On the axioms of scale space theory. Journal of Mathematical Imaging and Vision 20, 267–298 (2004) 7. Granlund, G.H.: In search of a general picture processing operator. Computer Graphics and Image Processing 8, 155–173 (1978) 8. Burt, P.J., Adelson, E.H.: The Laplacian pyramid as a compact image code. IEEE Trans. Communications 31(4), 532–540 (1983)

Spatio-Featural Scale-Space

819

9. Mallat, S.G.: A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans. Pattern Anal. Machine Intelligence 11, 674–693 (1989) 10. Daubechies, I.: Orthonormal bases of compactly supported wavelets. Communications on Pure and Applied Mathematics 41(7), 909–996 (1988) 11. Perona, P., Malik, J.: Scale-space and edge detection using anisotropic diﬀusion. IEEE Trans. Pattern Analysis and Machine Intelligence 12(7), 629–639 (1990) 12. Weickert, J.: Theoretical foundations of anisotropic diﬀusion in image processing. Computing suppl. 11, 221–236 (1996) 13. Portilla, J., Strela, V., Wainwright, J., Simoncelli, E.P.: Image denoising using scale mixtures of Gaussians in the wavelet domain. IEEE Trans. Image Processing 12(11), 1338–1351 (2003) 14. Felsberg, M., Forssén, P.E., Scharr, H.: Channel smoothing: Eﬃcient robust smoothing of low-level signal features. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(2), 209–222 (2006) 15. Granlund, G.H.: An associative perception-action structure using a localized space variant information representation. In: Sommer, G., Zeevi, Y.Y. (eds.) AFPAC 2000. LNCS, vol. 1888, pp. 48–68. Springer, Heidelberg (2000) 16. Forssén, P.E., Granlund, G.: Robust multi-scale extraction of blob features. In: Bigun, J., Gustavsson, T. (eds.) SCIA 2003. LNCS, vol. 2749, pp. 11–18. Springer, Heidelberg (2003) 17. Jonsson, E.: Channel-Coded Feature Maps for Computer Vision and Machine Learning. PhD thesis, Linköping University, Sweden (2008) 18. Jonsson, E., Felsberg, M.: Eﬃcient computation of channel-coded feature maps through piecewise polynomials. Image and Vision Computing (in press) 19. Felsberg, M., Granlund, G.: P-channels: Robust multivariate m-estimation of large datasets. In: International Conference on Pattern Recognition, Hong Kong (2006) 20. Zemel, R.S., Dayan, P., Pouget, A.: Probabilistic interpretation of population codes. Neural Computation 10(2), 403–430 (1998) 21. Pouget, A., Dayan, P., Zemel, R.: Information processing with population codes. Nature Reviews – Neuroscience 1, 125–132 (2000) 22. Howard, I.P., Rogers, B.J.: Binocular Vision and Stereopsis. Oxford University Press, Oxford (1995) 23. Forssén, P.E.: Low and Medium Level Vision using Channel Representations. PhD thesis, Linköping University, Sweden (2004) 24. Lindeberg, T.: Scale-Space Theory in Computer Vision. Kluwer Academic Publishers, Boston (1994) 25. Elder, J.H., Zucker, S.W.: Local scale control for edge detection and blur estimation. IEEE Trans. Pattern Analysis and Machine Intell. 20(7), 699–716 (1998) 26. Koenderink, J.J., van Doorn, A.J.: Image processing done right. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 158– 172. Springer, Heidelberg (2002) 27. Santhanam, T.S.: Higher-order uncertainty relations. Journal of Physics A: Mathematical and General 33(8), 83–85 (2000) 28. Florack, L.M.J., ter Haar Romeny, B.M., Koenderink, J.J., Viergever, M.A.: Scale and the diﬀerential structure of images. Image Vision Comp. 10(6), 376–388 (1992)

Scale Spaces on the 3D Euclidean Motion Group for Enhancement of HARDI Data Erik Franken1 and Remco Duits1,2 1

Department of Biomedical Engineering Department of Mathematics and Computer science, Eindhoven University of Technology, The Netherlands {E.M.Franken,R.Duits}@tue.nl 2

Abstract. In previous work we studied left-invariant diﬀusion on the 2D Euclidean motion group for crossing-preserving coherence-enhancing diﬀusion on 2D images. In this paper we study the equivalent threedimensional case. This is particularly useful for processing High Angular Resolution Diﬀusion Imaging (HARDI) data, which can be considered as 3D orientation scores directly. A complicating factor in 3D is that all practical 3D orientation scores are functions on a coset space of the 3D Euclidean motion group instead of on the entire group. We show that, conceptually, we can still apply operations on the entire group by requiring the operations to be α-right-invariant. Subsequently, we propose to describe the local structure of the 3D orientation score using left-invariant derivatives and we smooth 3D orientation scores using left-invariant diﬀusion. Finally, we show a number of results for linear diﬀusion on artiﬁcial HARDI data.

1

Introduction

A common approach for enhancing elongated structures in noisy images is by nonlinear anisotropic diﬀusion on the image [1]. This can be regarded as calculating a nonlinear scale space on the additive group (Rn , +), i.e. the translation group. In our earlier work [2, 3, 5], we proposed to enhance elongated structures via the orientation score of a 2D image, which has the practical advantage that crossing structures can be handled appropriately. An orientation score of a 2D image is a function on the 2D Euclidean motion group SE(2), which is constructed from a 2D image using an invertible transformation. The image enhancement in our previous work is accomplished by a nonlinear diﬀusion process in the orientation score of the image (which is a 3D dataset: 2 spatial dimensions and 1 orientation dimension), followed by an inverse orientation score transformation to obtain an enhanced image. In this paper we go one step further and investigate how we can apply the same techniques to 3D orientation scores. Such orientation score is a 5D dataset, i.e. 3 spatial dimensions and 2 orientation dimensions. The 3D case is very relevant for many (bio)medical problems, since many (bio)medical images are intrinsically 3D. Our main application of interest is high angular resolution diﬀusion imaging (HARDI). X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 820–831, 2009. c Springer-Verlag Berlin Heidelberg 2009

Scale Spaces on the 3D Euclidean Motion Group

821

With the term HARDI we refer to all diﬀusion MRI techniques, in which the diﬀusion proﬁle on each spatial position is modeled by a function on the sphere, which provides richer information especially in regions where diﬀerent ﬁbrous structures cross or bifurcate [4,6,7,8]. Roughly speaking the MRI scanner measures the probability of ﬁnding a water molecule at each position for a certain direction, where the number of acquired directions can be varied. Clearly, all data obtained using any HARDI technique can be considered as 3D orientation scores directly. Remarkably, in HARDI processing algorithms that are proposed in literature, the data is processed as function on the sphere for each spatial position separately, see e.g. [4, 7, 9]. In our approach, we consider both the spatial and the orientational part to be included in the domain, so a HARDI dataset is considered as a function R3 S 2 → R. Furthermore, we explicitly employ the proper underlying group structure. The advantage is that we can enhance the data using both orientational and spatial neighborhood information, which potentially leads to improved enhancement and detection algorithms. 3 2 3D orientation scores are deﬁned as u : R S → R or C, where functions 3 2 3 R is the spatial domain and S = n ∈ R n = 1 is the domain of a unit sphere. In this paper, the domain of u is parameterized by (x, n), where x = (x, y, z) ∈ R3 and n ∈ S 2 . Figure 1(a) shows an example clarifying the structure of a 3D orientation score. This paper will start with the introduction of the group structure of the 3D orientation score domain, i.e. the 3D Euclidean motion group SE(3). Subsequently, we will introduce the important diﬀerential geometry on SE(3), needed z Į

ȕ Ȗ

y

x (a) Example 3D orientation score

(b) Euler angles on the sphere

Fig. 1. (a) Visualization of a simple 3D orientation score u(x, n) containing two crossing straight lines, visualized using Q-ball glyphs in the DTI tool (see http://www.bmia.bmt.tue.nl/software/dtitool/) from two diﬀerent viewpoints. At each (relevant) spatial position x the function on the sphere u(x, ·) is displayed by a so-called glyph, which is given by n → x + q u(x, n)n where q is a scaling factor. (b) Intuition of coset space SO(3)/SO(2): the Euler angles (α, β, γ) are needed to parametrize rotation in SO(3), while the two angles (β, γ) are suﬃcient to describe positions on the unit sphere, represented by a unit vector. The third Euler angle α is in fact a rotation of this vector around its own axis, leaving the vector invariant. Thus, each position on the sphere is identiﬁed by a coset space SO(3)/SO(2) cf. (4).

822

E. Franken and R. Duits

to estimate tangent vectors that locally ﬁt best to the elongated structures in the 3D orientation score. The next topic will be the diﬀusion on 3D orientation scores, which yields a scale space representation of the SE(3) group. The paper will end with results of linear SE(3)-diﬀusion on artiﬁcial HARDI datasets.

2 2.1

Group Structure of the Domain of 3D Orientation Scores The Rotation Group SO(3) and Coset Space SO(3)/SO(2)

The noncommutative group of 3D rotations is deﬁned as matrix group by SO(3) = {R | R ∈ R3×3 , RT = R−1 , det(R) = 1}.

(1)

In this section, we will ﬁrst consider diﬀerent parameterizations of SO(3). Then, we will describe the coset space SO(3)/SO(2), which is an essential prerequisite to relate functions on the sphere (i.e. two angles) to functions on SO(3) (i.e. three angles). The relation between positions on the sphere S 2 and a 3D rotation SO(3) is established by rotating the vector ez , i.e. n = R · ez .

(2)

This relation shows that the resulting position n on the sphere is independent on an arbitrary rotation around the z-axis, that is R Reαz ·ez = R·ez for all α, where Rn α denotes rotation over α around the axis deﬁned by vector n. This means that a function on the sphere is not equivalent to a function on the complete rotation group SO(3), but rather a function on the set that partitions SO(3) into left cosets SO(3)/stab(ez ) where stab(ez ) denotes the subgroup of SO(3) of all rotations around the z-axis, as is made intuitive in Figure 1(b). A left coset [g]H of a group G with subgroup H is deﬁned as the set [g]H = gH = {gh|h ∈ H},

(3)

for any g ∈ G. The left cosets form a partition of the group, i.e. the group is divided into disjoint cosets, and the set of all of these cosets is denoted by G/H. Two group elements g1 ∈ G and g2 ∈ G have an equivalence relation g1 ∼ g2 if they belong to the same left coset, i.e. g1 H = g2 H. In the case SO(3)/stab(ez ), we have the equivalence relation R1 ∼ R2 iﬀ there is an α such that R1 Reαz = R2 . From now on we will write SO(3)/SO(2) rather than SO(3)/stab(ez ) since stab(ez ) and SO(2) are isomorphic. The cosets SO(3)/SO(2) are isomorphic to the space of the unit vectors of (2), i.e. SO(3)/SO(2) ∼ (4) = S 2 = n ∈ R3 n = 1 . The isomorphism is given by means of (2). The set of all the cosets SO(3)/SO(2) can be parameterized using only two angles rather than three angles, for instance e e as [Reγz Rβy ]SO(2) ∈ SO(3)/SO(2) and therefore n(β, γ) = Reγz Rβy ez ∈ S 2 . Note that the set of all disjoint cosets SO(3)/SO(2) does not form a group since SO(2) is not a normal subgroup of SO(2), so [g1 ]SO(2) [g2 ]SO(2) = [g1 g2 ]SO(2) .

Scale Spaces on the 3D Euclidean Motion Group

2.2

823

The 3D Euclidean Motion Group SE(3)

The 3D Euclidean motion group is the group of 3D translations and 3D rotations, i.e. SE(3) = R3 SO(3). An element of SE(3) can be parameterized by (x, R) where x ∈ R3 is the translation vector and R ∈ SO(3) is the rotation matrix. The group product and inverse of SE(3) are given by g g = (x, R) (x , R ) = (x + R · x , R · R ),

(5)

g −1 = (x, R)−1 = (−R−1 x, R−1 ).

To map the structure of a group to operators on orientation scores, we need a representation. A representation is a mapping of the form R : G → B(H), where H is the linear space of orientation scores and B(H) is the space of bounded linear invertible operators H → H, that maps a group element to an operator where the group properties are preserved, i.e. Rg Rh = Rgh and Re = I. On SE(3) we deﬁne the left- and right-regular representations on a function U ∈ L2 (SE(3)) as (Lg ◦ U )(h) = U (g −1 h),

g, h ∈ SE(3),

(6)

(Qg ◦ U )(h) = U (h g),

g, h ∈ SE(3).

(7)

The matrix Lie algebra [10] Te (SE(3)) is spanned by the following basis ⎛

0 ⎜0 ⎜ X1 = ⎝ 0 0 ⎛ 0 ⎜0 ⎜ X4 = ⎝ 0 0

0 0 0 0 0 0 1 0

⎞ 1 0⎟ ⎟, 0⎠ 0 ⎞ 0 0 −1 0⎟ ⎟, 0 0⎠ 0 0 0 0 0 0

⎛

⎞ ⎛ ⎞ 0000 0000 ⎜ 0 0 0 1⎟ ⎜ 0 0 0 0⎟ ⎟ ⎟ X2 = ⎜ X3 = ⎜ ⎝ 0 0 0 0⎠ , ⎝ 0 0 0 1⎠ , 0000 0000 ⎞ ⎛ ⎛ 0 010 0 −1 0 ⎜ 0 0 0 0⎟ ⎜1 0 0 ⎟ ⎜ ⎜ X5 = ⎝ , X6 = ⎝ −1 0 0 0⎠ 0 0 0 0 000 0 0 0

⎞ 0 0⎟ ⎟. 0⎠ 0

(8)

The nonzero commutators can be found by [Xi , Xj ] = Xi Xj − Xj Xi . By calculating the matrix exponents we ﬁnd the following matrix representation of the SE(3) group E(x,R) = exp(x X1 + y X2 + z X3 ) exp(ˇ γ X4 ) exp(βˇ X5 ) exp(ˇ α X6 ) Rx e = , with R = Reγˇx Rβˇy Reαˇz . 0 1

(9)

ˇ γˇ ) is a possible Euler angle parametrization of the rotation group where (ˇ α, β, SO(3), see [5, Chapter 7]. 2.3

Left-Invariance and Right-Invariance

An operator Φ : L2 (SE(3)) → L2 (SE(3)) is left-invariant if it commutes with the left-regular representation (6) ∀ g ∈ SE(3) : Lg ◦ Φ = Φ ◦ Lg ,

(10)

824

E. Franken and R. Duits

and similarly an operator Φ is right-invariant if it commutes with the rightregular representation (7) ∀ g ∈ SE(3) : Qg ◦ Φ = Φ ◦ Qg .

(11)

In this work we aim at left-invariant operations and consider right-invariant operations senseless. The rationale behing this will be clariﬁed below. Deﬁne W : (SE(3) → C) → (R3 → C) to be the operator that calculates the orientationmarginal,

W[U ](x) = U (x, R)dμ(R). (12) SO(3)

where dμ is the Haar measure, which is designed in order to fulﬁll requirement

F (R)dμ(R) = F (R · R )dμ(R), ∀ R ∈ SO(3). (13) SO(3)

SO(3)

It is easy to derive that for the left-regular representation Ug ◦ W ◦ U = W ◦ Lg ◦ U, ∀ g ∈ SE(3),

(14)

where U is a representation of SE(3) on L2 (R3 ) deﬁned by (U(x ,R ) f )(x) = f ((R )−1 (x − x )). On the other hand, we note that

(W ◦ Q(x,R) ◦ U )(x , R ) = U (x + R x, R R)dμ(R ), (15) SO(3)

which shows that the integral variable R enters the spatial part, making it impossible to ﬁnd a relation equivalent to (14) for the right-regular representation. In words, the left-regular representation “commutes” with W, where Lg changes into Ug since the function space changes from SE(3) to R3 , while it is not possible to ﬁnd such a relation for the right-regular representation. This observation makes it sensible to favor operators Φ to be left-invariant, i.e. W ◦ Φ ◦ Lg ◦ U = W ◦ Lg ◦ Φ ◦ U = Ug ◦ W ◦ Φ ◦ U states that applying a group transformation (Lg ) on the input U renders the same result as applying the same group transformation (Ug ) on the orientation-marginal of the output. 2.4

Functions on SE(3) and R3 S 2

In the beginning of this paper we deﬁned a 3D orientation score u as a function of three spatial variables and only two angular variables describing a position on the sphere. However, since the sphere S 2 is isomorphic to the coset space SO(3)/SO(2), rather than the entire rotation group SO(3), such an orientation score is not a function on the entire Euclidean motion group SE(3), but rather a function on the coset space SE(3)/(0 × stab(ez )). Here, (0 × stab(ez )) denotes the SE(2) subgroup of rotations around the z-axis and translation 0, which is isomorphic to SO(2). Analogously to the isomorphism SO(3)/SO(2) ∼ = S 2 , we 3 2 ∼ have the isomorphism SE(3)/(0 × stab(ez )) = R S .

Scale Spaces on the 3D Euclidean Motion Group

825

For the analysis it is more convenient to consider functions on R3 S 2 as functions on the entire group SE(3) with the extra property of α-right-invariance. ˜ : SE(3) → C is deﬁned to be α-right-invariant if A function U ˜ =U ˜ , ∀ α, that is, Q(0,Reαz ) ◦ U ˜ ˜ (x, R), ∀ α, U(x, RReαz ) = U

(16)

˜ rather than U to make explicit in the notation that the function where we write U ˜ (x, R) is independent on a is α-right-invariant. We observe that the value of U ˜ rotation of the z-axis applied on the right-side, so U can be identiﬁed one-to-one to an orientation score u : R3 S 2 → C, as ˜ U(x, R) = u(x, R · ez ),

˜ is α-right-invariant. where U

(17)

˜ , because In this paper we will mostly work with the α-right-invariant function U it is more convenient to work with functions on the group. 2.5

SE(3)-Convolutions

It can be shown that all operations on orientation scores that are linear and left-invariant, can be expressed as an SE(3)-convolution, which is deﬁned by

(Ψ ∗SE(3) U )(g) = Ψ (h−1 g)U (h)dh. (18) SE(3)

More explicitly this yields

(Ψ ∗SE(3) U )(x, R) =

R3

Ψ (R−1 (x − x ), R−1 R)U (x , R ) dx dμ(R ), (19) SO(3)

where dμ(R ) is deﬁned in (13). ˜ cf. (16) we need to put additional requirements on For an α-right-invariant U ˜ to be α-right-invariant as well, the kernel Ψ . We require the result Ψ ∗SE(3) U leading to the following requirement ˜ )) = Ψ˜ ∗SE(3) U ˜, Q(0,Rez ) ◦ (Ψ˜ ∗SE(3) (Q(0,Reαz ) ◦ U α

∀ α, α .

(20)

˜ One can easily verify that the folThis imposes requirements on the kernel Ψ. lowing properties hold for the SE(3)-convolution of (18) Qg (Ψ ∗SE(3) U ) = (Qg Ψ ) ∗SE(3) U, ∀g ∈ SE(3),

(21)

(Lg Ψ ) ∗SE(3) U = Ψ ∗SE(3) (Qg−1 U ), ∀g ∈ SE(3).

(22)

Using the latter two equations, the left-hand side of (20) can now be rewritten as ˜ )) = ((Q(0,Rez ) ◦ Ψ˜ ) ∗SE(3) (Q(0,Rez ) ◦ U ˜ )) Q(0,Rez ) ◦ (Ψ˜ ∗SE(3) (Q(0,Reαz ) ◦ U α α

α

˜. = (L(0,Rez ) ◦ Q(0,Rez ) ◦ Ψ˜ ) ∗SE(3) U −α

α

(23)

826

E. Franken and R. Duits

Therefore

˜ , for all α, α , z ) ◦ Q(0,Rez ) ◦ Ψ Ψ˜ = L(0,Re−α

(24)

α

so Ψ˜ is required to be both α-right-invariant and α-left-invariant (i.e. L(0,Rez ) ◦ α ˜ =U ˜ for all α ). More explicitly this yields U Ψ˜ (x, R) = Ψ˜ ((Reαz )−1 x, (Reαz )−1 RReαz ), for all α, α .

3

(25)

Diﬀerential Geometry on SE(3)

In [3] we introduced the basic diﬀerential geometry on SE(2). In this section we establish the same concepts for SE(3). We will introduce the left-invariant vector ﬁelds and left-invariant derivatives, and a procedure to estimate tangent vectors that locally ﬁt best to elongated structures in 3D orientation scores. A more extensive description, including explicit expression for e.g. curvature and torsion, can be found in [5, Chapter 7]. 3.1

Left-Invariant Derivatives in SE(3)

Using the matrix representation cf. equation (9), left-invariant derivatives are given by U (Eg · exp(h Xi )) − U (Eg ) (Ai U )(Eg ) = lim . (26) h→0 h The tangent space of g ∈ SE(3) is vector ﬁelds, i.e. spanned by these Tg (SE(3)) = span{A1 g , A2 g , A3 g , A4 g , A5 g , A6 g } where we deﬁne Ai g (U ) = (Ai U )(Eg ). Left-invariant derivatives A1 , A2 ,and A3 can be implemented simply by approximating (26) using ﬁnite diﬀerences. ˜ , A3 U ˜ (g) is always α-right-invariant On an α-right-invariant function U since exp(h X3 ) = E(h ez ,I) = E(h ez ,I) E(0,Reαz ) . Furthermore, we always have ˜ (g) = 0 for all g ∈ SE(3). The remaining left-invariant derivatives Ai U ˜, A6 U with i ∈ {1, 2, 4, 5}, do not render α-right-invariant functions since these leftinvariant derivatives are dependent on the value of α resp. α ˇ . This implies that if one takes higher order derivatives one still needs to take all 6 left-invariant derivatives into account. As an example, we derive the left-invariant Hessian HU = ∇(∇U ) for α-rightinvariant functions where the gradient operator is ∇ = (A1 , A2 , . . . A6 )T . To this end, we ﬁrst use the commutator relations to order the numbered left-invariant derivatives such that angular derivative A1 always appears on the left-side and A6 always appears on the right-side and subsequently we can use A6 U (g) = 0 (which implies that Ai A6 U = 0 for all i). This yields the following 5 × 6 Hessian matrix ⎛

A21 ⎜A1 A2 ⎜ ˜ = ∇(∇U ˜ ) = ⎜A1 A3 HU ⎜ ⎝A1 A4 A1 A5

A1 A2 A22 A2 A3 A2 A4 A2 A5

A1 A3 A1 A4 A1 A5 − A3 A2 A3 A2 A4 + A3 A2 A5 A23 A3 A4 − A2 A3 A5 + A1 A3 A4 A24 A4 A5 A3 A5 A4 A5 A25

⎞ A2 −A1 ⎟ ⎟ ˜ 0 ⎟ ⎟ U. ⎠ A5 −A4

(27)

Scale Spaces on the 3D Euclidean Motion Group

827

We use ﬁnite diﬀerences to calculate the left-invariant derivatives on orientation scores with a sampled domain R3 S 2 . To get a rotation matrix corresponding to an element of S 2 one can choose an arbitrary rotation matrix with R · ez = n. For ﬁrst order centered ﬁnite diﬀerences one subsequently calculates (Ai U )(Eg ) ≈

1 (U (Eg · exp(h Xi )) − U (Eg · exp(−h Xi ))) . 2h

(28)

Note that this will require interpolations to be performed, both in the spatial dimensions and on the sphere. One should, however, always ensure that the result of the eﬀective operator is independent on the speciﬁc choice of R. To this end, we have the following important relation between the left-invariant derivatives at g1 and g2 iﬀ g1 = (x, R1 ) ∼ g2 = (x, R2 ) ˜ (g1 ) = Zα1 −α2 ∇U ˜ (g2 ), ∇U

with Zα = Rα ⊕ ( 1 ) ⊕ Rα ⊕ ( 1 ),

(29)

where Zα1 −α2 ∈ R6×6 “converts” the left-invariant gradient at g2 to the left α − sin α invariant gradient at g1 , rotation matrix Rα is given by Rα = cos sin α cos α , and the symbol “⊕” denotes direct sum of matrices. 3.2

Estimation of Tangent Vectors in R3 S 2

The exponential curves of SE(3) are found by (expressed in matrix form) ⎛ ⎞ 6 cj Xj ⎠ , γc (t) = exp ⎝t (30) j=1

which where c = (c1 , c2 , . . . , c6 ) denotes the SE(3)-tangent vector components, 6 j are elements of the tangent space at the unity element j=1 c Aj e ∈ Te (SE(3)), where we use the isomorphism between the Lie algebra and the left-invariant vector ﬁelds at the unity element, i.e. Aj e ↔ Xj . We aim to estimate the locally best ﬁtting exponential curve (in the previous subsection) at each position SE(3). Therefore, we formulate a minimization problem that minimizes over the “iso-contours” of the left-invariant gradient vector at position g, leading to the optimal tangent vector c∗ 2 d ∗ ˇ c (g) = arg min (31) dt (∇U (g γc(g) (t)))t=0 c(g)μ = 1 , c(g) μ where · μ denotes the norm on a vector in tangent space Tg (SE(3)) (i.e. the norm at the right side) resp. on a covector in the dual tangent space Tg∗ (SE(3)). The norm on vectors is deﬁned by c = (c, c)μ with the inner product μ 3 6 2 j j j j + j=4 c c , where parameter μ ensures that all com(c, c)μ = μ j=1 c c ponents of the inner product are dimensionless. The value of the parameter determines how the distance in the spatial dimensions relates to distance in the

828

E. Franken and R. Duits

orientation dimension. After some elementary math, we ﬁnd that equation (31) can be expressed as ˜∗ = λ c ˜∗ , (Mμ HU Mμ )T (Mμ HU Mμ ) c

(32)

∗ ˜∗ = M−1 where Mμ = diag(1/μ, 1/μ, 1/μ, 1, 1, 1) and c μ c . This amounts to eigensystem analysis of the symmetric 6×6 matrix (Mμ HU Mμ )T (Mμ HU Mμ ), where ˜∗ . The eigenvector with the smallest correone of the three eigenvectors gives c ˜∗ , and the desired tangent sponding eigenvalue is selected as tangent vector c ∗ ∗ ∗ vector c is then given by c = Mμ ˜ c . Once the local tangent vector is found, it is of interest to obtain a measure for orientation conﬁdence, which can be used for controlling the anisotropy factor of an adaptive diﬀusion process, as described for 2D in [2, 3]. Such measure can be obtained by calculating the Laplacian in the ﬁve-dimensional (considering the full SE(3)) hyperplane orthogonal to the estimated tangent vector.

4

Diﬀusion on 3D Orientation Scores

The general left-invariant diﬀusion equation on SE(3) is given by ⎧ 6 6 ⎪ ⎨∂ W (g, t) = ∇ · D∇W (g, t) = A D A W (g, t), t j ij i i=1 j=1 ⎪ ⎩ ∂t W (g, 0) = U (g),

(33)

where W (·, t) represents the diﬀused orientation score at time t. This equation generates the diﬀusion scale space on the 3D Euclidean motion group SE(3). Next, we will derive which types of diﬀusions on SE(3) preserve the α-right˜ (g, 0) = U ˜ (g). In that case, invariance of an α-right-invariant input function W the right-hand side of (33) becomes, using (29) ˜ (g1 ) = ∇ · ZT ˜ ˜ ∇ · D(g1 )∇W α1 −α2 D(g1 )Zα1 −α2 ∇W (g2 ) = ∇ · D(g2 )∇W (g2 ),

(34)

which shows that diﬀusion is only valid (i.e., α-right-invariance-preserving) if D(g1 ) = Zα1 −α2 D(g2 )ZT α1 −α2 ,

for all g1 ∼ g2 .

(35)

Next, we separately consider constant diﬀusion (diﬀusion tensor D is constant for all g ∈ SE(3)) and adaptive diﬀusions (diﬀusion tensor D varies). Linear and Constant Diﬀusion: Suppose D is an arbitrary diﬀusion tensor, which is not necessarily valid, one can always make it valid by taking the α-marginal to remove the dependency on α, i.e.

2π ˜ (g, t)dα = 1 ˜ ∇ · D∇W ∇ · ZT α−α0 DZα−α0 ∇W (g0 , t)dα 2π 0 0

2π 1 ˜ ˜ ˜ =∇· ZT α−α0 DZα−α0 dα ∇W (g0 , t) = ∇ · D∇W (g0 , t), 2π 0

1 2π

2π

(36)

Scale Spaces on the 3D Euclidean Motion Group

829

˜ = 1 2π ZT DZα dα and where g = (x, R(α,β,γ)) and g0 = (x, R(α ,β,γ)). with D α 0 2π 0 ˜ α-right-invariance is preserved. All So by considering only diﬀusion tensors D, ˜ have the form D ˜ = diag(A, A, B, C, C, 0) (where the sixth diﬀusion tensors D ˜ = 0). This corresponds to horizontal, zero-curvature value is irrelevant since A6 U and zero-torsion, linear diﬀusion. Adaptive Diﬀusion: In case of adaptive diﬀusions, both linear and nonlinear, the diﬀusion above with adaptive A, B, and C is valid as well, since the derivation in (36) can also be applied on an adaptive D. Furthermore, adaptive diﬀusion with diﬀusion tensor D(g) = c(g) c(g)T , which can be interpreted as a diﬀusion process that only diﬀuses tangent to an exponential curve at each position g ∈ SE(3) with tangent vector c(g), is a valid diﬀusion as well. This can be easily seen by observing that c(g1 ) = Zα1 −α2 c(g2 ), iﬀ g1 ∼ g2 . This yields for the diﬀusion tensor D D(g1 ) = (Zα1 −α2 c(g2 ))(Zα1 −α2 c(g2 ))T = Zα1 −α2 c(g2 )c(g2 )T ZT α1 −α2 ,

(37)

which matches requirement (35). Furthermore, the sum of two valid diﬀusion tensors D1 + D2 forms a valid diﬀusion tensor again since T D1 (g1 ) + D2 (g1 ) = Zα1 −α2 D1 (g2 )ZT α1 −α2 + Zα1 −α2 D2 (g2 )Zα1 −α2

= Zα1 −α2 (D1 (g2 ) + D2 (g2 ))ZT α1 −α2 .

(38)

Therefore, in an adaptive setting one can also use a mixture between the between spatially-isotropic diﬀusion and diﬀusion along estimated exponential curves, i.e. D(c, Da ) = (1 − Da )

μ2 2 cμ

c cT + Da diag(1, 1, 1, μ2 , μ2 , μ2 ),

(39)

where Da is the anisotropy factor. Both Da and c are made dependent on the local structure in the orientation score. This diﬀusion process is analogous to the nonlinear curvature-adaptive diﬀusion process on 2D orientation scores that we have proposed in [2, 3].

5

Results

We implemented linear, left-invariant and α-right-invariance-preserving, diﬀu˜ = diag(A, A, B, C, C, 0) using an explicit sion on 3D orientation scores with D numerical scheme. The time derivative is taken as a ﬁrst order forward ﬁnite diﬀerence. Spatially, we take second order centered ﬁnite diﬀerences for ∂x2 , ∂y2 , and ∂z2 . In the orientation dimensions we calculate the Laplace operator on the sphere ΔS 2 via the spherical harmonic transform, where for stability a small regularization is applied via the spherical harmonic domain as well [11]. In Figure 2 we show a result of the linear SE(3)-diﬀusion process. In these examples an artiﬁcial three-dimensional HARDI dataset is created, to which Rician noise [12] is added. Next, we apply two diﬀerent SE(3)-diﬀusions on

830

E. Franken and R. Duits

(a) t = 0, no noise

(b) t = 0, noisy

(c) t = 1, μ-isotropic, no noise

(d) t = 1, μ-isotropic, noisy

(e) t = 1, anisotropic, no noise

(f) t = 1, anisotropic, noisy

Fig. 2. Result of R3 S 2 -diﬀusion on an artiﬁcial HARDI dataset of two crossing lines where one of the lines is curved, with and without added Rician noise with σ = 0.17 (signal amplitude 1). Image size: 10×10×10 spatial and 162 orientations. Parameters of the μ-isotropic diﬀusion process: A = B = 1, C = 0.01. Parameters of the anisotropic diﬀusion process: A = 0.01, B = 1, C = 10−4 .

both the noise-free and the noisy dataset. To visualize the result we use an experimental version of the DTI tool, which can visualize HARDI glyphs (recall Figure 1(a)) using the Q-ball visualization method [7]. In the results, all glyphs are scaled equivalently. The μ-isotropic diﬀusion does not preserve the anisotropy of the glyphs well; especially in the noisy case we observe that we get almost isotropic glyphs. With anisotropic diﬀusion, the anisotropy of the HARDI glyphs is preserved much better and in the noisy case the noise is clearly reduced. The resulting glyphs are, however, less directed than in the noise-free input image. This would improve when using nonlinear diﬀusion, or when adding some sort of “thinning” step in the method.

Scale Spaces on the 3D Euclidean Motion Group

6

831

Conclusions

In this paper we have shown that we can map all techniques of our previous work on 2D orientation scores to the more complicated case of 3D orientation scores. Some issues require special attention. Especially the fact that we usually have to deal with the coset space SE(3)/(0 × stab(ez )) ∼ = R3 S 2 has been emphasized as an important issue. We have shown that we can consider functions R3 S 2 → C as functions on SE(3) which are α-right-invariant. The required preservation of α-rightinvariance imposed additional constraints on the SE(3)-convolution kernel and the allowed types of (non)linear diﬀusion. The results suggest that even anisotropic linear diﬀusion on SE(3) is a useful way to denoise HARDI data. Future work should include the implementation and evaluation of nonlinear SE(3)-diﬀusion.

References 1. Weickert, J.A.: Coherence-enhancing diﬀusion ﬁltering. International Journal of Computer Vision 31(2/3), 111–127 (1999) 2. Franken, E., Duits, R., ter Haar Romeny, B.M.: Nonlinear diﬀusion on the 2D Euclidean motion group. In: Sgallari, F., Murli, A., Paragios, N. (eds.) SSVM 2007. LNCS, vol. 4485, pp. 461–472. Springer, Heidelberg (2007) 3. Franken, E., Duits, R.: Crossing-preserving coherence-enhancing diﬀusion on invertible orientation scores. International Journal of Computer Vision (IJCV) (to appear, 2009) 4. Özarslan, E., Mareci, T.H.: Generalized diﬀusion tensor imaging and analytical relationships between diﬀusion tensor imaging and high angular resolution imaging. Magnetic Resonance in Medicine 50, 955–965 (2003) 5. Franken, E.: Enhancement of Crossing Elongated Structures in Images. PhD thesis, Eindhoven University of Technology, Department of Biomedical Engineering, Eindhoven, The Netherlands (2008) 6. Özarslan, E., Shepherd, T.M., Vemuri, B.C., Blackband, S.J., Mareci, T.H.: Resolution of complex tissue microarchitecture using the diﬀusion orientation transform (DOT). NeuroImage 31, 1086–1103 (2006) 7. Descoteaux, M., Angelino, E., Fitzgibbons, S., Deriche, R.: Regularized, fast, and robust analytical Q-ball imaging. Magnetic Resonance in Medicine 58(3), 497–510 (2007) 8. Jian, B., Vemuri, B.C., Özarslan, E., Carney, P.R., Mareci, T.H.: A novel tensor distribution model for the diﬀusion-weighted MR signal. NeuroImage 37, 164–176 (2007) 9. Florack, L.: Codomain scale space and regularization for high angular resolution diﬀusion imaging. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2008. CVPR Workshops 2008, June 2008, pp. 1–6 (2008) 10. Varadarajan, V.: Lie groups, Lie algebras, and their representations. Prentice-Hall, Englewood Cliﬀs (1974) 11. Kin, G., Sato, M.: Scale space ﬁltering on spherical pattern. In: Proc. 11th international conference on Pattern Recognition, vol. C, pp. 638–641 (1992) 12. Macovski, A.: Noise in MRI. Magnetic Resonance in Medicine 36(3), 494–497 (1996)

On the Rate of Structural Change in Scale Spaces David Gustavsson, Kim S. Pedersen, Francois Lauze, and Mads Nielsen DIKU, University of Copenhagen Universitetsparken 1, DK-2100 Copenhagen Ø, Denmark {davidg,kimstp,francois,madsn}@diku.dk

Abstract. We analyze the rate in which image details are suppressed as a function of the regularization parameter, using first order Tikhonov regularization, Linear Gaussian Scale Space and Total Variation image decomposition. The squared L2 -norm of the regularized solution and the residual are studied as a function of the regularization parameter. For first order Tikhonov regularization it is shown that the norm of the regularized solution is a convex function, while the norm of the residual is not a concave function. The same result holds for Gaussian Scale Space when the parameter is the variance of the Gaussian, but may fail when the parameter is the standard deviation. Essentially this imply that the norm of regularized solution can not be used for global scale selection because it does not contain enough information. An empirical study based on synthetic images as well as a database of natural images confirms that the squared residual norms contain important scale information. Keywords: Regularization, Tikhonov Regularization, Scale Space, TV, Total Variation, Geometric Structure, Texture.

1 Introduction Images contain a mix of different type of information - from fine scale stochastic textures to large scale geometric structures. Image regularization can be viewed as approximating the observed original image with a simpler image, where simpler is defined by the regularization (prior) term and the regularization parameter λ. Here an image is considered to be simpler if it is smoother (or piece-wise smoother). Regularization can also be viewed as decomposing the observed image into a regularized (smooth) component and a small scale texture/noise component (called the residual, because it is the difference between the regularized solution and the observed image). By increasing the regularization parameter λ smoother and smoother approximations are generated. The rate in which image details are suppressed as a function of the regularization parameter depends on the image content and regularization method. The image residual contains the details that are suppressed during the regularization and the norm of the residual is a measurement of the amount of details that are suppressed. The norm of the residual as a function of the regularization parameter gives important information about the image content. For images containing small scale structure a lot of details are suppressed even for small λ and the norm of the residual will be large for small λ. For images containing solely large scale geometric structures few details will be suppressed for small λ and X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 832–843, 2009. c Springer-Verlag Berlin Heidelberg 2009

On the Rate of Structural Change in Scale Spaces

833

the norm of the residual will be small. The rate in which details are suppressed can be viewed as the derivative of the norm of the residual with respect to the regularization parameter, and reveals the amount of details that are suppressed if the regularization parameter increases. First order Tikhonov regularization, Gaussian linear scale space (which is equivalent to infinite order Tikhonov regularization [1]) and Total Variation image decomposition are studied. The squared L2 -norm of the regularized solution and the residual are studied as functions of the regularization parameter. Of special interest is the convexity/concavity of those norms viewed as functions, because it relates to the possibility that the rate in which details are suppressed can increase/decrease. In section 2, first order Tikhonov regularization is revisited and it is shown that the norm of the regularized solution is a convex function, while the norm of the residual is not a concave function. In section 3, linear Gaussian Scale Space is revisited, and it is shown that the norm of the regularized solution is convex as a function of the Gaussian variance, or equivalently diffusion time, but may fail to be convex when the parameter is the Gaussian standard deviation. The squared norm of the residual is in general not a concave function of its parameter. In section 4, Total Variation (TV) image decomposition is revisited. In section 5 experimental results are presented, the norm of the Sinc function, synthetic image containing image structures at different scales and natural images are studied. These studies tend to show that the square residual norm contains scale information, particularly at values where local convexity/concavity behavior changes. 1.1 Related Work Characterization of images by analyzing the behavior of the norm of the regularized solution and the residual as functions of the regularization parameter has not received much research attention. Sporring and Weickert [2, 3] view images as distributions of light quanta and use information theory to study the structure of images in scale space. The entropy of an image as a function of the scale (in scale-space) is analyzed and shown to be an increasing function of the scale. The result holds both for linear Gaussian scale space and non-linear scale-space. Furthermore the derivative of the entropy with respect to the scale is shown, empirically, to be a good texture descriptor. The derivative of the scale-space entropy function with respect to the scale is a global measure of how much the entropy of an image changes at different scale. Where Sporring and Weickert studies monotone functions of images across scale, we study norms of the scale space image and residual. Buades et.al [4] introduced the concept of Method Noise in denoising. The Method Noise is the image details that are removed in the denoising - i.e. the residual image - and the content is used for comparing denoising methods. The residual image has often been used for determine the optimal regularization parameter. (See Thompson et.al [5] for a classical study.) Selection of the optimal stopping time for diffusion filter was studied by Mrazek and Navara [6], which also relate to the Lyapunov functionals studied by Weickert [7].

834

D. Gustavsson et al.

1.2 Convexity, Fourier Transforms, Power Spectra Recall that a function f (x) defined on a convex set C is convex if f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y) for all 0 ≤ λ ≤ 1 and for all x, y ∈ C. If f (x) is convex on a convex set C then −f (x) is said to be concave on C. When f (x) is twice-differentiable, a necessary and sufficient condition for convexity is ∀x ∈ C, f (x) ≥ 0 (1) (in the multidimensional case a the Hessian matrix is positive semi-definite). Two elementary facts will be used in the sequel: 1) let h(λ) be a function of the form h(λ) = d(λ, x)s(x)dx (2) where d(λ, x) is convex in λ and s(x) ≥ 0 then h(λ) is convex. 2) Assume that f (x) = h(g(x)) where g : Rn → Rk and h : Rk → R. Then – if h is convex and non-decreasing and g is convex, then f is convex, – if h is convex and non-increasing and g is concave, then f is concave. The Fourier transform of a function f is denoted with fˆ. Parseval’s theorem asserts ˆ L2 where that this is an isometry of L2 : f L2 = f 2 f (x, y)2 = |f (x, y)|2 dxdy. (3) The frequency domain variables are denoted (ωx , ωy ) =: ω. The power spectrum function of a function f is the function ω → |fˆ(ω)|2 . f is said to follow a (α-)power law if α |fˆ(ω)| ∼ C/|ω| , where C and α are some constants. It is well known that the power spectra computed over a large ensemble of natural image approximate a power law in spatial frequencies with α around 1.7 or at least in (0, 2) [8, 9]. We use often implicitly the following classical result from Calculus. Let B := B(0, 1) the unit ball of Rn and B c its complement. Let g a positive function defined on −α c Rn . Assume that g ∼ x in B (resp B ). Then g dx < ∞ if and only if α < n B (resp. B c g dx < ∞ if and only if α > n). Finally, to conclude this paragraph, given a regularization, the functions s(λ) and r(λ) will denote the squared L2 -norm of respectively the the regularized solution and of the residual as a function of the regularization parameter λ.

2 Tikhonov Regularization The first order Tikhonov regularization is defined as the minimizer of the energy functional Eλ [f ] = (f − g)2 + λ|∇f |2 dxdy (4)

On the Rate of Structural Change in Scale Spaces

835

where g is the observed data and λ is the regularization parameter. The energy functional is composed of two terms: the data fidelity term f − g22 and the regularization term ∇f 22 . Note that Wiener filter can be regarded as a Tikhonov regularization method applied to the Fourier domain. Thanks to Parseval’s theorem all calculation can be performed in the Fourier domain where this energy becomes ˆ Eλ [f ] = (fˆ − gˆ)2 − λ(ωx2 fˆ2 + ωy2 fˆ2 )dωx dωy . (5) Using the Calculus of Variations, a necessary condition for a function f to minimize the functional (4) is given by its Euler-Lagrange equation: (f − g) − λΔf = 0. In the Fourier domain, it becomes ˆ =0 fˆ − gˆ + λ(ωx2 fˆ + ωy2 f)

i.e fˆ =

gˆ 1 + λ|ω|2

(6)

1 that is, the original signal multiplied with the filter function F (λ, ω) = 1+λ|ω| 2 which is a non-increasing convex function w.r.t λ (for λ ≥ 0). Set d(λ, ω) = F (λ, ω)2 . It is important to remark that defining the regularization in frequency domain by λ → F (λ, ω)ˆ g (ω) extends Tikhonov regularization beyond the case where g ∈ W 1,2 (R2 ), the Sobolev space of L2 functions with L2 weak derivatives, which is the natural space for Tikhonov regularization as defined by minimization of (4). Indeed, the corresponding function s(λ) is given by s(λ) = F (λ, ω)ˆ g 22 = d(λ, ω)|ˆ g |2 dω. (7)

This is the integral of the squared filter function times the power spectrum of the original signal g, and we have the following result: Proposition 1. The squared L2 -norm s(λ) of the minimizer of the Tikhonov regularization functional as a function of the regularization parameter λ is, for non-trivial images, a monotonically decreasing convex function (for λ ∈ (0, ∞)), when it exists. If g follows an α-power law, then from the Calculus fact recalled in the previous section, g ∈ L2 (Rn ), however s(λ), s (λ) and s (λ) exist and are finite for λ > 0 if and only if α ∈ (0, 2) (which is the case for natural images). Both s and s diverge for λ → 0+ . The square of a non-increasing convex function is a convex function, and from Section 1.2 we have the first part of the proposition. Now dλ (λ, ω) = −

2|ω|2 (1 +

, λ|ω|2 )3

dλλ (λ, ω) = 6

2|ω|4 (1 + λ|ω|2 )4

.

s (λ) = dλ (λ, ω)|g|2 dω and s (λ) = dλλ (λ, ω)|g|2 dω and the rest of the proposition follows by elementary analysis. Set R(λ, ω) = 1 − F (λ, ω) and e(λ, ω) = R(λ, ω)2 . The Fourier image residual is R(λ)ˆ g and its squared norm is r(λ) = R(λ, ω)ˆ g 2 = e(λ, ω)|ˆ g |2 dω

836

D. Gustavsson et al.

An elementary calculation gives eλ (λ, ω) = 2λ|ω|2 /(1 + λ|ω|2 )3 and this function, is for λ fixed, bounded in ω while it satisfies ∀ω,

lim eλ (λ, ω) → 0, lim eλ (λ, ω) → 0

λ→0+

λ→∞

The same holds for r (λ) when it is finite and therefore by the mean value theorem, as it is positive, it must have a maximum and r (λ) must change sign and we can state the following: Proposition 2. Assume first that g ∈ W 1,2 (R2 ) is non trivial. Then, although s(λ) is convex and decreasing, the squared norm residual r(λ) of Tikhonov regularization, while increasing from 0 to g22 , is neither concave nor convex. Note that when g is a α−power law with α ∈ (0, 2), g ∈ L2 (R) while its regularization 2 2 gλ is when λ > 0, thus g − gλ ∈ L (R ) and r(λ) = g − gλ 22 = +∞.

3 Linear Scale-Space and Regularization Linear scale-space theory [10, 11, 12] deals with simplified coarse scale representation of an image g, generated by solving the diffusion (heat) equation with initial value g: ∂f = f, f (−, 0) = g(−) (8) ∂t where = ∂xx + ∂yy is the Laplacian. Equivalently, this coarse scale representation can be obtained by convolution with a Gaussian kernel: fσ = g ∗ Gσ ,

Gσ (x, y) =

1 − x2 +y2 2 e 2σ 2πσ 2

(9)

and the link between the two formulations is given by fσ = f (−, 2σ 2 ). A third formulation of Linear Scale-Space is obtained as “infinite order” Tikhonov regularization, the 1-dimensional case was introduced by Nielsen et al. in [1]. In dimension 2, one defines for λ > 0 2 ∞ k λk k ∂kf E[f ] = (f − g)2 dxdy + dxdy (10) k! ∂x ∂y k− k=1 =0 where k is the ( , k)-binomial coefficient. By a direct computation, its associated Euler-Lagrange equation is given by f −g+

∞ (−1)k λk k=1

k!

k f = 0

where k is the k-th iterated Laplacian

k = ◦ · · · ◦ =

k times

k k =0

∂ 2k ∂x2 ∂y 2(k−)

.

Via Fourier Transform, the Laplacian operator becomes the multiplication by −|ω|2 operator and as in 1st order Tikhonov regularization, the solution is given by filtering:

On the Rate of Structural Change in Scale Spaces

gˆ

∞

fˆ =

837

2

= e−λ|ω| gˆ.

(11) 1+ The solution of the filtering problem for a given λ > 0 is the same as solving (8) with t = λ. By setting λ = 2σ 2 and applying the convolution theorem to (9) one gets the above equation. Using the Fourier formulation, the squared norm of the solution at λ of (11) s(λ) the squared-norm residual r(λ) are given by 2 2 s(λ) = e−λ|ω| gˆ22 = e−2λ|ω| |ˆ g (ω)|2 dω, 2 2 −λ|ω|2 2 )ˆ g 2 = 1 − e−λ|ω| |ˆ g(ω)|2 dω. r(λ) = (1 − e λk |ω|2k k=1 k!

2

2

If one defines d(λ, ω) = e−2λ|ω| and e(λ, ω) = (1 − eλ|ω| ), they have with respect to convexity/concavity, the same properties as their Tikhonov counterpart defined in the previous section and one can state the following, in term of heat equation / Gaussian variance Proposition 3 1. The squared L2 -norm s(t) of the solution of heat equation as a function of the diffusion “time” t (or equivalently the convolution by the Gaussian kernel in function of the kernel variance) is, for non-trivial images, a monotonically decreasing convex function (for t ∈ (0, ∞)), when it exists. 2. The squared norm residual r(t) of the solution of the heat equation at time t, while increasing from 0 to g22 , is neither concave nor convex. If, instead of using the diffusion time / variance as parameter, one uses the standard deviation σ of the Gaussian kernel, the resulting solution squared norm function s(σ), 2 2 although increasing, may fail to be convex as the function σ → e−σ |ω| is not convex in σ, this is a half Gaussian bell. A simple example showing the convexity failure is provided by the band limited function b whose Fourier transform is ˆb(ω) = 1 if |ω| ≤ 1 and ˆb(ω) = 0 otherwise. A direct calculation gives 2 π 1 − e−σ s(σ) = σ which is neither convex nor concave. In the other hand, for a function g following a α−power law with α < 2, s(σ), this seems to be convex (for instance if α = 0, s(σ) = π/σ 2 , if α = 1, s(σ) = π 3/2 /σ 2 ). If,again, the power spectrum of the image g is following a power law in spatial frequencies, while its regularized L2 - norm is finite, the residual norm is not as the initial datum is not square-integrable.

4 Total Variation Image Decomposition Bounded Variation image modeling was introduced in the seminal work of Rudin et al. in [13], where the following variational image denoising problem is considered. Given an image g and λ > 0, find the minimizer of the following energy 2 E(f ; g, λ) = (g − f ) dxdy + λ |∇f | dxdy (12)

838

D. Gustavsson et al.

The regularized image fλ can be interpreted as a denoised version of g, but also as the “geometric” content of g while the residual νλ = g − fλ contains the “noise/fine texture” component. Several methods have been proposed to solve the above equation, by solving a regularized form of the Euler-Lagrange equation of the functional f − g − λ∇ ·

∇g =0 |∇g|

where ∇· denote the divergence operator, but also for instance the non linear projection method of Chambolle ( [14]), which we have used in this work. λ is a regularization parameter that determines the level of details that ends up in the (noise/texture) component νλ . As λ increases νλ will contain details of larger and larger scale, that will not appear in fλ . Again it is interesting so see how the image content changes as λ increases. The component vλ is the residual of the regularization and contains the details that are suppressed in the cartoon component fλ and we set r(λ; g) = vλ 22 = g − fλ 22

(13)

2

i.e. the squared L -norm of the residual image as a function of the regularization parameter λ. Related to the norm of the residual is the norm of the cartoon component as a function of λ s(λ; u0 ) = uλ 22 (14) s (λ) encodes the rate in which details are suppressed in the cartoon component uλ . Due to the high non linearity of the TV-regularization problem, there is no relatively simple expression for s(λ), r(λ) and their respective derivatives. A norm study for the dual norm of the TV norm was done by Meyer in [15]. A more direct behavior for the 2-norm can be computed in a few cases. For instance Strong and Chan [16] showed that if g is the function g(x) = 1 if x ∈ B(0, 1) the unit disk, g(x) = 0 if x ∈ B(0, 1), then its regularization has the form cg, where c ∈ (0, 1) is a constant, therefore attenuating the contrasts of the image. In general situation, we cannot expect these type of simple results. We have instead decided to study the behavior of these functions experimentally on an image database.

5 Experiments 5.1 Sinc in Scale Space Let g(x) = sin(x)/x be the Sinc function where x ∈ [−∞, ∞]. The squared L2 norm of the residual as a function of the regularization parameter is in the Tikhonov case r(λ) =

1

−1

(

λx2 2 ) dx 1 + λx2

(15)

and in the scale space case r(σ) =

1

−1

(1 − e

−ω2 σ2 2

)2 dω.

(16)

On the Rate of Structural Change in Scale Spaces

839

1.0

0.8

0.15

0.6

0.10

0.4

0.6

0.4 0.2 0.2 2

2

4

6

8

10

12

4

6

8

10

14

0.5

1.0

1.5

2.0

6

8

(a) Tikhonov regularization: Residual norm, first and second order derivative 0.5 1.5 0.4

0.4

0.3 1.0

0.3 0.2 0.2

0.1

0.5 0.1

2

4

0.1 2

4

6

8

2

4

6

8

0.2

(b) Scale Space: Residual norm, first and second order derivative Fig. 1. The residual norm as a function of the regularization parameter for g(x) = sin(x) . The x plots clearly indicate that residual norm function are, in both case, increasing functions, but not concave.

The result is presented in figure 1. The plots clearly indicate that the residual norm in both cases -is not concave. 5.2 Black Squares with Added Gaussian Noise The first experiment is done on an artificially generated 100 × 100 image containing four 3 × 3 black squares, one 20 × 20 black square and added Gaussian white noise with σ2 = 12. The white background has intensity 125 and the black square 10, after the noise has been added the image is zero mean normalize. In figure 2 the regularized and residual image are shown for increasing regularization using first order Tikhonov Regularization. As the small scale noise is suppressed, the large scale geometric structures are also smoothed out. The norm of the residual is an increasing function of the scale and it seems to be concave, and in fact it can be concave for the shown λ. However λ may be small at the inflection point. In figure 3 the regularized and residual images are shown for increasing regularization using linear gaussian scale space. The results for the linear Gaussian scale-space is similar to the result using first order Tikhonov regularization. In figure 4 the regularized and residual images are shown for increasing regularization using Total Variation image decomposition. The different structures are suppressed at using different λ while the large scale structures are well preserved. At λ = 12 the gaussian white noise is suppressed, and at λ = 210 is the small boxes remove and finally the large box is suppressed at λ = 550. The residual norm as a function of the regularization parameter is not a concave function of λ.

840

D. Gustavsson et al.

4.5

1800

First Order Derivative of the Residual Norm

Residual Norm

4 1600

3.5 1400

3 1200

2.5

2

1000

1.5 800

1 600

0.5 400

200

0

0

20

40

60

80 100 120 Regularization Parameter

140

160

180

200

−0.5

0

20

40

60

80 100 120 Regularization Parameter

140

160

180

200

Fig. 2. Result for the squares and noise image using first order Tikhonov regularization. On the first row the regularized and the residual images for λ = 3, 10, 20 and 50 are shown. The plots contain the L2 −norm of the residual as a function the scale λ, followed by the first order derivative in log-scale.

8

2600

Derivative of the Residual Norm in Log−Scale 7

2400

6 2200

5

2000

4

3 1800

2 1600

1 1400

0 Residual Norm 1200

0

10

20

30

40

50

60

Regularization Parameter − σ

70

80

90

100

−1

0

10

20

2

30

40

50

60

70

80

90

100

2

Regularization Parameter − σ

Fig. 3. Result for the squares and noise image using linear scale space. On the first row the regularized and the residual images for σ 2 = 1, 7, 13 and 64 are shown. The plots contain the L2 −norm of the residual as a function the scale σ, followed by the first order derivative in log-scale.

5.3 DIKU Multi Scale Image Sequence Database I The newly collected DIKU Multi-Scale image sequence database [17], contains sequences of the same scene captured using varying focal length. The sequences contain both man-made structures and nature, the distance to the main objects in the scenes also show a large variation (from a few meters to a few kilometers).

On the Rate of Structural Change in Scale Spaces

841

10

3500

Derivative of the Residual Norm in Log−Scale

Residual Norm

3000

5

2500

0

2000

−5 1500

−10 1000

−15 500

0

0

100

200

300 400 500 Regularization Parameter

600

700

800

−20

0

100

200

300 400 500 Regularization Parameter

600

700

800

Fig. 4. Result for the squares and noise image using TV-decomposition. On the row regularized and the residual images for λ = 12, 38, 100 and 200 are shown. The plots contain the L2 −norm of the residual as a function the scale λ, followed by the first order derivative in log-scale. The residual norm seems to be a monotonically increasing non-concave function. The residual norm has three points of ’high’ curvature: one at λ = 12 - the noise is suppressed - and λ = 210 - the small squares are suppressed, and λ = 580 - the large square is suppressed.

Each image has first been normalized by an affine intensity range change so that that the intensity range becomes [0, 1], followed by subtracting the mean value (i.e. the mean intensity is 0 in each image). The mean residual norm was computed on the normalized images in the database, using fixed scales σ = 2i where i = 0, · · · , 12, using linear gaussian scale space. The result is a feature vector ¯ r (0), · · · , r¯(12) containing 1 r¯(i) = r(i; I) (17) N I∈F

where F is the set of all N normalized images in the database. The (signed) distance function d(I0 ) of a normalized image I0 ∈ F to the mean is defined as d(I0 ) =

12

r(i; I0 ) − r¯(i)

(18)

i=0

The (signed) distance to the mean has been computed for all images in the DIKU database. Images with large positive values have a larger than average residual and images with large negative values have a smaller than average residual. The first row in figure 5 contains the 4 images with the largest positive distance to the mean, on the second row the 4 images with the largest negative distance to the mean. The image contents difference is striking and clearly indicate that the residual norm contains important contents information. The same experiment was performed using first order Tikhonov regularization with similar, but not identical, result.

842

D. Gustavsson et al.

Fig. 5. The top row show images where f (σ) is much larger than the average and bottom row show images where f (σ) is much smaller than the average. The contents difference is striking! The images in the first row contain small scale details (texture), while the images in the bottom row contain large scale geometric structures.

6 Conclusions For square-integrable images, the squared L2 -norms of the regularized images in first order Tikhonov regularization and linear Gaussian Scale Space are, in general decreasing convex functions of the regularizing parameter. This may fail for Linear Scale space when Gaussian standard deviation is used as a parameter. Their squared residual norm are however not concave functions. For the the Total Variation regularization too, it is shown empirically that the squared norm of the residual is not concave. This confirms that the squared norm of the residual may be an indicator of image structure, both for 1st order Tikhonov regularization, Gaussian Scale Space as well as Total variation regularization. The behavior of the latter will be studied further in future research.

Acknowledgements This research was funded by the EU Marie Curie Research Training Network VISIONTRAIN MRTN-CT-2004- 005439 and the Danish Natural Science Research Council project Natural Image Sequence Analysis (NISA) 272-05-0256. The authors wants to thank Christoph Schnörr (Heidelberg University), Niels-Christian Overgaard (Lund University) and Vladlena Gorbunova (Copenhagen University) for charing their knowledge.

References 1. Nielsen, M., Florack, L., Deriche, R.: Regularization, scale-space, and edge detection filters. International Journal of Computer Vision 7(4), 291–307 (1997) 2. Sporring, J., Weickert, J.: On generalized entropies and scale-space. In: ter Haar Romeny, B.M., Florack, L.M.J., Viergever, M.A. (eds.) Scale-Space 1997. LNCS, vol. 1252, pp. 53– 64. Springer, Heidelberg (1997)

On the Rate of Structural Change in Scale Spaces

843

3. Sporring, J., Weickert, J.: Information measures in scale-spaces. IEEE Transactions on Information Theory 45, 1051–1058 (1999) 4. Buades, A., Coll, B., Morel, J.M.: A non-local algorithm for image denoising. In: CVPR 2005: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), Washington, DC, USA, vol. 2, pp. 60–65. IEEE Computer Society, Los Alamitos (2005) 5. Thompson, A.M., Brown, J.C., Kay, J.W., Titterington, D.M.: A study of methods of choosing the smoothing parameter in image restoration by regularization. IEEE Transactions on Pattern Analysis and Machine Intelligence 13(4), 326–339 (1991) 6. Mrázek, P., Navara, M.: Selection of optimal stopping time for nonlinear diffusion filtering. International Journal of Computer Vision 52(2-3), 189–203 (2003) 7. Weickert, J.: Anisotropic Diffusion in Image Processing. ECMI. Teubner-Verlag (1998) 8. Ruderman, D.L., Bialek, W.: Statistics of natural images: Scaling in the woods. Physical Review Letters 73(6), 814–817 (1994) 9. Field, D.J.: Relations between the statistics of natural images and the response properties of cortical cells. Journal of the Optical Society of America A 4, 2379–2394 (1987) 10. Koenderink, J.J.: The structure of images. Biological Cybernetics 50, 363–370 (1984) 11. Witkin, A.P.: Scale-space filtering. In: Proceedings 8th International Joint Conference on Artificial Intelligence, Karlsruhe, August 1983, vol. 2, pp. 1019–1022 (1983) 12. Iijima, T.: Basic theory on normalization of a pattern. Bulletin of Electrical Laboratory 26, 368–388 (1962) (in Japanese) 13. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Phys. D 60(1-4), 259–268 (1992) 14. Chambolle, A.: An algorithm for total variation minimization and applications. Journal of Mathematical Imaging and Vision 20(1-2), 89–97 (2004) 15. Meyer, Y.: Oscillating Patterns in Image Processing and Nonlinear Evolution Equations: The Fifteenth Dean Jacqueline B. Lewis Memorial Lectures. American Mathematical Society (AMS), Boston (2001) 16. Strong, D., Chan, T.F.: Exact solutions to total variation problems. Technical Report 96-41, UCLA, Ca (1996) 17. Gustavsson, D., Pedersen, K.S., Nielsen, M.: A multi-scale study of the distribution of geometry and texture in natural images (2009) (in preparation) 18. Florack, L., Duits, R., Bierkens, J.: Tikhonov regularization versus scale space: A new result. In: Proceedings of International Conference on Image Processing (ICIP), pp. 271–274 (2004)

Transitions of a Multi-scale Image Hierarchy Tree Arjan Kuijper Fraunhofer Institute for Computer Graphics Research & TU Darmstadt, Germany

Abstract. In this work we describe the possible transitions for the hierarchical structure that describes an image in Gaussian scale space. Until now, this tree structure has only been used for topological segmentation. In order to perform image matching and retrieval tasks based on this structure, one needs to know which transitions are allowed when the structure is changed under inﬂuence of one control parameter. We present a list of such transitions, enabling tree edit distance operations.

1

Introduction

In the analysis of images and shapes, descriptors take a prominent place. The ﬁrst aim of these descriptors is to represent the underlying structure in a simple way that is as invariant as possible, for instance with respect to rotations and scaling. Secondly, they should be robust with respect to (some) noise. Thirdly, they should capture “essential” aspects of the underlying structure, so that eﬃcient and eﬀective comparison tasks can be carried out on the descriptors. Essential for the latter is that the way the descriptor is obtained, is well-understood. This allows the deﬁnition of its possible changes. Robustness towards noise can be achieved by considering noise as a local perturbation of the structure. One way to overcome this perturbation is blurring the structure. A Gaussian ﬁlter is traditionally used for this purpose. It was pointed out by Koenderink [1], that choosing an a priori width of the kernel relates to observing the image at only one scale. Taking into account all widths (scales), the image is investigated at all small (“noisy”) levels and coarse (“structure containing”) ones. Doing so, one obtains a scale space image. Secondly, he pointed out that this equals to observing the image dynamically changed by the heat equation, thus linking the kernel based approach to a partially diﬀerential equation. It was shown that scale space image contains a tree-like sub-structure that serves as a rotation and scale invariant image descriptor [2] which can be used for image segmentation based on topological arguments. See for example [3] for full details and related work. In order to be able to use the proposed tree structure for tasks like image matching, retrieval, and reconstruction (like [4,5,6] were points in the scale space image are used), one needs to understand how this tree can change. The focus of this paper is to describe the possible changes of this tree structure. We restrict ourselves to a one parameter family of perturbations, that is, the structural changes (called singularities or catastrophes) that occur when one X.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 844–855, 2009. c Springer-Verlag Berlin Heidelberg 2009

Transitions of a Multi-scale Image Hierarchy Tree

845

extra introduced parameter changes, for instance due to varying imaging conditions. Such changes can inﬂuence the “building blocks” of the tree structure and allow well-deﬁned tree-based edit distance operations [7]. We will start with an short introduction to scale space and polynomials in it [1, 8], catastrophe theory [9, 10], and the tree structure [2] in section 2. In section 3 we introduce a special type of points that occur in our analysis, namely degenerated scale space saddles. Together with catastrophe points these form the basis of the possible transitions. They are presented in section 4, while the consequences for the tree structure are given in section 5. We give a simple example to illustrate the theory on an MR image in section 6 and give conclusions in section 7.

2

Theory

Let L(x) : Rn → R be an image with x an n-dimensional spatial variable (point) and L((x)) the intensity measured at a point x. The Gaussian scale space image L(x; t) is deﬁned as the convolution of L with a Gaussian: 2 1 − |x−y| 4t L(x; t) = L(y) dy (1) √ ne 4πt Rn The Gaussian ﬁlter is the Greens’ function of the diﬀusion, or heat, equation: ∂t L(x; t) = ΔL(x; t) with limt↓0 L(x; t) = L(x). For simplicity we will assume that n = 2 and, for notational ease x ∈ R2 , i.e. we assume that the image is embedded in the complete R2 . 2.1

Scale Space Polynomials, Jets

At each point (x0 , y0 ) a Taylor expansion can be made of a function L(x, y) to investigate the local structure: L(x, y) ≈ L + iLi + 12 ijLij + 16 ijkLijk + . . . , where L(.) denotes the partial derivatives with respect to the variables i, j, · · · ∈ (x, y), evaluated at the point of interest. In Gaussian scale space the same holds for the spacial and scale variable, i.e. i, j, · · · ∈ (x, y, t) and all derivatives of L(x, y, t) are evaluated at (x0 , y0 , t0 ). This yields a scale space polynomial. Next, due to the heat equation, the scale derivatives in Taylor expression can be expressed in terms of spatial derivatives, since ∂tn = Δn . The nth order scale space jet is deﬁned as the scale space polynomial with spacial derivatives up to order n. 2.2

Critical Curves, Scale Space Germs

Critical curves are curves in scale space that satisfy ∇x L = 0. It has been proven by Damon [9] that these curve do not intersect in scale space unless extra constraints (like symmetry) are added. The curves consist of saddle branches and extremum branches that meet pairwise at catastrophe points. At such points the spatial Hessian matrix Lxx Lxy H= (2) Lxy Lyy

846

A. Kuijper

degenerates and has exactly one eigenvalue equal to zero. Tracing critical points over scale, at such catastrophe points a saddle-extremum pair is created or annihilated. These catastrophe points are also called top points [4, 5, 6], since they occur at local extrema with respect to the scale axis: at local maxima for annihilations and at local minima for creations. In Gaussian scale space, there is one semi-free parameter: scale. Therefore, an eigenvalue of the Hessian matrix can become zero with multiplicity one. The generic catastrophe is thus described by terms x3 and y 2 , called A2 or cusp [10]. To account for the fact t can only increase during the evolution, two scale space polynomials are needed to describe an annihilation (Eq. (3)) and a creation (Eq. (4)) in a small environment of the origin with local coordinates x, y, and t: La = x3 + 6xt + y 2 + 2t 3

2

(3) 2

L = x − 6xy − 6xt + y + 2t c

(4)

The critical curves √ and are parameterised by √ cc occur in the (x, t) plane cca (x, y, t) = (± −2t, 0, t) and ccc (x, y, t) = (± 2t, 0, t). This follows directly from the x derivatives of Eqs. (3 - 4): Lax |y=0 = 3x2 + 6t and Lcx |y=0 = 3x2 − 6t. As an important consequence of the A2 catastrophe, critical curves do not intersect (as this requires a higher order catastrophe), but can contain subsequent creation-annihilation points. This implies that we can deﬁne scale space germs as scale space polynomials that yield critical curves with only generic catastrophes. For instance, La in Eq. (3) is a valid scale space germ, but Lc in Eq. (4) not, as 1 for t = 72 an intersection of two critical curves occurs. However, Lc + y, =0 is a scale space germ. 2.3

Saddle Points in Scale Space

In Gaussian scale space the only type of critical points are saddle points [11]. These scale space saddles appear at critical curves since the spatial derivatives vanish. To investigate these points, consider the Hessian matrix in scale space (the extended Hessian): ⎛ ⎞ Lxx Lxy Lxt H = ⎝ Lxy Lyy Lyt ⎠ (5) Lxt Lyt Ltt Since this matrix contains the spatial Hessian (Eq. (2)), at least one eigenvalue is positive and one is negative [2]. At scale space saddles the intensity on a critical curve has a local extremum: Let a curve be parametrised by L(x(t), y(t), t), then ddt L(x(t), y(t), t) = Lx xt + Ly yt + Lt . Since the parametrisation takes place at a critical curve, the spatial derivatives are zero, so ddt L(x(t), y(t), t) = Lt . Finally, at a scale space saddle Lt = 0 (and consequently ΔL = 0 and trH = 0). 2.4

A Multi-scale Image Descriptor

In 2+1D scale space images iso-manifolds through scale space saddles divide the 3D volume into two parts. At the initial image such parts reduce to areas

Transitions of a Multi-scale Image Hierarchy Tree

847

P

SSS D

C

Fig. 1. A sketch of the local structure at a scale space saddle in (x, y, t) space, t vertical (left), same in the y = 0 plane with the critical curves dashed (middle), and its algebraic tree representation (right)

that correspond to topological segments around two extrema. It is possible to discriminate between the two parts connected at the saddle, due to the fact that the scale space saddle is connected to one of the extrema via a critical curve. A sketch of such a structure is given in Figure 1 (see e.g [3] for full details, e.g. on the role of creations). On the left one sees critical curves and an isomanifold through a scale space saddle; a sketch in the (x, t) plane is given in the middle. The critical curve on the right (called “C”) contains a saddle branch and an extremum branch. The two branches are connected at the catastrophe (top) point. Via the iso-manifold through the scale space saddle “SSS”, this critical curve is connected to the other one on the left (called “D”). This is schematically visualised in the right image, where the “C” child part is connected to the curve with nodes “D” (child) and “P” (parent) via SSS . This is the “building block” of the hierarchical tree. Each inner node represents a scale space saddle, while the leaves are formed by the extrema in the initial image.

3

Degenerated Scale Space Saddles

The matrix in Eq. (5) is degenerated when at least one of its eigenvalues equal zero, i.e. det H = 0. Obviously, this is an extra requirement in scale space and thus non-generic. A degenerated (extended) Hessian implies that the type of the point cannot be resolved. For critical points it means that it is neither a saddle nor an extremum, but merely a combination of both - exactly because two of such points coincide. For scale space saddles, a zero eigenvalue of the extended Hessian implies that one of the other eigenvalues is positive and one negative, for instance when a saddle with two negative eigenvalues and one with two positive eigenvalues coincide. Since the latter denotes a local minimum of L(x(t), y(t), t) and the former a local maximum of L(x(t), y(t), t), such an event is visible as appearing as a point of inﬂexion of L(x(t), y(t), t).

848

A. Kuijper

Theorem 1. Degenerated scale space saddles coincide with points of inflexion: d2 L(x(t), y(t), t) = 0 ⇔ det H = 0 dt2

(6)

Proof. The points of infection of L(x(t), y(t), t) are with vanishing ﬁrst (i.e. scale space saddles) and second order derivatives of L with respect to t. Then d2 dt2 L(x(t), y(t), t)

d = dt (Lx xt + Ly yt + Lt ) = (Lxx xt + Lxy yt + Lxt )xt +(Lxy xt + Lyy yt + Lyt )yt +(Lxt xt + Lyt yt + Ltt ),

(7)

since we can ignore the derivatives of the spatial parametrisations xt and yt , as they are accompanied by spatial derivatives that vanish on critical curves. Next, the right hand side of Eq. (7) can be written as (xt , yt , 1) · H · (xt , yt , 1)T , which equals zero iﬀ det H = 0 since H is symmetric. So when two scale space saddles coincide, the resulting point is degenerate. Theorem 2. On critical curves, Eq. (7) can be simplified to d2 L(x(t), y(t), t) = (Lxt xt + Lyt yt + Ltt ). dt2

(8)

d Lx (x(t), y(t), t) = Proof. On critical curves, we have Lx = 0 and consequently dt 0. The similar argument holds for Ly , so (Lxx xt + Lxy yt + Lxt ) = 0.

In a one-parameter family, only one eigenvalue equals zero. So we assume that the special event locally takes place in the (x, t) plane, while y is a regular (Morse) 2 variable. Then we may neglect y derivatives and get ddt2 L(x(t), 0, t) = (Lxt xt + Ltt ) and det H = Lxx Ltt − L2xt . At critical points we obtain Lxx xt + Lxt = 0.

4

Transitions

When we allow a change driven by one parameter, we expect to see situations that are non-generic for still images. However, for moving images, e.g. ﬁlms or a sequence to warp one image into another, such situations can become generic. Since the tree structure relies on critical curves, catastrophe points and scale space saddles, we will discuss the eﬀect of the simplest combinations of them: 1. 2. 3. 4. 5.

two catastrophe points coincide on a critical curve, two critical curves intersect (necessarily at a catastrophe point), a catastrophe point coincides with scale space saddle, two scale space saddles coincide, and two scale spaces saddles on a critical curve have the same value - either at one critical curve, or at diﬀerent curves. 6. two scale spaces saddles on diﬀerent critical curves but on the same isomanifold have the same value.

Transitions of a Multi-scale Image Hierarchy Tree

849

Fig. 2. A critical curve in (x, y, t) space, t vertical. From left to right: When in Eq. (9) increases, a pair of top points is removed.

For all situation we will describe scale space germs. They are generic in a oneparameter family of perturbations iﬀ there is exactly one parameter that has to be ﬁxed to obtain the described situation. We note that we described cases 1 and 2 (partially) in some detail before 4, but summarize the results for completeness. In section 5 we describe the consequences of these events for the tree structure. 4.1

Two Catastrophe Points Coincide on a Critical Curve

The situation that two catastrophe points coincide on a critical curve implies the description of a creation or an annihilation of a pair of creation and annihilations events. Such a pair exists if a critical curve traverses the manifold det H = 0 twice. If the curve is perturbed, it is pulled away from this manifold. Exactly where the curve is tangent to the manifold, this special situation occurs. In [12], this was modelled by using a scale space germ in analogy of Eq. (4) by Lc = x3 − 6xy 2 − 6xt + y 2 + 2t + y, (9) √ 1 6) a creation and an annihilation with = 0 a free parameter. For ∈ (0, 32 √ √ 1 1 occur, for > 32 6 there are zero catastrophes, and for = 32 6 the two catastrophes coincide (are created or annihilated, depending on the decrease or increase of ). So with an additional parameter “wiggles” at a critical curve can be removed, i.e. a smoothing of the critical curve, as shown in Figure 2. 4.2

Two Critical Curves Intersect

The intersection of critical curves occurs at catastrophe points. Therefore, this event is described by an higher order catastrophe. For an annihilation (Eq. 3) this can be modelled [12] by Lx = 4x(x2 + 6t), arising from the scale space germ L = x4 + 12x2 t + 12t2 + x + y 2 + 2t

(10)

= 0 one For = 0 one obtains the so-called A3 catastrophe in scale space, for has the generic A2 catastrophe, see Figure 3. For a creation we use the modelling Lx = 4x(x2 − 6t), arising from the more complicated scale space polynomial L = x4 − 12x2 t − 12t2 − 12x2 y 2 + 2y 4

(11)

850

A. Kuijper

Fig. 3. A critical curve in (x, y, t) space, t vertical. From left to right: When the sign of in Eq. (10) changes, the annihilation takes place with the other extremum.

The scale space germ is obtained by adding the perturbation terms α(x2 + 2t) + β(y 2 + 2t) + γx + δy.

(12)

Choosing non-zero values for α, β, and γ, together with δ = 0 (that is, a one parameter degeneration), yields the desired creation result. 4.3

A Catastrophe Point Coincides with Scale Space Saddle

When a scale space saddle and a catastrophe point coincide, the following requirements hold: det H = 0 and trH = ΔL = ∂t L = 0. The latter implies that Lxx = −Lyy , so the former reads −L2xx − L2xy = 0. So the complete second order structure has to vanish: Lxx = Lxy = Lyy = 0. This is due to the fact that trH = 0 implies that det H is non-positive. We needed to set two parameters equal to zero, instead of one. So this situation is not generic in a one parameter family of perturbations. This is in line with intuition, stating that we cannot simple change an extremum into a saddle, vice versa. 4.4

Two Scale Space Saddles Coincide

For the situation that two scale space saddles coincide – a degenerated scale space saddle – we take the case that the event takes place in the (x, t) plane. Then it follows that Ltt is at least O(x): If Ltt = 0 than Lxt = 0, i.e. we are left with an ordinary critical point. If Ltt = O(1) we have L = O(x4 , t2 ), and similar to the case above the saddle coincides with the catastrophe point, since Lxx has to vanish. Therefore, Ltt = O(x) and the simplest scale space 5-jet (in x only) reads

1 5 1 3 1 2 x + tx + t x + x2 − y 2 + δ x2 + 2t + x (13) 120 6 2 Eq. (13) represents an A4 catastrophe in scale space, where the saddle is located at the origin. For such a catastrophe perturbations are required for the orders x3 , x2 , and x. Note that scale t perturbs x3 . For the three requirements L=

Transitions of a Multi-scale Image Hierarchy Tree Δ,Ε 0.025, 1.

Δ,Ε 0., 1.

0.4 0.2 0 -0.2 -0.4 -0.6 -0.8

Δ,Ε 0.025, 1.

0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -2

-1

0

1

0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -2

Δ,Ε 0.025, 0.

-1

0

1

-1

0

1

-1

0

1

-2

Δ,Ε 0., 1.

0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1

0

1

2

1

-1

0

1

Δ,Ε 0.025, 1.

0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -2

0

0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -2

Δ,Ε 0.025, 1.

-1

Δ,Ε 0.025, 0.

0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -2

-2

Δ,Ε 0., 0.

0.4 0.2 0 -0.2 -0.4 -0.6 -0.8

851

0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -2

-1

0

1

2

-2

-1

0

1

2

Fig. 4. Plots of L(x(t, y(t)), t) along a critical curve of the scale space polynomial of Eq. (13) for several values of δ and . Each column (changing , δ ﬁxed) shows a sequence where two scale space saddles (local extrema on the curve) meet and (dis)appear.

Lx = 0, Lt = 0 and det H = 0 one gets δ = δ(), that is, one free parameter remains. In Figure 4 a parameterized critical curve is shown for several values of δ and . Each column shows a sequence with varying where two scale space saddles (local extrema on the curve) meet and (dis)appear. Constraining the event to the origin yields δ = 0 and = 0, i.e. the plot in the middle. Here one clearly sees a horizontal tangent. Degenerated scale space saddles are generic in a one parameter family of perturbations, which implies that pairs of scale space saddles can be created and annihilated on a critical curve. 4.5

Two Scale Spaces Saddles with the Same Value

The case that two scale space saddles have the same intensity is easily derived from the previous section. In Figure 4, the plot in the middle of the third row shows exactly this phenomena when in Eq. (13) instead of δ now is taken ﬁxed. For < 0 the critical curve contains 3 extrema. When δ varies, their intensities vary and two have equal intensity for δ = 0. This eﬀect in the (x, t) plane, i.e. for the separation of parts in the scale space image, is shown in Figure 5 (cf. the middle plot in Figure 1). Here the iso-manifold is visible as an isophote since we do not consider the (Morse) y variable. When δ = 0, the iso-manifolds are connected at two places (middle plot), one of each is taken when δ = 0. As one can see, the region enclosed by the iso-manifold remains stable when going through the transition. Only the location of the scale space saddle changes suddenly.

852

A. Kuijper

Fig. 5. The critical curves (dashed) and iso-manifolds through the scale space saddles for Eq. (13) for several values of δ and = −1. When δ = 0, the saddles with equal intensity occur.

Fig. 6. Two scale-space saddles are located at one manifold (middle); Perturbing yields two nested manifolds with each one scale-space saddle (left and right). Critical curves are represented by the dashed curves, the iso-manifolds by the continuous curves.

4.6

Two Saddle Points on a Manifold with the Same Intensity

Of course, the scale-space saddles do not have to lie on the same critical curve. In the situation that a iso-manifold contains two scale-space saddles, the local description needs two saddle branches and three extremum ones. Consequently, one needs a polynomial expression of L6 (x, t) = O(x6 ) = x6 + 30x4 t + 180x2 t2 + 120t3 . Perturbations are of orders L3 (x, t) = x3 + 6xt, L2 (x, t) = x2 + 2t, and L1 (x) = x. Again, t perturbs the O(x4 ) terms. So the simplest description reads L(x, y, t) = x2 − y 2 + L6 (x, t) + αL3 (x, t) + βL2 (x, t) + γL1 (x)

(14)

In Fig. 6 one can see the unperturbed situation in the middle, and two perturbed situations on the left and the right. The unstable situation with two scale-space saddles on one iso-manifold is a transition.

5

Consequences for the Tree Structure

For the tree structure these transitions imply the following results: 1. Two catastrophe points coincide on a critical curve: This has no direct inﬂuence, since the complete critical curve is used in the construction of the

Transitions of a Multi-scale Image Hierarchy Tree

D2

e1

C2

D3

853

C3

D3

C3

D2

C2

e2

e3

e1

e2

e3

Fig. 7. The tree representations of the transition visualised in Fig. 6. The two scalespace saddles swap in hierarchy, which is a simple rotation of a parent-child pair of internal nodes.

2.

3.

4.

5.

6.

6

tree. This event merely describes a smoothing eﬀect allowing one to reduce the number of catastrophes on a critical curve. Two critical curves intersect: This event describes the change in ordering of the two child nodes “C” and “D” in the hierarchy when one catastrophe describes an annihilation. In the case that one describes a creation, it can be regarded as handing over a local creation-annihilation from one curve to another, which has no inﬂuence on the tree. A catastrophe point coincides with scale space saddle: This requires a total disappearance of second order structure and is a co-dimension two event, i.e. not generic. Two scale space saddles coincide: When going through a degenerated scale space saddle, two scale space saddles are created or annihilated. This does not inﬂuence the tree (although it may have some consequences when followed by the following event). Two scale spaces saddles on a critical curve have the same value. In this case, another scale space saddle connects the two parts of the manifold. Although the intensity of the manifold changes continuously, the location of the scale space saddles changes discontinuously. In the tree structure this inﬂuences the information stored at the node. If an iso-manifold under perturbation goes through a situation where it contains two scale space saddles on two diﬀerent curves, the impact on the tree is a change of parent-child nodes, as shown in Fig. 7.

Example

We illustrate the theory with the MR image shown in Fig. 8. Normal [0, 100] distributed noise is added, and of both a blurred version is computed. We derived the pre-segmentations and tree structures of both images shown in Fig. 9 as described in [2]. For visualisation purposes, we used the blurred versions for reference to keep the trees rather simple. The labels in the trees refer to the extrema in the blurred MR images as shown in Fig. 10. It is clear that a mapping based on the pre-segmentation in these images yields the pair A1, B2, C3, E5, F 6, and G7. This is also provided by the locations of spatial locations of these extrema (deviation of maximal one pixel).

854

A. Kuijper

Fig. 8. From left to right: An MR image, its noisy version (although intensities increased, the graphical output is rescaled) and their blurred versions

R G D E F A C B

R 7 5 6 1 3 2 4

Fig. 9. The tree structures of the MR image and the noisy MR image, respectively, starting scale the blurred versions

E B F C

5 2 6 34

D A G

7

H

1

8

Fig. 10. The labelled pre-segmentations, and the segment belonging to the left sub trees in the white matter for both MR images

The diﬀerences in the trees are the labels (extrema) D and 4. The operation on D is a simple deletion of a leave. Leave (extremum) 4 is added to the subtree spanned by extremum 1. Its position is found comparing the intensity of the scale space saddle with those that are related to extremum 1. Alternatively, it can be considered as replacing leave 1 by the building block with extrema 1 and 4, and applying the subsequent rotations with the scale space saddle belonging to extremum 4: with the node representing the scale space saddle related to extremum 3, followed by the one related to extremum 2.

7

Summary and Discussion

We introduced degenerated scale space saddles in section 3 and discussed in section 4 the six possible simple situations that may occur when an extra constraint is posed on the building blocks of the hierarchical structure in scale space, viz.

Transitions of a Multi-scale Image Hierarchy Tree

855

the critical curves, catastrophe points, and (degenerated) scale space saddles. We showed that one of them (described in section 4.3) is a co-dimension two event, requiring two vanishing control parameters. The other cases are co-dimension one events and generic in a one-parameter setting. These cases describe transitions of structures that are non-generic in scale space, but when allowing an additional constraint they become generic. This is useful when we want to change one image into another, i.e. matching. The list in section 5 indicates that the hierarchical tree structure changes only with respect to the ordering of children, information stored in the nodes and rotation of a parent-child node combination. The consequences of the standard events in scale space, viz. creation and annihilation of pairs of critical points, are the addition or removal of leave elements. Together they form the possible changes of the tree under a one parameter family of changes, give the grammar for relevant matching algorithms, cf. [7], and extend point-based methods like [4, 5, 6]. A simple example for this was shown in section 6.

References 1. Koenderink, J.J.: The structure of images. Biological Cybernetics 50, 363–370 (1984) 2. Kuijper, A., Florack, L.M.J.: Hierarchical pre-segmentation without prior knowledge. In: Proceedings of the 8th ICCV, pp. 487–493 (2001) 3. Kuijper, A.: Exploring and exploiting the structure of saddle points in Gaussian scale space. Computer Vision and Image Understanding 112(3), 337–349 (2008) 4. Kanters, F., Platel, B., Florack, L.M.J., ter Haar Romeny, B.: Content based image retrieval using multiscale top points. In: Griﬃn, L.D., Lillholm, M. (eds.) ScaleSpace 2003. LNCS, vol. 2695, pp. 33–43. Springer, Heidelberg (2003) 5. Kanters, F., Lillholm, M., Duits, R., Janssen, B., Platel, B., Florack, L.M.J., ter Haar Romeny, B.: On image reconstruction from multiscale top points. In: Kimmel, R., Sochen, N.A., Weickert, J. (eds.) Scale-Space 2005. LNCS, vol. 3459, pp. 431– 442. Springer, Heidelberg (2005) 6. Platel, B., Balmachnova, E., Florack, L.M.J., Kanters, F., ter Haar Romeny, B.: Using Top-Points as Interest Points for Image Matching. In: Fogh Olsen, O., Florack, L.M.J., Kuijper, A. (eds.) DSSCV 2005. LNCS, vol. 3753, pp. 211–222. Springer, Heidelberg (2005) 7. Fogh Olsen, O.: Tree edit distances from singularity theory. In: Kimmel, R., Sochen, N.A., Weickert, J. (eds.) Scale-Space 2005. LNCS, vol. 3459, pp. 316–326. Springer, Heidelberg (2005) 8. Lindeberg, T.: Scale-Space Theory in Computer Vision. The Kluwer International Series in Engineering and Computer Science. Kluwer Academic Publishers, Dordrecht (1994) 9. Damon, J.: Local Morse theory for solutions to the heat equation and Gaussian blurring. Journal of Diﬀerential Equations 115(2), 386–401 (1995) 10. Arnold, V.I.: Catastrophe Theory. Springer, Berlin (1984) 11. Koenderink, J.J.: A hitherto unnoticed singularity of scale-space. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(11), 1222–1224 (1989) 12. Kuijper, A., Florack, L.M.J.: The relevance of non-generic events in scale space models. International Journal of Computer Vision 57(1), 67–84 (2004)

Local Scale Measure for Remote Sensing Images Bin Luo1 , Jean-François Aujol2 , and Yann Gousseau3 1

CNES/DLR/ENST Competence Center and Telecom ParisTech [email protected] 2 CMLA, ENS Cachan, CNRS, UniverSud [email protected] 3 Telecom ParisTech, LTCI CNRS [email protected]

Abstract. This paper addresses the problem of deﬁning a scale measure for digital images, that is, the problem of assigning a meaningful scale information to each pixel. We propose a method relying on the set of level lines of an image, the so-called topographic map. We make use of the hierarchical structure of level lines to associate a level line to each pixel, enabling the computation of local scales. This computation is made under the assumption that blur is constant over the image, and therefore adapted to the case of satellite images. We then investigate the link between the proposed deﬁnition of local scale and recent methods relying on total variation diﬀusion. Eventually, we perform various experiments illustrating the spatial accuracy of the proposed approach.

1

Introduction

Scale is a fundamental concept in digital image analysis. In particular, computing local scale information is often a preliminary step for structure extraction, enabling the tuning of analysis tools and permitting scale invariant analysis. In texture or remote sensing image analysis, scale itself is also a useful feature for recognition or classiﬁcation. In this paper, we propose a new method relying on morphological tools to compute characteristic scales in a digital image. Roughly speaking, a pixel is characterized by the size of the most contrasted object it belongs to. In order to compute scales in an image, the approach initially proposed in [13] is probably the most extensively used. The basic idea is to compute local scales as extrema of various diﬀerential operators in the linear scale-space. Based on similar ideas, it is proposed in [11,23,28] to estimate scales in an image by considering extrema in the linear scale-space of operators based on information theory. The linear scale-space has also been proposed in the framework of remote sensing image analysis as a convenient way to estimate a resolution invariant characteristic scale [15]. However, such methods cannot achieve spatial accuracy in the computation of local scales, since it is well-known that Gaussian convolutions yield geometric degradations of the image through the diﬀusion of edges. Recently, it has been proposed to use the total variation regularization to compute local scales [26, 27]. The underlying idea is that using non-linear parX.-C. Tai et al. (Eds.): SSVM 2009, LNCS 5567, pp. 856–867, 2009. c Springer-Verlag Berlin Heidelberg 2009

Local Scale Measure for Remote Sensing Images

857

tial diﬀerential equations enables one to get spatially accurate results. In [4], it is proposed to estimate the local scales of structures by looking at the way they evolve under the total variation ﬂow. In a diﬀerent context, the mathematical morphology school has proposed to characterize materials through the size distribution of their constituents, by using the concept of granulometry [10]. Similarly, it has been proposed in [16] to use the derivative of granulometry, the pattern spectrum, to index gray-scale images. In the framework of remote sensing imaging, the authors of [7] have proposed, in view of the classiﬁcation of satellite images, to compute size distributions (called derivative morphological profile) at each pixel. In this paper, we introduce at ﬁrst a method to compute a local scale measure (a characteristic scale deﬁned at each pixel) relying on the topographic map [5] of the image. The main idea is to associate to each pixel the scale of the most signiﬁcant structure containing it. Contrarily to previous morphological approaches, the proposed method is auto-dual (i.e. dark and bright structures are processed in the same way) and does not necessitate any structuring element. More precisely, we use the Fast level set transformation (FLST) [20], an eﬃcient tool to compute the topographic map, representing an image by a hierarchical structure (an inclusion tree) of shapes. From this tree we search, for a given pixel, the most contrasted shape containing it and we associate the scale of this shape to the pixel. Since remote sensing images are blurred by the PSF of the optical instrument, the contour of a single structure can be diﬀused into several level lines. We propose to group the level lines belonging to the same structure. The criterion used to decide whether level lines should be grouped is based on the assumption that the optical blur is constant over the image, an assumption which makes sense for remote sensing images but would be wrong for natural, everyday life images. The second contribution of the paper is a study of the relationships between morphological deﬁnitions of scale and deﬁnitions relying on the total variation. We show that approaches relying on the total variation ﬂow or regularization yield local scales that are deﬁned, under some assumptions, as weighted averages of the size of shapes containing a given pixel. The paper is organized as follows. In Sect. 2, we present our method for local scale computation. In Sect. 3, alternative variational deﬁnitions of scale are recalled and the link between these approaches and ours is investigated. In Sect. 4, we illustrate the method with numerical examples on remote sensing images and compare our approach with variational methods.

2 2.1

Local Scale Measure Based on Topographic Map Topographic Map

We present in this section the main tool to be used in this paper in order to deﬁne local scales, the topographic map of an image as introduced in [5]. The topographic map is made of the set of level lines of an image. A level line is deﬁned as a connected component of the topological boundary of a level set.

858

B. Luo, J.-F. Aujol, and Y. Gousseau

Equivalently, the topographic map can be seen as a collection of shapes, as deﬁned below. For an image u : Ω ⊂ R2 → R, its upper and lower level sets are respectively deﬁned as Ψλ = {x ∈ Ω, u(x) ≥ λ}

and

Ψ λ = {x ∈ Ω, u(x) ≤ λ},

for λ ∈ R. Observe that u can be reconstructed using respectively upper level sets or lower level sets. Moreover, these sets are globally invariant with respect to contrast changes. Each of these family, upper sets on the one hand and lower sets on the other hand, has a tree structure with respect to inclusion. Several authors ( [22, 5]) have proposed the connected components of level sets as an eﬃcient way to represent images. In order to obtain a unique tree structure of connected component, Monasse et al. [19, 20] introduced the FLST. A shape is deﬁned as the union of a connected component of an upper or lower set together with its holes. The holes of a set A are deﬁned as the connected components of the complementary set of A which do not intersect with the boundary of Ω. Under mild regularity assumptions, shapes correspond to the interior of level lines, see [6]. An important property of the tree of shapes is its invariance to local contrast changes and its auto-duality, that is, its invariance with respect to the operation u → M − u. This implies in practice that light and dark objects are treated in the same way, a property which enables us to associate a unique contrasted shape to each pixel. Figure 1 shows the result obtained with the FLST algorithm on a synthetic image. For a pixel x of an image u, we denote by {fi (x)}i∈A(x) the set of shapes that contain x, A(x) being the set of indices such that fi (x) ⊂ fi+1 (x). For the sake of clarity, we will omit the x dependency when it is not necessary. For each shape, we deﬁne S(fi ) its area, P (fi ) its perimeter, and I(fi ) the gray level value associated to fi . The contrast of the shape fi is then deﬁned as the absolute

(a)

(b)

Fig. 1. Example of FLST : (a) Synthetic image ; (b) Inclusion tree obtained with FLST

Local Scale Measure for Remote Sensing Images

859

value of the diﬀerence between the gray level values associated respectively to fi+1 and fi : C(fi ) = |I(fi+1 ) − I(fi )| 2.2

(1)

Scales of an Image

Basically, we want to associate to each pixel a shape (i.e. a node in the FLST) from which its scale can be computed. Such shapes are obtained by ﬁltering the Topographic Map. Shapes are recursively grouped in order to account for structures present in the image and the most contrasted groups are kept. Shape grouping is deﬁned by taking advantage of the particular structure of satellite images, for which the blur is constant over the image and depend only on the (usually known) PSF of the acquisition device. We then deﬁne the local scale associated to each pixel. Most Contrasted Shape. In view of (1), the simplest deﬁnition of the most contrasted shape at a pixel would be the shape with highest contrast among all shapes containing x, i.e. fˆ(x) = farg maxi∈A(x) C(fi ) .

(2)

However, this deﬁnition is not applicable. Indeed, the deﬁnition of contrast by (1) corresponds to the contrast of a given binary structure under the assumption that only one line is associated to the boundary of this structure. Now, in a natural image, the contours of objects are always blurred by the acquisition process. As a consequence, a discrete image being quantiﬁed, a contour is in fact associated to a set of level lines. In practice, the contrast of each line is often equal to one. Therefore, the choice of the most contrasted shape using (2) can be ambiguous at best or even completely meaningless in the presence of strong noise. A possible solution to this problem has been proposed in [8]. It consists in computing the contrast of a line in a neighborhood, and then in selecting the most meaningful line along monotone branch of the tree. In the present work, we choose to group level lines corresponding to a single structure by using a simple model of blur. To do so, we recursively sum up the contrasts of shapes fi and fi+1 such that (3) S(fi+1 ) − S(fi ) < λP (fi ) where λ is a constant. This criterion relies on the hypothesis that the level lines corresponding to a blurred contour are regularly separated by a distance λ. Let us remark that the hypothesis of a constant blurring kernel for the whole image is realistic in the case of satellite images. We thus deﬁne the cumulated contrast of a shape fi as: i ¯ i) = C(f C(fk ), (4) k=a(i)

860

B. Luo, J.-F. Aujol, and Y. Gousseau

where, for all i, a(i) = min{j|∀k = j + 1, . . . , i, S(fk ) − S(fk−1 ) ≤ λP (fk−1 )}. ¯ i ) = C(fi ). The If a(i) is not deﬁned (that is if (3) is not satisﬁed), then C(f cumulated contrast of fi is therefore obtained by adding the contrasts of close enough level lines, which usually correspond to the same contour in the image. The most contrasted shape associated to x is then deﬁned as: fˆc (x) = farg maxi∈N C(f ¯ i (x))

(5)

In the case when the maximum is reached at more than one index, then the smaller one is chosen. We conclude this section by noticing that a method to group level lines relying on criteria similar to (3) (but using no perimeter information) was proposed in [19] as an eﬃcient alternative to shock ﬁlters, in the framework of image restoration. Notice that the notion of Maximally Stable Extremal Regions (MSER) [18], which are popular in computer vision community, can be seen as an alternative way for selecting the signiﬁcant shape for a given pixel. The MSERs are deﬁned as the shapes for which the quantities S(fi+1 (x)) − S(fi−1 (x))/S(fi (x)) reach local minima. Observe that there may be several MSERs containing a pixel. A further selection of shape must be performed for computing the scale. [14] shows that the selecting methods bias strongly the deﬁnition of local scale. Level Lines, Edges and Blur. We now investigate the validity of the use of (3) for grouping lines corresponding to a single edge. Let fi and fi+1 be two consecutive shapes corresponding to the same object, an object being deﬁned as a constant times the indicator function of some set smoothed by the acquisition kernel. Writing q for the quantization step and neglecting sampling, we have, for some gray level l, fi = Ψl = {x ∈ Ω / u(x) = l} and fi+1 = Ψl+q . Now, as noticed in [9], if x(s) is a parameterization of ∂Ψl , then ∂Ψl+q can be approximated, for small q by: x ˜(s) = x(s) + q

∇u |∇u|2

(6)

If we now assume that |∇u| ≥ C for some C > 0, then fi+1 ⊂ fi ⊕ D(qC −1 ), where ⊕ stands for the Minkowski1 addition and D(r) is a disk of radius r centered at the origin. On the other hand, assuming that fi is a convex set, the area of fi ⊕ D(qC −1 ) is (Steiner Formula [25]) S(fi ⊕ D(

q 2 q q )) = S(fi ) + π + P (fi ). C C C

This suggests that (3) enables one to group level lines corresponding to the same edge as soon as λ > qC −1 . 1

Let A and B be two sets; A ⊕ B = {x + y, x ∈ A, y ∈ B}.

Local Scale Measure for Remote Sensing Images

861

Scale Deﬁnition. We have seen above that the most contrasted shape at each pixel is deﬁned by (5). In order to deﬁne the scale at each pixel, we choose to consider as ﬁnal shape associated to x the shape fˆ(x) minus the most contrasted shapes embedded inside itself. Let us recall indeed that a shape is a connected component of a level set whose holes have been ﬁlled in. On Fig. 1, the shape F contains the pixels of the shape H. In satellite images occlusion is not preponderant, and contrasted shapes containing other contrasted shapes often correspond to road or river networks. To accurately represent such structures, we eventually decide to deﬁne the most contrasted shape associated to a pixel x as: ˜ f(x) = fˆ(x) \ fˆ(y), (7) ˆ fˆ(y)f(x)

ˆ i.e. the shape f(x) minus the most contrasted shapes strictly embedded in it. Other choices would be possible in the framework of other applications. We choose to deﬁne the scale as E(x) = S(f˜(x))/P (f˜(x))

(8)

so that the geometry of f˜(x) is taken into account. In particular, long and thin shapes (e.g. the roads) correspond to relatively small scales, even though their area can be quite large.

3

Link with Variational Definitions of Scales

Recently, two deﬁnitions of local scale related to the total variation of images have been proposed in the literature. They are presented in Sect. 3.1. In Sect. 3.2 a geometrical interpretation of these deﬁnitions is given and the link between them is clariﬁed. These are actually closely connected with the scale deﬁnition of this paper variation deﬁne the scale at each pixel as a weighted average over many shapes of the ratio area/perimeter, whereas the approach proposed in this paper deﬁne scale using the same ratio but relies on only one shape per pixel. 3.1

Variational Deﬁnitions of Scale

Deﬁnition Based on Total Variation Regularization. It is proposed in [27] to deﬁne the scales in an image by using the Rudin-Osher-Fatemi model [21] (ROF). Recall that the ROF model (or total variation regularization) consists, given an image f , in ﬁnding the solution u of: 1 inf

f − u 2L2 |Du| + (9) u 2T It is shown in [27] that if the scale of a set E is deﬁned as PS(E) (E) (i.e. its area divided by its perimeter, as done in (8)) and if f is a binary image of a disk, then the intensity change between u and f inside this disk is inversely proportional

862

B. Luo, J.-F. Aujol, and Y. Gousseau

to its scale, ie δ = T . Therefore, the idea in [27] to deﬁne scales in an image scale is to use the gray level diﬀerence at each pixel between u and f . The scale at each pixel x is deﬁned as scale(x) = T |u(x) − f (x)|−1 .

(10)

Observe that, in general, this deﬁnition of scale depends on the parameter T . Deﬁnition Based on the Total Variation Diﬀusion. Another deﬁnition of scale in images has been introduced in [4], by making use of the properties of total variation diﬀusion. Let us recall that the solution u of the total variation diﬀusion satisﬁes u(., 0) = f (11) ∂u Du ∂t = div |Du| In [24], the authors have proved the equivalence for 1-dimensional signal of total variation regularization (ROF model) and total variation diﬀusion. They have derived the same type of results as in [27] (where the considered functions were 2-dimensional radially symmetric signals). In particular, when using the total variation diﬀusion on an image, a constant region evolves with speed 2/m where m is the number of pixels in the considered region. Therefore in [4] the authors have proposed to deﬁne the scale m of a region (in dimension 1) as: m = 2 T |∂T u| dt , where T is the evolution time of the total variation diﬀu0

t

sion. In the same paper, the following deﬁnition of scale m is then proposed for 2-dimensional images T m = T . (12) |∂t u| dt 0 3.2

Equivalence and Geometrical Interpretation

As explained above, a geometrical interpretation of the scale deﬁnition given by (10) is provided by results from [27]. On the other hand, equivalence results between total variation regularization (see (9)) and total variation ﬂow (see (11)) are provided in [24]. We now summarize some recent mathematical results in order to further investigate the deﬁnitions of scale given by (10) and (12), as well as to clarify the link with the deﬁnition of scale given in the present paper, (8). These results have been proved by V. Caselles and his collaborators in a series of papers [2, 3, 1]. In particular, it is shown that, if an image f is the characteristic function of a convex set C, i.e. f = 1C , then total variation regularization is equivalent to total variation ﬂow. In both cases, the evolution (E) speed of a convex body C is P|E| where E is the Cheeger set of C (see [1]),

(K) that is, E is a solution of minK⊂C P|K| . The set C is said to be a Cheeger set in itself if it is itself a solution to this minimization problem. In dimension 2, a necessary and suﬃcient condition for C to be a Cheeger set in itself is that C is (C) . A disk is thus a Cheeger convex and the curvature of C is smaller than P|C| set in itself.

Local Scale Measure for Remote Sensing Images

863

Assume that f = 1C , with C Cheeger set in itself. Then it is shown in [3] that the solution of the total variation ﬂow, (11), or equivalently of the total variation regularization, (9), is given by P (C) , 0 1C u(x, T ) = max 1−T (13) |C| (C) The evolution speed of C is thus P|C| (and in the case when C is a disk, this is what was proved by Chan and Strong in [27]). As a consequence, in the case when the considered image f is the characteristic function of a Cheeger set, then both deﬁnitions of scale (10) and (12) are equivalent. Notice that in this particular case these two deﬁnitions of scale are also equivalent to the one proposed in this paper (8). With all three deﬁnitions (10), (12), and (8), the scale of C is equal to P|C| (C) . Of course, in the case of more complicated images, the equivalence does not hold any more (see [14] for a detailed study).

4

Experiments

In this section, we compute scale maps on real satellite images using the approach presented in Sect. 2. Afterwards, we have also compared these results with the results obtained by variational methods. We ﬁrst consider SPOT5 HMA images with a spatial resolution of 5 meters. Most contrasted shapes are extracted using (5). We choose to use a value of λ = 1. Of course, the choice of λ is related to the the image acquisition process and the image quantiﬁcation. It is shown in Appendix A of [14] that under reasonable assumptions, it makes sense to use λ = 1 in (3). In Figs. 2(a) and (d) is displayed a SPOT5 HMA image of Toulouse (half urban area and half rural area) together with its computed scale map. It can be observed that the computed scales are spatially accurate (e.g. at the edges of buildings and warehouses). Moreover, these scales are in good qualitative agreement with the size of structures of the image (large scales for ﬁelds and the forest, while smaller for individual houses on the right). One also observes that computed scales are largely constant inside objects. Notice also that the road network is attributed relatively small scales, in agreement with (8). Figures 2(b) and (e) show the scale map obtained from a SPOT5 THR image [12] of Marseille with resolution 2.5m. In this clever imaging system, two images captured by two diﬀerent CCD line arrays are interpolated to generate the high resolution image. The PSF of SPOT5 THR images is much more complicated than those of HMA images and cannot be modeled in a simple way, for instance using a Gaussian kernel. However, the slope of these PSF is sharp enough so that a value of λ = 1 still allows to group level lines belonging to the same contour. Again one can observe that the scales computed for the ﬁelds on the bottom are larger than the urban area on the top. Finally we present the scale map for a QuickBird Panchromatic image with a resolution of 0.6m, taken at Ouagadougou (see Fig. 2(c)). Again we use a

864

B. Luo, J.-F. Aujol, and Y. Gousseau

(a)

(b)

(c)

11 50

10

100

9

150

8

200

7

250

11 50

8

50

100

7

100

10

150

9 150 8

6 200

6

250

300

5

300

350

4

350

400

3

400

200 5

4

3

7

250

6

300

5

350

4

400

3

2 450 500 50

100

150

200

250

300

350

400

450

2

450

1

500

500

450 1 50

100

150

200

(d)

250

300

350

400

450

2 1

500

500

50

100

150

200

(e)

6

300

350

400

450

500

(f)

6.5 50

250

6.5 50

100

5.5

100

150

5

150

6 5.5

6.5 50

6

100

5.5

150

5

200

4.5

250

4

300

3.5

5 200

4.5

200 4.5

250

4

300

3.5

350

3

400

2.5

450

2

500 50

100

150

200

250

300

(g)

350

400

450

500

1.5

250 4

300 350

3.5

350

400

3

400

2.5

450

450 500 50

100

150

200

250

300

(h)

350

400

450

500

2

3 2.5 2

500 50

100

150

200

250

300

350

400

450

500

1.5

(i)

c Fig. 2. (a) Image of Los Angeles, SPOT5 (5m)CNES; (a) Image of Toulouse, SPOT5 c c (5m)CNES; (b) Image of Marseille, SPOT5 (2.5m, 512 × 512)CNES; (c) Image of c Ouagadougou, Quick-Bird (0.6m, 512 × 512)CNES; (d - f ) Scale map corresponding to (a)-(c) obtained by the method proposed in this paper. Notice the spatial accuracy of the method. (g - i) Scale map corresponding to (a)-(c) obtained by using the total variation diﬀusion, (12).

value of λ = 1. Considering the high resolution of QuickBird images, shapes smaller than 16 pixels are not taken into consideration. This is equivalent to the application of a grain ﬁlter of size 16 before the computation of the scale map, see [17]. The scale map is shown in Fig. 2(f). Again it can be observed that for most structures, such as the big buildings on the top left, computed scales are spatially accurate. However, for the city block in the middle of the image, the scale of the shape corresponding to the whole block has been associated to all pixels of this shape. This shows one of the limitation of the method : to each pixel is associated the scale of exactly one structure. A natural extension of the method would be to compute a scale proﬁle at each pixel, in a way similar to [7].

Local Scale Measure for Remote Sensing Images

865

In Sect. 3, we have compared our scale deﬁnition with the scale deﬁnitions proposed by [4] and [27], which are based respectively on the total variation ﬂow and the Rudin-Osher model. We have also explained why these two approaches ( [4] and [27]) are equivalent under some regularity assumptions. Therefore, in this section, we only use scale maps computed by the method of [4] for experimental comparisons. On Figs. 2(g)-(i) are displayed the scale maps of the previously shown images of Toulouse, Marseille and Ouagadougou (see Figs. 2(a)-(c)), obtained by using total variation diﬀusion (see (12)). The total evolution time is T = 60. The ﬁrst observation is that all these methods yield a good spatial accuracy Second, one can observe that the scale maps displayed on Figs. 2 (g)-(i) are more noisy than the ones obtained using the method presented in this paper, Figs. 2 (d)-(f) Third, regions of the original images with homogeneous scales are more clearly identiﬁed in Figs. 2 (d)-(f) than with the method using the total variation ﬂow to deﬁne scales. It appears quite clearly on these three examples that the approach presented in this paper yield sharper results. This is probably due to the deﬁnition of scale given by (8): only one contrasted shape is selected for each pixel.

5

Conclusions and Perspectives

By using the topographic map, we have introduced a deﬁnition of local scale. The scale of a pixel is deﬁned as the scale of the most contrasted shape containing this pixel. We have validated our approach on various satellite images (we refer the interested reader to [14] for more numerical examples). These experiments indicate that the method gives robust and spatially accurate results. No complex parameter tuning is involved. However, it should be noticed that this approach is devoted to remote sensing images: indeed, when deﬁning the contrast of level lines, we make the assumption that blur is uniform over the image. Another contribution of this paper is the study of the links between the proposed method and previous variational deﬁnitions of scale. This somehow bridges a gap between morphological and variational methods to compute scales in an image. We think that the proposed scale measure could be an eﬃcient feature for image classiﬁcation or segmentation. This should be the subject of further studies. Indeed, this feature can be expected to be complementary to more traditional indexing features obtained through wavelets or pixel statistics.

References 1. Alter, F., Caselles, V.: Uniqueness of the cheeger set of a convex body (submitted, 2007) (preprint) 2. Bellettini, G., Caselles, V., Novaga, M.: The total variation ﬂow in Rn . Journal of diﬀerential equation 184, 475–525 (2002)

866

B. Luo, J.-F. Aujol, and Y. Gousseau

3. Bellettini, G., Caselles, V., Novaga, M.: Explicit solutions of the eigenvalue problem Du −div |Du| = u in R2 . SIAM Journal on Mathematical Analysis 36, 1095–1129 (2005) 4. Brox, T., Weickert, J.: A TV ﬂow based local scale estimate and its application to texture discrimination. Journal of Visual Communication and Image Representation 17, 1053–1073 (2006) 5. Caselles, V., Coll, B., Morel, J.-M.: Topographic maps and local contrast changes in natural images. Int. J. Comp. Vision 33, 5–27 (1999) 6. Caselles, V., Monasse, P.: Geometric Description of Topographic Maps and Applications to Image Processing. Lecture Notes in Mathematics. Springer, Heidelberg (to appear, 2009) 7. Chanussot, J., Benediktsson, J., Fauvel, M.: Classiﬁcation of remote sensing images from urban areas using a fuzzy possibilistic model. IEEE Geoscience and Remote Sensing Letters 3, 40–44 (2006) 8. Desolneux, A., Moisan, L., Morel, J.: Edge detection by helmholtz principle. Int. J. of Computer Vision 14, 271–284 (2001) 9. Dibos, F., Koepﬂer, G., Monasse, P.: Total variation minimization for scalar/vector regularization. In: Osher, S., Paragios, N. (eds.) Geometric Level Set Methods in Imaging, Vision, and Graphics, pp. 121–140 (2003) 10. Haas, A., Matheron, G., Serra, J.: Morphologie mathématique et granulométries en place. Annales des mines, 736–753 (1967) 11. Jägerstand, M.: Saliency maps and attention selection in scale and spatial coordinates: An information theoretic approach. In: Proc. 5th Int. Conf. on Computer Vision, Cambridge, MA, USA, pp. 195–202 (1995) 12. Latry, C., Rouge, B.: SPOT5 THR mode. In: Proc. SPIE Earth Observing Systems III, October 1998, vol. 3493, pp. 480–491 (1998) 13. Lindeberg, T.: Feature detection with automatic scale selection. Int. J. of Computer Vision 30, 79–116 (1998) 14. Luo, B., Aujol, J.-F., Gousseau, Y.: Local scale measure from the topographic map and application to remote sensing images. SIAM Multiscale Modeling and Simulation (to appear) 15. Luo, B., Aujol, J.-F., Gousseau, Y., Ladjal, S., Maître, H.: Resolution independent characteristic scale dedicated to satellite images. IEEE Trans. on Image Processing 16, 2503–2514 (2007) 16. Maragos, P.: Pattern spectrum and multiscale shape representation. IEEE Trans. Pattern Anal. Mach. Intell. 11, 701–716 (1989) 17. Masnou, S., Morel, J.-M.: Image restoration involving connectedness. In: DIP 1997, pp. 84–95. SPIE (1997) 18. Matas, J., Chum, O., Martin, U., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: Proceedings of the British Machine Vision Conference, vol. 1, pp. 384–393 (2002) 19. Monasse, P.: Mophological representation of digital images and application to registration. PhD thesis, University Paris IX (2000) 20. Monasse, P., Guichard, F.: Fast computation of a contrast-invariant image representation. IEEE Trans. on Image Processing 9, 860–872 (2000) 21. Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992) 22. Salembier, P., Serra, J.: Flat zones ﬁltering, connected operators, and ﬁlters by reconstruction. IEEE Transactions on Image Processing 4, 1153–1160 (1995)

Local Scale Measure for Remote Sensing Images

867

23. Sporring, J., Weickert, J.: On generalized entropies and scale-space. In: Scale-Space Theories in Computer Vision, pp. 53–64 (1997) 24. Steidl, G., Weickert, J., Brox, T., Mrazek, P., Welk, M.: On the equivalence of soft wavelet shrinkage, total variation diﬀusion, total variation regularization, and sides. SIAM Journal on Numerical Analysis 42, 686–713 (2004) 25. Stoyan, D., Kendall, W.S., Mecke, J.: Stochastic Geometry and its Applications, 2nd edn. Wiley, Chichester (1995) 26. Strong, D., Aujol, J.-F., Chan, T.: Scale recognition, regularization parameter selection, and Meyer’s G norm in total variation regularization. SIAM Journal on Multiscale Modeling and Simulation 5, 273–303 (2006) 27. Strong, D., Chan, T.: Edge-preserving and scale-dependent properties of total variation regularization. Inverse Problems 19, 165–187 (2003) 28. Winter, A., Maître, H., Cambou, N., Legrand, E.: An Original Multi-Sensor Approach to Scale-Based Image Analysis for Aerial and Satellite Images. In: IEEEICIP 1997, Santa Barbara, CA, USA, vol. II, pp. 234–237 (1997)

Author Index

Alrefaya, Musa 212 Andersson, Thord 124 Astola, Laura 224 Aubert, Gilles 137 Aujol, Jean-Fran¸cois 295, 856 Avenel, Christophe 576

Felsberg, Michael 808 Florack, Luc 224, 377, 588 Franchini, Elena 75 Franken, Erik 795, 820 Frolkoviˇc, Peter 38 Fundana, Ketut 684

Bae, Egil 1 Bardin, Sabine 770 Becciu, Alessandro 588 Becker, Florian 150 Benmansour, Fethallah 14, 648 Berkels, Benjamin 26 Bischof, Horst 200 Borga, Magnus 124 Borok, Soﬁa 490 Bourgine, Paul 38 Bresson, Xavier 112 Breuß, Michael 247, 636, 733, 758 Brune, Christoph 235 Burger, Martin 235 Burgeth, Bernhard 247

Gabrielides, Nikolaos 672 Gilboa, Guy 527 Gousseau, Yann 856 Grasmair, Markus 331 Griﬃn, Lewis D. 343 Guillot, Laurence 87 Gurumoorthy, Karthik S. 100 Gustavsson, David 832

Chambolle, Antonin 368 Chan, Tony F. 112 Chessel, Anatole 770 Cinquin, Bertrand 770 Cohen, Laurent D. 14, 163, 648, 672 Crosier, Mike 343 Damerval, Christophe 782 Dascal, Lorina 259 DeCezaro, Adriano 50 Dinov, Ivo 389 Dong, Yiqiu 271 Drbl´ıkov´ a, Olga 63 Duits, Remco 377, 795, 820 Durand, Sylvain 282 Duval, Vincent 295 Eirola, Timo 660 Elmoataz, Abderrahim 187 Elo, Christoﬀer A. 307 Fadili, Jalal 137, 282 Feigin, Micha 319

Haber, Eldad 612 Hahn, Jooyoung 490 Hajiaboli, Mohammad Reza 356 Hao, Dinh Nho 212 Heldmann, Stefan 612, 624 Heyden, Anders 684 Hinterm¨ uller, Michael 271 Houhou, Nawal 112 Imiya, Atsushi

175

Jalalzai, Khalid 368 Janssen, Bart 377 Jehan-Besson, Stephanie Jindal, Nitin 696 Joshi, Shantanu H. 389 Jung, Miyoun 401 Kappes, J¨ org 150 Keriven, Renaud 721 Kervrann, Charles 770 Kimmel, Ron 259 Kozerke, Sebastian 588 Kuijper, Arjan 844 Lai, Ming-Jun 514 Lassila, Toni 660 L¨ ath´en, Gunnar 124 Lauze, Francois 832

137

870

Author Index

Lecellier, Francois 137 Le Guyader, Carole 87, 600 Leichtweis, Thomas 733 Leit˜ ao, Antonio 50 Lellmann, Jan 150 Lenz, Reiner 124 Lenzen, Frank 413 L´ezoray, Olivier 187 Lillholm, Martin 343 Lucier, Bradley 514 Luo, Bin 856 Malyshev, Alexander 307 Marquina, Antonio 389 Meignen, Sylvain 782 M´emin, Etienne 576 Mikula, Karol 38, 63 Mille, Julien 163 Modersitzki, Jan 612 Morigi, Serena 75, 426 Ng, Michael K. 539 Nielsen, Mads 832 Nikolova, Mila 282, 439 Osher, Stanley J. 389 Overgaard, Niels Chr. 684 Papenberg, Nils 624 Pedersen, Kim S. 832 P´erez, Patrick 576 Peyri´eras, Nadine 38, 63 Pizarro, Luis 247 Pock, Thomas 200 Prados, Emmanuel 696, 745 Rahman, Talal 307 Rangarajan, Anand 100 Reichel, Lothar 426 Remeˇs´ıkov´ a, Mariana 38 Revenu, Marinette 137 Roode, Vivian 588 Rosman, Guy 259 Rumpf, Martin 709 Sahli, Hichem Sakai, Tomoya

212 175

Salamero, Jean 770 Sawatzky, Alex 235 Scherzer, Otmar 413, 452 Schn¨ orr, Christoph 150, 552 Segonne, Florent 721 Seidel, Hans-Peter 636 Setzer, Simon 464 Sgallari, Fiorella 75, 426 Soatto, Stefano 696 Sochen, Nir 319 Steidl, Gabriele 477, 552 Sturm, Peter 745 Szlam, Arthur 112 Ta, Vinh-Thong 187 Tai, Xue-Cheng 1, 50, 259, 490, 502, 539 ter Haar Romeny, Bart M. 588 Teuber, Tanja 477 Thiran, Jean-Philippe 112 Thorstensen, Nicolas 721 Toga, Arthur W. 389 Unger, Markus

200

van Assen, Hans 588 Vanhamel, Iris 212 Van Horn, John D. 389 van Sande, Justus 343 Vese, Luminita A. 295, 401, 600 Vogel, Oliver 733 Walch, Birgit 452 Wang, Jingyue 514 Weickert, Joachim 247, 527, 636, 733, 758 Welk, Martin 527 Werlberger, Manuel 200 Wirth, Benedikt 709 Wu, Chunlin 502 Yau, Andy C. 539 Yoon, Kuk-Jin 745 Yuan, Jing 150, 552 Z´era¨ı, Mourad 565 Zimmer, Henning 636