Lecture Notes in Computer Science Edited by G. Goos, J. Hartmanis, and J. van Leeuwen
2886
3
Berlin Heidelberg New York Hong Kong London Milan Paris Tokyo
Ingela Nystr¨om Gabriella Sanniti di Baja Stina Svensson (Eds.)
Discrete Geometry for Computer Imagery 11th International Conference, DGCI 2003 Naples, Italy, November 19-21, 2003 Proceedings
13
Series Editors Gerhard Goos, Karlsruhe University, Germany Juris Hartmanis, Cornell University, NY, USA Jan van Leeuwen, Utrecht University, The Netherlands Volume Editors Ingela Nystr¨om Uppsala University, Centre for Image Analysis L¨agerhyddv¨agen 3, 752 37 Uppsala, Sweden E-mail:
[email protected] Gabriella Sanniti di Baja Institute of Cybernetics "E. Caianiello" National Research Council of Italy Via Campi Flegrei, 34, 80078 Pozzuoli (Naples), Italy E-mail:
[email protected] Stina Svensson Swedish University of Agricultural Sciences Centre for Image Analysis L¨agerhyddv¨agen 3, 752 37 Uppsala, Sweden E-mail:
[email protected] Cataloging-in-Publication Data applied for A catalog record for this book is available from the Library of Congress. Bibliographic information published by Die Deutsche Bibliothek Die Deutsche Bibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data is available in the Internet at
.
CR Subject Classification (1998): I.4, I.3.5, G.2, I.6.8, F.2.1 ISSN 0302-9743 ISBN 3-540-20499-7 Springer-Verlag Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law. Springer-Verlag is a member of Springer Science+Business Media GmbH springeronline.com c Springer-Verlag Berlin Heidelberg 2003 Printed in Germany Typesetting: Camera-ready by author, data conversion by Olgun Computergrafik Printed on acid-free paper SPIN: 10967836 06/3142 543210
Preface
This proceedings volume includes papers presented at DGCI 2003 in Naples, Italy, November 19–21, 2003. DGCI 2003 was the 11th conference in a series of international conferences on Discrete Geometry for Computer Imagery. The conference was organized by the Italian Institute for Philosophical Studies, Naples and the Institute of Cybernetics “E. Caianiello,” National Research Council of Italy, Pozzuoli (Naples). DGCI 2003 was sponsored by the International Association for Pattern Recognition (IAPR). This is the second time the conference took place outside France. The number of researchers active in the field of discrete geometry and computer imagery is increasing. Both these factors contribute to the increased international recognition of the conference. The DGCI conferences attract more and more academic and research institutions in different countries. In fact, 68 papers were submitted to DGCI 2003. The contributions focus on discrete geometry and topology, surfaces and volumes, morphology, shape representation, and shape analysis. After careful reviewing by an international board of reviewers, 23 papers were selected for oral presentation and 26 for poster presentation. All contributions were scheduled in plenary sessions. In addition, the program was enriched by three lectures, presented by internationally well-known invited speakers: Isabelle Bloch ´ (Ecole Nationale Sup´erieure des T´el´ecommunications, France), Longin Jan Latecki (Temple University, USA), and Ralph Kopperman (City College of New York, USA). In 2002, a technical committee of the IAPR, TC18, was established with the intention to promote interactions and collaboration between researchers working on discrete geometry. The first TC18 meeting was planned to be held in conjunction with DGCI 2003, to allow the members to discuss the activity of the technical committee. The outcome from this meeting will help the ongoing research and communication for researchers active within the field during the 18 months between the conferences. We hope that we made DGCI 2003 an unforgettable event where researchers gathered for fruitful discussions and enjoyable social activities, both categories providing stimuli for researchers. Hereby, we would like to thank the contributors who responded to the call for papers in a very positive manner, the invited speakers, all reviewers and members of the steering, program, and local organizing committees, as well as DGCI participants. We are also grateful to Regione Campania for the financial help, indispensable to guarantee a successful conference. September 2003
Ingela Nystr¨om Gabriella Sanniti di Baja Stina Svensson
Organization
DGCI 2003 was organized by the Institute of Cybernetics “E. Caianiello” of the National Research Council of Italy and by the Italian Institute for Philosophical Studies. The conference venue was the Italian Institute for Philosophical Studies. The conference was sponsored by the International Association for Pattern Recognition (IAPR).
Conference Chairs General Chair
Program Chair
Publications Chair
Gabriella Sanniti di Baja Institute of Cybernetics “E. Caianiello,” National Research Council of Italy, Pozzuoli (Naples), Italy Stina Svensson Centre for Image Analysis, Swedish University of Agricultural Sciences, Uppsala, Sweden Ingela Nystr¨ om Centre for Image Analysis, Uppsala University, Uppsala, Sweden
Steering Committee Gilles Bertrand Gunilla Borgefors Achille Braquelaire Jean-Marc Chassery Annick Montanvert Gabor Szekely
France Sweden France France France Switzerland
Organization
Program Committee Eric Andr`es Bidyut Baran Chaudhuri Michel Couprie Leila De Floriani Ulrich Eckhardt Bianca Falcidieno Christophe Fiorio Richard W. Hall Atsushi Imiya Pieter Jonker Ron Kimmel Nahum Kiryati Christer O. Kiselman Reinhard Klette Walter Kropatsch Jacques-Olivier Lachaud Gregoire Malandain R´emy Malgouyres Serge Miguet Punam K. Saha Pierre Soille Edouard Thiel Jayaram K. Udupa
France India France Italy Germany Italy France USA Japan The Netherlands Israel Israel Sweden New Zealand Austria France France France France USA Italy France USA
Local Organizing Committee Publicity Scientific Secretariat Web Managers Publications
Salvatore Piantedosi Silvia Rossi Luca Serino, Henrik Bostr¨ om Ida-Maria Sintorn
VII
VIII
Organization
Referees Michal Aharon Eric Andres Mario Bertero Gilles Bertrand Ilya Blayvas Gunilla Borgefors Achille Braquelaire Alexander Bronstein Michael Bronstein Jasmine Burguet Jean Marc Chassery Bidyut Baran Chaudhuri David Coeurjolly Michel Couprie Leila De Floriani Ulrich Eckhardt Bianca Falcidieno Fabien Feschet Christophe Fiorio
Richard Hall Omer Heymann Atsushi Imiya Pieter Jonker Ron Kimmel Nahum Kiryati Christer Kiselman Reinhard Klette Walter Kropatsch Jacques-Olivier Lachaud Pascal Lienhardt Michael Lindenbaum Cris L. Luengo Hendriks Gregoire Malandain R´emy Malgouyres Erik Melin Serge Miguet Annick Montanvert Dipti Prasad Mukherjee
Jan Neumann Ingela Nystr¨ om Francesca Odone Punam K. Saha Gabriella Sanniti di Baja Pierre Soille Stina Svensson Gabor Szekely Benjamin Taton Edouard Thiel Laure Tougne Jayaram K. Udupa Avi Vardi Anne Vialard Ola Weistrand Yaser Yacoob
Table of Contents
Discrete Geometry for Computer Imagery Invited Lectures Topological Digital Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ralph Kopperman
1
Fuzzy Spatial Relationships from Mathematical Morphology for Model-Based Pattern Recognition and Spatial Reasoning . . . . . . . . . . . . . 16 Isabelle Bloch Shape Similarity and Visual Parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Longin Jan Latecki, Rolf Lak¨ amper, and Diedrich Wolter
Peer-Reviewed Papers On the Morphological Processing of Objects with Varying Local Contrast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Pierre Soille Watershed Algorithms and Contrast Preservation . . . . . . . . . . . . . . . . . . . . . . 62 Laurent Najman and Michel Couprie Digital Flatness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Valentin E. Brimkov and Reneta P. Barneva Shape Preserving Digitization of Ideal and Blurred Binary Images . . . . . . . . 82 Ullrich K¨ othe and Peer Stelldinger Towards Digital Cohomology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 Rocio Gonzalez–Diaz and Pedro Real New Results about Digital Intersections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Isabelle Sivignon, Florent Dupont, and Jean-Marc Chassery On Local Definitions of Length of Digital Curves . . . . . . . . . . . . . . . . . . . . . . . 114 Mohamed Tajine and Alain Daurat Characterising 3D Objects by Shape and Topology . . . . . . . . . . . . . . . . . . . . . 124 Stina Svensson, Carlo Arcelli, and Gabriella Sanniti di Baja Homotopic Transformations of Combinatorial Maps . . . . . . . . . . . . . . . . . . . . 134 Jocelyn Marchadier, Walter G. Kropatsch, and Allan Hanbury
X
Table of Contents
Combinatorial Topologies for Discrete Planes . . . . . . . . . . . . . . . . . . . . . . . . . . 144 Yukiko Kenmochi and Atsushi Imiya Convex Structuring Element Decomposition for Single Scan Binary Mathematical Morphology . . . . . . . . . . . . . . . . . . . . . . 154 Nicolas Normand Designing the Lattice for Log-Polar Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 V. Javier Traver and Filiberto Pla On Colorations Induced by Discrete Rotations . . . . . . . . . . . . . . . . . . . . . . . . . 174 ´ Bertrand Nouvel and Eric R´emila Binary Shape Normalization Using the Radon Transform . . . . . . . . . . . . . . . 184 Salvatore Tabbone and Laurent Wendling 3D Shape Matching through Topological Structures . . . . . . . . . . . . . . . . . . . . 194 Silvia Biasotti, Simone Marini, Michela Mortara, Giuseppe Patan`e, Michela Spagnuolo, and Bianca Falcidieno Contour-Based Shape Representation for Image Compression and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 Ciro D’Elia and Giuseppe Scarpa Systematized Calculation of Optimal Coefficients of 3-D Chamfer Norms . . 214 C´eline Fouard and Gr´egoire Malandain Look-Up Tables for Medial Axis on Squared Euclidean Distance Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 Eric Remy and Edouard Thiel Discrete Frontiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 Xavier Daragon, Michel Couprie, and Gilles Bertrand Towards an Invertible Euclidean Reconstruction of a Discrete Object . . . . . 246 Rodolphe Breton, Isabelle Sivignon, Florent Dupont, and Eric Andres Reconstruction of Discrete Surfaces from Shading Images by Propagation of Geometric Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 Achille Braquelaire and Bertrand Kerautret Shape Representation and Indexing Based on Region Connection Calculus and Oriented Matroid Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 Ernesto Staffetti, Antoni Grau, Francesc Serratosa, and Alberto Sanfeliu Incremental Algorithms Based on Discrete Green Theorem . . . . . . . . . . . . . . 277 Sreˇcko Brlek, Gilbert Labelle, and Annie Lacasse
Table of Contents
XI
Using 2D Topological Map Information in a Markovian Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 Guillaume Damiand, Olivier Alata, and Camille Bihoreau Topology Preservation and Tricky Patterns in Gray-Tone Images . . . . . . . . 298 Carlo Arcelli and Luca Serino Shortest Route on Height Map Using Gray-Level Distance Transforms . . . . 308 Leena Ikonen and Pekka Toivanen On the Use of Shape Primitives for Reversible Surface Skeletonization . . . . 317 Stina Svensson and Pieter P. Jonker d-Dimensional Reverse Euclidean Distance Transformation and Euclidean Medial Axis Extraction in Optimal Time . . . . . . . . . . . . . . . . 327 David Coeurjolly Efficient Computation of 3D Skeletons by Extreme Vertex Encoding . . . . . . 338 Jorge Rodr´ıguez, Federico Thomas, Dolors Ayala, and Llu´ıs Ros Surface Area Estimation of Digitized Planes Using Weighted Local Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348 Joakim Lindblad Surface Area Estimation in Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358 Guy Windreich, Nahum Kiryati, and Gabriele Lohmann Perimeter and Area Estimations of Digitized Objects with Fuzzy Borders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368 Nataˇsa Sladoje, Ingela Nystr¨ om, and Punam K. Saha Geodesic Object Representation and Recognition . . . . . . . . . . . . . . . . . . . . . . 378 A. Ben Hamza and Hamid Krim A Fast Algorithm for Reconstructing hv-Convex 8-Connected but Not 4-Connected Discrete Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388 P´eter Bal´ azs, Emese Balogh, and Attila Kuba Stability in Discrete Tomography: Linear Programming, Additivity and Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398 Sara Brunetti and Alain Daurat Removal and Contraction for n-Dimensional Generalized Maps . . . . . . . . . . . 408 Guillaume Damiand and Pascal Lienhardt The Generation of N Dimensional Shape Primitives . . . . . . . . . . . . . . . . . . . . 420 Pieter P. Jonker and Stina Svensson Geometric Measures on Arbitrary Dimensional Digital Surfaces . . . . . . . . . . 434 Jacques-Olivier Lachaud and Anne Vialard
XII
Table of Contents
Nonlinear Optimization for Polygonalization . . . . . . . . . . . . . . . . . . . . . . . . . . 444 Truong Kieu Linh and Atsushi Imiya A Representation for Abstract Simplicial Complexes: An Analysis and a Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454 Leila De Floriani, Franco Morando, and Enrico Puppo A Computation of a Crystalline Flow Starting from Non-admissible Polygon Using Expanding Selfsimilar Solutions . . . . . . 465 Hidekata Hontani, Mi-Ho Giga, Yoshikazu Giga, and Koichiro Deguchi Morphological Image Reconstruction with Criterion from Labelled Markers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475 Dami´ an Vargas-Vazquez, Jose Crespo, and Victor Maojo Intertwined Digital Rays in Discrete Radon Projections Pooled over Adjacent Prime Sized Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485 Imants Svalbe and Andrew Kingston Power Law Dependencies to Detect Regions of Interest . . . . . . . . . . . . . . . . . 495 Yves Caron, Harold Charpentier, Pascal Makris, and Nicole Vincent Speed Up of Shape from Shading Using Graduated Non-convexity . . . . . . . . 504 Daniele Gelli and Domenico Vitulano Tissue Reconstruction Based on Deformation of Dual Simplex Meshes . . . . 514 David Svoboda and Pavel Matula Spherical Object Reconstruction Using Simplex Meshes from Sparse Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524 Pavel Matula and David Svoboda A System for Modelling in Three-Dimensional Discrete Space . . . . . . . . . . . . 534 Andreas Emmerling, Kristian Hildebrand, J¨ org Hoffmann, Przemyslaw Musialski, and Grit Th¨ urmer Interactively Visualizing 18-Connected Object Boundaries in Huge Data Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544 Robert E. Loke and Hans du Buf
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555
Topological Digital Topology Ralph Kopperman Department of Mathematics City College of New York New York NY 10031, USA
Abstract. The usefulness of topology in science and mathematics means that topological spaces must be studied, and computers should be used in this study. We discuss how many useful spaces (including all compact Hausdorff spaces) can be approximated by finite spaces, and these finite spaces are completely determined by their specialization orders. As a special case, digital n-space, used to interpret Euclidean n-space and in particular, the computer screen, is also dealt with in terms of the specialization. Indeed, algorithms written using the specialization are comparable in difficulty, storage usage and speed to those which use the traditional (8,4), (4,8) and (6,6) adjacencies, and are of course completely representative of the spaces. Keywords: Digital topology, general topology, T0 -space, specialization (order), connected ordered topological space (COTS), Alexandroff space, Khalimsky line, digital n-space, metric and polyhedral analogs, chaining maps, calming maps, normalizing maps, inverse limit, Hausdorff reflection, skew (=stable) compactness, (graph) path and arc connectedness and components, (topological) adjacency, Jordan curve, robust scene, cartoon.
1
Introduction: Why Topological Spaces?
During the first calculus or post-calculus course with any intellectual glue, students meet the idea of topology: Definition 1. A topological space is a set X, together with a collection τ , of subsets of X, such that: (a) if G is a finite subset of τ then its intersection, G ∈ τ , and (b) if G is any subset of τ then its union, G ∈ τ . A subset of X is called open if it is in τ , closed if its complement is in τ . Asa result of this definition, since ∅ is a finite subset of τ , ∅ = ∅ and X = ∅ are open (are in τ ). Why does topology come up there? First, metrics (distance functions) are noticed in calculus, such as d(x, y) = |x − y|, or for vectors, d(x, y) = x − y.
The author wishes to acknowledge support for this research, both from the EPSRC of the United Kingdom through grant GR/S07117/01, and the City University of New York, through CUNY-PSC grant 64472-0033.
I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 1–15, 2003. c Springer-Verlag Berlin Heidelberg 2003
2
Ralph Kopperman
It is easy to define a topology using from a metric: a set T is open if whenever x ∈ T , then some ball of positive radius, Br (x) = {y | d(x, y) < r}, is contained in T (for some r > 0, Br (x) ⊆ T ). Essentially no properties of the distance are used in the proof that this gives a topology, and for metrics satisfying the triangle inequality: d(x, z) ≤ d(x, y)+d(y, z), Br (x) is open (if y ∈ Br (x) then for s = r−d(x, y) > 0, Bs (y) ⊆ Br (x) (if d(y, z) < s then d(x, z) ≤ d(x, y)+d(y, z) < d(x, y) + (r − d(x, y)) = r). Good references in general topology include [18] and [19]. Using topology one can easily define: • Limit (thus derivative), continuous function (at a point or always), • closure, interior and boundary of sets, • connected set, compact set. It then becomes easy to show that each function is continuous at each point where it has a derivative. Also, the connected sets of real numbers are the intervals and the compact sets are the bounded closed sets; thus the closed bounded intervals (sets of the form [a, b] = {x | a ≤ x ≤ b}) are the connected compact subsets. If f : X → Y is a function and A ⊆ X, the image of A under f is f [A] = {f (x) | x ∈ A}; further, if B ⊆ Y , the inverse image of B, f −1 [B] = {x | f (x) ∈ B}. We don’t bother with any of these textbook proofs, although we do some later which are related to our particular interest. Facts: Suppose f is continuous and A ⊆ X. If A is connected then f [A] is connected, if A is compact, then f [A] is compact. Thus in particular, if X = IR and a < b then f [[a, b]] is a closed, bounded interval, [m, M ], so: There are x, y ∈ [a, b] so that f (x) = m and f (y) = M – that is, f achieves a minimum and a maximum on [a, b], so these are worth looking for. This justifies much of differential calculus. Since f (a) and f (b) are in the interval f [[a, b]], if p is between f (a) and f (b) then p ∈ f [[a, b]], which is to say that for some c ∈ [a, b], p = f (c). That is, the equation p = f (y) has a solution in [a, b]. This justifies much of the search for roots in algebra. The above and many similar facts mean that topological questions permeate analysis, thus theoretical science. Therefore, much computing must be done with topological data. We now discuss methods to do this.
2
Finite and Alexandroff Spaces
Definition 2. A topological space is Alexandroff if: (a’) if G is ANY subset of τ then G ∈ τ . (The above is in addition to (b), and implies (a) of Definition 1.) These spaces were studied systematically long ago by the author after whom they are named; see [2]. ∞ ∞ This is quite atypical of spaces. In IR for example, {0} = 1 (−1/n, 1/n) = 1 B1/n (0) is an intersection of open sets which isn’t open. But it is typical of
Topological Digital Topology
3
the finite topological spaces that one can completely store in a computer, since then any subset of τ is finite, so its intersection is in τ . The theory of Alexandroff spaces, applied especially to digital topology, is discussed in [11] and [7]. Most of the results in Lemma 2 through Theorem 1 can be found there conveniently (though none originate there). Alexandroff spaces have a particular property that is extremely useful in computing. Recall that a preorder is a relation ≤ such that each a ≤ a and a ≤ b&b ≤ c ⇒ a ≤ c; a partial order is a preorder for which a ≤ b&b ≤ a ⇒ a = b. We now work toward a proof that for finite spaces, topology and continuity are completely determined by a preorder (which should be seen as an asymmetric adjacency relation). That is (see Theorem 1 (b), or [11]): There is a preorder such that the open sets are the upper sets; those for which x ∈ T &x ≤ y ⇒ y ∈ T (lower sets are similarly defined). Furthermore, a function between Alexandroff spaces will be continuous if and only if it preserves the order. Here are some relevant textbook proofs: Lemma 1. Given any topological space: (a) Finite unions and arbitrary intersections of closed sets are closed. (b) For each A ⊆ X there is a smallest closed set containing A called its closure, and defined by clA = {C closed| A ⊆ C}, and a largest open subset of A, its interior, intA = {T open| T ⊆ A}. A function f : X → Y is defined to be continuous at a point a if whenever f (a) ∈ T and T is open, then for some open U a, f [U ] ⊆ T . It is continuous if continuous at every point in X. (c) The following are equivalent: f is continuous ⇔ for each open T , f −1 [T ] is open ⇔ for each closed C, f −1 [C] is closed ⇔ for each A, f [ cl(A)] ⊆ cl(f [A]). Proof. (a) Let G be a collection of closed sets. By deMorgan’s laws, X \ {C | C ∈ G} = {X \ C | C ∈ G} so the complement of {C | C ∈ G} is open if G is finite, thus {C | C ∈ G} is closed if G is finite; the other proof is similar. (b) By definition of a topological space, {T open| T ⊆ A} is an open set, and is certainly contained in A, and the largest such set (since if U ⊆ A is open, then U is one of the sets whose union is being taken). Thus int(A) is the largest open set contained in A. By (a), {C closed| A ⊆ C} is closed, and the proof that it is the smallest closed set containing A is like the above. −1 (c) For this proof it’s necessary to notice some properties of f : −1 x ∈ f [ G] ⇔ f (x) ∈ G ⇔ for some −1 B ∈ G, f (x) ∈ B ⇔ x ∈ {f [B] | B ∈ G}, −1 x ∈ f [ G] ⇔ f (x) ∈ G ⇔ for each B ∈ G, f (x) ∈ B ⇔ x ∈ {f −1 [B] | B ∈ G}, [Y \ B] ⇔ ∈ Y \ B ⇔ f (x) ∈B ⇔ x
∈ f −1 [B]. x ∈ f −1 f (x) −1 −1 −1 That is, f [ G] = {f [B | B ∈ G}], f [ G] = {f −1 [B | B ∈ G}], and f −1 [Y \ B] = X \ f −1 [B]. Another useful property is that A ⊆ f −1 [B] ⇔ f [A] ⊆ B. Suppose f is continuous, T is open and a ∈ f −1 [T ]. Then f (a) ∈ Tso for some open Ua a, f [Ua ] ⊆ T , thus a ∈ Ua ⊆ f −1 [T ], therefore f −1 [T ] ⊆ {Ua | a ∈ f −1 [T ]} ⊆ f −1 [T ], showing f −1 [T ] to be open.
4
Ralph Kopperman
If the inverse image of each open set is open and C is closed, then Y \ C is open, so f −1 [C] = X \ f −1 [Y \ C] is closed. If the inverse image of each closed set is closed, then so is f −1 [ cl(f [A])] ⊇ A. But then as the smallest closed set containing A, cl(A) ⊆ f −1 [ cl(f [A])], showing f [ cl(A)] ⊆ cl(f [A]). Finally if each f [ cl(A)] ⊆ cl(f [A]) and f (x) is in an open set T , then for each x, f (x) ∈ cl(Y \ T ), thus x ∈ cl(f −1 [Y \ T ]). But this says that for some open set, x ∈ U ⊆ X \ cl(f −1 [Y \ T ]) ⊆ X \ f −1 [Y \ T ] = f −1 [T ]. Therefore f is continuous at x. The same principles are used to see the key facts for Alexandroff spaces. But we need other definitions first. Definition 3. Let X be any set and B any collection of subsets of X. Then there is a smallest topology τ B on X which contains B. Let (X, τ ) be a topological space. The specialization is defined by x ≤X y ⇔ x ∈ cl{y}. The space X is T0 if whenever x ∈ cl{y} and y ∈ cl{x} then x = y, and T1 if each {x} is closed. If Y ⊆ X then the subspace topology τ |Y is defined by saying that T ∈ τ |Y if (and only if ) for some U ∈ τ, T = U ∩ Y . Given a collection of spaces, (Xi , τi ), i ∈ I, the product topology on the set i∈I Xi 1 , is the smallest one containing each set of the form {x | xi ∈ U }, where i ∈ I and U ∈ τi . Lemma 2. (a) For each X, ≤X is a preorder. It is a partial order iff the space is T0 , and equality if and only if the space is T1 . (b) Each closed set is a ≤X lower set and each open set is a ≤X -upper set. For each continuous f : X → Y , x ≤X y ⇒ f (x) ≤Y f (y). (c) Given a subspace Y of a space X, for x, y ∈ Y, x ≤Y y ⇔ x ≤X y. In a product, for x, y ∈ i∈I Xi , x ≤i∈I Xi y if and only if for every coordinate, xi ≤Xi yi . Proof. (a) Of course, x ∈ cl{x}. Next notice that x ∈ cl{y} if and only if cl{x} ⊆ cl{y}; it is immediate that ≤X is transitive. The assertion about partial order is immediate from our slightly non-standard definition of T0 , and that about equality is immediate from our standard definition of T1 . (b) If x ∈ C, C is closed, and y ≤X x, then y ∈ cl{x} ⊆ C, so y ∈ C, thus C is lower. Therefore each open set is upper since its complement is lower. If f is continuous and y ≤X x, then y ∈ cl{x} so f (y) ∈ f [ cl{x}] ⊆ cl(f [{x}]), which is to say, f (y) ≤Y f (x). (c) Notice that in the subspace topology, C ⊆ Y is closed if and only if C = Y ∩ D for some closed D ⊆ X. Thus Y ∩ cl{y} is closed in τ |Y and if y ∈ C closed in τ |Y then for some closed D ⊆ X, y ∈ D (thus cl{y} ⊆ D) and 1
Recall that the product is the set of all maps x on I such that each x(i) ∈ Xi . Usually x(i) is called the i’th coordinate, and denoted xi .
Topological Digital Topology
5
C = Y ∩ D. Thus C ⊇ Y ∩ cl{y}. This shows that Y ∩ cl{y} is the smallest closed set in τ |Y , containing y, and of course, for x ∈ Y , x ∈ cl{y} ⇔ x ≤X y. Notice that ifeach Ci ⊆ Xi is closed, then i∈I Ci = {x ∈ i∈I Xi | each xi ∈ Ci } = i∈I {x ∈ i∈I Xi | x i ∈ Ci } and is thus closed in i∈I Xi since foreach i, the complement {x ∈ X | x ∈ X \ C } ∈ τ . Thus for i i i i i i∈I each y, i∈I Ci = {x ∈ i∈I Xi | each xi ∈ cl{yi }} is the smallest closed set containing y, and of course x is in this set iff each xi ≤Xi yi . The converses of (a) and (b) above are not true: Notice that each function must preserve =, the specialization order of T1 spaces, while most are not continuous. For similar reasons, each set in a T1 space is both upper and lower, but the only sets in IR which are both open and closed are ∅ and IR. But the converses hold for Alexandroff spaces: Theorem 1. (a) A space is Alexandroff if and only if all unions of closed sets are closed; equivalently, if and only if each A is contained in a smallest open set, which we call n(A). (b) For an Alexandroff space (X, τ ), the closed sets are precisely the ≤τ lower sets, and the ≤τ -upper sets are exactly the open sets. Further, the continuous functions are simply the specialization order preserving functions. Proof. (a) The first assertion is shown using de Morgan’s laws, exactly as Lemma 1 (a) was shown. For the second, the existence of n(A) in Alexandroff spaces is shown just like that of cl(A) in all topological spaces, in Lemma 1 (b). Conversely, if n(A) always exists and G is a collection of open sets, then for each T ∈ G, n( G) ⊆ T ; therefore n( G) ⊆ G; but since in general A ⊆ n(A), we have that G = n( G), an open set. (b) One direction of each assertion in the first sentence holds by Lemma 2. For the converses, if C is a lower set in an Alexandroff space, then C = { cl({x}) | x ∈ C}, a closed set. Thus if T is an upper set then its complement is lower, so closed, thus T is open. For functions, we show more than stated in (b): a function f : X → Y, X, Y Alexandroff, is continuous at x ∈ X ⇔ whenever x ≤X y then f (x) ≤Y f (y). To see this, note that “x ≤X y ⇒ f (x) ≤Y f (y)” is equivalent to f [ n{x}] ⊆ n(f [{x}]), and if the latter holds and f (x) ∈ T , an open set, then n{f (x)} ⊆ T , so for U = n{x}, x ∈ U and f [U ] ⊆ T . From the last paragraph, it results that a function between Alexandroff spaces is continuous if and only if it is specialization preserving. The results in Theorem 1 essentially say that for all Alexandroff spaces, (including each space, X, that can be completely stored in a computer), all the information about X can be learned from the “asymmetric adjacency” ≤X . We use this below.
3
The Computer Screen
Since the execution of programs and the computer screen are “discrete”, programs for the computer screen operate in terms of adjacencies, that is, binary
6
Ralph Kopperman º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
Fig. 1. (4,4) and (8,8) violations of the Jordan curve theorem.
relations that are symmetric and irreflexive; the most popular are 4-adjacency, where each (x, y) ∈ Z2 is adjacent to (x, y+1) and (x+1, y) and 8-adjacency, in which each (x, y) ∈ Z2 is adjacent to (x+1, y+1) and the above 4 points. This very well known theory is discussed in [5] and [12], and many other places. Given an adjacency A on X and a subset S of X, an A-path in S (from y to z), is a finite sequence x1 , . . . , xn ∈ S such that for each 1 ≤ k < n, (xk , xk+1 ) ∈ A (and y = x1 , z = xn ). The subset S is A-connected if for each y, z ∈ S, there is an A-path in S from y to z. An A-component is a maximal A-connected subset. Further, an an A-arc is an A-path x1 , . . . , xn such that whenever 1 ≤ k, m ≤ n and (xk , xm ) ∈ A, then m = k+1, and an A-Jordan curve is an A-arc, except that (xn , x1 ) ∈ A. But adjacencies that seem to respect nearness need not mirror topological reality. For example, Figure 1 shows well-known, easy examples of a 4-Jordan curve whose complement has 3 4-components, and an 8-Jordan curve whose complement is 8-connected. But: if {k, m} = {4, 8} then whenever J is a k-Jordan curve, then Z2 \ J has exactly two m-components. This suggests the care needed in selecting an adjacency to represent Euclidean space. With the help of the earlier discussion, we discuss the solution of putting a topology on the finite computer screen which behaves like that on a the rectangle in the plane that it is supposed to represent. This raises several issues: Finite T1 -spaces are discrete (each singleton is the finite intersection of the complements of the other singletons; thus singletons are open, and therefore all sets are open). Thus they can’t be connected if they have more than one point. When a space (X, τ ) isn’t T1 , its specialization order becomes important. For us, the specialization is centrally important; it will be the tool for writing algorithms which, by Theorem 1, fully represent the topology of the space. It isn’t difficult to see that if ≤ is any preorder, then the collection of ≤-upper sets, α(≤), is an Alexandroff topology, and by Theorem 1 (a), for each Alexandroff space, τ = α(≤τ ). For the moment, we take dimension in its most trivial sense: an object will surely be k-dimensional if it is the product of k 1-dimensional objects. The
Topological Digital Topology
7
computer screen certainly looks like the product of two such spaces – in fact, it looks like the product of two intervals. Recall that a topological space is connected if whenever A ⊆ X is both open and closed, then A = X or A = ∅. We take the following to be the essence of 1-dimensionality in IR and intervals: a connected ordered topological space (COTS) is a connected space such that among any three points is one whose deletion leaves the other two in separate components of the remainder. Certainly the reals and intervals have this property; IR2 doesn’t since the deletion of any singleton leaves the remainder connected. But figure 2 shows a finite COTS.
º
º
º
º
Fig. 2. A COTS with 8 points: 4 open, 4 closed.
The diagram uses two conventions which enable us to draw “Euclidean” pictures and interpret them as finite T0 -spaces: • apparently featureless sets represent points, • sets which ‘look’ open are open. Figure 3 below uses these conventions, to show products of 2 and 3 COTS, looking appropriately 2 and 3-dimensional. The computer screen seems reasonably, to be the product of two long finite COTS; in it, the open points can be seen (are the ‘pixels’) and the others are invisible addresses that might be used in programs. (In fact, would it be reasonable to think of space as the product of 3 long finite COTS?) These diagrams suggest that COTS are natural 1-dimensional spaces. Here is a theorem which reinforces that idea: Theorem 2. A topological space X is a COTS if and only if there is a linear order < on X such that for each x ∈ X, (x, ∞) 2 and (−∞, x) are the two components of X \ {x}. In this case there are exactly two such total orders, the other being <−1 . In Z or IR, the orders which satisfy Theorem 2 are the usual order and its reverse; note that the specialization order, ≤Z , discussed after Proposition 2, is quite differenct, relating only adjacent numbers (and not all of them). Although we haven’t assumed any separation, the following result tells us that our spaces are T0 , and shows the generality of Figure 2: Proposition 1. For a COTS at least 3 points: (a) Each point is open or closed, but never both. The space is T0 . (b) Distinct points x, y ∈ X are adjacent (with respect to <) if and only if {x, y} is connected. (c) X is T1 if and only if it has no adjacent points; in this case, X infinite. 2
For x, y ∈ X, (x, ∞) = {z | x < z}, (−∞, y) = {z | z < y}, and (x, y) = (x, ∞) ∩ (−∞, y).
8
Ralph Kopperman
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
º
Fig. 3. A product of 2 9-point COTS
º ºº ºº ºº ºº º º º º º º º º º º º º º º º º º ºº º º º º º º º º ºº º º º º º º º º ºº º
º
A 9 × 9 × 3 “3-space”.
Proposition 2. The set Z of integers, with the smllest topology in which each {2n − 1, 2n, 2n + 1}, n ∈ Z is open, is a T0 COTS such that each finite T0 COTS is homeomorphic to some (x, y), x, y ∈ Z. In fact, the numbers in Figure 2 indicated one of many ways that finite COTS could be imbedded in Z. The space of Proposition 2 is often called the Khalimsky line. In it, a set T is open if and only if whenever it contains an even number, it contains the odd numbers adjacent, 2n ∈ T ↔ 2n−1, 2n+1 ∈ T . that is, 2n ∈ T ↔ 2n − 1, 2n + 1 ∈ T . Thus a set C is closed if and only if, whenever it contains an odd number, it contains the two even numbers adjacent, that is, 2n + 1 ∈ T ↔ 2n, 2n + 2 ∈ T . As a result, x ≤ZZ y if and only if x = y, or for some n, x = 2n&y = 2n+1. By Lemma 2 (c), the specialization in diital n-space, Zn , is found coordinatewise by the rule: for x, y ∈ Zn , x ≤ZZk y if and only if for each i = 1, . . . , k, xi = yi , or for some n, xi = 2n&yi = 2n+1. With Theorem 1 (b) and the usefulness of adjacencies in mind, we define the adjacency A(τ ) induced by τ by (x, y) ∈ A(τ ) if {x, y} is a set connected in τ (that is, if and only if, x ≤τ y or y ≤τ x), and x, y are distinct. We also let A(p) denote the set of points which are A(τ )-adjacent to p. Note that this adjacency depends only on the topological space, and not on the “background” and “foreground”. In Zk , for example, A(p) depends on how many of the coordinates are odd and how many are even. For example, if both coordinates are even: A(2n, 2m) = cl{(2n, 2m)} ∪ n{(2n, 2m)} \ {(2n, 2m)} = {(2n, 2m)} ∪ {2n − 1, 2n, 2n + 1} × {2m−1, 2m, 2m+1}\{(2n, 2m)}, the points 8-adjacent to (2n, 2m), and similarly (but exchanging the roles of cl, n) each A(2n + 1, 2m + 1) (both coordinates odd), is again the set of points 8-adjacent to (2n + 1, 2m + 1). For a point where 1 coordinate is even (the other odd), we have A(2n+1, 2m) = cl{(2n+1, 2m)}∪ n{(2n + 1, 2m)} \ {(2n, 2m)} = {2n + 1} × {2m − 1, 2m, 2m + 1} ∪ {2n, 2n + 1, 2n + 2} × {2m} \ {(2n + 1, 2m)}, the points 4-adjacent to (2n + 1, 2m). Figure 4 below illustrates some typical cases in Z2 , Z3 . We then have the notions of τ -path, etc., and:
Topological Digital Topology
¯
¯
¯
¯
9
¯
¯
Fig. 4.
A(2, 1)
A(1, 2, 1).
Proposition 3. Let (X, τ ) be an Alexandroff space. (a) A subset S ⊆ X is an A(τ )-path if and only if it is the continuous image of a COTS (equivalently, of an interval in Z). It is an A(τ )-arc if and only if it is a COTS. (b) A subset S ⊆ X is connected if and only if it is A(τ )-connected (also, if and only if for each x, y ∈ S there is an A(τ )-arc in S from x to y). (c) If J ⊆ Z2 is a Jordan curve then Z2 \ J has two connected components. Boundary-tracking is another concern of digital topology. The plane is often about a million pixels, and a region in it has comparable magnitude, but a relatively straight boundary might be a few thousand bytes in size. So considerable savings in storage is often achieved by replacing regions by their boundaries. Not all Jordan curves are closed sets, so not all can be boundaries, and not every set has as its boundary a Jordan curve. (Examples: the boundary of the set of closed points is the set itself, and that of the set of open points is its complement.) But these issues are overcome in a natural way: A set S is regular if int( cl(S)) ⊆ S ⊆ cl( int(S)). A robust scene is a partition of Z2 into regular sets whose interiors are connected. A cartoon is a finite union of Jordan curves. Then (see [8]): Theorem 3. (a) For any finite S ⊆ Z2 , ∂S is a (closed) Jordan curve if and only if S is regular and int(S), int(Z2 \ S) are both connected. (b) The union of the boundaries of the sets in a robust scene is a cartoon, and every cartoon is such a union. Although we have only discussed the two-dimensional case, most of these results extend to arbitrary (finite) dimensions. An important fact however, is that while the proofs in the two-dimensional case are all appropriately digital (carried out, for example, by induction on the lengths of the shortest paths with certain properties), those now known in higher dimensions require uses of other techniques. Problem 1: Find digital proofs in higher dimensions. There are algorithms written in terms of the topological adjacency, but in overwhelming number, they are in terms of the traditional adjacencies and some newer ones that have the advantage of providing a great deal of guidance by
10
Ralph Kopperman
being “small” – since boundaries are traced by going from point to adjacent point, adjacencies in which few points are adjacent require fewer steps to carry out. Thus the best that can be hoped, is: Problem 2: Are the sound algorithms in digital topology those that can be shown sound by comparison to some finite T0 -space? For example, soundness of the (4, 8), (8, 4) and (6, 6) algorithms can be shown this way (see [13]).
4
Comparing to Polyhedra
The following basic tool is developed in [10], from which most results in this section come. Definition 4. A metric analog of a topological space X with base point x0 , is a metric space M with base point m0 , together with an open quotient map q : M → X, such whenever A is a metric space with base point a0 : for any map f : A → X there is a map fˆ : A → M such that f = q fˆ. for any maps f, g : A → M so that qf = qg there is a homotopy F : A × [0, 1] → M such that whenever x ∈ A and t, u ∈ [0, 1]: F (x, 0) = f (x); F (x, 1) = g(x); F (a0 , t) ≡ m0 , and for each qF (x, t) = qF (x, u) (t → qF (x, t) is constant). Composition by the open quotient q induces a bijection between the path components (see [6]) of M and those of X, and this composition induces isomorphisms between the homotopy groups of M and those of X; that is to say, q is a weak homotopy equivalence between M and X. A homotopy which, like the above, has the property that t → qF (x, t) is constant, is said to ignore the quotient q. Further, suppose (M, q) is a pair such that M is a metric space and q : M → X is an open quotient, and suppose A is a metric space; then for any maps F, G : A → M , F and G are quotient homotopic if there is a homotopy H : A × [0, 1] → M between F and G such that qH(x, t) = qH(x, 0) = qH(x, 1) for all x in A and t in [0, 1]; this relation is denoted F G. Theorem 4. Each T0 countable join3 of Alexandroff topologies has a metric analog. By the following, any two metric analogs of the same space are homotopy equivalent. (Below, let 1A denote the identity map on A.) Theorem 5. Suppose (M, q) is a metric analog of a space X. If (N, r) is another metric analog of X, then there are maps F : M → N and G : N → M such that GF 1M and F G 1N . Conversely, if N is a metric space, r : N → X is an open quotient, and there are maps g : M → N, h : N → M so that gh 1N and hg 1M , then (N, r) is metric analog of X. 3
The join of a collection of topologies is the smallest topology containing them all.
Topological Digital Topology
11
º º º º
º
º º
º
º º º
º
º º
º
º º º
º
º º
º
º º
º
º
º
º
º
º
º
º º
º
º
º
Fig. 5. Approximation of the unit interval by finite COTS.
The converse is useful in creating other metric analogs from a given one. In particular, it is used in showing the existence, for each finite T0 space K, of a polyhedral analog: a subset |K| of a finite dimensional Euclidean space, with a vertex for each point in K, and whose simplices are the convex hulls of the specialization order chains in K, together with the quotient map which takes each point of this metric space into the specialization-largest vertex of the smallest simplex in which the point lies. Two results shown using polyhedral analogs are the Jordan surface theorem for three-dimensional digital spaces and that the product topology on Zn is the only simply-connected one whose connected sets include all 2n-connected sets but no 3n − 1-disconnected sets. This last result (of [9]) is a two-edged sword: it gives a complete representation of topological adjacencies that emulate finite dimensional Euclidean space topologies. In doing so, it points out their scarcity among all adjacencies. There are other adjacencies which emulate many of the properties of Euclidean space, and give rise to faster algorithms.
5
Finite Approximation of Compacta
Now we will use finite spaces to approximate others. Figure 5 illustrates such an approximation and motivates the mathematics that is needed. Its top horizontal line represents the unit interval, but those at the bottom are meant to be finite n n+1 COTS: Dn = { 2in | 0 ≤ i ≤ 2n } ∪ {( 2in , i+1 + 1 points 2n ) | 0 ≤ i < 2 }, with 2 and the quotient topology induced from [0, 1]. The vertical lines indicate maps going down, for which a closed point is the image of the one directly above it, while an open point is that of the three above it. Recall that a topological space X is compact if whenever X = G for some collection of open sets, then there is a finite subcollection H ⊆ G such that X = H. It is Hausdorff (T2 ) if whenever x = y there are T, U ∈ τ such that x ∈ T , y ∈ U and T ∩ U = ∅.
12
Ralph Kopperman
The following result has long been known ([Al]): Theorem 6. A T2 space X is compact, if and only if there is an inverse system of finite spaces and continuous maps such that X is the largest T2 continuous image of the limit of the system. The largest T2 continuous image of a space is called its Hausdorff reflection. Also, recall that an inverse system of topological spaces and continuous maps is a directed set (Γ, ≤) together with a space Xγ for each γ ∈ Γ and whenever δ ≥ γ, a continuous fδγ : Xδ → Xγ , such that each fγγ = 1Xγ and if δ ≥ γ ≥ β then fδβ = fγβ fδγ . Its inverse limit (unique to homeomorphism) is an XΓ , together with, for each α, pα : XΓ → Xα , such that for whenever α ≥ β, pβ = fαβ pα , and minimal among such spaces, in that whenever we have a Y and for each α, a gα : Y → Xα such that α ≥ β, gβ = fαβ gα , then there is a unique g : Y → XΓ such that for each α, gα = pα g. This inverse limit can be represented as the subspace of the product γ∈Γ Xγ whose elements are those x in the product such that whenever α ≥ β, xβ = fαβ (xα ). In the case of the diagram above, the inverse limit is essentially [0, 1] ∪ {d+ | d = m/2n , 0 ≤ m < 2n } ∪ {d− | d = m/2n , 0 < m ≤ 2n }, where d+ (k) = (m+1)/2n . This space is rarely Hausdorff, thus rarely the X we set out to approximate. It is for this reason that we need to use the Hausdorff reflection. We now look at cases of this construction that are sufficiently general to study all compact Hausdorff spaces, but relatively easy to understand; these are studied in [16], [15] and [14] (related earlier constructions can be found in [1], [2], [4] and [3]). First we look at the method used to get the inverse system, which dates from [Al] and is in our notation in [KW]. Suppose (X, τ ) is our compact Hausdorff space. Whenever F is a finite set of open sets, we get a partition of X into a finite number subsets: for each of the finite number of subsets G of F , let PG = {x ∈ X | for T ∈ F, x ∈ T ⇔ T ∈ G}. Let XF = {PG = ∅}, with the map πF : X → XF defined by f (x) be the element of the partition in which x lies. Also, let τF be the quotient topology resulting from πG (that is, U ∈ τF ⇔ πF−1 [U ] is open in X. Each XF is a T0 space. Also, we get increasingly fine partitions of X by taking more and more open sets; that is, if F ⊆ F , then fF F (PG ) = PF ∩G defines a map fF F : XF → XF , such that πF = fF F πF . Certainly, {F ⊆ τ | F finite} is directed by ⊆, and it can be checked that fF F = 1XF and the fF F are continuous maps such that if F ⊆ F ⊆ F then fF ”F = fF F fF ”F . Thus this method of considering partitions by larger and larger finite sets of open sets, yields a natural inverse system of finite spaces and maps. The above has been refined to cases that are easy to handle, but the refinement is best understood if we work with bitopological spaces: sets with two topologies (X, τ, τ ∗ ). A bitopological space is pseudoHausdorff (pH) if whenever x ∈ clτ (y) then there is a T ∈ τ and U ∈ τ ∗ which are disjoint and such that x ∈ T and y ∈ U . It is pairwise Q if both it and its dual, (X, τ ∗ , τ ), are Q. It is joincompact if it is pairwise pH and the join, τ ∨ τ ∗ is compact and T0 . A topological space (X, τ ) is skew compact if there is a second topology τ ∗ on X such that (X, τ, τ ∗ ) is joincompact.
Topological Digital Topology
13
For example, if X = [0, 1] then τ = {(a, 1] | 0 ≤ a ≤ 1} ∪ {X} is skew compact, using τ ∗ = {[0, a) | 0 ≤ a ≤ 1} ∪ {X}, and each compact Hausdorff space is skew compact, with τ ∗ = τ . In what follows, µ(X) will denote the set of specialization-minimal elements of X - that is, those x ∈ X such that {x} is closed. Further, m will denote the relation {(x, y) | y ∈ cl({x}), {y} closed}. Proposition 4. Suppose X is skew compact. (a) µ(X) is a compact subspace of (X, τ ). (b) If each x ∈ X lies above a unique element mx ∈ µ(X), then m is a continuous map from (X, τ ) onto (µ(X), τ |µ(X)). (c) Suppose T ∩ U = ∅ whenever x ∈ T, y ∈ U and T, U ∈ τ . Then there is a z ∈ X such that x, y ∈ cl(z). (d) If each element of X has a unique minimal element in its closure, then µ(X) is a Hausdorff subspace of (X, τ ). A topological space is normal if disjoint closed sets are contained in disjoint open sets. Theorem 7. The following are equivalent for a skew compact space X: (a) X is normal, (b) Each point of X has a unique closed point in its closure, (c) m is a retract from X to its subspace µ(X). If any of these hold, then (µ(X), m), is the Hausdorff reflection of (X, τ ). While there are many finite normal spaces, normality is best built up in the approximation: Definition 5. Suppose that X and Y are T0 -spaces; we say that a map f : X → Y is: normalizing if inverse images of disjoint closed sets are contained in disjoint open sets, chaining if f [ cl{x}] is a specialization chain for each x. Then a space X is normal if and only if the identity map 1X on X is a normalizing map. An inverse system of topological spaces and continuous maps (Xα , fβα ) whose inverse limit is X, is eventually normalizing (resp. chaining) if for each α ∈ I there is some γ ≥ α such that fγα is normalizing (resp. chaining). Theorem 8. (a) The limit of an inverse system of finite T0 -spaces and continuous maps is normal if and only if the system is eventually normalizing. (b) Each compact Hausdorff space is the Hausdorff reflection of the inverse limit of a spectrum of an eventually chaining inverse system of finite T0 -spaces and continuous maps. Also, every chaining map is normalizing, so the same holds for normalizing maps. The simplicialization of a finite T0 -space X is the set X C of nonempty chains (totally ordered subsets) of (X, ≤), with the Alexandroff topology A(⊆) whose
14
Ralph Kopperman
specialization order is containment (that is, if S, T ∈ X C , then S ∈ cl{T } if and only if S ⊆ T ). Define the simplicial quotient pX : X C → X, by pX (S) = max(S). Proposition 5. The map pX : X C → X is continuous, open, and chaining. Furthermore, a continuous map f : X → Y is chaining if and only if there is a continuous map f˜ : X → Y C such that pY f˜ = f . Finally, if h : X → Y C is closed and pY h = f then h = f˜. A calming map is a chaining map f for which f˜ is a closed map. Compact Hausdorff spaces are also ofen approximated using polyhedra (see the survey [17]; the following relates our approach to this: Theorem 9. Suppose (Xn , fn ) is an inverse sequence of finite T0 -spaces and calming maps. Then the limit of the (|XnC |, |fnC |) is homeomorphic to the space of minimal points of the limit of the (Xn , fn ). Corollary 1. (a) A metrizable space is compact if and only if it is the Hausdorff reflection of the limit of an inverse sequence of finite T0 -spaces and calming maps. (b) Under these conditions, our space is ≤ k-dimensional if and only if these finite spaces can be assumed ≤ k-dimensional, and is connected and only the finite spaces can be assumed connected.
6
Summary and Further Indicated Work
Of course, the topological spaces that can be completely stored and studied in a computer are finite. These spaces can be completely analyzed using the specialization order, x ≤X y ⇔ x ∈ cl(y), and this “asymmetric adjacency gives rise to an adjacency, defined by: for x = y, (x, y) ∈ AX ⇔ x ≤τ y or y ≤τ x. The traditional adjacencies, (4, 8), (8, 4), and (6, 6), and their n-dimensional analogues can be used to study Zn , and have been shown to capture the notions of connectedness and boundary quite well. But by their definitions, ≤ZZn perfectly captures all of the properties of these spaces, and determines the adjacency AZZn , with which boundary tracking and other traditional algorithms (typically written in terms of the traditional adjacencies) can be written. Further, the latter need not be adjusted to take into account the background and foreground. It should be repeated that there are “sparse” (nontopological, and typically nonsymmetric) adjacencies which limit the number of choices available and thus can result in faster execution times. However, all compact Hausdorff spaces arise by approximation using finite T0 -spaces. These finite spaces can be completely analyzed as partially ordered sets, using their specializations ≤X and, algorithms in terms of this relation work well for them as they do for the traditional digital n-spaces that arise in image processing. Note that it is easy to find spaces for which “boundary tracking” is a useless idea; for example, in the two-dimensional space on the left hand side of Figure 3, imagine that none of the points both of whose coordinates are odd are
Topological Digital Topology
15
in the space (so it represents a “graph paper” grid). Then almost no boundaries are connected, and none can be tracked. On the other hand, they can still be found, and can be useful in storing sets. More must be learned about this approximation; we know, for example that dimension is preserved in the approximation of spaces, and are presently working to find how homotopy and homology are preserved. We are also studying how to best represent functions between spaces in terms of finite approximation.
References 1. Alexandroff, P. S., “Untersuchungen u ¨ ber Gestalt und Lage abgeschlossener Mengen beliebeger Dimension”, Annals Math., 30 (1928-29), 101-187. 2. Alexandroff, P. S., “Diskrete R¨ aume”, Mat. Sbornik, 2-44 (1937), 501-519. 3. Flachsmeyer, J., “Zur Spektralentwicklung topologischer R¨ aume”, Math. Ann., 144 (1961), 253-274. 4. Freudenthal, H., “ Entwicklungen von R¨ aumen und ihren Gruppen”, Compositio Math, 4 (1937), 154-234. 5. G. T. Herman., Geometry of digital spaces. Birkh¨ auser, 1998. 6. Hocking, J. and Young, G., Topology, Addison Wesley, Reading, Mass., 1961. 7. Khalimsky, E., Kopperman, R. D. and Meyer, P. R., “Computer graphics and connected topologies on finite ordered sets”, Topology and its Appl., 36 (1990), 1-17. 8. Khalimsky, E., Kopperman, R. D. and Meyer, P. R., “Boundaries in Digital Planes”, Journal of Applied Mathematics and Stochastic Analysis, 3 (1990), 27-55. 9. Kong, T. Y., “The Khalimsky topologies are precisely those simply-connected topologies on Zn whose connected sets include all 2n-connected sets but no (3n −1)disconnected sets. To appear, Theoretical Computer Science. 10. Kong, T. Y. and Khalimsky, E., “Polyhedral analogs of locally finite topological spaces”, General Topology and Applications: Proceedings of the 1988 Northeast Conference, R. M. Shortt, editor, pp. 153–164. Marcel Dekker, 1990. 11. Kong, T. Y., Kopperman, R. D. and Meyer, P. R., “A Topological Approach to Digital Topology”, Am. Math. Monthly, 98 (1991), 901-917. 12. T. Y. Kong and A. Rosenfeld. “Digital topology: Introduction and survey,” in Computer Vision, Graphics, and Image Processing, vol. 48 (1989), 357–393. 13. Kopperman, R. D., “The Khalimsky Line as a Foundation for Digital Topology” Shape in Picture, Ed: Ying-Lie O, et. al., Springer-Verlag Vol. F-126 (1994), 3-20. 14. Kopperman, R. D., V. V. Tkachuk and R. G. Wilson, “The approximation of compacta by finite T0 -spaces”, to appear, Quaestiones Math. 15. Kopperman, R. D. and Wilson, R. G., “Finite approximation of compact Hausdorff spaces”, Topology Proceedings, 22 (1999), 175-201. 16. Kopperman, R. D. and Wilson, R. G., “On the role of finite, hereditarily normal spaces and maps in the genesis of compact Hausdorff spaces”. To appear in Topology and its Appl. 17. Mardeˇsi´c, S., “Approximating topological spaces by polyhedra”, Approximation Theory and its Applications, eds. J. Ferrera, J. L´ opez-G´ omez and F.R. Ruiz del Portal, Nova Science Publishers, Huntington, New York, USA. 18. Morris, S. A., Topology Without Tears, available from author’s web site. 19. Simmons, G. F. (1983). Introduction to Topology and Modern Analysis, Krieger, Malabar, FL.
Fuzzy Spatial Relationships from Mathematical Morphology for Model-Based Pattern Recognition and Spatial Reasoning Isabelle Bloch Ecole Nationale Sup´erieure des T´el´ecommunications Dept. TSI - CNRS UMR 5141 46 rue Barrault, 75013 Paris, France [email protected]
Abstract. This paper discusses the interest of fuzzy set representations and of mathematical morphology for structural spatial knowledge representation and its use in model-based pattern recognition in images. It also briefly addresses the issues of digitization effects and computational aspects.
1
Introduction
In model-based pattern recognition in images, the model constitutes a description of the scene where objects have to be recognized. This description can be iconic, as for instance a digital map or a digital anatomical atlas, or symbolic, as a linguistic description of the main structures. The model can be attached to a specific scene, the typical example being a digital map used for recognizing structures in an aerial or satellite image of a specific region. It can also be more generic, as an anatomical atlas, which is a schematic representation that can be used for recognizing structures in a medical image of any person. In both types of descriptions (iconic and symbolic), objects are usually described through some characteristics like shape, size, appearance in the images, etc. But this is generally not enough to discriminate all objects in the scene, in particular if they are embedded in a complex environment. For instance in a magnetic resonance image (MRI) of the brain, several internal structures appear as smooth shapes with similar grey levels, making their individual recognition difficult. Similar examples can be found in other application domains. In such cases, spatial relationships play a crucial role, and it is important to include them in the model in order to guide the recognition. Such relationships can be of topological nature (inclusion, exclusion, adjacency...) or of metric nature (distances, orientations...). They often allow to identify structures that could not be distinguished based on their individual characteristics, by using their relationships to other structures. This topic is the main component of spatial reasoning. Unlike temporal reasoning, which is well structured [1], spatial reasoning is more recent and less unified. It can be defined as the domain of spatial knowledge representation, in I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 16–33, 2003. c Springer-Verlag Berlin Heidelberg 2003
Fuzzy Spatial Relationships from Mathematical Morphology
17
particular spatial relations between spatial entities, and of reasoning on these relations. Usually vision and image processing make use of quantitative representations of spatial relationships. In artificial intelligence, mainly symbolic representations are developed (see [30] for a survey). The limitations of purely qualitative reasoning have already been stressed in [18], as well as the interest of adding semi-quantitative extension to qualitative value (as done in the fuzzy set theory for linguistic variables [31,16]) for deriving useful and practical conclusions (as for recognition). On the other hand, purely quantitative representations are limited in the case of imprecise statements, and of knowledge expressed in linguistic terms. The use of fuzzy approaches for representing spatial relationships allows us to integrate both quantitative and qualitative knowledge, using the semi-quantitative interpretation of fuzzy sets. As already mentioned in [20], this allows to provide a computational representation and interpretation of imprecise spatial relations, expressed in a linguistic way, possibly including quantitative knowledge. These representations can then be used for semi-quantitative reasoning, intermediate between quantitative and qualitative spatial reasoning. In this context, mathematical morphology is of particular interest. Although its basic transformations rely mainly on local information, based on the concept of structuring element, mathematical morphology also deals with more global and structural information since several spatial relationships can be expressed in terms of morphological operations (mainly dilations). Its algebraic framework leads to nice extensions to fuzzy sets with good properties, based on fuzzy mathematical morphology [9]. In Section 2, we show how spatial relationships can be defined from mathematical morphology in the fuzzy set framework. This summarizes our previous work, and more details can be found in [10,6,4,7] for instance. In Section 3, we briefly explain how these relationships can be used in model-based pattern recognition. In Section 4, we mention some digital aspects and address computational issues.
2
Fuzzy Spatial Relationships Based on Mathematical Morphology
In this Section we address the problem of modeling spatial relationships in the fuzzy set framework. This framework is interesting here for several reasons: – the objects of interest can be imprecisely defined, for instance due to the segmentation step; – some relations are imprecise, such as to be left of, and find a more suitable definition in the fuzzy set framework; – the type of knowledge available about the structures or the type of question we would like to answer can be imprecise too. We consider here set relations, adjacency, distances, and directional relative position. Some of them have led to a rich literature in the fuzzy set community,
18
Isabelle Bloch
like distances which have been defined using a lot of different approaches, while others have not raised so much attention. We summarize here our work based on fuzzy mathematical morphology [9], which allows us to represent in a unified way various spatial relationships [7]. Two types of questions are important for applications in structural pattern recognition: 1. given two object (possibly fuzzy) assess the degree to which a relation is satisfied; 2. given one reference object, define the area of the space in which a relation to this reference is satisfied (to some degree). Our approach provides answers to these two types of questions. The second one will be illustrated only for distances and directional position here (see [7] for the other relations). We consider the general case of a 3D space S (typically R3 or Z3 in the digital case), where objects can have any shape and any topology, and can be crisp or fuzzy. Fuzzy Dilation and Erosion. Several definitions of fuzzy mathematical morphology have been proposed. Here we just give an example, chosen for its nice properties with respect to classical morphology, where dilation and erosion of a fuzzy set µ by a structuring element ν are respectively defined, for all x ∈ S, by [9]: Dν (µ)(x) = sup{t[ν(y − x), µ(y)], y ∈ S}, Eν (µ)(x) = inf{T [c(ν(y − x)), µ(y)], y ∈ S}.
(1) (2)
where t is a t-norm and T the associated t-conorm with respect to the complementation c. Set Relationships. If the objects are imprecise and represented as spatial fuzzy sets (i.e. fuzzy sets defined by membership functions from the spatial domain S into [0, 1]), stating if they intersect or not, or if one is included in the other, becomes a matter of degree. The degree of intersection between two fuzzy sets µ and ν can be defined using a supremum of a t-norm t: µint (µ, ν) = sup t[µ(x), ν(x)],
(3)
x∈S
or using the fuzzy volume of the t-norm in order to take more spatial information into account: Vn [t(µ, ν)] µint (µ, ν) = , (4) min[Vn (µ), Vn (ν)] where Vn (µ) =
x∈S
µ(x).
(5)
Fuzzy Spatial Relationships from Mathematical Morphology
19
The degree of non-intersection is then simply defined by µ¬int = 1 − µint . It is interesting to note that the degree of intersection defined from a t-norm corresponds to the dilation of µ by ν at origin. In a similar way, the degree of inclusion of ν in µ can be defined as: inf T [c(ν(x)), µ(x)],
x∈S
(6)
and corresponds to the erosion of µ by ν at origin. These morphological interpretations allow us to include set relationships in the same framework as the other relations that will be detailed below. Adjacency. Adjacency has a large interest in image processing and pattern recognition, since it denotes an important relationship between image objects or regions [29], widely used as a feature in model-based pattern recognition. In the crisp case, it is defined based on the digital connectivity nc (x, y) defined on the image: two subsets X and Y in S are adjacent according to the c-connectivity if: X ∩ Y = ∅ and ∃x ∈ X, ∃y ∈ Y : nc (x, y). This definition can also be expressed equivalently in terms of morphological dilation, as: X ∩ Y = ∅ and DB (X) ∩ Y = ∅, DB (Y ) ∩ X = ∅, where B denotes the elementary structuring element associated to the c-connectivity. This morphological expression can be extended to the fuzzy case, leading to the following degree of adjacency between two fuzzy sets [10]: µadj (µ, ν) = t[µ¬int (µ, ν), µint [DB (µ), ν], µint [DB (ν), µ]].
(7)
This definition represents a conjunctive combination (through a t-norm t) of a degree of non-intersection µ¬int between µ and ν and a degree of intersection µint between one fuzzy set and the dilation of the other. B can be taken as the elementary structuring element related to the considered connectivity, or as a fuzzy structuring element, representing for instance spatial imprecision (i.e. the possibility distribution of the location of each point). We proved that this definition is symmetrical, consistent with the binary definition if µ, ν and B are binary, decreases if the distance between µ and ν increases, and is invariant with respect to geometrical transformations [10]. Distances. The importance of distances in image processing is well established. Their extensions to fuzzy sets can be useful in several parts of image processing under imprecision (classification and clustering, skeletonization, registration, structural pattern recognition, since distances constitute a major component of the spatial arrangement of objects). Several definitions can be found in the literature for distances between fuzzy sets (which is the main addressed problem). They can be roughly divided in two classes: distances that take only membership functions into account and that compare them point-wise, and distances that additionally include spatial distances. The definitions which combine spatial distance and fuzzy membership comparison allow for a more general analysis of structures in images, for applications where topological and spatial arrangement of the structures of interest
20
Isabelle Bloch
is important (segmentation, classification, scene interpretation). These distances combine membership values at different points in the space, and take into account their proximity or distance in S. The price to pay is an increased complexity, generally quadratic in the cardinality of S. We proposed in [6] original approaches for defining fuzzy distances taking into account spatial information, which are based on fuzzy mathematical morphology. They are summarized below. The idea is that in the binary case, there exist strong links between mathematical morphology (in particular dilation) and distances (from a point to a set, and between two sets), and this can also be exploited in the fuzzy case. The advantage is that distances are expressed in set theoretical terms, and are therefore easier to translate to the fuzzy case with nice properties than usual analytical expressions. We just give the example of nearest point distance between two fuzzy sets. The case of Hausdorff distance can be treated in a similar way. The minimum or nearest point distance between X and Y is defined (in the discrete finite case) as: dN (X, Y ) =
min (x,y)∈X×Y
dE (x, y) = min dE (x, Y ) = min dE (y, X), x∈X
y∈Y
(8)
where dE denotes the Euclidean distance in S. This has an equivalent morphological expression: dN (X, Y ) = inf{n ∈ N, X ∩ Dn (Y ) = ∅} = inf{n ∈ N, Y ∩ Dn (X) = ∅}.
(9)
By translating Equation 9, we define a distance distribution ∆N (µ, µ )(n) that expresses the degree to which the distance between µ and µ is less than n by: ∆N (µ, µ )(n) = f [sup t[µ(x), Dνn (µ )(x)], sup t[µ (x), Dνn (µ)(x)]], x∈S
(10)
x∈S
where f is a symmetrical function. A distance density, i.e. a fuzzy number δN (µ, µ )(n) representing the degree to which the distance µ is equal to n, can be obtained implicitly by n between µ and ∆N (µ, µ )(n) = 0 δN (µ, µ )(n )dn . Clearly, this expression is not very tractable and does not lead to a simple explicit expression of δN (µ, µ )(n). Therefore, we suggest to use an explicit method, exploiting the fact that, for n > 0, the nearest point distance can be expressed in morphological terms as: dN (X, Y ) = n ⇔ Dn (X) ∩ Y = ∅ and Dn−1 (X) ∩ Y = ∅
(11)
or equivalently by the symmetrical expression. For n = 0 we have dN (X, Y ) = 0 ⇔ X ∩ Y = ∅. The translation of these equivalences provides, for n > 0, the following distance density: δN (µ, µ )(n) = t[sup t[µ (x), Dνn (µ)(x)], c[sup t[µ (x), Dνn−1 (µ)(x)]]] x∈S
(12)
x∈S
or a symmetrical expression derived from this one, and δN (µ, µ )(0) = supx∈S t[µ(x), µ (x)].
Fuzzy Spatial Relationships from Mathematical Morphology
21
This definition of fuzzy nearest point distances (defined as fuzzy numbers) between two fuzzy sets (and of Hausdorff distance too) does not necessarily share the same properties as their crisp equivalent, depending on the choice of the involved t-norms and t-conorms for some of them. Let us now consider the second question, i.e. defining the area of the space that satisfies some distance property with respect to a reference object. We assume that a set A is known as one already recognized object, or a known area of S, and that we want to determine B, subject to satisfy some distance relationship with A. According to the algebraic expressions of distances, dilation of A is an adequate tool for this. For instance if the knowledge expresses that d(A, B) ≥ n, then B should be looked for in Dn−1 (A)C . As another example, expressing that B should lay between a distance n1 and a distance n2 of A can be obtained by considering both minimum and maximum (Hausdorff) distances: the minimum distance should be greater than n1 and the maximum distance should be less than n2 . In this case, the volume of interest for B is reduced to Dn2 (A) \ Dn1 −1 (A). In cases where imprecision has to be taken into account, fuzzy dilations are used, with the corresponding equivalences with fuzzy distances [9,6]. The extension to approximate distances calls for fuzzy structuring elements. We define these structuring elements through their membership function ν on S. Structuring elements with a spherical symmetry can typically be used, where the membership degree only depends on the distance to the center of the structuring element. Let us consider the generalization to the fuzzy case of the last case (minimum distance of at least n1 and maximum distance of at most n2 to a fuzzy set µ). Instead of defining an interval [n1 , n2 ], we consider a fuzzy interval, defined as a fuzzy set on R+ having a core equal to the interval [n1 , n2 ]. The membership function µn is increasing between 0 and n1 and decreasing after n2 (this is but one example). Then we define two structuring elements, as: 1 − µn (dE (v, 0)) if dE (v, 0) ≤ n1 ν1 (v) = (13) 0 else ν2 (v) =
1 if dE (v, 0) ≤ n2 µn (dE (v, 0)) else
(14)
where dE is the Euclidean distance in S and O the origin. The spatial fuzzy set expressing the approximate relationship about distance to µ is then defined as: µdistance = t[Dν2 (µ), 1 − Dν1 (µ)]
(15)
if n1 = 0, and µdistance = Dν2 (µ) if n1 = 0. The increasingness of fuzzy dilation with respect to both the set to be dilated and the structuring element [9] guarantees that these expressions do not lead to inconsistencies: we have ν1 ⊂ ν2 , ν1 (0) = ν2 (0) = 1, and therefore µ ⊂ Dν1 (µ) ⊂ Dν2 (µ). In the case where n1 = 0, we do not have ν1 (0) = 1 any longer, but in this case, only the dilation by ν2 is considered. This case corresponds actually to a distance to µ less than
22
Isabelle Bloch
“about n2 ”. These properties are indeed expected for representations of distance knowledge. Directional Relative Position. This type of relation is ambiguous and imprecise even of objects are crisp. Therefore, relative position concepts may find a better understanding in the framework of fuzzy sets, as fuzzy relationships, even for crisp objects. This framework makes it possible to propose flexible definitions which fit the intuition and may include subjective aspects, depending on the application and on the requirements of the user. The few existing fuzzy approaches in the literature mostly rely on angle histogram [24,22] or extensions of it [23]. Our approach is completely different since it works directly in the spatial domain. Let us consider a reference object R and an object A for which the relative position with respect to R has to be evaluated. In order to evaluate the degree to which A is in some direction with respect to R, we propose the following approach [4,5]: 1. We first define a fuzzy “landscape” around the reference object R as a fuzzy set such that the membership value of each point corresponds to the degree of satisfaction of the spatial relation under examination. 2. We then compare the object A to the fuzzy landscape attached to R, in order to evaluate how well the object matches with the areas having high membership values (i.e. areas that are in the desired direction). This is done using a fuzzy pattern matching approach, which provides an evaluation as an interval instead of one number only. This makes another difference with respect to all the previous approaches, and, to our opinion, it provides a richer information about the considered relationship. The first step answers to the second type of question, while the second one answers to the first type. Let us detail the first step. In the 3D Euclidean space S, a direction is defined by two angles α1 and α2 , where α1 ∈ [0, 2π] and α2 ∈ [− π2 , π2 ] (α2 = 0 in the 2D case). The direction in which the relative position of an object with respect to another one is evaluated is denoted by: uα1,α2 = (cos α2 cos α1 , cos α2 sin α1 , sin α2 )t, and we note α = (α1 , α2 ). We denote by µα (A) the fuzzy region representing the relation to be in the direction uα1 ,α2 with respect to reference object A. Points that satisfy this relation with high degrees should have high membership values. In other terms, the membership function µα (A) has to be an increasing function of the degree of satisfaction of the relation. Let us denote by P any point in S, and by Q any point in A. Let β(P, Q) be the angle between the vector QP and the direction uα1 ,α2 , computed in [0, π]: QP · uα1 ,α2 , and β(P, P ) = 0. (16) β(P, Q) = arccos
QP
Setting β(P, P ) = 0 allows actually to deal with overlapping objects or with fuzzy objects with overlapping supports.
Fuzzy Spatial Relationships from Mathematical Morphology
23
We then determine for each point P the point Q of A leading to the smallest angle β, denoted by βmin . In the crisp case, this point Q is the reference object point from which P is visible in the direction the closest to uα1 ,α2 : βmin (P ) = minQ∈A β(P, Q). The spatial fuzzy set µα (A) at point P is then defined as µα (A)(P ) = f (βmin (P )), where f is a decreasing function of [0, π] into [0, 1]. We choose a function that sets the values of µα (A)(P ) to 0 as soon as βmin becomes greater than π/2. This avoids to get positive membership values for points having coordinates completely outside of the coordinate range of A in the desired direction. In the fuzzy case, we propose a method which translates binary equations and propositions into fuzzy ones as: µα (A)(P ) =
max Q∈Supp(A)
t[µA (Q), f (β(P, Q))],
(17)
where t is a t-norm. An advantage of this approach is its easy interpretation in terms of morphological operations. It can indeed be shown [4] that µα (A) is exactly the fuzzy dilation of A by ν, where ν is the fuzzy structuring element defined on S as: ∀P ∈ S, ν(P ) = f [β(O, P )],
(18)
with O as the center of the structuring element. Among the nice properties of this definition is invariance with respect to geometrical transformations (translation, rotation, scaling), which is a requirement in object recognition. It also has a behavior that fits well the intuition if the distance to the reference object increases, and in case of concavities. These properties are detailed in [4], and several examples are shown. For the second step, let us denote by µA the membership function of the object A, which is a function of S into [0,1]. The evaluation of relative position of A with respect to R is given by a function of µα (R)(x) and µA (x) for all x in S. An appropriate tool for defining this function is the fuzzy pattern matching approach [17]. Following this approach, the evaluation of the matching between two possibility distributions consists of two numbers, a necessity degree N (a pessimistic evaluation) and a possibility degree Π (an optimistic evaluation), as often used in the fuzzy set community. An average measure can also be useful from a practical point of view. We proved [5] that the possibility has a symmetry property (i.e. the possibility for A to be in some direction from B is equal to the possibility of B to be in the opposite direction with respect to A). Also, the proposed definition is invariant with respect to translation, rotation and scaling, for 2D and 3D objects (crisp and fuzzy). We also proved that when the distance between the objects increases, the objects are seen as points. The value of their relative position can be predicted only from the direction of interest and the direction in which one object goes far away from the reference object. Therefore the shape of the objects does no longer play any role in the assessment of their relative position. Finally, we looked at the behavior of the proposed definition in cases where the reference object
24
Isabelle Bloch
has strong concavities, and show that the behavior corresponds to what can be intuitively expected.
3
Use of Fuzzy Spatial Relationships in Model-Based Pattern Recognition
Let us now briefly illustrate how these fuzzy spatial relations can be used for recognizing structures in a scene based on a model of this scene. Two types of approaches can be developed, corresponding to the two types of questions mentioned in Section 2. Graph-Based Approach. In the first approach, spatial relations evaluated between spatial entities (typically objects or regions) are considered as attributes in a graph. Graph representations are widely used for dealing with structural information, in different domains including image interpretation and model-based pattern recognition. Here, we assume that the model is represented as a graph where nodes are objects and edges represent links between these objects. Both nodes and edges are attributed. Node attributes are characteristics of the objects, while edge attributes quantify spatial relationships between the objects. A data graph is then constructed from each image where the recognition has to be performed. Since it is usually difficult to segment directly the objects, usually the graph is based on an over-segmentation of the image, for instance based on watersheds. Each region constitutes a node of this data graph, and edges represent links between regions. Attributes are computed as for the model. The use of fuzzy relations is particularly useful in order to be less sensitive to the segmentation. One important problem to be solved then is graph matching. In order to achieve a good correspondence between both graphs, the most used concept is the one of graph isomorphism and a lot of work is dedicated to the search for the best isomorphism between two graphs or subgraphs. However, in a number of cases, the bijective condition is too strong: because of the schematic aspect of the model and of the difficulty to segment the image into meaningful entities, no isomorphism can be expected between both graphs. In particular, several regions of the image can be assigned to the same node of the model graph. Such problems call for inexact graph matching. It constitutes generally in finding a morphism, which furthermore optimizes an objective function based on similarities between attributes. The morphism aims an preserving the structure of the graphs, while the objective function privileges the association between nodes, respectively between edges, with similar attribute values. This approach can benefit from the huge literature on fuzzy comparison tools (see e.g. [13]) and from recent developments on fuzzy morphisms [25]. The optimization is not an easy task since the problem is NP-hard. Genetic algorithms, estimation of distribution algorithms (EDA) and tree search methods have been developed towards this aim [26,2,14]. This approach has been applied in brain imaging, in order to recognize brain structures in a 3D magnetic resonance image (MRI) based on an anatomical at-
Fuzzy Spatial Relationships from Mathematical Morphology
25
Fig. 1. Left: model; middle: over-segmented image (subset); bottom: results on a few face features obtained with EDA (from [14]).
las, and in face feature recognition, based on a rough model of a face constructed from a different person image (an example is shown in Figure 1). Focusing Attention Based on Spatial Representation of Spatial Knowledge. In the second type of approach, we use the spatial representation of spatial knowledge. Each relation is then represented as a spatial fuzzy set, constraining the search for the object that should satisfy this relation. This region of interest allows to focus attention towards the only region satisfying the relation (to some degree). Since usually several relations are represented in the model for describing one structure, fusion of these representations should be performed. The fuzzy set frameworks offers a large set of fusion operators, varying from conjunctive to disjunctive ones, including adaptive operators [3]. The fusion of all regions of interest leads to a fuzzy region representing the combination of all relationships concerning one structure. Then segmentation of the structure can be based on the image information (typically grey levels) focused in the obtained fuzzy region. A recognition procedure based on this type of representation has been developed for the recognition of internal brain structures in MRI [21,8]. The model has an iconic part (digital atlas) and a symbolic part (linguistic descriptions of relationships between anatomical structures). The procedure consists in recognizing first simple structures (typically brain and lateral ventricles), and then progressively more and more difficult structures, based on relationships between these
26
Isabelle Bloch
Fig. 2. Information representation in the image space (only one slice of the 3D volume is shown), illustrating knowledge about one caudate nucleus: shape information (left), set relationships (middle), and relative directional relationship (right). Membership values vary from 0 (white) to 1 (black).
structures and previously recognized structures. Each relationship describing the structure to be recognized is translated into a spatial fuzzy set representing the area satisfying this relation, to some degrees. The fuzzy sets representing all relationships involved in the recognition processed are fused using a numerical fusion operator. In the obtained fuzzy region of interest, a segmentation procedure is performed, and the quality of the results is guaranteed by the very restricted (focused) area in which the structure of interest is searched. For instance, the recognition of a caudate nucleus in a 3D MRI image uses the previous recognition of brain and lateral ventricles and the following pieces of knowledge, illustrated in Figure 2: – rough shape and localization are provided by the representation of the caudate nucleus in the atlas, and its fuzzy dilation to account for variability and for inexact matching between the model and the image, – the caudate nucleus belongs to the brain (black) but is outside from both lateral ventricles (white components inside the brain), – the caudate nucleus is lateral to the lateral ventricle. These pieces of knowledge can be combined (also with information extracted from the image itself), which leads to a successful recognition of the caudate nucleus. Figure 3 illustrates the spatial representation of some knowledge about distances, used for other structures. Figure 4 shows 3D views of some cerebral objects as defined in the atlas and as recognized in an MR image with our method. They are correctly recognized although the size, the location and the morphology of these objects in the image significantly differ from their definitions in the atlas. Note in particular the good recognition of third and fourth ventricles, that are very difficult to segment directly from the image. Here the help of relationships to other structures is very important.
Fuzzy Spatial Relationships from Mathematical Morphology µ
µ
1
µ 1
1
d
0
0
0 D
d
27
D
d
D max 2
D max
Fig. 3. Examples of representation of knowledge about distances. Top: membership functions µn . Bottom: spatial fuzzy sets. The following types of knowledge are illustrated: the putamen has an approximately constant distance to the brain surface (left), the caudate nucleus is at a distance about less than D from the lateral ventricles (in white) (middle), lateral ventricles are inside the brain and at a distance larger than about D from the brain surface (right). The contours of the o bjects we are looking at are shown in white.
Fig. 4. Recognition results. The left view represents six objects from the model atlas: lateral ventricles (medium grey), third and fourth ventricles (light grey), caudate nucleus and putamen (dark grey). The right view represents the equivalent objects recognized from a MRI acquisition. (From [21].)
The segmentation can be further improved once recognition is achieved by integrating the fuzzy regions representing the spatial relations as new energy terms in deformable models [15]. This approach has been used in other domains, for instance in mobile robotics to reason about the spatial position of the robot and the structure of its environment [11].
28
Isabelle Bloch
Model: adjacent objects
Image (segmentation errors)
Fig. 5. Sensitivity of crisp adjacency: small modifications in the shapes may completely change the adjacency relation, and thus prevent a correct recognition based on this relationship.
4
Digital Aspects and Computational Issues
In this Section we address a few issues related to digital and computational aspects. In particular we show that introducing fuzziness overcomes some problems occurring when working in digital spaces. Although this may induce additional computation cost, some fast algorithms can be designed to get good approximations in reasonable time. Topological Relations. We first discuss set relationships and adjacency. In digital spaces, these relations are highly sensitive, since in the binary case, the result can depend on one point only. Also the segmentation can induce errors that can completely change the relations. For instance two objects that are expected to be adjacent (for instance if they are described as such in the model) can appear as not adjacent depending on the digitization or if some even tiny errors occur during the segmentation. An example illustrates this sensitivity in Figure 5. This is clearly a limitation of binary (all or nothing) definitions. In the fuzzy case, the problem is much less crucial. Indeed, there is no more strict membership, the fuzziness allows us to deal with some gradual transition between objects or between object and background, and relations become then a matter of degree. Therefore, through the notions of fuzzy neighborhood, fuzzy inclusion and fuzzy adjacency, we can expect to gain in robustness when assessing the relationships between two objects. In this respect, the fuzziness, even on digital images, could be interpreted as a partial recovering of the continuity lost during the digitization process. Two ways can be followed to achieve this aim, for instance for adjacency [10]. In the first one, the satisfaction of the adjacency property between two objects is considered to be a matter of degree even if the objects are crisp; this can be more appropriate than a binary index [27,28]. This degree can be a decreasing function of the minimum distance between the object for instance (a zero distance meaning exact adjacency). The second one consists in introducing imprecision in the objects themselves, and to deal with fuzzy spatial objects. For instance, spatial imprecision due to the limited quality of image information can be represented in an adequate way by considering fuzzy objects. Then obviously adjacency is also a matter of degree. This is the approach described in Section 2.
Fuzzy Spatial Relationships from Mathematical Morphology
Object A
Reference object (R)
29
Object B
Reference object (R)
Fig. 6. Two examples where the relative position of objects with respect to the reference object is difficult to define in a “all-or-nothing” manner: the object A is to the right of R but it can also be considered to be to some extent above it; object B is strongly to the right of R and above it.
Relative Directional Position. The above discussion on robustness also holds for relative directional position. An additional aspect concerns the digitization of directions of space. The approach described in Section 2 offers several advantages from this point of view. In particular it avoids to describe the relative position by only one dominant direction, which is not satisfactory in several situations, even of moderate complexity (see examples of Figure 6). It is also more flexible than just cardinal directions, which is an advantage of the semi-quantitative aspect. But one limitation of this approach is that its computation might be a problem in large 3D spaces. Here we can take benefit of the digital nature of the space to propose two solutions to reduce the computational cost. The computation can be made faster by storing the list of points in R (which are often much less numerous than all image points), and by tabulating angles (since QP takes a finite number of integer values in discrete images). The interpretation of the proposed definition as a fuzzy dilation may suggest a further way to reduce the computation time by reducing the precision of µα (R): it consists in performing the fuzzy dilation with a limited support for the structuring element. This amounts to have a rough quantification of angles, and therefore an approximate result is obtained. A second solution is to approximate the result using a propagation algorithm, similar as the ones used for computing chamfer distances. We have proposed in [4] a fast algorithm for computing µα (R), that still provides an approximation of µα (R) but with increased precision with respect to the algorithm based on dilation. This algorithm is based on a propagation technique inspired by chamfer methods used for instance for discrete distance computation [12]. This idea comes from the observation of thew results where it appears that membership values in the fuzzy set µα are constant along lines issued from contour points of the reference object. The algorithm consists in performing two passes on the image, one in the conventional sense, and one in the opposite sense. For each point P , we store the point Q = O(P ) from which the minimum visibility angle is obtained. For
30
Isabelle Bloch
a point P , we do not consider all points in R as for the exhaustive method, but only those of a neighborhood of P . The algorithm consists of the following steps: 1. Initialization: we set O(P ) = P if P ∈ R and O(P ) = N ull otherwise. 2. First pass: we compute the fuzzy landscape from visibility angle at P as: µα (R)(P ) = maxQ∈V (P ) t[µR (O(Q)), f (β(P, O(Q)))], where V (P ) denotes the neighborhood of P . Let QP be the point Q for which the maximum value is obtained is obtained: QP = arg maxQ∈V (P ) t[µR (O(Q)), f (β(P, O(Q)))]. Then we set: O(P ) = O(QP ). 3. Second pass: it is performed as the first one, except that the points are examined in the reverse order. Note that during these two passes, the points of R can also be modified. This algorithm is applicable in 2D as well as in 3D, and for crisp objects as well as for fuzzy ones. We used 8-connectivity in 2D, and 26-connectivity in 3D for defining V (P ). More precise results could be obtained with larger neighborhoods or with more passes on the image using other propagation directions, but at the price of extended computation time. The errors are mainly due to the fact that when there are several candidates for QP (i.e. leading to the same minimal value for βmin ), there is no clear strategy of choice of one particular point among the candidates. Although the result obtained for µα (R) using the propagation algorithm is not exact, it can be considered as a good approximation. Figure 7 illustrates the results obtained with the propagation algorithm and the difference with the exact method for several reference objects. They show the quality of the approximation. The results may show no error at all depending on the angle with respect to the propagation directions, and depending on the object (this is the case for instance for the square of Figure 6). In the fuzzy case too, only few differences can be observed. Moreover, when using these results instead of the exact ones, we observed only few differences in the pattern matching results (the maximum error is at most a few percentage points, and generally less than 5%). These differences cannot be considered as of much significance for pattern recognition purposes. Distances. As for distances, the possibility to represent distance knowledge expressed in an imprecise way offers more flexibility and more robustness against digitization effects. For instance stating that two objects are at a distance exactly 10 can easily fail due to the digitization. On the contrary, modeling a distance of “about 10” as illustrated in Figure 3 is much more robust. Let us now address the computational aspects for distances. Here again we can take advantage of the digitization. If the object is binary, we can compute a distance map to the object using a chamfer algorithm, as usual. Then the values of the fuzzy relation are obtained by using simply the curves of Figure 3 as a look-up-table. This is the approach used for instance in [21].
Fuzzy Spatial Relationships from Mathematical Morphology
31
Fig. 7. A few examples of µα (R) for α1 = α2 = 0 for different types of reference objects (reference objects are black) using the propagation method. The second line shows the difference with the exact method (a grey level of 128 corresponds to no error, and the differences have been enhanced for the visualization). For the corner (left example) we obtain no error for all directions. (From [4].)
In the fuzzy case, an approximation of the distance map can be obtained by dilating the reference object by a fuzzy structuring element of conic shape, the membership values at each point being a linear function of the distance to the origin. In the binary case, this approach is exact, while it is only an approximation in the fuzzy case. Computation can be reduced by limiting the size of the support of the structuring element to some maximal interesting distance. This type of fuzzy structuring element has been used for instance in [19] to represent the concept of large open space in a robot’s environment. Finally, if we have a fuzzy object and a fuzzy distance knowledge representation, it could be interesting to develop algorithms generalizing chamfer algorithms to address the second type of question raised in Section 2. We leave this for our future work.
5
Conclusion
The spatial arrangement of objects in images provides important information for recognition and interpretation tasks, in particular when the objects are embedded in a complex environment like in medical or remote sensing images. Such information can be expressed in different ways varying from purely quantitative and precise ones to purely qualitative and symbolic ones. The fuzzy set framework provides an interesting semi-quantitative alternative. We have shown that mathematical morphology provides an appropriate framework to express different types of spatial relationships in a unified formalism and to answer different questions about them, with good properties. Due to the strong algebraic structure of this framework, it applies to objects represented as sets, as fuzzy sets, and as logical formulas as well, which offers different points of view compared to the one adopted in this paper [7]. The different types of representation of spatial relations lead to model-based pattern recognition approaches such as graph-based or focalization methods.
32
Isabelle Bloch
Applications of this work concern model-based pattern recognition in complex images, spatial knowledge representation issues, and spatial reasoning. Finally digital and computational issues can also benefit from the fuzzy set framework, in particular to gain in robustness. These aspects are worth to be further developed in future work.
References 1. J. Allen. Maintaining Knowledge about Temporal Intervals. Comunications of the ACM, 26(11):832–843, 1983. 2. E. Bengoetxea, P. Larranaga, I. Bloch, A. Perchant, and C. Boeres. Inexact Graph Matching by Means of Estimation of Distribution Algorithms. Pattern Recognition, 35:2867–2880, 2002. 3. I. Bloch. Information Combination Operators for Data Fusion: A Comparative Review with Classification. IEEE Transactions on Systems, Man, and Cybernetics, 26(1):52–67, 1996. 4. I. Bloch. Fuzzy Relative Position between Objects in Image Processing: a Morphological Approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(7):657–664, 1999. 5. I. Bloch. Fuzzy Relative Position between Objects in Image Processing: New Definition and Properties based on a Morphological Approach. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 7(2):99–133, 1999. 6. I. Bloch. On Fuzzy Distances and their Use in Image Processing under Imprecision. Pattern Recognition, 32(11):1873–1895, 1999. 7. I. Bloch. Mathematical Morphology and Spatial Relationships: Quantitative, SemiQuantitative and Symbolic Settings. In L. Sztandera and P. Matsakis, editors, Applying Soft Computing in Defining Spatial Relationships, pages 63–98. Physica Verlag, Springer, 2002. 8. I. Bloch, T. G´eraud, and H. Maˆıtre. Representation and Fusion of Heterogeneous Fuzzy Information in the 3D Space for Model-Based Structural Recognition - Application to 3D Brain Imaging. Artificial Intelligence Journal, 2003. 9. I. Bloch and H. Maˆıtre. Fuzzy Mathematical Morphologies: A Comparative Study. Pattern Recognition, 28(9):1341–1387, 1995. 10. I. Bloch, H. Maˆıtre, and M. Anvari. Fuzzy Adjacency between Image Objects. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 5(6):615–653, 1997. 11. I. Bloch and A. Saffiotti. On the Representation of Fuzzy Spatial Relations in Robot Maps. In IPMU 2002, volume III, pages 1587–1594, Annecy, France, 2002. 12. G. Borgefors. Distance Transforms in the Square Grid. In H. Maˆıtre, editor, Progress in Picture Processing, Les Houches, Session LVIII, 1992, chapter 1.4, pages 46–80. North-Holland, Amsterdam, 1996. 13. B. Bouchon-Meunier, M. Rifqi, and S. Bothorel. Towards General Measures of Comparison of Objects. Fuzzy Sets and Systems, 84(2):143–153, September 1996. 14. R. Cesar, E. Bengoetxea, and I. Bloch. Inexact Graph Matching using Stochastic Optimization Techniques for Facial Feature Recognition. In International Conference on Pattern Recognition ICPR 2002, Qu´ebec, aug 2002. 15. O. Colliot. Repr´esentation, ´evaluation et utilisation de relations spatiales pour l’interpr´etation d’images, application a ` la reconnaissance de structures anatomiques en imagerie m´ edicale. PhD thesis, Ecole Nationale Sup´erieure des T´el´ecommunications, 2003.
Fuzzy Spatial Relationships from Mathematical Morphology
33
16. D. Dubois and H. Prade. Fuzzy Sets and Systems: Theory and Applications. Academic Press, New-York, 1980. 17. D. Dubois, H. Prade, and C. Testemale. Weighted Fuzzy Pattern Matching. Fuzzy Sets and Systems, 28:313–331, 1988. 18. S. Dutta. Approximate Spatial Reasoning: Integrating Qualitative and Quantitative Constraints. International Journal of Approximate Reasoning, 5:307–331, 1991. 19. E. Fabrizi and A. Saffiotti. Extracting Topology-Based Maps from Gridmaps. In IEEE International Conference on Robotics and Automation (ICRA-2000), San Francisco, CA, 2000. 20. J. Freeman. The Modelling of Spatial Relations. Computer Graphics and Image Processing, 4(2):156–171, 1975. 21. T. G´eraud, I. Bloch, and H. Maˆıtre. Atlas-guided Recognition of Cerebral Structures in MRI using Fusion of Fuzzy Structural Information. In CIMAF’99 Symposium on Artificial Intelligence, pages 99–106, La Havana, Cuba, 1999. 22. J. M. Keller and X. Wang. Comparison of Spatial Relation Definitions in Computer Vision. In ISUMA-NAFIPS’95, pages 679–684, College Park, MD, September 1995. 23. P. Matsakis and L. Wendling. A New Way to Represent the Relative Position between Areal Objects. IEEE Trans. on Pattern Analysis and Machine Intelligence, 21(7):634–642, 1999. 24. K. Miyajima and A. Ralescu. Spatial Organization in 2D Segmented Images: Representation and Recognition of Primitive Spatial Relations. Fuzzy Sets and Systems, 65:225–236, 1994. 25. A. Perchant and I. Bloch. Fuzzy Morphisms between Graphs. Fuzzy Sets and Systems, 128(2):149–168, 2002. 26. A. Perchant, C. Boeres, I. Bloch, M. Roux, and C. Ribeiro. Model-based Scene Recognition Using Graph Fuzzy Homomorphism Solved by Genetic Algorithm. In GbR’99 2nd International Workshop on Graph-Based Representations in Pattern Recognition, pages 61–70, Castle of Haindorf, Austria, 1999. 27. A. Rosenfeld. Fuzzy Digital Topology. Information and Control, 40:76–87, 1979. 28. A. Rosenfeld. The Fuzzy Geometry of Image Subsets. Pattern Recognition Letters, 2:311–317, 1984. 29. A. Rosenfeld and A. C. Kak. Digital Picture Processing. Academic Press, NewYork, 1976. 30. L. Vieu. Spatial Representation and Reasoning in Artificial Intelligence. In O. Stock, editor, Spatial and Temporal Reasoning, pages 5–41. Kluwer, 1997. 31. L. A. Zadeh. The Concept of a Linguistic Variable and its Application to Approximate Reasoning. Information Sciences, 8:199–249, 1975.
Shape Similarity and Visual Parts Longin Jan Latecki1 , Rolf Lak¨ amper1 , and Diedrich Wolter2 1
2
Dept. of Computer and Information Sciences, Temple University Philadelphia, USA {latecki,lakamper}@temple.edu Dept. of Computer Science, University of Bremen, Bremen, Germany [email protected]
Abstract. Human perception of shape is based on visual parts of objects to a point that a single, significant visual part is sufficient to recognize the whole object. For example, if you see a hand in the door, you expect a human behind the door. Therefore, a cognitively motivated shape similarity measure for recognition applications should be based on visual parts. This cognitive assumption leads to two related problems of scale selection and subpart selection. To find a given query part Q as part of an object C, Q needs to have a correct size with regards to C (scale selection). Assuming that the correct size is selected, the part Q must be compared to all possible subparts of C (subpart selection). For global, contour-based similarity measures, scaling the whole contour curves of both objects to the same length usually solves the problem of scale selection. Although this is not an optimal solution, it works if the whole contour curves are ‘sufficiently’ similar. Subpart selection problem does not occur in the implementation of global similarity measures. In this paper we present a shape similarity system that is based on correspondence of visual parts, and apply it to robot localization and mapping. This is a particularly interesting application, since the scale selection problem does not occur here and visual parts can be obtained in a very simple way. Therefore, only the problem of subpart selection needs to be solved. Our solution to this problem is based on a contour based shape similarity measure supplemented by a structural arrangement information of visual parts.
1
Motivation and Overview of Shape Descriptors
Shape descriptors for comparing silhouettes of 2D objects in order to determine their similarity are important and useful for wide range of applications, of which the most obvious is shape-based object retrieval in image databases. Shape’s importance is indicated by the fact that the MPEG-7 group incorporated shape descriptors into the MPEG-7 standard. Since the 2D objects are projections of 3D objects their silhouettes may change due to: 1. change of a view point with respect to objects, 2. non-rigid object motion (e.g., people walking or fish swimming), 3. noise (e.g., digitization and segmentation noise). I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 34–51, 2003. c Springer-Verlag Berlin Heidelberg 2003
Shape Similarity and Visual Parts
35
Fig. 1. Some shapes used in part B of MPEG-7 Core Experiment CE-Shape-1. Shapes in each row belong to the same class.
The goal of the Core Experiment CE-Shape-1 [20] was to evaluate the performance of 2D shape descriptors under such conditions. The shapes were restricted to simple pre-segmented shapes defined by their bitmaps. Some example shapes are shown in Figure 1. The main requirement was that the shape descriptors should be robust to small non-rigid deformations due to (1), (2), or (3). In addition the descriptors should be scale and rotation invariant. The main part of the Core Experiment CE-Shape-1 was part B: similaritybased retrieval. The data set used for this part is composed of 1400 shapes stored as binary images. The shapes are divided into 70 classes with 20 images in each class. In the test, each image was used as a query, and the number of similar images (which belong to the same class) was counted in the top 40 matches (bulls-eye test). Since the maximum number of correct matches for a single query image is 20, the total number of correct matches is 28000. It turned out that this data set is the only set that is used to objectively evaluate the performance of various shape descriptors. We present now some of the shape descriptors with the best performance on this data set. It is not our goal to provide a general overview of all possible shape descriptors. A good overview can be found in the book by Costa and Cesar [4]. The shape descriptors can be divided into three main categories: 1. contour based descriptors: the contour of a given object is mapped to some representation from which a shape descriptor is derived, 2. area based descriptors: the computation of a shape descriptor is based on summing up pixel values in a digital image of the area containing the silhouette of a given object; the shape descriptor is a vector of a certain number of parameters derived this way (e.g., Zernike moments [13]), 3. skeleton based descriptors: after a skeleton is computed, it is mapped to a tree structure that forms the shape descriptor; the shape similarity is computed by some tree-matching algorithm.
36
Longin Jan Latecki, Rolf Lak¨ amper, and Diedrich Wolter
The idea of representing shapes by their skeletons in Computer Vision goes back to Blum [3]. Siddiqi et al. [25] also convert object skeletons to a tree representation and use a tree-matching algorithm to determine the shape similarity. In the MPEG-7 Core Experiment CE-Shape-1 part B, shape descriptors of all three categories were used. A general conclusion is that contour based descriptors significantly outperformed the descriptors of the other two categories [20]. It seems to be that area based descriptors are more suitable for shape classification than for indexing. The week performance of skeleton based descriptors can probably be explained by unstable computation of skeletons related to discontinuous relation between object boundary and skeletons. A small change in the object boundary may lead to a large change in the skeleton. As reported in [20], the best retrieval performance of 76.45% for part B was obtained for shape descriptor of Latecki and Lakaemper [17], that will be described in this paper, (presented by the authors in cooperation with Siemens Munich) followed by shape descriptor of Mokhtarian et al. [22,23] with retrieval rate of 75.44% (presented by Mitsubishi Electric ITE-VIL). It is important to mention that 100% retrieval rate on this data set is not possible to achieve employing only shape. The classification of the objects was done by human subjects, and consequently, some shapes can be only correctly classified when semantic knowledge is used. Meanwhile new shape descriptors have been developed that yield a slightly better performance. The best reported performance on this data set is obtained by Belongie et al. [2], 76.51%. The small differences in the retrieval rate of these approaches are more likely to indicate a better parameter tuning than a better approach. All the contour based shape descriptors have a common feature that limits their applicability. They require a presence of the whole contours to compute shape similarity. Although they are robust to some small distortions of contours, they will fail if a significant part of contour is missing or is different. The same critique applies to area and skeleton based shape descriptors that require the whole object area or the complete skeleton to be present. The goal of this paper is to direct our attention to a cognitively motivated ability of shape descriptors and the shape similarity measures that is necessary for most practical applications of shape similarity. It is the ability of partial matching. Partial matching leads to two related problems of scale selection and subpart selection. To find a given query part Q as part of an object C, Q needs to have a correct size with regards to C (scale selection). Assuming that the correct size is selected, the part Q must be compared to all possible subparts of C (subpart selection). The subparts may be obtained either by a decomposition of Q into parts using some decomposition criterion or simply by sliding Q over all possible positions with respect to C, e.g., the beginning point of Q is aligned with each point of C. A good example of an approach that allows for partial matching is a singledirectional Hausdorff distance [12], which tries to minimize the distance of all
Shape Similarity and Visual Parts
37
points of the query part Q to points of object C. However, the problem of scale selection cannot be solved in the framework of Hausdorff distance alone. For example, the approach presented in [12] simply enumerates all possible scales. Moreover, the Hausdorff distance does not tolerate shape deformations that preserve the structure of visual parts, i.e., the objects differing by such deformations although very similar to humans will have a large similarity value. For global, contour-based similarity measures, scaling the whole contour curves of both objects to the same length usually solves the problem of scale selection. Although this is not an optimal solution, it works if the whole contour curves are ‘sufficiently’ similar. Subpart selection problem does not occur in the implementation of global similarity measures. To our knowledge, there does not exist an approach to partial shape similarity that also solves the scaling problem. In this paper we show that the shape descriptor presented by Latecki and Lakaemper [17] can be easily modified to perform partial matching when the scale is known. An ideal application where this restriction is satisfied is robot localization and mapping using laser range data. Therefore, we apply our shape similarity measure in this context.
2
Shape Representation, Simplification, and Matching
For a successful shape-representation we need to account for arbitrary shapes. Any kind of boundary information obtained must be representable. Therefore, we will use polygonal curves as boundary representation. We developed a theory and a system for a cognitively motivated shape similarity measure for silhouettes of 2D objects [17,18,16]. To reduce influence of digitization noise as well as segmentation errors the shapes are first simplified by a novel process of discrete curve evolution which we introduced in [16,19]. This allows us • (a) to reduce influence of noise and • (b) to simplify the shape by removing irrelevant shape features without changing relevant shape features. A few stages of our discrete curve evolution are shown in Figure 2. The discrete curve evolution is context sensitive, since whether shape components are relevant or irrelevant cannot be decided without context. In [16], we show that the discrete curve evolution allows us to identify significant visual parts, since significant visual parts become maximal convex arcs on an object contour simplified by the discrete curve evolution. Let P be a polyline (that does not need to be simple). We will denote the vertices of P by V ertices(P ). A discrete curve evolution produces a sequence of polylines P = P 0 , ..., P m such that |V ertices(P m )| ≤ 3, where | . | is the cardinality function. Each vertex v in P i (except the first and the last if the polyline is not closed) is assigned a relevance measure that depends on v and its two neighbor vertices u, w in P i : K(v, P i ) = K(u, v, w) = |d(u, v) + d(v, w) − d(u, w)|,
(1)
38
Longin Jan Latecki, Rolf Lak¨ amper, and Diedrich Wolter
Fig. 2. A few stages of our discrete curve evolution.
where d is the Euclidean distance function. Note that K measures the bending of P i at vertex v; it is zero when u, v, w are collinear. The process of discrete curve evolution (DCE) is very simple: – At every evolution step i = 0, ..., m − 1, a polygon P i+1 is obtained after the vertices whose relevance measure is minimal have been deleted from P i . For end vertices of open polylines no relevance measure is defined, since the end vertices do not have two neighbors. Consequently, end-points of open polylines remain fixed. Note that P i+1 is obtained from P i by deleting such a vertex that the length change between P i and P i+1 is minimal. Observe that relevance measure K(v, P i ) is not a local property with respect to the polygon P = P 0 , although its computation is local in P i for every vertex v. This implies that the relevance of a given vertex v is context dependent, where the context is given by the adaptive neighborhood of v, since the neighborhood of v in P i can be different than its neighborhood in P . The discrete curve evolution has also been successfully applied in the context of video analysis to simplify video trajectories in feature space [6,15]. DCE may be implemented efficiently. Polyline’s vertices can be represented within a double-linked polyline structure and a self-balancing tree simultaneously. Setting up this structure for a polyline containing n vertices has the complexity of O(n log n). A step within DCE constitutes of picking out the least relevant point (O(log n)), removing it (O(log n)), and updating it’s neighbor’s relevance measures (O(1)). As there are at most n points to be deleted, this yields an overall complexity of O(n log n). As it is applied to segmented polylines, the number of vertices is much smaller than the number of points read from the sensor. To compute our similarity measure between two polygonal curves, we establish the best possible correspondence of maximal convex arcs. To achieve this, we first decompose the polygonal curves into maximal convex subarcs. Since a simple one-to-one comparison of maximal convex arcs of two polygonal curves is of little use, due to the facts that the curves may consist of a different number of such arcs and even similar shapes may have different small features, we allow for 1-to-1, 1-to-many, and many-to-1 correspondences of the maximal convex arcs. The main idea here is that we have at least on one of the contours a maximal convex arc that corresponds to a part of the other conour composed of adjacent
Shape Similarity and Visual Parts
39
Fig. 3. The corresponding arcs are labeled by the same numbers.
maximal convex arcs. In this context the corresponding parts of contours can be identified with visual object parts. The best correspondence of the visual object parts, i.e., the one yielding the lowest similarity measure, can be computed using dynamic programming, where the similarity of the corresponding visual parts is as defined below. Using dynamic programing, the similarity between corresponding parts is computed and aggregated. The computation is described extensively in [17]. The similarity induced from the optimal correspondence of polylines C and D will be denoted S(C, D). Two example correspondences obtained by our approach are shown in Fig. 3. Since our shape matching technique is based on correspondence of visual parts, it will also work under a moderate amount of occlusion and/or segmentation errors. Basic similarity of arcs is defined in tangent space. Tangent space, also called turning function, is a multi-valued step function mapping a curve into the interval [0, 2π) by representing angular directions of line-segments only. Furthermore, arc lengths are normalized to 1 prior to mapping into tangent space. This representation was previously used in computer vision, in particular, in [1]. Denoting the mapping function by T , the similarity gets defined as follows: 1 l(C) l(D) 2 , (2) , (TC (s) − TD (s) + ΘC,D ) ds · max Sarcs (C, D) = l(D) l(C) 0 where l(C) denotes the arc length of C. The constant ΘC,D is chosen to minimize the integral (it respects for different orientation of curves) and is given by
1
TC (s) − TD (s)ds.
ΘC,D = 0
Obviously, the similarity measure is a rather a dissimilarity measure as the identical curves yield 0, the lowest possible measure. It should be noted that this measure is based on shape information only, neither the arcs’ position nor orientation are considered. This is possible due to the large context information of closed contours.
40
Longin Jan Latecki, Rolf Lak¨ amper, and Diedrich Wolter
Fig. 4. An illustration of query process in our shape database; from left to right: query sketch, first result, and refined result.
3
First Application: Image Database
The performance of our shape descriptor (described in Section 2) can be evaluated using the shape-based image database located at http://knight.cis.temple.edu/˜shape The interface allows query by shape based on hand-drawn sketches as well as texture and keywords. Using shape, the user defines the query drawing a shape boundary, see Figure 4(left). Since the system has to deal with moderate artistic abilities of the user (who may not be a gifted illustrator) the results are achieved in two steps of increasing precision: the first result set shows examples of different shape classes, presenting not a precise match but a wide variety of similar shapes. The reason is that not all parts existing in the hand-drawn sketch are considered as descriptive features. A typical example is an airplane roughly drawn from top view: the first search result includes planes, but also shows a cactus, a peeled banana etc., Figure 4(middle); note that these shapes have a similar boundary to a plane. To refine the search, one of the shapes is chosen as new query, which is now an object formerly stored in the database. It is independent from the user’s sketching talents, therefore it is reasonable to enhance the search precision based on all parts of the shape. The results of this second query are the most similar matches in the database using our similarity measure. The shapes in Figure 4(right) are the best matches for the airplane in the center of first result. The search can be recursively continued by choosing shapes of each result set as new query. Since the boundary of the chosen shape is first imported into the input-interface, it is possible to further enhance the search by additional information (e.g. texture).
Shape Similarity and Visual Parts
4
41
Second Application: Robot Mapping and Localization
Robot mapping and localization are the key points in building truly autonomous robots. The central method required is matching of sensor data, which - in the typical case of a laser range finder as the robot’s sensor - is called scan matching. Whenever a robot needs to cope with unknown or changing environments, localization and mapping have to be carried out simultaneously; this technique is called SLAM (Simultaneous Localization and Mapping). To attack the problem of mapping and/or localization, mainly statistical techniques are used (Thrun [28], Dissanayake et al. [7]). The extended Kalman filter, a linear recursive estimator for systems described by non-linear process models, and observation models are usually employed in current SLAM algorithms. The robot’s internal geometric representation builds the basis for these techniques. It is build atop of the perceptual data read from the laser range finder. Typically, either the planar location of reflection points read from the laser range finder is used directly as the geometric representation, or simple features in form of line segments or corner points are extracted (Cox [5]; Gutmann and Schlegel [8]; Gutmann [10]; R¨ ofer [24]). Although robot mapping and localization techniques are very sophisticated they do not yield the desired performance in all respects. We observe that these systems use only a very primitive geometric representation. As the internal geometric representation is the foundation for localization and mapping, shortcomings on the level of geometric representation affect the overall performance. Systems with geometric representation based on the extracted features outperform the systems based on the location of scan points in terms of compute time, but there is a major drawback. Systems relying on linear features can only cope with surroundings that are largely made up from linear segments. Hence, these approaches are limited to indoor office scenarios (R¨ofer [24]). To cope with unconstrained scenarios as needed for service robot applications, more general features are required, as most environments, like furnished rooms, lack of linear features but show a great variety of shapes. Figure 5 gives an impression of a regular home scenario. Furthermore, extracting lines from an environment lacking of exactly linear parts but presenting many slightly curved ones introduces a lot of noise. This noise affects the matching quality. As this noise is propagated from matching to matching, it accumulates, resulting in errors. But just like environments lacking of the features chosen for mapping, the presence of a lot of those features can lead to difficulties. Problems arise in a surrounding containing many similar features. For example, scanning a zigzag- shaped wall (or a curtain) results in detecting many lines at positions nearby each other pointing in similar directions. Applying a line-based matching treats all lines individually, a matching is susceptible to a mix-up. Hence, the map gets misaligned. Besides the specific shortcomings discussed, it has been claimed by various authors that using purely metric geometric representation will not suffice for a mobile robot system. Especially solving navigational tasks can benefit from a more abstract representation, e.g. a topological one. As metric information is needed for path planning and topological information is desired in navigation,
42
Longin Jan Latecki, Rolf Lak¨ amper, and Diedrich Wolter
Fig. 5. A regular living room perceived by a laser range finder. Each circle represents a reflection point measured. The lack of linear features is evident. Hence, more complex, versatile features need to be employed. The cross denotes the position and the orientation of the robot.
an urge to abstract from metrical data arises. Therefore, hybrid representations have been proposed (Thrun [26]; Kuipers [14]). Thus, a representation granting topological access alongside the metric information would be advantageous. Using either a feature extraction or not, mapping applications are opposed with another problem yet. As a topological correct map is a prerequisite to a successful navigation, maintaining topological correctness is a key point. We discuss two problems that require a careful mapping in order not to violate topology. The first problem is with self-intersections. Among existing approaches there are no global geometric constraints that prevent the resulting map from containing any overlaps. Such overlaps between parts of the map wrongly restrict the robot’s passable space. Maps containing such errors can no longer be used for navigation. The second problem is the cycle detection. The problem is illustrated in Figure 6(a). To link processing of perceptual data and handling of navigational tasks more fitting together, we believe introducing an improved geometric representation as basis of a mobile robot’s spatial representation is the central point. A successful geometric representation must result in a much more compact representation than uninterpreted perceptual data, but must neither discard valuable information nor imply any loss of generality. We claim that a shape-representation as the robot’s underlying spatial representation fulfills these demands. Representing the passable space explicitly by means of shape is not only adequate to mapping applications but helps also to bridge the gap from metric to topological information due to the object-centered perspective offered. Moreover, an object-centered representation is a crucial building block in dealing with changing environments, as this representation allows us to separate the partial changes from the unchanged parts. The demands posed on a scan matching are similar to the ones in computer vision as discussed in the beginning: the environment is perceived from different view points, the environment is composed of different visual parts, and sensor
Shape Similarity and Visual Parts
(a)
43
(b)
Fig. 6. (a) This figure from the paper by Gutmann and Konolidge [9] shows a partially mapped environment. Due to the propagation of errors, the cyclic path the robot was following is no longer cyclic. Subsequent mapping would lead to an overlap. (b) Using shape-similarity we can detect the overlapping parts (highlighted).
data is noisy. This provides a strong connection to shape matching. Although it has been stated in Lu and Milos’ fundamental work [21], “scan matching is similar to model-based shape matching”, approaches to scan-matching have so far not taken advantage of state-of-the-art techniques in shape-matching. We propose a shape-representation of the robot’s surrounding as it is perceived by a laser range finder (LRF). After the scan points are mapped to a 2D top view of the surrounding, they can be easily grouped to form connected polylines. Our features are these polylines, which we interpret as visual parts of the boundary of the scanned objects. Shape processing and matching of these visual parts allow us to derive a sophisticated matching of scans that is reliable as well as efficient. Using visual parts as features allows us to maintain the generality required for arbitrary indoor scenarios, since the boundary of any shape can be easily represented with a polyline. The richness of perceivable shapes in a regular indoor scenario yields a more reliable matching than other feature-based approaches, as mixups in determining features are more unlikely to occur. At the same time, we are able to construct a compact representation for an arbitrary environment. Our motivation for this approach is related to the human visual perception, where shape representation and recognition plays a primary role. It is wellknown that it is the case in object recognition. We claim that it is also the case for localization tasks and for route description in navigation. In the following part of this paper, we will show that the proposed shapebased representation and matching of LRF scans lead to robust robot localization and mapping. Moreover, shape matching allows us to also perform object recognition (as it is the case in Computer Vision). This ability is extremely useful to
44
Longin Jan Latecki, Rolf Lak¨ amper, and Diedrich Wolter
maintain the global map consistency in robot navigation as we illustrate on the problem of cycle detection now. Using shape representation and shape similarity measure we can easily correct the map depicted in Figure 6(a). A shape matching procedure can identify that the two parts marked in Figure 6(b) are very similar. Since these parts have a complicated shape structure, the probability of an accidental similarity is very close to zero. By transforming the map so that the matching parts are aligned, we correct the map. Observe that this process is cognitively motivated, since a human observant will notice that the map in Figure 6(a) is incorrect and will correct it by identifying the highlighted parts in Figure 6(b) as having identical shape.
5
From LRF Scan Data to Simplified Polylines
This section details how boundary information of scanned objects is extracted from LRF data and how a similarity between two boundaries is determined. First the range data acquired by the laser range finder is mapped to locations of reflection points in the Euclidean plane, i.e., reflection points are represented as points in the plane. Thus, we obtain a sequence of scan points in the plane in a local coordinate system, the robot’s heading aligned with the positive y-axis, e.g., see Figure 5. The order of the sequence reflects the order of the data as returned by the LRF. The next step is to segment this sequence into polylines that represent visual parts of the scan. It must be noticed that this is necessary, since two consecutive points in the scan reading do not necessarily belong to the same object. In this case they must not be represented by the same polyline. For this segmentation, a simple heuristic may be used: Whenever the Euclidean distance of two consecutive points exceeds a given threshold (20 cm is used), these points are supposed to belong to different objects. The obtained polylines that represent boundaries of these objects are viewed as visual parts of the scan boundary. Thus, the extraction of visual parts in this context is a very simple process. Segmented polylines still contain all the information read form the LRF. However, this data contains some noise. Therefore, we apply DCE (Section 2) that cancels noise as well as makes the data compact without loosing valuable shape information. To illustrate the complete process of feature extraction and, most importantly, the applicability of DCE to range finder data, refer to Figure 7. Once the simplified boundaries are computed, a similarity of boundaries can be computed as described in Section 2. However, for matching two scans we will not rely only on matching individual boundaries. A structural shape representation representing all boundaries within a single compound object is used to avoid faulty matches.
6
Structural Shape Representation and Matching
The boundary-based computation of similarity provides a distinctive measure for matching boundaries against each other. However, self-similarities in the
Shape Similarity and Visual Parts
(a)
(b)
(c)
(d)
(e)
(f)
45
Fig. 7. The process of extracting polygonal features from a scan consists of two steps: First, polygonal lines are set up from raw scanner data (a) (1 meter grid, the cross denotes the coordinate system’s origin). The lines are split, wherever two adjacent vertices are too far apart (20 cm). The resulting set of polygonal lines (b) is then simplified by means of discrete curve evolution with a threshold of 50. The resulting set of polygonal lines (c) consists of less data though still capturing the most significant information. Below, results of applying DCE with different parameters as threshold are shown. As can be observed, choosing the value is not critical for shape information. Thresholds chosen: (d) 10, (e) 30, (f) 70.
environment can still cause faulty matches. For example, within typical indoor scenarios, especially office buildings, there is a high self-similarity of objects, e.g., door frames look always the same. Though door frames can – due to their complexity – provide a distinctive shape feature, they might easily be mixed up when several of them are observable from a single viewpoint. Matching structural shape representations made up from an entire laser scan allows us to overcome this problem. Structural representations allow us to incorporate abstract, qualitative knowledge (here: ordering information) and metric information into a single representation. Boundaries extracted from sensory data provide metric information needed for aligning scans. Bringing more abstract spatial knowledge into play enables an efficient matching. Just as the representation constitutes of two individual aspects, the employed shape-matching is likewise twofold. Matching shapes is build up from a matching of the shape’s structure and from determining the similarity of boundaries. A similarity measure determined for a pair of boundaries – which were extracted from different scans – serves as a plausibility measure for the matching. The more similar these boundaries are, the more likely they correspond to each other. The key point of the proposed approach is to dispose as much context information as possible. We elaborate on this in a bit more detail. Looking at a purely data-driven approach, there is no context information used at all. Each reflection point measured by the laser range finder is matched individually against another point from a different scan. Of course, such an attempt is prune to errors. Therefore, several enhancements need to be applied. The technique of position filtering is applied to neglect any reflection points in the matching process that
46
Longin Jan Latecki, Rolf Lak¨ amper, and Diedrich Wolter
– most likely – could not have been observed from both viewpoints. Second, the displacement that aligns two scans best is determined as the mean value of the largest cluster of displacement induced from the individual correspondences of scan points1 . Employing a feature-based scan-matching can be viewed as increase in context information. The local context of scan-points which form features is respected. However, features are still matched independently. Hence, computing the overall displacement still requires to compute a mean value from the most plausible cluster of individual displacements. The advantages of increasing the context respected in a matching can be summarized as (a) a reduction of compute time as there are just few features as compared to the raw LRF data and (b) an increase in matching reliability as a faulty correspondence of features is much more unlikely to happen accidentally as opposed to using raw data. Therefore, a structural shape representation is employed that captures the configuration of boundaries. Within the terminology of context, individual visual parts are matched in the context of a complete scan. This prevents mixups in determination of correspondence. The aspect of spatial information stored in the structural representation is a very simple, yet powerful one: ordering information. Sensing a scene in a counter-clockwise manner induces a cyclic ordering of the objects perceived. When matching two scenes against each other, hence determining a correspondence of objects present in both scenes, this ordering must not be violated. As a LRF does not provide a full round view, the cyclic ordering may be represented by a linear ordered structure, i.e., a vector. Proceeding this way, we can represent a scan by a vector of visual parts (represented as boundary polylines) B. When matching two vectors of visual parts against each other, only 1-to-1correspondences of boundaries are considered, but some visual parts may remain unmatched (new objects may appear and some objects may not longer be visible). Let us assume that all similarities for individual pairs of visual parts S(Bi , Bj ) have been computed for two vectors B = (B1 , B2 , . . . , Bb ) and B = (B1 , B2 , . . . , Bb ) respectively, using our shape similarity measure S. Correspondence of visual parts Bi and Bj will be denoted Bi ∼ Bj . Then the task to compute an optimal correspondence can be written as minimization of the summed up similarities Σ(Bi ,Bj )∈∼ S(Bi , Bj ). The goal is to compute the correspondence relation ∼ that yields the lowest overall sum of similarities of corresponding visual parts. To prevent a tendency not to match any visual parts (as ∼= ∅ would yield 0, the lowest sum possible), a penalty C is introduced for leaving a visual part unmatched, i.e., either ∀i ∈ [1, . . . , b ]B i ∼ B j or ∀j ∈ [1, . . . , b]B i ∼ B j . Thus, the matching can be written as minimization
S(B i , B j ) + C · (2| ∼ | − |B| − |B |) = min. !
(B i ,B j )∈∼
1
This can be viewed as introducing context information: the scan is treated as a compound object of points allowing scan-points only to be displaced equally.
Shape Similarity and Visual Parts
47
Respecting the ordering of visual parts enforced by simply restricting the correspondence relation ∼ to be a strictly monoton ordering of indices i, j in S(Bi , Bj ). Computing such optimal correspondence can be achieved by dynamic programming.
7
Aligning Scans
Once a correspondence has been computed, the scans involved need to be aligned in order to determine the current robot’s position from which the latest scan has been perceived, and finally to build a global map from the perceived scans. To align two scans, a translation and rotation (termed a displacement) must be computed such that corresponding visual parts are placed at the same position. The overall displacement is determined from the individual correspondences. Of course, due to noise, this can only be fulfilled to a certain extend, as boundaries may sometimes not be aligned perfectly and individual displacements may differ. To define the best overall displacement, the overall error, i.e., the summed up differences to individual displacements, is minimized according to the method of least squares. To mediate between all, possibly differing individual displacements, it is advantageous to restrict the attention to the most reliable matches. The presented approach uses only the best three matcheing pairs of visual parts selected using a reliability criterion described in Section 7.1. Based on the correspondence of the three matcheing pairs two complete scan boundaries from time t and t − 1 are aligned. For each corresponding polyline pair, we also know the correspondence of the line segments of which the polylines are composed. These correspondences have been determined along the way of computing the similarity of two polylines. Proceeding this way, the problem of aligning two scan is reduced to aligning two sets of corresponding lines. This is tackled by computing the individual displacements that reposition the corresponding line segments atop each other using standard techniques. First, the induced rotation is computed as the average value of rotational differences and the scans are aligned accordingly. Second, the induced translation is computed. This is done by solving an over-determined set of linear equations. As due to noise usually no solution exists, the solution minimizing the least square error is chosen. 7.1
Matching Reliability
The reliability of a matching a pair of polylines is influenced by two parameters, namely their similarity and their shape complexity. The higher the complexity is, the more distinctive a matching is, as accidental matchings become much more unlikely with growing complexity. So, alongside the similarity measure complexity mirrors a plausibility for a particular matching. The motivation is that choosing the most complex correspondences from an overall matching of scans should guarantee to pick correct correspondences only. Determination of
48
Longin Jan Latecki, Rolf Lak¨ amper, and Diedrich Wolter
similarity measure S has been presented in section 2. To determine the complexity of a polyline P with points (p1 , p2 , . . . , pn ), n > 2 the following formula is used: n−1 CP = K(pi−1 , pi , pi+1 ) (3) i=2
Hereby K denotes the relevance measure of points as defined in formula (1). For a polyline composed of a single line segment, however, no relevance measure can be assigned this way. Therefore, in this case simply the half length of the line segment is chosen as complexity (d denotes the Euclidean distance). C(p1 ,p2 ) = 0.5d(p1 , p2 )
(4)
The matching reliability of two polylines P, R is then determined by Q(P, R) = CP + CR − S(P, R).
(5)
Thus, two polylines with complex shape that are very similar, receive a high matching reliability value. 7.2
Advanced Incremental Alignment
Previous sections explained how correspondences between two scans can be detected and how an induced displacement can be computed. In principle, an incremental scan matching can be realized in a straightforward manner: For each scan (at time t) visual parts are extracted and matched against the last scan perceived (at time t − 1). As the boundaries are matched they are displaced accordingly and entered in a map. However, such approach suffers from accumulating noise. For example, if a wall is perceived in front of the robot with a noise in distance of about 4cm (typical noise of a LRF), computing a single displacement can introduce an error of 8cm. Such errors accumulate during the continuous matching. Hence, maps resulting from several hundred scans render themselves useless. This is reason enough for any real application to incorporate some handling of uncertainty, e.g., by means of stochastic models. Our way of handling the uncertainty is again based on shape similarity. Instead of aligning all scans incrementally, i.e., scan t is aligned with respect to scan t − 1, we align scan t with respect to a reference scan t − n for some n > 1. Scan t − n remains as the reference scan as long as the three most reliable maching visual parts from scan t are sufficiently similar to the corresponding visual parts from scan t − n. This reference scan allows us to keep the accumulating incremental error down, as the reference visual parts do not change so often. Our criterion on when to change the reference scan is a threshold on shape similarity of actual visual parts to the reference ones. The performance of our system is demonstrated in Figure 8(a), where the map constructed from 400 scans obtained by a robot moving along the path marked with the dashed line is shown. For comparison, a ground truth map of the reconstructed indoor environment (a hallway at the University of Bremen) is shown in 8(b).
Shape Similarity and Visual Parts
49
Glas doors/windows
(a)
(b)
Fig. 8. (a) A map created by our approach. The robot path is marked with a dashed line. (b) A ground truth map of the indoor environment.
8
Conclusions
The problems of self-localization and robot mapping are of high importance to the field of mobile robotics. These problems constitute from a geometric level and a handling of uncertainty. State-of-the art in robot mapping and self-localization provides us with good techniques to master the latter. The underlying geometric representation is a rather simple one. Either perceptual data remains largely uninterpreted or simple features (e.g. lines, corners) are extracted. A connection between the geometric level and shape matching exists but is still underexploited. By using a shape representation as the underlying geometric representation, we combined advantages of feature-based approaches, namely a compact representation and a high-level, object-centered interface, with generality of uninterpreted approaches due to shape-representation’s versatility. Our future goal is to gain deeper geometric understanding of robot localization. It is well known that shape representation and shape-based object recognition plays a primary role in human visual perception. Our research indicates that localization and mapping tasks are also based on shape representation and shape matching. Therefore, we are developing a robot localization and mapping formalism that employs a cognitively motivated shape representation and shape matching.
References 1. M. Arkin, L. P. Chew, D. P. Huttenlocher, K. Kedem, and J. S. B. Mitchell. An efficiently computable metric for comparing polygonal shapes. IEEE Trans. PAMI, 13:209–206, 1991. 2. S. Belongie, J. Malik, and J. Puzicha. Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Analysis and Machine Intelligence, 24:509–522, 2002. 3. H. Blum. Biological shape and visual science. Journal of Theor. Biol., 38:205–287, 1973.
50
Longin Jan Latecki, Rolf Lak¨ amper, and Diedrich Wolter
4. L. da F. Costa and R. M. Cesar. Shape Analysis and Classification. Theory and Practice. CRC Press, Boca Raton, 2001. 5. Cox, I.J., Blanche – An experiment in Guidance and Navigation of an Autonomous Robot Vehicle. IEEE Transaction on Robotics and Automation 7:2, 193–204, 1991. 6. D.F. DeMenthon, L.J. Latecki, A. Rosenfeld, and M. Vuilleumier St¨ uckelberg. Relevance ranking and smart fast-forward of video data by polygon simplification. pages 49–61, 2000. 7. Dissanayake, G. ,Durrant-Whyte, H., and Bailey, T., A computationally efficient solution to the simultaneous localization and map building (SLAM) problem. ICRA’2000 Workshop on Mobile Robot Navigation and Mapping, 2000. 8. Gutmann, J.-S., Schlegel, C., AMOS: Comparison of Scan Matching Approaches for Self-Localization in Indoor Environments. 1st Euromicro Workshop on Advanced Mobile Robots (Eurobot), 1996. 9. Gutmann, J.-S. and Konolige, K., Incremental Mapping of Large Cyclic Environments. Int. Symposium on Computational Intelligence in Robotics and Automation (CIRA’99), Monterey, 1999. 10. Gutmann, J.-S., Robuste Navigation mobiler System, PhD thesis, University of Freiburg, Germany, 2000. 11. D. H¨ ahnel, D. Schulz, and W. Burgard. Map Building with Mobile Robots in Populated Environments, Int. Conf. on Int. Robots and Systems (IROS), 2002. 12. D. Huttenlocher, G. Klanderman, and W. Rucklidge. Comparing images using the Hausdorff distance. IEEE Trans. PAMI, 15:850–863, 1993. 13. A. Khotanzan and Y. H. Hong. Invariant image recognition by zernike moments. IEEE Trans. PAMI, 12:489–497, 1990. 14. B. Kuipers. The Spatial Semantic Hierarchy, Artificial Intelligence 119, pp. 191– 233, 2000. 15. L. J. Latecki and D. de Wildt. Automatic recognition of unpredictable events in videos. In Proc. of Int. Conf. on Pattern Recognition (ICPR), volume 2, Quebec City, August 2002. 16. L. J. Latecki and R. Lak¨ amper. Convexity rule for shape decomposition based on discrete contour evolution. Computer Vision and Image Understanding, 73:441– 454, 1999. 17. L. J. Latecki and R. Lak¨ amper. Shape similarity measure based on correspondence of visual parts. IEEE Trans. Pattern Analysis and Machine Intelligence, 22(10):1185–1190, 2000. 18. L. J. Latecki and R. Lak¨ amper. Application of planar shapes comparison to object retrieval in image databases. Pattern Recognition, 35 (1):15–29, 2002. 19. L. J. Latecki and R. Lak¨ amper. Polygon evolution by vertex deletion. In M. Nielsen, P. Johansen, O.F. Olsen, and J. Weickert, editors, Scale-Space Theories in Computer Vision. Proc. of Int. Conf. on Scale-Space’99, volume LNCS 1682, Corfu, Greece, September 1999. 20. L. J. Latecki, R. Lak¨ amper, and U. Eckhardt. Shape descriptors for non-rigid shapes with a single closed contour. In Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, pages 424–429, South Carolina, June 2000. 21. Lu, F., Milios, E., Robot Pose Estimation in Unknown Environments by Matching 2D Range Scans. Journal of Intelligent and Robotic Systems 18:3 249–275, 1997. 22. F. Mokhtarian, S. Abbasi, and J. Kittler. Efficient and robust retrieval by shape content through curvature scale space. In A. W. M. Smeulders and R. Jain, editors, Image Databases and Multi-Media Search, pages 51–58. World Scientific Publishing, Singapore, 1997.
Shape Similarity and Visual Parts
51
23. F. Mokhtarian and A. K. Mackworth. A theory of multiscale, curvature-based shape representation for planar curves. IEEE Trans. PAMI, 14:789–805, 1992. 24. R¨ ofer, T., Using Histogram Correlation to Create Consistent Laser Scan Maps. IEEE Int. Conf. on Robotics Systems (IROS). EPFL, Lausanne, Switzerland, 625– 630, 2002. 25. K. Siddiqi, A. Shokoufandeh, S. J. Dickinson, and S. W. Zucker. Shock graphs and shape matching. Int. J. of Computer Vision, 35:13–32, 1999. 26. S. Thrun. Learning Metric-Topological Maps for Indoor Mobile Robot Navigation, Artificial Intelligence 99, pp. 21–71, 1998. 27. S. Thrun. Probabilistic algorithms in robotics. AI Magazine, 21(4):93–109, 2000. 28. S. Thrun. Robot Mapping: A Survey, In Lakemeyer, G. and Nebel, B. (eds.): Exploring Artificial Intelligence in the New Millenium, Morgan Kaufmann, 2002. 29. Thrun, S., Burgard, W., and Fox, D., A real-time algorithm for mobile robot mapping with applications to multi-robot and 3D mapping. IEEE Int. Conf. on Robotics and Automation (ICRA), 2000.
On the Morphological Processing of Objects with Varying Local Contrast Pierre Soille EC Joint Research Centre Institute for Environment and Sustainability Land Management Unit TP 262, I-21020 Ispra, Italy [email protected]
Abstract. Most morphological operators appear by pairs such as erosion/dilation, opening/closing, and thinning/thickening. These are pairs of dual operators with respect to set complementation. The output of a (dual) morphological operator applied to an object depends on whether it is a bright object over a dark background or a dark object over a bright background. When dealing with complex images such as earth observation data, there is no clear distinction between the background and the foreground because the image consists of a partition of the space into image objects of arbitrary intensity values. In this paper, we present an overview of existing approaches for tackling this problem and propose new techniques based on area filters applied first to the image extrema and then to all flat regions. Keywords: Mathematical morphology, self-duality, self-complementarity, partition, region growing, flat regions, compression, satellite images.
1
Introduction
Most morphological operators appear by pairs such as erosion/dilation (ε, δ), opening/closing (γ, φ), and thinning/thickening (THIN, THICK). These are pairs of dual operators with respect to set complementation. Rather than pairs of dual operators, we could have referred to the more general concept of pairs of adjunct operators. However, because we restrict our attention to morphological operators applied to grey scale images, the notion of duality suits our needs. In mathematical terms, two image transformations Ψ and Φ are dual with respect to complementation if applying Ψ to an image is equivalent to applying Φ to the complement of the image and taking the complement of the result: Ψ and Φ are dual with respect to complementation ⇔ Ψ = Φ.
(1)
For example, dilating an image is equivalent to eroding the complement of this image and then complement the output eroded image: δ = ε. Although the
This work was supported by the EC-JRC ESDI Project.
I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 52–61, 2003. c Springer-Verlag Berlin Heidelberg 2003
On the Morphological Processing of Objects with Varying Local Contrast
53
duality principle is fundamental to many morphological operators, its implications are often overlooked. Indeed, the output of a (dual) morphological operator applied to an object depends on whether it is a bright object over a dark background or a dark object over a bright background. This is not a major issue for applications where the local contrast of a given object type does not vary over the image. It happens for instance with images of cells whose nuclei usually appear darker than their cytoplasm, man-made objects controlled by machine vision systems, etc. However, when dealing with more complex images such as earth observation data, there is no clear distinction between the background and the foreground because the image consists of a partition of the space into arbitrary image objects. As a consequence, an object such as a field may appear darker or brighter than the surrounding fields depending on the reflectance of its neighbouring fields [9]. In addition, satellite images are often multichannel images. It follows that the independent processing of each channel by a dual operator may lead to inconsistent results because an object may appear darker than its neighbourhood in a channel and vice versa in another. A further relevant application is the filtering of speckle noise. Indeed, owing to their symmetric nature, constructive and destructive interferences should be processed identically. A solution to mitigate the non-symmetric behaviour of dual morphological operator is to apply, in a sequence, a filter and then its dual. For instance, one may apply an opening followed by a closing (or vice versa), starting with the smallest possible size and proceed until the maximum filter size is reached. This idea is at the basis of alternating sequential filters. However, although alternating sequential filters process bright and dark structures much more equally than the mere utilisation of a unique opening or closing, their output usually depends on whether one starts the sequence with an opening or a closing. In situations where a strict symmetric processing of objects either brighter or darker than their neighbourhood is required, self-dual operators should be considered. In mathematical terms, an image-to-image transformation Ψ is selfdual with respect to the complementation operator if its dual transformation with respect to complementation is itself: Ψ is self-dual with respect to complementation ⇔ Ψ = Ψ .
(2)
The complement of an image f , denoted by f c , is defined for each pixel x as the maximum value of the data type used for storing the image tmax minus the value of the image f at position x, i.e., f c (x) = tmax − f (x) = (f ) (x). For example, normalised shift-invariant operators are also self-dual operators. The median filter is an example of non-linear self-dual filter. Motivated by applications dealing with objects whose local contrast may vary across the image, we propose to analyse the behaviour of morphological operators for such variations. The paper is organised as follows. A brief review of morphological self-dual operators including new developments related to the link between switch operators and thinning/thickening operations is given in Sec. 2. We show in Sec. 3 that self-duality is not sufficient for some applications and discuss possible techniques addressing this problem.
54
Pierre Soille
2
Morphological Self-dual Operators
Self-dual operators defined in terms of switch operators are recalled in Sec. 2.1. We then show in Sec. 2.2 that there exist equivalent representations in terms of thinning and thickening. We finally describe in Sec. 2.3 self-dual alternating sequential filters based on area opening and closing. 2.1
Switch-Based Self-dual Operators [4]
A thorough paper about the notion of self-duality in mathematical morphology is due to Heijmans [4]. This paper was motivated by the search for a representation of self-dual morphological increasing and idempotent operators. Because increasing self-dual activity-extensive1 operators converge when iterated, the problem comes down to find a representation of the latter operators. This is achieved step by step, starting from an arbitrary self-dual operator and progressively constraining it with the desired properties of a morphological filter (increasingness, translation invariance, and idempotency). These developments are summarised in this section for increasing and translation invariant self-dual operators (i.e., without taking into account the idempotency property). Consider an arbitrary self-dual operator Ξ applied to an arbitrary set X. We denote by σ(X) the points of X which are suppressed by Ξ, i.e., σ(X) = X \ Ξ(X) = X ∩ (Ξ(X))c = X ∩ Ξ(X c ). It follows that Ξ can be written in terms of σ as follows, id denoting the identity operator: Ξ = (id ∧ σ) ∨ σ.
(3)
An anti-extensive operator σ leading to a self-dual increasing operator is called a switch operator and satisfies the following two conditions [4, p. 22]: X ⊆ Y ⇒ X ∩ σ(Y ) ⊆ σ(X), σ(X ∪ {h}) ∩ σ(X ∪ {h}) = ∅, h ∈ Z2 , X ∈ Z2 . c
(4) (5)
Recall that any increasing operator extends directly to grey tone images using the threshold superposition principle which states that the output of an increasing operator applied to a grey tone image equals the sum its binary cross-sections processed by this operator. The adjective ‘switch’ indicates that the operator σ yields all points which switch value from 1 to 0 (points in σ(X)) or from 0 to 1 (points in σ(X c )) by application of the corresponding self-dual increasing operator Ξ: Ξ(X) = (X \ σ(X)) ∪ σ(X c ). By considering the additional translationinvariance property, the basis representation of a translation invariant increasing operator [3] leads to a representation of switch operators as well as their corresponding translation invariant increasing self-dual operators. More precisely, every translation invariant switch σA can be written as follows [4, p. 25]: 1
An operator Ψ1 is less active than an operator Ψ2 if for all pixels of any given input image, the output value of Ψ1 at any given pixel lies in the interval defined by the input value and the output of Ψ2 at this pixel. An operator Ψ is called activityextensive if Ψ iterated n times is less active that Ψ iterated n + 1 times.
On the Morphological Processing of Objects with Varying Local Contrast
σA = id ∧
εA ,
55
(6)
A∈A
where A is a collection of structuring elements such that the origin o belongs to none of them (i.e., o ∈ A for all A ∈ A) and the intersection of any two structuring elements is non-empty (i.e., A ∩ B = ∅ for all A, B ∈ A). The representation of the corresponding self-dual operator ΞA is obtained by substituting σ with σA in Eq. 3 and simplifying the right term: ΞA = id ∧ δA ∨ εA . A∈A
2.2
(7)
A∈A
Link between Switch and Thinning/Thickening Representations
We now show that Eq. 7 can be expressed in terms of thinnings and thickenings as exemplified in [8, Fig. 5.12, p. 157] for the special case of a self-dual filter removing isolated pixels of grey scale images. Indeed, the left part of Eq. 7 corresponds to an intersection of thinnings performed in parallel with a series of composite structuring elements Bi = (Bi1 , Bi2 ) such that Bi1 is restricted to the origin o and the sets Bi2 form a one-to-one correspondence with the sets A of A: Bi
THINBi (X) =
X \ HMTBi (X) Bi
c X ∩ HMTBi (X) = Bi
c X ∩ εBi1 (X) ∩ εBi2 (X c ) = Bi
X ∩ δBi1 (X c ) ∪ δBi2 (X) = Bi
=
Bi2
=X∩
X ∩ δBi2 (X)
δBi2 (X)
Bi2
=X∩
δA (X),
A∈A
where HMT denotes the hit-or-miss transformation. This result can also be obtained starting from the observation
[4, p. 26] that σA corresponds to a union of hit-or-miss transformations: σA = A∈A HMTB=({o},A) . Indeed, it follows that:
56
Pierre Soille
id ∧ σA = id ∧
HMT({o},A)
A∈A
= id ∧
HMT({o},A)
A∈A
=
id ∧ HMT({o},A) A∈A
=
THIN({o},A) .
A∈A
Note that in this paper, the constrained version of the hit-or-miss and thinning operators [7] must always be considered when processing grey level images. Remember that the right term of Eq. 7 (as well as the corresponding term in Eq. 3) corresponds to pixels switching from 1 to 0 when σ is applied to X c . That is, some pixels of the background of X are unioned to the intersection of thinnings. Now, because Ξ is self-dual, an equivalent formulation of Eq. 3 is as follows:
and, accordingly,
Ξ = (id ∨ σ) ∧ σ,
(8)
εA ∧ δA . ΞA = id ∨
(9)
A∈A
A∈A
Observing that HMTB (X) equals HMTBc (X c ) where B = (B1 , B2 ) and Bc = (B2 , B1 ), the left term of Eq. 8 can be decomposed as follows: HMT({o},A) id ∨ σA = id ∨ A∈A
= id ∨
HMT(A,{o})
A∈A
=
id ∨ HMT(A,{o}) A∈A
=
THICK(A,{o}) .
A∈A
We now show that when this union of thickening is altering (increasing) the intensity value of a given pixel, the subsequent intersection with A∈A δA appearing in Eq. 9 will never further alter this value (equivalent developments apply to the thinning representation corresponding to the left term of Eq. 7). Proof. The union of thickening at a given position x of an image f modifies the value of the input image at this position if and only if A∈A εA (f ) (x) > f (x). This implies that there exists a A ∈ A such that f (x + a) > f (x) for all
a ∈ A and A∈A εA (f ) (x) = ∧a∈A f (x + a). Because A ∩ B = ∅ for all A, B ∈ A, for all B ∈ A there exists a a ∈ A such that a ∈ B. This implies
that A∈A δA (f ) (x) ≥ A∈A εA (f ) (x). Indeed, for all B ∈ A, a ∈ B ∩ A ,
the following inequality holds: ∨b∈B f (x + b) ≥ f (x + a).
On the Morphological Processing of Objects with Varying Local Contrast
57
Consequently ΞA reduces to the anti-centre based on the intersection of thinnings and union of thickenings. 2.3
Area Based Self-dual Filters and Self-dual Reconstruction
Specific compositions of 8-connected area opening and closing lead to self-dual morphological filters. More precisely, the composition of an 8-connected area opening γλ8 with the dual area closing φ8λ and area parameter λ equal to 2 is a self-dual morphological filter. This filter can also be expressed in terms of a switch operator with the 8-neighbour ring as structuring element (SE) and the origin at its centre. Its formulation in terms of thinnings and thickenings is illustrated in [8, Fig. 5.12, p. 157]. More generally, it can be shown that the open-close filters, based on 8-connected area filters, are self-dual up to an area of 8 pixels and are identical to the corresponding close-open filters: φ8λ γλ8 = γλ8 φ8λ , ∀λ ∈ {2, 3, . . . , 8}.
(10)
For larger sizes, alternating sequential filters (ASFs) based on 8-connected area closings and openings lead to self-dual filters. From a computational point of view, it is worth mentioning that, for an 8-connected area ASF of size n larger than 20, only sizes 8, 16, 20 need to be considered and then every even size smaller than n, and finally the size n. For example, it can be shown that the following equalities holds: 8 8 8 φ25 · · · γ38 φ3 γ28 φ2 = φ825 γ25 · · · φ83 γ3 φ82 γ2 = γ25 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 φ25 γ24 φ24 γ22 φ22 γ20 φ20 γ16 φ16 γ8 φ8 = φ825 γ25 φ24 γ24 φ22 γ22 φ20 γ20 φ16 γ16 φ8 γ8 . γ25
The latter filter is illustrated in Fig. 1 on a satellite image. In this experiment, the three channels of the input multichannel image have been processed independently.
3
Beyond Self-duality
In some applications, self-duality is only partially addressing the problem of treating equally objects with varying local contrast. Indeed, self-dual filters such as the alternating sequential filters described in Sec. 2.2 are still assuming that the targeted image objects are either brighter or darker than the surrounding objects (i.e., we assume that the targeted image structures are marked by image extrema). While this is a valid assumption for processing objects such as speckle patterns, it does not apply to image structures showing more than two phases such as satellite images with various crop fields or microscopy images of rock samples with various minerals. In these situations, objects correspond to regions of homogeneous grey scale rather than simply maxima and minima. As a consequence, a self-dual filter such the alternating sequential filters described in Sec. 2.2 is not robust in the sense that it will only alter those regions that
58
Pierre Soille
(a) 521×381 Landsat image of Naples: false RGB colour composite using bands 4, 5, and 7.
(b) 8-connected self-dual area ASF up to an area 25 pixels.
Fig. 1. Self-dual image simplification removing all dark and bright connected components of pixels smaller than a given threshold. This is achieved by performing the alternating sequential filter based on 8-connected area openings and closings (as per Eq. 11).
are completely surrounded by either darker or brighter regions. That is, intermediate regions occurring as plateaus may remain unaffected. We present in this section two approaches tackling this problem. The first (Sec. 3.1) consists in applying self-complementary operators while the second (Sec. 3.2) is based on the processing of the image flat zones. 3.1
Self-complementary Based Operators
Self-complementary operators may be considered for processing image objects independently of their local contrast. Indeed, a self-complementary operator Φ outputs the same result if applied to an image or its complement [8, p. 55]: Φ is self-complementary with respect to ⇔ Φ = Φ.
(11)
For example, both the morphological gradient (arithmetic difference between the dilation and erosion of an image) and the norm of a gradient computed using derivative convolution kernels are self-complementary operators. Consequently, morphological segmentation techniques such as those based on the watershed transformation of self-complementary gradients guarantee that the resulting segmentation is independent of the local contrast of the searched objects in the input image (assuming the marker set are extracted accordingly). Moreover, contrary to self-dual filters, plateaus are treated identically to image extrema by the gradient operator. Indeed, a plateau region such as a field adjacent to both darker and brighter fields may remain unaffected by a self-dual filter although this latter filter may remove a similar field appearing as an image extremum, i.e., a field surrounded by either darker or brighter fields. However, a drawback of a gradient based processing is that the searched regions of the image must be thick enough to display a core with low gradient
On the Morphological Processing of Objects with Varying Local Contrast
59
values. This problem is caused by the the limited resolution of the gradient. It has motivated Crespo et al. [2] to propose an alternative approach based on the merging of flat zones. Similarly, Pesaresi and Benediktsson [5] proposed the notion of morphological profiles to avoid this resolution problem when segmenting satellite images. 3.2
Sequential Area Filtering of the Image Flat Zones
Owing to their very nature, area opening and closing act only on the image extrema. It follows that transition regions or intermediate plateaus may be preserved by these filters even if their extent is smaller than the selected area parameter. This issue is illustrated in Fig. 2 by displaying the partition of the satellite image filtered by an area opening and closing corresponding to Fig. 1b.
Fig. 2. Flat zones of the image shown in Fig. 1b (alternating self-dual area opening and closing up to an area of 25 pixels). Although this filter ensures that all extrema of the filtered image are larger or equal to the size of the filter, flat zones belonging to nonextrema regions can be of arbitrary size in the filtered image. Flat zones corresponding to the first channel of the processed multichannel satellite image are displayed.
We propose to simplify the image by removing all flat zones whose area is below a given threshold value as follows: – First extract all flat zones whose area is greater than or equal to a given threshold value. This is achieved by labelling the flat zones of the initial image according to the fast breadth first stack based algorithm described in [8, p. 38]. Those labelled regions whose area are equal to or exceed the threshold value are then selected. – Define an ordered procedure to grow the selected flat zones while preserving their initial grey level values so as to obtain a new partition of the image definition domain into flat zones. This is achieved by adapting the seeded region growing algorithm described in [1].
60
Pierre Soille
Similarly to alternating sequential filters, a better preservation of the relevant image structures is obtained by iterating the proposed area based filtering for increasing values of the area threshold level until the desired value is reached. For example, Fig. 3a shows the output of the proposed filtering by iterating it up to an area threshold of 25 pixels and using 8-connectivity. Contrary to the
(a) 8-connected area sequential filter applied to Fig. 1a for an area of up to 25 pixels.
(b) Corresponding image partition (for the first channel).
Fig. 3. Sequential area filtering of the flat zones of the multichannel satellite image displayed in Fig. 1a. Compare the image displayed in (a) with the output of the alternating sequential area opening/closing filter displayed in Fig. 1b as well as the corresponding flat zones partitions of the first channel.
partition produced by the alternating sequential area opening/closing filter (see Fig. 2), each flat zone of the alternating area filter of the flat zones up to an area of λ pixels has at least λ pixels (Fig. 3b). Note that Salembier et al. [6] also propose a filter suppressing all flat regions whose area is less than a given threshold. It is based on the processing of the region adjacency graph of the flat zones using an area merging criterion and setting the grey level of the merged region to the median value of the largest region (or the arithmetic mean of the two merged regions if they have both the same size) while considering an ad hoc merging order. Contrary to our approach, this type of process defines a connected operator. That is, when a flat zone is below the threshold level, it cannot be shared by two different flat zones.
4
Conclusion and Perspectives
Beyond background techniques for generating self-dual morphological filters and new links between switch operators and thinning/thickening pairs, we have focused our attention to new filters based on area filters. The first category is based on area opening and closing. However, it assumes that relevant objects are either brighter or darker than their neighbourhood. This model does not apply to complex images such as satellite images with numerous land cover types.
On the Morphological Processing of Objects with Varying Local Contrast
61
We have proposed to address this problem by suppressing all flat zones of the image using an area criterion and then grow the remaining flat zones using a modified seeded region growing technique. We are in the process of evaluating this technique for extracting thematic information from pan-European Landsat imagery.
Acknowledgements I wish to thank Henk Heijmans for stimulating discussions about switch operators.
References [1] R. Adams and L. Bischof. Seeded region growing. IEEE Transactions on Pattern Analysis and Machine intelligence, 16(6):641–647, 1994. [2] J. Crespo, R. Schafer, J. Serra, C. Gratin, and F. Meyer. The flat zone approach: a general low-level region merging segmentation method. Signal Processing, 62(1): 37–60, 1997. [3] H. Heijmans. Morphological Image Operators. Advances in Electronics and Electron Physics Series. Academic Press, Boston, 1994. [4] H. Heijmans. Self-dual morphological operators and filters. Journal of Mathematical Imaging and Vision, 6:15–36, 1996. URL ftp://ftp.cwi.nl/pub/morphology/ report/Heijmans_selfdual.ps.Z. [5] M. Pesaresi and J. Benediktsson. A new approach for the morphological segmentation of high resolution satellite imagery. IEEE Transactions on Geoscience and Remote Sensing, 39(2):309–320, February 2001. [6] P. Salembier, L. Garrido, and D. Garcia. Auto-dual connected operators based on iterative merging algorithms. In H. Heijmans and J. Roerdink, editors, Mathematical Morphology and its Applications to Image and Signal Processing, volume 12 of Computational Imaging and Vision, pages 183–190, Dordrecht, 1998. Kluwer Academic Publishers. [7] P. Soille. Advances in the analysis of topographic features on discrete images. Lecture Notes in Computer Science, 2301:175–186, March 2002. URL http://link. springer.de/link/service/series/0558/bibs/2301/23010175.htm. [8] P. Soille. Morphological Image Analysis: Principles and Applications. Springer-Verlag, Berlin Heidelberg New York, 2nd edition, 2003. See also http://ams.jrc.it/soille/book2nd. [9] P. Soille and M. Pesaresi. Advances in mathematical morphology applied to geoscience and remote sensing. IEEE Transactions on Geoscience and Remote Sensing, 40(9):2042–2055, September 2002.
Watershed Algorithms and Contrast Preservation Laurent Najman and Michel Couprie Laboratoire A2SI, Groupe ESIEE Cit´e Descartes, BP99 93162 Noisy-le-Grand Cedex France {l.najman,m.couprie}@esiee.fr http://www.esiee.fr/˜coupriem/Sdi/
Abstract. This paper is devoted to the study of watershed algorithms behavior. Through the introduction of a concept of pass value, we show that most classical watershed algorithms do not allow the retrieval of some important topological features of the image (in particular, saddle points are not correctly computed). An important consequence of this result is that it is not possible to compute sound measures such as depth, area or volume of basins using most classical watershed algorithms. Only one watershed principle, called topological watershed, produces correct watershed contours. Keywords: Mathematical Morphology, Watersheds, Contours Saliency, Topology
1
Introduction
This paper is a first of a series dedicated to the notion of watershed contour saliency. Using this concept, introduced in [1,2], we can sum up in one image all the contour information that we can obtain by filtering the image by attribute opening [3,4,5,6] for all values of the parameter and applying a watershed on each of the filtered images. Several algorithms [2,7,8,9] for computing saliency of watershed contours have been proposed. We expect to obtain the same result either by thresholding the saliency image at a given level k, or by filtering the original image using an attribute opening using k as parameter value and applying a watershed algorithm on the filtered image. None of the existing saliency algorithms computes this expected result. The goal of the series is to show why it is the case, and to propose a novel efficient algorithm that computes the expected result. This paper is devoted to the study of watershed algorithms behavior with respect to what is needed to compute saliency of contours. For computing saliency, one needs a map of watershed basins neighborhood, with the altitude of their associated saddle points, and a valuation on each basin. A review of watershed algorithms and their associated results can be found in [10]. This review does not study algorithms from the point of view of the preservation of important I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 62–71, 2003. c Springer-Verlag Berlin Heidelberg 2003
Watershed Algorithms and Contrast Preservation
63
topological features of the original image; for instance, it does not consider the question: does the algorithm compute correct saddle points? We are going to tackle the difficult notion of saddle point through the introduction of a concept of “pass value”. We demonstrate that watershed algorithms that are the most used in practice do not behave correctly with respect to the preservation of pass values, and thus cannot be used in a saliency algorithm. We show that the approach called topological watershed [11] (which is not mentioned in [10]) provides the only existing algorithm that produces a correct entry point for a saliency algorithm.
2 2.1
Brief Description of Watershed Algorithms Intuitive Notions for Watershed
The intuitive idea underlying the watershed notion comes from the field of topography: a drop of water falling on a relief follows a descending path and eventually reaches a minimum. Watershed lines are the divide lines of the domains of attraction of drops of water. This intuitive approach is not well suited to practical implementations, and can yield biased results in some cases [12]. An alternative approach is to imagine the surface being immersed in a lake, with holes pierced in local minima. Water will fill up basins starting at these local minima, and, at points where waters coming from different basins would meet, dams are built. As a result, the surface is partitioned into regions or basins separated by dams, called watershed lines. 2.2
What Is a Watershed Algorithm
This paper is not the place to describe in details the (large) family of watershed algorithms. Nevertheless, it is worthwhile to give a brief description of the main algorithms. Let E be a set of vertices (or points). Let P(E) denote the set of all subsets of E. Let G = (E, Γ ) be a (symmetric) graph, where Γ is a mapping from E into P(E), which associates to each point x of E, the set Γ (x) of points adjacent to x. Let X ⊆ E, and let x0 , xn ∈ X. A path from x0 to xn in X is an ordered family (x0 , x1 , . . . , xn ) of points of X such that xi+1 ∈ Γ (xi ), with i = 0 . . . n−1. Let x, y ∈ X, we say that x is connected to y if there exists a path from x to y in X. The relation “is connected to” is an equivalence relation. A connected component of X is an equivalence class for the relation “is connected to”. Let E = Z 2 . We denote by F(E) the set composed of all functions from E to Z. Let F ∈ F(E). We denote by X the complement of X. We write Fk = {x ∈ E; F (x) ≥ k} with k ∈ Z; Fk is called an upper (cross-) section of F , and Fk is called a lower (cross-) section of F . A connected component of a section Fk is called a (level k) lower-component of F . A level k lower-component of F that does not contain a level (k − 1) lower-component of F is called a (regional) minimum of F .
64
Laurent Najman and Michel Couprie
Let us recall that a partition of a set S is a collection of non-empty disjoint subsets of S whose union is S. A watershed algorithm builds a partition of the space: – it associates an influence zone B(M ) called catchment basin, to each minimum M of the image. The set B(M ) is connected and contains M ; – it may produce a set of watershed lines which separates those catchment basins one from each other. 2.3
Vincent-Soille Watershed Algorithm [12]
For any set A and any set B ⊂ A made of several connected components Bi , the geodesic influence zone izA (Bi ) of Bi in A is the locus of the points of A whose geodesic distance to Bi is strictly smaller than their geodesic distance to any other component of B. We define the following recursion: Xhmin +1 = Fhmin +1 = M INhmin Xh+1 = M INh ∪ IZFh+1 (Xh )
(1) (2)
where hmin is the lowest grey-value of F, where IZFh+1 (Xh ) is the union of the geodesic influence zones of the connected components of Xh in Fh+1 , and where M INh is the union of minima of F with grey-level equal to h. The watershed lines are the complement of Xhmax +1 . As noted in [10], Vincent-Soille’s algorithm does not implement exactly this recursion. Thanks to a fifo queue, it floods the catchment basins of the image, and to build the watershed lines, it associates a special value WSHED to the pixels where two different cacthment basins would merge. A point labelled WSHED by the algorithm is not considered again in the following iteration, as it should be the case. Furthermore, pixels labelled WSHED are propagated. This allows the detection of special thick watershed zones, like those called buttonholes (see fig. 2.a). 2.4
Meyer’s Watershed Algorithm [13]
Starting from a greyscale image F and a set M of markers with different labels (in our case, these will be the minima of F ), it expands as much as possible the set M , while preserving the number of connected components of M : 1. insert every neighbor x of every marked area in a hierarchical queue, with a priority level corresponding to the grey level F (x). Note that a point cannot be inserted twice in the queue; 2. extract a point x from the hierarchical queue, at the highest priority level, that is, the lowest grey level. If the neighborhood Γ (x) of x contains only points with the same label, then x is marked with this label, and its neighbors that are not yet marked are put into the hierarchical queue;
Watershed Algorithms and Contrast Preservation
65
Step 2 must be repeated until the hierarchical queue is empty. The watershed lines set is the complement of the set of labeled points. Let us note that this algorithm does neither label nor propagate watershed pixels, which “stop” the flooding. Thus, the watershed lines produced by Meyer’s algorithm are always thinner than lines produced by other watershed algorithms. 2.5
Cost-Based Watershed
The principle is to define a distance or a cost for travelling between pixels, and to define the influence zone of a minimum as the set of points which are strictly closer to this minimum than to any other minimum. Various costs or distances can be considered, the most popular ones being the topographical distance [14,15], but other approaches exist [16] among which we can mention the max-arc path cost. The (so-called) topographical distance of an image F is a digital analogue to dtF (x, y) = inf π∈Π(x,y) π ||∇F (π(s))||ds. Let us note that if we are on a line of steepest slope between x and y, then dtF (x, y) = |F (x) − F (y)|. The catchment basin of a minimum mi is defined as the set of pixels x for which F (mi ) + dtF (mi , x) < F (mj ) + dtF (mj , x) for all minima mj = mi . The watershed lines set is the complementary of those catchment basins. Another simple possible choice is the max-arc path cost [16] which assigns to a path the maximum of F for each pixel on the path. In this case, dmF (x, y) = inf π∈Π(x,y) maxi F (π(i)), and the catchment basin of a minimum mi is defined as the set of pixels x for which dmF (mi , x) < dmF (mj , x) for all minima mj = mi . The watershed lines set is the complementary of those catchment basins. 2.6
Topological Watershed [11]
The idea is to define a transform that acts directly on the greyscale image, by lowering some points in such a manner that the connectivity of each lower cross-section Fk is preserved. The regional minima of the result, which have been spread by this transform, can be interpreted as the catchment basins. The formal definition relies on a particular notion of simple point: Definition 1. Let G = (E, Γ ) be a graph, and let X ⊂ E. The point x ∈ X is simple (for X) if the number of connected components of X ∪ {x} equals the number of connected components of X. In other words, x is simple (for X) if x is adjacent to exactly one connected component of X. We can now define a notion of destructible point, and the topological watershed: Definition 2. Let F ∈ F(E), x ∈ E, and k = F (x). The point x is destructible (for F ) if x is simple for Fk . We say that W ∈ F(E) is a topological watershed of F if W may be derived from F by iteratively lowering destructible points by one until stability (that is, until all points of E be non-destructible for W ).
66
Laurent Najman and Michel Couprie
The catchment basins of the topological watershed W are the minima of W , and the watershed lines are the non-minima of W . As a consequence of this definition, a topological watershed W of a function F is a function which has the same number of regional minima as F . Furthermore, the connectivity of any lower cross-section is preserved during this transformation. Let us note that, in this case, and contrary to other watershed principles, the watershed lines are part of the definition: it does not exist a variation of this notion that does not build those lines. An efficient algorithm to compute the topological watershed has been proposed in [11]. Let us emphasize the essential difference between this notion of topological watershed and the notion of homotopic greyscale skeleton, pioneered by Goetcherian [17] and extensively studied in [18,19]. With the topological watershed, only the connected components of the lower cross-sections of the function are preserved, while the homotopic greyscale skeleton preserves both these components and the components of the upper cross-sections. As a consequence, an homotopic greyscale skeleton may be computed by using a purely local criterion for testing whether a point may be lowered or not, while computing a topological watershed requires the use of a global data structure [11].
3
Watershed Algorithms Comparison
Intuitively, for application to image analysis, the watershed lines represent the location of pixels which best separate the dark objects (regional minima), in terms of grey level difference (contrast). In order to evaluate the effectiveness of this separation, we have to consider the values of pixels along watershed lines. This motivates the following definition. Definition 3. The watershed contours of F is a grayscale image W such that W (x) = 0 for any x in a catchment basin, and W (x) = F (x) elsewhere. Let us note that such a definition is not necessary for the topological watershed, which produces a function, and not a binary result. 3.1
Saddle Point, Pass Value and the Dynamics
To formalize the notion of contrast between two minima, we need to characterize first-contact points between basins. In the continuous framework, such points are called saddle points, but this notion is difficult to transfer to the digital grid. Furthemore, such a notion is not fundamental for contrast criteria. More precisely, for each couple of neighboring basins, we only need the altitude of the lowest contact point between them. This is the motivation for defining the pass value, a natural concept already used by several authors. Definition 4. Let M(F ) be the set of all minima of F . We define the pass value F (m1 , m2 ) between two minima m1 and m2 in M(F ) as F (m1 , m2 ) =
min
π∈Π(m1 ,m2 )
max F (π(i)) i
(3)
Watershed Algorithms and Contrast Preservation
67
where Π(m1 , m2 ) is the set of all paths linking m1 to m2 . For applications to image analysis like filtering, and especially for saliency, we want to compute – all pass values; we would like watershed contours to have the same pass values as the original image; – and a measure of contrast or importance of each basin (minima) of the original image; such a measure should correspond to measure taken on lower cross-sections of the original image. Various contrast measures can be computed, among which we can mention depth (dynamics [20]), area and volume [3]. We are going to examine more particularly the case of the dynamics. We first recall the basic definitions introduced by Grimaud [20] (in fact these definitions were proposed for 2D images, we extend them for arbitrary graphs). Let F ∈ F(E) and let X be a minimum for F . The attraction domain of X is the set composed of all point x such that there exist a descending path from x to X. The attraction domain of a minimum X is denoted by K(X). Let π be a path. The dynamics of π (for F ) is the value Dyn(π) = M ax{|F (x)− F (y)|; for all x, y in π}. Let x, y be two points. The dynamics between x and y (for F ) is the value Dyn(x, y) = M in{Dyn(π); for all π ∈ Π(x, y)}. Let X and Y be two subsets of E. The dynamics between X and Y (for F ) is the value Dyn(X, Y ) = M in{Dyn(x, y); for all x ∈ X, y ∈ Y }. Definition 5. Let X ∈ M(F ). The dynamics of X (for F ) is the number Dyn(X) such that: – If F (X) = M in{F (Y ); Y ∈ M(F )}, then Dyn(X) = ∞; – Otherwise, Dyn(X) = M in{Dyn[X, K(Y )]; ∀Y ∈ M(F ), F (Y ) < F (X)}. 3.2
The Case of the Topological Watershed
We can prove [21] that the topological watershed preserves the pass values. Property 1 Let W be a topological watershed of F . For all (m1 , m2 ) ∈ M(F )2 , and for the corresponding minima (m1 , m2 ) ∈ M(W )2 , we have F (m1 , m2 ) = W (m1 , m2 ) In the sequel of the paper, we are going to show that this property is neither true for the Vincent-Soille’s algorithm, nor for Meyer’s algorithm. An important consequence of this property is that measures (such as depth (dynamics [20]), area or volume [3]) computed on the basins obtained by either Vincent-Soille’s algorithm or Meyer’s algorithm do not correspond to measures of connected components of lower-cross sections of the image. On the contrary, the topological watershed does allow such computations. In particular, in the case of the dynamics, a consequence of property 1 is the following result.
68
Laurent Najman and Michel Couprie 2 3 6 3 6 3 255 7 6 2 255 7 1 2 255 (a)
2 6 4 6 5
A A 255 E E
A 6 6 C 7 6 E 7 E 255 (b)
B 6 D 6 F
A A 255 E E
A 6 6 C E 6 E E E 255 (c)
B 6 D 6 F
A A 6 A 6 C 255 7 6 E 255 7 E E 255 (d)
B 6 D 6 F
Fig. 1. Counter-example to pass-values preservation. A greyscale image (a) and some results of watershed algorithms: (b) Vincent-Soille (c) Meyer and (d) Cost-based and Topological watershed. One can see that the pass value between E and any other basin is 6 in (c) and is 7 in (b). Both the cost-based and the topological watershed (d) do preserve the correct pass value of 255
Property 2 Let F ∈ F(E) and let W be a topological watershed of F . Then the dynamics of a minimum for F is equal to the dynamics of the corresponding minimum for W . We can also prove that, for suitable cost functions, cost based watersheds of an image F preserve the pass values of F . But, as we will see, cost based watershed produces very thick contours that prevent it to be used for a saliency algorithm. 3.3
Comparison and Counter Examples for Other Watershed Algorithms
We are going to examine the behavior of watershed algorithms on several examples. In the sequel, the watershed examples are computed in 4-connectivity. In particular, regional minima are 4-connected subsets of ZZ 2 . On all the pictures, the basins are labeled with letters, and the watershed pixels are given with their corresponding value in the original image. Similar configurations can be found for other connectivities. Let us emphasize that configurations similar to the examples presented in this paper were found in real images. Neither Vincent-Soille’s nor Meyer’s algorithm do preserve the pass values. A counter-example that illustrates this behavior is given in Figure 1. Figure 1.a presents a high contour at altitude 255. This contour is run over by the flooding principle of both Meyer and Vincent-Soille. This is especially visible on Meyer’s algorithm, as in figure 1.c, the pass value between E and any other minima is 6 instead of 255. Vincent-Soille’s algorithm, while having the same kind of problem, tries to detect special pixel configurations called buttonholes, and thus produces thick lines. But in this case, the Vincent-Soille’s watershed is not thick enough, and the pass value between E and any other basin is 7 for the watershed contours, while it is 255 for the original image. The only correct result is produced both by the topological watershed and the cost based watershed, and is presented in figure 1.d. Vincent-Soille’s watershed algorithm aims at detecting watershed areas such as buttonholes. These areas are such that one cannot decide towards which
Watershed Algorithms and Contrast Preservation
69
A A A A A A A 2 2 2 2 2 2 2 A A A A A A A 2 20 20 20 20 20 2 A A A A A A A 2 20 10 10 10 20 2 30 A A A A A 30 30 30 10 10 10 30 30 B 30 A A A 30 C 1 30 10 10 10 30 0 B B 30 A 30 C C 1 30 30 10 30 30 0 B B B 10 C C C 1 1 1 10 0 0 0 (b) (a) A A A A A A A A A A A A A A A A A A A A A A A A A A A A A 20 20 20 20 20 A A A 20 20 20 A A A 20 10 10 10 20 A A 20 B 10 C 20 A A A 10 10 10 A A 30 A 10 10 10 A 30 30 30 10 10 10 30 30 30 B B 10 C C 30 B 30 10 10 10 30 C B 30 10 10 10 30 C B B B 10 C C C B B B 10 C C C B 30 30 10 30 30 C B B B 10 C C C B B B 10 C C C B B B 10 C C C B B B 10 C C C (c) (d) (e) Fig. 2. Another counter-example to pass values preservation. (a): Original “buttonhole” image, (b): Meyer’s watershed contours, (c): Vincent-Soille’s watershed contours, (d): cost-based watershed contours, (e): Topological watershed contours. One can note that the contour at altitude 20 is neither kept by Vincent-Soille’s algorithm, nor by Meyer’s algorithm. One can also note that both the cost-based and the topological watersheds preserve the pass values of the buttonhole (a), but the topological watershed (e) is thinner than the cost-based watershed (d)
minimum a drop falling on them will slide. Figure 2.a exhibits a particular case of buttonhole. Clearly, the pixels at altitude 20 are essential since they carry the pass value between the minimum A (level 2) and the minima B and C (level 1 and 0). We can observe on figures 2.b and 2.c that both Meyer’s algorithm and Vincent-Soille’s remove the contour at altitude 20; in fact, Meyer’s algorithm does not “see” at all this buttonhole. In both cases, pass value between A and B or C is at an altitude of 10 instead of 20 for the watershed contours. In order to preserve pass values on the buttonhole, we have two possibilitites: – either keeping in the watershed lines all the pixels of the buttonhole: that is what is done by the cost-based watershed (Fig. 2.d), which produces contours that cover the whole buttonhole; – or making a careful (but arbitrary) choice between all the contours possible in the buttonhole, the choice being such that it preserves the pass values. This is what is done by the topological watershed (Fig. 2.e). On real images, both cost based and Vincent-Soille’s watershed are very sensitive to buttonholes and the resulting watershed lines can cover a large part of the image [example not shown due to space constraint]. Meyer’s algorithm and the topological watershed compute thinner lines. Furthermore, cost-based watersheds produce very thick lines even in the absence of buttonhole, as noted in [10], and tend to isolate basins. Figure 3 illus-
70
Laurent Najman and Michel Couprie 0 0 0 0 1
4 3 2 1 0
5 4 3 2 0 (a)
6 5 4 3 0
0 6 5 4 0
A A A A 1
4 3 2 1 C
5 4 3 2 C (b)
6 5 4 3 C
B 6 5 4 C
A A A A 1
A A A 1 C
A A 3 C C (c)
6 5 C C C
B 6 C C C
Fig. 3. A greyscale image (a) and some results of watershed algorithms:(b) cost-based watershed and (c) result according to Vincent-Soille, Meyer or Topological watershed. Basin B is isolated in (b)
trates this problem. Indeed, those algorithms have been designed to compute basins, and not lines. Thus, they cannot be used as an entry for a saliency algorithm. Let us note that all watershed algorithms can produce thick watershed lines in some configurations (for instance, think of 4 lines crossing at one point).
4
Consequences and Conclusion
In this paper, we have shown that – Meyer’s and Vincent-Soille’s algorithms do not preserve important topological features of the image; in particular, pass values are not correct. Only cost-based and topological watersheds are correct from this point of view; – furthermore, cost-based watershed and Vincent-Soille’s algorithm can produce very thick watershed lines. Thus, only one watershed notion, the topological watershed, is suited to our task: the associated algorithm is the only one that produces a correct basin neighborhood map and correct pass values. For computing saliency, we need a measure of contrast of the watershed basins, such as depth (dynamics [20]), area or volume [3]. An important consequence of the results of this paper is that measures computed on the basins obtained by either Vincent-Soille’s algorithm or Meyer’s algorithm do not correspond to measures of connected components of lower-cross sections of the image. On the contrary, we have seen that the topological watershed allows such computations. Thus, it is not possible to use the propagation mechanism of the line-building versions of Meyer’s or Vincent-Soille’s algorithms to compute “on-the-fly” such a measure. Such a mechanism was implemented in the Najman-Schmitt’s saliency algorithm [1,2], and has also been proposed in [22], leading to non-correct results. We could think that we can correct past saliency algorithms by replacing their watershed operator by the topological watershed. Unfortunately, this is not enough. In future papers of the series, we are going to review past saliency algorithms, to show what hypothesis they rely on are wrong, and propose a novel efficient saliency algorithm.
Watershed Algorithms and Contrast Preservation
71
References 1. Najman, L.: Morphologie Math´ematique: de la Segmentation d’Images ` a l’Analyse Multivoque. PhD thesis, Universit´e Paris-Dauphine (1994) 2. Najman, L., Schmitt, M.: Geodesic saliency of watershed contours and hierarchical segmentation. IEEE Trans. on PAMI 18 (1996) 1163–1173 3. Vachier, C.: Extraction de caract´eristiques, segmentation d’images et Morphologie ´ Math´ematique. PhD thesis, Ecole Sup´erieure National des Mines de Paris (1995) 4. Breen, E., Jones, R.: Attribute openings, thinnings and granulometries. Computer Vision and Image Understanding 64 (1996) 377–389 5. Salembier, P., Oliveras, A., Garrido, L.: Anti-extensive connected operators for image and sequence processing. IEEE Trans. on Image Proc. 7 (1998) 555–570 6. Meijster, A., Wilkinson, M.: A comparison of algorithms for connected set openings and closings. IEEE Trans. on PAMI 24 (2002) 484–494 7. Meyer, F.: The dynamics of minima and contours. In P. Maragos, R.S., Butt, M., eds.: ISMM 3rd. Computational Imaging and Vision, Kluwer Academic Publishers (1996) 329–336 8. Lemar´echal, C., Fjørtoft, R., Marthon, P., Cubero-Castan, E.: Comments on ‘geodesic saliency of watershed contours and hierarchical segmentation’. IEEE Trans. on PAMI 20 (1998) 762–763 9. Schmitt, M.: Response to the comment “geodesic saliency of watershed contours and hierarchical segmentation”. IEEE Trans. on PAMI 20 (1998) 764–767 10. Roerdink, J., Meijster, A.: The watershed transform: Definitions, algorithms and parallelization strategies. Fundamenta Informaticae 41 (2000) 187–228 11. Couprie, M., Bertrand, G.: Topological grayscale watershed transform. In: SPIE Vision Geometry V Proceedings. Volume 3168. (1997) 136–146 12. Vincent, L., Soille, P.: Watersheds in digital spaces: An efficient algorithm based on immersion simulations. IEEE Trans. on PAMI 13 (1991) 583–598 13. Meyer, F.: Un algorithme optimal de ligne de partage des eaux. In: Actes du 8`eme Congr`es AFCET, Lyon-Villeurbanne, France (1991) 847–859 14. Meyer, F.: Topographic distance and watershed lines. Signal Processing 38 (1994) 113–126 Special issue on Mathematical Morphology. 15. Najman, L., Schmitt, M.: Watershed of a continuous function. Signal Processing 38 (1994) 99–112 Special issue on Mathematical Morphology. 16. Lotufo, R.A., Falcao, A.X., Zampirolli, F.A.: Ift-watershed from gray-scale marker. In: SIBGRAPI’02, Fortaleza-CE, Brazil (2002) 146–152 17. Goetcherian, V.: From binary to grey tone image processing using fuzzy logic concepts. Pattern Recognition 12 (1980) 7–15 18. Bertrand, G., Everat, J., Couprie, M.: Image segmentation through operators based upon topology. Journal of Electronic Imaging 6 (1997) 395–405 19. Couprie, M., Bezerra, F.N., Bertrand, G.: Topological operators for grayscale image processing. Journal of Electronic Imaging 10 (2001) 1003–1015 20. Grimaud, M.: A new measure of contrast: Dynamics. In: SPIE Vol. 1769, Image Algebra and Morphological Processing III, San Diego (1992) 292–305 21. Najman, L., Couprie, M.: Topological watershed and contrast preservation. Discrete Applied Mathematics (2003) In preparation, special issue on DGCI 2003. 22. Andrade, M.: A topological image segmentation method by attributes and applications. PhD thesis, Universidade Federal de Minas Gerais (Brazil) (1998)
Digital Flatness Valentin E. Brimkov1 and Reneta P. Barneva2 1
Inst. of Math. and Comp. Science, Bulg. Acad. of Sci., Sofia 1113, Bulgaria 2 SUNY Fredonia, Fredonia, NY 14063, USA {brimkov,barneva}@cs.fredonia.edu
Abstract. In this paper we define and study the notion of digital flatness. We extend to dimension two various definitions and classical results about digital lines and rays. In particular, we resolve a conjecture of Maurice Nivat restricted to the case of digital planes, and define and characterize 2D Sturmian rays. Keywords: Digital planarity, 2D Sturmian word, periodic array, digitization of planes, slope of digital planes
1
Introduction
Straight line/ray discretizations have been extensively studied over the last decades. Different aspects of “Digital straightness” have been found relevant to scientific disciplines as diverse as discrete geometry and topology, number theory, computer graphics, self-similarity studies in pattern recognition, periodicity studies in theory of words, as well as to some branches of physics and biology. Fundamental results characterizing digital lines and rays (in particular, their periodicity structure) have been obtained by Rosenfeld [16], Bruckstein [5], Brons [4], and others. Other deep theoretical results (e.g., related to properties of Sturmian words) have been obtained by Morse and Hedlund [13], Lunnon and Pleasants [11], Coven and Hedlund [6]. For a nice survey on the subject the reader is referred to [15]. At the same time, very little is done towards extending the existing theory to higher dimensions. For instance, most of the above cited theoretical studies on discrete lines and rays do not have their plane counterpart, although the discrete plane is a very basic primitive, widely used in computer imagery. Moreover, “digital flatness” is relevant to all scientific disciplines listed above, as sometimes the possible applications are even more significant than in 1D. Thus developing a relevant theory for the case of digital planes is seen as an important task. In the present work we propose a 2D extension of various concepts and results about digital rays. Some of these developments turn out to be quite perplexing due to certain intrinsic structural differences caused by the higher dimension. The paper is organized as follows. In the next section, we recall some wellknown basic knowledge from digital topology and combinatorics of 2D arrays, which we will need in order to describe our results. In Section 3, we propose a definition of a digital 2D ray and study its basic properties. In particular, we I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 72–81, 2003. c Springer-Verlag Berlin Heidelberg 2003
Digital Flatness
73
present two basic theorems about periodicity of rational and irrational digital rays. In Section 4, we present other related results. In particular, we define and characterize 2D Sturmian rays and resolve a conjecture by Maurice Nivat for the case of digital 2D rays. Further tasks are commented in the final Section 6. Since the proofs of the reported theorems considerably exceed the imposed space limit for the paper, most of them are omitted here and will be included in the full journal version. Detailed proofs of all theorems (from 1 through 8) together with a lot of illustrations, examples and additional references are available online in [2], as well as in [3].
2 2.1
Preliminaries Basic Notions of Digital Topology and Digital Geometry
Discrete coordinate plane consists of unit squares (pixels), centered on the integer points of the two-dimensional Cartesian coordinate system in the plane. Discrete coordinate space consists of unit cubes (voxels), centered on the integer points of the three-dimensional Cartesian coordinate system in the space. The pixels’/voxels’ coordinates are the coordinates of their centers. Sometimes they are called discrete points. The edges of a pixel/voxel are parallel to the coordinate axes. A set of discrete points is usually referred to as a discrete object. We presume that any possible reader of this paper would be familiar with the basic notions of digital topology and geometry, such as pixel/voxel adjacency and connectivity, separability and minimality of a discrete object in another object, etc. For a detailed accounting of these and other basic concepts we refer to [10]. 2.2
Periodicity, Repetitions and Tilings of Infinite Arrays
2 2 , the integer lattice Z+ , and the corresponding set Consider the real plane R+ of pixels with centers at the integer points and sides parallel to the coordinate 2 over an alphabet Σ is a axes. The pixels’ sides form a grid. An array A on Z+ 2 mapping from Z+ to Σ, i.e.,
... a2,0 A= a1,0 a0,0
... a2,1 a1,1 a0,1
... a2,2 a1,2 a0,2
... ... , where aij ∈ Σ. ... ...
2 is called a shape. Given Array on Z 2 is defined analogously. A subset s ⊆ Z+ 2 an array A on Z+ , by A[s] we denote the restriction of A to s. A[s] is connected if s is connected. We will call A[s] factor of A on shape s. A rectangular factor of size m × n will be called m × n-factor. Below we follow in part [7], where periodicity in infinite 2D arrays is considered. 2 (or on Z 2 ). Let S = A[s] be a factor of Definition 1. Let A be an array on Z+ 2 A on shape s (possibly, s = Z+ and S = A). A vector v is a symmetry vector
74
Valentin E. Brimkov and Reneta P. Barneva
for S if A(i, j) = A(v + (i, j)) for any point (i, j) ∈ s such that v + (i, j) is still in s. (If s = Z 2 , then clearly v + (i, j) ∈ s for any point (i, j).) v is periodicity vector (or a period) for S if for any integer k, the vector kv is a symmetry vector for S. n Definition 2. An array A on Z+ is lattice periodic if there are two linearly independent vectors u and v such that w = iu + jv is a period for A for any n . A is line periodic if all periods of A are pair of integers i, j for which w ∈ Z+ parallel vectors.
Note that the above definitions are similar but not equivalent to the well-known definitions from [1,8]. 2 . The set of its symmetry Definition 3. Let A be lattice periodic array on Z+ vectors is a subset of (is extendable to) a sublattice Λ of Z 2 . Then any basis of Λ will be considered as a basis of A. 2 is tiled by a tile W if it can be Definition 4. We say that an array A on Z+ ... ... ... represented in the form A = W W . . . , for certain rectangular block W . W W ...
We have the following proposition. 2 can be tiled by some tile W . Proposition 1. Any lattice-periodic array on Z+
3 3.1
Digital Planes Basic Definition
Consider the Euclidean plane P(α1 , α2 , α3 , β) = {(x1 , x2 , x3 ) ∈ R3 : α1 x1 + α2 x2 + α3 x3 = β}.
(1)
W.l.o.g., assume that P makes with the coordinate plane Ox1 x2 an angle θ, √ 0 ≤ θ ≤ arctan 2. (2) (See Figure 1.) Then the coefficient α3 of x3 in (1) will be nonzero. Dividing both sides of (1) by α3 , we obtain the following equivalent formulation: P(a1 , a2 , b) = {(x1 , x2 , x3 ) ∈ R3 : x3 = a1 x1 + a2 x2 + b},
(3)
β α2 1 where a1 = − α α3 , a2 = − α3 , b = α3 . We will consider digitizations of the plane P or its portions in the set of grid points Z3 = {(i, j, k) : i, j, k ∈ Z}. In terms of representation (3), we will digitize the third coordinate x3 over the integer grid points Z2 = {(i, j) : i, j ∈ Z} in the coordinate plane Ox1 x2 . Let us write (3) in a more general form:
P D (a1 , a2 , b) = {(x1 , x2 , x3 ) ∈ R3 : x3 = a1 x1 + a2 x2 + b, (x1 , x2 ) ∈ D ⊆ R2 }.
Digital Flatness
75
x3 (0,1,1) (1,0,1)
x2
O (0,0,0)
x1
Fig. 1. A plane forming an angle arctan
√
2 with the plane Ox1 x2 .
We call P D (a1 , a2 , b) a restriction of P(a1 , a2 , b) to domain D. We have that P D (a1 , a2 , b) is connected as long as D is connected. Also, D P (a1 , a2 , b) is bounded (resp. unbounded) if and only if D is bounded/ unbounded. Note, however, that a 2D domain D admits many different shapes, whether D is bounded or not. (The possible unbounded shapes are even infinitely many.) As far as in our study periodicity properties of digitized planes are concerned, it is reasonable to restrict ourselves to a few cases. When D is a finite domain, we will assume that it is the rectangle D = {(x1 , x2 ) : m1 ≤ x1 ≤ n1 , m2 ≤ x2 ≤ n2 , m1 , n1 , m2 , n2 ∈ Z}. Then the corresponding portion P D (a1 , a2 , b) of P(a1 , a2 , b) will be a space rectangle. For an infinite domain D, one can consider the following three basic cases: (a) D is a quadrant; (b) D is a half-plane; (c) D is the whole plane. Note that the first case corresponds to a ray while the third one to a line in the plane. Therefore, if D is a quadrant, we will call P D (a1 , a2 , b) a 2D ray. The second case of a half-plane does not have a 1D counterpart. We will deal mostly with digitizations of 2D rays. The other cases can be handled in a similar (although not fully identical) way. Below we explain how one can digitize P D (a1 , a2 , b) when D is a 2D ray, i.e., the first quadrant of the plane. Formally, we have P D (a1 , a2 , b) = {(x1 , x2 , x3 ) ∈ R3 : x3 = a1 x1 + a2 x2 + b, (x1 , x2 ) ∈ D}, where D = {(x1 , x2 ) : 0 ≤ x1 , x2 ≤ ∞}. We discretize P D (a1 , a2 , b) in Z3+ = {(i, j, k) : i, j, k ∈ Z+ }, where Z+ is the set of nonnegative integers. We discretize the third coordinate x3 over the nonnegative integer grid points Z2+ = {(i, j) : i, j ∈ Z+ } in the first quadrant QuadI. Consider an array on Z2+ ... ρ2,0 ρ= ρ1,0 ρ0,0
... ρ2,1 ρ1,1 ρ0,1
... ρ2,2 ρ1,2 ρ0,2
... ... , ... ...
whose elements are the intersection points of P D (a1 , a2 , b) with the vertical grid lines. Let (i, j, Ii,j ) ∈ Z3 be the grid point nearest to ρi,j . If there are two nearest points (i.e., in a vertical distance 12 from ρi,j ), we choose the upper one. Formally, we have that the discretization of P D (a1 , a2 , b) over Z3+ is Ia1 ,a2 ,b = {(i, j, Ii,j ) :
76
Valentin E. Brimkov and Reneta P. Barneva
i, j ≥ 0, Ii,j = a1 x1 + a2 x2 + b + 12 }. It has a slope vector (a1 , a2 ) and intercept b. The discretization of a 2D ray R will alternatively be denoted discr(R). The following theorem is an analog of a result about digital rays [16]. Theorem 1. A discretization of a plane P is 2-minimal in Z3+ . Corollary 1. A discretization of a 2D ray is 2-minimal in Z3+ . The plane P intersects the coordinate planes Ox1 x3 and Ox2 x3 in straight lines with equations x3 = a1 x1 + b, x2 = 0 and x3 = a2 x2 + b, x1 = 0, respectively. Considered in the plane Ox1 x3 , the first line has slope a1 , while the second has slope a2 . The slope vector of the plane has the slopes of these two lines as coordinates. Now we define a digital 2D ray ra1 ,a2 ,b with a slope vector (a1 , a2 ) and intercept b, as follows:
ra1 ,a2 ,b
... ... ... ... ra1 ,a2 ,b (2, 0) ra1 ,a2 ,b (2, 1) ra1 ,a2 ,b (2, 2) . . . = , ra1 ,a2 ,b (1, 0) ra1 ,a2 ,b (1, 1) ra1 ,a2 ,b (1, 2) . . . ra1 ,a2 ,b (0, 0) ra1 ,a2 ,b (0, 1) ra1 ,a2 ,b (0, 2) . . .
where ra1 ,a2 ,b (i, j) are called cell codes and defined for i, j ≥ 0, as follows: Let us set ra1 ,a2 ,b (0, 0) = I0,0 . Defining the 0-th digitized row: ra1 ,a2 ,b (0, j + 1) = I0,j+1 − I0,j =
0, if I0,j+1 = I0,j , 1, if I0,j+1 = I0,j + 1
Defining the 0-th digitized column: ra1 ,a2 ,b (i + 1, 0) = Ii+1,0 − Ii,0 =
0, if Ii+1,0 = Ii,0 , 1, if Ii+1,0 = Ii,0 + 1
Defining the i-th digitized row: ra1 ,a2 ,b (i, j + 1) = Ii,j+1 − Ii,j =
0, if Ii,j+1 = Ii,j , 1, if Ii,j+1 = Ii,j + 1
Alternatively, we can digitize the array columnwisely. Defining the i-th digitized column: 0, if Ii+1,j = Ii,j ra1 ,a2 ,b (i + 1, j) = Ii+1,j − Ii,j = . 1, if Ii+1,j = Ii,j + 1 Note that the 0-th row and the 0-th column are the same both in the rowwise and the columnwise digitizations. Code 0 can be interpreted as a horizontal rowwise/columnwise grid increment, and 1 as a vertical rowwise/columnwise increment in the grid N3 . Because of assumption (2), horizontal/vertical move
Digital Flatness
77
from one integer point to another in the domain D can increase the z-coordinate by at most 1. Once the 0-th row or column is generated, one can build the rest of the array either rowwisely or columnwisely. ra1 ,a2 ,b is called digitization of the 2D ray P D . The digitization of a 2D ray R will alternatively be denoted by digit(R). We will also say that the digital 2D ray ra1 ,a2 ,b is generated by the 2D ray x3 = a1 x1 + a2 x2 + b. If for two 2D rays ra1 ,a2 ,b and ra1 ,a2 ,b the number b − b is integer, then clearly ra1 ,a2 ,b = ra1 ,a2 ,b . Thus, without loss of generality we may assume that the intercepts are limited to 0 ≤ b ≤ 1. For any b, the special digital rays r0,0,b and r1,1,b are composed entirely by 0’s and 1’s, respectively. Let (a1 , a2 ) be the slope-vector of a plane discretization Ia1 ,a2 ,b . Ia1 ,a2 ,b (as well as the corresponding digital 2D ray ra1 ,a2 ,b and the Euclidean plane x3 = a1 x1 + a2 x2 + b) is called rational if both a1 and a2 are rational numbers. Otherwise, it is called irrational. The following theorem is an analog of a well-known result of Bruckstein about digital rays [5]. Theorem 2. For irrational plane with a slope vector (a1 , a2 ), the plane discretization Ia1 ,a2 ,b uniquely determines both (a1 , a2 ) and b. For rational (a1 , a2 ), Ia1 ,a2 ,b uniquely determines (a1 , a2 ), and b is determined up to an interval. 3.2
Periodicity Properties of 2D Digital Rays
We consider separately the cases of rational and irrational 2D rays. Rational Digital 2D Rays. Consider the rational 2D ray R = P D (a1 , a2 , b), its discretization discr(R) = Ia1 ,a2 ,b , and the corresponding digital 2D ray digit(R) = ra1 ,a2 ,b . The coefficients a1 , a2 , b are rational numbers. Without loss of generality we may assume that they are integer and that R contains integer points. These integer points belong to a 2-dimensional integer lattice Λ ∈ Z3 in the plane P = P(a1 , a2 , b) = {(x1 , x2 , x3 ) ∈ R3 : x3 = a1 x1 +a2 x2 +b}. Consider a basis B for Λ, i.e., a linearly independent system of integer vectors B = {e1 , e2 }, such that {x : a1 x1 + a1 x2 + a1 x3 = b, x ∈ Zn } = {e0 + λ1 e1 + λ2 e2 , λ1 , λ2 ∈ Z} where e0 is an arbitrary integer point in P . Note that Λ has different bases. For instance, in Figure 2a, any one of the pairs of vectors B1 = {e1 , e2 }, B2 = {−e1 , e2 }, B3 = {e1 , −e2 }, and B4 = {−e1 , −e2 } constitutes a basis. Geometrically, for a given basis {e1 , e2 }, the whole plane P is partitioned into parallelograms spanned on the basis vectors. (See Figure 2a.) Any two parallelograms are equivalent up to translation. Every lattice point can be obtained from any other lattice point by consecutive passes along the vectors e1 , e2 , −e1 , or −e2 . The discretization discr(P ) and the digitization digit(P ) are periodic as well. discr(P ) has period vectors e1 and e2 , while digit(P ) has as period vectors the projections of e1 and e2 on the coordinate plane Ox1 x2 . For an integer point e0 ∈ P D (a1 , a2 , b), one can obtain identical periodicity picture of R, discr(R), and ra1 ,a2 ,b = digit(R). (See Figure 2b.) In view of the above discussion, one can consider digit(P ) and digit(R) as tiled by a tile with
78
Valentin E. Brimkov and Reneta P. Barneva
x3
x3
e3 e4
e2
e1
O
x2
O
x2
x1
x1
a)
b)
Fig. 2. Illustration to the proof of Theorem 3. a) The 2D integer lattice Λ in the plane P and some of its bases. b) The points of the integer lattice Λ of the 2D ray R.
a shape of parallelogram formed by the vectors of a given basis. It follows from Proposition 1 that ra1 ,a2 ,b and raD1 ,a2 ,b are also tiled by a rectangular tile of suitable size. We have seen that the lattice of the integer points of a plane or a 2D ray can be generated by different bases which feature different parallelogram partitions (Figure 2). Nevertheless, it is a well-known fact from lattice theory that the lattice cells have the same area for all possible bases. It equals the value max(|α1 |, |α2 |, |α3 |), where α1 , α2 , α3 are the coefficients in the plane representation (1) with gcd(α1 , α2 , α3 ) = 1. The above discussion leads us to the following 2D version of a result of Brons about digital rays [4]. Theorem 3. Rational digital 2D rays are lattice-periodic. For a given basis of the lattice, the corresponding lattice cells are parallelograms. For all possible bases, the lattice cells have the same area which equals the maximal by absolute value coefficient in the plane representation α1 x1 + α2 x2 + α3 x3 = β with gcd(α1 , α2 , α3 ) = 1. Irrational Digital 2D Rays. This case is more involved than the previous one. Note that, unlike with an irrational line digitization which is always aperiodic, a plane digitization may be line-periodic. Let P be an irrational plane. The possible periodicity structure of digit(P ) depends on the integer (or rational) points which P contains. Careful analysis of all the possibilities leads us to the following theorem. Theorem 4. Irrational digital 2D rays are either aperiodic or line-periodic. The proof relies particularly on the fact that a plane P with irrational coefficients can either contain no integer points, or exactly one integer point, or an infinite set of integer points belonging to a line which is on P or parallel to P .
Digital Flatness
4
79
Sturmian Planes and 2D Rays
Remember that an m × n factor of a 2D array A (finite or infinite) is any m × n subarray of A. We define complexity function PA (m, n) of A as the number of the different m × n factors of A. In particular, we have PA (0, 0) = 1 (the empty word is the unique factor in this case), while PA (1, 1) is the size of the alphabet. Thus for a binary alphabet {0, 1} we have PA (1, 1) = 2. Further we will consider arrays on the alphabet {0, 1}. Definition 5. We call a digital 2D ray r Sturmian if Pr (m, n) = mn + 1. The following properties as a matter of fact are based on the Kronecker theorem. Theorem 5. All digital planes/2D rays with an irrational slope vector (a, b) contain the same set of rectangular factors. Theorem 6. Let r be an irrational digital 2D ray. Then every rectangular factor appearing in r, appears in it infinitely many times. Theorem 7. Any rectangular factor of an irrational digital plane is a factor of a certain rational digital plane. We now present a fundamental theorem. Theorem 8. Let r be a digital 2D ray. (a) If Pr (m, n) ≤ mn for some integers m, n ≥ 0, then r has at least one periodicity vector. (b) If r is rational, then it is lattice-periodic and has at least two linearly independent periodicity vectors. In this case, Pr (m, n) is bounded for any m, n ≥ 0. Also, Pr (m, n) ≤ mn − k always holds for some m, n, k ≥ 0, 0 ≤ k ≤ mn. (c) If r is irrational, then the inequality Pr (m, n) ≤ mn − k may hold for some m, n, k ≥ 0, 0 ≤ k ≤ mn − max(m, n). This may happen only if r is line-periodic. If r is aperiodic, then Pr (m, n) = mn + 1, i.e., r is Sturmian. (d) If r is irrational, then Pr (m, n) is always unbounded, even if Pr (m, n) ≤ mn − k for some m, n, k ≥ 0. For example, the rational digital 2D ray digitcol (R 12 , 13 ) is lattice-periodic, as illustrated below. (Bullet signes indicate lattice elements.) . 7 6 5 4 digitcol (R 12 , 13 ) : 3 2 1 0
. . . . . . . . . . 0 0 0 0 0 0 0 0 0 . •0 1 •0 1 •0 1 •0 1 •0 . 1 0 1 0 1 0 1 0 1 . 0 0 0 0 0 0 0 0 0 . •0 1 •0 1 •0 1 •0 1 •0 . 1 0 1 0 1 0 1 0 1 . 0 0 0 0 0 0 0 0 0 . •0 1 •0 1 •0 1 •0 1 •0 . 0 1 2 3 4 5 6 7 8 .
80
Valentin E. Brimkov and Reneta P. Barneva
As an example of aperiodic irrational digital 2D ray, consider the digital Fi+ φx2 , bonacci 2D ray defined as a digitization of the 2D ray RF ib : x3 = φx √ 1 x1 , x2 ≥ 0. Here φ is one of the golden ratio numbers, namely φ = 5−1 2 . (We √ 1 1+ 5 also have φ = τ , where τ = 2 = 0.618033988 . . ..) Through our digitization process we obtain the following digitization of the lower-left corner of RF ib : 12 11 10 9 8 7 digit(RF ib ) : 6 5 4 3 2 1 0
. . . . . . . . . . . 011011010 . . . 1011011010 . . . 0101101101 0 . . . 1010110110 1 0 . . . 1101011011 0 1 0 . . . 0110101101 1 0 1 0 . . . 1011010110 1 1 0 1 0 . . . 1101101011 0 1 1 0 1 0 . . . 0110110101 1 0 1 1 0 1 0 . . . 1011011010 1 1 0 1 1 0 1 0 . . . 0101101101 0 1 1 0 1 1 0 1 0 . .. 1010110110 1 0 1 1 0 1 1 0 1 0 ... 0101011011 0 1 0 1 1 0 1 1 0 1 ... 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 . . .
Note that, since the array is symmetric with respect to the line x1 = x2 , we have digitcol (RF ib ) = digitrow (RF ib ). Assertion (a), stated for an arbitrary array r, is sometimes referred as to Nivat’s conjecture [14]. Only partial results for small values of m and n have been proved regarding this conjecture. A weaker statement is proved in [7] under 1 the condition Pr (m, n) ≤ 100 mn. Theorem 8a resolves Nivat’s conjecture for the important case of arrays that are digital planes.
5
Concluding Remarks
In this paper we studied properties of digital planes (in particular, characterization of their periodicity/aperiodicity) in terms of 2D arrays on binary alphabets. Similar “linguistic” approach to studying digital straightness led to designing efficient linear-time algorithms for digital straight line recognition (see, e.g., [17,18]). Further research task is seen in extending these algorithms to digital plane recognition algorithms, based on the results presented above. We believe that some of the ideas and results of this work, combined with those of Epifanio, Koskas, and Mignosi [7], may lead to a complete proof of Nivat’s conjecture for the case of arbitrary digital arrays (not necessarily digitizations of 2D rays). The irrational digital lines possess certain properties that are reminiscent to ones of the Penrose tilings of the plane (see, e.g., [9]), the latter being found relevant to the structure of the quasicrystals. It would be worth to explore such kind of interesting relations when irrational digital planes are involved.
Digital Flatness
81
Acknowledgments We would like to thank Azriel Rosenfeld who encouraged this research. We are grateful to Reinhard Klette who read an extended preliminary version of the paper, suggested certain improvements, and proposed to post it as a technical report on the CITR web site. Thanks go also to Eric Andres, Alberto Apostolico, and Stefan Dantchev for some helpful discussions. We thank the referees for some useful remarks and suggestions. The reported research was done in part while the first author was visiting the Laboratory on Signal, Image and Communication, CNRS, University of Poitiers, France, and supported by a grant from the University of Poitiers.
References 1. Amir, A., G. Benson, Two-dimensional periodicity and its applications, Proc. 3rd ACM-SIAM Symp. on Discrete Algorithms (1992) 440-452 2. Brimkov, V.E., Digital flatness and related combinatorial problems, CITR-TR-120, University of Auckland, New Zealand (2002) 44 pages, http://www.citr.auckland.ac.nz/techreports/?year=2002 3. Brimkov, V.E., Notes on digital flatness, TR 2002-01, Laboratory on Signal, Image and Communication, CNRS, University of Poitiers, France, July 2002, 52 pages 4. Brons, R., Linguistic methods for description of a straight line on a grid, Computer Graphics Image Processing 2 (1974) 48-62 5. Bruckstein, A.M., Self-similarity properties of digitized straight lines, Contemp. Math. 119 (1991) 1–20 6. Coven, E., G. Hedlund, Sequences with minimal block growth, Math. Systems Theory 7 (1973) 138–153 7. Epifanio, Ch., M. Koskas, F. Mignosi, On a conjecture on bidimensional words, http://dipinfo.math.unipa.it/mignosi/periodicity.html 8. Galil, Z., K. Park, Truly alphabet-independent two-dimensional pattern matching, Proc. 33rd IEEE Symp. Found. Computer Science (1992) 247–256 9. Gr¨ unbaum, B., G.C. Shephard, Tilings and patterns, Freeman & Co, New York, 1987 10. Kong, T.Y., A. Rosenfeld, Digital topology: introduction and survey, Comput. Vision Graphics Image Processing 48 (1989) 357–393 11. Lunnon, W.F., P.A.B. Pleasants, Characterization of two-distance sequences, J. Austral. Math. Soc. (Ser. A) 53 (1992) 198–218 12. Mignosi, F., G. Perillo, Repetitions in the Fibonacci infinite words, RAIRO Theor. Inform. Appl. 26 (1992) 199–204 13. Morse, M., G.A. Hedlund, Symbolic dynamics II: Sturmian sequences, Amer. J. Math. 61 (1940) 1–42 14. Nivat, M., Invited talk at ICALP’97 15. Rosenfeld, A., R. Klette, Digital straightness, Electronic Notes in Theoretical Computer Science 46 (2001) URL: http://www.elsevier.nl/locate/entcs/volume46.html 16. Rosenfeld, A., Digital straight line segments, IEEE Trans. Computers 23 (1974) 1264–1269 17. Smeulders, A. W. M., L. Dorst, Decomposition of discrete curves into piecewise segments in linear time, Contemporary Mathematics 119 (1991) 169–195 18. Wu, A. Y., On the chain code of a line, IEEE Trans. Pattern Analysis Machine Intelligence 4 (1982) 347–353
Shape Preserving Digitization of Ideal and Blurred Binary Images Ullrich K¨ othe and Peer Stelldinger Cognitive Systems Group, University of Hamburg, Vogt-K¨ oln-Str. 30, D-22527 Hamburg, Germany
Abstract. In order to make image analysis methods more reliable it is important to analyse to what extend shape information is preserved during image digitization. Most existing approaches to this problem consider topology preservation and are restricted to ideal binary images. We extend these results in two ways. First, we characterize the set of binary images which can be correctly digitized by both regular and irregular sampling grids, such that not only topology is preserved but also the Hausdorff distance between the original image and the reconstruction is bounded. Second, we prove an analogous theorem for gray scale images that arise from blurring of binary images with a certain filter type. These results are steps towards a theory of shape digitization applicable to real optical systems.
1
Introduction
When an analog image is digitized, much of its information may get lost. Therefore, it is important to understand which information is preserved. In this paper, we will be concerned with the problem of shape preservation. In particular, we would like discrete regions to have the same topology as their analog originals, and geometric distortions to be bounded. This problem of topology preservation was first investigated by Pavlidis [3]. He showed that a particular class of binary analog shapes (which we will call r-regular shapes, cf. definition 4) does not change topology under discretization with any sufficiently dense square grid. Similarly, Serra showed in [5] that the homotopy tree of r-regular sets is preserved under discretization with any sufficiently dense hexagonal grid. Both results apply to binary sets and the so called subset digitization, where a pixel is considered part of the digital shape iff its center is element of the given set. Real images are always subjected to a certain amount of blurring before digitization. Blurring is an unavoidable property of any real optical system. It can be described by a convolution of the analog image with the point spread function (PSF) of the optical system. After convolution, analog images are no longer binary, and the above theorems do not apply. Latecki et al. [1] theorefore generalized the findings of Pavlidis to other digitizations including the square subset and intersection digitizations. These digitizations can be interpreted as subset digitizations of a level set of the blurred image where the PSF is a square I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 82–91, 2003. c Springer-Verlag Berlin Heidelberg 2003
Shape Preserving Digitization of Ideal and Blurred Binary Images
(a)
(b)
(c)
83
(d)
Fig. 1. Comparison of similarity criteria. (a) and (b) are topologically equivalent, (b) and (c) have the same homotopy tree, (c) and (d) have a very small Hausdorff distance when overlaid. No pair fulfills more than one condition.
with the same size as the pixels. Under this paradigm, topology preservation requires to halve the sampling distance. In contrast, Ronse and Tajine [4] based their approach to digitization on the Hausdorff distance, i.e. a geometric measure of shape similarity. They proved that in the limit of infinitely dense sampling the Hausdorff distance between the original and digitized shapes converges to zero. However, they do not analyse under which circumstances the topology remains unchanged. In this paper, we combine the three shape similarity criteria topological equivalence, identical homotopy tree and bounded Hausdorff distance. We prove that r-regularity is a sufficient condition for an analog set to be reconstructible (in the sense that all three criteria are met simultaneously) by any regular or irregular grid with sampling distance smaller than r. The results of [3,5] are obtained as corollaries of this theorem. We also apply these findings to binary images blurred with a flat disk-like PSF and show that the sampling density has to be increased according to the PSF’s radius to ensure correct reconstruction.
2
Shape Similarity
Given two sets A and B, their similarity can be expressed in several ways. The most fundamental is topological equivalence. A and B are topologically equivalent if there exists a bijective function f : A → B with f and f −1 continuous. Such a function is called a homeomorphism. However, it does not completely characterize the topology of a set when it is embedded in the plane IR2 . Therefore, [5] introduced the homotopy tree which encodes whether some components of A enclose others in a given embedding. Fig. 1 (a) to (c) illustrate how shapes may differ if they are either topologically equivalent or have the same homotopy tree. We can capture both notions simultaneously when we extend the homeomorphism f to the entire IR2 plane. Then it refers to a particular planar embedding of A and B and defines a mapping Ac → B c for the set complements as well. This ensures preservation of both the topology and the homotopy tree. We call this an IR2 -homeomorphism. Geometric similarity between two shapes can be measured by the Hausdorff distance dH (∂A, ∂B) = max max min d(x, y), max min d(x, y) x∈∂A y∈∂B
y∈∂B x∈∂A
84
Ullrich K¨ othe and Peer Stelldinger
between the shapes’ boundaries. Fig. 1 (c) and (d) shows two shapes with small Hausdorff distance that are not IR2 -topologically equivalent. All these criteria are necessary to regard a reconstructed image as similar to the original. Thus we combine them and call two sets r-similar if there exists a IR2 -homeomorphism that maps A into B, and dH (∂A, ∂B) ≤ r. That is, two sets A, B are r-similar, iff they are topologically equivalent, have the same homotopy tree, and their boundaries have a bounded Hausdorff distance.
3
Reconstructible Images
A set A ⊆ IR2 can be transformed into an analog binary image by means of the characteristic function of the set χA : IR2 → {0, 1}, χA (x) = 1 iff x ∈ A. A discretisation is obtained by storing the values of this image only at a countable number of sampling points. To characterize sampling formally, we must restrict the distance of the sampling points: Definition 1. A countable set S ⊂ IR2 of points with dH (IR2 , S) ≤ r for some r ∈ IR+ such that for each bounded set A the subset S ∩ A is finite, is called rgrid. The elements of S are the sampling points, and their associated Euclidean Voronoi regions are the pixels: PixelS : S → P(IR2 ),
PixelS (s) := {x : ∀s ∈ S \ {s} : |x − s| ≤ |x − s |}
The intersection of A ⊆ IR2 with S is called the S-digitization of A, and the restriction of the domain of A’s characteristic function to S is the associated digital binary image: DigS (A) := A ∩ S DigitalImageS (χA ) := χA |S : S → {0, 1} This definition is very broad and captures not only the usual rectangular and square grids, but also other regular and even irregular grids, provided their Voronoi regions have bounded radius, see fig. 2. As it is not useful to directly compare a discrete set with an analog one, we reconstruct an analog set from the given digitization. This is done by assigning the information stored at each sampling point to the entire surrounding pixel: Definition 2. Given a set A ⊆ IR2 and a grid S, the S-reconstruction of DigS (A) is defined as PixelS (s) Aˆ = RecS (DigS (A)) = s∈(S∩A)
The results of a reconstruction process will be considered correct if the reconstructed set Aˆ is sufficiently similar to the original set A. Formally, we get Definition 3. A set A ⊆ IR2 is reconstructible by an r-grid S if the S-reconstruction Aˆ is r-similar to A.
Shape Preserving Digitization of Ideal and Blurred Binary Images
(a)
(b)
(c)
85
(d)
Fig. 2. Many different grid types can be described when pixels are defined as the Voronoi regions of suitably located sampling points. These include regular grids like the square (a), hexagonal (b) and trigonal ones (c), and irregular grids (d) as found in natural image acquisition devices like the human eye.
This definition imposes stricter conditions on reconstruction than preservation of topology or homotopy trees as used by Pavlidis and Serra. Pavlidis gave a weaker bound for the Hausdorff distance and did not prove that the homotopy tree remains unchanged, while Serra didn’t prove topology preservation. Corollary 1 shows that their geometric sampling theorems can be strengthened according to our requirements. We recall the definition of the type of shapes they looked at: Definition 4. A compact set A ⊂ IR2 is called r-regular iff for each boundary point of A it is possible to find two osculating open balls of radius r, one lying entirely in A and the other lying entirely in Ac . In the following we will show that an r-regular set is reconstructible by any grid with sufficiently small pixel size, regardless of the grid structure. The following lemmas describe some prerequisites. We only formulate them for the foreground A, but their claims and proofs apply to the background Ac analogously. Lemma 1. Let A be an r-regular set and Aˆ the reconstruction of A by an r -grid S, with 0 < r < r. Then two sampling points lying in different components of ˆ A cannot lie in the same component of A. Proof. Since the Hausdorff distance of two components of A is at least 2r (cf. [1,2]), and the S-reconstruction of any component A is a subset of the r -dilation of A , the Hausdorff distance between two components of Aˆ is at least 2r−2r > 0. Thus the reconstruction process cannot merge two components of A. Lemma 2. Let A be a component of an r-regular set A, S be an r -grid, 0 < r < r < r. Further, let A = (A B r )0 be the interior of the erosion of A with a closed ball of radius r , and Si := {s ∈ S : Pixel(s) ∩ A = ∅} the set of all sampling points whose pixels intersect A . Then at least one member of Si is in A . Proof. Since A is r-regular, every component A contains at least one ball of radius r. The center m of such a ball lies in A . Let s ∈ S be a sampling point with m ∈ Pixel(s). Then s is also element of Si and the distance between s and m is at most r < r . Thus, s lies within A .
86
Ullrich K¨ othe and Peer Stelldinger
Lemma 3. Let A, A , S and Si be defined as in lemma 2. Then any pair of pixels with sampling points in Si is connected by a chain of adjacent pixels whose sampling points are also in Si . Pixels are adjacent if they have a common boundary edge (direct neighborhood). Proof. Every component A of an r-regular set A is r-regular, too. Thus A is an open, connected set. Now let s1 and s2 be sampling points in Si . The interior of their pixels intersects A , and there exist two points s1 , s2 lying in (Pixel(s1 ))0 ∩A and (Pixel(s2 ))0 ∩A respectively. s1 and s2 can be connected by a path in A which, without loss of generality, does not intersect any pixel corner. The sampling points of all pixels intersecting this path are in Si as well. The order in which the path enters those pixels defines a chain of adjacent pixels. Lemma 4. Let A, A , S and Si be defined as in lemma 2. Then each sampling point lying in A is either a member of Si or is connected to a member of Si by a chain of adjacent pixels whose sampling points all lie in A . Proof. Let c be any sampling point in A . Then there exists a ball of radius r in A such that c lies in the ball. Let m ∈ A be the center of the ball. The halfline starting at c and going through m crosses the boundary of the convex Pixel(c) at exactly one point c . If d(c, m) ≤ d(c, c ), the point m is part of Pixel(c) and thus c ∈ Si . If d(c, m) > d(c, c ), let g be the line defined by the edge of Pixel(c) going through c . If there are two such lines (i.e. if c is a corner of Pixel(c)), one is chosen arbitrarily. Due to the definition of Voronoi regions the point c constructed by mirroring c on g is a sampling point in S, and Pixel(c ) is adjacent to Pixel(c). Since c := d(c , c) = d(c , c ), the point c always lies on the circle of radius c with center c . Among all points on this circle, c has the largest distance to m, and in particular d(m, c ) < d(m, c). Thus, the sampling point c lies in A , and is closer to m than c. We can repeat this construction iteratively to obtain a sequence of adjacent pixels whose sampling points successively get closer to m. Since there are only finitely many sampling points in A , one such pixel will eventually intersect A . Theorem 1 (sampling theorem for ideal binary images). Let r ∈ IR+ and A an r-regular set. Then A is reconstructible with any r -grid S, 0 < r < r. Proof. Due to lemma 2 there is a mapping of the foreground components of ˆ Lemma 1 states that this mapping is A to the foreground components of A. injective, and from lemmas 3 and 4 follows surjectivity. The same holds for the ˆ This implies a one-to-one mapping between background components of A and A. ˆ Due to lemma 4, both the foreground and background the boundaries of A and A. components of Aˆ are connected via direct pixel neighborhood. Therefore, their boundaries are Jordan curves. The same holds for the boundaries of A due to r-regularity. Consequently, an R2 -homeomorphism can be constructed, and A and Aˆ are R2 -topologically equivalent.
Shape Preserving Digitization of Ideal and Blurred Binary Images
87
It remains to be shown that the Hausdorff distance between the boundaries of A and Aˆ is restricted. Suppose to the contrary that ∂ Aˆ contains a point s whose distance from ∂A exceeds r . Due to the definition of an r -grid, the sampling points of all pixels containing s are located in a circle around s with radius r . Under the supposition, this circle would either be completely inside or outside A, and the pixels were all either in Aˆ or Aˆc . Thus, s could not be on ∂ Aˆ – contradiction. Therefore, the Hausdorff distance between ∂A and ∂ Aˆ is at most r . This geometric sampling theorem does not only apply to square or hexagonal grids, but also to irregular grids as can be found in the human retina, see fig. 2. Moreover, if a set is reconstructible by some grid S due to this theorem, this also holds for any translated and rotated copy of the grid. Moreover, it can be shown that r-regularity is not only a sufficient but also a nessessary condition for a set to be reconstructible. That is, if A is not r-regular for some r, there exists an r-grid S such that the S-reconstruction is not topologically equivalent to A. Due to space limitations, the proof of this claim had to be omitted. The sampling theorems of Serra and Pavlidis are corollaries of theorem 1: Corollary 1. Let S1 := h1 · ZZ be the square grid with grid size (minimal samh1 is reconstructible pling point distance) h1 . Then every r-regular set with r > √ 2 with S1 . Let S2 be the hexagonal grid with grid size h2 . Then every r-regular set h2 is reconstructible with S2 . with r > √ 3
4
Sampling of Blurred Images
In the previous section we worked exclusively with the subset digitization where a sampling point is set if it lies within the foreground region of the binary image. Unfortunately, this digitization scheme can never be realized in practice: Every real optical system blurs the binary image before the light reaches the optical sensors. The finite area of real sensors introduces additional blurring. Both effects can be described by a convolution of the ideal binary image with a suitable point spread function. Thus, the image actually observed is always a gray-scale image. A binary image can be recovered by considering a particular level set Ll = {x ∈ IR2 |fˆ(x) ≥ l} of the blurred image fˆ, i.e. by thresholding. Since thresholding and digitization commute, we can apply thresholding first and then digitize the resulting level set by standard subset digitization. (This order facilitates the following proofs.) Now the question arises if and how we can bound the difference between the original set before blurring and the S-reconstruction of a level set of the blurred image. We first analyse the relationship between the original set and an analog level set, and then between the level set and its S-reconstruction. In order to get definitive results, we restrict ourselves to a particular type of PSF, namely flat disks of radius p. Flat, disk-shaped PSFs have the advantage that the result of the convolution can be calculated by measuring the area of sets. In the sequel, A shall be an r-regular set and kp a disk PSF with radius
88
Ullrich K¨ othe and Peer Stelldinger
A c c1
c3
c4
c2
p
c
A p
c4+(c−c3) s
b0
s1
A c3’
c3 g0
c2
c0
r’
r
c1
r s2
r
r
b2
Fig. 3. If a p-ball is shifted orthogonally to the boundary ∂A from an inner osculating to an outer osculating position, its intersection area with A strictly decreases.
b1
Fig. 4. The boundary of the circle b0 centered at point c0 (light gray) intersects the boundary of the set A (bold line) at the two points s1 and s2 . Since A is r-regular, its boundary can only lie within the area marked with dark gray.
p < r. If Kp (c) denotes the PSF’s support region after translation to the point c, the result of the convolution at c is given by: Kp (c) ∩ A fˆ(c) = (kp χA )(c) = Kp (c) where denotes convolution and . is the area size. Therefore, it is possible to derive properties of the level sets by purely geometrical means. Obviously, all interesting effects occur in a 2p-wide strip Ap = ∂A ⊕ Kp around the boundary ∂A, because out of this strip the kernel does not overlap ∂A, and the gray values are either 0 or 1 there (⊕ denotes morphological dilation). Level sets have the following property: Lemma 5. Let s be a point on ∂A, and let c1 and c2 be the centers of the inside and outside osculating circles of radius r. Moreover, let c3 and c4 be the two points on the normal c1 c2 with distance p from s. Then the boundary of every level set has exactly one point in common with c3 c4 . Proof. Consider a point c in Kp (c3 ) and translate the line segment c3 c4 by c − c3 (see fig. 3). Because of the restricted curvature of ∂A, the translated line segment intersects ∂A at exactly one point. Thus, as t ∈ [0, 1] increases, the area of Kp (c3 + t · (c4 − c3 )) ∩ A is strictly decreasing. This area is proportional to the result of the convolution, so the same holds for the gray values. Since the p-ball centered in c3 is an inside osculating ball of A, the gray value at c3 is f (0) = 1. Likewise, f (1) = 0. This implies the lemma. The curvature of the level set contours is bounded by the following lemma: Lemma 6. Let c0 ∈ Ap be a point such that (AKp )(c0 ) = l, (0 < l < 1). Thus, c0 is part of level set Ll . Then there exists a circle bout of radius ro ≥ r = r − p
Shape Preserving Digitization of Ideal and Blurred Binary Images
89
c
A
A
b0
b3’
s1 r’
b0
f4
f4
c3 c1
c0
f3
f3
b3
c4
f1 f1
s2 f2
f4’
b4
b4 f3’
b3’
f4’
f3’
Fig. 5. Left: The gray level at any point c4 = c0 on b3 is smaller than the gray level at c0 ; center and right: decomposition of the circles b0 and b4 into subsets (see text).
that touches c0 but is otherwise completely outside of Ll . Likewise, there is a circle bin with radius ri ≥ r that is completely within Ll . Proof. Consider the set b0 = Kp (c0 ) centered at c0 . Let its boundary ∂Kp (c0 ) intersect the boundary ∂A at the points s1 and s2 (see fig. 4). Let g0 be the bisector of the line s1 s2 . By construction, c0 is on g0 . Define c1 and c2 as the points on g0 whose distance from s1 and s2 is r, and draw the circles b1 and b2 with radius r around them. Now, the boundary of A cannot lie inside either b1 \ b2 or b2 \ b1 , because otherwise A could not be r-regular. The areas where ∂A may run are marked dark gray in fig. 4. Since p < r, there can be no further intersections between ∂Kp (c0 ) and ∂A besides s1 and s2 . On g0 , mark the points c3 between c0 and c1 , and c3 between c0 and c2 , such that |c1 c3 | = |c2 c3 | and min(|c0 c3 |, |c0 c3 |) = r = r − p. Due to the triangle inequality, and since p < r, such a configuration always exists. We prove the lemma for the circle bout around c3 , bin around c3 is treated analogously. Let b3 = bout be the circle around c3 with radius r , and b3 the circle around c3 that touches s1 and s2 (fig. 5 left). Consider a point c4 on ∂b3 and draw the circle b4 with radius p around c4 . This circle corresponds to the footprint of the PSF centered at c4 . Now we would like to compare the result of the convolution kp χA at c0 and c4 . The convolution results are determined by the amount of overlap between A and b0 = Kp (c0 ) and b4 = Kp (c4 ) respectively. To compare b0 ∩ A and b4 ∩ A, we split the two circles into subsets according to fig. 5 center (only b0 , b4 and b3 are shown in this figure). Circle b0 consists of the subsets f1 , f2 , f3 , f4 , whereas b4 consists of f1 , f2 , f3 , f4 . The subsets f1 and f2 are shared by both circles, while due to symmetry f3 , f3 and f4 , f4 are mirror images of each other. In terms of the subsets, we can express the convolution results as follows: (kp χA )(c0 ) =
f1 ∩ A + f2 ∩ A + f3 ∩ A + f4 ∩ A Kp
(kp χA )(c4 ) =
f1 ∩ A + f2 ∩ A + f3 ∩ A + f4 ∩ A Kp
90
Ullrich K¨ othe and Peer Stelldinger
By straightforward algebraic manipulation we get: Kp ((kp χA )(c0 ) − (kp χA )(c4 )) = f3 ∩ A − f3 ∩ A + f4 ∩ A − f4 ∩ A
(1)
Since the radius of b3 is smaller than r, and its center c3 is between c0 and c1 , the boundary ∂b3 intersects ∂A only at s1 and s2 . It follows that subset f3 is completely inside of A, whereas f4 is completely outside of A. Hence, we have f3 ∩ A = f3 = f3 and f4 ∩ A = 0. Inserting this into (1), we get Kp ((kp χA )(c0 ) − (kp χA )(c4 )) = f3 − f3 ∩ A + f4 ∩ A > 0
(2)
Thus, the gray level at c4 is smaller than l. When c4 is moved further away from c0 , the subset f2 will eventually disappear from the configuration (fig. 5 right). If c3 is outside of b0 , f1 will finally disappear as well. It can easily be checked that (2) remains valid in either case. Due to the definition of c3 , no other configurations are possible. Therefore, the gray values on the boundary ∂bout are below l everywhere except at c0 . It remains to prove the same for the interior of bout . Suppose the gray level at point c ∈ b0out were l ≥ l. By what we have already shown, the associated level line ∂Ll cannot cross the boundary ∂bout (except at the single point c0 if l = l). So it must form a closed curve within bout . However, this curve would cross some normal of ∂A twice, in contradiction to lemma 5. This implies the claim for outside circles. The proof for inside circles proceeds analogously. We conclude that the shape of the level sets Ll is quite restricted: Theorem 2. Let A be an r-regular set, and Ll any level set of kp χA , where kp is a flat disk-like point spread function with radius p < r. Then Ll is r -regular (with r = r − p) and p-similar to A. Proof. The proof of r -regularity follows directly from the definition of r-regularity and lemma 6. Now assume that there exists a homeomorphism f : IR2 → IR2 such that f (A) = Ll and ∀x ∈ IR2 : |f (x) − x| ≤ p. This homeomorphism would induce a homeomorphism from A to Ll . Due to the embedding of f in IR2 , the homotopy trees of A and f (A) would be equal. Since |f (x) − x| ≤ p, the Hausdorff distance between ∂A and f (∂A) would be at most p. Thus, the existence of such a homeomorphism is sufficient to prove p-similarity. The required homeomorphism can indeed be constructed: Because of the restricted curvature of ∂A, the normals of ∂A cannot intersect within the pstrip Ap around ∂A (cf. [1,2]). Therefore, due to 5, every point s on ∂A can be translated along its normal towards a unique point on the given level line ∂Ll and vice versa. The distance between s and its image is ≤ p. This mapping can be extended to the entire IR2 -plane in the usual way, so that we get a homeomorphism with the desired properties. This finally allows us to show what happens during the digitization of a set A that was subjected to blurring with a PSF:
Shape Preserving Digitization of Ideal and Blurred Binary Images
91
Theorem 3 (sampling theorem for blurred binary images). Let A be an r-regular set, Ll any level set of kp χA , where kp is a flat disk-like point spread function with radius p < r, and S a grid with maximum pixel radius r < r − p. ˆ l of Ll is (p + r )-similar to A. The S-reconstruction L Proof. By theorem 2, Ll is r -regular and p-topologically similar to A. By theorem 1, the S-reconstruction of an r -regular set with an r -grid (r < r ) is r -similar to the original set. Thus A, Ll and Lˆl are topologically equivalent and have the same homotopy tree. Due to the triangle inequality of the Hausdorff ˆ l is at most p + r . metric, the Hausdorff distance between A and L Corollary 2. Since r + p < r, any S-reconstruction of Ll is r-topologically similar to A, regardless of how the grid is rotated and translated relative to A.
5
Conclusions
Our results are intuitively very appealing: When we digitize an ideal binary image with any r -grid, we can properly reconstruct a shape if it is r -regular with r > r . But when the image is first subjected to blurring with a PSF of radius p, the set must be r-regular with r > r + p. In other words, the radius of the PSF must be added to the radius of the grid pixels to determine the regularity requirements for the original shape. It should also be noted that r > r + p is a tight bound, which for instance would be reached if A consisted of a circle of radius r, and the threshold was 1 – in this case, any smaller circle could get lost in the reconstruction. However, for a single, pre-selected threshold a better bound can be derived. Our result is closely related to the findings of Latecki et al. [1,2] about vdigitization (and thus also square subset digitization and intersection digitization). In their approach, the grid must be square with sampling distance h, and the PSF is an axis aligned flat square with the√same size as the pixels. Then, the pixel and PSF radius are both √ r = p = h/ 2, and the original shape must be r-regular with r > r + p = 2h. This is exactly the same formula as in our case. We conjecture that our results can be generalized to a much wider class of radially symmetric PSFs, but we can’t prove this yet.
References 1. Latecki, L.J., Conrad, C., Gross, A.: Preserving Topology by a Digitization Process. Journal of Mathematical Imaging and Vision 8, 131–159, 1998. 2. Latecki, L.J.: Discrete Representation of Spatial Objects in Computer Vision. Kluwer Academic Publishers, Dordrecht, 1998. 3. Pavlidis, T.: Algorithms for Graphics and Image Processing. Computer Science Press, Rockville, Maryland. 1982. 4. Ronse, C., Tajine, M.: Discretization in Hausdorff Space. Journal of Mathematical Imaging and Vision 12, 219–242, 2000. 5. Serra, J.: Image Analysis and Mathematical Morphology Academic Press, New York, 1982.
Towards Digital Cohomology Rocio Gonzalez–Diaz and Pedro Real Applied Math Dept., University of Seville, Spain, {rogodi,real}@us.es, http://www.us.es/gtocoma
Abstract. We propose a method for computing the Z2 –cohomology ring of a simplicial complex uniquely associated with a three–dimensional digital binary–valued picture I. Binary digital pictures are represented on the standard grid Z3 , in which all grid points have integer coordinates. Considering a particular 14–neighbourhood system on this grid, we construct a unique simplicial complex K(I) topologically representing (up to isomorphisms of pictures) the picture I. We then compute the cohomology ring on I via the simplicial complex K(I). The usefulness of a simplicial description of the digital Z2 –cohomology ring of binary digital pictures is tested by means of a small program visualizing the different steps of our method. Some examples concerning topological thinning, the visualization of representative generators of cohomology classes and the computation of the cup product on the cohomology of simple 3D digital pictures are showed. Keywords: Digital topology, chain complexes, cohomology ring.
1
Introduction
The homology groups (given in terms of number of connected components, holes and cavities in the digital picture), the digital Euler characteristic or the digital fundamental group are well–known operations in Digital Topology [15,10]. All of them can be considered as translations into the discrete setting of classical continuous topological invariants. In order to prove that a digital topology operation πD (associated with a continuous operation πC ) correctly reflects the topology of digital pictures considered as Euclidean spaces, the main idea is to associate a “continuous analog” C(I) with the digital picture I. In most cases, each binary digital picture I is associated with a polyhedron C(I) [10,11,9,1]). It is clear that C(I) “fills the gaps” between black points of I in a way that strongly depends on the grid and adjacency relations chosen for the digital picture I. Recent attempts to enrich the list of computable digital topological invariants in such a way can be found in [8]. In this paper, we will consider binary digital pictures I = (Z3 , 14, 14, B), having the standard lattice Z3 as the underlying grid and fixing a special 14– adjacency for both the points of B and the points of its complement. Our binary digital picture space (or, briefly, DPS) is regular and isomorphic to the
Partially supported by the PAICYT research project FQM–296 from Junta de Andaluc´ıa (Spain).
I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 92–101, 2003. c Springer-Verlag Berlin Heidelberg 2003
Towards Digital Cohomology
93
well–known DPS called 3–d body–centered cubic grid [11]. Starting from a picture I, we construct, in a straightforward way, a simplicial complex K(I) based on the triangulation of the Euclidean 3–space determined by the previous 14– neighbourhood relation: The i–simplices of K(I) (i ∈ {0, 1, 2, 3}) are constituted by the different sets of i 14–neighbour black points in I (analogously, we can construct another simplicial complex whose i–simplices are the different sets of i 14–neighbour white points in I). We do not take care of the orientation of the simplices due to the fact that we are interested in computing the mod 2 cohomology. Since an isomorphism of pictures is equivalent to a simplicial homeomorphism of the corresponding simplicial representations, we are able to define the digital cohomology ring H ∗ (I; Z2 ) as the cohomology ring H ∗ (K(I); Z2 ). In this simplicial setting and using the technique of simplicial collapses [5], we topologically thin K(I), obtaining a smaller simplicial complex Mtop K(I). The following step is the computation of the cohomology ring H ∗ (Mtop K(I); Z2 ). Since H ∗ (Mtop K(I); Z2 ) is isomorphic to H ∗ (K(I); Z2 ), the information obtained in this way can be used for “topologically” classifying (up to isomorphisms of pictures) and distinguishing (up to cohomology ring level) 3–d binary digital pictures. A small program, called EditCup, for editing binary digital pictures and visualizing cohomology aspects of them has been designed by the authors and developed by others1 . This software allows us to test in some simple examples the potentiality and topological acuity of our method.
2
Simplicial Representation of 3D Pictures
We follow the terminology given in [11] for representing binary digital pictures. A 3D binary digital picture space (or, briefly, DPS) is a triple (V, β, ω), where V is the set of grid points in a 3–d grid and each of β and ω is a set of closed straight line segments joining pairs of points in V . The set β (resp. the set ω) determines the neighbourhood relations between black points (resp. white points) in the grid. An isomorphism of a DPS (V1 , β1 , ω1 ) to a DPS (V2 , β2 , ω2 ) is a homeomorphism h of the Euclidean 3–space to itself such that h maps V1 onto V2 , each β1 -adjacency onto a β2 -adjacency and each ω1 -adjacency onto an ω2 -adjacency, and h−1 maps each β2 -adjacency onto a β1 -adjacency and each ω2 adjacency onto an ω1 -adjacency. A 3D digital binary picture is a quadruple I = (V, β, ω, B), where (V, β, ω) is a DPS and B (the set of black points) is a finite subset of V. An isomorphism of a picture I1 = (V1 , β1 , ω1 , B1 ) to a picture I2 = (V2 , β2 , ω2 , B2 ) is an isomorphism of the DPS (V1 , β1 , ω1 ) to the DPS (V2 , β2 , ω2 ) that maps B1 onto B2 . The DPS used in this paper, that we call (14, 14)–DPS, is (Z3 , 14, 14), in which the underlying grid is the set of points with integer coordinates in the Euclidean 3–space E 3 and the 14–neighbours of a grid point (black or white) with integer coordinates (x, y, z) are: (x ± 1, y, z), (x, y ± 1, z), (x, y, z ± 1), 1
The 1st version was programmed by J.M. Berrio, F. Leal and M.M. Maraver. The 2nd version was programmed by F.Leal. This program has been already presented in [2]. http://www.us.es/gtocoma/editcup.zip.
94
Rocio Gonzalez–Diaz and Pedro Real
(x + 1, y − 1, z), (x − 1, y + 1, z), (x + 1, y, z − 1), (x − 1, y, z + 1), (x, y + 1, z − 1), (x, y − 1, z + 1), (x + 1, y + 1, z − 1), (x − 1, y − 1, z + 1). Nevertheless, the 14– adjacency for digital pictures has been usually defined in a 3–d body–centered cubic grid (BCC grid) [11]: The grid points are the points (a, b, c) ∈ Z3 such that a ≡ b ≡ c (mod 2). The 14–neighbours of a grid point p with coordinates (a, b, c) are: (a ± 2, b, c), (a, b ± 2, c), (a, b, c ± 2), (a ± 1, b ± 1, c ± 1). The (14, 14)–DPS and the BCC grid are isomorphic DPSs: a grid point (x, y, z) of the (14, 14)–DPS can be associated to the point (x + y + 2z, −x + y, −x − y) of the BBC grid.
Fig. 1. The 14–neighbours of a grid point p of the (14,14)–DPS (on the left) and the BCC grid (on the right).
3
An Approach to Digital Cohomology Ring
Given a binary digital picture I = (Z3 , 14, 14, B) on the (14, 14)–DPS, we can uniquely associate with it a 3–dimensional simplicial complex K(I) that we call the simplicial representation of the digital picture I. The vertices (or 0– simplices) of K(I) are the points of I. The edges, triangles and tetrahedra are formed joining two, three and four 14–neighbour points of B, respectively. This naive simplicial construction, together with the satisfactory algorithmic solution presented here to the problem of the computation of cohomology operations on finite simplicial complexes, will allow to “cohomologically control” the digital picture I (up to isomorphisms of pictures). Before explaining in detail the different steps of our method, we will enunciate the following theorem whose proof is straightforward and left to the reader. Theorem 1. Two binary digital pictures, I1 = (Z3 , 14, 14, B1 ) and I2 = (Z3 , 14, 14, B2 ), are isomorphic if and only if the simplicial representations K(I1 ) and K(I2 ) are simplicially homeomorphic. This last result allows us to define the following notion: Definition 1. Given a binary digital picture I = (Z3 , 14, 14, B), the digital Z2 cohomology ring of I is defined as the Z2 -cohomology ring of K(I). Since the simplicial complexes considered in this paper are embedding in R3 then homology groups are torsion free (moreover, the possible non–null homology groups are H0 (K), H1 (K) and H2 (K)). Therefore, homology and cohomology are isomorphic. The q–Betti number is the rank of the qth homology group.
Towards Digital Cohomology
95
In general, the 0th Betti number is the number of connected components, the 1st and 2nd Betti numbers have intuitive interpretations as the number of independent non–bounding loops and the number of independent non–bounding shells. Since the Betti numbers are independent of the group of coefficients we consider, throughout the paper, the ground ring is Z2 . In the next three subsections, we will reinterpret classical methods in Algebraic Topology and Homological Algebra in terms of chain contractions[12] that will enable us to design an algorithm for computing the cohomology rings of binary digital pictures. Now, the previous reading of the appendix is strongly recommended if the reader is not familiar with the concepts from Algebraic Topology presented in this section. Let us emphasize that a fundamental notion here is that of chain contraction: Definition 2. A chain contraction from a chain complex C to another chain complex C is a set of three homomorphisms (f, g, φ) such that: – f : C → C and g : C → C are chain maps. – f g is the identity map of C . – φ : C → C is a chain homotopy of the identity map idC of C to gf , that is, φ∂ + ∂φ = idC + gf . Important properties of chain contractions are that C has fewer or the same number of generators than C, and C and C have isomorphic homology groups. We will also use the following notation: let a be a chain and b an element of a. We denote by (a; b) the new chain obtained replacing b by a variable x and solving the equation a = 0 for the variable x. 3.1
Topological Thinning
Topological thinning is an important preprocessing operation in Image Processing. The aim is to shrink a digital picture to a smaller, simpler picture which retains a lot of the significant information of the original.Then, further processing or analysis can be performed on the shrunken picture. In our approach, a 3D binary digital picture is directly converted into a 3D simplicial complex. There is a well–known process for thinning a simplicial complex using simplicial collapses [3]. Suppose K is a simplicial complex, σ ∈ K is a maximal simplex and σ is a free facet of σ. Then, K simplicially collapses onto K − {σ, σ }. An important property of this process is that there exists an explicit chain contraction from C(K) to C(K − {σ, σ }) [5]. More generally, a simplicial collapse is any sequence of such operations. A thinned simplicial complex Mtop K is a subcomplex of K with the condition that all the faces of the maximal simplices of Mtop K are shared. Then, it is obvious that it is no longer possible to collapse. There is also an explicit chain contraction from C(K) to C(Mtop K). In particular, recall that this means that the (co)homology of K and Mtop K are isomorphic. The following algorithm computes Mtop K and a chain contraction (ftop , gtop , φtop ) from C(K) to C(Mtop K). Initially, Mtop K = K.
96
Rocio Gonzalez–Diaz and Pedro Real
While there exists a maximal simplex σ with a free facet σ do Mtop K := Mtop K − {σ, σ }, ftop (σ) := 0, ; φtop (σ) := 0, ftop (σ ) := ftop (∂σ; σ ), φtop (σ ) := σ + φtop (∂σ; σ ); if σ ∈ Mtop K, then ftop (σ) := σ, gtop (σ) := σ and φtop (σ) := 0. End. 3.2
“Algebraic Thinning”
Having obtained the simpler thinned complex Mtop K(I), we next compute its homology. The computation of a chain contraction (falg , galg , φalg ) from the chain complex C(Mtop K(I)) to its homology can be considered as a thinning, at algebraic level, of C(Mtop K(I)) (for this reason we call it “algebraic thinning”). We compute (falg , galg , φalg ) interpreting the “incremental algorithm” [4] for computing homology groups in R3 in terms of chain homotopies. This procedure is essential for us in order to calculate the cohomology ring of I. Let (σ1 , . . . , σm ) be a sorted set of all the simplices of a given simplicial complex L with the property that any subset {σ1 , . . . , σi }, i ≤ m, is a subcomplex of L. The algorithm computes a chain complex C with set of generators h, and a chain contraction (falg , galg , φalg ) from C(L) to C. Initially, h is empty. In the step ith of the algorithm, the simplex σi is added to the subcomplex {σ1 , . . . , σi−1 } and then, a homology class is created or destroyed. If falg ∂(σi ) = 0 then σi “creates” the class αi . Otherwise, σi “destroys” one homology class involved in the expression of falg ∂(σi ). At the end of the algorithm, C is a chain complex isomorphic to the homology of L. The pseudocode of the algorithm is: For i = 1 to i = m do if falg ∂(σi ) = 0 then h := h ∪ {αi }, else let falg (σj ) be an element of falg ∂(σi ) then falg (σi ) := 0, φalg (σi ) := 0, h := h − {αj }, falg (σj ) := (falg ∂(σi ); falg (σj )), φalg (σj ) := σi + (φalg ∂(σi ); φalg (σj )); for i = 1 to i = m do if αi ∈ h then αi := [σi + φalg ∂(σi )], falg (σi ) := αi , galg (αi ) := σi + φalg ∂(σi ), φalg (σi ) := 0. End. Recall that the output of the algorithm is a chain contraction (falg , galg , φalg ) from C(L) to H(L), allowing us to determine both a representative cycle for each homology class and the homology class for each cycle. Moreover, for any q–boundary a on L we can obtain a (q + 1)–chain a = φalg (a) on L such that a = ∂(a ). The algorithm runs in time at most O(m3 ) if L has m simplices. The idea of computing a contraction from a chain complex to its homology has also been used in [6,7]. In [6] the computation of the contraction is based on the transcription of the reduction algorithm [14, p. 58] and it is used for computing primary and secondary cohomology operations.
Towards Digital Cohomology
97
We can compose the chain contraction (ftop , gtop , φtop ) from C(K(I)) to C(Mtop K(I)), described in the section above, with that (falg , galg , φalg ) from C(Mtop K(I)) to H(Mtop K(I)) (which is isomorphic to H(K(I))). We then obtain a new chain contraction [12] (falg ftop , gtop galg , φtop + gtop φalg ftop ) from C(K(I)) to H(K(I)). Example 1. Let I be the digital picture showed in Figure 3. The non–null images of the component morphisms of a chain contraction (falg , galg , φalg ) from C(K(I)) to H(K(I)) obtained using the algorithm explained above are: K h 1 α1 2 3 4 2, 4 5 6 7 1, 2 α16
falg galg φalg α1 1 0 α1 1, 7 + 6, 7 + 5, 6 + 4, 5 + 3, 4 + 2, 3 α1 1, 7 + 6, 7 + 5, 6 + 4, 5 + 3, 4 α1 1, 7 + 6, 7 + 5, 6 + 4, 5 0 2, 3, 4 α1 1, 7 + 6, 7 + 5, 6 α1 1, 7 + 6, 7 α1 1, 7 α16 a 0
Where a = 1, 2 + φ(2). Therefore, H0 (I) Z2 , H1 (I) Z2 and H2 (I) = 0.
Fig. 2. A digital picture I and its simplicial representation K(I).
3.3
Computing the Digital Z2 -Cohomology Ring
After applying topological and algebraic thinning to the simplicial representation K of a binary digital picture I in order, we are able to compute the multiplication table on the cohomology. Let (f, g, φ) be a contraction from C(K) to H(K). Observe that if γ ∈ Hq (K) then γ ∗ : Hq (K) → Z2 , defined by γ ∗ (ω) = 1 if ω = γ and γ ∗ (ω) = 0 otherwise, is a cohomology class of K. Moreover, γ ∗ f : Cq (K) → Z2 is a representative cocycle of γ ∗ . Let {α1 , . . . , αp } and {β1 , . . . , βq } be sets of generators of H1 (K) and H2 (K) then, {α1∗ , . . . , αp∗ } and {β1∗ , . . . , βq∗ } are sets of generators of H 1 (K) and H 2 (K). The cohomology ring of K is computed as follows: For i = 1 to i = p do q ∗ ∗ ∗ for j = i to j = p do k=1 ((αi f αj f )(gβk )) · βk . End.
98
Rocio Gonzalez–Diaz and Pedro Real
Fig. 3. The pictures X and Y and their simplicial representations K and L.
Given a binary digital picture I, the total algorithm for computing the cohomology ring of I runs in time at most O(m6 ) if K(I) has m simplices. As we have said in Section 1, in order to show an example of the computation and visualization of the cohomology ring of simple 3D binary digital pictures, we expose a small prototype called EditCup. We use a free program for building 3D words. In our case, a world is a particular 3D simplicial complex K representing a digital picture I considering the 14–adjacency. A way for distinguishing the different maximal simplices of the simplicial complex associated with a simplicial representation is by using different colours: red for tetrahedra, green for triangles, blue for edges, and black for vertices. For visualizing (co)chains, the simplices on which a given (co)chain is non–null, are lighted in a different color. On the other hand, the “visualization” of any Z2 –(co)homology class on the original binary digital picture I is given by lighting the points of I such that the corresponding vertices span simplices on which the representative cochain of this class (obtained using our algorithm) is non–null. Let us consider now the following pictures (see Figure 4): the torus (the picture X) and the wedge of two topological circles and a topological 2–sphere (the picture Y ). In order to compute the cup product, we need the simplicial representation K and L of X and Y , respectively (see Figure 4). It is clear that the (co)homology groups of X are isomorphic to those of Y . They are Z2 , Z2 ⊕Z2 and Z2 in dimension 0, 1 and 2, respectively. Let us denote by a1 , a2 and a3 the representative cycles of the classes in H1 (K) and H2 (K), respectively; and by a1 , a2 and a3 the same in H1 (L) and H2 (L). We show the visualization of this cycles in Figure 5. In Figure 6 we show the two representative cocycles u, v generating H 1 (K) and the cup product w = u v which is a representative cocycle of H 2 (K). If we consider now the representative cocycles u and v generating H 1 (L), and w generating H 2 (L) then [u ] [u ] = [v ] [v ] = [u ] [v ] = 0. We conclude that X and Y are not isomorphic. Let us note that this multiplication table for the cohomology ring of K is not suitable in general for topological classification tasks, due to the fact that determining whether two rings are isomorphic or not by means of its respective multiplication tables is an extremely difficult computational question. In order to avoid this problem, we can put the information of the cup product in a matrix form M (pairs of cohomology classes of dimension 1 × cohomology classes of
Towards Digital Cohomology
99
Fig. 4. The cycles a1 , a2 and a1 , a2 (in yellow); and a3 and a3 (in green).
Fig. 5. The cocycles u, v and u , v (in yellow); and w and w (in green).
dimension 2). From the diagonalization D of the matrix M , a first cohomology invariant HB1 (I) appropriate for distinguishing non–isomorphic binary digital pictures with isomorphic (co)homology groups appears. Definition 3. Given a 3D binary digital picture I, the cohomology invariant HB1 (I) is defined as the rank of the matrix M . For example, the matrices corresponding to the cohomology rings of the pictures X and Y are: X ([u],[u]) ([u],[v]) ([v],[v]) [w] 0 1 0
Y ([u’],[u’]) ([u’],[v’]) ([v’],[v’]) [w’] 0 0 0
Therefore, HB1 (X) = 1 and HB1 (Y ) = 0. In fact, more complicated topological invariants can be derived from the cohomology ring in a similar way. Constructing these invariants is a very technical matter and we will study it in detail in the near future. Nevertheless, we will confine ourselves to say that these topological numbers can be directly generated from the homology of a well–known chain complex in Homological Algebra: the reduced bar construction of an algebra [12].
100
Rocio Gonzalez–Diaz and Pedro Real
References 1. Ayala R., Dom´ınguez E., Franc´es A.R., Quintero A.: Homotopy in Digital Spaces. DGCI 2000 LNCS, Springer–Verlag 1953 (2000) 3–14 2. Berrio J.M., Gonz´ alez–D´ıaz R., Leal F., L´ opez M.M., Real P.: Visualizing Cohomology Aspects of 3D Objects. Proc. of the 6th Asian Tech. Conf. in Math. (2001) 459–468. 3. Bj¨ orner A.: Topological Methods. Handbook on Combinatorics. Elsevier Sci., 2 (1995) 1819–1872 4. Delfinado C.J.A., Edelsbrunner H.: An Incremental Algorithm for Betti Numbers of Simplicial Complexes on the 3–Sphere. Comput. Aided Geom. Design 12 (1995) 771–784 5. Forman R.: Combinatorial Differential Topology and Geometry. New Perspective in Geom. Combinatorics. MSRI Public. 8 (1999) 177–206 6. Gonz´ alez–D´ıaz R., Real P.: Computation of Cohomology Operations on Finite Simplicial Complexes. Homology, Homotopy and Applications 5 (2) (2003) 83–93 7. Gonz´ alez–D´ıaz R., Real P.: Geometric Objects and Cohomology Operations. Proc. of the 5th Workshop on Computer Algebra in Scientific Computing (2002) 121–130 8. Kenmochi Y., Imiya A.: Discrete Polyhedrization of Lattice Point Set. Digital and Image Geometry, LNCS, Springer–Verlag 2243 (2001) 150–162 9. Khalimsky E.D., Kopperman R.D., Meyer P.R.: Computer Graphics and Connected Topologies on Finite Ordered Sets. Topology and Appl. 36 (1990) 1–17 10. Kong T.Y.: A digital Fundamental Group. Comput. Graphics 13 (1989) 159–166 11. Kong T.Y., Roscoe A.W., Rosenfeld A.: Concepts of Digital Topology. Topology and its Applications 8 (1992) 219–262 12. MacLane S.: Homology. Classic in Math., Springer–Verlag (1995) 13. Kovalevsky V.A.: Discrete Topology and Contour Definition. Pattern Recognition Letter 2 (1984) 281–288 14. Munkres J.R.: Elements of Algebraic Topology. Addison–Wesley Co. (1984) 15. Rosenfeld A.: 3D Digital Topology. Inform. and Control 50 (1981) 119–127
Appendix: Basic Notions From Algebraic Topology In this section we briefly explain the main concepts from Algebraic Topology we use in this paper. Our terminology follows Munkres book [14]. The four types of non–empty simplices in R3 are: a 0–simplex which is a vertex, a 1–simplex which is an edge, a 2–simplex which is a triangle and a 3– simplex which is a tetrahedron. Considering an ordering on a vertex set V , a q–simplex with vertices v0 < · · · < vq in V is denoted by v0 , . . . , vq . If i < q, an i–face of σ is an i–simplex whose vertices are in the set {v0 , . . . , vq }. A facet of σ is a (q − 1)–face of it. A simplex is shared if it is a face of more than one simplex. Otherwise, the simplex is free if it belongs to one higher dimensional simplex, and maximal if it does not belong to any. A simplicial complex K is a collection of simplices such that every face of a simplex of K is in K and the intersection of any two simplices of K is a face of each of them or empty. The set of all the q–simplices of K is denoted by K (q) . A subset K ⊆ K is a subcomplex of K if it is a simplicial complex itself. Let K and L be simplicial complexes and let |K| and |L| be the subsets of R3 that
Towards Digital Cohomology
101
are the union of simplices of K and L, respectively. Let f : K (0) → L(0) be a map such that whenever the vertices v0 , . . . , vn of K span a simplex of K, the points f (v0 ), . . . , f (vn ) are vertices of a simplex of L. Then f can be extended to ti f (vi ). a continuous map g : |K| → |L| such that if x = ti vi then g(x) = The map g is called a simplicial homeomorphism if f is bijective and the points f (v0 ), . . . , f (vn ) always span a simplex of L. ∂q+1
∂q
A chain complex C is a sequence · · · Cq+1 −→ Cq −→ Cq−1 · · · of abelian groups Ci and homomorphisms ∂i , indexed with the integers, such that for all q, ∂q ∂q+1 = 0 . A q–chain a ∈ Cq is called a q–cycle if ∂q (a) = 0. If a = ∂q+1 (a ) for some a ∈ Cq+1 then a is called a q–boundary. We denote the groups of q–cycles and q–boundaries by Zq and Bq respectively, and define Z0 = C0 . Since Bq ⊆ Zq , define the qth homology group to be the quotient group Zq /Bq , denoted by Hq (C). Given a ∈ Zq , the coset a + Bq is the homology class in Hq (C) determined by a. We denote this class by [a]. Let C = {Cq , ∂q } and C = {Cq , ∂q } be two chain complexes. A chain map f : C → C is a family of homomorphisms {fq : Cq → Cq } such that ∂q fq = fq−1 ∂q . A chain map f : C → C induces a homomorphism f∗ : H(C) → H(C ). Let K be a simplicial complex. A q–chain a on K is a formal sum of simplices of K (q) . The q–chains form a group with respect to the component–wise addition mod 2; this group is the qth chain group of K, denoted by Cq (K). The boundary of q a q–simplex σ = v0 , . . . , vq is the formal sum: ∂q (σ) = i=0 v0 , . . . , vˆi , . . . , vq where the hat means that vi is omitted. By linearity, the boundary operator ∂q can be extended to q–chains. The homology of K, denoted by H(K), is defined by the homology of the chain complex C(K). Let C = {Cq , ∂q } be a chain complex. Define the q–dimensional cochain group of C by the equation: C q (C) = {c : Cq → Z2 such that c is a homomorphism} . The boundary operator ∂q+1 on Cq+1 induces the coboundary operator δq : C q → C q+1 via δq c = c∂q+1 . It follows that δq δq−1 = 0. In the obvious way, there are also the dual notions of cocycles, coboundaries and cohomology of a cochain complex C ∗ (C). Given a simplicial complex K, C q (K) denote the q–cochain group C q (C(K)). Observe that a q–cochain c can be defined on the q–simplices of K and it is naturally extended to Cq (K). Define the cup product : C p (K) × C q (K) → C p+q (K) by the formula (c c )(σ) = cv0 , . . . , vp • c vp , . . . , vp+q , where σ = v0 , . . . , vp+q ∈ K (p+q) . It induces an operation : H p (K) × H q (K) → H p+q (K) that is bilinear, associative, independent of the ordering of the vertices of K and topologically invariant as follows: [c] [c ] = [c c ].
New Results about Digital Intersections Isabelle Sivignon1 , Florent Dupont2 , and Jean-Marc Chassery1
2
1 Laboratoire LIS Domaine universitaire Grenoble - BP46 38402 St Martin d’H`eres Cedex, France {sivignon,chassery}@lis.inpg.fr Laboratoire LIRIS - Universit´e Claude Bernard Lyon 1 Bˆ atiment Nautibus - 8, boulevard Niels Bohr 69622 Villeurbanne cedex, France [email protected]
Abstract. Digital geometry is very different from Euclidean geometry in many ways and the intersection of two digital lines or planes is often used to illustrate those differences. Nevertheless, while digital lines and planes are widely studied in many areas, very few works deal with the intersection of such objects. In this paper, we investigate the geometrical and arithmetical properties of those objects. More precisely, we give some new results about the connectivity, periodicity and minimal parameters of the intersection of two digital lines or planes. Keywords: Digital straight lines and planes, intersection.
1
Introduction
Digital straight lines and digital planes properties have been widely studied in many fields like topology, geometry and arithmetics. Topologically, those object are well defined according to the digitization scheme employed. On the geometrical ground, connectivity features have been determined and a characterization using convex hull properties [1] has been proposed. Finally, an arithmetical definition [2,3] provides a general model to handle all the definitions proposed so far. Those properties led to many recognition algorithms. Geometric algorithms [4] decide whether a set of pixels/voxels is a digital line/plane or not, and arithmetical algorithms [5] moreover return, for a given digitization scheme, the parameters of the Euclidean lines/planes the digitization of which contains the set of pixels/voxels. Discrete geometry is different from Euclidean geometry in many ways, but the differences between the intersection of two Euclidean lines and two digital lines is often used to illustrate this difference. Indeed, while the intersection of two Euclidean lines is a Euclidean point, the intersection of two digital lines can be a discrete point, a set of discrete points or even empty on rectangular grids. However, only a few works deal with the properties of digital lines or planes intersections. Nevertheless, a good knowledge of those objects is useful, for inI. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 102–113, 2003. c Springer-Verlag Berlin Heidelberg 2003
New Results about Digital Intersections
3
2
4
y
1
0
111 000 000 111 000 111
(2,−3) x
1111 0000 000 111 1111 000 111 0000111 1111 0000 000
5
(0,0)
(a)
6
103
7
(b)
(c)
(d)
000 111 000 000111 111 000 111 111000 111 000 000 111 111 000 000 111
000111 111 000 000111 111 000 111 111 000000
(e)
Fig. 1. (a) The digital naive line (2, −3, 0); (b)Freeman code; (c) Two naive lines with no common direction; (d) Two naive lines with one common direction; (e) Two naive lines in the same octant.
stance during the polygonalization process of a discrete curve or a discrete surface. Indeed, this process implies the definition of edges and vertices that are to be found in the intersection of digital lines in the case of polygonal curves or digital planes in the case of digital surfaces. In [6], using the arithmetical definition of a discrete line/plane, Debled et al. present a definition of the set of intersection pixels/voxels of two digital lines/planes using an unimodular matrix. This definition enables the design of an efficient algorithm to determine all the pixels/voxels of an intersection given the parameters of the two lines/planes. However, no results are given about the topology and arithmetics of this intersection. In this paper, we present new results about digital lines and digital planes intersection. We focus our study on two properties that describe both topology and arithmetics: connectivity and minimal parameters. The first part deals with the intersection of two digital lines. We present a criterion to analyze the connectivity of the intersection of any two digital lines, thus completing the results presented in [2] for lines with slopes between 0 and 1. Then, we propose a study about the minimal arithmetic parameters of digital lines intersection and give a result allowing to design an efficient algorithm to find those parameters. The second part deals with digital planes intersection: after some results about connectivity characteristics, we prove that the intersection is periodic and give the minimal period. Finally, we define and determine the minimal parameters of the intersection of two digital planes.
2
Digital Lines Intersection
In this section, we focus on the properties of digital lines intersections. A digital naive line of parameters (a, b, µ) is the set of integer points {(x, y)} fulfilling the conditions 0 ≤ ax + by + µ < max(|a|, |b|). An illustration is proposed in Figure 1(a). Let us consider two digital naive lines denoted L1 and L2 . L1 ∩ L2 is a set of pixels the connectivity of which depends on the parameters of the two digital lines.
104
2.1
Isabelle Sivignon, Florent Dupont, and Jean-Marc Chassery
Connectivity
In 1991, J.-P. Reveill`es [2] proposed a criterion to determine whether the intersection of two digital naive lines with slopes between 0 and 1 is connected or not. Nevertheless, he does not give any information about the intersection of any two digital naive lines. We propose here such a criterion using the Freeman code depicted on Figure 1(b). These directions define 8 octants but only 4 remain if we consider symmetries around the central point. For instance, the octant {4, 5} is equivalent to the octant {0, 1}. A classical result is that the Freeman code of any digital naive line is composed of at most two consecutive different directions, which means that one digital line belongs to one octant. Proposition 1. Let L1 and L2 be two digital naive lines. Then: – if they belong to the same octant, their intersection may be not connected , and [2] gives a criterion to analyze exactly the connectivity; – if they belong to two neighbors octants, their intersection is either empty or connected; – otherwise, their intersection is either empty or reduced to a unique pixel. In the following we denote F1 (resp. F2 ) the set of directions composing the Freeman code of L1 (resp. L2 ). An illustration is given Figure 1. Proof. Let L1 and L2 be two digital naive lines. If L1 and L2 belong to the same octant, |F1 ∩F2 | = 2. If they belong to neighbor octants, |F1 ∩F2 | = 1. Otherwise |F1 ∩ F2 | = 0. Let us give a classification of the pixels of L1 and L2 . We denote p1,k = p2,k = pk the pixel of L1 ∩ L2 with minimal x-coordinate and maximal y-coordinate, if there exist one. Then, p1,k+1 (resp. p2,k+1 ) is the successor of pk along L1 with increasing x-coordinate (resp. L2 ). – if |F1 ∩ F2 | = 0 (Figure 1(c)), then p1,k+1 = p2,k+1 as they are the successors of the same point using two different directions. Suppose that L1 is composed of 0 and 1 freeman codes, and that L2 is composed of 2 and 7. The other cases are symmetrical. Then, let us consider a pixel p1 (xp , yp1 ) ∈ L1 with xp greater that the x-coordinate xk of pk , and p2 (xp , yp2 ) ∈ L2 . Then, yk ≤ yp1 and yp2 ≤ yk − (xp − xk ), with xp >= xk . Hence, the two lines do not have any common point after pk . – if |F1 ∩ F2 | = 1 (Figure 1(d)) then let us denote α1i (resp. α2i ) the direction used from p1,i to p1,i+1 (resp. p2,i to p2,i+1 ). Hence, while α1i = α2i , i ≥ k, p1,i+1 = p2,i+1 . Both pixels p1,i and p1,i+1 belong to the intersection and are 8-connected. Unless the two lines are confounded, there exist j such that α1j = α2j . Hence, p1,j+1 = p2,j+1 . Suppose that L1 is composed of 0 and 1 freeman codes, and that L2 is composed of 1 and 2. The other cases are symmetrical. Then, let us consider a pixel p1 (xp , yp1 ) ∈ L1 with xp greater that the x-coordinate xj of p1,j = p2,j , and p2 (xp , yp2 ) ∈ L2 . Then, yp1 ≤ yj + (xp − xj − 1) and yp2 ≥ yj + (xp − xj ). Hence, the two lines do not have any common point after p1,j . – if |F1 ∩ F2 | = 2 (Figure 1(e)) , we refer to [2] to analyze the connectivity.
New Results about Digital Intersections
2.2
105
Minimal Parameters
The intersection of two digital lines is a set of collinear discrete points. To characterize this set of points, it is interesting to know the straight lines which digitization contains all the intersection pixels. Obviously, the two lines we are studying are solutions. Consider a straight line y = α0 x+β0 , 0 ≤ α0 , β0 ≤ 1, thus in the octant {0, 1}. Its digitization with the Object Boundary Quantization is the set of discrete points lying on or just under the line. Given a set of discrete points P, we call preimage and denote D(P) the set of straight lines (α, β) : y = αx + β, the OBQ digitization of which contains the discrete points P. Definition 1. Let P be a set of discrete points and D(P) its preimage. The minimal parameters of P are the values ( ab , µb ) ∈ D(P) such that b and µ are minimal. In the following, we show how to find the minimal parameters of the intersection of any two digital naive lines using two different methods and emphasizing the links between them. Preimage Study. First of all, we show how to find the directional vector of the minimal parameters studying the structure of the intersection preimage. To study the intersection of any two digital lines, we need to work in the same straight line parameter space for any slope, greater or smaller than 1. In [7], Veelaert shows that the transformation between the space where a ≥ b into the space where a ≤ b can be done with a central symmetry in a 3D space. Thus, we can work in the straight line parameter space where a point (α, β) represents the line y = αx + β, for all α and β. In this space, the preimage of a digital straight line of slope ab with a ≤ b and no remainder is the segment [( ab , 0), ( ab , 1b )], and the preimage of a digital straight line of slope ab with a ≥ b and no remainder is the segment [( ab , 0), ( ab , − 1b )]. For instance, the preimage of the line of slope 1 is the segment [(1, −1), (1, 1)] in the parameter space. We consider two digital naive lines L1 and L2 with slopes ab and dc and no remainder, and their intersection I = L1 ∩ L2 . Without loss of generality, we assume that ab < dc . We denote D(L1 ) (resp. D(L2 )) the preimage of L1 (resp. L2 ). The preimage D(I) of L1 ∩ L2 is a convex polygon including D(L1 ) and D(L2 ), and its convexity implies that it includes the segment [( ab , 0), ( dc , 0)] (see Figure 2 for illustrations). Moreover, as I contains all the discrete points belonging simultaneously to L1 and L2 , adding one more pixel of L1 or L2 to I cuts D(I) into two parts, one including D(L1 ) and the other including D(L2 ). Theorem 1. The minimal directional vector of the intersection of two lines of slopes ab and dc , ab < dc is given by the rational fraction uv lying between ab and dc with minimal denominator v. Proof. Consider the set of discrete points belonging to L1 and L2 , I = L1 ∩ L2 and call D(I) its preimage. We divide the proof of the theorem into 3 cases that are depicted in Figure 2.
106
Isabelle Sivignon, Florent Dupont, and Jean-Marc Chassery
l2
a b
0 1
(a)
a b
1 1
c d
c d
u v
(b)
l1
a b
c u+1 d v
(c)
Fig. 2. Illustration of the three cases of Theorem 1.
– Assume that ab ≤ 0 and dc ≥ 0. Then, the fraction 01 lies between ab and dc . Consequently, the line with slope 01 is a solution, and obviously the solution with minimal denominator. (cf. Figure 2a) – Assume that ab ≤ 1 and dc ≥ 1. Then, the fraction 11 lies between ab and c 1 d , and from what we said before, we deduce that the line with slope 1 is a solution, and by the way the one with minimal denominator.(cf. Figure 2b) – Assume that 0 ≤ ab < dc ≤ 1. We know that any fraction between ab and c d is a solution. By the way, the fraction with minimal denominator lying between ab and dc is a solution. We show that there does not exist a solution fraction with a smaller denominator outside the segment defined by ab and dc . Suppose that there exist such a fraction denoted uv . Then, v < b and v < d. Suppose that uv < ab and that | ab − uv | is minimal for the set of irreducible fractions smaller than ab with denominator v. The case uv > dc is symmetrical. Consider the discrete point p(−v, −u − 1). Adding this point to L1 ∩ L2 implies two new half-spaces constraints given by 0 ≤ −αv + u + 1 + β < 1 in the straight lines parameter space. This strip is delimited by two lines l1 : −αv + u + 1 + β = 0 and l2 : −αv + u + 1 + β = 1. l1 cuts the x-coordinate axis for x = u+1 and l2 for x = uv (see Figure 2c).Thus, since v is smaller v than any denominators of the fractions lying between ab and dc , u+1 v is either greater than dc or smaller than ab . But since we assume that uv was the closest fraction with denominator v smaller than ab , we get that uv < ab < dc < u+1 v . Finally, D(I ∪ p) includes at the same time D(L1 ) and D(L2 ), which leads to the contradiction. All the remaining cases can be treated as one of those three.
Geometrical Method. The preimage study gives us the value of the minimal directional vector of the intersection of two digital lines. We propose here a geometrical point of view that leads to an algorithm to find both the minimal directional vector and the corresponding remainder. To do so, let us introduce a structure called Stern-Brocot tree (see [8] for a complete definition or [9] for a more informal approach) which contains all the positive irreducible rational fractions. An illustration of this tree is proposed in figure 3(a). The idea under its construction is to begin with the two fractions 0 1 1 and 0 and to repeat the insertion of the median of these two fractions as m m follows: insert the median m+m n+n between n and n . Many works deal with the
New Results about Digital Intersections
107
7
−1/5 −2/7 −3/8 −3/7 −4/7 −5/8 −5/7 −4/5 −5/4 −7/5 −8/5 −7/4 −7/3 −8/3 −7/2 −5/1 −1/4
−2/5
−3/5
−1/3
−3/4
−4/3
−2/3
−5/3
−5/2
−3/2
−4/1 −3/1
−1/2
−2/1
−1/1 1/0
0/1
0
2
1/1
1/1
2/1
1/2
2/3
1/3
3/2
2/3
3/1 0/1
1/4
2/5
3/5
3/4
4/3
5/3
5/2
4/1
1/2
1/5 2/7 3/8 3/7 4/7 5/8 5/7 4/5 5/4 7/5 8/5 7/4 7/3 8/3 7/2 5/1 1
(a)
3/5 5/8
(b)
Fig. 3. (a) Stern-Brocot tree: positive and negative irreducible rational fractions. (b) Decomposition of one period of the digital line of slope 58 : for each fraction of the path in the Stern-Brocot tree, the corresponding subset of pixels of the line.
relations between irreducible rational fractions and digital lines (see [10,11] for a characterization with Farey series, and [12] for a link with decomposition into continuous fractions), but in [5], Debled first introduced the link between this tree and digital lines. She noticed that recognizing a piece of digital line is like going down the Stern-Brocot tree up to the directional vector of the line. In the following, we call Stern-Brocot tree root the two fractions 01 and 10 . Theorem 2. Let L be a digital line of slope ab , and S( ab ) be the path going from the Stern-Brocot tree root to the fraction ab . Then, for each fraction abii lying on S( ab ), there exist a subset of bi + 1 pixels of L having a minimal directional vector abii . Moreover, for any other fraction, there does not exist such a subset of L. This theorem means that the path leading to the fraction ab represents all the patterns of length smaller than b included in L. If b = 0 for a given digital line, then we consider the fraction ab and the same results hold. Before the proof of this theorem, let us give a few lemmas. The proof of lemma 1 was given by Dorst and Duin in [13]. Lemma 1. Let L1 and L2 be two digital naive lines of slope uv11 and uv22 such that u2 v1 − u1 v2 = 1. Let C1 (resp. C2 ) be the Freeman code associated to a period of L1 (resp. L2 ) of length v1 + 1 (resp. v2 + 1). Then, the Freeman code 2 associated to a period of the digital naive line of slope uv11 +u +v2 is C1 C2 of length v1 + v2 + 1. An illustration of this lemma is given in Figure 3(b).
108
u v.
Isabelle Sivignon, Florent Dupont, and Jean-Marc Chassery
We call mothers of a fraction uv the two fractions Hence, we have the following result:
u1 v1
and
u2 v2
such that
u1 +u2 v1 +v2
=
Lemma 2. Let ab an irreducible rational fraction and S( ab ) its related path. Then, the mothers of ab lie on S( ab ). Moreover, if we denote A( ab ) the set of ancestors of ab according to the definition of mothers, we have S( ab ) = A( ab ). This lemma is directly derived from the definition and construction of the Stern-Brocot tree. Proof (Theorem 2). Let ab an irreducible rational fraction and S( ab ) its related path. Let uv ∈ S( ab ) another rational fraction. Two possibilities: – if uv is one of ab mothers, then we derive the result from lemma 1; – otherwise, according to lemma 2, uv is one of ab ancestors, and the result is obtained by induction. a b
ancestors represent all the connected subsets of discrete points that appear in the digital line of slope ab . As S( ab ) = A( ab ), there is no fraction outside the path corresponding to a connected pattern of the digital line of slope ab . Hence, each node of the tree matches with a pattern. Since the intersection of two digital lines is composed of patterns appearing in the two lines, we just have to look for the closest common ancestor of the two corresponding fractions to find the minimal parameters of the intersection,. Theorem 3. Let L1 and L2 be two digital lines of slopes ab11 and ab22 . Then, the minimal parameters of L1 ∩ L2 are given by ab11 and ab22 closest common ancestor in the Stern-Brocot tree. If the two digital lines studied are such that b1 = 0 and a2 = 0, then the corresponding nodes are the root of the Stern-Brocot tree, and the minimal parameters are any of the two fractions of the root. Originally, the Stern-Brocot tree defines only the positive irreducible rational fractions. In order to study the intersection of any two digital lines, we generalize this tree adding its negative symmetrical as shown on Figure 3(a). It is easy to see with the preimage study or the geometrical method that the directional vector found for two digital lines with no remainder is also solution for any remainder. Nevertheless, if the cardinal of the intersection is smaller than the length of the common pattern described by the directional vector found, there exist smaller parameters. In that case, the minimal directional vector can be found among the common ancestors of the two fractions in the Stern-Brocot tree, looking for the one with the smallest denominator greater than or equal to the intersection cardinal minus 1. Theorems 1 and 3 are equivalent as looking for the closest common ancestor of two fractions is the Stern-Brocot tree is like looking for the fraction with minimal denominator lying between those two fractions. Nevertheless, this geometrical point of view is useful to design an efficient algorithm to determine the
New Results about Digital Intersections
109
0/1
µ
1/1 µ + µ
1/2 2µ + µ 2/3 3µ + 2µ
1/3
1/4
2/5
3/5
3/4
µ,µ 1/5
2/7
3/8
3/7
4/7
5/8
5/7
4/5
8µ + 5µ 5µ + 4µ
Fig. 4. Remainder calculation for the digital lines (4, −5, µ1 ) et (5, −8, µ2 ).
minimal directional vector. Moreover, we show that this method enables to find the minimal remainder associated to this minimal directional vector. Let us define the following labelling L of the Stern-Brocot tree nodes: – L( 01 ) = µ and L( 10 ) = µ ; – let ab be a node and uv11 and
u2 v2
its mothers: then L( ab ) = L( uv11 ) + L( uv22 ).
Finally, L( ab ) = bµ + aµ . Each node label thus depends on only two variables. Now let us consider the intersection of two digital lines L1 (a, −b, µ1 ) and L2 (c, −d, µ2 ). Mapping the remainder values with the corresponding nodes labels, we get the following system: bµ + aµ = µ1 dµ + cµ = µ2 Hence, we can deduce the values of µ and µ , and injecting those values in the label of the node corresponding to the intersection parameters, we get the remainder of the intersection. Figure 4 illustrates this with an example.
3
Digital Planes Intersection
In this part, we extend the properties found on digital lines intersection for digital planes intersection and present some properties peculiar to planes. The grid considered is a square grid with a 26-6 connectivity. 3.1
Periodicity
Proposition 2. Let P1 (a, b, c, µ) and P2 (d, e, f, ν) be two digital planes. Let v(v1 , v2 , v3 )T be the cross product of (a, b, c)T and (d, e, f)T . Let g = gcd(v1 , v2 , v3) and v = g1 v. Then P1 ∩ P2 is periodic of period v .
110
Isabelle Sivignon, Florent Dupont, and Jean-Marc Chassery
Proof. Let us denote r1 (x, y, z) = ax+by+cy+µ and r2 (x, y, z) = dx+ey+f y+ν the remainder function of the two planes. Let M (x, y, z) ∈ P1 ∩ P2 . Then M + tv is not an integer point if t is not integer. We show that M + v belongs to P1 ∩ P2 and that r1 (M + v ) = r1 (M ) and r2 (M + v ) = r2 (M ): 1 r1 (M + v ) = ax + by + cz + µ + (abf − ace + bdc − abf + ace − bcd) = r1 (M ) g The same calculation can be done with the r2 function and this achieves that P1 ∩ P2 is periodic of period v . 3.2
Minimal Parameters
In this part, we focus on the minimal parameters of the intersection of two digital planes. To work in the same parameter space for any parameters, we use the same trick as the one proposed by Veelaert [7] for lines, presented in section 2.2. Hence, we work in the parameter space where a point (α0 , β0 , γ0 ) stands for the plane α0 x + β0 y + z + γ0 = 0 in the Cartesian space for any value of α0 , β0 and γ0 . Given two digital planes P1 and P2 , we look for the plane parameters (u, v, w, µ) with minimal w and µ the OBQ digitization of which contains all the voxels of P1 ∩ P2 . In the following, we consider digital naive planes with no remainder: digital naive planes are the thinnest 18-connected digital planes without 6-connected holes. First of all, Proposition 3 gives a description of the intersection preimage. Proposition 3. Let P1 (a, b, c, 0) and P2 (d, e, f, 0) be two digital naive planes. We denote I = P1 ∩ P2 . Then, D(I) is a polygon included in the plane perpendicular to γ = 0 and containing the points ( ac , cb , 0) and ( fd , fe , 0). Proof. Since the two planes have no remainder, the point (0, 0, 0) is a lower leaning point of the two digital planes. As I is periodic of period v (Theorem 2), for all integer t, the point tv belongs to P1 ∩ P2 and is a lower leaning point of the two digital planes. In the dual space, the point tv corresponds to the two constraints 0 ≤ αtv1 + βtv2 + tv3 + γ < 1. Since tv is a lower leaning point for the two digital planes, the constraint αtv1 + βtv2 + tv3 + γ = 0 goes through the two points ( ac , cb , 0) and ( fd , fe , 0). Hence, for all t, D(I) is constrained by the plane αtv1 + βtv2 + tv3 + γ = 0, equivalent to αv1 + βv2 + v3 + 1t γ = 0 for t = 0. When t goes to +∞, the normal vector of this plane converge to the value (v1 , v2 , 0) with positive values of t and with negative values of t when t goes to −∞. Then, for infinite planes, D(I) is reduced to a polygon included in the plane with normal vector (v1 , v2 , 0) which contains the two points ( ac , cb , 0) and ( fd , fe , 0). An example of an intersection preimage is given Figure 5. This description enables to characterize the minimal parameters of I:
New Results about Digital Intersections
111
z P2
γ
P1 ∩ P2 β
x
P1 y
B( 1 , 3 , 0) 2 4 , 3 , 0) A( 1 5 5
α
Fig. 5. Preimage of the intersection of the digital naive planes P1 (1, 3, 5, 0) and P2 (2, 3, 4, 0).
Theorem 4. Let P1 (a, b, c, 0) and P2 (d, e, f, 0) be two digital naive planes. We denote A( ac , cb , 0) and B( fd , fe , 0) the corresponding points in the parameter space, and I = P1 ∩ P2 . Then, the minimal normal vector of I is given by the point u v (w , w , 0) on [AB] with minimal w. Proof. Without loss of generality, we suppose that ac ≤ fd . To prove this theorem, we use the results obtained for digital lines using a digital plane decomposition into digital lines presented in [14]. Indeed, we can decompose any digital plane P (a, b, c, µ) into digital 3D lines: for instance, a decomposition along the y axis gives the set of lines Syj (P ) = {(x0 , y0 , z0 ) ∈ P |y0 = j}, ∀j ∈ Z. For two out of these three possible decompositions, those lines are naive lines, and for the third one, they are thicker than naive lines. Since I is a piece of naive plane, we can use this decomposition. Consider the decomposition of I along the y axis. We denote Syj (I) the 3D digital lines of this decomposition. Then we have D(I) = j D(Syj (I)). Moreover, Syj (I) = Syj (P1 ∩ P2 ) = Syj (P1 ) ∩ Syj (P2 ) as Syj (I) is the set of pixels of P1 ∩ P2 the y-coordinate of which is j. Let us consider the set Sy0 (I) = Sy0 (P1 ) ∩ Sy0 (P2 ). Then, we get two cases: – if Sy0 (P1 ) and Sy0 (P2 ) are naive lines, we denote them N3D,1 (a, c, 0) and N3D,2 (d, f, 0). Then, Sy0 (I) = N3D,1 ∩ N3D,2 . – otherwise, Sy0 (P1 ) or Sy0 (P2 ) is thicker than a naive line but contains the naive line of the previous case. Thus we have Sy0 (I) ⊃ N3D,1 ∩ N3D,2 . If we consider the preimages of those sets, we then get the following property: D(Sy0 (I)) ⊆ D(N3D,1 ∩ N3D,2 ). N3D,1 ∩N3D,2 is a piece of 3D naive line and its preimage is a prism such that the basis in the plane β = 0 is the preimage of the intersection of the two 2D naive lines N2D,1 (a, c, 0) and N2D,2 (d, f, 0) and such that the directional vector is (1, 0, 0)T . u v k , w , w ) be a point of D(I) as illustrated on Figure 6. Then p ∈ Let p( w D(Sy0 (I)) and thus p ∈ D(N3D,1 ∩ N3D,2 ). The projection of p along the u , 0, 0). prism previously described onto the plane β = 0 is the point proj(p)( w proj(p) ∈ D(N2D,1 ∩ N2D,2 ) and according to the results about the preimage of the intersection of two digital 2D naive lines, if w < c and w < f , then
112
Isabelle Sivignon, Florent Dupont, and Jean-Marc Chassery β =0
D(N2D,1 ∩ N2D,2 ) D(I)
proj(p) A
p
B
Fig. 6. Illustration of the proof of Theorem 4. u ≤ w ≤ fd . If ac = fd , then cb = fe and the same argument can be applied using a decomposition along the x axis. Otherwise, finally, we derive that, if w < c and w < f , thus p belongs to [AB] from the structure of D(I) presented in Proposition 3. This shows that the minimal parameters are to be found on [AB]. a c
4
Conclusion
In this paper, we present new results about the intersection of two digital lines or two digital planes. We give criteria to analyze its connectivity and propose a characterization of the minimal parameters of a given intersection in function of the parameters of the two lines/planes. Although the properties are enounced and proved for digital naive lines and planes, those results are also true or can be easily transposed for standard objects. For instance, the connectivity results for lines intersections can be adapted transforming any diagonal moving into an horizontal and a vertical one. Moreover, all the results about minimal parameters are based on the intersection preimage features, which depend on the lines or planes preimage shape. But the preimage of a standard line or plane is a translated copy of the preimage of the naive line or plane having the same parameters. Those properties can be used for instance in the polygonalization process for digital curves and digital surfaces to define edges and vertices and a study of the intersection of two 3D digital lines would be interesting for that problem.
References 1. Kim, C.E.: Three-dimensional digital planes. IEEE Trans. on Pattern Analysis and Machine Intelligence 6 (1984) 639–645 2. R´eveill`es, J.P.: G´eom´etrie discr`ete, calcul en nombres entiers et algorithmique. PhD thesis, Universit´e Louis Pasteur, Strasbourg, France (1991) 3. Andr`es, E., Acharya, R., Sibata, C.: Discrete analytical hyperplanes. Graphical Models and Image Processing 59 (1997) 302–309
New Results about Digital Intersections
113
4. Kim, C.E., Stojmenovi`c, I.: On the recognition of digital planes in three-dimensional space. Pattern Recognition Letters 12 (1991) 665–669 5. Debled-Rennesson, I.: Etude et reconnaissance des droites et plans discrets. PhD thesis, Universit´e Louis Pasteur, Strasbourg, France (1995) 6. Debled, I., Reveill`es, J.P.: A new approach to digital planes. In: Spie’s Internat. Symposium on Photonics and Industrial Applications - Technical conference vision geometry 3. (1994) Boston. 7. Veelaert, P.: Geometric constructions in the digital plane. Journal of Mathematical Imaging and Vision 11 (1999) 99–118 8. Hardy, G.H., Wright, E.M.: An introduction to the Theory of Numbers. Oxford Society (1989) 9. Hayes, B.: On the teeth of wheels. In: Computing Science. Volume 88-4., American Scientist (2000) 296–300 10. McIlroy, M.D.: A note on discrete representation of lines. AT&T Technical Journal 64 (1985) 481–490 11. Dorst, L., Smeulders, A.N.M.: Discrete representation of straight lines. IEEE Trans. on Pattern Analysis and Machine Intelligence 6 (1984) 450–463 12. Yaacoub, J.: Enveloppes convexes de r´eseaux et applications au traitement d’images. PhD thesis, Universit´e Louis Pasteur, Strasbourg, France (1997) 13. Dorst, L., Duin, R.P.W.: Spirograph theory: A framework for calculations on digitized straight lines. IEEE Trans. on Pattern Anal. and Mach. Intell. 6-5 (1984) 632–639 14. Coeurjolly, D., Sivignon, I., Dupont, F., Feschet, F., Chassery, J.M.: Digital plane preimage structure. In Del Lungo, A., Di Ges` u, V., Kuba, A., eds.: Electronic Notes in Discrete Mathematics, IWCIA’03. Volume 12., Elsevier Science Publishers (2003)
On Local Definitions of Length of Digital Curves Mohamed Tajine and Alain Daurat LSIIT UMR 7005 CNRS-ULP, Pˆ ole API, Boulevard S´ebastien Brant, 67400 Illkirch-Graffenstaden, France {tajine,daurat}@lsiit.u-strasbg.fr
Abstract. In this paper we investigate the ‘local’ definitions of length of digital curves in the digital space rZ2 where r is the resolution of the discrete space. We prove that if µr is any local definition of the length of digital curves in rZ2 , then for almost all segments S of R2 , the measure µr (Sr ) does not converge to the length of S when the resolution r converges to 0, where Sr is the Bresenham discretization of the segment S in rZ2 . Moreover, the average errors of classical local definitions are estimated, and we define a new one which minimizes this error. Keywords: Digital segments, local length estimation, frequency of factors, convergence.
1
Introduction
A digital curve is the discretization of a curve in R2 . We investigate the local definitions of length of digital curves in rZ2 where r is the resolution of discrete space. The local definition of length is obtained by associating a weight p(w) to each digital curve w of size m, where the size of a digital curve is its cardinality minus one, ie its number of edges between consecutive points. If C(m) is the set of digital curves of size m, then any digital curve γr , in rZ2 , can be obtained by concatenation of elements in C(m) with perhaps a digital curve ε of size less than m. In other words γr can be viewed as a word in C(m)∗ .ε. If wN ε where wi ∈ C(m) for all i, then we define the length of γr by γr = w1 w2 . . . µr,m,p (γ) = r i p(wi ) (we neglect the contribution of digital curve ε). Actually, we investigate the following problem: Does there exist m, p(·) such that for any curve γ of R2 the lengths µr,m,p (γr ) converge to the length of γ where r tends to 0 ? (i.e. γr is a discretization of γ). In this paper, we study this problem for a particular class of curves: the set of segments in R2 , moreover we suppose that the discretization operator δr restricted to the segments is the “Bresenham” discretization. We consider the segment S = {(x, αx + β) | A ≤ x ≤ B} of R2 such that the slope α ∈ [0, 1], the other cases could be studied by symmetry. Its “Bresenham” discretization Sr = δr (S) ⊂ rZ2 is the set I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 114–123, 2003. c Springer-Verlag Berlin Heidelberg 2003
On Local Definitions of Length of Digital Curves
Sr = r (X, Y ) ∈ Z2
115
A B β β r ≤ X ≤ r and αX + r − 1 < Y ≤ αX + r .
We fix m as a positive integer. As it has been explained for curves the segment Sr can be seen as the word: w1 w2 . . . wN ε where m · N + Card(ε) = Card(Sr ), wi is a word of size m and ε is a word of size less than m. We call Sm the set of all such factors wi when S describes all the segments. Figure 1 illustrates this situation.
We construct µr,m,p as the local definition of measure by using a weight function p : Sm −→R. Then µr,m,p is defined by: µr,m,p (Sr ) = r(p(w1 ) + p(w2 ) + . . . + p(wN )) =r n(w, Sr , r)p(w) w∈Sm
where n(w, Sr , r) is the number of i between 1 and N such that wi = w.
116
Mohamed Tajine and Alain Daurat
The central question of this paper can be formulated as the following: does there exist m, p(·) such that, for any segment S, the estimation µr,m,p (Sr ) converges to the length of S when the resolution r converge to 0? In this paper, we will prove that for almost all segments S, the estimation µr,m,p (Sr ) does not converge to the length of S when the resolution r tends to 0.
Segments in Z2
2 2.1
Preliminaries
Let a ∈ R, a (resp. a ) denotes the integral part (resp. the fractional part) of a. So, a = a + a with a ∈ Z, a ≤ a < a + 1 and 0 ≤ a < 1. We also define a = −−a. For example, 73 = 2, 37 = 3 and 73 = 13 . We have: Property 1. Let x, u be real numbers then: u if x < 1 − u x + u − x = u + 1 otherwise. So, for all α, β ∈ R α(x + u) + β − αx + β =
αu if αx + β < 1 − αu αu + 1 otherwise.
In this paper, we consider the discretization operator δr in rZ2 of the families of Bresenham’s discretization. Definition 1. Let r > 0 and let α ∈ [0, 1], β, A, B ∈ R. Consider the segment S = {(x, αx+β) | A ≤ x ≤ B} of slope α and displacement β. Sr = δr (S) = {r(X, αX + 1r β) | Ar ≤ X ≤ Br and X ∈ Z} is the discretization of S in rZ2 . The notion of digital segment is a central notion in this paper for the local definitions of length. This notion can be defined as particular subset of digital straight line (as in the Euclidean case) or by using the chaincodes. Definition 2. Let r > 0. Let α ∈ [0, 1], β ∈ R and m ∈ N∗ . • Let n ∈ Z. A subset S = {r(X, αX + 1r β) | n ≤ X ≤ n + m and X ∈ Z} is called a segment of size m of rZ2 . The point r(n, αn + 1r β) is called the starting point of S. • A subset S of rZ2 is a digital segment of size m in rZ2 , if there exists a segment S of size m of rZ2 such that S = {p − p0 | p ∈ S }, where p0 is the starting point of S. So, a digital segment is a segment up to a translation. • Sr,m is the set of all digital segments of size m of rZ2 with the slope in [0, 1].
On Local Definitions of Length of Digital Curves
117
If the slope α ∈ [0, 1], then the notion of digital segment can be described by using the relative or the absolute chaincode as the following: Definition 3. Let α ∈ [0, 1] and β ∈ R. α,β • The relative chaincode vx,r,m of length m at abscissa x ∈ rZ (x = rX where X ∈ Z) is the word on {0, 1} defined by: α,β (k) = α(X + k) + vx,r,m
β β − α(X + k − 1) + . r r
for 0 < k ≤ m. α,β • The absolute chaincode wx,r,m is defined by: α,β (k) = α(X + k) + wx,r,m
β β − αX + r r
for 0 ≤ k ≤ m. α,β α,β α,β These two chaincodes are equivalent, since vx,r,m (k) = wx,r,m (k) − wx,r,m (k − 1) k α,β α,β and wx,r,m (k) = l=1 vx,r,m (l). We consider the set of absolute chaincodes for a given slope α and displacement β: α,β α,β = {wx,r,m | x ∈ rZ} Cr,m
and the set of all the absolute chaincodes α,β Ar,m = {wx,r,m | α ∈ [0, 1], β ∈ R, x ∈ rZ}.
α,β So, Ar,m = α∈[0,1],β∈R Cr,m and Card(Ar,m ) = Card(Sr,m ). α,β In the following, we will prove that the set Cr,m is not depending of β nor r and the set Ar,m and Sr,m are not depending of r. 2.2
Some Combinatorial Properties of Digital Segments
In this subsection, we consider a segment with the slope α ∈ [0, 1] and the displacement β ∈ R. Definition 4. Let m ∈ N∗ . Fm is the set of Farey numbers of order m: Fm = { pq | 0 ≤ p ≤ q ≤ m and = 0}. The elements of Fm are called m−Farey numbers. We recall properties about the structure of the chaincodes of a given line (see [1, 2]). The first one is a direct consequence of Property 1: Property 2. Let x ∈ rZ. If x = rX with X ∈ Z then αk if αX + βr < 1 − αk α,β wx,r,m (k) = αk + 1 otherwise.
118
Mohamed Tajine and Alain Daurat
We define (Biα )0≤i≤m as the sequence (1 − αk )0≤k≤m reordered increasingly, α α notice that Bm = 1 − a · 0 = 1. By convention we suppose B−1 = 0. Property 3. [1, 2] • If α ∈ ([0, 1] \ Fm ), then 1 − αi = 1 − αj for all i, j such that −1 ≤ i < j ≤ m. α,β • The chaincode wx,r,m at x depends only on the position of the number β αX + r relatively to the elements of the sequence (Biα )−1≤i≤m . So if x = α,β = wxα,β rX, x = rX ∈ rZ then wx,r,m ,r,m ⇐⇒ ∃i ∈ {−1, 0, . . . , m − 1} such β β α α [. that αX + r , αX + r ∈ [Bi , Bi+1 α α • If Bi < Bi+1 then for all β ∈ R and r > 0 there exists X ∈ Z such that α α,β αX + βr ∈ [Biα , Bi+1 [. So, the sets Cr,m do not depend on β nor r and will α be denoted Cm and thus the set Ar,m (resp. Sr,m ) does not depends on r and will be denoted Am (resp. Sm ). α α ) = Card({i | 0 < i < m and Biα < Bi+1 }) ≤ m + 1. Moreover Card(Cm α Thus, if α ∈ ([0, 1] \ Fm ), then Card(Cm ) = m + 1. Property 4. [1] • Sm is the set of segments of Z2 with (0, 0) as starting point. m • Card(Sm ) = 1 + i=1 ϕ(i) where ϕ is the Euler’s totient function (ϕ(i) = Card({j | 1 ≤ j < i and i and j are coprime})). 3 2 • Card(Sm ) = m π 2 + O(m log(m)). Definition 5. Let α ∈ ([0, 1] \ Fm ). σα is the permutation on {1, . . . , m} such that: 1 − ασα (i) < 1 − ασα (i + 1) for 1 ≤ i < m. So, Biα = 1 − ασα (i) for all 1 ≤ i ≤ m. Lemma 1. Let f, f be two consecutive m-Farey numbers and α, α ∈ ]f, f [. Then σα = σα . In other words, the function α → σα is a constant function on ]f, f [. Moreover, the function α → Biα is an affine function on ]f, f [. The proof of Lemma 1 is omitted due to space constraints and is available in [3]. Theorem 1. Let m ∈ N∗ and 0 ≤ j < m. Let I ⊆ [0, 1] be an interval. Then lim
r→0
Card({X ∈ (mZ + j) | αX + βr ∈ I and Ar ≤ X ≤ Br }) Br − Ar + 1
=
1 µ(I) m
where µ(I) is the length of the interval I. The proof of Theorem 1 is analogous to the proof of Theorem 1.19 of [4] (Weyl’s Theorem), and is given in [3].
On Local Definitions of Length of Digital Curves
2.3
119
Local Definitions of Length of Digital Segments
Let m ∈ N∗ . We construct the local definition of length by using a weight function p : Sm −→ R as the following: Let S = {(x, αx + β) | A ≤ x ≤ B} be a segment in R2 with α ∈ [0, 1], β, A, B ∈ R, and let r > 0.
Then Sr = δr (S) = r X, αX + 1r β | Ar ≤ X ≤ Br and X ∈ Z . B − A +1
. Let N (r) = r m r So, Sr can be seen as the word w1,r w2,r ...wN (r),r εr where wi,r ∈ Sm for i = 1, ..., N (r) and εr a word of size less than m. Consider µr,m,p (Sr ) = r(p(w1,r ) + p(w2,r ) + ... + p(wN (r),r )) as an approximation of the length of the segment S (we neglect the contribution of εr ). Put, DAm,p (S) = limr→0 (µr,m,p (Sr )) α,β,A,B Definition 6. Let j such that 0 ≤ j < m. The frequency Fj,r (w) of a 1 word w of length m in the segment δr (S) = {r(X, αX + r β) | Ar ≤ X ≤ Br and X ∈ Z} of rZ2 is defined by: α,β,A,B (w) = Fj,r
α,β = w}) Card({X ∈ (mZ + j) | Ar ≤ X ≤ Br and wx,r,m
Br − Ar + 1
.
α , 0≤ Lemma 2. Let α ∈ [0, 1] be an irrational number, β, A, B ∈ R, w ∈ Cm j < m and i as in Property 3. Then α,β,A,B (w) = Fjα,β,A,B (w) = lim Fj,r r→0
1 α (B α − Bi−1 ). m i
In particular Fjα,β,A,B (w) does not depend on j, β, A and B, and will be denoted F α (w) in the following. Proof. By Property 3 we have: Fjα,β,A,B (w) = lim
α,β Card({X ∈ (mZ + j) | Ar ≤ X ≤ Br and wx,r,m = w})
Br − Ar + 1
r→0
α Card({X ∈ (mZ + j) | Ar ≤ X ≤ Br and αX + 1r β ∈ [Bi−1 , Biα [}) r→0 Br − Ar + 1
= lim
So, by Theorem 1, Fjα,β,A,B (w) =
1 α m (Bi
α − Bi−1 ).
Remark 1. This lemma is wrong for rational slopes. For example, if we consider the line y = 12 x, then the frequency of the word w = (0, 0, 1) is 1. But this 1
1
2 , B02 [ whose length is 12 . word corresponds to the interval [B−1
Theorem 2. Let f, f be two consecutive m−Farey numbers. There exist u, v such that DAm,p (S) = (B − A)(uα + v) for all segments S = {(x, αx + β) | A ≤ x ≤ B}, such that α ∈ (]f, f [\Q) (i.e. α is an irrational numbers between the two m−Farey numbers f, f ). In other words, DAm,p (.) is piecewise affine function in α for α ∈ ([0, 1] \ Q).
120
Mohamed Tajine and Alain Daurat
Proof. We suppose the weight function p : Sm → R which associates to each chaincode w of size m a weight p(w). The digital segment Sr can be seen as the word: w1,r w2,r ...wN (r),r εr B − A +1
where N (r) = r m r , wi,r ∈ Sm for i = 1, ..., N (r) and εr is a word of length less than m. So, the approximated length of the digital segment Sr is: µr,m,p (Sr ) = r(p(w1,r ) + p(w2,r ) + . . . + p(wN (r),r ) n(w, Sr , r)p(w) =r w∈Sm α,β = w}) which where n(w, Sr , r) = Card({X ∈ mZ | Ar ≤ X ≤ Br and wx,r,m r is the number of i such that wi = w. So, n(w, Sr , r)p(w) DAm,p (S) = lim r r→0
w∈Sm
B A n(w, Sr , r) − + 1) p(w) B r→0 r r − Ar + 1 w∈Sm r F α (w)p(w). = (B − A)
= lim r(
(1)
w∈Sm
So, according to Lemma 1 and Lemma 2, DAm,p (S) is an affine function on α if α is irrational numbers between two consecutive m−Farey numbers. m Corollary 1. There are at most (2 i=1 ϕ(i)) irrational numbers α ∈ [0, 1] such that DAm,p (S) = length(S) where S = {(x, αx + β) | A ≤ x ≤ B} (i.e. length(S) is the length of the segment S and ϕ is the Euler’s totient function). Proof. We consider an interval ]f, f [ bordered by two consecutive m−Farey numbers. By the previous theorem, there exist u, v ∈ R such that the estimated length of the segment S = {(x, αx + β) | A ≤ x ≤ B} is DSm,p (S) = (B − A)(uα +√v) for the irrational slopes α. The exact length of S is length(S) = (B − A) 1 + α2 . But the equation DSm,p (S) = length(S) ⇔ (1 + α2 ) = (uα + v)2 ⇔ (u2 − 1)α2 + 2uvα + (v 2 − 1) = 0 has more than two solutions only when u2 − 1 = 0, uv = 0, v 2 − 1 = 0 which never happens. So the estimated length can be equal m to the exact length for only two values on each interval. There are exactly i=1 ϕ(i) such intervals. So the m estimated length is exact for at most 2 i=1 ϕ(i) slopes.
On Local Definitions of Length of Digital Curves
121
Remark 2. Theorem 2 and Corollary 1 imply that: for any m ∈ N∗ and any weight p(·) the set of slopes α ∈ [0, 1], such that the length of segments S of slope α is equal to DSm,p (S) is at most a countable set. So, for any m ∈ N∗ , for almost all segments S of slope α ∈ [0, 1], length(S) = DAm,p (S).
3
Examples of Estimators for Segments
In this section we compare different local estimators. Table 1 gives for each method, the irrational slopes for which the asymptotic length is exact (By Corollary 1 for every local estimator there is always a finite number of such slopes), and the root mean square error. This last is given by the formula: 1 (lest (α) − lreal (α))2 D(α)dα RM SE = 0 1 D(α)dα 0 where lest (α) = DAm,p (S(α)) is the estimated length of √ the segment S(α) = {(x, αx) | 0 ≤ x ≤ 1}, lreal (α) is the real length: lreal (α) = 1 + α2 , D(α) is the density of the lines of slope α. In the following we suppose that the distribution α = (1+α2 )−1 . of the angles of the lines is uniform, which means: D(α) = d arctan dα In the previous section we have computed lest (α) for every irrational α, so we can compute precisely the RMSE for every local estimator of length. 3.1
Some Classical Estimators
We have considered three classical kinds of estimators: Freeman’s estimator ([5]), Chamfer estimators ([6]) and BLUE estimators ([7]). The weights of these estimators and their RMSE are given in Table 1. For more details see [3]. 3.2
Minimum RMSE Estimator
In this paragraph we propose estimators which minimize the RMSE. In fact Formula (1) and Lemma 2 permit easily to express the RMSE in the weights:
1
1 D(α)dα RM SE 2 = (lest (α) − lreal (α))2 D(α)dα
0
1
= 0
=
0
w∈Sm
α
F (w)p(w)
− 1 + α2
p(w1 )p(w2 )
1
2 D(α)dα
F (w1 )F (w2 ) D(α)dα α
α
0
(w1 ,w2 )∈(Sm )2
−2 p(w) w∈Sm
1 0
1 F α (w) 1 + α2 D(α)dα + (1 + α2 )D(α)dα 0
122
Mohamed Tajine and Alain Daurat
so the function (p(w))w∈Sm → RM SE 2 is a quadratic form. It is positive so it reaches its minimum for some values of weights which give the minimum RMSE estimators. Each last line of the two parts of Table 1 gives these weights for the wordlengths m = 1, 2 and the corresponding errors. (computed with the formal calculus system Maple) Figure 2 gives the estimated length when the resolution tends to zero for three different estimators. By definition, the minimum RMSE estimator is the more closed to the real length.
Table 1. Comparison of length estimators in the plane. m=1 asymptotic root p(00) p(01) slopes with no errors mean square error √ Freeman 1 2 0.066143 {0, 1} 4 Chamfer 3-4 1 0.042255 {0, 34 } 3 BLUE 1.059416 1.183276 0.084863 {0.510130} minimum 0.941246 1.351320 0.026524 {0.184382, 0.743633} RMSE m=2
2.226499
asymptotic root mean square error 28 0.011875 10 2.583985 0.043534
2.205554
2.811569
p(000) p(001),p(011) p(012) Chamfer 5-7-11 2 BLUE 2.037583 minimum 1.958843 RMSE
22 10
0.007466
1.4
1.4
1.4
1.3
1.3
1.3
1.2
1.2
1.1
1.1
irrational slopes with no errors none {0.480972} {0.106259, 0.408328, 0.634893, 0.897172}
1.2
1.1
1 1
1 0
0.2
0.4
0.6
0.8
x Legend
1
0
0.2
0.4
0.6
0.8
x Legend Estimated length Real length
Chamfer 5-7-11
1
0
0.2
0.4
0.6
0.8
1
x Legend Estimated length Real length
BLUE
Estimated length Real length
Minimum RMSE
Fig. 2. Length approximated by three estimators in function of the slope of the segment. (m = 2).
On Local Definitions of Length of Digital Curves
4
123
Conclusion
In this paper we have proved that local definitions of digital length cannot be used to estimate the length of continuous curves because we does not have the convergence of such measurements to the searched length when the resolution tends to infinity, even if we restrict the curves to the segments. But, of course it does not mean that the discretizations of the curves do not permit to compute a good estimation of the length of the continuous curve. For example in [8] the authors measure the length of a curve by summing the length of segments included in the curve. They prove that the limit length when the resolution tends to infinity is the searched length if the curve satisfies some regularity properties. See also [9] for a comparison between different estimators.
References 1. Mignosi, F.: On the number of factors of Sturmian words. Theoret. Comput. Sci. 82 (1991) 71–84 2. G´erard, Y.: Contribution a ` la G´eom´etrie Discr`ete. PhD thesis, Universit´e Clermont 1 (1999) 3. Tajine, M., Daurat, A.: On local definitions of digital curves. Technical report, LSIIT (2003) extended version with proof. 4. Drmota, M., Tichy, R.F.: Sequences, discrepancies and applications. Lecture Notes in Mathematics 1651. Springer-Verlag (1997) 5. Freeman, H.: Boundary encoding and processing. In Lipkin, B.S., Rosenfeld, A., eds.: Picture Processing and Psychopitorics. (1970) 6. Borgefors, G.: Distance transformations in digital images. Computer Vision, Graphics, and Image Processing 34 (1986) 344–371 7. Dorst, L., Smeulders, A.W.M.: Best linear unbiased estimators for properties of digitized straight lines. IEEE Transactions on Pattern Analysis and Machine Intelligence 8 (1986) 276–282 8. Coeurjolly, D., Debled-Rennesson, I., Teytaud, O.: Segmentation and length estimation of 3D discrete curves. Lecture Notes in Computer Science 2243 (2001) 299–317 (DGCI’ 2001) 9. Coeurjolly, D., Klette, R.: A comparative evaluation of length estimators. Technical Report, CITR-TR-105, University of Auckland (2001) 10. Berth´e, V.: Fr´equences des facteurs des suites sturmiennes. Theoret. Comput. Sci. 165 (1996) 295–309 11. Dorst, L., Smeulders, A.W.M.: Discrete straight line segments: Parameters, primitives and properties. In Melter, R., Bhattacharya, P., Rosenfeld, A., eds.: Vision Geometry, series Contemporary Mathematics. Volume 119., AMS (1991) 45–62 12. Dorst, L., Smeulders, A.N.M.: Discrete representation of straight lines. IEEE Transactions on Pattern Analysis and Machine Intelligence 6 (1984) 450–463 13. Berenstein, C.A., Kanal, L.N., Lavine, D., Olson, E.C.: A geometric approach to subpixel registration accuracy. Computer Vision, Graphics, and Image Processing 40 (1987) 334–360 14. Amanatides, J., Woo, A.: A fast voxel traversal algorithm for ray tracing. In Mar´echal, G., ed.: Eurographics ’87, Elsevier (1987) 3–10 15. Borgefors, G.: Distance transformations in arbitrary dimensions. Computer Vision, Graphics, and Image Processing 27 (1984) 321–345
Characterising 3D Objects by Shape and Topology Stina Svensson1 , Carlo Arcelli2 , and Gabriella Sanniti di Baja2 1
Centre for Image Analysis Swedish University of Agricultural Sciences, Uppsala, Sweden [email protected] 2 Istituto di Cibernetica National Research Council of Italy, Pozzuoli (Napoli), Italy {car,gsdb}@imagm.cib.na.cnr.it
Abstract. Information on the shape of an object can be combined with information on the shape of the complement of the object, in order to describe objects having complex shape. We present a method for decomposing and characterising the convex deficiencies of an object, i.e., the regions obtained by subtracting the object from its convex hull, into parts corresponding to cavities, tunnels, and concavities of the object. The method makes use of the detection of watersheds in a distance image. Keywords: Distance transform, watershed segmentation, topological erosion, volume image.
1
Introduction
The description of objects having complex shape, but that are not easily decomposable into meaningful simple parts, can be achieved if also the shape of the complement of the object, the background, is investigated. In fact, object and background play dual roles, and concavities of the object can be described as convexities of the background. The analysis of the entire background can be rather time consuming and, for this reason, only the voxels of the background that are embedded in concavities of the object should be taken into account. Therefore, it is convenient to compute the convex hull of the object, since in this way the portion of the background to be investigated can be limited to the convex deficiencies, i.e., the difference between the convex hull and the object, [1]. A bounding box could be used instead of the convex hull to save computation time. However, the use of a bounding box would only limit the size of the portion of the background to be investigated, but would not provide useful hints to achieve object’s description. In fact, the difference between the bounding box and the object seldom originates components that can be in faithful correspondence with the perceived convex deficiencies. Topological features involve both the object and its complement and, as such, are particularly useful for the description of non-intuitively decomposable complex objects. Topological features of objects in 3D images are the connected I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 124–133, 2003. c Springer-Verlag Berlin Heidelberg 2003
Characterising 3D Objects by Shape and Topology
125
concavity
tunnel
cavity
Fig. 1. Object with one concavity, one tunnel, and one cavity, left. A cross section of the object and both topological and shape features, right.
components of the object, called object components, and for each object component, the connected components of the background that are completely enclosed by the object component, also called cavities, and tunnels. A tunnel exists whenever a component of the background is interpenetrating the object. Besides topological features, also some shape features are of interest for the description of non-intuitively decomposable complex objects. For example, this is the case for concavities of the object that can be interpreted as convexities or protrusions of the background. Concavities of the object, as well as tunnels and cavities can be identified by computing the convex hull and by analysing the convex deficiencies. A simple case is given in Fig. 1. There, a brick-shaped object consisting of one connected component is shown. The object includes a concavity, visible on top of the object, a tunnel, crossing the object in the middle, and a cavity on the bottom of the object, visible only in the cross section of the object. While the number of object components and the number of cavities are easy to compute by means of local operators, tunnels are more difficult to identify and only recently some contributions have appeared dealing with this subject. In [2], an algorithm was presented to close tunnels, called holes in that article. The purpose of that algorithm was actually that of detecting tunnels (especially in nearly thin objects, as it is clear by looking at the examples shown in the article) in order to suppress them (or to suppress only those considered as less significant, based on the size of the tunnels). In this respect, the algorithm works nicely as it identifies a closure located in the middle of each tunnel. However, our purpose is to count, represent and describe complex tunnels in object characterised by any thickness. In this sense, the algorithm [2] is not adequate. In fact, for a complex tunnel consisting of crossing branches it is not possible to know a priori how many distinct closures will be identified, since this depends on the length of the different branches. More recently, a method to detect tunnels (and cavities) and to represent them by linear structures (and single voxels) has been introduced in [3]. The method is based on the topological erosion of the convex deficiencies of the object. The number of crossing points (or better clusters of crossing points) found within the linear representation of the tunnel is used to estimate tunnel complexity, and the number of end points accounts for the number of exits of the
126
Stina Svensson, Carlo Arcelli, and Gabriella Sanniti di Baja
tunnel. Moreover, information on the maximal thickness of the tunnel is given in terms of the number of iterations of topological erosion necessary to generate the linear representation. In this paper, we perform the analysis of concavities, tunnels and cavities, still using the convex deficiencies of the object. In particular, we here face the problem of decomposing convex deficiencies, corresponding to a number of concavities and tunnels merged into a unique component, into the constituting entities. To this aim, we use a combination of constrained distance transformation and watershed segmentation. Once the convex deficiencies have been decomposed into individual entities, the method in [3] can be applied to extract from each entity its representation.
2
Preliminaries
We consider volume images consisting of object and background. We treat the object as 26-connected, i.e., two object voxels are adjacent if they share a face, an edge, or a vertex, and the background as 6-connected, i.e., two background voxels are adjacent if they share a face. For an object voxel v, we denote N 26 (v), the set of voxels from the immediate neighbourhood of v including all face, edge, and vertex neighbours of v, and N 18 (v), the set of voxels including all face and edge neighbours of v. An object component is a set of voxels for which each pair of voxels, u and v can be connected by a path, u = w0 , w1 , ..., wn = v, within the object and such that wi+1 ∈ N 26 (wi ), i = 0, . . . , n − 1. For simplicity, in this paper we will consider a volume image including a single object component. In case of more than one object component, connected component labelling (performed by using, e.g., the algorithm in [4]) is preliminarily performed, so as to work on each object component individually. The convex hull of an object is the smallest convex set containing that object. Different, equivalent, definitions for a convex set S exist, e.g., a set is convex when for all points P, Q ∈ S the straight line connecting P and Q is also in S. Defining and finding the convex hull of a discrete object is not trivial, [5]. Often an approximation of the convex hull, e.g., a covering polyhedron, is adequate. In this paper, we use the method described in [6] to build a covering polyhedron by repeatedly applying concavity filling operators. Though using 3 × 3 × 3 operators, the method actually derives and uses information from a 5 × 5 × 5 neighbourhood of each voxel, to establish whether the voxel is located in a planar region. Thus, the resulting approximation of the convex hull is quite good, as the covering polyhedron is characterised by a number of up to 90 faces. An even larger number of faces could be obtained by deriving and using information from a larger neighbourhood. However, for our purpose this is not necessary, as the increased accuracy does not affect the result enough to justify the increase in computational cost. The convex deficiencies of an object are obtained by computing the difference between the covering polyhedron and the object. In what follows, we denote the
Characterising 3D Objects by Shape and Topology
127
Fig. 2. Two cross sections of a box with a cavity shaped as a torus, left, and of a box with a concavity including a torus, right.
convex deficiencies by CDs. For each CD we call cap each connected component of voxels belonging to the CD and having at least a face-neighbour in the complement of the covering polyhedron. An object has a tunnel if there exists a closed connected path in the object which cannot be deformed to a single voxel (for details, see [7]). A tunnel is identified by a CD having more than one cap (two caps for a simple tunnel, more than two caps for tunnels consisting of many branches). An object has a cavity if a background component is fully enclosed in the object. A cavity is identified by a CD having no cap at all. An object has a concavity, whenever a CD including a single cap is found. We note that using the number of caps to establish the nature of a CD allows us to be consistent also in presence of otherwise ambiguous cases. For example, see Fig. 2, left, where both the definition of cavity (i.e., a background component fully enclosed by the object) and the definition of tunnel (i.e., a background region such that there exists a closed connected path that cannot be deformed to a single voxel) apply. By using the number of caps, the CD is classified as a cavity. Analogously for the example shown in Fig. 2, right, the CD is classified as a concavity, though part of it is clearly shaped as a tunnel. An object voxel v is simple if the object including v is homotopic to the object obtained after v has been assigned to the background, [8]. This means that the number of object components, the number of tunnels, and the number of cavities is the same, independently of whether v is in the object or in the background. A decision on whether v is simple or not can be taken based on the local neighbourhood configuration of v, [9,10]. The voxel v is simple if the number of object components in N 26 (v) is one and the number of background components, having v as a face neighbour, in N 18 (v) is also one. Topological erosion of the object is a process that assigns simple voxels to the background. The process terminates when no more object voxels are simple. Distance between voxels or sets of voxels in an image can be represented by means of a distance transform, [11]. In a distance transform, each voxel in the object is assigned a value corresponding to the distance to its closest voxel in a reference set, which is often the background. A good approximation to the Euclidean distance, i.e., a distance that is stable under rotation, can be obtained by taking the distance between two voxels as the length of the minimal path between the voxels, where each step in face direction is weighted 3, each step in edge direction is weighted 4, and each step in vertex direction is weighted 5, [12].
128
Stina Svensson, Carlo Arcelli, and Gabriella Sanniti di Baja
We will use this distance function through this paper. In case the reference set with respect to which the distance is computed is a subset of the object, instead of the background, the distance transform is said to be constrained, [13].
3
Decomposing the Convex Deficiencies of an Object
We first briefly summarise the method introduced in [3] to associate a representation to cavities and tunnels, because this constitutes the final part of the procedure described in this paper. In [3], once the covering polyhedron is achieved and the difference to the object is computed, the CDs are identified using a connected component labelling algorithm, [4], to assign an identity label to each component. Among the CDs, cavities could be easily distinguished as the CDs having no face neighbour in the background. However, all CDs, including those corresponding to cavities, undergo the topological erosion, done to detect the relative representations, because these structures are easier to manage than the CDs, and carry enough information for shape description. Topological erosion of the CDs is accomplished by removing simple voxels having no face or edge neighbours in the complement of the covering polyhedron. To guide the erosion through successive, more and more internal, voxels of the CDs, the constrained distance transform of the CDs (called DT, for short) is computed, where the reference set from which to derive distance information is the original object. Using DT also allows us to associate with each CD, information concerning its maximal thickness, given by the maximal distance label found within the DT. The resulting representation will consist, for each cavity, in an isolated voxel having no face neighbours in the complement of the covering polyhedron, and, for each tunnel, in a linear structure where a number of voxels have edge or vertex neighbours in the complement of the covering polyhedron (the voxels are as many as the exits of the tunnel). If CDs corresponding to concavities are also found, an isolated voxel having edge or vertex neighbours in the complement of the covering polyhedron is found for each connected set of concavities. A limit of the above method is that whenever a CD corresponds to a combination of more than a single entity, e.g., a number of concavities or a number of tunnels and concavities, the obtained representation only accounts for one entity, namely the thickest concavity, if the CD is in correspondence with a combination of concavities, or the tunnels, in case of a combination of tunnels and concavities. As an elucidative example, consider the object in Fig. 3. There, a solid brick-shaped object is shown, from which a number of cylinders and (parts of) balls have been removed to create tunnels and concavities. Though eight entities are perceived (two tunnels and six concavities), only four CDs are found: one simple concavity; one simple tunnel; one component consisting of the combination of one tunnel and three concavities; and one component consisting of the combination of two concavities. The corresponding representations are shown in Fig. 4, where we note that the structures corresponding to the combination of concavities and tunnels and to the combination of concavities, account only for the tunnel and for one concavity, respectively.
Characterising 3D Objects by Shape and Topology
129
Fig. 3. From left to right: object, a cross section and the convex deficiencies.
Fig. 4. Representation of the convex deficiencies for the object in Fig. 3, framed by a cross section of the borders of the CDs.
It is clear that to obtain correct representations by the method in [3], the CDs corresponding to combinations of concavities and combinations of tunnels and concavities should be preliminarily decomposed into their constituting entities. In this paper, we aim at achieving a decomposition of these CDs into parts corresponding to single tunnels and single concavities. To decompose CDs corresponding to combinations of concavities or combinations of tunnels and concavities, we resort to watershed segmentation, [14,15]. The concept of watershed is based on the idea of a “topographic” interpretation of a multi-valued image, e.g., a grey-level image or a distance transform where distance labels play the role of grey levels. The three spatial coordinates x, y, and z of a voxel v, together with elevation of v, which is the grey-level of v, are used. This gives raise to an elevation model in terms of a hyper-surface. In this interpretation we have three types of voxels: voxels that are minima; voxels belonging to catchment basins; and voxels belonging to watersheds (crest lines). See Fig. 5 for the 2D case. Watersheds are found by “immersion”. Imagine each minimum as pierced so that when immersing the hyper-surface into water, the catchment basins start to be filled. A watershed is built in correspondence with any voxel which is reached by water coming from two basins. To identify the minima within the CDs, we compute the DT, where we use as reference set the complement of the covering polyhedron. The minima for the watershed segmentation are the voxels farthest from the reference set. Accordingly, they are detected as the maxima on the DT. In correspondence of each tunnel most of maxima will be placed midway with respect to the caps delimiting the tunnel. A problem to be solved is that the number of connected components of maxima generally exceeds the number of entities, so that an over-segmentation is likely to be obtained. Well known techniques to reduce this over-segmentation can be applied, e.g. , see [16]. We do not discuss here these techniques, but concentrate on additional criteria we adopt to reduce over-segmentation. Specif-
130
Stina Svensson, Carlo Arcelli, and Gabriella Sanniti di Baja
minima watershed catchment basins
Fig. 5. Voxels involved in watershed computation.
Fig. 6. Object, left, and its convex deficiencies, right. Only the convex deficiency corresponding to the tunnel is meaningful.
ically, our criteria involve the reduction of the number of components of maxima, merging of components of maxima, and merging of parts of the decomposition. To reduce the number of components of maxima, we perform a small number of erosion/dilation operations. This results in smoothed CDs, since spurs and thin protrusions are removed. We note that erosion/dilation is also useful to avoid considering spurious CDs. In fact, when building the covering polyhedron, concavity filling changes the status of a number of voxels that are not really placed in concavities of the object, but are such that the planes passing through them and tangent to the object are not oriented according to permitted directions. Thus, CDs are likely to be identified even when no concavities are actually present in the object. In Fig. 6, a cube rotated 30◦ in z-direction with respect to the upright position is shown together with the found CDs. It can be noted that besides the expected CD corresponding to the tunnel, also other four spurious CDs are detected. In fact, the faces of the cube are not oriented along directions permitted for the faces of the covering polyhedron. Thus, in correspondence with each face of the cube, concavity filling adds to the covering polyhedron all voxels understood as belonging to local concavities as far as a face of the covering polyhedron oriented along a permitted direction is obtained. The tool used to merge components of maxima is active only for maxima found midway with respect to the caps delimiting each tunnel. To this aim, we need to compute closures in correspondence with tunnels and accomplish this task similarly to [2]. Since the DT is already available, we can use it to guide a topological erosion that removes simple voxels, starting from the voxels having minimal distance label and proceding inwards, until the closures are obtained. The process is illustrated in Fig. 7, where a brick-shaped object with
Characterising 3D Objects by Shape and Topology
131
Fig. 7. Object with a combination of concavities and a tunnel, left, a cross section, middle, and the closure of the tunnel, right.
Fig. 8. Decomposition for the CDs of the object in Fig. 7, left, and in Fig. 3, right. (Fig. 8 is actually in colours. For a better understanding, please refer to the electronic version of the paper.)
a combination of concavities and a tunnel is shown. The closure of the tunnel is shown in Fig. 7, right. Connected component labelling of the closures is accomplished. Then, we can ascribe the same identity label to all components of maxima found midway with respect to the caps delimiting the tunnels. These maxima are either included in a closure (and as such already have the same identity label as the closure they belong to), or are adjacent to it (and the identity label of the closure can be assigned to them). The latter case occurs when the length of the tunnel is expressed by an even number of voxels and, hence, the set of maxima is two-voxel thick. Connected component labelling is, then, accomplished on the remaining maxima. This completes the process to identify the markers for the watershed segmentation. We use an algorithm for computing the watersheds which is basically an extension to deal with 3D images of the algorithm presented in [14]. The watershed decomposition for the CDs of the objects in Fig. 7 and in Fig. 3, are respectively shown in Fig. 8 to the left and the right. As concerns merging among parts of the decomposition, we distinguish two cases respectively dealing with complex tunnels, and with tunnels or concavities with significant protrusions that have not been removed by erosion/dilation. For tunnels having complex shape and, hence, more than one branch, e.g., a Y-shaped tunnel, more than one closure can be found. As a consequence it may happen that, after the watershed segmentation, a branch of the tunnel is assigned more than one identity label. See Fig. 9. Merging of the parts identified within the tunnel can be easily accomplished. In fact, closures have been assigned identity labels that not only distinguish a closure from other closures, but also discriminate between components of maxima found in correspondence with a closure and all other components of maxima. The second merging case
132
Stina Svensson, Carlo Arcelli, and Gabriella Sanniti di Baja
Fig. 9. A Y-shaped tunnel with the found two closures, left, watershed segmentation before merging, middle, and after merging, right.
Fig. 10. From left to right: a tunnel with a significant protrusion, a cross section, its convex deficiencies, and watershed segmentation before merging. Result after merging is one tunnel part. (Fig. 10 is actually in colours. For a better understanding, please refer to the electronic version of the paper.)
regards entities with significant protrusions, see Fig. 10. When this is the case, maxima in the DT are found also in correspondence with the significant protrusions of the tunnel which will cause creation of parts of the decomposition, once watershed segmentation is implemented. Since according to our definition, all entities, except cavities, have at least one cap, we merge all adjacent parts without cap until a compound part with cap is achieved. Once all parts of the decomposition have at least one cap, over-segmentation can be treated by well known techniques. Finally, the representations consisting in a linear structure for tunnels and in isolated voxels for concavities and cavities can be obtained by using the method described in [3].
4
Conclusion
We have characterised an object in a 3D binary image in terms of topology and shape by analysing the convex deficiencies of (an approximation of) the convex hull of the object. While the identification of cavities is a trivial problem, detection of tunnels and concavities is often tricky. Various techniques have been used, including distance transformation, connected component labelling, watershed segmentation and topological erosion. The method has given satisfactory results, when tested on a large set of artificial objects. No evidence is available
Characterising 3D Objects by Shape and Topology
133
yet of the effectiveness of the method on real images. We expect that new problems will arise that have not occurred with the artificial objects used so far. This will be a topic for future research.
References 1. Borgefors, G., Sanniti di Baja, G.: Analyzing nonconvex 2D and 3D patterns. Computer Vision and Image Understanding 63 (1996) 145–157 2. Aktouf, Z., Bertrand, G., Perroton, L.: A three-dimensional holes closing algorithm. Pattern Recognition Letters 23 (2002) 523–531 3. Svensson, S., Arcelli, C., Sanniti di Baja, G.: Finding cavities and tunnels in 3D complex objects. Proceedings of 12th International Conference on Image Analysis and Processing (ICIAP 2003), Mantova, Italy, IEEE CS (in press) 4. Thurfjell, L., Bengtsson, E., Nordin, B.: A new three-dimensional connected components labeling algorithm with simultaneous object feature extraction capability. CVGIP: Graphical Models and Image Processing 54 (1992) 357–364 5. Soille, P.: Morphological Image Analysis. Springer-Verlag (1999) 6. Borgefors, G., Nystr¨ om, I., Sanniti di Baja, G.: Computing covering polyhedra of non-convex objects. In: Proceedings of 5th British Machine Vision Conference, York, UK (1994) 275–284 7. Kong, T.Y.: A digital fundamental group. Computers & Graphics 13 (1989) 159–166 8. Kong, T.Y., Rosenfeld, A.: Digital topology: Introduction and survey. Computer Vision, Graphics, and Image Processing 48 (1989) 357–393 9. Saha, P.K., Chaudhuri, B.B.: Detection of 3-D simple points for topology preserving transformations with application to thinning. IEEE Transactions on Pattern Analysis and Machine Intelligence 16 (1994) 1028–1032 10. Bertrand, G., Malandain, G.: A new characterization of three-dimensional simple points. Pattern Recognition Letters 15 (1994) 169–175 11. Rosenfeld, A., Pfaltz, J.L.: Distance functions on digital pictures. Pattern Recognition 1 (1968) 33–61 12. Borgefors, G.: On digital distance transforms in three dimensions. Computer Vision and Image Understanding 64 (1996) 368–376 13. Piper, J., Granum, E.: Computing distance transformations in convex and nonconvex domains. Pattern Recognition 20 (1987) 599–615 14. Vincent, L., Soille, P.: Watersheds in digital spaces: An efficient algorithm based on immersion simulations. IEEE Transactions on Pattern Analysis and Machine Intelligence 13 (1991) 583–597 15. Beucher, S., Lantuejoul, C.: Use of watersheds in contour detection. In: International Workshop on image processing: Real-time edge and motion detection/estimation. (1979) Rennes, France. 16. Meyer, F.: An overview of morphological segmentation. International Journal of Pattern Recognition and Artificial Intelligence 15 (2001) 1089–1118
Homotopic Transformations of Combinatorial Maps Jocelyn Marchadier, Walter G. Kropatsch, and Allan Hanbury Pattern Recognition and Image Processing Group(PRIP) Favoritenstraße 9/1832 A-1040 Wien, Austria [email protected]
Abstract. In this contribution, we propose the notion of homotopy for both combinatorial maps and weighted combinatorial maps. We also describe transformations that are homotopic in the defined sense. The usefulness of the concept introduced is illustrated using two applications. The first one consists in calculating a skeleton using homotopic transformations of weighted combinatorial maps. The result is a compact combinatorial map describing the structure of the skeleton which may be viewed as a “combinatorial map skeleton”. The second application consists in run length encoding of all the regions described by a combinatorial map. Although these demonstrations are defined on combinatorial maps defined on a square grid, the major insights of the paper are independent of the embedding. Keywords: Homotopy, skeletonization, combinatorial map.
1
Introduction
Homotopy characterizes, in continuous topology, elastic transformations that preserve certain topological properties, transforming a simple arc into a simple arc for example. The definition of homotopy for digital sets has been proposed [15] in order to characterize transformations of such sets preserving topological properties such as the region inclusion tree, or more generally, equivalence classes of paths. The definition of homotopy of transformations on gray-level images has also been proposed [15,12], as well as on ordered sets [2]. Homotopy is an important concept, as it characterizes topological properties of skeletons, graytone skeletons and watersheds [15,12,13,14]. Combinatorial maps have been introduced as a code for planar graphs. They have been already used in image analysis to encode topological maps with different embeddings [7,6,3,11]. Some transformations of combinatorial maps have been proposed [4]. In this paper, we propose to extend the notion of homotopy to combinatorial maps (section 2), and to weighted combinatorial maps, i.e. combinatorial maps in which a single real number is associated with each dart (section 3), coherently with the classical definitions. The main advantage is in the design of classes
This paper has been supported by the Austrian Science Fund (FWF) under grants P14445-MAT and P14662-INF
I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 134–143, 2003. c Springer-Verlag Berlin Heidelberg 2003
Homotopic Transformations of Combinatorial Maps
135
Fig. 1. A combinatorial map
of transformations that have nice topological properties, independently of the embedding of the sets studied. Thus, combinatorial maps with different embeddings can be treated with the same classes of transformations and algorithms such that properties defined independently from their embedding are preserved. Some transformations that are homotopic in the defined sense are also presented. Two applications are presented (section 4), demonstrating the advantages of the proposed definitions and transformations. Defining homotopy on a combinatorial map naturally leads to the definition of a new class of skeletonization algorithm (section 4.1), producing combinatorial map skeletons. The second application, presented in section 4.2, consists of constructing a combinatorial map conveniently encoding horizontal runs.
2 2.1
Combinatorial Maps Basic Definitions
Let us review some definitions. A combinatorial map is a triplet G = (D, σ, α) where D is a set of elements called darts (or half-edges), and σ and α are two permutations defined on D such that α is an involution without fixed point (∀d ∈ D, α2 (d) = d). An example of a combinatorial map is drawn in Fig. 1. Each dart may be viewed as a directed half-edge of an embedded planar graph, and is associated to a vertex. The darts d and α(d) are associated to a unique edge of the drawn planar graph. σ defines the arrangement of darts turning counterclockwise around a vertex. A combinatorial map can be seen as a graph with explicit orientation around the vertices. A combinatorial map may be used to encode a topological map, i.e. a cellular complex of dimension 2 which partitions an orientable surface into a set of vertices (0-cells), a set of arcs (1-cells), and a set of faces (2-cells). Here, the continuous embedding of the underlying cellular complex is assumed, although most of the following results can be also interpreted with embeddings on other topological spaces. The darts of D may be viewed as cell-tuples (s, a, f ) [10], where s, a and f are incident. The orbits σ ∗ (d) are bijectively associated to vertices of the represented topological map, the orbits α∗ (d) are associated to
136
Jocelyn Marchadier, Walter G. Kropatsch, and Allan Hanbury
a) Initial map. b) Removal of e. c) Contraction of e. Fig. 2. Removal and contraction transformations
edges of the topological map, and the orbits ϕ∗ (d) of the permutation ϕ = σ ◦ α are associated to the faces of the encoded topological map. Some topological notions such as loops, bridges, etc, can be defined straightforwardly for combinatorial maps [4]. We recall the following configurations, as they are special cases to be considered in the following text. Let us consider a combinatorial map G = (D, σ, α) and one of its darts d ∈ D. d is a self loop iff α(d) ∈ σ ∗ (d). d is a bridge iff α(d) ∈ ϕ∗ (d). d is pendant iff σ(d) = d. d is redundant iff σ 2 (d) = d. Paths and loops can also be defined for combinatorial maps. A path of a combinatorial map G = (D, σ, α) is an ordered sequence of darts P = (d1 , ..., dn ) such that ∀i ∈ {1, ..., n − 1}, di+1 ∈ σ ∗ (α(di )). The reverse path α(P ) of P is defined by α(P ) = (α(dn ), ..., α(d1 )). A loop is a path P = (d1 , ..., dn ) such that d1 ∈ σ ∗ (α(dn )). The dual of a combinatorial map G = (D, σ, α) is the combinatorial map G = (D, σ, ϕ). It is well defined when the combinatorial map is (path-)connected, i.e. ∀d, d ∈ D, ∃ a path P = (d1 , ..., dn ) with d ∈ α∗ (d1 ) and d ∈ α∗ (dn ). In the following text, we will consider only connected combinatorial maps. The removal of an edge α∗ (d) removes d and α(d) from the initial combinatorial map. Consider a combinatorial map G = (D, σ, α) and a dart d ∈ D with d not being a bridge. The removal of the edge α∗ (d) creates the sub-map G \ α∗ (d) = (D \ α∗ (d), σ , α) defined by: −1 −1 σ (d ) = σ(d ) if d ∈ D \ {σ (d), σ (α(d))} −1 σ (σ (d)) = σ(d) if σ(d) = α(d) and σ(α(d)) = d −1 (σ (α(d))) = σ(α(d)) σ −1 σ (σ (d)) = σ(α(d)) if σ(d) = α(d) The contraction of an edge α∗ (d) transforms a combinatorial map G into a combinatorial map G where d and α(d) have been removed from the dual G . Consider a combinatorial map G = (D, σ, α) and a dart d ∈ D with α∗ (d) not being a self-loop. The contraction of the edge α∗ (d) creates the sub-map G defined by: G = G/α∗ (d) = G \ α∗ (d) The two transformations are illustrated in Fig. 2. 2.2
Homotopic Transformations
Let us recall that a continuous path is the image of the unit interval by a bijection f into some space X (f : [0, 1] → X). Two continuous paths defined by
Homotopic Transformations of Combinatorial Maps
137
f : [0, 1] → X and g : [0, 1] → Y are said to be homotopic iff there exists a continuous map H : X × [0, 1] → Y that transforms f into g: H(x, 0) = f (x) and H(x, 1) = g(x). Homotopic transformations are used to define fundamental groups in both continuous and digital topological spaces [1]. The fundamental group of a topological space X is the group formed by all equivalence classes of loops (paths f with f (0) = f (1)) under the equivalent relation of homotopy. We introduce the new notion of homotopy on combinatorial maps, derived from the definitions of [15] p.187. However, for combinatorial maps, the dual structure is straightforwardly defined as shown in the previous section, and the definition is simpler. We say that two paths are equivalent if one can be obtained from the other by a finite sequence of operations of the form: – if dk = α(dk−1 ) replace ..., dk−2 , dk−1 , dk , dk+1 ... by ..., dk−2 , dk+1 ... – or conversely replace ..., dk−1 , dk ... by ..., dk−1 , d, α(d), dk ... Its clear that two equivalent paths must contain the same loops. Consider the set G of all combinatorial maps. A mapping Φ from G onto itself is said to be homotopic if it transforms a combinatorial map G into a combinatorial map Φ(G) such that each pair of equivalent path of G is transformed into a pair of equivalent path Φ(G). As a direct consequence, there must be a bijection between the orbits of ϕ of G and of Φ(G). Theorem 1. The contraction of an edge α∗ (d) is a homotopic transformation. The proof is straightforward, as the contraction does not either remove nor create a face. This trivial result demonstrates the power of the notion of “homotopy” for combinatorial maps. Moreover, as underlined above, there is a direct interpretation of this notion for continuous topological maps. The contraction operation applied on combinatorial maps will lead to a description of topological maps having the same fundamental group (in the common continuous interpretation). This simple definition is completely equivalent to more complex frameworks [1], and can benefit from previously published results. Moreover, the contraction of pendant darts can be interpreted as the operation of simple point removal, neither of them changing the topology of the described set. In [9], a dual graph is transformed such that the degree of the surviving vertices is preserved. This rule can be applied to a combinatorial map. Theorem 2. The contraction of α∗ (d), where d is a redundant dart, preserves the cardinality of the orbits of surviving darts. We just give an intuitive idea of the result: Fig. 3 demonstrates the contraction of a redundant dart; as |σ ∗ (d)| = 2, the contraction of d does not change |σ ∗ (σ(α(d)))| and |σ ∗ (α(d))|. Contraction of redundant edges is connectivity preserving. It is a homotopic transformation, as it preserves the number of orbits of ϕ of a combinatorial map.
138
Jocelyn Marchadier, Walter G. Kropatsch, and Allan Hanbury
a) Initial map. b) Contraction of d. Fig. 3. Contraction of a redundant dart d
3 3.1
Weighted Combinatorial Maps Definitions
We introduce here new notions related to combinatorial maps whose darts are associated with a single real number. A weighted combinatorial map is a 4-tuple (D, σ, α, w) where (D, σ, α) defines a combinatorial map, and w : D → R is a function defined on D, associating a real number w(d) to each dart d ∈ D. Weights associated to darts can take any value, depending on the application. We restrict ourselves to the study of a particular class of weighted combinatorial maps, where two opposite darts d and α(d) have opposite weights. We say that a weighted combinatorial map M = (D, σ, α, w) is antisymmetric iff ∀d ∈ D, w(d) = −w(α(d)). The following notions interpret weights of darts of a weighted combinatorial map as difference of elevations of connected vertices. An upstream path is a path P = (d1 , ..., dn ) with only positive weights (∀i ≤ n, w(di ) > 0). A downstream path is a path P = (d1 , ..., dn ) with only negative weights (∀i ≤ n, w(di ) < 0). A plateau path is a path P = (d1 , ..., dn ) with only null weights (∀i ≤ n, w(di ) = 0). Since the weights on opposite darts of a weighted combinatorial map can have any value, the reverse of an upstream path is not necessarily a downstream path. However, for an antisymmetric weighted map, this is the case, as stated by the following theorem: Theorem 3. If a weighted combinatorial map G = (D, σ, α, w) is antisymmetric, then the opposite of every upstream path is a downstream path. Proof. By definition, the opposite of a path P = (d1 , ..., dn ) is the path P = (α(dn ), ..., α(d1 )). Suppose that P is upstream, then ∀i ≤ n, w(di ) > 0. As the map is antisymmetric, we have ∀i ≤ n, −w(α(di )) > 0, and P is downstream. The removal and the contraction transformations are defined as removal and contraction of the combinatorial map, and do not modify the weights of the remaining darts. 3.2
Homotopy Revisited
We define here the concept of homotopy for weighted combinatorial maps, in such a way that it is coherent with the definition of Serra ([15] p. 448). Consider the set G of all weighted combinatorial maps. A mapping Φ from G onto itself is said to be homotopic if it transforms a weighted combinatorial map G into a weighted combinatorial map Φ(G) such that:
Homotopic Transformations of Combinatorial Maps
139
1. combinatorial maps underlying G and Φ(G) are homotopic, 2. Φ preserves upstream and downstream paths. Two weighted combinatorial maps G1 and G2 are homotopic iff there exists a homotopic transformation Φ such that G2 = Φ(G1 ). In the preceding definition, the path preserving condition (condition 2) has to be understood as: any upstream (downstream) path is transformed into an upstream (downstream) path, possibly empty, and no new upstream (downstream) path is created. The contraction of a dart d of an antisymmetric weighted combinatorial map G = (D, σ, α, w) is path-preserving iff σ ∗ (d) = {d} (d is a pendant dart) or d is not a self loop and w(d) = 0 and ∀d ∈ σ ∗ (d), w(d)w(d ) ≤ 0 (the weights of d and of any of the darts adjacent to the same vertex have opposite signs). The contraction of path preserving darts is a homotopic transformation, as stated in the following theorem. Theorem 4. A path-preserving contraction of an antisymmetric weighted combinatorial map G is a homotopic transformation. Proof. Condition 1. is already proved in theorem 1. Condition 2. holds for the following arguments: If σ ∗ (d) = {d}, then d is a pendant dart, and its contraction does not create new upstream or downstream path. Consider an upstream path P = (d1 , ..., di−1 , di , di+1 , ..., dn ). By definition, ∀j ≤ n, w(dj ) > 0. By contraction of di , this path will be transformed into the path P = (d1 , ..., di−1 , di+1 , ..., dn ) which is upstream. Suppose that by contraction of a dart di with w(di ) = 0, the non-upstream path P = (d1 , ..., di−1 , di , di+1 , ..., dn ) is transformed into the upstream path P = d1 , ..., di−1 , di+1 , ..., dn ). Then, as P is upstream and not P , ∀j = i, w(dj ) > 0 and w(di ) < 0. As the combinatorial map is antisymmetric, we have w(α(di−1 )) = −w(di−1 ) < 0. Then w(α(di−1 ))w(di ) > 0 and the contraction of di is not path-preserving. This reasoning can also be used for proving that no downstream path is created. On Fig. 4.a, the contraction of a dart which is not path preserving is illustrated. We can remark that a downstream path which did not exist is created (in bold). On Fig. 4.b, the contraction of the path preserving dart d does not create any upstream nor downstream path.
4 4.1
Applications Gray-Tone Skeletons
In this section, we apply the preceding treatments to the computation of skeletons of gray-level images, i.e. thin subsets of the crest network of a gray level image [12]. Dual graph contractions invariant to monotonic transformations have been studied in [8]. The framework presented here is different in that it is based on an alternative original graph, and uses the notion of homotopic transformation for combinatorial map discussed above.
140
Jocelyn Marchadier, Walter G. Kropatsch, and Allan Hanbury
a) Non path preserving contraction.
b) Path preserving contraction. Fig. 4. Contraction of a dart d which is not path preserving
For skeletonisation applications, we start by constructing a weighted combinatorial map on a pixel based scale. Then, by applying some transformations which reduce the number of darts while preserving the homotopy of the combinatorial map until stability is reached, we obtain a compact representation of the skeleton of the original image. The initial combinatorial map can be obtained from a straightforward algorithm first introduced by M. Pierrot Deseilligny and al. [13]. An image I is defined as a function of digital support [0, xmax ] × [0, ymax ] to Z. We define I as the image: I (x, y) = ((I(x, y) dmax + D(x, y)) xmax + x) ymax + y
(1)
where D(x, y) is the distance of a point (x, y) to the nearest point with lower intensity and dmax = max(x,y) D(x, y) (D can easily be related to the classical distance transform of cross-sections of the original gray level images). I assigns to each pixel a unique value. The weighted combinatorial map is built using a simple algorithm based on a local analysis of the 8-neighborhood of each pixel p of I . Each 3 × 3 neighborhood is decomposed into sets of 4-connected components, the values of which are greater than the value of the central pixel. We construct the contour map by adding an edge (pair of conjugate darts) that connects the central pixel with the highest valued pixel of each component. The weights being associated to the darts are given by the difference between the end-vertex and the origin-vertex of each dart. The map is obviously antisymmetric. One can demonstrate that the combinatorial map is connected, and that a bijection between the faces (ϕ-orbits) of the combinatorial map and the local minima of I exists [13]. Homotopic transformations can then be applied in order to simplify the combinatorial map, and to get rid of the undesired edges. For example, darts that are either redundant or pendant, and whose contraction is path-preserving, can be contracted until stability in order to obtain the simplest combinatorial map (kernel of the transformation) describing the crest network of the image. Some
Homotopic Transformations of Combinatorial Maps
141
Fig. 5. Skeleton of a weighted combinatorial map
darts describing relevant features can be excluded from the contraction operation in order to preserve these features. This is equivalent to defining anchor points [13,14]. As a classical example of “anchor darts”, we may want to keep pendant darts with negative weights, corresponding to peaks in the image, leading to results very similar to [11]. Fig. 5 shows the results of path-preserving contractions until stability of the redundant darts (on the right), with pendant darts with negative weight as anchor darts (characterizing peaks in the original image). The recursive contraction of pendant darts is applied first, and then pathpreserving contraction of redundant darts is computed. The algorithm in that order leads to a time complexity linear in the number of edges (two passes on all edges). As the combinatorial map can be constructed within a single pass onto the original image, the complete algorithm is linear in the number of pixels of the original image. The obtained result is a compact combinatorial map describing the structure of the skeleton in a much more compact way than a raster graph, which may be thought of as a “graph”-skeleton. Hierarchies of combinatorial maps may be defined describing different simplification levels of the underlying graph. The homotopic kernel of the contraction (made up of loops only) may then be thought as the top level of the hierarchy. Other criteria than homotopy such as geometry, etc, may be considered. With the criteria used, the result is independant of the order of the contractions, and the algorithm can be implemented in parallel. The above scheme also works for different types of grid or digital topologies, by giving a proper embedding of the underlying combinatorial maps. 4.2
Curve-Based Runlength Encoding
The main idea proposed in this section is to construct a combinatorial map such that in a band defined by two consecutive vertices of the map, no topological events occur (no region appears or disappears, the interior of regions being described by convex domains). For example, in Fig. 6, a simple region described by its boundary is decomposed into two bands (A and B) within which the region is decomposed into connected components which are convex on each horizontal line. By carefully choosing the weights associated to each dart of the map, we can use the algorithm described in the preceding section in order to compute the
142
Jocelyn Marchadier, Walter G. Kropatsch, and Allan Hanbury
Fig. 6. Curve-based encoding of horizontal runs
combinatorial map. We consider an antisymmetric weighted combinatorial map G = (D, σ, α, w) such that each vertex σ ∗ (d) is a point of Z2 , each edge α∗ (d) relates a pair of (4 or 8)-neighbors, and for a dart d ∈ D, w(d) = y − y with y and y the vertical coordinate of respectively the vertex σ ∗ (α(d)) and σ ∗ (d). The algorithm computing the sought combinatorial map performs the contraction until stability of the path-preserving redundant darts. Fig. 6 illustrates the result of such an algorithm. On the left, the initial map is drawn. On the right, the contracted map is represented. The regions A and B indicate the rows of the image for which the connected components associated to each region are convex. The resulting map can be used for filling regions by a simple scan line algorithm which retrieve from the curves horizontal runs [5] describing the interior of regions. The first part of the algorithm is to sort the vertices of the computed map according to the vertical coordinates. Between two vertices, the curves of the map decrease monotonically. Thus, a simple loop can be used to deduce the left and right extremity of a run filling a connected component associated to a region. When a line corresponding to the end of the treated band is reached, the followed curves can be updated according to the topological events that occur on that line (insertion/deletion of followed curves, insertion/deletion of filled connected components).
5
Conclusion and Perspectives
In this paper, we have proposed the concept of homotopy for combinatorial maps and weighted combinatorial maps. This leads to defining homotopy between unembedded structures encoding topological maps. The main advantage in doing so is to define homotopic transformations independently of the embedding. Concepts, demonstration, and algorithms proposed are simple. Nenertheless, the proposed concepts lead to results completely analog to the more complex frameworks. This apparent simplicity demonstrates the usefulness of this research. As an application, we considered the computation of skeletons, for example, for which a compact structure is produced. This naturally extends the concept of skeleton leading to the new concept of “combinatorial map skeletons”, i.e. planar embedded graphs which describe homotopic digital or continuous topological structures in the classical sense. The encoding of a combinatorial map whose vertices define bands decomposing the described topological map into
Homotopic Transformations of Combinatorial Maps
143
convex connected components has been proposed in the same framework. Other applications are possible. For example, the computation of hierarchies of skeletons is straightforward. We could also consider the extension of the proposed framework into higher dimensions.
References 1. Ayala R., Dominguez E., Franc´es A. R., Quintero A., “Homotopy in digital spaces”, Discrete Applied Mathematics 125(1) (2003) 218–228 2. Bertrand G., “New Notions for Discrete Topology”, DGCI’99, Lecture Notes in Computer Science no. 1568 (1999) 3-24 3. Braquelaire, J.-P., Brun, L. “Image Segmentation with Topological Maps and Interpixel Representation”, Journal of Visual Communication and Image Representation, vol. 9(1) (1998) 62–79 4. Brun L., Kropatsch W. G., “Dual Contraction of Combinatorial Maps”, PRIPTR-54, Vienna University of Technology (1999), 37 pages 5. Burge M., Kropatsch W. G., “A Minimal Line Property Preserving Representation of Line Images”, Computing, vol. 62, (1999) 355–368 6. Fiorio C., “A topologically Consistent Representation for Image Analysis: the Frontiers Topological Graph”, DGCI’96, Lecture Notes in Computer Science no. 1176, (1996) 151–162 7. Gangnet M., Herv´e J.-C., Pudet T., Van Tong J.-M., “Incremental Computation of Planar Maps”, SIGGRAPH Proc., Computer Graphics, vol. 23(3) (1989) 345–354 8. Glantz R., Englert R., “Dual Image Graph Contractions Invariant to Monotonic Transformations of Image Intensity”, In Proc. of the 2nd Int. IAPR Workshop on Graph-based Representation, 1999. 9. Kropatsch W.G., “Property Preserving Hierarchical Graph Transformations”, Advances in Visual Form Analysis, C. Arcelli, L. Cordella and G. Sanniti di Baja Eds (1997) 340–349 10. Lienhardt P., “Topological Models for boundary representation: a comparison with n-dimensional generalized maps”, Computer Aided Design vol. 23(1) (1991) 59–82 11. Marchadier J., Arqu`es D., Michelin M., “Thinning Grayscale Well-Composed Images: A New Approach for Topological Coherent Image Segmentation”, DGCI’02, Lecture Notes in Computer Science no. 2301 (2002) 360–371 12. Meyer F., “Skeletons and Perceptual Graphs”, Signal Processing, vol. 16 (1989) 335–363 13. Pierrot Deseilligny, M., Stamon, G., Suen, C., “Veinerization: A New Shape Description for Flexible Skeletonization”, IEEE Trans. on PAMI, vol. 20(5) (1998) 505–521 14. Ranwez V., Soille P., “Order independent homotopic thinning for binary and grey tone anchored skeletons”, Pattern Recognition Letters, vol. 23 (2002) 687–702 15. Serra J., “Image Analysis and Mathematical Morphology”, Academic Press, London, 1982, 610 pages.
Combinatorial Topologies for Discrete Planes Yukiko Kenmochi1 and Atsushi Imiya2,3 1
Department of Information Technology, Okayama University 3-1-1 Tsushimanaka Okayama 700-8530 Japan [email protected] 2 National Institute of Informatics Department of Informatics, The Graduate University for Advanced Studies 2-1-2 Hitotsubashi Chiyoda-ku Tokyo 101-8430 Japan 3 Institute of Media and Information Technology, Chiba University 1-33 Yayoi-cho Inage-ku Chiba 263-8522 Japan imiya@{nii,media.imit.chiba-u}.ac.jp
Abstract. A discrete analytical plane DAP is defined as a set of lattice points which satisfy two inequalities. In this paper, we define a discrete combinatorial plane DCP and show relations between DAPs and DCPs such that a DCP is a combinatorial surface of a DAP. From the relations, we derive new combinatorial topological properties of DAPs.
1
Introduction
A plane P in the 3-dimensional Euclidean space R3 is given by an analytical form such that P = {(x, y, z) ∈ R3 : ax + by + cz + d = 0}
(1)
where a, b, c, d are real numbers. Let Z be the set of integers; Z 3 denotes the set of lattice points whose coordinates are all integers. The discrete analytical form of discrete planes in Z 3 , called discrete analytical planes, was introduced by Reveill`es [9] and defined such that DAP = {(x, y, z) ∈ Z 3 : 0 ≤ ax + by + cz + d < w}
(2)
where a, b, c, d are all integers. We call w a width of a DAP. If w = |a|+|b|+|c|, a DAP is called a standard plane SP [2,5], and if w = max{|a|, |b|, |c|}, a DAP is called a naive plane NP [9]. In this paper, we define discrete planes which have combinatorial topological structures, called discrete combinatorial planes DCPs. We construct a DCP by applying our algorithm of combinatorial boundary tracking [8] to one of digitized half spaces separated by P. Any DCP is defined as the combinatorial boundary of a polyhedral complex which is considered to be a polygonal decomposition of border points of such a separated region. Thus, a DCP is a topological space and not a subset of Z 3 as a DAP. Our main aim is to show relations between DAPs and DCPs such that a DCP is a combinatorial surface of a DAP. In [7], we have already shown such I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 144–153, 2003. c Springer-Verlag Berlin Heidelberg 2003
Combinatorial Topologies for Discrete Planes
145
relations between NPs and DCPs. Discrete combinatorial planes defined in [7] are based on simplicial complexes [1,10], but in this paper, they are based on polyhedral complexes [11]. This is because we would like to make use of our recent results from the polyhedral approach [8] to obtain relations between DAPs and DCPs some of which could not be derived from the simplicial approach [7], for example, relations between SP and DCP. From the relations, we obtain combinatorial topological properties of DAPs. First we consider configurations of points in a DAP at local regions which project on the coordinate plane z = 0 as rectangles whose sizes are 2 × 2 are called bicubes. If the sizes of rectangles are 3 × 3, they are called tricubes. It has been shown in [4,6] that there exist five different bicubes and forty different tricubes in NPs where 0 ≤ a ≤ b ≤ c, c > 0. Note that they have been obtained only for NPs, but it is possible to extend the results to SPs if we use local differences of combinatorial topological structures between an NP and an SP. We therefore obtain combinatorial topological structures in bicubes and tricubes, called combinatorial bicubes and tricubes, for both SPs and NPs. By observing combinatorial bicubes and tricubes, we show that a DCP is a 2-dimensional combinatorial manifold of a DAP. Similar properties have been given for SPs in [5] and for NPs in [6], respectively, but no proof and no detail are given in [6]. We also study connectivities of points in a DAP and the complement DAP and derive the same results as the previous work [3,5] whose proofs are different from ours.
2
Discrete Combinatorial Planes
In Rn , a convex polyhedron σ is the convex hull of a finite set of points in some Rd where d ≤ n. The dimension of σ is the dimension of its affine hull. An n-dimensional convex polyhedron σ is abbreviated to an n-polyhedron. A linear inequality a · x ≤ z is valid for σ if it is satisfied for all points x ∈ σ. A face of σ is defined by any set δ = σ ∩ {x ∈ Rd : a · x ≤ z} where a · x ≤ z is valid for σ. Definition 1. A polyhedral complex K is a set of convex polyhedra such that 1. the empty polyhedron is in K, 2. if σ ∈ K, then all faces of σ are also in K, 3. the intersection σ ∩ τ of two convex polyhedra σ, τ ∈ K is a face both of σ and of τ . The dimension of K is the largest dimension of a convex polyhedron in K. In Z 3 m-neighborhoods are defined by Nm (x) = {y ∈ Z n : x − y2 ≤ t}
(3)
setting t = 1, 2, 3 for each m = 6, 18, 26. We consider all convex polyhedra in Z 3 such that the vertices are all lattice points and any adjacent vertices are mneighboring each other for either m = 6, 18, 26. We call such convex polyhedra discrete convex polyhedra. We illustrate all discrete convex polyhedra with the
146
Yukiko Kenmochi and Atsushi Imiya Table 1. All discrete n-polyhedra for n = 0, 1, 2, 3. discrete convex polyhedra dim.
N6
N18
P1
P1
P2a
P2a
P2b
P4a
P3a
P3c
P4a
P8
P4e
P4g
P6a
P6b
N 26 P1
0 P2a
P2b
P2c
P4f
P3a
P3b
P3c
P4a
P4f
P5b
P5c
P4b
P4c
P4d
P4e
P4g
P6c
P7
P5a
P5b
P5c
P6a
P6b
P6c
P7
P8
1 2
3 P8
dimension of n = 0, 1, 2, 3 and with the m-neighborhood relations between the adjacent vertices for m = 6, 18, 26 in Table 1. We then construct a discrete polyhedral complex which is a collection of discrete convex polyhedra satisfying the three conditions in Definition 1 for each m-neighborhood system. Hereafter, n-dimensional discrete convex polyhedra and n-dimensional discrete polyhedral complexes are called discrete n-polyhedra and discrete n-complexes. We give some topological notions for discrete polyhedral complexes [1]. A discrete n-complex K is said to be pure if each of discrete n -polyhedra of K is a face of a discrete n-polyhedron of K where n < n. If K0 is any subcomplex of K, the complex consisting of all the elements of K0 and of all the elements of K each of which is a face of at least one element of K0 is called the combinatorial closure Cl(K0 ) of K0 in K. We consider a discrete polyhedral complex C as a topological representation of any subset V ⊂ Z 3 , i.e. a topological space by topologizing V. We first obtain a pure discrete 3-subcomplex O ⊆ C and then define the combinatorial boundary ∂O of O in the following [10]. Definition 2. Let O be a pure discrete 3-complex and Q be the set of all discrete 2-polyhedra in O each of which is a face of exactly one discrete 3-polyhedron in O. The combinatorial boundary of O is defined such that ∂O = Cl(Q). From Definition 2, we see that the boundary ∂O of a pure discrete 3-complex O is a pure discrete 2-subcomplex of O. Because discrete convex polyhedra are defined for each m-neighborhood system where m = 6, 18, 26, a discrete polyhedral complex C, a discrete pure 3polyhedron O and the combinatorial boundary ∂O are also defined for each m-neighborhood system. When we insist an m-neighborhood system considering for them, they are denoted by Cm , Om and ∂Om instead.
Combinatorial Topologies for Discrete Planes
147
Given a finite lattice-point set V ⊂ Z 3 , we show in [8] how to construct a pure discrete 2-complex ∂Om which is the combinatorial boundary of V. The idea is simple: we first obtain a discrete polyhedral complex Cm putting as many discrete convex polyhedra as possible into V so that all the vertices of discrete convex polyhedra are points in V and the dimensions of discrete convex polyhedra are maximum. Then we cut away less than 3-dimensional parts of Cm to obtain a pure discrete 3-complex Om . Finally, we extract the combinatorial boundary ∂Om from Om by following Definition 2. The full details and the more effective algorithm for obtaining ∂Om from V are found in [8]. In the sense of general topology, we define the set of border points of V such that Brm (V) = {x ∈ V : Nm (x) ∩ V = ∅}. (4) Let Sk(K) be the set of all vertices of discrete convex polyhedra in a discrete complex K. We then have the following important relations which will be used later in this paper. The proof is omitted in this paper because of the page limitation; it will be seen in our prepared paper1 . Theorem 1. For any subset V ⊂ Z 3 , we have relations such as Br6 (V) = Sk(∂O26 ) ∪ (Sk(C26 ) \ Sk(O26 )), Br26 (V) = Sk(∂O6 ) ∪ (Sk(C6 ) \ Sk(O6 )).
A plane P of (1) defines two digitized half spaces such as I− = {(x, y, z) ∈ Z 3 : ax + by + cz + d ≤ 0}, I+ = {(x, y, z) ∈ Z 3 : ax + by + cz + d ≥ 0}.
(5)
We apply the algorithm of combinatorial boundary tracking shown in the previous subsection to I+ , instead of V, for obtaining a discrete combinatorial plane DCPm which is a pure discrete 2-complex ∂Om . Table 2 illustrates how to obtain a Cm for each m = 6, 26 from I+ ; depending on a point configuration Hi , i = 0, . . . , 9, of I+ at each unit cubic region, we have a discrete polyhedral complex and we set Cm to be the union of these discrete polyhedral complexes for all unit cubic regions in Z 3 . Obviously I+ is a infinite set. Therefore, from a computational viewpoint, the algorithm will not end if it is applied to I+ . However, from a mathematical viewpoint, we see that DCPm is uniquely obtained from I+ . We then have the following inclusion relations. The proof which is omitted here can be similar to that of Lemma 1 in [7] 2 . Property 1. For any plane P, we have the relations such that Sk(DCP6 ) ⊇ Sk(DCP18 ) = Sk(DCP26 ). 1 2
Similar relations are shown in [8] with some illustrations; Br6 (V) = Sk(∂O26 ) ∪ Sk(C26 \ O26 ), Br26 (V) = Sk(∂O6 ) ∪ Sk(C6 \ O6 ). In [7] a discrete combinatorial plane is a simplicial complex, but it is easy to show that it has the same set of vertices as that of our DCP which is a polyhedral complex.
148
Yukiko Kenmochi and Atsushi Imiya
Table 2. A discrete polyhedral complex Cm for each configuration Hi , i = 0, . . . , 9, of points of I+ for m = 6, 26. We consider cases of 0 ≤ a ≤ b ≤ c, c > 0 in the table. H0
H1
H2
H3
H4
H5
H6
H7
H8
H9
C6
a point in I +
C26
a point in I +
Similarly to Brm (V) of (4), we can define a set of border points of I+ , called a discrete morphological plane such that DMPm = {x ∈ I+ : Nm (x) ∩ I+ = ∅}
(6)
for each m = 6, 18, 26. We then derive the next relations between DMP and DCP corresponding to Theorem 1 about the relations between Br(V) and ∂O. Lemma 1. For any plane P, we have relations such as DMP6 = Sk(DCP18 ) = Sk(DCP26 ), DMP26 = Sk(DCP6 ).
(7)
Proof. From Theorem 1, replacing V with I+ , we have DMP6 = Sk(DCP26 ) ∪(Sk(C26 ) \ Sk(O26 )), DMP26 = Sk(DCP6 ) ∪ (Sk(C6 ) \ Sk(O6 )). From Property 1, it is easily seen that we only need to show that the second terms are all empty, namely, Cm = Om for m = 6, 26. For the proof, we show that Cm is pure so that each of discrete n-polyhedron of Cm where n < 3 is a face of a discrete 3-polyhedron of Cm . We first consider the case of m = 6. Let us consider a discrete 2-polyhedron σ2 in Table 2, for example, H4. Setting H(i, j, k) to be a configuration of points of I+ at a unit cube whose vertices are eight lattice points such as (i + 1 , j + 2 , k + 3 ) for i = 0 or 1 for i = 1, 2, 3. From the configuration H(i, j, k) of H4, we see that H(i, j, k + 1) can be only H9 which has a 3-polyhedron σ3 . Thus σ2 is a face of σ3 ; the faces of σ2 are also faces of σ3 . Similarly, we can show that other discrete 2-polyhedra of H6, H7 and H8 are also faces of some discrete 3polyhedra of H9 if we consider the possible point configurations of the adjacent cubes. Let us consider discrete 1-polyhedra which are not faces of discrete 2polyhedra in Table 2, for example, a discrete 1-polyhedron σ1 of H2. From the configuration H(i, j, k) of H2, we see that H(i, j, k + 1) can be only H7 and σ1 is a face of the right-side discrete 2-polyhedron σ2 . Such σ2 is a face of a discrete 3-polyhedron of H9 as we have already shown in the above. Similarly, we can show that other discrete 1-polyhedra of H3, H5 and H6 are also faces of some discrete 2-polyhedra which are faces of some discrete 3-polyhedra of H9. Finally, let us consider discrete 0-polyhedra which are not faces of any discrete 1-polyhedra in Table 2, such as a discrete 0-polyhedron σ0 of H1. From the 1point configuration H(i, j, k) of H1, we see that H(i, j, k + 1) can be H5 or H6
Combinatorial Topologies for Discrete Planes
149
which has a discrete 1-polyhedron σ1 such that σ0 is a face of σ1 and σ1 is a face of a discrete 3-polyhedron of H9. Let us consider the cases of m = 26. In this case, we need to check only discrete 0-, 1- and 2-polyhedra of H1, H2, H3 and H4. Similarly to the case of m = 6, we find possible configurations H(i, j, k + 1) adjacent to H(i, j, k) of H1, H2, H3 and H4: H(i, j, k+1) can be only H5 or H6 for H(i, j, k) of H1, H7 for H2, H8 for H3, and H6 or H9 for H4. Therefore, all discrete 0-, 1- and 2-polyhedra are faces of some discrete 3-polyhedra. (Q.E.D.)
3
Relations between DAPs and DCPs
Given a plane P in R3 , obtaining SP, NP, DCPm for m = 6, 18, 26 respectively, we derive the next theorem. Theorem 2. For any P, we have relations such as SP = Sk(DCP6 ), NP = Sk(DCP18 ) = Sk(DCP26 ).
(8) (9)
The relations of (9) have been already proved in [7] (Theorem 2 in [7]).2 In this paper, we give a proof for (8). Our approach in the following is completely different from that given in [7] for (9). For a proof of (8), due to (7) in Lemma 1, we need to show only the following lemma. Note that it is easy to modify the following lemma for (9) such as NP = DMP6 . Lemma 2. For any plane P, we have SP = DMP26 . In order to prove this lemma, we need the following lemma. Lemma 3. For any plane P such that 0 ≤ a ≤ b ≤ c, c > 0, if a point (u − 1, v − 1, w − 1) ∈ I+ , then N26 (u, v, w) ⊂ I+ . Proof. Because (u − 1, v − 1, w − 1) ∈ I+ , we obtain that a(u − 1) + b(v − 1) + c(w − 1) + d ≥ 0 from (5). Setting (u , v , w ) ∈ N26 (u, v, w), we have u − 1 ≤ u , v−1 ≤ v , w−1 ≤ w , thus au +bv +cw +d ≥ a(u−1)+b(v−1)+c(w−1)+d ≥ 0 because a, b, c are not negative. (Q.E.D.) Proof of Lemma 2. For simplification. we set w = a + b + c for SP of (2) such that 0 ≤ a ≤ b ≤ c, c > 0. Similar proofs are easily derived for other Ps. Let us consider two Euclidean planes, P of (1) and P = {(x, y, z) ∈ R3 : ax + by + cz + d = a + b + c}. We see that SP is a set of lattice points between P and P . Obviously, a point (p, q, r) ∈ R3 is on P if (p − 1, q − 1, r − 1) ∈ R3 is on P. Geometrically, this means that there is a unit cube between P and P such that the two vertices (p, q, r) and (p − 1, q − 1, r − 1) of the unit cube are on P and P , respectively.
150
Yukiko Kenmochi and Atsushi Imiya
(i) For any point (u, v, w) ∈ SP, i.e. a point (u, v, w) between P and P (can be on P but not be on P ), we have 0 ≤ au + bv + cw + d < a + b + c from (2). Thus, −(a+b+c) ≤ a(u−1)+b(v−1)+c(w−1)+d < 0, so that (u−1, v−1, w−1) ∈ I+ . Because (u − 1, v − 1, w − 1) ∈ N26 (u, v, w), we have N26 (u, v, w) ∩ I+ = ∅. (ii) For any point (u, v, w) ∈ I+ \ SP, we have au + bw + cw + d ≥ a + b + c, thus a(u − 1) + b(v − 1) + c(w − 1) + d ≥ 0. Therefore we say (u − 1, v − 1, w − 1) ∈ I+ and obtain N26 (u, v, w) ∩ I+ = ∅ from Lemma 3. (Q.E.D.) (iii) From (i) and (ii), we have SP = DMP26 . From Theorem 2, we see that DCP18 and DCP26 are topological spaces on NP and DCP6 is a topological space on SP.
4
Combinatorial Topological Properties of DAPs
4.1
Combinatorial Bicubes and Tricubes
It has been shown in [4,6] that there exist five bicubes and forty tricubes in NPs where 0 ≤ a ≤ b ≤ c, c > 0. Considering each of five bicubes [6] at a region B(i, j, k) = {(p, q, r) ∈ Z 3 : p = i, i + 1 ; q = j, j + 1 ; r = k, k + 1} for (i, j, k) ∈ Z 3 , we obtain a combinatorial bicube such that CBm (i, j, k) = {σ ∈ DCPm : Sk({σ}) ⊆ B(i, j, k)} for m = 6, 18, 26. Similarly, considering each of forty tricubes [4] to be a union of eight bicubes, we obtain combinatorial tricubes such that CTm (i, j, k) = ∪(p,q,r)∈B(i,j,k) CBm (p, q, r) for m = 6, 18, 26. We illustrate all CTm (i, j, k) for m = 6, 26 in Fig. 1 and 2. We can easily obtain CT18 (i, j, k) which is similar to CT26 (i, j, k) by replacing CB26 (p, q, r) with CB18 (p, q, r) for (p, q, r) ∈ B(i, j, k); there are not many differences between CB18 (p, q, r) and CB26 (p, q, r). 4.2
Combinatorial Topological Properties
Let K be a polyhedral complex. For each vertex v ∈ Sk(K), the subcomplex consisting of all convex polyhedra σ of K which contain v such that v ∈ Sk({σ}) is called the star St(v, K) of v in K [1,11]. The link of v is then defined such as Lk(v, K) = Cl(St(v, K)) \ St(v, K) in [1,11] 3 . A star St(v, K) is said to be cyclic if Lk(v, K) is a simple closed broken line (i.e., if its elements are disposed in cyclic order, like the elements of a circle split up into sectors) [1]. If a star is cyclic, it is combinatorial equivalent of a disc and called an umbrella [10]. For each combinatorial tricube in Fig. 1 and 2, if we consider a star of each white vertex, we obtain 8 and 34 different configurations of stars for m = 6, 26, respectively. Note that we also obtain 34 configurations for m = 18. We see stars as polygons with diagonal lines in Fig. 1 and 2 and it is obvious that they are cyclic, i.e. umbrellas. The number of umbrellas is less than 40 because there are umbrellas of the same shape for different tricubes. Therefore, we obtain the following property. Note that similar properties are presented in [5,6]; a different proof is seen in [5] and no proof is presented in [6]. 3
The link is called the outer boundary in [1].
Combinatorial Topologies for Discrete Planes
151
Fig. 1. Combinatorial tricubes CT6 (i, j, k)s for SPs with combinatorial structures obtained from DCP6 s. Eight umbrellas of white vertices which are different from others are also shown as polygons with diagonal lines.
Property 2. Let us consider SP, NP and DCPm , m = 6, 18, 26 for a given P. We then see that DCP6 is a 2-dimensional combinatorial manifold of SP and that DCP18 and DCP26 are those of NP. 4.3
Connectivity Properties
A subset A ⊂ Z 3 is said to be m-connected if any pair of elements a, b ∈ A has a path a1 = a, a2 , a3 , . . . , ap = b such that ai+1 ∈ Nm (ai ) and ai ∈ A for every i = 1, 2, . . . , p − 1. Andr`es derived connectivity properties of DAP with the definitions of k-tunnel and k-separating [3] 4 . If the complement DAP of DAP in Z3 is not k-connected, DAP is said to be k-separating for k = 6, 18, 26. Considering the two regions such as A = {(x, y, z) ∈ Z 3 : ax + by + cz + d < 0}, 4
In [3], k is set to be 0, 1, 2. In this paper we set k = 26, 18, 6 instead to avoid the confusion.
152
Yukiko Kenmochi and Atsushi Imiya
Fig. 2. Combinatorial tricubes CT26 (i, j, k)s for NPs with combinatorial structures obtained from DCP26 s. There are thirty-four different umbrellas of white vertices which are shown as polygons with diagonal lines. Six combinatorial tricubes with asterisks have the umbrellas which are the same as the others.
B = {(x, y, z) ∈ Z 3 : ax + by + cz + d ≥ w}, if there are two k-neighboring points a and b such that a ∈ A, b ∈ B, DAP is said to have a k-tunnel for k = 6, 18, 26. The following properties are already presented in [3,5] and we derive them differently and more simply making use of Theorem 2; from Lemmas 4 and 5 and Theorem 2, we derive Properties 3 and 4 Property 3. A standard plane SP is tunnel free, and 6-connected. Property 4. A naive plane NP may have 18-tunnel but no 6-tunnel, and is 6separating, i.e. NP is 18-connected but not 6-connected. Lemma 4. Any Sk(DCPm ) is m-connected for each m = 6, 18, 26. Proof. From the definition of discrete polyhedra, a set Sk({σ}) for a discrete 2-polyhedron σ is m-connected. From Property 2, we see that any vertex v in
Combinatorial Topologies for Discrete Planes
153
DCPm has the star St(v, DCPm) which is an umbrella, and that Sk(St(v, DCPm)) is also m-connected. Because DCPm is a union of connected St(v, DCPm )s, Sk(DCPm ) is m-connected. (Q.E.D.) Lemma 5. Any DMPm for m = 6, 18, 26 is m-separating. Proof. From (6), any two points a, b such that a ∈ I+ \ DMPm and b ∈ I− are not m-neighboring. Thus, a DMPm is m-separating. (Q.E.D.)
5
Conclusions
We defined discrete combinatorial planes DCPm for m = 6, 18, 26 and showed the relations between DAPs and DCPm s as given in Theorem 2, such that a DCP6 is a combinatorial topology on an SP and a DCP18 or DCP26 is a combinatorial topology on an NP. From the relations, we obtained combinatorial topological properties of DAPs, called combinatorial bicubes and tricubes. By using them, we proved that any DAP is considered to be a 2-dimensional combinatorial manifold and also derived their connectivities properties. A part of this work was supported by JSPS Grant-in-Aid for Encouragement of Young Scientists (15700152).
References 1. P. S. Alexandrov, Combinatorial Topology, Vol. 1, Graylock Press, Rochester, New York, 1956. 2. E. Andr`es, “Le plan discret,” in Actes du 3e Colloque G´eom´etrie discr`ete en imagerie: fondements et applications, Strasbourg, september 1993. 3. E. Andr`es, R. Acharya, C. Sibata, “Discrete analytical hyperplanes,” Graphical Models and Image Processing, Vol. 59, No. 5, pp. 302–309, 1997. 4. I. Debled-Renesson, “Etude et reconnaissance des droites et plans discrets,” Th`ese de doctorat de l’universit´e Louis Pasteur, 1995. 5. J. Fran¸con, “Sur la topologie d’un plan arithm´etique,” Theoretical Computer Science Vol. 156, pp. 159-176, 1996. 6. J. Fran¸con, J. M. Schramm, M. Tajine, “Recognizing arithmetic straight lines and planes,” In LNCS 1176; Discrete Geometry for Computer Imagery, Proceedings of 9th International Workshop, DGCI’96, pp. 141–150, Springer-Verlag, Berlin, Heidelberg, 1996. 7. Y. Kenmochi, A. Imiya, “Naive planes as discrete combinatorial surfaces,” in LNCS 1953; Discrete Geometry for Computer Imagery, Proceedings of 9th International Conference, DGCI2000, pp. 249–261, Springer, 2000. 8. Y. Kenmochi, A. Imiya, “Discrete polyhedrization of a lattice point set,” in LNCS 2243; Digital and image geometry, pp. 148–160, Springer, 2001. 9. J-P. Reveill`es, “G´eom´etrie discr`ete, calcul en nombres entiers et algorithmique,” Th`ese d’´etat soutenue a ` l’universit´e Louis Pasteur, 1991. 10. J. Stillwell, Classical topology and combinatorial group theory, Springer-Verlag: New York, 1993. 11. G. M. Ziegler, Lectures on polytopes, Springer-Verlag: New York, 1994.
Convex Structuring Element Decomposition for Single Scan Binary Mathematical Morphology Nicolas Normand IRCCyN-IVC (CNRS UMR 6597) ´ Ecole polytechnique de l’universit´e de Nantes La Chantrerie, rue Christian Pauc, BP 50609, 44306 Nantes Cedex 3 France
Abstract. This paper presents a structuring element decomposition method and a corresponding morphological erosion algorithm able to compute the binary erosion of an image using a single regular pass whatever the size of the convex structuring element. Similarly to classical dilation-based methods [1], the proposed decomposition is iterative and builds a growing set of structuring elements. The novelty consists in using the set union instead of the Minkowski sum as the elementary structuring element construction operator. At each step of the construction, already-built elements can be joined together in any combination of translations and set unions. There is no restrictions on the shape of the structuring element that can be built. Arbitrary shape decompositions can be obtained with existing genetic algorithms [2] with an homogeneous construction method. This paper, however, addresses the problem of convex shape decomposition with a deterministic method.
1
Introduction
Mathematical morphology operators are time-consuming with large structuring elements and brute force algorithms. In the past, several methods have been described to reduce the cost of these operators. Two main approaches exist, the first one uses a decomposition of a large structuring element into a set of smaller ones. The result is obtained by a series of operations with small structuring elements. The overall cost is then directly connected to the number of operations and depends on the size of the initial structuring element. The second one consists in binarizing a distance map, which can be computed in a fixed number of image scans. It requires the structuring element to be expressed as a distance disk but then the computational cost is constant whatever the size of the structuring element. The method proposed here is based on a new generalized distance transform (GDT). The algorithm is quite similar to local distance propagation algorithms but in our case, distance increments are not constant over disks size, which allows for much more flexibility in the disk construction. As an example, we describe an algorithm to decompose any convex 2D polygon in a series of pseudo-distance disks for single scan distance map computation. I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 154–163, 2003. c Springer-Verlag Berlin Heidelberg 2003
Convex Structuring Element Decomposition
155
The overall cost of the mathematical morphology operators derived from this GDT is constant with the size of the structuring element. Moreover, they can be used in a pipeline fashion and have very low memory requirements. In section 2 existing structuring element decomposition methods will be recalled.
2 2.1
Distances and Structuring Elements Mathematical Morphology Operators
Let A and B be two sets of points in the discrete grid E with origin O, the neutral element for the symetry in E (p symetric element is denoted as pˇ). The erosion of A by the structuring element B is defined as: ˇ = {p| (B) ⊆ A} AB p
(1)
where (B)p is B translated by p: (B)p = {x + p|x ∈ B}. The erosion dual operator, the dilation can be defined as: ˇ c = {p| (B) ∩ A = ∅}. ˇ = (Ac B) A⊕B p
(2)
The notation ⊕ denotes the Minkowski sum of two sets i.e. the set of sums of two elements, one taken from the first set the other from the second one. These basic operators lead to a great variety of image transformations [3]. However, the algorithm directly derived from the fundamental definition given in (eq. 1) is not efficient for a large structuring element B, as it requires the exploration of all translated points of B for each point of the image. 2.2
Distance Map
The distance transform associates to any point x the smallest distance to a point outside the set X: (3) dX (x) = min(d(x, y)) y∈X
The distance map is linked to the mathematical morphology erosion by the fact that the set of points whose distance map values are at least r is the eroded set of X by D(r): ˇ A D(r) = {p|dX (p) ≥ r} (4) with D(r) = {p|d(O, p) < r}. Eroding a shape from a distance map consists in thresholding distance values, so the erosion cost depends only on the cost of the distance transform. Since algorithms exist to compute a distance map in a fixed number of scans [4,5,6], they can be used to erode with a constant cost whatever the size of the structuring element. For usual distances, each disk is constructed by the dilation of the previous disk with a basic structuring element as illustrated in fig. 1.a. In a sequential distance map computation, the symetric neighborhood is divided in two halves which are passed over the image once, in reverse order scans [4].
156
Nicolas Normand
Fig. 1. Some disk and structuring element construction examples. a: d4 disks (first row). b: octagonal distance disks (second row). c: chamfer distance d2,3 disks (third row). d: line elements obtained by a GDT are gathered in the last step (last row)
By moving the morphological center of the disks to the last scanned pixel, some one-pass algorithms can be obtained [7]. We can not refer to these disks as distance disks since the symetry property of distances is not verified anymore. Hence the transform is called a generalized distance transform (GDT). However, there is a strong constraint on the shape of the disks since the basic structuring element is unique for the whole set of disks, so this method will only apply to very specific structuring elements. By mixing different structuring elements, distances like the octagonal distance add some variability: each disk can be built from the previous by a different structuring element (fig. 1.b). Any shape that is decomposed in a series of dilations can be constructed with this method. However, distance transfom algorithms are only known for some specific cases (for instance when two building structuring elements are used periodically [7,8]). On the other side, another way of mixing different neighborhoods is used in chamfer distances (fig. 1.c): each disk is built from different-size disks according to local distances described in a neighborood mask. 2.3
Structuring Element Decomposition
The structuring element decomposition methods rely on the fact that a series of erosions with a set of structuring elements is equivalent to a single erosion with the Minkowski sum of the structuring elements: ˇ Cˇ = A (B ˇ ⊕ C). ˇ (A B)
(5)
Decomposition methods generally use a series of basic structuring elements computable by specific hardware machines in one clock cycle. The shape of the build-
Convex Structuring Element Decomposition
p p p p p
p p p rp p
p p p p p
p p p pb p
p p p pr p
p p p p p
p p p pb p
p p pb pr p
p p p p p
p p p pb p
p p p p p
p p p rp p
p p p p p
p p p p p p ppr p p
p p p p p
p p p p p p p p p pppr p p p
p p p p p
p p p p p
B(1) B(2)
P(1) P(2)
p p p pb p
B(3)
P(3)
p p pb pb p
p p pb pb p
p pb pb pr p
p p p p p
p p p p p p p p p p6 prp2 p pp1 p p
p p p p pp pp p p p pr p p p
p p p p p
p p p p p p p p p p6 prp2 p pp1 p p
B(4)
P(4)
157
Fig. 2. Series of structuring elements (top), series of polygons (bottom)
ing structuring elements depends on the hardware platform and convex polygon decomposition algorithms were presented for instance for linear shaped building structuring elements [9] and for 4 and 8-neighborhood parallel machines [1]. These decompositions lead to optimal morphological operator implementations for parallel or pipeline architectures, but conversely to distance-based methods, the computational complexity depends on the size of the structuring element. Since some convex polygonal structuring elements can not be decomposed by Minkowski sums, an extra final set union can be needed as displayed in fig. 1.d [10]. In this case, the initial decomposition can be obtained from a single scan GDT. However, the complexity of the last step depends on the shape of the structuring element. Other methods use a fixed number of scans, but are still restricted to simple shapes such as lines [11] or rectangles [12] and also need combination for other kinds of elements [13]. In order to deal with arbitrary shapes, combinatorial and genetic algorithms have been proposed [2].
3
Convex Polygon Decomposition for Single Scan Erosion
The proposed method is the combination of a construction scheme used to recursively build structuring elements (section 3.1), a generalized distance transform (section 3.2) and a decomposition algorithm (section 3.3) which determines how structuring elements have to be assembled to obtain a given convex polygon. A sample polygon P is shown in fig. 2. It is convex since it is equal to the intersection of all the half-planes supported by its sides. The aim of the method is to obtain structuring element B, the discrete counterpart of P. B is the set of discrete points of the square grid included in the closed polygon P. The construction is directed by a series of increasing polygons {P(i)}i∈[2..N ] used as templates for the structuring elements assembling. Each structuring element B(i) is the discrete counterpart of its corresponding polygon P(i), defined in the continuous plane. 3.1
Structuring Element Construction
Like the methods recalled in the previous section, the proposed structuring element construction scheme recursively builds a family of increasing elements.
158
Nicolas Normand
Table 1. Structuring element construction table (see text concerning column 1) i I1 (i) I2 (i)
1 0 0
2 1 0
3 2 1
4 3 3
However, each structuring element can be built from different smaller elements (conversely to dilation-based construction) and size increments are not fixed for each neighborhood (conversely to chamfer disks). This method operation can be compared to local distance increment with varying weights. Each structuring element B(i) is the union of smaller structuring elements translated according to a set of neighbors {pk }. For instance, in fig. 2, B(2) = B(1) ∪ (B(1))p1 B(3) = B(2) ∪ (B(2))p1 ∪ (B(1))p2 B(4) = B(3) ∪ (B(3))p1 ∪ (B(3))p2 where B(1) is the simplest element, only containing the origin {O}. A general expression is given by introducing Ik (i), the index of the element used in neighborhood pk for B(i), B(0) the empty set and neighbor p0 the origin: ∀i = 2 . . . N B(i) = k∈[0,K] (B(Ik (i)))pk (6) The values of Ik (i) are summarized in a construction table. Such a table is shown in table 1 for fig. 2 structuring elements. Despite B(1) is not built stricto sensu, an extra column 1 is however added for later computing purposes. Disk Increase. By adding p0 = O with I0 (i) = i − 1, we have (B(Ik (i)))p0 = B(i − 1), so B(i − 1) is always a subset of B(i). Without loss of generality, we can assume that each Ik table contains increasing values (Ik (i) ≥ Ik (i − 1)). Comparison with other Methods. This construction scheme generalizes the disk or structuring element construction methods previously recalled. Chamfer distances use constant local distance increments which correspond to a fixed difference between a constructed disk size i and the included disk size Ik (i). Chamfer distance da,b is obtained with Ik (i) = i − a or Ik (i) = i − b depending on pk . Dilation series are obtained by taking Ik (i) = i − 1 for each pk belonging to the structuring element used to build B(i). Each pk can be any point in the discrete plane. The neighbor set is determined from the shape of the structuring element (section 3.3). 3.2
Single Pass Generalized Distance Transform and Erosion
The value of the distance map at point x is the index of the largest structuring element centered in x contained in X. It is built from elements located on x
Convex Structuring Element Decomposition
159
Table 2. Generalized distance transform table Mk . M1 (1) = 2 because disk 2 can be built with disk 1 in neighborhood 1 but disk 3 can not j M1 (j) M2 (j)
0 1 2
1 2 3
2 3 3
3 4 4
4 4 4
neighbors: {x + pk }. The current element size is the greatest one that contains all the neighbor elements: dX (x) = max{i|∀k, Ik (i) ≤ dX (x + pk )} = min{max{i|Ik (i) ≤ dX (x + pk )}} k
In order to speed up the distance transform computation, we introduce Mk (j): Mk (j) = max(i|Ik (i) ≤ j). The distance transform is then: mink (Mk (dX (x + pk ))) if x ∈ X dX (x) = 0 otherwise
(7)
(8)
Mk (j) represents the index of the largest element B(i) that can be built with B(j) in the neighborhood k. Mk (j) is at least equal to 1 due to column 1 filled with 0 in table 1. Mk can be computed once from the construction table Ik . table 2 shows the Mk values corresponding to the example construction values displayed in table 1 The overall complexity is linear with the number of image pixels like all GDT. Furthermore, if all the neighborhoods are chosen to be causal (all pk precede O in the scan order) then only one image scan is needed. While it is also true for some GDT for few restricted shape classes, this GDT works with any convex polygonal shape as it will be shown in next section. The erosion of X by B = B(N ) is finally: ˇ ⇔ dX (p) = N p∈X B The causality hypothesis implies that the last vertex in scan order must be equal to the origin O. The single-scan algorithm structure permits to use it in a pipeline chain, with one stage for each morphological operation (for instance, a morphological opening requires two pipeline stages). A first implementation has been realized on a Xilinx Spartan IIE FPGA educational card fed with a PAL video signal. The FPGA handles the input synchronization signal and regenerates it on the output. Due to the low cost of the algorithm, at least 8 morphological pipeline stages with different structuring elements can be handled at video rates without extra resources (only in-chip memory is used). The inputoutput delay is only a fraction of a pixel for each stage and an extra delay can be introduced in the output synchronization signal to compensate the translation of the structuring element center.
160
Nicolas Normand Table 3. Half-plane location table i A0,−1 (P(i)) A−1,2 (P(i)) A−2,1 (P(i)) A1,0 (P(i))
3.3
0 −∞ −∞ −∞ −∞
1 0 0 0 0
2 0 1 1 0
3 0 2 2 0
4 0 4 3 0
Convex Structuring Element Decomposition
The proposed structuring element construction evokes the anisotropic growth of a single crystal in which epitaxial layers of atoms are successively deposited on a crystal seed. The shape of the crystal is influenced by the physical properties of atoms which constrain the orientations and by the speed of the deposit which may differ from an orientation to another. The orientation of its sides remain constant during the growth. The shape of the structuring element is controlled by artificial constraints which maintains the direction of its sides. However, as the discrete plane produces orientation artifacts especially for small structuring element sizes, the growth is proceeded on a family of continuous polygons which are then used as templates for the structuring elements. The decomposition method is able to process any convex polygon i.e. any closed shape that can be obtained from the intersection of half-planes. For instance, P(4) shown in fig. 2 is bounded by the following half-planes: −y ≤ 0 A0,−1 (P(4)) ≤ 0 −x + 2y ≤ 4 A−1,2 (P(4)) ≤ 4 (x, y) ∈ P(4) ⇔ ⇔ (9) −2x + y ≤ 3 A−2,1 (P(4)) ≤ 3 x≤0 A1,0 (P(4)) ≤ 0 with: Ap,q (X) = max (px + qy) (x,y)∈X
The decomposition algorithm consists in moving the planes from their initial seed position (tangent at the origin with Apl ,ql = 0) to their final position. A series of positions is computed for all half-planes as displayed in table 3 for fig. 2 polygons. Half-planes locations are set in such a way that the sides of intermediate polygons P(i) have a constant orientation and an increasing length: ∀l, ∀i, ∀j, det[vl+1,i − vl,i ; vl+1,j − vl,j ] = 0
(10)
∀i > 0, ∀j ≥ i, ||vl+1,i − vl,i || ≥ ||vl+1,j − vl,j ||
(11)
Structuring Elements from Polygons. The index of the structuring element used in neighborhood pk is determined as the largest polygon translated by pk that is included in P(i): Ik (i) = max(i : (P(Ik (i)))pk ⊆ P(i) = max(i : ∀l, Apl ,ql P(Ik (i)) + Apk ≤ Apl ,ql P(i))
Convex Structuring Element Decomposition
161
This expression of Ik (i) ensures that every structuring element B(i) is a subset of the corresponding polygon P(i): ∀i, B(i) ⊆ P(i). Polygon Set. As a result of the polygon side properties (constant orientation and increasing length, eq. 10, 11), the series of polygons can be iteratively constructed by Minkowski sums in the continous plane [14]. A direct consequence on the construction is that: vl,i = vl,j + pk ⇒ Ik (i) ≥ j ⇒ (vl,j ∈ B(j) ⇒ vl,i ∈ B(i)) Therefore, if the set of intermediate vertex vl positions {vl,i }i∈[i..N ] contains a path from O to vl using neighbor moves (plus extra non discrete positions), then vl is necessarily contained in P. Algorithm 1 takes this point into consideration. Half-plane positions are guided by the movement of vertices. Each vertex is initially located at the origin and follows a path to its final position using the two neighbors of its influence cone. The algorithm ensures that each position in the path is correctly reached by the half-plane, i.e. that half-plane boundaries meet exactly at the vertex intermediate positions. Neighbor Selection. This phase is actually the first in the decomposition process, it must ensure that the obtained structuring elements are convex and that paths to vertices can be obtained with polygons of increasing side length (eq. 11). Each pair of successive neighbors defines an influence cone that have some similarities with chamfer disks geometry [15]. In an influence cone, each pixel is reached by a series of moves along the two neighbors. The main difference with chamfer distances is that the vertices of the structuring element do not necessarily belong to boundaries between cones. Therefore the number of needed neighbors is generally less than the number of vertices in P. There are two constraints on the pair of neighbors (pk , pk+1 ): i all pixels from the influence cone must be reachable from the neighbors. A necessary and sufficient condition is that pk and pk+1 form a regular cone (det[pk , pk+1 ] = 0) [15]. ii all pathes to a point must be included in the structuring element. If p ∈ B(i) is in the cone (pk , pk+1 ) and p = apk + bpk+1 then all the points in the parallelogram (O, apk , p, bpk+1 ) must be in B(i).
4
Conclusion
We have introduced a unified structuring element construction scheme, the corresponding generalized distance transform algorithm and a convex polygon decomposition method. Eroding an image only requires a single regular scan of the image pixels which differs from the classical chamfer distance transform by table lookups instead of constant local distant increments. The computational properties of these algorithms allow their use in a pipeline manner, optimizing time and memory consumption for series of morphological operations.
162
Nicolas Normand
Algorithm 1 Half-planes shift computation i←0 while ∃l : vi,l = vi do {Update reached vertices intermediate position} for l ← 1 to L do {Test of vertex vl,i } if ∀m, Apm ,qm ({vl,i }) ≤ AHm then {All half-planes contain vl,i , move it to the next intermediate location} choose neighbor k vl,i+1 ← vl,i + pk else vl,i+1 ← vl,i end if end for {Half-plane shift} for l ← 2 to L − 1 do {Reach the closest vertex vl or vl+1 } AHl ← min(Apl ,ql {vl , vl+1 }) end for i←i+1 end while
Algorithm 2 Determination of neighbors {Selection of the two initial neighbors} p1 = (v2x / gcd(v2x , v2y ), v2y / gcd(v2x , v2y )) p2 = (vLx / gcd(vLx , vLy ), vLy / gcd(vLx , vLy )) {Neighbor insertion for condition i} while k < K do if det(pk , pk+1 ) = 1 then {(pk ,pk+1 ) is not a regular cone} find a et b with extended Euclide’s algorithm such that bpk x − apk y = 1 n≥
bpk+1 x −apk+1 y
pk x pk+1 y −pk y pk+1 x
>n−1
insert (a + npk x , b + npk y ) after k (indices above k are shifted) end if k ←k+1 end while {Neighbor insertion for condition ii} for n = 1 to L do {Detection of the cone (pk ,pk+1 ) containing vn } while vn is not atteignable do {division of the cone} insert neighbor pk +pk+1 after k if vn is in the second half-cone pk+1 +pk+2 (indices after insertion) then k ←k+1 end if end while end for
Convex Structuring Element Decomposition
163
References 1. Xu, J.: Decomposition of convex polygonal morphological structuring elements into neighborhood subsets. IEEE trans. on PAMI 13 (1991) 153–162 2. Anelli, G., Broggi, A., Destri, G.: Decomposition of arbitrarily shaped binary morphological structuring elements using genetic algorithms. IEEE trans. on PAMI 20 (1998) 217–224 3. Serra, J.: Image analysis and mathematical morphology. Academic Press London (1982) 4. Rosenfeld, A., Pfaltz, J.: Distances functions on digital pictures. Pattern Recognition Letters 1 (1968) 33–61 5. Yokoi, S., Toriwaki, J., Fukumura, T.: On generalized distance transformation of digitized pictures. PAMI 3 (1981) 424–443 6. Borgefors, G.: Distance transformations in digital images. CVGIP 34 (1986) 344– 371 7. Wang, X., Bertrand, G.: An algorithm for a generalized distance transformation based on minkowski operations. In: ICPR. (1988) 1164–1168 8. Wang, X., Bertrand, G.: Some sequential algorithms for a generalized distance transformation based on minkowski operations. IEEE Trans. on PAMI 14 (1992) 1114–1121 9. Gong, W.: On decomposition of structure element for mathematical morphology. In: ICPR. (1988) 836–838 10. Ji, L., Piper, J., Tang, J.: Erosion and dilation of binary images by arbitrary structuring elements using interval coding. Pattern Recognition Letters (1989) 201–209 11. van Herk, M.: A fast algorithm for local minimum and maximum filters on rectangular and octogonal kernels. Pattern Recognition Letters 13 (1992) 517–521 12. Van Droogenbroeck, M.: Algorithms for openings of binary and label images with rectangular structuring elements. In Talbot, H., Beare, R., eds.: Mathematical morphology. CSIRO Publishing, Sydney, Australia (2002) 197–207 13. Soille, P., Breen, E., Jones, R.: Recursive implementation of erosions and dilations along discrete lines at arbitrary angles. IEEE Transactions on Pattern Analysis and Machine Intelligence 18 (1996) 562–567 14. Ohn, S.: Morphological decomposition of convex polytopes and its application in discrete image space. In: ICIP. Volume 2. (1994) 560–564 15. Thiel, E., Montanvert, A.: Chamfer masks: Discrete distance functions, geometrical properties and optimization. In: ICPR. Volume III. (1992) 244–247
Designing the Lattice for Log-Polar Images V. Javier Traver and Filiberto Pla Dep. de Llenguatges i Sistemes Inform` atics Universitat Jaume I, E12071-Castell´ o, Spain {vtraver,pla}@uji.es
Abstract. Log-polar images have been being used for pattern recognition and active vision tasks for some years. These images are obtained either from true retina-like sensors or from conventional cartesian images by software conversion. From the hardware perspective, the design of such log-polar retinae faces its own technological limitations. In the case of software remappers, however, their very flexibility has led to many researchers to use them with little or no justification of the choice of the particular log-polar layout. In this paper, a set of design criteria are proposed, and an approach to choose the parameters involved in the log-polar transform is described. This kind of design not only could be used in simulation software, but also could act as design guidelines for artificial hardware-built retinae. Keywords: Log-polar transform, receptive fields, design criteria.
1
Introduction
Motivation. After its biological foundations [12], the log-polar image representation has been adopted in fields of computer vision such as pattern recognition [16] and active vision [4]. Three basic techniques exist for obtaining log-polar images: – Hardware retinae, which involve the design and manufacture of very specialized sensors which directly yield log-polar frames [6, 17, 9]. – Software remappers, which implement the log-polar transform by taking as input conventional cartesian images [8, 3, 13]. – Virtual sensors, which simulates the log-polar mapping via special-purpose hardware, also with cartesian images as input [7, 5]. The main challenge faced by the first choice (true log-polar sensors), relates to technological obstacles during their design and fabrication, which, in part, have been overcome over time [11]. In contrast, software-based simulations of the logpolar transform have an amazing flexibility, allowing an easy implementation of different log-polar models each with a variety of designs. Traditionally, however, scarce attention has been paid to the selection or justification of the proper values for the parameters of the log-polar transform.
Research partly funded by Conselleria d’Educaci´ o, Cultura i Ci` encia, Generalitat Valenciana, under project CTIDIB/2002/333.
I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 164–173, 2003. c Springer-Verlag Berlin Heidelberg 2003
Designing the Lattice for Log-Polar Images
165
This fact seems particularly apparent in the case of software conversion, probably due to the very nature of software. Importance. Nevertheless, criteria for the design of the log-polar layout would be of great help for the practitioners of computer vision interested in using logpolar imagery. Furthermore, the choice of the log-polar geometry has an impact on the performance of the algorithms used in certain tasks. Finally, this kind of design guidelines might also benefit the conception of new silicon retinae. Related Work. Among the scarce work addressing the topic of log-polar design, we can mention [2], where it is studied how log-polar mapping parameters affect the performance of a vergence control algorithm. In [14], the log-polar sensor design is driven by the relation between the geometry parameters and 3D sensing precision requirements. Several alternatives for the more specific problem of fovea design are proposed in [15]. Quantitative measures of the quality of logpolar sensors are given in [10, 11]. Although not directly related to the geometric design, these measures are useful for comparison between different sensors. The importance of a good choice for the transform parameters is stressed and considered in [1]. Our Work and Structure of the Paper. With respect to these works, we propose a set of general design criteria, and a means to find the transform parameters meeting these criteria. The rest of the paper is organized as follows. Section 2 describes the log-polar model and its parameters. This is the model on which design criteria are discussed in Section 3. Based on these design considerations, parameters of the transform can be selected as described in Section 4. Finally, concluding remarks are given in Section 5.
2 2.1
Log-Polar Mapping Definition and Basic Parameters
Among the different log-polar image representations proposed in the literature, we choose the central blind-spot model because of its interesting properties [13] (e.g., retinal rotations and scalings map both to simple shifts in the cortical plane). Under this model, the log-polar coordinates are defined as: ρ ,θ , (1) (ξ, η) loga ρ0 with (ρ, θ) being the polar coordinates defined from the cartesian coordinates x2 + y 2 , arctan xy . Because of the discretization, (x, y) as usual, i.e., (ρ, θ)
the continuous coordinates (ξ, η) become the discrete ones (u, v) = (ξ, q · θ), 0 ≤ u < R, 0 ≤ v < S, with R and S being the number of rings and sectors S of the discrete log-polar image, and q = 2π sectors/radian. The notation z denotes the common floor operation, i.e., the largest integral value not greater than z. Having chosen R, ρ0 (the radius of the innermost ring), and ρmax (the radius of the visual field), the transformation parameter a is computed as a = exp(ln( ρmax ρ0 )/R). If the original cartesian image is sized M × N , ρmax can be
166
V. Javier Traver and Filiberto Pla
(a)
(b)
(c)
(d)
Fig. 1. Log-polar mapping: (a) grid layout example (10 × 16), (b) original cartesian image (256 × 256), (c) cortical image (64 × 128), (d) retinal image (256 × 256) obtained by the inverse mapping from (c)
defined as ρmax = 12 min(M, N ), and the log-polar transform is centered at the N foveation point (xc , yc ) = M 2 , 2 . Illustrative Example. An example of a log-polar transformation is shown in Fig. 1, from which several observations can be made. First of all, it can be appreciated the much smaller size of the cortical image (Fig. 1(c)) compared to the original uniformly-sampled image (Fig. 1(b)), which illustrates the data reduction property. Second, the small arrows radially disposed in the cartesian image become magnified and parallel one to each other (see Fig. 1(c)), which demonstrates how (i) visual acuity is higher in the fovea area, and (ii) rotations become translations along the angular axis. Third, note in the retinal visualization of the cortical image (Fig. 1(d)) how edges near the image center are much sharper than edges at the periphery, because of the space-variant resolution. 2.2
Derived Properties
From the basic parameters involved in the log-polar transform, we define other parameters which are a quantification of some properties of the log-polar layout. These measures will later be used in Section 3 for formalizing the design criteria. Log-Polar Image Size. It is simply the total number of pixels, i.e., N = R · S. Aspect Ratio of Receptive Fields. The aspect ratio of a geometric entity is the ratio between its width and its height. Given that a receptive field (RF) is not rectangular, its width is not well-defined. As an approximation, we can consider its outer or its inner boundary as its width (or even a function of both of them). Here, the length of the inner arc will be chosen as the RF’s width (see Fig. 2). With these considerations, for any RF at the same eccentricity u, we have that its width will be given by (arc equals angle per radius): w(u) =
2π 2π · ρu−1 = · ρ0 · au−1 , S S
(2)
Designing the Lattice for Log-Polar Images
167
¾
11111111111 00000000000 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111
½
Fig. 2. Geometric elements involved in the computation of the area and the aspect ratio of receptive fields
and its height by h(u) = ρu − ρu−1 = ρ0 · au − ρ0 · au−1 = ρ0 · au−1 (a − 1). Therefore, the aspect ratio γ(u) =
(3)
w(u) is: h(u)
2π 2π · ρ0 · au−1 2π w S = = S = . γ= h ρ0 · au−1 (a − 1) a−1 S(a − 1)
(4)
By observing Eq. 4, it can be noticed that the aspect ratio is not a spacevariant quantity: all RFs in a log-polar grid have the same aspect ratio. This interesting result, however, does not hold in other log-polar models (e.g., in Jurie’s model [8]). Area of Receptive Fields. Because of the space-variant nature of the logpolar geometry, RFs at different eccentricities cover a different surface. The area of a RF can be found by computing the area of a circular annulus and then dividing the result by the number of sectors. The area of a circular annulus at eccentricity u is π · ρ2u+1 − π · ρ2u = π(a2 · ρ2u − ρ2u ) = π · ρ2u (a2 − 1). Then, the area σ of a single RF will be a S-th part of this, i.e.: σ(u) =
π · ρ2u (a2 − 1) . S
(5)
Oversampling. In a software-based implementation of the log-polar mapping, RFs near the center (the fixation point) become much smaller than the cartesian pixels (Fig. 3(b)). As a result, information contained in cortical images at the center, where most pixels are, become highly redundant. This situation is known as “oversampling”, because cartesian images are oversampled, i.e., sampled at
168
V. Javier Traver and Filiberto Pla
(a) in the periphery
(b) in the fovea
Fig. 3. Relative size between RFs and cartesian pixels: in (a) RFs (in dark solid lines) are larger than cartesian pixels (in dotted lines); in (b) cartesian pixels (in dark solid lines) are larger than RFs (in dotted lines)
a frequency higher than its maximum frequency. Undersampling also occurs at periphery, where cartesian pixels and RFs happen to be in a many-to-one relationship (Fig. 3(a)). However, this undersampling is not only desired, but it is the very essence of the selective data reduction of discrete log-polar images. The maximum oversampling occurs at the innermost ring (u = 0), and this is the one which it is interesting to be quantified. Note that the area of a RF at cartesian pixels . Because near the fovea (i.e., for ring u, σ(u), is expressed as No. 1 receptive field small u), this ratio tend to become smaller than the unity, we choose the inverse of the area at u = 0 as the (maximum) oversampling, so that, its units become No. receptive fields , which is closer in meaning to the oversampling effect. Thus, 1 cartesian pixel the oversampling is quantified as o = σ(0)−1 ,
(6)
i.e., the number of RFs covering a single underlying cartesian pixel.
3
Design Criteria
Limiting Computational Complexity. In computer vision applications, running time and memory space requirements are proportional to the number of pixels of the images to be processed. Therefore, a simple way of bounding computational resources can be modeled as N < Nmax , i.e., imposing that the size of the log-polar image keeps under a certain value (Nmax ), according to the computational power available. Having Unit Aspect Ratio RFs. In theory, it is possible to choose any combination for R and S. Even with the above constraint, there are many possibilities for these parameters. However, not all of these combinations result in “good” log-polar grids. As an example, see Fig. 4, where different layouts are shown,
Designing the Lattice for Log-Polar Images
(a) 15 × 60
(b) 40 × 20
169
(c) 20 × 40
Fig. 4. Log-polar grids with different aspect ratios in the RFs’ geometry: (a) γ < 1, (b) γ > 1, (c) γ ≈ 1
which illustrate what the problem can be. In one case (Fig. 4(a)), pixels are too much “elongated” (when R S); in the other (Fig. 4(b)) , pixels are too much “flattened” (when R S). The problem in both cases is the same: the log-polar pixels (actually, their associated receptive fields) have a “wrong” aspect ratio, either too small or too big. We believe this is an undesirable feature when applying some operations on the image, because neighbor RFs would be at different distances along the radial and angular directions. From a different point of view, we might be interested in having comparable resolutions in both the radial and angular axes. Therefore, preserving the aspect ratio close to 1, i.e., having RFs approximately squared (Fig. 4(c)), allows local image processing operators be applied correctly, and log-polar images mapped back to the cartesian domain are per2π ceptually better. Then, by forcing γ = w h = 1, we have γ = S(a−1) = 1. To fulfill this constraint, we can choose S as a function of R (or vice versa). As a is already a function of R, we can write: S=
2π Sγ . a−1
(7)
Minimizing Oversampling. In some applications, oversampling is not desirable or, at least, should not be too big. Ideally, null oversampling would be 2 2 −1 π·ρ0 (a −1) achieved with o = σ(0)−1 = = 1. Then, S S = πρ20 (a2 − 1) So .
(8)
Preserving Small Objects Observable. The log-polar model considered in this paper is characterized by having a central blind spot (as can be seen in Fig. 1(a) and Fig. 4). This implies that objects centered in the visual field will only be detectable if they are bigger than the sensor’s blind area. Therefore, if objects with radius rmin are to be observed (at least partially), we should have ρ0 < rmin .
170
V. Javier Traver and Filiberto Pla
Having a Wide Field of View (FOV). Just one of the appealing properties of log-polar images is that they offer a good compromise between three important requirements: resolution, image size and FOV. The FOV’s width can be modeled as ρmax .
4
Finding the Mapping Parameters
Free parameters. There are a total of six parameters Ψ = (R, S, ρ0 , ρmax , a, q) involved in the log-polar model. In principle, these are the parameters whose values should be found. However, R, ρ0 , ρmax , and a are related, so that each of them can be found from the three others. Second, ρmax will be considered a fixed parameter because: (i) the log-polar image is computed from cartesian images, so the size of the latter gives the value for ρmax ; (ii) this parameter only affects the scale of the sensory layout. Third, q is a function of S. Finally, with the design criteria considered above, S can be found from R. Therefore, only two parameters remain free: R and ρ0 . Trading criteria. As usual with any design process, there are conflicting criteria: to observe small targets, ρ0 should be small; to have small oversampling, ρ0 should be large. Therefore, a trade-off solution is required. We propose the use of a user-selectable parameter, λ, which weighs the relative importance given to these two criteria. Notice that these criteria were expressed as constraints on the values of S (Eqs. 7 and 8). On the other hand, it can be shown that So < Sγ . Then, we suggest to use the following combined constraint: S = So + λ(Sγ − So ),
λ > 0.
Therefore, the lower λ is, the more importance is given to null oversampling. The closer λ is to 1, the more importance is paid to unit aspect ratio RFs. Regarding the aspect ratios, γ > 1, for λ < 1, and γ < 1 for λ > 1. Algorithm. Going a step further, a mere user of the log-polar transformation should be concerned as little as possible about particularities and details of the log-polar model being used. Therefore, the user requirements should be expressed in terms of higher-level design criteria. To that end, we propose a simple procedure (Algorithm 1) in which the input from the user is: – – – –
Nmax , the approximate number of pixels for the resulting log-polar image; α, the allowable error between Nmax and the total number of pixels N found; rmin , the size of the smallest object that should be visible; and λ, the trade-off value weigthing small oversampling versus close-to-one aspect ratio RFs.
This algorithm proceeds iteratively. Initially, an estimate is set for R from the required Nmax (e.g., assuming R = S). At each iteration, a new set of mapping parameters, Ψ , is computed, and R is updated from the newly found N , using
Designing the Lattice for Log-Polar Images
171
ComputeMappingParameters(R, ρ0 , λ) : Ψ (R, S, ρ0 , ρmax , a, q) a ← exp(ln( ρmax )/R) // ρmax considered given a priori ρ0 So ← πρ20 (a2 − 1) // Null oversampling // Unit aspect ratio RFs Sγ ← 2π/(a − 1) // Compromise solution S ← So + λ(Sγ − So ) S q ← 2π return Ψ (R, S, ρ0 , ρmax , a, q) FindGoodDesign(Nmax , α, rmin , λ) : Ψ (R, S, ρ0 , ρmax , a, q) ρ0 ← √ rmin /2 // Any convenient function of rmin R ← Nmax // Any reasonable initial guess repeat Ψ ← ComputeMappingParameters(R,ρ0 ,λ) β ← ΨR · ΨS /Nmax R ← R/β // Rectify estimate until |β − 1| < α return Ψ (R, S, ρ0 , ρmax , a, q)
Algorithm 1: Finding the parameters of a log-polar layout from design criteria formalized quantitatively the amount of deviation, β, as a corrective factor. The process is repeated until N and Nmax are close enough (according to how demanding the user has been by specifying α). Notice that the value of a particular parameter of the 6-tuple Ψ is denoted using the name of that parameter as a subindex (e.g., ΨR is the value of R in Ψ ). Examples. Table 1 shows four examples of input parameters and results obtained by using the algorithm described above. The resulting real values for R and S have been rounded to the nearest integers, and o and γ have been approximated to 2 decimal places. Table 1. Design examples: input requirements and resulting parameters and measures Example 1 2 3 4
Input Criteria λ Nmax rmin α 2000 20 0.1 0.9 2000 10 0.1 0.9 2000 10 0.1 0.1 4000 5 0.1 0.7
R 28 32 93 63
S 65 56 23 69
N 1820 1792 2139 4347
Results ρ0 o 10 1.03 5 3.20 5 4.04 2.5 26.62
γ 1.00 1.08 7.77 1.41
# iters. 13 127 3 1
In the first example, because rmin is relatively high, it was possible to have both small oversampling and unit aspect ratio RFs. The total number of pixels (N = 1820) differs in less than 10% (as specified with α = 0.1) from Nmax = 2000. In the second example, rmin is smaller, and it can be appreciated that the algorithm takes longer (127 vs. 13 iterations) to find a good combination of mapping parameters. It is interesting to appreciate that, because the criterion of
172
V. Javier Traver and Filiberto Pla
having unit aspect ratio is stressed more than having low oversampling (because λ = 0.9), we get a good aspect ratio (γ = 1.08), but oversampling has increased (o = 3.2). In the third example, less attention is paid to having unit aspect ratio RFs (λ = 0.1), this resulting in a very high, unreasonable aspect ratio. In the fourth example, an intermediate trade-off value λ = 0.7 is provided. This example would be of a design meant for visualization purposes: higher number of pixels (Nmax = 4000), and small observable targets (rmin = 5). In this case, oversampling is not an issue, because its effect is not visually perceivable, it only affects redundancy in data in the log-polar image. Notice the very high oversampling we incur in this case (o = 26). Comments. The idea of an algorithm processing high-level design specifications and yielding low-level mapping parameters is very attractive. In this sense, Algorithm 1 represents an effort along the line of automating the design process. However, because the procedure was basically driven to achieve a given total number of pixels, other criteria could not easily be met at the same time, or compromise solutions are not dealt conveniently. In practical terms, for certain input requirements, this algorithm might not find a solution (and waste iterations in the attempt), or give up with improvable solutions in just one or a few iterations (like example 4 in Table 1).
5
Conclusions
Little effort has been paid in the past to a proper selection of the parameters of the log-polar mapping, in particular when the transform is implemented in software. After a brief description of a log-polar model, this paper examines possible design criteria that should guide the choice of the values of the mapping parameters. A mathematical expression has been derived for each criterion. Then, it is discussed how these design constraints could be met. Because the different criteria lead to contradictory goals, only trade-off solutions are possible. Although the design process can be completely trial-&-error-based, an algorithm has been proposed to help designers and end users find reasonably adequate solutions. Interestingly, the input are specified as high-level design requirements. Further work could be directed to develop some algorithm which considers and explores the design space more effectively.
References 1. R. Alan Peters II, M. Bishay, and T. Rogers. On the computation of the log-polar transform. Technical report, School of Engineering, Vanderbilt University, Mar. 1996. http://www.vuse.vanderbilt.edu/~rap2/papers/oncomplp.pdf. 2. A. Bernardino and J. Santos-Victor. Sensor geometry for dynamic vergence: Characterization and performance analysis. In Workshop on Performance Characteristics of Vision Algorithms, ECCV, 1996. (Also as TR 01/96 at VisLab, Lisbon, Portugal).
Designing the Lattice for Log-Polar Images
173
3. A. Bernardino and J. Santos-Victor. Visual behaviors for binocular tracking. Robotics and Autonomous Systems, 25:137–146, 1998. 4. M. Bolduc and M. D. Levine. A review of biologically motivated space-variant data reduction models for robotic vision. Computer Vision and Image Understanding (CVIU), 69(2):170–184, Feb. 1998. 5. J. R. del Solar, C. Nowack, and B. Schneider. VIPOL: A virtual polar-logarithmic sensor. In Scandinavian Conf. on Image Analysis (SCIA), pages 739–744, Finland, 1997. 6. J. V. der Spiegel, G. Kreider, C. Claeys, I. Debusschere, G. Sandini, P. Dario, F. Fantini, P. Belluti, and G. Soncini. A foveated retina-like sensor using CCD technology. In C. Mead and M. Ismail, editors, Analog VLSI and Neural Network Implementations, Boston, 1989. DeKluwer Publ. 7. T. E. Fisher and R. D. Juday. A programmable video image remapper. In SPIE Conf. on Pattern Recognition and Signal Processing, volume 938 (Digital and Optical Shape Representation and Pattern Recognition), pages 122–128, 1988. 8. F. Jurie. A new log-polar mapping for space variant imaging. Application to face detection and tracking. Pattern Recognition, 32:865–875, 1999. 9. F. Pardo, J. A. Boluda, J. J. P´erez, B. Dierickx, and D. Scheffer. Design issues on CMOS space-variant image sensors. In SPIE Conf. on Advanced Focal Plane Arrays and Electronic Cameras (AFPAEC), Berlin, Germany, Oct. 1996. 10. A. S. Rojer and E. L. Schwartz. Design considerations for a space-variant visual sensor with complex-logarithmic geometry. In Intl. Conf. on Pattern Recognition (ICPR), pages 278–285, 1990. 11. G. Sandini, P. Questa, D. Scheffer, B. Dierickx, and A. Mannucci. A retina-like CMOS sensor and its applications. In Proc. 1st. IEEE SAM Workshop, Cambridge, USA, Mar. 2000. 12. E. L. Schwartz. Spatial mapping in the primate sensory projection: Analytic structure and relevance to perception. Biological Cybernetics, 25:181–194, 1977. 13. V. J. Traver. Motion Estimation Algorithms in Log-polar Images and Application to Monocular Active Tracking. PhD thesis, Dep. Llenguatges i Sistemes Inform` atics, Universitat Jaume I, Castell´ on (Spain), Sept. 2002. 14. C. F. R. Weiman. Exponential sensor array geometry and simulation. In Preprint from the Proc. of SPIE, Orlando, Florida, Apr. 1988. Vol. 938 (Digital and Optical Shape Representation and Pattern Recognition). 15. C. F. R. Weiman. Log-polar binocular vision system. NASA Phase II SBIR Final Report, Dec. 1994. 16. J. C. Wilson and R. M. Hodgson. Log-polar mapping applied to pattern representation and recognition. Computer Vision and Image Processing, pages 245–277, 1992. 17. R. Wodnicki, G. W. Roberts, and M. D. Levine. A foveated image sensor in standard CMOS technology. In Custom Integrated Circuits Conf., Santa Clara, May 1995.
On Colorations Induced by Discrete Rotations ´ Bertrand Nouvel and Eric R´emila Laboratoire de l’Informatique du Parall´elisme UMR CNRS - ENS Lyon - INRIA 5668 ´ Ecole Normale Sup´erieure de Lyon 46, All´ee d’Italie 69364 LYON CEDEX 07 - France {bertrand.nouvel,eric.remila}@ens-lyon.fr
Abstract. We consider a non numerable family of colorations induced by discrete rotations. The symbolical dynamical system associated with the coloration is first explained. We introduce then a group that supports the dynamics of the system. The periodical cases are precised, they are induced by Pythagorean triples. Finally, a proof of the quasi-periodicity of the colorations, and a description of asymmetrical colorations conclude this paper.
1
Introduction
The search for discrete rotation algorithms that have similar properties as euclidian rotations (bijectivity, commutativity, etc.) was started by Andr`es, R´eveill`es ([3], [8]), ten years ago. It remains today one of the most interesting –and hardest– problems of discrete geometry theory. In this paper, we focused on the image of a single point’s neighbors as transformed by discrete rotations. We have embedded this neighborhood information into each point as a color, therefore describing the transformation as a coloration of the grid Z2 . This paper documents our investigations of local deformations, by the bias of a study of these colorizations. We explain when and why the colorizations investigated are periodical,or asymmetric. We prove their quasi-periodicity, in the aim of laying the foundation for ongoing research based on their use.
2
Definitions
We denote by x the integer part of x: the integer such that x ≤ x < x + 1. The rounding function, or point-discretization function is defined as [x] = x + 0.5; it may be applied to vectors, components by components. We may notice that [(-. 5, . 5)] = (0, 1). The composition of a function f with the rounding function will be denoted by [f ]. We define the application {.} by {x} = x − [x]. For a binary relation r on set E, ie r is a subset of E 2 , we denote f (r) as the
Financed via CIFRE by TF1 french television channel.
I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 174–183, 2003. c Springer-Verlag Berlin Heidelberg 2003
On Colorations Induced by Discrete Rotations
175
set of all pairs (f (x), f (y)) with (x, y) belonging to r, r(x) is the set of all y such that (x, y) belongs to r. RC(r), that stands for “relative coding”, will refer to the binary relation1 formed by all (x, y − x) for (x, y) belonging to r. In the real plane R2 , i, j will refer to the unit vectors (1, 0) and (0, 1). Assuming v is a vector, we denote the horizontal (resp. vertical) coordinate of v by vx (vy ). U denotes the application of Z2 that maps the point p to p + j. If U (p) stands for “Up”, one easily guesses what will be D(p) for down, L(p) for left and R(p) for right. By VU we mean the binary relation that links any point of the plane with its upward neighbor, (VU = {(x, U (x))|x ∈ Z2 }). By V4 it’s the 4-neighborhood (Von Neumann neighborhood) of a point in a the discrete plane Z2 that is denoted (V4 = VL ∪ VR ∪ VD ∪ VU ). When an application f is bijective, f −1 denotes the inverse application. If it is not one-to-one, f {−1} will denote the application that maps to x the set of elements y such that f (y) = x. If x ∈ R2 , {−1} the set [[x]] will be called a discretization cell associated to x; it is the unit square centered on [x]. In this document, an arrow is an element of A = {−1, 0, 1}2 . A coloration is an application that maps a point of Z2 into a finite set, which is called the colorset. (The elements of the colorset may be different from what we usually call colors, this is just a finite set). The colorset we are going to use across this document will be P(A), the set of subsets of A. A coloration C is deemed periodical, if and only if there exists two non colinear vectors v and v of Z2 , such that for all x ∈ Z2 , C(x) = C(x + v) = C(x + v ).
3
Description of the Studied Colorations
rα is the euclidian rotation in the real euclidian plane. We consider the application that maps a point x of Z2 to the set of the arrows to its neighbors after a discrete rotation [rα ]. Formally: Gα (x) := (x ∈V4 (x)) {[rα ](x ) − [rα ](x)} The information Gα can be also be affected to the discretization cell of rα (x). {−1} , we can also affect to x the union of Gα (y) For a discretization cell c = [x] for all y such that rα (y) ∈ c: We denote by Gα the following coloration. Gα (y) Gα (x) := {−1} (y∈[rα ] (x)) Gα (x) := {[rα ](y ) − [rα ](y)} {−1} (y∈[rα ] (x)) (y ∈V4 (y)) (G ) ’s construction is now detailed: We consider the discrete lattice2 Z2 . We rotate it, thus we get rα (Z2 ), the rotated lattice is represented in dark on figure 1. a. On Z2 , with dashed lines, we have also represented its dual, which corresponds to the Voronoi diagram of Z2 , and divides the space into cells; These cells are the discretization cells. If a real point v is located in the cell associated with an integer coordinate point p the [v] = p. The exact behaviour of the relation on the border is induced from the behavior of the discretization operator [.]. 1 2
Note that we assume a “minus” operator has been defined between the elements. A lattice is here the couple (Z2 , V4 ).
176
´ Bertrand Nouvel and Eric R´emila
0 2
3 2
5 2
Fig. 1. Construction of the coloration Gα (y) and Gα (x).
The neighborhood relation V4 on the rotated lattice is rα (V4 ). (see figure. 1. a) We have overlined here the edges of the relation of rα (V4 ) for one point of the rotated lattice. The relation is then discretized according to the underlying network3 . Each edge of the relation will be moved to the nearest integer point (see fig. 1b), and thus we obtain [rα ](V4 ). And we draw the edges of the corresponding relation ([rα ](V4 )(x)), see fig. 1c), for the point y that fall in a cell centered on x after rotation. We consider, RC([rα (V4 )])(x), the relative coding of the precedent relation. This is required in order to have a finite color set (which would be independant of the point considered in the relation). Moreover this provides the ability to compare, the colors of two points directly. {−1} {−1} The cardinal of [rα ] is at most 2. If the cardinal of [rα ] (x) is zero {−1} then x is a hole in Gα . If the cardinal of [rα ] (x) is one then x is a normal {−1} (x) is two then x is a double point in Gα . point in Gα . If the cardinal of [rα ] (see section 4). When x is a double point in Gα then its associated code is the superposition (the union of the sets of arrows) of the code in G that were affected to the points that transform to x via [rα ]. An example of G that denotes colors using arrows is presented on figure 2.
4
Elementary Properties
During all this article, we are going to consider that α ∈ [0 . . . π/4]. We are going now to present first some basic fundamental properties. Proofs have been omitted. They are available in [7]. – Two V4 -neighbors can not be two holes. – In Gα each color contains exactly of 3 or 4 different non null-arrows. If it is only 3 arrows then it means that there is one null-arrow4 . 3 4
The one that has not been rotated. And not that two arrows merge.
On Colorations Induced by Discrete Rotations
177
Fig. 2. Sample representation of the G coloration for a random angle (α = 0.54977832 rad). The set of arrows inside a cell c = [[p]]−1 represents the value G ([p]). The background color behind the arrows is function of the number of points that have an image in the discretization cell. Finally, the axis have also been a bit darkened.
– In Gα each color contains exactly of 0, 4, or 6 different non null-arrows. If it is 0, the point considered is a hole, if it is 4 then it is a normal point, and finally, if there are 6 arrows, then it is a double point (issued from two 3-arrows symbol in Gα ). It is important to note that two non null-arrows merge. 2 – The application α →Gα is injective from the [0 . . . π/4] to P(A)Z . – The colorations admit a central symmetry for all angles except a numerable set called special angles, which will be presented later in this text. – In G, there exists a application that maps a color, to the arrow that denotes only the position of a specified neighbor (U (x), R(x), L(x) or D(x)).
5
Algebraic Properties
A window is the products of intervals5 on Z. We will denote by [px , px + sx [Z ×[py , py + sy [Z the window located at p and of size s; it contains sx .sy points of Z2 . We define a pattern as a function of a window [0, sx [Z ×[0, sy [Z to the colorset Q. Let C be a coloration and π a pattern of size (sx , sy ). If there exists a point 5
We denote intervals on Z by [a, b[Z .
178
´ Bertrand Nouvel and Eric R´emila
0 ≤ α < π/6
π/6 < α ≤ π/4
Fig. 3. In the figure above, we see an example of a map that binds a frame in the torus to it’s associated symbol. If the image of a point p by {rα } is in a the frame Is , then Gα = s. Of course, the arrows in the symbol indicate the location of the cells where the rotated neighbors of that point are.
p ∈ Z2 such that for all 0 ≤ tx < sx and for all 0 ≤ ty < sy , we have C(p + t) = π(t), then we say that the pattern π appears (at p) in the coloration C. We define the torus T = (R/Z)2 , we will often use {.} as a projection on this torus. And generally when we represent it we will represent it from − 12 to 1 2 such that the {0} is placed at the center of the square representing the torus. Therefore we can identify this representation of the torus and a discretization cell. We define a frame as products of projections of real intervals on the torus T. The following theorem is fundamental, due to the fact that it will be at the basis of all the analysis of the colorations. Theorem 1 (Fundamental Theorem). There exists a partition I1 , . . . , In of the torus into a subset of generally 25 frames of the torus T such that for all Gα (x) there exists a frame fi such that for all x , Gα (x ) = Gα (x) if and only if {rα (x )} ∈ fi . Proof. (elements of proof) Without lost of generality, we will focus only on the position of the right neighbor p + i of point p, and only on the question: “Is this neighbor mapped by [rα ] on the same vertical coordinate as p or not?” We have [rα ](p) · i = [rα ](p + i) · i, if and only if {rα }(p) · i + 12 < 1 − cos(α). Therefore that splits the torus in two frames according to a vertical line located at x = 12 − cos(α). We will obtain the same kind of results for the other quotient colorations6 . Finally we obtain of 4 vertical lines and 4 horizontal lines that splits of the torus (generally in 25 parts when no lines are merged). 6
Considering the positions of others neighbors (U (x), L(x), D(x), R(x)), relatively to me in one of the two directions i or j.
On Colorations Induced by Discrete Rotations
179
The partitions we present (fig. 3) are easily built by attaching a rotated unit vector to each corner of the square, and by splitting the unit square vertically and horizontally on the other end of that vector. The construction lines are dashed. Corollary 1 (Fundamental theorem for G ). There exists a partition I1 , . . . , In and Ii1 , Ii2 , Ii3 , Ii4 of the torus T into frames such that Gα (x) = n if and only {−1}
if ∀y ∈ rα
(x), ({rα }(y) ∈ In or {rα }(y) ∈ In ).
The proof relies on similar ideas. Theorem 2 (Fundamental theorem for patterns in Gα (resp. Gα )). To any pattern π that appears in the coloration Gα (resp. Gα ), it is possible to associate a frame Iπ , such that the pattern π appears in Gα at a position p if and only if {p} ∈ Iπ . Moreover, for all size vector s, the set of patterns of size s partition the torus. Proof. We consider a rectangular pattern π1 appearing at p0 in a coloration Gα and of size s. The pattern appears: for all t, with 0 ≤ tx < sx and 0 ≤ ty < sy , we get Gα (p + t) = π(t) . Using the fundamental theorem,for all t, with 0 ≤ tx < sx and 0 ≤ ty < sy , {rα (p + t)} ∈ If (π(t)) . In addition, {.} is a morphism for addition, therefore for all t, with tx < sx and ty < sy , ({rα (p)}+{rα (t)}) ∈ {If (π(t)) }, and thus {rα (p)} ∈ {If (π(t)) −rα (t)}. Finally we pose Iπ1 = ∩0
6
Study of Periodicity and Quasiperiodicity
A coloration C is quasi-periodical if and only if for all patterns π that appears in C, there exists a size of window s, such that π1 appears in all windows of size s. We introduce the two vectors iα = rα (i) and jα = rα (j). We consider G the abelian group generated by: Zi + Zj + Ziα + Zjα . We note that G is invariant by integer translation and rotation of angle π/2. G is the subgroup of {G} of T. The subgroup is generated by {iα } and {jα }. 7
It appears at p0 , {rα (p0 )} ∈ Iπ1 .
180
´ Bertrand Nouvel and Eric R´emila
The application {.} is a one-to-one map from G to G ∩ F , where F is a unit window [β, β + 1[×[γ, γ + 1[ in R2 . Let p ∈ R2 , p ∈ G if and only if {p} ∈ G . G can equivalently be introduced as the set of points of {rα (p)} for all p ∈ Z2 . We are going to investigate the structure of G, we define the set SG so {||v||, v ∈ G \ {(0, 0)}}. We can consider the two following cases: either inf(SG ) = l0 , and l0 = 0, either inf(SG ) = 0. 6.1
The Case inf (SG ) = l0 , and l0 = 0, the Discrete Case
We are in the case where the lower bound of SG is not null. Proposition 1. Let l0 be this lower bound. There exists a vector e of G which has l0 as norm. Proof. We consider the compact crown8 of depth [l0 , l0 + ]. There exists a sequence of vectors of G such that the norms of the vectors converges to l0 , and thus a vector v0 . There exists an extracted subsequence that converges. The sequence of the difference between two consecutive terms of this sequence converge to 0, if the bound will not have been reached, from the previous sequence we would extract an arbitrary small element. Thus we would have a contradiction. Proposition 2. We define e = rπ/2 (e); we have G = Ze + Ze . In addition, G is finite. Proof. Let’s assume that there is an element k of G outside G = Ze + Ze . We are in a two dimensional space therefore, that would generate a smallest vector in G, which would contradict the minimality of e. G is finite since G ∩ T is finite and {.} is a one-to-one map to G . Theorem 3. If G is finite then the colorations Gα and Gα are periodical. Proof. Since G is finite, there exists a constant K such that K{e} = 0 in G . Let’s stands that e = ae i + be j + ce iα + de jα . Which gives to us: {e} = ce {iα } + de {jα }. Moreover c2e + d2e = 0 if and only if G is not included in Z2 (which does not occur if we suppose α = 0 module π/2). Let x be a point of Z2 . {rα (x+Kce i+Kde j)} = {rα (x)+Kce iα +Kde jα } = {rα (x)} + Kce {iα } + Kde {jα } = {rα (x)} + K{e} = {rα (x)}. This proves that coloration G is periodical of period Kce i + Kde j. Using e in a similar process, we find that the coloration is periodical of period: −Kde i + Kce j. Therefore the coloration is periodical (using the fundamental theorem 1). G is assumed to be discrete, the coloration is periodical. There exists constants K1 and K2 such that K1 {e} + K2 {e’} = {i}. One may project the vector e and e on one dimension of the space. Hence, one may view this equation as 8
the set: Bl0 + \ Bl0 , where Bx is the opened ball of radius x
On Colorations Induced by Discrete Rotations
181
one where the two unknowns are cos(α) and sin(α) (all others parameters are integers). This implies that cos(α) and sin(α) are both rationnal. And thus that α is a Pythagorean angle 9 . There are particular cases of Pythagorean, for which discrete rotation is bijective, for more details have a look at [6]. Conversely, when α is Pythagorean,cos(α) and sin(α) are rationnal, G is discrete, all points are separated and G is finite. 6.2
The Case Where inf (SG ) = 0, the Dense Case
We are in the case were the lower bound of SG is null. For all > 0, there exists a vector v of G such that ||v || ≤ . Thus, there exists in G a grid formed by squares of side less than . In addition G is a dense subset of R2 . Theorem 4. For all angle α ∈ [0 . . . π/4]R , if inf(SG ) = 0 then Gα is quasiperiodical. Proof. We consider G = [−1 . . . 1]2R ∩ (e Z + e Z) the set generated by linear composition of e and e , (orthogonal vectors in G with a norm less or equal to ), and for which the components on i, j are less than 1. It has the property that for all elements z of G and x of G, there exists an element y of G such {x + y} is at a distance at most of z. All elements y of G may be written so y = ay i + by j + cy iα + dy jα . G is finite, so there exists constants C and D , such that for all y ∈ G , we have |cy | < C and |dy | < D . Consider now a window F,x of size 2C × 2D centered on x. For all element y of G , there exists an integer element x’ of this window such that {rα }(x’) − {rα }(x) = {y}. One may take x’ = x + cy i + dy j. In other words: {rα }(F,x ) contains {rα }(x) + {G }. Let x be a point of Z2 , and π a pattern of the coloration, this pattern π appears at p if and only if {rα }(p) is in fixed frame ZM of T (theorem 2). We note that this ZM contains a frame of size µ × µ (without loss of generality, we may suppose it square). Thus if we consider = µ/2, then we ensure that an element {y + rα (x)} belongs to ZM (because there will be one in any square of dimension µ × µ), which will provide us an element x of F,x is such that {rα }(x ) is in ZM . This proves quasi-periodicity of Gα . More precisions on the quasiperiodicity can be added considering the continued fraction development of cos(α) and sin(α), then using the three distance theorem (see [9], a good survey is also [1]).
7
Asymmetrical Cases
Let α and α be two angles, we define a distance on colorations considering the distance of the first point that differs in the two colorations: 9
A Pythagorean angle is an angle α which may be written α ≡ arctan( ab ), where (a, b, c) is Pythagorean triples, ie a, b, c ∈ N and a2 + b2 = c2 . Pythagorean angles are angles for which both cosinus and sinus are rational.
182
´ Bertrand Nouvel and Eric R´emila
2 d(Gα , Gα ) = 2− inf{d∈R,||p||=d and p∈Z |Gα (p)=Gα (p)} We introduce the following function: st : [0, π/4]R → R+ which to α associates lim→0 supα ∈]α−,α+[ d(Gα , Gα ). An angle α is stable if st(α) = 0. An angle α is unstable if st(α) = k, k ∈ R+ \ {0}. Also we say that the configuration is unstable. A configuration is special, if and only if, there exists a point p of Z2 that is mapped after transformation on the border of a discretization cell.
Proposition 3. For an angle α, the following propositions are equivalent: – Gα is special – Gα is unstable – Gα is asymmetric Proof. The only interesting case is: unstable ⇒ special, all others implications are immediate. If it is unstable, then it means that there is point p of rα (Z2 ) such that any slight variation in α would make change the cell’s it belongs to. Therefore rα (p) is on an edge of discretization cell. By definition, it means that Gα is a special configuration. As corollary we get that the pythagorean configurations are stable.
8
Conclusion
The quasi-periodicity of the colorations studied provide us with information on the constraints of an algorithm that could generate the colorations. It may be important to say that there is a simple algorithm which takes for input a point p of the plane and the coloration induced by a discrete rotation of angle α, and returns the image of p by discrete rotation. As long as no finite memory automaton may generate a strictly quasiperiodical sequence, and in order to provide the required input to the algorithm, it will be necessary to study in depth the simple cases formed by the periodical and special colorations. To answer the previous question, it is also possible to explore these colorations from a combinatoric viewpoint basing our researches on studies of bi-sturmian sequences(see [10], [5], [4]). Discrete rotations may also be considered as an alternative interpretation of patterns issued from a particular class of 4-to-2 quasi-periodic tilings. Finally, one may define similar colorations in order to study any quasiisometry – particularly, quasi-affine transformations. In ongoing research, we will compare these colorations with the ones generated by other kinds of rotations, such as E. Andr`es’s rotations (see [2]) or variants of rotation by circles.
On Colorations Induced by Discrete Rotations
183
References 1. Pascal Alessandri and Valerie Berth´e. Three distance theorems and combinatorics on words. L’enseignement math´ematique, 2(44):103–132, 1998. 2. Eric Andres. The quasi-shear rotation. In 6th Int. Workshop on Discrete Geometry for Computer Imagery, Lyon (France), volume 1176 of Lecture Notes for Computer Science, Nov 1996. pp.307-314. 3. Eric Andres. Habilitation ` a diriger des recherches: Mod´elisation analytique discr`ete d’objets g´eometriques, 2000. 4. Pierre Arnoux, Val´erie Berth´e, Hiromi Ei, and Shunji Ito. Tilings, quasicrystals, discrete planes, generalized substitutions, and multidimensional continued fractions. Discrete Mathematics and Theoretical Computer Science Proceedings, 2001. 5. Val´erie Berth´e and Laurent Vuillon. Suites doubles de faible complexit´e. Journal de Th´eorie des Nombres de Bordeaux, 12:179–208, 2000. 6. Marie-Andr´e Jacob and Eric Andr`es. On discrete rotations. In Discrete Geometry for Computer Imagery, 1995. 7. Bertrand Nouvel. Action des rotations sur le voisinage dans le plan discret. Master’s thesis, ENS-Lyon, 2002. 8. J.P. R´eveill`es. G´eom´etrie disr`ete, Calcul en nombre entiers, et Algorithmique. PhD thesis, ULP, 1991. 9. V.T. S´ os. On the distribution mod 1 of the sequence nα. Ann. Univ. Scient. Budapest., E¨ otv¨ os Sect. Math. 1, 1958. 10. Laurent Vuillon and Val´erie Berth´e. Tilings and rotations on the torus: a twodimensional generalization of sturmian sequences. Discrete Mathematics, 2000.
Binary Shape Normalization Using the Radon Transform Salvatore Tabbone and Laurent Wendling LORIA Campus scientifique BP 239, 54 506 Vandœuvre-les-Nancy Cedex, France {tabbone,wendling}@loria.fr Abstract. This paper presents a novel approach to normalize binary shapes which is based on the Radon transform. The key idea of the paper is an original adaptation of the Radon transform. The binary shape is projected in Radon space for different levels of the (3-4) distance transform. This decomposition gives rise to a representation which has a nice behavior with respect to common geometrical transformations. The accuracy and the efficiency of the proposed algorithm in the presence of a variety of transformations is demonstrated within a shape recognition process.
1
Introduction
By definition The Radon transform of an image is determined by a set of projections of the image along lines taken at different angles. For discrete binary image data, each non-zero image point is projected into a Radon matrix. Earlier works [6,10,11] on the 2D Radon transform were dedicated to find high-valued coefficients in the transformed domain, in order to detect specific shape primitives like straight lines or arcs of conics. In all these approaches the information encoded is contour-based allowing the characterization of simple shapes. Furthermore, this kind of representation is not suited to a recognition task because it needs to be normalized with respect to geometric parameters (translation, rotation and scaling). Indeed, it is difficult to recover all the geometric parameters of the transformation between two objects using directly the Radon transform. To overcome this problem we propose an original adaptation of the Radon transform. We define a new representation which has a low time complexity and a nice behavior with respect to common geometrical transformations. The key idea is to project each binary shape in the Radon space for different levels of the (3-4) distance transform. In this manner we take into account the link between the internal structure and the boundaries of the shape. Thus, we provide a global description of any binary shape whatever its type and form are. The remainder of the paper is organized as follows. A brief review of shape representation methods is presented in Section 2. The definition of the Radon transform is recalled in Section 3. The current method is described in Section 4 and experimental results are given in Section 5. Finally, Section 6 presents conclusions and the future work. I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 184–193, 2003. c Springer-Verlag Berlin Heidelberg 2003
Binary Shape Normalization Using the Radon Transform
2
185
Related Works
The normalization of binary shape [4] is a mandatory step in many computer vision system especially when the main goal is the discrimination of objects depending on their forms. Shape representation for object recognition has been the subject of much research, and extensive surveys of shape analysis can be found in [7,12]. The choice of a particular representation scheme is usually driven by the need to cope with requirements such as robustness against noise, stability with respect to small distortions, invariance to common geometrical transformations, or tolerance to occlusions. Many approaches have been proposed to describe the boundary contours from a small set of features. Fourier descriptors [8,15] have been widely used and modified versions [16] have been proposed to compute the affine transformation between one shape to another. In most cases the centro¨ıd of the shape is required to define the geometric transform and it is well known that the position of the centro¨ıd is sensitive to noise. Moreover, the Fourier descriptors represent the global appearance of a shape from their most important components. The number of required coefficients relies on the shape of the given object and is usually rather large. In curvature approaches [13,23] a shape is described in a scale space by the maximum of the curvature. The similarity of two shapes is determined by measuring the distance between their corresponding scale space representations. In some approaches the similarity is computed at a high scale or at all scales. These methods yields interesting results. However the number of scales is set manually because it is difficult to compute it automatically. For the same reason the first and the last scales are set manually too. To solve the correspondence problem of contour points between two shapes, S. Belongie [1] links a shape context to each contour point. The shape context at a contour point captures the distribution around it and enables to solve the correspondences as an optimal assignment problem. This method has the advantage that it does not require ordered boundary points over the previous one. Nevertheless, matching local contexts does not necessarily preserve the coherence of shapes. A recognition system which recognizes objects from their silhouettes has been proposed in [24]. Each instance of an object is represented by a graph built on the medial axis of the shape silhouette. The nodes are skeleton junctions and edges are the primitive points between them. An improved version is defined by the shock structure notion [9] which is obtained by viewing the medial axis as being the locus of singularities. This structure is represented in a shock graph which describes the shape more accurately than the medial axis graph. Several approaches have been proposed to compare shock graphs. K. Siddiqi [21] modifies the shock graph into a shock tree and matching is performed by subgraph isomorphism or by finding the maximal cliques [14]. An alternative approach, also based on the singularities of a curve evolution process, has been given in [20]. The underlying graph is hierarchical, but more complex. Novel recognition frameworks have been proposed for matching shock graphs of 2D
186
Salvatore Tabbone and Laurent Wendling y
fd ρi
D
θi
Li x
0
Fig. 1. Definition of the Radon transform.
shapes outlines [18]. In these approaches the edit distance is used to measure the similarity between shapes. It is performed by searching the optimal sequence in the space of all possible transition sequences. These methods are highly effective since they rely on global optimizations. Partial and occluded shapes are well matched. The main disadvantage is that they are computationally expensive, although heuristics can be used to reduce the complexity [19] in practice. A further problem is that they do not take into account the internal structure of general objects and the medial axis is difficult to extract from real images due to its noise-sensitivity.
3
The Radon Transform
Let f (x, y) be an image. Its Radon transform is defined in [5]: ∞ ∞ f (x, y)δ(x cos(θ) + y sin(θ) − ρ)dxdy TRf (ρ, θ) = −∞
(1)
−∞
where δ(.) is the Dirac delta-function (δ(x) = 1 if x = 0 and 0 elsewhere), θ ∈ [0, π[ and ρ ∈ [−∞, ∞]. In other words, TRf is the integral of f over the line L(ρ,θ) defined by ρ = x cos(θ) + y sin(θ). In the context of shape recognition it is of particular interest to consider the case where the general function f is replaced by (see Figure 1): 1 if (x , y) ∈ D fD (x, y) = (2) 0 otherwise. Let Li be in normal form (ρi , θi ) in the plane (see Figure 1). The Radon transform TRf (ρ, θ) describes the length intersection of all the line Li with the function fD for all θi ∈ [0, π[ and −ρmin < ρ ≤ ρmax . When an implementation is considered, ρmin and ρmax are finite and rely on the image size.
Binary Shape Normalization Using the Radon Transform
187
Fig. 2. A 2D shape and its Radon transform.
Since the Radon transform is linear by definition, geometric properties like straight lines or curves can be made explicit by the Radon transform which concentrates energies (loci of intersection of several sinusoidal curves) from the image in few high-valued coefficients in the transformed domain. These remarks are illustrated in Figure 2 where white pixels are the loci of high energies. The Radon transform has several useful properties. Some of them are nice for shape representation [5]: – Periodicity: TRf (ρ, θ) = TRf (ρ, θ + 2kπ), k integer. The period is therefore 2π. – Shift of translation vector u = (x0 , y0 ): TRf (ρ − x0 cos(θ) − y0 sin(θ), θ). A translation of f results in the shift of its transform in the variable ρ by a distance equal to the projection of the translation vector on the line ρ = x cos(θ) + y sin(θ). – Rotation of θ0 : TRf (ρ, θ + θ0 ). A rotation of the image by an angle θ0 implies a translation of the Radon transform in the variable θ. 1 – Scaling of α: |α| TRf (α × ρ, θ). A scaling of f results in a scaling of both the ρ coordinate and the amplitude of the transform. To be useful, a shape recognition framework should allow explicit invariance through the operations of translation, rotation and scaling. To measure the similarity between the Radon matrix of two shapes it is necessary to know the underlying geometric transformations from one shape into the other. However, we can see from the previous properties that if a given shape is translated, rotated and scaled, it will be difficult to recover all the parameters of the geometric transformations from the Radon transform. To overcome this problem we propose in the next section an original adaptation of the Radon transform.
4
R-Transform
Let the following transform, called R-transform, be: ∞ TR2 f (ρ, θ)dρ Rf (θ) =
(3)
−∞
where TRf is the Radon transform of f . We can show the following properties:
188
Salvatore Tabbone and Laurent Wendling
Rf
1
Rf
1
0.95
0.95
0.9
0.9
0.85
0.85
0.8
0.8 0
20
40
60
80
100
120
140
160
θ
180
0
20
40
60
80
100
120
140
160
θ
180
Fig. 3. R-transforms of the same shape which has been rotated, scaled and translated. Only the rotation provides a shift of the R-transform.
– Periodicity: Rf (θ ± π) = Rf (θ). The period is therefore set to π. ∞ – Rotation: Rf (θ + θo ) = −∞ TR2 f (ρ, θ + θo )dρ. A rotation of the image by an angle θ0 implies a translation of the R-transform of θ0 . ∞ – Shift: −∞ TR2 f (ρ − xo cos(θ) − yo sin(θ))dρ = Rf (θ). The R-transform is invariant under a translation of f by a vector u = (x0 , y0 ). ∞ – Scaling: α1 −∞ TR2 f (αρ, θ)dρ = α12 Rf (θ) (α > 0). A scaling of f causes in a scaling of only the amplitude of the R-transform. To summarize, the R-transform is invariant under translation and scaling if the transform is normalized. A rotation of the shape implies a translation of the transform modulo π. Figure 3 shows two R-transforms of the same object which has been rotated, scaled and translated. We note that only the rotation provides a modification of the function. Given a large collection of shapes, unraveling its redundancies with only one R-transform per shape is not efficient because the R-transform provides a highly compact shape representation. In this perspective, to improve the description, each shape is projected in the Radon space for different segmentation levels of the (3 − 4) distance transform. A distance transformation is an operation that converts a binary image to an image where each element is valued to the distance of the nearest boundary contour [2,17]. There are different families of distance transformation (see [2,17] for more details). The Chamfer distance, called the (3−4) distance transform, is fast and simple to implement and provides a good approximation of the Euclidean distance. To compute the (3−4) distance transform, the two additive masks of Figure 4 are applied in two passes on the image. In the forward pass the first mask starts in the upper left corner of the
Binary Shape Normalization Using the Radon Transform 3 4
3
0
Forward
4
0
4
3
3
4
189
Backward
Fig. 4. The masks for computing the (3 − 4) distance transform.
Fig. 5. First column: (3-4) distance transform. Other columns: segmented images for 8 different levels of distance transform.
image moving from left to right and from top to bottom. The opposite operations are performed for the backward mask. Given the distance transform of a shape, the distance image is segmented in n equidistant levels in order to keep the segmentation isotropic. For each distance level, pixels having a distance value superior to that level are selected and at each level of segmentation, an R-transform is computed. In this manner, we capture both the internal structure and the boundaries of the shape. Since the Radon transform is linear, all the R-transforms are computed in only one step. That is, each non-zero point (xi , yi ) is projected simultaneously into different Radon matrices. The number of projections relies on the number of segmentation levels and on the value of the distance transform at that point. So, each binary shape is composed of a set of R-transforms describing a 2D surface and verifying the previous geometric properties. Figure 5 shows the distance transform of the two dog images of Figure 3 and their corresponding distance images for 8 segmentation levels. We note that the isotropy of the segmentation is kept. The corresponding 2D surfaces are presented in Figure 6. We can see that one surface is very close to a circular permutation of the other. In the next section, we give some indications on the way of computing the discrete R-transform.
5
Digital Considerations
Since the aim of this paper is not to provide a new version of the discrete Radon transform but to emphasize its application, we only give some tips about its implementation. A great number of algorithms have been proposed which are more or less fast. We have adapted the approach proposed in [3] which is wellsuited to deal with binary images even though the complexity is high. Let a single point image with coordinates (x0 , y0 ) be: I(x, y) = δ(x − x0 )δ(y − y0 ).
(4)
190
Salvatore Tabbone and Laurent Wendling
Z Z
Y Y
Y Y
X X
X
X
Fig. 6. Surface visualization of the corresponding 2D R-transforms of the Figure 5. On the X-axis is the number of orientations in the Radon transform. On the Y -axis is the number of level cuts in the distance transform.
Its Radon transform is by definition: TRI (ρ, θ) = δ(ρ − x0 cos(θ) − y0 sin(θ)).
(5)
That is, a single point has a Radon transform which is non-zero along a sinusoidal curve of equation ρ = x0 cos(θ) + y0 sin(θ). Therefore, since the Radon transform is by definition linear, a way to compute the transform of a binary image is to map every non-zero image point, using the normal parameterization ρi = xi cos(θi )+yi sin(θi ), into a Radon matrix. That is, for each point (xi , yi ) of the image, i is fixed and the value ρi is calculated using stepwise increments of θi from 0 to π. The increment is defined to avoid aliasing following the Shannon theory (see [22] for more details). Here we set ∆θ = ∆ρ = 1, ∆x = ∆y = 12 and the sampled values of ρi are defined by a linear interpolation. This algorithm requires time O(N 2 M ) for an image of size N × N and M different angles (here M = 180). Two major optimizations are made to reduce the complexity. The cosine and the sine of all the possible values of θi are computed one time only. The values of ρi are also defined recursively. That is, since the step increment ∆x is set to cos(θi ) 1 1 . Therefore, when we move in the x-direction, ρ 2 , we have: ρi+ 2 = ρi + 2 sin(θ) is incremented by cos(θ) 2 . Similarly, in the y-direction ρ is increased by 2 . Hence, the discrete Radon transform is represented by a digital image and the discrete R-transform is defined by:
ˆ= R
π n=0
ˆ TˆR2 f (ˆ ρ, θ).
(6)
We are aware that the properties of the continuous Radon transform cary over to the discrete Radon transform only approximately, due to errors of discretization. However, we will see in the next section that the provided results have a tendency to show that the errors are small.
Binary Shape Normalization Using the Radon Transform
6
191
Experimental Results
To show the efficiency of the proposed algorithm we provide experimental results within a shape recognition process. The method was tested on a database of D. Sharvit [20] who made it kindly available to us on his Web site. This database consists of nine categories with 11 shapes in each category. Figure 7 shows an example of matching one shape of each category against all other shapes of the database. Since there were ten possible similar shapes excluding the shape itself, we provide as results the ten nearest neighbors. The similarity of two surfaces is defined by the χ2 distance [1]. Such a distance can be efficiently computed and it is well-adapted to our context: χ2 = C(hq , hm ) =
#levels π k=1
θ=0
(hq (k, θ) − hm (k, θ))2 , hq (k, θ) + hm (k, θ)
(7)
where hq and hm are respectively the 2D surfaces of query and model shapes belonging to the database. Given the previous cost, we look for the best permutation Π which minimizes: P (Π) = min{C(hΠ(q) , hm )}. Due to the (3-4) transform level cuts, a 2D surface may be composed of non uniformly sampled data. In this perspective we interpolate the values at uniformly spaced points before computing the χ2 distance. We can observe from Figure 7 that most of the shapes are well classified. For example, the wrench image provides a good demonstration of the behavior of the proposed approach with respect to geometric transformations and small deformations. Furthermore, images with different sizes and types of occlusion have been added in the database. For example, five occluded images of the fish class have been inserted in the database. We can remark from Figure 8 that our approach have retrieved all the occluded shapes concerning the fish query. Each image of the database is represented by about 128 × 128 pixels. All the 2D R-transforms are computed off-line and the running time is about 0.5s per shape. These results are obtained on a Pentium III, 866 MHz runing under Linux.
7
Conclusion
We have presented a new approach to shape recognition which is simple and easy to apply. The key characteristic of our approach is the definition of a new transform, based on both the Radon and the (3-4) distance transform, allowing to capture the shape at different levels of distance. In our experiments we have shown the invariance of our approach under several common image transformations, including the ability to handle partial and occluded shapes. Currently the number of segmentation levels is defined manually. It is obvious that a smallsized shape does not require the same levels of description than a larger one. In this case we need also more precision on the distance decomposition. To achieve more accuracy, further works will be devoted to get a better approximation of the Euclidean distance and to define automatically the number of level cuts.
192
Salvatore Tabbone and Laurent Wendling
Fig. 7. Top: A database of 99 shapes. Each shape of the first column is matched against every other shapes in the database. Bottom: the 10 nearest neighbors. The self-matching that is a perfect match is excluded from the results.
Fig. 8. Robustness to occlusion. The occluded images are represented in gray. Each shape of the first column is matched against every other shapes in the database (selfmatching is excluded from the results). Right: the 14 nearest neighbors.
References 1. S. Belongie, J. Malik and J. Puzicha. Shape Matching and Object Recognition Using Shape Contexts. IEEE Transactions on PAMI, 24(4):509–522, 2002. 2. G. Borgefors. Distance Transformations in Arbitrary Dimensions. CVGIP, 27:321– 345, 1984. 3. R. N. Bracewell. Two-Dimensional Imaging. Englewood Cliffs, NJ: Prentice Hall, 1995. pp. 505–537. 4. J. Cortadellas, J. Amat and M. Frigola. Robust Normalization of Shapes. DGCI 2002, Bordeaux, France, 2002. 5. S. R. Deans. Applications of the Radon Transform. New York: Wiley Interscience Publications, 1983. 6. P. Fr¨ anti, A. Mednonogov, V. Kyrki and H. K¨ alvi¨ ainen. Content-based Matching of Line-drawing Images Using the Hough Transform. International Journal on Document Analysis and Recognition, 3(2):117–124, 2000.
Binary Shape Normalization Using the Radon Transform
193
7. A. K. Jain, R. P. W Duin and J. Mao. Statistical Pattern Recognition: A Review. IEEE Transactions on PAMI, 22(1):4–37, 2000. 8. H. Kauppinen, T. Sepp¨ anen and M. Pietik¨ ainen. An Experimental Comparison of Autoregressive and Fourier-Based Descriptors in 2D Shape Classification. IEEE Transactions on PAMI, 17(2):201–207, 1995. 9. B. B. Kimia, A. R. Tannenbaum and S. W. Zucker. Shapes. Shocks. and Deformations I: The Components of Two-Dimensional Shape and the Reaction-Diffusion Space. IJCV, 15:189–224, 1995. 10. V. F. Leavers. Use of the Radon Transform as a Method of Extracting Information about Shape in two Dimensions. Image Vision and Computing, 10(2):99–107, 1992. 11. V. F. Leavers. Use of the Two-Dimensional Radon Transform to Generate a Taxonomy of Shape for the Characterization of Abrasive Powder Particles. IEEE Transactions on PAMI, 22(12):1411–1423, 2000. 12. S. Loncaric. A Survey of Shape Analysis Techniques. Pattern Recognition, 31(8):983–1001, 1998. 13. F. Mokhtarian and S. Abbasi. Shape Similarity Retrieval under Affine Transforms. Pattern Recognition, 10(2):31–41, 2002. 14. M. Pelillo, K. Siddiqi and S. Zucker. Matching Hierarchical Structures Using Association Graphs. IEEE Transactions on PAMI, 21(11):1105–1119, 1999. 15. E. Persoon and K. Fu. Shape Discrimination using Fourier Descriptors. IEEE Transactions on SMC, 7(3):170–179, 1977. 16. Y. Rui, A. She and T. S. Huang. A Modified Fourier Descriptor for Shape Matching in MARS. Image Databases and Multimedia Search, 8:165–180, 1998. 17. G. Sanniti di Baja and E. Thiel. Skeletonization algorithm running on path-based distance maps. Image and Vision Computing, 14:47–57, 1996. 18. T. B. Sebastian, P. N. Klein and B. Kimia. Recognition of Shapes by Editing Shock Graphs. ICCV 2001, 755–762, 2001. 19. T. B. Sebastian, P. N. Klein and B. Kimia. Shock-based Indexing into Large Shape Databases. ECCV 2002, 731–746, Denmark, 2002. 20. D. Sharvit, J. Chan, H. Tek and B. Kimia. Symmetry-based Indexing of Image Databases. Journal of Visual Communication and Image Representation, 1998. 21. K. Siddiqi, A. Shokoufandeh, S. J. Dickinson and S. W. Zucker. Shock Graphs and Shape Matching. IJCV, 35(1):13–39, 1999. 22. P. Toft. The Radon Transform - Theory and Implementation Ph.D. thesis. Department of Mathematical Modelling. Technical University of Denmark, June 1996. 23. C. Urdiales, A. Bandera and F. Sandoval. Non-parametric Planar Shape Representation Based on Adaptive Curvature Functions. Pattern Recognition, 35:43– 53, 2002. 24. S. C. Zhu and A. L. Yuille. FORM: A Flexible Object Recognition and Modelling System. IJCV, 20(3):187–212, 1996.
3D Shape Matching through Topological Structures Silvia Biasotti, Simone Marini, Michela Mortara, Giuseppe Patan`e, Michela Spagnuolo, and Bianca Falcidieno Istituto di Matematica Applicata e Tecnologie Informatiche Consiglio Nazionale delle Ricerche {silvia,simone,michela,patane,spagnuolo,falcidieno}@ge.imati.cnr.it http://www.ge.imati.cnr.it
Abstract. This paper introduces a framework for the matching of 3D shapes represented by topological graphs. The method proposes as comparison algorithm an error tolerant graph isomorphism that includes a structured process for identifying matched areas on the input objects. Finally, we provide a series of experiments showing its capability to automatically compare complex objects starting from different skeletal representations used in Shape Modeling.
1
Introduction
Work on shape representation and comparison is based on a trade-off between conciseness and expressiveness of the chosen scheme. Shape representations have to keep in a compact and effective way the object topology and geometry; for instance, shocks graphs [7], component/max trees [9,19] in 2D imagining, and topological graphs of 3D meshes [2] and volumes [10] are popular tools for the abstraction of complex information. As shape descriptor we consider a topological representation that codes the relations among the surface features in a graph and serves as input for the matching algorithm. The matching method proposed in the paper is based on an error tolerant graph isomorphism which identifies a greedy approximation of the maximal common subgraph shared by the two input graphs and that gives information about the analogies and differences among the features of the compared objects. The paper is organized as follows: the graph matching algorithm for attributed directed acyclic graphs is introduced in section 2, where a possible similarity measure is also proposed. The definition of a particular class of topological graphs, i.e. the Reeb graphs, is outlined in section 3, while a series of experiments for comparing complex objects and conclusions are given in section 4.
2
Matching of Topological Structures
The problem of comparing topological structures has been approached in several ways. In [6], multi-resolution Reeb graphs are extracted and compared in order I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 194–203, 2003. c Springer-Verlag Berlin Heidelberg 2003
3D Shape Matching through Topological Structures
195
to estimate the similarity between 3D models; the graph matching algorithm is based on a coarse to fine strategy that preserves the consistency of the graph structure. In [21] the shock graph is used as signature of the shape of 2D objects. In order to perform the graph matching the shock graphs are transformed in rooted tree and then compared. Analogously, in [10] a skeletal graph is computed from a volumetric object and the corresponding graphs are compared using the methodology presented in [21]. In this paper we propose an algorithm to compare objects described by topological structures, where the input shape is represented by an attributed, directed and acyclic graph as formalized in definition 1. Definition 1. A M-graph G is given by a quadruple G = (V, E, µV , µE ), where V is a set of nodes, E ⊆ V × V is the set of the graph edges, µV : V → AV and µE : E → AE are the node and the edge attribute functions, with AV , AE sets of node and edge attributes of G. The set of M-graphs is denoted by MGset . A subgraph S of G is a quadruple (VS , ES , µVS , µES ), where VS ⊆ V , ES ⊆ E ∩ (VS × VS ), µVS and µES are induced by µV and µE , respectively. Since the graph is directed, each node v ∈ V identifies a subgraph S of G, where VS contains the v itself and all the nodes for which v is an ancestor. This property is used during the graph comparison process in order to match not only nodes, but subgraphs too. The isomorphism notion defines an equivalence relationship among M-graphs, [15]. Since the demand for two graphs to be isomorphic is a strong condition, generally unsuitable for similarity tasks, we relax this hypothesis and propose a weaker notion of isomorphism. Starting from the properties of weak isomorphism presented in [15], the definition of an error tolerant graph isomorphism is introduced as a set of graph editing operations that makes the two M-graphs isomorphic. Definition 2. Let G and G be two M-graphs and ∆ = (δ1 , . . . , δn ) a sequence of graph editing operations (where a graph edit operation, δi , is an addition, a deletion or an attribute modification of nodes and edges), then: – the edited graph ∆(G) is the graph ∆(G) = δn (δn−1 ( . . . (δ1 (G)) . . .)); – an error tolerant graph isomorphism is a couple ψ = (∆, ψ∆ ), where ∆ is a sequence of editing operations such that there exists an isomorphism ψ∆ between ∆(G) and G . Given two M-graphs (named input and model graph, respectively), we construct an error tolerant isomorphism originating a subgraph (possibly not connected) in the input graph that is “mapped” onto an isomorphic subgraph (with respect to the graph edit operations) of the model graph. The error tolerant isomorphism is seen as a transformation process from the input graph to the model graph where the track of the editing operations explicits the differences between the two objects and is used to check the effectiveness of the matching process.
196
2.1
Silvia Biasotti et al.
Error Tolerant Graph Isomorphism
The construction of a graph isomorphism is intrinsically a NP-problem and, in the scientific literature, most of methods compare graphs by exploiting statistic [12], genetic algorithms [8], fuzzy approaches [13], optimization theory [4] or a search in the space of states [15]. In this paper a heuristic algorithm has been devised to assess and quantify an error tolerant graph isomorphism. Let S be the subgraph identified by v ∈ V , v is described by the vector sv =< (in(v), out(v), in sum, out sum, n, s sum) >, where in() (resp. out()) is the indegree (resp. outdegree) of v, in sum (resp. out sum) the indegree (resp. outdegree) node sum of S, n the cardinality of VS and s sum the sum of the attributes of ES . The proposed algorithm is a search in the state spaces defined by the input graph G1 = (V1 , E1 , µV1 , µE1 ) and the model graph G2 = (V2 , E2 , µV2 , µE2 ), where a state is the ordered set of node pairs St = {(vj , vk ) | vj ∈ V 1 , vk ∈ V2 , (vj , vk ) is candidate to be mapped by ψ∆ }. The state ordering is based on a cost function associated to each pair of nodes and computed by the real-valued function C(vj , vk ), which is the Euclidean distance between the vectors svj , svk and works as a node similarity measure. Therefore, if µVi , µEi are the attribute functions of Gi , i = 1, 2, we require that there exist two totally ordered sets AV and AE , such that AVi ⊆ AV , and AEi ⊆ AE . The algorithm maps nodes and edges of G1 and G2 moving from an initial state St0 to a final state Stf , where Stf = ∅, through a sequence of intermediate states Sti . The initial state. The initial state St0 is set using a heuristics based on the information associated to the nodes. The nodes of Gi , i = 1, 2 are ordered with respect to µVi and a set of candidate node pairs is chosen as follows. Firstly, the pairs having similar attribute value are selected. Then, the set of candidate pairs is refined by considering only those with relevant nodes: since each node generates a subgraph S, the bigger S is, the more the node is relevant. Finally, the set of candidates is ordered with respect to the cost function C. In figure 1 an example is shown on two directed acyclic M-graphs: the initial node candidates, selected with respect to the node attributes and the node relevance, are marked with the same symbols (see figure 1(a)). Leaves are not considered because the associated information is negligible. Choice of the best candidate node pair. The heuristic used to choose the best candidate pair (vj , vk ) involves both the cost of the node pair C(vj , vk ) and the information carried by the two nodes. The relevance related to a candidate pair is given by the minimum between the cardinality of the subgraphs induced by the two nodes. Therefore, the chosen candidate is the node pair of minimum cost, with respect to C which has a relevance greater or equal to the average relevance value in the state. This choice generates a priority list for the construction of the isomorphism node mapping. In figure 1(a), the first best candidate pair is denoted by a grey square. From the state Sti to the state Sti+1 . The candidate node pair (vj , vk ) having highest priority is removed from Sti and becomes a component of the node mapping isomorphism ψ∆ . New candidate pairs are obtained from vj and
3D Shape Matching through Topological Structures
(a)
197
(b)
Fig. 1. Example of the matching method: initial state and best candidate pair (a) and final matching (b).
vk : they are all the possible node pairs obtained by combining the child nodes of vj and vk . Then, the state i + 1 is the state i enriched by the new candidate node pairs obtained from vj and vk . According to figure 1(a), when the node pair (e, 7) is added to ψ∆ , the new state is obtained from the previous one adding the new node pairs (h, 6), (h, 8), (h, 9), (g, 6), (g, 8), (g, 9), (f, 6), (f, 8), (f, 9) and removing (e, 7). End of the search. When the set of candidate pairs is empty the final state is reached. The node mapping that identifies the common subgraph (and consequently the edge mapping) is complete and the set of graph editing operations ∆ (see definition 2) is computed. Results are shown in figure 1(b): node mapping is highlighted with the same symbols; bold arcs represent the edge matching induced. According to the notation proposed in [3], a graph SM is a common subgraph of G1 and G2 if there exists a subgraph isomorphism between SM and G1 , SM and G2 . Moreover, G1 can be transformed into G2 by adding to ∆ a deletion operation for each edge in E1 not belonging to SM and an addition operation is inserted into ∆ for each edge in E2 not belonging to SM . Furthermore, modify operations are added to ∆ for each couple of edges belonging to the edge mapping, that do not have the same attributes. 2.2
Similarity Measure
To compare two M-graphs G1 and G2 , a distance metric d : MGset × MGset → [0, 1] that satisfies the properties of uniqueness, non-negativity, identity, symmetry and triangular inequality, is necessary, [22]. Then, the similarity measure, s(G1 , G2 ) between G1 and G2 is defined by s(G1 , G2 ) = 1 − d(G1 , G2 ). The bigger the common subgraph SM of G1 and G2 defined by the error tolerant isomorphism is, the bigger the similarity measure s(G1 , G2 ) should be. 1 ,G2 )| A possible choice [3] is dB (G1 , G2 ) = 1 − |maxcs(G max(|G2 |,|G2 |) , where maxcs(G1 , G2 ) is the maximal common subgraph of G1 and G2 and | . . . | is the number of nodes of a graph. As required by our statements such a distance depends on the size of the subgraph; however, it does not take into account the attribute value of edges and nodes. This implies that each node of the subgraph has the same weight, despite its relevance in the graph. Our aim is to compute
198
Silvia Biasotti et al.
the distance on the edges of SM (where an arc e belongs to SM iff the nodes it connects do) and to adapt dB to our purposes, correcting the contribution of each edge in SM by considering the difference between the attributes of the corresponding edge of G1 and G2 . More formally, our distance measure is defined
(1−
|µE (a)−µE (b)| )
α M as: d(G1 , G2 ) = 1 − e∈Smax(|G , where α represents the maximal 1 |,|G2 |) attribute value on the edges of the graphs in the database and e = ψ∆ (a) = ψ∆ (b), a ∈ E1 , b ∈ E2 . Clearly, if the edges of G1 mapped in SM have the same attribute value of those ones in G2 , our metric corresponds to dB . Furthermore the non-negativity, uniqueness, identify and symmetry properties immediately follow from the definition; if SM is the maximal common subgraph, also the triangular inequality can be demonstrated, through a procedure similar to that proposed in [3], and d is a distance.
2.3
Computational Complexity
The isomorphism construction involves the generation of the node description represented by the vector sv previously described, the computation of the initial state, the transition among states and the choice of a candidate. Denoting with N the number of nodes of a graph, the generation of the vector sv has a linear cost in N : each node or edge is read exactly once. The ordering of nodes with respect to µV requires O(N 2 ) operations, while the generation of the initial candidates is quadratic. Also the transition between two states is O(N 2 ). The removal of the chosen candidate is performed in linear time, but the generation of the new candidates has a quadratic cost. The choice of the best candidate from the current state is linear in the number of the candidates but, in the worst case, the number of candidates is quadratic in the number of nodes.
3
Topological Structures
Among the possible shape descriptors of 3D objects that may be coded as a Mgraph, we detail in this section a discrete representation of the Reeb graph [18] discussing its application to shape retrieval. Let f : M → R be a real mapping function defined on a surface M , and let [fmin , fmax ] be the variation interval of f on the surface M , and fmin < f1 < · · · < fh < fmax be the distribution of the values of the contour levels of M , which are supposed to be all non degenerate contours. In addition, let I = {(fmin , f1 ), (fi , fi+1 ), (fh , fmax )| i = 1, . . . , h−1}∪{fmin , f1 , . . . , fh , fmax } be the partition of the interval [fmin , fmax ] provided by the set of the h + 1 interior parts and the function values of the contour levels. Definition 3. An extended Reeb equivalence between two points P, Q ∈ M is given by the following conditions: 1. f (P ), f (Q) belong to the same element of t ∈ I; 2. f −1 (f (P )), f −1 (f (Q)) belong to the same connected component of f −1 (f (t)), t ∈ I.
3D Shape Matching through Topological Structures
199
By the quotient relation in definition 3, all the points belonging to R := f −1 (f (t)), for some t ∈ I, are Reeb-equivalent in the extended sense and they are collapsed into the same point of the quotient space, called Extended Reeb (ER) quotient space. Moreover the ER space, which is an abstract sub-space of M and is independent of the geometry, may be represented as a traditional graph which is called the Extended Reeb Graph, (ERG). The ERG with respect to the function f is a M-graph; in fact, the nodes correspond to critical points of f , i.e. the points of M where the gradient of f vanishes, while the node attributes are provided by their classification (as minima, maxima or saddles) and some geometric information such as the space position, the value of f , etc.. In addition, the function f induces a natural orientation on an edge e = (v1 , v2 ) by considering as relation f (v1 ) < f (v2 ), while the edge attributes are defined as µE (e) = f (v2 ) − f (v1 ). However, additional geometric information, such as cross-section length, area and volume of the portion of the corresponding part of the surface, may be stored for each edge. Finally, due to the monotonicity of f along an edge, the ERG is acyclic. Likewise the Reeb graph [18,16], under the hypothesis I is a sufficiently dense partition of the domain of f [1], the ERG representation identifies the main topological properties of M independently of the chosen f . However, the application domain restricts the choice of f ; for instance, a suitable mapping function f has to be independent of rotation, translation, uniform scaling of the object and user’s choices. These requirements prevent the use for matching purposes of the height function [1], and the centerline representation [11,5] which respectively depend on the orientation and on the selection of a seed point. The family of continuous or Morse functions is a natural set for identifying f , and in the following we present an overview of possible choices of f for coding triangular meshes without boundary. ERG with respect to the Euclidean distance from a point. Differential topology and Morse theory guarantees that the distance functions of the surface points from a given point p of the Euclidean space are appropriate for extracting a Reeb graph. Such a point could belong to the mesh or not, even though a reasonable choice seems to be the barycenter of the object [2] which is easily calculated and, due to its linear dependence on all the vertices, is stable to small perturbations. In figure 2(a), an example of this graph representation is given. ERG with respect to the integral geodesic distance. A different mapping function has been defined by Hilaga et al. [6], where the notion of integral geodesic distance has been introduced for matching purposes. In particular, foreach vertex of a mesh M , the value of the function f is given by: f (v) = i g(v, bi ) · area(bi ), where g(v, bi ) represents the geodesic distance between v and bi , {bi }i are the base vertices for the Dijkstra’s algorithm that are scattered almost equally on the surface, and area(bi ) is the area of the neighborhood of bi (see figure 2(b)). Geodesic distance from curvature extrema. The strategy proposed in [17] extracts the M-graph of a surface represented by a simplicial complex. More precisely, once computed a multi-resolutive Gaussian curvature on the input
200
Silvia Biasotti et al.
(a)
(b)
(c)
Fig. 2. The Reeb graph with respect to the distance from the barycenter (a), the integral geodesic distance (b), and the curvature extrema distance (c). The blue, red and green nodes represent minima, maxima and saddles, respectively.
mesh [14], for each high curvature region Ri , i = 1, . . . , n, a seed vertex pi is selected. Starting at the same time from all the representative vertices, rings made of vertices of increasing neighborhoods are computed in parallel until the whole surface is covered. Rings growing from different seed points collide and join where two distinct protrusions depart, thus identifying a branching zone. A graph is drawn according to the ring expansion: terminal nodes are identified by the seed points, while union or split of topological rings give branching nodes. Arcs join consecutive nodes of the graph, (see figure 2(c)). Experimental results have shown that this framework works on shapes of arbitrary genus.
4
Experimental Results and Concluding Remarks
The behavior of the different ERG representations has to be taken into account during the similarity analysis: in fact each function emphasizes different aspects of the object shape. For instance, the geodesic distance distribution on a human model does not change if the legs and the arms are stretched rather than curled up, since the geodesic distance from the body does not change, while the Euclidean distance from the center of mass does. Therefore, if the aim is to distinguish between different poses of the same object, the ERG with respect to the distance from the barycenter should be preferred. Figure 3 highlights how the choice of f influences the matching results. In fact, a teapot has been slightly modified and the graphs result much different. The graph obtained by distance from the barycentre function is a representation of the spatial distribution of the object with respect to the barycentre: even if a part of the handle has been removed the remaining part folds on itself, generating a critical points in the Reeb function. The graph based on the integral geodesic does not take into account the spatial embedding, thus the broken handle of the teapot results in a maximum critical point with respect to the geodesic distance, neglecting the shape of the handle itself. Concerning the distance from the curvature extrema the modification of the teapot handle results in a new curvature extreme generating a new maximum critical point.
3D Shape Matching through Topological Structures
201
Fig. 3. Matching between the teapot and its modified version for the three ERG structures and corresponding similarity values.
In our experimental results the ERG structure, obtained from a uniform partition I of the interval [fmin , fmax ], highlights that the main structure of the object is better detected through a rough subdivision of it, while smaller features are located when the number of sub-intervals of I increases. On the contrary, the representation provided by the geodesic distance from curvature extrema depends on the choice of the base points, which are individuated using the multiresolution strategy proposed in [14]. Finally, we observe the function f in our ERG representation is always non-negative. Then, to compare the shape features according to their relevance on the model, we adopt the following “normalized” ERG extraction: for each model the partition I is given by I = IP ∩[fmin , fmax ], max max max where IP = {( i∗fm , ( (i+1)∗f ) | i = 0 . . . m − 1} ∪ { i∗fm |i = 0 . . . m} is a m partion of the interval [0, fmax ] and m is an integer chosen by the user. Experimental results of our matching method are shown in figure 4, where the top objects retrieved by our matching algorithm on two query models (a child and a dog) are shown. Results are arranged according their similarity value with respect to the query models, in decreasing order from left to right. For both, all the proposed ERG representations where compared: line (a) corresponds to the distance from the barycentre, line (b) to the integral geodesic distance and (c) to the geodesic distance from curvature extrema. For each function the best match was the model itself and was not depicted. We can conclude that the graph comparison reflects the intuitive notion of similarity and groups the objects in a number of families (for instance quadrupeds, humans, pots, hands, etc.) even if some false positive results are obtained. This phenomenon rises more frequently when the graph representation of the query model is very simple, both in terms of number of entities and of configuration: in this case the query graph itself is easily contained into other graph representations of the models in our database. The framework for graph matching proposed in this paper is valid not only for Extended Reeb graphs but also for each graph-like representation which can be related to a M-graph. In particular, for the digital context, the component tree [9], the max-tree [19] and the topological graph proposed in [20], seem to be natural candidates. Moreover, even if the adopted matching approach is mainly based on the topological information
202
Silvia Biasotti et al.
Fig. 4. Matching results for two query models in our database with respect to the three ERG representations: (a) distance from the barycentre, (b) integral geodesic distance and (c) distance from curvature extrema.
stored in the M-graph, as a future development we are planning to consider a greater number of geometric attributes which should improve the results so far obtained. Further improvements of the matching algorithm can be also obtained by considering the sequence of the editing operations of the input and model graphs and using them for partial mapping or metamorphosis purposes.
Acknowledgements This work has been partially supported by the National Project “MACROGeo: Metodi Algoritmici e Computationali per la Rappresentazione di Oggetti Geometrici”, FIRB grant.
References 1. M. Attene, S. Biasotti and M. Spagnuolo. Shape understanding by contour driven retiling. The Visual Computer, 19(2-3):128–137, 2003. 2. S. Biasotti, S. Marini, M. Mortara and G. Patan´e. An overview on properties and efficacy of topological skeletons in shape modelling. Proc. of Shape Modelling and Applications 2003, IEEE Press, Seoul, pp. 245–254, 2003.
3D Shape Matching through Topological Structures
203
3. H. Bunke and K. Shearer A graph distance metric based on the maximal common subgraph. Pattern Recognition Letters, 19:255–259, 1998. 4. S. Gold and A. Rangarajan. A Graduated Assignment Algorithm for Graph Matching. IEEE Trans. on Patt. Anal. Mach. Intell., 18(4):377–388, April 1996. 5. F. H´etroy and D. Attali. Topological Quadrangulations of Closed Triangulated Surfaces using the Reeb Graph. Graphical Models, 65, pp. 131–148, 2003. 6. M. Hilaga, Y. Shinagawa, T. Komura and T. L. Kunii. Topology Matching for Fully Automatic Similarity Estimation of 3D Shapes. ACM Computer Graphics, (Proc. of SIGGRAPH 2001), Los Angeles, pp. 203–212, 2001. 7. B. Kimia, A. Tannenbaum and S. Zucker. Shapes, shocks, and deformations, I. Computer Vision 15:189–224, 1995 8. K. G. Koo and P. N. Suganthan. Multiple Relational Graphs Mapping Using Genetic Algorithms. Proc. of Congr. on Evolutionary Comp., pp. 727–733, 2001. 9. R. Jones. Connected Filtering and Segmentation Using Component Trees. Computer Vision and Image Understanding, 75(3):215–228, 1999 10. H. Sundar, D. Silver, N. Gagvani and S. Dickinson. Skeleton Based Shape Matching and retrieval. Proc. of Shape Modelling and Applications 2003, IEEE Press, Seoul, pp. 130–139, 2003. 11. F. Lazarus and A. Verroust. Level Set Diagrams of Polyhedral Objects. ACM Solid Modeling ’99, Ann Arbor, Michigan, pp. 130–140, 1999. 12. B. Luo and E. R. Hancock. Symbolic Graph Matching using the EM Algorithm and Singular Value Decomposition. Proc. of Int. Conf. on Pattern Recognition, Vol. 2, pp. 2141-2144, 2000. 13. S. Medasani, R. Krishnapuram and Y. Choi. Graph Matching by Relaxation of Fuzzy Assignements. IEEE Trans. on Fuzzy Systems, 9(1):173–182, February 2001. 14. M. Mortara, G. Patan`e, M. Spagnuolo, B. Falcidieno and J. Rossignac. Blowing bubbles for multi-scale analysis and decomposition of triangle meshes. Algorithmica, Special Issue on Shape Algorithmics, Springer-Verlag. to appear. 15. B. T. Messmer, H. Bunke. A New Algorithm for Error Tolerant Subgraph Isomorphism Detection. IEEE Trans. Patt. Anal. Mach. Intell., 20(5):493-504, 1998. 16. J. Milnor. Morse Theory. Princeton University Press, New Jersey, 1963. 17. M. Mortara and G. Patan´e. Shape-Covering for Skeleton Extraction. Int. J. of Shape Modelling , 8(2):245–252, 2002. 18. G. Reeb. Sur les points singuliers d’une forme de Pfaff compl`etement int´egrable ou d’une fonction num´erique. Comptes Rendu Acad. Sciences, 222:847–849, 1946. 19. P. Salembier, A. Oliveras, L. Garrido Anti-extensive connected operators for image and sequence processing. IEEE Trans. on Image Processing, 7(4):555–570, 1998. 20. D. Shattuck, R. Leahy, Automated graph based analysis and correction of cortical vaolume topology. IEEE Trans. on Medical Imaging, 20(11):1167–1177, 2001 21. K. Siddiqi, A. Shokoufandeh, S.J. Dickenson and S.W. Zucker Shock graphs and shape matching. Proc. of 6th Int. Conf on Computer Vision, pp. 222–229, 1998. 22. R. C. Veltkamp and M. Hagendoorn. State-of-Art in Shape Matching. IN Principles of Visual Information Retrieval, M. Lew (Eds.) Springer-Verlag, pp. 87–119, 2000.
Contour-Based Shape Representation for Image Compression and Analysis Ciro D’Elia1 and Giuseppe Scarpa2 1
University of Cassino, Department of Automation Electromagnetism Information Eng. and Industrial Mathematics DAEIMI, (FR) Italy [email protected] http://webuser.unicas.it/delia 2 University Federico II of Naples, Dept of Electronic and Telecommunication Eng., via Claudio 21, (NA) Italy
Abstract. With the rapid growth of computing power, many concepts and tools of image analysis are becoming more and more popular in other data processing fields, such as image and video compression. Image segmentation, in particular, has a central role in the object-based video coding standard MPEG-4, as well as in various region-based coding schemes used for remote-sensing imagery. A region-based image description, however, is only useful if it has a limited representation cost, which calls for accurate and efficient tools for the description of region boundaries. A very promising approach relies on the extended boundary concept, first discussed in [6] and [7] and later used by Liow [5] to develop a contour tracing algorithm. In this work, we extend Liow’s algorithm and introduce the corresponding reconstruction technique needed for coding purposes. In addition, we define an algebraic semi-group structure that allows us to formally prove the algorithm properties, to extend it to other boundary definitions, and to introduce a fast contour tracing algorithm which only requires a raster scan of the image.
1
Introduction
Thanks to the increasing availability of computing resources, segmentation-based compression algorithms, in which the classical transform coding scheme is preceded by a segmentation preprocessing, are becoming more and more popular. A well-know example is the MPEG-4 video coding standard [8] [9] [10], but segmentation-based compression is of interest for many other applications [4]. This approach guarantees two main results: on one hand, segmentation divides the image in statistically homogeneous regions, so that image encoding can be optimized locally for each region, with excellent encoding performance; on the other hand, the end user is provided with an high-quality segmentation, obtained on the original data, and embedded at no cost in the output stream. This byproduct can be extremely useful in many applicative fields, e.g., remote-sensing and biomedicine, where such an “high level” image descriptions can be used I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 204–213, 2003. c Springer-Verlag Berlin Heidelberg 2003
Contour-Based Shape Representation for Image Compression and Analysis
205
Fig. 1. Boundary Definitions: (a) Boundary in R2 , (b) Trivial application of R2 definition to the space discrete space N 2 , (c) Boundary lattice, (d) Extended Boundary.
for subsequent automated analysis, such as classification, object detection, automatic diagnosis, and data mining. Obviously, the quality of the segmentation map is very important for this scheme to succeed. It should be faithful to the image features, to enable an efficient coding of the image regions but should also present smooth region boundaries, to limit the encoding cost of the map itself. Once obtained a smooth map, a promising way to encode it is to extract the map contours and represent them efficiently. To this aim it is necessary to choose a boundary definition that allows for: 1. an efficient contour representation, i.e. a contour representation that is intrinsically suited for coding applications; 2. a memory efficient algorithm, considering that in some applications, like remote sensing, images of 8000x8000 pixels and more are commonplace; 3. a simple contour tracing algorithm; 4. a lossless (namely, perfect) image reconstruction. In figure 1 several definitions of boundaries are illustrated: in fig.1.a there is the well known topological definition of boundary in the continuous (R2 ), where a point x in a closed set R of R2 belongs to the boundary (∂R)1 if and only if in each neighborhood of x there is at least an external point of R. However, since our segmentation map is defined on the discrete set (N 2 ) we have to extend this definition. In fig.1.b there is the application of the previous definition to the discrete case N 2 with the neighborhood of fig.2. It is clear that this definition is unable to provide a common boundary between two regions, namely, the boundary of region R is different from the boundary of the complementary region R, while in R2 ∂R ≡ ∂R. Therefore, this extension to discrete sets is not suitable for coding purposes, because ∂R and ∂R should be both encoded although they carry the same information. Furthermore, this definition contradicts the physical meaning of boundary and makes it difficult to carry out a geometrical analysis of the image. The definition of fig.1.c, where a Boolean lattice of boundaries 1
Where ∂R is the boundary of R.
206
Ciro D’Elia and Giuseppe Scarpa
between pixels is introduced, preserves the common boundary between regions but requires this supplementary lattice, which could not be memory efficient and, contrarily to the definition in the continuous2 , ∂R is not included in N 2 . Finally the definition of fig.1.d, based on the concept of extended boundary of the region R, preserves both the common boundary between regions and the desirable property that ∂R belongs to N 2 . 3 2 1 4 P 0 5 6 7 Fig. 2. Neighbors of point P and Freeman codes. Segmentation Map Segmentation Map
? Contour Tracer
? Map Encoder
?
Segmentation Map Contour
? ? Contour Encoded Stream
Contour Encoder
?
Contour Encoded Stream Fig. 3. “Map encoding” block scheme.
For these reasons, to develop our “Map encoding” algorithm (see fig.3), we have chosen the extended boundary definition which meets our major requirement about the efficiency of the contour representation. Of course, we have yet to shown how this boundary definition leads to a simple contour tracing algorithm and, above all, how the original segmentation map can be recovered, without loss of information, starting from the contours. In other words, we must prove that the extended boundary definition can be used to represent the map, and that the contours have the same information content of the map itself. In Section 2 we will show how to trace the image contours using the previous definition, while in the Section 3 we will show how to reconstruct the Image starting from the Contours or the Shapes.
2
Contour and Shape Tracing
In the following, we will show how to trace the contours using the extended boundary concept, how to organize them in shapes, for a multi label image 2
Where ∂R ⊆ R2 .
Contour-Based Shape Representation for Image Compression and Analysis
207
Fig. 4. In this sample there are two regions dark gray R1 and light gray R2 together with their extended boundary, depicted with ///. We can observe that the common boundary between regions R1 and R2 is preserved, indeed for instance, Pixel (A) belong either to SA (R2 ) and SC (R1 ), while Pixel (B) belong either to SA (R2 ) and SB (R1 ).
like our segmentation map, and how to reconstruct the image starting from the shapes or contours. First of all, we need a formal definition of extended boundary. Let Pi (P ) be the i-th 8-connected neighbor of pixel P , with i given by the code of fig.2, and let the following definitions hold: 1) 2) 3) 4)
LEFT(R) ≡ {P : P ∈ R UPPER(R) ≡ {P : P ∈ R RIGHT(R) ≡ {P : P ∈ R LOWER(R) ≡ {P : P ∈ R
and and and and
P4 (P ) ∈ \R} ; \R} ; P2 (P ) ∈ P0 (P ) ∈ \R} ; P6 (P ) ∈ \R} ;
Then, the extended boundary of region R is the union of the sets: SA (R) = {P : P ∈ LEFT(R) or P ∈ UPPER(R)} ; SB (R) = {P6 (P ) : P ∈ LOWER(R) or P ∈ LEFT(R)} ; SC (R) = {P0 (P ), P7 (P ) : P ∈ RIGHT(R)} ; Looking at the definition, we can observe that SA (R) contains boundary points that are all internal to R, while SB (R) and SC (R) add essentially external points3 . Indeed for example, SB (R) is composed by points that are “under” internal points of R. More important, note that points in SB (R) and SC (R), that are external boundary points for R, are internal points for R, and in fact belong to the set SA R . That is the reason why this boundary definition preserves the common boundary between regions (a sample is provided in fig.4). 2.1
Single Region Contour Tracing
To trace the boundary ∂R of a region R, it is sufficient to select a starting point x0 ∈ ∂R, and to follow the contour step by step using some suitable rules (for instance like in fig.7). Indeed during the contour tracing algorithm the boundary 3
Note that SB (R) contains also same points already included in SA (R).
208
Ciro D’Elia and Giuseppe Scarpa
is described by a sequence of boundaries steps represented with the codes of fig.2 (Freeman Codes). In Section 2.2 we will show how to select the starting point x0 , while here we derive the rules to follow the contours. To this end, we first define the rules for an elementary region, and then show how to combine them in order to trace the contours of more complex regions. In fig.5.a a single-pixel region is shown (in gray), together with its extended boundary, depicted with ///, and its visiting order, indicated by the vector arrows. Fig.5.c shows the same items for a two-pixel region. In fig.5.b, instead, we can see how the boundary and visiting order of the two-pixel region can be obtained by combining the visiting orders of each component pixel. In fact, by considering the visiting order of each pixel and adding (and hence, erasing) overlapping vectors, we obtain exactly the result of figure 5.c. By using this composition rule, it is possible to derive the visiting order of an arbitrary region by adding one pixel at time. Furthermore, the same rule can be used not only to the so called 4-connected boundary of fig.1.d, but also for the 6-connected boundary of fig.1.c, as is evident in figure 6. With reference to this figure, the composition rule can be easily defined in form of an internal addition “+” operation over the set of the contours C4 . To define this operation in C, for 4- or 6-connected contours, we need to define in turn the boundaries cliques as the subsets of the boundary in which each pair of pixels are connected. In fig6.a, for example, one such clique is shown with ///. We can note that for the 6-connected boundary definition the cliques comprise 4 elements, while for the 4-connected elements they comprise just 2 elements. With the notion of boundary clique it is easy to define the addition of the boundaries of two disjoint regions R1 and R2 . Definition 1. Let ∂R1 and ∂R2 be the oriented boundaries of the two generic disjoint5 regions R1 and R2 , respectively: the operation (∂R1 + ∂R2 ) is defined as the vectorial sum of the boundaries steps in each common boundary clique of R1 and R2 . To gain insight on this simple definition we can apply it to the 6-connected example of fig.6.a, where R1 and R2 are two single-pixel regions. We have already observed that ∂R1 and ∂R2 have two cliques in common, and for each one of these we have to add vectorially the “boundaries steps” as depicted in fig.6.b. Note that this addition operation allows us to define also the Abelian semi-group C(C, +)6 , which allows us to derive the contour tracing rules of fig.7 (as presented in [5]) by applying the previous operation to the depicted configurations. In particular, to decide about the next boundary step, it is sufficient to observe the value of the current point P and of P2 (P ), P3 (P ) and P4 (P ) (see fig.2), because only these points can affect the clique for the next boundary step. This is also true for 6-connected boundary definition, and hence a similar look-up table can be derived for this case. 4 5 6
This sets contains the oriented contours of all the subsetsof the image. If R1 and R2 are not disjoint we can consider R1 = R1 − R1 R2 instead of R1 . It is also interesting to note that the semi-group is generated by the singleton regions. Trivially any region and hence any contour can be obtained by combining single-pixel regions (singleton).
Contour-Based Shape Representation for Image Compression and Analysis
209
Fig. 5. Contour Tracing Composition Rule.
Fig. 6. 6 connected contour definition. (a) boundary cliques, (b) Composition Rule (dashed arrows are summed in continous line arrows).
In conclusion, using the rules of fig.7, we can move step by step along the boundary starting from a generic point; therefore, to implement the contour tracer, we can store this look-up table or else compute the next step using the C operations on P, P2 (P ), P3 (P ), P4 (P ). In addition, the properties of C(C, +) can be also used to implement the contour tracing of a region R with a single raster scan. In fact, during the raster scan, it is possible to update the contour of the object adding pixel by pixel. This can be convenient when dealing with a very large image that exceeds the available memory resources, or also to have a more local algorithm on the data to speed up the contour tracing procedure, and finally also to implement a parallel version of the contour tracing algorithm, if needed. 2.2
Multi-label Map Contour Tracing
The last problem to solve, for the contour tracing, is the selection of a suitable starting point x0 for each connected region in the map. In [5], Liow suggests to select x0 as the first point, during the raster scan, that meets the the conditions: a) P has the selected label; b) P2 (P ) = P and P4 (P ) = P ; So for each label we have just one starting point, and hence one region traced. In classification applications, however, the same label can be used for a large
210
Ciro D’Elia and Giuseppe Scarpa
Fig. 7. Contour Tracing Rule: The circle is the traced region, X is any other region, small circle is don’t care, while Big Circle is the contour point that is being traced. Observing the figure it is clear that, knowing the incoming direction, the outgoing direction of the current boundary point depends at most on the configuration of P , P2 (P ), P3 (P ) and P4 (P ) because the other are don’t care.
number of non-connected regions. By selecting all the points that meet condition (b) we end up with too many starting points, and each region could be traced twice. To solve this problems we can use two solutions, one memory efficient and the other computationally efficient. The first solution detects a starting points using condition (b) and then, while the region is traced, a list of forbidden starting point is filled with the other points of the traced region that also respect condition (b). This solution is memory efficient but, for large images could be quite slow, because once detected a candidate starting point we have to search in a possibly large list if it is forbidden or not. The other solution, that we use, requires a boolean bitmap in which each forbidden point is marked, so that it is straightforward to see whether a candidate is forbidden or not. Up to now, the presented algorithm detects one starting point per connected region and then, using the rule of fig.7, traces the contours of each region of the map. The map is then represented by a list of region contours (shapes). In this way, however, no information is retained on the region adjacency, which could turn out to be useful both for the map compression and for subsequent applications based on image analysis. Therefore, the implemented algorithm is slightly different: when the region is traced, like in [5], the contour is cut in chains, contour segments shared by only two regions, connected by vertices (see fig.8), contour points shared by three or more regions. So, after the contour tracing algorithm, the image is represented by a list of vertices and a list of chains7 . The shape-based image representation can be derived easily by building a list of shapes and grouping, for each shape, all the chains that belong to it. The shape-based representation is quite convenient because only one label per shape must be saved (rather than two labels per chain), and because coding together 7
As will clear in the paragraph 3, to have a lossless image representation it is also necessary to store the internal and external label for each chain.
Contour-Based Shape Representation for Image Compression and Analysis
211
statistically homogeneous chains is usually more efficient. In addition, with the organization in shapes, it is quite easy to build the region adjacency graph, to compute geometrical features like the region perimeter and, more in general, any feature obtained as an integral along the shape boundary. We can, finally, summarize the entire shape extraction algorithm with the following steps: 1) detection of the region starting point; 2) extraction of the contour segments for each region (like in [5]), following the rule of fig.7 (contour image representation); 3) grouping of contour-segments in shapes (shape image representation); 4) building of the region adjacency graph using both contour and shape representation.
3
Image Reconstruction from Shapes
In this section we will describe how to recover a segmentation map from its contour or shape representation without any loss of information. First of all, we have to prove that it is indeed possible to recover the original map from its extended boundaries. As a matter of fact, by simple inspection of fig.1.d or fig.4 it is clear that the the extended boundary does not contain by itself all the information needed to recover the image, because it is not possible to understand whether a point is internal or external to the region, observing, for instance, the light gray region (R2 ) of fig.4 we can argue that there are same boundary pixel that are internal and same other that are external, so it is not possible to reconstruct the boundary label value. This is the main problem to solve, because once the right label has been assigned to the boundary points it is quite simple to reconstruct the whole image by using, for instance, a flood fill procedure. Even if the extended boundary itself does not contain the required information, by observing fig.7 we can argue that, except for the case of fig.7.a8 the oriented extended boundary tells us directly if a point is internal or not. For instance, in the case of fig.7.b, we can say that the current point is external if we enter and leave both from the left, while in the case of fig.7.c, we can say that it is internal if we enter from the left and leave on the bottom. Hence, the reconstruction procedure can be summarized as follow 1) reconstruction of the boundaries chain by chain, i.e. painting with original labels the boundaries (wire-frame image); 2) reconstruction of the internal points (original map); It is worth underlining that we can exploit the extended boundary properties also to implement the reconstruction of internal points. Indeed, we can argue that, moving from left to right and top to bottom of the image, the label change only 8
This case have to be treated in a special way, anyway it is possible reconstruct the right value also in this case.
212
Ciro D’Elia and Giuseppe Scarpa
Fig. 8. Vertexes condition (a) Vertexes of 4 regions (b) Vertexes of 3 regions.
on the boundary points, and the internal points on the right of the boundary have the same label of its boundary. This property makes possible to implement step (2) of the reconstruction algorithm during the raster scan, when the internal points are filled with the value of the current brush, and the brush changes value on the boundaries. The procedure described above reconstructs the image from a representation in terms of vertices, chains, and their internal and external labels. To improve the encoding efficiency, of particular importance in compression applications, we can switch to a representation of the image in terms of shapes, so that just one label per shape is needed rather two labels per each chain of the shape. On the other hand, the reconstruction procedure from the shape representation is more complex, because we know only the internal label for the chains of each shape and the missing information (external label) must be recovered in some way. This can be accomplished, however, by observing that the external points are internal to other shapes. There are two possible cases, because an external point of a shape belongs either to an adjacent shape or to a surrounding shape. In the first case, the boundary label is painted with the right value when we reconstruct the adjacent shape, while in the case of a surrounding shape we have to retrieve same way the the its label. This can be accomplished during step (2), using a stack of brushes for each image line instead of a single brush. Indeed in each line, we can build, during the paint step, the stack of the labels of the surrounding regions in a such a way that it is extremely easy to access the needed information.
4
Conclusions
The proposed representation is based on the extended boundary concept first discussed in [6] and [7] and then used for a Contour tracing algorithm in [5]. This work extends the contour tracing algorithm proposed in [5] and introduces a reconstruction algorithm needed for coding purpose. Furthermore it has been defined an algebraic semi-group structure that allow the formal demonstration of the algorithm [5], his extension to other definition of boundaries, and the introduction of a raster contour tracing algorithm.
Contour-Based Shape Representation for Image Compression and Analysis
213
Acknowledgements The Authors are grateful to Prof. Francesco Tortorella for his precious suggestions and encouragements.
References 1. Giacinto Gelli, Giovanni Poggi: “Compression of Multispectral images by Spectal Classification and Transform Coding”, IEEE Transaction on Image Processing , volume.8, numero.4, pp.476-489, April 1999. 2. Gelli, G.; Poggi,G.; Ragozini, A.R.P.: “Multispectral-image compression based on tree-structured Markov random field segmentation and transform coding”, Geoscience and Remote Sensing Symposium, 1999. IGARSS ’99 Proceedings. IEEE 1999 International, Volume: 2 , 1999 Page(s): 1167 -1170 vol.2 3. D’Elia, Poggi, Scarpa: “An Adaptive MRF model for boundary preserving segmentation of multispectral images”, Eusipco 2002, Sept 2002. 4. C.D’elia, G.Poggi, G.Scarpa: “Advances in segmentation and compression of multispectral images”, Proc. IEEE IGARSS 01, vol.*, pp.*, Sidney, July 2001. 5. Yuh-Tay Liow: “A contour tracng algorithm that preserves common boundaries between regions”, Image Understanding , volume.3, numero.3, pp.313-321, May 1991. 6. H.F. Feng and T. Pavlidis: “The generation of polynomial outlines of objects from gray level pictures”, IEEE Trans. on Circuit and Systems, CAS-22, pp.427-439, 1975. 7. T. Pavlidis: “Structure Pattern Recongnition”, Springer-Verlang, Berlin, New York, 1977. 8. A. Puri and T. Chen: “Multimedia Systems, Standards, and Networks”, Signal Processing and Communications Series, Marcel Dekker Inc., March 2000. 9. MPEG-4 System Group: “Coding of audio-visual objects: video”, ISO/IEC JTC1/SC29/WG11 N2202, March 1998. 10. A.K. Katsaggelos, L.P. Kondi, F.W. Meier, J.Ostermann, and G.M. Schuster: “MPEG-4 and rate-distorsion-based shape-coding techniques”, Proc. IEEE 86, pp.1029-1051, 1998.
Systematized Calculation of Optimal Coefficients of 3-D Chamfer Norms C´eline Fouard1,2,3 and Gr´egoire Malandain1 1
2
Epidaure Research Project, INRIA Sophia Antipolis, France TGS Europe SA, PA Kennedy 1 - BP 227 F-33708 Merignac Cedex 3 INSERM U455 Toulouse
Abstract. Chamfer distances are widely used in image analysis, and many ways have been investigated to compute optimal chamfer mask coefficients. Unfortunately, these methods are not systematized: they have to be conducted manually for every mask size or image anisotropy. Since image acquisition (e.g. medical imaging) can lead to anisotropic discrete grids with unpredictable anisotropy value, automated calculation of chamfer mask coefficients becomes mandatory for efficient distance map computation. This article presents a systematized calculation of these coefficients based on the automatic construction of chamfer masks of any size associated with a triangulation that allows to derive analytically the relative error with respect to the Euclidean distance, in any 3-D anisotropic lattice. Keywords: chamfer distance, anisotropic lattice.
1
Introduction
Distance transformations (DTs) are widely used in image analysis since they allow to recover morphometric features of a binary shape. Among other applications, they can be applied to skeleton computation [1], Vorono¨i diagram construction, or shape-based interpolation [2]. Distance transformation transforms a binary image into a grey level image where the value of each foreground pixel corresponds to its shortest distance to the background. Brute-force computation of DT is not compatible with expected image analysis requirements, so DTs are usually computed by propagation. Exact Euclidean maps can be computed through Euclidean Distance Transformations (EDT). Several EDT have been proposed, using morphological operators [3,4], filters [5], several path on rows and columns [6], or propagating vectors [7,8], but lead to time and/or memory consuming algorithms. A good trade-off between precision and computational cost for DT is achieved by chamfer maps that have been made popular by Borgefors [9]. These maps are computed through two raster-scan on the image that propagate the distance values by the way of chamfer masks. The coefficients of the mask are (proportional) estimation of short-range distances: the larger the chamfer mask is, the closest to the Euclidean map the chamfer map will be. The calculation of optimal coefficients can be done by minimizing either an absolute error [10] or a relative one [11]. It has first been done for 2-D 3 × 3 masks [10] I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 214–223, 2003. c Springer-Verlag Berlin Heidelberg 2003
Systematized Calculation of Optimal Coefficients of 3-D Chamfer Norms
215
in isotropic lattices, then extended to larger masks [9,11] and to higher dimensions [12]. Anisotropic lattices have also been considered [13,14,15]. However, those calculations remain tedious and are not systematized: thus they have to be conducted manually for every mask size or anisotropy value. Our motivation is the computation of DT in 3-D medical images: they are usually acquired on anisotropic lattices (slice thickness is usually larger than the pixel size) and this anisotropy may vary from one acquisition to the other. The efficient computation of chamfer maps requires then the calculation of the chamfer mask’s coefficient to be automated. calculation of these coefficients for any mask size and any anisotropy value. In addition to classical error criteria, we also consider norm constraints [16] that guarantee predictable results. Our approach is based on the automatic construction of chamfer masks of any size associated with a triangulation that allows to derive analytically the relative error with respect to the Euclidean distance. In the following, we first recall some basic definitions. Then we describe error estimation and norm constraints. Some results (coefficients of isotropic 73 and anisotropic 33 masks) are given before we conclude.
2
Definitions and Notations
We recall here some notations and definitions. We consider the discrete space E = Z3 . An image I is an application defined on E. A discrete distance is an application d : E × E −→ N that verifies for all p, q, r ∈ E the following 4 properties: d(p) ≥ 0, d(p, q) = d(q, p),
and
d(p, q) = 0 ⇐⇒ p = q, d(p, q) ≤ d(p, r) + d(r, q).
Given a discrete distance d, the application n : E −→ N is a discrete norm on Z if and only if ∀p ∈ E, n(λp) = |λ|n(p) ∀λ ∈ Z. Let us consider a binary image I with foreground X and background X. The distance map DX is an application defined on E such that DX (p) = inf q∈X d(p, q). Distance maps can be approximated by chamfer maps, that can be computed with a two-passes (so-called forward and backward passes) algorithm [17]. To do so, we need to define the chamfer mask which is a set MC = {(vi , ωi ), 1 ≤ i ≤ m} of weighted vectors representing authorized displacements. It is centered in O, symmetrical with respect to its center and contains at least a base of E. Given a chamfer mask MC and two points p, q ∈ E, we define a path Ppq → − from A to B as a sequence of vectors vi ∈ MC so that pq is expressed as a → − linear combination of vectors: ni .vi with ni ∈ N. The cost W of a path pq = ni .ωi . The chamfer distance between two points Ppq is defined as W (Ppq ) = p, q ∈ E as the minimal possible cost, i.e. dC (p, q) = minPpq W (Ppq ).
3
Computing Optimal Coefficients for Chamfer Norms
Calculating the optimal weights for a given chamfer mask is usually achieved by minimizing the error (either absolute or relative) between the chamfer’s distance
216
C´eline Fouard and Gr´egoire Malandain
and the Euclidean one. Thanks to symmetry considerations, we can only consider the mask generator MgC , i.e. the part of the anisotropic chamfer mask MC that is included in the first eighth of the space, 18 Z3 , delimited by the half-lines (O, x), (O, y) and (O, z). Moreover, vectors are also chosen so that ∀i, j with i = j, ∃n ∈ N such that vi = nvj . Estimating the error between a chamfer distance and the Euclidean one is quite awkward when dealing with large masks. This difficulty can be reduced if we are able to triangulate the mask generator MgC into regular cones. A continuous cone, defined by a triplet of vectors and denoted by vi , vj , vk , represents the region of R3 delimited by the vectors vi , vj and vk , i.e. −−→
vi , vj , vk {M ∈ R3 |OM = λi .vi + λj .vj + λk .vk , λi , λj , λk ∈ R+ } A discrete cone,
vi , vj , vk , is the set of points of Z3 that are included in the continuous cone vi , vj , vk :
vi , vj , vk {M ∈ Z3 |M ∈ vi , vj , vk }. A regular cone is a discrete cone that verifies ∆i,j,k = ±1 where ∆i,j,k is the determinant of the matrix |vi vj vk | (first column is vector vi , etc). Regular cones have the interesting properties that any point of the cone can be expressed in the basis of the 3 vectors defining the cone, i.e. −−→
vi , vj , vk = {M ∈ Z3 |OM = λi .vi + λj .vj + λk .vk , λi , λj , λk ∈ N} only holds for regular cones [18]. Having a mask generator that can be triangulated into regular cones allows us to reduce the calculation of the error into independent calculations in each regular cone. In the following, we will only deal with such mask generators. To ensure that they can be triangulated into regular cones, we build them with the Farey triangulation [16]. This technique allows us to recursively (and automatically) built large mask generators MgC with their associated regular triangulation TCg = {
vi , vj , vk } (see appendix A). 3.1
Error Definition and Calculation
We have chosen to minimize the relative error between the chamfer distance and the Euclidean one, computed on planes x = cste, or y = cste, or z = cste. Let us consider a point P = (x, y, z). According to above section, we know that its chamfer distance to the origin O, denoted dC (P ), is a linear combination of the weights of the 3 vectors defining the regular cone it belongs to. We have −−→ then dC (P ) = a.ωi + b.ωj + c.ωk with OP = a.vi + b.vj + c.vk . Solving the latter expression yields (recall that ∆i,j,k = ±1 for regular cones) x xj xk xi x xk xi xj x 1 1 1 yi y yk , and c = yi yj y y yj yk , b = a= ∆i,j,k ∆i,j,k ∆i,j,k z zj zk zi z zk zi zj z and allows us to obtain dC (x, y, z) = α.x + β.y + γ.z with α = (yj zk − yk zj ).ωi + (yk zi − yi zk ).ωj + (yi zj − yj zi ).ωk β = (zj xk − zk xj ).ωi + (zk xi − zi xk ).ωj + (zi xj − zj xi ).ωk γ = (xj yk − xk yj ).ωi + (xk yi − xi yk ).ωj + (xi yj − xj yi ).ωk
Systematized Calculation of Optimal Coefficients of 3-D Chamfer Norms
217
Chamfer distances are usually computed with integer weights, and have to be scaled with a real factor, ε (typically the displacement associated with the smallest voxel size), to be compared to the Euclidean distance dE (x, y, z) = d2x x2 + d2y y 2 + d2z z 2 where dx , dy and dz denote the voxel size in the x, y, and z direction. The relative error to minimize is then defined by Erelative (x, y, z) =
dC /ε − dE 1 α.x + β.y + γ.z = − 1. dE ε d2 x2 + d2 y 2 + d2 z 2 x
y
(1)
z
Depending on the orientation of the cone, this error has to be minimized on either the plane x = M , or y = M , or z = M . Without loss of generality, we will only go into details for the case x = M ,M = 0. The error has then to be estimated on the triangle defined as the intersection between the cone
vi , vj , vk and the plan x = M . The vertices of this triangle are the points Vl = (M, M xyll , M xyll ) = (M, yl , xl ) for l = i, j, k. In plane x = M , the relative error can be rewritten Ex (y , z ) =
1 α + β.y + γ.z −1 ε d2 + d2 y 2 + d2 z 2 x
y
with y =
z
y z and z = . M M
(2)
Ex is continuous on a closed and bounded interval (the triangle Vi Vj Vk ), it is then bounded and reaches its bounds. Its extrema can be located either inside the triangle, or on the edges of the triangle, or at the vertices of the triangle. Let us consider the three cases. 1. The extremum is inside the triangle. By derivating 2, it comes that equation βd2x γd2x the extremum will be located at (ymax , zmax ) = αd2 , αd2 . If this point is y z inside the triangle Vi Vj Vk , it yields an extreme value 1 α2 β2 γ2 + + − 1. Ex (Vi Vj Vk ) = ε d2x d2y d2z 2. The extremum is on an edge. There are three edges, but we will only present the calculation for Vi Vj . In this case, a point M belonging to the edge can be represented by M = aVi + (1 − a)Vj yielding the relative error along the edge Ex (a) =
1 ε
√
(β.Y +γ.Z)a+(α+β.yj +γ.zj ) (d2y Y 2 +d2z Z 2 )a2 +2(d2y yj Y +d2z zj Z)a+d2x +d2y yj2 +d2z zj2
with
−1
(3)
Y = yi − yj . Z = zi − z j
After derivation, it can be shown that the extreme value is reached for amax = −
(β(yj Z−zj Y )+αZ)zj d2z +(γ(zj Y −yj Z)+αY )yj d2y −(βY +γZ)d2x . (β(yj Z−zj Y )+αZ)Zd2z +(γ(zj Y −yj Z)+αY )Y d2y
If 0 ≤ amax ≤ 1, the extreme value Ex (Vi Vj ) is given by Ex (amax ) whose form is not simple enough to be displayed here.
218
C´eline Fouard and Gr´egoire Malandain
3. The extremum is reached on one of the triangle’s vertices, Vl where the relative error value is given by 1 ωl − 1 with ||vl ||R = x2l d2x + yl2 d2y + zl2 d2z Ex (Vl ) = ε ||vl ||R Thus we are now able to compute both the minimum and maximum relative errors, τmin and τmax (please recall that they depend on ε), for a mask generator MgC by min min E (V V V ), min E (V V ), min E (V ) τmin = u i j k u l m u l l,m l vi ,vj ,vk ∈TCg max max Eu (Vi Vj Vk ), max Eu (Vl Vm ), max Eu (Vl ) τmax = g vi ,vj ,vk ∈TC
l,m
l
u denoting the plane (x, y, or z) where the error is minimized; Eu (Vi Vj Vk ) and Eu (Vl Vm ) being estimated only if the corresponding extremum lies in the correct interval. The global relative error is defined by τ (ε) = max(τmin (ε), τmax (ε)). According that τmin (ε) < 0 and τmax (ε) > 0, we can them equal in absolute
makemax + 1 [19]. We obtain value by changing the value of ε into εopt = ε τmin +τ 2 the optimal relative error τopt by τopt = −τmin (εopt ) = τmax (εopt ). 3.2
Norm Constraints
It can be shown that the chamfer distance dC induced by any chamfer mask MC is a discrete distance [20]. However, a distance that is not a norm is not invariant by homothety and this may not be desirable (for instance when comparing skeletons of the same object at different scales). Therefore, we introduce additional criteria to ensure that the computed weights will define a discrete norm. A distance is a norm if and only if its ball is convex, symmetric, and homogeneous. For chamfer Fig. 1. Equivalent rational masks, symmetry is achieved by construction, ho- ball of a 3D 5×5×5 isotropic mogeneity is due to the regular triangulation (also chamfer mask. obtained by construction) while convexity can be assessed on the equivalent rational ball of the chamQ fer mask [16]. Given a chamfer mask MC = {(vi ∈ Z3 , ωi ∈ N)} its P R equivalent rational mask is defined by MC = {(vi /ωi ∈ 3 Q , 1)}. The polyhedron defined by this equivalent rational S mask is the equivalent rational ball (see figure 1). O To check the convexity of the ball, we have to check Fig. 2. Notations for whether the ball is convex at each of its edges [16]: each equation 4.
Systematized Calculation of Optimal Coefficients of 3-D Chamfer Norms
219
edge must be “turned to the outside” of the ball. It turns out that we only have to check a local convexity criterion (LCC) at each edge of the equivalent rational ball. Given 2 faces (P, Q, S) and (Q, R, S) of a triangulation sharing edge (Q, S), the LCC can be expressed as xQ 1 y . Q LCC(P, Q, R, S) = ωP .ωQ .ωR .ωS zQ ωQ
4
xR yR zR ωR
xS yS zS ωS
xP yP ≥ 0. zP ωP
(4)
Automatic Calculation of Chamfer Mask Coefficients
The computation of optimal coefficients for a mask of size (2n + 1)3 is done in three steps: generation of the Farey triangulation, generation of the norm constraints, and iterative computation of the optimal sets of weights. 4.1
Building the Farey Triangulation
The recursive automated construction of the Farey triangulation of order n is described in appendix A. This triangulation TCg corresponds to isotropic chamfer mask generator MgC . When dealing with anisotropic lattice, one has to add extra vectors to the mask generator and extra cones to the triangulation. This is achieved by symmetry considerations. For instance, for a 33 mask, if the voxel size dz along z is different from the ones along x and y, dx and dy , we have to consider in the mask generator, in addition to the vectors {(1,0,0), (1,1,0), (1,1,1)}, the two extra vectors {(0,0,1), (1,0,1)} that corresponds to weights induced by the anisotropy. These extra vectors belongs to the two extra cones,
(1,0,0), (1,0,1), (1,1,1) and
(0,0,1), (1,0,1), (1,1,1), that are to be considered for the error computation and the local convexity constraints. 4.2
Generating Convexity Criteria
The triangulation TCg has been built as described above. It allows us to generate all the local convexity constraints (equation 4) that are to be verified. They have to be generated for every edge inside the mask generator, but also for the edges that are at the border of the mask generator. For the latter, the fourth point (see figure 2) is derived from symmetry considerations. Please notice that each of the generated LCC depends on 4 weights ωi . 4.3
Finding the Optimal Coefficients
This is the tough part. We have to identify the m-tuples (ω1 . . . ωm ) of weights corresponding to the chamfer mask generator MgC = {vi , 1 ≤ i ≤ m} to find the optimal ones that yield optimal error. These sets of optimal coefficients are searched by a brute-force method. However, we try to reduce this computationally expensive search by throwing away
220
C´eline Fouard and Gr´egoire Malandain
m-tuples (ω1 . . . ωm ) as soon as part of them do not satisfy the local convexity constraints (as sketched by below recursive algorithm1 ). 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13:
procedure Test( n ) if some LCCs can be verified with (ωi , . . . ωn ) then test these LCCs and return if one of them is not verified if n equals to m then {All ωi are set.} Compute the error τopt if this τopt is smaller than the previous one then (ωi , . . . , ωm ) is an optimal set of coefficients return for ωn+1 from ω1 ||vi ||∞ to ω1 ||vi ||1 do {Iteratively set a value to ωn+1 .} Test( n + 1 ) {Main Program} for ω1 from 1 to some user provided value do Test( 1 )
ω1 , the coefficient corresponding to the direction of smallest voxel size, varies from 1 to some maximal value provided by the user, while the other coefficients are searched in the interval [ω1 ||vi ||∞ , ω1 ||vi ||1 ]. Error computation is only performed on coefficients sets that verify all the local convexity constraints. As a result, this algorithm gives all the optimal m-tuples in lexicographical order. Table 1. 3 × 3 × 3 chamfer mask coefficients for anisotropic grid. aX 1 1 2 2 2 4 4 4 5 6 6 10
5
aY 2 2 3 3 3 6 6 6 8 9 9 15
aZ 3 3 6 6 6 12 12 12 15 18 18 30
bY Z 3 4 6 7 7 13 13 14 17 20 21 34
bXZ 3 4 6 6 7 12 13 13 16 19 19 32
bXY 2 2 3 4 4 7 7 7 9 11 11 18
c εopt τopt (%) 3 1.257 25.66 4 1.238 23.79 6 2.370 18.49 7 2.353 17.65 7 2.302 15.09 13 4.592 14.81 14 4.584 14.60 14 4.581 14.52 17 5.703 14.06 21 6.834 13.90 21 6.815 13.59 35 11.343 13.43
Results
Table 1 presents optimal sets of weights of a 3 × 3 × 3 chamfer mask for an anisotropic grid with dx = 1, dy = 1.5, dz = 3.0. The points belonging to this mask are: aX(1, 0, 0), aY (0, 1, 0), aZ(0, 0, 1), bY Z(0, 1, 1), bXZ(1, 0, 1), bXY (1, 1, 0), and c(1, 1, 1). The time needed to compute these sets is 958 ms. 1
Java code is available from http://www-sop.inria.fr/epidaure/personnel/Celine.Fouard/.
Systematized Calculation of Optimal Coefficients of 3-D Chamfer Norms
221
Table 2 presents optimal sets of weights the associated maximum relative error for 7 × 7 × 7 isotropic chamfer masks. The points belonging to this mask are: a(1, 0, 0), b(1, 1, 0), c(1, 1, 1), d(2, 1, 0), e(2, 1, 1), f (2, 2, 1), g(3, 1, 0), h(3, 1, 1), i(3, 2, 0), j(3, 2, 1), k(3, 2, 2), l(3, 3, 1), m(3, 3, 2). The computational times needed to examine all the m-tuples with ω1 less or equal to 5, 7, 10, and 14 are respectively of 2 min, 25 min, 6 h 37 mn, and 102 h. Table 2. 7 × 7 × 7 chamfer mask coefficients. a 1 1 2 2 2 3 4 5 5 7 8 10 12 14
6
b 1 2 2 3 3 4 6 7 7 10 11 14 17 20
c 1 2 3 3 4 5 7 9 9 12 14 17 21 24
d 2 3 4 5 5 6 9 11 11 16 18 22 27 31
e 2 3 4 5 6 7 10 12 12 17 19 24 29 34
f 2 4 5 6 7 9 13 15 15 21 24 30 36 43
g 3 4 6 7 7 9 13 16 16 22 25 32 38 44
h 3 4 6 7 8 9 14 16 17 23 26 33 40 46
i 3 5 6 8 8 10 15 18 18 26 29 36 44 51
j 3 5 6 8 9 11 16 19 19 27 30 37 45 53
k 3 5 7 8 10 12 17 21 21 29 33 41 49 58
l 3 6 7 9 10 13 19 22 22 31 34 43 52 62
m εopt τopt (%) 3 1.211 21.13 6 1.207 20.71 8 2.293 14.64 9 2.252 12.60 11 2.225 11.24 14 3.158 5.28 20 4.179 4.49 24 5.186 3.72 24 5.149 2.97 33 7.176 2.51 38 8.184 2.30 47 10.224 2.24 56 12.245 2.04 67 14.248 1.77
Conclusion
We have proposed an automated approach to compute optimal chamfer norm coefficients for mask of any size and for lattice of any anisotropy. It is based on the Farey triangulation that permits us to recursively build large masks while ensuring a regular triangulation of the chamfer mask generators. It allows us to automatically compute the error of any mask, thanks to analytical expressions of errors we can derive on regular cones. In addition, the coefficients we calculate verify norm constraints, thus yields scale invariant chamfer maps.
References 1. C.J. Pudney. Distance-ordered homotopic thinning: A skeletonization algorithm for 3d digital images. CVIU, 72(3):404–413, 1998. 2. G.T. Herman, J. Zheng, and C.A. Bucholtz. Shape-based interpolation. IEEE Computer Graphics & Applications, pages 69–79, 1992. 3. F.Y Shih and O.R. Mitchell. A mathematical morphology approach to euclidean distance transformation. IEEE Trans. on Image Processing, 1(2):197–204, 1992. 4. C.T. Huang and O.R. Mitchel. A euclidean distance transform using grayscale morphology decomposition. IEEE Trans. on PAMI, 16(4):443–448, 1994. 5. T. Saito and J.I. Toriwaki. New algorithms for euclidean distance transformation of an n-dimensional digitized picture with applications. Pattern Recognition, 27(11):1551–1565, 1994.
222
C´eline Fouard and Gr´egoire Malandain
6. T. Hirata. A unified linear-time algorithm for computing distance maps. Information Processing Letters, 58:129–133, 1996. 7. P.E. Danielsson. Euclidean distance mapping. CGIP, 14:227–248, 1980. 8. I. Ragnemalm. The euclidean distance transform in arbitrary dimensions. PRL, 14(11):883–888, 1993. 9. G. Borgefors. Distance transformations in digital images. CVGIP, 34(3):344–371, 1986. 10. G. Borgefors. Distance transformations in arbitrary dimensions. CVGIP, 27:321– 345, 1984. 11. B.J.H Verwer. Local distances for distance transformations in two and three dimensions. PRL, 12:671–682, 1991. 12. G. Borgefors. On digital distance transforms in three dimensions. CVIU, 64(3):368– 376, 1996. 13. D. Coquin and Ph. Bolon. Discrete distance operator on rectangular grids. PRL, 16:911–923, 1995. 14. J.F. Mangin, I. Bloch, J. L´ opez-Krahe, and V. Frouin. Chamfer distances in anisotropic 3D images. In VII European Signal Processing Conference, Edimburgh, UK, 1994. 15. I.M. Sintorn and G. Borgefors. Weighted distance transfoms for images using elongated voxel grids. In Proceedings of DGCI, pages 244–254, 2002. LNCS 2301. 16. E. Remy. Optimizing 3d chamfer masks with norm constraints. In IWCIA, pages 39–56, July 2000. 17. A. Rosenfeld and J.L. Pfaltz. Sequential operations in digital picture processing. JACM, 13(4):471–494, 1966. 18. G.H. Hardy and E.M. Wright. An Introduction to the Theory of Numbers. Oxford University Press, 1978. 19. E. Thiel. Les distances de chanfrein en analyse d’images : fondements et applications. PhD thesis, Universit´e Joseph Fourier, 1994. 20. E. Remy. Normes de chanfrein et axe m´ edian dans le volume discret. PhD thesis, Universit´e de la M´editerran´ee, Marseille, France, 2001.
A A.1
Recursive Farey Triangulation Construction
Farey Set Points
A Farey set F n of order n is a set of all the irreducible points xy , xz in Q ∩ [0, 1] whose denominator does not exceed n. It is built only with visible points (this means the greatest common divisor of (x, y, z) is 1). A Farey set of order n correspond to the vectors of the generator of a 3-D chamfer mask of size (2n + 1)3 . For example, the ordered (lexicographical order) Farey set of order 1 F 1 = {( 01 , 01 ), ( 11 , 01 ), ( 11 , 11 )} correspond to the set of vectors {(1, 0, 0), (1, 1, 0), (1, 1, 1)} which is the generator of an isotropic chamfer mask of size 33 . Other vectors that are involved in an anisotropic chamfer mask are deduced from the previous ones by symmetries. by The Farey set of order n + 1, F n+1 , can be built from Fn
y z y z y z y z , , , ∈ Fn Fn+1 = Fn ∪ x , x + x , x with x + x ≤ n and x x x x y+y z+z xy , xz = x+x [18]. the addition being defined by xy , xz + , x+x
Systematized Calculation of Optimal Coefficients of 3-D Chamfer Norms (1,1,1)
(1,1,1)
(1,1,1)
223 (1,1,1)
(2, 1 1)
(2, 1 1)
(2, 1 1)
(1,0,0)
(1,1,0)
(1,0,0)
(1,1,0)
(1,0,0)
(2,1,0)
(1,1,0)
(1,0,0)
(2,2,1)
(2,1,0)
(1,1,0)
Fig. 3. Construction of T2 from T1 . (1,1,1)
(1,1,1)
(4,3,3)
(3,2,2) (2, 1 1) (3,1,1)
(3,2,2)
(3,3,2)
(3,2,1)
(4,2,2)
(3,1,0) (2,1,0) (3,2,0)
(1,1,0)
(3,2,1)
(3,1,1)
(3,3,1)
(4,1,1)
(1,0,0)
(3,3,2)
(2, 1 1)
(2,2,1)
(1,0,0)
(4,4,3)
(4,3,1) (4,2,1)
(4,1,0) (3,1,0) (2,1,0) (3,2,0) (4,3,0)
(2,2,1) (3,3,1) (4,4,1)
(1,1,0)
Fig. 4. T3 and T4 .
A.2
Recursive Construction of Farey Set Triangulations
The triangulation T1 associated to F 1 is composed of a single cone
(1,0,0), (1,1,0), 0 0 1 0 1 1 (1,1,1), or equivalently a Farey triangle
( 1 , 1 ), ( 1 , 1 ), ( 1 , 1 ), that is regular. To build Tn+1 from Tn , we first put all the Farey triangle in a list L. We now examine successively the triangle in L, and will try to build new triangles by splitting the existing one into two triangles. Let us consider the triangle
A, B, C of L. We try to add a new vertex along its largest edge2 , say AB. Such a vertex belongs to Fn+1 if and only if xa + xb ≤ n + 1. If the latter is not true, the triangle is put again in the list but will no more be considered. If xa + xb ≤ n + 1 is true, let us denote C = A+B the new Farey point: the two triangles
A, C, C and
B, C, C are put into the list L. It can also recursively be shown that those two triangles are regular. The construction of Tn+1 stops when no more triangles, whose vertices are Farey points of order n + 1, can be inserted into L. Figure 3 shows the different steps of the construction of T2 from T1 . T3 and T4 are displayed in Figure 4.
2
We consider that large discrepancies between the chamfer distance and the Euclidean one are more likely to occur along the largest edges.
Look-Up Tables for Medial Axis on Squared Euclidean Distance Transform Eric Remy1 and Edouard Thiel2 1
LSIS (UMR CNRS 6168) - ESIL, Case 925, 163 Av. de Luminy, 13288 Marseille Cedex 9, France [email protected] 2 LIF (UMR CNRS 6166) - Case 901, 163 Av. de Luminy, 13288 Marseille Cedex 9, France [email protected] http://www.lim.univ-mrs.fr/˜thiel
Abstract. Medial Axis (MA), also known as Centres of Maximal Disks, is a useful representation of a shape for image description and analysis. MA can be computed on a distance transform, where each point is labelled to its distance to the background. Recent algorithms allow to compute Squared Euclidean Distance Transform (SEDT) in linear time in any dimension. While these algorithms provide exact measures, the only known method to characterize MA on SEDT, using local tests and Look-Up Tables, is limited to 2D and small distance values [5]. We have proposed in [14] an algorithm which computes the look-up table and the neighbourhood to be tested in the case of chamfer distances. In this paper, we adapt our algorithm for SEDT in arbitrary dimension and show that results have completely different properties. Keywords: Medial Axis, Centres of Maximal Disks, Look-Up Tables, Squared Euclidean Distance Transform, Digital Shape Representation.
1
Introduction
Blum proposed in [2] the medial axis transform (MAT), which consists in detecting the centres of the maximal disks in a 2D binary shape. Following Pfaltz and Rosenfeld in [11], a disk is said to be maximal in a shape S, if it is not completely covered by any single other disk in S. The medial axis MA of S is the set of centres and radii of maximal disks in S; an example is given Figure 1. Pfaltz and Rosenfeld have shown that the union of maximal disks in S is a covering, thus MA is a reversible coding of S. MA is a global representation, centred in S, allowing shape description, analysis, simplification or compression. While MA is often disconnected and not thin in Zn , further treatments are applied to achieve shape analysis. In this way, MA is an important step for weighted skeleton computation [17]. A maximal disk can be included in the union of other maximal disks; so the covering by maximal disks, which is unique by construction, is not always minimal. Minimizing this set while preserving reversibility can be interesting for compression, see [10,4]. I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 224–235, 2003. c Springer-Verlag Berlin Heidelberg 2003
Look-Up Tables for Medial Axis on Squared Euclidean Distance Transform
225
1111 0000 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 Fig. 1. Medial Axis with circles.
Fig. 2. Balls inside the shape.
One attractive solution to detect MA is to use a distance transform, denoted DT. In a distance transform on S, each pixel is labelled with its distance to the background; it is also the radius of the largest disk in S, centred on the pixel. A reverse distance transform (RDT) allow to recover the initial shape from MA. Rosenfeld and Pfaltz have shown in [15] for the city block and chessboard distances d4 and d8 , that it is sufficient to detect the local maxima on the DT image. For chamfer (i.e.weighted) distances using 3×3 masks, Arcelli and Sanniti di Baja proved in [1] that some labels have to be lowered on the DT before identifying the local maxima; but their solution cannot be extended to larger masks. Borgefors presented in [3] a method to extract MA in the case of a 5 × 5 chamfer mask (namely, 5, 7, 11), using a look-up table. Borgefors, Ragnemalm and Sanniti di Baja have previously used the same method for SEDT in [5],√but giving a partial look-up table, which cannot be used for radius greater than 80. The principle of look-up table (LUT) is general: it gives for each radius value read in the DT , the minimum value of the neighbours which forbids a point to be in MA. The problem is to systematically compute the LUT associated with a distance function, for any radius, and also to compute the test neighbourhood (which is not necessarily 3 × 3 as seen later). In [14] we have shown an efficient algorithm which computes both of them for any chamfer norm in any dimension. The first Euclidean distance transforms (EDT), proposed by Danielsson [6] and Ragnemalm [12], give approximate results, which where improved afterwards by many authors. Saito and Toriwaki in [16] have presented an efficient algorithm computing exact SEDT (S for Squared) in arbitrary dimension. Recently, Hirata [8] and Meijster et al. [9] have optimized this algorithm to linear time complexity in the number of pixels. Reverse SEDT can be easily derived from [16,8,9]. These exact and fast transforms bring about renewed interest in MA computation for Euclidean distance. We present in this paper an adaptation of [14], which efficiently computes the LUT for SEDT in any dimension. Our algorithm also computes the test neighbourhood, and certifies that this neighbourhood is sufficient up to a given radius. We recall in §2 some basic notions and definitions. We present and justify in §3 our method. Results are given in §4 in the 2D and 3D cases, and we finally conclude in §5.
226
Eric Remy and Edouard Thiel
Fig. 3. The generators G(Zn ) for n = 2, 3 and 4 in projection.
2 2.1
Definitions Generator and Grid Symmetries
The rectilinear grid of Zn has a number of natural symmetries, which we employ to simplify our study. We denote SG (n), the group of axial and diagonal symmetries in Zn . The cardinal of the group is #SG (n) = 2n n! (which is 8, 48 and 384 for n = 2, 3 and 4). A subset X of Zn is said to be G-symmetrical if for all σ ∈ SG (n) we have σ(X) = X. We call generator of X the subset G(X) = (x1 , ..., xn ) ∈ X : 0 xn xn−1 . . . x1 . (1) If X is G-symmetrical, the subset G(X) is sufficient to reconstruct X with the G-symmetries. Figure 3 shows G(Zn ) for n = 2 (an octant), n = 3 and 4 (cones). 2.2
Balls and Reverse Balls
We call direct ball B and reverse ball B −1 of centre p ∈ Zn and radius r ∈ N, the G-symmetric sets of points B(p, r) = q ∈ Zn : dE2 (p, q) ≤ r (2) −1 n 2 B (p, r) = q ∈ Z : r − dE (p, q) > 0 . (3) Since dE2 is integral, balls and reverse balls are linked by the relation B(p, r) = B −1 (p, r + 1) .
(4)
We point out that on DT , the value DT [p] for any shape point p is the radius of the greatest reverse ball centred in p inside the shape, namely B −1 (p, DT [p]). 2.3
Look-Up Tables
In the following, we denote MLut a G-symmetric set of vectors, MgLut = → → → v g = G(− v ) for any vector − v ∈ MLut . G(MLut ) and −
Look-Up Tables for Medial Axis on Squared Euclidean Distance Transform
227
A shape point p is the centre of a maximal disk if there is no other shape point q such that the ball B −1 (q, DT [q]) entirely covers the ball B −1 (p, DT [p]). The presence of q forbids p to be an MA point. Suppose that it is sufficient to search q in a local neighbourhood MLut of p. Suppose also that we know for each DT [p] the minimal value DT [q], stored in a look-up table Lut, which forbids p in → →. The minimal value for p and − → → direction − v =− pq v is stored in Lut[− v ][DT [p]]. Because of the G-symmetry, it is sufficient to store only the values relative to → → v is accessed using Lut[− v g ][DT [p]]. MgLut ; hence the minimal value for p and − Finally we have the following criterion: → → → p ∈ MA ⇐⇒ DT [p + − v ] < Lut[− v g ][DT [p]] , ∀− v ∈M . (5) Lut
3
Computation of Lut and MLut for SEDT
Computing an Entry of Lut → The computation of an entry Lut[− v ][r] in the look-up table for r = DT [p] in → − → direction v , consists in finding the smallest radius R of a ball B −1 (p + − v , R) 3.1
which completely covers B −1 (p, r) (see Figure 2). Since all considered balls are convex, G-symmetric and such that if r1 ≤ r2 then B(O, r1 ) ⊆ B(O, r2 ), we can limit the covering test by restricting the two balls to G(Zn ). One can find R, as illustrated in Figure 4, by decreasing the radius R+ while keeping the ball → → v = p− − v g by symmetry. B −1 (q, R+ ) covering the ball B −1 (p, r), where q = p+ − A basic method, using a reverse SEDT for each step, would be prohibitive. We avoid it by using relation (4), and another distance image denoted CT g , resulting from the cone transform in Figure 6, where each point of G(Zn ) is labelled with its distance to the origin (see example Figure 14.a). The covering of the ball B −1 (q, R+ ) over B −1 (p, r) can be tested by simply scanning CT g ; moreover, the smallest radius R can be read in CT g during the scan. We propose to translate both B −1 (p, r) and B −1 (q, R) to the origin as shown in Figure 5. We scan each point p1 of G(B −1 (O, r)), which by translation → of vector − v g gives p2 . Values dE2 (O, p1 ) and dE2 (O, p2 ) are read in CT g . We have → v g , p1 ∈ G(B −1 (O, r)) , so (6) R = max dE2 (O, p2 ) : p2 = p1 + − 2 → − g −1 R = max dE (O, p1 + v ) : p1 ∈ G(B (O, r)) . (7) This process can be efficiently implemented (see Figure 7), because all the → covering relations (r, R) in a direction − v g can be detected during the same scan (lines 2–7). To remain in the bounds of the CT g image, the x scan is limited → → → to L − − v gx − 1 (where − v gx is the x component of − v g ). For each point p1 , we g look for the corresponding radius r1 which is CT [p1 ] + 1 by (4). Then we look for the radius r2 of the ball passing via the point p2 . Its value is CT g [p2 ] + 1 = → → v g ] + 1, by (4). During the scan, we keep in Lut[− v g ][r1 ] the greatest CT g [p1 + − value found for r2 , which at the end, is R by (7). At this stage, our algorithm gives a set of local covering relations, which stands for a partial ordering on the covering of balls. This ordering is not to→ tal since one can observe in Lut, cases where ra < rb while Lut[− v g ][ra ] >
228
Eric Remy and Edouard Thiel
½
½
¾
·
Fig. 4. Covering test on two balls restricted to G(Z2 ).
½ ¾
·
½
¾
¾
Fig. 5. Translated covering test on CT g .
½ ¾ ½ ½ ½ ¾½ ¾
Fig. 6. Fast Cone Distance Transform. Input: L the side length. Output: CT g the Ln distance image to the origin for dE2 .
→ Lut[− v g ][rb ] ; it means that the ball covering B −1 (O, ra ) is bigger than the ball covering B −1 (O, rb ), which is impossible. Thus, we correct the table by assum→ → ing that in this case, Lut[− v g ][rb ] should at least equal Lut[− v g ][ra ], building this way a compatible total order (Figure 7, lines 8–10). 3.2
Computing MLut
Let us assume that a given MgLut is sufficient to extract correctly the MA from any DT which values does not exceed RKnown . This means that MgLut enables to extract, from any ball B(O, R) where R ≤ RKnown , an MA which is by definition, the sole point O. At the beginning, MgLut is empty and RKnown = 0. So as to increase RKnown to a given RTarget , we propose to test each ball B(O, R), where R > RKnown , each time extracting its DT and then its MA, until whether R reaches RTarget , or a point different from O is detected in the MA of B(O, R). If R reaches RTarget , then we know that MgLut enables to extract the MA correctly, for any DT containing values lower or equal to RTarget . Thus this value RTarget must be kept as the new RKnown . On the contrary, if one extra point p is found in MA during the scan, then MgLut is not sufficient to properly extract the MA, since by construction B(O, R) −→ covers B −1 (p, DT g [p]). In this case we add a new vector Op in MgLut (and keep R for further usage, see §4.2). This vector is necessary and sufficient to remove p from the MA of the ball B(O, R) because the current MgLut is validated until
Look-Up Tables for Medial Axis on Squared Euclidean Distance Transform
229
½ ½ ¾ ½ ½ ½ ½
½ ¾ ½
¾ ½ ¾ ½ ½ ¾
→ v g the Fig. 7. Lut Column Computation. Input: CT g the cone, L the side length, − direction of the search, Rmax the greatest radius value to be verified in Lut. Output: → the column Lut[− v g ] is filled with the correct values.
&' ( ½ ¾ ½ ½ ½
½
½
! "# )
*+,( ½ ¾ ½ ½ ½
&- ½
½
$ ½
½
&- ½
Fig. 8. Full MgLut and Lut Computation. Input: L the side length, MgLut , RKnown and RTarget . Output: Lut, MgLut and RTarget . At first call, MgLut and RKnown must be set to ∅ and 0 respectively. After exit, RKnown must be set to RTarget .
&- ) % # )
Fig. 9. Fast extraction of MA points from G(B). Input: p the point to test, MgLut the generator of the Lut neighbourhood, Lut the look-up table, DT g the distance transform of the section of the ball. Output: returns true if point p is detected as MA in DT g .
230
Eric Remy and Edouard Thiel
R − 1; thus it enables to find all the direct balls covering B −1 (p, DT g [p]) of radii lower or equal to R − 1. So, the only direct ball which is not tested is the only −→ ball of radius R : B(O, R) itself. This ball is in direction pO from p and must be searched by MgLut to remove p. Since MLut is G-symmetric, B(O, R) is detected −→ by adding Op in its generator. After having added the vector, we compute the corresponding new column in Lut. Then, we ensure that this new MLut is sufficient to remove p. This is actually a consistency test of the Lut column computation algorithm of Figure 7, because we are sure that the new MLut is correct. Once p is removed, we resume the scan for current R. Other extra points p may be detected sequentially, each time giving a new vector and Lut column. The computation of MgLut is finished when R reaches RTarget . The full algorithm, presented in Figure 8, uses an adapted version of MA extraction (see Figure 9), working on G(Zn ) with MgLut in a single scan. Note also that the computation of DT g (function CompSEDTg called Figure 8, line 9), using a slightly modified SEDT working in G(Zn ), is mandatory, since the MA is extracted from the DT to the background. In fact, a simple threshold on image CT g to the radius R gives only the G(B(O, R)) set, but not the correct DT g labels (see Figure 14, where values of (a) differ from (b)).
4 4.1
Results for SEDT Complexity
While the function dE2 is not a metric (triangular inequality is not satisfied), its balls respect sufficient conditions for the validity of our method (convexity, G-symmetry and increase by inclusion). The same can be applied for discrete functions round(dE ), dE and dE (successfully tested). For CompSEDTg (not presented), we have chosen to use a modified version of the algorithm in [16], which provides exact results and can be relatively easily adapted to G(Zn ). In particular, backward scans can be suppressed [13, §6.5.2]. Note that SEDT on a ball is the worst case for the complexity of [16], and that optimised algorithms [8,9] are noticeably more efficient for large radii. The complexity in Zn of CompSEDTg for a ball of radius R is O(n.Rn ) with [8,9] or O(n.Rn+1 ) with [16]. The complexity of CompLutCol is O(2.Rn ) (one scan of G(Zn ) plus one scan of a Lut column). The complexity of IsMAg, with a number k of directions to test, is O(k.Rn ) in the worst case, that is to say, when p is detected as an MA point. Since this event is seldom, the algorithm returns almost always early, hence the real cost of IsMAg is negligible. In CompLutMask, the complexity of one iteration of the main loop (lines 4–16 in Figure 8) is thus the complexity of CompSEDTg. As CompLutMask makes radius R increase, its total cost grows quite fast. We present the results of our method in 2D and 3D in Figures 10 and 13. Computing the MgLut shown Figure 10 takes 590s, while computing one corresponding Lut column takes 0.004s, for L = 400 and from RKnown = 0 to
Look-Up Tables for Medial Axis on Squared Euclidean Distance Transform
231
Fig. 10. Beginning of MgLut for Z2 (appearance rank i, coordinates, appearance radius R).
Fig. 12. Beginning of MLut for Z2 (radius r, → next columns Lut[− v g ][r] ).
Fig. 13. Beginning of MgLut for Z3 (appearance rank i, coordinates, appearance radius R).
232
Eric Remy and Edouard Thiel
RTarget = 128 200 (on a Pentium 4 at 2.26 GHz with Debian Gnu/Linux 2.4.19). This load is explained by the systematic test of about 26 000 balls. As expected, CompLutCol is very fast, whereas CompLutMask is much slower, and its resulting (and compact) MgLut should thus be saved for further re-usage. The memory required to store Lut is m.R.e, where m is the number of columns in MgLut for R, and e is the size of one long integer (to store dE2 values). In Figures 10 and 13 we can see that m grows slowly with R. Since R grows with the square of the radius in pixel of the largest Euclidean ball tested, the memory cost of Lut becomes important for large images. For instance, the size of the Lut corresponding to Figure 10 is 23 MB. Memory can be saved by storing only possible values of dE2 . The set of possible values in 2D is S = { a2 + b2 R : a, b ∈ [0 .. R] }. The Lut entries are then → accessed by Lut[− v g ][index[r]], where index is a table of size R + 1, built in a single scan on CT g , which gives for any r ∈ [0 . . . R] the rank index[r] in S. The gain for Lut corresponding to Figure 10 is about 78% with only 5.1 MB to store. The same holds in 3D, but in lesser proportion. On the contrary in 4D and higher dimensions, any positive integer can be decomposed in sum of four (or more) squares (Lagrange thm., see [7, §20.5]), so that no space can be saved in this manner. 4.2
Extracting Medial Axis
A sample usage of the Lut given Figure 12 and formula (5) is : a point valued 4 on DT is not an MA point if, following third entry in table, it has at least a (1, 0)-neighbour 6, or a (1, 1)-neighbour 9, or a (2, 1)-neighbour 14, etc. The table is compressed by showing only possible radii r. In Figures 10 and 13 are given the vectors of MgLut in 2D and 3D respectively, and also their appearance radius R during CompLutMask. Keeping this radius is important because it allows to limit the number of directions to test for each point during whole MA extraction. In a DT where the greatest value is Rmax , it is necessary and sufficient to take the subset → max MR = { (− v ; R) ∈ MLut : R < Rmax } as the test neighbourhood to detect Lut max all MA points. In fact, CompLutMask garanties that MR is necessary and Lut g sufficient up to RKnown = Rmax − 1 in CT (as a radius of direct ball), thus by (4), up to Rmax in DT (as a radius of reverse ball). For example in Figure 10, if Rmax = 101 on DT , then the test neighbourhood will be limited to (1, 0)-neighbours and (1, 1)-neighbours. The extraction of MA from a binary image I can be divided in the following steps. One must first compute SEDT, then search Rmax in the resulting DT . Next, CompLutMask is applied using the Rmax value as RTarget ; this step can be avoided if a sufficient MgLut , computed once for all, is already stored. The max is then used to extract MA, which is initialized to shape points. subset MR Lut To minimize memory usage, we propose to allocate only one Lut column, instead max the whole Lut, which might be very large of computing for Rmax and #MR Lut → − g max as seen in §4.1 : for each vector v in MR Lut , we overwrite the previous column
Look-Up Tables for Medial Axis on Squared Euclidean Distance Transform
233
using CompLutCol, then reject from MA all the points which do not fulfill (5) → with the G-symmetries of − v g . This way, the MA set often decrease extremely fast at each step, thus accelerating the computation. 4.3
Properties
Two reverse balls of radii r and r are said equivalent if the sets of pixels B −1 (O, r) and B −1 (O, r ) are the same (even if the labels of the pixels on the DT are generally different). The equivalence class of a reverse ball is the interval of radii for which the reverse balls are equivalent. In Zn , the equivalence classes are easily obtained by underlining possible values in DT (i.e. integers which can be written in sum of n squares); the equivalence class of a possible value b is [ a . . . b ] where a − 1 is the largest possible value less than b. The first equivalence classes in 2D are [1], [2], [3, 4], [5], [6, 7, 8], [9], [10], [11, 12, 13], etc. Equivalence classes of size > 1 exist in 2D and 3D because the sum of two or three squares does not fill N. All the balls are different for dimension n 4 because of Lagrange theorem; we think that this might have implications over properties of MLut and Lut which are linked to equivalence classes. Our algorithm CompLutCol in Figure 7 gives the low bound of each equivalence class. We remark that the values published in [5] correspond to the high bounds; in that sense, the two tables must be considered as equivalent. Figure 10 also confirms the 3 × 3 test neighbourhood used in [5] for radii less than 80 in 2D, because the third direction only appears for R = 101. We illustrate in Figure 14 the appearance of the direction (2, 1) in MLut for R = 101 in Z2 . The radius R = 101 of a direct ball (Figure 14.a) corresponds by (4) to radius R = 101 + 1 of reverse ball. Since equivalence class of 102 is [102, 103, 104], CompSEDTg labels O to 104 (Figure 14.b). When extracting MA with 2 test directions (0,1) and (1,1), the point labelled 65 is detected since its reverse ball is not completely overlapped by the reverse balls of its neighbours (Figure 14.c,d), while it is overlapped in direction (2,1) (Figure 14.e). Our experiments in 2D and 3D show that MLut is not bounded for dE2 , unlike chamfer distances (see [14]). Figure 11 geometrically represents the set of vectors in MgLut from Figure 10 with their rank of appearance. While layout seems random, one can note that all MLut points are visible points. A point (x1 , . . . , xn ) is said visible (from the origin) if gcd(x1 , . . . , xn ) = 1; the set of visible points in Zn is denoted V n (see [18]). When carrying on computation of MgLut with CompLutMask, all visible points seems to be gradually detected, while non-visible points never are. We therefore propose the conjecture: n lim MR Lut = V .
R→∞
(8)
These properties for dE2 are very different from those of chamfer distances (see [14]), where MLut are always bounded, Lut are bounded in most cases, and non-visible points may appear in MLut . We think this is linked to the number of normals of the balls, which is unbounded for infinite Euclidean balls, while bounded for chamfer balls.
234
Eric Remy and Edouard Thiel
Fig. 14. Appearance of vector (2, 1) in MLut for obtained using points 101 from CT g , and gives which MA is extracted. In (c), B −1 (65) (in gray) direction (1, 0), nor in (d) by B −1 (85) in direction B −1 (104) in direction (2, 1).
5
R = 101 in Z2 . In (a), B(101) is after SEDT, B −1 (104) in (b), on is not overlapped by B −1 (80) in (1, 1), but is overlapped in (e) by
Conclusion
The computation of the medial axis (MA) from the squared Euclidean distance transform (SEDT) is detailed for arbitrary dimension. The principle of MA extraction using look-up tables (Lut) was already published for dE2 in 2D for small values and 3 × 3 neighbourhood in [5], but no general method to compute them was given. We have introduced the mask MLut , which stores the test neighbourhood used during the MA extraction. We showed that, in the general case, the mask MLut is greater than just the 3n neighbourhood. We have presented and justified efficient algorithms which compute both Lut and MLut for dE2 . Our algorithms certify that MLut is sufficient up to a given ball radius. We give a sample Lut table in 2D for comparison with [5]. We give two sets of MgLut in 2D and 3D which enable a simple MA extraction using only the Lut table computation algorithm (provided that the greatest radius R in the image is lower than 128 178 in 2D and 947 in 3D). Our experimentations show that, in the case of dE2 , the neighbourhood MLut to test is a set of visible points. Unlike seen in the case of chamfer distances in [14], this set seems to grow forever as the radius R of the greatest possible ball in the image grows. A further work needs to be done to get a better understanding of the inclusions of discrete Euclidean balls and to find arithmetical rules.
Look-Up Tables for Medial Axis on Squared Euclidean Distance Transform
235
References 1. C. Arcelli and G. Sanniti di Baja. Finding local maxima in a pseudo-Euclidean distance transform. Comp. Vision, Graphics and Image Proc., 43:361–367, 1988. 2. H. Blum. A transformation for extracting new descriptors of shape. In W. Wathendunn, editor, Models for the Perception of Speech and Visual Form, pages 362–380, Cambridge, 1967. MIT Press. 3. G. Borgefors. Centres of maximal disks in the 5-7-11 distance transform. In 8 th Scand. Conf. on Image Analysis, pages 105–111, Tromsø, Norway, 1993. 4. G. Borgefors and I. Nystr¨ om. Efficient shape representation by minimizing the set of centres of maximal discs/spheres. Pat. Rec. Letters, 18:465–472, 1997. 5. G. Borgefors, I. Ragnemalm, and G. Sanniti di Baja. The Euclidean Distance Transform : finding the local maxima and reconstructing the shape. In 7 th Scand. Conf. on Image Analysis, volume 2, pages 974–981, Aalborg, Denmark, 1991. 6. P.E. Danielsson. Euclidean distance mapping. Comp. Graphics and Image Proc., 14:227–248, 1980. 7. G.H. Hardy and E.M. Wright. An introduction to the theory of numbers. Oxford University Press, fifth edition, October 1978. 8. T. Hirata. A unified linear-time algorithm for computing distance maps. Information Proc. Letters, 58:129–133, 1996. 9. A. Meijster, J.B.T.M. Roerdink, and W.H. Hesselink. A general algo. for comp. distance trans. in linear time. In Goutsias and Bloomberg, editors, Math. Morph. and its App. to Image and Signal Proc., pages 331–340. Kluwer, 2000. 10. F. Nilsson and P.E. Danielsson. Finding the minimal set of maximum disks for binary objects. Graph. Models and Image Proc., 59(1):55–60, 1997. 11. J.L. Pfaltz and A. Rosenfeld. Computer representation of planar regions by their skeletons. Comm. of ACM, 10:119–125, feb 1967. 12. I. Ragnemalm. The Euclidean distance transform in arbitrary dimensions. Pat. Rec. Letters, 14(11):883–888, 1993. 13. E. Remy. Normes de chanfrein et axe m´ edian dans le volume discret. PhD, Univ. de la M´editerran´ee, Aix-Marseille 2, Dec 2001. 14. E. Remy and E. Thiel. Medial axis for chamfer distances: computing look-up tables and neighbourhoods in 2D or 3D. Pat. Rec. Letters, 23(6):649–661, April 2002. 15. A. Rosenfeld and J.L. Pfaltz. Sequential operations in digital picture processing. Journal of ACM, 13(4):471–494, 1966. 16. T. Saito and J.I. Toriwaki. New algorithms for Euclidean distance trans. of an n-dim. digitized picture with applications. Pat. Rec., 27(11):1551–1565, 1994. 17. G. Sanniti di Baja and E. Thiel. A skeletonization algorithm running on path-based distance maps. Image and Vision Computing, 14(1):47–57, Feb 1996. 18. E. Thiel. G´eom´etrie des distances de chanfrein. Docent, Univ. de la M´editerran´ee, Aix-Marseille 2, Dec 2001. http://www.lim.univ-mrs.fr/˜thiel/hdr .
Discrete Frontiers Xavier Daragon, Michel Couprie, and Gilles Bertrand ´ ´ ´ Ecole Sup´erieure d’Ing´enieurs en Electrotechnique et Electronique Laboratoire A2 SI 2, boulevard Blaise Pascal, Cit´e DESCARTES, BP 99 93162 Noisy le Grand CEDEX, France {daragonx,coupriem,bertrand}@esiee.fr
Abstract. Many applications require to extract the surface of an object from a discrete set of valued points, applications in which the topological soundness of the obtained surface is, in many case, of the utmost importance. In this paper, we introduce the notion of frontier order which provides a discrete framework for defining frontiers of arbitrary objects. A major result we obtained is a theorem which guarantees the topological soundness of such frontiers in any dimension. Furthermore, we show how frontier orders can be used to design topologically coherent “Marching Cubes-like” algorithms.
1
Introduction
The Marching Cubes algorithm[1] provides an efficient way to extract a polygonal surface from an object expressed as a subset of a digital image, or an isosurface from a function. However, the polygonal mesh obtained by this algorithm is not guaranteed to be a topological surface, since artefacts such as holes[2,3,4,5] might appear. While small holes, though a nuisance, might not seem an overly important issue for the visualization of large objects, they can have a dramatic impact on collision detection and most calculations. Consequently, many researches have been directed toward solving this problem[3,4,5,6,7,8]. The approach of J. O. Lachaud [8] is especially interesting: it guarantees the topology of the extracted surface using the topology of the underlying discrete object. Such guarantees are obtained using the framework of digital topology[9] for the underlying object while defining continuous analogs of digital boundaries, and the results hold true for Zn , n ∈ N . In a former article[10], we introduced the notion of frontier orders in 2D and 3D partially ordered sets, asserting the possibility to define the frontiers of objects as symmetrical separating surfaces in such spaces. The present article will encompass and extend our previous results: frontier orders will be presented as a purely discrete framework, based on order topology[11,12,13], which provides topological guarantees for a wide variety of spaces of any dimension. The main result of this paper is a theorem establishing that the frontier order of any subset of an n-surface[14] is a union of disjoint (n − 1)-surfaces. This result allows us to design sound “Marching Cubes-like” algorithms to extract frontiers of objects both in the Khalimsky grid and in Z3 equipped with the digital topology. I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 236–245, 2003. c Springer-Verlag Berlin Heidelberg 2003
Discrete Frontiers
2
237
Definitions
Let us first introduce the notations we will use in this article. If X is a set and S a subset of X, S denotes the complement of S in X. If λ is a binary relation on X, i.e.: a subset of X × X, the inverse of λ is the binary relation {(x, y) ∈ X × X; (y, x) ∈ λ}. For any binary relation λ, λ2 is defined by λ2 = λ \ {(x, x); x ∈ X}. For each x of X, λ(x) denotes the set {y ∈ X; (x, y) ∈ λ} and for any subset S of X, λ(S) denotes the set {y ∈ λ(s); s ∈ S}. 2.1
Orders
An order is a pair |X| = (X, αX ) where X is a set and αX is a reflexive, antisymmetric and transitive binary relation on X. The set αX (x) is called the αX -adherence of x. We denote by βX the inverse of αX and by θX the union of αX and βX . The set θX (x) is called the θX -neighborhood of x. A path from x0 to xn in X in |X| is a sequence x0 , . . . , xn of elements of X such that ∀i ∈ [1 . . . n], xi−1 ∈ θX (xi ). A connected component C of |X| is a maximal subset of X such that for all x, y ∈ C, there exists a path from x to y in C. 2 (x) = ∅ and is equal to the maximal The rank of an element x of X is 0 if αX 2 rank of the elements of αX (x) plus 1 otherwise; the rank of an order is the maximal rank of its elements. Any element of an order is called a point and it is also called an n-element, n being the rank of this point. An order |X| is countable if X is countable, it is locally finite if, for each x ∈ X, θX (x) is a finite set. A CF-order is a countable locally finite order. Let |X| and |Y | be two orders, |X| and |Y | are order isomorphic if there exists a bijection f : X → Y such that, for all x1 , x2 ∈ X, x1 ∈ αX (x2 ) ⇔ f (x1) ∈ αY (f (x2 )). If (X, αX ) is an order and S is a subset of X, the sub-order of |X| relative to S is the order (S, αS ) with αS = αX ∩ (S × S)). When no confusion may arise, we also write |S| = (S, αS ). 2.2
Discrete Surfaces
We use the general definition for n-dimensional surfaces (or simply n-surfaces) proposed by Evako, Kopperman and Mukhin[14]; such surfaces are also known as Jordan n-surfaces[15]. This definition is both elegant and efficient: Let |X| = (X, αX ) be a non-empty CF-order. • The order |X| is a 0-surface if X is composed of exactly two points x and y / αX (y). such that y ∈ / αX (x) and x ∈ • The order |X| is an n-surface, n > 0, if |X| is connected and if, for each x in 2 (x)| is an (n − 1)-surface. X, the order |θX 2.3
Simplicial Complexes
Let Λ be a set, any non-empty subset of Λ is called a simplex. A subset constituted of (n + 1) of elements Λ is also called an n-simplex. Now, let C be a family of simplexes of Λ, C is a simplicial complex if it is closed by inclusion, which
238
Xavier Daragon, Michel Couprie, and Gilles Bertrand
a)
b)
c)
d)
e)
Fig. 1. Schema depicting our methodology. a) Our data is a set of points. b) Upon this set of points is built a simplicial complex. c) Independently from this simplicial complex, some of these points are labeled as object points, the others as background points. d) This bi-partition of the point set induces a tri-partition of the simplicial complex between an object complex [white], a background complex [black] and a frontier complex [grey]. e) The frontier order [depicted by a discrete curve], isomorphic to the frontier complex, is then defined.
means that, if s belongs to C, then any non-empty subset of s also belongs to C. A (simplicial) n-complex is a simplicial complex in which maximal elements are n-simplexes. The minimal subset ΛC of Λ such that any element of C is a subset of ΛC is called the support of C. In this paper, simplicial complexes are also seen as orders: any simplicial complex C will be interpreted as the order |C| = (C, ⊆). Consequently, C will be said to be an n-surface if |C| is an n-surface. The simplicial complexes we just defined are often known as abstract simplicial complexes, as opposed to other notions of complexes based upon an underlying Euclidean space. 2.4
Chains of an Order
Let |X| be an order, a chain of |X| is a fully ordered subset of X. An n-chain is a chain of size n + 1. We denote by C X the set of all the chains of |X|, ie.: 2 (s2 )}. It should be noted that (C X , ⊆) C X = {S ⊆ X, S =∅, ∀s1 , s2 ∈ S, s1 ∈ θX is an order and that C X is a simplicial complex, the support of which is X. Moreover, the topology of (C X , ⊆) is strongly related to the topology of |X|, as shown by the following proposition: Proposition 1 Let |X| be an order. If |X| is an n-surface then the order |C X | = (C X , ⊆) is an n-surface as well. The proof of the above proposition is not included in this article due to space restrictions: while not overly long nor difficult by itself, this proof would require several lemmas. This holds true for the other properties introduced in this article.
3
Frontier Orders
If we consider a simplicial complex C (figure 1.b) and its support X (figure 1.a), the partition of X between a set K, the object, and its complementary K, the background, (figure 1.c) induces a partition of C into three sets (figure 1.d):
Discrete Frontiers
239
• CK , the set of all the simplexes which are subsets of K • CK , the set of all the simplexes which are subsets of K • CK/K , the set of the simplexes being neither subset of K nor subset of K Since a singleton (0-simplex) is either a subset of K or a subset of K, CK/K is not closed for the inclusion and, consequently, is not a simplicial complex. Nevertheless, |CK/K | = (CK/K , ⊆) is still the sub-order of |C| relative to CK/K . It should be noted that, for any given C and K, |CK/K | is order isomorphic | (figure 1.e) defined as the couple (C , α ) where to the frontier order |C K/K
K/K
C
C is the binary K/K = {{A, B}, A ⊆ K, B ⊆ K, A =∅, B =∅, A∪B ∈ C} and αC relation such that, considering M = {A1 , B1 } and N = {A2 , B2 }, M ∈ αC (N ) is equivalent to A1 ⊆ A2 and B1 ⊆ B2 . By definition, CK/K is both symmetrical, since CK/K = CK/K , and separating, since any path from x ∈ K to y ∈ K crosses CK/K . Consequently, the frontier order, which is symmetrical, can be said to be separating. Furthermore, the following theorem, the main result of this paper, guarantees that a frontier order is a union of discrete surfaces: Theorem 2 Let C be a simplicial complex with the property of being an nsurface, n > 1, and let X be its support. Now, let K be a non-empty proper subset of X. Then the frontier order C K/K is a union of disjoint (n−1)-surfaces. As seen previously, to any order can be associated the simplicial complex composed by its chains. So, as a consequence of proposition 1 and theorem 2, we have: Corollary 3 Let |X| = (X, αX ) be an order and K a non-empty proper subset X of X. If |X| is an n-surface then the frontier order |C K/K | is a union of disjoint (n − 1)-surfaces.
4
Marching Cubes and the Khalimsky Grid
The main feature of the Marching Cubes algorithm is a look-up table associating a surface patch to each possible partition of the corners of a unit cube between two sets of points, K and K. Given a map f : Z3 → R and a value n, the Marching Cube algorithm sets K = {x ∈ Z3 , f (x) > n} and K = Z3 \ K. Then, for each unit cube of the cubic grid Z3 , the algorithm finds the appropriate surface patch in the look-up table and builds this patch, interpolated according to the values of the eight corners of this unit cube. The union of all those patches constitutes the approximated iso-surface. This algorithm is often used to extract the surface of an object in a greylevel image, in which case n is interpreted as a threshold. In the case of a binary image, it is sufficient to apply the look-up table without any interpolation. While the original Marching Cubes algorithm[1] did not consider the topology of the underlying image, and did not guarantee the topology of the extracted surface, we will now explain how to generate a Marching Cube algorithm coherent with the topology of the Khalimsky grid.
240
Xavier Daragon, Michel Couprie, and Gilles Bertrand
Fig. 2. A unit cube ({n, n + 1} × {m, m + 1} × {l, l + 1} and its closure) of H 3 , one of the 8 unit cubes of Z3 of which it is made, and the tetrahedra (chains of |H 3 |) it contains.
4.1
Khalimsky Grid and Embedded Frontier Order
Let us first introduce now the Khalimsky grids as the family of orders |H n | = (H n , ⊆), defined by: H01 = {{a}, a ∈ Z} ; H11 = {{a, a + 1}, a ∈ Z} H 1 = H01 ∪ H11 H n = {h1 × . . . × hn , ∀i ∈ [1, n], hi ∈ H 1 } It is important to note that |H n | is an n-surface for all n ∈ N∗ as proved by V. A. Evako and al.[14]. This implies, by corollary 3, that the frontier defined for any subset of an order H n is a union of disjoint (n − 1)-surfaces. A natural encoding of the set H n into the corresponding discrete space Zn is defined as follows[11]: to every element h1 × . . . × hn of H n is assigned the vertex of coordinates (z1 , . . . , zn ) in Zn , such that ∀i ∈ [1 . . . n], zi = 2vi if hi = {vi } and zi = 2vi + 1 if hi = {vi , vi + 1}. Figure 2 depicts the cube of H 3 constituted by {n, n + 1} × {m, m + 1} × {l, l + 1} and its subsets, which contains 8 unit cubes of Z3 , each of which is itself constituted by 6 tetrahedra, images of the chains of H 3 . This encoding of H n induces an embedding of the frontier orders based upon it: to each 0-element {{A}, {B}} of the frontier order we assign the vertex of coordinates (a+b)/2 where a (resp. b) is the vertex assigned to A (resp. B). Then, to each 1-element we assign the segment joining the vertices associated to the 0-elements of its θ-neighborhood, to each 2-element we assign the corresponding polygon (which is in fact either a triangle or a parallelogram); and so on. 4.2
Marching Cubes-Like Algorithm in Dimension 3
The look-up table obtained for the possible configurations of a unit cube of H 3 is depicted in figure 3. Unlike both the original Marching Cubes algorithm and its correction by Lachaud in the framework of digital topology, our surface generation process is not translation invariant, since the Khalimsky grid itself is not. In practice, it is sufficient to rotate the configuration according to the coordinates of the upper-left-front (or any other) corner of the unit cube. The configurations given figure 3 being based upon chains (tetrahedra) rather than upon cubes, they are more facetized than those of the original Marching-Cubes
Discrete Frontiers
241
Fig. 3. Configurations obtained for the look-up table of the Marching Cubes-like algorithm in the H 3 case. Whenever several configurations are identical up to rotations and symmetries, only one is presented here. While the original Marching-Cube Algorithm generates from 1 to 4 triangles for each configuration, the count here ranges from 2 to 12 triangles (2 to 6 frontier orders elements, some of them correspond to parallelograms).
algorithm. It is possible to simplify these configurations, with the guarantee to preserve the overall topology, and the guarantee that the surface still separates the object from the background. The simplification process is as follows: the configurations of figure 3 are first triangulated, then anti-stellar and bi-stellar moves[16] are applied to reduce the number of faces. In order to ensure the coherency of the frontier between adjacent unit cubes, we systematicaly replace any point located on a face but not an edge of a cubic cell by the segment connecting its two nearest neighbors in this face as depicted in figure 5. We thereby obtain the configuration table depicted figure 4.
5
Frontier Orders and Digital Topology
In the framework of digital topology[9], a digital image built upon Z3 can be seen as a quadruple (Z3 , m, n, K), where K ⊆ Z3 is the set of the object points (or object), where K is the set of the background points (or background) and where (m, n) ∈ {(6, 26), (6, 18), (26, 6), (18, 6)}, m being the adjacency of the object and n the adjacency of the background. More precisely, any two points belonging to the object are connected if: • both belong to a unit edge. • both belong to a unit face and either m = 18 or m = 26. • both belong to a unit cube and m = 26. The same goes for the background, with n instead of m.
242
Xavier Daragon, Michel Couprie, and Gilles Bertrand
Fig. 4. Simplified configurations obtained for look-up table of the Marching Cubes-like algorithm in the H 3 case, from the configurations presented in figure 3. One should note that some originaly different frontiers have identical simplifications, up to rotations. Most simplified configurations are equivalent to the corresponding configuration of the original Marching-Cubes algorithm; in the sense that they have the same number of triangles, the same intersection with the cube boundary and are stellar equivalent. Nevertheless some new configurations appear whenever two points located on the opposite corner of a face or cube are adjacent according to |H 3 | topology; and one of the original algorithm configurations, assuming four non-adjacent corners, has no equivalent here.
Fig. 5. a) Is an original configuration. b) Is a triangulation of a). c) Is obtained from b) by the anti-stellar move replacing the vertex A by the 1-simplex {B, C}, this same move being applied to all points located on the centers of the faces (observe that this move has effects not only on this cube but on the neighboring ones as well). d) and e) are then obtained by consecutive bi-stellar moves.
In this framework, Lachaud[7,8] has provided a topologically sound Marching Cubes algorithm using continuous analogs of digital boundaries, we will show how the same result can be reached using purely discrete means: frontier orders. Since Z3 equipped with digital topology is not an order, we first need to build a simplicial complex C upon it. However, would C be built using only Z3 as its support, it would be unable to emulate the various adjacency relations used by digital topology; two points x and y of K located on the opposite corners of a face, for example, would be considered to be adjacent if {x, y} ∈ C, whatever the adjacency. In order to take into account the adjacency, we need to introduce two types of intermediary points: face points, which are located in the center of
Discrete Frontiers
243
Fig. 6. a) Triangulation of a unit cube, with intermediary (smaller) points. b) One of the 24 identical tetrahedra of this triangulation is outlined in grey. c) Let now assume that one corner point (black) belongs to the object, and all the others (white) to the background. d) Result of the affectation strategy, assuming that the object is 26-connected (which implies that the background is 6-connected). e) Generation of the frontier complex. f) Simplified frontier.
a face, and cube points, which are located in the center of a cube. Then, refering to the previous example, two points of K located on the opposite corners of a face will be considered adjacent if, and only if, the face point associated to this face also belongs to K, which will depend on the adjacency (and, maybe, the other corners of the face). No points are introduced for edges since two points of K located on the same edge are always adjacent. As a result, each cube (figure 6.a) is triangulated into 24 identical tetrahedra defined by 2 points of Z3 , a face point and a cube point (figure 6.b). It should be noted that C is then a 3-surface, which can be easily verified by an exhaustive checking of every existing simplex configuration, thus the hypotheses of theorem 2 are satisfied. Since the entries of the look-up table are to be entirely determined by the points of Z3 and the adjacency, the belonging of an intermediary point to either K or K is entirely determined by an affectation strategy (figure 6.d) defined as follows: • 6/26-adjacency and 26/6-adjacency (let K be the 26-adjacent set) • a face point belongs to K iff at least one corner of this face does • a cube point belongs to K iff at least one corner of this cube does • 6/18-adjacency and 18/6-adjacency (let K be the 18-adjacent set) • a face point belongs to K iff at least one corner of this face does • a cube point belongs to K iff at least three corners of this cube do The simplified results, which can be found on figure 7, are obtained from the initial ones by stellar and bi-stellar moves, as in the |H 3 | case, and are equivalent to the results obtained by Lachaud for the same configurations.
6
Conclusion
We have introduced frontier orders which allow to define the frontier of a discrete object. We have established that frontier orders are surfaces, which appears as a necessary property for the design of topologically sound Marching Cubes-like algorithms. An extended version of this paper[17] will provide proofs for the properties stated in this article, as well as other important properties which, due to space limitation, have not been included. In particular we proved that any simplicial
244
Xavier Daragon, Michel Couprie, and Gilles Bertrand
1)
2) Fig. 7. 1) Simplified configurations, according to the adjacency of the set of black points. As previously (in the H 3 case) the initial configurations are the direct embedding of the frontier order (induced by the subdivision into 24 tetrahedra and the affectation strategy appropriate for the adjacency) into R3 while the simplified configurations are obtained from the initial ones through stellar and bi-stellar moves. 2) As it can be seen, it may happen that depending on the chosen adjacency, the same vertex configuration produces different frontier order (initial) configurations, which in turn produce the same simplified configuration.
a)
b)
Fig. 8. Results for a segmented cortex (in |H 3 |), a) using initial configurations b) using simplified configurations.
complex which is an n-surface is an n-pseudomanifold. We will also show the link between frontier orders and regular neighborhoods[18]. The frontier order associated to a cortex segmentation is depicted on figure 8 and those interested in further images may find some at the following address: “http://www.esiee.fr/˜info/xavier/MC03_res.html”.
Discrete Frontiers
245
References 1. Lorensen, W., Cline, H.: Marching cubes: a high resolution 3D surface construction algorithm. Computer Graphics 21 (1987) 163–169 2. Payne, B.A., Toga, A.W.: Surface mapping brain function on 3D models. IEEE Computer Graphics and Applications 10 (1990) 33–41 3. Cignoni, P., Ganovelli, F., Montani, C., Scopigno, R.: Reconstruction of topologically correct and adaptive trilinear isosurfaces. Computers and Graphics 24 (2000) 399–418 4. Delibasis, K.S., Matsopoulos, G.K., Mouravliansky, N.A., Nikita, K.S.: A novel and efficient implementation of the marching cubes algorithm. Computerized Medical Imaging and Graphics 25 (2001) 343–352 5. Zhou, C., Shu, R., Kankanhalli, M.S.: Handling small features in isosurface generation using marching cubes. Computers and Graphics 18 (1994) 845–848 6. Chan, S.L., Purisima, E.O.: A new tetrahedral tesselation scheme for isosurface generation. Computers and Graphics 22 (1998) 83–90 7. Lachaud, J.O.: Topologically defined iso-surfaces. Lecture Notes in Computer Science 1176 (1996) 245–256 8. Lachaud, J.O., Montanvert, A.: Continuous analogs of digital boundaries: A topological approach to iso-surfaces. Graphical models 62 (2000) 129–164 9. Kong, T.Y., Rosenfeld, A.: Digital topology: Introduction and survey. Computer Vision, Graphics and Image Processing 48 (1989) 357–393 10. Daragon, X., Couprie, M., Bertrand, G.: Marching chains algorithm for alexandroff-khalimsky spaces. In: Vision Geometry XI. (2002) 51–62 11. Khalimsky, E.: On topologies of generalized segments. Soviet Mat. Doklady 10 (1969) 1508–1511 12. Bertrand, G.: New notions for discrete topology. In Springer, ed.: DGCI’99. Volume 1568 of LNCS. (1999) 218–228 13. Bertrand, G., Couprie, M.: A model for digital topology. In Springer, ed.: DGCI’99. Volume 1568 of LNCS. (1999) 229–241 14. Evako, A.V., Kopperman, R., Mukhin, Y.V.: Dimensional properties of graphs and digital spaces. Jour. of Math. Imaging and Vision 6 (1996) 109–119 15. Kopperman, R.: The khalimsky line as a foundation for digital topology. In: Shape in Pictures. Volume 126 of NASO ASI Series F. (1994) 3–20 16. Lickorish, W.: Simplicial moves on complexes and manifolds. Geometry And Topology Monograph, Proccedings of the KirbyFest 2 (1998) 229–320 17. Daragon, X., Couprie, M., Bertrand, G.: Discrete surfaces and frontier orders. (in preparation) 18. Hudson, J.: Piecewise Linear Topology. W.A. Benjamin inc. (1969)
Towards an Invertible Euclidean Reconstruction of a Discrete Object Rodolphe Breton1 , Isabelle Sivignon3 , Florent Dupont2 , and Eric Andres1 1
2
Laboratoire IRCOM-SIC, Universit´e de Poitiers, BP 30179, 86962 Futuroscope Chasseneuil Cedex, France {andres,breton}@sic.sp2mi.univ-poitiers.fr http://www.sic.sp2mi.univ-poitiers.fr Laboratoire LIRIS – FRE 2672 CNRS, Universit´e Claude Bernard Lyon I, Bˆ at. NAUTIBUS, 8, bd Niels Bohr, 69622 Villeurbanne cedex, France [email protected] http://liris.cnrs.fr 3 Laboratoire LIS, Domaine universitaire Grenoble, BP 46, 38402 St Martin d’H`eres Cedex, France [email protected] http://www.lis.inpg.fr
Abstract. An invertible Euclidean reconstruction method for a 2D curve is proposed. Hints on an extension to 3D are provided. The framework of this method is the discrete analytical geometry. The reconstruction result is more compact than classical methods such as the Marching Cubes. The notions of discrete cusps and patches are introduced. Keywords: Discrete object, invertible Euclidean reconstruction.
1
Introduction
The reconstruction of discrete objects is mainly performed in practice with the “Marching Cubes” method [1] (and all its follow ups). For a couple of years another approach, based on discrete analytical geometry, is investigated in the discrete geometry community. The aim is to decompose the boundary of a discrete object into discrete analytical polygons and then these polygons into Euclidean polygons. The method has to be invertible, i.e. the discretization of the reconstructed boundary has to be equal to the original discrete object. We don’t want any information to be added or lost. The aim of this new approach is to provide a more compact reconstruction. Several other attempts have already been made in this direction that are not satisfying and usually not invertible (see [2] for details). Our method is based on Vittone’s recognition algorithm for the decomposition of the discrete boundary into discrete line pieces in 2D and discrete plane pieces in 3D. The analytical framework is provided by the standard discrete analytical model that defines 2D and 3D discrete polygons [3]. A working solution in 2D and indications on how to tackle the 3D case are proposed. The method works basically as follows: a discrete boundary is decomposed with Vittone’s algorithm [4] into discrete line pieces in 2D (resp. discrete I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 246–256, 2003. c Springer-Verlag Berlin Heidelberg 2003
Towards an Invertible Euclidean Reconstruction of a Discrete Object
247
plane pieces in 3D). The result of Vittone’s algorithm is adapted to the standard analytical model as it is, for the moment, the only suitable discrete analytical model [3]. The reconstruction process is guided by so called discrete cusps in order to propose a reconstruction that fits better a “common sense” recontruction. A Euclidean line (resp. 3D plane) candidate is chosen among all the possible solutions. This is done for each discrete line piece (resp. 3D plane piece). All these 2D lines (resp. 3D planes) form a Euclidean 2D polygon (resp. 3D polyhedron). The discretization of this Euclidean object (2D polygon or 3D polyhedron) is not necessarily equal to the boundary of the discrete object but is usually larger. In 2D, in order to avoid this problem, and provide the revertibility property, patches are introduced. In 3D, the problem is more difficult and not completely solved so far. Not only the vertices but also the 3D edges of the polyhedron can be located outside the discrete object. Several hints are given on how to solve these problems, especially with convex and non-convex discrete objects. In section 2, a new discrete curve reconstruction method is provided. Notions such as discrete cusps and patches are introduced. In section 3, the 3D case is examined. The convex and non-convex case are studied and hints on solutions are given. We conclude in section 4 with some perspectives. Brief recall on the standard model. The standard digitization of a Euclidean object consists in all the pixels (resp. voxels) that are cut by the object. The standard lines (resp. planes) can be defined arithmetically: a discrete standard line (resp. plane) of parameters (a, b, µ) (resp. (a, b, c, µ)) is the set of integer points (x, y) (resp. (x, y, z)) verifying −ω ≤ ax + by(resp. + cy) + µ < ω where |a|+|b| (resp.+|c| ) . A standard line (resp. plane) is a 4-connected line (resp. ω= 2 6-connected plane). If we denote St(O) the standard digitization of the object O, the following useful properties can be derived from the geometrical definition of this model: St(O1 ∩ O2 ) ⊆ St(O1 ) ∩ St(O2 ) and St(O1 ∪ O2 ) = St(O1 ) ∪ St(O2 )
2 2.1
Reconstruction of a 2D Discrete Curve Principle
We consider here 4-connected curves. To reconstruct a discrete curve, we first choose a point on that curve, recognize a discrete straight-line segment and then, repeat this process along the curve. The recognition algorithm used was developed by Vittone [5]. For a given discrete edge, it provides the set of all corresponding Euclidean straight lines as a polygon in a parameter space (well studied by Veelaert in [6]). The standard discretization [7] [3] of any of these Euclidean lines contains the original discrete edge. It has been proven that the set of solutions is a (3 or 4)-vertex convex polygon (see [8]) in the (α, β) parameter space P, and can only have one of the five shapes illustrated in fig. 1. A Euclidean straight line y = αx + β, in the cartesian space C, corresponds to a point (α, β) in P. Thus, the three (resp. four) vertices of the solution set correspond to three (resp. four) Euclidean straight lines in C. We chose one
248
Rodolphe Breton et al.
Fig. 1. The 5 possible shapes of the solution set and in each case, the chosen solution.
Fig. 2. Example of discrete cusps and Euclidean solutions. In (a), a regular case. In (b), addition of a patch.
particular line as a solution and called it the median solution. This seems to be a reasonable choice, as illustrated on fig. 1. This figure shows the median solution in P and C for each possible shape of the set. Prior to the recognition process, we look for remarkable points on the discrete curve. We call those points discrete cusps and define them as follows: a point of a discrete curve is a discrete cusp iff the segment composed of this point, the next two points and the previous two points, is not a discrete segment. We use the Freeman code to determine whether or not such a 5-pixel set is a discret segment. Fig. 2 shows an example of discrete cusps. These cusps act like “anchors” and help us to adjust segments’ extremities: during the recognition of a discrete segment, we preferably begin (and end) a segment on a discrete cusp. Starting Point. If there are cusps, we choose as a starting point of the algorithm, the cusp with the smallest x-coordinate and then with the smallest y-coordinate. If there are no cusps on the curve, we choose a regular point that fits the same conditions. This choice ensures the unicity of the process. We proceed then with the recognition of the curve counter clockwise.
Towards an Invertible Euclidean Reconstruction of a Discrete Object
2.2
249
Details on the Reconstruction
Before going on, we have to introduce some useful notations: pi denotes the i-th pixel of a curve and sk is the k-th segment of the polygonalized curve. After the Vittone’s algorithm, for each discrete segment we found, we obtain an equivalence class of all the lines that match this discrete segment and we choose the median line as a solution (see fig. 2). Then, we have to handle the intersections between those Euclidean lines. The most simple case occurs when two lines intersect in a pixel which belongs to the two corresponding discrete segments sk and sk+1 (see fig. 2 (a)). In this section, we explain the different cases we face during the reconstruction.
Fig. 3. (a) addition of a patch. (b) smoothing of this patch.
Patch. Our first problem is the intersection of two lines outside a pixel of the curve, or even, the non-intersection of two lines. As we must “constrain” the Euclidean curve inside the discrete curve, we decide to add a little patch to join the lines together (see fig. 2 (b)). But for some cases, adding a patch causes undesired visual results as shown on fig. 3 (a). We soften this patch by extending it to the neighbouring pixels, as illustrated on fig. 3 (b). In order to reduce the number of patches, we allow two solution lines to intersect in a 3-pixel long area, that is, the pixel common to the two discrete segments and its two neighbours. This little trick still allows reversibility. Post-process Patch Removal. Sometimes, we can get rid of a patch thanks to a second pass of the recognition algorithm in the opposite direction. In fig. 4 (a), we see the result of a first reconstruction. As the two solution lines do not intersect in the permitted intersection area, we normaly should add a patch. But a second recognition, in the opposite direction, leads to (b) and a valid intersection. So, we eventually end up with the result (c). 2.3
The Algorithm
Initialization: – we consider a discrete curve, i.e. a sorted sequence of n pixels: p1 . . . pn – the cusps of the curve are determined
250
Rodolphe Breton et al.
Fig. 4. Patch removal thanks to a reverse recognition.
Step 1: Recognition – sk denotes the current segment (at first k = 1) – pi denotes the current pixel (at first i = 2) – we use Vittone’s algorithm to recognize a discrete segment: • we insert pixel pi in sk • if this extended sk is still a discrete segment, we go on: i = i + 1 • else sk ends up on pi−1 and either pi−1 or pi−2 becomes the starting point of the new segment: i = i − 1 (or i = i − 2) and k = k + 1 – until we reach the last pixel (i = n) – in the case of a closed curve we carry on the recognition until we meet a cusp, and then we possibly merge the last and the first segment – at this point, the curve is entirely recognized and splitted into k discrete segments and each one is linked to a coset of Euclidean solutions in the parameter space Step 2: Reconstruction – for each coset of solutions, we choose the median line dk – we must now create the Euclidean segments that are contained in dk – so, we set the first extremity of the first Euclidean segment r1 (a point on d1 that belongs to p1 , the first pixel of the curve) – then, we enter a loop through the lines dk : • if dk (segment sk = [pa , pb ]) and dk+1 (segment sk+1 = [pb , pc ]) intersect in pb , pb−1 or pb+1 • then∗ , this intersection point becomes the second extremity of rk and the first one of rk+1 • else (intersection outside or no intersection), we launch another recognition between pc and pa , which can lead to two cases: we still have the same two segments sk and sk+1 , therefore, the patch is unavoidable, and then, the second extremity of rk is the first vertex of the patch, and the first extremity of rk+1 is the second vertex of the patch sk+1 has been extended and the intersection between dk and the new solution line allows us to avoid the patch; thus we go back to the regular case (see ∗) – we eventually have a sequence of Euclidean segments rk (each one defined by two Euclidean points) and this sequence forms a polygonal line of which discretization perfectly matches the starting discrete curve
Towards an Invertible Euclidean Reconstruction of a Discrete Object
3
251
Discrete Object Surface Polygonalization
In this section, we present the problem for 3D discrete volumes. We point out the type of difficulties we encounter and give some indications on the possible solutions to solve them. 3.1
Discrete Surface Segmentation
We consider an 18-connected discrete volume and its surface defined as the set of voxels sharing one face with the background object. Since discrete naive planes are the thinnest 18-connected discrete planes without 6-connected holes [9], they are well adapted for a segmentation of an object surface. In an arithmetical way, a discrete naive plane of parameters (a, b, c, µ) is the set of integer points (x, y, z) fulfilling the conditions 0 ≤ ax + by + cz + µ < max(|a|, |b|, |c|). We use, as in 2D, a discrete naive plane recognition algorithm proposed by Vittone [10] in 3D. For a given discrete plane, it provides the set of all corresponding Euclidean planes as a polyhedron in a parameter space. The standard discretization of any of these Euclidean planes contains the original discrete plane. Consider a discrete point (x0 , y0 , z0 ) and the parameter space (α, β, γ) where a point (α0 , β0 , γ0 ) stands for the plane α0 x + β0 y + z + γ0 = 0. The discrete point corresponds to a double constraint defined by the double inequality 0 ≤ αx0 + βy0 + z0 + γ < 1 in the parameter space. Hence, the recognition algorithm adds the voxels one by one, reducing the solution set in the parameter space according to the corresponding double inequality. Figure 5 gives an illustration of a piece of plane and the corresponding set of solutions in the parameter space.
Fig. 5. A piece of the discrete naive plane (1, 3, −5, 0) and the corresponding set of solutions in the parameter space.
We proposed in [11] a discrete surface segmentation based on this algorithm. We will not describe precisely this algorithm but just give some hints. The general idea is to propose a coplanarity test ensuring a “regular shape” for the recognized plane pieces. To do so, we use a local configuration of discrete planes called tricube. Let P be a discrete plane in the first quadrant. Then, a tricube is a set of 9 voxels of P such that the projection fo those voxels onto the plane (x, y)
252
Rodolphe Breton et al.
is a 3 × 3 square: T (i, j) = {(x, y, z) ∈ P | i ≤ x < i + 3, j ≤ y < j + 3}. There exist 40 different tricubes [12–14] and it has been shown that any discrete plane can be built using tricubes. In our algorithm, we impose that any voxel of a plane piece belongs to a tricube of this plane, which means that at least 3 out of 8 neighbours of any voxel of a plane piece P belong to P . Moreover, we allow planes overlapping to avoid as much as possible tiny plane pieces. The pieces of planes recognized may contain holes that can be removed splitting them around the holes. Hence, the result of the algorithm is a labelling of the voxels faces with discrete plane pieces numbers. 3.2
Use of the Standard Model
After the discrete surface segmentation, we need to define discrete polygons onto this surface in order to get a polygonal reversible surface. This implies the definition of vertices and edges and thus the study of the discrete planes intersections. Unfortunately, naive planes, that were well adapted for the segmentation step, do not have the geometrical consistency properties needed to define discrete edges and vertices. To solve this problem, we choose to swap to another model, called standard model that was already presented briefly for lines in the introduction. We use the connectivity characteristics of naive and standard planes to add to the naive plane pieces, the voxels needed to get standard planes. As we do not want to add information to the initial object, we must add those voxels inside the object. If we look at the arithmetical definition of naive and standard plane, this means that we only add voxels (x, y, z) which satisfies −(|a| + |b| + |c| − max(|a|, |b|, |c|)) ≤ ax + by + cy + µ < 0 and which lies “under” a surface voxel of the considered plane piece. Once we have done this transformation, we need to move the set of solutions in the parameter space, in order to fit to the definition of standard plane we gave. Consider a point (a, b, c, µ) of the parameter space, solution for the piece of naive plane P . Then, the point ) is a solution for the standard plane defined (a, b, c, µ + |a|+|b|+|c|−2max(|a|,|b|,|c|) 2 by the previously given transformation. 3.3
From a Discrete Surface to a Polygonal Surface
We have shown how to get a segmentation of a discrete surface into pieces of standard planes. In the following, we show how to get a polygonal surface for convex objects, and give some hints on the problems encountered for non convex objects. First Approach for Convex Objects. For each piece of discrete plane of the segmentation, we know the whole set of solutions in the parameter space. Thus, one can choose a solution for each piece of plane, and the intersection of all those half-spaces is a polygonal approximation of the object surface. Figure 6(a) gives the result we get with such a solution for a discrete sphere of radius 20.
Towards an Invertible Euclidean Reconstruction of a Discrete Object
253
Fig. 6. Some examples on convex discrete volumes.
This solution is, however, usually not a reversible one. Figure 6(b) shows an example where some of the reconstructed edges and vertices are outside the discrete volume. Thus, the standard digitization of this polygonal surface contains more voxels than the original volume. This is exactly the same type of problems we discussed and solved by adding patches for discrete curves. In 3D, such patches are more difficult to define but a solution would be to run the discrete plane recognition algorithm on the surface places where the polygonal surface goes through the discrete object. This new plane would give the needed patch as shown on figure 6(c).
General Case and Specific Problems. Solving the reversibility problems is a second step after the construction of a polygonal surface. The half-spaces intersection method presented above can not work on non convex volumes. In order to reconstruct a polygonal surface from the segmentation for any object, we propose a contruction face by face. Moreover, this allows us to control the position of edges and vertices as we calculate them one by one. The general algorithm we propose is shown in Algorithm 1.
Algorithm 1 Construction of a polygonal surface Polygonal Surface(S) 1: For each piece of discrete plane of S, choose an Euclidian solution. 2: Let p be a piece of discrete plane, and E (p) the Euclidian solution chosen. – track the 6-connected border of p, numbering its neighbour planes pi , 0 ≤ i < n, n ≥ 3; [edges] – for all i, compute Li = E (p) ∩ E (pi ); [vertices] – for all i, compute Li ∩ Li+1 . 3: Repeat for each pi , 0 ≤ i < n until each discrete plane has been treated.
254
Rodolphe Breton et al.
From the face by face construction, we derive that this very simple algorithm is valid for convex and non convex objects. Nevertheless, the discrete structure of the volume induces many problems. Let us look at this algorithm step after step. The first important step is to track the border of each piece of plane in order to get an order on the plane neighbours. This step highly depends on the segmentation we get. Indeed, the segmentation algorithm we proposed allows planes overlapping and this leads to many neighbourhood relationships between discrete planes whatever neighbourhood definition we use. It is sometimes impossible to get an order on the neighbours which is consistent with the contruction of a polygonal face. We tried other strategies to get rid of this problem, the underlying idea always being the suppression of useless neighbourhood relationships. Algorithm 2 describes the solution we propose to compute the neighbourhoods.
Algorithm 2 Neighbourhood calculation Neighbours() 1: Apply the segmentation algorithm allowing only one piece of discrete plane for each voxel: the voxels already labelled by another plane piece are added to the current plane but not labelled. 2: Compute the 4-connected border B(p) of the projection of each piece of plane p; 3: Order the neighbour planes of each p tracking B(p): two planes are neighbours when there exist v1 ∈ p1 and v2 ∈ p2 such that v1 and v2 are 18-neighbours. 4: For each plane piece, label the voxels that were added but not labelled during step 1.
With Algorithm 2, we use the minimal plane number to compute the neighbourhood relationships, but finally get the same pieces of planes as before. This method gives most of the time good neighbourhood relationships but needs to be improved because the order in the plane segmentation has an influence on the result we get. The next and last problem of algorithm 1 occurs during the vertices calculation when one vertex should be the intersection of more than three planes. For instance, let us consider a vertex that should be the intersection of four planes p0 , p1 , p2 and p3 . This vertex is computed four times, one for each polygon, and we denote them α0 = p0 ∩ p1 ∩ p2 , α1 = p0 ∩ p1 ∩ p3 , α2 = p0 ∩ p2 ∩ p3 and α3 = p1 ∩p2 ∩p3 . Figure 7 illustrates this situation. Those four vertices are either confounded or all different. Thus, either we get one point or four. Moreover, in the case of four points, they cannot be coplanar. In the case of four different points, we need to make some changes in order to get a surface. For instance, if at least one of the αi is outside the discrete object, then we need to add a patch. An other case is when the four vertices belong to the same voxel: then we can delete some of those vertices or add some little triangle faces. Otherwise, the four vertices are inside the object but do not lie in the same voxel. This case may be very tricky and the most simple way to
Towards an Invertible Euclidean Reconstruction of a Discrete Object
255
Fig. 7. The multiple vertices problem: four planes and four different vertices. The polygonal faces computed are drawn with dashed lines.
Fig. 8. Illustration of the different steps for the reconstruction of a polygon.
solve the problem is probably to try to recognize a new piece of digital plane with the voxels containing the vertices αi . Figure 8 illustrates the whole process described in this section: on the left, a digital piece of plane P : the 4-connected border of its projection is represented by a polygonal line, and the labels of the neighbour voxels are depicted; on the right, an illustration of the reconstructed polygon from the neighbour planes.
4
Conclusions and Future Work
In this paper we described a framework to find a polygonal curve (resp. surface in 3D) from a discrete curve (resp. surface in 3D) with an invertible method. In 2D a new algorithm has been developed to vectorize a discrete curve. We first introduce some remarkable points called discrete cusps and use the Vittone’s algorithm for line recognition. The addition of patches allows to keep the Euclidean curves inside the discrete curve. Then a post-processing stage removes patches in order to give a visually correct result. In 3D, a solution has been presented for convex objects which is for the moment not reversible. We have also proposed a general algorithm to construct a polygonal surface based on the Vittone’s algorithm and a face by face neighbourhood calculation. We have pointed out the main problems encountered to find neighboorhood relationships
256
Rodolphe Breton et al.
and have proposed some solutions. In a future work improvements have to be done in order to keep the Euclidean surface inside the object even on identified particular cases.
References 1. Lorensen, W., Cline, H.: Marching cubes: a high resolution 3d surface construction algorithm. In: SIGGRAPH ’87, Computer Graphics J. Volume 21., Anaheim, USA (1987) 163–169 2. Cœurjolly, D.: Algorithmique et g´eom´etrie discr`ete pour la caract´erisation des courbes et des surfaces. PhD thesis, Universit´e Lumi`ere, Lyon 2, France (2002) 3. Andres, E.: Discrete linear objects in dimension n: the standard model. Graphical Models (2003) (To appear). 4. Vittone, J., Chassery, J.M.: (n − m)-cubes and farey nets for naive plane understanding. In: 8th Int. Workshop on Discrete Gometry for Computer Imagery. Volume 1568., Marne-la-Vall´ee, France (1999) 76–87 5. Vittone, J.: Caract´erisation et reconnaissance de droites et de plans en g´eom´etrie discr`ete. PhD thesis, Universit´e Joseph Fourier - Grenoble 1, France (1999) 6. Veelaert, P.: Geometric constructions in the digital plane. Journal of Mathematical Imaging and Vision 11 (1999) 99–118 7. Andres, E.: Defining discrete objects for polygonalization: the standard model. In A. Braquelaire, J.O.L., Vialard, A., eds.: Discrete Geometry for Computer Imagery 2002. Volume 2301 of Lecture Notes in Computer Science., Bordeaux, France, Springer (2002) 313–325 8. Lindenbaum, M., Bruckstein, A.: On recursive, o(n) partitioning of a digitized curve into digital straight segments. IEEE Transactions on Pattern Analysis and Machine Intelligence 15 (1993) 949–953 9. Andres, E., Acharya, R., Sibata, C.: Discrete analytical hyperplanes. Graphical Models and Image Processing 59 (1997) 302–309 10. Vittone, J., Chassery, J.M.: Recognition of digital naive planes and polyhedrization. In: Discrete Geometry for Computer Imagery. Volume 1953 of LNCS., Springer-Verlag (2000) 296–307 11. Sivignon, I., Dupont, F., Chassery, J.M.: Decomposition of a 3d discrete object surface into discrete plane pieces. Algorithmica, Special Issue on Shapes Algorithmics (To appear) ´ 12. Debled-Rennesson, I.: Etude et reconnaissance des droites et plans discrets. PhD thesis, Universit´e Louis Pasteur, Strasbourg, France (1995) 13. Schramm, J.: Coplanar tricubes. In Ahronovitz, Fioro, eds.: Discrete geometry for computer imagery. Volume 1347 of LNCS., Springer-Verlag (1997) 87–98 14. Vittone, J., Chassery, J.M.: Coexistence of tricubes in digital naive plane. In: Discrete Geometry for Computer Imagery. Volume 1347 of LNCS., Springer-Verlag (1997) 99–110
Reconstruction of Discrete Surfaces from Shading Images by Propagation of Geometric Features Achille Braquelaire and Bertrand Kerautret LaBRI, Laboratoire Bordelais de Recherche en Informatique UMR 5800, Universit´e Bordeaux 1 351, cours de la Lib´eration 33405 Talence, France {achille,kerautre}@labri.fr
Abstract. This paper describes two new methods for the reconstruction of discrete surfaces from shading images. Both approaches are based on the reconstruction of a discrete surface by mixing photometric and geometric techniques. The processing of photometric informations is based on reflectance maps which are classic tools of shape from shading. The geometric features are extracted from the discrete surface and propagated along the surface. The propagation is based in one case on equal height discrete contour propagation and in the other case on region propagation. Both methods allow photometric stereo. Results of reconstruction from synthetic and real images are presented. Keywords: Computer vision; Shape from shading; Discrete surface; Discrete normal.
1
Introduction
Shape recovery is an important domain of computer vision the problematic of which is to reconstruct a surface from 2D images of this surface. In general we consider only topographic surfaces S defined by z = Z(x, y). The human system of vision may combine different informations in order to perform such a reconstruction, like shadings, focus, or stereo informations. But the combination of these informations is not trivial and the methods developed in computer vision are generally based on the processing of one kind of data: shading, shadows, motion, stereo-vision, defocus. In this paper we address the problematic of shape from shading which consists in using shading informations to retrieve the normals to the surface and thus its shape. This approach was introduce in 1970 by Horn [3] and many different methods have then been proposed (see Zhang et al. [13] for a comprehensive survey). The main difficulty of shape from shading is that, for a given light source direction, a gray level may correspond to many different orientations of surface normal. The possible surface orientations for each intensity are usually represented by a map called reflectance map. Four approaches have been proposed: I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 257–266, 2003. c Springer-Verlag Berlin Heidelberg 2003
258
Achille Braquelaire and Bertrand Kerautret
– The global minimization approaches the principle of which is to minimize a global energy function. Usually this energy measures a difference between the image intensity and the intensity calculated from the reconstructed surface. Additional constraints like smoothness or integrability of the surface are often used (see for example Ikeuchi and Horn [5] and Frankot and Chellappa [2]). – The local derivative approaches the principle of which is to try to recover the shape information from the intensity image and its derivatives. For example Lee and Rosenfeld [7] compute the normal vector to the surface by using the first derivative of the intensity. – The linear approaches based on the linearization of the reflectance map (see Tsai and Shah [11]). – The propagation approaches which consist in propagating shape informations from singular points. The first shape from shading technique introduced by Horn was a propagation approache with a reconstruction based on the extension of characteristic strips [3]. In some areas of the image the reconstruction of the surface may be altered by self shadows or intereflection. Self shadows depend both on the orientation of the surface and on the position of the light source. Thus the reconstruction may be improved by considering more than one image of the surface, for instance with several different light source directions. This technique, called photometric stereo, was introduced by Woodam [12] for the reconstruction of the surface gradient from several images and is suited to minimization methods (for example Saito proposed a method based on this principle for recovering the skin surface [8]). In this work we develop methods based on the processing of geometric features extracted from the explicit reconstruction of the discrete surface. The reconstruction of the surface is based on the propagation of geometrical features related to equal height contours or regions. The propagation of equal height informations called level sets has been introduced by Kimmel and Bruckstein in 1992 [6]. A closed curve was initialized in the areas of singular point and propagated according to the light source direction. The evolution of the parametric curve was solved via a Eulerian formulation. The propagation direction in the direction of the light source makes possible to solve the ambiguities and to choose between multiple solutions. But this restriction is not compatible with the use of several light sources. In the following we propose two methods of shape from shading based on the local estimation of the discrete surface normal and the propagation of height informations. Both methods can be used with several images in order to improve the reconstruction. We have tested these methods both on computer generated images and on real images provided by archaeologists. The real images we use are photos of carved stone and of resin casts of small archaeological objects. In both cases it is possible to make the hypothesis that the surface to reconstruct is a Lambertian surface. A Lambertian surface is a surface with only diffuse
Reconstruction of Discrete Surfaces
259
reflectance. The intensity only depends on the normal vector and on the light source direction and thus does not depend on the position of the observer. In section 2 we briefly recall the definition of reflectance map introduced by Horn in 1977 [4] to describe the luminance of a Lambertian surface in function of the orientation of the surface normal. In section 3 we present a shape from shading method based on the propagation of equal height contours and in section 4 a method based on the propagation of regions. Some results on synthetic and real images are presented in section 5.
2
Reflectance Map
It is convenient to choose a coordinate system (O, x, y, z) where the z axis is in the direction of the observer. A direction is represented by a pair of angles (θ, φ) where θ is an horizontal component and φ a vertical component (see Fig. 1-a). Moreover if we denote by p and q the partial derivatives of the height of the and q = ∂Z(x,y) ) we surface Z(x, y) in the direction of x and y (p = ∂Z(x,y) ∂x ∂y have: p = − cos φ tan θ and q = − sin φ tan θ (1) where (θ, φ) is the direction of the normal at the point (x, y). Given a light source of direction (ps , qs ) and a surface Z(x, y) it is convenient to describe the intensity of the surface by a function R(p, q) of the orientation of the normal. This function is called a reflectance map. For a Lambertian surface the reflectance map is given by: 1 + pps + qqs R(p, q) = (2) 1 + p2 + q 2 1 + p2s + qs2 Given an image intensity Li the possible orientations of the surface at this point are given by the reflectance equation: R(p, q) = Li
(3)
By substituting p and q by φ and θ in Equ. (3) we get: 1 − ω tan θ (4) 1 + tan2 θ = K with K = Li 1 + p2s + qs2 and ω = ps cos φ + qs sin φ. The problem of shape from shading is to define strategies to select the better normal orientation among all the possible solutions given by the reflectance map.
3
Contour Based Approach
If we suppose known the horizontal angle φ of a normal (p, q) we can derive an expression of the vertical angle from Equ. (4) and we get only two possible solutions: √ −ω ± −K 4 + K 2 + ω 2 K 2 (5) θ = arctan K 2 − ω2
260
Achille Braquelaire and Bertrand Kerautret
The related partial derivatives are given by: √ p1 = − cos φ −ω+ −K24 +K22 +ω2 K 2 K −ω √ q1 = − sin φ −ω+ −K24 +K22 +ω2 K 2 K −ω √ p2 = − cos φ −ω− −K24 +K22 +ω2 K 2 K −ω √ q2 = − sin φ −ω− −K24 +K22 +ω2 K 2 K −ω This result is illustrated by Fig. 1-b which shows a reflectance map as a function of φ and θ. The level displayed in black corresponds to the intensity of the point which we want to reconstruct the normal. If we consider a pixel Q of the gray level image for which the horizontal component φQ of the reconstructed normal is known, the two possible values for θQ are given by the points Q and Q of the reflectance map. If we make the hypothesis that the surface to reconstruct is continuous we can choose between the both solutions by selecting the one which is the closest from the normal of a known neighbor. For example, let us consider a pixel P1 of the gray level image for which the normal of the reconstructed surface is known, and let P2 be a neighbor of P1 in the image. If the horizontal component of the normal of the reconstructed surface at P2 is known, there are only two solutions for the vertical component. We select the solution which is the closest from the normal at P 1. On the running example the solution for θP2 is given by the point P2 of the reflectance map (see Fig. 1-b). θ π 2
Z θ
P0
Q
P1P2
L0 Y
φ
P2
−π 2
X
(a)
φP2
Q 0
φQ
φ π 2
(b)
Fig. 1. The angles θ and φ associated with a normal (a) (here the normal is also the light source direction), and the two solutions corresponding to a value of the horizontal angle φ (b).
Let us now consider a discrete surface Z(x, y) and a discrete contour C on this surface. The contour Γ is a sequence of points (x1 , y1 , z1 ) . . . (xk , yk , zk ) such that the sequence γ = (x1 , y1 ) . . . (xk , yk ) is a connected path of the discrete plane according to a connectivity relation. Let us also suppose that all the points of Γ have the same height z. Thus we have Γ = (x1 , y1 , z) . . . (xk , yk , z) and the horizontal value of the normal vector φ at a point P of Γ is also the normal at the related point of the discrete 2D path γ. Thus we have decomposed the problem of determination of the normal into two subproblems:
Reconstruction of Discrete Surfaces
261
1. The determination of the horizontal component of the normal by estimation of the discrete normals of a 2D discrete path. We use the discrete tangent estimator of the Euclidean paths model [1]. 2. The determination of the vertical component of the normal by selecting in the reflectance map one of the two possible solutions. Suppose now that we have determined the normal of the reconstructed surface for each point of the equal height contour. Thus we know an estimation of local derivatives (p, q) at each point of the contour and we use them to calculate the height of neighboring points. Fig. 2-a shows how these informations are propagated from an outer contour (the light gray points) for which heights and normals are known to an inner adjacent contour (the dark gray points). Each dark gray point is 4-adjacent to at least one light gray point. Let P of coordinates (i, j) be a point which the height has to be calculated and N (P ) a set of 4-neighbors of P for which the height and the normal are known. The propagation of the height to adjacent points is computed according to the following formula: Z(P ) =
1 |N (P )|
(Z(P ) + (i − i)pP + (j − j)qP )
P ∈N (P )
where i and j are the coordinates of the point P and (pP , qP ) is the direction of the reconstructed normal at P . We can now define a strategy to reconstruct the surface corresponding to a shading image. The principle of the reconstruction is to traverse the whole surface by propagating equal height contours. First we initiate an equal height contour for which normals are assumed to be known. The evolution of the contour is done by iterating the following steps: 1. Calculate the normal horizontal components at each contour point of the equal height contour from the related 2D discrete curve. 2. Calculate the normal vertical components and the local derivatives p and q at each contour point of the equal height contour. 3. Propagate the height estimations to the adjacent contour in the direction of the propagation. 4. If some adjacent points have the same height as the contour (the difference between both heights is lower than a threshold), goto step 1. When the propagation of the contour is achieved a new equal height contour is initialized and propagated. With this approach it is straightforward to process together several images of a same surface. The images are supposed to have a same point of view and only the light source is changing from an image to another one. According to distribution of the iso-intensity area of the reflectance map we get more precision when the φ angle is closest to the light source direction. Thus for each point we consider the image in which its light direction is the closest from the φ angle.
262
4
Achille Braquelaire and Bertrand Kerautret
Region Based Approach
The method of shape from shading described in this section is based on the evolution of patches initialized in plane areas of the source image which are orthogonal to the viewer direction. The principle of the method is to determine the height of the points adjacent to the patch by minimizing the error between the luminance calculated with the normal estimated in the patch and the real image intensity. If RI is the reflectance map of an image I, if P is a pixel of I, and if (p, q) is the normal of the reconstructed surface at the point P , the error of the reconstruction at P is given by: E(P ) = (RI (p, q) − I(P ))2 where I(P ) denotes the intensity of P in I. The normal of a point at the boundary of the patch depends on both points being inside and outside the patch. For the inside points we assume that both the height and the normal is known (the height is a relative height depending on the height of the initial equal height patch). On the other hand no information is known for outside points. The method consists in initializing the height of each point adjacent to the patch with the average of the heights of its neighbors in the patch. Then a point adjacent to the patch, say P , is randomly selected and its height is changed upward or downward by a step ∆h. The direction upward or downward is also selected randomly. This change will influence the normal of the points of the patch. Thus it is possible to calculate the error E at each neighbor of P in the patch before and after the vertical move of P . The move is validated only if it makes the error decrease. Consider the example presented in Fig. 2-b. The initial patch is composed of the dark gray pixels and the pixels connected to the patch are the light gray ones. The white pixels are uninitialized pixels. Any change of the height of the point D involves a modification of the normals of the points A, B and E. 0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
A
B
D
E
C F
G
H
I
0 0
0
0
0
0
0
0
0
(a)
(b)
Fig. 2. Example of normal propagation for determining the height of the internal pixels (a) and influence of the height of the pixels connected to the patch (b).
If P is a point adjacent to the patch we denote by Π(P ) the points of the patch which are 8-adjacent to P . We define the error EΠ (P ) by: E(Q) EΠ (P ) = Q∈Π(P )
Reconstruction of Discrete Surfaces
263
When P is randomly selected and moved along the vertical axis to a point P the move is validated only if EΠ (P ) < EΠ (P ). The processing of a point P of the neighborhood of the patch may be summarized as follows: 1. Select a direction (upward or downward). 2. Let P be the result of the move of P in the selected direction. Estimate the normals at each point of Π for the positions P and P , 3. Compute EΠ (P ) and EΠ (P ). 4. If EΠ (P ) < EΠ (P ) then validate the move of P . The iteration of this process may diverge if the estimation of the normal is not accurate enough. This raises a problem since only the immediate neighborhood of a point P can be used to estimate the normal at this point. This is because the height is defined only for the points which are either inside the patch or adjacent to it. We have experimented the method with two different normal estimations: a discrete estimator and a direct geometric calculation. When using the discrete estimator the vertical step ∆h is set to 1. We used the discrete estimator proposed by Thurmer [10]. Since we only consider a 3Dneighborhood of size 1 the estimation of the normal at a point P is given by − − →
PQ − → where N (P ) is the subset of the 26-neighborhood of P being Q∈N (P ) − ||P Q||
outside the surface according to the normal direction. With the direct geometric calculation the step ∆h may P7 P6 P8 be set to values lower than 1. Let P0 be the point which the S3 S4 normal is estimated and P1 , . . . P8 its neighborhood on the P0 reconstructed surface, and S1 , . . . S4 its interpixel neighborP5 P1 S2 S1 hood (see Figure 3). The normal at S1 is estimated by the −−−→ −−−→ cross-product P0 P2 × P3 P1 and so on for S2 , S3 and S4 . The P4 P P3 2 normal at P0 is then the average of normals at S1 , S2 , S3 and S4 [9]. Fig. 3. When the height of all the points adjacent to the patch has been determined we select the points (i, j) such that Z(i, j) ∈ [h − ∆h, h + ∆h], where h is the reference height of the patch. The reference height of the initial patch is zero and change with the growing of the patch as described below. The selected points are added to the patch and the process is iterated until saturation. The reference height of the patch is then increased by ∆h and the patch is saturated with the points which the estimated height is around the new current patch height. The points of decreasing height are processed in the same way. The process is iterated until the whole image is traversed. This method has been tested with the both normal estimation methods described above. It appears that with the discrete estimator the method always converge when with the direct geometric calculation the method may diverge. On the other hand the discrete estimator produces rough results because of the smallness of the available neighborhood when the direct geometric calculation gives more accurate results and can be used with threshold lower than one. Thus we use first the discrete estimator to initialize the reconstructed surface
264
Achille Braquelaire and Bertrand Kerautret
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 4. Contour evolution with the contour based method and related height maps (from (a) to (f)). Reconstruction of a sphere-cone (e) and of a pyramid from synthetic images (f).
and then we refine the construction with the direct geometric calculation and with thresholds lower than one. It is possible to make the method running with more than one patch. All the patches are processed in parallel. The initial reference height is the same for all the patches but as soon as the patches grow their reference height may change independently. When two different patches meet they are merged and both their reference heights and the height of their points are adjusted according to an average value of the height of the points of the junction. Finally we can process several images in parallel by summing the error EΠ (P ) calculated for each reflectance map. The results are improved by invalidating in each image the self shadows areas.
5
Experiments
The first results presented in this section are reconstruction performed on computer generated images. We have used sets of four images of a same scene made of simple forms such as spheres and pyramids. The vertical orientation of the light source θs was set to 70◦ in order to improve the precision of the reflectance map. The Fig. 4 shows different steps of the reconstruction of an object which
Reconstruction of Discrete Surfaces
(a)
(b)
(c)
265
(d)
Fig. 5. One of the four real source images (a) and the initial patches used in the reconstruction. The result of the reconstruction (c) and a 3D visualization of the difference between the reconstructed surface and the scanned one.
consists of a sphere and a cone. For each point the image with the higher value is used to compute the vertical component of the normal. We can see on the image in the left part of Fig. 4-a the angular sector of this image which is used in the reconstruction. The other points are reconstructed by using one of the three other images. In Fig. 4-e and f is displayed the result of the reconstructions of the object Fig. 4 (a) and of a pyramid. On real images this method is quite sensitive to the noise and errors may be propagated along the whole reconstruction. The second method gives very good results with synthetic images. We have also experimented it for the reconstruction of archaeological objects. The first example comes from miscroscope images of a resin cast of a small hole which was also scanned with a 3D scanner. It was thus possible to compare the result of the reconstruction with the real surface. We have used four images of the cast with a vertical angle of the light source θs set to 60◦ and an horizontal angle φs varying by step of approximatively 90◦ . The light source direction was determined from the shadow of a small vertical pin. One of the four images is displayed in Fig. 5-a and the black areas in Fig. 5-b are the initial patches. The result of the reconstruction is displayed in Fig. 5-c and the differences between the reconstructed height map and the real one are displayed in Fig. 5-d. We have experimented this method with more complex surfaces such as the surface of the border of the replica of an ivory spatula. The spatula presents on her border a set of notches. One photo of the spatula with the initial patches is displayed in Fig. 6-a. The initial patches (drawn in black in the image) were selected on small plane region located between the notches. The interest of the reconstruction of archaeological objects have been validated by archaeologist expert.
6
Conclusion
In this paper we have proposed two new methods for reconstruct a surface from shading images. Both methods are based on the propagation of geometric features along the reconstructed discrete surface. The first method decomposes the estimation of the normal at a point of the reconstructed surface into a geometric estimation of the horizontal component and a photometric estimation of the vertical one. The second method uses normal reconstructions to minimize a photometric error function. It appears that the second method gives better results than the first one with photo of real objects but it would be interesting to try to use the principle of the first one to improve the results of the second one. More
266
Achille Braquelaire and Bertrand Kerautret
(a)
(b)
Fig. 6. One of the four real images of a replica of a prehistoric spatula with the initial patches drawn in black (a) and a representation with OpenGL of the reconstructed surface (b).
generally we have proposed an approach to develop shape from shading methods in the context of discrete geometry.
Acknowledgment We thank Francesco d’Errico from the Institut de pr´ehistoire et g´eologie du quaternaire for having kindly provided the numerical data of archaeological objects used in this work.
References 1. J.P. Braquelaire and A. Vialard. Euclidean paths: A new representation of boundary of discrete region. Graphical Models and Image Processing, 61:16–43, 1999. 2. R.T. Frankot and R. Chellappa. A method for enforcing integrability in shape from shading algorithm. IEEE PAMI, 10, 1988. 3. B.K.P. Horn. Shape from Shading: a Method for Obtaining the Shape of a Smooth Opaque Object from One View. PhD thesis, Departement of Electrical Engineering, MIT, 1970. 4. B.K.P. Horn. Understanding image intensity. Artificial Intelligence, 8(11), 1977. 5. K. Ikeuchi and B.K.P Horn. Numerical shape from shading and occluding boundaries. Artificial Intelligence, 17(1-3), 1981. 6. R. Kimmel and A.M Bruckstein. Tracking level sets by level sets: a method for solving the shape from shading problem. CVIU, 62(2), July 1995. 7. C.H. Lee and A. Rosenfeld. Improved methods of estimating shape from shading using the light source coordinate system. Artificial Intelligence, 26:439–451, 1985. 8. H. Saito, Y. Somiya, and S. Ozawa. Shape reconstruction of skin surface from shading images using simulated annealing. ACCV, 3, 1995. 9. W.F. Taylor. The geometry of computer. Wadsworth and Brooks, 1992. 10. Grit Thurmer. Normal computation for discrete surfaces in 3d space. Eurographics, 16(3), 1997. 11. P.S. Tsai and M. Shah. A simple shape from shading algorithm. CVPR, 1992. 12. R. J. Woodham. Photometric method for determinating surface orientation from multiple images. Optical Engineering, 19, 1980. 13. R. Zhang, P. Tsai, J.E. Cryer, and M. Shah. Shape from shading: A survey. IEEE PAMI, 21(8):690–706, August 1999.
Shape Representation and Indexing Based on Region Connection Calculus and Oriented Matroid Theory Ernesto Staffetti1 , Antoni Grau2 , Francesc Serratosa3 , and Alberto Sanfeliu1 1
2
Institute of Industrial Robotics (CSIC-UPC) Llorens i Artigas 4-6, 08028 Barcelona Spain {estaffetti,asanfeliu}@iri.upc.es Department of Automatic Control, Technical University of Catalonia Pau Gargallo 5, 08028 Barcelona Spain [email protected] 3 Department of Computer Engineering and Mathematics Rovira i Virgili University, Av. Paisos Catalanes 26, 43007 Tarragona Spain [email protected]
Abstract. In this paper a novel method for indexing views of 3D objects is presented. The topological properties of the regions of the views of a set of objects are used to define an index based on the region connection calculus and oriented matroid theory. Both are formalisms for qualitative spatial representation and reasoning and are complementary in the sense that whereas the region connection calculus encodes information about connectivity of pairs of connected regions of the view, oriented matroids encode relative position of the disjoint regions of the view and give local and global topological information about their spatial distribution. This indexing technique is applied to 3D object hypothesis generation from single views to reduce candidates in object recognition processes.
1
Introduction
In this paper we present a new method for indexing views of 3D objects which is applied to 3D object hypothesis generation from single views to reduce candidates in 3D object recognition processes. Given a set of views of different 3D objects, the problem of object recognition using a single view becomes the problem of finding a subset of the set of regions in the image with a relational structure identical to that of a member of the set of views. The standard way to reduce the complexity of shape matching is subdividing the problem into a hypothesis generation followed by a verification. To be of interest for object recognition, hypothesis generation should be a relatively fast although imprecise procedure in which several possible candidates for matching are generated. In this way the verification can be carried out using a more complex, and therefore, slower procedure [1] over a reduced number of I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 267–276, 2003. c Springer-Verlag Berlin Heidelberg 2003
268
Ernesto Staffetti et al.
b
b a
DC(a, b) (a)
b a
EC(a, b) (b)
b a
PO(a, b) (c)
a
TPP(a, b) (d)
b
a
NTPP(a, b) (e)
b
a
EQ(a, b) (f)
Fig. 1. Some of the 8 possible relative positions of two regions and the corresponding descriptions using the formalism of the region connection calculus. The other two can be obtained from (d) and (e) interchanging a with b. In situation (a) a is disconnected from b, in (b) a is externally connected to b, in situation (c) a is partially overlapped to b, in (d) a is tangential proper part of b, in (e) a is non-tangential proper part of b and, finally, in situation (f) a and b coincide.
candidates. The hypothesis generation can be carried out very efficiently if it is formulated as an indexing problem where the set of views of the set of 3D objects are stored into a table that is indexed by some function of the views themselves. In this paper an indexing technique that combines the region connection calculus and oriented matroid theory is presented. More precisely, the type of connectivity between connected regions of the views is described by means of the formalism of the region connection calculus [2], whereas the topological properties of the disconnected regions of the views are encoded into a data structure called set of cocircuits [3]. The set of cocircuits, that are one of the several combinatorial data structure referred to as oriented matroids, encode incidence relations and relative position of the elements of the image and give local and global topological information about their spatial distribution. Reasoning with the region connection calculus is based on composition tables, while oriented matroids permit algebraic techniques to be used. These two descriptions merged are used as an index of the database. This indexing method is employed to the hypothesis generation for 3D object recognition from single views that can be regarded as a qualitative counterpart of the geometric hashing technique [4]. For another approach to shape representation and indexing based on combinatorial geometry see [5]. The region connection calculus and oriented matroids are introduced in Section 2 whereas Section 3 describes the proposed indexing method. In Section 4 some experimental results are reported and Section 5 contains the conclusions.
2
Qualitative Spatial Representation
Qualitative reasoning is based on comparative knowledge rather than on metric information. Many methods for shape representation and analysis are based on extracting points and edges which are used to define projectively invariant descriptors. In this paper, instead of points, regions of the images are taken into account. The motivation behind this choice is that the regions of an image can be more reliably extracted than vertices and edges. In the following sections two formalisms for qualitative representation and reasoning are described: the first
Shape Representation and Indexing
269
b b b a OUT(a, b) (a)
a P-INS(a, b) (b)
a INS(a, b) (c)
Fig. 2. Some of the possible positions of a convex region with respect to the convex hull of a non-convex one.
one is based on the region connection calculus and the second one is derived from oriented matroid theory. 2.1
Region Connection Calculus
For spatially extended objects we can qualitatively distinguish the interior, the boundary, and the exterior of the object, without taking into account the concrete shape or size of the object. A set theoretical analysis of the possible relations between objects based on the above partition is provided by [6]. The relation between objects that they examine is the intersection between their boundaries and interiors. This setting is based on the distinction of the values empty and non-empty for the intersection. Some variants of this theory were developed by Cohn and his coworkers in a series of papers (see for example [2]). In this work the distinction between interior and the boundary of an object is abandoned, and eight topological relations derived from the single binary relation “connected to” are taken into account. Some of them are represented in Fig. 1. Some of these relations, namely those of Fig. 1.d and Fig. 1.e, are not symmetrical and, following the notation of [2], their inverses are denoted TPPi(a, b) and NTTPi(a, b), respectively. Furthermore in [2] the theory is extended to handle concave objects by distinguishing the regions inside and outside of the convex hull of the objects. A convex object can be inside, partially inside or outside the convex hull of a non-convex one (Fig. 2). If both regions are non-convex 23 relations between them can be defined. These relations permit qualitative description of rather complex relations, such as that represented in Fig. 3. Moreover, by means of this formalism called region connection calculus it is possible, for instance, to infer the relative position of two regions knowing their position with respect to a third one. Reasoning with the region connection calculus is essentially based on composition tables. 2.2
Oriented Matroids
Oriented matroid theory [3], [7], [8] is a broad setting in which the combinatorial properties of geometrical configurations can be described and analyzed. It
270
Ernesto Staffetti et al.
a b
Fig. 3. With the formalism of the region connection calculus the relation between these two disconnected non-convex regions, where a is partially inside the convex hull of b and vice versa, is denoted by P-INS P-INSi DC(a, b).
provides a common generalization of a large number of different mathematical objects usually treated at the level of usual coordinates. In this section oriented matroids will be introduced over arrangements of points using two combinatorial data structures called chirotope and set of cocircuits, which represent the main tools to translate geometric problems into this formalism. In the abstraction process from the concrete configuration of points to the oriented matroid, metric information is lost but the structural properties of the configuration of points are represented at a purely combinatorial level. Oriented Matroids of Arrangements of Points. Given a point configuration in Rd−1 whose elements are the columns of the matrix P = (p1 , p2 , . . . , pn ), the associated vector configuration is a finite spanning sequence of vectors {x1 , x2 , . . ., xn } in Rd represented as columns of the matrix X = (x1 , x2 ,. . ., xn ) where each point pi is represented in homogeneous coordinates as xi = p1i . To encode the combinatorial properties of the point configuration we can use a data structure called chirotope [8], which can be computed by means of the associated vector configuration X. The chirotope of X is the map χX : {1, 2, . . . , n}d → {+, 0, −} (λ1 , λ2 , . . . , λd ) → sign ([xλ1 , xλ2 , . . . , xλd ]) that assigns to each d-tuple of vectors of the finite configuration X a sign + or − depending on whether it forms a basis of Rd having positive or negative orientation, respectively. This function assigns the value 0 to those d-tuples that do not constitute a basis of Rd . The chirotope describes the incidence structure between the points of X and the hyperplanes spanned by the same points and, at the same time, encodes the relative position of the points of the configuration with respect to the hyperplanes that they span. Consider the point configuration P represented in Fig. 4 whose associated vector configuration X is given in Table 1. Table 1. Vector configuration that corresponds to the planar point configuration represented in Fig. 4. x1 = (0, 3, 1)T x2 = (−3, 1, 1)T x3 = (−2, −2, 1)T x4 = (2, −2, 1)T x5 = (3, 1, 1)T x6 = (0, 0, 1)T
Shape Representation and Indexing
271
p1 p2
p5
p6 p3
p4
Fig. 4. A planar point configuration. Table 2. Chirotope of the planar point configuration represented in Fig. 4. χ(1, 2, 3) = + χ(1, 3, 5) = + χ(2, 3, 4) = + χ(2, 5, 6) = −
χ(1, 2, 4) = + χ(1, 3, 6) = + χ(2, 3, 5) = + χ(3, 4, 5) = +
χ(1, 2, 5) = + χ(1, 4, 5) = + χ(2, 3, 6) = + χ(3, 4, 6) = +
χ(1, 2, 6) = + χ(1, 4, 6) = − χ(2, 4, 5) = + χ(3, 5, 6) = +
χ(1, 3, 4) = + χ(1, 5, 6) = − χ(2, 4, 6) = + χ(4, 5, 6) = +
Table 3. Set of cocircuits of the planar point configuration represented in Fig. 4. (0, 0, +, +, +, +) (0, −, −, −, 0, −) (+, 0, −, 0, +, +) (+, +, 0, 0, +, +) (+, +, +, 0, 0, +)
(0, −, 0, +, +, +) (0, −, −, +, +, 0) (+, 0, −, −, 0, −) (+, +, 0, −, 0, +) (−, +, +, 0, −, 0)
(0, −, −, 0, +, −) (+, 0, 0, +, +, +) (+, 0, −, −, +, 0) (+, +, 0, −, −, 0) (−, −, +, +, 0, 0)
The chirotope χX of this vector configuration is given by the orientations listed in Table 2. The element χ(1, 2, 3) = + indicates that in the triangle formed by p1 , p2 , and p3 these points are counterclockwise ordered. These orientations can be rearranged in an equivalent data structure called set of cocircuits of X shown in Table 3. In this planar case, the set of cocircuits of X is the set of all partitions generated by the lines passing through two points of the configuration. For example, (0, 0, +, +, +, +) means that the points p3 , p4 , p5 , and p6 lie on the half plane determined by the line through the points p1 and p2 . Reversing all the signs of the set of cocircuits we obtain an equivalent description of the planar arrangement of points. Besides chirotopes and cocircuits there are several data structures capable of encoding the topological properties of a point configuration. In [8] their definitions can be found and it is shown that all of them are equivalent and are referred to as oriented matroids. Oriented Matroid of Arrangements of Regions. Consider a segmented view of a 3D object. Extracting the oriented matroid of a view is not straightforward since the regions that form the image cannot be reduced to points, taking for instance their centroids, without losing essential topological information for
272
Ernesto Staffetti et al.
object recognition. Therefore, the convex hull [9] of each region is employed to represent the region itself. Then, pairs of the resulting convex polygons are considered and the oriented matroid is computed based on the spatial location of the other convex regions of the image with respect to the two lines arising in merging the convex hulls of pairs disconnected regions. Consider, for instance, the ordered pair of convex regions (S, T ) of Fig. 5.a. It is easy to see that the convex hull of these two planar convex disconnected polygonal regions is a polygon whose set of vertices is included in the union of the set of vertices of S and T . On the contrary, the set of edges of the convex hull of S and T is not included in the union of their set of edges. Indeed, two new “bridging edges,” e1 and e2 , appear as illustrated in Fig. 5.a. Actually, efficient algorithms for merging convex hulls are based on finding these two edges [10].
T
e1
e2 S (a)
IS,T
LS,T
U
l1 IS,T
l2 (b)
Z RS,T
V (c)
Fig. 5. Steps of encoding of the combinatorial properties of a view of an object into a chirotope.
Consider the two lines l1 and l2 that support e1 and e2 . These two lines divide the image into three or four zones depending on the location of their intersection point with respect to the image. Let RS,T , LS,T (Fig. 5.b) be, respectively, the rightmost and leftmost zones with respect to l1 and l2 and IS,T the zone of the image comprised between them. Since, RS,T , LS,T and IS,T can be univocally determined from the ordered couple of region (S, T ), the location of a region U with respect to the regions (S, T ) of the image is encoded into a chirotope using the following rule + if U ∈ LS,T , χ(S, T, U ) = 0 if U ∈ IS,T , − if U ∈ RS,T . It has been implicitly assumed that U is completely contained into either RS,T LS,T or IS,T but, in general, it belongs to more that one of them. In this case, since the ratio of areas is an affine invariant, introducing an approximation, we can choose the sign based on which region contains the largest portion of the area of U . For instance, if regions U , V and Z are located as in Fig. 5.c we have that χ(S, T, U ) = +, χ(S, T, V ) = 0 and χ(S, T, Z) = −. 2.3
Invariance of the Representation
Consider a 3D point configuration and one of its views. The combinatorial structure of the 3D point configuration and that of its 2D perspective projection are
Shape Representation and Indexing
273
related in the following way: if x0 represents in homogeneous coordinates the center of the camera, p0 , we have that sign[¯ xi , x ¯j , x ¯k ] = sign[xi , xj , xk , x0 ]
(1)
where xi , xj and xk are the homogeneous coordinates of the 3D points pi , pj and pk , and x ¯i , x ¯j and x ¯k are those of the corresponding points in the view, p¯i , p¯j and p¯k . Equation (1) can be regarded as a projection equation for chirotopes. It is easy to see that, whereas the matrix that represents in homogeneous coordinates the vertices of a projected set of points is coordinate-dependent, an oriented matroid is a coordinate-free representation. Moreover, the representation of object views based on oriented matroid is a topological invariant, that is, an invariant under homeomorphisms. Roughly speaking, this means that the oriented matroid that represents the arrangement of points of a view of an object does not change when the points undergo a continuous transformation that does not change any orientation of the chirotope. Doe to this property this representation is robust to discretization errors of the image as well as to small changes of the point of view that does not change any orientation of the chirotope. Since projective transformations can be regarded as special homeomorphisms, we can assert that the representation of the projected set of points based on oriented matroids is projective invariant. However, since affine and Euclidean transformations are special projective transformations, the oriented matroid of the projected set of points of a view of an object does not change under rotations, translations, and affine transformations of the planar arrangement of points themselves. These considerations can be extended to the case in which oriented matroids represent arrangements of planar regions. Since the ratio of areas is not invariant under projective transformations this representation will be invariant only under affine and Euclidean transformations of the views.
3
Indexing Views of 3D Objects
The process of indexing a database of views of a set of objects starts with some preliminary choices, namely the features used to characterize the regions of the segmented views of the set of 3D objects. Suppose that hue and area are used to characterize each region. Another parameter to choose is the number of levels in which the hue is quantized and the number of regions having the same hue that will be taken into account. These choices, of course, depend on the properties of the views of the database. Then, the views are segmented according to these choices and the convex hull of each region is computed. As a consequence, the resulting images are compositions of convex polygonal regions that can be disconnected or partially or completely overlapped. In Fig. 6 are represented two views of two objects in which a hue quantization with 6 levels W , R, Y , G, B and N has been applied and only the two biggest regions with the same hue value are taken into account.
274
Ernesto Staffetti et al.
Let (W, R, Y, G, B, N ) be the ordered tuple of hue levels considered. For example, labels G1 and G2 in Fig. 6 denote, respectively, the first and the second regions of the views with the biggest area having the same hue value G. The type of connection between the existing regions is described using the formalism of the region connection calculus. For each pair of disconnected regions the set of cocircuits is computed. This is done for each view of the database and this information is combined into a unique index table whose entries are spatial combinations of features and whose records contain a list of the views in which each combination is present.
W B2 G2
R N
B1
Object 1
N
Y
G1
B2
G
1
W
B1
Object 2
Fig. 6. Two views of two objects whose topological properties are indexed in Table 4.
In Table 4 the index of the topological properties of the two views v1,1 and v1,2 of the objects represented in Fig. 6 is reported. In the first column the relation between ordered couples of regions is described in terms of the region connection calculus. The symbol “∅” for a certain couple (S, T ) indicates that no view contains two regions having features S and T . This is the case of the regions R and Y . When S and T are disconnected, the corresponding cocircuit is present in the index. The symbol “∗” in correspondence with a certain feature indicates that no region with that feature is present in the views listed in the record. For example, the cocircuit W R contains a ∗ in the column Y because no region with the Y feature is present in v1,1 . If (S, T ) is a couple of connected regions, the corresponding row of the index is empty because the cocircuit cannot be computed. 3.1
Hypothesis Generation for Object Recognition
Given a database of views of a set of 3D objects and a view vi of one of them, not necessarily contained in the database, its set of cocircuits is computed. Each cocircuit is used to access the table that constitutes the index of the database. Then the views that best match vi are selected based on the number of correspondences they have with vi in terms of cocircuits. It is easy to see that this method for hypothesis generation, that can be regarded as a qualitative version of the geometric hashing technique [4], is also robust to partial occlusions of the objects. Indeed, if a region of an image is
Shape Representation and Indexing
275
Table 4. Index of the topological properties of the two views v1,1 and v1,2 of the two objects represented in Fig. 6.
WR WY W G1 W G1 W G2 W B1 W B1 W B2 W B2 WN WN RY RG1 ··· B2 N B2 N
Connection DC DC NTPP DC DC DC NTPP DC NTPPi DC DC ∅ NTPP ··· DC DC
W R Y G1 G2 B1 B2 N Objects 0 0 ∗ 0 0 0 - + v1,1 0 ∗ 0 0 ∗ 0 0 v1,2 v1,1 0 ∗ 0 0 ∗ 0 0 0 v1,2 0 0 ∗ 0 0 + 0 0 v1,1 0 0 ∗ 0 0 0 0 0 v1,1 v1,2 0 0 ∗ + + + 0 + v1,1 v1,2 0 0 ∗ - - - - 0 v1,1 0 ∗ + + ∗ 0 0 0 v1,2
··· ··· ··· ··· ··· ··· ··· ··· + 0 ∗ - - - 0 0 - ∗ + + ∗ + 0 0
v1,1 ··· v1,1 v1,2
occluded, the set of cocircuits can still be computed and therefore, the number of correspondences with the views of the database can still be calculated. In this case, obviously, its selectivity decreases.
4
Experimental Results
The method has been fully implemented and experiments with different sets of 3D objects have been carried out to validate it. Sixteen views of each object with angular separation of 22.5 degrees have been used for the experiments. These images have been segmented using the segmentation method described in [11]. Then, the index of the learning set of eight views per object taken at the angles 0, 45, 90, 135, 180, 225, 270 and 315 has been created. In the recognition process the set of cocircuits of each image of the test set composed by the eight views not used in the learning process that is, the views taken at angles: 22.5, 67.5, 115.5, 157.5, 202.5, 247.5, 292.5 and 337.5 degrees, has been calculated. The experimental results are encouraging and currently we are refining the method introducing a distance measure between set of cocircuits.
5
Conclusions
In this paper a new method for indexing a database of views of 3D object has been presented. It is based on the combination of two qualitative representations derived from the region connection calculus and oriented matroid theory. This combination of qualitative representations characterizes the local and global topology of the regions of an image, is invariant under affine and Euclidean transformation of the views, intrinsically robust to discretization errors of the image and insensitive to small displacements of the point of view.
276
Ernesto Staffetti et al.
References 1. Serratosa, F., Alqu´ezar, R., Sanfeliu, A.: Function-described for modeling objects represented by attributed graphs. Pattern Recognition 36 (2003) 781–798 2. Cohn, A., Bennett, B., Gooday, J., Gotts, N.M.: Qualitative spatial representation and reasoning with the region connection calculus. GeoInformatica 1 (1997) 275– 316 3. Bj¨ orner, A., Vergnas, M.L., Sturmfels, B., White, N., Ziegler, G.M.: Oriented Matroids. Volume 43 of Encyclopedia of Mathematics and its Applications. Cambridge University Press (1993) 4. Lamdan, Y., Schwartz, J.T., Wolfson, H.J.: Affine invariant model-based object recognition. IEEE Transactions on Robotics and Automation 6 (1990) 5. Carlsson, S.: Combinatorial geometry for shape representation and indexing. In: Proceedings of the International Workshop on Object Representation for Computer Vision. (1996) 6. Egenhofer, M.J., Franzosa, R.D.: Point set topological relations. International Journal of Geographical Information Systems 5 (1991) 161–174 7. Bokowski, J., Sturmfels, B.: Computational Synthetic Geometry. Volume 1355 of Lecture Notes in Mathematics. Springer–Verlag (1989) 8. Richter-Gebert, J., Ziegler, G.M.: Oriented matroids. In Goodman, J.E., O’Rourke, J., eds.: Handbook of Discrete and Computational Geometry. CRC Press (1997) 9. Rourke, J.O.: Computational Geometry in C. Cambridge University Press (1999) 10. Toussaint, G.T.: Solving geometric problems with the rotating calipers. In: Proceedings of IEEE MELECON’83, Athens, Greece (1983) 11. Comaniciu, D., Meer, P.: Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 24 (2002) 603–619
Incremental Algorithms Based on Discrete Green Theorem Sreˇcko Brlek, Gilbert Labelle, and Annie Lacasse Laboratoire de Combinatoire et d’Informatique Math´ematique Universit´e du Qu´ebec ` a Montr´eal, CP 8888, Succ. Centre-ville, Montr´eal (QC) Canada H3C3P8 {brlek,gilbert,lacasse}@lacim.uqam.ca Abstract. By using the discrete version of Green’s theorem and bivariate difference calculus we provide incremental algorithms to compute various statistics about polyominoes given, as input, by 4-letter words describing their contour. These statistics include area, coordinates of the center of gravity, moment of inertia, higher order moments, size of projections, hook lengths, number of pixels in common with a given set of pixels and also q-statistics. Keywords: Discrete Green Theorem, statistics about polyominoes.
1
Introduction
In this paper, the word polyomino means a finite union of unit lattice closed squares (pixels) in the plane whose boundary consists of a simple closed polygonal path using 4-connectedness. In particular, our polyominoes are simply connected (contain no holes), and have no multiple points. The polygonal path γ (contour) of a polyomino can be encoded by an ordered pair (s, ω) where s is a lattice point belonging to γ and ω is a word over the 4-letter alphabet A = {r, u, l, d} = {→, ↑, ←, ↓}, known as the Freeman chain code [8,9], corresponding to the unit translations, respectively, in the right, up, left and down direction. The word ω represents the perimeter of the polyomino described in a counterclockwise manner starting from point s. For example, the polyomino of Figure 1 is coded by (s, ω) where s = (0, 0) and ω = rdrdrrruuruulluuldlddlld. Many basic parameters associated to polyominoes (see Figure 1) can be represented by surface integrals. For example, the area A(P ), center of gravity CG(P ) and moment of inertia I(P ), of a polyomino P are defined by the integrals x dx dy y dx dy P P A(P ) = dx dy, CG(P ) = (¯ x, y¯) = , , dx dy dx dy P P P ((x − x ¯)2 + (y − y¯)2 ) dx dy = (x2 + y 2 ) dx dy − (¯ x2 + y¯2 )A(P ). I(P) = P
P
With the support of NSERC (Canada).
I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 277–287, 2003. c Springer-Verlag Berlin Heidelberg 2003
278
Sreˇcko Brlek, Gilbert Labelle, and Annie Lacasse
1 0 0 01 1 01 1 01 01 0 01 01 01 0 01 1 01 01 1 01 01 01 01 0 01 1 01 01 0 01 1 01 0
Area= 20 Center of gravity = (3.3, 0.55) Moment of inertia = 73.4833 Horizontal projections = (3,4,6,4,2,1) Vertical projections = (1,2,5,6,4,2)
Fig. 1. Some parameters for polyominoes.
The classical Green’s Theorem (see below) relates surface integrals to contour integrals. Since our polyominoes are given by words describing their contours, it is natural to use Green’s Theorem for the construction of our first general algorithms. In Section 2, we introduce the notion of incremental algorithm for polyominoes given by their contour and show how Green’s theorem can be used to generate families of such algorithms. In Section 3, we drop the continuity conditions of Green’s Theorem and deal with general additive incremental algorithms for which the output associated to the sum of two polyominoes is the sum of the outputs associated to each polyomino. The use of Green’s Theorem is not new in discrete geometry [9]. Our present approach is similar to the one given in [8, 11, 12] where discrete Green’s Theorem is applied to efficient moment computations. For a general presentation of polyominoes and their properties see [7]. A survey of enumerative results concerning polyominoes can be found in [10](see also [2, 3, 5]).
2
Green’s Theorem and Incremental Algorithms
The following version of Green’s Theorem will be sufficient to start our analysis. Theorem 1. [Green] Let P (x, y), Q(x, y) be continuously differentiable functions on an open set containing a simply connected region Ω bounded by simple piecewise continuously differentiable positively oriented curve Γ . Then ∂Q ∂P ) dx dy = − ( P (x, y) dx + Q(x, y) dy. ∂y Ω ∂x Γ Since the above parameters about polyominoes involve integrals of the form f (x, y) dx dy, P
our next step is to choose P (x, y) and Q(x, y), in Green’s Theorem, such that ∂P ( ∂Q ∂x − ∂y ) = f. There are many ways to do this, and we list three important ones in the following Lemma. Lemma 1 Let P be a polyomino with contour γ, and let f = f (x, y) be continuous. Then we have, (1) (2) (3) f (x, y) dx dy = f1 (x, y) dy = − f2 (x, y) dx = F (x, y)(xdy − ydx), P
γ
γ
γ
Incremental Algorithms Based on Discrete Green Theorem
where
x
f1 (x, y) =
y
f (u, y) du, f2 (x, y) =
f (x, v) dv, F (x, y) =
279
1
f (sx, sy)s ds. 0
The notation integration.
γ
stands for contour integration on γ while
t
means the indefinite
Proof. For (1), take P = 0, Q = f1 in Green’s Theorem. For (2), take P = −f2 , Q = 0. Formula (3) is more delicate and can be established as follows. Take, in Green’s Theorem, P (x, y) = −yF (x, y) and Q(x, y) = xF (x, y). Using some analytical manipulations it can be shown that (
∂F ∂F ∂Q ∂P − ) = 2F + x +y = f. ∂x ∂y ∂x ∂y
Incremental Algorithms. The evaluation of each line integral in Lemma 1 can be broken into simpler integrals over successive unit (horizontal or vertical) line segments forming γ: α= γ
n−1 i=0
α,
[v i ,v i+1 ]
where v i = (xi , yi ), i = 0, . . . , n − 1, denotes the successive vertices of the polyomino P , v n = v 0 , v i+1 = v i + ∆v i = (xi + ∆xi , yi + ∆yi ). Since our polyominoes are coded by (s, ω) where s ∈ Z × Z is the starting point and ω is a word over the alphabet A = {r, u, l, d}, the above discussion gives rise to incremental algorithms in the following sense: Starting from the source point s, the contour γ of the polyomino is described by reading ω letter by letter. At each step, the action made depends only on the current position on the boundary and on the letter read. More precisely, consider four vectors r = (1, 0), u = (0, 1), l = (−1, 0), d = (0, −1) and take four functions (one for each letter in A) Φr (x, y), Φu (x, y), Φl (x, y), Φd (x, y). Then cumulate sequentially the partial sums on ω = ω1 ω2 . . . ωn as follows: v := (x0 , y0 ); S := 0; for i = 1 to n do S := S + Φωi (v); v := v + ωi od; return S. We will use the following suggestive notation to represent the output of our incremental algorithm: Φr (xi , yi ) + Φu (xi , yi ) + Φl (xi , yi ) + Φd (xi , yi ). →
↑
←
↓
280
Sreˇcko Brlek, Gilbert Labelle, and Annie Lacasse
The integral formulas in Lemma 1 yield the corresponding incremental algorithms called respectively V-algorithm, H-algorithm and VH-algorithm, where the letters V and H stand for the words vertical and horizontal: in a V-algorithm (resp. H-algorithm) only vertical (resp. horizontal) sides of the polyomino are used; in VH-algorithms both vertical and horizontal sides are used. Proposition 1 (Incremental Algorithms of Green’s Type). Let P be a polyomino encoded by (s, ω). Then, f (x, y) dx dy = Φr (xi , yi )+ Φu (xi , yi )+ Φl (xi , yi )+ Φd (xi , yi ), P
→
←
↑
↓
where the functions Φr , Φu , Φl , Φd are taken from any of the following three sets of possibilities 1 1 V: Φr = 0, Φu = 0 f1 (x, y + t) dt, Φl = 0, Φd = − 0 f1 (x, y − t) dt. 1 1 H: Φr = − 0 f2 (x + t, y)dt, Φu = 0, Φl = 0 f2 (x − t, y)dt, Φd = 0. 1 1 VH: Φr = −y 0 F (x + t, y)dt, Φu = x 0 F (x, y + t)dt, 1 1 Φl = y 0 F (x − t, y)dt, Φd = −x 0 F (x, y − t)dt. where f1 (x, y), f2 (x, y) and F (x, y) are defined by Lemma 1. Elementary instances of these algorithms are given in the following tables for the area (Table 1) where f (x, y) = 1, center of gravity (Table 2), where f (x, y) = x and f (x, y) = y; and moment of inertia (Table 3), where f (x, y) = x2 + y 2 . Table 1. Area. Algorithm Φr
Φu Φl
Φd
V-algo
0
x
0
−x
H-algo
−y
0
y
0
VH-algo
−y/2 x/2 y/2 −x/2
For instance, using the polyomino ω =rrururullulddldd, we obtain: VH-algorithm for the area: → −yi /2 + ↑ xi /2 + ← yi /2 + ↓ −xi /2, 1 dx dy = −y0 /2 − y1 /2 + x2 /2 − y3 /2 + x4 /2 − y5 /2 + x6 /2 + y7 /2 + y8 /2 P
+ x9 /2 + y10 /2 − x11 /2 − x12 /2 + y13 /2 − x14 /2 − x15 /2 = 1 − 1/2 + 3/2 − 1 + 2 + 3/2 + 3/2 + 1 + 2 − 1/2 − 1/2 + 1 = 9. V-algorithm for x ¯ of the center of gravity: → 0+ ↑ x2i /2+ ← 0+ ↓ −x2i /2. x dx dy = x22 /2 + x24 /2 + x26 /2 + x29 /2 − x211 /2 − x212 /2 − x214 /2 − x215 /2 = 31/2. P V-algorithm for the integral involved in the moment of inertia: x3 x3 (x2 + y 2 ) dx dy = ↑ x3i + xi yi + 3i + xi yi2 + ↓ − x3i + xi yi − 3i − xi yi2 = 92. P
Incremental Algorithms Based on Discrete Green Theorem
281
Table 2. Center of gravity. Algorithm
Φr
Φu 2
Φl
Φd
V-algo (num x ¯)
0
x /2
0
−x2 /2
V-algo (num y¯)
0
x/2 + xy
0
x/2 − xy
H-algo (num x ¯)
−y/2 − xy
0
−y/2 + xy
0
H-algo (num y¯)
−y 2 /2
0
y 2 /2
0
x2 /3
xy/3 − y/6
−x2 /3
xy/3 + x/6
y 2 /3
−xy/3 + x/6
VH-algo (num x ¯) −xy/3 − y/6 VH-algo (num y¯)
−y 2 /3
Table 3. Moment of inertia. V-algo
Φu = x/3 + xy + x3 /3 + xy 2
Φr = 0
Φd = −x/3 + xy − x3 /3 − xy 2
Φl = 0 H-algo
2
3
Φr = −y/3 − xy − x y − y /3
Φu = 0
Φl = y/3 − xy + x2 y + y 3 /3 2
Φd = 0 3
VH-algo Φr = −y/12 − xy/4 − x y/4 − y /4 Φu = x/12 + xy/4 + x3 /4 + xy 2 /4 Φl = y/12 − xy/4 + x2 y/4 + y 3 /4
Φd = −x/12 + xy/4 − x3 /4 − xy 2 /4
The next example computes the probabilty that a random point (x, y) ∈ R × R, under a normal bivariate probability distribution, f (x, y) = π1 exp(−x2 − y 2 ), falls in a given polyomino P . In this case the VH-algorithm is complicated and only the V and H-algorithms are given (see Table 4). Discrete probability distributions (such as uniform distributions over rectangles) will be considered in the next section. Due to its formulation, the VH-algorithm is in general more complicated than the corresponding V and H-algorithms. There is, however, an important class of functions for which the VH-algorithm is generally preferable: the class of homogeneous functions f (x, y). That is those functions satisfying a functional equation of the form f (sx, sy) = sk f (x, y) for a constant k, called the degree of homogeneity. The corresponding VH-algorithm is described in Corollary 1. Corollary 1 Let f (x, y) be a continuous homogeneous function of degree k > −2 and let Φr , Φu , Φl , Φd be defined by −y x (f1 (x+1, y)−f1 (x, y)), Φu (x, y) = (f2 (x, y+1)−f2 (x, y)), k+2 k+2 −y x Φl (x, y) = (f1 (x−1, y)−f1 (x, y)), Φd (x, y) = (f2 (x, y −1)−f2 (x, y)), k+2 k+2 where f1 (x, y) and f2 (x, y) are defined in Lemma 1. Then the corresponding additive incremental VH-algorithm computes P f (x, y) dx dy, for P. Φr (x, y) =
Here is a typical illustration of Corollary 1 for which the VH-algorithm is simpler than the corresponding V or H-algorithms. The computation of the average euclidean distance from a given point (a, b) ∈ Z × Z to a random point in a polyomino P is given by the formula
282
Sreˇcko Brlek, Gilbert Labelle, and Annie Lacasse Table 4. f (x, y) =
1 exp(−x2 π
− y 2 ), erf(x) =
x 0
exp(−t2 ) dt.
Φu = 14 erf(x)(erf(y + 1) − erf(y)),
V-algo Φr = 0, Φl = 0, H-algo
√2 π
Φr = − 14 erf(y)(erf(x + 1) − erf(x)), Φl = − 14 erf(y)(erf(x − 1) − erf(x)),
Φd = 14 erf(x)(erf(y − 1) − erf(y)), Φu = 0, Φd = 0.
(x − a)2 + (y − b)2 dx dy P . A(P ) This is reducible to the computation of the integral P f (x, y) dx dy by simply replacing the starting point s = (x0 , y0 ) by s − (a, b) = (x0 − a, y0 − b). This corresponds to the choice f (x, y) = x2 + y 2 and k = 1 in Corollary 1. In this case, the functions f1 (x, y) and f2 (x, y) are given by the formulas 1 1 f1 (x, y) = x x2 + y 2 + y 2 ln(x + x2 + y 2 ), 2 2 1 1 f2 (x, y) = y x2 + y 2 + x2 ln(y + x2 + y 2 ). 2 2
3
Additive Incremental Algorithms
In the above examples, the function f = f (x, y) was assumed to be continuous. We can often drop this condition on f and still use Proposition 1 as a guideline to devise corresponding algorithms. For example, algorithms for the computation of horizontal and vertical projections of a polyomino can be found in this way: take an integer α and define f (x, y) = χ(α ≤ x < α + 1), where χ denotes the characteristic function (which takesthe value 1 if the inequations are satisfied, and 0 otherwise). Then, obviously, P f (x, y) dx dy is the α-vertical projection of the polyomino P : f (x, y) dx dy = #{β ∈ Z | pixα,β ⊆ P } = vα (P ), P
where pixα,β denotes the unit pixel of the plane having the point (α, β) ∈ Z × Z as its lowest left corner: pixα,β = {(x, y) ∈ R × R | α ≤ x < α + 1, β ≤ y < β + 1}. In this case, using Proposition 1, we find that x if x < α; 0 f1 (x, y) = χ(α ≤ u < α + 1) du = x − α if α ≤ x < α + 1; 1 if α + 1 ≤ x. This gives the following V-algorithm for the vertical projection vα (P ): Φr = 0,
Φu = X (x ≥ α + 1),
Φl = 0,
Φd = −X (x ≥ α + 1).
Incremental Algorithms Based on Discrete Green Theorem
283
Similarly, taking f (x, y) = χ(β ≤ y < β + 1), the β-horizontal projection of the polyomino P defined by #{α ∈ Z | pixα,β ⊆ P } = hβ (P ), can be computed by the H-Algorithm for the horizontal projection hβ (P ): Φr = −X (y ≥ β + 1),
Φu = 0,
Φl = X (y ≥ β + 1),
Φd = 0.
These algorithms for the projections are special instances of the general notion of additive incremental algorithm defined as follows. Definition 1 An incremental algorithm Φr (x, y), Φu (x, y), Φl (x, y), Φd (x, y), is called additive if, whenever P is the union of two polyominoes P1 , P2 with disjoint interiors, we have output(P ) = output(P1 ∪ P2 ) = output(P1 ) + output(P2 ). An example of a non additive incremental algorithm is given by Φr = Φu = Φl = Φd = 1 which simply computes the perimeter of a polyomino. Proposition 2 An incremental algorithm Φr (x, y), Φu (x, y), Φl (x, y), Φd (x, y), is additive if and only if Φl (x, y) = −Φr (x − 1, y)
and
Φd (x, y) = −Φu (x, y − 1).
Moreover the output of an additive incremental algorithm on P is given by output(P ) =
∆x Φu (α, β) − ∆y Φr (α, β),
(1)
pixα,β ⊆P
where ∆x Φ(x, y) = Φ(x + 1, y) − Φ(x, y) and ∆y Φ(x, y) = Φ(x, y + 1) − Φ(x, y). Proof. (Sketch) The main idea is to reduce the analysis to the case where the polyomino is a (horizontal or vertical) domino, where the sum cancels over the common edge. Proposition 2 can be used, for example, to prove rigourously that a given additive incremental algorithm actually works. For example, the reader can check, using it, that the above algorithms for the projection vα (P ) and hβ (P ) are valid. The validity of the boolean valued additive incremental algorithms below can also be checked using Proposition 2. Another use of this proposition is to create new algorithms starting first from an arbitrary choice of functions Φr (x, y), Φu (x, y); then by defining the associated functions Φl (x, y), Φd (x, y); and, finally, by computing the corresponding output.
284
Sreˇcko Brlek, Gilbert Labelle, and Annie Lacasse
11 0 01 01 0 1 0 01 1 01 01 0 1 01 0 01 0 01 1 01 01 0 1 01 0 01 1 01 01 01 0 00 11 01 1 01 1 01 01 01 0 00 11 01 1 01 0 01 1 01 01 0 00 11 01 1 01 0
11 00 00 11 00 11 00 11 0 1 00 11 00 11 00 11 00 11 01 1 0 01 1 0 00 11 00 11 00 11 00 11 01 00 11 00 11 00 11 00 11 001 11 01 01 0 00 11 00 11 00 11 00 11 001 11 01 01 01 0 00 11 00 11 00 11 001 11 01 01 01 0
(a)
(b)
Fig. 2. (a) Pixel pix1,3 in the polyomino (b) pixel pix4,3 not in the polyomino.
Deciding if a Polyomino Contains a Given Pixel. Let (α, β) ∈ Z × Z and consider the boolean-valued function Wα,β (x, y) = χ(x = α)χ(y = β). Since 1 if pixα,β ⊆ P , Wα,β (x, y) = χ(pixα,β ⊆ P ) = 0 otherwise, pixx,y ⊆P
then, the following additive incremental algorithms can be used to decide whether the pixel determined by (α, β) belongs or not to a polyomino P . V-algorithm: Φr = 0, Φu = χ(x ≥ α + 1)χ(y = β), Φl = 0, Φd = −χ(x ≥ α + 1)χ(y = β + 1). H-algorithm: Φr = −χ(x = α)χ(y ≥ β + 1), Φu = 0, Φl = χ(x = α + 1)χ(y ≥ β + 1), Φd = 0. For example, the V-algorithm applied to Figure 2(a) with (α, β) = (1, 3) and to Figure 2(b) with (α, β) = (4, 3) gives respectively (only non-zero terms listed): χ(pix1,3 ⊆ P ) = χ(x11 ≥ 2)χ(y11 = 3) − χ(x16 ≥ 2)χ(y16 = 4) + χ(x22 ≥ 3)χ(y22 = 3) χ(pix4,3
= 1 − 1 + 1 = 1 (since pix1,3 ⊆ P ); ⊆ P ) = χ(x11 ≥ 2)χ(y11 = 3) − χ(x16 ≥ 2)χ(y16 = 4) = 1 − 1 = 0 (since pix4,3 ⊆ P ).
α,β α,β α,β Of course there is an uncountable family of algorithms Φα,β r , Φu , Φl , Φd from which one can compute χ(pixα,β ⊆ P ).
Pixels in Common between a Polyomino and a Given Set. Let S be a p,q p,q p,q set of pixels and let Φp,q r , Φu , Φl , Φd be an algorithm for the computation of χ(pixp,q ⊆ P ),
(p, q) ∈ Z × Z.
Then, to decide if a polyomino P intersects S, one must compute χ(S ∩ P = ∅). This can obviously be done by taking ΦSr , ΦSu , ΦSl , ΦSd , where ΦSr (x, y) = ΦSl (x, y) =
sup
S Φp,q r (x, y), Φu (x, y) =
sup
S Φp,q l (x, y), Φd (x, y) =
pixp,q ⊆S pixp,q ⊆S
sup
Φp,q u (x, y),
sup
Φp,q d (x, y).
pixp,q ⊆S pixp,q ⊆S
Incremental Algorithms Based on Discrete Green Theorem
285
To compute the number #(S ∩ P ) of pixels in common between S and P , simply replace in the last algorithm the sup symbols by summation symbols . Computation of Hook-Lengths. Consider the north-east corner in the R × R plane associated to a given lattice point (α, β) ∈ Z × Z N E α,β = {(x, y) ∈ R × R | α ≤ x, β ≤ y} = [α, ∞) × [β, ∞). Then the reader can check that the following algorithms can be used to compute, for a polyomino P , the number of pixels in P ∩ N Eα,β , i.e., the number of pixels of P which are to the north-east of (α, β) (see Figure 3): V-algorithm: Φr = 0, Φu = (x − α)χ(x ≥ α + 1)χ(y ≥ β), Φl = 0, Φd = −(x − α)χ(x ≥ α + 1)χ(y ≥ β + 1). Φu = 0, H-algorithm: Φr = −(y − β)χ(x ≥ α)χ(y ≥ β + 1), Φl = (y − β)χ(x ≥ α + 1)χ(y ≥ β + 1), Φd = 0.
1 0 0 1 0 1 0 1 01 1 00 11 0 1 00 11 00 11 00 11 00 11 111111 000000 00 1 0 1 00 11 0 1 00 11 00 11 00 11 00 11
Fig. 3. There are 21 pixels in P to the north-east of (α, β), and 11 pixels in the Hookα,β .
Let (α, β) ∈ Z × Z and P be a polyomino. The hook-length of (α, β) ∈ P is hookα,β (P ) = #(P ∩ Hookα,β ) where Hookα,β = N Eα,β \ N Eα+1,β+1 . In other words, it is the number of pixels of P belonging to the L-shaped Hookα,β determined by (α, β) (see Figure 3). Replacing (α, β) by (α + 1, β + 1) in the above algorithms and substracting gives corresponding algorithms for the computation of hook-lengths. Computation of Higher Order Moments. Our approach for the computation of higher order moments is equivalent to the one given by Yang and Albregsten in [11, 12] and differ because we use Stirling instead of Bernoulli numbers. For sake of completeness, we provide it in our framework. Consider two integers m, n ≥ 0 and a point (a, b) ∈ Z × Z. By definition, the (m, n)-moment of a polyomino P relative to the point (a, b) is given by the following integrals (x − a)m (y − b)n dx dy = xm y n dx dy, P
P
286
Sreˇcko Brlek, Gilbert Labelle, and Annie Lacasse
where the second is obtained by a simple translation. In this case, (x + 1)m+1 − (x)m+1 (y + 1)n+1 − (y)n+1 . xm y n dx dy = W (x, y) = m+1 n+1 P 1 m+1 n+1 = ∆y y . ∆x x (m + 1)(n + 1) k k (v) , where Svk denotes the Now, it is well-known (see [4]) that tk = v=0 Sv t (v) Stirling numbers of the second kind and t = t(t − 1) . . . (t − v + 1). Since ∆t t(v) = vt(v−1) , it is easily seen that,
W (x, y) =
wi,j x(i) y (j) ,
0≤i≤m,0≤j≤n
wi,j =
(i + 1)(j + 1) m+1 n+1 S S . (m + 1)(n + 1) i+1 j+1
To find solutions (U, V ) = (Φu , Φr ) of the difference equation (1), let U (x, y) = ui,j x(i) y (j) , V (x, y) = vi,j x(i) y (j) . Then, ∆x U − ∆ x V =
((i + 1)ui+1,j − (j + 1)vi,j+1 )x(i) y (j) ,
and the problem is reduced to solve the linear system (i + 1)ui+1,j − (j + 1)vi,j+1 = wi,j ,
i, j ≥ 0.
Of course, many choices are possible for the ui,j ’s, vi,j ’s and the same kind of approach can be used for other wi,j ’s.
4
Conclusion
The Discrete Green Theorem provides a general framework allowing the discovery and development of new algorithms for the computation of many statistics on polyominoes. Let us also mention, the simultaneous computation of vertical projections of a polyomino P : setting Φr (x, y) = 0, Φu (x, y) = q x , where q is a ) formal variable, the coefficients of α∈Z vα (P )q α = − output(P are the vertical 1−q projections (horizontal or oblique projections are obtained in a similar way). This might be of some help for the study of families of polyominoes defined by their projections (see [1, 6]). Computations on integer partitions are obtained along the same lines since partitions are special cases of polyominoes which are encoded by words of the type ω = ri θdj , where θ is a word on {u, l} containing i times the letter l and j times the letter u. Note also that their complexity is (time and space) linear in the boundary size of a polyomino: indeed the Freeman chain code of a polyomino is its perimeter, whose size determines the number of iterations in the incremental algorithms. The careful reader has certainly noticed that the algorithms carried out can be straightforwardly adapted to more general objects: for a polyomino with holes
Incremental Algorithms Based on Discrete Green Theorem
287
it suffices to substract the holes; needless to say that it also extends to objects coded by a closed curve. The lack of space permits to show only a small part of the results of this method. For the detailed proofs, discussion, as well as other features not presented here, the reader is referred to the research report [3] that can be obtained from the authors on special request.
Acknowledgements The authors wish to thank the anonymous referees for the valuable comments that improved greatly the paper readability.
References 1. Barcucci, E., Del Lungo, A., Nivat, M., Pinzani, R.: Reconstructing convex polyominoes from their vertical and horizontal projections, Theoret. Comput. Sci., 155 (1996) 321–347 2. Bousquet-M´elou, M.: New enumerative results on two-dimensional directed animals, Discrete Math. 180 (1-3) (1998) 73–106 3. Brlek, S., Labelle, G., Lacasse, A.: Incremental Algorithms for Polyominoes Coded by their Contour, Research Report, Lacim (Un. Quebec ` a Montr´eal) (2003) 4. Clarke, A. L.: Isometrical polyominoes. J. Recreational Math. 13 (1980) 18–25 5. Comtet, L.: Advanced Combinatorics. Reidel (1974) 6. Delest, M. P., Gouyou-Beauchamps, D., Vauquelin, B.: Enumeration of parallelogram polyominoes with given bound and site perimeter, Graphs Comb. 3 (1987) 325–339 7. Del Lungo, A.: Polyominoes defined by two vectors. Theoret. Comput. Sci. 127 (1) (1994) 187–198 8. Freeman, H.: On the Encoding of Arbitrary Geometric Configurations, IRE Trans. Electronic Computer 10 (1961) 260–268 9. Freeman, H.: Boundary encoding and processing, in Picture Processing and Psychopictorics, B.S. Lipkin and A. Rosenfeld, Editors. Academic Press: New York. (1970) 241-266. 10. Golomb, S. W.: Polyominoes: Puzzles, Patterns, Problems, and Packings. Princeton University Press (1996) 11. Philips, W.: A new fast algorithm for moment computation: Pattern Recognition. 26(11), (1993) 1619–1621 12. Tang, G.Y., Lien, B.: Region Filling With The Use Of The Discrete Green Theorem. Proc. CVGIP(42) (1988) 297–305 13. Viennot, X. G.: A survey of polyomino enumeration, Proc. S´eries formelles et combinatoire alg´ebrique, Montr´eal, Juin 1992. Publications de LACIM 11, Universit´e du Qu´ebec ` a Montr´eal (1996) 14. Yang, L., Albregtsen, F.: Fast computation of invariant geometric moments. A new method giving correct results. In Proceeding of the International Conference on Pattern Recognition (ICPR’94) (1994) A:201–204 15. Yang, L., Albregtsen, F.: Fast and exact computation of Cartesian geometric moments using discrete Green’s theorem. Pattern Recognition. bf 29 No. 7 (1996) 1061–1073
Using 2D Topological Map Information in a Markovian Image Segmentation Guillaume Damiand, Olivier Alata, and Camille Bihoreau IRCOM-SIC, UMR-CNRS 6615 - bˆ at. SP2MI, Bvd M. et P. Curie BP 30179, 86962 Futuroscope Chasseneuil Cedex, France {damiand,alata}@sic.univ-poitiers.fr Abstract. Topological map is a mathematical model of labeled image representation which contains both topological and geometrical information. In this work, we use this model to improve a Markovian segmentation algorithm. Image segmentation methods based on Markovian assumption consist in optimizing a Gibbs energy function. This energy function can be given by a sum of potentials which could be based on the shape or the size of a region, the number of adjacencies,. . . and can be computed by using topological map. In this work we propose the integration of a new potential: the global linearity of the boundaries, and show how this potential can be extracted from the topological map. Moreover, to decrease the complexity of our algorithm, we propose a local modification of the topological map in order to avoid the reconstruction of the entire structure. Keywords: Markovian segmentation, topological maps, region segmentation, boundaries linearity.
1
Introduction
Topological maps were studied since several years in 2D [1,2,3] and more recently in 3D [4,5,6,7]. Indeed, a topological map represents a labeled image with interesting properties: it is minimal in number of atomic elements (darts); it is complete, it represents both topology and geometry of the image; and it is unique. For these reasons, topological map allows to retrieve most of the information which may be required by an image processing algorithm with a low computational cost. So the topological map seems to be a good tool to define efficient image processing. The main objective of image segmentation is to partition the pixels of an image. In such context, there are two main research axes: the boundary and the region based methods [8]. Fusions of both approaches have also been proposed. Image segmentation can be used in many applications like content-based image retrieval, computer-aided medical diagnostic, recovery of shape information from an image,. . . In previous works [9,10,11], topological map was often used in split-andmerge algorithms, since it is well suited to implement efficiently such methods. In this work, we present a new utilization of the topological map to improve a Markovian segmentation algorithm. I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 288–297, 2003. c Springer-Verlag Berlin Heidelberg 2003
Using 2D Topological Map Information in a Markovian Image Segmentation
289
The image partition contains areas of pixels considered as homogeneous following some properties. Using Markov models and Monte-Carlo Markov Chain (MCMC) implementation like Simulated Annealing (SA) [12], the only properties used for aggregating pixels are often statistical properties [13,14,15,16]. Nevertheless, many other geometrical or topological information on the segmented or label field could be used: the Markovian assumption for the representation of the hierarchical field composed by an observation field and a label field leads to an unnormalized Gibbs distribution; the energy of the Gibbs distribution can be written as a sum of potential functions which is a powerful tool for the fusion of information; geometrical or topological information on the label field could then be integrated in potential functions. Our aim is then to find some potential functions based on geometrical or topological properties and to compute them thanks to the topological map. In this paper, we show how to favor the creation of regions with linear boundaries during the segmentation process. We first present in Sec. 2 a brief recall on topological maps that are combinatorial maps extended to represent images. Then in Sec. 3 we introduce the Markovian image segmentation and show how potentials are integrated in such a process. In Sec. 4 we define our new potential used to favor linear boundaries, and present how this new potential is integrated with topological maps in Sec. 5. We give experimental results in Sec. 6, then we conclude and present some perspectives in Sec. 7.
2
Topological Maps
Topological maps are an extension of combinatorial maps [17,18] in order to represent in a unique and minimal way a labeled image. Indeed, combinatorial maps are a good model that allow to represent any orientable, quasi-manifold, closed subdivision in any dimension, but one object can be represented by different maps. We present here briefly the main notions of combinatorial maps and of topological maps (see [19,5] for more details). Intuitively, a 2D combinatorial map is an extension of a planar graph that keeps the orientation of edges around each vertex. Each edge of the graph is divided in two parts. Basic elements obtained are called darts and are the unique atoms of the combinatorial map definition. A combinatorial map is an algebra composed by a set of darts that represent the elements of the subdivision, and 2 mappings defined of these darts that represent adjacency relations (this can be easily extended in nD, with n mappings). We can see in Fig. 1 an image and the corresponding topological map. In this figure, there are 20 darts numbered from 1 to 20. β1 is a permutation1 that connects a dart and the next dart of the same face. There are for example β1 (2) = 3 and β1 (3) = 11. β2 is an involution2 that connects the two darts belonging to the same edge. In our example, there are β2 (1) = 2 (and since β2 is an involution also β2 (2) = 1). When two darts d1 and d2 are such that 1 2
A permutation on a set S is a one to one mapping from S onto S. An involution f on a set S is a one to one mapping from S onto S such that f = f −1 .
290
Guillaume Damiand, Olivier Alata, and Camille Bihoreau R0 R1
R5
6 3 4
1
R2
2
5
16 15
11 12
7 10 9
j
8
3
g
18 13
4 i
11 12 h
1 2
k
17 19
17 19
l c
(a)
e
f
14 20
R4 R3
a
b
R0
(b)
d
(c)
Fig. 1. (a) A 2D image drawn with its interpixel boundaries. (b) The corresponding topological map. Each dart is represented by a numbered arrow. β1 connects a dart and the next dart of the same face (drawn consecutively, orientation is represented with arrows). β2 connects two darts drawn parallel, close to each other, and with reverse orientations. (c) The embedding used in this work (partial representation). Each edge is linked with an 1D oriented curve. Only one dart of the two that composed an edge points to the 1D curve (liaisons are represented by dash arrows). The orientation of the curve is given by the orientation of this dart.
βi (d1 ) = d2 , we say that d1 is i -sewn with d2 . We call i -sewing (resp. i -unsewing) the operation that connects two darts for βi (resp. disconnects). A topological map is a combinatorial map that represents a labeled image and that verifies particular properties. Indeed, this map is minimal, complete and unique. These properties lead to another characteristic of the topological map: each edge represents exactly an interpixel boundary between two regions of the image (this can be verified in Fig. 1). An interpixel boundary between two regions Ri and Rj , is the set of interpixel curves such that each linel of these curves is incident to exactly one pixel of Ri and one pixel of Rj . Combinatorial map represents the topological part of our model: all the cells of the space subdivision and all the adjacency and incidence relations. But it is also necessary to represent the geometry of the image. We call embed the operation that associates a geometrical model to a combinatorial map, and we speak about embedding to design this geometrical model. There are many possibilities to embed a combinatorial map and the choice of one of them depends on the needs of each application. In this work, we link to each edge of the map an 1D oriented curve. This curve represents the geometry of the interpixel boundary associated with the edge. We can see in Fig. 1(c) this type of embedding for the map already presented in the previous figure (only a partial representation). Each 1D curve is described with a 1D combinatorial map (we have so a hierarchical model that facilitates its extension in higher dimension). Each vertex of these curves represents a pointel of an interpixel boundary, and each edge represents a maximal set of aligned linels.
3
Integration of Information in Markovian Image Segmentation
Let X = {Xs , s ∈ S} a family of random variables on a regular grid S which is a finite subset of Z2 and an image x = {xs , s ∈ S}, a realization of X. Each
Using 2D Topological Map Information in a Markovian Image Segmentation
291
Xs has the same state space that can be Ωx = {0, 1, . . . , 255}, the set of gray pixel values, or Ωx = R, . . . We now suppose that there is another random field L = {Ls , s ∈ S} called the label field with state space Ωl = {c1 , c2 , . . . , cK } for which a realization will be l = {ls , s ∈ S}. In our case, a label represents the region to which the pixel belongs. From these definitions, we suppose that X and L are defined in a hierarchical way: for each ls , we have a conditional model for Xs that can be, for example, an independently and identically distributed (i.i.d.) model or a 2D Gaussian Markovian model [13,15,16]. As the field X is supposed to be composed of K areas, we then have K probability laws, defined by the stochastic models chosen, describing the variations of gray pixel values in x for the different regions belonging to the K areas. Besides, such models allows the use of Maximum Likelihood-based algorithms. For example, in the case of the Gaussian i.i.d. mixture model, K and the parametric models, θk = {µk , σk }, k = 1 . . . K, can be estimated with a Stochastic Expectation Maximization algorithm (SEM) [20]. µk and σk are respectively the mean and the standard deviation of the area k. In the following, K and θk , k = 1 . . . K, are then supposed to be known. At this step, we need now to estimate the segmented field, ˆl, or, in other words, the image partition. The likelihood method defines the a posteriori law P (L = l/X = x) or P (l/x) as the probability to get one specific realization of the label field knowing the observation field. If the Markovian assumption is done about the (X, L) hierarchical field, P (l/x) can be written in a general form following the Gibbs distribution: U (x, l) P (l/x) ∝ exp − (1) T in which U is an “energy function”, depending on the observation and label fields, and T , the temperature. Therefore, Maximum a Posteriori (MAP) estimation of l, ˆl, consists in the minimization of U in order to maximize P (l/x). SA methods have shown to be appropriate for such an optimization problem [12]. SA is an iterative process with decreasing T . For each T , a Gibbs sampler is done on the whole pixel of S: a label is sampled at each pixel following local probabilities of labels. The energy function can be seen as a sum of weighted potentials, each one corresponding to a particular measure on the observation field or the label field. Potentials can be expressed locally, i.e. for each pixel, thanks to the Markovian assumption. Local probabilities of labels at each pixel can then be computed from these potentials. Therefore, the potential functions will allow us to integrate global information about the label field in a local probability.
4
The Global Linearity Potential (GLP)
In some regions, favoring geometric properties of boundaries during a segmentation algorithm can be of interest. As an example, aerial images of cities contain mostly regions with linear boundaries. In the following, we present a potential function that is able to take into account such property.
292
Guillaume Damiand, Olivier Alata, and Camille Bihoreau
pixel p lp = w
?
lp = g
Fig. 2. An example of boundary pixel.
First of all, at a given iteration of the Gibbs sampler, the GLP (called ΦGL ) is calculated only on pixels belonging to a boundary, i.e. we exclude isolated pixels and pixels inside a region: V (lp ) , if s belongs to a boundary (2) ΦGL (lp ) = 0, elsewhere V (lp ) is the energy function associated with the value of lp . For the example given in Fig. 2, V (lp = w) is the energy when we consider the pixel p in the white region, and V (lp = g) the energy when p is in the grey one. To favor the white case, we search for a function that gives: V (lp = w) ≤ V (lp = g) ≤ V (lp = k) = 1 with (w, g, k) ∈ Ωl3 , w = g, w = k, g = k. To achieve this objective, we discretize each boundary in a succession of discrete segments. Therefore, we can choose V as follows, when the pixel s has the label lp : V (lp ) =
nb (lp ) nb (lp ) i=1
(3)
li (lp )
where nb (lp ) is the number of segments of the boundaries, and li (lp ), i = 1, ..., nb (lp ), the length of the different segments. If we still consider the example in Fig. 2, it gives V (w) = l1b < V (g) = lb 5+2 < 1, with lb the length in the case of a linear boundary; lb > 1 as we excluded isolated pixels.
5
Integration of the GLP with Topological Maps
Since the SA is an iterative method, we compute at each iteration a topological map that corresponds to the current label field. Then, to compute the GLP in a pixel p, we proceed in four steps: 1. Test if p belongs to a boundary. Indeed, we compute the GLP only for boundary pixels, for other pixels GLP is equal to 0; 2. Compute in what regions p can be set. Indeed, since we do not consider the boundaries of isolated pixels, we can not set p is a region that leads to the creation of such pixel; 3. Modify locally the map to take into account the modification of the region of p. This optimization is necessary in order to avoid the entire reconstruction of the map for each pixel of the image; 4. Finally, compute GLP on this map by using a discretization algorithm.
Using 2D Topological Map Information in a Markovian Image Segmentation
293
The first step can be easily achieved by testing the 4-neighbor pixels of p. When they are all in the same region as p, p is not a boundary pixel and reciprocally. For the step 2, we traverse the topological map in order to find all the darts that touch p (note that we can obtain the same result by looking at the regions of the 4-neighbor pixels of p, but darts computed here are going to be used in the following). We say that a dart touches a pixel when it represents an interpixel curve that contains a linel incident to the pixel. We can remark that only 0, 1, 2, 3 or 4 darts can touch a pixel. All possible configurations are given in Fig. 3 (We did not represent the cases when no dart and when 4 darts touch p).
p 1
p
p
p
p
2
3
4
5
(a) 1 dart.
p
p
p
6
7
8
(b) 2 darts.
p
p
9
10
(c) 3 darts.
Fig. 3. The configurations of darts around a pixel, sorted in number of darts. Others configurations can be deduced from these ones by rotations or by changing the length of the darts.
Given p, retrieving all the incident darts can be done by traversing all the darts of the region of p, and for each dart by looking if the corresponding 1D curve contains a linel incident to p (by comparing coordinates). This can be performed in a linear complexity in number of linels that belong to the boundaries of the region of p. We present in Fig. 4(a) an image made of 3 regions (white, light grey and dark grey), and its topological map (in the same figure). The pixel p touches 2 darts (numbered 1 and 2), which corresponds to the case number 7 in Fig. 3.
2 p
p
p
(a) Initial map.
(b) After unsewings.
(c) Map obtained.
1
Fig. 4. A configuration of darts around a boundary pixel, and local modifications performed to change the region of the pixel.
Then we can find all the regions around p, just by recovering the regions of the darts 2-sewn with darts that touch p. In Fig. 4(a), β2 (1) gives a dart that belongs to the dark grey region, and β2 (2) a dart that belongs to the white region.
294
Guillaume Damiand, Olivier Alata, and Camille Bihoreau
p
p
Fig. 5. Two different maps with same initial configuration of darts around a boundary pixel when p belonged to the light grey region (initial darts drawn in light grey), and with different maps after modification.
The step 3 of the algorithm consists in affecting p successively into all these regions and modify locally the map in order to represent this modification. For that, we need to define for each case shown in Fig. 3 how to transform locally the map. First, we can remove cases numbered 1, 5 and 9 since we do not consider isolated pixels. For the same reason, we can remove the two cases not drawn in the figure where p touches 0 or 4 darts. We remove also the case numbered 8 because when we change the region of p this leads to topological modification of the map. With the GLP, we want to favor linear boundaries only by modifying locally the boundaries extract from the previous iteration. For this reason, we do not allow to change the region of p into a region when this leads to topological modification. So we still only have to consider cases numbered 2, 3, 4, 6, 7 and 10. We present in Fig. 4 modifications performed for the case 7, since we can not give here all the different algorithms and they are quite similar. Starting from the map shown in Fig. 4(a), we change the region of p to the white region. This region touches the dart 2, so we first begin by cutting the face between the two darts 1 and 2 (see Fig. 4(b)). This is done by unsewing the two edges incident to the darts 1 and 2 and by decreasing their sizes. Then, we create two new edges, one which pass to the left of p and the second to the bottom. These edges are sewn with the darts that were previously sewn with the initial edges. The map obtained is given in Fig. 4(c), but this is not the final result. Indeed, modifications performed here are done for the general case, and we do not obtain a topological map (the map in Fig. 4(c) is not minimal since there are some degree two vertices). We prefer to simplify the map after the local modification in order to propose a general algorithm and to decrease the number of different cases to consider. We can see in Fig. 5 two examples with the same initial configuration of darts around p when p belonged to the light grey region. For both configurations, putting p in the white region leads to the same map (those presented in Fig. 4(c)) since both local configurations of darts around p are the same. This is only during the simplification of the map (which merge edges around degree two vertices) that we perform different operation, and obtain so the two different maps shown in Fig. 5. The last step of the GLP computation consists in retrieving, in the modified topological map, the two numbers used in the V (lp ) formula: the number of segments of the boundaries and the length of the different segments. For that, we use a discrete curve polygonalization algorithm of the pointels of each 1D
Using 2D Topological Map Information in a Markovian Image Segmentation
(a) Original image.
(b) SA - without GLP.
295
(c) SA - with GLP.
Fig. 6. GLP influence on a synthetic image.
(a) Original image.
(b) SA - without GLP.
(c) SA - with GLP.
Fig. 7. GLP influence on a real image.
curves around p (with the algorithm presented in [21]). This gives the number of discrete segments of each boundaries around p. The length of each segment is simply computed with the Euclidean distance.
6
Experimental Results
The synthetic (64×64) image given in Fig. 6(a) has been used to test the method. This image is made up five Gaussian i.i.d. processes, with different means and variances, inside regions with linear boundaries. The weights between potentials3 have been fixed in order to focus on the influence of the GLP. 200 iterations of the Gibbs sampler has been realized from T0 = 2 with the decreasing scheme: Tk = 0.99k T0 . Figure 6(b) shows the result without using GLP and Fig. 6(c) with GLP. Both results are good but boundaries are better with the GLP. Moreover, the percentage of label errors is 0.3174 for Fig. 6(b) against 0.1709 for Fig. 6(c). We can see in Fig. 7 one result obtained for a “real” image. We can observe the influence of the GLP by comparing Fig. 7(b) and Fig. 7(c). Boundaries are more straight with the GLP, even if the differences are not very visible. We need 3
Besides the GLP, we used three potentials respectively based on the Gaussian probability law, the local label neighboring and the size of regions.
296
Guillaume Damiand, Olivier Alata, and Camille Bihoreau
more experiments in order to fix the weights associated with each potential and so to show the interest of this potential in real applications.
7
Conclusion
In this work we have shown how the topological map can be used in order to improve an existing image processing: a Markovian segmentation algorithm. Indeed, topological map is a good model that allows to retrieve most of the information, both topological and geometrical, of the image in an efficient way. Moreover, this work shows that we can use this model in different algorithm and not only in split-and-merge approaches. In other works, Markovian segmentation with MCMC implementation has been done essentially by using statistical properties. It was due to a lack of objects that allow to encode efficiently geometrical and topological properties of the label field. With topological map, we could propose new global potentials that allows to favor properties of label field. We show in this paper how to favor the global linearity of the boundaries. It is possible now to extend this work in order to propose others potentials. Many others properties can be used, like the shape of a region or the number of adjacent regions. Our goal is to define a set of particular potentials that we can use or not, depending on the type of image to process. Moreover, the definition of algorithms to compute these potentials will probably lead questions on how to process particular operations on the topological map, like for the local modification. This is particularly interesting in order to improve our model and to propose new tools to deal with topological maps.
References 1. Domenger, J.: Conception et impl´ementation du noyeau graphique d’un environnement 2D1/2 d’´edition d’images discr`etes. Th`ese de doctorat, Universit´e Bordeaux I (1992) 2. Fiorio, C.: A topologically consistent representation for image analysis: the frontiers topological graph. In: Discrete Geometry for Computer Imagery. Number 1176 in Lecture Notes in Computer Science, Lyon, France (1996) 151–162 3. Pailloncy, J., Jolion, J.: The frontier-region graph. In: Workshop on Graph based representations. Volume 12 of Computing Supplementum., Springer (1997) 123– 134 4. Braquelaire, J., Desbarats, P., Domenger, J., W¨ uthrich, C.: A topological structuring for aggregates of 3d discrete objects. In: Workshop on Graph based representations, Austria, IAPR-TC15 (1999) 193–202 5. Bertrand, Y., Damiand, G., Fiorio, C.: Topological encoding of 3d segmented images. In: Discrete Geometry for Computer Imagery. Number 1953 in Lecture Notes in Computer Science, Uppsala, Sweden (2000) 311–324 6. Braquelaire, J., Desbarats, P., Domenger, J.: 3d split and merge with 3-maps. In: Workshop on Graph based representations, Ischia, Italy, IAPR-TC15 (2001) 32–43 7. Damiand, G., Resch, P.: Topological map based algorithms for 3d image segmentation. In: Discrete Geometry for Computer Imagery. Number 2301 in LNCS, Bordeaux, France (2002) 220–231
Using 2D Topological Map Information in a Markovian Image Segmentation
297
8. Gonzales, R.C., Woods, R.E.: Digital Image Processing. Addison-Wesley (1993) 9. Brun, L., Domenger, J.: A new split and merge algorithm with topological maps and inter-pixel boundaries. In: The fifth International Conference in Central Europe on Computer Graphics and Visualization. (1997) 10. Brun, L., Domenger, J., Braquelaire, J.: Discrete maps : a framework for region segmentation algorithms. In: Workshop on Graph based representations, Lyon, IAPR-TC15 (1997) published in Advances in Computing (Springer). 11. Braquelaire, J., Brun, L.: Image segmentation with topological maps and interpixel representation. Journal of Visual Communication and Image Representation 9 (1998) 62–79 12. Geman, S., Geman, D.: Stochastic Relaxation, Gibbs Distribution, and the Bayesian Restoration of Images. IEEE Trans. on Pattern Analysis and Machine Intelligence PAMI-6 (1984) 721–741 13. Bouman, C., Liu, B.: Multiple Resolutions Segmentation of Textured Images. IEEE Trans. on Pattern Analysis and Machine Intelligence 13 (1991) 99–113 14. Kervrann, C., Heitz, F.: A Markov Random Field Model-based Approach to Unsupervised Texture Segmentation using Local and Global Spatial Statistics. IEEE Trans. on Image Processing 4 (1995) 856–862 15. Barker, S.A.: Image Segmentation using Markov Random Field Models. Phd thesis, University of Cambridge (1998) 16. Melas, D.E., Wilson, S.P.: Double markov random fields and bayesian image segmentation. IEEE Trans. on Signal Processing 50 (2002) 357–365 17. Jacques, A.: Constellations et graphes topologiques. In: Combinatorial Theory and Applications. Volume 2. (1970) 657–673 18. Cori, R.: Un code pour les graphes planaires et ses applications. In: Ast´erisque. Volume 27. Soc. Math. de France, Paris, France (1975) 19. Lienhardt, P.: Topological models for boundary representation: a comparison with n-dimensional generalized maps. Commputer Aided Design 23 (1991) 59–82 20. Celeux, G., Diebolt, J.: The SEM Algorithm : a Probabilistic Teacher Algorithm Derived from the EM Algorithm for the Mixture Problem. Computational statistics quarterly 2 (1985) 73–82 21. Debled-Rennesson, I., Reveilles, J.P.: A linear algorithm for segmentation of digital curves. International Journal of Pattern Recognition and Artificial Intelligence 9 (1995) 635–662
Topology Preservation and Tricky Patterns in Gray-Tone Images Carlo Arcelli and Luca Serino Istituto di Cibernetica “E. Caianiello”, CNR 80078 Pozzuoli, Napoli, Italy {c.arcelli,l.serino}@cib.na.cnr.it
Abstract. A gray-tone image including perceptually meaningful elongated regions can be represented by a set of line patterns, the skeleton, consisting of pixels having different gray-values and mostly placed along the central positions of the regions themselves. We discuss a skeletonization algorithm, computed over the Distance Transform of the image and employing topology preserving operations. Differently from the binary case, where the use of the connectivity test is generally sufficient to create a one-pixel-thick skeleton, we consider also a suitable labeling of the pixel neighborhood. In this way, we are able to deal with some of the tricky patterns in the gray-tone image that can be regarded as irreducible.
1 Introduction In gray-tone digital images, regions with locally higher gray-value can be understood in certain problem domains as the ones carrying the most relevant information. This is the case when an image includes perceptually significant elongated subsets, generally constituted by pixels characterized by different gray-values. This meaningful information can conveniently be represented in terms of a set of line patterns, called skeleton hereafter, generally consisting of pixels having different gray-values and mostly placed along the central positions of the regions themselves. If a gray-tone digital image is regarded as a mountainous relief, the gray-value of a pixel being its height, the identification of the skeleton can be related to the detection of topographic features such as ridges, peaks and saddles. For instance, the skeleton could be found by considering the image as a continuous surface, and by using the first and second partial derivatives of this surface to identify the skeletal pixels [1]. Alternatively, one could consider the gray-tone image as an ordered set of binary images each one obtained, by suitably thresholding, as a cross-section of the graytone image [2]. Skeletonization is then accomplished by repeatedly lowering the gray-value of certain pixels until gray-values which characterize regional minima are eventually assigned to them [3]. Generally, a lowering operation should not modify the topology of the gray-tone image, in the sense that any cross-section binary image should preserve its topology [4].
I. Nyström et al. (Eds.): DGCI 2003, LNCS 2886, pp. 298–307, 2003. © Springer-Verlag Berlin Heidelberg 2003
Topology Preservation and Tricky Patterns in Gray-Tone Images
299
In this paper, we describe a skeletonization algorithm driven by the Distance Transform of the gray-tone image. Particularly, to find a skeleton placed along the proper medial positions, we exploit the structural information characterizing the Distance Transform of a single-valued region and take into account the dominance relations among the regions constituting the gray-tone image. We regard the image as piecewise constant [5] and for each region with constant gray-value we compute the Distance Transform. The latter is of the constrained type [6] whenever there exist adjacent regions with higher gray-value. Computation of the Distance Transform is accomplished according to the (3,4)-weighted distance [7], by ordered propagation over regions with increasing gray-values. The pixels in each region receive a distance label related to their geodesic distance from a reference set constituted by the pixels with lower gray-values adjacent to the region. Then, the pixels are examined in a suitable order, and the ones that are end points or non simple points are taken as elements of the skeleton. Due to the possible complexity of the morphology of a gray-tone image, topology preserving reduction operations are not always sufficient to create a one-pixel-thick skeleton. In this respect, we consider a suitable labeling of the pixel neighborhood, which allows us to deal with some tricky patterns in the gray-tone image that can be regarded as irreducible. We note that both a preprocessing phase and a postprocessing phase should be included in any skeletonization algorithm applied to real world images. The role of the preprocessing is to remove narrow peaks and pits as well as to fill in valleys and flatten plateaux. In turn, the postprocessing phase is required to remove skeleton branches which do not constitute significant separations (watersheds) between adjacent basins, and to prune branches which do not denote significant promontories. We are not specifically interested in these phases and will only mention some features of the preprocessing phase we take into account.
2 Preliminaries Let G be a gray-tone digital image. Pixels in G are assigned one out of a finite number of increasing integer values gk, k= 0, 1, ..., N, which indicates for any pixel p the gray-value or status g(p) of the pixel itself. Letters will be used to denote both pixels and their gray-values. We assume that G is bordered by a frame of pixels with grayvalue greater than gN. The neighbors of p are its 8-adjacent pixels. They constitute the neighborhood N(p) of p and are denoted by n1, n2,...,n8, where the subindexes increase clockwise from the pixel n1 placed to the left of p. The neighbors ni, i odd, are called direct neighbors (d-neighbors). The neighbors ni, i even, are called indirect neighbors (ineighbors). We denote by max and min, respectively, the maximal gray-value and the minimal gray-value of the ni having gray-values less than g(p). If p> ni, for at least one d-neighbor, p is termed lower border point. If p< ni, for at least one d-neighbor, p is termed upper border point. If p has only one neighbor or just two consecutive neighbors with gray-value equal to its own gray-value and all the remaining neighbors have smaller gray-value, p is termed end point.
300
Carlo Arcelli and Luca Serino
A gray-tone image can be regarded as a mosaic, generally made of very irregular pieces (or regions), different in gray-value, shape and size. The regions do not overlap each other and are maximal 4-connected sets of pixels with a same gray-value. Two regions with different gray-value are called adjacent if they are 4-adjacent. A bottom region (shortly, a bottom) is a region with all its adjacent regions having higher gray-values. Any bottom is a regional minimum of G. A top region (shortly, a top) is a region with all its adjacent regions having lower gray-values. When G has to be processed by using topology preserving operations, it is necessary to identify for each of its pixels which are the foreground and the background and which kind of connectedness holds for each of them. Thus, rather than as a mosaic, it is convenient to understand G as a stack of binary images, by following an approach dating back to the studies on threshold logic [8] and widely used in the literature. More in detail, for any gray-value gk, with k different from 0, the gray-tone image is regarded as a binary one, say Bk, where the set of pixels having gray-values not less than gk constitutes the current foreground and the set of pixels having grayvalues less than gk constitutes the current background. Thus, according to the chosen threshold values, there are N binary images in correspondence with G. In Bk, the 8connectedness should be understood to hold for the foreground, the 4-connectedness for the background. When applying operations which change the status of some pixels, we say that the topology of the gray-tone image does not change if none of the N binary images Bk, found in correspondence with the various thresholds, has its topology changed. In a binary image, the assignement of a pixel p to a component, different from the one it currently belongs to, changes the topology of the image whenever it causes a modification in the number of components of the foreground or of the background. In this respect, topology is preserved if ones removes from the foreground (i.e., assigns to the background) only the pixels, termed simple points, which satisfy certain neighborhood conditions. For instance, simple points are those p for which the 8connectivity number C(p) is equal to one [9]. C(p) =Σi odd ((1- ni) - (1 - ni) (1 - ni +1) (1 - ni +2)) When considering a pixel p in a gray-tone image, we should refer to the binary image Bk, where the threshold t = gk corresponds to g(p). By regarding ni as a Boolean variable equal to 1 if ni ≥ p, and equal to 0 otherwise, the 8-connectivity number C(p) for a lower border point p turns out to be equal to the number in N(p) of 8components of pixels with gray-value not less than g(p). We say that p is a simple point of G if it is a simple point of Bk, namely if C(p) = 1 in Bk. End points and pixels that are not simple points are called feature points. A reduction operator is an operator which replaces the gray-value of a pixel by the gray-value of one of its neighbors having smaller gray-value. A reduction operator is topology preserving in G whenever it is applied only to simple points and lowers to max the gray-value of each of them [4, 10]. c Let X and X respectively denote a region of G and its complement, and suppose c that R is a subset of X adjacent to the whole border of X. The Distance Transform of X with respect to the reference set R is the multi-valued set DT(X,R), which differs from X in having each pixel labelled with its distance from R, computed according a
Topology Preservation and Tricky Patterns in Gray-Tone Images
301
chosen distance function. If R is not adjacent to the whole border of X, the transform is called the constrained Distance Transform of X.
3 Skeletonization In this section, we outline a skeletonization algorithm driven by the Distance Transform of the gray-tone image and based on the use of topology preserving reduction operations. Moreover, we briefly discuss the preprocessing phase which is often crucial to obtain meaningful results. In summary, the main steps leading to the creation of the skeleton are the following: 1. Preprocessing; 2. Distance transformation; 3. End point detection; 4. Lowering of simple points present in successively adjacent regions with increasing gray-value; 5. Postprocessing. 3.1 Preprocessing The scope is to create a image having only a number of significant (i.e., deep enough) bottoms and a number of tops not too crenelated. Bottoms and tops of the input image are taken as seeds and, in correspondence with them, we construct multi-level (ε,δ)-components [11]. Each component is then identified by the gray-value of the corresponding seed. A multi-level (ε,δ)-component is a region where the difference in gray-value between two d-neighbors never exceeds the adjacency parameter δ, and the maximum difference in the gray-values of its pixels does not exceed the range parameter ε. Moreover, any component satisfies a maximal property, i.e., no valid component can be merged with an adjacent valid component to form a larger valid component. It has been pointed out that the values of ε and δ can conveniently be adjusted for different types of images or different levels of analysis. In this paper, we refer to input images characterized by 256 gray-levels and relate ε to the greatest difference in gray-value ∆ between adjacent d-neighbors. Particularly, we set ε = ∆-1. The rationale for this choice is to ensure a distinction between the foreground and the background in a binary image. As for δ, we select the value ∆ /2. Tops and bottoms are identified and then grown into (ε,δ)-components, which are created by iteratively aggregating to each seed the δ-adjacent regions. The output of * this phase is a modified image G where the obtained (ε,δ)-components take the place of the corresponding regions in the initial image. 3.2 Distance Transformation *
We regard the gray-tone image G as the union of a number of single-valued regions, and compute the Distance Transform of every region with respect to a reference set constituted by the regions with lower gray-values and adjacent to the region.
302
Carlo Arcelli and Luca Serino
The Distance Transform of the gray-tone image is the union of the (constrained and unconstrained) Distance Transforms of the regions constituting the image. A region is classified according to three typologies, depending on the gray-values of the adjacent regions: type 1. All the adjacent regions have smaller gray-values; type 2. Only some of the adjacent regions have smaller gray-values; type 3. All the adjacent regions have greater gray-value. It is straightforward to observe that for the regions of type 1, the Distance Transform is unconstrained since there are no adjacent regions with higher gray-values. These regions are characterized by locally higher intensities and will certainly include a skeleton branch. For any region of type 2, the Distance Transform is constrained and its computation leads to a set of propagating wave fronts (each wave front being a connected set of pixels with a same distance label) which interact with each other whenever the region protrudes over adjacent regions with smaller gray-value. A region of type 2 is perceptually dominated by the adjacent regions with higher gray-value, and the strength of this dominance is as greater as less protruding is the region itself. Let X and Y denote two regions of F, with g(X) 3LXY. Our procedure detects significant skeleton subsets only in correspondence with regions that are not strongly dominated. Finally, for the regions of type 3 the Distance Transform cannot be computed since all border pixels are adjacent to regions with higher gray-values and the reference set is empty. The computation is accomplished on the array where the preprocessed gray-tone image is stored, and is performed according to the (3,4)-weighted distance, by ordered propagation over regions with ascending gray-values. Queues, i. e., first-infirst-out data structures, are used to this purpose. The lower border points of the image are traced and stored in a priority queue constituted by a set of queues, which have different priority levels. The use of the priority queue allows one both to minimize pixel access and to process efficiently pixels in increasing order of gray-value. The priority levels correspond to the increasing gray-values in the image, higher priority corresponding to lower gray-value. The queue at priority level gk contains only the pixels with gray-value gk currently under examination. As a result, the pixels in each region receive a distance label related to their geodesic (3,4)-weighted distance from a reference set constituted by the pixels with lower gray-values and adjacent to the region. If Xk denotes the region(s) with gray-value gk, the distance labels in Xk turn out to be smaller (greater) than those in Xk +1 (Xk -1). 3.3 End Point Detection and Pixel Lowering *
We refer to the Distance Transform of G and follow the classical scheme which is concerned first with the detection of the end points present in every region of the
Topology Preservation and Tricky Patterns in Gray-Tone Images
303
image (they are marked as feature points), and successively, starting from the lower border points of every region, with the iterated lowering of more and more internal pixels. The definition of end point in the Distance Transform is given in terms of distance label. Specifically, a pixel p is defined end point if it has only one neighbor or just two consecutive neighbors with distance label equal to its own distance label and all the remaining neighbors have smaller distance label. Moreover, to cope with the discrete nature of the digital plane we mark also two-pixel-thick end point configurations such as the set of p’s in Fig.1. c c c c
c p p a
c p p a
c c c c
Fig. 1. The 4-tuple of p's is a two-pixel-thick end point configuration. Pixels c's have distancelabels less than p, pixels a's have distance-labels not less than p.
End point detection is performed on the Distance Transform during one scan of the image. As for pixel lowering, the same data structure adopted for the computation of the Distance Transform is used to access the pixels of each region, according to the increasing value of their distance from the lower border points. For every set of pixels with the same distance label, the connectivity test is performed repeatedly until only pixels that are end points or not simple points are left. When computing the 8-connectivity number on a pixel p, we refer to its binary neighborhood where a neighbor is regarded equal to 1 if it is a feature point or has distance label not less than the distance label of p, and equal to 0 otherwise. The pixels that are simple points are lowered, while the remaining ones are marked as feature points. Once a region has been completely examined, the process is repeated on the successive regions with greater gray-value, until the image is exhausted. * At the end of the process, G will be transformed into an image including a set of feature points, which should represent the skeleton of the gray-tone image. Indeed, the set of feature points has not always a linear structure, so that it might not be cor* rect to regard it as the skeleton of G . An example is shown in Fig. 2, concerning a part of magnified biological material scanned at 300 dpi, 256 gray-levels. 3.4 Tricky Patterns The set of feature points we obtain by the previous process is not ensured to be one* pixel-thick. Indeed, there are patterns in G , which are inhibited to be lowered because of the connectivity test, even if their lowering seems to be intuitively easy to achieve.
304
Carlo Arcelli and Luca Serino
a)
b)
*
Fig. 2. a) input image G . b) skeleton including a thick subset (superimposed over the input).
We don’t mention here the simple cases of the type shown in Fig. 3, where the presence of pixels that are non simple points in the binary images Bk (k= 6 and k=7) causes a local thickening of the skeleton. These patterns are also common when dealing with non gray-tone images [12], and their reduction to unit thickness is easy to obtain by using topology preserving operations in a postprocessing phase (namely, by lowering the underlined pixels 7). 1 1 1 1 1 1 1
1 6 1 1 1 6 1
1 1 6 1 6 1 1
1 7 7 6 1 1 1
1 1 1 7 9 1 1
1 1 1 1 9 1 1
1 1 1 1 1 1 1
Fig. 3. Pixels with gray-value different from 1 are not allowed to change their status. Further lowering can be achieved during a postprocessing phase. 1 1 1 7 7 7 7
1 1 1 7 6 6 7
1 6 6 6 6 6 7
1 1 1 7 6 6 7
1 1 7 7 7 6 7
1 1 7 7 7 7 7
Fig. 4. Pixels with gray-value 6 cannot be lowered when topology and end points are preserved.
Here we refer to regions, even of considerable size, whose pixels are all detected as feature points. This occurs, for instance, for the regions with gray-value 6 in Fig. 4. Their pixels cannot be lowered because when the application of the topology preserving reduction operator starting from the lower border points reaches the underlined pixels, it turns out that the underlined pixels are either non simple points or end points, and all the successive more internal pixels, when checked, are non simple points because their connectivity number turns out to be equal to zero.
Topology Preservation and Tricky Patterns in Gray-Tone Images
305
Indeed, the problem arises as soon as an image subset is bordered almost completely by pixels with higher gray-values, and the subset can communicate with an adjacent region with lower gray-value only through non simple points belonging to a narrow one-pixel-wide path. A variation of the previous pattern may be as in Fig. 5a, where the gray-values of the pixels are ordered as follows: a
a a a a b a a a a
d b b b b a a a a
d b b b b d d d d
d b b b b c c c d
d b b b b c c c d
d b b b b c c c d
d b b b b c c c d
d b b b b c c c d
d d d d d d d d d
d 12 12 11 10 c c c d
d 15 15 14 13 c c c d
d 18 18 17 16 c c c d
d 21 21 20 19 c c c d
d d d d d d d d d
a) a a a a 3 a a a a
a a a a 3 a a a a
d 3 3 3 3 a a a a
d 6 6 6 4 d d d d
d 9 9 8 7 c c c d b)
Fig. 5. a) Four regions, where the gray-values of the pixels are a
It is straightforward to see that the underlined pixels are detected as feature points. In particular, these pixels result currently internal for all the distance labels starting from the underlined pixel labeled 4. When the set of c’s has to be processed, it happens that the lower border points cannot be lowered because they appear to be internal pixels (i.e., with connectivity number equal to 0) due to the presence of the non simple points detected on the set of b’s. In turn, if lower border points are not lowered, also the remaining c’s cannot be lowered because they are not adjacent to any pixel with lower gray-value. Thus, the set of c’s is not modified. To deal with the kind of patterns shown above in Figs. 3 and 4, our approach is to break the barrier (constituted by one or more non simple points) which prevents lowering of the pixels of an irreducible region. To this purpose, once a pixel p is found that is detected as non simple point because it appears to be internal and p has at least one d-neighbor belonging to a region
306
Carlo Arcelli and Luca Serino
with higher gray-value, we temporarily interrupt the lowering process and check the number and position of such d-neighbors. If there is only one d-neighbor or a pair of two opposite d-neighbors, we induce lowering of p and of all its neighbors with the same gray-value of p not yet examined, and at the same time we mark as feature points the d-neighbors with higher gray-value. Then, the lowering process is resumed. This induced lowering is just a labeling operation which creates the conditions allowing the modification of the irreducible region. Marking of the d-neighbors having higher gray-value ensures the connectedness of the set of feature points (hence of the skeleton). What has to be paid for the success of this operation is that topology may not be preserved. For instance in Fig. 4 the pixels 6 not underlined are lowered to 1, so that the binary image B6 has initially the foreground constituted by one simply connected component, while after the process that component includes also one hole. On the other hand, in the case of Fig. 5 topology is preserved. Anyway, as far as skeletonization of gray-tone images is concerned, topology preservation is not a fundamental issue [13]. As for the example mentioned in Fig. 2, the onepixel-wide skeleton is shown in Fig. 6.
Fig. 6. One-pixel-wide modification of the skeleton shown in Fig. 2.
3.5 Postprocessing Generally, the found gray-skeleton is not everywhere perceptually meaningful since it may include a number of branches which either are created in correspondence of end points which are not significant, or are found as lines dividing two bottoms at least one of which is not significantly deep. In this paper, we don't deal specifically with this phase and refer to the recent literature for a discussion regarding suitable criteria of significance [13,14].
4 Conclusion We have described a sequential algorithm for the skeletonization of a gray-tone image. We have regarded the image as constituted by a number of constant gray-value regions, and have looked for a set of digital lines that is mainly placed centrally in
Topology Preservation and Tricky Patterns in Gray-Tone Images
307
correspondence with regions with locally higher gray-values. This set has been detected on the Distance Transform of the image, computed according to the (3,4)weighted distance by ordered propagation over regions with increasing gray-value. An advantage of using the Distance Transform is that it creates a structure in the interior of each region, and favors the detection of skeleton subsets in correspondence with elongated regions not strongly dominated by other regions. Topology preserving reduction operations have been used to lower the gray-value of simple points. However, differently from the binary case, where the use of the connectivity test is generally sufficient to create a one-pixel-thick skeleton, we need to consider also a different operation including a suitable labeling of the neighborhood of the pixel under examination. In this way, we are able to obtain a one-pixelthick skeleton in correspondence with some of the tricky subsets of the gray-tone image that can be regarded as irreducible.
References 1. Wang, L., Pavlidis, T.: Detection of curved and straight segments from grey scale topography. CVGIP: Image Understanding 58 (1993) 352-365 2. Beucher, S., Meyer, F.: The morphological approach to segmentation: the watershed transformation. In: Dougherty, E.R. (ed.): Mathematical Morphology in Image Processing. Marcel Dekker, New York (1993) 433-481 3. Goetcherian, V.: From binary to grey tone image processing using fuzzy logic concepts. Pattern Recognition 12 (1980) 7-15 4. Bertrand, G., Everat, J.-Ch., Couprie, M.: Image segmentation through operators based on topology. J. Electronic Imaging 6 (1997) 395-405 5. Rosenfeld, A.: On connectivity properties of greyscale pictures. Pattern Recognition 16 (1983) 47-50 6. Piper, J., Granum, E.: Computing distance transformations in convex and non-convex domains. Pattern Recognition 20 (1987) 599-615 7. Borgefors, G.: Distance transformations in digital images. Computer Vision, Graphics and Image Processing 34 (1986) 344-371 8. Gilbert, E. N.: Lattice-theoretic properties of frontal switching functions. J. Math. Phys. 33 (1954) 57-67 9. Yokoi, S., Toriwaki, J.-I., Fukumura, T.: An analysis of topological properties of digitized binary pictures using local features. Computer Graphics and Picture Processing 4 (1975) 63-73 10. Arcelli, C.: Topological changes in grey-tone digital pictures. Pattern Recognition 32 (1999) 1019-1023 11. Wang, Y., Bhattacharya, P.: On parameter-dependent connected components of gray images. Pattern Recognition 29 (1996) 1359-1368 12. Arcelli, C., Sanniti di Baja, G.: Skeletons of planar patterns. In: Kong, T.Y., Rosenfeld, A. (eds): Topological Algorithms for Digital Image Processing. North Holland, Amsterdam (1996) 99-143 13. Arcelli, C., Serino, L.: Regularization of graphlike sets in gray-tone digital images. Int. J. Pattern Recognition and Artificial Intelligence 15 (2001) 643-657 14. Najman, L., Schmitt, M.: Geodesic saliency of watershed contours and hierarchical segmentation. IEEE Trans. on PAMI 18 (1996) 1163-1173
Shortest Route on Height Map Using Gray-Level Distance Transforms Leena Ikonen and Pekka Toivanen Lappeenranta University of Technology P.O.Box 20, 53851 Lappeenranta, Finland [email protected]
Abstract. This article presents an algorithm for finding and visualizing the shortest route between two points on a gray-level height map. The route is computed using gray-level distance transforms, which are variations of the Distance Transform on Curved Space (DTOCS). The basic Route DTOCS uses the chessboard kernel for calculating the distances between neighboring pixels, but variations, which take into account the larger distance between diagonal pixels, produce more accurate results, particularly for smooth and simple image surfaces. The route opimization algorithm is implemented using the Weighted Distance Transform on Curved Space (WDTOCS), which computes the piecewise Euclidean distance along the image surface, and the results are compared to the original Route DTOCS. The implementation of the algorithm is very simple, regardless of which distance definition is used.
1
Introduction
Finding the shortest path between two points on a three dimensional surface is a common optimization problem in many practical applications, e.g. robotic and terrain navigation, highway planning, and medical image analysis. By considering the digitized surface as a graph, variations of Dijkstra’s classical path search algorithm become feasible (e.g. [4], [10]). A dynamic programming-based algorithm for computing distances of fuzzy digital objects is presented in [9]. This article presents an algorithm for finding optimal routes, or so called minimal geodesics, between two points on a gray-level height map. Other distance map approaches for path optimization include level sets propagation [3], and morphological grassfire algorithms [5]. Our algorithm is based on the Distance Transform on Curved Space (DTOCS presented in [13]), which calculates distances on a gray-level surface, when the gray-levels are understood as height values of the image surface. The Route DTOCS, first presented in [2], is developed further by using distance definitions, which give more accurate values for the global distances compared to the original chessboard distance transform. Particularly the piecewise Euclidean distance calculated with the Weighted Distance Transform on Curved Space (WDTOCS [13]) produces reliably optimal routes. I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 308–316, 2003. c Springer-Verlag Berlin Heidelberg 2003
Shortest Route on Height Map Using Gray-Level Distance Transforms
2
309
Definitions for Route DTOCS and WDTOCS
In the distance image produced by the DTOCS or the WDTOCS, every pixel in the calculation area X has a value which corresponds to the distance of that pixel to the nearest background pixel in X C . The definition of the DTOCS for any calculation area X can be found in [13]. In the Route DTOCS the same distance metrics apply, but the complement area X C is restricted to a single point, and the distance can be calculated according to the following, slightly simplified, definitions. A discrete gray-level image is a function G : Z 2 → N , where N is the set of positive integers. Definition 1. Let N8 (p) denote the set of all 8 neighbors of pixel p in Z 2 . Pixels p and q are 8-connected if q ∈ N8 (p). Let N4 (p) denote the set of 4-connected neighbors, and N8 (p) \ N4 (p) the set of diagonal neighbors. A discrete 8-path from pixel p to pixel s is a sequence of pixels p = p0 , p1 , ..., pn = s, where every pi is 8-connected to pi−1 , i = 1, 2, ..., n.. Definition 2. Let Ψ (x, y) denote the set of all possible discrete 8-paths linking points x ∈ X and y ∈ X C . Let γ ∈ Ψ (x, y) and let γ have n pixels. Let pi and pi+1 be two adjacent pixels in path γ. Let G(pi ) denote the gray value of pixel pi . n−1 The length of the path γ is defined by Λ(γ) = i=1 d(pi , pi+1 ), where the definition of d(pi , pi+1 ), i.e. the distance between neighbor pixels pi and pi+1 on the path, depends on the distance transform used. The Weigthed Distance Transform on Curved Space (WDTOCS) uses the Euclidean distance calculated with the Pythagoras’ theorem from the height difference and the horizontal displacement of the two pixels: |G(pi ) − G(pi+1 )|2 + 1 , pi+1 ∈ N4 (pi ) d(pi , pi+1 ) = (1) |G(pi ) − G(pi+1 )|2 + 2 , pi+1 ∈ N8 (pi ) \ N4 (pi ) In the chessboard DTOCS the distance is defined as the height (gray-level) difference between the pixels, plus one for the horizontal displacement: d(pi , pi+1 ) = |G(pi ) − G(pi+1 )| + 1
(2)
The distance can also be defined using separate height and pixel-to-pixel displacements as in DTOCS, but using the accurate horizontal distance between diagonal neighbors: |G(pi ) − G(pi+1 )| + √ 1 , pi+1 ∈ N4 (pi ) (3) d(pi , pi+1 ) = |G(pi ) − G(pi+1 )| + 2 , pi+1 ∈ N8 (pi ) \ N4 (pi ) Definition 3. The distance image F ∗ (x) when X C = {y} is min(Λ(γ)) , x ∈ X ∗ F (x) = γ∈Ψ 0 , x ∈ XC
(4)
310
Leena Ikonen and Pekka Toivanen
The same distance image definition √ is used for the WDTOCS, the DTOCS and for the distance transform using 2 as the horizontal displacement between diagonal neighbors. The definition of neighbor-distances d(pi , pi+1 ) used in calculating the path length Λ(γ) determines which version of the distance transform is produced by the algorithm.
3
The Distance Transformation Algorithms
The two-pass algorithm (see [13]) for calculating the DTOCS or the WDTOCS image F ∗ (x) is a sequential local operation (see [7]). The algorithm requires two images: the original gray-level image G(x) and a binary image F(x), which determines the region(s) in which the transformation is performed. The calculation area X in F(x) is initialized to max (the maximal representative number of memory) and the complement area X C to 0. The first computation pass proceeds using the mask M1 = {pnw , pn , pne , pw } in figure 1 rowwise from the top left corner of the image, substituting the middle point F(pc ) with the distance value F1∗ (pc ) = min[F(pc ), min (∆(p) + F1∗ (p))] p∈M1
(5)
The distance ∆(p) between pixels pc and p is calculated according to the definition of the distance transformation that is used: |G(p) − G(pc )|2 + 1 , p ∈ N4 (pc ) (6) WDTOCS: ∆(p) = |G(p) − G(pc )|2 + 2 , p ∈ N8 (pc ) \ N4 (pc ) DTOCS: ∆(p) = |G(p) − G(pc )| + 1 √
2-DTOCS: ∆(p) =
|G(p) − G(pc )| + √ 1 , p ∈ N4 (pc ) |G(p) − G(pc )| + 2 , p ∈ N8 (pc ) \ N4 (pc )
(7) (8)
The backward pass uses the mask M2 = {pe , psw , ps , pse } in figure 1 replacing the distance value F1∗ (pc ) calculated by the forward pass with the new value F ∗ (pc ) = min[F1∗ (pc ), min (∆(p) + F ∗ (p))] p∈M2
(9)
If the original gray-level map is complex, the two calculation passes may have to be repeated several times to get the perfect distance map (see [11]). The distance image F ∗ (x) is used instead of the binary image F(x) for the next computation pass repeatedly until the DTOCS algorithm has converged to the globally optimal distances.
4
The Shortest Route Algorithm
The shortest route algorithm is based on calculating two distance maps, one for each endpoint of the desired route. Assuming we have a gray-level map G(x)
Shortest Route on Height Map Using Gray-Level Distance Transforms pnw pn pne pw (pc )
psw
311
(pc ) pe ps pse
Fig. 1. The masks for calculating the DTOCS. The left mask M1 is used in the forward calculation pass, and the right mask M2 in the backward pass.
and want to find an optimal route from point a with gray-level value (i.e. height) G(a) to point b with value G(b), we initialize the binary images Fa (x) and Fb (x) with XaC = {a} and XbC = {b} respectively. Using these two images, we calculate the distance images Fa∗ (x) and Fb∗ (x) with one of the distance transformation algorithms. In the resulting distance maps each value corresponds to the distance between point x and point a (or b respectively) along an 8-connected path that is optimal according to the distance definition of the used algorithm, WDTOCS, √ DTOCS or 2-DTOCS. It can be noted that Fa∗ (b) as well as Fb∗ (a) equals the length of the shortest route between points a and b, but the route itself can not be seen in the separate maps. Using the two maps we define the route distance: DR (x) = Fa∗ (x) + Fb∗ (x)
(10)
For each point x the value DR (x) is the length of the shortest path from point a to b that passes through point x. The value Fa∗ (x) is the shortest distance from a to x, and Fb∗ (x) is the shortest distance from x to b, and these optimal subpaths form an optimal path (see [6]). The equal distance propagation curves in [3] are combined similarly to form minimal geodesics. Now the optimal route from a to b is the set of points, for which the route distance is minimal. We define the route: (11) R(a, b) = { x | DR (x) = min DR (x)} x
There can be several optimal paths, and the set R(a, b) contains all points that are on any optimal path, so this method does not provide an analytical description of a distinct route (e.g. a sequence of pixels). However, the routes can be visualized by √ marking the set of pixels R(a, b) on the original image. In WDTOCS and 2-DTOCS real values are used in the calculations, but the route distance DR (x) is rounded up to nearest integer before finding the points with the minimal distance. To summarize, the shortest route algorithm is: 1. 2. 3. 4.
5
Calculate the distance image Fa∗ (x) from source point a Calculate the distance image Fb∗ (x) from destination point b Calculate the route distance DR (x) = Fa∗ (x) + Fb∗ (x) Mark points with DR (x) = minDR (x) as points on optimal route R(a, b) x
Experiments and Results
This section demonstrates how the shortest route algorithm works, and compares the results of implementations with different distance definitions. Figure
312
Leena Ikonen and Pekka Toivanen
2 presents a step by step application of the algorithm. Figure 2 a) is the original gray-level image. Figures 2 b) and 2 c) show the DTOCS-images Fa∗ (x) and Fb∗ (x) calculated from the endpoints a and b (marked with ’x’). As the distance function is symmetrical, it does not matter which endpoint corresponds to a and which to b. Figure 2 d) shows the route distance image, i.e. the sum of the DTOCS-images. Images 2 b)–d) are scaled to gray-levels, but original distance values, which can be beyond 255, are used in the calculation of DR (x). Figure 2 e) presents the final result, i.e. the points in set R(a, b). Figure 2 f) presents the same route calculated with the WDTOCS. It can be seen that for the complex image surface representing varying terrain the route is very similar, but sharper than the route by the DTOCS.
a)
b)
c)
d)
e)
f)
Fig. 2. a) Original image, b) distance from source point, c) distance from destination point, d) sum of distance images, e) route by DTOCS, f) route by WDTOCS.
A sample application, where the shortest route idea is used to solve a labyrinth, was presented in [2]. Figure 3 a) shows the route through a labyrinth produced by the original Route DTOCS. The algorithm needs a threshold segmented image, where labyrinth paths get value zero and walls get a very high value. Then the shortest path from the entrance to the exit of the labyrinth is the route through the labyrinth. It can be seen in figure 3 a) that the route makes seemingly extra 90◦ corners when calculated with the chessboard DTOCS. The explanation to this problem is visualized in figure 4. The route from point A to B that passes through point x is just as short as as the intuitively optimal straight route, as there are as many pixel-to-pixel displacements on both routes. Consequently, there are several optimal discrete 8-connected paths through the labyrinth, and as the route is defined as the set of all points that are on any
Shortest Route on Height Map Using Gray-Level Distance Transforms
313
optimal path, the visualized route becomes wide. Figure 3 b) shows how the route width decreases when the longer distance between diagonal pixels is taken into account according to equation 3.
a)
b)
Fig. 3. √ a) Route through labyrinth by DTOCS b) Route through labyrinth by DTOCS with 2 diagonal distances.
x
A
B
Fig. 4. Two of several possible routes from point A to B on a flat image surface according to the chessboard distance definition. The DTOCS distance is the same along the route through point x as along the straight line, as there are as many pixels on both routes (each square represents a pixel).
Tests with a gray-scale-ball image show similar results. The routes between the endpoints of the horizontal diameter of the half-sphere are too wide, when √ calculated with the basic Route DTOCS (figure 5 a), but introducing the 2factor to the diagonal neighbor distances makes the routes as optimal as can be expected of discrete 8-connected paths (figure 5 b). Using the Euclidean neighbor distances of the WDTOCS changes the result dramatically, i.e. the algorithm finds the route across the half-sphere rather than around it (figure 5 c). The differing route lengths are partly a result of the digitization of the sphere function. Figure 6 shows a cross-section and a horizontal projection of a digital ball with few pixels. The digitization error is smaller but still present when
314
Leena Ikonen and Pekka Toivanen
using a higher resolution ball image. Another big factor is that the variation in surface height increases the WDTOCS-distances less than the distances of the transforms that add the height difference to the horizontal displacement. The DTOCS-distance across the ball along the route the WDTOCS algorithm finds optimal (as in figure 5 c) would be clearly longer than the WDTOCS-distance, √ as each neighbor-distance d2 + 1 is replaced with d + 1, where d is the height difference of the neighbor pixels. When gray-level variations, i.e. height differences are large, the effect the horizontal displacements have on the distance value decreases in WDTOCS, √ whereas it stays constant in DTOCS and 2-DTOCS. The application determines which approach is better. If the transformation is used to approximate actual distances along a real surface, using the piecewise Euclidean distance of WDTOCS is justified. If the gray-level differences represent a different type of cost than the horizontal displacements, the transformations adding horizontal and vertical distances may work better and be more easily scalable. To modify the effect the height differences have on the distance transform, the original image can be scaled before applying the transformation. Alternatively, a weighting factor can be added to the height difference in equations 1, 2 and 3.
a)
b)
Fig. 5. a) Route by DTOCS, b) Route by DTOCS with by WDTOCS.
6
c) √
2 diagonal distance, c) Route
Discussion
In previous work, the DTOCS algorithm has mostly been used to calculate local distances. For example in image compression (see e.g. [12]) distance values are used to measure the variation of the image surface. More control points need to be stored from image areas, where local distances are high, i.e where gray-level values change rapidly. In such applications the chessboard distance transform works well enough, and the use of integer approximations of distance values is justified to save computation time and space. However, the route optimization algorithm computes global distances across the whole image, and the approximation error of the chessboard distance accumulates. Particularly on smooth and simple image surfaces, the chessboard Route DTOCS performs poorly, and using the WDTOCS produces more reliable optimal routes.
Shortest Route on Height Map Using Gray-Level Distance Transforms
a)
315
b)
Fig. 6. a) Cross section of digititized ball with the WDTOCS route across the ball. The height of√the bars corresponds to gray-level values. b) flat projection of digitized ball with the 2-DTOCS route around the ball, and the shape of the WDTOCS-route marked with dashed line for comparison. Each square represents a pixel.
√ The distance transform using 2 as the diagonal pixel-to-pixel displacement is an interesting hybrid of chessboard and Euclidean distance definitions, as the locally Euclidean distance is used as the horizontal pixel-to-pixel displacement, but the height difference is calculated just as in the chessboard DTOCS. The theoretical basis for this hybrid distance transform may not be as solid as for the DTOCS and the WDTOCS, but in route optimization it can give some interesting results. For example in the labyrinth application the horizontal displacements form the desired route, and the values of the gray-level differences are not significant, as long as distances along low paths are clearly shorter than distances over high walls. Other obstacle avoidance problems can be solved using the route optimization algorithm, and treating the horizontal and vertical displacements differently can be practical. Using the piecewise Euclidean distances of WDTOCS gives the most accurate approximations for distances along the image surface. If the slightly heavier computation of floating point values instead of integers is not a problem, the WDTOCS algorithm should be used to get the best results in route optimization. A question for future research is whether we can define integer kernel distances, which approximate the Euclidean distance more accurately than the DTOCS. Borgefors [1] showed that using local distances 3 and 4 for square and diagonal neighbors in binary images actually gives a better approximation of Euclidean distance along the horizontal image plane than the distances 1 and √ 2 used here. Extending the ideas to gray-level images requires further investigation into how the height differences affect, and how they should affect the distance transformation.
References 1. Borgefors, G.: Distance Transformations in Digital Images. Computer vision, Graphics, and Image Processing, 34 (1986) 344–371 2. Ikonen, L., Toivanen, P., Tuominen, J.: Shortest Route on Gray-Level Map using Distance Transform on Curved Space. Proc. of Scandinavian Conference on Image Analysis (2003) 305–310 3. Kimmel, R., Amir, A. and Bruckstein A.: Finding Shortest Paths on Surfaces Using Level Sets Propagation. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol 17, no. 6 (1995) 635–640
316
Leena Ikonen and Pekka Toivanen
4. Kimmel, R. and Kiryati, N.: Finding Shortest Paths on Surfaces by Fast Global Approximation and Precise Local Refinement. International Journal of Pattern Recognition and Artificial Intelligence, vol 10 (1996) 643–656 5. Lin P., Chang S.: A Shortest Path Algorithm for a Nonrotating Object Among Obstacles of Arbitrary Shapes. IEEE Transactions on Systems, Man, and Cybernetics, vol 23, no 3 (1993) 825–833 6. Piper J., Granum E.: Computing Distance Transformations in Convex and NonConvex Domains. Pattern Recognition, vol 20, no 6 (1987) 599–615 7. Rosenfeld A., Pfaltz, J. L.: Sequential Operations in Digital Picture Processing. Journal of the Association for Computing Machinery, vol 13, no 4 (1966) 471–494 8. Rosin, P., West, G.: Salience Distance Transforms. Graphical Models and Image Processing, vol 56, no 6 (1995) 483–521 9. Saha P. K., Wehrli F. W., Gomberg B. R.: Fuzzy Distance Transform: Theory, Algorithms and Applications. Computer Vision and Image Understanding 86 (2002) 171–190 10. Saab, Y. and VanPutte M.: Shortest Path Planning on Topographical Maps. IEEE Transactions on Systems, Man, and Cybernetics–Part A: Systems and Humans, vol 29, no 1 (1999) 139–150 11. Toivanen, P. J.: Convergence properties of the Distance Transform on Curved Space (DTOCS). Proc. of Finnish Signal Processing Symposium (1995) 75–79 12. Toivanen, P. J.: Image Compression by Selecting Control Points Using Distance Function on Curved Space, Pattern Recognition Letters 14 (1993) 475–482 13. Toivanen, P.: New geodesic distance transforms for gray-scale images. Pattern Recognition Letters, 17 (1996) 437–450
On the Use of Shape Primitives for Reversible Surface Skeletonization Stina Svensson1 and Pieter P. Jonker2 1
2
Centre for Image Analysis, Swedish University of Agricultural Sciences, Uppsala, Sweden [email protected] Pattern Recognition Group, Faculty of Applied Sciences Delft University of Technology, Delft, The Netherlands [email protected]
Abstract. We use a mathematical morphology approach to compute the surface and curve skeletons of a 3D object. We focus on the behaviour of the surface skeleton, in particular the reversibility for the case when the skeleton is, and is not anchored to the set of centres of maximal balls. We elaborate on the difficulties to obtain a reversible surface skeleton that does not depend on the orientation of the original object with respect to the grid, and that has no jagged borders. Keywords: Topological erosion, mathematical morphology, distance transform.
1
Introduction
For efficient shape analysis of the foreground set in an image, various shape representation schemes have been developed, among which skeletonization is commonly used. Skeletonization is a way to reduce the intrinsic dimension of the foreground objects, i.e., a surface in 2D is reduced to a curve (the 2D skeleton), or a volume in 3D is reduced to a curved surface (the 3D surface skeleton) that might be further reduced to a space curve (the 3D curve skeleton). In this paper we focus on the behaviour of the surface skeleton. To function as an efficient representation scheme, the skeleton of the object should fulfil a number of properties: The skeleton should be a thin subset of the object. To reflect the main structure of the object, the reduction process should not alter the topology. The reduced set should be centred within the original object. The result should be identical under rotation of the original. Finally, the process should be reversible, meaning that the object can be recovered from the skeleton. This property is useful if shape analysis related to changes in thickness of the object is performed. Various approaches to compute the surface skeleton of the foreground set in a 3D image can be found in literature, [1,2,3,4,5,6]. The interest of this paper deals with the latter two, [5,6]. In [5], an algorithm based on conditional erosion of the foreground set was presented. The conditions take care of the preservation of the topology and contain subsets that preserve surfaces, I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 317–326, 2003. c Springer-Verlag Berlin Heidelberg 2003
318
Stina Svensson and Pieter P. Jonker
surface ends, curves, curve ends and single points. When iteratively eroding the foreground, they take care that surfaces, curves and single points are not eroded. The resulting skeleton is thin and is centred within the respective object, however, it is not fully reversible. In [6], a distance transform based algorithm to compute reversible surface skeletons was presented. The algorithm is based on topology preserving iterative thinning guided by the distance transform of the foreground set. During this process, voxels needed for reversibility are preserved in combination with voxels needed for surface preservation. The resulting surface skeletons are reversible and centred within the respective objects with respect to the distance function used. However, the surface preservation condition is not enough to avoid a certain jaggedness in the border of the resulting surface skeleton. As far as we know there is to date no reversible surface skeletonization algorithm without this drawback. In this paper, we study the possibility of constructing an algorithm to compute reversible surface skeletons without this drawback. We show results from an algorithm based on the D26 distance, i.e., the 3D equivalence of chess board distance, based on the combination of the algorithms in [5,6]. Moreover, we show to what extent the surface skeletons computed by the algorithm in [5] are reversible and conclude with some remarks on the difficulties in computing reversible surface skeletons that are orientation independent and have no jagged surface borders. See figures 5..8.
2
Notions
Consider a 3D image consisting of foreground X and its complement X c , the background. In a 3 × 3 × 3 set of voxels centred on a voxel v, there are three types of neighbours to v:√6 face neighbours (on Euclidean distance √ 1 voxel from v), 12 edge neighbours ( 2 from v), and 8 vertex neighbours ( 3 from v). The different neighbours give three types of neighbourhoods, which will be denoted N 6 , N 18 , and N 26 . A set A ∈ X is n-connected, n ∈ {6, 18, 26}, if each pair of voxels v1 , vm ∈ A can be joined by a path v1 , v2 . . . , vm−i , vm such that each successive pair vi , vi+1 are n-connected to each other, i.e., vi ∈ N n (vi+1 ). Most often, the highest connectivity is used for the foreground set and the lowest for the background. We adopt this. Each voxel in an image can be labelled with a distance, according to the chosen distance function, to its closest voxel in the background. The result can be stored in a distance image or distance transform (DT). The DT can be computed in two scans of the images using local distance information only. For more information on how to use and compute DTs, we refer to [7,8,9,10]. We will use a distance function where the distance between two voxels, v and w, is dependent on the number of steps in the face (A), edge (B), and vertex (C) directions in a minimal path between v and w. The distance is given by d(v, w) = max(A, B, C), i.e., the distance equals the number of steps in a minimal 26-connected path between v and w. We refer to this distance as D26 and to the corresponding distance transform as the D26 DT. The D26 DT has the drawback of being unstable under rotation, i.e., it is not a good approximation of the Euclidean DT, [11,12], on
On the Use of Shape Primitives for Reversible Surface Skeletonization
319
the other hand, the Euclidean DT has the disadvantage in being more difficult to use [13,10]. The label of a voxel v in a DT can be interpreted as the radius of a ball centred on v which is fully enclosed in the foreground (the DTball). The foreground can be efficiently represented by a subset of its voxels considering the fact that some of the DTballs are completely covered by other DTballs. A DTball that is not completely covered by another DTball is called a maximal ball and the voxel it is centred on a centre of maximal ball (CMB). The foreground can be recovered from its CMBs by taking the union of the corresponding DTballs. This process can be efficiently implemented using the reverse distance transformation [14]. The CMBs can be detected on the DT by simple label comparison based on the fact that distance information is not propagated by any CMB to its neighbours. In fact, v, in a D26 DT, is a centre of a maximal ball if it has no neighbour with larger distance label. Detecting CMBs on the Euclidean DT is not equally trivial [13]. As far as we know there is to date no publication on the detection of centres of maximal Euclidean balls (for 3D images). A basic morphological operation is the hit-or-miss transformation, e.g., [15]. This can be described as a point-by-point transformation of a set X with the structuring element S consisting of two sets S 1 and S 2 and is performed in such a way that x ∈ X belongs to the transformation Y if and only if Sx1 , i.e., S 1 centred on x, is included in X and Sx2 , i.e., S 2 centred on x, in X c . Y ← X ⊗ S ≡ x|Sx1 ⊂ X, Sx2 ⊂ X c An image is a set of elements (pixels, voxels,. . . ) with underlying vectorspace, with vectors k, that represent the positions of the voxels within the image. In an image, we have the foreground set of elements (value 1) and the background set of elements (value 0). The foreground set is constituted by all objects in the image. For a structuring element S, we can associate with the set S 1 foreground elements and with the set S 2 background elements. These are all do care elements. To use a structuring element of a fixed size and shape, e.g., to operate upon 3n neighbourhoods Mk , centred around element xk in an nD image X, we adopt the notion of don’t care elements. The transformation Y ← X ⊗ S can be implemented with the neighbourhood transformation {∀k : yk ← Mk ∼ = S}. The structuring element S can be used to perform operations like the erosion. The erosion operation on an image X by a structuring element S, εS (X), is equal to all voxels x ∈ X such that Sx ∈ X, εS (X) = {x|Sx ⊆ X}. The structuring element S can also be used to put constraints upon the erosion, such as in topology preserving erosion (denoted topological erosion) in which only simple elements are eroded. A voxel v belonging to the foreground of image X is called simple if X is homotopic to X\ {v}. Topological erosion is used, e.g., in skeletonization.
3
Shape Primitives
A voxel and its 3 × 3 × 3 neighbourhood comes in four states. It can be part of a single voxel object, it can be a part of a space curve, it can be a part of a curved surface, or it can be part of a volume. As such it can be assigned
320
Stina Svensson and Pieter P. Jonker
˜ = 0 and N ˜ = 3, respectively. ForeFig. 1. Structuring elements for conditions for N ground voxels are shown in light grey and background voxels are transparent.
Fig. 2. Curve primitives. Foreground voxels are shown in light grey and background voxels in dark grey.
˜ with N ˜ = 0..3, respectively. For an object dimension or intrinsic dimension N topological erosion to obtain a surface skeleton, only a volume voxel can be changed from foreground to background if it is on the object boundary, i.e., it has a face neighbour to the background. Changing a foreground surface voxel to background would cause (locally) the creation of a tunnel [16], or, expressed differently, a foreground surface is “pierced” by a background curve. Changing a foreground curve voxel to background would cause (locally) the breaking of one foreground component into two foreground components, or, expressed differently, a foreground curve is “sliced” into two curve parts by a background surface. Changing an isolated foreground voxel to background would cause removal of ˜ = 0 and N ˜ = 3, topological erosion can that foreground component. For N be obtained using the structuring elements in Fig. 1 as conditions. To deal with ˜ = 1 and N ˜ = 2, we use the concept of shape primitives. A detailed description N ˜ = 1, are given by the can be found in [5]. Shape primitives for curves, i.e., N voxel v and two of its neighbours u and w. These two neighbours should be disconnected. Considering that 26-connectedness is used for the foreground and 6-connectedness for the background, we have for a foreground curve primitive that u, w ∈ N 26 (v) and u ∈ / N 26 (w) and for a background curve primitive that 6 6 u, w ∈ N (v) and u ∈ / N (w). Curve primitives are shown in Fig. 2 (rotated and ˜ = 2, mirrored primitives are not shown). Shape primitives for a surface, i.e., N can be generated by encircling the voxel v by a simply connected curve, i.e., each curve voxel has exactly two neighbours in the curve. For a foreground surface primitive, the curve is a set of n voxels ui , where n ≥ 4 and ui ∈ N 26 (ui+1 ). This, together with ui = v, necessarily gives ui ∈ N 18 (v), which effectively expresses that surfaces are locally 18-connected. For a background surface primitive, the curve is a set of at least n voxels ui , where n ≥ 6 and ui ∈ N 6 (ui+1 ). This, together with ui = v, necessarily gives ui ∈ N 18 (v). Surface primitives are shown in Fig. 3 (rotated and mirrored primitives are not shown).
4
Surface Skeletonization
The shape primitives described in the previous section can be used to find the structuring elements (or, simply, masks) for the conditional erosion to be used
On the Use of Shape Primitives for Reversible Surface Skeletonization
321
Fig. 3. Surface primitives. Foreground voxels are shown in light grey and background voxels in dark grey.
during skeletonization. To generate the needed masks, curve and surface primitives are suitably intersected. In fact, all possible combinations of intersection of foreground surface primitives and background curve primitives as well as all possible intersections of foreground curve primitives and background surface primitives should be investigated. The masks that will actually be used are those for which a foreground surface is prevented from being penetrated by a background curve (thus, avoiding the creation of a tunnel) and those for which a foreground curve is prevented from being sliced by a background surface (thus, avoiding breaking a foreground component into two components). The masks can be viewed in Fig. 4. For their details, see [5].
Fig. 4. Mask set for topological erosion with respect to surfaces, left, and curves, right. Foreground voxels are shown in light grey, background voxels in dark grey, and don’t care voxels transparent.
Using iterated conditional erosion on the foreground (with the generated masks) is not enough to guarantee a topology preserving removal of voxels. For example, two-voxel thick parts of the foreground can not be properly detected using a 3 × 3 × 3 neighbourhood. This can be solved based on the use of subiterations, where the foreground is eroded from one direction only in each subiteration; or a subfield sequential method, where the image is examined in a directional and sequential fashion [4]. Yet another approach is to use a recursive neighbourhood [17]. In this case, masks are applied both simultaneously and sequentially, and each structuring element is matched both in the input and in the output image, where only voxels from the current iteration are considered. The recursive neighbourhood method is the fastest procedure [17].
322
Stina Svensson and Pieter P. Jonker
The shape primitives are also used to generate surface and curve end conditions [5]. A curve extends from a voxel v in two directions. The curve end conditions are found by setting one of the two neighbours of v to background. A surface extends from a voxel v in four directions. Systematically one and two directions can be set to background, yielding half and quarter surfaces.
Fig. 5. From left to right: Object, non-reversible surface skeleton, centres of maximal D26 balls, and D26 surface skeleton.
The first step in the skeletonization algorithm is to compute the D26 DT, from which the CMBs are extracted. Anchoring the surface skeleton onto the CMBs will give a reversible skeletonization. For the actual skeletonization, the process is the same as in [5], except for two details. We use distance guided erosion, meaning that for each erosion iteration, we only consider voxels with a distance label equal to the iteration number. CMBs are never removed. We denote this skeleton the D26 surface skeleton. For details on the implementation, we refer to [18]. In Fig. 5, we show the surface skeletons for two objects as well as the set of CMBs and the surface skeleton obtained when anchoring to the CMBs is omitted. The latter is denoted non-reversible surface skeleton. The examples are “dog” and “pot plant”.
5
Reversibility and Surface Preservation
The reversible surface skeletons, using the anchoring to the CMBs, are reversible with respect to the D26 distance: the object can be fully recovered from the skeleton, e.g., by applying the reverse D26 DT. If the anchoring in CMBs is omitted, the resulting surface skeleton will not be reversible. The surface skeletonization algorithm described in [5], from which this distance guided algorithm originates,
On the Use of Shape Primitives for Reversible Surface Skeletonization
323
Fig. 6. Recovered objects when anchoring is omitted, left. Difference with respect to the original objects, right.
aims to compute a surface skeleton that is centred within the object with respect to the Euclidean distance, i.e., to give a surface skeleton which is stable under rotation. If we assign distance labels from the Euclidean DT to the surface skeleton and then apply the reverse Euclidean DT (here computed in a brute-force way by, for each skeletal voxel v, adding a Euclidean ball centred on v with radius equal to the distance label of v), the objects in Fig. 6 are obtained. Comparing to the original objects, we have recovered 16903 of the 17229 voxels for the “dog” and 8413 of the 8871 for the “pot plant”. If we consider a more brick-like object, the situation is slightly worse, see Fig. 7. Compared to the original objects, we have recovered 259475 of the 283604 voxels for the “pyramid” and 260658 of the 288040 for the “rotated pyramid”. Observe that the surface skeletons differ in shape for different rotations. This is due to the problem of perfect alignment of objects made of square voxels in a square tessellated grid. Possibly, special surface-end preservation conditions can be made that detect the situations in which the object is perfectly aligned to the grid, and hence could make the top row of Fig. 7 similar to the bottom row. However, for this probably a larger support of the surface end-preservation conditions is necessary, e.g., 53 , but such a patch may also have as drawback that it preserves voxels in non-aligned cases, where this is not preferable. This needs further research. In conclusion, anchoring to the CMBs should be used to ensure reversibility, however, this does not ensure that the obtained skeleton is independent of the rotation of the original. Using anchoring to CMBs also does not guarantee that a surface skeleton without jagged borders is obtained. Fig. 8 shows that the anchoring in CMBs gives dashed lines from the top of the cone to the ground plane. The skeletonization algorithm produces a correct result, however, it anchors to single points in space and not to surface patches: the cone is eroded surface by surface and the CMBs emerge on the gradually eroded surface of the cone leading to single voxel skeleton branches. Hence, the Christmas tree pattern is correct, although it does not look nice. If we also want to preserve jagged surface borders, we can add additional conditions, e.g., as shown in Fig. 9. (They have the same effect as the surface preservation condition in [6]). Note, however, that these masks only preserve battlements where there is only a single background voxel between each pair of foreground voxels. Its effect is a form of surface border closing with a support of one. If there is a distance of two background voxels between each pair
324
Stina Svensson and Pieter P. Jonker
Fig. 7. From left to right: Object, non-reversible surface skeleton, recovered object, difference with respect to the original object, and D26 surface skeleton. (The pyramid in the bottom row is a rotated version (45◦ around y) of the pyramid in the top row.)
Fig. 8. From left to right: Object, centres of maximal D26 balls, D26 surface skeleton, and improved D26 surface skeleton.
Fig. 9. Additional masks intended for surface shape preservation. Foreground voxels are shown in light grey, background voxels in dark grey, and don’t care voxels transparent.
of foreground voxels, larger support, e.g., 53 needs to be taken. But this leads to a scale problem; how large should the support be? The actual cause is in the fact that we want to obtain a certain surface boundary property (that involves a direction), but we anchor to points (that have no direction). Further research has to be done on, e.g., using the centres of maximum ellipses, which give anchors that are line-pieces. This probably will lead to smooth surface skeleton borders, as then the surface end-conditions will match on them, in contrast with the current situation, where only the curve end-conditions match on the anchor points.
On the Use of Shape Primitives for Reversible Surface Skeletonization
6
325
Conclusion
In this paper, we have pointed out the difficulties in finding an algorithm for computing a reversible surface skeleton of a 3D object which results in a surface skeleton that is also independent of the orientation of the original object with respect to the grid and has no jagged surface borders. We showed that surface skeletons resulting from the algorithm described in [5] are fairly stable under rotation and that the original object could be largely recovered. For full reversibility, anchoring to centres of maximal balls is needed. We have shown the results both for a distance guided algorithm derived from [5] and based on the D26 DT, leading to the conclusion that we can make fully reversible surface skeletons that are centred within the original object with respect to a D26 distance. But also that even if we combine [5] with anchoring on centres of maximal Euclidean balls, the problems with orientation and jagged surface borders remain. To better preserve jagged surface borders, conditions that perform a closing operation on the skeleton border in addition to the topology preservation conditions, can be used to improve the result. However, the orientation problem due to perfect alignment of objects and masks, made of square voxels on a square tessellated grid, remains. In real applications the orientation dependence and the jagged surface skeleton border are minor problems. From a theoretical point, however, it would be interesting to develop an algorithm, for which the skeletal properties of orientation independence and smooth surface skeleton boundaries are guaranteed.
Acknowledgement The work presented in this paper was carried out while Stina Svensson visited the Pattern Recognition Group in Delft, The Netherlands. This was made possible by a grant from The Swedish Foundation for International Cooperation in Research and Higher Education (STINT).
References 1. Attali, D., Lachaud, J.O.: Delaunay conforming iso-surface, skeleton extraction and noise removal. Computational Geometry 19 (2001) 175–189 2. Leymarie, F.F., Kimia, B.B.: The shock scaffold for representing 3D shape. In Arcelli, C., Cordella, L.P., Sanniti di Baja, G., eds.: Visual Form 2001. Volume 2059 of Lecture Notes in Computer Science., Capri, Italy, Springer-Verlag (2001) 216–228 3. Ma, C.M., Wan, S.Y.: A medial-surface oriented 3-d two-subfield thinning algorithm. Pattern Recognition Letters 22 (2001) 1439–1446 4. Pal´ agyi, K., Kuba, A.: A parallel 3D 12-subiteration thinning algorithm. Graphical Models and Image Processing 61 (1999) 199–221 5. Jonker, P.P.: Skeletons in N dimensions using shape primitives. Pattern Recognition Letters 23 (2002) 677–686
326
Stina Svensson and Pieter P. Jonker
6. Svensson, S.: Reversible surface skeletons of 3D objects by iterative thinning of distance transforms. In Bertrand, G., Imiya, A., Klette, R., eds.: Digital and Image Geometry. Volume 2243 of Lecture Notes in Computer Science., Dagstuhl, Germany, Springer-Verlag (2002) 395–406 7. Rosenfeld, A., Pfaltz, J.L.: Sequential operations in digital picture processing. Journal of the Association for Computing Machinery 13 (1966) 471–494 8. Borgefors, G.: Applications using distance transforms. In Arcelli, C., Cordella, L.P., Sanniti di Baja, G., eds.: Aspects of Visual Form Processing. World Scientific Publishing Co. Pte. Ltd. (1994) 83–108 9. Borgefors, G.: On digital distance transforms in three dimensions. Computer Vision and Image Understanding 64 (1996) 368–376 10. Svensson, S., Borgefors, G.: Digital distance transforms in 3D images using information from neighbourhoods up to 5 × 5 × 5. Computer Vision and Image Understanding 88 (2002) 24–53 11. Danielsson, P.E.: Euclidean distance mapping. Computer Graphics and Image Processing 14 (1980) 227–248 12. Ragnemalm, I.: The Euclidean distance transform in arbitrary dimensions. Pattern Recognition Letters 14 (1993) 883–888 13. Borgefors, G., Ragnemalm, I., Sanniti di Baja, G.: The Euclidean distance transform: Finding the local maxima and reconstructing the shape. In Johansen, P., Olsen, S., eds.: Proceedings of Scandinavian Conference on Image Analysis (SCIA’91), Pattern Recognition Society of Denmark (1991) 974–981 14. Nystr¨ om, I., Borgefors, G.: Synthesising objects and scenes using the reverse distance transformation in 2D and 3D. In Braccini, C., Floriani, L.D., Vernazza, G., eds.: Proceedings of ICIAP’95: Image Analysis and Processing, Springer-Verlag (1995) 441–446 15. Serra, J.: Image Analysis and Mathematical Morphology. Academic Pres, inc. (1982) 16. Kong, T.Y.: A digital fundamental group. Computers & Graphics 13 (1989) 159– 166 17. Jonker, P.P.: Morphological operations in recursive neighbourhoods. Accepted for publication in Pattern Recognition Letters (2002) 18. Jonker, P.P.: Lecture notes on mathematical morphology for 2, 3 & 4 dimensional images and its implementation in soft and hardware. In Wojciechowski, K., ed.: Summer School on Mathematical Morphology and Signal Processing. Volume 2., Zakopane, Poland (1995) 41–120 ISBN 83-904743-2-8.
d-Dimensional Reverse Euclidean Distance Transformation and Euclidean Medial Axis Extraction in Optimal Time David Coeurjolly Laboratoire LIRIS Universit´e Lumi`ere Lyon 2 5 avenue Pierre Mend`es-France F-69676 Bron, France [email protected]
Abstract. In this paper, we present optimal in time algorithms to solve the reverse Euclidean distance transformation and the reversible medial axis extraction problems for d-dimensional images. In comparison to previous technics, the proposed Euclidean medial axis may contain less points than the classical medial axis. Keywords: Reverse Euclidean distance transform, medial axis extraction, d-dimensional shapes.
1
Introduction
In binary images, the distance transformation (DT) and the geometrical skeleton extraction are classical tools for shape analysis [15, 16]. The distance transformation consists in labelling each pixel of an object with the distance to the closest pixel of its complement (also called the background). Obviously, a distance transformation algorithm is deeply linked to the subjacent metric. In the digital image literature, the problem of approximating the Euclidean distance or isotropic property of digital object is worth interesting to. Hence, for the DT problem, we have mask based or chamfer distances [2, 14, 16, 19]; vector displacement based Euclidean distance [5, 12]; Voronoi diagram based Euclidean distance [4, 8] or Squared distance based Euclidean distance [7, 9, 17]. In a computational cost point of view, several of these methods lead to optimal in time algorithms in order to compute the error-free Euclidean Distance Transform (EDT) for n-dimensional binary images [4, 7, 8]. Skeleton or medial axis is a classical and convenient representation of a shape for description or recognition purpose [1]. Many definitions exist to define such an object [10]. A classical one defines the skeleton as the set of center pixels of maximal disks covering the shapes. A maximal disk is a disk contained in the shape not exactly covered by another disk contained in the shape. Many discrete implementations of these models have been proposed either for chamfer distances [2, 6, 15] or for Euclidean distance [13, 18, 19]. I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 327–337, 2003. c Springer-Verlag Berlin Heidelberg 2003
328
David Coeurjolly
In this paper, we investigate the d-dimensional medial axis (MA for short) extraction upon the error-free Euclidean distance. A first problem prior to the MA extraction is the reverse Euclidean distance transformation (REDT). Formally, given a set of points associated to their Euclidean distance values, how can we reconstruct the shape resulting from the overlapping of the corresponding balls ? An optimal in time algorithm is given to solve this problem. Based on this process, we present an optimal in time algorithm to compute a reduced medial axis on d-dimensional shapes. In section 2, we first detail algorithms to solve the EDT problem for ddimension in a linear time. Based on these technics, we optimize the REDT algorithm proposed by Satio and Toriwaki [18] to obtain a linear in time algorithm in the section 3. Then, in section 4, we present an optimal in time algorithm that extract a reversible subset of the classical medial axis. Finally, we present possible generalizations of these algorithms to other grids.
2
d-Dimensional Euclidean Distance Transformation
In [17], Saito et al. propose an n-dimensional approach for the EDT problem. The authors present a simple n−dimensional algorithm that labels image pixels with the squared distance to the closest background pixel. This process is done dimension by dimension and allows simple generalization in d-dimension. We present the algorithm in the 2-D case: we consider a two dimensional binary image P of size n × n, B denotes the non-empty set of background pixels and the output of the algorithm is a 2-D image S = {sij } storing the squared distance transform. For each point p(i, j) of the image, the squared distance transform is given by: sp = min{dist2 (p, q)} q∈B
=
min {(i − x)2 + (j − y)2 } .
(1) (2)
q(x,y)∈B
This formulation of the problem leads to an efficient two pass process for the squared distance transform (SDT for short) labelling in 2-D: 1. Build from the source image P , an one-dimensional SDT according to the first dimension (x−axis), denoted by G = {gij }, where gij =
min {(i − x)2 } .
(3)
p(x,y)∈B
2. Then, construct the sij image with a y−axis process: sij = min{giy + (j − y)2 , 1 ≤ y ≤ n} . y
(4)
This algorithm provides a direct implementation of d-dimensional EDT algorithm: we only have to compute an one-dimensional SDT for the initialization step (step 1 of the previous algorithm) and then add, for each greater dimension,
SDT
SDT
d-Dimensional Reverse Euclidean Distance Transformation
12
12
8
8
4
4
1
2
3
4
5
y
1
2
3
4
5
329
y
Fig. 1. Illustration of the Hirata’s optimization: let [1, 4, 4, 9, 4] be a column of G after step 1, (left) the set of parabolas (j − y)2 + giy and (right) the bold curve is the lower envelope. Thus the result of the minimization process is [1, 2, 4, 5, 4].
a mix process (step 2) that merges results of inferior dimensions. In a computational cost point of view and given a d-dimensional binary shape of size nd , the first step can be done in a linear time in the number of grid points, i.e. O(nd ). In [17], the authors present an O(Avg.nd ) algorithm that computes each mix step where Avg denotes the average of the Euclidean distance values in the image (Avg = O(n) without any assumptions on the input image). In [7] and [9], Hirata and Roerdnik et al. independently present optimal algorithms to solve the min operation and thus propose an optimal in time algorithm for the EDT. The idea is to see the min operation as a lower envelope computation of a set of parabolas. More precisely, let us suppose we have computed the step 1 of the algorithm (x-axis SDT) and let {giy } (1 ≤ y ≤ n) be a column of G. If we consider the set of parabolas fyi (j) = (j − y)2 + giy , the column {siy } after step 2 is exactly the lower envelope of fyi with 1 ≤ y ≤ n (see figure 1). In [7] and [9], the authors present an O(n) algorithm to compute such a lower envelope using a parabola elimination process. Finally, for a d-dimensional image, the dimensional mix processes are computed in O(nd ) and thus, the global cost to compute the EDT based on this approach is O(nd ).
3 3.1
Reverse Euclidean Distance Transformation Definitions
In [18], Saito and Toriwaki present both definitions and algorithms to compute the d-dimensional REDT. Using their notations and the dimension 2, let us consider L as a set of l points {(xm , ym )}1≤m≤l and fxm ym the squared Euclidean distance value associated to the pixel (xm , ym ). The Reverse Euclidean Distance Transformation of L consists in obtaining the set of points P such that P = {(i, j) | ∃m, (i − xm )2 + (j − ym )2 < fxm ym , (xm , ym ) ∈ L} .
(5)
In other words, a point (i, j) belongs to P if it belongs to at least one disk whose center is a point m of L, with radius fxm ym . Let F = {fij } be a picture of size n × n such that fij is set to fxm ym if (i, j) belongs to L and 0 otherwise. The authors show that equation (5) is equivalent to
330
David Coeurjolly
P = {(i, j) |
max {fxy − (i − x)2 − (j − y)2 } > 0} .
(6)
(x,y)∈F
Hence, if we compute the map H = {hij } such that hij = max {fxy − (i − x)2 − (j − y)2 } .
(7)
(x,y)∈F
we obtain P by extracting from H all pixels of positive values. So, to build H from F , we can decompose the computation into two one-dimensional processes: 1. Build from the image F the picture G = {gij } such that gij = max{fxj − (i − x)2 , 1 ≤ x ≤ n} . x
(8)
2. Build from G the picture H such that hij = max{giy − (j − y)2 , 1 ≤ y ≤ n} . y
(9)
To prove this decomposition, we can substitute the equation (8) to the equation (9) and we obtain the equation (7). Finally, we can design similar algorithms to solve the REDT as those proposed for the EDT labelling; we only have to replace the minimization steps by the maximization steps and thus to compute upper envelope of parabolas (see next section). Note that this process can be easily extended to d-dimensional images, we just have to compute d one-dimensional maximization steps. In [18], Saito and Toriwaki use their algorithm presented in [17] to compute the REDT and they obtain a computational cost in O(Avg.nd ) for a d-dimensional image. In the next section, we detail an O(nd ) algorithm to compute the REDT. 3.2
Optimal REDT Algorithm
The basic idea of the optimal REDT algorithm is to use the Hirata’s parabola elimination process to compute maximization steps. We detail the optimization of step 1 in the previous algorithm, all other steps can be easily deduced. First of all, for a given column j of F , we define a function describing a parabola: Fx (i) = fxj − (i − x)2
(10)
and a function that computes the abscissa of the intersection between two parabolas. Thus, we have to find the point i such that Fu (i) ≥ Fv (i) with u < v. Hence, the “separation” between the parabolas is given by 2 u − v 2 − fuj + fvj . (11) Sep(u, v) = 2(u − v) Based on these elementary functions, the algorithm presented in figure 2 computes the upper envelope of the parabolas {Fx }. This algorithm is derived from Meijster et al.’s one [9] (similar to Hirata’s one). The idea is to manipulate
for all j ∈ [0..n − 1] do q := 0; s[0] = 0; t[0] = 0; for u := 1 to n − 1 do while (q ≥ 0) and (Fs[q] (t[q]) < Fu (t[q])) do 5: q := q − 1; 6: end while 7: if q < 0 then 8: q := 0; s[0] := u; 9: else 10: w := 1 + Sep(s[q], u); 11: if w < n then 12: q := q + 1; s[q] := u; t[q] := w; 13: end if 14: end if 15: end for 16: for u := n − 1 downto 0 do F 17: guj := Fs[q] (u); G 18: if u = t[q] then 19: q := q − 1; REDT 20: end if 21: end for 22: end for
1: 2: 3: 4:
SDT
d-Dimensional Reverse Euclidean Distance Transformation
331
s[q]
6 4 2
u t[q] 0
0
1
2
4
0
0
0
−3
0
1
3
4
3
0
−5
0
0
1
1
1
1
0
0
Fig. 2. Pseudo-code of the optimal one-dimensional upper envelope parabola computation (left); an illustration of the notations and an example in one-dimension (right).
two arrays s and t that simulate a parabola stack. The array s contains the set of parabola apexes (tops of a parabola) of the upper envelope and t the intersection abscissa between two consecutive parabolas in s. In lines 3 − 15 we compute the upper envelope and those arrays s and t; and in lines 16 − 21, we construct the map G using s and t. The computational cost of this upper envelope extraction is O(n) if n is the size of a row in F . Finally, we can use this algorithm to compute step 2 and construct P , by thresholding H, in O(n2 ) if F is a n × n image. More generally, if we apply it for all one-dimensional maximization steps, we have a global complexity in O(nd ) for a d-dimensional image, which is optimal in time.
4 4.1
Euclidean Medial Axis Extraction Definitions
First of all, we introduce some definitions. Definition 1 (Maximal ball). A maximal ball is a ball contained in the shape not exactly covered by another ball contained in the shape. Based on this property, we can define the medial axis: Definition 2 (Medial axis). The medial axis (MA for short) of a shape is the set of maximal ball centers contained in the shape. In [18], Saito and Toriwaki define a geometrical Euclidean skeleton based on elliptic paraboloids in dimension 2. Such an elliptic paraboloid of center (i, j) and height qij is given by the following equation
332
David Coeurjolly
0 ≤ z < qij − (x − i)2 − (y − j)2 .
(12)
The intersection between such a domain and the plane z = 0 is a disk of center √ (i, j) and radius qij . We say that an elliptic paraboloid is contained in a shape √ S if the disk of center (i, j) and radius qij is contained in S. In the following, we prove that the Saito and Toriwaki’s skeleton is a subset of the medial axis of a shape. Definition 3 (Maximal elliptic paraboloid). A maximal elliptic paraboloid is an elliptic paraboloid contained in the shape not exactly covered by another elliptic paraboloid contained in the shape. Note that this object can be generalized to d-dimension shapes. Proposition 1. Let (i, j) be a point in a continuous shape and qij be a number. √ The disk D of center (i, j) and radius qij is maximal if and only if the elliptic paraboloid P of center (i, j) and height qij is maximal. Proof. Note that D is the intersection between P and z = 0. We first prove the left to right implication. If we suppose that P is not maximal, there exists another elliptic paraboloid P such that P contains P . Thus the intersection D between P and the plane z = 0 contains the intersection D between P and the same plane. Hence, there exists a disk D contained in the shape that contains D, and so D is not maximal. Conversely, we suppose that D is not maximal. Hence, there exists a disk D such that D contains D. We denote by P the elliptic paraboloid, uniquely defined, such that D is the intersection between P and z = 0. If we suppose / P . Let us that P does not contain P , there exists a point p ∈ P such that p ∈ consider the intersections between P and P with a plane H perpendicular to z = 0 that contains p and the center of P . In the plane H and using the elliptic paraboloid definition, P (resp. P ) leads to the domain 0 ≤ z < fu − (x − u)2
(resp.
0 ≤ z < fv − (x − v)2 ) ,
(13)
with u, v, fu , fv ∈ R (see figure 3-(b)). Since H contains the center of P and p, these domains are not empty. Using the notations of figure 3-(b), D contains D implies that both m and n belong to D . Furthermore, since p does not belong to P , the two parabolas given by equation 13 must have two intersection points a and b. However, using the equation 13, such parabolas only have one intersection point if u = v. Since the upper parts of the parabolas are excluded, u = v implies that the intersection is empty. Hence such a point p does not exist and thus P contains P which finally proves that P is not maximal. Note that this proof can be generalized to other dimensions because we transform the problem into a 1-dimension parabola intersection. Hence, in the continuous plane, maximal balls and maximal elliptic paraboloids coincide. In [18], the authors use the term “skeleton” to describe a geometric
d-Dimensional Reverse Euclidean Distance Transformation
333
20
z=fu-(x-u)2
a
15
A
x p z=fv-(x-v)
C
2
B
10
b
5
m
0 0
2
(a)
n 4
6
8
10
12
14
16
18
20
22
24
(b)
Fig. 3. Skeleton definitions: (a) comparison between maximal balls and maximal elliptic paraboloids; (b) notations for the proof of the proposition 1 (left two parabolas) and differences between the medial axis and Sk (right three parabolas).
object not based on a topological feature preservation. Let Q = {qij } be a SDT of the shape. The skeleton Sk is defined by Sk = {(i, j) | ∃(x, y), (i − x)2 + (j − y)2 < qij , and max{quv − (x − u)2 − (y − v)2 } = qij − (x − i)2 − (y − j)2 } . (u,v)
In other words, Sk is the set of elliptic paraboloids that belong to the upper envelope (in dimension 2) of all elliptic paraboloids whose heights are given by the squared distance transform. Using proposition 1, we have the corollary: Corollary 1. In the continuous plane, Sk is a subset of the medial axis. Furthermore, the original figure can be reconstructed by Sk. Proof. First of all, all elliptic paraboloids that belong to the upper envelope are maximal by definition of such an envelope. Since maximal elliptic paraboloids and maximal balls coincide, points in Sk belong to the medial axis. Some maximal elliptic paraboloids may not belong to Sk as illustrated in figure 3-(b) in the 1-D case: the parabolas {A, B, C} belong to the medial axis whereas only the parabolas A and C belong to Sk (B is covered by the union of A and C). To prove the second statement, we remark that the definition of Sk strictly coincides with the reverse distance transformation equations of section 3. Once Sk is computed, if we threshold the height values of the upper envelope elliptic paraboloids by 0, we obtain the original shape [18]. If we consider a binary shape in dimension d, Saito and Toriwaki [18] use the O(Avg.nd ) REDT process to extract the skeleton Sk. The idea is to mark upper envelope elliptic paraboloids. If we use the optimal REDT algorithm proposed in the previous section, we obtain an algorithm to compute Sk in O(nd ) which is optimal for the problem. 4.2
Reduced Medial Axis Extraction
In the previous section, we prove that the skeleton Sk is a subset of the medial axis in the continuous case. As illustrated in dimension 2 in figure 4-(left), this
334
David Coeurjolly
property does not hold in the discrete case. In the following we present a filtering process to transform Sk points into maximal ball centers in the discrete case. In the 2D case, let us consider a binary shape and its skeleton Sk. We denote by {Fx (i)}i=0..N the sequence of parabolas given by the intersection between the Sk elliptic paraboloids and the column j of the image. Hence, each parabola is such that Fx (i) = fxj − (i − x)2 . In this one-dimension case, the differences between Sk and the discrete medial axis are illustrated in figure 4-(left): {D, E} belong to Sk whereas only D belongs to the medial axis. We denote by Dx the disk associated to Fx (i) (i.e. a segment in the onedimensional case). Furthermore, we consider the discrete disk Dx associated to Dx as the set of discrete points contained in Dx . To only consider discrete maximal disk in Sk, we have to remove all points x such that Dx is not maximal. Given two parabolas of centers x and x , we have a simple test, denoted Incl(x,x’), to decide if Dx contains Dx (we just compare segment extremities). Let us denote by [ly , ry ] the interval given by a disk Dy . We consider the list L of parabolas sorted according to the left extremity of the segments. If some parabolas have got the same left extremity coordinate, we sort such parabolas according to the right extremity position (see figure 4-(right)). If n denotes the size of the column j in the image, the list L can be computed in O(n) (we store the extremities in two arrays of size n during the scan of the parabolas). If two segments are identical, we remove one of them and we label the other one with a flag “double” (see definition 4). Using L, we have a simple algorithm presented in figure 4-(b) to remove from the set {Fx (i)} all points that do not belong to the discrete medial axis. In this algorithm, we scan the parabolas according to the L order and we test the inclusion of two consecutive parabolas in a greedy process. Hence, the computational cost of this filtering algorithm is O(n). The resulting set of parabolas is stored in the array s and the correctness of this algorithm is given by the proposition: Proposition 2. The associated disk of a parabola is maximal if and only if the parabola belongs to s at the end of the filtering process. Proof. First of all, if the list L is reduced to one parabola, the associated disk is maximal and it belongs to s. We prove the proposition by induction. Let us consider the step k (k ≥ 1) in the algorithm 4-(b). We suppose that, at this point, s contains the maximal disk of the parabolas in {L(i)}0≤i≤k and we consider the disk [u, v] of the parabola L(k + 1). Note that the order of parabola in s is the same as the order of parabola in L. We denote by [m, n] the segment associated to s[q] (last inserted parabola in s). If the test Incl(s[q],L(k + 1)) is true, the segment [m, n] contains the segment [u, v], and so L(k + 1) is not maximal and this parabola is not inserted in s. If we suppose that the inclusion test fails, L(k + 1) is inserted in s. First of all, the segment [u, v] cannot contain a segment in s. Indeed, by definition of L, if a parabola x is before the parabola x in L, then the segment associated to x cannot contain the segment associated to x. Hence L(k + 1) does not change the maximal property of the segments in s. To complete the proof, we show that if the test fails, no segment in s contains the segment [u, v]. Let us consider a segment [a, b] in s such that [a, b] contains
d-Dimensional Reverse Euclidean Distance Transformation
335
[u, v] and such that the segment [a, b] is not associated to s[q]. So, we have b ≥ v and a ≤ u. If the inclusion test fails between [u, v] and [m, n], then v > n (we have u ≥ m by construction of L). Hence, we have b > v. This leads to the contradiction that [a, b] contains [m, n] because the segments in s are supposed to be maximal. Finally, it is sufficient to consider the inclusion test between L(k + 1) and the last inserted parabola in s to construct the set of maximal disks from the set {L(i)}0≤i≤k+1 . In higher dimensions, we apply this filtering process in each dimension and we define the reduced medial axis as follows: Definition 4 (Reduced Medial Axis). Let P be a binary shape in dimension d and Q the SDT of P . We consider Sk the Satio and Toriwaki’s skeleton of P . The reduced medial axis (RMA for short) is the set of points (i, j) such that there exists at least one row in one of the d dimensions in which the parabola associated to (i, j, qij ) is preserved and not labeled “double” during the onedimensional filtering process. Theorem 1. Let P be a binary shape in dimension d, the RMA is a subset of the discrete medial axis of the shape, it has the reversibility property and the RMA extraction is in O(nd ). Proof. According to the corollary 1, Sk is a subset of the continuous medial axis of the shape. Let us consider a discrete ball B, we prove that if B is preserved at the end of the filtering process, then B belongs to the discrete medial axis. If we suppose that B is not maximal in the discrete case, there exists another ball B such that B contains B. During the filtering process, in each dimension, the segments associated to B will either be removed or labeled “double” because they are contained in B segments. Hence, the ball B will be removed from Sk. Finally, all resulting balls are maximal in the discrete plane. Furthermore, since the parabola removal process between two parabolas maintains the reversibility property, the final result allows us to reconstruct the shape. Concerning the computational cost, the Sk computation is done in O(nd ) and for each row in each dimension, the one-dimensional filtering process computational cost is linear in the number of parabolas in the row. Hence the global cost of the filtering is linear in the number of points in P , which is optimal for the problem. 4.3
Results and Generalizations
In this section we present some results of both the REDT and skeleton extraction algorithms in dimension 2 and 3. The figure 5 presents results on several 2-D and 3-D shapes. Note that the EDT of images is computed using the Hirata’s algorithm [7]. Due to the one-dimensional decomposition process of the REDT and skeleton extraction algorithms, several generalizations that have been proposed for EDT algorithm can be done. For example, the same algorithms can be used for ddimensional elongated grids (different scale factors between axis). We just have to insert those scale factors in the process without changing the algorithms [18].
336
David Coeurjolly 20 D
1: 2: 3: 4: 5: 6:
15
10 E
q := 0; s[0] = 0; for k := 1 to N do if not Incl(s[q], L(k)) then q := q + 1; s[q] := L(k); end if end for
5
0 0
2
4
6
8
10
12
14
16
Fig. 4. An illustration of the difference between the skeleton Sk and the discrete medial axis (left): dashed segments indicate Euclidean balls {Dx } and plain segments the discrete balls {Dx }. Right: Pseudo-code for the filtering process of Sk points and illustration of the algorithm (arrows indicate the order in L or in s).
Fig. 5. Results of skeleton extraction on 2-D and 3-D images: first row the input binary shapes, second row the Sk skeleton extraction and last row RMA extraction results.
5
Conclusion
In this article, we first have optimized the REDT computation algorithm and have obtained a computation cost in O(nd ) for a d-dimensional image which is optimal in time (nd is the total number of grid points). Then, we have presented a d-dimensional reversible RMA extraction algorithm in O(nd ). We have shown that the proposed RMA is a subset of the classical medial axis of the shape. In future works, we expect further optimizations of the RMA extraction process to reduce again the number of points. The final goal of this optimization should be to compute the optimal reversible skeleton of a shape (in the sense of having minimal number of points, see [3, 11] for related papers). Furthermore, we would like to illustrate the d-dimensional algorithms on real data in higher dimensions.
d-Dimensional Reverse Euclidean Distance Transformation
337
References 1. H. Blum. A transformation for extracting descriptors of shape. In Models for the Perception of Speech and Visual Forms, pages 362–380. MIT Press, 1967. 2. G. Borgefors. Distance transformations in digital images. Computer Vision, Graphics, and Image Processing, 34(3):344–371, June 1986. 3. G. Borgefors and I. Nystr¨ om. Efficient shape representation by minimizing the set of centers of maximal discs/spheres. Pattern Recognition Letters, 18:465–472, 1997. 4. D. Coeurjolly. Algorithmique et g´eom´etrie discr`ete pour la caract´erisation des courbes et des surfaces. PhD thesis, Universit´e Lumi`ere Lyon 2, Bron, Laboratoire ERIC, dec 2002. 5. P. E. Danielsson. Euclidean distance mapping. CGIP, 14:227–248, 1980. 6. G. Sanniti di Baja. Well-shaped, stable, and reversible skeletons from the (3,4)distance transform. J. Visual Communication and Image Representation, 5:107– 115, 1994. 7. T. Hirata. A unified linear-time algorithm for computing distance maps. Information Processing Letters, 58(3):129–133, May 1996. 8. C.R. Maurer Jr., V. Raghavan, and R. Qi. A linear time algorithm for computing the euclidean distance transform in arbitrary dimensions. In Information Processing in Medical Imaging, pages 358–364, 2001. 9. A. Meijster, J.B.T.M. Roerdink, and W. H. Hesselink. A general algorithm for computing distance transforms in linear time. In Mathematical Morphology and its Applications to Image and Signal Processing, pages 331–340. Kluwer, 2000. 10. U. Montanari. Continuous skeletons from digitized images. Journal of the Association for Computing Machinery, 16(4):534–549, oct 1969. 11. F. Nilsson and P.-E. Danielsson. Finding the minimal set of maximum disks for binary objects. Graphical models and image processing, 59(1):55–60, January 1997. 12. I. Ragnemalm. Contour processing distance transforms, pages 204–211. World Scientific, 1990. 13. I. Ragnemalm. The Euclidean Distance Transform. PhD thesis, Link¨ oping University, Link¨ oping, Sweden, 1993. 14. E. Remy and E. Thiel. Optimizing 3D chamfer masks with norm constraints. In Int. Workshop on Combinatorial Image Analysis, pages 39–56, Caen, July 2000. 15. A. Rosenfeld and J. L. Pfaltz. Sequential operations in digital picture processing. Journal of the ACM, 13(4):471–494, October 1966. 16. A. Rosenfeld and J. L. Pfalz. Distance functions on digital pictures. Pattern Recognition, 1:33–61, 1968. 17. T. Saito and J. I. Toriwaki. New algorithms for Euclidean distance transformations of an n-dimensional digitized picture with applications. Pattern Recognition, 27:1551–1565, 1994. 18. T. Saito and J.-I. Toriwaki. Reverse distance transformation and skeletons based upon the euclidean metric for n-dimensionnal digital pictures. IECE Trans. Inf. & Syst., E77-D(9):1005–1016, sept. 1994. 19. E. Thiel. G´eom´etrie des distances de chanfrein. Habilitation ` a Diriger des Recherches, Universit´e de la M´editerran´ee, Aix-Marseille 2, D´ec 2001.
Efficient Computation of 3D Skeletons by Extreme Vertex Encoding Jorge Rodr´ıguez1 , Federico Thomas2 , Dolors Ayala1 , and Llu´ıs Ros2 1
Technical University of Catalonia (UPC) Computer Science Department (LSI) Diagonal 647, 8 planta, 08028 Barcelona, Spain {jrodri,dolorsa}@lsi.upc.es 2 Industrial Robotics Institute (CSIC-UPC) Llorens Artigas 4-6, 2 planta, 08028 Barcelona, Spain {fthomas,llros}@iri.upc.es
Abstract. Many skeletonisation algorithms for discrete volumes have been proposed. Despite its simplicity, the one given here still has many theoretically favorable properties. Actually, it provides a connected surface skeleton that allows shapes to be reconstructed with bounded error. It is based on the application of directional erosions, while retaining those voxels that introduce disconnections. This strategy is proved to be specially well-suited for extreme vertex encoded volumes, leading to a fast thinning algorithm. Keywords: 3D surface skeleton, mathematical morphology, extreme vertex encoding.
1
Introduction
The computation of three-dimensional skeletons is a fundamental tool for an increasing number of applications related to shape matching and tracking, virtual navigation, shape abstraction, animation control, growth modelling, analysis of symmetries, path planning, feature recognition, etc. Actually, the skeleton of a solid has been proposed as an alternative to its boundary or constructive solid model because it provides a complete representation [6]. The word skeleton is usually understood in 2D to mean the medial axis of a given shape. The medial surface of a 3D solid is defined similarly to its 2D counterpart: it is the set of the centers of all inscribed spheres of maximal radius. The computation of the medial surface for arbitrary solids is a complex problem. Herein we concentrate ourselves in the computation of skeletons of discrete solids, i.e. objects described in terms of sets of voxels. When working in a discrete space, spheres must be necessarily approximated and, as a consequence, the concept of skeleton should be redefined. Then, 3D discrete skeletons are approximations of the medial surface. In any case, the three common required properties for any of these approximations are [15]: (a) it should have no interior (thinness); (b) it must be homotopic to the object it corresponds to, i.e. it must preserve I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 338–347, 2003. c Springer-Verlag Berlin Heidelberg 2003
Efficient Computation of 3D Skeletons by Extreme Vertex Encoding
339
26-neighbor connectedness for the foreground and 6-neighbor connectedness for the background (connectedness); and (c) it must retain sufficient information to reconstruct the original object (reconstructability). Other interesting criteria for some applications have been recently introduced [17], but it is important to realize that in the discrete space all these requirements become mutually incompatible [7], and, as a consequence, practical skeletonisation methods are invariably a compromise between them. Although other alternatives are possible [17], two main methods to obtain skeletons in discrete spaces have been proposed: methods based on thinning and methods based on distance transforms. Thinning methods produce skeletons by iteratively deleting voxels from the boundary of the solid. The deletion of a voxel can be in a sequential algorithm or parallel one. Each iteration of sequential algorithms consists, in general, of three steps: (1) identify all border voxels, and label them with the iteration number; (2) inspect all voxels labelled with the current iteration number, and mark those that cannot be removed in order to preserve the shape of the original solid; and (3) remove all unmarked border voxels. An example of algorithm using this technique can be found in [12]. The methods based on distance transforms first convert the volume, which consists of object (foreground) and non-object (background) voxels, into an object where every object voxel has the value corresponding to the minimum distance to the background. Different types of metrics for discrete solids are used, aiming at approximating the euclidean distance so that rotation-invariance is attained to some extend. Then, the ridges of the induced scalar field constitute the skeleton. In general, these algorithms are not iterative, so that the skeleton is produced in a fixed number of passes through the solid. An example of algorithm using this technique can be found in [11]. The procedure proposed here can be outlined as follows. Those voxels whose deletion by a directional erosion might destroy the connectedness are retained and classified as gaps, then the region is eroded and the corresponding residuals computed. Gaps and residuals are retained in the solid, and this process is repeated until no progress is made. This approach satisfies the requirements given above in the following order of priority: connectedness, thinness and reconstructability. The preservation of the connectivity is the essential condition for the skeleton in order to extract the shape of the original solid. In our case, shapes can be nearly reconstructed with an error bounded to one voxel. This strategy was first introduced in [7] for 2D objects. A similar approach has recently been presented in [12], but using the classical template-based thinning approach. In our case, we avoid this explicit use of templates by reducing all the steps of the presented algorithm to boolean operations between solids. This paper is structured as follows. In order to make it as self-contained as possible, the required morphological operations, as well as the concepts of residuals and gaps associated with directional erosions, are introduced in Section 2. The skeletonisation algorithm is presented in Section 3. The adopted spatial encoding and how boolean operations can be performed between encoded objects is described in Section 4. The results, showing how this encoding improves the performance of the algorithm are detailed in Section 5.
340
Jorge Rodr´ıguez et al. z
x
y
Fig. 1. Numbering assignment to the 26 neighbors of a voxel.
2
Background
¯ = Z 3 \X denote Let Z 3 be the discrete space. Let X ⊂ Z 3 be a 3D solid. Let X the background of X. The connectivity used herein is (26,6)-connectivity, which means 26-connectivity for the solid and 6-connectivity for the background. Each of the 26 neighbors of a voxel in the solid defines a vector which will be numbered as shown in figure 1. The erosion of X using the structuring element B is defined as X B = {y|∀b ∈ B, y+b ∈ X}, its dilation using the same structuring element as X ⊕B = {y|y = x + b, x ∈ X, b ∈ B}, and its opening as XoB = ((X B) ⊕ B). The residual, X⊥B, is the set made of those voxels in X which do not belong to its opening using the structuring element B, that is, X⊥B = X\(XoB). If Xb denotes the translation of X in the direction associated with b ∈ B, then it can be shown that X B = b∈B X−b . In other words, this erosion can be accomplished by taking the intersection of all the translates of X, where the shifts in the translates are the negated members of B, seen as vectors. An especially interesting case for B is that in which B consists of two voxels, where one is centered in the origin. Then, the erosion of X using B can be computed simply by X B = X ∩ X−b , and its opening by XoB = (X ∩ X−b ) ∪ (X ∩ X−b )b . Since (B1 B2) B3 = B1 (B2 ⊕ B3), then, if B = B1⊕B2⊕· · ·⊕Bk, one concludes that XB = (. . . [(X B1) B2] · · · Bk). Thus, if a structuring element can be broken down to a chain of dilations of smaller substructuring elements, the desired operation may be performed as a sequence of suboperations. As a first approximation, a skeleton can be defined as the set of all the residuals of the successive erosions of X, using the following simple algorithm: algorithm T1; input: X; output: S; S ← ∅; while X = ∅ E ← X B; S ← S ∪ (X\(E ⊕ B); X ← E; endwhile; end.
Efficient Computation of 3D Skeletons by Extreme Vertex Encoding
341
Now, let us assume that B is a centered 3x3x3 cubic structuring element which can be broken down into a chain of 6 dilations of two-voxel elements in the directions 0, 2, 4, 6, 8, and 10. Then, the above algorithm can be rewritten as follows: algorithm T2; input: X; output: S; S ← ∅; while X = ∅ E ← X ∩ X0 ∩ X2 ∩ X4 ∩ X6 ∩ X8 ∩ X10 ; S ← S ∪ (X\(E ∪ E4 ∪ E2 ∪ E8 ∪ E0 ∪ E6 ∪ E10 )); X ← E; endwhile; end.
The main advantage of this algorithm over T1 is that it only involves directional erosions and dilations along the coordinate axes. Although the skeleton that it obtains allows to entirely reconstruct the initial shape by simply dilating each voxel of the result (according to the iteration in which it was obtained), it is neither thin nor connectivity preserving. The first drawback can be easily overcome as follows: algorithm T3; input: X; output: S; S ← ∅; A ← ∅; while X = ∅ /* directional erosion along y+ */ E ← X ∩ X0 ; A ← A ∪ (X\(E ∪ E4 )); X ← E; /* repeat for directions z+, x+, y-, z-, and x- */ .. . S ← S + A; endwhile; end.
Now, since residuals are independently thin (they are obtained from single directional erosions), the obtained skeleton is thin. As a counterpart, the original shape can only be nearly reconstructed but, as it has already been pointed out, thinness and reconstructability are mutually incompatible goals. Then, shapes can be nearly reconstructed with an error, in our case, bounded to one voxel. In order to overcome the remaining drawback (connectivity) we first introduce the concept of directional gaps. Those voxels required to ensure connectivity in the final skeleton and not included in the medial surface computed by algorithm T3, will be part of a set of disjoint regions that we call gaps. Contrarily to what one might expect, when considering only directional erosions, gaps can
342
Jorge Rodr´ıguez et al.
be easily computed. For example, the directional gap of a binary region X in direction y+ can be obtained by computing: (X\X0 ) ∩ ((X7 \X6 ) ∪ (X1 \X2 ) ∪ (X20 \X16 ) ∪ (X22 \X17 ) ∩ (X21 \X14 ) ∪ (X23 \X15 ) ∪ (X12 \X10 ) ∪ (X9 \X8 )) .
(1)
This boolean formula detects the voxel configurations shown in figure 2 and all its π/2 rotated versions around the y axis. While grey cubes correspond to points in the foreground, transparent ones correspond to those in the background. Gaps along the other coordinate axes, either in positive or negative directions, can analogously be obtained. The above expression is obFig. 2. Gaps. tained as a generalization of the two-dimensional case. It is worth noting that the concept of gaps, first introduced in [7], is closely related to the set of β templates presented in [12]. z
y
x
3
The Thinning Algorithm
The motivation behind our thinning algorithm is seen as follows. First those voxels whose deletion by a directional erosion might destroy the connectedness are retained and classified as gaps, then the region is effectively eroded and the corresponding residual computed. Gaps and residuals are removed from the solid in order to concentrate the thinning effort on the thick region. Iterations continue until the solid becomes empty. The following algorithm in pseudo-code implements this procedure. algorithm T4 input: X; output: S; /* skeleton of X*/ S ← ∅; L ← ∅; do /* erosion along y+ */ I ← ∅; /* increment of skeleton */ G ← (X\X0 ) ∩ [(X7 \X6 ) ∪ (X1 \X2 ) ∪ · · · ∪ (X9 \X8 )]; /* gap */ E ← X ∩ X0 ; /* eroded solid */ R ← X\ (E ∪ E4 ); /* residual */ I ← I ∪ R ∪ G; X ← E ∪ I; /* repeat for directions z+, x+, y-, z-, and x- */ ... X ← X\L; S ← S ∪ L; L ← I\L; until (X == ∅); end.
Efficient Computation of 3D Skeletons by Extreme Vertex Encoding
(a)
(b)
343
(c)
Fig. 3. Result of the application of T3 (b), and T4 (c) on the solid in (a). The input solid is shown in wireframe.
Note that the number of iterations is exactly half the 1-1-1 maximum thickness of the volume. This is perhaps the simplest thinning algorithm directly expressed in terms of boolean operations that provides a connected single-pixelin-width well-centered homotopic skeleton. Figure 3 shows a solid and the obtained skeletons using algorithms T3 and T4. The thinning action of T3 causes clear topological changes. The introduction of gaps by T4 fixes the problem. Moreover, it avoids the explicit consideration of a huge number of 3D templates, characterizing deletable voxels. Finally, it is worth to highlight three properties of the proposed algorithm. First, each pixel of the skeleton can be labelled with its 1-1-1 distance to its nearest volume border by storing the iteration in which it was obtained. Second, the iterations continue until a volume becomes empty, which is faster than detecting idempotence. Third, the thinning effort is concentrated on the thick volume; in other words once a thin volume is obtained, it is removed from the image and hence it is no longer considered. This is of interest when processing spatially-encoded objects, as described in the next section.
4
Extreme Vertex Encoding
Any subset of Z3 is geometrically analogue to an orthogonal pseudo-polyhedron (OPP), that is, a polyhedron with all its faces oriented according to the three orthogonal coordinate axes and with a non-manifold boundary [9]. Figure 4(a) shows an OPP with seven non-manifold vertices and three non-manifold edges. Let P be an OPP and Πc a plane whose normal is parallel, without loss of generality, to the X axis, intersecting it at x = c, where c ranges from −∞ to ∞. Then, this plane sweeps the whole space as c varies within its range, intersecting P at some intervals. Let us assume that this intersection changes at c = c1 , . . . , cn . More formally, P ∩ Πci −δ =P ∩ Πci +δ , i = 1, . . . , n, where δ is an arbitrarily small quantity. Then, Ci (P ) = P ∩ Πci is called a cut of P and
344
Jorge Rodr´ıguez et al.
Fig. 4. (a) An OPP with three non-manifold edges and seven non-manifold vertices. (b) A brink from vertex A to vertex E where cuts and sections perpendicular to the X axis are shown in dark and light grey, respectively. (c) Forward and backward differences in black and light grey, respectively.
Si (P ) = P ∩ Πcs , ci < cs < ci+1 , is called a section of P . Figure 4(b) shows an OPP with its cuts and sections perpendicular to the X axis. Since we work with bounded regions, S0 (P ) = P ∩ Π−∞ = ∅ and Sn (P ) = P ∩ Π∞ = ∅, n being the total number of cuts along a given coordinate axis. The concept of cuts and sections can be extended to pseudopolygons. Hence, we can compute the cuts and sections of Ci (P ) by intersecting it with a sweeping line. Each resulting 1D cut is, in general, a set of disjoint segments. Each of these segments is called a brink. Actually, a brink is a maximal uninterrupted segment built out of a sequence of collinear edges of P . The ending vertices of a brink are called extreme vertices. Figure 4(b) shows an OPP with a brink going from vertex A to vertex E and traversing the non-extreme vertices B, C and D. The organization of extreme vertices of an OPP in terms of all its brinks parallel to a given coordinate axis, then in terms of 1D cuts and, finally, in terms of 2D cuts is called an extreme vertex encoding [2]. This codification can obviously be done in six different possible ways depending on the chosen sequence of coordinate axis: XYZ, XZY, YXZ, YZX, ZXY, or ZYX. For example, in an XYZ ordering, the 2D cuts are perpendicular to the X axis and ordered from low to high x values and, for each of them, the 1D cuts are parallel to the Y axis and ordered from low to high y values. Sections can be computed from cuts and vice versa by noting that: Si (P ) = Si−1 (P ) ⊗ Ci (P ), and Ci (P ) = Si−1 (P ) ⊗ Si (P ), for i = 1 . . . n, where ⊗ denotes the regularized XOR operation. Applying the definition of ⊗ in the above expression for Ci , we get Ci (P ) = Si−1 (P ) ⊗ Si (P ) = (Si−1 (P )\Si (P )) ∪ (Si (P )\Si−1 (P )), for i = 1 . . . n. As a consequence, we can decompose any cut into two terms named forward difference and backward difference defined as Fi (P ) = Si−1 (P )\ Si (P ) and Bi (P ) = Si (P )\Si−1 (P ), respectively. Figure 4(c) shows an OPP with
Efficient Computation of 3D Skeletons by Extreme Vertex Encoding
345
its sections perpendicular to the X axis, together with the corresponding forward and backward differences. It can be checked that Fi (P ) and Bi (P ) coincide with sets of faces of P with normals pointing forward and backward, respectively. This guarantees that the correct orientation of all faces of P can be obtained from its extreme vertex encoding. This together with the fact that sections can be computed from cuts allows to prove the completeness of this codification as a boundary model [3], [4]. Moreover, its efficiency in terms of memory requirements, compared to that of the semiboundary representation [16], which is usually considered as the most efficient block-form representation, has been shown to be clearly favorable [1]. Arbitrary boolean operations between OPPs can be carried out by applying recursively the same operation over the corresponding OPP sections which can be reduced to XOR operations between the extreme vertex encodings of both operands [2]. The computation of gaps in algorithm T4 consists of successive boolean operations between the input object, X, and a displaced version of itself, Xd , d = 1, . . . , 25. The direct implementation of these operations requires computing the extreme vertex encoding of all possibly displaced operands. For example computing expression (1) would require computing 17 extreme vertex encodings. Fortunately, this can be avoided so that displacements of an object are taken into account, introducing the proper indices, directly when operating with them. In practice this means that a directional erosion or dilation takes the same time as an AND or OR operation.
5
Results and Conclusions
The above algorithms have been implemented in C on a Sun Ultra2-2200 workstation with one processor running at 360 Mhz and 4 Mbytes of cache memory. It has been criticized that in many papers on skeletonisation of solid objects the only given examples are tiny test images, which makes it difficult to understand what would be the results for reasonable sized real objects [5]. In our case, T4 has been tested on 3-D images obtained using a CAT device. Figure 5 shows the result for a human vertebra. Before skeletonisation, it contains 22263 voxels. The memory required to store it using its extreme vertex encoding is 26 Kbytes, instead of 59 Kbytes required by the semiboundary codification. The total number of extreme vertices for this model is 6028, that is, 3014 brinks. The skeleton is computed in 1004 seconds if the codifications of the shifted objects are explicitly computed before operating with them. This time drops to 399 seconds when these computations are avoided. Table 1 shows the evolution of the processing time for each iteration using both approaches in separated columns. Note the reduction in the time required for each iteration thanks to the adopted incremental strategy that concentrates the computational effort in the thick regions. The last column of Table 1 gives an idea of the complexity of these thick regions in terms of their total number of brinks before the corresponding iteration. The resulting skeleton contains 6059 voxels, encoded using 3387 brinks.
346
Jorge Rodr´ıguez et al.
Fig. 5. Top-left: A voxel model of a human vertebra. Top-right: its skeleton, overlaid on the original voxel model (in wireframe). As a rule of thumb, we can think of darker areas as corresponding to places where the erosion is deeper. Bottom: zoom-in on three selected areas. Table 1. Iteration
t1
1 2 3 4 5 6 7 Total
274 329 253 101 34 12 4 1007
t2 Brinks
108 132 98 41 13 5 2 399
6028 5814 4174 1798 628 216 78
We have presented an skeletonisation algorithm fully described using only differences, unions and intersections of possibly shifted solids. It has been shown how the efficiency of the algorithm can be greatly improved and its memory requirements dramatically reduced by using extreme vertex encoded solids. The enormous algorithmic difficulties caused when working at a voxel level due to the identification of all removable voxels using large look-up tables – as standard thinning algorithms usually do – are thus avoided. Thanks to this fact, it can be applied – contrarily to all other alternative algorithms – to other solid representations, as long as they allow the easy computation of regularized boolean operations.
Efficient Computation of 3D Skeletons by Extreme Vertex Encoding
347
References 1. J. Rodr´ıguez, D. Ayala and A. Aguilera, “A Complete Solid Model for Surface Rendering,” in Geometric Modeling for Scientific Visualization, Springer-Verlag, to appear, 2003. 2. A. Aguilera and D. Ayala, “Orthogonal Polyhedra as Geometric Bounds in Constructive Solid Geometry”, ACM SM’97, pp. 56-67, 1997. 3. A. Aguilera and D. Ayala, “Domain extension for the extreme vertices model (EVM) and set-membership classification,” CSG’98. Ammerdown (UK), Information Geometers Ltd., pp. 33-47, 1998. 4. A. Aguilera and D. Ayala, “Converting orthogonal polyhedra from extreme vertices model to B-Rep and to alternative sum of volumes,” Computing Suppl. SpringerVerlag, No. 14, pp. 1-28, 2001. 5. G.B. Borgefors, I. Nystrom and G. Sanniti Di Baja, “Computing skeletons in three dimensions”, Pattern Recognition, Vol. 32, pp. 1225-1236, 1999. 6. J.W. Brandt, “Describing a solid with the three-dimensional skeleton,” SPIE Curves and Surfaces in Computer Vision and Graphics III, No. 1830, pp. 258-269, 1992. 7. R. Cardoner and F. Thomas, “Residuals + directional gaps = skeletons,” Pattern Recognition Letters, Vol. 18, pp. 343-353, 1997. 8. P.P. Jonker, “Morphological operations on 3D and 4D images: from shape primitive detection to skeletonization,” Proc. 9th Conf. on Discrete Geometry for Computer Imaginery, (DGCI 2000), LNCS 1953, pp. 371-391, Springer Verlag, 2000. 9. L. Latecki, “3-D well-composed pictures,” Graphical Models and Image Processing, Vol. 59, No. 3, pp. 164-172, 1997. 10. Ch. Lohou and G. Bertrand, “A new 3D 6-subiteration thinning algorithm based on p-simple points,” Proc. 9th Conf. on Discrete Geometry for Computer Imaginery, (DGCI 2000), LNCS 1953, pp. 102-113, Springer Verlag, 2000. 11. G. Malandain and S. Fern´ andez-Vidal, “Euclidean skeletons,” Image and Vision Computing, Vol. 16, pp. 317-327, 1998. 12. A. Manzanera, T.M. Bernard, F. Prˆeteux, and B. Longuet, “Medial faces from a concise 3D thinning algorithm,” Proc. 7th IEEE Int. Conf. on Computer Vision, pp. 337-343, 1999. 13. A. Manzanera, T.M. Bernard, F. Prˆeteux, and B. Longuet, “Ultra-fast skeleton based on an isotropic fully parallel algorithm,” Proc. 8th Conf. on Discrete Geometry for Computer Imaginery (DGCI 1999), Lecture Notes in Computer Science 1568, pp. 313-324, Springer Verlag, 1999. 14. E. Remy and E. Thiel, “Medial axis for chamfer distances: computing look-up tables and neighbourhoods in 2D and 3D,” Pattern Recognition Letters, Vol. 23, No. 6, pp. 649-661, 2002. 15. A. Sudhalkar, L. G¨ urx¨ oz, and F. Prinz, “Box-skeletons of discrete solids,” Computer-Aided Design, Vol. 28, No. 6-7, pp. 507-517, 1996. 16. J. Udupa and O. Odhner, “Fast visualization, manipulation and analysis of binary volumetric objects,” IEEE Computer Graphics and Applications, Vol. 11, No. 6, pp 53-62, 1991. 17. Y. Zhou and A.W. Toga, “Efficient skeletonization of volumetric objects,” IEEE Transactions on Visualization and Computer Graphics, Vol. 5, No. 3, pp. 196-209, 1999.
Surface Area Estimation of Digitized Planes Using Weighted Local Configurations Joakim Lindblad Centre for Image Analysis, Uppsala University L¨ agerhyddsv. 3, SE–75237 Uppsala, Sweden [email protected]
Abstract. We describe a method for estimating surface area of threedimensional binary objects. The method assigns a surface area weight to each 2 × 2 × 2 configuration of voxels. The total surface area is given by a summation of the local area contributions for a digital object. We derive optimal area weights, in order to get an unbiased estimate with minimum variance for randomly oriented planar surfaces. This gives a coefficient of variation (CV) of 1.40% for planar regions. To verify the results and to address the feasibility for area estimation of curved surfaces, the method is tested on convex and non-convex synthetic test objects of increasing size. The algorithm is appealingly simple and uses only a small local neighbourhood. This allows efficient implementations in hardware and/or in parallel architectures. Keyword: Surface area estimation, marching cubes, optimal weights, digital planes, local voxel configurations
1
Introduction
Surface area of three-dimensional (3D) objects is an important feature for image analysis. In digital image analysis, we are given only a digitized version of the original continuous object. Digital surface area measurements can therefore only be estimates of the true surface area of the original object. Quantitative analysis of digital images requires that such estimates are both accurate and precise, i.e., that the estimates agree well with the true measure on the continuous object and that we get similar values for repeated measurements. A good estimator should be unbiased, i.e., the expected value of the estimate should be equal to the true value. To be precise, an estimator should also have as small Mean Squared Error (MSE) as possible. For an unbiased estimator the MSE is equal to the variance σ 2 . From a theoretical point of view, multigrid convergence, is a very appealing property of an estimator (see e.g. [6]). This ensures that the estimate converges toward the true value, as the grid resolution increases. However, from a practical viewpoint, grid resolution is rarely a parameter that can be easily increased. A property that is attractive from a practical point of view, is locality. Local techniques compute features using information only from a local part of the image. Computation of different parts of the image are independent, and can thus I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 348–357, 2003. c Springer-Verlag Berlin Heidelberg 2003
Surface Area Estimation of Digitized Planes
349
be performed in parallel. Local algorithms are, in general, simple to implement and efficient in terms of computing power. This together with their inherent parallelism makes them most suitable for demanding real-time applications. Unfortunately, due to the limited distance that information is allowed to travel in local algorithms, it is believed that they never can be made multigrid convergent in a general sense. This leads to a trade-off situation, between the desire for a local algorithm, due to reasons of simplicity and speed, and the improved performance at higher resolutions given by multigrid convergent estimators. The best estimator for one particular situation may well differ from the best choice in another situation. In this paper, we present a method to obtain accurate surface area estimates with high precision, using only local computations and avoiding strong assumptions about the object of study. The estimator is based on assigning area weights to local configurations of binary voxels. The weights for the different configurations are optimized in order to give an unbiased estimate with minimal MSE for an isotropic distribution of flat surfaces of infinite size. We verify the performance of the estimator by applying it to synthetic test objects of increasing size with randomized alignment in the digitization process.
2
Background
In 2D image analysis, the perimeter of a digital object can be estimated as the cumulative distance from pixel centre to pixel centre along the border of the object. This is straightforward to accomplish using the Freeman chain √ code [4], but results in an incorrect estimate. The weights 1 for isothetic and 2 for diagonal steps are not optimal when measuring digitized line segments. By assigning optimized weights [8,12] to the steps, a more accurate perimeter estimate is obtained. Weights for the 2D case have been optimized for infinitely long straight lines and have then been proven to perform even better for curved contours [3]. In addition to the above local type of estimators, different multigrid convergent perimeter estimators exist, most of them based on finding straight line segments and performing a polygonalization of the object. See e.g [2], for a compact overview. A straightforward and simple approach to get a surface area estimate of a 3D object is to count the number of foreground voxels with a neighbour in the background. This, however, results in a quite severe underestimate. Assigning a weight to each border voxel of 1.2031 instead of 1, we get an unbiased estimate for an isotropic distribution of planar surfaces [11]. The variance of that estimate is however rather large, due to the fact that we do not differentiate between different types of surface voxels. Mullikin and Verbeek [11] propose a method where each voxel in the foreground is classified depending on the number of six-connected neighbours it has in the background. This gives a total of 9 different types of configurations of boundary voxels. Three of these configurations, namely voxels with one, two, and three neighbouring voxels in the background, occur for flat surfaces and are also, the by far most commonly appearing for smooth objects sampled at reasonable resolution. Mullikin and Verbeek derive optimal weights for these three
350
Joakim Lindblad
cases, in order to get an unbiased estimate with minimal MSE for planar surfaces. The weights are W1 = 0.8940, W2 = 1.3409, and W3 = 1.5879, respectively, in sample grid units squared. This gives an unbiased estimate with a coefficient of variation (CV=σ/µ) of 2.33% for planar surfaces. Since their method is dependent on what is defined to be foreground and background in the image, the estimate will change if applied to the complementary image. The true area of the surface between a continuous object and the background does not change if we interchange what is object and what is background, and a desired property of a surface area estimator is, therefore, that it gives the same result on the complementary image. Mullikin and Verbeek suggest to use the average of the estimate for the original and the complementary image, to achieve a symmetric estimate. Note that this property of symmetry with respect to foreground and background is slightly dependent on the chosen digitization method. We have in this paper used Gauss centre point digitization, where the digitized object is defined to be the set of all grid points (voxel centres) contained in the continuous set. Recent publications [5,7,1] have studied multigrid convergent surface area estimators with promising results. Klette et al. [7] use global polyhedrization techniques to arrive at a surface area estimate, whereas Coeurjolly et al. [1] present an efficient algorithm based on discrete normal vector field integration, where the problem of surface area estimation is transformed into a problem of normal vector estimation.
3
Surface Area Estimation
The method proposed in this paper has similarities to the one described in [11]. Both methods count the occurrence of a set of local configurations of binary valued voxels. Each configuration is assigned an area contribution and the total area is calculated as the sum of the local area contributions over the surface of the object. Where Mullikin and Verbeek use the six-connected neighbourhood, we use a 2 × 2 × 2 neighbourhood. This gives a total of 13 different types of surface configuration, five of them appearing for flat surfaces. The increased number of cases gives a better discrimination between different normal directions, and therefore an improved surface estimate is achieved. In addition, since the 2×2×2 neighbourhood is symmetric with respect to foreground and background, the estimate does not change if we apply it to the complementary image. The use of the 2 × 2 × 2 neighbourhood makes the configurations appearing in the method similar to the ones of the Marching Cubes algorithm [10]. In fact, assigning a surface area to each configuration equal to the surface area of the triangles of the Marching Cubes algorithm, is not a bad idea. This leads to an overestimate of 8%. If we divide the result with the factor 1.08, we get an unbiased estimate with a remaining CV of 2.25%. In [9], empirical optimization when varying the weight of one of the configurations (case 5) was studied with promising results. Note that the optimization that we perform in this paper is based only on the binary voxel configurations, and we do not care about possible triangulations of the actual surface.
Surface Area Estimation of Digitized Planes
3.1
351
m-Cubes
An m-cube (short for Marching Cube), is the cube bounded by the eight voxels in a 2 × 2 × 2 neighbourhood. Hence, each corner of the m-cube corresponds to a voxel. In a binary image, the possible number of configurations of the eight voxels is 256. Using symmetry, the 256 configurations can be grouped into 14 (or 15) cases [10], see Fig. 1. We number the cases according to [10], except for the mirrored cases 11 and 14, which we group into one, case 11.
0
1
2
3
4
5
6
7
8
9
10
11
12
13
Fig. 1. m-cubes of 2 × 2 × 2 voxels. Voxels denoted by a • are inside the object. The complementary cases are classified to be the same as the original cases. Only cases 1, 2, 5, 8, and 9, (emphasized ) appear for planar surfaces.
Surface area weights, Ai , are assigned to the different cases. One of the configurations, case 0, does not represent a boundary situation, and therefore has zero area contribution. The sum of the area weights for all surface m-cubes of an object gives an estimate of the total surface area of that object. The histogram presenting the cardinality, Ni , of each of the 13 surface configurations (skipping case 0) is computed for each digitized object. The surface area estimate of the particular object is then calculated as Aˆ =
13
Ai Ni .
(1)
i=1
3.2
Planar Surfaces
We optimize the weights, Ai , in order to get an unbiased estimate with minimal MSE for planar surfaces of isotropic orientation. This can be justified by the fact that the surface of an object becomes locally planar as the sampling density increases if the maximum spatial frequency is kept constant. Furthermore, planar surfaces of distinct orientation represents the worst case objects for the estimation method. Since all other objects have a more isotropic distribution of
352
Joakim Lindblad
z’y
z’y
z’x z
z’x
y x
(a) zx + zy ≤ 1
(b) zx + zy > 1
Fig. 2. The surface between object and background, z = zx x + zy y + w, here shown for w = 0.
normal directions, the variance of the estimated surface area of any reasonably shaped object should be lower than that of planar objects. This is verified on the synthetic test objects in Sect. 4. To optimize the surface area weights, we need to study the different types of configurations that appear when a volume of voxels is divided by a randomly oriented and positioned plane. We can, without loss of generality, restrict the analysis to planes with a reduced set of normal directions, due to the symmetry of the sampling grid. We have chosen, in spherical coordinates, the region −π ≤ φ < − 3π 4 ,
0≤θ<
π 2
+ arctan(cos(φ)) ,
(2)
where the transformation from spherical to Cartesian coordinates of the normal vector is given by n = (cos φ sin θ, sin φ sin θ, cos θ). The reason for choosing this set of normal directions is that it allows us to represent the surface plane as a function of x and y, z(x, y) = zx x + zy y + w ,
0 ≤ zy ≤ zx < 1 ,
(3)
where voxels with a centre on, or below, the plane are included in the object. Figure 2 shows two such surfaces of different slopes. Depending on if zx + zy is less or greater than 1, we get two different sets of configurations appearing, as we vary the offset term w. This is illustrated in Figs. 3 and 4. We keep track of the intersection between the surface and all m-cubes. For example, in Fig. 3(b) the lower m-cube is a case 5 and the upper one is a case 1. Note that only five of the 13 possible surface configurations appear for planar surfaces. Since the offset term for randomly aligned planes is uniformly distributed, we can calculate the probability, P (ci ), that an intersected m-cube is of type i, given a specific normal direction n, directly from Fig. 3 for zx + zy ≤ 1,
Surface Area Estimation of Digitized Planes
(a) Case 8 0 ≤ w < 1−zx −zy
(b) Case 5+1 1−zx −zy ≤ w < 1−zx
(c) Case 2+2 1−zx ≤ w < 1−zy
353
(d) Case 1+5 1−zy ≤ w < 1
Fig. 3. The different cases appearing for zx + zy ≤ 1 as w is varied are shown.
P (c1 |n) = 2zy /ztot , P (c2 |n) = 2(zx − zy )/ztot , P (c5 |n) = P (c8 |n) =
2zy /ztot , (1 − zx − zy )/ztot
(4a) (4b) (4c) ,
(4d)
and from Fig. 4 for zx + zy > 1, P (c1 |n) = 2zy /ztot , P (c2 |n) = 2(zx − zy )/ztot , P (c5 |n) = P (c9 |n) =
2(1 − zx )/ztot , (zx + zy − 1)/ztot
(5a) (5b) (5c) ,
(5d)
where ztot = 1 + zx + zy . The total number of intersected m-cubes for a plane of area A and normal direction n is
1+z +z Ntot (n) = √ x 2 y 2 A . 1+zx +zy
(6)
The cardinality of a specific configuration, i, is Ni (n) = P (ci |n)Ntot (n) ,
i = 1 . . . 13
(7)
and the total estimated surface area is given by ˆ A(n) =
13 i=1
Ai Ni (n) .
(8)
354
Joakim Lindblad
(a) Case 5+1 0 ≤ w < 1−zx
(b) Case 2+2 1−zx ≤ w < 1−zy
(c) Case 1+5 1−zy ≤ w < 2−zx −zy
(d) Case 1+9+1 2−zx −zy ≤ w < 1
Fig. 4. The different cases appearing for zx + zy > 1 as w is varied are shown.
3.3
Optimization
We wish to optimize (8) over all normal directions, in order to get an unbiased estimate with minimal MSE. However, since the cardinality of cases 1, 5, and 9, are linearly dependent,1 N1 = N5 + 2N9 , the solution becomes non-unique. By grouping the cases that co-appear, we can still get a unique solution. That is, instead of assigning a surface area to each individual m-cube, we assign an area to the different situations shown in Figs. 3 and 4. Optimizing area contributions, using standard methods, to this new set of cases, leads to the following grouped set of area contributions, in sample grid units squared. A1 + A5 = 1.1897,
2A2 = 1.3380,
A8 = 0.9270,
2A1 + A9 = 1.6942. (9)
This gives an unbiased area estimate with a CV of 1.40% for planar surfaces. For non-planar objects the relation between cases 1, 5, and 9, no longer holds. Therefore, we need area contributions for the individual m-cubes. Since the Marching Cubes triangulation of cases 1 and 9 together represent a well behaved and flat surface (Fig. 5), we have used the ratio given by the corresponding 1 1 triangle areas, i.e., A A9 = 6 . Inserting this into (9) we get the following set of area contributions. A1 = 0.2118, A2 = 0.6690, A5 = 0.9779, A8 = 0.9270, A9 = 1.2706.
(10)
This specific choice, of how to distribute the area between cases 1, 5, and 9, is not indisputable. Since the optimization for planar surfaces does not supply enough information to give a unique solution, further optimization on some other type of objects is required. Note, however, that the distribution of area between cases 1, 5, and 9, affects neither the CV nor the maximum error for planar surfaces, as long as (9) holds. The maximum absolute error is reached for planes 1
Easily verified by observing Figs. 3 and 4. For cases 5 and 9, the neighbouring voxel (voxels for case 9) will always contain a complementary case 1.
Surface Area Estimation of Digitized Planes
Fig. 5. Possible triangulation of cases 1 and 9.
Fig. 6. Estimated surface area divided by true surface area, for different values of zx and zy . The maximum error is reached for zx = zy = 0.
355
Fig. 7. Digitized spherical cap of radius 40 pixels.
aligned with the digitization grid, and the error is then 1−A8 = 0.0730. Figure 6 shows the estimate divided by the true area as a function of the normal direction.
4
Simulations
To verify the results and to address the feasibility for area estimation of curved and non-convex surfaces, the method is tested on synthetic objects of known surface area. The used test objects are balls of radii 0–80, cubes of side length 0– 160, cylinders of height = 2·radius, and, to get a concave object, thick spherical caps (see Fig 7) where radiuscavity = 12 · radiuscap . The objects are generated in the continuous space and then digitized using Gauss centre point digitization in different sizes, and with random rotation and position in the digitization grid. Since our test objects are not only planar surfaces, some additional m-cube cases will be present. To calculate a total area for our test objects we need to assign an area contribution also to those additional cases. Since these cases constitute not more than 0.2% of the total number of surface m-cubes of the test objects, this area contribution will have a very small impact on the overall result. We have assigned the triangle area of a Marching Cubes triangulation to these additional cases.
5
Results
Surface area estimates and average relative errors for digitized objects of increasing resolution can be seen in Fig. 8. Surface area estimates for 10,000 digitized balls of radius 70 pixels, 20,000 cubes of side length 140, 10,000 cylinders of radius 70 and height 140, and 10,000 spherical caps of radius 70 pixels, are summarized in Table 1. The results are (on average) a slight underestimate of the true surface area. This is due to the cutting of corners and edges [13]. For large objects this effect can be neglected, though. The surface of a large ball is a good sampling of planes in all directions and should thus exhibit very low variance. This is verified by the simulations, where
356
Joakim Lindblad
(a)
(b)
Fig. 8. (a) Surface area estimates divided by true surface area for 150,000 digitizations of objects of increasing size. (b) Log plot of relative error for the different objects.
Table 1. Performance on synthetic test objects of radius 70 / side length 140. Object Ball Cube Cylinder Spherical Cap
ˆ mean(A)/A
CV
0.9999 0.9963 0.9975 0.9965
0.015% 0.89% 0.63% 0.32%
mean(|Aˆ − A|/A) max(|Aˆ − A|/A) 0.012% 0.68% 0.45% 0.36%
0.082% 7.14% 3.36% 1.99%
superlinear convergence O(r−α ) , α ≈ 1.5, is observed. The cube have planar surfaces which are aligned so that it represents a worst case situation for the cubic digitization grid, accordingly it shows the worst performance in the simulations.
6
Discussion and Conclusions
We have presented a method for estimating surface area of binary 3D objects using local computations. The algorithm is appealingly simple and uses only a very small local neighbourhood, allowing efficient implementations in hardware and/or in parallel architectures. The estimated surface area is computed as a sum of local area contributions. We have derived optimal area weights for the 2 × 2 × 2 configuration of voxels that appear on digital planar surfaces. The method gives an unbiased estimate with minimum variance for randomly oriented planar surfaces. Theoretic worst case CV for the suggested surface area estimator is 1.40%, and the maximum absolute error is 7.30%. The maximum error is reached for planar surfaces aligned with the digitization grid. The performance of the surface area estimator is verified on more than 200,000 convex and non-convex digitized synthetic objects. Due to the local nature of the method, it cannot be made multigrid convergent. However, for objects of size less than a few hundred voxels in diameter, it is competitive in terms of precision with existing multigrid convergent methods. A more detailed comparison is of inter-
Surface Area Estimation of Digitized Planes
357
est. Further work on finding an optimal and unique distribution of surface area between cases 1, 5, and 9 will follow.
Acknowledgements We thank Doc. Ingela Nystr¨ om, Prof. Gunilla Borgefors, and Nataˇsa Sladoje, for their strong scientific support.
References 1. D. Coeurjolly, F. Flin, O. Teytaud, and L. Tougne. Multigrid convergence and surface area estimation. In Theoretical Foundations of Computer Vision ”Geometry, Morphology, and Computational Imaging”, volume 2616 of LNCS, pages 101–119. Springer-Verlag, 2003. 2. D. Coeurjolly and R. Klette. A comparative evaluation of length estimators. In Proceedings of the 16th International Conference on Pattern Recognition (ICPR), pages IV: 330–334. IEEE Computer Science, 2002. 3. L. Dorst and A. W. M. Smeulders. Length estimators for digitized contours. Computer Vision, Graphics and Image Processing, 40:311–333, 1987. 4. H. Freeman. Boundary encoding and processing. In B. S. Lipkin and A. Rosenfeld, editors, Picture Processing and Psychopictorics, pages 241–266, New York, 1970. Academic Press. 5. Y. Kenmochi and R. Klette. Surface area estimation for digitized regular solids. In L. J. Latecki, R. A. Melter, D. M. Mount, and A. Y. Wu, editors, Vision Geometry IX, pages 100–111. Proc. SPIE 4117, 2000. 6. R. Klette. Multigrid convergence of geometric features. In G. Bertrand, A. Imiya, and R. Klette, editors, Digital and Image Geometry, volume 2243 of LNCS, pages 314–333. Springer-Verlag, 2001. 7. R. Klette and H. J. Sun. Digital planar segment based polyhedrization for surface area estimation. In C. Arcelli, L. P. Cordella, and G. Sanniti di Baja, editors, Visual Form 2001, volume 2059 of LNCS, pages 356–366. Springer-Verlag, 2001. 8. Z. Kulpa. Area and perimeter measurement of blobs in discrete binary pictures. Computer Graphics and Image Processing, 6:434–454, 1977. 9. J. Lindblad and I. Nystr¨ om. Surface area estimation of digitized 3D objects using local computations. In Proceedings of the 10th International Conference on Discrete Geometry for Computer Imagery (DGCI), volume 2301 of LNCS, pages 267–278. Springer-Verlag, 2002. 10. W. E. Lorensen and H. E. Cline. Marching Cubes: A high resolution 3D surface construction algorithm. In Proceedings of the 14th ACM SIGGRAPH on Computer Graphics, volume 21, pages 163–169, 1987. 11. J. C. Mullikin and P. W. Verbeek. Surface area estimation of digitized planes. Bioimaging, 1(1):6–16, 1993. 12. D. Proffit and D. Rosen. Metrication errors and coding efficiency of chain-encoding schemes for the representation of lines and edges. Computer Graphics and Image Processing, 10:318–332, 1979. 13. I. T. Young. Sampling density and quantitative microscopy. Analytical and Quantitative Cytology and Histology, 10(4):269–275, 1988.
Surface Area Estimation in Practice Guy Windreich1 , Nahum Kiryati1 , and Gabriele Lohmann2 1
Dept. of Electrical Engineering–Systems, Tel Aviv University Tel Aviv 69978, Israel [email protected] 2 Max-Planck Institute of Cognitive Neuroscience Stephanstr. 1a, 04103 Leipzig, Germany [email protected]
Abstract. Consider a complex, convoluted three dimensional object that has been digitized and is available as a set of voxels. We describe a fast, practical scheme for delineating a region of interest on the surface of the object and estimating its original area. The voxel representation is maintained and no triangulation is carried out. The methods presented rely on a theoretical result of Mullikin and Verbeek, and bridge the gap between their idealized setting and the harsh reality of 3D medical data. Performance evaluation results are provided, and operation on segmented white matter MR brain data is demonstrated. Keywords: Surface area estimation, digital geometry, voxels objects, morphometric measurements, segmented white matter.
1
Introduction
Consider a three dimensional object that has been digitized and is given as a set of voxels. How can one delimit a region of interest on the surface of the object and estimate its area? Estimating the area of specific regions in the highly convoluted cortical surface is a challenging instance of this generic problem. The cortex is the thin outermost layer of grey matter in the brain; cortical surface area is likely to be related to functional capacities [15]. The interesting problem of topologically-correct brain segmentation in MR images is beyond the scope of this paper. Given the segmented cortical voxel set, a useful surface area measurement process should include three non-trivial steps: Tracing the boundary of the region of interest on the surface, identifying the region surrounded by the boundary, and estimating its area. Marking a boundary contour on a convoluted surface is not straightforward, because parts of the intended curve may not be visible. To overcome this limitation, the user should be able to select a sequence of visible key points, and have them connected automatically to form the boundary. This calls for an efficient algorithm for geodesic path generation, i.e., for finding shortest paths between points on a surface. As to region identification, in the continuous world Jordan’s theorem ensures that a simple closed contour encloses a region and separates the interior and exterior. In the discrete domain, identifying a region of interest by I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 358–367, 2003. c Springer-Verlag Berlin Heidelberg 2003
Surface Area Estimation in Practice
359
its outline is prone to paradoxes, since the discrete version of Jordan’s theorem does not generally hold and different definitions of digital connectivity must be used for the region and its boundary [12]. Estimating the continuous area of a surface that is available only in digital form is fundamentally difficult. Different continuous surfaces, with different surface areas, may have the same digital representation. Furthermore, the voxel representation of smooth continuous surfaces is generally jagged, so the total area of exposed voxel faces is usually much greater than that of the original continuous surface. Transforming the digital object and its surface from their original voxel representation to a triangle-based polyhedral representation has certain advantages. An efficient algorithm for geodesic path generation on triangulated domains is available [8], facilitating key point based boundary generation. Moreover, on the triangulated domain the paths found are continuous, so the boundary outlines the region of interest in a well defined way. Two difficulties however arise. First, when using the marching cubes algorithm [13] to create the triangulated representation, topological ambiguities may occur and holes may be generated [10]. Second, the surface area estimate produced by summing up the area of the resulting triangles does not converge to the true surface area as the resolution increases [7]; this follows from the locality of the marching cubes algorithm. Klette and Sun [11] suggest that surface area estimators that converge to the true surface area can be obtained by using a global polyhedrization method. More efficient algorithms are required in order to make their method practical for large, high resolution data sets; see also [2]. This research follows a different approach: the original voxel representation of the object is maintained and no polyhedrization is carried out. This simplifies the overall system design and avoids difficulties, ambiguities and distortion that may arise due to the application of a polyhedrization process to a complex, convoluted surface. A voxel-based surface area estimator was presented by Mullikin and Verbeek [14]. Extending the planar perimeter estimation methodology [3,5], their estimator is designed to be unbiased and minimize the mean square error (MSE) for planes, and its operation is evaluated with spheres. The estimator of Mullikin and Verbeek [14] is at the core of the method presented here. However, as will be discussed, it cannot be directly applied to complex convoluted surfaces and various difficulties need to be addressed. The creation of a complete voxel-based surface area estimation method, applicable to surfaces as complex as that of the brain, is the focus of this research. Preliminary results were presented in [16].
2 2.1
Delimiting the Region of Interest Border vs. Boundary
Given a 3D object that is represented as a set of voxels, one can easily identify the voxels that are 6-connected to the background and view them as the border of the object. The border set can be represented as a graph; once the user defines keypoints on the border, they can be automatically connected using the algorithm of [9] to obtain a closed contour that encloses the region of interest.
360
Guy Windreich, Nahum Kiryati, and Gabriele Lohmann
Fig. 1. Left: The shortest path generated (solid) between border voxels ‘1’ and ‘2’ does not follow the intended contour (dashed), even though the path is constrained to border voxels. Right: A region grown from the seed voxel (black) within the contour will include not only the intended region of interest (darker voxels in the top layer) but also other voxels in the border set to which they are connected (bottom layer).
Delimiting the region using the voxel-chain contour is inadequate. Consider for example the object detail shown in Fig. 1 (left). The voxels marked ‘1’ and ‘2’ both belong to the border set. When connecting them (as part of the contour generation process) using an algorithm like [9], the connection (solid) will not follow its intended path (dashed). Once a closed contour within the border set has been created, Fig. 1 (right) demonstrates that the contour does not properly enclose the region of interest. A region grown from the seed voxel (black) within the outline will include not only the intended region of interest (darker voxels in the top layer) but also other voxels in the border set to which they are connected (bottom layer). The difficulties associated with using the border voxel set can be eliminated by keeping track of the boundary of the object, the set of voxel faces that separates the object from the background [1]. This is easily verified for the pathological cases shown in Fig. 1. Marking the region of interest requires the following steps. First, detection of the border-set and the boundary. Second, selection of key-faces on the boundary and creation of a chain of faces that connects the key-faces on the surface. Third, seed selection within the region of interest, growing the region of interest on the boundary and associating it with the border-set. Maintaining the borderset representation is crucial, since the surface area estimation algorithm, based on [14], operates on the border-set. 2.2
Algorithms
Border and Boundary Detection. The straightforward approach to simultaneous detection of the border-set and the boundary is to visit each voxel in the 3-D image and determine whether it is 6-connected to a background voxel. Each surface voxel detected is inserted to a border hash table and each boundary
Surface Area Estimation in Practice
361
face to a boundary hash table. In practice, for the brain images used in this research, this operation took only 7 seconds on a 350MHz PC. Thus, sophisticated alternative algorithms were not needed. Constructing the Boundary Adjacency Graph. Following the detection of the border and the boundary, the boundary faces adjacency graph is constructed. For each boundary face, adjacent boundary faces are determined. A simple closed surface can be represented as a directed graph with indegree and outdegree two [1]. Here, we extract a region of interest on the closed surface of an object, i.e., a surface patch. The directed graph representation is not suitable for the surface patch: there will be nodes in the graph with indegree (or outdegree) smaller than two. For example, consider an object that consists of a single voxel. Suppose the region of interest contains five faces out of the six that form the surface of the voxel. Only one of these five faces has four neighbors (two of them are adjacent faces in the sense of [1], that share the outgoing edges with the source face. The other two share the incoming edges). Each of the other four faces has only three edges shared with other faces. Clearly, this simple region of interest cannot be described using the directed graph representation. Thus, to find the boundary faces adjacent to each face, we modify the method of [1] to allow four adjacent faces for each boundary face, one for each of its edges. Connecting Key-Points on the Boundary. Kiryati and Sz´ekely [9] described an efficient algorithm for finding shortest paths on voxel surfaces represented as graphs. Here, given the boundary graph representation, a similar algorithm can be devised to find reasonably short paths between the keypoints defining the region of interest on the surface. As in [9], the sparsity of the boundary adjacency graph allows very efficient search. Unlike [9], where different spatial adjacency relations (link types) between voxels induce different weights for arcs in the surface graph, here all four arcs connecting a boundary face to adjacent faces are equally weighted. This algorithm requires O(N log N ) time, where N is the number of boundary faces. Growing the Region of Interest on the Object Boundary. In the boundary adjacency graph, the degree of each node is between one and four. Like [1], we use a breadth first search algorithm for graph traversal that begins at an arbitrary node. Unlike [1], we have the list of boundary faces in memory so we do not have to detect the boundary, but only to mark the nodes (boundary faces) that are within the region of interest. To allow the search to stop at the borderline of the region of interest, all the nodes representing the outline of the region of interest are marked as they are generated by the shortest-path algorithm. Thus, they are already marked when surface growing starts.
3 3.1
Surface Area Estimation The Estimator of Mullikin & Verbeek [14]
Mullikin and Verbeek [14] extended the theory of 2-D perimeter estimation to 3-D surface area estimation. Their algorithm begins by detecting all surface
362
Guy Windreich, Nahum Kiryati, and Gabriele Lohmann
Fig. 2. The nine unique surface voxel classes (after [14]). A voxel under consideration (dark) is classified according to the arrangement of adjacent surface voxels (lighter grey and white). Only voxels of types S1−3 appear in a planar surface.
voxels, i.e., object voxels that are 6-connected to background voxels. Each voxel is classified into one of nine possible classes, and the surface area is estimated as a linear combination of the class membership values {Ni }: Sˆ =
9
W i Ni
i=1
Each surface voxel is classified according to the number and configuration of its faces that are exposed to the background. Up to rotation and mirroring, there are exactly nine unique voxel classes (Fig. 2), denoted S1−9 . Only voxels of types S1−3 appear in digital planes. Voxel types S4−6 are found in curved border regions. Voxel types S7−9 exist in extreme situations, where the object is a plane, line or point respectively. Having defined the voxel classification scheme, Mullikin and Verbeek determined the weights W1−3 associated with voxels in classes S1−3 , to make the surface area estimate unbiased for random plane orientations and to minimize the mean square error. These weights are W1 ≈ 0.894, W2 ≈ 1.3409 and W3 ≈ 1.5879; the coefficient of variation (CV = σ/µ) for planes is 2.33%. Clearly, an unbiased estimator for planes will have very small errors when operating on curved surfaces, where local estimation errors, obtained at differently oriented patches, essentially cancel out. This methodology does not determine the weights W4−9 associated with classes S4−9 . Following the spatial grid method [4], Mullikin and Verbeek set W4 = 2, W5 = 8/3 and W6 = 10/3. No weights were assigned by Mullikin and Verbeek to voxel types S7−9 . Experimental performance evaluation with spheres
Surface Area Estimation in Practice
363
revealed some bias related to the radius, that can be alleviated by averaging the surface area of the object with that of the background. The surface area estimator of [14] is local. While it does not exhibit multigrid convergence, it operates directly on voxels, is easy to implement, very fast to compute and achieves very reasonable accuracy. Note that multigrid convergence is related to surface area estimation accuracy with resolution approaching infinity (and surface curvature approaching zero). This is not the case in present MR brain images, where the cortical surface curvature is high. For an alternative voxel-based surface area estimation method, see e.g. [6]. 3.2
Voxel Types in Brain Surfaces
Table 1 shows the frequency of the nine surface voxel types in a 160 × 200 × 160 segmented white matter MR brain image (in the grey-white matter interface). It is seen that all nine voxel classes are represented, and that voxel types S4−9 constitute 3.23% of the 187,567 surface voxels. About 1.35% of the surface voxels are of types S7−9 . Table 1. The frequencies of surface voxels types in the grey-white matter interface of a segmented 160 × 200 × 160 MR brain image. S1 S2 S3 S4 S5 S6 S7 S8 S9 total No. of voxels 65878 46532 37218 1151 2547 1383 468 923 725 187567 Frequency (%) 42.1
29.6
23.7 0.73 1.62 0.88 0.30 0.59 0.46
100
The presence of voxels types S4−9 in brain data necessitates the assignment of weights to these classes, but these are not provided by the design methodology, that is based on digital planes. However, the fairly low frequency of these voxels means that the overall surface estimation accuracy is not too sensitive to the weights selected. As discussed above, weights for classes S4−6 were already proposed in [14]. For voxel classes S7−9 , we suggest the following. For S7 , with two opposite faces exposed to the background, we take the weight to be twice the weight of voxel type S1 (that has only one face exposed to the background), i.e., W7 = 1.79. A voxel in S8 has two pairs of adjacent faces exposed to the background; we can take its weight as twice that of voxel type S2 (that has only one pair of adjacent faces exposed to the background), i.e., 2.68. Alternatively, one can argue that the weight should be 4/5 of the weight of S6 (that has 5 faces exposed to the background). This gives an almost identical weight of 2.67. Thus, for all practical purposes we can take W8 = 2.68. As to S9 , with all 6 faces exposed to the background, the weight can be taken to be the sum of the weights of S6 and S1 (4.23), or twice that of S4 (4) or the sum of the weights of S5 and S2 (4.01). The difference between these values is insignificant considering the low frequency of these voxels. We take the average, W9 = 4.08. Consider a flat, thin object consisting of a single layer of S7 voxels. Suppose that the boundary region of interest is on one side of the object. Each of the S7
364
Guy Windreich, Nahum Kiryati, and Gabriele Lohmann 4
1.2
3.5
1
3 0.8
2.5 0.6
2 0.4
1.5 0.2
1
0
−0.2 2
0.5
4
6
8
10
12
14
16
18
20
0 2
4
6
8
10
12
14
16
18
20
Fig. 3. Left: The relative mean estimation error (percent) in estimating the surface area of spheres, as a function of sphere radius (with object-background averaging). Right: The corresponding coefficient of variation (percent).
voxels has two faces exposed to the background, but only one of them belongs to the region of interest! In this case only half of the weight W7 should contribute to the surface area estimate. Generally, for a voxel with P faces exposed to the background, of which p faces belong to the region of interest on the boundary, we take p · Wi Wip = P where Wi is the voxel class weight. In most cases p = P , so Wip = Wi . 3.3
Performance Evaluation
Mullikin and Verbeek [14] evaluated the performance of their surface area estimator using simulated spheres and cylinders. Here we report on our simulation results, with synthetic spheres and ellipsoids. Small spheres represent objects with high surface curvature, that deviate mostly from the planar surface model used in the design of the estimator. Large spheres, with their uniformly distributed surface normals, can demonstrate the unbiasedness of the estimator. Unlike spheres, surface normal directions on ellipsoids are not uniformly distributed. Testing the surface area estimator on ellipsoids, suggested in [7], indicates the sensitivity of the estimation error to nonuniformity of the normal directions distribution. Note that planar objects, having a single normal direction (or two opposite directions if both sides are considered), are generally the worst case for the estimator: the coefficient of variation (the standard deviation divided by their mean) is 2.33% for randomly oriented planes. Fig. 3 (left) shows the relative mean estimation error (percent) in estimating the surface area of spheres (average of object and background surface areas), as a function of sphere radius. Each point in the graph is based on 50 spheres, whose center points are uniformly distributed within the unit voxel. It is seen that the relative mean estimation error is less than 1% even for spheres of radius 2, that it rapidly decreases as the radius is increased, and is practically zero for radii larger than 10. The coefficient of variation of these measurements is presented
Surface Area Estimation in Practice 0.1
0.08
0
0.07
−0.1
365
0.06
−0.2 0.05 −0.3 0.04 −0.4 0.03
−0.5
0.02
−0.6 −0.7 20
30
40
50
60
70
80
90
100
110
0.01 20
30
40
50
60
70
80
90
100
110
Fig. 4. Left: Relative mean estimation error (percent) as a function of ellipsoid main semi-axis a (using object-background surface area averaging). The dots refer to the ellipsoid family (a, 26, 25); the x’s to (a, 51, 50). Right: The corresponding coefficients of variation (percent).
in Fig. 3 (right). It rapidly decreases with the sphere radius, from about 4% with sphere radius 2, through 0.5% with radius 5 to negligibly small values at larger radii. These results are similar to those obtained with spheres in [14]; they demonstrate the outstanding accuracy of the estimator even when the surface curvature is very high. Note that surface curvature radii between 2 and 5 are common in the segmented MR white matter brain data used in this research. Consider an ellipsoid with semi-main axes a, b and c parallel to the coordinate system axes. Each data point in Fig. 4 corresponds to surface area estimation of 50 such ellipsoids, with center points uniformly distributed within the unit voxel. The dots in Fig. 4 (left) refer to the ellipsoid family (a, b = 26, c = 25), and shows the relative mean surface area estimation as a function of a. The x signs refer to the ellipsoid family (a, b = 51, c = 50). As expected, the error is almost zero for nearly spherical ellipses; it slowly grows as a increases and the ellipses become elongated. The respective coefficients of variation are shown in Fig. 4 (right); they are very small.
4
Application to Brain Data
The methods presented in this paper have been implemented as a C++ program named Surf3D, for Unix platforms. Surf3D receives as input 3D binary images, visualizes the data, and allows the user to interact with the surface using the mouse. A graphical user interface allows easy object rotation, synthetic illumination, etc. Using the mouse, the user marks a set of key points that indicate the region of interest on the surface of the viewed object. The keypoints are connected automatically to create the surrounding contour. From a seed point selected by the user, Surf3D grows the region of interest and estimates its area. A typical work session with Surf3D is illustrated in Fig. 5. The input data was a 160 × 200 × 160 segmented white matter MR brain image, with 558,363 cubic object voxels, of which 156,825 were surface voxels, having 300,130 boundary faces (larger data sets are readily accommodated). The session begins by loading
366
Guy Windreich, Nahum Kiryati, and Gabriele Lohmann
an image containing segmented white matter MR brain data. The border and the boundary of the object are detected and displayed. The user defines the contour of the region of interest by selecting keypoints on the surface. When ready, the program is instructed to close the contour. The user then clicks on a surface point within the region of interest, from which the program grows the region and estimates its surface area. With a 350MHz PC running Linux, the computing time spent in each of the steps is a few seconds or less.
Fig. 5. Interactive definition of the region of interest on the surface. Left: Automatic connection of key points provided by the user. Right: Closure of the contour surrounding the region of interest.
5
Conclusions
This research provides a fast, accurate and convenient scheme for estimating the surface area of regions of interest on the surface of digital objects. The input is a 3D binary digital image, i.e., a set of voxels. The voxel representation is maintained and no triangulation is carried out. The suggested technique bridges the gap between the theoretical results of Mullikin and Verbeek [14] and the reality of complex medical data. In particular, the method is well suited for highly convoluted surfaces. The accuracy is verified using synthetic surfaces: simulation results reported in [14] are corroborated, and augmented by results on ellipsoids. Operation is demonstrated on segmented white-matter MR brain data.
Acknowledgments This research was supported by a grant from the G.I.F., the German-Israeli Foundation for Scientific Research and Development.
Surface Area Estimation in Practice
367
References 1. E. Artzy, G. Frieder and G.T. Herman, “The Theory, Design, Implementation and Evaluation of a Three-Dimensional Surface Detection Algorithm”, Computer Graphics and Image Processing, Vol. 1, pp. 1-24, 1981. 2. D. Couerjolly, F. Flin and O. Teytaud, “Digital Surface Area Estimation”, abstract, in Dagstuhl Seminar Report no. 339, Schloss Dagstuhl, Germany, 2002. 3. L. Dorst and A.W.M. Smeulders, “Length Estimators for Digital Contours”, Computer Vision Graphics Image Processing, Vol. 40, pp. 311-333, 1987. 4. U. Hahn and K. Sandau, “Precision of Surface Area Estimation Using Spatial Grids”. Acta Stereologica, Vol. 8, pp. 425-430, 1989. 5. J. Koplowitz and A.M. Bruckstein, “Design of Perimeter Estimators for Digitized Planar Shapes”, IEEE Trans. Pattern Analysis Machine Intelligence, Vol. 11, pp. 611-622, 1989. 6. J. Lindblad and I. Nystr¨ om, “Surface Area Estimation of Digitized 3D Objects using Local Computations”, Proc. DGCI’2002, Lecture Notes in Computer Science, Vol. 2301, pp. 267-278, 2002. 7. Y. Kenmochi and R. Klette, “Surface Area Estimation for Digital Regular Solids”, Technical Report CITR-TR-62, Computer Science Department, University of Auckland, New Zealand, 2000. available online. 8. R. Kimmel and J. Sethian, “Computing Geodesics on Manifolds”, Proc. National Academy of Sciences, Vol. 95, pp. 8431-8435, 1998. 9. N. Kiryati and G. Sz´ekely, “Estimating Shortest Paths and Minimal Distances on Digitized Three-Dimensional Surfaces”, Pattern Recognition, Vol. 26, pp. 16231637, 1993. 10. R. Klette, F. Wu and S. Zhou, “Multigrid Convergence of Surface Approximations”, Technical Report CITR-TR-25, Computer Science Department, University of Auckland, New Zealand, 1998. Available online. 11. R. Klette and H.J. Sun, “A Global Surface Area Estimation Algorithm for Digital Regular Solids”, Technical Report CITR-TR-69, Computer Science Department, University of Auckland, New Zealand, 2000. Available online. 12. G. Lohmann, Volumetric Image Analysis, Wiley, Chichester, UK & Teubner, Stuttgart, Germany, 1998. 13. W.E. Lorensen and H.E. Cline, “Marching Cubes: A High Resolution 3D Surface Construction Algorithm”, ACM Computer Graphics, Vol. 21, pp. 163-169, 1987. 14. J.C. Mullikin and P.W. Verbeek, “Surface Area Estimation of Digitized Planes”, Bioimaging, Vol. 1, pp. 6-16, 1993. 15. X. Zeng, L.H. Staib, R.T. Schultz and J.S. Duncan, “Segmentation and Measurement of the Cortex from 3-D MR Images Using Coupled-Surfaces Propagation”, IEEE Transactions on Medical Imaging, Vol. 18, pp. 927-937, 1999. 16. G. Windreich and N. Kiryati, “Voxel-based Surface Area Estimation”, abstract, in Dagstuhl Seminar Report no. 339, Schloss Dagstuhl, Germany, 2002.
Perimeter and Area Estimations of Digitized Objects with Fuzzy Borders Nataˇsa Sladoje1, , Ingela Nystr¨ om1 , and Punam K. Saha2 1
2
Centre for Image Analysis, Uppsala, Sweden {natasa,ingela}@cb.uu.se MIPG, Dept. of Radiology, University of Pennsylvania Philadelphia, PA, USA [email protected]
Abstract. Fuzzy segmentation methods have been developed in order to reduce the negative effects of the unavoidable loss of data in the digitization process. These methods require the development of new image analysis methods, handling grey-level images. This paper describes the first step in our work on developing shape analysis methods for fuzzy images: the investigation of several measurements on digitized objects with fuzzy borders. The performance of perimeter, area, and the P 2A measure estimators for digitized disks and digitized squares with fuzzy borders is analyzed. The method we suggest greatly improves the results obtained from crisp (hard) segmentation, especially in the case of low resolution images. Keywords: Fuzzy shape representation, measurement, accuracy, precision
1
Introduction
It has become clear that some of the problems caused by crisp (hard) segmentation of a grey-level image (leading to a binary image) may be solved by using fuzzy segmentation instead (see, e.g., [7]); performing image analysis directly on a grey-level image, i.e., on its corresponding fuzzy segmented image. To date, very little has been published on the development of shape analysis methods that can handle fuzzy segmented images. We are interested in estimating quantitative properties of fuzzy objects. In particular, we would like to investigate the behaviour of perimeter estimation of a fuzzy object, digitized at relatively low resolution. A comparative evaluation of several estimators of the length of a binary digitized curve is presented in [3]. Generally, they perform asymptotically very well, but produce either overor under-estimation at low resolution. We believe that the perimeter measures computed from a fuzzy segmentation could provide better results. We also expect improved area estimations. In the context of computing perimeter and area estimators, it is natural to analyze how they affect the P 2A shape descriptor, for a shape S, calculated as
The first author is supported by a grant from the Swedish Institute.
I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 368–377, 2003. c Springer-Verlag Berlin Heidelberg 2003
Perimeter and Area Estimations of Digitized Objects with Fuzzy Borders
369
Fig. 1. Examples of digitized objects with crisp (top) and fuzzy border (bottom).
perimeter2 (S) . 4π · area(S) This is a measure of the compactness of a two-dimensional (2D) object closely related to the well-known isoperimetric inequality, (perimeter(S))2 ≥ 4π · area(S). According to this inequality, for any continuous 2D object, P 2A is larger than or equal to 1. The P 2A value of an object decreases with the increase of its compactness and the lower limit is reached for a disk, the most compact object. However, for digitized 2D objects, the P 2A measure can be less than 1, even for non-circular objects. We expect that fuzzification of a digital object increases its similarity to the continuous analogue shape.
2
Background and Related Works
We assume that, during a segmentation process, most of the image points can be classified as either object or background, but for some of the points it is hard to make this discrimination. Such points are often located around the border of an object. One adequate way to treat these points may be to determine the extent of their membership to the object/background, i.e., to define an object as a fuzzy subset. In this study, we compute the membership value of a pixel (a point in a digitized image) as a fraction of its area belonging to the original object, where the discrete analogue of the area is used. Area coverage of a pixel is expressed as the number of subpixels, within the candidate pixel, each having its centroid inside the object. It is achieved by increasing the resolution of the image (i.e., sub-sampling). Examples of digital objects obtained this way, are presented in Figure 1. As a starting point, we utilize some already-known approaches of measuring the perimeter of a fuzzy subset. We are particularly interested in the results of Rosenfeld [6] and Bogomolny [1]. Both papers are related to fuzzy subsets in the
370
Nataˇsa Sladoje, Ingela Nystr¨ om, and Punam K. Saha
continuous domain, and utilize the gradient of a fuzzy membership function for the calculation of a perimeter of a set. In this paper, we propose a new gradient estimator for the discrete case. 2.1
Perimeter of Fuzzy Subsets in the Continuous Domain
It is natural to define the perimeter of a digital fuzzy subset with respect to the definition of the perimeter of a fuzzy subset in the continuous domain. The notions of area and perimeter of a (continuous) 2D fuzzy subset were introduced in [6], as a generalization of their hard analogues. However, some simple interrelations, e.g., the isoperimetric inequality, that hold in the crisp case, do not hold if the perimeter and area are defined as in [6]. This fact initialized further research and resulted in the modified definition of the perimeter of a fuzzy subset [1]. We are interested in the results related to the fuzzy step subsets which were considered in both [6] and [1]. We utilize the following definitions, cited from [10] (Definition 1) and [1] (Definitions 2–5). Definition 1 A fuzzy subset M of a reference set X ∈ Rn is a set of ordered pairs M = {(x, µM (x)) | x ∈ X}, where µM : X → [0, 1] is the membership function of M in X. Definition 2 A set S, given by its membership function µS : R2 → [0, 1], is a fuzzy step subset if (i) there exist (crisp) open sets S1 , . . . , Sn+1 , n = n(µS ), of which all but one (say, Sn+1 ) are bounded; (ii) Si ∩ Sj = ∅, i = j; n+1 (iii) i=1 S¯j = R2 , where S¯ denotes the closure of a (crisp) set S in the ; Euclidean topology of R2 nij Bijk , where Bijk is a rectifiable Jordan arc (iv) if i = j, then S¯i ∩ S¯j = k=1 of length l(Bijk ); (v) µS (x) = si , for x ∈ Si , i.e., for i = 1, 2, . . . , n + 1, Si is iso-membership valued, and sn+1 = 0. Note: A fuzzy digital image is a fuzzy step subset, where sets Si are determined with respect to the connected sets of pixels having the same membership (greylevel) value. Definition 3 The area A(M ) of a fuzzy subset M of a reference set X, given by its membership function µM , is A(M ) = X µM (x)dx, where x ∈ X. Definition 4 (Rosenfeld) The perimeter P (S) of a fuzzy step subset S, given by its membership function µS , is P (S) =
nij n+1 i,j=1 i<j
k=1
|si − sj | · l(Bijk ).
Perimeter and Area Estimations of Digitized Objects with Fuzzy Borders
371
Definition 5 (Bogomolny) The perimeter P (S) of a fuzzy step subset S, given by its membership function µS , is P (S) =
nij n+1 i,j=1 i<j
√ √ | si − sj | · l(Bijk ).
k=1
For the notification used in Definitions 4 and 5, see Definition 2. The isoperimetric inequality does not hold if the perimeter is defined by Definition 4, but it holds if Definition 5 is used. Both definitions reduce to the usual definition of a perimeter in the case of a crisp set. 2.2
Perimeter of Crisp Sets in the Discrete Domain
The direct application of Definitions 4 and 5 assumes estimation of the length of a border line, l(Bijk ), between two neighbouring sets of pixels, Si and Sj , having different grey-levels. Consequently, the accuracy of the perimeter estimation of a fuzzy subset strongly depends on the accuracy of the estimation of the length of a line in a binary digitized set. The results of the comparative evaluation of length estimators, presented in [3], show that the length estimators which are multigrid convergent, i.e., perform very well if the image resolution is high, are all global. It is also noticed that they have disadvantages, if applied to low resolution images. In this paper we focus on the evaluation of estimators at low resolutions. We are not interested in multigrid convergence, but instead use an intuitively simple local approach that relies on measuring elementary moves within the path l(Bijk ), and that is naturally incorporated in the gradient-based method for the perimeter estimation that we suggest.
3
Perimeter of a Digital Fuzzy Subset
The perimeter of a digital fuzzy subset is obtained by calculating a measure related to the gradient of the membership function. This measure is determined by summing, for each pixel, the difference between membership values of neighbouring pixels multiplied by the length of the border line between every two neighbouring pixels. It is assumed that the digital objects we work with are fuzzy (only) on the border; the membership value of the pixels is equal to 1 in the inner – “central” – area of the object, and “radially” decreases toward the border. This intuitively follows from the fuzzification method that we apply and allows us to assume that the objects we work with fulfil the following property: Definition 6 (Local fuzzy convexity property) A fuzzy digital object has a local fuzzy convexity property if the 3 × 3 neighbourhood of each pixel in the image is a convex fuzzy subset. (For the definition of a convex fuzzy subset, see, e.g., [9].)
372
Nataˇsa Sladoje, Ingela Nystr¨ om, and Punam K. Saha z1 z2 z3 z4 z5 z6 z7 z8 z9 Fig. 2. The 3 × 3 neighbourhood of a pixel z5 in a 2D digital image.
Note: A convex fuzzy subset has a local fuzzy convexity property, but the reverse does not hold in general. For each pixel, an increase of membership values in its 3 × 3 neighbourhood may occur only in the directions contained in the convex angle, positioned in the observed pixel. Consequently, the estimated gradient in each point of the fuzzy object can be computed in the way described in the sequel. Let the 3 × 3 neighbourhood of a pixel z5 be denoted as in Figure 2. To compute the increase of a membership function, i.e., the estimated gradient at z5 , first, we calculate dhor = max{µ(z6 ) − µ(z5 ), µ(z4 ) − µ(z5 ), 0}, dvert = max{µ(z2 ) − µ(z5 ), µ(z8 ) − µ(z5 ), 0}, and then assign dmax = max{dhor , dvert } and dmin = min{dhor , dvert }. The contribution per(z5 ) of the observed pixel z5 to the estimated perimeter is determined as per(z5 ) = dmin · b-step + (dmax − dmin ) · a-step,
(1)
where a-step and b-step are the estimates (weights) of the isothetic and the diagonal distance between two neighbouring pixels, respectively, see [2]. The perimeter P (S) of a fuzzy object S in the image I is calculated as per(z). (2) P (S) = z∈I
Locally, equation (1) corresponds to Definition 4. If the square root of the membership value is used instead, the result is in accordance with Definition 5. To optimize the perimeter estimation of a fuzzy object, methods and results already successfully applied in the binary case are used. For the a- and b-step weights, we use the coefficients an→∞ M SE ≈ 0.948 and bn→∞ M SE ≈ 1.343 ([4], [5]) to minimize the expected mean square error (M SE) for measurement of the length of long line segments (n → ∞) and to give an unbiased estimate. It is shown by Dorst and Smeulders [4] that such a choice of the coefficients provides the best linear unbiased estimator (BLUE for short) for straight lines and that it also performs well when applied to curved lines. In the binary case, equation (2) reduces to the perimeter estimator where the perimeter corresponds to the outer boundary of an object, which leads to an over-estimation of a real perimeter, especially, for small objects. In that case, the perimeter estimation is highly improved if a (negative) correction term is used. We find that the same holds for fuzzy perimeter estimation. In other words,
Perimeter and Area Estimations of Digitized Objects with Fuzzy Borders
373
the subtraction of an appropriately chosen constant value from the estimated perimeter compensates for the over-estimation caused by digitization, particularly for small objects. We have determined the correction term CorrT erm as the difference between the mean of the estimated perimeter values of 10, 000 (hard) disks of radius 1 pixel, randomly positioned inside a pixel, and the real perimeter value of a disk of the same size. We obtained CorrT erm = −0.689078. The perimeter of the fuzzy objects is finally calculated by using the following definition. Definition 7 The perimeter P (S) of a fuzzy object S is given by the equation Pcorr (S) = per(z) − 0.689078, z∈I
where per(z) is a contribution to the perimeter of any pixel z in the image I. Our choice of correction term, based on the specific shape of a disk, should not be seen as a shape-dependent (however, the possibility of further adjustment of the estimation results for some specific cases should not be excluded). The disk, being the most compact shape, having a smooth boundary, seems to be a reasonable choice to be used for “calibration” of the experimental results.
4
Results
We have performed tests for 100 disks for each observed real-valued radius (the centre is randomly positioned inside a pixel) and for 1, 000 randomly positioned squares for each observed real-valued side length (for each size 100 random centre positions each with 10 random rotations between 0 and 45 degrees are used). The experiment is repeated for different sub-sample factors (1, 4, 8, and 16). Note that sub-sample factor 1 corresponds to hard segmentation and that it is meaningless to increase the sub-sample factor above 16 for 8-bit pixel values (16 × 16 = 256 = 28 ). We focus on the results obtained for disks with radius up to 20 pixels and for squares with side length up to 30 pixels. We present results obtained for area, perimeter, and P 2A measure estimations. For each of the parameters, we determine the mean of the estimated values, as well as maximal and minimal estimated values. 4.1
Area Estimation
The results obtained for area estimation of both digital disks and digital squares with fuzzy border are very encouraging. Even though the number of pixels in the object gives a very good estimate for the area of a (hard) object, the precision of this estimate is rather low for small objects. According to the results presented in Figure 3, for sub-sample factor 1 (corresponding to the binary case), we cannot expect the maximal error to be less than 5% of the real value, if the radius of the disk is less than 7 pixels. The digitized squares are even more affected by
374
Nataˇsa Sladoje, Ingela Nystr¨ om, and Punam K. Saha Area Estimation of a Square
Area Estimation of a Disk 1.25 SubsFact 1 SubsFact 4 SubsFact 8 SubsFact 16
SubsFact 1 SubsFact 4 SubsFact 8 SubsFact 16
1.2
1.1 1.15
1.1 EstArea/TrueArea
EstArea/TrueArea
1.05
1
0.95
1.05
1
0.95
0.9 0.9 0.85
0.85
2
3
4
5 Radius
6
7
8
0.8
5
6
7
8
9 10 11 Square Side Length
12
13
14
15
Fig. 3. Area estimation of small size objects randomly positioned in a grid. Results for disks (left) and squares (right).
the problem of imprecise area estimation than the disks, because of the possible appearance of straight line segments aligned with the grid. In that case, theoretically, the order of the area estimation error is equal to the number of points on the straight line segment. Introducing fuzziness in the segmentation procedure and using the analogous estimate (the sum of pixel values having their centroid within the object, see Definition 3), we get much better results. According to the experiments done for sub-sample factor 8, the area estimation error is less than 1% for disks with radius slightly larger than 1 pixel, while the improvement is even more significant in the case of digitized squares, where the error is reduced to less than 1% for squares with side length 4 pixels. The precision of this unbiased estimation increases both with the size of the object and the sub-sample factor. 4.2 Perimeter Estimation Perimeter estimation of digitized objects is our main topic. The disadvantages of the available estimators are their low precision for small objects and their biased behaviour [3]. We manage to get better estimations by applying fuzzy segmentation both on a disk and a square, and by using the method suggested by Definition 7, both when incorporating Definition 4 and Definition 5. In order to test Bogomolny’s approach (Definition 5), we used the image where the grey-level in each point is determined by using the square root of the membership of a point in an image, instead of the membership value itself. In this way, (strict) object points and background points are not affected, but the grey-levels of the points on the (fuzzy) border are slightly increased. Our methods based on Definitions 4 and 5 considerably improve the precision of perimeter estimation of small objects. Results for digitized disks and digitized squares are shown in Figure 4. The error becomes less than 1% for disks with radius of about 13 pixels, even with sub-sample factor 4. The estimation is slightly biased (the real perimeter value is over-estimated) if Definition 4 is applied. The results obtained for randomly positioned and randomly rotated
Perimeter and Area Estimations of Digitized Objects with Fuzzy Borders
375
Perimeter Estimation of a Disk (Definition 5)
Perimeter Estimation of a Disk (Definition 4) 1.5
1.4 SubsFact 1 SubsFact 4 SubsFact 8 SubsFact 16
1.4
SubsFact 1 SubsFact 4 SubsFact 8 SubsFact 16
1.3
1.3 1.2
EstPerim/TruePerim
EstPerim/TruePerim
1.2
1.1
1
1.1
1
0.9
0.9 0.8 0.8 0.7
0.7
0.6
0
2
4
6
8
10 Radius
12
14
16
18
0.6
20
0
2
4
6
8
(a)
10 Radius
12
14
16
18
20
(b)
Perimeter Estimation of a Square (Definition 4)
Perimeter Estimation of a Square (Definition 5) 1.5
1.4 1.1 1.3
1.2 EstPerim/TruePerim
EstPerim/TruePerim
1
0.9
0.8
1.1
1
0.9
0.8
0.7
0.7 0.6
0.5
SubsFact 1 SubsFact 4 SubsFact 8 SubsFact 16 0
5
10
15 Square Side Length
(c)
20
25
SubsFact 1 SubsFact 4 SubsFact 8 SubsFact 16
0.6
30
0.5
0
5
10
15 Square Side Length
20
25
30
(d)
Fig. 4. Perimeter estimations of disks (top) and squares (bottom) randomly positioned in a grid. (a, c) Definition 4 used. (b, d) Definition 5 used.
squares (the rotation angle is uniformly distributed over the interval [0◦ , 45◦ ]) of different side lengths are presented in Figure 4(c) and (d). Our opinion is that estimation method based on Definition 5 is rather nonintuitive, because of the not clearly motivated way of the image transformation (application of the square root function). provide any improvement of the perimeter estimation for the disks; actually, the over-estimation for the disks is much higher compared to the results based on Definition 4, even though the results of a square perimeter estimation are rather good. Consequently, we see no strong reasons to suggest Bogomolny’s approach to be used for perimeter estimation. We suggest to select a method expected to be reasonably reliable if applied to general (convex) shapes; according to the results we obtained, our perimeter estimation method based on Definition 4 seems to be a good choice, considerably improving estimation precision even in the very delicate case of a shape bounded by straight line segments aligned with the grid.
376
Nataˇsa Sladoje, Ingela Nystr¨ om, and Punam K. Saha P2A Estimation of a Disk (Definition 5)
P2A Estimation of a Disk (Definition 4) 1.3
1.4 SubsFact 1 SubsFact 4 SubsFact 8 SubsFact 16
1.2
1.2
EstP2A/TrueP2A
EstP2A/TrueP2A
1.1
1
0.9
1.1
1
0.8
0.9
0.7
0.8
0.6
SubsFact 1 SubsFact 2 SubsFact 4 SubsFact 16
1.3
0
2
4
6
8
10 Radius
12
14
16
18
20
0.7
0
2
4
6
8
10 Radius
12
14
16
18
20
Fig. 5. P 2A measure estimation for digitized disks. Left: The results based on Definition 4. Right: The results based on Definition 5.
4.3
P 2A Measure Estimation
The results presented in Figure 5 (digitized disks) correspond to P 2A measure estimation. We conclude that the estimate for P 2A measure for the disks, based on Definition 4, is rather close to the true value (that is 1 for real disks). Not only is the precision of the estimation improved by introducing fuzzy segmentation (i.e., fuzzy border objects), but the P 2A measure is larger than 1, i.e., isoperimetric inequality is satisfied for disks with radius slightly larger than 3 pixels for sub-sample factor 4. This is not the case for hard objects; even though the mean estimated P 2A value is above 1, the estimation is rather imprecise, and results lower than 1 (violating the isoperimetric inequality) may be obtained for disks with radius larger than 20 pixels. The main intention of Bogomolny’s approach is to define perimeter of a (continuous) fuzzy subset in a way that the P 2A measure is never less than 1 and is equal to 1 for disks. By using Definition 5 and Definition 7, analogous results should be provided in the discrete case. Obviously, the P 2A measure resulting from such a perimeter estimation highly over-estimates the correct value for digital disks, and the over-estimation increases as the sub-sample factor increases. Since the isoperimetric inequality should reduce to equality for disks, we can conclude that the results are even worse than in the binary case.
5
Comments and Conclusions
A fuzzy subset representation of a digitized object seems to be promising in improving the precision of estimates of several quantitative properties. The estimation of area, perimeter, and P 2A (compactness) measure, for digitized disks and squares, are generally improved when fuzzy membership values are allowed for the pixels instead of just 0 or 1. The improvements using fuzzy representation have been found to be more significant at low resolution. From the investigation in Section 4.2, we suggest the perimeter estimation method based on Definition 4
Perimeter and Area Estimations of Digitized Objects with Fuzzy Borders
377
and our Definition 7, as a general and natural one. An important property of this method is that no hard segmentation, which is usually difficult to obtain, is needed; the estimations are based on the fuzzy (border of an) object. This makes our method attractive for applications, where, e.g., the imaging devices produce images with the grey-levels proportional to the picture element coverage, and the grey-levels can be used for defining membership values of a fuzzy object. This is exactly the approach we used in the development of the theoretical background for the shape analysis methods we suggest. The next step is to study results of applying the proposed method on nonconvex shapes and on shapes with fuzziness appearing not only on the object border. Also, generalizing the method to 3D fuzzy shapes is of interest. The primary goal of this work is to apply the estimation methods on real images where fuzziness of an object is obtained as a result of fuzzy segmentation methods [8].
Acknowledgments Prof. Gunilla Borgefors and Dr. Joakim Lindblad, both Centre for Image Analysis, Uppsala, Sweden, are gratefully acknowledged for their scientific support.
References 1. A. Bogomolny. On the perimeter and area of fuzzy sets. Fuzzy Sets and Systems, 23:257–269, 1987. 2. G. Borgefors. Distance transformations in digital images. Computer Vision, Graphics, and Image Processing, 34:344–371, 1986. 3. D. Coeurjolly and R. Klette. A comparative evaluation of length estimators. In Proceedings of International Conference on Pattern Recognition (ICPR 2002), volume IV, pages 330–334. IEEE Computer Society, August 2002. 4. L. Dorst and A. W. M. Smeulders. Length estimators for digitized contours. Computer Vision, Graphics, and Image Processing, 40:311–333, 1987. 5. Z. Kulpa. Area and perimeter measurement of blobs in discrete binary pictures. Computer Graphics and ImageProcessing, 6:434–454, 1977. 6. A. Rosenfeld and S. Haber. The perimeter of a fuzzy subset. Pattern Recognition, 18:125–130, 1985. 7. P. K. Saha and J. K. Udupa. Relative fuzzy connectedness among multiple objects: Theory, algorithms, and applications in image segmentation. Computer Vision and Image Understanding, 82(1):42–56, Apr. 2001. 8. N. Sladoje, I. Nystr¨ om, and P. K. Saha. Measuring perimeter and area in low resolution images using a fuzzy approach. In J. Bigun and T. Gustavsson, eds, Proc. of 13th Scandinavian Conference on Image Analysis (SCIA 2003), G¨ oteborg, Sweden, vol. 2749 of LNCS , pp. 853–860. Springer-Verlag, 2003. 9. X. Yang. Some properties of convex fuzzy sets. Fuzzy Sets and Systems, 72:129–132, 1995. 10. L. Zadeh. Fuzzy sets. Information and Control, 8:338 – 353, 1965.
Geodesic Object Representation and Recognition A. Ben Hamza and Hamid Krim Department of Electrical and Computer Engineering North Carolina State University, Raleigh NC 27695, USA
Abstract. This paper describes a shape signature that captures the intrinsic geometric structure of 3D objects. The primary motivation of the proposed approach is to encode a 3D shape into a one-dimensional geodesic distribution function. This compact and computationally simple representation is based on a global geodesic distance defined on the object surface, and takes the form of a kernel density estimate. To gain further insight into the geodesic shape distribution and its practicality in 3D computer imagery, some numerical experiments are provided to demonstrate the potential and the much improved performance of the proposed methodology in 3D object matching. This is carried out using an information-theoretic measure of dissimilarity between probabilistic shape distributions. Keywords: Geodesic shape distribution, 3D object representation and matching, Jensen-Shannon divergence.
1
Introduction
Three-dimensional objects consist of geometric and topological information, and their compact representation is an important step towards a variety of imaging applications including indexing, retrieval, and matching in a database of 3D models. The latter will be the focus of the present paper. There are two major steps in object matching. The first step involves finding a reliable and efficient shape representation or descriptor, and the second step is the design of an appropriate dissimilarity measure for object comparison between shape representations. Most three-dimensional shape matching techniques proposed in the literature of computer graphics, computer vision and computer-aided design are based on geometric representations which represent the features of an object in such a way that the shape dissimilarity problem reduces to the problem of comparing two such object representations. Feature-based methods require that features be extracted and described before two objects can be compared. Among featurebased methods, one popular approach is graph matching, where two objects are represented by their graphs composed of vertices and edges. An efficient representation that captures the topological properties of 3D objects is the Reeb graph descriptor proposed by Shinagawa et al. [1]. The vertices of the Reeb graph are the singular points of a function defined on the underlying object [1,2]. These singularities carry important information for further operations, such as image registration, shape analysis, surface evolution and object recognition [3,4,5]. I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 378–387, 2003. c Springer-Verlag Berlin Heidelberg 2003
Geodesic Object Representation and Recognition
379
An alternative to feature-based representations is the shape distribution developed by Osada et al [6]. The idea here is to represent an object by a global histogram based on the Euclidean distance defined on the surface of the object. The shape matching problem is then performed by computing a dissimilarity measure between the shape distributions of two arbitrary objects. This approach is computationally stable and relatively insensitive to noise. Because of unsuitability of the Euclidean distance when dealing with nonlinear manifolds, the shape distribution, however, does not capture the nonlinear geometric structure of the data. In this paper, we propose a new approach for object matching based on a global geodesic measure. The key idea behind our technique is to represent an object by a probabilistic shape descriptor called geodesic shape distribution that measures the global geodesic distance between two arbitrary points on the surface of an object. In contrast to the Euclidean distance which is more suitable for linear spaces, the geodesic distance has the advantage to be able to capture the (nonlinear) intrinsic geometric structure of the data. The geodesic shape distribution may be used to facilitate representation, indexing, retrieval, and object matching in a database of 3D models. More importantly, the geodesic shape distribution provides a new way of looking at the object matching problem by exploring the intrinsic geometry of the shape. The matching task therefore becomes a one-dimensional comparison problem between probability distributions which is much easier than comparing 3D structures. Object matching may be carried out by dissimilarity measure calculations between the corresponding geodesic shape distributions, and it is accomplished in a highly efficient way. Information-theoretic measures provide quantitative entropic divergences between two probability distributions. A common entropic dissimilarity measure is Kullback-Leibler divergence [7] which has been successfully used in many applications including indexing and image retrieval [8]. Another entropy-based measure is the Jensen-Shannon divergence which may be defined between any number of probability distributions [9], and it has been applied to a variety of signal/image processing and computer vision applications including DEM image matching [10], and ISAR image registration [11]. The rest of this paper is organized as follows. The next section is devoted to the problem formulation. Section 3 describes the representation step of our proposed technique. In Section 4, we present the Jensen-Shannon divergence and show its attractive properties as a dissimilarity measure between probability distributions. In Section 5, we provide numerical simulations to show the power of the geodesic shape distribution for 3D object matching. And finally, Section 6 contains some conclusions and describes a brief outline of possible future work.
2
Problem Statement
Three-dimensional objects are usually represented as triangular meshes in computer graphics and geometric-aided design. A triangle mesh is a pair M = (V, T ), where V = (v 1 , . . . , v m ) is the set of vertices, and T = (T1 , . . . , Tn ) is the set of triangles.
380
A. Ben Hamza and Hamid Krim
In scientific visualization and analysis, a triangle mesh is too large to be examined without simplification. One way to overcome this limitation is to represent a triangle mesh by surface features which can easily be computed and which effectively characterize the global surface shape. The centroids of the set of triangles T are desirable features that can efficiently be computed and which have a global significance for the surface shape representation as illustrated in Fig. 1. In addition, there is a well defined correspondence between a centroid and its corresponding region (triangle) as depicted in Fig. 1. It is important to point out that centroid-based methods have been used in a variety of computer vision applications including clustering. In particular, the K-mean algorithm is one of the most widely centroid-based technique used for clustering [13]. Unless we establish a meaningful measure of distance between the centroids of a triangle mesh, no meaningful exploration of the underlying object structure is possible. In order to take into account the interaction between the centroids, we compute a pairwise distance measure g(ci , cj ) from any centroid ci to all the other centroids cj ∈ C. Fig. 1 illustrates an arbitrary distance between two centroids. Notice that the distance g need not be an Euclidean metric.
Fig. 1. Distance between two arbitrary centroids of a 3D human head.
To obtain a global measure of the shape M, we simply integrate over all centroids. More precisely, we define a function f : C ⊂ M → R such that 1 g(ci , cj ) dcj , (1) f (ci ) = |C| cj where dcj denotes the area element that contains the centroid cj , that is, in the discrete domain, dcj = area(Tj ) the area of the triangle Tj , and |C| =
Geodesic Object Representation and Recognition
381
n
j=1 area(Tj ) is the total area of the surface M. The function f clearly represents a global measure of the shape, and therefore to each triangle mesh M we will assign its global measure f . The problem addressed in this paper can now be concisely described by the following statement: Given two 3D objects M1 and M2 to be matched, find their global measures f1 and f2 , and then for a computed dissimilarity measure D(f1 , f2 ), we are to determine the relative match between them. In other words, the dissimilarity between two objects measures “how different they are”, and a smaller value of D means that the two objects are more similar. Fig. 2 depicts a block-diagram of the proposed framework.
Fig. 2. Block-diagram of the proposed methodology.
3
Proposed Method
The basic idea behind the shape descriptor is to characterize a 3D object with a one-dimensional function which will help us discriminate between objects in a database of 3D models. 3.1
Global Geodesic Shape Function
The Reeb graph concept has been shown to be very effective in modeling 3D objects based on cross sections such as MRI or CT images. It is more adapted to modeling applications where the height is of special interest such as terrain imaging. The height function, however, has some limitations to be used as an object signature for matching, indexing, or retrieval of arbitrary 3D objects.
382
A. Ben Hamza and Hamid Krim
The main reason is that the height function is not rotationally invariant. To overcome these limitations, we propose a global geodesic function defined on the object surface as follows. Let ci and cj be two points (centroids) on a triangle mesh M. The geodesic distance g(ci , cj ) between ci and cj is the shortest length b L(γ) = a γ (t)dt of a smooth curve γ : [a, b] → M such that γ(a) = ci and γ(b) = cj . The geodesic distance may be viewed locally as the Euclidean one dE (ci , cj ) = ci −cj , and it is clearly invariant to rigid transformations. Inspired by the geodesic-based representation for 3D topology matching proposed by Hilaga et al. [12], we define a global shape function f : C → R expressed in terms of the rotationally invariant (square) geodesic distance as follows 1 g(ci , cj )2 dcj . (2) f (ci ) = |C| cj The primary motivation behind the geodesic distance is that of overcoming the limitations of the Euclidean distance which is linear in nature and therefore cannot find nonlinear structures in the underlying object. It is worth noting that the geodesic distance in not only rotationally invariant, but also invariant to isometric transformations of the surface [16]. Unlike the Euclidean distance which is basically a straight line between two points in a 3D space, the geodesic distance captures the global nonlinear structure and the intrinsic geometry of the data as illustrated in Fig. 3. It is clear that the Euclidean distance between two arbitrary points in a nonlinear manifold is just a straight segment connecting both points, whereas the geodesic distance, which is the shortest curve along the surface connecting both points, clearly reflects the nonlinear structure of the object. Geodesic Distance Calculation. Given the set of centroids C = {c1 , . . . , cn } of a triangle mesh M, the geodesic distance calculation is based on a graph-based approach recently used to compute the isometric feature mapping (Isomap) for multidimensional scaling on nonlinear manifolds [14]. The algorithm has two main steps: (i) Construct a neighborhood graph by connecting a given centroid to its k-nearest neighbors, and link these neighboring centroids by edges with weights equal to the Euclidean distances. (ii) Compute the geodesic distances (shortest paths) between all pairs of n points in the constructed graph using Dijkstra’s or Floyd’s algorithm. Note that the geodesic distance on triangulated surfaces may also be effectively computed using the fast marching method introduced by Kimmel and Sethian [17]. In discrete form, the geodesic shape function is reduced to f (ci ) =
a 1 Ga = G , |C| a1
(3)
Geodesic Object Representation and Recognition
383
Fig. 3. Euclidean vs. geodesic distance on a nonlinear manifold. 2 where G = (gij ) is the (square) geodesic distance matrix of size n × n, and a = (a1 , . . . , an )T is an n × 1 vector of triangle areas, i.e. aj = area(Tj ). The 2 ) is symmetric with zeros in the diagonal, and positive offmatrix G = (gij diagonal elements. Note that the geodesic shape function can be expressed as a geodesic shape vector X = {X1 , . . . , Xn }, where Xi = f (ci ). This vector may be viewed as a shape descriptor that can be used for 3D shape comparison.
3.2
Global Geodesic Shape Distribution
Assume that the geodesic shape vector X of an object M is a random sample with a common (unknown) probability density function p. A common approach to approximating the probability density function p is through the kernel density estimation. This is an important data-driven tool that provides a very effective way of unraveling structure in a set of data [15]. The kernel density estimator pˆ is given by n 1 x − Xi , (4) pˆ(x) = K nh i=1 h where K is a Gaussian kernel, and h is the bandwidth or window width to be estimated. A good selection of this bandwidth is given by (see [15])
1/5 243 R(K) σ ˆ, 35 µ2 (K)2 n ˆ = medj {|Xj − medi {Xi }|} where R(K) = K(t)2 dt, µ2 (K) = t2 K(t) dt, and σ is the median absolute deviation. To each 3D object represented by a triangle mesh M, we therefore associate a kernel density pˆ which we will refer to as geodesic shape distribution. This ˆ= h
384
A. Ben Hamza and Hamid Krim
probabilistic shape descriptor represents the object information and will be used in our matching experiments. In order to compare two geodesic shape distributions and hence to measure the performance of the proposed scheme, we will describe in the next section an information-theoretic distance that quantifies the difference between two 3D shapes through their probabilistic shape descriptors.
4
Probabilistic Dissimilarity
Let M1 and M2 two 3D objects with geodesic shape distributions pˆ and qˆ respectively. Information theoretic measures provide quantitative entropic divergences between two probability distributions. A common entopic measure is KullbackLeibler divergence. This dissimilarity measure, however, is not symmetric, unbounded, and undefined if pˆ is not absolutely continuous with respect to qˆ. To overcome these limitations, we use the Jensen-Shannon divergence D defined as H(ˆ p) + H(ˆ q) pˆ + qˆ − , D(ˆ p, qˆ) = H 2 2 where H(ˆ p) = − pˆ(x) log2 pˆ(x) dx is the differential entropy. The Jensen-Shannon divergence is a statistical distance that is very useful in quantifying differences between probability distributions or densities. In other words, this dissimilarity measure quantifies differences in shape between two arbitrary objects. Unlike the Kullback-Leibler divergence, the Jensen-Shannon divergence has the advantage of being symmetric, always defined, and generalizable to any number of probability distributions, with a possibility of assigning weights to these distributions [9]. In addition to its convexity property, the Jensen-Shannon divergence is shown to be an adapted measure of disparity among probability distributions, and it is upper bounded D(ˆ p, qˆ) ≤ log2 (2) = 1 (see [11] for more details).
5
Experimental Results
Object matching experiments were performed using a database of 3D models collected from the internet. Each model is represented as a triangle mesh, and the number of centroids is chosen to be equal n = 1000 for all models used in experimentation. This number was fixed by reducing the number of triangles while trying to preserve the overall shape of the triangle mesh. We experimented with different values of n, and the resulting geodesic shape distribution has been shown to be relatively insensitive to the choice of the number of centroids. We conducted two sets of experiments. The first set of simulations deals with objects that are topologically equivalent to a sphere (i.e. with genus equal to zero) as shown in Fig. 4. The numerical results using the Jensen-Shannon dissimilarity measure are depicted in Table 1 where the grayscale colorbar displays the grayscale colormap of this dissimilarity matrix. This grayscale colormap ranges from white (maximum similarity) to black (maximum dissimilarity), and passes
Geodesic Object Representation and Recognition 6
385
2.5
5
2
4 1.5 3 1 2
0.5
1
0
0
0.2
0.4
0.6
0.8
1
2
0
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
3.5
1.8 3 1.6 2.5
1.4 1.2
2
1 1.5
0.8 0.6
1
0.4 0.5 0.2 0
0
0.2
0.4
0.6
0.8
1
0
Fig. 4. First set of experiments: 3D models and their geodesic shape distributions. Table 1. Jensen-Shannon dissimilarity results for the first set of experiments.
through the gray colors indicating the values of the matching performance. As illustrated in Table 1, the minimum dissimilarity rate is about 9%, that is the matching rate is about 91%. The second set of experiments is similar the first, except that the underlying objects have a different topology than the ones in the first part of the experiments. Fig. 5 shows a set of objects with genus equal to one. Matching is achieved by the minimum Jensen-Shannon distance computations as illustrated
386
A. Ben Hamza and Hamid Krim 1.8
2.5
1.6 2 1.4 1.2
1.5
1 1
0.8 0.6
0.5 0.4 0.2
0
0.2
0.4
0.6
0.8
1
2
0
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
2.5
1.8 1.6
2
1.4 1.2
1.5
1 0.8
1
0.6 0.4
0.5
0.2 0
0
0.2
0.4
0.6
0.8
1
0
Fig. 5. Second set of experiments: 3D models and their geodesic shape distributions. Table 2. Jensen-Shannon dissimilarity results for the second set of experiments.
in Table 2. Note that the minimum dissimilarity rate is about 2%, that is the matching rate is about 98%.
6
Conclusions and Future Work
In this paper, we proposed an new methodology for object matching. The key idea is to encode a 3D shape into a 1D geodesic shape distribution. Object match-
Geodesic Object Representation and Recognition
387
ing is then achieved by calculating an information-theoretic measure of dissimilarity between the probability distributions. That is, the dissimilarity computations are carried out in a low-dimensional space of geodesic shape distributions. Finally, while the experimental results presented in this paper are very encouraging, significant additional performance gains and efficient parallelizations are still possible.
References 1. Y. Shinagawa, T.L. Kunii, and Y.L. Kergosien, “Surface coding based on Morse theory,” IEEE Comp. Graph. and Appl., vol. 11 no. 5 , pp. 66-78, 1991. 2. A.T. Fomenco and T.L. Kunii, Topological modeling for visualization, SpringerVerlag Tokyo, 1997. 3. J.J. Koenderink, Solid Shape, MIT Press, Cambridge, Massachusetts, 1990. 4. J.W. Bruce and P.J. Giblin, Curves and singularities, Cambridge University Press, second edition, 1992. 5. C. Lu, Y. Cao, and D. Mumford, “Surface evolution under curvature flows,” J. of Vis. Commun. and Image Representation, vol. 13, no. 1/2, pp. 65-81, 2002. 6. R. Osada, T. Funkhouser, B. Chazelle, and D. Dobkin, “Shape distributions,” ACM Transactions on Graphics, vol. 21, no. 4, pp. 807-832, October 2002. 7. S. Kullback and R. Liebler, “On information and sufficiency,” Ann. Math. Statist., vol. 22, pp. 79-86, 1951. 8. R. Stoica, J. Zerubia, and J.M. Francos, “Image retrieval and indexing: A hierarchical approach in computing the distance between textured images,” IEEE Int. Conf. on Image Processing, Chicago, 1998. 9. J. Lin, “Divergence measures based on the Shannon entropy,” IEEE Trans. Information Theory, vol. 37, no. 1, pp. 145-151, 1991. 10. A.O. Hero, B. Ma, O. Michel and J. Gorman,“Applications of entropic spanning graphs,” IEEE Signal Processing Magazine, vol. 19, pp. 85-95, Sept. 2002. 11. Y. He, A. Ben Hamza, and H. Krim, “A generalized divergence measure for robust image registration”, IEEE Trans. on Signal Processing, vol. 51, no. 5, May 2003. 12. M. Hilaga, Y. Shinagawa, T. Komura, and T.L. Kunii, “Topology matching for fully automatic similarity estimation of 3D shapes,”, Proc. SIGGRAPH, pp. 203-212, August 2001. 13. J. MacQueen, “Some methods for classification and analysis of multivariate observations,” Proc. 5th Berkeley Symp. Math. Stat. and Prob., 1967. 14. J.B. Tenenbaum, V. de Silva, and J.C. Langford, “A global geometric framework for nonlinear dimensionality reduction,” Science, vol. 290, pp. 2319-2323, Dec. 2000. 15. M.P. Wand and M.C. Jones, Kernel smoothing, Chapman and Hall, London, 1995. 16. A. Elad and R. Kimmel, “Bending invariant representations for surfaces,” Proc. of Computer Vision and Pattern Recognition, pp. 168-174, 2001. 17. R. Kimmel and J.A. Sethian, “Computing geodesic on manifolds,” Proc. of National Academy of Science, 95, pp. 8431-8435, 1998.
A Fast Algorithm for Reconstructing hv-Convex 8-Connected but Not 4-Connected Discrete Sets P´eter Bal´azs, Emese Balogh, and Attila Kuba Department of Informatics, University of Szeged ´ ad t´er 2, H-6720 Szeged, Hungary Arp´ {pbalazs,bmse,kuba}@inf.u-szeged.hu
Abstract. One important class of discrete sets where the reconstruction from two given projections can be solved in polynomial time is the class of hv-convex 8-connected sets. The worst case complexity of the fastest algorithm known so far for solving the problem is of O(mn · min{m2 , n2 }) [2]. However, as it is shown, in the case of 8-connected but not 4-connected sets we can give an algorithm with worst case complexity of O(mn · min{m, n}) by identifying the so-called S4 -components of the discrete set. Experimental results are also presented in order to investigate the average execution time of our algorithm. Keywords: Discrete tomography, reconstruction, convex and connected discrete set.
1
Introduction
One of the most frequently studied area of discrete tomography [8,9] is the problems of the reconstruction of 2-dimensional (2D) discrete sets from their row and column sum vectors. There are reconstruction algorithms for different classes of discrete sets (e.g., [3,4,6,7,10,11,14]). However, the reconstruction in certain classes can be NP-hard (see [15]). Since applications require fast algorithms, it is important to find reconstruction algorithms in those classes of 2D discrete sets where the reconstruction can be performed in polynomial time. We always suppose having some a priori information of the set to be reconstructed. The most frequently used properties are connectedness, directedness and some kind of discrete versions of the convexity. One important class where the reconstruction problem from two given projections can be solved in polynomial time is the class of hv-convex 8-connected sets. Several algorithms have been developed for solving this problem [5,11], among them the fastest has worst case complexity of O(mn · min{m2 , n2 }) [2]. In this paper we give an algorithm with worst case complexity of O(mn · min{m, n}) for the reconstruction problem in the class of hv-convex 8-connected but not 4-connected sets by examining the features of these sets. This paper is structured as follows. First, the necessary definitions are introduced in Section 2. In Subsection 3.1 we define S4 -components of an 8connected but not 4-connected hv-convex set and prove some properties of them, I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 388–397, 2003. c Springer-Verlag Berlin Heidelberg 2003
A Fast Algorithm for Reconstructing hv-Convex
389
then, in Subsection 3.2 we investigate the directedness of these components. S4 components can be identified from two given projections, as it is shown in Subsection 3.3. The reconstruction algorithm is represented in Subsection 3.4, the experimental results are given in Subsection 3.5.
2
Definitions and Notation
The finite subsets of ZZ 2 (the 2D integer lattice) are called discrete sets, its elements are called points or positions. F denotes the class of discrete sets. In the followings discrete sets will be represented by binary matrices F = (fij )m×n , where fij ∈ {0, 1}. Figure 1 shows a discrete set represented by the binary matrix 0 1 0 0 0 0 0 1 0 0 0 0 F = 1 1 0 0 0 0 . 0 1 1 0 0 0 0 0 0 1 1 1 For any discrete set F we define its projections by the operations H and V as follows. H : F −→ INm 0 , H(F ) = H = (h1 , . . . , hm ), where hi =
n
fij ,
i = 1, . . . , m ,
(1)
j=1
and V : F −→ INn0 , V(F ) = V = (v1 , . . . , vn ), where vj =
m
fij ,
j = 1, . . . , n .
(2)
i=1
The vectors H and V are called the row and column sum vectors of F , respectively (see Fig. 1). H and V are also called the projections of F . Not any pair of vectors is the projections of some discrete set. In the followings n we suppose, that H ∈ INm 0 and V ∈ IN0 are compatible which means that they satisfy the following two conditions. 1 ≤ i ≤ m, and vj ≤ m, for 1 ≤ j ≤ n; (i) hi ≤ n, for m n h = (ii) i i=1 j=1 vj , i.e., the two vectors have the same total sums. = ( The cumulated vectors of H and V are denoted by H h1 , . . . , hm ) and V = ( v1 , . . . , v n ), respectively, and defined with the following recursive formulas, h1 = h1 , v 1 = v1 ,
hi = hi−1 + hi , v j = v j−1 + vj ,
i = 2, . . . , m ,
(3)
j = 2, . . . , n
(4)
(see Fig. 1). Given a class G of discrete sets, we say that the discrete set F ∈ G is unique in the class G (w.r.t. the row and column sum vectors) if there is no different discrete set F ∈ G for which H(F ) = H(F ) and V(F ) = V(F ).
390
P´eter Bal´ azs, Emese Balogh, and Attila Kuba
~ V 1 5 6 7 8 9 1 1 H 2 2 3
F
1 2 ~ 4 H 6 9
1 4 1 1 1 1 V Fig. 1. An hv-convex 8- but not 4-connected discrete set F . The projections of F are and V . the vectors H and V , the cumulated vectors of H and V are denoted by H
Two points P = (p1 , p2 ) and Q = (q1 , q2 ) in ZZ 2 are said to be 4-adjacent if |p1 − q1 | + |p2 − q2 | = 1. The points P and Q are said to be 8-adjacent if they are 4-adjacent or |p1 − q1 | = 1 and |p2 − q2 | = 1. The sequence of distinct points (i(0) , j (0) ), . . . , (i(k) , j (k) ) is a 4/8-path from point (i(0) , j (0) ) to point (i(k) , j (k) ) in a discrete set F if each point of the sequence is in F and (i(l) , j (l) ) is 4/8adjacent, respectively, to (i(l−1) , j (l−1) ) for each l = 1, . . . , k. Two points are 4/8-connected in the discrete set F if there is a 4/8-path, respectively, in F between them. A discrete set F is 4/8-connected if any two points in F are 4/8-connected, respectively, in F . The 4-connected set is also called polyomino. The discrete set F is h-convex/v-convex if its rows/columns are 4-connected, respectively. The h- and v-convex sets are called hv-convex (see Fig. 1). We denote the class of hv-convex 8-connected and hv-convex 4-connected discrete sets by S8 and S4 , respectively. Clearly, S8 ⊃ S4 (see, e.g., Fig. 1) and so S8 \ S4 =∅. Let S8 = S8 \ S4 . In this paper we are going to study the problem of reconstruction in the class of hv-convex 8- but not 4-connected discrete sets Reconstruction(S8 ). n Instance: Two compatible vectors H ∈ INm 0 and V ∈ IN0 . Task: Construct a discrete set F ∈ S8 such that H(F ) = H and V(F ) = V . Note, that components of the row and column sum vectors of an 8-connected set cannot be zero, therefore in the followings we assume that the input of the above problem are the vectors H ∈ INm and V ∈ INn .
3 3.1
Reconstruction of Sets of S8 S4 -Components
Let F ∈ S8 . A maximal hv-convex 4-connected subset of F is called an S4 component of F . Clearly, the S4 -components F1 , . . . , Fk of F give a uniquely determined partition of F and the number of S4 -components of F is at least 2 (see, e.g., Fig. 1 where there are two S4 -components: {(5, 4), (5, 5), (5, 6)} and {(1, 2), (2, 2), (3, 1), (3, 2), (4, 2), (4, 3)}).
A Fast Algorithm for Reconstructing hv-Convex
391
Since F is hv-convex, the sets of the row/column indices of the elements of F1 , . . . , Fk consist of consecutive integers and they are disjoint. Then it follows that there is an S4 -component of F , say F1 with the smallest containing discrete rectangle (SCDR) R1 = I1 × J1 such that I1 = {1, . . . , i1 } for some i1 ≥ 1. Similarily, we get that there is another S4 -component of F , say F2 with the SCDR R2 = I2 × J2 such that I2 = {i1 + 1, . . . , i2 } for some i2 > i1 . And so on. Generally, there are integers 0 = i0 < i1 < . . . < ik−1 < ik = m (k ≥ 2) such that Il = {il−1 + 1, . . . , il } contains the row indices of the l-th S4 -component of F for each l (1 ≤ l ≤ k). Among I1 , . . . , Ik we define a relation ”<” as follows. Let I, I ∈ {I1 , . . . , Ik }. We say that I < I if each element of I is less than any element of I . Using this relation we can write shortly that I1 < I2 < . . . < Ik .
(5)
We define the same relation among J1 , . . . , Jk . In order to give a description of the relative positions of the S4 -components of F consider Theorem 1. Let F ∈ S8 having S4 -components F1 , . . . , Fk with the SCDRs I1 × J1 , . . . , Ik × Jk (k ≥ 2) such that (5) is satisfied. Then exactly one of the following cases is possible. (6) Case 1. J1 < J2 < . . . < Jk , Case 2. J1 > J2 > . . . > Jk . (7)
Proof. The proof is quite technical (see [1]). S8
has type 1 if Case 1 of Theorem 1 is In the followings we say that F ∈ satisfied, otherwise, that is, if Case 2 of Theorem 1 is satisfied, it has type 2. As an example see Fig. 2.
F
F’
Fig. 2. A discrete set F of type 1 and a discrete set F of type 2. The SCDRs are drawn with bold lines. CF = {(2, 2), (5, 5), (8, 8)}, CF = {(3, 8), (7, 4)}.
Corollary 1. Let F ∈ S8 . Then there are uniquely determined row indices 0 = i0 < i1 < . . . < ik = m and column indices 0 = j0 < j1 < . . . < jk = n such that Il × Jl is the SCDR of the S4 -component Fl of F for each l = 1, . . . , k (k ≥ 2), where Il = {il−1 + 1, . . . , il } and {jl−1 + 1, . . . , jl }, if F has type 1, Jl = (8) {jk−l + 1, . . . , jk−l+1 }, if F has type 2 .
392
3.2
P´eter Bal´ azs, Emese Balogh, and Attila Kuba
Directed Discrete Sets
An 8-path in a discrete set F is an NE-path from point (i(0) , j (0) ) to point (i(t) , j (t) ) if each point (i(l) , j (l) ) of the path is in north or east or northeast to (i(l−1) , j (l−1) ) for each l = 1, . . . , t. SW -, SE-, N W -paths can be defined similarily. The discrete set F is NE-directed if there is a particular point of F , called source (which is necessarily the point (m, 1)), such that there is an N Epath in F from the source to any other point of F . Similar definitions can be given for SW -, SE- and N W -directedness. The discrete set F in Fig. 1 is N W directed with source (5, 6). On the base of the following lemma, it is easy to check the directedness of discrete sets in the class S4 . Lemma 1. Let G ∈ S4 and R = {i +1, . . . , i }×{j +1, . . . , j } (i < i , j < j ) be its SCDR. (i) G is SE-directed if and only if gi +1,j +1 = 1; (ii) G is N W -directed if and only if gi ,j = 1; (iii) G is SW -directed if and only if gi +1,j = 1; (iv) G is N E-directed if and only if gi ,j +1 = 1. Proof. It follows from the definitions directly. (As an example see Fig. 2).
Theorem 2. Let F ∈ S8 having S4 -components F1 , . . . , Fk (k ≥ 2). If F has type 1 then F1 , . . . , Fk−1 are N W -directed and F2 , . . . , Fk are SE-directed. If F has type 2 then F1 , . . . , Fk−1 are N E-directed and F2 , . . . , Fk are SW -directed. Proof. The proof is based on Corollary 1 and Lemma 1 (see [1]). Depending on the type of F let us define if F has type 1, {(il , jl ) | l = 1, . . . , k − 1}, CF = {(il , jk−l + 1) | l = 1, . . . , k − 1}, if F has type 2 ,
(9)
where i1 , . . . , ik−1 and j1 , . . . , jk−1 denote the uniquely determined indices mentioned in Corollary 1. That is, CF consists of the sources of the N W -/N Edirected S4 -components F1 , . . . , Fk−1 if F has type 1/2, respectively (see Fig. 2). The knowledge of any element of CF is useful in the reconstruction of an F ∈ S8 , as we can see on the base of the following theorem. Theorem 3. Any F ∈ S8 is uniquely determined by its projections, its type and an arbitrary element of CF . Proof. The proof is based on Theorem 3 in [12] and Theorem 2 (see [1]).
Corollary 2. If F, F ∈ S8 are different solutions of the same reconstruction problem and they have the same type then CF ∩ CF = ∅.
A Fast Algorithm for Reconstructing hv-Convex
393
3.3
Equality Positions and V be the cumulated vectors of the projections of F ∈ S . We say Let H 8 that (i, j) ∈ {1, . . . , m} × {1, . . . , n} is an equality position of type 1 if hi = vj . (m, n) is a trivial equality position of type 1 and in the followings we omit it. We say that (i, j) ∈ {1, . . . , m} × {2, . . . , n + 1} is an equality position of type 2 if hi = vn − vj−1 . Not every equality position is in CF but they are useful to find the elements of CF (see Fig. 3).
1
~ V 2 4 6
9 1
F
4 6
~ H
8 9
and V . (1, 1), (2, 3) and (3, 4) are Fig. 3. A discrete set F with cumulated vectors H equality positions of type 1. However, only (2, 3) is in CF . (4, 2) is the only equality position of type 2 but it is not in CF since F has type 1. Lemma 2. Let F ∈ S8 and CF be defined by (9). Then the elements of CF are all equality positions of the same type as of F . Proof. See [1]. 3.4
The Reconstruction Algorithm
Our algorithm is called Algorithm REC8’ and works as follows. We first assume that the set F ∈ S8 to be reconstructed has type 1. On the base of Theorem 3 it is sufficient to find an arbitrary element of CF to reconstruct F from its projections uniquely. The elements of CF are equality positions of type 1 on the base of Lemma 2. So, in order to find all solutions of the reconstruction problem, we start to check every equality position of type 1 whether it is an element of CF and if it is then we find a solution. The set L1 of equality positions of type 1 can be found by the comparison of the cumulated row and column sums. This algorithm is called Algorithm L 1 and it is similar to the procedure used for reconstructing the spine of hv-convex polyominoes [12]. An analogous algorithm can be given to find all equality positions of type 2 (Algorithm L 2). Since the knowledge of any element of CF is sufficient, again on the base of Theorem 3, without losing any solution, we can assume that if an investigated equality position (i, j) of type 1 is in CF then it is the source of the first S4 -component F1 , i.e., the one with the SCDR {1, . . . , i1 } × {1, . . . , j1 }, i.e., (i, j) = (i1 , j1 ). On the base of Theorem 2, this S4 -component is N W -directed. Now, in order to decide if (i, j) is the source of F1 we try to reconstruct an hvconvex N W -directed polyomino with source (i, j). This can be done using Algorithm RecNW which is a simple modification of the algorithm for reconstructing
394
P´eter Bal´ azs, Emese Balogh, and Attila Kuba
directed discrete sets given in [12]. Algorithm RecNW tries to reconstruct an m × n binary matrix G from the input data H, V and (i, j) such that the 1’s of G constitute an hv-convex N W -directed polyomino having source (i, j) and the row and column sums of G in the non-zero rows and columns are equal to the corresponding elements of H and V , respectively. If RecNW can reconstruct such a G then it returns also the upper left position (i , j ) of the SCDR of G. If RecNW fails, there is no such a binary matrix G. Now, there are two cases: (1) RecNW fails. Clearly, in this case (i, j) cannot be the source of F1 . We continue with the investigation of the next equality position from L1 . (2) RecNW gives a (unique) solution, i.e., it is possible to reconstruct an hvconvex N W -directed polyomino G with source (i, j) and with the SCDR {i , . . . , i}×{j , . . . , j}, where 1 ≤ i ≤ i = i1 and 1 ≤ j ≤ j = j1 . If (i , j ) = (1, 1) then, clearly G cannot be the first S4 -component of F , i.e., F1 =G and we continue with the investigation of the next equality position from L1 . Otherwise, i.e., when (i , j ) = (1, 1), we can assume that F1 = G and we try to reconstruct the 2nd, 3rd, ... S4 -components iteratively. Reconstruction of the SE-directed k-th component Fk (k = 2, . . .) can be done using Algorithm RecSE. Algorithm RecSE tries to reconstruct an m×n binary matrix G from the input data H, V and (i, j) such that the 1’s of G constitute an hv-convex SE-directed polyomino having source (i, j) and the row and column sums of G in the non-zero rows and columns are equal to the corresponding elements of H and V , respectively. If RecSE can reconstruct such a G then it returns also the lower right position (i , j ) of the SCDR of G. If RecSE fails, there is no such binary matrix G. On the base of Theorem 2, Fk must be SEdirected with source (ik−1 + 1, jk−1 + 1). We call RecSE to reconstruct such a polyomino. Again, there are two cases: (2.1) RecSE fails. Clearly, in this case (ik−1 + 1, jk−1 + 1) cannot be the source of Fk which contradicts the assumption that (i, j) is the source of F1 . We continue with the investigation of the next equality position from L1 . (2.2) RecSE gives a (unique) solution, i.e., it is possible to reconstruct an hvconvex SE-directed polyomino G with source (ik−1 +1, jk−1 +1) and with the SCDR {ik−1 + 1, . . . , i } × {jk−1 + 1, . . . , j }, where ik−1 + 1 ≤ i ≤ m and jk−1 + 1 ≤ j ≤ n. Depending on the properties of G we have two cases, again: (2.2.1) If (i , j ) = (m, n) then Fk cannot be the last component. Then, on the base of Theorem 2, Fk is N W -directed and therefore fik ,jk = 1 (on the base of Lemma 1). If gi ,j = 1 then, clearly, Fk = G which contradicts the assumption that (i, j) is the source of F1 . We continue with the investigation of the next equality position from L1 . Otherwise, that is, when gi ,j = 1, we can assume that Fk = G. On the base of Corollary 2, G cannot be the first component of any other solution of the same type therefore (i , j ) can be deleted from L1 and we continue with the next iteration. (2.2.2) If (i , j ) = (m, n) then Fk = G and F = F1 ∪ . . . ∪ Fk . We found a solution and we continue with the investigation of the next equality position from L1 in order to find another solutions.
A Fast Algorithm for Reconstructing hv-Convex
395
Fig. 4. Reconstructing sets of S8 of type 1 having projections H = (1, 2, 2, 2) and V = (1, 2, 2, 2). L1 = {(1, 1), (2, 2), (3, 3)}. 1st row: trying to build a solution from position (1, 1) by filling a row or a column in each step the algorithm fails because of no place for filling the last row. After this step L1 = {(2, 2)} since position (3, 3) can be deleted from L1 (see Case (2.2.1) of the algorithm). 2nd row: testing the only position (2, 2) from L1 whether it is the source of F1 , the only solution of type 1 is found.
Fig. 5. Four hv-convex 8-connected but not 4-connected discrete sets with the same row and column sums: H = (1, 2, 2, 2, 2, 1), V = (1, 2, 2, 2, 2, 1).
The first part of our algorithm (searching for solutions of type 1) is illustrated in Fig. 4. The second part of the algorithm, i.e., when it is assumed that F has type 2, is similar to the first part but we investigate equality positions of type 2 instead of type 1 and try to build N E- and SW -directed components from the corresponding sources (using the algorithms RecNE and RecSW). If no solutions are found after investigating all equality positions of both types then the assumption that F ∈ S8 is not met, i.e., there is no discrete set with the given projections which is hv-convex, 8-connected but not 4-connected. However, in some cases there can be more than one solution (see Fig. 5). Theorem 4. The worst case computational complexity of Algorithm REC8’ is of O(mn · min{m, n}). The algorithm finds all sets of S8 with the given projections. Proof. Every row and column index can be in an equality position of both type at most once. This means that we have at most min{m, n} equality positions of type 1 and at most min{m, n} equality positions of type 2. Moreover, equality positions can be found in time O(m + n) by Algorithms L 1 and L 2. Building the S4 -components of F assuming that an equality position (i, j) is in CF takes O(mn) time. We have to examine every equality position if it is in CF , so we get the execution time O(mn · min{m, n}) in the worst case.
396
P´eter Bal´ azs, Emese Balogh, and Attila Kuba
Table 1. Average execution times in seconds of Algorithm REC8’ and Algorithm C depending on the size of the matrix. Each set of test data consists of 1000 hv-convex 8-connected but not 4-connected discrete sets. Size n × n
Algorithm REC8’
Algorithm C
20 × 20 40 × 40 60 × 60 80 × 80 100 × 100
0.000272 0.001064 0.002597 0.004746 0.007831
0.011511 0.032524 0.065897 0.116505 0.178633
On the base of Theorems 2 and 3 the sets reconstructed by Algorithm REC8’ are hv-convex, 8-connected and have the given projections H and V . On the base of Theorem 3 any element of CF together with the projections and the knowledge of the type of F is sufficient to reconstruct F uniquely. Elements of CF are equality positions, too, on the base of Lemma 2. Since Algorithm REC8’ examines every equality position whether it is in CF , the second part of the theorem follows. 3.5
Experimental Results
In 2001 E. Balogh et al. presented an algorithm having worst case complexity of O(mn · min{m2 , n2 }), which has so far the best average time complexity for reconstructing hv-convex 8-connected discrete sets (Algorithm C in [2]). In order to compare the average execution times of our Algorithm REC8’ and Algorithm C we need to generate sets of S8 at random with uniform distribution. In [2] an algorithm is also given to generate hv-convex 8-connected discrete sets having fixed row and column numbers with uniform distribution. The method is also suitable to generate sets od S8 with uniform distribution (we check whether the generated set is 4-connected and if so, then we simply omit it). We have generated discrete sets of S8 with different sizes. Then, we have reconstructed them with both algorithms. We used a PC with AMD Athlon processor of 1.4 GHz and 1.5 GB RAM under Red Hat Linux release 7.3. The programs were written in C++. The average execution times in seconds for obtaining all the solutions of different test sets are presented in Table 1. The results show that not only the worst case complexity of our algorithm is better (see Theorem 4) but also its average execution time was much better using any of the five test sets.
4
Conclusions
We have introduced a subclass of hv-convex 8-connected discrete sets, the class of hv-convex 8- but not 4-connected sets, and investigated the reconstruction problem in this class. We have shown that sets belonging to this class can be decomposed into so-called S4 -components which can be uniquely reconstructed. We also introduced the concept of equality positions in order to determine these components.
A Fast Algorithm for Reconstructing hv-Convex
397
A reconstruction algorithm has been given with worst case complexity of O(mn · min{m, n}) and compared to a previous (more general) one given in [2]. It is quite surprising that the assumption on a set being 8-connected but not 4-connected makes so much improving in the reconstruction complexity (c.f. [13]). These results give us a better understanding of the reconstruction problems and hopefully lead us towards designing reconstruction algorithms in broader classes of discrete sets.
References 1. P. Bal´ azs, E. Balogh, A. Kuba, Reconstruction of 8-connected but not 4-connected discrete sets, Technical Report at the University of Szeged (2002) http://www.inf.u-szeged.hu/˜pbalazs/research/research.html 2. E. Balogh, A. Kuba, Cs. D´ev´enyi, A. Del Lungo, Comparison of algorithms for reconstructing hv-convex discrete sets, Lin. Alg. and Its Appl. 339 (2001) 23–35. 3. E. Barcucci, A. Del Lungo, M. Nivat, R. Pinzani, Reconstructing convex polyominoes from horizontal and vertical projections, Theor. Comput. Sci. 155 (1996) 321–347. 4. S. Brunetti, A. Daurat, Reconstruction of discrete sets from two or more X-rays in any direction, Proceedings of the seventh International Workshop on Combinatorial Image Analysis (2000) 241–258. 5. S. Brunetti, A. Del Lungo, F. Del Ristoro, A. Kuba, M. Nivat, Reconstruction of 8and 4-connected convex discrete sets from row and column projections, Lin. Alg. and Its Appl. 339 (2001) 37–57. 6. M. Chrobak, Ch. D¨ urr, Reconstructing hv-convex polyominoes from orthogonal projections, Information Processing Letters 69(6) (1999) 283–289. 7. A. Del Lungo, M. Nivat, R. Pinzani, The number of convex polyominoes reconstructible from their orthogonal projections, Discrete Math. 157 (1996) 65–78. 8. G.T. Herman, A. Kuba (Eds.), Discrete Tomography, Special Issue. Int. J. Imaging Systems and Techn. 9 (1998) No. 2/3. 9. G.T. Herman, A. Kuba (Eds.), Discrete Tomography: Foundations, Algorithms and Applications (Birkh¨ auser, Boston, 1999). 10. A. Kuba, The reconstruction of two-directionally connected binary patterns from their two orthogonal projections, Comp. Vision, Graphics, and Image Proc. 27 (1984) 249–265. 11. A. Kuba, Reconstruction in different classes of 2D discrete sets, Lecture Notes on Computer Sciences 1568 (1999) 153–163. 12. A. Kuba, E. Balogh, Reconstruction of convex 2D discrete sets in polynomial time, Theor. Comput. Sci. 283 (2002) 223–242. 13. L. Latecki, U. Eckhardt, A. Rosenfeld, Well-Composed Sets, Computer Vision and Image Understanding 61(1) (1995) 70-83. 14. H.J. Ryser, Combinatorial properties of matrices of zeros and ones, Canad. J. Math. 9 (1957) 371–377. 15. G.W. Woeginger, The reconstruction of polyominoes from their orthogonal projections, Information Processing Letters 77 (2001) 225–229.
Stability in Discrete Tomography: Linear Programming, Additivity and Convexity Sara Brunetti1 and Alain Daurat2 1
Dipartimento di Scienze Matematiche e Informatiche, Universit` a di Siena, Via del Capitano 15, 53100, Siena, Italy [email protected] 2 LSIIT CNRS UMR 7005, Universit´e Louis Pasteur (Strasbourg 1), Pˆ ole API, Boulevard S´ebastien Brant, 67400 Illkirch-Graffenstaden, France [email protected]
Abstract. The problem of reconstructing finite subsets of the integer lattice from X-rays has been studied in discrete mathematics and applied in several fields like image processing, data security, electron microscopy. In this paper we focus on the stability of the reconstruction problem for some lattice sets. First we show some theoretical bounds for additive sets, and a numerical experiment is made by using linear programming to deal with stability for convex sets. Keywords: Discrete Tomography, Linear Programming, Additivity.
1
Introduction
A lattice set is a non-empty finite subset of the integer lattice Z2 . A lattice direction is a direction directed by a vector in Z2 \ {0}, and it can also be given by an equation p(x, y) = ax + by with a, b ∈ Z. Further, the X-ray of a lattice set E in a lattice direction p is the function Xp E giving the number of points in E on each line parallel to this direction, formally Xp E(k) = |{M ∈ E : p(M ) = k}|. Discrete Tomography is the area of mathematics and computer science that deals with the inverse problem of reconstructing lattice sets from a finite set of X-rays. The reconstruction problem can be formulated as a linear program in terms of fuzzy sets instead of lattice sets and efficient algorithms based on the interior point can be provided for finding any solution or proving that no such solution exists [9,11]. This approach is also motivated by the computational complexity result stating that the reconstruction problem is NP-hard when the X-rays are taken in more than two directions (m > 2) so that (if P = NP) any algorithm will take an exponential time. In this paper we use linear programming to deal with the stability of the reconstruction problem. Stability is of main importance in practical applications where the X-rays are possibly affected by errors. For instance, in electron microscopy, techniques [17] that enable to count the number of atoms lying in a line up to an error of ±1 are known. But in case of instability, the reconstructed set can be quite different from the original one even if the error on the data I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 398–407, 2003. c Springer-Verlag Berlin Heidelberg 2003
Stability in Discrete Tomography
399
is small. In [1] the authors prove that when m > 2, the two sets can be even disjoint, permitting an error of 2(m − 1) on the X-rays. In Section 3 we show that to obtain a stability result even with a very small error on the data the requirement of uniqueness for the sets is not enough (see Remark 1). If the sets are additive, then a stability result holds. Here we just recall to the reader that additivity implies uniqueness, whereas the converse is not true. Additionally, the notion of additivity should be regarded as a property of the solutions of the linear program. In Section 4 we treat the stability of reconstructing convex sets. Experimental results suggest the conjecture that for the set of directions {x, y, 2x+y, −x+2y}, convex sets are additive. This would imply that the results of Section 2 may hold for convex sets so giving a stability result that corresponds to the continuous case where the reconstruction problem for convex bodies is well-posed ([15]).
2
The Problem
The reconstruction problem is the task of determining any lattice set having the given X-rays. Stability concerns how sensitive is the problem to noisy data. Hence one can ask whether a perturbation of the data correspond solutions that are close. To study the problem we define a measure for the error on the X-rays and one for the distance of two solutions. Let D be a set of m prescribed lattice directions with m ≥ 2 and let E, F be lattice sets: |Xp E(k) − Xp F (k)| DXD (E, F ) = max p∈D
k∈Z
and card(EF ) = card((E \ F ) ∪ (F \ E)). The formulation of the problem that we consider is the following: Problem 1. Let E be known. Determine F maximizing card(EF ), with the constraint that DXD (E, F ) is given. Let us introduce some definitions that we need in the following. Definition 1. A lattice set E is additive with respect to D, or D-additive, if there is a function e which gives a value ep (k) for each line p = k parallel to a direction p of D such that for all M in Z2 : ep (p(M )) > 0. M ∈ E if and only if p∈D
This definition introduced by Fishburn et al. can be better understood with linear programming: a lattice set E is additive if it is the unique solution of the linear programming problem which looks for a fuzzy set which has the same X-rays than E.
400
Sara Brunetti and Alain Daurat
Definition 2. A lattice set E is unique with respect to D, or D-unique, if F ⊂ Z2 and Xp E = Xp F for any p ∈ D imply E = F . There is an intimate relationship between these two definitions: every D-additive set is D-unique and the converse is true if m = 2 (see[9]). As a last remark we recall that if p and q are two directions, then a p-line does not always intersect a q-line. Indeed Z2 can be split in det(p, q) pq-lattice such that in each pq-lattice a p-line intersects with any q-line. Precisely a pq-lattice has the form: Lpq i = {M : p(M ) = i (mod det(p, q)) and q(M ) = κi (mod det(p, q))} where κ only depends on the directions p and q (see for example [6]). Moreover we denote by i, jpq the point M such that p(M ) = i and q(M ) = j. Notice that this point is in Z2 only if p = i and q = j are in the same pq-lattice.
3 3.1
Stability for Additive Sets Error Equal to 1
In this section we study the symmetric difference of any two D-additive sets E and F verifying the condition DXD (E, F ) ≤ 1. In the first two lemmas additivity is not required. The condition DXD (E, F ) ≤ 1 permits the X-rays of the two sets to differ by one in at most a line for each direction. Then, p ∈ D and an integer kp exists such that |Xp E(kp ) − Xp F (kp )| = 1 and Xp E(k) = Xp F (k) for k = kp . Lemma 1. If p ∈ D and an integer kp exist such that |Xp E(kp )−Xp F (kp )| = 1, then for every q ∈ D there is an integer kq such that |Xq F (kq ) − Xq E(kq )| = 1 and kp , kq pq ∈ Z2 . be the pq-lattice containing the line p = kp , or equivalently Proof. Let Lpq i kp ∈ p(Lpq ). Suppose that Xp F (kp ) − Xp E(kp ) = +1. Thus, we have that i k∈p(Lpq i )
Xp F (k) = 1 +
Xp E(k).
k∈p(Lpq i )
Using the consistency of the X-rays for F and E, the previous identity leads to the following Xq F (k) = 1 + Xq E(k), k∈q(Lpq i )
k∈q(Lpq i )
for all q in D. From this, the thesis easily follows.
2
In the next lemma we show that all the lines with error 1 have a common point and this point is in Z2 . In the following, we assume that card(F ) > card(E) and for any p ∈ D the integer kp is as in the previous lemma.
Stability in Discrete Tomography
401
Lemma 2. If DXD (E, F ) = 1, then a point W ∈ Z2 exists such that Xp F (k) = Xp E(k) + 1, if k = p(W ) Xp F (k) = Xp E(k), otherwise for all the directions p in D. Proof. Let p, q and r be directions in D and suppose that A = kp , kq pq , B = kp , kr pr , C = kq , kr qr are three distinct points. Let a, b be such that r = ap + bq. Thus, summing up we can write: r(M ) = a p(M ) + b q(M ) M ∈F
M ∈F
M ∈F
and by grouping line by line we obtain: kXr F (k) = a kXp F (k) + b kXq F (k). k
k
k
We can exhibit the corresponding identity for the set E. As a result of the difference of these two identities we obtain that kr = akp + bkq and so r(A) = r(B) = r(C). Thus, the three points A, B and C coincide and the claim is proved. 2 Suppose now that E and F are D-additive, that is E = {M : e(M ) > 0} and F = {M : f (M ) > 0}. Proposition 1. Let E and F be D-additive lattice sets. If DXD (E, F ) = 1, then card(EF ) = 1. Proof. Let W be as in Lemma 2. At first suppose that W ∈ E and let E = E ∪ {W }. For each direction p in D we have that Xp E = Xp F . Finally, since additivity of F implies uniqueness of F , we conclude that F = E ∪ {W }. On the contrary, if W ∈ E we study the following: ep (p(M ))(1E (M ) − 1F (M )). ΦE = M ∈Z2 p∈D
Rewriting it as ep (p(M ))(1E (M ) − 1F (M )) + ep (p(M ))(1E (M ) − 1F (M )), M ∈E p∈D
M ∈E p∈D
we notice that ΦE > 0, because the additivity of E implies that if M is in E, then e(M ) > 0 and 1E (M ) = 1 holds, and otherwise e(M ) ≤ 0 and 1E (M ) = 0. We can also explicit the terms Xp E and Xp F in ΦE so obtaining that ΦE = ep (k)(Xp E(k) − Xp F (k)) k∈p(W ) p∈D
+
ep (p(W ))(Xp E(p(W )) − Xp F (p(W )))
p∈D
that is strictly less than zero.
2
402
Sara Brunetti and Alain Daurat
y = 49
E F
W
y=0
x=0
D
x = 70
Fig. 1. E and F are non-additive sets of uniqueness such that DXD (E, F ) = 1 and E ∩ F = ∅.
Remark 1. The comparison between uniqueness and additivity can be made following [9]. Given any three lattice directions we may construct two sets E, F in such a way that they are unique but non-additive. (We do not give the proof for reasons of space limit and we refer the reader to [9]). Figure 1 illustrates two such sets verifying the constraint DXD (E, F ) = 1. Since they are disjoint, Proposition 1 does not hold for D-unique sets. 3.2
Error Larger than 1
In this section we consider the case where the error is larger than 1. Since even when the error is just equal to 2, we have instability if the number of lattice directions is larger than 2, we restrict our attention to the case of two directions. More in detail, the instability follows from the result of [1, Theorem 1] because the sets constructed in the proof of [1] are actually D-additive. Therefore we can restate it as follows: Proposition 2 (see [1]). For any n and a set D of m ≥ 3 directions there exist E and F D-additive such that |E| = |F | ≥ n, DXD (E, F ) = 2 and E ∩ F = ∅. As a result, our focus is on the case of two directions. In this case additivity is equivalent to uniqueness, and the construction used to prove Proposition 2 cannot be carried out. Since pq-lattices are equivalent to Z2 we can see that it is sufficient to consider the case D = {x, y}.
Stability in Discrete Tomography
403
So, in the following, we suppose that E and F are unique with respect to D = {x, y}. In [13, p17]) it is proved that M = (xM , yM ) is in E if and only if e(M ) = ex (xM ) + ey (yM ) ≥ 0 where ex (j) = Xx E(j) and ey (i) = −card({l : Xy E(l) ≥ Xy E(i)). Notice that this property implies directly the additivity of Definition 1 because we can add a small positive number to e in such a way that e(M ) remains negative if M ∈ / E. We define similarly f (M ) = fx (xM ) + fy (yM ). Then, as in the previous section, we can prove that: ΦE = (ex (j) + ey (i))(1E (j, i) − 1F (j, i)) ≥ 0 (j,i)∈Z2
ΦF =
(fx (j) + fy (i))(1E (j, i) − 1F (j, i)) ≤ 0
(j,i)∈Z2
Remark 2. By definition of ex and fx , if an error of ±a occurs in x = j then fx (j) = ex (j) ± a. The relationship between fy (i) and ey (i) is more complex and will be studied in special cases. At first we begin with a short lemma: Lemma 3. Let P be a point of F \ E such that Xx F (xP ) = Xx E(xP ) + 1 and Xy F (yP ) ≤ Xy E(yP ), then a point Q ∈ E \ F exists satisfying yP = yQ , and for any such point we have Xx E(xQ ) > Xx F (xQ ). Proof. Since Xy F (yP ) ≤ Xy E(yP ) there exists a point Q ∈ E \ F such that yP = yQ . Let ex , ey be defined as above. We have: ex (xQ ) + ey (yQ ) ≥ 0 > fx (xQ ) + fy (yQ ) ex (xP ) + ey (yP ) < 0 ≤ fx (xP ) + fy (yP )
(3.1) (3.2)
Substituting fx (xP ) = ex (xP )+1 in (3.2) we get ey (yP ) < 0 ≤ fy (yP )+1, that is, ey (yP ) ≤ fy (yP ) (because ey and fy are always integer). Since ey (yP ) = ey (yQ ) and fy (yP ) = fy (yQ ), equation (3.1) gives ex (xQ ) > fx (xQ ). 2 Proposition 3. Let D = {x, y}; if E and F are any two D-unique lattice sets satisfying DXD (E, F ) = 2, then card(EF ) ≤ 4. The proof is omitted due to space constraints and is available in [4].
4
Stability for Convex Sets
In this section we experimentally study the stability of the reconstruction of convex sets via linear programming. Any convex set is the intersection of a convex polygon and the digital plane Z2 . The result of [10] states that convex sets are uniquely determined by their X-rays taken in a suitable set of directions. In the “continuous” plane an analogous result holds and additionally the problem is well-posed ([15]). Moreover there is a connection between additive sets and convex sets, since an euclidean ball is additive with respect to two orthogonal
404
Sara Brunetti and Alain Daurat
directions ([8]). Our experiments support the suspect that these results have a correspondence in the “digital” plane. Actually we consider in this section a class of lattice sets which is more general than the convex sets [3]. For each point M = (xM , yM ) ∈ Z2 the four quadrants around M are defined by the following formulas: R0 (M ) = {(x, y) ∈ Z2 / x ≤ xM and y ≤ yM }, R1 (M ) = {(x, y) ∈ Z2 / x ≥ xM and y ≤ yM }, R2 (M ) = {(x, y) ∈ Z2 / x ≥ xM and y ≥ yM }, R3 (M ) = {(x, y) ∈ Z2 / x ≤ xM and y ≥ yM }.
Definition 3. A lattice set E is Q-convex if and only if for each M ∈ E there exists i ∈ {0, 1, 2, 3} such that Ri (M ) ∩ E = ∅. An example of Q-convex set is given on the left part of Figure 2. We have generated 184 Q-convex sets of semi-perimeter from 4 to 370 using an uniform generator ([5], inspired from [12]). Then their X-rays in the set of directions D = {x, y, 2x + y, −x + 2y} have been computed. (These directions have been chosen because the X-rays along them uniquely determine the convex sets ([10]) and they contain the horizontal and vertical directions). We then used these X-rays and any error e ∈ {0, 1, 2, 3} as input data in the following linear-program: Maximizing
(1 − vi,j ) +
(i,j)∈E
vi,j
(4.3)
(i,j)∈E /
such that p(i,j)=k
+ − vi,j = Xp E(k) + erp,k − erp,k
k
+ − erp,k + erp,k ≤e
+ − ≥ 0, erp,k ≥0 0 ≤ vi,j ≤ 1, erp,k
(4.4) (4.5) (4.6)
We solved the linear program with the software soplex ([16]). Notice that solving this problem with vi,j ∈ Z permits to exactly find the maximum of card(EF ) where F describes all the lattice sets such that DXD (E, F ) ≤ e. Unfortunately, integer-linear-program is an NP-hard problem, and hence we solved the relaxed problem where the unknown variables can be fractional: this computation provides an upper bound to card(EF ). Figure 2 illustrates (on the right side-hand) a solution of the linear programming for card(E) = 200 and e = 3. The different grey-scale colors of the squares correspond to different values of vi,j . The complete results are summarized in Figures 3 and 4. In Figure 3 the upper bound to
Stability in Discrete Tomography
E
405
vi,j
Fig. 2. A Q-convex set E and the corresponding extremalvalues of vi,j for e = 3. In this case we have card(E) = 200 and (i,j)∈E (1 − vi,j ) + (i,j)∈E c vi,j = 33.7.
) Fig. 3. An upper bound to card(EF for the Q-convex generated sets. (Only 40 % of card(E) the 184 generated sets have been represented for readability).
card(EF ) is divided by card(E), so that each value gives an upper bound to the relative distance from a given set. Moreover the black squares show the values of the maximum of the quantity (4.3) when the constraints (4.4),(4.5) are replaced by Xp E(k) − 1 ≤ p(i,j)=k vi,j ≤ Xp E(k) + 1: these values give an upper bound to card(EF ) when DXD (E, F ) = maxp∈D maxk∈Z |Xp E(k) − Xp F (k)| = 1.
406
Sara Brunetti and Alain Daurat
) √ Fig. 4. An upper bound to card(EF for the 184 generated Q-convex sets. card(E)
These experimental results bring out the following points: – If DXD (E, F ) = 0, then we always found a null relative distance. In other words, according to our experiments every Q-convex set is D-additive. In fact this property was first conjectured by L. Thorens ([14]) (with additivity replaced by uniqueness), and can be seen as a variant of Conjecture 4.6 of [2] and Theorem 5.7 of [10]. We may set out the conjecture as follows: Conjecture 1. If D is a set of directions which contains {x, y}, such that all the directions are not in the same quadrant and they uniquely determine the convex sets, then every Q-convex set is D-additive. Notice that the property about the quadrants is necessary because there is a counter-example with D = {x, y, x + y, x + 5y}. – For a fixed error e, the relative distance looks to converge to zero as card(E) this ratio seems to be grows. If we divide by card(E) instead of card(E), bounded so that in average card(EF ) = O( card(E)) according to our experiments (see Figure 4). It must be noticed that in the case e = 1, by the previous remark and Proposition 1 the real maximum error for lattice sets is always 1 for the generated cases; we obtain a stronger result in the experiment because the problem has been relaxed.
Stability in Discrete Tomography
407
– If DXD (E, F ) = 1, then the relative distance does not seem to converge to zero, but the computed values are only upper bounds, that is, we do not know if the fractional values mirror instability or they are just an artifact introduced by relaxing the integral constraints of the problem. In the former case, the reconstruction of convex sets would not be applied easily in the continuous world (as in medical imaging), because a rounding error of the measurements can always be of ±1.
References 1. A. Alpers, P. Gritzmann, and L. Thorens. Stability and instability in discrete tomography. In Dagstuhl Seminar: Digital and Image Geometry 2000, volume 2243 of Lecture Notes in Comp. Sci., pages 175–186. Springer, 2001. 2. E. Barcucci, A. Del Lungo, M. Nivat, and R. Pinzani. X-rays characterizing some classes of discrete sets. Linear Algebra Appl., 339:3–21, 2001. 3. S. Brunetti and A. Daurat. An algorithm reconstructing convex lattice sets. To be published in Theoret. Comput. Sci. 4. S. Brunetti and A. Daurat. Stability in discrete tomography: Linear programming, additivity and convexity. Extended version with proof. 5. A. Daurat. Counting and generating Q-convex sets. in preparation. 6. A. Daurat, A. Del Lungo, and M. Nivat. Median points of discrete sets according to a linear distance. Disc. and Comp. Geom., 23:465–483, 2000. 7. A. Del Lungo and M. Nivat. Reconstruction of connected sets from two projections. In Discrete tomography, Appl. Numer. Harmon. Anal., pages 163–188. Birkh¨ auser Boston, 1999. 8. P. C. Fishburn, J. C. Lagarias, J. A. Reeds, and L. A. Shepp. Sets uniquely determined by projections on axes. I. Continuous case. SIAM J. Appl. Math., 50(1):288–306, 1990. 9. P. C. Fishburn and L. A. Shepp. Sets of uniqueness and additivity in integer lattices. In Discrete tomography, Appl. Numer. Harmon. Anal., pages 35–58. Birkh¨ auser, 1999. 10. R. J. Gardner and P. Gritzmann. Discrete tomography: determination of finite sets by X-rays. Trans. Amer. Math. Soc., 349(6):2271–2295, 1997. 11. P. Gritzmann, S. de Vries, and M. Wiegelmann. Approximating binary images from discrete X-rays. SIAM J. Optim., 11(2):522–546, 2000. 12. W. Hochst¨ attler, M. Loebl, and C. Moll. Generating convex polyominoes at random. Discrete Math., 153(1-3):165–176, 1996. 13. A. Kuba and G. T. Herman. Discrete tomography: a historical overview. In Discrete tomography, Appl. Numer. Harmon. Anal., pages 3–34. Birkh¨ auser Boston, Boston, MA, 1999. 14. L. Thorens. Personal communication, 2000. 15. A. Volˇciˇc. Well-posedness of the Gardner-McMullen reconstruction problem. In Measure theory, Oberwolfach 1983 (Oberwolfach, 1983), volume 1089 of Lecture Notes in Math., pages 199–210. Springer, 1984. 16. R. Wunderling. Paralleler und Objektorientierter Simplex-Algorithmus. PhD thesis, ZIB technical report TR 96-09, 1996. 17. S. de Vries. Discrete Tomography, Packing and Covering, and Stable Set Problems. Polytopes and Algorithms. PhD thesis, Utz, Muenchen,DE, 1999.
Removal and Contraction for n-Dimensional Generalized Maps Guillaume Damiand and Pascal Lienhardt IRCOM-SIC, UMR-CNRS 6615 - bˆ at. SP2MI, Bvd M. et P. Curie BP 30179, 86962 Futuroscope Chasseneuil Cedex, France {damiand,lienhardt}@sic.univ-poitiers.fr
Abstract. Removal and contraction are basic operations for several methods conceived in order to handle irregular image pyramids, for multi-level image analysis for instance. Such methods are often based upon graph-like representations which do not maintain all topological information, even for 2-dimensional images. We study the definitions of removal and contraction operations in the generalized maps framework. These combinatorial structures enable us to unambiguously represent the topology of a well-known class of subdivisions of n-dimensional (discrete) spaces. The results of this study make a basis for a further work about irregular pyramids of n-dimensional images. Keywords: Removal, contraction, irregular pyramids, generalized maps.
1
Introduction
Many works deal with regular (cf. e.g. [1]) or irregular (cf. e.g. [2,3,4]) image pyramids for multi-level analysis and treatments (cf. also [5]). For irregular pyramids, it is necessary to handle a (topological) representation and basic operations, for instance dual graphs and removal and contraction operations for 2D images [6,7]. Similar problems about multi-level representations arise also in geometric modeling (e.g. for CAD applications, architectural or geological modeling,. . . ). Our goal is to build a theoretical framework for the definition and handling of n-dimensional irregular pyramids: the 2D case has been widely studied; the importance of the 3D case is now well-known, and several works deal with 4D objects for which time corresponds to the fourth dimension (e.g. sequences of 3D images and 4D geometric modeling for animation). We think it is thus important to get coherent definitions of data structures and operations for any dimension. So we study the definition of removal and contraction of i -dimensional cells within n-dimensional objects, in order to rigorously define the relations between two consecutive levels of a pyramid. This is a basic work which would enable us to conceive data structures (including their constraints of consistency), providing unique and unambiguous representations for pyramids of subdivided objects1 . 1
Informally, a subdivision of an n-dimensional space is a partition into i-dimensional cells (or i-cells), for 0 ≤ i ≤ n.
I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 408–419, 2003. c Springer-Verlag Berlin Heidelberg 2003
Removal and Contraction for n-Dimensional Generalized Maps
409
Our choice for the data structures for which we study the definitions of removal and contraction operations is mainly a consequence of the fact that unique and unambiguous representations are needed, which take into account multiadjacency. For instance, a main drawback of many graph-like representations is the fact that the whole topological information is often not maintained. More precisely, such representations are often ambiguous, even for 2-dimensional images, in particular when image regions are multi-adjacent. And multi-adjacency usually appears when constructing irregular pyramids. The definition of subdivision representations is the subject of many works in the field of imagery, but also in the field of geometric modeling (cf. for instance [8,9,10,11,12,13]). Several representations extend the notion of 2D combinatorial map [14,15,16] for several classes of n-dimensional subdivisions [12,17]. Note also that several models based on combinatorial maps have been proposed for handling 2-dimensional [18,19] and 3-dimensional segmented or multi-level images [20,21,22,23,24]. For instance, Brun and Kropatsch revisit works about graphs [25,26,27,28] in order to define 2-dimensional combinatorial map pyramids. We choose to study the definitions of removal and contraction operations for n-dimensional generalized maps, since this notion enables us to unambiguously represent the topology of quasi-manifolds, which is a well-defined class of subdivisions [12]. Generalized maps are defined for any dimension, and their algebraic definition is simple; several kernels of geometric modeling softwares are based upon data structures derived from this notion. More precisely, generalized maps (resp. combinatorial maps) represent orientable or not orientable quasimanifolds, with or without boundaries (resp. orientable without boundaries). It is clear that most applications deal with orientable subdivisions without boundaries. Nevertheless, we mainly choose to deal with generalized maps rather than combinatorial maps or equivalent structures since we can provide simpler definitions of data structures and operations, and so more efficiency for the conception of softwares. Moreover, we known how to deduce combinatorial maps from generalized maps, so the results presented in this paper can be extended for combinatorial maps. Precise relations between generalized and combinatorial maps, and other classical data structures are presented in [29]. The main result of this paper consists in Def. 5 and Theorem 2 which shows that, for a given dimension n, we can simultaneously remove and contract cells of different dimensions under some simple conditions. This extends previous 2D and 3D results for the n-D case, and even for the more studied 2D and 3D cases (cf. [25,20,21]) this shows that more operations can be simultaneously applied (this is important for parallelization and for reducing the number of pyramid levels). More precisely, we define removal of one i -dimensional cell, and contraction by duality (cf. Sec. 3 and Sec. 4); according to the respective values of i and n, a simple precondition is to be satisfied. Then we extend this definition in order to simultaneously remove and contract several i -cells for a given i. At last, we extend it for removing and contracting several cells of any dimensions (cf. Sec. 5): we show that it is possible if and only if cells are disjoined.
410
Guillaume Damiand and Pascal Lienhardt
10 9
f1 a1
a2
s1 s2
12
4 3 8 19
5 6 1 5
4 19
11 15
5
3 20
16
21
1 2
6 5 7 6
7 8 6 7
14
6
(a) 12 3 α0 2 1 4 α1 5 3 2 α2 1 2 20
18 17
7 8
f2
13
22
(b) 8 7 4 8
9 10 18 9
10 9 13 10
11 12 15 17
12 11 19 18
13 14 10 13
14 13 17 14
15 16 11 15
16 15 22 16
17 18 14 11
18 17 9 12
19 20 12 4
20 19 21 3
21 22 20 21
22 21 16 22
Fig. 1. (a) A 2D subdivision. (b) The corresponding 2-G-map (involutions are given explicitly in the array). Darts are represented by numbered black segments. Two darts in relation by α0 share a little vertical segment (ex. darts 1 and 2). Two darts in relation by α1 share a same point (ex. darts 2 and 3). Two distinct darts in relation by α2 are parallel and close to each other (ex. darts 3 and 20); otherwise, the dart is its own image by α2 (ex. dart 2). Dart 1 corresponds to (s1 , a1 , f1 ), dart 2 = 1α0 corresponds to (s2 , a1 , f1 ), 3 = 2α1 corresponds to (s2 , a2 , f1 ), and 20 = 3α2 corresponds to (s2 , a2 , f2 ). The vertex incident to dart 2 is < α1 , α2 > (2) = {2, 3, 20, 21}, the edge incident to dart 3 is < α0 , α2 > (3) = {3, 4, 19, 20}, and the face incident to dart 9 is < α0 , α1 > (9) = {9, 10, 13, 14, 17, 18}.
2
Generalized Maps Recall
An n-dimensional generalized map is a set of abstract elements, called darts, and applications defined on these darts: Definition 1 (Generalized Map). Let n ≥ 0. A n-dimensional generalized map (or n-G-map) is G = (B, α0 , . . . , αn ) where: 1. B is a finite set of darts; 2. ∀i, 0 ≤ i ≤ n, αi is an involution2 on B; 3. ∀i, j, 0 ≤ i < i + 2 ≤ j ≤ n, αi αj is an involution. Let G be an n-G-map, and S be the corresponding subdivision. Intuitively, a dart of G corresponds to an (n + 1)-tuple of cells (c0 , . . . , cn ), where ci is an i -dimensional cell that belongs to the boundary of ci+1 (cf. [11] and Fig. 1). αi associates darts corresponding with (c0 , . . . , cn ) and (c0 , . . . , cn ), where cj = cj for j = i, and ci = ci (αi swaps the two i -cells that are incident to the same (i − 1) and (i + 1)-cells). When two darts b1 and b2 are such that b1 αi = b2 (0 ≤ i ≤ n), b1 is i-sewn with b2 . 2
An involution f on S is a one to one mapping from S onto S such that f = f −1 .
Removal and Contraction for n-Dimensional Generalized Maps
411
G-maps represent cells in an implicit way: Definition 2 (i-Cell). Let G be an n-G-map, b a dart and i ∈ N = {0, . . . , n}. The i-cell incident to b is the orbit3 <>N −{i} (b) =< α0 , . . . , αi−1 , αi+1 , . . . , αn > (b) Intuitively, an i -cell is the set of all darts which can be reached starting from b, by using any combination of all involutions except αi . The set of i -cells is a partition of the darts of the G-map, for each i between 0 and n. Two cells are disjoined if their intersection is empty, i.e. when no dart is shared by the cells. More precisions about G-maps are provided in [12].
3
Removal
Intuitively and in a general way for an n-dimensional space, the removal of an i -cell consists in removing this cell and in merging its two incidents (i + 1)-cells: so removal can be defined for 0 . . . (n − 1)-cells. 3.1
Dimension 1: 0-Removal
For dimension 1, only the 0-removal exists, which consists in removing a vertex and in merging its two incident edges. Let C =< α1 > (b) be a vertex, let Cα0 be the “neighbor” darts of C for α0 , i.e. Cα0 = {b | ∃b ∈ C such that b α0 = b }, and let B S = Cα0 − C be the “neighbor” darts of C for α0 that do not belong to C (see Fig. 2). The G-map resulting from the 0-removal of C is obtained by redefining α0 for the darts of B S as follows: ∀b ∈ B S , b α0 = b (α0 α1 )k α0 , where k is the smallest integer such that b (α0 α1 )k α0 ∈ B S . Note that α1 is not modified by 0-removal. 3.2
Dimension 2
There are two different removal operations (0 and 1-removal) for dimension 2. 0-Removal. It consists in removing a 0-cell C =< α1 , α2 > (b). Let B S = Cα0 − C (Cα0 is defined as above). This operation can be applied only if the following precondition is satisfied: ∀b ∈ C, b α1 α2 = b α2 α1 . This constraint corresponds, in the general case, to the fact that the degree of the vertex is equal to 2 (2 edges are incident to the vertex). If this constraint is not satisfied, we do not know how to join the cells incident to C, and it is then impossible to define the removal in a simple way. [30] proposes a generalization of this operation, but it is complex and cannot be used for an automatic process, in particular in automatic image processing. 3
Let {Π0 , . . . , Πn } be a set of permutations on B. The orbit of an element b relatively to this set of permutations is < Π0 , . . . , Πn > (b) = {Φ(b), Φ ∈< Π0 , . . . , Πn >}, where < Π0 , . . . , Πn > denotes the group of permutations generated by Π0 , . . . , Πn .
412
Guillaume Damiand and Pascal Lienhardt
1
2
1
3
4
4 (a)
(b)
Fig. 2. 0-removal in 1D. (a) Initial 1-G-map. (b) Result. C =< α1 > (2) = {2, 3} (darts marked with empty squares), Cα0 = {1, 4} = B S (darts marked with crosses). 0-removal consists in setting 1α0 = 1(α0 α1 )α0 = 4 ∈ B S and 4α0 = 4(α0 α1 )α0 = 1 ∈ BS .
The G-map resulting from 0-removal is obtained by redefining α0 for the darts of B S as follows: ∀b ∈ B S , b α0 = b (α0 α1 )k α0 where k is the smallest integer such that b (α0 α1 )k α0 ∈ B S . Note that this redefinition of α0 is the same as for dimension 1 but concerns different darts, since it is a 0-cell within a 2D object: cf. Fig. 3 (intuitively, in the general case, this operation consists in applying twice the 0-removal defined for dimension 1).
1
3
2
(a)
4
1
4
(b)
Fig. 3. 0-removal in 2D. (a) Initial 2-G-map. (b) Result. C =< α1 , α2 > (2) (darts marked with empty squares), Cα0 = B S (darts marked with crosses). For instance, 1α0 = 1(α0 α1 )α0 = 4 ∈ B S .
1-Removal. It consists in removing a 1-cell C =< α0 , α2 > (b). This can be achieved without any precondition. Let B S = Cα1 − C. The resulting G-map is obtained by redefining α1 for the darts of B S as follows: ∀b ∈ B S , b α1 = b (α1 α2 )k α1 , where k is the smallest integer such that b (α1 α2 )k α1 ∈ B S . Examples of 1-removal are presented in Fig. 4, and 5. For this last example, k = 2 since the removed edge is incident twice to the same vertex.
3.3
Dimension n
The general definition of i -cell removal for an n-dimensional G-map is an obvious extension of the previous cases. Let C be an i -cell to remove; when i < n − 1
Removal and Contraction for n-Dimensional Generalized Maps
1 4
2 3
1
1
2
4
3
4
(a)
(b)
Fig. 4. 1-removal in 2D in the general case. Darts of the edge to remove are marked with circles. (a) Initial 2-G-map. (b) Result.
(a)
413
1
4 (b)
Fig. 5. 1-removal in 2D of a loop. (a) Initial 2-G-map. (b) Result. For instance, 1α1 = 1(α1 α2 )(α1 α2 )α1 = 4 ∈ B S (since 1(α1 α2 )α1 ∈ B S , this dart belongs either to C and to Cα1 ).
the operation can be applied only when, informally4 the degree of C is equal to 2 (a vertex incident to exactly two edges or an edge incident to two faces or a face incident to two volumes. . . ). The i -removal consists then in redefining αi for the darts of B S = Cαi − C in the following way: b αi = b (αi αi+1 )k αi , where k is the smallest integer such that b (αi αi+1 )k αi ∈ B S . We obtain so the general definition of the i -removal operation: Definition 3 (i-Cell Removal). Let G = (B, α0 , . . . , αn ) be an n-G-map, i ∈ {0, . . . , n−1} and C =<>N −{i} (b) be an i-cell, such that: ∀b ∈ C, b αi+1 αi+2 = b αi+2 αi+1 . Let B S = Cαi −C, the set of darts i-sewn to C that do not belong to C. The n-G-map resulting from the removal of this i-cell is G = (B , α0 , . . . , αn ) defined by: – – – –
B = B − C; ∀j ∈ {0, . . . , n} − {i}, αj = αj |B 5 ; ∀b ∈ B − B S , b αi = b αi ; ∀b ∈ B S , b αi = b (αi αi+1 )k αi , where k is the smallest integer such that b (αi αi+1 )k αi ∈ B S .
Theorem 1. G is an n-G-map. Proof. It is easy to check that G satisfies conditions 2 and 3 of Def. 1. cf. [31]. Note that G can contain only one n-cell, and may even be empty if G contains only one i-cell.
4
Contraction
Informally, i -contraction consists in contracting an i -cell into an (i−1)-cell. Contraction is the dual of the removal operation. Informally, the dual of a subdivision is a subdivision of the same space, in which an (n − i)-cell is associated with each initial i -cell, and incidence relations are kept. A nice property of G-maps 4 5
The formal pre condition is: ∀b ∈ C, b αi+1 αi+2 = b αi+2 αi+1 . Note that if i = n − 1 this condition doesn’t apply and we can always remove any (n − 1)-dimensional cell. αj is equal to αj restricted to B , i.e. ∀b ∈ B , bαi = bαi .
414
Guillaume Damiand and Pascal Lienhardt 1
1 2
(a)
1
4
1
2 4
3 4
4
3
(b)
Fig. 6. 1-contraction in 1D. Darts of the edge to contract are marked with black disks. (a) Initial 2-G-map. (b) Result.
(a)
(b)
Fig. 7. 1-contraction in 2D. (a) Initial 2-Gmap. (b) Result.
is the fact that the dual G-map of G = (B, α0 , . . . , αn ) is G = (B, αn , . . . , α0 ): we just need to reverse the involution order. We can thus easily deduce the definition of i -contraction from the general definition of i -removal. We just have to replace ’+’ by ’-’ for indices of involutions for preconditions and operations, i.e. αi+1 αi+2 →αi−1 αi−2 and αi αi+1 →αi αi−1 . (see two examples of contraction in Fig. 6 and 7). Definition 4 (i-Cell Contraction). Let G = (B, α0 , . . . , αn ) be an n-Gmap, i ∈ {1, . . . , n} and C =<>N −{i} (b) be an i-cell, such that6 : ∀b ∈ C, b αi−1 αi−2 = b αi−2 αi−1 . Let B S = Cαi − C , the set of darts i-sewn to C that do not belong to C. The n-G-map resulting from the contraction of this i-cell is G = (B , α0 , . . . , αn ) defined by: – – – –
5
B = B − C; ∀j ∈ {0, . . . , n} − {i}, αj = αj |B ; ∀b ∈ B − B S , b αi = b αi ; ∀b ∈ B S , b αi = b (αi αi−1 )k αi , where k is the smallest integer such that b (αi αi−1 )k αi ∈ B S .
Generalisations
Previous definitions enable us to remove or to contract a single cell. For some applications, it could be more efficient to simultaneously apply several operations. Concretely, let G be an n-G-map. Assume that each dart belonging to a removed or contracted cell is marked with the dimension and type of the corresponding operation. Operations can be simultaneously applied if and only if: – the cells are disjoined (involving that a dart has at most a unique mark and thus that there is no ambiguity when redefining αi for each i, 0 ≤ i ≤ n); – the preconditions of the corresponding operations are satisfied. Indeed, when a precondition is satisfied before a set of operations, this precondition is still satisfied after each step of the operation (the converse is obviously false), because cells are disjoined. This allows us to apply simultaneously or successively a set of operations and to obtain the same result. 6
Note that this condition doesn’t apply for i = 1, so we can always contract any edge.
Removal and Contraction for n-Dimensional Generalized Maps
415
1
2
(a)
(b)
Fig. 8. (a) Simultaneous 1-removals in 2D. (b) Simultaneous 1-removals (dart marked with circles) and 1-contractions (dart marked with black disks) in 2D. For instance, 1α1 = 1(α1 α0 )(α1 α2 )α1 = 2 since the edge incident to 1α1 is contracted and the edge incident to 1(α1 α0 )α1 is removed.
We now present this generalization in several steps. First, we show that it is possible to simultaneously perform removals (resp. contractions) of several i -cells for a given i (0 ≤ i ≤ n). Generalisation 1 We can easily prove that the previous definition of removal (resp. contraction) stands for the removal (resp. contraction) of a set of cells of same dimension i. The (possible) precondition of the initial operation has to be satisfied for each cell (cf. Fig. 8(a)). Moreover, removing (resp. contracting) simultaneously several i -cells or applying successively and in any order the initial operation for any removed cell, produce the same result. The main idea of the proof is: each αi redefinition consists in: ∀b ∈ B Si , bαi = b(αi αi+1 )k αi . The darts of this path can be partitioned depending on the removed cells they belong to, i.e. bαi = b(αi αi+1 )k1 (αi αi+1 )k2 . . . (αi αi+1 )kp αi . Each subpath corresponds to a single removal and so order is not important, and each removal do not depend on other operations. We retrieve here the notion of connecting walk of Brun and Kropatsch [26]. Darts of B S i are surviving darts and αi put in relation two darts of B S i by traversing some non-surviving darts (darts traversed by (αi αi+1 )k ). Generalisation 2 The previous generalization can be directly extended for simultaneously removing and contracting cells of same dimension i. A cell is either removed or contracted, but not both at the same time. The (possible) precondition of the corresponding initial operation has to be satisfied for each cell (cf. Fig. 8(b)). More precisely, let CS i (resp. CC i ) be a set of i -cells to remove (resp. contract), such that CS i ∩ CC i = ∅ and such that the (possible) precondition of i -removal (resp. i -contraction) operation is satisfied for each cell of CS i (resp. CC i ). Let B Si = (CC i ∪CS i )αi −(CC i ∪CS i ). As before, αi is redefined for these darts: ∀b ∈ B Si , bαi = b = b(αi αk1 ) . . . (αi αkp )αi where p is the smallest integer such that b ∈ B Si and ∀j, 1 ≤ j < p, if bc = b(αi αk1 ) . . . (αi αkj−1 )αi ∈ CS i then kj = i + 1 else (bc ∈ CC i ) kj = i − 1.
416
Guillaume Damiand and Pascal Lienhardt
Generalisation 3 The previous generalization can be directly extended for the removal and/or contraction of a set of disjoined cells of any dimension. The (possible) precondition of the corresponding initial operation has to be satisfied for each cell. This last generalization is possible because the set of cells are disjoined. If we consider two disjoined cells, there is two possibilities: the two cells have the same dimension and they can be adjacent or not; the two cells have different dimensions and they can not be incident. The first case is covered by the previous generalizations. The second one is easy to consider since the cells are not incident: so some surviving darts exist between the two cells, and αi is redefined only for these darts. The other cells are not modified and so other preconditions are still valid when applying any subset of these operations. The following definition covers all the previous operations. As for the previous generalizations, αi is redefined only for the darts of B Si , but this redefinition is now done for any i, 0 ≤ i ≤ n. Definition 5 (Simultaneous Removal and Contraction of Cells of Any Dimension). Let G = (B, α0 , . . . , αn ) be an n-G-map, CS 0 , . . . , CS n−1 be sets of 0-cells,. . . , (n − 1)-cells to be removed and CC 1 , . . . , CC n be sets of 1-cells,. . . ,n-cells to be i n i contracted. Let CS = ∪n−1 i=0 CS and CC = ∪i=1 CC . Two preconditions have to be satisfied: cells are disjoined (i.e. ∀C, C ∈ CC ∪ CS, C ∩ C = ∅), and “the degree of each cell is equal to 2”, i.e.: - ∀i, 0 ≤ i ≤ n − 2, ∀b ∈ CS i , bαi+1 αi+2 = bαi+2 αi+1 - ∀i, 2 ≤ i ≤ n, ∀b ∈ CC i , bαi−1 αi−2 = bαi−2 αi−1 Let B Si = (CS i ∪ CC i )αi − (CS i ∪ CC i ) ∀i, 0 ≤ i ≤ n. The resulting n-G-map is G = (B , α0 , . . . , αn ) defined by: – B = B − (CC ∪ CS); – ∀i, 0 ≤ i ≤ n, ∀b ∈ B − B Si , bαi = bαi ; – ∀i, 0 ≤ i ≤ n, ∀b ∈ B Si , bαi = b = b(αi αk1 ) . . . (αi αkp )αi , where p is the smallest integer such that b ∈ B Si , and ∀j, 1 ≤ j < p, if bc = b(αi αk1 ) . . . (αi αkj−1 )αi ∈ CS i then kj = i + 1 else (bc ∈ CC i ) kj = i − 1. Theorem 2. The general removal and contraction operation produces an n-Gmap. Proof. cf. [31]. An example of this last generalization is given in Fig. 9, for which all possible removal and contraction operations are simultaneously applied. We can check that we get the same result when cells are successively removed or contracted by the initial operation in any order. We retrieve here the notion of connecting dart sequence of Brun and Kropatsch [32].
Removal and Contraction for n-Dimensional Generalized Maps
417
C1 C3
C2
(a)
(b)
Fig. 9. An example in 2D of simultaneous removal and contraction of cells of different dimensions. (a) 2-G-map before operations. (b) The resulting 2-G-map. Darts belonging to a removed 1-cell (resp. removed 0-cell, contracted 1-cell, contracted 2-cell) are marked with a circle (resp. an empty square, a black disk, a filled square). Darts marked with crosses belong to ∪B Si . Three connecting walks are represented: C1 which traverses a contracted edge, C2 which traverses two contracted vertices and C3 which traverses one contracted edge then one removed edge.
6
Conclusion and Perspectives
In this paper, we have defined removal and contraction operations, which can be applied to any cells of any G-maps, whatever their respective dimensions. Moreover, we have studied how to perform simultaneously different operations. These definitions are homogeneous for any dimension. Since combinatorial maps [20,25] can be easily deduced from orientable generalized maps [12], these operations can also be defined on combinatorial maps. We intend to revisit the works of Brun and Kropatsch for handling irregular pyramids of n-dimensional generalized maps. Properties of removal and contraction operations would enable us to establish relations between two contiguous levels within a pyramid, and thus between any levels. Efficient data structures could be deduced taking these relations into account. Theorem 2 means that the result of removal and contraction operations is a valid object. However, topological properties (as connectivity for instance) could be not preserved when applying these operations. Since preserving topological properties is an essential issue, in particular for image pyramids, we are studying the evolution of some topological characteristics, in order to control the construction of coherent pyramids. Some results are presented in [24] in the particular framework of 2D and 3D images representation; they have to be generalized in upper dimension and for the general case: since we exactly know the mathematical objects associated with G-maps, it is possible to apply well-known
418
Guillaume Damiand and Pascal Lienhardt
results of combinatorial topology in order to control the evolution of topological properties. In order to conceive efficient algorithms, another interesting perspective is the parallelization of the application of a set of operations. We think that, in the general case, checking preconditions could be distributed on cells: it is then possible to simultaneously compute sets Cαi − C; the application of the operations could be distributed on the surviving darts (but this has to be more deeply studied). It is also necessary to study particular cases, for instance when removed or contracted cells satisfy some particular properties (a well known example consists in removing a tree of edges).
References 1. Burt, P., Hong, T.H., Rosenfeld, A.: Segmentation and estimation of image region properties through cooperative hierarchical computation. IEEE Transactions on Systems, Man and Cybernetics 11 (1981) 802–809 2. Meer, P.: Stochastic image pyramids. Computer Vision, Graphics and Image Processing 45 (1989) 269–294 3. Montanvert, A., Meer, P., Rosenfeld, A.: Hierarchical image analysis using irregular tesselations. IEEE Transactions on Pattern Analysis and Machine Intelligence 13 (1991) 307–316 4. Jolion, J., Montanvert, A.: The adaptive pyramid : a framework for 2d image analysis. Computer Vision, Graphics and Image Processing 55 (1992) 339–348 5. Kropatsch, W.: Abstraction pyramids on discrete representations. In: Discrete Geometry for Computer Imagery. Number 2301 in LNCS, Bordeaux, France (2002) 1–21 6. Kropatsch, W.: Building irregular pyramids by dual graph contraction. Technical report PRIP-TR-35, Dept. for Pattern Recognition and Image Processing, Institute for Automation, Technical Univerity of Vienna, Austria (1994) 7. Kropatsch, W.: Building irregular pyramids by dual-graph contraction. Vision, Image and Signal Processing 142 (1995) 366–374 8. Baumgart, B.: A polyhedron representation for computer vision. In: AFIPS nat. conf. Number 44 (1975) 589–596 9. Guibas, L., Stolfi, J.: Primitives for the manipulation of general subdivisions and the computation of vorono¨ı diagrams. ACM Transactions on Graphics 4 (1985) 74–123 10. Dobkin, D., Laszlo, M.: Primitives for the manipulation of three-dimensional subdivisions. Algorithmica 4 (1989) 3–32 11. Brisson, E.: Representing geometric structures in d dimensions: topology and order. Discrete Comput. Geom. 9 (1993) 387–426 12. Lienhardt, P.: N-dimensional generalized combinatorial maps and cellular quasimanifolds. International Journal of Computational Geometry and Applications 4 (1994) 275–324 13. de Floriani, L., Mesmoudi, M., Morando, F., Puppo, E.: Non-manifold decomposition in arbitrary dimensions. In: Discrete Geometry for Computer Imagery. Number 2301 in LNCS, Bordeaux, France (2002) 69–80 14. Edmonds, J.: A combinatorial representation for polyhedral surfaces. Notices of the American Mathematical Society 7 (1960)
Removal and Contraction for n-Dimensional Generalized Maps
419
15. Jacques, A.: Constellations et graphes topologiques. In: Combinatorial Theory and Applications. Volume 2. (1970) 657–673 16. Cori, R.: Un code pour les graphes planaires et ses applications. PhD thesis, Universit`e de Paris VII (1973) 17. Elter, H., Lienhardt, P.: Cellular complexes as structured semi-simplicial sets. International Journal of Shape Modeling 1 (1994) 191–217 18. Brun, L.: Segmentation d’images couleur ` a base topologique. PhD thesis, Universit´e de Bordeaux I (1996) 19. Fiorio, C.: Approche interpixel en analyse d’images : une topologie et des algorithmes de segmentation. PhD thesis, Universit´e Montpellier II (1995) 20. Braquelaire, J., Desbarats, P., Domenger, J., W¨ uthrich, C.: A topological structuring for aggregates of 3d discrete objects. In: Workshop on Graph based representations, Austria, IAPR-TC15 (1999) 193–202 21. Bertrand, Y., Damiand, G., Fiorio, C.: Topological encoding of 3d segmented images. In: Discrete Geometry for Computer Imagery. Number 1953 in LNCS, Uppsala, Sweden (2000) 311–324 22. Bertrand, Y., Damiand, G., Fiorio, C.: Topological map: minimal encoding of 3d segmented images. In: Workshop on Graph based representations, Ischia, Italy, IAPR-TC15 (2001) 64–73 23. Braquelaire, J., Desbarats, P., Domenger, J.: 3d split and merge with 3-maps. In: Workshop on Graph based representations, Ischia, Italy, IAPR-TC15 (2001) 32–43 24. Damiand, G.: D´efinition et ´etude d’un mod`ele topologique minimal de repr´esentation d’images 2d et 3d. PhD thesis, Universit´e de Montpellier II (2001) 25. Brun, L., Kropatsch, W.: Dual contraction of combinatorial maps. In: Workshop on Graph based representations, Austria, IAPR-TC15 (1999) 145–154 26. Brun, L., Kropatsch, W.: Pyramids with combinatorial maps. Technical report 57, Institute of Computer Aided Automation, Vienna University of Technology, Austria (1999) URL: http://www.prip.tuwien.ac.at/. 27. Brun, L., Kropatsch, W.: The construction of pyramids with combinatorial maps. Technical report 63, Institute of Computer Aided Automation, Vienna University of Technology, Austria (2000) URL: http://www.prip.tuwien.ac.at/. 28. Brun, L., Kropatsch, W.: Contraction kernels and combinatorial maps. In: Workshop on Graph based representations, Ischia, Italy, IAPR-TC15 (2001) 12–21 29. Lienhardt, P.: Topological models for boundary representation: a comparison with n-dimensional generalized maps. Commputer Aided Design 23 (1991) 59–82 30. Elter, H.: Etude de structures combinatoires pour la repr´esentation de complexes cellulaires. PhD thesis, Universit´e Louis-Pasteur, Strasbourg (1994) 31. Damiand, G., Lienhardt, P.: Removal and contraction for n-dimensional generalized maps. Technical Report 2003-01, Laboratoire IRCOM-SIC (2003) URL: http://damiands.free.fr/. 32. Brun, L., Kropatsch, W.: Receptive fields within the combinatorial pyramid framework. In: Discrete Geometry for Computer Imagery. Number 2301 in LNCS, Bordeaux, France (2002) 92–101
The Generation of N Dimensional Shape Primitives Pieter P. Jonker1 and Stina Svensson2 1 Pattern Recognition Group, Faculty of Applied Sciences, Delft University of Technology, Lorentzweg 1, 2628 CJ Delft, The Netherlands [email protected] 2 Centre for Image Analysis Lägerhyddvägen 3, SE-75237 Uppsala, Sweden [email protected]
Abstract. This paper describes a method to accelerate the generation of shape primitives for N-dimensional images X N . These shape primitives can be used in conditions for topology preserving erosion or skeletonization in Ndimensional images. The method is based on the possibility to describe primitives for intrinsic dimensions N% = N − 1 by quadratic equations of the form
xN =
n = N −1
∑ (a x n
n =0
)
+ bn xn . 2
x
Keywords: N-dimensional morphology, N-dimensional digital topology, Ndimensional thinning
1 Introduction In [4-9], a general principle for morphological operations in cubic tessellated binary images X N was described. One of its major applications is the skeletonization operation, which can be described as a conditional erosion operation. The conditions to the erosion are topology preservation conditions which are in this case constituted out of sets of structuring elements or mask. These sets can be partitioned in a number of sub-sets, e.g., for three dimensional images one can distinguish subsets that preserve volumes, surfaces, surface-ends, curves, curve-ends and single voxel objects. The subsets for topology preservation of surfaces and curves are made of so-called shape primitives, which are –in short- all possible ways a non-bifurcating surface or a curve intersects a 33 neighbourhood. We have elaborated on the connectivity paradox for high-dimensional images and, considering the fact that one likes to have the highest possible curvature to describe foreground objects, it is preferable to select the highest connectivity for foreground objects and the lowest for background objects. The topology preservation conditions are generated out of the shape primitives by properly intersecting foreground and background primitives. For three-dimensional images, the intersections of foreground surface primitives with background curve primitives, yield the surface preservation conditions; they specify all possible ways in which a foreground surface is possibly “pierced” by a background curve. This happens when I. Nyström et al. (Eds.): DGCI 2003, LNCS 2886, pp. 420–433, 2003. © Springer-Verlag Berlin Heidelberg 2003
The Generation of N Dimensional Shape Primitives
421
the central voxel swaps value from foreground to background. The same principle holds for the curve preservation conditions; they specify all possible ways in which a foreground curve is possibly “sliced” by a background surface. This, again, happens when the central voxel swaps value from foreground to background. As we designed a general principle for skeletonization in cubic tessellated binary images X N , based on the hit-or-miss transform [10], we came in [8] with a method that can be used to generate shape primitives up to X N . It is a bootstrapping method, in which starting from a single voxel we gradually generate the shape primitives for ever higher object dimensions: A single voxel with two adjacent mutually not connected neighbour voxels form all possible space curve primitives. A voxel encircled by a closed space curve constitutes all surface primitives. We noticed also that as a consequence, the surface primitive is 18-connected and its behaviour can be described by a quadratic equation. This can also be used to measure the surface area of objects. If we come to skeletonization in four dimensional images, the skeletonization procedure is identical to that in 3D and 2D, the question is only to find the suitable shape primitives. As was suggested in [8], the 3D procedure could be extended to 4D. In 4D we find space curves, curved surfaces, curved volumes and a hyper-volume. We stated that encircling a central voxel by surface patches generates the curved volumes; much alike a football is made out of leather patches. However, in practise this is a recursive procedure that takes almost infinite time to generate all possible situations. As a consequence, in this paper we describe a method to speed up the generation of shape primitives, starting from their quadratic descriptions.
2 Connectivities in N Dimensional Images A set is connected if each pair of its points can be joined by a path along points that are all in the set, [3]. Within a square tessellated two-dimensional image with objects on a background, the objects can be chosen to consist of pixels connected with one or more of their 8 neighbours at (E, NE, N, NW, W, SW, S, SE) but as a consequence the background pixels must be connected with one or more of their 4 neighbours at (E, N, W, S) [1], or vice versa. This paradox extends to higher dimensions. Tessellation in higher dimensions is also known as honeycombing. Coxeter [2] proved that the only regular honeycomb, i.e., a single regular identical cell on a lattice, which exists in all dimensions N, is the cubic honeycomb. This leaves hypercubic honeycombing as the only way to set-up a method that is able to operate from low to high dimensions. In [5], [6], we derived expressions to generate N x , the connectivity between elements for N dimensional cases, i.e. the connectivities for 3N neighbourhoods in 3D are N 6 , N 18 , N 26 and in 4D are N 8 , N 32 , N 64 , N 80 . An object in a binary image X N is defined as a set of mutually connected image elements with the same value. Generally the highest connectivity is chosen as foreground connectivity. Consequently, the connectivity of an object is the highest connectivity between any two adjacent elements (pixels, voxels, …) within that object. If an image has more
422
Pieter P. Jonker and Stina Svensson
than one object, we assume that all objects have the same value 1. The set of all objects in an image is the image foreground. When we define a basic object as a nonforking object with arbitrary shape and connectivity having a single intrinsic dimension N% with: 0 ≤ N% ≤ N , we can categorize them as in Table 1. Table 1. Intrinsic dimensions of basic objects Dimension N=1 N=2
N=3
N=4
Intrinsic dimension Ñ=0 Ñ=1 Ñ=0 Ñ=1 Ñ=2 Ñ=0 Ñ=1 Ñ=2 Ñ=3 Ñ=0 Ñ=1 Ñ=2 Ñ=3 Ñ=4
Basic object type Point Line Point Line Plane Point Line Planar Volume Point Line Planar Volume Hyper-Volume
Non-linear form
Curve
Space Curve Curved Surface
Space Curve Curved Surface Curved Volume
An example of a basic object is a non-bifurcating 18-connected curved surface in X 3 . Another example is a non-forking 26-connected space curve in X 3 . Note that curving is only possible when 1 ≤ N% ≤ N − 1 . For N% = 0 the basic object's size is the unit size, which prevents curving. For N% = N the basic object's structure spatially fills all dimensions and leaves no freedom for curving in another dimension. The degree of freedom for a basic object to curve is N − N% . Basic objects are tessellated from shape primitives with a single intrinsic dimension N% only. In contrast with a basic object, we can define a compound object as an object of any arbitrary shape and size composed of one or more basic objects. Commonly a compound object is referred to as “object”. It can bifurcate. Compound objects are tessellated from shape primitives from any intrinsic dimension. Examples in X 3 are objects composed of a number of volumes and / or surfaces and / or space curves, and in X 2 objects composed of a number of planes and / or curves. Compound objects can be thinned from an intrinsic dimension N% to a lower intrinsic dimension N% − 1 by eroding them until the topology preservation conditions for dimensions N% ≤ N% − 1 prevent further thinning. These conditions for intrinsic dimension N% are found by intersecting foreground and background shape primitives of N% .
The Generation of N Dimensional Shape Primitives
423
As to each shape primitive, e.g., in 3D, an area or length contribution can be given, the total area of object boundaries or the length of space curves can be calculated by adding up the contributions of the primitive occurrences within an object.
3 The Generation of Shape Primitives in XN In [8], we defined Table 2, which gives per dimension N the relations between a shape primitive of object dimension N% , the connectivity of the neighbour voxels to the Central Element (CE), the number of neighbour voxels connected to the central element (NE), and the recursive connectivity (RC). For example, a curved surface in 3D, (N, N% ) = (3,2), has at least 4 neighbour voxels 18-connected to its central voxel. The neighbour voxels are mutually 26-connected (they are a closed space curve).
Table 2. Dimension, intrinsic dimension and connectivity for various shapes N, N%
CE
NE
(2,2) (2,1) (3,3) (3,2) (3,1)
Foreground shape primitive Flat surface Curve Flat volume Curved surface Space curve
4 8 6 18 26
4 2 6
(4,4) (4,3) (4,2) (4,1)
Flat hyper-volume Curved volume Curved surface Space curve
8 32 64 80
≥4 2 8
≥6 ≥4 2
RC
CE
NE
RC
26 -
Background shape primitive Point Curve Point Space curve Curved surface
4 6 18
2 2
6 6
64 80 -
Point Space curve Curved surface Curved volume
80 64 32
≥4 2
≥4 ≥6
8 8 8
Table 2 specifies the algorithms with which the primitives can be generated: ! !
! !
!
A curve primitive in X 3 is formed by a central voxel connected to two mutually not connected voxels. The curve roams over the 26-connected positions A curved surface primitive in X 3 is formed by a central voxel encircled by a closed space curve. This curve is 26-connected but is 18-connected to the central voxel A curve primitive in X 4 is formed by a central voxel connected to two mutually not connected voxels. The curve roams over the 80-connected positions. A curved surface primitive in X 4 is formed by a central voxel encircled by a closed space curve. The curve is 80-connected but it is 64-connected to the central voxel A curved volume primitive in X 4 is formed by a central voxel encircled by a closed set of surface patches. Each patch is a surface primitive. The surface patch is 64-connected but is 32-connected to the central voxel
424
Pieter P. Jonker and Stina Svensson
These rules can be used to generate primitives as in figures 1-5. However, for dimensions higher than 3 this becomes cumbersome because of the recursive nature of the algorithms. Using quadratic equations to generate candidates for intrinsic dimensions N = N% − 1 speeds up this generation considerably. Fig. 1 shows the set of space curve primitives for X 3 . Masks a and b form the 6-connected background set. Masks b...l form the 26-connected foreground set. Note that the masks {b, c f, i} form the set of masks that is used as the foreground set of curves in X 2 , and the masks {a, b} form the background set in X 2 . The curves of the foreground set in X 2 are described (and can hence be generated) by:
y = ax + bx 2 with
(1)
( a, b ) ∈ { 0, ± 0.5, ±1} and ( a + b ≤ 1)
Which generates a set of 7 curves, of which 3 are mirrored versions of others: mask
(a,b )
equation
remark
b c i f − − −
(0,0) (1,0) (0,1) (0.5,0.5) (−1,0) (0, −1) (−0.5, −0.5)
y=0 y=x y = x2 y = 0.5 x + 0.5 x 2 y = −x y = − x2 y = −0.5 x − 0.5 x 2
mirror of c mirror of i mirror of f
a
b
z O
y
(2)
c
f
i
d
g
j
e
h
k
x
l Fig. 1. The relation between the 4- and 8-connected curve primitives in X 2 and the 6- and 26connected curve primitives in X 3
The Generation of N Dimensional Shape Primitives
425
Observe in fig. 1 that the N 8 curve primitives in 2D {b, c, f, i} [4] are a subset of the N 26 curve primitives in 3D. In X 2 they stay in the x-z plane. Note that another method to generate the 3D curve primitives {d, e, g, h, j, k, l} from the 2D curve primitives, is by exploring the newly obtained degree of freedom in the y direction. All non-central voxels are permuted over the ordinates y = {-1, 0, 1}. See, e.g., {j, k, l}, that have been derived from i.
a
b
h
i
c
d
e
f
g
j k l m Fig. 2. The 18-connected surface primitives in X 3
Fig. 2 shows the set of foreground curved surface primitives for X 3 . In geometry, two lines span a plane. Likewise two space curves may span a curved surface. This is shown by the surface primitive of mask a, which is spanned by two perpendicular versions of mask i of fig. 1 (the quadratic mask). The reason for this is that the surfaces of the foreground set in X 3 are described (and can hence be generated) by: z = ax + bx 2 + cy + dy 2
(3)
with
(a, b, c, d ) ∈ {0, ± 0.5, ±1}
and
( a + b ≤ 1) and ( c + d ≤ 1)
and ( ( a ≥ 0 ) ∧ ( b ≥ 0 ) ∧ ( c ≥ 0 ) ∧ ( d
≥ 0)
)
∨
((b > 0) ∧ ( c < 0))
(4)
This generates a set of 22 surfaces, of which 9 are mirrored versions of others: (5)
426
Pieter P. Jonker and Stina Svensson
Fig. 3 shows some curved volume primitives for X 4 . Like in 3D, where two lines span a plane and two space curves may span a curved surface, in 4D two planes may span a volume and two curved surfaces may span a curved volume. The curved volumes of the foreground set in X 4 are described (and can hence be generated) by:
u = ax + bx 2 + cy + dy 2 + ez + fz 2 with
(a, b, c, d , e, f ) ∈ {0, ± 0.5, ± 1}
and
( a + b ≤ 1) and ( c + d ≤ 1) and ( e + f ≤ 1)
and
(( a ≥ 0) ∧ (b ≥ 0) ∧ (c ≥ 0) ∧ ( d ≥ 0) ∧ (e ≥ 0) ∧ ( f (( b > 0) ∧ ( d > 0) ∧ ( f < 0))
(6)
)
≥ 0) ∨
(7)
Fig. 3 shows by way of example a flat volume a ( z = 0 ) , and the linear volumes1
b ( z = y ) , c ( z = u ) , d ( z = x + y ) , e ( z = y + u ) and f ( z = x + y + z ) .
1
Note that there is difference in terminology: in general, shape primitives of dimension N = N% − 1 are indicated in the field of pattern recognition as hyper-planes, where we indicate them here as curved volumes. The term hyper-plane in pattern recognition is used to indicate that the structure separates the hyperspace in two parts, e.g. each of which contains a point cloud that represents a class. Our terminology indicates the basic shape of the objects
The Generation of N Dimensional Shape Primitives
427
4 The Number of Unique Shape Primitives for N% = N - 1 Quadratic equations of the form (8) can be used to generate the shape primitives of dimensions N% = N − 1 . It is of interest how many shape primitives can be found for each dimension. n = N%
(8)
X N% = ∑ ( an xx + bn xn2 ) n=0
First, note that all linear solutions have two symmetry axes. Hence, for linear equations a difference version, e.g., z = x − y , is a mirrored version of the sum version, i.e., z = x + y . Secondly, all quadratic solutions have only one symmetry axis. But, a difference of a linear term and quadratic term will also produce mirrored versions. Only differences that involve two quadratic terms give rise to a new valid combination. This gives raise to the extra conditions (4) and (7) for the curved surfaces in 3D and the curved volumes in 4D.
a
d
z
b
O
c
y
e
x
u f
Fig. 3. Some 32 connected curved volume primitives in X 4
In order to estimate how many candidates we may expect for shape primitives of intrinsic dimension N% = N − 1 , we observe that in (8) the parameters come in tuples per dimension ( a,b ) , ( c, d ) ,... , whereas each dimension adds a tuple. Each tuple has the four sum combinations ( 0,0 ) , (1,0 ) , ( 0,1) , ( 0.5,0.5 ) , which we may code with (0, %
1, 2, 3). Consequently there are 4 N sum candidate combinations. In the case of the 3D surface primitives, this is 16. However, there are four self-mirrors (ab, cd) = (00, 11, 22, 33), leaving 12 candidates that are asymmetric. Of those 12, 12/2 = 6 are unique (10 = 01, 20 = 02, 30 = 03, 21 = 12, 31 = 13, 32 = 23). Adding the four self-mirrors (00, 11, 22, 33), yields as number of unique 3D surface primitives, based on a sum of terms, Cm + = 10 . To set-up a general equation for the number of shape primitives, observe (12), in which the generating equation is setup for N% = 2 .
428
Pieter P. Jonker and Stina Svensson
(9)
i , j i , j i, j i , j
00 10 20 30
3
i
⇒ Cm + = ∑∑ (1)
11 21 31
i = 0 j =1
22 32
33
This can be generalized for higher dimensions to: N%
n
i
Cm+ = ∑∑∑ (1) = n =0 i = 0 j =0
%
%
(10)
1 N 2 1 N n + 3n + 2 ) = ∑ ( n + 1)( n + 2 ) ( ∑ 2 n =0 2 n=0
This sum can be solved to:
Cm + =
((
)(
)(
1 % N + 1 N% + 2 N% + 3 6
))
⇒
Cm + =
N ( N + 2 )( N + 3) 6
(11)
This yields for intrinsic dimensions N = N% + 1 = (1,2,3,4,5,6,7,8,9,10,11,12), the values of Cm+ = (1,4,10,20,35,56,84,120,165,220,286,364). These numbers are also known as the Tetrahedal numbers from Pascal’s Triangle. For the combinations involving differences of terms that all involve squares, each tuple has two combinations ( 0,1) , ( 0.5, 0.5 ) , which we may code with (a, b). Conse%
quently, there are in principle 2 N candidate combinations: (00, 01, 10, 11), in which the first and the last are self-mirrors and the middle two are mirrors of eachother. This gives as number of unique 3D surface primitives based on a differences of terms: Cm − = 3 . The general case is more complicated. For example, for N% = 2, we have the sum k + l and the difference k − l . For N% = 3, we have the sum k + l + m and the
differences k + l − m and k − l − m , which is, however, a mirrored version of the first difference. For N% = 4, we have the sum k + l + m + n and the differences k + l + m − n and k + l − m − n and k − l − m − n , where the first difference is a mirrored version of the last difference. If we write out the combinations of a and b for differences of terms for some intrinsic dimensions, we obtain Table 3. In this table bold a and b indicate a negative term, others a positive term. E.g., aababb indicates the term -k-l-m+n+o+p. Note that we have written down only half of the combinations. The lower half for each intrinsic dimension is identical to the upper half, i.e. for N% = 8 the combinations for (m, n) = (3,5), (2,6), (1,7) and (0,8) must be added. Note also the regular pattern in Table 3, which gives rise to the generating function of equation (12), but for odd N% only. The second term in (12) is half of the total sum of the boxed entries in Table 3. The grey entries are omitted mirrored versions. n = N% ( m − 1)( n − 1) Cm− = ∑ N% − 1 + = 2 n= 0
(
)
n = N%
∑( n =0
(
)
N% − n − 1 ( n − 1) N% − 1 + 2
)
(12)
The Generation of N Dimensional Shape Primitives
429
Table 3. Combinations of a n b m for intrinsic dimensions N% = 6,7,8 with m + n = N%
N% = 8
m 8 7 6
n 0 1 2
aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaab aaaaaaab aaaaaaab aaaaaaab aaaaaaab aaaaaaab aaaaaaab aaaaaabb aaaaaabb aaaaaabb aaaaaabb aaaaaabb aaaaaabb aaaaaabb
4 7 7
5
3
abaaaaab aabaaaab aaabaaab aaaabaab aaaaabab aaaaabbb aaaaabbb aaaaabbb aaaaabbb aaaaabbb aaaaabbb aaaaabbb
3 7
4
abaaaabb aabaaabb aaabaabb aaaababb abbaaaab aabbaaab aaabbaab aaaabbab aaaabbbb aaaabbbb aaaabbbb aaaabbbb aaaabbbb aaaabbbb aaaabbbb
2 2 7
abaaabbb aabaabbb aaababbb abbaaabb aabbaabb aaabbabb
2 2
abbbaaab aabbbaab aaabbbab
1
4
44
N% = 7
m 7 6 5
n 0 1 2
aaaaaaa aaaaaab aaaaabb
aaaaaaa aaaaaab aaaaabb
aaaaaaa aaaaaab aaaaabb
aaaaaaa aaaaaab aaaaabb
aaaaaaa aaaaaab aaaaabb
aaaaaaa aaaaaab aaaaabb
3 6 6
4
3
abaaaab aaaabbb
aabaaab aaaabbb
aaabaab aaaabbb
aaaabab aaaabbb
aaaabbb
aaaabbb
2 6
abaaabb
aabaabb
aaababb
2
abbaaab
aabbaab
aaabbab
1 26
m 6 5 4 3
N% = 6
n 0 1 2
aaaaaa aaaaab aaaabb
aaaaaa aaaaab aaaabb
aaaaaa aaaaab aaaabb
3
abaaab aaabbb
aabaab aaabbb
aaabab aaabbb
abaabb
aababb
abbaab
aabbab
aaaaaa aaaaab aaaabb aaabbb
aaaaaa aaaaab aaaabb
3 5 5
aaabbb
2 5 2 0 22
This equation can be solved to (16): Cm − =
1 %3 1 %2 1 % 1 N + N − N− 12 2 12 2
1 3 11 2 while Cm+ was N% + N% + N% + 1 6 6
(13)
430
Pieter P. Jonker and Stina Svensson
For even dimensions N this gives: Cm,even =
1 3 N + 3N 2 − 2 N 4
Cm ,even =
⇒
1 ( N + 1)( N + 2 ) − 4 4
(14)
Table 3 shows that for odd dimensions N, i.e. for even intrinsic dimensions N% , when n is also even, there is a problem as the product of (m-1)(n-1) is odd. For these dimensions we have to add a contribution N% − 1 / 2 . This results in the equation:
(
Cm,odd =
)
1 3 N + 3N 2 − N + 1 4
(15)
Written in one equation this gives: C(N ) =
N N 1 3 2 N + 3N − 2 N + ( N + 1) − 4 2 2
(16)
Consequently, the numbers for Cm− are: 0, 3, 6, 14, 22, 37, 52, 76, 100, 135, 170, while the total number of unique N% = N − 1 primitives for dimension N = 1..12 are: C (1..12) = 1, 4,13, 26, 49,78,121,172, 241,320, 421,534
(17)
5 The Unique Shape Primitives for N% < N - 1 As fig. 1 showed, the curve primitives in 2D are a subset of the curve primitives in 3D. This procedure can also be used in higher dimensions to generate for example the ND curve primitives from the (N-1)D curve primitives by exploring the newly obtained degree of freedom. Fig. 4 shows some masks from the set of foreground curved surface primitives for X 4 . The procedure to generate these masks is similar to the generation of space curves in X 3 using the space curves from X 2 . The surface masks from X 3 (figure 2), are placed in X 4 . The 3D surface primitives are a subset of the 4D surface primitives. They stay in X 4 in the x-y-z plane. The 4D masks can be generated from the 3D set by exploring the newly obtained degree of freedom in the u direction. All noncentral voxels are permuted over the ordinates u = {-1, 0, 1}. Fig. 4 shows the permutation of mask l of fig. 2. In addition to this method, there is a downward method that is based on the observation that in 3D, the intersection of two planes is a line. Furthermore, the intersection of two curved surfaces may produce a space curve. This is not always the case, because two intersecting curved surfaces may partially overlap. As a consequence, this method can only be used to generate candidate situations for the shape primitives of N% < N - 1 . The candidates need to be sieved using the rules of section 3.
The Generation of N Dimensional Shape Primitives
z O
431
y x
u
Fig. 4. Some 64-connected surface primitives in X 4
6 The Unique Shape Primitives for the Background Fig. 5 shows the 6-connected surface primitives in X 3 , generated by taking a central voxel and encircling it with a 6-connected space curve that is 18-connected to the central voxel. This can be accelerated by using the quadratic equations of (18). However, as can be observed in fig. 5f, the z-axis runs along the body diagonal of the neighbourhood. Consequently, the points generated by (18) have to be rotated by rotating the body diagonal around the central voxel, atan
( 2 ) , in the direction of the
z-axis. Fig. 6 shows for each mask (a..f) of fig 5. the result (a..f) of the surface generated by (18) and its rotated version.
432
Pieter P. Jonker and Stina Svensson
Fig. 5. The 6-connected surface primitives in X 3 , generated by encircling
Fig. 6. The 6-connected surface primitives in X 3 , generated by (18) and rotation
mask (a,b;c,d) equation a z=0 (0,0;0,0) c z=x (1,0;0,0) e z = x2 (0,1;0,0)
mask (a,b;c,d) equation d z = x+ y (1,0;1,0) f (0,1;0,1) z = x 2 + y 2 b (0,1;0, −1) z = x 2 − y 2
(18)
7 Conclusions We described a method to accelerate the generation of shape primitives for Ndimensional images. These shape primitives can be used in conditions for topology preserving erosion or skeletonization in N-dimensional images, as well as for measurements. The method is based on the possibility to describe primitives for intrinsic dimensions N% = N − 1 by quadratic equations. We derived an equation to predict the number of unique shape primitives thus generated, being C(N) = (1, 4, 13, 26, 49, 78, 121) for the dimensions N = N% + 1 = 1..7 Finally, we showed that derived from this method, also the primitive candidates for N% < N − 1 as well as those for the background (lowest) connectivities can be found. This is used to speed up their generation.
Acknowledgement This work was partially funded by the Swedish Foundation for International Cooperation in Research and Higher Education (STINT). The authors thank Jurjen Caarls for his fruitful contributions.
The Generation of N Dimensional Shape Primitives
433
References 1. Rosenfeld A., Pfaltz J.L. (1966) Sequential operations in Digital Image Processing. Journal of the ACM, 471-494 rd 2. Coxeter H.S.M. (1974) Regular Polytopes (3 edition) Dover Publications, Inc. New York, ISBN 0-486-61480-8 3. Alexandroff P. (1961) Elementary concepts of topology. Dover Publications, Inc. New York, ISDN 0-486-60747-X 4. Jonker P.P., Komen E.R., Kraayveld M.A. (1995) A scalable, real-time image processing pipeline, Machine Vision Applications, (8): 110-121 5. Jonker P.P. (1992) Morphological Image Processing: Architecture and VLSI design. (Ph.D. Thesis) Kluwer Dordrecht, ISBN 90-2012766-7 6. P.P. Jonker and A.M. Vossepoel (1994), Connectivity in high dimensional images, Proc. MVA'94, IAPR Workshop on Machine Vision Applications (Kawasaki, Japan, Dec.1315), IAPR, Kawasaki, Japan, 30-33 7. Jonker P.P., Vermeij O. (1996) On skeletonization in 4D images. In: P. Perner, P. Wang, A. Rosenfeld (eds.), Advances in Structural and Syntactical Patt. Recogn., LNCS, vol. 1121, Springer Verlag, Berlin, 79-89 8. Jonker P.P. (2002), Skeletons in N dimensions using Shape Primitives, Pattern Recognition Letters, 23:677-686 9. Jonker P.P. (2003), Morphological Operations in Recursive Neighbourhoods, to appear in Pattern Recognition Letters 10. Serra J. (1982) Image analysis and mathematical morphology. Academic Press Inc., London 11. Soille P. (1999) Morphological Image Analysis, Springer Verlag, Berlin
Geometric Measures on Arbitrary Dimensional Digital Surfaces Jacques-Olivier Lachaud and Anne Vialard LaBRI, Univ. Bordeaux 1, 351 cours de la Lib´eration 33405 Talence, France {lachaud,vialard}@labri.fr
Abstract. This paper proposes a set of tools to analyse the geometry of multidimensional digital surfaces. Our approach is based on several works of digital topology and discrete geometry: representation of digital surfaces, bel adjacencies and digital surface tracking, 2D tangent computation by discrete line recognition, 3D normal estimation from slice contours. The main idea is to notice that each surface element is the crossing point of n − 1 discrete contours lying on the surface. Each of them can be seen as a 4-connected 2D contour. We combine the directions of the tangents extracted on each of these contours to compute the normal vector at the considered surface element. We then define the surface area from the normal field. The presented geometric estimators have been implemented in a framework able to represent subsets of n-dimensional spaces. As shown by our experiments, this generic implementation is also efficient.
1
Introduction
Many applications in the image analysis field need to represent and manipulate regions defined as subsets of n-dimensional images. Moreover, it is often necessary to perform geometric measurements on these regions and on the digital surfaces that form their boundaries. Classically, geometric estimators are defined over frontiers in 2D or 3D images. In this paper, we present a set of tools for the analysis of the geometry of arbitrary dimensional digital surfaces. This work is based on a concise coding of the cells of n-dimensional finite regular grids [8]. This coding induces a generic and efficient framework for implementing classical digital topology data structures and algorithms. We show here that this framework is also suited to defining the geometry of digital surfaces, namely by a careful use of digital surface tracking. Note that we do not compare our work with geometric definitions based on a continuous approximation of digital sets. Our topological and geometric definitions are purely discrete. Furthermore, they are much easier to define and compute in arbitrary dimension. Some authors define arbitrary dimensional digital surfaces as set of spels (pixels in 2D, voxels in 3D, n-cells in nD) with specific properties [11]. However, frontiers of regions in images are generally not digital surfaces in this sense. Moreover, it is not clear how to extend classical 2D and 3D discrete geometry I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 434–443, 2003. c Springer-Verlag Berlin Heidelberg 2003
Geometric Measures on Arbitrary Dimensional Digital Surfaces
435
estimators to these surfaces. This paper is concerned with digital surfaces that are defined as subsets of the cellular decomposition of Rn into a regular grid (pixel edges in 2D, voxel faces or surfels in 3D). This space was introduced in image analysis by Kovalevsky [6]. There are several approaches to defining discrete geometric estimators on digitized objects (e.g. see [3] for a recent survey). Our approach for tangent and normal estimation follows the basic idea of “slice” decomposition proposed by Lenoir et. al. [10]. Intuitively, there are n − 1 orthogonal 2D planes containing the point of interest. The intersection of each of those planes with the digitized object forms a contour on which a 2D tangent is computed. Lenoir builds the 3D normal as the vector product of the two extracted tangents. Tellier and Debled–Rennesson [13] proposed a similar technique where the tangent is defined as a discrete line segment. This paper extends these two works to arbitrary dimensional digital surfaces. We use the obtained normal estimator to compute the area of a digital surface. This definition coincides with the one proposed by Lenoir [9] in 3D. The paper is organized as follows. First we show how to represent boundaries of digital objects in arbitrary dimension as a set of surface elements (surfels) with a topology. This representation allows the definition of n − 1 contours around each surfel. Secondly we define a discrete tangent at a surfel on each of these contours and detail its computation algorithm. In the last section we combine these 2D information to obtain nD estimators (normal vector, elementary area, surface area). The presented material has been implemented in nD. We show its efficiency on some experiments. All the necessary information to reimplement it are provided.
2
Representation and Properties of Digital Surfaces
In this paper, we are interested in computing geometric characteristics of (oriented) digital surfaces that are boundaries of sets of spels. However, all the presented material is adaptable to any kind of digital surfaces (open or not, orientable or not) with little work. In this section, we assume we are working in a finite n-dimensional image forming a parallelepiped in Zn . We denote by M i the inclusive upper bound for the i-th coordinate of any spel. All coordinates have 0 as lower bound. 2.1
Cell Coding
There is an isomorphism betwen the cellular decomposition Cn of Rn into a regular grid and the n-dimensional Khalimisky space Kn [5]. This space is the cartesian product of n connected ordered topological spaces (COTS). A COTS can be seen as a set of ordered discrete points, like Z, whose topology alternates closed points and open points. If we define even points of Z as closed and odd points of Z as open, each point of Kn is then identified by its n integer coordinates, whose parities define its topological properties.
436
Jacques-Olivier Lachaud and Anne Vialard
Consequently, any cell c of Cn has exactly one corresponding point in Kn with coordinates (x0K , . . . , xn−1 K ). We propose to code any (unoriented) cell c as one binary word α xn−1 . . . xi . . . x0 , called the unsigned code of c, as follows: – The i-th coordinate xiK is coded by its binary decomposition after a rightshift (xi = xiK div 2). We say that xi is the i-th digital coordinate of c. – All coordinates are packed as one binary word (from xn−1 to x0 ). Every coordinate is allocated a fixed number of bits Ni given by Ni = log2 (M i )+1. – The parities of all coordinates are also packed as an n-bits word α with α = i (xiK mod 2)2i . The word α is called the topology of c. According to the isomorphism, cells of Cn that are k-dimensional (or k-cells) have a topology word composed of k 1’s. The coordinate where a surfel c has a 0 in its topology word is called the coordinate orthogonal to the surfel c and is denoted by ⊥ (c). The cell topology (dimension, open or closed along a coordinate, adjacent and incident cells) and geometry (coordinates in Zn , centroid, trivial normal and tangent vectors) can be computed from the code without any further information. It has been shown in [7] that most basic operations on cells (e.g. adjacence, incidence) have an efficient implementation that is independent from the dimension of space. All specific subsets of Cn (e.g., objects, digital surfaces, cubical complexes) have then an efficient and compact representation. 2.2
Oriented Cells, Boundary Operators, Bels, Boundary of an Object
To define some geometric characteristics (e.g. normal vector), a digital surface must be oriented (at least locally). It is thus convenient to associate an orientation to each cell of Cn . We therefore define the signed code of an oriented cell c by adding an orientation bit s (0 for positive orientation and 1 for negative orientation) to its unsigned code as follows: α s xn−1 . . . xi . . . x0 . The opposite cell −c of c is the same cell as c but with opposite orientation. With oriented cells, we can define boundary operators, which represent at the same time how cells are incident with each others and how orientations are propagated from one cell to another. Definition 1. Let c = ik . . . ij . . . i0 s xn−1 . . . xij . . . x0 be any cell with topology bits set to 1 on the coordinates ik , . . . , ij , . . . , i0 , n − 1 ≥ ik > . . . > ij > · · · > i0 ≥ 0 and the others bits set to 0. The symbol ˆij means that the bit ij is set to 0. Let τ = (−1)(k−j) . The set ∆ij c composed of the two oppositely signed cells ik . . . ˆij . . . i0 τ s xn−1 . . . xij . . . x0 and ik . . . ˆij . . . i0 −τ s xn−1 . . . xij + 1 . . . x0 , is called the lower boundary of the cell c along coordinate ij . The lower boundary ∆c of c is then the set of cells ∪l=0,...,k ∆il c. The lower boundary of a k-cell c thus corresponds to the set of k − 1-cells low incident to c with specific orientations (e.g. on Figure 1, +b is the positively
Geometric Measures on Arbitrary Dimensional Digital Surfaces
437
oriented 0-cell low incident to the 1-cell b along coordinate x). The upper boundary ∇ of a cell is defined symmetrically (the upper boundary is taken on topology bits set to 0). It can be shown that this definition of boundary operators induces that any cubical cell complex is a polyhedral complex. In the remainder of the paper, the set O is an object of the image I with an empty intersection with the border of I. Assume that all spels of O are oriented positively. We merge the sets ∆p with p ∈ O with the rule that two identical cells except for their orientation cancel each other. The resulting set of oriented surfels is called the boundary of O, denoted by ∂O. It is an oriented digital surface, whose elements are called bels of O. This surface separates the object O from its complement [7]. 2.3
Followers of Surfel, Bel Adjacency, Digital Surface Tracking
The bel adjacency defines the connectedness relations between bels bounding an object. It has two nice consequences: (i) the boundary of an object can be extracted by tracking the bels throughout their bel adjacencies [1]; (ii) sets of surfels can be considered as classical Euclidean surfaces, where one can move on the surface in different orthogonal directions (2 in 3D). The second property is thus essential for defining the geometry of digital surfaces. We start by defining which surfels are potentially adjacent to a given bel with the notion of follower. We then define two kinds of bel adjacency for each pair of coordinates. Definition 2. We say that an oriented r-cell q is a direct follower of an oriented r-cell p, p = ±q, if ∆p and ∆q have a common r − 1-cell, called the direct link from p to q, such that this cell is positively oriented in ∆p and negatively oriented in ∆q. The cell p is then an indirect follower of q. It is easy to check that any surfel has 3 direct followers and 3 indirect followers along all coordinates except the one orthogonal to the surfel. We order the followers consistently for digital surface tracking (see Figure 1a). Definition 3. Let b be an oriented n − 1-cell with ∇b = {+p, −q}. Let j be a coordinate with j =⊥ (b). The three direct followers of b along j are ordered as follows: (1) the first direct follower belongs to ∆j + p, (2) the second direct follower belongs to ∇j −b with +b direct link in ∆j b, (3) the third direct follower belongs to ∆j − q. Intuitively, when tracking a digital surface, you have 3 different possibilities for a move along a given coordinate. This is true for arbitrary dimension. The following definition shows which one to choose at each step. It is in agreement with the definitions of bel adjacencies proposed by Udupa [14], but easier to implement in our framework. Definition 4. Let b be a bel of ∂O, such that ∇b = {+p, −q} (thus p ∈ O and q ∈ O). For any coordinate j =⊥ (b), the bel b has one interior direct adjacent bel (resp. exterior direct adjacent bel) which is the first (resp. last) of the three ordered direct followers of b along coordinate j that is a bel of ∂O. The bel adjacency is the symmetric closure of the direct bel adjacency.
438
Jacques-Olivier Lachaud and Anne Vialard
{x, z}-contour
∈ ∆x − q (3) b
(a)
-q
∈ ∇x + b (2)
+p
+b ∈ ∆x b ∈ ∆x + p (1)
(b)
{y, z}-contour
Fig. 1. (a) Direct followers of a surfel b along coordinate x. (b) The two direct contours crossing at a given surfel in 3D.
In 3D, the interior (resp. exterior) bel adjacency along all coordinates induces the classical (6,18) bel-adjacency (resp. (18,6) bel-adjacency). Interior and exterior bel adjacencies can be mixed for different coordinate pairs. This might be useful in an application where the image data are not isotropic (e.g., some CT scan images, confocal microscopy). 2.4
Contours over Digital Surfaces
The following definition is consistent since a direct follower c of a surfel b along a coordinate j =⊥ (b) satisfies ⊥ (c) ∈ {⊥ (b), j}. Definition 5. Let S be a set of oriented surfels and i, j two distinct coordinates. A sequence of distinct surfels p0 , . . . , pk in S is called a direct {i, j}-contour over S iff: (i) ∀0 ≤ l ≤ k, ⊥ (pl ) ∈ {i, j}, and (ii) ∀0 ≤ l < k, pl+1 is a direct follower of pl along the coordinate i or j different from ⊥ (pl ). The next propositions state that contours can be defined over boundaries of objects for any pair of coordinates and that these contours can be seen as 4connected paths of pixels in the 2-dimensional plane that “contains” the contour (see Figure 1b for a 3D illustration). Proofs can be found in [7]. Proposition 1. Let b be any bel in ∂O and j any coordinate different from ⊥ (b). The sequence (pl )0≤l≤k of direct interior adjacent bels starting from b and going along either j or ⊥ (b) is a direct {⊥ (b), j}-contour over ∂O. Note that p0 is the direct interior adjacent bel of pk . Proposition 2. Given a direct {i, j}-contour C over a set of oriented surfels S with C = (pl )0≤l≤k , then the sequence D = (ql )0≤l
Geometric Measures on Arbitrary Dimensional Digital Surfaces
3
439
Discrete 2D Tangent over a 4-Connected Contour
From the last proposition, we can trace from any bel b n − 1 contours Cj (b), j =⊥ (b), on the boundary ∂O. Each of these contours is a 4-connected contour composed of edges and points in the 2D plane it spans. Bels are then contour edges and links are contour points. For each contour Cj (b), we define a discrete 2D tangent (βj (b), αj (b)) at b using a discrete line segment recognition algorithm. 3.1
Recognition of a 4-Connected Discrete Line Segment
An incremental algorithm was proposed for recognizing 8-connected line segments by Debled-Rennesson and Reveilles [4]. It was adapted for the recognition of 8 or 4-connected discrete tangent lines by Vialard [2] and by Tellier and Debled-Rennesson [13]. We recall here the principle of this line recognition algorithm in the case of 4-connected contours. A 4-connected discrete line of characteristics (a, b, µ) ∈ Z2 can be defined as the following set of discrete points [12]: {(x, y) ∈ Z2 , µ ≤ ax − by < µ + |a| + |b|}. The slope of the line is given by ab while µ decribes its location in the 2D plane. The real lines of equations ax − by = µ and ax − by = µ + |a| + |b| − 1 are called the upper and lower leaning lines. A point belonging to the upper (resp. lower) leaning line is called an upper (resp. lower) leaning point. Let us now consider a discrete line segment. We denote by U (resp. U ) the upper leaning point of minimum (resp. maximum) abscissa of this segment. In the same way, we denote by L (resp. L ) the lower leaning point of minimum (resp. maximum) abscissa of this segment. Given a starting point on the contour, we orient the x-axis in the direction of the following point. The initial characteristics of the line segment are (0, 1, 0) and U = L = (0, 0) and U = L = (1, 0). Now assume that the characteristics of the line segment are (a, b, µ) after adding m successive contour points. When adding the next contour point (x, y), we update the characteristics of the line according to the value r = ax − by with the rules defined in the following table. In the three first cases the point (x, y) extends the segment without changing its characteristics (a, b, µ). The new point may just become a leaning point of maximum abscissa. In cases (4) and (5) the segment plus the point (x, y) is still a line segment. In case (4) (resp. (5)) the slope of the extended line segment is greater (resp. lower) than the slope of the initial line segment. These two last cases are illustrated in Figure 2. Any other value of r indicates that the current line segment completed by point (x, y) is no longer a line segment. r = ax − by U U’ L L’ a b µ (1) µ < r < µ + |a| + |b| − 1 (2) r = µ (x, y) (3) r = µ + |a| + |b| − 1 (x, y) (4) r = µ − 1 (x, y) L’ yU − yU xU − xU axU − byU (5) r = µ + |a| + |b| U’ (x, y) yL − yL xL − xL axL − byL −|a| − |b| + 1
440
Jacques-Olivier Lachaud and Anne Vialard (x, y)
(a)
U’
(b)
U’
U’
U
U L’
L’ L
L (1, 2, -1)
(x, y) L’
U U’
L’
U L (2, 3, -2)
L (1, 1, 0)
(2, 3, -1)
Fig. 2. Recognition of a 4-connected line. (a) Slope increase - (b) Slope decrease.
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 3. Tangent line computation. (a) Initialization. (b-e) Growth of the tangent line segment. (f) The contour piece is no more a discrete line segment. The tangent line is thus the discrete line segment obtained at the previous step.
3.2
Discrete Tangent Computation
The tangent line segment at a contour edge e can be defined as the longest line segment corresponding to the contour and centered on e. This definition is a slight adaptation of the one given in [2]: here the discrete tangent is centered on a contour edge instead of a contour point. Computing the discrete tangent at e is performed by adding successively pairs of points, one of negative abscissa and one of positive abscissa, to a discrete segment. The preceding line segment recognition algorithm is therefore slightly adapted so that points are added alternatively to the front and to the back of the segment. The rules for adding a point to the back are very similar to the ones presented in the previous table. Figure 3 illustrates the tangent computation algorithm. Definition 6. Given a bel b and a coordinate j =⊥ (b), the 2D tangent vector is defined as (βj (b), αj (b)) = (b, a) where (a, b, µ) are the characteristics of the tangent line segment computed over the contour Cj (b).
4
Geometric Measures
In this section, we define the normal vector to and the area of a bel from its n−1 2D tangent vectors. We assume that (e0 , . . . , en−1 ) is the trivial orthonormal basis of Rn .
Geometric Measures on Arbitrary Dimensional Digital Surfaces
4.1
441
Tangent Vectors and Plane at a Bel; Normal Vector at a Bel
The orientations of the tangent vectors in the following definitions come from the definition of boundary operators (see computation of τ in Definition 1) and from the fact that contours are implicitly oriented by the sequence of direct links. Definition 7. Let b be a bel of ∂O. Let i =⊥ (b) and j a coordinate different from i. The j-th tangent vector tj (b) at b is the n-dimensional vector (−1)n−1−j βj (b)ej + (−1)n−i αj (b)ei . Those n − 1 tangent vectors at b span an n − 1-dimensional plane since they are linearly independent. We define the tangent plane at b as the affine plane parallel to these vectors and containing the centroid of b. It is now easy to define the normal vector at b. Definition 8. The normal vector n(b) at bel b on ∂O is the unit vector orthogonal to any vector of the tangent plane at b and pointing outside the object O. It α (b) u(b) with ∀j =⊥ (b), u(b) · ej = (−1)n−j βjj (b) , and is easy to find that n(b) = u(b) for i =⊥ (b), u(b) · ei = (−1)n−i−1 . 4.2
Elementary Area of a Bel; Area of a Boundary
As the boundary of an object is made of bels, each bel has a given contribution to the area of the whole boundary. Definition 9. The elementary area dσ(b) of a bel b is defined as dσ(b) = n−1 1/( d=0 |n(b)·ed |). The area of the boundary of O is then the sum b∈∂O dσ(b). The following theorem justifies the previous definition by examining the elementary area of each bel of a 3D plane. Theorem 1. Let U = (bc, 0, 0), V = (0, ac, 0), W = (0, 0, ab) be three points of R3 with a, b, c positive integer numbers. The continuous plane P containing the triangle U V W follows the equation ax + by + cz = abc and its normal vector n 1 (a, b, c). The digital plane Q, digitized version of P , follows the is thus √a2 +b 2 +c2 equation abc ≤ ax + by + cz < abc + a + b + c, and forms the vertices of a set of bels in C3 . Then the elementary contribution to the area of each bel of Q is 1/(n · e0 + n · e1 + n · e2 ). Proof. Each bel of Q can be projected onto the plane parallel to it and going through the origin. We restrict Q to the bels included in the positive octant. The number of bels m of Q is therefore obtained by counting the projected bels on each of the three planes of projection. There are acab/2 bels projected on x = 0, bcab/2 bels on y = 0 and bcac/2 bels on z = 0 so that m = abc 2 (a + b + c). Now this subset of Q corresponds exactly to the triangle U V W . The elementary area of each bel of Q is equal to the total area of the triangle U V W divided by the number of bels of Q that are part of the digitization of U V W . The area of U V W is given by the identity U V ∧ U W = abc(a, b, c) = 2area(UVW)n. A short computation concludes the proof.
442
Jacques-Olivier Lachaud and Anne Vialard
Table 1. Comparison between discrete geometric estimators and expected geometric measures. The object under consideration is a ball of increasing radius. For each bel b, we measure the positive angle φ in degree between the expected normal vector to the sphere and the discrete normal n(b). Mean value and standard deviation of φ from 0 are listed for 2D, 3D and 4D balls of increasing radii. The estimated area of the discrete balls, their number of surfels, and the computation time of the normal vector fields are also listed (tests made on a Celeron 400Mhz with 128Mbytes of memory). Object
2D ball 3D ball 4D ball r=50 r=1250 r=20 r=50 r=100 r=10 r=30
nb surfels 404 normal computation time (ms) 0 mean value of φ 2.24◦ std. dev. of φ from 0 6.47◦ area / expected area 1.011
(a)
(b)
10004 380 0.22◦ 1.30◦ 1.000
7542 170 3.82◦ 5.76◦ 0.994
47070 188502 33352 904648 1230 6210 1070 29960 2.19◦ 1.51◦ 6.75◦ 3.98◦ 3.46◦ 2.34◦ 8.09◦ 5.15◦ 0.997 0.998 1.042 1.042
(c)
Fig. 4. 3D exemples of normal vector computations. Surfaces are rendered with flat shading. (a) Sphere of radius 30 with trivial normals of bels. (b) Same object but with discrete normals. (c) Cube minus sphere with discrete normals.
This theorem can be extended to arbitrary dimension using the n-dimensional external product, since it also provides an area mesurement of n − 1-dimensional parallelograms. The preceding exposition is sufficient to understand the link between normal vector and elementary area of a bel without too cumbersome notations. A corollary to this theorem is that if the discrete object of interest is the digitization of a continuous object with good properties (boundary C 1 ), then the area of the discrete object tends toward the area of the continuous object as the discretization resolution increases. Our experiments have confirmed this theoretical result. Table 1 shows that the proposed discrete estimators of normal and area are consistent with expected values. Figure 4 illustrates the computation of normal vector field for two different objects.
5
Conclusion
We have defined several geometric measures (tangent plane, normal vector, elementary area, surface area) for boundaries of n-dimensional objects and we have
Geometric Measures on Arbitrary Dimensional Digital Surfaces
443
shown how to compute them efficiently. An immediate extension of this work is the definition and computation of the mean curvature field of n − 1-dimensional digital surfaces. Our main motivation is the development of a multidimensional discrete deformable model for image segmentation. Local area and curvature measurements are used so as to maintain a regular and smooth shape during the evolution of the model towards boundaries of image components.
References 1. E. Artzy, G. Frieder, and G.T. Herman. The theory, design, implementation and evaluation of a three-dimensional surface detection algorithm. Computer Graphics and Image Processing, 15:1–24, 1981. 2. J.P. Braquelaire and A. Vialard. Euclidean paths: a new representation of boundary of discrete regions. Graphical Models and Image Processing, 61:16–43, 1999. 3. D. Coeurjolly. Algorithmique et g´eom´etrie discr`ete pour la caract´erisation des courbes et des surfaces. PhD thesis, Universit´e Lumi`ere Lyon 2, France, dec 2002. 4. I. Debled-Renesson and J.P. Reveilles. A linear algorithm for segmentation of discrete curves. International Journal of Pattern Recognition and Artificial Intelligence, 9:635–662, 1995. 5. T. Y. Kong, R. D. Kopperman, and P. R. Meyer. A topological approach to digital topology. Am. Math. Monthly, 98:901–917, 1991. 6. V. A. Kovalevsky. Finite Topology as Applied to Image Analysis. Computer Vision, Graphics, and Image Processing, 46(2):141–161, May 1989. 7. J.-O. Lachaud. Coding cells of multidimensional digital spaces to write generic digital topology and geometry algorithms. Research Report 1283-02, LaBRI, University Bordeaux 1, Talence, France, 2002. 8. J.-O. Lachaud. Coding cells of digital spaces: a framework to write generic digital topology algorithms. In Proc. Int. Work. Combinatorial Image Analysis (IWCIA’2003), Palermo, Italy, ENDM. Elsevier, 2003. To appear. 9. A. Lenoir. Des Outils pour les Surfaces Discr`etes. Estimation d’Invariants G´eom´etriques. Pr´eservation de la Topologie. Trac´ e de G´eod´esiques. Visualisation. PhD thesis, Universit´e de Caen, France, sep 1999. 10. A. Lenoir, R. Malgouyres, and M. Revenu. Fast computation of the normal vector field of the surface of a 3D discrete object. In Proc. of 6th Discrete Geometry for Computer Imagery (DGCI’96), Lyon, France, volume 1176 of LNCS, pages 101–112. Springer-Verlag, 1996. 11. G. Malandain. On topology in multidimensional discrete spaces. Research Report 2098, INRIA, France, 1993. 12. J.P. Reveill`es. G´eom´etrie discr`ete, Calcul en nombres entiers et algorithmique. PhD thesis, Universit´e Louis Pasteur, Strasbourg, France, 1991. 13. P. Tellier and I. Debled-Rennesson. 3D discrete normal vectors. In Proc. of 8th Discrete Geometry for Computer Imagery (DGCI’99), Marne-la-Vall´ ee, France, volume 1568 of LNCS, pages 447–457. Springer-Verlag, 1999. 14. J. K. Udupa. Multidimensional Digital Boundaries. CVGIP: Graphical Models and Image Processing, 56(4):311–323, July 1994.
Nonlinear Optimization for Polygonalization Truong Kieu Linh1 and Atsushi Imiya2,3 1
School of Science and Technology, Chiba University, Japan 2 National Institute of Informatics, Japan 3 Insutitute of Media and Information Technology Chiba University, Japan
Abstract. In this paper, we first derive a set of inequalities for the parameters of a Euclidean line from sample pixels, and an optimization criterion with respect to this set of constraints for the recognition of the Euclidean line. Second, using this optimization problem, we prove uniqueness and ambiguity theorems for the reconstruction of a Euclidean line. Finally, we develop a polygonalization algorithm for the boundary of a discrete shape.
1
Introduction
In this paper, we aim to develop a polygonalization algorithm for the boundary of a discrete shape. For the reconstruction of a smooth boundary from sample points, the polygonalization on a plane is the first step. Following the polygonalization, we estimate the geometric features of a figure, such as the normal vector at each point on the boundary, and the length and area of planar shapes. We first derive a set of inequalities for the parameters of a Euclidean line from sample points, and an optimization criterion with respect to this system of inequalites. Second, we develop an algorithm for the computation of parameters of the Euclidean lines from pixels on a plane. Finally using this algorithm, we introduce a polygonalization algorithm. There are basically three types of models for the expression of a linear manifold in the grid space, supercover, standard, and naive models [1]. We deal with the supercover model on a plane. Recently, a linear-programming based method for the recognition of linear manifolds has been proposed [2,3]. This method is based on the mathematical property that a point set determines a system of linear inequalities for the parameters of a linear manifold, and the recognition process for a linear manifold is converted to the computation of the feasible region for this system of inequalities. The other class for recognition of a linear manifold is based on the binary relation among local configurations of pixels in 3 × 3, which characterize the local properties of the discrete linear manifold [4,5,6,7]. Our method proposed in this paper is based on the former method for the derivation of constraints on parameters of the Euclidean line that passes through pixels. Furthermore, we derive a minimization criterion for the parameters of the Euclidean line with respect to the constraints yielded from sets of pixels on a plane. I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 444–453, 2003. c Springer-Verlag Berlin Heidelberg 2003
Nonlinear Optimization for Polygonalization
2
445
Optimization Problem for Recognition
Setting p = (p1 , p2 ) to be a point in Z2 , a hypercube, v(p) = {x| |xi − pi | ≤
1 }, 2
(1)
for x = (x1 , x2 ) is called a pixel of Z2 . Hereafter, P = {xi }ki=1 stands for the 2 set of the centers of the pixels V = {v(xi )}N i=1 . For a = (a, b) ∈ Z and µ ∈ Z, the supercover of the line a x + µ = 0 is the collection of pixels which satisfy the inequality 1 (2) |a x + µ| ≤ |a|1 , 2 where |x|1 is the l1 norm of vector x. Setting n sx = (sgnx1 , sgnx2 ) the l1 -norm is expressed as |x|1 = sx x, since |x|1 = i=1 |xi |. This expression derives the expression. 1 1 (3) − a sa ≤ a x + µ ≤ a sa . 2 2
Recognition of a Euclidean line is stated as the following problem. Problem 1 For a collection of sample points xi for i = 1, 2, · · · , N , if there exists a Euclidean line whose supercover contains all pixels {v(xi )}N i=1 , compute parameters a and µ. This problem is mathematically equivalent to finding parameters a and µ which satisfy the system of inequalities |a xi | ≤
1 |a|1 , i = 1, 2, · · · , N. 2
(4)
Equation (4) is equivalent to the system of inequalities 1 1 a xi + µ ≥ − |a|1 , −a xj − µ ≥ − |a|1 , i, j = 1, 2, · · · , N. 2 2
(5)
Furthermore, eq. (5) derives the system of inequalities 1 1 a (xij − sa ) ≥ 0, −a xi − |a|1 ≤ µ ≤ −a xi + |a|1 , 2 2
(6)
for i, j = 1, 2, · · · , k, i = j, where xij = xi − xj . Since the vector xij + sa , such that i = j, is a constant for i, j = 1, 2, · · · , N , eq. (6) determines a cone in Z2 , that is, a integer cone in R2 , depending on each 12 -space in which a lies. Since 1 1 |a|1 ≤ (|a|1 + |µ|) ≤ |a|1 + |µ|, 2 2 we solve the following problem.
(7)
446
Truong Kieu Linh and Atsushi Imiya
Problem 2 Find a ∈ Z2 and µ ∈ Z which minimizes z = y + |µ|, for z ∈ Z with respect to 1 1 (8) − a xi + y ≤ µ ≤ a xi + y, 2 2 and for y which minimizes y = |a|1 , for a ∈ Z2 with respect to a y ij ≥ 0, i, j = 1, 2, · · · , N, i = j,
(9)
where y ij = xi − xj + sa . For the supercover of a line, we have the following theorem. Theorem 1 Pixels V (xi ) for |a xi + µ| ≤ 12 |a|1 , if |a|1 = 2n, then only a x + µ = 0 passes through all V (v i ), and if |a|1 = 2n+1, for |γ| ≤ 12 , all a x+µ+γ = 0 passes through all pixels V (xi ) for |a xi + µ| ≤ 12 |a|1 . (Proof) If |a| = 2n, then we have 0 ≤ a (xi − 12 e) + µ ≤ n. For y i = xi − 12 e, 1 1 {v(xi )|0 ≤ a x+µ+ |a|1 ≤ |a|1 } = {v(x)|0 ≤ a(xi + e)+µ ≤ |a1 |}. (10) 2 2 Furthermore, a (x − 12 e) + µ = 0 and a x + µ = 0 are the same line. This geometric property leads to the first part of the theorem. If |a|1 = 2n + 1, we have 0 ≤ a xi + µ + γ + |a2|1 ≤ |a|1 for |γ| ≤ 12 . Therefore, a xi + µ + γ = 0 and a xi + µ = 0 determine the same set of pixels. This geometric property leads to the second part of the theorem. (Q.E.D.) For |a|1 = 2n + 1, lines exist in a strip whose center is a x + µ = 0 and width is 2|a1 a| . For an integer λ if λµ + λγ is an integer, both a x + µ + γ = 0 and λa x + λµ + λγ = 0 are the same line. Therefore, |λ| > 1, since |a|1 + |µ| < |λ|(a1 + |µ|) < |a |1
(11)
for a = (λa , λ(µ + γ)) . This inequality geometrically means that the line which minimizes (|a|1 +|µ|) is the central lines in the strip region. This geometric property shows the validity of our minimization algorithm. Therefore, we have the following theorem. Theorem 2 If and only if |a|1 = 2n, the supercover |a x + µ| ≤ 12 |a|1 contains 2 × 2 squares. (Proof ) If |a|1 = 2n, the Euclidean line a x + µ = 0 passes through the point x + 12 e for x ∈ Z n . Moreover, for pixels v(x) and v(x + ei ), if |a x + µ| ≤ 12 |a|1 , the Euclidean line a x + µ = 0 passes through the point x + 12 e, and |a|1 = 2n. (Q.E.D.) This theorem leads to the conclusion that, for a supercover, 2 × 2 squares guarantee the uniqueness of the Euclidean reconstruction of a plane. For gcd(a, b, µ) = 1, let gcd(a, b) = g, a = ag , and b = gb . For the supercover of the line L, ax + by + µ = 0, elementary number theory derives relations in tables 1 and 2, on the uniquness of the Euclidean reconstruction of lines with
Nonlinear Optimization for Polygonalization
447
Table 1. Reconstruction of a line from the supercover with bubbles. gcd(a, b) a + b, a + b Equivalent line Universal line g=1
a + b : even
L
∅
g=2
a + b : odd
L
∅
g>2
×
×
×
Table 2. Reconstruction of a line from the supercover without bubbles. Center line
L
L
a x + b y + µ = 0 where µ ∈ Z, µ − 12 < µ < µg + 12 g
Equivalent line gcd(a, b) = 1 a + b = odd kax + kby + kµ + kε = 0 where k ∈ Z, |ε| < 12 , kε ∈ Z gcd(a, b) = 2 a + b = odd kax + kby + kµ + kε = 0 where k ∈ Z, |ε| < 1, kε ∈ Z gcd(a, b) > 2 a + b = odd
Universal line
2ax + 2by + 2µ ± 1
ax + by + µ ± 1 = 0
ka x + kb y + kµ + kε = 0 where 2a x + 2b y + 2µ ± 1 1 k ∈ Z, |ε| < 2 , kε ∈ Z
gcd(a, b) > 2, a + b = even
2a x + 2b y + 2µ + 1 = 0 ka x + kb y + kµ + kε = 0 a x + b y + µ = 0 where µ ∈ Z, where and µ µ − 1 < µ < k ∈ Z, 0 < ε < 1, kε ∈ Z a x + b y + µ + 1 = 0 g g
bubbles and with out bubples, respectively, from the geometrical and algebraic properties of bubbles. In the tables Q is the set of all quotient numbers In tables, the center line of the supercover L is the line which minimizes the optimization criterion. The equivalent lines of L are lines which define the same supercover with L. The universal lines of L are the lines which contain all pixels of the supercover of L. The supercover of the universal line of L always contains bubbles. These relations imply that if gcd(a, b) = 1, the line which minimizes the criterion is uniquely computed. Furthermore, if a + b = odd, the line reconstructed from the supercover does not pass through the corners of pixels.
3
Algorithm for Line Recognition
Generally, we can set a > 0 and b > 0. Then, we have the following two equations, case 1 : 0 ≤ ax + by + µ +
a+b ≤a+b 2
(12)
448
Truong Kieu Linh and Atsushi Imiya
case 2 : 0 ≤ ax − by + µ +
a+b ≤ a + b. 2
(13)
We set Xij = xi − xj + 1, Yij = yi − yj + 1, for P = {xi = (xi , yi ) |xi , yi ∈ Z, i = 1, 2, · · · , N },
(14)
where x1 ≤ x2 ≤ x3 ≤ · · · ≤ xN . We define sets of points as H = {(Xij , Yij ) |i = j, i, j = 1, 2, · · · , n}, Q++ = {(Xij , Yij ) |(Xij , Yij ) ∈ H, Xij > 0, Yij > 0, i = j}, Q−− = {(Xij , Yij ) |(Xij , Yij ) ∈ H, Xij < 0, Yij < 0, i = j}, Q+− = {(Xij , Yij ) |(Xij , Yij ) ∈ H, Xij > 0, Yij < 0, i = j}, Q−+ = {(Xij , Yij ) |(Xij , Yij ) ∈ H, Xij < 0, Yij > 0, i = j}, Q0X = {(Xij , Yij ) |(Xij , Yij ) ∈ H, Xij = 0, i = j},
(15)
Q0Y = {(Xij , Yij ) |(Xij , Yij ) ∈ H, Yij = 0, i = j}. For these sets of points, if at least one of the four conditions 1. Q−− = ∅, 2. ∀(Xij , Yij ) ∈ Q0X , Yij ≤ 0, 3. ∀(Xij , Yij ) ∈ Q0Y , Xij ≤ 0 , 4. Q+− = ∅, Q−+ = ∅, and X nm min(− Yijij )|(Xij ,Yij ) ∈Q+− < max(− X Ynm )|(Xnm ,Ynm ) ∈Q−+ is satisfied, the system of inequalities has no solution. Therefore, we have the following theorem. Theorem 3 If Q−− = ∅ ∀(Xij , Yij ) ∈ Q0X , Yij > 0 ∀(Xij , Yij ) ∈ Q0Y , Xij > 0 min(− Xij ) Xnm Yij |(Xij ,Yij ) ∈Q+− ≥ max(− Ynm )|(Xnm ,Ynm ) ∈Q−+ ,
(16)
then the point set P is the supercover of the Euclidean line ax + by + µ = 0 for a > 0 and b > 0 which lie in the region X ij a + Yij b ≥ 0, Xmn a + Ymn b ≥ 0, where Xij ) Yij |(Xij ,Yij ) ∈Q+− Xnm (X ) . nm , Ynm ) = max(− Ynm |(Xnm ,Ynm ) ∈Q−+ (X ij , Yij ) = min(−
(17)
Nonlinear Optimization for Polygonalization
449
For x = (xi , yi ) , we set Xij = (xi − xj + 1 and Yij = yj − yi + 1. Then, for H = {(Xij , Yji ) |i = j, i, j = 1, 2, · · · , N } Q++ = {(Xij , Yji ) |(Xij , Yji ) ∈ H, Xij > 0, Yji > 0, i = j} Q−− = {(Xij , Yji ) |(Xij , Yji ) ∈ H, Xij < 0, Yji < 0, i = j} Q+− = {(Xij , Yji ) |(Xij , Yji ) ∈ H, Xij > 0, Yji < 0, i = j} Q−+ = {(Xij , Yji ) |(Xij , Yji ) ∈ H, Xij < 0, Yji > 0, i = j}
(18)
Q0X = {(Xij , Yji ) |(Xij , Yji ) ∈ H, Xij = 0, i = j} Q0Y = {(Xij , Yji ) |(Xij , Yji ) ∈ H, Yji = 0, i = j}, if at least one of the four conditions 1. Q−− = ∅, 2. ∀(Xij , Yji ) ∈ Q0X , Yji ≤ 0, 3. ∀(Xij , Yji ) ∈ Q0Y , Xij ≤ 0, 4. Q+− = ∅, Q−+ = ∅, and X mn min(− Yjiij )|(Xij ,Yji ) ∈Q+− < max(− X Ynm )|(Xmn ,Ynm ) ∈Q−+ . is satisfied, the system of inequalities has no solution. Therefore, we have the following theorem. Theorem 4 If Q−− = ∅ ∀(Xij , Yji ) ∈ Q0X , Yji > 0 ∀(Xij , Yji ) ∈ Q0Y , Xij > 0 min(− Xij ) Xmn Yji |(Xij ,Yji ) ∈Q+− ≥ max(− Ynm )|(Xmn ,Ynm ) ∈Q−+ ,
(19)
then the point set P is the supercover of the Euclidean line ax − by + µ = 0 for a > 0 and b > 0 which lie in the region X ij a + Yji b ≥ 0, Xmn a + Ynm b ≥ 0
(20)
where Xij ) Yji |(Xij ,Yji ) ∈Q+− Xmn (X ) . mn , Ynm ) = max(− Ynm |(Xmn ,Ynm ) ∈Q−+ (X ij , Yji ) = min(−
Assuming that the feasible region of the inequalites is the cone bounded by two lines α1 x + β1 y ≤ 0 and α2 x + β2 y ≥ 0 Since a > 0 and b > 0, the minimum of |a| + |b| is computed by the following algorithm.
450
Truong Kieu Linh and Atsushi Imiya
step 1: Set a + b = k step 2: The line a + b = k crosses at points (a1 , b1 ) and (a2 , b2 ) with α1 a + β1 b = 0 and α2 a + β2 b = 0, respectively. 1 2 Then, we have a1 = β1β−α k and a2 = β2β−α k. 1 2 step 3: k := 2. step 4: For a = {a|a1 ≤ a ≤ a2 , a ∈ Z}, min{k}, b = k − a, if both a and b are integers then stop, else k := 1. For ax + by + µ = 0, a > 0 and b > 0, µ satisfies the inequality 1 1 1 1 max{−(xi + )a − (yi + )b} ≤ µ ≤ min{( − xi )a + ( − yi )b}. 2 2 2 2 Therefore,
(21)
– if max{−(xi + 12 )a − (yi + 12 )b} ≥ 0, µ = max{−(xi + 12 )a − (yi + 12 )b}. – if min{( 12 − xi )a + ( 12 − yi )b} ≤ 0, µ = min{( 12 − xi )a + ( 12 − yi )b}. – if max{−(xi + 12 )a − (yi + 12 )b} < 0 and min{( 12 − xi )a + ( 12 − yi )b} > 0, µ = 0. Furthermore, for ax−by+µ = 0, where a > 0 and b > 0, µ satisfies the inequality 1 1 1 1 max{−(xi + )a + (yi − )b} ≤ µ ≤ min{( − xi )a + ( + yi )b}. 2 2 2 2 Therefore,
(22)
– if max{−(xi + 12 )a + (yi − 12 )b} ≥ 0, µ = max{−(xi + 12 )a + (yi − 12 )b}, – if min{( 12 − xi )a + ( 12 + yi )b} ≤ 0, µ = min{( 12 − xi )a + ( 12 + yi )b}, – if max{−(xi − 12 )a + (yi + 12 )b} < 0 and min( 12 − xi )a + ( 12 − yi )b > 0, µ =0.
4
Polygonalization
Using the optimization procedure for the recognition of a Euclidean line from a collection of pixels, in this section we develop an algorithm for the polygonalization of the discrete boundary of a binary shape. We assume that the 4-connected boundary is extracted by an appropriate boundary extraction method Setting P to be a digital curve which is a sequence of 4-connected pixels, our problem is described as follows. n(i)
Problem 3 For a digital boundary curve P, setting {pij }j=1 = Pi , derive a partition of P, P = ∪ni=1 Pi , such that |Pi ∩ Pi+1 | = ε, where ε is an appropriate integer, which minimizes n (|ai |1 + µi ) (23) z= i=1
for the system of inequalities, |a i xij + µi | ≤ for i = 1, 2, · · · , n and j = 1, 2, · · · , n(i).
1 |ai |1 , 2
(24)
Nonlinear Optimization for Polygonalization
451
To solve this minimization problem, we prepare the following lemmas. Lemma 1 Setting p1 = (x1 , y1 ) and p2 = (x2 , y2 ) to be a pair of points on supercover L(a, b, µ) = {x, y ∈ Z||ax + by + µ| ≤
1 (|a| + |b|)}, 2
the number of pixels between p1 and p2 along this supercover is |p1 − p2 | + 1 if |a| + |b| = odd, N (p1 , p2 ) = −x2 | |p1 − p2 | + [ |x1|b| ] + δ if |a| + |b| = even,
(25)
(26)
where δ ∈ {1, 2} Lemma 2 For a pair of points p1 = (x1 , y1 ) and p2 = (x2 , y2 ) on a supercover, setting p3 to be the center of a pixel on this supercover between these two points, we have the relation |det(p2 − p3 , p1 − p3 )| ≤ |p2 − p1 |1 .
(27)
Using these properties of the supercover, we introduce the following algorithm. step 1: Input P = {pi }n0 . step 2: Set head = 0, tail = 0, j = 0, Lj = {x}tail head . step 3: Select the maximum of k for pk ∈ P, such that N (phead , pk ) = |phead − pk | + 1, set the maximum of k kmax and put tail = kmax . step 4: If there exist a line lj = {ax + by + µ = 0}, such that |a| + |b| = odd, whose supercover contains Lj , then go to step 6. step 5: For line segment phead , ptail which passes through a pair of points phead and ptail , select a point pd ∈ Lj whose distance to this line segment is maximum, set tail = d, and go to step 4. step 6: If lines lj−1 and lj are parallel, then go to step 8. step 7: For lj−1 ∩ lj = (uj , vj ) and phead = (xh , yh ) , if the conditions |uj − xh | ≤ 12 and |vj − yh | ≤ 12 are not satisfied, then go to step 9. step 8: Set head := head − 1 and go to step 3. step 9: Output Lj and lj . step 10: If tail = n, then set head = tail, j = j + 1 and go to step 3, else stop. Geometrically, the algorithm detects the candidate of the minimum-lenght path between a pair of points on the digital boundary, examining the relation
452
Truong Kieu Linh and Atsushi Imiya
Fig. 1. Boundary pixels and the Euclidan polygon of SS253TL2 [8]. Table 3. Parameters of lines for the boundary of SS253TL2 [8]. The parameters of the polygonal edges
(1, −2, 117) (0, 1, −64) (4, −1, 8) (8, 7, −518) (4, 1, −134) (0, 1, −46) (2, −1, −2) (8, −1, −157) (1, −10, 597) (1, 0, −37) (1, −2, 86) (12, −1, −332) (5, −2, −64) (3, −2, −5) (1, 0, −26) (1, −6, 179) (1, 0, −34) (1, −12, 414) (4, 1, −227) (1, −2, 40) (1, 0, −49) (0, 1, −59) (10, 1, −589) (0, 1, −45) (2, 1, −194) (6, −1, −380) (6, −5, −153) (1, −6, 300) (1, 2, −214) (3, 2, −397) (10, −1, −897) (0, 1, −46) (10, 1, −1260) (2, −17, 360) (1, −38, 1196) (2, −1, −46) (0, 1, −30) (1, 0, −29) (3, 2, −143) (4, −1, −110) (1, 2, −66) (4, 1, −156) (2, −1, −61) (1, −4, −3) (4, 3, −137) (1, 0, −26) (0, 1, −15) (3, 4, −127) (2, 1, −54) (6, 1, −111) (9, −2, −53) (1, 0, −15) (3, 2, −137) (2, 1, −73) (1, 0, −7) (2, −1, 47) (1, 0, −7) (0, 1, −62)
between numbers of pixels on a supercover. If this candidate determines a Euclidean line, then the algorithm computes the parameters solving the nonlinear optimization problem derived in the section 2. If this candidate does not determine a Euclidean line, by separating the candidate path, the algorithm continues the same procedure in order to detect a Euclidean line. In this algorithm, steps 4, 7, and 8 are achieved to fulfill the conditions that polygonal edges do not cross the corners of the pixels, that the successive two lines are not parallel, and that the polygonal vertices exist in the pixels of the digital boundary. According to the greedy property of the algorithm, this algorithm stops and fulfils the uniqueness of the solution for the starting point. In Figure 1, we show a result of the polygonalization by this algorithm for a graphical ornament [8]. In Figure 1, the boundary is extracted from a binary graphical ornament and the polygonal curve is superimposed on the sequence of boundary pixels. Table 3 shows the parameters of the Euclidean lines for the polygonalization for the boundary.
Nonlinear Optimization for Polygonalization
5
453
Conclusions
We developed an algorithm for the computation of the parameters of a Euclidean line from pixels on a plane. We also proved uniqueness and ambiguity theorems for the reconstruction of Euclidean lines. We have also developed an algorithm for the polygonalization of a digital 4-connected boundary of binary shapes. The standard model of a discrete line is defined as 0 ≤ a x + µ < |a|1 .
(28)
Assuming that a > 0, for a standard line, we have the system of inequalities, a xi + µ ≥ 0, a (e − xj ) − µ > 0, −a xi ≤ µ < a (e − xj )
(29)
from the collection of sample points P . Therefore, the feasible region of a is defined by the system of inequalities a > 0, a (xi − xj + e) > 0.
(30)
Then, setting xij = xi − xj + e, the algorithm proposed in this paper recognizes a standard line and reconstructs a Euclidean line from sample pixels.
References 1. Andres, E., Nehlig, P., Francon, J., Supercover of straight lines, planes, and triangles, LNCS, 1347, 243-254, 1997. 2. Francon, J., Schramm, J.M., Tajine, M., Recognizing arithmetic straight lines and planes, LNCS, 1176, 141-150, 1996. 3. Buzer, L., An incremental linear time algorithm for digital line and plane recognition using a linear incrimental feasibility problem, LNCS, 2301, 372-381, 2002. 4. Barneva, R. P., Brimkov, V. E., Nehlig, P., Thin discrete triangular meshes, Theoretical Computer Science, 246, 73-105, 2000 D 5. Schramm, J.M., Coplanar tricubes, LNCS, 1347, 87-98, 1997. 6. Vittone, J., Chassery, J. M., Digital naive planes understanding, Proceedings of SPIE, 3811, 22-32, 1999. 7. Reveilles, J.-P., Combinatorial pieces in digital lines and planes, Proceedings of SPIE, 2573, 23-34, 1995. 8. SS253TL2 in Graphic Ornaments , The Pepin Press-Agile Rabbit Edition; Amsterdam, 2001.
A Representation for Abstract Simplicial Complexes: An Analysis and a Comparison Leila De Floriani, Franco Morando, and Enrico Puppo Department of Computer and Information Sciences, University of Genova Via Dodecaneso 35, 16146 Genova, Italy
Abstract. Abstract simplicial complexes are used in many application contexts to represent multi-dimensional, possibly non-manifold and nonuniformly dimensional, geometric objects. In this paper we introduce a new general yet compact data structure for representing simplicial complexes, which is based on a decomposition approach that we have presented in our previous work [3]. We compare our data structure with the existing ones and we discuss in which respect it performs better than others. Keywords: Non-manifold modeling, simplicial complexes,data structures.
1
Introduction
Geometric cell complexes are widely used to represent multi-dimensional geometric objects in many applications. In particular, simplicial complexes have received great attention both from a theoretical and from a practical point of view. In fact, their combinatorial properties make them easier to understand, represent and manipulate than more general cell complexes. A data structure representing a complex should not only describe its shape unambiguously, but should also support efficient traversal and editing operations [13]. Although most work in the geometric modeling literature has been aimed at representing just three-dimensional manifold objects, several authors have pointed out the need of developing more general data structures, which can represent also higher dimensional and/or non-manifold and non-uniformly dimensional objects [8,15,10]. Non-manifold singularities in modeled objects occurs as a side-effect of feature extraction from images, 3D reconstruction or as a byproduct of severe discretization. Sometimes singularities are actually essential when, for instance, we choose to model the semantic content of an image (e.g. [9]) with an object of mixed dimensionality. This generality is usually paid in terms of some overhead in storage costs. On the other hand, most objects encountered in the applications contain a relatively small number of non-manifold singularities. Thus, it is important to develop data structures that are not burdened by an excessive overhead, when they are used to represent manifold objects, i.e., they scale well with the degree of “non-manifoldness”. I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 454–464, 2003. c Springer-Verlag Berlin Heidelberg 2003
A Representation for Abstract Simplicial Complexes
455
In this paper, we first review several data structures available from the literature for non-manifold modeling. Next, we describe a two-level data structure that we call Non-Manifold Decomposition Data Strucutre (NMD-DS). The NMD-DS can represent any simplicial complex in any dimension and downscales well to the manifold case. This data structure is based on a scheme for decomposing non-manifold complexes into nearly manifold parts, that we presented in [3]. A complex is decomposed in a unique way into a reduced number of components such that each component is as free as possible from singularities. Each decomposition component belongs to a well-understood class of complexes, that we called initial quasi-manifolds. Such complexes are simple enough to be represented with a data structure, having a cost comparable to those used for representing manifolds. We call this data structure the Initial Quasi Manifold Data Structure (IQM-DS). This data structure was introduced in [4] and is detailed here in Section 5.1. The collection of representations of components constitutes the first level of our data structure. The assembly of all components is represented in the second level, which is designed to support efficient traversal of the complex across different components.
2
Background
Purely geometrical aspects are not relevant in the design of data structures because geometric embedding is always encoded by adding just coordinates to vertices. Therefore, we will address only abstract complexes, by focusing on their combinatorial structure and on the topological relations among their cells. Abstract Simplicial Complexes. Let V be a finite set of elements that we call vertices. An abstract simplicial complex on V is a subset Ω of the set of (non empty) subsets of V such that: {v} ∈ Ω for every vertex v ∈ V ; and if γ ⊂ V is an element of Ω, then every subset of γ is also an element of Ω. Each element of Ω is called an abstract simplex, or just a simplex. The dimension of a simplex γ ∈ Ω, denoted dim(γ), is the number of vertices in γ minus one. A cell of dimension s is called an s-cell. A complex Ω is called d-dimensional or a d-complex if maxγ∈Ω (dim(γ)) = d. Each d-cell of a d-complex Ω is called a maximal cell of Ω. The set of all cells of dimension smaller or equal to m is called the m-skeleton of Ω (denoted by Ω m ). The set of all simplices of dimension m will be denoted by Ω [m] . It is easy to see that Ω m is a subcomplex of Ω and Ω [m] is not. The boundary ∂γ of a cell γ is defined to be the set of all proper subsets of γ. Cells ξ in ∂γ are called faces of γ. Similarly, the co-boundary or star of a cell γ is defined as γ = {ξ ∈ Ω | γ ⊂ ξ}. Cells ξ in γ are called co-faces of γ. Any cell γ such that γ = {γ} is called a top cell of Ω. Two distinct cells are said to be incident if and only if one of them is a face of the other. Two simplices are called s-adjacent if and only if they share an
456
Leila De Floriani, Franco Morando, and Enrico Puppo
s-face. In particular, two p-simplices, with p > 0, are said to be adjacent if they are (p − 1)-adjacent. Two vertices are called adjacent if and only if they are both incident at a common 1-cell. The link of a cell γ, denoted by lk(γ), is the set of all faces of co-faces of γ, that are neither incident at, nor adjacent to γ. A h-path is a sequence of simplices (γi )ki=0 such that two consecutive simplices in the sequence γi−1 γi are h-adjacent. Two simplices γ and γ are h-connected if and only if there exist a h-path such that γ is a face of γ0 and γ is a face of γk . A subset Ω of a complex Ω is called h-connected iff every pair of its vertices are h-connected. Classes of Complexes. A d-complex Ω in which every non-maximal simplex is a face of some maximal simplex is regular or uniformly d-dimensional. A s-simplex γ in a d-complex, with 0 ≤ s ≤ d − 1, is a manifold simplex if and only if its link is combinatorially equivalent either to a (d − s − 1)-sphere, or to a (d − s − 1)-ball [7]. If γ is not a manifold simplex, it is called a singularity. A regular (d−1)-connected d-complex where all (d−1)-simplices are manifold is called a combinatorial pseudomanifold. A regular d-complex where all vertices are manifold is called a combinatorial d-manifold. In a combinatorial manifold all simplices are manifold. Topological Relations. Let γ be a p-simplex in a d-complex Ω, with 0 ≤ p ≤ d. For each integer value q, 0 ≤ q ≤ d, we define the topological relation Rpq (γ) as a retrieval function that returns q-cells of Ω. Whenever p < q function Rpq (γ) returns the set of simplices of dimension q that contains γ. Similarly, for p > q, function Rpq (γ) returns the set of simplices of dimension q that are contained in γ. Relation Rpp , for p > 0 is defined using Rqp for q < p, as Rpp (γ) = ∪v∈γ R(p−1)p (γ − {v}), i.e., Rpp (γ) gives all p-simplices which are (p − 1)-adjacent to γ. Similarly, R00 (v) = ∪e∈R01 (v) {e − {v}}, i.e., R00 (v) gives all 0-simplices which share a 1-simplex with v.
3
Related Work
Several data structures for manifolds can encode partially the non-manifold domain using simplicial and cell complexes. Dimension-independent data structures have been proposed for d-dimensional manifold complexes, which include the Cell Tuple (CT) [1], the n-G-map (nGM) [12] for cellular complexes, and the Indexed data structure with Adjacencies (IA) for simplicial complexes (which directly extends to arbitrary dimension, being called winged representation in [15]). If the IA is used to encode a simplicial d complex 2(d + 1) references are needed for each d-simplex. If either CTs or n-G-maps are used to describe just simplicial complexes, they require (d + 1)!(d + 1) references for each d-simplex. This represent a storage cost much bigger than that of IA, for a factor that grows combinatorially with the dimension of the complex.
A Representation for Abstract Simplicial Complexes
457
The representation domain of all such data structures actually extends beyond the class of d-manifolds. The IA, altough extremely compact, can only describe Euclidean pseudomanifolds embedded in the Euclidean d-dimensional space. The n-G-maps describes a larger sub-class of pseudomanifolds introduced in [12], called cellular quasi-manifolds. The representation domain of CT is similar to that of n-G-maps (see [1] for details). However, none of them can encode completely the non-manifold domain. A data structure for encoding any two-dimensional simplicial complex, called the triangle-segment (TS) data structure, has been proposed in [2]. The TS extends the IA to deal with non-manifoldness. This data structure is quite compact, since it requires at most 4nst additional references with respect to the IA, where nst denotes the number of top simplices incident to a non-manifold vertex. Moreover, the TS data structure downscale to IA in the manifold case. Data structures for non-manifold, non-regular three-dimensional cell complexes have been proposed for modeling non-manifold solids. They are basically all variants of the Radial-Edge (RE) data structure [10]. The RE encodes any 3-cell implicitly through the manifold 2-complex partitioning its boundary. A face can be shared by at most two 3-cells. More compact versions of the RE, namely the Tri-cyclic Cusps (TCC) data structure ([8] and the Partial Entity (PE) data structure [11], have been proposed more recently. To give an idea of storage costs for these data strucutres we can compute the number of references necessary to encode a simplicial 3-complex by using such data structures. Let v, e, f , t be, respectively, the number of 0-, 1-, 2- and 3-cells/simplices in the non-manifold solid. Then, the RE uses 155t + 2f + e + v references, the TCC uses 94t + f + e + v references and finally storage requirements for PE reduces to 27t + 19f + 2e + v references. Experimental evaluations reported in [11] show that these data structures do not downscale well to the manifold case, i.e. they are extremely inefficient when used to encode manifolds. In summary, we can conclude that data structures that fully models nonmanifold solids do not downscales well to the manifold case. The data structure NMD-DS, presented in this paper, downscales effectively its storage requirements when going into the manifold domain. Still NMD-DS can encode a generic, possibly non-manifold, abstract simplicial d-complex. Of course, there are alternative ways for implementing abstract simplicial complexes, though not efficient. For instance simplicial sets, close to simplicial complexes, can be implemented as variants of incidence graphs [5], and chains of maps [6].
4
The Standard Decomposition
In this section, we summarize the results of previous work [3,14], in which we proposed a sound decomposition of non-manifold complexes. We say that a decomposition Ω is an essential decomposition of Ω if and only if all simplices of Ω that must be pasted together to produce Ω are glued at some singularity (non-manifold face) of Ω. The decompositions in Figures 1c and 1d are examples of essential decompositions for the complex of Figure 1a. The decompositions in Figures 1b and 1e
458
Leila De Floriani, Franco Morando, and Enrico Puppo
v a
b
c
v1
v2
e
d
Fig. 1. A 2-complex with a non-manifold edge (having three incident triangles) and a non-manifold vertex V marked in bold (a); four possible decompositions of the complex (b).
are non-essential decompositions. The decomposition in Figure 1e is a manifold complex, but is not essential because we split along a manifold edge (marked in bold). We consider essential decompositions as the only candidates, and we define the standard decomposition ∇Ω as the most decomposed essential decomposition. It can be proven [14] that the standard decomposition exists, and it is unique and it is the decomposition that is obtained by cutting the complex Ω along all its non-manifold faces. For instance, the complex of Figure 1d is the standard decomposition of the complex in 1a. In [3], we have also presented an algorithm that computes the standard decomposition ∇Ω in O(d! t log t) where t is the number of maximal simplices in the d-complex Ω. This decomposition algorithm produces a map, we will denote with σ, that maps back vertices in ∇Ω into their original vertex in Ω (i.e. σ(∇Ω) = Ω). An example of the σ map associated with a standard decomposition is presented in Figure 3. The standard decomposition is a complex formed of regular connected components, and each of its components belong to a class of complexes, that we called initial quasi-manifolds, which admit the local characterization. A regular h-complex Ω is an initial quasi-manifold if and only if we can always traverse the maximal h-simplices in the star of each vertex through manifold (h − 1)-faces (see [3] for the formal definition of initial quasi-manifolds). In this case we say that the star of each vertex is manifold-connected. This characterization is relevant to the design of data structures, as we will see in the next sections. The class of initial quasi-manifolds coincides with that of manifolds in dimension d ≤ 2, while in higher dimension (d ≥ 3) there are initial quasi-manifolds which are non-manifold, and it also is possible to build examples of initial quasimanifolds that are not even pseudomanifolds [14].
5
The Non-manifold Data Structure (NMD-DS)
In this section, we present a data structure, that we call NMD-DS, to encode non-manifold d-complexes according to their decomposition. The data structure contains a lower level, which encodes separately each initial quasi-manifold component obtained from decomposition; and an upper layer, which encodes
A Representation for Abstract Simplicial Complexes
459
information necessary to traverse different components through (non-manifold) joints. The lower level is based on a data structure, we called the Initial Quasi Manifold Data Struture (IQM-DS) to encode initial quasi-manifolds that we sketched already in [4]. Next, in order to extract all topological relations efficiently , we add to this two level data structure a set of d partial relations, each denoted by V i T , for 0 ≤ i < d. Partial relation V i T will give, for each i-simplex γ, an h-simplex that is incident to γ. Proofs about claims and analyses of space and time complexity are omitted for brevity. All proofs can be found in [14].
5.1
The Initial Quasi Manifold Data Structure (IQM-DS)
Let Ω be a h-dimensional initial quasi-manifold simplicial complex, with h > 0, we represent Ω with a data structure that extends the indexed data structure with adjacencies [15]. In the original IA, a complex is represented encoding relations Rh0 and Rhh assuming that the encoded complex is a pseudomanifold. We extend this data structure to accomodate non-pseudomanifold situations where the set Rhh (ξ) has two or more h-simplices. Let ξ be a non-manifold (h − 1)-face, and let γ0 , . . . , γk−1 be the h-simplices incident at ξ. Then for i = 0 . . . k − 1, in the adjacency list of γi , at the entry corresponding to ξ, we encode a link to γ(i+1) mod k . This allows us to visit cells incident at ξ in cyclic order, thus supporting efficient retrieval of the complete relation Rhh for all such cells. In Figure 2b we report references for the R20 and the R22 relations in the adjiacency data structure for the 2-complex in Figure 2a. As in the original IA symbol ⊥ is used to mean ”no adjacency”. Note that in the tables for these two relations we adopt the usual consistency rules in ordering the two lists of links encoding R20 and R22 for a given triangle t, i.e.: the adjacency at a given position i in the list of R22 corresponds to the edge of t which is opposite to the vertex at the same position i in the list of R20 . It is easy to see that, using this data strucure, we can encode Rh0 and Rhh relations using 2(h+1) references for each h-simplex. Moreover through a suitable renumbering of vertices and top cells, we may obtain that a vertex indexed by wi , for i > h, is always incident at a top h-cell indexed by t(i−h) (i.e. wi ∈ Rh0 (t(i−h) ) for i > h). This renumbering allows us to maintain the V 0 T implicitly. Furthermore, by imposing Rh0 (t1 ) = {w1 , . . . , wh+1 }, and by exploiting wi ∈ Rh0 (t(i−h) ) for i > h we can encode part of the Rh0 relation implicitly saving v references. This reduce storage cost for the three relations Rh0 , Rhh and V 0 T relations 2fh (h + 1) − v, where fh is the number of h-simplices. The table in Figure 2, for instance, is built by using this numbering scheme (i.e. w3 is in t1 , w4 is in t2 etc.). Such a data structure is sufficient to retrieve all topological relations for an initial quasi-manifold h-complex. Vertex based relations R0m , for (h − 2) ≤ m ≤ h, can be computed in O(|R0m |) whenever the given abstract simplicial complex is imbeddable in Rh . In particular we can compute in linear time all vertex bases topological relations R0m in for 2 and 3 complexes embeddable in R3 .
460
Leila De Floriani, Franco Morando, and Enrico Puppo
t t1 t2 t3 t4 t5
R20 (t) (w1 , w2 , w3 ) (w4 , w1 , w3 ) (w5 , w4 , w3 ) (w6 , w5 , w1 ) (w5 , w1 , w4 )
R22 (t) (⊥, t2 , ⊥) (t1 , t3 , t5 ) (t2 , ⊥, t5 ) (t5 , ⊥, ⊥) (t2 , t3 , t4 )
(b) (a) Fig. 2. References for the R20 and the R22 relations in the IA for the 2-complex on the left.
(a)
(b)
(c)
Fig. 3. A 3-complex (a) and its standard decomposition (b). Edge uv splits into u1 v1 and u2 v2 and the star of u2 v2 is not manifold connected.
5.2
A Data Structure to Connect Components
We now assume that each component of ∇Ω obtained from the decomposition algorithm is encoded with the IQM-DS data structure described in the previous subsection. In order to build a data structure for the original complex Ω, we add to this data structure the encoding of the map σ. Recall that the map σ is computed by the the decomposition process (See Section 4) and is such that σ(∇Ω) = Ω. Consider for instance for the complex of Figure 3a whose standard decomposition is in Figure 3b. The maps σ and σ −1 for this complex are shown in Figure 3c. The encoding of ∇Ω together with maps σ and σ −1 is sufficient to extract all topological relations. However, more complex relations are necessary to ensure more efficient traversal. To this aim, we introduce partial relation σ∇ . This relation is defined for all simplices γ ∈ Ω such that γ is either a splitting simplex or a simplex whose star is not (h − 1)-connected. When γ is a splitting simplex, σ∇ relates γ with the set
A Representation for Abstract Simplicial Complexes
461
of simplices into which γ is split (they are called copies of γ). Figure 3c reports the map σ∇ for the complex of Figure 3a. Note that edge uv splits into the two copies u1 v1 and u2 v2 and the star of u2 v2 is not manifold-connected in ∇Ω. The encoding of components of ∇Ω together with the encoding of σ, σ −1 and σ∇ is sufficient to compute efficiently all topological relations Rpq (γ) provided that, we can supply a top simplex θ incident to γ. We will denote with Rpq (γ|θ) this computation (reads Rpq (γ) given θ). If we assume logarithmic access time to the maps σ, σ −1 and σ∇ then Rpq (γ|θ) can be computed, for a d-complex Ω imbeddable in Rd , in O(|Rpq (γ)| + log nst ) for all (d − 3) ≤ p < q ≤ d, where nst is the total number of top simplices incident to non-manifold vertices. This means that, for d = 2 and d = 3, under the above assumptions, all topological relations Rpq (γ|θ) are computed in O(|Rpq (γ)| + log nst ). Adding an encoding of relations V p T we can provide a top simplex θ incident to a generic p-simplex γ. We assume that access to relation V p T can be done in O(log |Ω [p] |) (recall that Ω [p] is the set of all simplices of dimension p). With this assumption it is easy to see that relation Rpq (γ) can be computed in O(|Rpq (γ)|+ log nst + log |Ω [p] |). 5.3
Implementation and Storage Requirements
We describe here an implementation of the NMD-DS that is optimized for storage costs as well as for traversal operations. This implementation is inherently static and do not support editing operations. Maps σ and σ −1 are encoded as balanced binary search trees, which support logarithmic access time, implemented as arrays. Each entry in the array encoding map σ contains one key corresponding to a vertex copy and one pointer to its corresponding split vertex, for a total cost of 2nc references, being nc the number of vertex copies introduced by the decomposition process. Similarly each entry in the array encoding the map σ −1 contains one key corresponding to a split vertex and one pointer to the list of its vertex copies. All vertex copies can be maintained in a single array, segmented according to the different (disjoint) lists corresponding to split vertices. Thus, one list can be located in such array by two offset numbers, which can be compressed in a single reference. Therefore, σ −1 can be implemented by 2ns +nc references in total, being ns the number of original vertices duplicated by the decomposition process (nc ≥ 2ns ). Relations V p T and σ∇ are encoded as trie dictionaries whose words are the sequences of vertex indexes obtained by lexicographic ordering of simplices. A trie dictionary is usually implemented as a special binary search tree called a ternary tree. We assume again an array implementation of this tree. In this case, the trie for the map σ∇ for a d-complex takes less than (2d+1 − (d + 3))nst (4d+1) references. In order to implement relations V p T , for 0 ≤ p < d, we note that all trees for all tries for V p T for all 0 ≤ p < d overlap. From this property, assuming again an array implementation for the trie, the collective encoding of all relations V p T for 0 < p < d can be done with 2|Ω d−2 | + 2|Ω [d−1] | − v references.
462
Leila De Floriani, Franco Morando, and Enrico Puppo
Table 1. (a) Acronyms for data structures reviewed in Section 3; (b) ratios of storage costs against reviewed data structures for the NMD-DS used to encode a simplicial 3-manifold; (c) break-even thresholds on the number of singlarities that make other data structures more competitive than the NMD-DS. (a) IA Indexed with Adjacencies [15] CT Cell Tuple [1] nGM n-G-Map [12] RE Radial Edge [10] TCC Tri-Cyclic Cusps [8] PE Partial Entity [11] TS Triangle-Segment [2]
6
(b)
IA CT nGM
Ratio to NMD-DS < 0.21 > 2.28 > 2.28
(c)
RE TCC PE
threshold > 86 > 50 > 33
Comparisons and Discussion
In this Section, we compare the NMD-DS data structure with the data structures reviewed in Section 3 and listed in Table 1a. In Table 1b we compare the NMDDS, over the 3-manifold domain, against data structures IA, CT and nGM. Over the 3-manifold domain the NMD-DS reduces to the IQM-DS augmented with the V i T relations. Our comparison shows that the NMD-DS it requires nearly five times the space required by the IA. However, edge-based (R1h ) and face-based (R2h ) relations cannot be efficiently retrieved from the IA while it is possible to retrieve all topological relations in optimal time from the NMD-DS (see [14] for details). The NMD-DS encodes non-manifoldness in a separate layer and thus NMDDS storage requirements grow as the degree of the non-manifoldness increases. We have compared the NMD-DS with the RE, the TCC and the PE data structures used for representing non-manifold solids. For each of them, we compute a threshold, on the number of top simplices incident to a singular vertex, below which our data structure is more compact than the others. Table 1(c) summarizes the results of this analysis (see [14] for details). The break-even point, above which our data structure is no longer competitive, occurs, for the PE, when at least one third of top simplices are incident to a singular vertex. Storage cost of the NMD-DS and that of the TS both depends on the degree of non-manifoldness in the modeled 2-complex. However, under the hypthesis that the average vertex order is greater than six we find that the TS is always more compact than the NMD-DS.
7
Concluding Remarks
In this paper, we have introduced a new, dimension-independent, data structure for describing simplicial complexes, called the Non-Manifold Decomposition data structure (NMD-DS). The NMD-DS is a two-level data structure being based
A Representation for Abstract Simplicial Complexes
463
on a decomposition of the complex into simpler components, called initial quasimanifolds, which can be encoded in a compact data structure supporting efficient traversal. We have reviewed and analyzed existing data structures for simplicial and cell complexes, and we have evaluated the NMD-DS data structure with respect to them. The NMD-DS structure supports efficient traversal algorithms, and it is compact. In particular, our analysis has shown that it is more compact than any data structure for non-manifold solids when less than one third of the cells of the complex are non-manifold. Moreover, the NMD-DS structure scales very well to the manifold case, since it exhibits a negligible overhead when applied to a manifold complex.
Acknowledgements This work has been performed while Leila De Floriani has been visiting the Computer Science Department of the University of Maryland at College Park (USA). This work has been supported by the Italian Ministry of Education University and Research under FIRB project MACROGeo (contract N.RBAU01MZJ5), by the Italian Space Agency (ASI) under project “Augmented Reality for Teleoperation of Free Flying Robots” and by the Italian National Research Council under research project Efficient modeling and transmission of three-dimensional scenes and objects under program “Agenzia 2000”, Contract N.CNRC00FE45 004.
References 1. E. Brisson. Representing geometric structures in d dimensions: Topology and order. In Proceedings 5th ACM Symposium on Computational Geometry, pages 218–227. ACM Press, June 1989. 2. L. De Floriani, P. Magillo, E. Puppo, and D. Sobrero. A multi-resolution topological representation for non-manifold meshes. In Proceedings 7th ACM Symposium on Solid Modeling and Applications (SM02) Saarbrucken, Germany, June 17-21, pages 159–170, 2002. 3. L. De Floriani, M.M. Mesmoudi, E. Puppo, and F. Morando. Non-manifold decomposition in arbitrary dimensions. In A. Vialard A. Braquelaire, J.O. Lachaud, editor, Discrete Geometry for Computer Imagery, volume 2301 of Lecture Notes in Computer Science, pages 69–80. Springer-Verlag, 2002. Extended version to appear in Graphical Models. 4. L. De Floriani, F. Morando, and E. Puppo. Representation of non-manifold objects through decomposition into nearly manifold parts. In V. Shapiro G. Elber, editor, Proceedings 8th ACM Symposium on Solid Modeling and Applications Seattle, WA, June 16-20, pages 304–309. ACM Press, 2003. 5. H. Edelsbrunner. Algorithms in combinatorial geometry. In Brauer, W., Rozenberg, G., and Salomaa, A., editors, EATCS Monographs on Theoretical Computer Science. Springer-Verlag, 1987. 6. H. Elter and P. Lienhardt. Cellular complexes as structured semi-simplicial sets. International Journal of Shape Modeling, 1(2):191–217, 1994.
464
Leila De Floriani, Franco Morando, and Enrico Puppo
7. L. C. Glaser. Geometric combinatorial Topology. Van Nostrand Reinhold, New York, 1970. 8. E. L. Gursoz, Y. Choi, and F. B. Prinz. Vertex-based representation of nonmanifold boundaries. In M. J. Wozny, J. U. Turner, and K. Preiss, editors, Geometric Modeling for Product Engineering, pages 107–130. Elsevier Science Publishers B.V., North Holland, 1990. 9. V. A. Kovalevsky. Finite topology as applied to image analysis. Computer Vision, Graphics, and Image Processing, 46(2):141–161, May 1989. 10. K.Weiler. The radial edge data structure: A topological representation for nonmanifold geometric boundary modeling. In H.W. McLaughlin J.L. Encarnacao, M.J. Wozny, editor, Geometric Modeling for CAD Applications, pages 3–36. Elsevier Science Publishers B.V. (North–Holland), Amsterdam, 1988. 11. S.H. Lee and K. Lee. Partial entity structure: a fast and compact non-manifold boundary representation based on partial topological entities. In Proceedings Sixth ACM Symposium on Solid Modeling and Applications. Ann Arbor, Michigan, June 2001. 12. P. Lienhardt. Topological models for boundary representation: a comparison with n-dimensional generalized maps. CAD, 23(1):59–82, 1991. 13. M. Mantyla. An Introduction to Solid Modeling. Computer Science Press, 1983. 14. F. Morando. Decomposition and Modeling in the Non-Manifold domain. PhD thesis, February 2003. 15. A. Paoluzzi, F. Bernardini, C. Cattani, and V. Ferrucci. Dimension-independent modeling with simplicial complexes. ACM Transactions on Graphics, 12(1):56–102, January 1993.
A Computation of a Crystalline Flow Starting from Non-admissible Polygon Using Expanding Selfsimilar Solutions Hidekata Hontani1 , Mi-Ho Giga2 , Yoshikazu Giga2 , and Koichiro Deguchi3 1
Department of Informatics, Yamagata University 4-3-16, Yonezawa, Yamagata, 992-8510 Japan [email protected] 2 Department of Mathematics, Hokkaido University nishi-8, kita-10-jo, Sapporo, Hokkaido, 060-0810 Japan {mihogiga,giga}@math.sci.hokudai.ac.jp 3 Department of System Information Science, Tohoku University Aramaki-aza Aoba01, Aoba-ku, Sendai, Miyagi, 980-8579, Japan [email protected]
Abstract. A numerical method for obtaining a crystalline flow from a given polygon is presented. A crystalline flow is a discrete version of a classical curvature flow. In a crystalline flow, a given polygon evolves, and it remains polygonal through the evolving process. Each facet moves keeping its normal direction, and the normal velocity is determined by the length of the facet. In some cases, a set of new facets sprout out at the very beginning of the evolving process. The facet length is governed by a system of singular ordinary differential equations. The proposed method solves the system of ODEs, and obtain the length of each new facet, systematically. Experimental results show that the method obtains a crystalline flow from a given polygon successfully.
1
Introduction
An evolution based multi-scale analysis plays an important role to characterize a contour figure in an image[1][2]. A family of evolving contours that is called a curvature flow is used for this analysis. In the flow, every point in the contour moves toward the normal direction of the contour with the velocity V determined by the curvature κ at each position. As a given contour evolves, its shape changes. Observing the change, those methods extract shape features of a given contour. The flow given by setting V = κ is called the curve shortening flow, which is widely used for the multi-scale analysis[3][4][5][6]. For the computation of the curve shortening flow, there have been proposed several methods. For those methods, the representation of a contour figure is quite important, since each of those methods represents the smoothly evolving curve discretely. For example, the Gaussian-based method[3] represents an evolving contour figure by a set of points that are equally spaced in the contour. The coordinates of the ith point is represented as (x(i∆), y(i∆)) where ∆ denotes I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 465–474, 2003. c Springer-Verlag Berlin Heidelberg 2003
466
Hidekata Hontani et al.
the interval between adjacent points. The method iterates two processes: (1) smoothing both x(·) and y(·) with a small scale Gaussian filter, and (2) resampling the resulted contour at equal intervals after the smoothing. The resampling process is needed because the arc length changes as the contour evolves. This method can obtain the curvature flow without computing the curvature. On the other hand, the resampling process deforms the shape of the represented contour figure slightly at each iteration step. In addition, the interval ∆ changes at each iteration because ∆ must aliquot of the total peripheral length, but it is not realizable. A level set method[7][8][9] is a powerful tool for obtaining a curvature flow. It is widely used. The method represents an evolving interface as the zero level set of a higher dimensional function φ. For example, an evolving contour in a x-y plane is represented as the zero level set of the evolving function φ(x, y; t). This method needs no arc length parameter along the contour. For obtaining the curvature flow, we only need to solve the level set equation φt + κ|∇φ| = 0. In the computation, the function φ is discretely represented on fixed pixels, and finite difference operators are used for computing the spatial derivatives. The operators’ width is usually two or three pixels. If there is a small part in the evolving contour that is comparable to the operators’ width, then, the computed values do not approximate well the spatial derivatives. In [10] and [11], a crystalline flow is proposed. The crystalline flow is a special family of evolving polygons. It can be regarded as a discrete version of a classical curvature flow. A given contour figure in an image can be often interpreted as a polygon. In the evolving process of the crystalline flow, a given polygon remains polygonal through the evolving process. Each facet moves keeping its normal direction. The velocity is determined by the non-local curvature, which depends on the length of the facet. Polygons are well represented in a discrete manner. Different from a classical curvature flow, it is not difficult to compute the nonlocal curvature correctly, and to obtain the crystalline flow if an appropriate initial polygon is given. The crystalline flow proposed in [10] and [11] restricts initial polygonal contour. In [12] and [13], a level set formulation was extended to handle some family of non-local curvature flow including a crystalline flow. Any polygon can be given as an initial contour of a crystalline flow by a level set formulation proposed in [12] and [13]. In some cases, new facets sprout out at corners of a given polygon, at the very beginning of the evolving process. Once new facets sprout out, then, no new facet sprout out any more, and the number of facets decreases monotonically, as time increases. Recently, a system of singular ordinary differential equations of facet length has been studied to handle new facets sprouting out[14]. In this article, we present a numerical method for solving the system of ODEs and for obtaining a crystalline flow from a given polygon. The length of each new facet is systematically calculated. The proposed method enables to use any simple and convex polygon as the Wulff shape, which determines the nonlocal curvature of each facet.
A Computation of a Crystalline Flow
2 2.1
467
Crystalline Flow Weighted Curvature Flow
First, we recall the notion of the weighted curvature. Let γ be a continuous, convex function on R2 which is positively homogeneous of degree one, i.e., γ(λp) = λγ(p) for all p ∈ R2 , λ > 0. Assume that γ(p) > 0 for p = 0. For a moment assume that γ is smooth (except the origin). For an oriented curve S with the orientation n, which is a unit normal, we call Λγ (n) = −div(ξ(n)) the weighted curvature of S in the direction of n, where ξ = ∇γ. We note that the weighted curvature of S is the first variation of I(S) with respect to a variation of the area enclosed by S; here I(S) is defined by I(S) = γ(n)ds, (1) S
where ds denotes the line element; I(S) is called the interfacial energy with an interfacial energy density γ. We recall that the Wulff shape defined by Wγ =
∩ {x ∈ R2 ; x · m ≤ γ(m)}
|m|=1
is the unique minimizer of I(S) among all S whose enclosed area is the same as Wγ (see e.g. [15]). If γ(p) = |p|, then Λγ is the usual curvature, and Wγ is nothing but a unit disk. For any γ the weighted curvature of ∂Wγ always equals −1, so Wγ plays the role of a unit disk for the usual curvature. We consider a motion of an evolving curve Γt governed by the anisotropic curvature flow equation of the form V = Λγ (n)
(2)
on Γt , where V denotes the normal velocity of {Γt } in the direction of n. When γ(p) = |p|, equation (2) becomes the curve shortening equation. There are several methods to track evolution of Γt ; one of a typical method is the level-set method (see [7], [8], [9], [16]). If γ is C 2 except the origin, global unique solvability for (2) is established by [9] (see also [17]). However, when γ has corners, conventional notion of a solution including viscosity solutions does not apply to (2). If Frank diagram of γ: Frankγ = {p ∈ R2 ; γ(p) ≤ 1} is a convex polygon, γ is called a crystalline energy (density), and a notion of solution for (2) is proposed by [10] and [11] independently by restricting {Γt } as a special family of evolving polygonal curves called admissible. Even for more general γ with corners not necessarily crystalline energy, the level-set approach for (2) and more general equations is successfully extended by [13] (see also [12]), although the problem has nonlocal nature. They introduced a new notion of solution consistent with that in [10] and [11], and proved the global unique solvability at least for a general initial simple curve (not necessarily admissible).
468
2.2
Hidekata Hontani et al.
Crystalline Flow
Here and hereafter we assume that γ is a crystalline energy, i.e., Frank γ is a convex M -polygon. In this section we introduce an evolving polygonal curve called a crystalline flow governed by (2). To track such an evolving polygon, we shall derive a system of ordinary differential equations (ODEs) for the length of sides (facets) of the polygon. For this purpose we need to prepare several notions. Let q i (i = 1, . . . , M ) be vertices of Frankγ. We call a simple polygonal curve S as an essentially admissible crystal if the outward unit normal vector m and ˆ of any adjacent segments (facets) of S satisfy m ˆ (1 − λ)m + λm ∈ N ˆ |(1 − λ)m + λm|
(3)
for any λ ∈ (0, 1), where N = {q i /|q i |; i = 1, . . . , M }. Let J be a time interval. We say that a family of polygon {S(t)}t∈J is an essentially admissible evolving crystal if S(t) is an essentially admissible crystal for all t ∈ J and each corner moves continuously differentiably in time. These conditions imply that the orientation of each facet is preserved in J. By definition S(t) is of the form S(t) = ∪rj=1 Sj (t) where Sj (t) is a maximal, nontrivial, closed segment and its orientation is nj . Here we number facets clockwise. Then we obtain a transport equation for Lj (t) which is the length of Sj (t): 1 1 dLj (t) = (cot ψj + cot ψj+1 )Vj − Vj−1 − Vj+1 dt sin ψj sin ψj+1
(4)
for j = 1, . . . , r; index j is considered modulo r. Here ψj = θj − θj−1 (modulo 2π ) with nj = (cos θj , sin θj ), and Vj denotes the normal velocity of Sj (t) in the direction of nj . We say that an essentially admissible crystal {S(t)}t∈J is a γ-regular flow of (2) if (5) Vj (t) = χj ∆(nj )/Lj (t) ˆ sin θ) ˆ ˆ = γ˜ (θˆ + 0) − γ˜ (θˆ − 0) with m ˆ = (cos θ, for j = 1, 2, . . . , r. Here ∆(m) ˆ is the length of facet of Wγ and γ˜ (θ) = γ(cos θ, sin θ). We note that ∆(m) ˆ if m ˆ ∈ N , otherwise ∆(m) ˆ = 0. The quantity χj is with outward normal m called a transition number, and takes +1 (resp.−1) if the j-th facet is concave (resp. convex) in the direction of nj , otherwise χj = 0. We call the quantity Λj ≡ χj ∆(nj )/Lj (t) as a nonlocal weighted curvature of the j-th facet with respect to γ. (We use the convention that 1/Lj (t) = 0 if Lj (t) = ∞.) Thus we get a system of ODEs (4) and (5) for Lj (t)’s. For a moment we assume that S(0) is an essentially admissible closed curve. A fundamental theory of ODE yields the (local in time) unique solvability of (4) and (5). Unless S(t) shrinks to a point, self-intersects, or develops degenerate pinching at most two consecutive facets with zero nonlocal weight curvatures may disappear (i.e., the length of a facet tends to zero) at some time T∗ . However, S(T∗ ) remains essentially admissible, so that we can continue calculating the ODE system (5),(6) for t > T∗ starting with initial data S(T∗ ) (see [11][12]).
A Computation of a Crystalline Flow
469
We say that {S(t)}t∈J is a crystalline flow with initial data S(0) , if there is some t0 = 0 < t1 < t2 < · · · < tl , such that {S(t)}t∈J h is a γ-regular flow for Jh = [th , th+1 ) with initial data S(th ) (h = 0, 1, . . . , l − 1), and S(t) → S(th+1 ) in the sense of the Hausdorff distance topology as t ↑ th+1 and some facets disappear at th+1 (h = 0, 1, . . . , l − 2). By a similar argument as in [12], we see that a crystalline flow {S(t)}t∈J starting with essentially admissible closed curve S(0) shrinks to a point and does not intersect nor develop degenerate pinching. A crystalline flow {S(t)}t∈J agrees with a solution by level-set approach for (2) introduced in [13], by a similar argument as in [12]. The discussion in [12] is for an admissible evolving crystal but it is easy to extend to an essentially admissible evolving crystal. For convenience we recall the notion of an admissible evolving crystal. An essentially admissible crystal S is called an admissible crystal if the outward unit normal vector m of each segment of S belongs to N . We say {S(t)}t∈J is an admissible evolving crystal if S(t) is an admissible crystal for each t ∈ J. 2.3
General Polygonal Initial Curve
In the previous section we restricted an initial curve to an essentially admissible crystals. Here we shall focus on a simple, closed, polygonal initial curve S(0), which is not necessarily an essentially admissible crystal. In [13], it is shown that there exists a unique level-set flow (solution) for (2) with a crystalline energy γ starting with a general polygonal initial curve. However, it is not clear a priori whether or not the solution is described by an ODE system, since new facets whose orientation belongs to N are expected to be created instantaneously at the place where the property (3) is violated on S(0). Moreover, it is not clear how to solve the expected ODE system since it is singular at newly created facets. In this section we give a heuristic argument to solve such a singular ODE system. ˆ be the orientation of any adjacent facets Sj (0) and Sj+1 (0) of Let m and m S(0). If ˆ (1 − λ)m + λm ∈ N;0 < λ < 1 M≡ ˆ |(1 − λ)m + λm| is not the empty set, all facets (say, R1 (t), . . . , Rn (t), numbered clockwisely) with orientation in M is expected to be created between Sj (0) and Sj+1 (0) just after t = 0, so that the transition number of each Ri (t) is 1 (resp. −1) for small t > 0 if the bounded polygon enclosed by S(0) is concave (resp. convex) near Sj (0) ∩ Sj+1 (0). By inserting these newly created facets, our solution S(t) becomes essentially admissible instantaneously. This observation should be justified by approximating S(0) by essentially admissible crystals from inside and from outside with comparison principle[13]. For a given initial polygon S(0) one is able to find the place, the orientation and the transition number of the all facets that are expected to be newly created at initial time. For later convenience, we shall re-number clockwisely all facets of S(0) and all facets that are expected to be created at t = 0, i.e., the length of a newly created facet equals 0 at t = 0. Then the expected ODE system for a
470
Hidekata Hontani et al.
simple, closed, polygonal initial curve S(0) again becomes (4) and (5); however, the initial data Lj (0) may be 0. The ODE system is of the form dLj (t) p˜j q˜j−1 r˜j+1 = + + dt Lj (t) Lj−1 (t) Lj+1 (t)
(6)
for j = 1, . . . , r ; index j is considered modulo r . Here numbers p˜j , q˜j , r˜j are determined uniquely by (4) and (5), since the transition number and the orientation of a newly created facet are known. To solve the equation (6) we consider Puiseux series Lj (t) =
∞
ajk tk/2 ,
(7)
k=0
with real number ajk . Clearly, for j with Lj (0) = 0 the coefficient aj0 must be zero. Suppose that n consecutive facets, say S1 (t), . . . , Sn (t) are created at t = 0, i.e. L1 (0) = . . . = Ln (0) = 0 and L0 (0), Ln+1 (0) > 0. We plug (7) into (6) and multiply t1/2 with the both sides of (6). Comparing both sides we observe that all coefficients are determined. The first coefficients {aj1 }nj=1 have a significant meaning. If the nonlocal curvature of S0 (0) and Sn+1 (0) equal zero, then Lj (t) = aj1 t1/2 for j = 1, . . . , n exactly solves the ODE system (6) with j = 1, . . . , n (as long as both S0 (t) and Sn+1 (t) exist), since it is decoupled from the whole system (6) with j = 1, . . . , r by the fact q˜0 = 0 = r˜n+1 . In this case the solution {aj1 }nj=1 represents a selfsimilar expanding solution of the problem in the next section. 2.4
Selfsimilar Expanding Solutions
Let {S(t)}t>0 be an essentially admissible evolving crystal of the form n+1
S(t) = ∪ Sj (t) j=0
with nonparallel half lines S0 (t) and Sn+1 (t). We say that {S(t)}t>0 is selfsimilar if there exists an essentially admissible crystal S∗ such that S(t) = t1/2 S∗ = {t1/2 x; x ∈ S∗ },
t > 0.
If {S(t)}t>0 solves (6), we call {S(t)}t>0 a selfsimilar expanding solution of (2). By definition S(+0) = limt↓0 S(t) consists of two (nonparallel) half lines emanated from the origin. We also observe that ∪nj=1 Sj (t) is admissible for all t > 0 and that the transition number of Sj (t) is independent of j = 1, . . . , n and t > 0; it must be either −1 or +1. It turns out that {S(t)}t>0 is a selfsimilar expanding solution if and only if the length Lj (t) of Sj (t) (j = 1, . . . , n) solves the ODE system (6) for t > 0 and for j = 1, . . . , n with q˜0 = 0 = r˜n+1 . Note that aj1 of Lj (t) = aj1 t1/2 represents the length of j-th facet of S∗ for j = 1, . . . , n.
A Computation of a Crystalline Flow
471
Theorem For a given oriented closed cone C (with connected interior) there exists a unique selfsimilar expanding solution S(t) such that S(+0) agrees with the boundary of C (see [14]). From ODE system (6) we see that this problem is equivalent to the unique solvability of algebraic equation 1/an p˜n q˜n−1 an r˜n p˜n−1 q˜n−2 1/an−1 an−1 0 r˜n−1 p˜n−2 q˜n−3 1/an−2 an−2 (8) = 2 .. .. .. .. .. . . . . . a2 0 r˜3 p˜2 q˜1 1/a2 a1 r˜2 p˜1 1/a1 for aj = aj1 (j = 1, 2, . . . , n). We solved this equation by a method of continuity while we proved the uniqueness of a solution by a geometric observation[14].
3
Numerical Method for Obtaining a Crystalline Flow
In this section, we describe a numerical method for obtaining a crystalline flow starting from a given polygon that is not necessarily an essentially admissible ˆ of the initial polycrystal. For each adjacent facets with orientation m and m gon, if M = ∅ then all facets with orientation in M should newly sprout out instantaneously, so that the given polygon becomes essentially admissible instantaneously. Once the polygon becomes essentially admissible, no new facet sprout out and remains essentially admissible through the evolving process. We calculate singular ODE system (4) and (5) by the Euler method. A special treatment is necessary to get approximate length of newly created facets at first √ time step ∆t. We take Lj (∆t) = aj ∆t for such facets, where aj is a numerical solution of (8). To solve (8) numerically, as in [14] we rewrite (8) with αj = 1/aj : αn pn qn−1 1/αn αn−1 rn pn−1 qn−2 1/αn−1 0 .. .. . . . .. .. .. (9) = Hn . , where Hn = , . α2 0 1/α2 r3 p2 q1 1/α1 α1 r2 p1 pj , qj = 2˜ qj , and rj = 2˜ rj . We introduce extra parameter s ∈ [0, 1] by pj = 2˜ replacing Hn by Kn (s) in (9). pn sqn−1 srn pn−1 sqn−2 0 .. .. .. (10) Kn (s) = . . . . 0 sr3 p2 sq1 sr2 p1
472
Hidekata Hontani et al.
√ Evidently [1/αj (0)] = Kn (0)[αj (0)] can be easily solved: αj (0) = 1/ pj . Based on [14], we calculate the numerical solution of (8) as follows. √ 1. Set αj = 1/ pj = 2[cot φj + cot φj+1 ]χj ∆(nj ) for initial values. 2. Apply the Newton-Rapson method to obtain √ the numerical solution of αj . 3. Calculate aj = 1/αj , and set Lj (∆t) = aj ∆t, where ∆t is the time interval. Once new facets are inserted into a given polygon, then the length of each facet Lj (i∆t) is calculated at each time step by the system of (4) and (5). Note that Lj (t) = aj t1/2 is the exact solution if the speeds of both facets bounding newly created facets are zero.
4
Experimental Results
In the first experiment, we used a regular 16-polygon as the Wulff shape, and a square as an initial contour. Let mi (i = 1, 2, · · · , 16) denote the outward unit normals of the Wulff shape. We set that the arg mi = π − π(i − 1)/8 (the facet number is counted clockwise). Let Fj denote the jth facet of the initial contour, and nj (j = 1, 2, 3, 4) be the outward unit normal of Fj . Assume that arg nj = π − π(j − 1)/2. Then, three new facets sprout out at each corner of the square. For example, between F1 (arg n1 = π) and F2 (arg n2 = π/2) of the given square, three facets sprout out of which normals are parallel to m2 , m3 , and m4 , respectively. In order to obtain the quantities of aj , we solve next equations that correspond to (8). pq0 1/a3 a3 4 2 a2 = q p q 1/a2 , where p = and q = − . (11) tan(π/8) tan(π/8) a1 1/a1 0qp Let α = 1/a1 = 1/a3 and β = 1/a2 . The equation(11) can be solved analytically: α = −1/2q(pβ − 1/β), (12) β = [(p2 + q 2 ) + (p2 + q 2 )2 − p2 (p2 − 2q 2 )]/[p(p2 − 2q 2 )]. We can calculate the quantities of aj s using a1 = a3 = 1/α and a2 = 1/β. The values p and q in (12) are known as shown in (11). The values are a1 = a3 1.68 and a2 1.29, respectively. Three facets sprout out with symmetric shape in this case. It should be noted that the shape of the set of new facets are not same with the shape of the corresponding part of the Wulff shape. In this case, the middle facet is shorter than the neighbors, in spite that the Wulff shape is regular. Figure 2 shows some experimental results of crystalline flow. The initial contour is common to all, but the Wulff shape is different. As described before, the Wulff shape plays the role of a unit circle for a classical curve shortening flow. Because the proposed method can obtain a crystalline flow from a non essentially admissible crystal, any simple and convex polygon can be used for the Wulff shape.
A Computation of a Crystalline Flow
473
Fig. 1. An example of the the Wulff shape and an initial contour. An analytic solution can be calculated in this case. Three new facets are inserted at the beginning as shown in the this figure. It should be noted that the middle facet is shorter than side ones.
Fig. 2. Examples of the crystalline flow. The initial contour is common to all, and is shown in the second column. The Wulff shapes are shown at the left: (A) a regular 30-polygon, (B) a decagon two of which facets are longer than others, (C) a regular pentagon, and (D) a 30-polygon each of which facet has the same length.
5
Conclusion
A numerical method for obtaining a crystalline flow from a given polygon that is not essentially admissible is presented. The method enables to use any simple and convex polygon for the Wulff shape, because a crystalline flow can be obtained from any simple polygon even if it is not essentially admissible. In many cases, a contour in an image is given as a polygon. For example, a contour represented with a chain-code is a polygon that consists of short facets. Because the nonlocal curvature Λγ is determined by the facet length, no approximation is needed for the calculation of the curvature. In addition, because each facet moves with keeping its direction, it is not difficult to trace every facet through the evolving process. We believe that those features of a crystalline flow are useful for multi-scale contour figure analysis.
474
Hidekata Hontani et al.
References 1. Koenderink, J. J.: The Structure of Images, Biological Cybernetics, 50 (1984) 363– 370 2. Alvarez, L. and Guichard, F.: Axioms and Fundamental Equations of Image Processing, Arch. Rational Mech. Anal., 123 (1993) 199–257 3. Mokhtarian, F. and Mackworth, A.: A Theory of Multiscale, Curvature-Based Shape Representation for Planner Curves, IEEE Transactions on Pattern Analysis and Machine Intelligence, 14, No.8 (1992) 789–805 4. Rattarangsi, A. and Chin, R. T.: Scale-Based Detection of Corners of Planar Curves, IEEE Transactions on Pattern Analysis and Machine Intelligence, 14, No.4 (1992) 430–449 5. Kimia, B. B., and Tannenbaum, A. R. and Zucker S. W.: Shapes, Shocks, and Deformations I: The Components of Two-Dimensional Shape and the ReactionDiffusion Space, International Journal of Computer Vision, 15 (1995) 189–224 6. Hontani, H. and Deguchi, K.: Multi-Scale Image Analysis for Detection of Characteristic Component Figure Shapes and Sizes, Proceedings of 14th International Conference on Pattern Recognition (1998) 1470–1472 7. Osher, S. and Sethian, J. A.: Fronts propagating with curvature-dependent speed: Algorithms based on Hamilton-Jacobi formulations, J. Comput. Phys. 79 (1988) 12–49 8. Evans, L. C. and Spruck, J.: Motion of level-sets by mean curvature, I, J. Differential Geometry, 33 (1991) 635–681 9. Chen, Y.-G., Giga, Y. and Goto, S.: Remarks on viscosity solutions for evolution equations, J. Differential Geometry, 33 (1991) 749–786 10. Angenent, S. B. and Gurtin, M. E.: Multiphase thermomechanics with interfacial structure 2. Evolution of an isothermal interface, Arch. Rational Mech. Anal., 108 (1989) 323–391 11. Taylor, J.: Constructions and conjectures in crystalline nondifferential geometry, Proceedings of the Conference on Differential Geometry, 52, Pitman, London (1991) 321–336 12. Giga, M. -H. and Giga, Y.: Crystalline and level-set flow – Convergence of a crystalline algorithm for a general anisotropic curvature flow in the plane, Free boundary problems: theory and applications I (ed. N. Kenmochi) Gakuto International Ser. Math. Sci. Appl., 13 (2000) 64–79 13. Giga, M.-H. and Giga, Y.: Generalized Motion by Nonlocal Curvature in the Plane, Arch. Rational Mech. Anal., 159 (2001) 295–333 14. Giga, M. -H., Giga, Y. and Hontani, H.: Selfsimilar solutions in motion of curves by crystalline energy, Minisymposium lecture of 5th International Congress on Industrial and Applied Mathematics, Sydney, (2003), July 15. Gurtin, M. E.: Thermomechanics of Evolving Phase Boundaries in the Plane, Oxford, Clarendon Press (1993) 16. Giga, Y.: A level set method for surface evolution equations, Sugaku 47 (1993) 321–340: Eng. translation, Sugaku Exposition 10 (1995), 217–241 17. Giga, Y. and Goto, S.: Motion of hypersurfaces and geometric equations, J. Mathematical Society Japan, 44 (1992) 99–111
Morphological Image Reconstruction with Criterion from Labelled Markers Dami´ an Vargas-Vazquez, Jose Crespo, and Victor Maojo Facultad de Inform´ atica Universidad Polit´ecnica de Madrid 28660 Boadilla del Monte (Madrid), Spain [email protected]
Abstract. In Mathematical Morphology, the reconstruction of images from markers has proven to be useful in morphological filtering and image segmentation. This work investigates the utilization of a criterion in the reconstruction process, whose utilization in the problem of the image reconstruction from an image marker has been partially treated elsewhere. This work further investigates this idea and extends it to the problem of image reconstruction from labelled markers. In the binary case, this allows us to compute the modified influence zones associated to the set of labelled markers. A significant difference with the usual case (i.e., the “normal” influence zones) is that we generally do not obtain a whole partition of the space, because the criterion added to the reconstruction process causes that some points or pixels are not recovered. In addition, in this paper we consider the gray-level case, and we use the reconstruction with criterion to separate regions from a non-binary input image. This input image is considered as a topographic relief (similarly as in a normal watershed); however, the flooding mechanism is modified by the reconstruction criterion. The benefit is that we can control to some extent how the flooding proceeds and, therefore, how image region shapes are recovered. Keywords: Mathematical Morphology, segmentation, flat zones, labelled markers, reconstruction with criterion.
1
Introduction
In Mathematical Morphology [10] [11] [14] [6] [9], the use of reconstruction algorithms has been successfully used in the stages of image processing and analysis. Filters by reconstruction [12] [2] [11] [8] [3] [4] [13] have become powerful tools that enable us to eliminate undesirable features without practically affecting desirable ones. These filters are computed by reconstructing a reference image f from a marker image g, and they preserve well the shapes of the marked structures. A new type of transformations - known as transformations with reconstruction criterion - are derived from filters by reconstruction. A modification of the reconstruction process, in particular the inclusion of a criterion, allows us to I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 475–484, 2003. c Springer-Verlag Berlin Heidelberg 2003
476
Dami´ an Vargas-Vazquez, Jose Crespo, and Victor Maojo
control the shape of some structures while preserving contours and the structures of interest. The main feature of these transformations is that they enable us to obtain intermediate results between the standard morphological opening (or, respectively, closing) and the opening (resp., closing) by reconstruction, and some of their inconveniences can be avoided. These filters by reconstruction with criterion have been partially treated in [15] [17]. In this paper, we will study the application of the reconstruction criterion to the problem of the reconstruction of an input image from labelled markers (or connected components). In the binary case, we will discuss the main differences that exist with respect to the “non-criterion” case, and we will study how to compute the modified influence zones. As will be discussed, some points are not recovered in the reconstruction process, and the criterion used modifies ultimately the influence zones shapes. In addition, we will apply these ideas to the gray-level case, in which case the reconstruction criterion provides some control about how the “flooding” of the topographical relief constituted by the input image (using terms normally employed in the watershed transformation method) proceeds. As expected from the binary case discussion, not all pixels of the input image will belong to the computed regions, since the criterion causes, in certain situations, that some pixels are not reached by the reconstruction mechanism. Section 2 discusses some aspects of filters by reconstruction with criterion. Section 3 considers the problem of image reconstruction (with criterion) using labelled markers in both the binary case (Section 3.2) and the gray level case (Section 3.3), where differences with the normal reconstruction (i.e., where no criterion is employed) will be highlighted.
2
Openings and Closings with Reconstruction Criterion
The process to build these types of transformations involves the use of a reference image and a marker image. Thus, a reconstruction process of a marker image inside a reference image is made (as is the case in transformations by reconstruction), but a reconstruction criterion is taken into account [15] [17]. Let f and g be the reference and marker image respectively. We will consider the next propagation criteria: f ∧ γλ δ(1) (g)
and f ∨ ϕλ ε(1) (g)
(1)
the first for the opening case, and the second for the closing case. Note: γλ and ϕλ denote, respectively, an opening and a closing of index λ (which defines the structuring element size) and δ(1) and ε(1) symbolize the elementary dilation and erosion, respectively (which, for example, employ a 3 × 3 square structuring element using 8-connectivity). We will refer in the following expressions only to the opening case (dual expressions will apply also to the closing case). Let us remember that, in the opening by reconstruction, the operation used is f ∧ δ(1) (g), which is the geodesic dilation. In our case, the opening γλ plays the special role of propagation criterion. We have the following inequality: g < γλ δ(1) (g) < δ(1) (g)
(2)
Morphological Image Reconstruction with Criterion from Labelled Markers
477
since γλ is an anti-extensive operator. For λ = 0, we have expression g < γλ δ(1) (g) = δ(1) (g). That is, the propagation process of marker g is realized in the same way as in the opening by reconstruction. However, if λ ≥ 1, although inequality γλ δ(1) (g) < δ(1) (g) holds (since γλ is anti-extensive), inequality g ≤ γλ δ(1) (g) is not necessarily true. In the case of the opening by reconstruction, when the marker is given by a morphological opening g = γµ (f ) instead of the erosion function g = εµ (f ), we can obtain the same result (assuming the structuring element contains the center). Specifically, when the marker image is given by g = γµ (f ), for λ ≤ µ + 1, the output images of successive iterations of the operation γλ δ(1) are similar to those generated by the successive iterations of δ(1) . However, the reconstruction process changes when the reference image f is used. It is possible to appreciate (1) the propagation criterion given by γλ if we iterate the operator σλ,f γµ (f ) = f ∧ γλ δ(1) γµ (f ) until idempotence to obtain the opening with reconstruction λ,µ ): criterion γ λ,µ (and, in a similar way, for operator ϕ (n) (1) (1) (1) γ λ,µ (f ) = lim σλ,f γµ (f ) = σλ,f σλ,f · · · σλ,f γµ (f ) . (3) n→∞ until idempotence
In this case, the reference image modifies the reconstruction process of successive iterations of γλ δ(1) , where the opening γλ restricts the reconstruction to some regions of the reference image f. There are inclusion relationships between the flat zones obtained at the output of the opening (and, respectively, closing) with reconstruction criteria and those at the output of classical opening (respectively, closing) by reconstruction. I.e., each flat zone of the output of an opening with reconstruction criterion is included in a flat zone of the output of the corresponding opening by reconstruction. In fact, there are inclusion relationships between those filters and morphological openings and closings (without reconstruction). Thus, using a non-connected opening as a marker we can establish a flat zone inclusion relationship. One extreme would be the case of the non-connected opening (used as the marker), and the other extreme would be the case of the classical opening by reconstruction. Between those cases we would find the gradation constituted by the family of the opening with reconstruction criteria, whose criterion allows us to control the reconstruction of flat zones and the resulting inclusion relationships. Figure 1 and 2 illustrate this. The example in Figure 1 shows the gradation that can be obtained with the opening with reconstruction criteria, whose outputs appear as intermediate results between those of the non-connected filters and of the filters by reconstruction. In the binary case (Figure 2), we can see how we can control the flat zone extension and, if we desire it, separate certain regions in some cases.
478
Dami´ an Vargas-Vazquez, Jose Crespo, and Victor Maojo
(a)
(b)
(d)
(c)
(e)
µ Fig. 1. (a) Morphological opening γµ with µ = 2; (b) opening by reconstruction γ with µ = 2; (c) opening with reconstruction criteria γ λ,µ using µ = 2, λ = 1; (d) γ λ,µ using µ = 2, λ = 2; and (e) γ λ,µ using µ = 2, λ = 3.
3
Reconstruction with Criterion from Labelled Markers
In this section we will discuss the reconstruction with criterion from labelled markers. First, we will consider the binary case where we will discuss the computation of influence zones associated with the markers. Afterwards we will apply these concepts to the gray-level image case. 3.1
General Definitions
We will consider only digital images in the following. A gray-level image can be represented by a function f : D → L, where D is a subset of Z 2 and L is a subset of Z (Z denotes the set of integers). A section of f at level i is a set Xi (f ) defined as: Xi (f ) = {x ∈ D : f (x) ≥ i}. In the same way, we may define the set Zi (f ) as: Zi (f ) = {x ∈ D : f (x) ≤ i}. We have clearly Xi (f ) = (Zi+1 (f )), where denotes the complementation operator. Let M be a set of D. For every point y of M , we will denote the distance function of y to complementary set (M ) as: ∀y ∈ M,
d(y) = dist(y, (M ))
(4)
where dist(y, (M )) is the shortest distance between y and a point of (M ). Let X ⊂ D be a set, and x, y two points of X. We define the geodesic distance dX (x, y) between x and y as the length of the shortest path (if any) included in X and linking x and y.
Morphological Image Reconstruction with Criterion from Labelled Markers
(a)
(b)
(d)
479
(c)
(e)
Fig. 2. (a) Original Image; (b) opening with reconstruction criteria γ λ,µ using µ = 5, λ = 2; (c) γλ,µ using µ = 5, λ = 3; (d) γλ,µ using µ = 5, λ = 4; and (e) γ λ,µ using µ = 5, λ = 5.
Suppose now that M is composed of n connected components (markers) Mi . The geodesic zone of influence zX (Mi ) of marker Mi is the set of points of X located at a finite geodesic distance from Mi that are closer to Mi than to any other marker Mj : zX (Mi ) = {x ∈ X : dX (x, Mi ) finite, ∀j = i, dX (x, Mi ) < dX (x, Mj )}.
(5)
The boundaries between the various zones of influence constitute the geodesic skeleton by zones of influence (SKIZ) of M in X. We can write: IZX (M ) = ∪ zX (Mi ).
(6)
SKIZX (M ) = X / IZX (M ).
(7)
i
and: where / stands for the set subtraction. 3.2
Binary Case: Geodesic Distance Modification
Let CX (x, y) denote the set of paths that link x and y. Such a set can be the empty set, in particular if x and y belong to connected disjoint components of X. We can write the geodesic distance as: dX (x, y) = ∧ {(f ), f ∈ CX (x, y)}.
(8)
dX (x, y) = ∞ if CX (x, y) = ∅.
(9)
where is the length of path of points (number of points).
480
Dami´ an Vargas-Vazquez, Jose Crespo, and Victor Maojo
Suppose now that we apply an opening with reconstruction criterion γ λ,g (expression (3)) to image X. We will modify the geodesic distance dX (x, y) expressions indicated above, and we will use instead DX (x, y), which is defined next: DX (x, y) = ∧ {(f ), f ∈ Cγλ,g (X) (x, y)}. DX (x, y) = ∞ if Cγλ,g (X) (x, y) = ∅.
(10) (11)
Note that, instead of paths included in X, we are considering paths included in the filter output γ λ,g (X).
(a)
(b)
(c)
(d)
Fig. 3. Shortest path differences.
Figure 3 illustrates the DX (x, y) concept. Figure 3(a) and 3(b) display the usual case and the shortest path between a pair of points x and y that belong to X. Figure 3(c) visualizes the filter output γ λ,g (X), and Figure 3(d) displays the shortest path between x and y that is included in γ λ,g (X). Note that, in this case, dX (x, y) is quite different from DX (x, y). Now we will consider the problem of computing the influence zones associated to a set of markers. The following expression will define the new z X (Mi ) influence zone of marker Mi : z λ,g (X) : DX (x, Mi ) finite, ∀j = i, DX (x, Mi ) < DX (x, Mj )}. X (Mi ) = {x ∈ γ (12) where g = ∪ Mi . i
The following figures illustrate the computation of z X (Mi ). Figure 4(a) displays an input set with two markers M1 and M2 . Figure 4(b) and 4(c) display, respectively, the influence zones of M1 and M2 , considering there is only one
Morphological Image Reconstruction with Criterion from Labelled Markers
(a)
(b)
(c)
(d)
481
Fig. 4. Modified influence zones.
marker (M1 or M2 ). Figure 4(d) displays the influence zones of both markers. Note that there are some (a few) points at the right corner in Figure 4(d) that do not belong to any influence zones, but that belong to the influence zones in Figure 4(b) or 4(c). The reason is that there are some pixels that belong to γ λ,g (X) when g = M1 or g = M2 , but not when g = M1 ∪ M2 . 3.3
Gray-Level Case
In this section, we will apply previous concepts to the gray-level case. The input image will be considered as a topographic relief that is flooded (using terms normally used in the watershed method [5] [7] [16] [1]). We will discuss next the expressions of this modified flooding process, which proceeds level by level. Let the marker image Mj be a gray-level image composed of labelled markers mk at time j, where mk is a connected component with label k, where k ∈ {1, ..., N }, and N is the number of markers. In Mj , background pixels (those that do not belong to a labelled marker) has an intensity value of 0. The initial marker image M0 is composed of the set of all the minima of the original image f . Then, the next sequence operation compute the boundary image B (B is a binary image where boundary pixels will have zero intensity value and the rest of them will have the maximum value MaxValue of the images under consideration).1 Initialize counters: i=0, j=1 Initialize Boundary Image B: B(x)=MaxValue ∀ pixel x 1
For example, for two byte-per-pixel images, this value is equal to 65535.
482
Dami´ an Vargas-Vazquez, Jose Crespo, and Victor Maojo
For all levels i of f do Ti (f ) =
MaxValue 0
∀x : f (x) ≤ i otherwise
Do Mj = (γλ δ(1) (Mj−1 ) ∧ Ti (f )) ∨ Mj−1 /∗ image marker update ∗/ for (k = 1, k ≤ N ) do for all border pixels p of mk do if ∃p ∈ NG (p) so that Mj (p ) > 0 and Mj (p ) = k B(p ) = 0 /∗ p’ is labelled as a boundary pixel ∗/ Mj = Mj ∧ B j =j+1 While idempotence is not reached in Mj . i=i+1 where NG (p) is the set of neighbors of a pixel p. Note that the inf operation “Mj = Mj ∧ B” is necessary to prevent the mixing of different markers. At the end of the process, the boundaries are the separations of the modified catchment basins. Nevertheless, note that some pixels may not be flooded (as discussed in Section 3.2 for the binary case), because of the added reconstruction criteria. The suggested reconstruction criterion γλ δ(1) (Mj−1 ), introduced by the transformations described in Section 3, allows us to have some flexibility to separate flat or connected zones, limiting the immersion process to certain zones. In Figure 5, we show an application of this modified flooding transformation for particle extraction in a medical gray-level images. Figure 5(a) displays an input image, and Figure 5(b) show the markers highlighted in white over the original (note that the background marker is also displayed). If we only want to segment the marked cells as unique regions, the watershed transformation will not be the most suitable option, because the flooding process will recover all overlapping components (including the small particle at the upper-left corner joined by a thin structure that, in this case, is desired to be extracted separately) as is showed in Figure 5(c). Figure 5(d) visualizes the image region recovered by the transformation presented above. As we can see, the reconstruction criterion added can prevent the undesired mixing of overlapping particles (as is the case of the particle at the upper-left corner). Note that, in this case, pixels that are not assigned to the particle markers will be ultimately flooded by the background marker. 3.4
Conclusion
This paper has investigated the utilization of a criterion in the reconstruction process and has applied it to the problem of image reconstruction from labelled markers. In the binary case, we have studied the differences that exist in the
Morphological Image Reconstruction with Criterion from Labelled Markers
(a)
(b)
(c)
(d)
483
Fig. 5. (a) Original Image; (b) markers used (displayed in white) over the original image; (c) result of a watershed; and (d) result of modified flooding with γ λ,g (where λ is equal to 5).
computation of the influence zones of each connected component of the marker. As discussed, it is possible that some pixels do not belong to any influence zone because of the added criterion. Then, the gray-level case has been considered, and a modified flooding process arises that can be used to segment regions of interest with additional flexibility. We have shown a medical image example in which the modified flooding process permits to separate overlapping particles.
References 1. S. Beucher, F. Meyer: The morphological approach to segmentation: the watershed transformation, in book “Mathematical morphology in image processing” (Ed.: E. Dougherty), pp. 433-481, New York: Marcel Dekker, 1993. 2. J. Crespo, J. Serra, R. Schafer: Theoretical aspects of morphological filters by reconstruction. Signal Process., 47(2), 201-225, 1995. 3. J. Crespo, R. Schafer: Locality and adjacency stability constraints for morphological connected operators, in Journal of Mathematical Imaging and Vision, vol. 7, pp. 85-102, 1997. 4. J. Crespo, V. Maojo: New Results on the Theory of Morphological Filters by Reconstruction, in Journal “Pattern Recognition”. Vol. 31, Nro. 4, pp. 419-429, April 1998. 5. H. Digabel, C. Lantu´ejoul: C. Iterative algorithms. Second Symposium Europ´een d’Analyse Quantitative des Microstructures en Sciences des Mat´eriaux, Biologie et M´edecine, Caen. J.-L. Chermant, Ed., Riederer Verlag, Stuttgart, pp. 85-99, 1977.
484
Dami´ an Vargas-Vazquez, Jose Crespo, and Victor Maojo
6. H. Heijmans: Morphological Image Operators (Advances in Electronics and Electron Physics, Series Editor: P. Hawkes), Academic Press, 1994. 7. F. Meyer, S. Beucher: Morphological segmentation. J. Visual Commun. Image Repres, Volume 1, Number 1, Pages 21-45, 1990. 8. P. Salembier, J. Serra: Flat zones filtering, connected operators, and filters by reconstruction, in J. of IEEE Transactions on Image Processing, Vol. 4, pp. 11531160, 1995. 9. M. Schmitt, J. Mattioli: Morphologie Mathematique, Masson, 1993. 10. J. Serra: Image Analysis and Mathematical Morphology, Vol. 1, Academic Press, 1982. 11. J. Serra (Ed.). Image Analysis and Mathematical Morphology, Vol. 2, Academic Press, 1988. 12. J. Serra, Ph. Salembier: Connected operators and pyramids. In SPIE, editor, Proc. Image Algebra Math. Morphology, Volume 2030, pages 85-76, San Diego (CA), USA, July, 1993. 13. J. Serra: Connectivity on complete lattices. Journal of Mathematical Imaging and Vision, Volume 9, Pages 231-251, 1998. 14. P. Soille: Morphological Image Analysis: Principles and Applications, Springer, 2nd edition, 2003. 15. I. R. Terol, D. Vargas: A study of openings and closings with reconstruction criteria. In H. Talbot, and R. Beare, Editors. Mathematical Morphology, Proc. of the VIth International Symposium, 2002. 16. L. Vincent, P. Soille: Watersheds in Digital Spaces: An Efficient Algorithm Based on Immersion Simulations, J. IEEE Trans. Pattern Anal. Machine Intell, Volume 13, Pages 583-598, June 1991. 17. D. Vargas, J. Crespo, V. Maojo, I.R. Terol: Medical Image Segmentation Using Openings and Closings with Reconstruction Criteria, to be published in Proceedings of the International Conference on Image Processing ICIP, September 2003.
Intertwined Digital Rays in Discrete Radon Projections Pooled over Adjacent Prime Sized Arrays Imants Svalbe and Andrew Kingston Center for X-ray Physics and Imaging School of Physics and Materials Engineering Monash University, VIC 3800, AUS
Abstract. Digital projections are image intensity sums taken along directed rays that sample whole pixel values at periodic locations along the ray. For 2D square arrays with sides of prime length, the Discrete Radon Transform (DRT) is very efficient at reconstructing digital images from their digital projections. The periodic gaps in digital rays complicate the use of the DRT for efficient reconstruction of tomographic images from real projection data, where there are no gaps along the projection direction. A new approach to bridge this gap problem is to pool DRT digital projections obtained over a variety of prime sized arrays. The digital gaps are then partially filled by a staggered overlap of discrete sample positions to better approximate a continuous projection ray. This paper identifies primes that have similar and distinct DRT pixel sampling patterns for the rays in digital projections. The projections are effectively pooled by combining several images, each reconstructed at a fixed scale, but using projections that are interpolated over different prime sized arrays. The basis for the pooled image reconstruction approach is outlined and we demonstrate the principle of this mechanism works. Keywords: Discrete Radon transform, tomographic image reconstruction.
1
Introduction
The Discrete Radon Transform (DRT) maps discrete image data I(x, y) into discrete digital projections R(t, m) that closely resemble continuous space integral Radon transforms [1]. The inherently discrete nature of the sampling and representation of projections makes the DRT an attractive tool to transform and interpret digital data [2]. In contrast with the case for continuous space, the digital projection mechanism requires no data interpolation, as each digital projection (labelled by index m) sums whole pixel values sampled along its ray direction. The samples are oriented at integer array displacements of xm horizontally and ym vertically on the lattice at each translate position (t). Arrays of prime size [3] generate unique pixel sampling patterns for each DRT projection. This means that digital images can be projected and reconstructed exactly with I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 485–494, 2003. c Springer-Verlag Berlin Heidelberg 2003
486
Imants Svalbe and Andrew Kingston
the DRT using only simple (and hence fast) addition operations. Basic properties of digital projections for the DRT are reviewed in [4,5]. The prime based DRT has been applied successfully for the reconstruction of tomographic images from projections based on x-ray transmission [6,7]. In [7] linear 1D interpolation was used to match each continuous integral ray to a corresponding discretely sampled digital ray in a DRT with expanded translates, called k-space, R(k, θ). The inverse DRT was then used to reconstruct the image using the digital projections inferred from the analogue projection data, after mapping R(k, θ) back to R(t, m). The reconstructed image size is a free variable in such a process. The major disadvantage of this DRT based inversion of analog projections is that accurate image reconstruction requires a very large final image size (or, at best, calculation of a sub-sampled result based a large image). The size of the gaps between samples on digital rays scales as the square root of the image size [4], so that, in increasingly large images, the finite gaps have diminishing importance. The computational efficiency of the DRT method is, however, rapidly overwhelmed by the additional computation required to reconstruct large format digital images. In this paper, we look at reconstructing images using the same DRT method, but by applying it over a relatively small range of neighbouring prime array sizes, to avoid the need to reconstruct large arrays. Linear interpolation is a poor approximation to match digital and continuous rays. Direct interpolation from the rays of a sinogram into k-space at each projection angle is complicated by the jumbled ordering and variable spatial overlap of the digital rays. A more direct approach to solving this interpolation problem is developed in [8]. In this paper, the aim is to compensate for the poor approximation of linear interpolation by pooling the reconstructed images derived from appropriately resampled digital projections. Section 2 establishes the link between the prime array size and the pattern of digital ray sampling for any given digital projection. Section 3 shows that the pattern of sampling for a projection xm : ym repeats for primes p = p ± xm ym and that this repetition limits the range of image array sizes that can be usefully pooled. Section 4 shows examples of digital projections based on different sized images. Comparative image reconstruction results using the pooled projection approach are given in section 5, followed by discussion of the limitations of this technique and conclusions in section 6.
2
Wrap Factors on Prime Arrays
A digital projection with label m samples image pixels on adjacent rows of the image data that are always m horizontal units apart, so that the pixels located at (x, y) and (x + m, y + 1) are always part of the same digital projection. We take (0, 0) to be the image origin, with positive displacements in x increasing to the left with y increasing in the downwards direction. Wherever (x + m) ≥ p, the displacement is wrapped modulus p to a new displacement on the same row. Each digital projection m is comprised of a set of parallel segments or digital rays. These rays link the nearest neighbour pixels of a projection. The samples in
Intertwined Digital Rays in Discrete Radon Projections
487
a ray for projection m are located at regular intervals xm units apart horizontally and ym units apart vertically on a square image array of size p. The sample gap, 2 given by d2m = x2m + ym , is the minimum distance between samples along that ray direction. The xm and ym values must be relatively prime, [9] outlines how the set of xm and ym values for any p are drawn from the Farey series in number theory. The perpendicular separation between digital rays is given by p/dm with the horizontal offset between wrapped rays being p/ym . We are interested in 0 < m < p, as m = 0 and m = p are defined respectively as row and column projections. The integer variable t, 0 ≤ t < p, defines the horizontal translation of a digital projection. For square lattice arrays, with xm < ym , there are four symmetric digital projections, m0 , m1 , m2 and m3 , corresponding to xm : ym , ym : xm , −xm : ym and −ym : xm with projection angles θm , 90 − θm , 90 + θm and 180 − θm . It can be shown that m3 = p − m0 and m2 = p − m1 as these projections form complementary angles. The symmetric projections for each xm : ym are important as they share symmetric patterns of pixel sampling. The values xm and ym are solutions of the digital linear projection equations [4], hence m0 ym = α0 p + xm m1 xm = α1 p + ym m2 xm = α2 p − ym m3 ym = α3 p − xm ,
(1a) (1b) (1c) (1d)
where the four constants αi are positive integers indicating how many times the translation wraps around the array of size p to get to the nearest ray sample location. For example, (1a) means that to project from the sample at (0, 0) to the nearest digital ray sample at (xm , ym ) in ym horizontal steps of m0 , requires α0 wraps around the right edge of the image of size p. Figure 1 shows an example where xm = 2, ym = 13 for m = 422 on a 457 by 457 array. The rays wrap 12 times for the samples to be nearest neighbours. The points sampled by the digital projection xm : ym are characterised completely by some m0 for a particular p, that is (xm , ym ) ⇔ (m0 , p). Whilst there are many possible ways to have α0 wraps on an array of size p, the set {αi } described by (1a – 1d) defines a particular and distinct pattern of sampling of the digital rays across the image space to form that projection. Combining (1a – 1d) α0 + α3 = ym α1 + α2 = xm ,
(2a) (2b)
we see that each {αi } characterises an xm and ym pixel sample pattern independently of p. For the 2:13 ray, {αi } = {12, 1, 1, 1}. The values taken by the {αi } in (2a – 2b) are further restricted, as α1 and α2 cannot be factors of xm and, similarly, α0 and α3 cannot be factors of ym . To prove this, assume, for example, that xm = jα1 . Then equation (1b) implies that α1 is also factor of ym . For the gap dm to be a minimum distance, however, xm
488
Imants Svalbe and Andrew Kingston
Fig. 1. Pixel locations (white dots) in a 457 by 457 image array that are sampled by the digital projection 2:13 (m0 = 422), for translate t = 200. Digital rays follow the lines joining nearest neighbour pixels with α0 = 12.
and ym must be relatively prime, so the assumption must be false. For similar reasons, equation (2b) implies α1 and α2 must be relatively prime, as must be α0 and α3 . Then, like the ordered set of xm /ym fractions, {α1 /α0 , xm /ym , α2 /α3 } also form a Farey-like sequence of relatively prime ratios [9], with xm and ym being mediants of the wrap factors.
3
Primes with Common Wrap Factors
Each allowed {αi } defines a distinct pattern of sampling for the digital rays that comprise the projection xm : ym of an image with size p. An array of size p that has the same {αi } will have the same relative pixel sampling pattern for the projection xm : ym , but with the separation of digital rays scaled by p /p. Let the image array size change from p to p , with p and p having the same {αi } for projection xm : ym . Then equation (1a) becomes m0 ym = α0 p + xm . Defining ∆m0 = m0 − m0 and ∆p = p − p, then ∆m0 and ∆p are related by ∆m0 ym = α0 ∆p,
(3a)
∆m1 xm = α1 ∆p.
(3b)
and similarly,
Equation (3a) requires that ∆p = nym for some integer n = ∆m0 /α0 , as ∆m0 is always an integer and α0 is not a factor of y. Equation (3b) requires
Intertwined Digital Rays in Discrete Radon Projections
489
Fig. 2. Pooled pixel locations (white dots) for three array sizes, 457 - 26 (m0 = 398), 457 (m0 = 422) and 457 + 26 (m0 = 150), sampled by the digital projection 2:13, each with t = 200. The array shown is a 431 by 100 image subset, with common origin (0, 0). The digital rays for the three array sizes have the same sampling pattern scaled to the width of each array.
that ∆p = n xm for some integer n . Taken together, these constraints mean that the lowest possible value of ∆p for p and p to have the same {αi } is then ∆p = xm ym . Hence (4) p = p ± xm ym defines the array size nearest to p that has the same {αi } for the digital projection xm : ym . If xm and ym are odd, then, since p is odd, the nearest prime p with the same wrap factors is at least p = p ± 2xm ym . The value of p given by (4) may not be a prime number. Figure 2 shows that the 2:13 digital rays for the three arrays sizes, 457, 457 ± 26, each with the same {αi }, have the same pixel sampling pattern, in proportion to the array width.
4
Coincident Projection Segments
Suppose the translate of one ray of a digital projection at xm : ym in an image of size p is aligned with the translate of one ray at xm : ym in the size p image. If p and p have the same {αi }, then all of the samples along the aligned rays match exactly within the array of size p, as shown in Figure 2. The perpendicular separation between digital rays is p/dm and the horizontal separation of the rays is p/ym . The digital rays immediately adjacent to the aligned ray for array sizes of p and p = p + xm ym will have a perpendicular difference in separation, d, given by d = (p − p)/dm , so that xm ym . d= 2 x2m + ym
(5)
The integer horizontal displacement of the rays beside the coincident ray is ± xm . These displaced rays form part of a new projection translated by xm away from the ray with the gaps we are trying to fill, so that blending arrays with p = p ± xm ym will not fill the gaps. However all of the primes between p and p will have a perpendicular offset less than that given by (5) and will have pixels that sample the space between the original ray samples and those of the bounding
490
Imants Svalbe and Andrew Kingston
Fig. 3. Pooled digital rays for the projection 2:13, aligned at t = 200, for the nine prime image array sizes 431 < p < 483, shown as a 431 by 100 array subset, with (0,0) as common origin.
rays for p ± xm ym . As the array sizes are chosen to be prime, the sample locations for each blended digital ray will be unique. For those parallel digital rays further away from the aligned ray, the separation between rays belonging to p and those belonging to p become progressively more and more out of alignment (the nth parallel ray from the aligned set has a horizontal spread of ± nxm ). Figure 3 shows, as an example, the pattern of pixels sampled by the nine prime array sizes between 431 through 483, for the projection 2:13, each drawn with a common translate, t = 200. The image shown is a 431 by 100 subset of the nine image arrays, with (0,0) as a common origin. Here xm ym = 26, so the array sizes are chosen to lie inside the range 457 ± 26. The samples for the ray through t = 200 match exactly for all these primes. √ The size of the gap between sample points along the ray direction is dm = 173 ≈ 13.15. The adjacent rays immediately either side of the aligned ray at t = 200 have 9 sample points inside ± d, where here d ≈ 1.98 pixel units (the horizontal separation is xm = 2 pixel units). The nine sample locations are distinct and randomly spread inside the area of size 2ddm . In this example, the nine pooled projections fill about 1/5 of the area bounded by the gap distance between digital ray pixel samples and the limiting rays set by p ± xm ym . As p = p ± xm ym sets limits for those primes with digital rays that would at least partially fill the gap between digital samples for p, it matters how many primes fall between p − xm ym and p + xm ym . Each of those primes is guaranteed to have a different {αi } to that for p. When more primes lie within this range, the gap is filled by more pixels. This results in a better approximation by the pooled digital rays to an integral projection passing through the same image space. As the number of primes lying within ∆p will vary with p, not all possible {αi } values will necessarily occur. For the projection 2:13, only 3 of the 12 possible distinct {αi } sets do not occur between 431 and 483. Some reconstructed image arrays will pad out gaps in given projections better than others, simply because more primes fall inside the same ∆p interval.
5
Reconstruction of Images
The pooling of digital rays as shown in Figure 3 enables the sum of these rays to be more correctly identified with the continuous space rays in real projection
Intertwined Digital Rays in Discrete Radon Projections
491
data at the corresponding angle and translation. Figure 1 makes it clear why linear interpolation works so poorly with the DRT method when reconstructing a single image using a small array size. Projections can be pooled in image space rather than in k-space or t-space. If images are reconstructed at a constant scale, but from data sourced over arrays of different prime size, then the same process of staggered ray sampling occurs, but we see the effects in the reconstructed image rather than in the interpolated projections. If the original real data sinogram is first padded with zeros and then reconstructed using the method described in [7], the result is an image at the same physical scale as the unpadded data, but in a larger frame. Adding several reconstructed images that are appropriately padded, cropped and registered, emulates the pooling of digital rays in reducing the effects of the ray gaps. The method to achieve the appropriate image scaling is described next. A sinogram comprised of N rays at M angles is first reconstructed to an image of size p by p. The same sinogram, padded symmetrically left and right by N p /(2p) zeros (where p > p), can be used to reconstruct an image of size p by p but will retain the image data at the same physical scale as for the p by p image. Averaging these two images after registration (shifting the origin by (p − p)/2 in x and y) will be equivalent to blending the digital rays as sampled over p and p . To fill as much of the gap in the digital rays as possible, the reconstructed images from all primes between p − xm ym and p + xm ym would be averaged. This process should be applied individually for each projection xm : ym . To avoid cycling through all p projections, we choose p = p+xM yM where xM : yM is the largest product for the projections reconstructing at size p. This ensures the largest gap lengths are pooled sufficiently (but will also “over” average for the smaller gap lengths). Figure 4a shows a 601 by 601 image, reconstructed using the prime-based DRT, from an x-ray transmission sinogram of 511 entries at 180 uniformly spaced angles. For p = 601, xm : ym ranges from 1:24 to 17:18 so that xM yM = 306. Figure 4b shows the corresponding reconstructed image result for the average of 10 images reconstructed using 10 prime array sizes from 601 to 653. Figure 4c shows the average of 40 reconstructed images using the 40 prime array sizes from 601 to 863. Figure 4d shows the same image reconstructed once but for a large array size (p = 4091). The projections were subsampled by 8:1 to produce a 512 by 512 final result. Figure 4d is very similar to the result obtained using standard back-projection methods [7]. The reconstruction artefacts evident in Figure 4a arise from mismatching digital and analog projection rays and show the effect of the digital ray gaps. These artefacts are reduced in Figures 4b and 4c as the pooling of digital projections produces on average a better interpolation result for each projection. The artefacts produced at each different prime image size are effectively random and cancel in the summed result. The quality of the images in Figures 4b – c is still well below that obtained by more conventional reconstruction, such as Figure 4d. The errors arise not only
492
Imants Svalbe and Andrew Kingston (a)
(b)
(c)
(d)
Fig. 4. a) Prime DRT image reconstructed for p = 601, from 511 x-ray projections at each of 180 equally spaced angles. b) Average of 10 constant scale images reconstructed using the pooled DRT projection method, for primes from 601 to 653. c) Average of 40 constant scale images reconstructed using the pooled DRT projection method, for primes from 601 to 863. d) DRT reconstruction for a single large final image format (p = 4091), sub-sampled by 1:8 to produce a 521 by 5 12 image.
because the digital ray gaps are only partially filled, but also from the limitations of the image scaling method used to pool the digital rays. Pooling projections in the spatial or image domain requires that the image scaling and registration be as precise as possible for each prime array size to avoid blurring image details. The padding of the sinogram data was done to the nearest integral number of bins to avoid the effects of redistributing the original x-ray projection data. The pooling of digital rays should also be optimised for each xm : ym projection value rather than over-averaging the rays with small gaps. This produces no
Intertwined Digital Rays in Discrete Radon Projections
493
additional benefit for the small gap projections but makes them very sensitive to the scale and registration problems outlined above. There was also no efficiency gain in applying the pooled image approach, as the combined time to scale and compute multiple reconstructed images, such as Figure 4c, was greater than the time required to produce a single, higher quality result from a large prime reconstructed image (such as that in Figure 4d). See [7] for relative reconstruction times as a function of image size.
6
Conclusions
We have shown that digital rays from digital projections that are sampled over different prime array sizes can be pooled to better approximate integral projection rays. A distinct sampling pattern for the digital rays at each digital projection angle was associated with the uniqueness of each set of array wrap factors, {αi }. The limit on the range of array sizes that can be usefully pooled was established and estimates were given for the degree of gap filling by the staggered ray samples. The pooling of digital rays through the indirect method of spatial averaging of scaled images gave some improvement in image reconstruction quality for the prime-based DRT method. However this was only enough to make this approach interesting, rather than providing a practical alternative to the “large image” DRT approximation to reduce the ray gap problem. Solving the inverse problem of direct distribution of the content of an integral projection ray amongst the component digital rays in k-space for any prime p, is considered further in [8]. Nevertheless, the relative improvement in image quality seen here offers a proof of principal that the approach of pooling digital rays does work.
Acknowledgments IS and AK acknowledge support for this work from the Centre for X-Ray Physics and Imaging within the School of Physics and Material Engineering at Monash University. AK is a PhD student supported by an Australian Postgraduate Award provided by the Australian Government.
References 1. Beylkin, G., Discrete Radon Transform, IEEE Trans. on Acoustics, Speech and Signal Processing, vol. ASSP-35, no. 2, pp. 162-172, 1987. 2. Svalbe, I., Image Operations in Discrete Radon Space, DICTA 2002, Melbourne, Australia, Jan. 21-22, 2002, pp. 285-290. 3. Matus, F., and Flusser, J., Image Representation via a Finite Radon Transform, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 15, no. 10, pp. 996-1106, 1993. 4. Svalbe, I., Digital Projections in Prime and Composite Arrays, IWCIA, Philadelphia, August, 2001, also see Electronic Notes in Theoretical Computer Science, www.elsevier.nl/locate/entcs/volume46.free.
494
Imants Svalbe and Andrew Kingston
5. Svalbe, I., Sampling Properties of the Discrete Radon Transform, accepted for publication, Discrete Applied Mathematics, 2003. 6. Salzberg, P., and Figueroa, R., Chapter 19, “Tomography on the 3D-Torus and Crystals”, in Discrete Tomography: Foundations, Algorithms and Applications, Eds. G. T. Herman and A. Kuba, Birkhauser, Boston, 1999. 7. Svalbe, I. and van der Spek, D., Reconstruction of Tomographic Images Using Analog Projections and the Digital Radon Transform, Linear Algebra and its Applications, 339 (2001) 125-145. 8. Kingston, A., k-space Representation of the Discrete Radon Transform, PhD. Thesis, School of Physics and Materials Engineering, Monash University (in preparation) 2003. 9. Svalbe I. and Kingston, A., Farey Sequences and Discrete Radon Transform Projection Angles, IWCIA’03, May 14-16, Palermo, Italy, 2003.
Power Law Dependencies to Detect Regions of Interest Yves Caron, Harold Charpentier, Pascal Makris, and Nicole Vincent Laboratoire d’Informatique, Université François Rabelais, 64 avenue Jean Portalis, 37200 Tours, France [email protected], {vincent,makris}@univ-tours.fr
Abstract. This paper presents a novel approach to detect regions of interest in digital photographic grayscale images using power laws. The method is intended to find regions of interest in various types of unknown images. Either Zipf law or inverse Zipf law are used to achieve this detection. The detection method consists in dividing the image in several sub-images, computing the frequency of occurence of each different image pattern, representing this distribution by a power law model and classifying the sub-frames according to the power law characteristics. Both power laws models allow region of interest detection, however inverse Zipf law has better performances than Zipf law. The detection results are generally consistent with the human perception of regions of interest. Keywords: Segmentation, region detection, region of interest, compression, coding.
1 Introduction The detection of regions of interest in images is a difficult problem, which has many applications, such as object detection and recognition, image indexation and compression optimization. Our aim is to design a method which is generic enough to be able to detect regions of interest in various types of photographic static images without requiring previous knowledge of the scene or the objects to be detected. A possible application is to implement this function for automatic region of interest detection in a JPEG 2000 encoder. Unsupervised region of interest detection in unknown images is a difficult task since we cannot use color or shape information to find the objects of interest in the image. A possible approach is to use semi-local texture characteristics to detect parts of the image which have distinctive features from the surrounding background. Some existing works in this domain include Beaver et al. [1] who use a measure of fractal dimension to detect man-made objects in aerial images, Kadir and Brady [2] who use an entropy measure to detect salient regions in images, and Wang et al. [3] who use the statistical repartition of wavelet coefficient to extract foreground objects in photographic images. This paper presents a novel approach to detect region of interest by analyzing statistical distribution of image patterns using power laws. Power laws models such as Zipf law have been used in many domains, such as linguistics by Zipf [4], Miller and Newman [5] and Cohen et al.[6] by Hill [7] and Makse and al.[8] in urban population studies, by Mantegna and al. [9] in the sequencing of the human genome, or by Breslau et al. [10] and Huberman [11] in Internet I. Nyström et al. (Eds.): DGCI 2003, LNCS 2886, pp. 495–503, 2003. © Springer-Verlag Berlin Heidelberg 2003
496
Yves Caron et al.
traffic analysis. It has also been used in the domain of image analysis by Vincent et al. [12] for compression quality evaluation and by Caron et al. [13] for detecting artificial objects in natural environments. The problem of region of interest detection can be considered as a generalization of the object detection problem. In this paper we will first present the power laws models used, namely Zipf law and inverse Zipf law, then we will present how they can be used for image analysis. The detection method will be detailed and experimental results will be presented.
2 Power Law Models 2.1 Zipf Law This law was determined empirically in 1949 by G.K Zipf [4]. It states that in a topologically organized set of symbols, the distribution of the frequency of appearance of the different symbol patterns, like the words in a text, follows a power-law distribution. If the n-tuples of symbols are sorted in the decreasing order, the frequency of appearance Nσ(i) of the n-tuple of rank i in the sequence is given by the formula (1):
Nσ(i)=k.i-α
(1)
In this formula, k and α are constants and the value of α characterizes the power law. In the distribution of words in English texts studied by Zipf, the value of α is close to 1. This law can be graphically represented in a bi-logarithmic scale diagram called Zipf plot. In this graphical plot, the least-square regression slope of the distribution enables an estimation of the power law exponent. Power-law distributions have been found in all natural languages as well as in different domains such as the distribution of city populations, Internet traffic or the repartition of DNA sequences in the human genome. The interest of Zipf law model is mainly based on pattern coding. 2.2 Inverse Zipf Law G.K Zipf [14] also defined inverse Zipf law. It also deals with the statistical repartition of patterns frequency but, unlike the previously described Zipf law which emphasizes on the most frequent patterns in the sequence, the inverse Zipf law concerns the least frequent patterns in the sequence. According to inverse Zipf law, the number of words I(f) which have the occurrence frequency f is given by the formula (2): I(f) = af
-b
(2)
In this formula, a and b are constants and the value of b estimated by Zipf in his works on English texts is close to 2. This formulation has notably been used by Cohen et al. [6] for linguistic analysis. All these analysis tools designed primarily for text analysis can be adapted for use in different domains, in our case to image analysis.
Power Law Dependencies to Detect Regions of Interest
497
3 Application to Image Analysis 3.1 Image Pattern Coding We can notice an image is a discrete representation in a 2 dimensional space. In order to use these models designed for text analysis in the domain of image analysis, we must first define an equivalent of the notion of word in the case of an image. We will work on image patterns, each pattern is defined as a 3x3 mask of adjacent pixels. Then we must define a pattern coding. If the grayscale levels were directly used, the frequency of appearance of each particular pattern would be very low due to the great number of possible patterns, and the distribution of pattern frequencies would not be statistically significant. So we must define some coding in order to reduce the number of possible patterns. A possibility is to divide the gray levels scale into a relatively small number of classes and to affect to each pixel the value of this class according to its luminance value. An example of pattern coding with this method is given on Fig. 1. The class c(x,y) of a pixel with grayscale g(x,y) would be given by the formula (3), where N is the number of classes.
Ng(x, y) c(x, y)=int 255
(3)
This value allows a significant decrease of the maximal number of patterns from 9 21 9 8 256 = 4.7x10 to 9 = 3.87x10 and maintains consistency with human visual perception. With only 9 gray levels, the image structure is preserved and the main features are still clearly visible, as shown on Fig. 2.
255 210 210
8
7
7
25
2
34
0
0
1
40
2
40
1
0
1
Fig. 1. Original pattern (left) and pattern coded in 9 classes
Fig. 2. Original image (left) and image coded with 9 gray levels (right)
498
Yves Caron et al.
3.2 Application of Power Law Models
Log (frequency)
Both Zipf law and inverse Zipf law can be used to analyze an image. Analyzing an image with Zipf law consists in scanning the image with a 3x3 mask, coding the patterns, counting the frequency of appearance of each different pattern and sorting the patterns by decreasing order. The distribution of pattern frequencies with respect to their ranks can then be plotted in a bi-logarithmic scale diagram to visualize the actual law. In Fig. 3 is presented an image and the associated Zipf plot. For an image coded by partitioning the grayscale in 9 classes, it can be noticed that, using this coding the actual behavior of data would be better modelized using two different power laws. In fact two inner structures are put to the fore. One concerns the layout of regions and the other concerns the contours. The left part of the curve generally contains the most frequent patterns in which all the pixels belong to the same grayscale class, those patterns represents the homogenous zones of the image. The right part is made of patterns which pixels belong to different classes and represent contours and nonhomogenous zones, in photographic images they are considerably less frequent. As a consequence, the Zipf plot can be modelized by the least-square regression straight lines of the two parts of the curve. The inverse Zipf plot can also be used for analyzing an image. The patterns are coded in the same way as for the Zipf plot, the number of different patterns having each different frequency of appearance is counted, and the number of patterns with respect to their frequency is plotted in a bi-logarithmic scale diagram as shown in Fig. 4. We can notice that the left part, which represents the least frequent patterns, is linear. This result is in accordance with the results obtained by G.K. Zipf on texts. The distribution of pattern frequencies in an image follows a power law.
Log (rank)
Log (number of patterns)
Fig. 3. Zipf plot (right) associated to the F-16 image (left) coded by partitioning the grayscale in 9 classes
Log(frequency)
Fig. 4. Inverse Zipf plot for the F-16 image
Power Law Dependencies to Detect Regions of Interest
499
4 Region of Interest Detection 4.1 Principle of the Method The objective is to detect regions of interest in images using power laws. The notion of region of interest is largely subjective, however some general considerations can be made for characterizing it. A region of interest can be defined as a region which has distinctive features from the rest of the image. It may be the foreground of the image, a region which appears less homogenous than the background, or a particular object in the image. In any case, it is a region which is different from the rest of the image. The use of power laws models will allow to find a region of the image which has a different distribution of texture pattern frequencies. The detection method consists in dividing the image into sub-images, computing the Zipf or the inverse Zipf distribution of each sub-image and classifying the sub-images according to the characteristics of this distribution. The size of the sub-images must be chosen properly, they must be large enough to have a statistically significant pattern distribution but also be small enough to allow a precise determination of the region of interest. The optimal size of the sub-images was determined experimentally, the best results were obtained for subimages containing about 5000 pixels and having the same aspect ratio as the initial image. Thus, the number of sub-images will be dependent on the image size. 4.2 Use of Zipf Law As seen in Section 3, the regression lines of the two parts of the plot can represent the Zipf plot associated with an image. The classification method is based on the Zipf exponent that is to say on the regression slopes of the two parts of the plot. The Zipf plot associated with each sub-image of the image will be represented by a dot in the plane. In this representation the horizontal coordinate represents the slope of the left part of the Zipf plot corresponding to the homogenous patterns and the vertical coordinate represents the right part of the plot corresponding to the non-homogenous patterns. The points of this graph can be classified in two clusters according to their position with respect to the line of equation y = x. In most images, the points representing foreground objects tend to be situated just below the y = x line, and more precisely in the left part of the cluster, at the left of a vertical line in close proximity with the cluster centroid. The equation of this vertical line was determined empirically to be: x = Gx / 1.2
(4)
In this formula, Gx is the horizontal coordinate of the cluster center of gravity. The sub-images belonging to the region of interest are situated between the two lines, as indicated in Fig. 4. However, not all the points situated between the two lines represent sub-images belonging to the main region of interest. In order to deal with this problem, only the largest connected component of the sub-images corresponding to these points is kept in an automatic detection of a region of interest. 4.3 Use of Inverse Zipf Law A similar classification method can also be used to detect region of interest with inverse Zipf law. The parameters of this classification will be the inverse Zipf exponent
500
Yves Caron et al.
that is estimated by the slope of inverse Zipf plot, noted b in Equation (2) and the number of unique patterns, noted a in the same equation. The value of a depends on the image size, so this parameter would only be significant if all the sub-images have the same size. The inverse Zipf plot associated with each sub-image will be represented by a dot in a graph with log (b) as the horizontal coordinate and log (a) as the vertical coordinate, as seen in Fig. 5.
Fig. 4. Classification diagram for Zipf law (left) and corresponding region of interest (right)
Fig. 5. Classification diagram for inverse Zipf law (left) and corresponding region of interest on the image(right)
In general, regions of interest tend to have more details than other parts of the image. Consequently, the sub-images associated with these regions will have more unique patterns than the rest of the image sub-images because they comprise more specific details and they will be situated at the top of the cluster. This part of the cluster may contain sub-images which do not belong to the region of interest. In that case, only the largest connected component is kept, as with Zipf law. The region of interest may contain holes; in that case we fill them by including the sub-images which have all the neighboring sub-images belonging to the region of interest. To ensure a better determination of the region of interest, the separation line between the two classes of the cluster can be determined dynamically, in such a way that the surface of the region of interest would always be between 20% and 50% of the total image surface. At first the direction of the separation line is chosen with respect to the inertia of the set of points and it intersects the center of gravity of the cluster. If the area of the region of interest is more than 50% of the surface area, after extracting the
Power Law Dependencies to Detect Regions of Interest
501
largest connected component and filling the holes, the separation line is raised iteratively by steps of 0.01 and the region of interest is recomputed until its surface area is 50% or less of the image surface area. Inversely, if the region of interest is less than 20% of the total image area, the separation line is lowered and the region of interest recomputed until it is at least 20% of the image surface area. In some images, especially the images of artificial objects in natural backgrounds, the region of interest is more uniform than the background, and it is represented by points situated in the bottom part of the set of points. In that case the region of interest would be constituted by the sub-images represented by the points below the separation line. Thus we must make the distinction between the two types of images. The image is considered as having a region of interest more uniform than the background if the average proportion of unique patterns is higher than 50% in the whole image. This distinction has been confirmed by experimental results. This method allows to detect region of interest in various types of images, the results are generally consistent with human interpretation.
5 Experimental results The two detection methods using Zipf and inverse Zipf law have been compared with each other to determine which of them detects the “best” region of interest, i.e. which one is the more consistent with human interpretation. An example of this comparison is shown on Fig. 6. In this image, the method using inverse Zipf law gives better results than the method using Zipf law, with Zipf law some parts of the object are classified as background and a significant part of the background is included in the region of interest. In most of the tested images, the inverse Zipf law gives better results than Zipf law for detecting objects of interest. The method has been tested in a set of 100 images of various subjects featuring a region of interest outlined by a human observer, the region of interest was correctly detected in 56% of the images with Zipf law and in 80% of the images with inverse Zipf law.
Fig. 6. Detection results with Zipf law (left) and inverse Zipf law (right)
It is also interesting to study the influence of the number of sub-images on the detection results. The image in Fig.7 has been segmented in 8x8, 19x19 and 32x32 subimages, and the inverse Zipf detection method has been applied. When segmented in
502
Yves Caron et al.
8x8 sub-frames, the detection fails because the image is misclassified as having an object of interest more uniform than the background. In this case, it is the ground in front of the object which is detected. The segmentation in 19x19 sub-frames detects correctly the object of interest, and with the segmentation in 32x32 sub-frames, some uniform parts of the object are classified as parts of the background and textured background regions are classified as regions of interest. In most of the images, the best detection results are obtained when the surface of the sub-frames is around 5000 pixels, which correspond to 19x19 sub-images.
Fig. 7. Regions of interest detected with inverse Zipf law for a segmentation in 8x8 (left), 19x19 (center) and 32x32 (right) sub-images.
6 Conclusion The use of a method based on power laws allows the detection of regions of interest in an image without previous knowledge of the image or the nature of the region to be detected. Either Zipf law and inverse Zipf law can be used for region of interest detection. The classification of the sub-frames of the image in function of the characteristics of their representation by a power law can find a region of interest which is consistent with the human interpretation in terms of region of interest. Inverse Zipf law has better performances than Zipf law for region of interest detection. It finds more precise regions of interest, and it is able to detect regions of interest either when the region to be detected is more uniform than the image background or when it is less uniform. It is possible to find an optimal size for the sub-frames. The criteria used to classify the sub-frames are not the only possible, it is possible to use different criteria such as pattern frequency entropy. It will also be possible to use different pattern codings to improve the performances of the method.
References 1. Beaver, P., Quirk, S.M., Sattler, J.P.: Object Identification in Greyscale Imagery using Fractal Dimension, in M Novak (ed.): Fractal Reviews in the Natural and Applied Science, Chapman & Hall, London, (1995) 63-73. 2. Kadir, T., Brady M.: Scale, Saliency and Image Description. International Journal of Computer Vision. Vol. 45 No. 2 (2001) 83-105 3. Wang, J.Z., Li J., Gray,R.M., Wiederhold,G.: Unsupervised Multiresolution Segmentation for Images with Low Depth of Field. IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. 23 No.1 (2001) 85-90
Power Law Dependencies to Detect Regions of Interest
503
4. Zipf, G.K.: Human Behavior and the Principle of Least Effort. Addison-Wesley, New York (1949) 5. Miller, G.A., Newman, E.B.: Test of a Statistical Explanation of the Rank-Frequency Relation for Words in Written English. American Journal of Psychology, 71 (1958) 209-218 6. Cohen, A. Mantegna, R.N., Havlin, S.: Numerical analysis of word frequencies in artificial and natural language texts. Fractals, Vol. 5 No.1 (1997) 95-104 7. Hill, B.M.: Zipf's law and prior distributions for the composition of a population. Journal of the American Statistical Association, 65 (1970) 1220-1232 8. Makse, H.A., Havlin, S., Stanley, H.E. Modelling urban growth patterns. Nature, 377 (1995) 608-612 9. Mantegna, R.N., Buldyrev, S.V., Goldberger, A.L., Havlin, S., Peng., C.K., Simons, M., Stanley, H.E.: Linguistic Features of Noncoding DNA Sequences, Phys. Rev. Lett. 73: 3169 (1994) 10. Breslau, L., Cao, P., Fan, L., Phillips, G., Shenker., S. Web caching and Zipf-like distributions: Evidence and implications. In Proceedings of IEEE Infocom 99 , New York (1999) 126-134 11. Huberman, B.A., Pirolli, P., Pitkow, J., Lukose, R.: Strong Regularities in World Wide Web Surfing. Science, 280 (1998) 95-97 12. Vincent, N. Makris, P. Brodier, J.: Compressed Image Quality and Zipf’s Law. In Proceedings of International Conference on Signal Processing (ICSP – IFICIAPRWCC2000), Beijing, China (2002) 1077-1084 13. Caron, Y., Makris, P., Vincent, N.: A method for detecting artificial objects in natural environmements. International Conference on Pattern recognition (ICPR - IAPR), Québec, Canada (2002), 600-603 14. Zipf, G.K. The Psychology of Language, an Introduction to Dynamic Philology, M.I.T. Press, Cambridge, Massachusetts (1965)
Speed Up of Shape from Shading Using Graduated Non-convexity Daniele Gelli and Domenico Vitulano Istituto per le Applicazioni del Calcolo IAC-C.N.R. Viale del Policlinico 137, 00161 Roma, Italy {Gelli,Vitulano}@iac.rm.cnr.it
Abstract. This paper will focus on a proposal to speed up Shape From Shading (SFS) approaches based on energy minimization. To this end, Graduated Non Convexity (GNC) algorithm has been adopted to minimize this strongly non convex energy. Achieved results are very promising and involve aspects both theoretical and practical. In fact, both a generalization of the original formulation of GNC and an effective discrete shape recovery characterize our approach. Finally, a drastic reduction of the computational time is reached in comparison with the other currently available approaches.
1
Introduction
Shape From Shading (SFS) is a classical problem of Computer Vision and consists of recovering 3-D shape of an object starting from its image [1,2]. In the last few years a lot of research has been devoted to it. The approaches in literature can be coarsely classified in four groups: minimization, propagation, local and linear. Minimization based approaches try to achieve the solution via minimization of an energy composed of some terms: each of them corresponds to a constraint on the solution itself [3,4]. In the second class, introduced by Horn [3] and later developed by Bruckstein [5], shape information is propagated along the strips in the direction of intensity gradient. These latter are lines in the image where both surface depth and orientation can be computed whether they are known at the starting point . New approaches are based on Hamilton Jacobi equation and viscosity solution, such as [6] and the one recently proposed in [7], where Eikonal equation is solved using Fast Marching Method. Local approaches exploit some “a priori” assumptions on the shape to be recovered (e.g. locally spherical as in [8]). Finally, approaches belonging to the last class are based on linearization of the SFS scheme to achieve the surface depth. Two examples are in [1,9]. Minimization approaches seem to achieve better results among the aforementioned ones [2]. Their only drawback consists of a huge computing time in reaching the solution. Then they are not suitable for real time applications. This problem stems from the fact that involved energies are usually not convex functions. Hence, their (global) minimum is hard to be found. I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 504–513, 2003. c Springer-Verlag Berlin Heidelberg 2003
Speed Up of Shape from Shading Using Graduated Non-convexity
505
This paper focuses on a drastic reduction of the SFS computing time using Graduated Non Convexity (GNC) approach [10]. The latter is a clever method for minimizing discrete non convex energies. It consists of producing a sequence of energy functions, by a gradual deconvexification of the original one. This way allows us to reach a minimum close to the absolute one, in a moderate computing time [10]. If on one hand GNC is a valid alternative to very expensive algoritms like Simulated Annealing [10], on the other hand its main drawback consists of “adapting” this strategy to the energy to be minimized. We outline that the contribution of this paper is not a trivial application of GNC to SFS problem. Starting from this kind of energy we have generalized the original formulation of GNC, as it will be clearer in the following. The outline of the paper is as follows. Next section will present some basic concepts about SFS problem useful for understanding the rest of the paper. Section 3 will firstly focus on a general presentation of GNC and then on its application to the energy under study. Some experimental results will be shown and discussed in Section 4.
2
Shape from Shading
The aim of Shape from Shading consists of recovering the 3-D shape of an object from 2-D intensity data of an image. The simplest model for image formation is the lambertian one [2,3]. The grey level intensity in correspondence to a given pixel of the image under study can be considered proportional to both the light → − → − source L = (lx , ly , lz ) and the surface normal N intensities. Strictly speaking, the Reflectance Map in the Lambertian model is: lz − plx − qly → − − → } = max{0, ρ · cos(θ)} (1) R(p, q) = max{0, ρ N · L } = max{0, ρ 1 + p2 + q 2 → − (−p,−q,1) ∂Z √ the surface where (p, q) = ( ∂Z ∂x , ∂y ) is the surface gradient, N = 2 2 1+p +q
normal, ρ (albedo) is a positive constant including factors such as strength of → − → − illumination and surface reflectivity, and θ is the angle between N and L . As matter of fact, alternative image formation models have been proposed in literature [11]. They try to overcome some intrinsic drawbacks of the lambertian model. Nonetheless, the simplicity of this latter makes it more attractive and effective in many cases. Hypotheses of uniformity of albedo along with infinite distance between a single light source and the object in the scene are usually made [11]. SFS problem is ill-posed [12] since it consists of recovering the surface quotes Z(x, y) satisfying the image irradiance equation: E(x, y) = R(p, q) with E(x, y) image intensity at position (x, y). Then a smoothness constraint has to be added, yielding the following energy: E(p, q, Z) =
N
i,j=1
−1 2 N Ei,j − R(pi,j , qi,j ) +λ Φ( | ∇pi,j | ) + Φ( | ∇qi,j | ) (2)
Data−Closeness term E1
i,j=1
Regularization term E2
506
Daniele Gelli and Domenico Vitulano
where Ei,j
i, j ∈ {1...N } is the input image, R(pi,j , qi,j ) the lambertian re2 2 flectance map, while | ∇pi,j |= (pi+1,j − pi,j ) + (pi,j+1 − pi,j ) is the gradient of p and similarly for q. We selected the well-known function of regularization k2 Φ(k) = 1+δk 2 with δ positive constant[2,3,10,13,14], which performs a selective regularization with preservation of the salient parts of the shape to be recovered [15,16]. λ is a positive constant balancing E1 and E2 in (2).
3
GNC for Shape from Shading
In the previous sections we have introduced GNC as an effective way to minimize discrete non convex energies like the SFS one. Its effectiveness stems from its ability in “avoiding the pitfall of sticking in local minima” (p. 46 of [10]) of a given energy F . It is reached by building a finite sequence of energies F (p) with 1 ≥ p ≥ 0. The first element (F (1) ) is built as a convex approximation of the original energy and then has a unique (global) minimum. The last one (F (0) ) is the original non convex energy. The intermediate elements of the energies sequence correspond to gradual deconvexed versions of F (0) . The main idea of minimization step is very simple. The minimum of F (1) will represent the starting point of minimization of the second element of the sequence and so on. Unfortunately, in spite of a both simple and fast minimization the production of the sequence is often not trivial. The rest of this section will focus on it. Next subsection will deal with deconvexification of the data-closeness term of (2). It represents a generalization of the classical GNC proposed in [10]. In fact this term has always been convex in all energies minimized so far by this algorithm. This fact will involve some non trivial problems tied to it. On the contrary, the second subsection will focus on the regularization term. Again, also in this case our proposal is more general of that in [10]: it provides a direct solution of the second order energy usually called thin plate. In order to simplify the computation, in the following it is convenient to consider the functional in (2) in terms of the only variable z. Moreover N × N matrices zi,j will be considered as N 2 × 1 arrays zk , with k = (i − 1)N + j to make the formulation tractable. 3.1
Data-Closeness Term
The Reflectance Map is: lz − lx pk − ly qk lz − lx (zk+N − zk ) − ly (zk+1 − zk ) = Rk = 2 2 1 + p k + qk 1 + (zk+N − zk )2 + (zk+1 − zk )2
∂z ∂z where pk = ∂x = (z − z ) and q = = (zk+1 − zk ). k+N k k ∂y k k
In order to study the convexity of the first term of (2) we have to analyze when its hessian ∂Rk ∂Rk ∂ 2 Rk ∂ 2 E1 =2· − (Ek − Rk ) . (3) ∂zi ∂zj ∂zi ∂zj ∂zi ∂zj k
Speed Up of Shape from Shading Using Graduated Non-convexity
507
is positive definite. We introduce a positive constant γ ∈ [0, 1] to build a sequence of reflectance maps whose first term is convex: lz − lx [γzk+N − zk ] − ly [γzk+1 − zk ] ˜k = . R 1 + γ [(zk+N − zk )2 + (zk+1 − zk )2 ]
(4)
In fact, for γ = 0 the function (4) becomes: ˜ k = lz + zk (lx + ly ) R and then
∂ 2 E1 = 2(lx + ly )2 ≥ 0. ∂zi ∂zj
On the contrary, for γ → 1, again, we obtain the original lambertian map. k Since for γ = 0 ∂R ∂zi = 0 only for k ∈ {i, i − 1, i − N }, it follows: ∂ (Ek − Rk )2 = ∂zi k ∂Ri ∂Ri−1 ∂Ri−N , + (Ei−1 − Ri−1 ) + (Ei−N − Ri−N ) − 2 · (Ei − Ri ) ∂zi ∂zi ∂zi and then it is trivial to show that if we derive with respect to variable zj we have: ∂2 (Ek − Rk )2 = 0 ∂zi ∂zj k
only for the indices j ∈ {i, i − 1, i + 1, i − N, i + N, i − (N − 1), i + (N − 1)}. 3.2
Regularizing Term
In this section we will achieve a convex approximation of the regularization term:
E2 = λ Φ(| ∇p |) + Φ(| ∇q |) (5) i,j
where | ∇p |=
2 + z2 p2x + p2y = zxx xy
,
| ∇q |=
2 + z2 . qx2 + qy2 = zxy yy
and zxx (k) = zk+N − 2zk + zk−N zxy (k) = zk+N +1 − zk+N − zk+1 + zk zyy (k) = zk+1 − 2zk + zk−1 .
508
Daniele Gelli and Domenico Vitulano
Considering the circulant matrices G1k,l G2k,l G3k,l generated by V 1 = (. . . 0, 0, 1 , 0, . . . , 0, −2 , 0, . . . , 0, 1 , 0, 0 . . .) k−N
k+N
k
1 , −1 , 0, . . . , 0, −1 , 1 , 0, 0 . . .) V 2 = (. . . 0, 0, k
k+N k+N +1
k+1
V 3 = (. . . 0, 0, 0, 0, 1 , −2 , 1 , 0, 0, 0, 0 . . .), k−1
k
k+1
the k − th component of discretized derivative operators can be written as: N 2 N 2 N 2 zxx (k) = l=1 G1k,l · zl ; zxy (k) = l=1 G2k,l · zl ; zyy (k) = l=1 G3k,l · zl . Then:
N 2 N2
2 2 λ G1k,l zl + G2k,l zl + E2 = 2 Ψ δ δ k l=1 l=1 N2 N2
2
2 λ G2k,l zl + G3k,l zl Ψ δ + 2 δ k
l=1
where Φ(u) =
l=1
1 Ψ (δu). δ2
Defining the variables: N 2 1 2 N 2 2 2 tk = + l=1 Gk,l zl l=1 Gk,l zl
2 N 2 2 N 2 3 2 wk = G z + l=1 k,l l l=1 Gk,l zl (6) can be written as: E2 (z) =
λ {Ψ (δtk ) + Ψ (δwk )} . δ2 k
2 Its convexity derives from the sign of its Hessian ∂z∂E > 0. i ∂zj As regards the first part Ψ (δtk ) of the regularization term we have: ∂ ∂tk ∂tk ∂ 2 tk Ψ (δtk ) = δ2 Ψ · · + δΨ · . ∂zi ∂zj ∂zi ∂zj ∂zi ∂zj k k k
1 1 2 2 G z + G z G G With Ak,i = k,i k,i we have: l k,l l l k,l l ∂ 1 2 1 k Ψ (δtk )) = k (Ak,i Ak,j ) δ t2 Ψ (δtk ) − δ t3 Ψ (δtk ) ∂zi ∂zj ( k
+
k
δ t1k Ψ (δtk ) G1k,i G1k,j + G2k,i G2k,j .
k
(6)
Speed Up of Shape from Shading Using Graduated Non-convexity
509
Then ! " −8 1 Ψ (u) 2 1 4 Ψ (u) =δ − (7) δ 2 Ψ (δtk ) − δ 3 Ψ (δtk ) = δ4 2 3 tk tk u u (1 + u2 )2 (δtk =u) −8δ 4 (1+u2 )2
bounded as follows −8δ 4 ≤ !
Ψ (δtk ) δ tk
≤ 0 and
"
= δ2 (δtk =u)
Ψ (u) 2 = δ2 u (1 + u2 )2
(8)
2
2δ 2 with 0 ≤ (1+u 2 )2 ≤ 2δ . Functions (7), (8) can be lower bounded by their minima: ∂ Ψ (δtk ) ≥ −8δ 4 (Ak,i Ak,j ) . ∂zi ∂zj k
k
In order to consider the worst case, we analize the maximum of k (Ak,i Ak,j ) . Exploiting eigenvalues for G1 e G2 , we have: 1 G1k,l zl = (G1 z)k = γ 1 zk ≤ γmax zk G1 z = γ 1 z =⇒ l
and similarly for G2 . In [17] we proved
1 2 1 2 2 Ak,i Ak,j ≤ Cz2 | γmax || γ˜max | + | γmax || γ˜max | k 1 2 1 2 with γmax γmax and γ˜max γ˜max respectively the greatest eigenvalues for 2 1 1 2 ˜ ˜ G , G and Gi,j =| Gi,j | , Gi,j =| G2i,j | and supposing zk bounded: zk ≤ Cz . Hence # $2 ∂ 1 1 2 2 Ψ (δtk ) ≥ −8Cz2 δ 4 (| γmax || γ˜max |)+(| γmax || γ˜max |) . ∂zi ∂zj 1
k
Similar considerations can be made for the other term containing Ψ (δwk ). Then we obtain: ∂ 2 E2 ≥ −8Cz2 λδ 2 Γmax ∂zi ∂zj with
1 2 2 2 1 2 2 2 3 3 Γmax= | γmax || γ˜max | + | γmax || γ˜max | + | γmax || γ˜max |+| γmax || γ˜max | . Eigenvalues have been computed as in Appendix D of [10] with suitable changes. The obtained constraints are general and can be applied to all energies having the first term convex and the same regularization function.
510
Daniele Gelli and Domenico Vitulano
a)
b)
Fig. 1. The 128 × 128 original discrete shape: a)gray-scale image, b) surface.
3.3
Implementation
From the data-closeness term of the energy E we have: ∂E1 = 2(lx + ly )2 . ∂zi ∂zj The Hessian matrix is positive definite if the following condition is satisfied: Γmax ≤
(lx + ly )2 . 4Cz2 λδ 2
Then under the above constraint, the functional (2) is convex. Now we set δp2 =
(lx + ly )2 4pCz2 λΓmax
with 1 ≥ p ≥ p¯, where p¯ =
and
γp =
1−p 1 − p¯
(9)
(lx +ly )2 4Cz2 λδ 2 Γmax .
Let us denote E (p) the energy in (2) corresponding to the values δp and γp . The sequence of energies is such that E (1) is convex and E (p) changes towards E as p approaches p¯. We first minimize the energy E (1) and then the parameter p is gradually lowered in the interval 1 p¯.
4
Experimental Results and Discussion
We have applied our model on many shapes. In this paper we will show the results on the shape depicted in Fig. 1 since simple but with evident discontinuities. Fig.2 shows the surface recovered by our model in just 350 seconds on a Risk work station Octane/SI R10000 175MHz/1Mb cache, considering a multigrid speed up in the convex step. It can be noticed that our model is able to catch discontinuities usually difficult to reach by SFS minimization models. In other words our minimization model strongly decreases the energy achieving good results. To better understand this point, we compared our performances
Speed Up of Shape from Shading Using Graduated Non-convexity
511
35
30
30
25
29
20
28
15
27
10
26
5
25
0
24 40
45
50
55
−5
0
20
40
60
80
100
120
Fig. 2. Row n. 64 of: original surface (top most), convex recovery (bottom most), GNC (close to the convex one but with discontinuities), Gauss-Seidel (intermediate solution). A zoom around a discontinuity point is also evidenced.
with the ones achieved by another well-known energy minimization: Gauss-Seidel algoritm on Euler equations relative to p, q, Z. In order to make comparable the performances of both models, we start from the same convex surface obtained via the convex energy relative to (2), i.e. with Φ = t2 . This choice accounts for the same speed up in both minimizations in the respective convex steps via a multigrid model [18]. Looking again at Fig. 2 we can understand the difference between the behavior of the two minimizations. Our solution remains close to the convex surface outperforming the discontinuity regions. The other one tries to reach the original surface but it loses the discontinuity information in opposition to non convex models aim. 40
Energy
35 30 25 20 15 10 5
0
500
1000
sec.
1500
2000
2500
Fig. 3. Energy decay behavior during minimization performed by Gauss-Seidel (top) and GNC (bottom).
Fig. 3 emphasizes this fact showing the energy decay versus time of both models. Although its closeness to the convex solution, our model makes a drastic reduction of the global energy that entails a better recovering of the discontinuity points. On the contrary, the other minimization shows a compensation between a better surface (closer to the original one) and the lost of discontinuities. Moreover, our model achieves a good result in a few seconds against many minutes of the other. Finally, in order to show the potentialities of our approach, in Fig. 4
512
Daniele Gelli and Domenico Vitulano
Fig. 4. An example of GNC performed on a 1-D case: on the left original (dashed) and recovered (solid) reflectance map, on the right original and recovered surfaces (nearly coincident).
we have shown the result obtained in 180 seconds on a 1-D version of the energy in (2) where the original surface is now a cone overlapped on a spherical cap. It turns out that paying a higher computational time, our approach can achieve results close to absolute minimum. In order to maintain the time moderate a further speed up on the non convex steps should be achieved. This is presently an open problem and will be the topic of our future research.
Acknowledgements Authors would like to thank dr. Riccardo March for his helpful suggestions to develop this work.
References 1. Pentland, A.: Shape information from shading:a theory about human perception. In: Proceedings of Int’l Conference on Computer Vision. (1988) 404–413 2. Zhang, R., Tsai, P., Cryer, J., Shah, M.: Shape from shading: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 21 (1999) 690–705 3. Ikeuchi, K., Horn, B.: Numerical shape from shading and occluding boundaries. Artificial Intelligence 17 (1981) 141–184 4. Zeng, Q., Chellappa, R.: Estimation of illuminant direction, albedo, and shape from shading. IEEE Transactions on Pattern Analysis and Machine Intelligence 13 (1991) 680–702 5. Bruckstein, A.M.: On shape from shading. Computer vision, Graphics, and Image Processing 44 (1988) 139–154 6. Rouy, E., Tourin, A.: A viscosity solution approach to shape from shading. SIAM Journal of Numerical Analysis 29 (1992) 867–884 7. Kimmel, R., Sethian, J.: Optimal algorithm for shape from shading and path planning. Journal of Mathematical Imaging and Vision 14 (2001) 237–244 8. Pentland, A.: Local shading analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 6 (1984) 170–187 9. Tsai, P., Shah, M.: Shape from shading using linear approximation. Image and Vision Computing 12 (1993) 487–498 10. Blake, A., Zisserman, A.: Visual Reconstruction. MIT Press, Cambridge MA (1987)
Speed Up of Shape from Shading Using Graduated Non-convexity
513
11. Stewart, A., Langer, M.: Towards accurate recovery of shape from shading under diffuse lighting. IEEE Transactions on Pattern Analysis and Machine Intelligence 19 (1997) 1020–1025 12. Hadamard, J.: Lectures on the Cauchy Problem in Linear Partial Differential Equations. Yale University Press, (1923) 13. Mumford, D., Shah, J.: Optimal approximation by piecewise smooth functions and associated variational problems. Communications on Pure and Applied Mathematics 42 (1989) 577–685 14. March, R.: Visual reconstruction with discontinuities using variational methods. Image and Vision Computing 10 (1992) 30–38 15. Chipot, M., March, R., Vitulano, D.: Numerical analysis of oscillations in a non convex problem related to the image selective smoothing. Journal of Computational and Applied Mathematics 136 (2001) 123–133 16. Tebould, S., Aubert, L.B.F.G., Barlaud, M.: Variational approach for edge preserving regularization using coupled pde’s. IEEE Transactions on Image Processing 7 (1998) 387–397 17. Gelli, D., Vitulano, D.: Gnc for thin plate. Technical Report Q33-002, IAC-C.N.R. (2002) 18. Terzopoulos, D.: Image analysis using multigrid relaxation methods. IEEE Trans. PAMI 8 (1986) 129–139
Tissue Reconstruction Based on Deformation of Dual Simplex Meshes David Svoboda and Pavel Matula Laboratory of Optical Microscopy Faculty of Informatics, Masaryk University Botanick´ a 68a, 602 00 Brno, Czech Republic {xsvobod2,pam}@fi.muni.cz
Abstract. A new semiautomatic method for tissue reconstruction based on deformation of a dual simplex mesh was developed. The method is suitable for specifically-shaped objects. The method consists of three steps: the first step includes searching for object markers, i. e. the approximate centre of each object is localized. The searching procedure is based on careful analysis of object boundaries and on the assumption that the analyzed objects are sphere-like shaped. The first contribution of the method is the ability to find the markers without choosing the particular objects by hand. In the next step the surface of each object is reconstructed. The procedure is based on the method for spherical object reconstruction presented in [3]. The method was partially changed and was adapted to be more suitable for our purposes. The problem of getting stuck in local minima was solved. In addition, the deformation process was sped up. The final step concerns quality evaluation: both of the first two steps are nearly automatic, therefore the quality of their results should be measured. Keywords: Deformable models, dual simplex mesh, quality evaluation, reconstruction.
1
Introduction
Various pieces of experience in analyzing biomedical images show that tissue cells are quite heterogenous. Every cell is different and it is difficult to find a universal image analysis approach. The main characteristics such as similarity in shape, continuity of surface, volume, etc. are the only guide to choose which objects in the image could be supposed to be a cell and which couldn’t. Many techniques for the reconstruction of particular cells have been developed. Some of them have used thresholding [7,4], some others deformable models. It seems better to choose the deformable models as the way of object reconstruction because these techniques are capable of working with a priori knowledge about the shape of an object. They can handle missing or noisy data. For the first time, the deformation of curves in 2D was introduced by Kass et al. [6], who proposed active contours model. The idea was based on the successive I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 514–523, 2003. c Springer-Verlag Berlin Heidelberg 2003
Tissue Reconstruction Based on Deformation of Dual Simplex Meshes
515
deformation of the curve, defined by differential equations, until the stable state was reached. This model is very suitable for reconstruction of circular objects. Therefore Bamford & Lovell [1] applied it to biomedical data, especially cells. They simplified this method and improved it by solving problem of touching or overlapping cells. Unfortunately the extension of these methods to 3D space is nontrivial. Another approach has been chosen by Delingette. He proposed simplex meshes [2] and suggested to use them as a data model suitable for deformation. In his work the objects of general shape were reconstructed. Later Matula & Svoboda [3] used this idea and applied it to biomedical images. They have developed the idea and made it more suitable for the purpose of biomedical images. Unfortunately not many techniques for tissue reconstruction have been developed yet. Nowadays the reconstruction of biomedical objects based on mathematical morphology is very often used [15,13]. Also another technique based on deformable models was proposed. Sethian introduced level sets [8] based on moving general surface defined by partial differential equations. The idea was used by Solorazno et al., who presented this approach in tissue segmentation [5]. In this paper a new semiautomatic three-step method for tissue reconstruction is described. It is based on the deformation of a dual simplex mesh. This simplex mesh has the same meaning as the star-shaped simplex mesh showed in [3]. The attribute “dual” origins in the dual active contour method presented by Gunn [10]. Due to the duality of the mesh, setting the initial state of the deformed surface is simpler than in previous methods. Therefore, it seems to be convenient for tissue segmentation. Dual simplex mesh is defined in Section 2.1. Next the deformation principle and searching for energy minima of the mesh is presented. The application to real biomedical data is demonstrated. The results are discussed in Section 3.
2 2.1
The Method Definition of Dual Simplex Mesh
Definition 1. Let M be an arbitrary star-shaped simplex mesh [3] with the centre Q. Let H(Q, qi ) be homotheties with the centre Q and ratios qi , qi ∈< 1, ∞ >, ∀i ∈ {1, 2, . . . , n}, where n is the number of vertices in M . Let M be another star-shaped simplex mesh such that ∀Pi ∈ M ; Pi = H(Q, qi )(Pi ); Pi ∈ M ; i ∈ {1, . . . , n} and each Pi has the corresponding neighbours PiN 1 , PiN 2 and PiN 3 . The union D = M ∪ M is called dual simplex mesh. Note 1. All the vertices in D have the same properties as the vertices in the general simplex mesh. Note 2. Let us introduce the symbols used in the following text: The symbol D is used for any dual simplex mesh. M is called inner mesh and its mapping M is called outer mesh. The vertex Pi ∈ M and its mapping Pi ∈ M are called counterparts. The abscissa li connecting them is called connection line of vetrices Pi and Pi .
516
2.2
David Svoboda and Pavel Matula
Deformation of Dual Simplex Mesh
Suppose the vertices in the mesh can be moved using some of law of motion and suitable forces. Therefore the arbitrary movement is not permitted. In the following text the law of motion and principle of deformation of the dual simplex mesh are presented. Let Pi ∈ M be any vertex and Pi ∈ M its counterpart. The motion of −−−→ vertex Pi is allowed along the ray Pi Pi towards Pi only. Vice versa, the motion −−−→ of vertex Pi is allowed along the ray Pi Pi towards Pi only. Now let us define the two energies for each vertex in a dual simplex mesh (shortly mesh): Internal Energy. As mentioned above, for any vertex Pi it is possible to define many of its geometric properties. The simplex angle ϕi is the most important one. According to the values of ϕi the shape of the mesh in the local neighbourhood of vertex Pi and consequently the relative position of vertex Pi to its neighbours can be defined. Namely the local curvature and continuity of n-th (for any natural number n) derivative of the surface represented by the mesh are known. Vice versa, if the local curvature or derivative in vertex Pi is known (e. g. some specific smoothness of object surface is required) the desired simplex angle ϕ˜i is known and the position of the vertex Pi can be set to fulfil the appropriate requirements. The desired simplex angle is called reference simplex angle. Then, every time the surface is arbitrarily deformed, the shape constraint pushes the surface to the state required by ϕ˜i . There are several constraints applied to the surface smoothness [3]. In the following enumeration the two most important of them are presented: – shape constraint. ϕ˜i is usually set to constant value ϕ0i (zero index says it is the value of the simplex angle in the initial state of the mesh – before any deformation). – C 2 constraint. ϕ˜i is set to the average value of the simplex angles at neighbouring vertices. Now, let ϕ˜i be the reference simplex angle bearing a constraint for the vertex Vi in mesh D and let ϕi be the current simplex angle. The internal energy Eint (Vi ) in vertex Vi is defined as: ϕi −ϕ˜i , Vi ∈ M 2π Eint (Vi ) =
ϕ˜i −ϕi , Vi ∈ M 2π
(1)
and expresses the tension between the current position of vertex Vi with its simplex angle ϕi and desired position of this vertex presented by the reference simplex angle ϕ˜i . External Energy. Let li ∈ D be any connection line between its vertices Pi , Pi . Furthermore let image data be represented by the two image buffers:
Tissue Reconstruction Based on Deformation of Dual Simplex Meshes
517
Fig. 1. Composition of image gradient direction (left), gradient image (center) and a dual simplex mesh (right). For any vertex W ∈ IR3 the intensity of gradient value and gradient direction can easily be acquired.
Fig. 2. (left) Deformation with image gradient direction is correct. (right) Incorrect result – without using gradient direction.
– I . . . simply preprocessed (smoothing, median) data – |∇I| . . . gradient image evaluated from I Let us put all these data structures together (see Fig. 1), so that each particular voxel W ∈ li has its own I(W ) and ∇I(W ) values as well as its local neighbourhood. The external energy Eext (W ) in voxel W , which belongs to connection line li with direction vector v, is defined by: |∇I(W )| , (2) Eext (W ) = 1.0 − δW,v max(|∇I|)
where δW,v =
1, 0,
∇I(W ) · v > 0 otherwise
(3)
and max(|∇I|) is maximal possible value of |∇I|. Using the division, the value of external energy is normalized. Hence the external energy in voxel W represents the importance of gradient and its direction in image data lying at the position of voxel W . The lower the value of the external energy the more stable the position of any vertex occupying the place of voxel W is. The reason of using image gradient direction is explained in Figure 2. A similar problem has been solved in [1]. It can be seen, that after computing the external energy in each voxel of connection line li , ∀i ∈ {1, ..., n} both buffers I and |∇I| can be removed. This step has two advantages:
518
David Svoboda and Pavel Matula
1. There is no need of computing the external energy later during the deformation. Everything has already been precomputed. 2. The image buffers are about 1300 × 1030 × 40 voxels stored in main memory. Hence the permanent allocation of this huge memory block is undesirable. Now the two different energies are defined. Internal energy represents the connectivity of mesh. External energy expresses the important voxels in image space. Law of motion. The same law of motion as in general simplex meshes [2] and star-shaped simplex meshes [3] was used. Vertices of a mesh are considered as physical mass submitted to a Newtonian law of motion: dPi d2 Pi + F(Pi ), = −γ (4) dt2 dt where m is the vertex mass and γ is the damping factor [2]. F is the force applied to each vertex and is defined in the following text. The evolution of the mesh in time under this law of motion can be discretized. It is done by using central finite differences: m
Pit+1 = Pit + (1 − γ)(Pit − Pit−1 ) + F(Pi ) outer mesh M' inner mesh M
E(Pi ) < E(Pi')
Pi'
(5)
li Pi
Q Pj
E(Pj ) > E(Pj')
Pj'
Fig. 3. Force definition: Each vertex moves towards its counterpart only. The movement is enabled/disabled according to the values of potential energies of particular vertices.
The principle is explained in Figure 3. For each connection line li between the corresponding vertices Pi and Pi in the mesh the four energies could be computed: Eint (Pi ), Eext (Pi ), Eint (Pi ), Eext (Pi ). They are used in this way: E(Pi ) = αEint (Pi ) + (1 − α)Eext (Pi )
E(Pi ) = αEint (Pi ) + (1 − α)Eext (Pi )
(6) (7)
where α ∈< 0; 1 > is a parameter affecting the smoothness of reconstructed object. The force F is defined as a shrinkage of the connection lines li : Let li be any connection line with its vertices Pi and Pi . One of them has higher energy and therefore is less stable. This vertex is shifted towards its counter part. If both of them have the same energy level, the outer one is chosen for motion.
Tissue Reconstruction Based on Deformation of Dual Simplex Meshes
2.3
519
The Algorithm
Inputs. When proposing this method, the aim was to eliminate the necessity of choosing many parameters and particular objects by hand. Therefore the algorithm requires the following input only: image data, minimal and maximal diameter (din and dout , respectively) of studied objects, required smoothness α and damping factor γ. Outputs. The set of appropriately deformed dual simplex meshes (see Fig. 6) is returned. The contour lines (see Fig. 5 (right)) drawn back onto source data to view the exact shape of reconstructed surface is optional. In this way, the results can easily be observed. The presented method is semiautomatic and consists of three steps: Searching for Markers. The aim of this step is to find object markers (see Fig. 5 (left)). In this case the tissue cells are the objects. One marker belongs to one objects and is its approximate center of gravity. There should not be two or more markers matching one object. Nearly one-to-one correspondence between markers and cells is expected. The procedure is defined in Figure 4.
1. Read the image data and parameters from an input and create convolution kernel K in the shape of sphere with radius din . 2. buf ← |∇I| ⊗ K 3. buf ← I \ buf 4. buf ← buf ⊗ K 5. markers ← NonMaxSuppression(buf ) Fig. 4. Markers searching algorithm: Symbol ⊗ represents convolution operator, symbol \ represents voxel by voxel subtraction of two images. Function called “NonMaxSuppression(buf )” leaves only the local peaks in the image buffer (see [12]). By means of this many redundant markers are suppressed.
The procedure stems from the idea of Hough transform [16] and is similar to the approach used in [5]. The main advantage of this approach is its noise robustness. It is well-known that the Hough based methods are very time consuming. Therefore the Fast Fourier Transform [16] was used when implementing the convolution to speed up the whole process. Purposely, we avoid of using mathematical morphology methods [15], such as watershed algorithm, because of the undesirable over-segmentation phenomenon. Finally the set of markers in space is obtained as a result. If there are some markers too close to each other (closer than a value of din ), then the marker with higher quality is left, the other one is removed.
520
David Svoboda and Pavel Matula
Deformation. The set of markers and the two parameters (din and dout ) are used to construct the appropriate set of dual simplex meshes. Each dual simplex mesh has its own center and corresponds to one of the markers. All dual simplex meshes are constructed based on the same parameters – din and dout . The first one serves for the construction of inner mesh, the second one for the outer mesh. When all the meshes are constructed and consequently composed with image buffers (see Figure 1) , the underlying data can be read and the external energies for particular voxels can be precomputed. Now everything is prepared for the deformation. Deformation is an iterative process. In each step all the objects in space IR3 and therefore all the connection lines are exposed to the law of motion. One iteration is equivalent to shortening each of the connection lines. All the connection lines have finite length and the magnitude of the force F is equivalent to at least one voxel, therefore the iterative process is finite. The deformation continues until some shortening is no longer possible. Note 3. At this moment the deformation of mesh D is stopped and inner mesh M and outer mesh M are identical. As mentioned above, the dual simplex mesh is tailored for approximation of the spherical-like objects. In general any objects topologically equivalent to a sphere may be approximated with this model. For simplicity of the implementation of initial dual simplex mesh the reconstruction of spherical-like objects was supposed only. The Evaluation of the Results Quality. The only input parameters, besides image data, given to the algorithm are inner diameter, outer diameter, smoothness parameter and damping factor. It is not obvious to say that the results of the previous step are correct. It’s essential to assess the quality of the results. The assessment is made by measuring some properties of resulting objects (volume, surface, roundness, shape, . . . ). According to these measures the results are evaluated by the user to be suitable or improper for further processing or another application.
3
Results and Discussion
All presented images were cell nuclei whose interior was stained with DAPI. The images were acquired using a confocal microscope Zeiss Axiovert 100 equipped with a CARV confocal unit and a MicroMax CCD camera. The resolution was 0.124µm in the lateral (x, y) direction and 0.3 − 0.5µm in the axial (z) direction. The images had in average 40 z-slices and were acquired and stored using FISH 2.0 software package [4,9]. The dimension of processed data was 1300 × 1030 × 40 of voxels. All the computations were performed on Intel Pentium IV 1.4 GHz machine with RAM 256 MB and Red Hat Linux OS. During the first step of the computation the markers were searched for. (This is the very important part of the algorithm.) If some markers are localized incorrectly or omitted, the consecutive deformation is not able to mend it. During
Tissue Reconstruction Based on Deformation of Dual Simplex Meshes
521
Fig. 5. (left) The input image with overlaid markers, that were found in the first step of the algorithm. Markers are the approximate centers of gravity of studied objects. (right) The results can be stored in many different ways. Drawing contour lines into each slice of source image is one of them. (The depicted contours were obtained by setting the smoothness parameter as well as the damping factor to the value of 0.5).
the assessment the objects falsely denoted as cell may be deleted, but those, that are missing, can’t be recovered. Many runs of the first part of the algorithm were performed. It was found that the correct setting of the parameters of inner diameter and outer diameter (din and dout , respectively) is very important. Unfortunately this procedure is highly demanding on the CPU time and memory consuming. More than 1GB of main memory was required at one time. Hence the source images had to be cut into large amount of smaller partially overlapping subimages. Then these were processed consequently without any memory problems. For image data of 1300 × 1030 × 40 dimensions it took 45 mins. On the other hand a significant acceleration of this process will be possible after distributing the task among two or more computers. Another feature of this step, that is very important, is the noise robustness. When studying procedure in Figure 4, it can be seen that only the gradient image evaluation is noise sensitive. If e. g. Canny edge detector or ISEF (see [14]) is used, the noise is suppressed. In addition the input image was preprocessed (smoothing or median filter) before any computation. The procedure was tested on different sets of input data. One possible result is depicted in Figure 5 (left). This and many other analogous results were considered by biologists as sufficiently valuable and suitable for next usage. The deformation step is also noise insensitive [3]. Moreover, it gives very accurate results. It could easily be adjusted by the two parameters: smoothness and damping factor (α and γ, respectively). Setting the appropriate values for both of these parameters was simplified by the fact that all the computed energies and consequently the results were normalized. The deformation is stable and it takes only little time to perform the whole computation. The initialization of one cell (dout < 100) took approximately 230 ms and the deformation of the same cell took 2.5 sec in average. As could be seen in Figure 5 (right), the results were very accurate and therefore feasible for further computation and measurement. Theoretically, reconstruction of objects of any size and shape topologically equal to a sphere is possible with using our method. With the apriori knowledge of the likely shape of the reconstructed objects and because of the programming
522
David Svoboda and Pavel Matula
Fig. 6. Reconstructed image visualized using OpenGL library. The large box encloses the image data space. The small box designates one cell in sequence for which the measurement and the evaluation is performed.
simplicity only the sphere-like shaped objects such as cells were submitted to the reconstruction process. The evaluation step of the algorithm is based on processing the basic measures used in biomedical statistics: volume, surface, roundness, etc. A simple application was implemented for the visualization of the results (see Figure 6). The preliminary results demonstrate that the algorithm converges to the nuclear surface, and that it admits a range of variation in the quality of the staining within and between images. In the forthcoming research some more sophisticated and unsupervised methods for results quality evaluation, such as those published in [11], will be used.
Acknowledgements Many thanks are due to I. Koutn´ a who provided large amount of data suitable for testing. This work was supported by the Ministry of Education of the Czech Republic (Project No. MSM-143300002) and by the Academy of Sciences of the Czech Republic (Grants No. S5004010 and No. B5004102).
References 1. P. Bamford and B. Lovell. Unsupervised Cell Nucleus Segmentation with Active Contours. Signal Processing Special Issue: Deformable Models and Techniques for Image and Signal Processing, vol. 71, p. 203–213, December, 1998 2. H. Delingette. General object reconstruction based on simplex meshes. International Journal of Computer Vision, 32(2):111–146, 1999
Tissue Reconstruction Based on Deformation of Dual Simplex Meshes
523
3. P. Matula, D. Svoboda. Spherical Object Reconstruction Using Star-Shaped Simplex Meshes, in Figueiredo M.A.T., Zerubia J., Jain A.K. (Eds.): EMMCVPR 2001, LNCS 2134, pp. 608–620, 2001 4. M. Kozubek, S. Kozubek, E. Luk´ aˇsov´ a, A. Mareˇckov´ a, E. B´ artov´ a, M. Skaln´ıkov´ a and A. Jergov´ a. High-resolution cytometry of FISH dots in interphase cell nuclei. Cytometry, vol. 36, p. 279–293, 1999 5. Solorzano C. O., Malladi R., Lelievre S. A. & Lockett S. J., Segmentation of nuceli and cells using membrane related proteins markers, Journal of Microscopy, Vol. 201, Pt 3, March 2001, pp. 404–415 6. M. Kass, A. Witkin and D. Terzopoulos. Active contour models. International Journal of Computer Vision, 1(4):133–144, 1987 7. H. Netten, I. T. Young, L. J. Van Vliet, H. J. Tanke, H. Vrolijk and W. C. R. Sloos. FISH and chips: automation of fluorescent dot counting in interphase cell nuceli. Cytometry, 28:1–10, 1997 8. J. A. Sethian. Level Set Methods and Fast Marching Methods, Cambridge University Press, 1999 9. M. Kozubek. High-resolution cytometry: Hardware approaches, image analysis techniques and applications, PhD thesis, Masaryk University, Brno, 1998 10. S. R. Gunn and M. S. Nixon. A Dual Active Contour. BMVC 94, September, York, U.K, 305-314, 1994 11. M. Ankerst and G. Kastenm¨ uller and H.-P. Kriegel and T. Seidl. 3D Shape Histograms for Similarity Search and Classification in Spatial Databases, Advances in Spatial Databases, 6th International Symposium, SSD’99, vol. 1651, p. 207–228, 1999 12. F. Devernay. A non-maxima suppression method for edge detection with sub-pixel accuracy. Technical report, INRIA, 1995 13. K. Rodenacker, M. Aubele, P. Hutzler, and U. Adiga P. S. Groping for quantitative digital 3-d image analysis: An approach to quantitative fluorescence in situ hybridization in thick tissue sections of prostate carcinoma. Anal Cell Pathol, 15:19–29, 1997 14. J. R. Parker. Algorithms for Image Processing and Computer Vision, John Wiley & Sons, NY 1997 15. P. Soille. Morphological image analysis: principles and applications, Springer Verlag, Berlin 1999 16. W. K. Pratt. Digital image processing, 3rd ed., Wiley, New York 2001, ISBN: 0471-37407-5
Spherical Object Reconstruction Using Simplex Meshes from Sparse Data Pavel Matula and David Svoboda Laboratory of Optical Microscopy Faculty of Informatics, Masaryk University Botanick´ a 68a, CZ–602 00 Brno, Czech Republic {pam,xsvobod2}@fi.muni.cz
Abstract. A new method for spherical object reconstruction based on deformation of star-shaped simplex meshes has been developed in our laboratory and published recently. The method can handle volumetric as well as three-dimensional range data and is easy to use and relatively fast. The method, however, can yield wrong results for sparse data. The goal of this paper is to describe a modification of the method that is suitable also for sparse data. The performance of the proposed modification is demonstrated on real biomedical data. Keywords: Spherical objects, object reconstruction, deformable models, sparse data, simplex mesh, volumetric image segmentation.
1
Introduction
Spherical object reconstruction is of great importance, especially in the field of cell biology in the research of 3D organisation of the human genome, since both cells and cell nuclei mostly have the shape of a deformed sphere. Biological applications very often require processing of a large number of cells or cell nuclei to obtain statistically significant results. One of the key issues in this field is to have a good 3D model of object boundaries. Therefore fast, reliable and precise procedure for the automatic image analysis and objects reconstruction is needed. Good reconstruction methods in this field must yield good results also for sparse data. The reason for this is that the biochemical visualisation of nuclear envelope using standard methods of molecular biology and cytology is not always completely successful and only several parts of nuclear surface can clearly be visible in volumetric image data. Therefore even the best image analysis technique cannot determine, in principle, the whole nuclear envelope. Many techniques for the reconstruction of cell nuclei are based on thresholding [14,1,5]. In this case the nuclei are represented as a set of voxels and a boundary representation can be produced by boundary tracking algorithms (see [7]). Isosurfacing methods based on marching cubes algorithm [8] are also used [6,4]. However, these methods do not handle missing and noisy data and can be hardly used for reconstruction from sparse input data, because they make no assumption about the shape to recover. I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 524–533, 2003. c Springer-Verlag Berlin Heidelberg 2003
Spherical Object Reconstruction Using Simplex Meshes from Sparse Data
525
Deformable modelling [13,12] is suitable for the reconstruction of objects from incomplete and noisy data because it can exploit an apriori knowledge about the reconstructed shape. There are several existing frameworks for deformable models, but a common approach is to consider the deformation as an iterative process minimising a required trade-off between an internal energy controlling apriori known shape qualities of the model and an external force controlling the closeness of fit. Three-dimensional objects of an arbitrary shape and topology can be reconstructed using Delingette general algorithm, which is based on deformation of simplex meshes [3]. The algorithm works for both volumetric and range data. One of the advantages of simplex meshes is that their deformation can be performed in a simple and efficient manner. Simple deformation is possible at the expense of not having a global functional for guiding the minimisation. Certain apriori knowledge about the shape of an unknown object have to be involved in the deformation process only by means of constraints on the local shape of the simplex mesh. The idea of the general algorithm was exploited during the design of a new method for spherical object reconstruction [11]. Star-shaped simplex meshes are considered for object representation instead of general simplex meshes. Therefore, a deformation scheme can be simplified and the star-shaped method is faster and more stable (in the meaning of convergence for more minimisation parameters) [9]. This paper discusses the usage of both the general and star-shaped method for spherical object reconstruction from sparse data. First, the necessary background of the methods is given in Sect. 2. In Sect. 3, the usage of the methods on sparse data is discussed and a modification of the methods is proposed to make it more suitable for spherical object reconstruction from sparse data. Section 4 presents an application example of the modified star-shaped method.
2
Method Background
This section reviews the basics of reconstructions using simplex meshes and star-shaped simplex meshes. 2.1
Simplex Meshes
The surface of an object can be represented using a simplex mesh [2,3]. Simplex mesh is a structure consisting of vertices and edges. The vertices are points in 3D space. Every edge connects two distinct vertices. The shorter the edges are the more details of the surface can be modelled. Important property of a simplex mesh is that each vertex has exactly three neighbouring vertices connected via edges (see Fig. 6). A simplex mesh is called star-shaped (has the shape of a star) if a point exists inside the mesh such that any ray going from the point intersects the mesh only once. The set of such points inside the mesh is called kernel of star-shaped simplex mesh. Thanks to the property of three neighbours, the following definitions can be provided. Tangent plane at a vertex is given by its three neighbours. Normal
526
Pavel Matula and David Svoboda
vector at a vertex is equal to the normal vector of the tangent plane. Local shape of the simplex mesh can be controlled by means of a simplex angle. The simplex angle at a vertex is related to the local mean curvature of the surface at this vertex. Only one sphere can be circumscribed to the vertex and its three neighbours. This sphere can be seen as an approximation of the local shape near the vertex. The simplex angle expresses how much this sphere is elevated upon or sunken bellow the tangent plane. The simplex angle is invariant to the position of the vertex on the sphere and to the position of the three neighbours on a circle circumscribed to these three points. The simplex angle is also invariant to translation, rotation and scale transformations [2]. 2.2
Deformation of Simplex Meshes
Law of Motion. All vertices of a simplex mesh are considered as a physical mass submitted to a Newtonian law of motion including internal and external forces. The discrete formula is [3]: Pit+1 = Pit + (1 − γ)(Pit − Pit−1 ) + αF int + βF ext ,
(1)
where Pit is a position of i-th vertex in time t. Internal force F int and external force F ext are computed at time t and are defined further. Real parameter γ is the damping factor. Real parameters α and β must belong to a given interval to guarantee a stable scheme and their ratio expresses the trade-off between influence of internal and external forces, i.e. between required local shape of the mesh and the closeness of fit. All forces deforming a star-shaped simplex mesh are acting only along rays called deformational rays from a proper point in the kernel of an initial simplex mesh (usually the centre of an initial ellipsoid is used). The point is called deformational centre. In this way the star-shaped quality is preserved during the deformation process. The general and star-shaped methods differ in the definition of the internal and external force. Internal Force. The internal force of Delingette’s general algorithm is defined as the composition of a tangential force and a normal force. The goal of the tangential force is to control the vertex position with respect to its three neighbours in the tangent plane, i.e. uniformly spread the vertices of the final mesh. The normal force is acting in order to change the local mean curvature at a vertex to the required local shape. The requirements are expressed by means of a reference simplex angle. The reference simplex angle can be determined in four ways [3]. However, only two of them are valuable for spherical object reconstruction: shape constraint. The reference simplex angle ϕ˜i is equal to a constant value ϕ0i . The value can be, for example, computed from an initial ellipsoid or sphere. In this way the internal force can keep the spherical shape of the mesh.
Spherical Object Reconstruction Using Simplex Meshes from Sparse Data
527
C 2 constraint. The reference simplex angle ϕ˜i is computed as an average of the simplex angles at neighbouring vertices: 1 ϕj , ϕ˜i = n si j∈Q
(Pi )
where Qsi (Pi ) is the set of all vertices, which are reachable in at most si steps via edges from the vertex Pi (neighbourhood with size si ). Number n is the cardinality of this set. The neighbourhood size si corresponds intuitively to the notion of rigidity, or deformation scale. The tangential force is not needed for the deformation of star-shaped simplex meshes. A deformational ray plays the role of this force. The internal force of the star-shaped method behaves only like the normal force of the general method. The difference is only in the direction of activity and in the way of computation. The role of both forces is the same [11]. External Force. The external force for general simplex meshes can be defined for both volumetric and range data [3]. In all cases, the external force is directed along the normal direction ni at the vertex Pi , where the force is applied. For every vertex Pi of the mesh, the closest point Qi is searched in a given scope and the force F ext is then computed as F ext = ((Qi − Pi ) · ni )ni ,
(2)
˜ where G ˜ is an user defined parameter called gravitation if (Qi − Pi ) · ni < G, limit. The external force is set to 0 otherwise. The force is computed as the projection of the vector (Qi − Pi ) in the normal direction. Notice that if there is no data point in the vicinity of a mesh vertex (defined by the gravitation limit) then no external force influences the vertex and the vertex is only submitted to internal (regularization) force. The force F ext is defined similarly for star-shaped simplex meshes. The force is not, however, projected in the normal direction of the vertex, but onto the corresponding deformational ray [11,9]. 2.3
Reconstruction Algorithm
The general reconstruction algorithm works in two stages. First, the initialisation stage creates a simplex mesh in the vicinity of the data and then, after a few iterations, the mesh topology can be modified. Finally, an iterative refinement algorithm decreases the distance of the mesh from the data while preserving high geometric and topological quality. During the first stage Delingette recommends to use C 2 constraint for internal force computation and set the neighbourhood size si to high values (≈ 10) and β to low values (≈ 0.1) in order to obtain smooth and large scale deformations ˜ is recommended to be set to relatively of the mesh. The gravitation limit G large value (up to 20 % of the overall size of the dataset) in order to allow large deformations of the mesh.
528
Pavel Matula and David Svoboda
Pi X
Fext
0
. Pi
1
O
Fig. 1. Reconstruction of the cube vertices. One iteration was performed while only external forces were active. Thanks to the perpendicular projection the mesh vertices lie on spheres given by the cube vertices and the centre of the initial mesh. The result is the same for both the general and star-shaped methods.
The second stage is recommended to be performed with parameters set as ˜ < 8 %. The deformation is stopped when the follows si ≈ 1, β ≈ 0.5, and G change of the mesh is not significant (is less then a predefined constant ε). Remaining parameters are recommended to be set during the whole reconstruction process as follows α = 0.5, γ = 0.65. The reconstruction algorithm based on star-shaped simplex meshes works also in two stages. However, the initialisation phase is usually performed in only one iteration. The iteration is done with parameters α = 0, β = 1, i.e. only external forces are active while gravitation limit is large. Therefore the mesh settles onto the input data in a given scope and interpolates them. The task for the refinement stage, where the internal forces are employed and the influence of the external forces is reduced, is to smooth the mesh.
3 3.1
Reconstruction from Sparse Data The Problem
Both the general and the star-shaped methods can fail in case of sparse data. The source of the problem of the methods can be demonstrated on a simple example of sparse data reconstruction described bellow. Let the input data be eight vertices of a cube with unit edges (vertices have coordinates [± 12 , ± 12 , ± 12 ]). The initial mesh have a shape of the unit radius sphere with centre in the origin O. If one ˜ = 2 is iteration according to (1) with parameters γ = 1, α = 0, β = 1, and G performed, i.e. only external forces are active and are strong, then the mesh have the shape as in Fig. 1. This shape is a result of the definition of external force. Perpendicular projection of the nearest data point X is computed and therefore the mesh vertices lie on the circles with diameter XO. Notice that the normal directions coincide with the direction of the deformation rays. The behaviour of both the general and star-shaped method is the same in this example.
Spherical Object Reconstruction Using Simplex Meshes from Sparse Data
529
Evidently, the definition of the external force is in conflict with an apriori knowledge about the spherical shape. This problem especially affects sparse data reconstruction during the initialisation stage, because the gravitation limit is typically large. The bad initialisation stage can significantly slow down the whole reconstruction process. However, the refinement stage is affected as well. The gravitation limit must be sufficiently high during the minimisation in order that the external forces can keep the mesh near the data points. Therefore protuberance occur close to the data points and the surface is consequently underestimated. See Fig. 2.
Fig. 2. (left) The mesh was computed by the general method with parameter setting βi = 0.1, si = 10, αi = 0.5, γ = 0.8 during the initialisation stage (10 iterations, ˜ = 0.1 during the refinement stage C 2 constraint) and αi = βi = 0.5, γ = 0.8, G (250 iterations, shape constraint). The initial mesh had a shape of the unit sphere with centre in the origin. Eight data points of coordinates [± 12 , ± 12 , ± 12 ] were shifted to the left by the vector [0, 0.14, 0]. Perpendicular projection was applied for external force computation. Notice that protuberance occur close to the data points and the surface is underestimated. (right) The mesh was computed by the star-shaped method, which was modified according to Section 3.2, i.e. spherical projection was applied for external force computation. The initial mesh and data points were the same as on the left. Refinement stage was performed (250 iterations) with parameters αi = βi = 0.5, ˜ = 0.1. γ = 0.8, G
3.2
The Solution
We suggest using a spherical projection instead of the perpendicular projection for spherical object reconstruction from sparse data as a solution to this problem. Let O be a deformational centre (see Sect. 2.2) and X be the closest point of vertex Pi . The external force is redefined X − O . (3) F ext = (Pi − O) 1 − Pi − O The proposed solution is illustrated in Fig. 3. The novel external force definition for spherical object reconstruction can also be used in the general method.
530
Pavel Matula and David Svoboda
Pi X
F
.
Pi F
X
Y
Y
O
O
Fig. 3. Figure illustrates the proposed solution of the problem from Fig. 1. Spherical projection is used in the external force computation instead of the perpendicular projection.
Only the point O must be defined. One of the possible definitions is to take the point O equal to the centre of the initial ellipsoidal mesh. It is assumed that the mesh does not move much and only its shape and size are changed during the deformation. The spherical projection is computed in the normal direction ˜ for the maximal value of of mesh points. In both cases the gravitation limit (G) an external force is applied. Notice that considering this redefinition, the reconstruction of cube vertices in the first example above reaches the sphere circumscribed to the cube already after the first iteration.
4
Application Example
Both the general method and the star-shaped method with the proposed external force redefinition can be applied for cell nucleus reconstruction from sparse data. Only the results of the star-shaped method are presented because the results do not differ significantly for the general method. The test volumetric images of 3D fixed cell nuclei obtained from the stabilised cell line of human colon adenocarcinoma HT-29 (the nuclear envelope was visualised using Lamin B) were acquired by a confocal microscope. The typical size of the data was 140x140x40 voxels. The resolution was 0.124 µm/voxel in the lateral and 0.3 µm/voxel in the axial direction. The images were manually cropped from large volumetric images. One of the test images is depicted in Fig. 4 (left column) as two cross-sections through the data. The fractions of the nuclear envelope visible in images were automatically extracted by image analysis methods (Gaussian smoothing was followed by thresholding with a suitable threshold). Results of this step are presented in Fig. 4 (middle column). The extracted voxels were then considered as a point set. The point set was approximated by an ellipsoid in a least squares manner. The ellipsoid superimposed onto the image data is presented in Fig. 4 (right column). In the last step before running the deformation process the ellipsoid was covered with an initial mesh. The mesh was deformed according to the reconstruction algorithm for starshaped simplex meshes (Sect. 2.3). The external force was computed by the
Spherical Object Reconstruction Using Simplex Meshes from Sparse Data
531
Fig. 4. Input data. The upper row shows in sequence from left to right: xy slice (z = 20) of the raw 3D input image, object boundary points extracted from this slice, and projection of an initial ellipsoid fitted to the boundary points onto the 20th xy slice; the lower row corresponds to the same data, only xz slice (y = 50) is shown.
Fig. 5. The final mesh is projected onto the raw input data. The upper row shows xy slices at z = 1, 4, 7, 10, 13. The lower row shows xy slices at z = 15, 18, 21, 24, 27.
proposed definition (3). The best results were obtained for the following parameters. The initialisation stage (one iteration) was run with parameters α = 0, ˜ ≈ 30 % of the dataset size. The refinement stage was run with paramβ = 1, G ˜ ≈ 5 % of the dataset size. Projection of the eters α = 0.8, β = 0.2, γ = 0.8, G mesh onto the raw data after 100 iterations is shown in Fig. 5. The shape of the final mesh and its relation to the extracted voxels is shown in Fig. 6.
5
Conclusion
The problem of the applicability of deformable simplex meshes for spherical object reconstruction from sparse data was studied. We have discovered that the general [3] and the star-shaped [11,9] reconstruction algorithms can yield poor results on sparse data. A redefinition of external forces suitable for spherical
532
Pavel Matula and David Svoboda
Fig. 6. (Left) Range data extracted from the input volumetric image and the fitted mesh. (Right) The final mesh alone.
object reconstruction was proposed in this paper. The method was tested on real biological data acquired by a confocal microscope. The proposed modification increases the speed of the reconstruction process (the number of iterations is reduced) and the final meshes seem to be more natural. The assumption about the spherical shape of the reconstructed objects is expoited by the novel external force definition in a natural way.
Acknowledgements This work was supported by the Ministry of Education of the Czech Republic (Project No. MSM-143300002) and by the Academy of Sciences of the Czech Republic (Grants No. S5004010 and No. B5004102). We thank Irena Koutn´ a for providing the image data. We also thank both reviewers for their useful comments.
References 1. C. O. de Sol´ orzano, E. Garc´ıa Rodriguez, A. Jones, D. Pinkel, J. W. Gray, D. Sudar, and S. J. Lockett. Segmentation of confocal microscope images of cell nuclei in thick tissue sections. Journal of Microscopy, 193:212–226, 1999. 2. H. Delingette. Simplex meshes: a general representation for 3D shape reconstruction. Technical Report 2214, INRIA, France, 1994. 3. H. Delingette. General object reconstruction based on simplex meshes. International Journal of Computer Vision, 32(2):111–146, 1999. 4. F. Guilak. Volume and surface area measurement of viable chondrocytes in situ using geometric modelling of serial confocal sections. Journal of Microscopy, 173(3):245–256, 1994. 5. M. Kozubek, S. Kozubek, E. Luk´ aˇsov´ a, A. Mareˇckov´ a, E. B´ artov´ a, M. Skaln´ıkov´ a, and A. Jergov´ a. High-resolution cytometry of FISH dots in interphase cell nuclei. Cytometry, 36:279–293, 1999.
Spherical Object Reconstruction Using Simplex Meshes from Sparse Data
533
6. L. Kub´ınov´ a, J. Jan´ aˇcek, F. Guilak, and Z. Opatrn´ y. Comparison of several digital and stereological methods for estimating surface area and volume of cell studied by confocal microscopy. Cytometry, 36:85–95, 1999. 7. G. Lohmann. Volumetric Image Analysis. John Wiley & Sons, Inc. and B. G. Teubner, 1998. 8. W. E. Lorensen and H. E. Cline. Marching cubes: A high resolution 3D surface construction algorithm. In Computer Graphics (SIGGRAPH ’87), volume 21, pages 163–169, 1987. 9. Pavel Matula. Effectivity of spherical object reconstruction using star-shaped simplex meshes. In Guido M. Cortelazzo and Concettina Guerra, editors, 1st International Symposium on 3D Data Processing Visualisation and Transmission, pages 794–799, Padova, Italy, June 2002. IEEE Computer Society. 10. Pavel Matula. Threedimensional object reconstruction and its application in cytometry. PhD thesis, Faculty of Informatics, Masaryk University, Brno, september 2002. in czech. 11. Pavel Matula and D. Svoboda. Spherical object reconstruction using star-shaped simplex meshes. In M´ ario Figueiredo, Joasine Zerubia, and Anil K. Jain, editors, Energy Minimization Methods in Computer Vision and Pattern Recognition, volume 2134 of LNCS, pages 608–620, Sophia Antipolis, France, September 2001. Springer Verlag. 12. T. McInerney and D. Terzopoulos. Deformable models in medical image analysis: A survey. Medical Image Analysis, 1(2):91–108, 1996. 13. J. Montagnat, H. Delingette, and N. Ayache. A review of deformable surfaces: topology, geometry and deformation. Image and Vision Computing, 19:1023–1040, 2001. 14. H. Netten, I. T. Young, L. J. Van Vliet, H. J. Tanke, H. Vrolijk, and W. C. R. Sloos. FISH and chips: automation of fluorescent dot counting in interphase cell nuclei. Cytometry, 28:1–10, 1997.
A System for Modelling in Three-Dimensional Discrete Space Andreas Emmerling, Kristian Hildebrand, J¨ org Hoffmann, Przemyslaw Musialski, and Grit Th¨ urmer Computer Graphics, Visualization, Man-Machine Communication Group Faculty of Media, Bauhaus-University Weimar, 99421 Weimar, Germany {andreas.emmerling,kristian.hildebrand,joerg.hoffmann, przemyslaw.musialski,grit.thuermer}@medien.uni-weimar.de
Abstract. A system for modelling in three-dimensional discrete space is presented. Objects can be modelled combining simple shapes by set operations to obtain large and regular shapes. The system also supports aspects of free-form modelling to generate organic and more complex shapes. Techniques known from image processing are applied to transform and to smooth objects. The basic geometric transformations translation, rotation, scaling, and shearing are provided for discrete objects.
1
Introduction
The growing interest of computer graphics in the three-dimensional discrete space opens up a new application field of volume data: volume-based interactive design and sculpting [10,16]. This requires a new modelling approach based on the discrete space which deals with the generation and manipulation of synthetic objects. If a solid object is represented in continuous space by its boundary surfaces, e.g. by a polygon mesh, the manipulation of local geometric properties may effect the entire surface representation. In contrast, such local manipulations of objects can be easily performed if the objects are represented in discrete space. Moreover, objects modelled in discrete space can be directly merged with measured data, e.g. as obtained from computed tomography. This is frequently necessary in applications of virtual reality in medicine [17]. A number of systems have been already developed for modelling in discrete space, which are mainly concerned with special cases either in the way of modelling [2,4] or in the representation of the discrete space [13,5,9]. Rastering geometric descriptions of continuous objects is one approach. For example, object boundaries are modelled using NURBS [19]. The continuous representation of the boundaries is voxelized to obtain a set of voxels as discrete representation. Modelling with conventional surface based modellers and representing the resulting objects in discrete space tries to make advantage of both kinds of representation [6,19,11]. Less hybrid systems have been developed using Constructive Solid Geometry (CSG) [15,2,3], whereas CSG is not always related to classical solid geometry. I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 534–543, 2003. c Springer-Verlag Berlin Heidelberg 2003
A System for Modelling in Three-Dimensional Discrete Space
535
Instead, simple elementary objects are used to modify a model in the context of free-form modelling by adding voxels to or subtracting voxels from an object by set operations. This modelling technique is often called volume sculpting. Freeform modelling is well-suited to model objects which cannot be easily described by basic geometric shapes. Another enhancement of the CSG approach is the idea of sweeping objects. A volume can be created by sweeping a two- or three-dimensional template along three-dimensional trajectories [1,18]. Sweeping gives good results if accuracy is significant since the movement is done along a predefined continuous curve. There exists another extension of CSG for volume graphics: Constructive Volume Geometry (CVG) [8,7] in which the objects are represented by their scalar fields. In case more than one object occupies the same voxel their scalar values, e.g. colours, are adjusted in this voxel. We have been developing a system for modelling in three-dimensional discrete space to experiment with different modelling approaches. In the future, we want to have a modelling system which relies on the advantages of representing objects by volumetric data combined with a functionality and an easy handling known from modelling in continuous space. The system works consistently in discrete space. We never make use of an immediate continuous representation of the objects as other systems do, e.g. [19]. The first results of our investigation are presented in this paper. The paper is organized as follows: Section 2 outlines the fundamentals of the modelling system and states our basic assumptions. Section 3 deals with the properties of objects in the system. Section 4 is concerned with the functionality provided by the system to generate and manipulate objects. Afterwards, the implementation and the interface are briefly described in Sect. 5. Finally, Sect. 6 summarizes the paper.
2
Modelling Approach
The n-dimensional discrete space ZZ n is constituted by the n-dimensional array of points with integer coordinates in the Cartesian coordinate system. An object in discrete space is a subset of ZZ n . There exists another approach to define objects in discrete space based on the assumption that the discrete space is a tessellation of the continuous space: a point in ZZ n is assumed to represent an ndimensional unit cube. In ZZ 3 , such a unit cube is called voxel. If each unit cube has homogeneous properties, the two representations are basically exchangeable in volume modelling. We are interested in modelling of solid objects in discrete space, i.e. an object in discrete space has homogeneous properties such that each point of the object has the same properties, e.g. colour. As stated above, an object in discrete space is assumed as a set of points of ZZ n . Especially in three-dimensional discrete space, an object is a set of voxels. A scene is a set of objects placed in the discrete space, whereby each point of the space is either empty if belongs to none object, or it is full, then it belongs to exactly one object.
536
Andreas Emmerling et al.
root workplace desk desk-top
tools
monitor sheet mug lamp pencil left right crosstable-leg table-leg beam
Fig. 1. Compound object.
base
Fig. 2. Object hierarchy.
The system supports two basic modelling approaches: on the one hand, there are CSG tools to generate and manipulate large regular shapes by set operations with basic geometric shapes. Such shapes require also basic geometric transformations, e.g. translation, rotation, scaling, and shearing. On the other hand, there are tools for free-form modelling to obtain organic and more complex shapes. These approaches are described in detail in Sect. 4. Modelling complex objects or scenes containing a rather large number of objects requires the possibility to compound, decompound and recompound objects. For this reason, our system supports an object hierarchy which can be dynamically changed by the user. Morphological operations are provided to smooth small details of objects and to identify the hull of objects. How these operations are applied for modelling is described in the Sect. 4.4.
3
Object Properties
In our modelling system we consider a finite subset V of ZZ 3 which we want to call volume buffer subsequently. More precisely, V is a regular three-dimensional array of points. As stated in the previous section, an object O in our modelling system is a subset of ZZ 3 , which is located in V such that O ⊂ V . We need some properties of O for the object management and for an efficient processing of O. The set O has a certain size |O| which is expressed by the number of points of O. There is no requirement on the connectivity of O. However, connectivity of an object is kept by the transformations. For example, if an object is a connected set and has no holes before a rotation than the object must have the same properties after a rotation. For processing, each object O is associated with a unique identifier which is assigned by the system and is saved in the volume buffer at the position of the voxels which belong to O. Additional meta data like name, colour, and density complete the object representation. Each object has a certain position in the object hierarchy. An object represented by a leaf in the hierarchy is a simple object which cannot be decompound
A System for Modelling in Three-Dimensional Discrete Space
Fig. 3. Views of a Lego block.
537
Fig. 4. Olympic games.
further. In contrast, an object is a compound object if it is the result of grouping a number of simple or compound objects. Then the object is represented by an inner knot in the hierarchy. The properties of a compound object like size and bounding box depend on the simple objects from which the object is composed. The object hierarchy for the example in Fig. 1 is illustrated in Fig.2. For example, the object desk is a compound object and the object desk-top is a simple object. The root-knot is pre-defined by the system. Any object which is created in the volume buffer is per default a child-knot of the root, i.e. it is a simple object which does not belong to any compound object. The hierarchy can be changed interactively by the user.
4 4.1
Generation and Manipulation of Objects Generation, Deletion, Set Operations
Simple objects can be generated by rastering basic geometric shapes, e.g. sphere, cylinder, cone or cuboid, and combinations of them. The second way to obtain an object is by setting the voxels of this object one-by-one, which is not very efficient for large objects. A third way is the import of binary data, e.g. from medical imaging, into our system. This data can be manipulated subsequently in the same way like synthetic data. Of course, all voxels of an object can be deleted at once after the object has been selected. Alternatively, each voxel can be deleted separately. The set operations union, difference and average are provided by the system to combine single objects as well as compound objects. An example is presented in Fig. 3 which is the result of combining simple shapes with set operations. 4.2
Templates
Another way of generating an object is to define a three-dimensional template T ⊂ ZZ 3 , which can be viewed as three-dimensional pencil that is moved in
538
Andreas Emmerling et al.
-1
A
(a)
A
(b)
(c)
(d)
Fig. 5. Rotation by −π/4: (a) original object O (grey squares), (b) result after rotating each point of O (dots) and O (grey square) after nearest neighbour rounding, (c) original object O (grey squares) and relevant points of the output image after inverse transformation (dots), and (d) O (grey squares).
space. The voxels which are hit by moving T belong to the object. A template T can also be used like an eraser. Then the voxels which are hit by moving T are set to empty. Any simple object can be used as template. In this way the user is able to define his own tools by connecting dynamically an object with one of the operations fill or erase. This is a very natural way of modelling and is well suited for free-form modelling. Figure 4 shows an example for free-form modelling. The poles and the slate in this Figure are modelled by shifting a template. 4.3
Geometric Transformations
Geometric transformations are important for placing and manipulating objects. At the current state, our system provides the affine transformations translation, scaling (and with it also reflection), rotation, and shearing. Applying the corresponding transformation matrix A to each point of an object O works well for the translation. The other transformations cannot be done in this way since holes may appear in the transformed object. This problem is illustrated in Fig. 5 by an example in two-dimensional space: the object O is rotated by −π/4 around the center of the lower left point of O. Denote the result with O . In Fig. 5(a) the points of original object O (grey squares) and with their actual representation in ZZ 3 (little dots) are shown. Figure 5(b) illustrates the result after rotating each point of O separately. The grey squares in (b) show O after nearest neighbour rounding. There arises a hole in O which does not exist in the original object. To solve the problem of holes, we use a common approach of image processing which has been adapted to volumetric image processing [12]. The basic idea is to successively fill in new values at each position of the output image. For this, we have to reverse the transformation by inverting the transformation matrix and apply this matrix A−1 to each point of the output image and round the result to their nearest neighbours of ZZ 3 . This is illustrated in Fig. 5(c) and (d) for our example. For simplification only the points of the output image which hit O are illustrated by dots. Figure 5(d) shows O (grey squares). Note that in O arises no hole. We have adapted this approach for our purpose keeping in mind that we want to compute these transformations for separate objects only and not for the entire volume buffer.
A System for Modelling in Three-Dimensional Discrete Space
-1
BO
539
B’O
A
S(B’O) (a)
(b)
Fig. 6. Rotation by −π/4: (a) original object O (grey squares) and the points of S(BO ) after the inverse transformation (dots and crosses), (b) points of S(BO ) (dots and crosses) and O (grey squares).
To determine O , one could apply A−1 to each point of V . In general, this would be very inefficient. Therefore, the remaining problem is to identify the subset of V , in which the transformed object O is located. Assume O ⊂ V and O ⊂ V . In Sect. 3, it was stated that O is associated with its bounding box BO which fully includes O. In fact, a bounding box can be viewed as a continuous cuboid. This is illustrated in Fig. 6 for an example in two-dimensional space: again, the object O is rotated by −π/4 around the center of the lower left point of O and the result is denoted with O . In Fig. 6(a) the points of O (grey squares) and its bounding box (thicker lines) are illustrated. Apparently, O will be also enclosed by the cuboid which represents its bounding box after this cuboid is . We only have to transformed. Let us denote this transformed cuboid with BO transform the corner points of BO with A to obtain BO . Then we determine the points of V which belong to the supercover of BO including the interior points. These points of the example in Fig. 6(b) are surrounded by a thick line. ) and assume S(BO ) ⊂ V . Finally, we calculate Denote this set of points S(BO the inverse transformation of each point q ⊂ S(BO ). If the nearest neighbour of qA−1 is in O then q ⊂ O . In Fig. 6, the points of S(BO ) which belong to O are marked with crosses. This basic approach for transforming an object in ZZ 3 can be applied whenever the inverse transformation is known. One should keep in mind, that this is not always the most efficient way, like for the translation. The examples shown in Fig. 7 and Fig. 8 are modelled with simple geometric shapes which were transformed by rotation, shearing, scaling, and translation. 4.4
Morphological Operations
Our system provides a smoothing function to reduce tiny details of objects. We apply morphological filtering well-known from image processing for this purpose [14]. A morphological transformation is given by the relation of the set of points in question with another (smaller) set of points called structuring element. Morphological filtering in digital image processing is done by the two elementary functions: erosion, dilation, and combinations of them. Erosion shrinks objects by smoothing away the boundaries of an object. Dilation expands objects, fills small holes and connects disjoint parts of an object. Combinations of these functions are used to smooth objects. These combinations are opening and closing.
540
Andreas Emmerling et al.
Fig. 7. Stadium.
Fig. 8. Starship.
Fig. 9. Smoothed scene.
Fig. 10. Hollowed cylinder.
Opening is defined as erosion followed by dilation and closing as dilation followed by erosion. We applied a discrete sphere as structuring element, i.e. a voxelized sphere with a user-defined diameter. However, other structuring elements, e.g. cubes, could be used as well. Figure 9 shows an example for the result of a morphological closing. The lower left side of the image shows the scene before smoothing and the upper right side shows the result after the closing. Furthermore, we use the result of the erosion to hollow out objects, i.e. to delete the interior of an object. As said above, erosion removes the boundary of an object, i.e. some kind of shell is subtracted from the original object O. The thickness of the shell depends on the structuring element which is applied for the erosion. To hollow out O, the result of the erosion is subtracted from O such that the shell of O is kept. The example of a hollowed cylinder with bases removed is shown in Fig. 10. The left side of the image shows the shell after hollowing the entire cylinder and the right side shows the result after hollowing the shell with a smaller structuring element. The morphological operations as described above turned out to be a valuable tool for modelling. The smoothing is particularly important for free-from modelling and hollowing out objects is frequently useful for modelling regular shapes.
A System for Modelling in Three-Dimensional Discrete Space
5
541
Implementation and Interface
For the implementation of our modelling system, we decided to rely on a common PC with the ordinary input devices mouse and keyboard. We give a brief outline of the implementation below. Basically, the system is subdivided into four parts: – The volume buffer and the data management are responsible for the entire data handling including the organization of the data flow and the memory management for a session. – The interface is implemented using QT 3.0. It enables to model interactively via menu and icon-based control. Sessions and scenes can be stored on external storage units. The main parts of the interface are illustrated in Fig. 11. We tried to compensate the disadvantages of the input devices to interact with a three-dimensional scene by an interface which enables a straightforward navigation in and manipulation of the volume buffer. The user is able to move three slices through the volume buffer that are parallel to the three coordinate planes. Objects can be interactively selected and placed in this slices. The slice-based interaction is a first and easy way to model in discrete space. However, it is not always sufficient. Therefore a numerical manipulation of objects is also supported: a command line interface is provided and parameters, e.g. for the transformations, can be set numerically. A real-time OpenGL-based visualization unit gives the user visual feedback. Like in the examples shown throughout the paper, the objects are visualized by their cuberille representation. We have preferred this representation for the modelling process since the results can be well judged on the voxel level. – The interaction pool keeps track of all provided interactions between the user and the volume buffer, and does their execution. The result of an interaction is mapped directly into the volume buffer. – The object pool manages the object hierarchy and the meta-data associated with each object and organizes the data flow between the volume buffer and the interactions.
6
Summary
We have presented a system for modelling in three-dimensional discrete space. Our system is intended as experimental environment for volume modelling. It enables different modelling approaches and works consistently in discrete space. The combination of simple shapes is suitable to generate large and regular shapes. Free-form modelling is supported by defining templates and moving them interactively in the volume buffer. The management of complex objects is facilitated by an object hierarchy. The geometric transformations translation, rotation, scaling, and shearing are provided for objects in discrete space. Morphological operations to smooth and hollow out objects turned out to be a powerful tool for any modelling approach. At the current state, the functionality provided
542
Andreas Emmerling et al.
Fig. 11. Interface.
by the system already supports the modelling of a wide variety of objects. This is illustrated by the examples shown throughout the paper. The development of our modelling system is in progress. In a next step we are concerned with deformations. At the current state, the interface is not comfortable for an unexperienced user. It will also be a matter of future work to investigate in the development of a more intuitive interface with an improved visualization unit.
Acknowledgement We would like to thank all people who have contributed to the development and the implementation of the modelling system. In particular, thanks are given to Sebastian Derkau and Marcel Schl¨ onvoigt. Thanks also go to Christoph Lincke and Marko Meister for helpful discussions and comments on the paper.
References ¨ller, H. Interactive manipulation of voxel volumes with 1. Ayasse, J., and Mu free-formed voxel tools. In Vision, Modeling, and Visualization 2001, T. Ertl, B. Girod, G. Greiner, H. Niemann, and H.-P. Seidel, Eds. IOS Press - infix, 2001, pp. 359–366.
A System for Modelling in Three-Dimensional Discrete Space
543
2. Bærentzen, A. Octree–based volume sculpting. In LBHT Proceedings of IEEE Visualization ’98 (October 1998), C. M. Wittenbrink and A. Varshney, Eds. 3. Bærentzen, A., and Christensen, N. J. A technique for volumetric CSG based on morphology. In Volume Graphics 2001, K. Mueller and A. Kaufman, Eds. Springer–Verlag, 2001, pp. 117–130. 4. Bærentzen, J. A., and Christensen, N. J. Volume sculpting using the level-set method. Shape Modeling International, 2002. Proceedings (2002), 175–182. ¨ nning, R., and Mu ¨ller, H. Interactive sculpturing and visualization of un5. Bo bounded voxel volumes. In Proceedings 7th ACM Symposium on Solid Modeling and Applications (2002), pp. 212–219. 6. Chen, H., and Fang, S. A volumetric approach to interactive CSG modeling and rendering. In Proceedings 5th ACM Symposium on Solid Modeling and Applications (1999), pp. 318–319. 7. Chen, M., and Tucker, J. V. Constructive volume geometry. Computer Graphics Forum 19, 4 (2000), 281–293. 8. Chen, M., Tucker, V., and Leu, A. Constructive representations of volumetric environments. In Volume Graphics, M. Chen, A. E. Kaufman, and R. Yagel, Eds. Springer–Verlag, 2000, pp. 97–117. 9. Chen, M., Winter, A. S., Rodgman, D., and Treavett, S. M. F. Enriching volume modelling with scalar fields. In Data Visualization: The State of The Art, F. Post, G.-P. Bonneau, and G.Nielso, Eds. Kluwer Academic Press, 2002. 10. Kaufman, A., Yagel, R., and Cohen, D. Modeling in volume graphics. In Modeling in Computer Graphics - Methods and Applications, B. Falcidieno and T. L. Kunii, Eds. Springer–Verlag, 1993, pp. 441–454. 11. Liao, D., and Fang, S. Fast volumetric CSG modeling using standard graphics system. In Proceedings 7th ACM Symposium on Solid Modeling and Applications (2002), pp. 204–211. 12. Lohmann, G. Volumetric Image Analysis. Wiley–Teubner, 1998. 13. Savchenko, V. V., Pasko, A. A., Sourin, A. I., and Kunii, T. L. Volume modelling: Representations and advanced operations. In Proc. of Computer Graphics International ’98 (1998), IEEE Computer Society Press, pp. 616–625. 14. Sonka, M., Hlavac, V., and Boyle, R. Image processing, analysis, and machine vision. PWS Publishing, 1999. 15. Wang, S., and Kaufman, A. Volume sculpting. In Symposium on Interactive3D Graphics (1995), ACM Siggraph, pp. 151–156. 16. Wang, S. W., and Kaufman, A. E. Volume-sampled 3D modeling. IEEE Computer Graphics and Applications 14, 5 (1994), 26–32. 17. Westwood, J., Hoffman, H., Mogel, G., Phillips, R., Robb, R., and Stredney, D., Eds. Medicine Meets Virtual Reality 11. IOS Press, 2003. 18. Winter, A. S., and Chen, M. Image-swept volumes. Computer Graphics Forum (Proc. Eurographics’02) 21, 3 (2002), 441–450. 19. Wu, Z., Seah, H. S., and Lin, F. NURBS volume for modelling complex objects. In Volume Graphics, M. Chen, A. E. Kaufman, and R. Yagel, Eds. Springer–Verlag, 2000, pp. 159–167.
Interactively Visualizing 18-Connected Object Boundaries in Huge Data Volumes Robert E. Loke and Hans du Buf Vision Laboratory, University of Algarve, 8000-810 Faro, Portugal {loke,dubuf}@ualg.pt http://w3.ualg.pt/∼dubuf/vision.html tel: +351 289 800900 ext. 7761, fax: +351 289 819403
Abstract. We present a multiresolution framework for the visualization of structures in very large volumes. Emphasis is given to an in the framework embedded, new algorithm for triangulating 18-connected object boundaries which preserves 6-connectivity details. Such boundaries cannot be triangulated by standard 6-connectivity algorithms such as Marching Cubes. Real sonar imaging results show that the framework allows to visualize global subbottom structure, but also high-resolution objects, with a reduced CPU time and an improved user interactivity. Keywords: Boundary triangulation, Marching cube, Voxel connectivity, Visualization.
1
Introduction
Visualization facilitates the analysis, modeling and manipulation of scalar data volumes. Visualization can be done by direct volume rendering (DVR) and surface rendering [1, 2]. In surface rendering, object boundaries are visualized by first extracting a geometric model of the volume (iso)surfaces and then by rendering the model. Advantages are that it is fast and that memory requirements are low if compared to DVR, because the geometric model has to be extracted only once and rotations etc. deal with the model only, and are not again affected by the entire data volume, like in DVR. Furthermore, realtime shading algorithms and hardware support are available for surface graphics. In this paper we describe our visualization framework which has in part already been published before, see e.g. [3]. However, here we accurately define and extend the embedded boundary triangulation. Below, we describe the framework and triangulation algorithm used to build surfaces for detected object boundaries (Sections 2 and 3), apply them to a real sonar dataset (Section 4) and give conclusions and directions for future work (Section 5).
2
Interactive Visualization
Similar to other approaches [4, 5], we render surfaces in an octree, aiming at quick (multiresolution) processing and fast user interactivity. Octrees are representations of volumes in which different spatial resolution levels are computed I. Nystr¨ om et al. (Eds.): DGCI 2003, LNCS 2886, pp. 544–553, 2003. c Springer-Verlag Berlin Heidelberg 2003
Interactively Visualizing 18-Connected Object Boundaries
545
Fig. 1. Visualization at different resolution levels in the octree spatial data structure.
by sampling or filtering data in blocks of 2×2×2 [6]. They are hierarchical data structures with explicitly defined parent-child relationships: a parent represents 2×2×2 voxels at the lower level and 22 ×22 ×22 voxels at the next lower level etc. We use an octree in which low resolution data at the higher tree levels are determined by spatially smoothing the available data at the lower tree levels. Voxel values at a higher level are the average of all values of non-empty data voxels in non-overlapping blocks of size 2×2×2 at the lower level. This simple processing results in a fast tree construction and facilitates quick data visualizations at low resolutions, because in the tree both the signal noise and the size of gaps decrease, even for huge volumes with a large number of gaps or volumes reconstructed from very noisy data. The loss in spatial resolution at the higher tree levels is compensated using adequate down-projection techniques. In particular, once the data have been classified at a high tree level, the boundaries of the segmented regions are refined by filtering the available data at the lower levels [7]. After first selecting a region of interest (ROI), and registering the selected data to a regular 3D grid, we do all critical processing in an octree. Because the tree construction and the processing at the highest tree levels is very fast, initial coarse visualizations are quickly obtained, such that the ROI can be immediately adjusted. The initial coarse visualizations already give much insight in the structures which are being studied and are only refined, i.e. the data at the lower tree levels are only processed, if the ROI has been correctly set. A first, coarse visualization at the lowest resolution is obtained by interpolating data around gaps, segmenting the volume into regions, and constructing shaded, colored and/or transparent surfaces for all region boundaries. See Fig. 1 (pings are specific underwater acoustic signals which represent vertical columns in the volume). Next visualizations at higher resolutions are obtained by down-projecting and interpolating the available data into gaps, and refining the segmented structures and the constructed surfaces. Importantly, once the data have been visualized, the processing can be stopped at any moment in order to select a new ROI. The processing proceeds only if, according to the user, the ROI has been correctly set. If not, the processing is stopped and a tree is built for another ROI. The octree provides a computational framework in which the following techniques can be employed: (A) the construction of a quadtree that allows to fill
546
Robert E. Loke and Hans du Buf
empty voxel columns [8], (B) a first but coarse visualization at a high tree level in order to rapidly adjust the ROI, and (C) a very efficient triangulation (mesh reduction) that allows for a fast interactivity even at the highest detail level. By using one single octree all this processing can be combined because (1) gaps can be filled by interpolation because they are smaller at higher tree levels, (2) connected components can be projected down the tree and refined using the data available there and (3) triangulations at higher tree levels can be used to steer those at lower levels to fill efficiently large and smooth surface areas. After the segmentation (and possibly a connected-component labeling) in the visualization framework, the object boundaries are visualized using surface rendering. Software libraries such as OpenGL (Open Graphics Library) or VRML (Virtual Reality Modeling Language) provide interfaces which enable an interactive analysis of structures by “flying” through and around the structures of interest. Thus, apart from using an octree, we use two extra techniques for improving interactivity: the selection of a ROI and the use of VRML/OpenGL.
3
Triangulation
The well-known Marching Cubes (MC) algorithm, as well as topology improved [9] and efficiency enhanced – in terms of a reduced triangle count – versions are all based on locally triangulating cuberille (2×2×2 voxel) configurations. Other surface construction algorithms decompose the cuberilles into voxels or tetrahedra, use boxes instead of cuberilles, use polyhedra or polygonal volume primitives instead of triangles, use rules instead of a look-up table for cuberille configurations, use heterogeneous grids to guarantee topologically coherent (closed, oriented) surfaces, or optimize the search of relevant cuberilles. In contrast to all these algorithms we: 1. Triangulate object boundaries by mapping complete 3×3×3 neighborhoods to polygons. This allows to optimize the polygons locally. 2. Interpolate between the coordinates of boundary voxels. This improves the smoothness of the built surfaces. 3. Allow 18 connectivity for objects (like in [9]; unlike 6 connectivity in MC)1 . This allows to construct surfaces for boundaries which are not connected according to a 6-connectivity model, e.g. an object boundary which is tilted and thinned, see Fig. 2 (left). Our algorithm is based on a property of non-intersecting surfaces, excluding the edges: for such surfaces, each point on the surface has exactly four neighboring 1
Here we note that two voxels are n-connected (n = 6, 18, 26) if there exists a path between the voxels such that all subsequent voxels on the path are maximally nadjacent one to another. Two voxels are n-adjacent if they are n-neighbors. The 6neighborhood (respectively, 18-, 26-neighborhood) of a voxel at (x, y, z) is comprised by these voxels for which |x − a| + |y − b| + |z − c| = 1 (2, 3), with (a, b, c) arbitrary voxel coordinates. Thus, 6-connected voxels are also 18-connected and 26-connected, but 18-connected ones not 6, and 26-connected ones not 18 nor 6.
Interactively Visualizing 18-Connected Object Boundaries
Fig. 2. Two examples of 3×3×3 voxel neighborhoods. Boundary voxels are grey. the left, the boundary is 6-connected in z (into the paper), and 18-connected in (x, y)-plane. On the right, it is 6-connected in the (x, y)-plane or, put differently, connected with 6-connected shortcuts. The 2nd and 3rd layer have been shifted to right in order to show their contents.
547
On the 18the
points which are also located on the surface. In discrete terms, this means that between a boundary voxel a and another boundary voxel in its neighborhood, say b, two other adjacent boundary voxels c and d are needed to form a surface patch a − c − b − d. Below, we will distinguish between two types of voxels: face voxels and non-face voxels. Figure 3 (a) shows the definition of face voxels in the 3×3×3 neighborhood N of a boundary voxel B. Since we assume up to 18-connectivity for objects, a boundary voxel in N is a face if it is 6-connected to B, but also if it is 18-connected to B and no other boundary voxel can be found which 6-connects the voxel and B. If a boundary voxel in N is not a face, we call it a non-face. Furthermore, we will model the boundary topology using a very small set of configurations with, in each configuration, varying connectivity paths between a and b. In these configurations, a boundary voxel is 18-connected and sometimes 6-connected to each other boundary voxel in its 26-neighborhood. Then, by defining a surface patch for each configuration, object boundaries can be mapped to surfaces. Finally, we will extend the set of configurations in order to correctly model and map non-thin boundaries, i.e. boundaries with additional 6-connectivity paths between a and b. 3.1
Boundary Definition
In order to triangulate the boundaries in a volume, we first must determine all boundary voxels. Here, we define a voxel at (x, y, z) to be part of a component’s boundary if at least one of the values of the voxels at (x + 1, y, z), (x − 1, y, z), (x, y + 1, z), (x, y − 1, z), (x, y, z + 1) and (x, y, z − 1) differs from its own value. However, the triangulation is not restricted by this definition, i.e. other boundary definitions may be used, employing for example 18- and 26-neighborhoods. Obviously, the resulting boundaries are not necessarily smooth, e.g. they may contain sharp corners/edges: in neighborhoods, boundaries may be both 6- and 18-connected, see Fig. 2 (right). Thinning can be used to remove boundary voxels which do not contribute to the connectivity of the boundary. This normally decreases the triangle counts of the resulting surfaces. However, in some applications this leads to undesired information loss or deformations. We note that for a correct application of our algorithm, thinning may be done but is not required.
548
Robert E. Loke and Hans du Buf
Fig. 3. Triangulation look-up table for boundary configurations with varying connectivity between voxel pair a = (0, 0, 0) and b = (1, 1, 0/1), or triplet a, c = (0, 1, 1) and d = (1, 0, 1). Only some configurations for one octant in the 3×3×3 neighborhood around a boundary voxel are shown; the other configurations for the same octant and those for the other octants are obtained by mirroring. Boundary voxels are either black or grey; background voxels white. Grey spheres denote face voxels. Corners on the cubes without spheres are positions which do not affect the connectivity.
3.2
Boundary Matching
We triangulate the volumetric boundaries locally, in the 3×3×3 neighborhood N around each boundary voxel B. We independently map the boundary in each of the eight octants in N to multiple vertex lists, such that in each list the coordinates and the order of the vertices of a matched boundary configuration are defined. This decomposition into octants allows to: (A) reduce the total number of configurations, and (B) correctly map neighborhoods at edges of boundaries. Figure 3 (b), (c) and (d) show configurations for the octant in N with positive x, y and z coordinates, together with the triangles which are to be applied. The configurations for the other octants are obtained by mirroring about the planes x = 0, y = 0 and z = 0, about the x, y and z axes, or about B. The total number of configurations has been reduced using mirroring about the diagonal planes x = y, x = z and y = z. We2 obtained the configurations by: (A) determining the set of all valid (i.e., 6- and/or 18-connected) a−c−b−d boundary voxel patterns in N (yielding Fig. 3 (b) and (c)); (B) extending the resulting set by increasing 2
Similar configurations have also been obtained from a theoretical approach [10], providing a topological validation of the object surfaces built by our algorithm.
Interactively Visualizing 18-Connected Object Boundaries
549
the boundary connectivity in all patterns (yielding Fig. 3 (d)). There are totally 12 configurations which are divided into three different types: (1) Configurations with four boundary voxels in which the non-face is 6- and/or 18-connected to B using exactly two faces. (2) A special configuration with three boundary voxels in which the non-face is (assumed to be) located outside N , which is 18-connected, again using exactly two faces. This configuration corresponds to the case in which the position of the center is (x, y, z), and two faces exist at (x + 1, y, z + 1) and (x, y + 1, z + 1). Then the voxel which is adjacent to both faces may be positioned outside N , at (x + 1, y + 1, z + 2). (3) Configurations with more than four boundary voxels, in which the 18-connected voxels in (1) are now also 6connected. The latter configurations we call shortcuts, because they add an extra 6-connectivity to an already existing 18-connectivity. For each configuration a vertex list is defined, which specifies the coordinates and the order of the vertices which are to be applied in the triangulation. Vertex coordinates are determined for each boundary voxel in N (apart from B, whose voxel and vertex coordinates are (0,0,0)) in one of two ways, dependent on whether it is a face of B or not. The vertex position computed for each face is the average of the voxel coordinates of B and the face. The vertex position of each non-face is the average of the neighboring four voxel coordinates, except for the one in the special non-face configuration, for which the coordinates are (0.33, 0.33, 0.67), and the additional non-faces in the shortcut configurations. All vertex coordinates can be derived from Fig. 3, e.g., for the second shortcut (column 1, row 2) the vertex list is {(0.5, 0, 0), (0.5, 0.5, 0.5), (0, 0.5, 0.5), (0, 0.5, 0)}. We match each octant in N with all (mirrored) configurations. If a configuration matches an octant, the (mirrored) vertex list of the configuration is stored. By using “don’t care” voxels, i.e. voxels which may belong to either the boundary or the background, multiple configurations can match the same octant. This allows to correctly map “sharp” boundaries to surfaces. We note that this even allows to map intersecting boundaries, but that for intersections the linking algorithm [11] is not trivial. The neighborhood matching results in a number of vertex lists, which must be stored for all positive matches, in each of the eight octants. The order of the vertices in each list is implicitly defined in Fig. 3. After the matching, the order of the vertex lists is determined by linking all vertex lists, and the final patch can be triangulated and optimized [11]. Also, a normal vector is attributed to the patch for surface shading. Figure 4 shows surface patches obtained by triangulating the boundaries of a cube of size 16×16×16 and a sphere of radius 14, without any patch optimization.
4
Visualization Results
We obtained several 3D datasets by maneuvring vessels mounted with bottompenetrating sonar in shallow water areas. Dataset sizes may range up to several GBs per seabed, and this will further grow due to increasing demands on sampling rate and trace size. Obviously, it is impossible to conduct a vessel such that an entire site is scanned, which implies that a lot of 3D data are missing. Com-
550
Robert E. Loke and Hans du Buf
Fig. 4. Wireframes of 1/8th part of a cube and a sphere (top left), and shaded and optimized surfaces of detected subbottom structures in the large ROI at octree level 2 (top right) and 1 (bottom). Note the improvement in detail when refining structures from level 2 to 1.
Interactively Visualizing 18-Connected Object Boundaries
551
monly, sonar operators need to explore data at different scales. They may want to visualize a large area of a seabed, but also a small part, for example when they look for objects. The analyses which are required demand for different sampling rates. Here, we will show volumetric reconstructions of a seabed at two different scales, in two different ROIs: a large region in which the size of the voxels equals 3.8×4.5×0.6 m3 and a small region with a voxel size of 0.5×0.5×0.08 m3 . Figure 4 shows shaded surfaces with wireframes for boundaries of the structures found in the large ROI. The images were obtained by mapping all data from one site to a regular grid of size 32×128×128. Thereafter, 29% of the volume was filled. The octree consisted of 4 levels. For these images, we did not apply any interpolation. We directly constructed the tree and projected the segmented boundaries from the highest tree level to the lower levels using a robust planar filtering on the boundary data [7]. CPU times on an SGI Origin 200QC server, using all 4 processors and including disk IO, were 1.6, 1.3, 4.4 and 19.7s at octree level 3, 2, 1 and 0. We note that the Origin has MIPS R10000 processors at 180 MHz, and that a normal Pentium III computer at 733 MHz is faster than one Origin processor by a factor of 2.2. Using the latest GHz processors, the total time, about 27s, can be reduced to less than 4s. Hence, our visualization framework can be applied in realtime for routine inspection and interpretation. Ideally, octrees can be used for visualizing large structures in huge data volumes at high tree levels and small ones at low levels. Here we have preferred to select another, much smaller ROI, and to reconstruct another volume (of size 384×64×700) at a much higher spatial resolution. The vertical spatial resolution in depth was increased by averaging and sampling each underwater acoustic signal with a mask of size 2 (for the large ROI a mask of size 40 was used). In order to automatically detect and visualize the sewage pipes which appear at this level of detail, and to cope with the increased data sparseness (this volume was filled for only 9%), we performed additional inter-slice interpolations. In these interpolations, we match/correlate voxel columns in order to correctly obtain single surface reflections and to avoid artificial double/multiple reflections [8]. Hereafter, an octree of three levels was built in order to interpolate remaining gaps, automatically detect the pipes and triangulate their boundaries. The CPU time was 228s for the inter-slice interpolation and 28, 127 and 241s for the octree processing at level 2, 1 and 0. These times are much bigger than those for the large ROI. However, again, using the latest GHz processors, the octree times can be reduced to less than 4, 19 and 35s, and the time for the extra interpolations can be reduced to less than 33s. The optimized time needed for a complete processing at the highest tree level, 37s, enables application of the framework for routine inspection and interpretation work, in near realtime. Figure 5 shows the seafloor and some semi-buried pipeline segments as well as a zoom-in of one segment, seen from different viewpoints. It is even possible to “look through” the pipe. Although a correct reconstruction of the seabottom is a very difficult task, due to the sparseness of and the noise in the data, these volumetric reconstructions allow for a detailed exploration and analysis of the seabed.
552
Robert E. Loke and Hans du Buf
Fig. 5. Optimized and shaded seabottom surfaces in the small ROI at octree level 2, 1 and 0, and a sewage pipe seen from three viewpoints at octree level 1. These surfaces can be extracted from an incomplete volume of very noisy sonar data, sized 384×64×700, in near realtime.
Interactively Visualizing 18-Connected Object Boundaries
5
553
Conclusions
We use multiresolution octrees for interactively visualizing large data volumes in (near) realtime. Application to volumes reconstructed from a very large sonar dataset showed that octree visualizations facilitate a fast seabottom analysis and/or a fast searching for objects in the subbottom, even for volumes reconstructed from very noisy data and for volumes with a large number of unknown voxel values. In the future we will look for further applications, aiming at further finetuning and optimization of the embedded techniques, in order to enable a fast processing of huge datasets, thereby focussing on a fast user interactivity.
Acknowledgements The data were obtained at ˚ Asg˚ ardstrand, a site in the North Sea close to the city of Horten, in the Oslofjord, in the European MAST-III ISACS Project (http://w3.ualg.pt/isacs/), contract MAS3-CT95-0046. The visualizations in Figs. 4 and 5 have been partly obtained with VRMLview, from Systems in Motion, Oslo, Norway. This work was partially supported by the FCT Programa Operacional Sociedade de Informa¸c˜ao (POSI) in the frame of QCA III. Loke is currently involved in a Portuguese project on 3D modeling from video, contract POSI/SRI/34121/1999.
References 1. T. T. Elvins, “A survey of algorithms for volume visualization,” Computer Graphics, vol. 26, no. 3, pp. 194–201, 1992. 2. A. Kaufman, Volume Visualization. Los Alamitos (CA), USA: IEEE Computer Society Press Tutorial, 1991. 3. R. E. Loke and J. M. H. du Buf, “Sonar object visualization in an octree,” in Proc. OCEANS 2000 MTS/IEEE Conf., Providence (RI), USA, 2000, pp. 2067–2073. 4. J. Wilhelms and A. Van Gelder, “Multi-dimensional trees for controlled volume rendering and compression,” in 1994 ACM Symposium on Volume Visualization, A. Press, Ed., Tysons Corner (VA), USA, 1994, pp. 27–34. 5. D. Meagher, “Geometric modeling using octree encoding,” Computer Graphics and Image Processing, vol. 19, no. 2, pp. 129–147, 1982. 6. H. Samet, Applications of Spatial Data Structures: Computer Graphics, Image Processing, and GIS. Reading (MA), USA: Addison-Wesley, 1990. 7. R. E. Loke and J. M. H. du Buf, “3D data segmentation by means of adaptive boundary refinement in an octree,” Pattern Recognition, 2002, subm. 8. ——, “Quadtree-guided 3D interpolation of irregular sonar data sets,” IEEE J. Oceanic Eng., 2003, to appear. 9. J. O. Lachaud and A. Montanvert, “Continuous analogs of digital boundaries: A topological approach to iso-surfaces,” Graphical Models and Image Processing, vol. 62, pp. 129–164, 2000. 10. M. Couprie and G. Bertrand, “Simplicity surfaces: a new definition of surfaces in Z3 ,” in Proc. SPIE Vision Geometry VII, vol. 3454, 1998, pp. 40–51. 11. R. E. Loke and J. M. H. du Buf, “Linking matched cubes: efficient triangulation of 18-connected 3D object boundaries,” The Visual Computer, 2003, to appear.
Author Index
Alata, Olivier 288 Andres, Eric 246 Arcelli, Carlo 124, 298 Ayala, Dolors 338
Hanbury, Allan 134 Hildebrand, Kristian 534 Hoffmann, J¨ org 534 Hontani, Hidekata 465
Bal´ azs, P´eter 388 Balogh, Emese 388 Barneva, Reneta P. 72 Ben Hamza, A. 378 Bertrand, Gilles 236 Biasotti, Silvia 194 Bihoreau, Camille 288 Bloch, Isabelle 16 Braquelaire, Achille 257 Breton, Rodolphe 246 Brimkov, Valentin E. 72 Brlek, Sreˇcko 277 Brunetti, Sara 398 Buf, Hans du 544
Ikonen, Leena 308 Imiya, Atsushi 144, 444
Caron, Yves 495 Charpentier, Harold 495 Chassery, Jean-Marc 102 Coeurjolly, David 327 Couprie, Michel 62, 236 Crespo, Jose 475
Labelle, Gilbert 277 Lacasse, Annie 277 Lachaud, Jacques-Olivier Lak¨ amper, Rolf 34 Latecki, Longin Jan 34 Lienhardt, Pascal 408 Lindblad, Joakim 348 Linh, Truong Kieu 444 Lohmann, Gabriele 358 Loke, Robert E. 544
Damiand, Guillaume 288, 408 Daragon, Xavier 236 Daurat, Alain 114, 398 Deguchi, Koichiro 465 D’Elia, Ciro 204 Dupont, Florent 102, 246 Emmerling, Andreas
534
Falcidieno, Bianca 194 Floriani, Leila De 454 Fouard, C´eline 214 Gelli, Daniele 504 Giga, Mi-Ho 465 Giga, Yoshikazu 465 Gonzalez–Diaz, Rocio Grau, Antoni 267
92
Jonker, Pieter P.
317, 420
Kenmochi, Yukiko 144 Kerautret, Bertrand 257 Kingston, Andrew 485 Kiryati, Nahum 358 K¨ othe, Ullrich 82 Kopperman, Ralph 1 Krim, Hamid 378 Kropatsch, Walter G. 134 Kuba, Attila 388
434
Makris, Pascal 495 Malandain, Gr´egoire 214 Maojo, Victor 475 Marchadier, Jocelyn 134 Marini, Simone 194 Matula, Pavel 514, 524 Morando, Franco 454 Mortara, Michela 194 Musialski, Przemyslaw 534 Najman, Laurent 62 Normand, Nicolas 154 Nouvel, Bertrand 174 Nystr¨ om, Ingela 368
556
Author Index
Patan`e, Giuseppe 194 Pla, Filiberto 164 Puppo, Enrico 454 Real, Pedro 92 ´ R´emila, Eric 174 Remy, Eric 224 Rodr´ıguez, Jorge 338 Ros, Llu´ıs 338 Saha, Punam K. 368 Sanfeliu, Alberto 267 Sanniti di Baja, Gabriella 124 Scarpa, Giuseppe 204 Serino, Luca 298 Serratosa, Francesc 267 Sivignon, Isabelle 102, 246 Sladoje, Nataˇsa 368 Soille, Pierre 52 Spagnuolo, Michela 194 Staffetti, Ernesto 267
Stelldinger, Peer 82 Svalbe, Imants 485 Svensson, Stina 124, 317, 420 Svoboda, David 514, 524 Tabbone, Salvatore 184 Tajine, Mohamed 114 Thiel, Edouard 224 Thomas, Federico 338 Th¨ urmer, Grit 534 Toivanen, Pekka 308 Traver, V. Javier 164 Vargas-Vazquez, Dami´ an 475 Vialard, Anne 434 Vincent, Nicole 495 Vitulano, Domenico 504 Wendling, Laurent 184 Windreich, Guy 358 Wolter, Diedrich 34