Pattern Recognition 33 (2000) 1405}1410
On the independence of rotation moment invariants Jan Flusser* Institute of Information Theory and Automation, Academy of Sciences of the Czech Republic, Pod voda& renskou ve\ z\ n& 4, 182 08 Prague 8, Czech Republic Received 28 August 1998; accepted 19 May 1999
Abstract The problem of the independence and completeness of rotation moment invariants is addressed in this paper. First, a general method for constructing invariants of arbitrary orders by means of complex moments is described. As a major contribution of the paper, it is shown that for any set of invariants there exists a relatively small basis by means of which all other invariants can be generated. The method how to construct such a basis and how to prove its independence and completeness is presented. Some practical impacts of the new results are mentioned at the end of the paper. 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Moment invariants; Complex moments; Independence; Completeness
1. Introduction Moment invariants have become a classical tool for object recognition during the last 30 years. They were "rstly introduced to the pattern recognition community by Hu [1], who employed the results of the theory of algebraic invariants [2,3] and derived his seven famous invariants to the rotation of 2-D objects:
"k #k ,
"(k !k )#4k ,
"(k !3k )#(3k !k ),
"(k #k )#(k #k ),
"(k !3k )(k #k )((k #k ) !3(k #k ))#(3k !k )(k #k ) ;(3(k #k )!(k #k )),
"(k !k )((k #k )!(k #k )) #4k (k #k )(k #k ),
* Tel.: #420-2-6605-2357; fax: #420-2-6641-4903. E-mail address: #
[email protected] (J. Flusser).
"(3k !k )(k #k )((k #k ) !3(k #k ))!(k !3k )(k #k ) ;(3(k #k )!(k #k )),
(1)
where
k " NO
(x!x )N(y!y )O f (x, y) dx dy A A \ \
(2)
is the central moment of the object f (x, y) and (x , y ) are A A the coordinates of the object centroid. Hu also showed how to achieve the invariance to scaling and demonstrated the discriminative power of these features in the case of recognition of printed capital characters. Since then, numerous works have been devoted to the various improvements and generalizations of Hu's invariants and also to its use in many application areas. Dudani [4] and Belkasim [5] described their application to aircraft silhouette recognition, Wong and Hall [6], Goshtasby [7] and Flusser and Suk [8] employed moment invariants in template matching and registration of satellite images, Mukundan [9,10] applied them to estimate the position and the attitude of the object in 3-D space, Sluzek [11] proposed to use the local moment invariants in industrial quality inspection and many
0031-3203/00/$20.00 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 1 2 7 - 2
1406
J. Flusser / Pattern Recognition 33 (2000) 1405}1410
authors used moment invariants for character recognition [5,12}15]. Maitra [16] and Hupkens [17] made them invariant also to contrast changes, Wang [18] proposed illumination invariants particularly suitable for texture classi"cation, Van Gool [19] achieved photometric invariance and Flusser et al. [20,21] described moment invariants to linear "ltering. Several papers studied recognitive and reconstruction aspects, noise tolerance and other numerical properties of various kinds of moment invariants and compared their performance experimentally [5,22}27]. Moment invariants were shown to be also a useful tool for geometric normalization of an image [28,29]. Large amount of e!ort has been spent to "nd the e!ective algorithms for moment calculation (see [30] for a survey). Recently, Flusser and Suk [31] and Reiss [32] have corrected some mistakes in Hu's theory and have derived the invariants to a$ne transform. In the contrast to a large number of applicationoriented works, only few attempts to derive invariants from moments of orders higher than three have been done. Li [33] and Wong [34] presented the systems of invariants up to the orders nine and "ve, respectively. Unfortunately, none paid attention to the mutual dependence/independence of the invariants. The invariant sets presented in their papers are algebraicly dependent. There is also a group of papers [14,35,36] that use Zernike moments to construct rotation invariants. Their motivation comes from the fact that Zernike polynomials are orthogonal on a unit circle. Thus, Zernike moments do not contain any redundant information and are more convenient for image reconstruction. However, Teague [35] showed that Zernike invariants of second and third orders are equivalent to the Hu's ones when expressing them in terms of geometric moments. He presented the invariants up to eight order in explicit form but no general rule as to the way to derive them is given. Wallin [36] described an algorithm for a formation of moment invariants of any order. Since Teague [35] as well as Wallin [36] were particularly interested in the reconstruction abilities of the invariants, they did not pay much attention to the question of independence. However, the independence of the features is a fundamental issue in all the pattern recognition problems, especially in the case of a high-dimensional feature space. The bene"t of this paper is twofold. First, a general scheme how to derive moment invariants of any order is presented. Secondly, we show that there exist relatively small set (basis) of the invariants by means of which all other invariants can be expressed and we give an algorithm for its construction. As a consequence of this, we show that most of the previously published sets of rotation moment invariants including Hu's system (1) are dependent. This is really a surprising result giving a new look at Hu's invariants and possibly yielding a new interpretation of some previous experimental work.
2. A general scheme for deriving invariants There are various approaches to the theoretical derivation of moment-based rotation invariants. Hu [1] employed the theory of algebraic invariants, Li [33] used the Fourier}Mellin transform, Teague [35] and Wallin [36] proposed to use Zernike moments and Wong [34] used complex monomials which also originate from the theory of algebraic invariants. In this paper, we present a new scheme, which is based on the complex moments. The idea to use the complex moments for deriving invariants was already described by Mostafa and Psaltis [24] but they concentrated themselves to the evaluation of the invariants rather than to constructing higher-order systems. In comparison with the previous approaches, this one is more transparent and allows to study mutual dependence/independence of the invariants easily. It should be noted that all the above approaches di!er from each other formally by the mathematical tools and the notation used but the general idea behind them is common and the results are similar or even equivalent. Complex moment c of the order (p#q) of an inteNO grable image funciton f (x, y) is de"ned as
c " NO
(x#iy)N(x!iy)Of (x, y) dx dy, \ \
(3)
where i denotes the imaginary unit. Each complex moment can be expressed in terms of geometric moments m as NO
N O p c " NO k I H
q j
(!1)O\ H iN>O\I\H m
. I>HN>O\I\H (4)
In polar coordinates, Eq. (3) becomes the form
c " NO
p
rN>O>e N\OFf (r, h) dr dh.
(5)
It follows immediately from Eq. (5) that c "cH (the NO ON asterix denotes a complex conjugate). The following lemma describes an important rotation property of the complex moments. Lemma 1. Let f be a rotated version (around the origin) of f, i.e. f (r, h)"f (r, h#a) where a is the angle of rotation. Let us denote the complex moments of f as c . Then NO c "e\ N\O?c . NO NO
(6)
Using Eq. (5), the proof of Lemma 1 is straightforward. It can be seen immediately that "c " is invariant to NO rotation for any p and q. However, the moment magnitudes do not generate a complete set of invariants. In the following theorem, we propose a better approach to the construction of rotation invariants.
J. Flusser / Pattern Recognition 33 (2000) 1405}1410
Theorem 1. Let n*1 and let k , p and q ; i"1,2n, be G G G non-negative integers such that L k (p !q )"0. G G G G Then L I" cIG NGOG G is invariant to rotation.
on I if and only if there exists a function F of k variables containing only the operations multiplication, involution with an integer (positive or negative) exponent and complex conjugation, such that J"F(I , 2, I ). I
(7)
The proof of Theorem 1 follows immediately from Lemma 1. According to Theorem 1, some simple examples of rotation invariants are c , c c , c c , etc. As a rule, most invariants (7) are complex. If we want to have real-valued features, we only take the real and imaginary parts of each of them. To achieve also translation invariance (or, equivalently, invariance to the rotation around an arbitrary point), we only use the central coordinates in the de"nition of the complex moments (3). It can be seen that the Hu's invariants (1) are nothing else than particular representatives of the general form (7):
"c ,
"c c ,
"c c ,
"c c ,
"Re(c c ),
"Re(c c ),
"Im(c c ).
1407
De5nition 2. Let k'1 and let I"+I , 2, I , be a set I of rotation invariants of the type (7). The set I is said to be dependent if and only if there exists k )k such that I depends on I!+I ,. The set I is said to be indepenI I dent otherwise. According to this de"nition, +c c , c c ,, +c c , c c , c c c , and +c c , c c , are the examples of the dependent invariant sets. De5nition 3. Let I be a set of rotation invariants of the type (7) and let B be its subset. B is a basis of I if and only if E B is independent, E Any element of I!B depends on B (this property is called completeness). Now we can formulate the central theorem of this paper that tells us how to construct an invariant basis above a given set of moments. Theorem 2. Let M be a set of the complex moments of any orders (not necessarily of all moments), let MH be a set of their complex conjugates and let c 3M6MH such that NO p !q "1 and c O0. Let I be a set of all rotation NO invariants created from the elements of M6MH according to (7). Let B be constructed as follows:
(8)
3. Independence and completeness of the sets of invariants In this section, our attention will be paid to the construction of a basis of the invariants. Theorem 1 allows us to construct an in"nite number of the invariants for any order of moments, but only a few of them are mutually independent. By the term basis we intuitively understand the smallest set by means of which all other invariants can be expressed. The knowledge of the basis is a crucial point in all the pattern recognition problems because it provides the same discriminative power as the set of all invariants at minimum computational costs. To formalize this approach, we introduce the following de"nitions "rst. De5nition 1. Let k*1, let I"+I ,2, I , be a set of I rotation invariants of the type (7) and let J be an invariant of the same type. Invariant J is said to be dependent
3B). (∀p, q"p*qc 3M6MH)('(p, q),c cN\O NO NO ON Then B is a basis of I. For the proof of Theorem 2 see Appendix A.
4. Some consequences of Theorem 2 In this section, we highlight some consequences of Theorem 2 that are of practical importance. 4.1. On the dependence of the previously published invariants We show that the previously published systems of rotation invariants including the famous Hu's one are dependent. This fact has not been reported in the literature yet. Using Eq. (8) and assuming c O0, it is easy to prove that
# c c c c .
"c c " "
(c c )
1408
J. Flusser / Pattern Recognition 33 (2000) 1405}1410
Moreover, the Hu's system is incomplete. There are two third-order invariants } c c and c c } that are independent of + ,2, ,. Li [33] published a set of invariants from moments up to the ninth order. Unfortunately, his system includes the Hu's one as its subset and therefore it also cannot be a basis. Wong [34] presented a set of 16 invariants from moments up to the third order and a set of `more than 49a invariants from moments up to the fourth order. It follows immediately from Theorem 2 that a basis of the third-order invariants has only six elements and a basis of the fourth-order invariants has 11 elements (these numbers relate to the real-valued invariants). Thus, most of Wong's invariants are dependent. The systems of invariants published by Teague [35] and Wallin [36] are also dependent. We can, however, obtain an independent system just by removing all skew invariants. The proof of completeness is given in Ref. [36] but this term is in that paper de"ned as a possibility to recover all the moments up to the given order from the set of invariants. 4.2. An explicit construction of the third-order and fourth-order bases In this section, we present the bases of low-order invariants that have been constructed according to Theorem 2 and that we recommend to use for 2-D object recognition. E Second and third orders: t "c " , t "c c " , t "Re(c c )" , t "Im(c c ) "k ((k #k )!(k #k )) !(k !k )(k #k )(k #k ), t "Re(c c )" , t "Im(c c )" . E Fouth order: t "c , t "Re(c c ), t "Im(c c ), t "Re(c c ), t "Im(c c ).
In the case of third-order and fourth-order invariants, the bases are determined unambiguously. However, there are various possibilities as to apply Theorem 2 when constructing higher-order bases. The di!erence is in the choice of the indices p and q . Although it is not strictly required, it is highly desirable always to choose p and q as small as possible, because low-order moments are less sensitive to noise than the higher-order ones.
5. Skew invariants In this section, we investigate the behavior of the rotation invariants under re#ection. The invariants, which do not change their values under re#ection are traditionally called the true invariants while the others are called skew invariants [1] or pseudoinvariants [35]. Skew invariants distinguish between the mirrored images of the same object that is useful in some applications but may be undesirable in other cases. In the following text we show which invariants of those introduced in Theorem 1 are skew and which are the true ones. Let us consider an invariant of the type (7) and let us investigate its behavior under the re#ection across an arbitrary line. Due to the rotation and shift invariance, we can restrict ourselves to the re#ection across the x-axis. Without loss of generality, we consider the invariants from the basis only. Let fN (x, y) be a re#ected version of f (x, y), i.e. fN (x, y)"f (x, !y). It follows from Eq. (3) that c "cH . NO NO Thus, it holds for any basic invariant '(p, q) "cH (cH )N\O"'(p, q)H. '(p, q)"c cN\O NO ON NO O N This indicates that the real parts of the basic invariants are true invariants. On the other hand, the imaginary parts of them are skew invariants, because they change their signs under re#ection.
6. Summary and conclusion In this paper, the problem of the independence and completeness of the rotation moment invariants was discussed. Although the moment invariants have attracted a signi"cant attention of pattern recognition community within the last thirty years, they have not been studied from this point of view as yet. A general method how to derive the rotation invariants of any order was described "rst. Then the theorem showing what the basis of the invariants looks like was formulated and proven. This is the major theoretic result of the paper. Finally, the relationship to the previous works was demonstrated. As an interesting consequence
J. Flusser / Pattern Recognition 33 (2000) 1405}1410
of our results, it was shown that Hu's system of moment invariants is dependent and incomplete.
Acknowledgements This work has been supported by the grants Nos. 102/96/1694 and 106/97/0827 of the Grant Agency of the Czech Republic.
Appendix A. Proof of Theorem 2 Completeness of B: Let I be an arbitrary element of I. Thus L I" cIG , NGOG G where c 3M6M*. The product can be decomposed NGOG into two factors according to the relation between p and G q: G L L I" cIGG G cIGG G NO NO G GL> where p *q if i)n and p (q if i'n . G G G G Let us construct another invariant J from the elements of B only as follows:
1409
independence of the moments, it must hold p"p and q"q . That means, according to the above assumption, there exist invariants '(p , q ),2,'(p , q ) and L L '(s , t ),2,'(s , t ) from B!+'(p , q ), such that K K '(p , q )HIG L '(p , q )IGL (A.1) '(p , q )" G G G l GL> G G l . K '(s , t ) GK '(s , t )H G G G G GK> G G Substituting into Eq. (9) and grouping the factors c and c together, we get NO ON IGNG\OGc LGL >IGOG\NGL cIG L c LG cIG NO G NGOG GL> lOGNG. '(p , q )" O NK l c G GQG\RGc KGK >lGRG\QGK clG K c GG G ON NO G QGRG GK> R Q (A.2) Comparing the exponents of c and c on the both NO ON sides we get the constraints K L K " k (p !q )! l (s !t )"1 G G G G G G G G and
(A.3)
L K K " k (q !p )! l (t !s )"1. (A.4) G G G G G G GL> GK> Since the rest of the right-hand side of Eq. (A.2) must be equal to 1 and since the moments themselves are mutually independent, the following constraints must be ful"lled for any index i:
L L J" '(p , q )IG '(q , p )HIG. G G G G G GL> Grouping the factors c and c together we get ON NO
n "m , n"m, p "s , q "t , k "l . G G G G G G Introducing these constraints into Eqs. (A.3) and (A.4), we get K "K "0 that is a contradiction 䊐.
L L L L J"c G IGNG\OGc GL>IGOG\NG cIGG G cIGG G ON NO NO NO G GL>
References
L G G G
L GL> G G
I O \NGI. "c I N \OG c ON NO Since I is assumed to be an invariant, it must hold L L k (p !q )# k (p !q )"0 G G G G G G G GL> and, consequently, L L k (p !q )" k (q !p )"K. G G G G G G G GL> Now I can be expressed as a function of the elements of B: I"'(p , q )\)J. Thus, I has been proven to be dependent on B. Independence of B: Let us assume B is dependent, i.e. there exists '(p, q)3B, such that it depends on B ! +'(p, q),. As follows immediately from the mutual
[1] M.K. Hu, Visual pattern recognition by moment invariants, IRE Trans. Inform. Theory 8 (1962) 179}187. [2] G.B. Gurevich, Foundations of the Theory of Algebraic Invariants, Noordho!, Groningen, The Netherlands, 1964. [3] I. Schur, Vorlesungen uber Invariantentheorie, Springer, Berlin, 1968. [4] S.A. Dudani, K.J. Breeding, R.B. McGhee, Aircraft identi"cation by moment invariants, IEEE Trans. Comput. 26 (1977) 39}45. [5] S.O. Belkasim, M. Shridhar, M. Ahmadi, Pattern recognition with moment invariants: a comparative study and new results, Pattern Recognition 24 (1991) 1117}1138. [6] R.Y. Wong, E.L. Hall, Scene matching with invariant moments, Comput. Graphics Image Process. 8 (1978) 16}24. [7] A. Goshtasby, Template matching in rotated images, IEEE Trans. Pattern Anal. Mach. Intell. 7 (1985) 338}344. [8] J. Flusser, T. Suk, A moment-based approach to registration of images with a$ne geometric distortion, IEEE Trans. Geosci. Remote Sensing 32 (1994) 382}387. [9] R. Mukundan, K.R. Ramakrishnan, An iterative solution for object pose parameters using image moments, Pattern Recognition Lett. 17 (1996) 1279}1284.
1410
J. Flusser / Pattern Recognition 33 (2000) 1405}1410
[10] R. Mukundan, N.K. Malik, Attitude estimation using moment invariants, Pattern Recognition Lett. 14 (1993) 199}205. [11] A. Sluzek, Identi"cation and inspection of 2-D objects using new moment-based shape descriptors, Pattern Recognition Lett. 16 (1995) 687}697. [12] F. El-Khaly, M.A. Sid-Ahmed, Machine recognition of optically captured machine printed arabic text, Pattern Recognition 23 (1990) 1207}1214. [13] K. Tsirikolias, B.G. Mertzios, Statistical pattern recognition using e$cient two-dimensional moments with applications to character recognition, Pattern Recognition 26 (1993) 877}882. [14] A. Khotanzad, Y.H. Hong, Invariant image recognition by Zernike moments, IEEE Trans. Pattern Anal. Mach. Intell. 12 (1990) 489}497. [15] J. Flusser, T. Suk, A$ne moment invariants: A new tool for character recognition, Pattern Recognition Lett. 15 (1994) 433}436. [16] S. Maitra, Moment invariants, Proc. IEEE 67 (1979) 697}699. [17] T.M. Hupkens, J. de Clippeleir, Noise and intensity invariant moments, Pattern Recognition 16 (1995) 371}376. [18] L. Wang, G. Healey, Using Zernike moments for the illumination and geometry invariant classi"cation of multispectral texture, IEEE Trans. Image Process. 7 (1998) 196}203. [19] L. van Gool, T. Moons, D. Ungureanu, A$ne/photometric invariants for planar intensity patterns, Proceedings of the Fourth ECCV'96, vol. 1064, Lecture Notes in Computer Science, Springer, Berlin, 1996, pp. 642}651. [20] J. Flusser, T. Suk, S. Saic, Recognition of blurred images by the method of moments, IEEE Trans. Image Process. 5 (1996) 533}538. [21] J. Flusser, T. Suk, Degraded image analysis: an invariant approach, IEEE Trans. Pattern Anal. Mach. Intell. 20 (1998) 590}603. [22] R.J. Prokop, A.P. Reeves, A survey of moment-based techniques for unoccluded object representation and recognition CVGIP: Graphical Models and Image Processing, Vol. 54, 1992, pp. 438}460
[23] C.H. Teh, R.T. Chin, On image analysis by the method of moments, IEEE Trans. Pattern Anal. Mach. Intell. 10 (1988) 496}513. [24] Y.S. Abu-Mostafa, D. Psaltis, Recognitive aspects of moment invariants, IEEE Trans. Pattern Anal. Mach. Intell. 6 (1984) 698}706. [25] S.X. Liao, M. Pawlak, On image analysis by moments, IEEE Trans. Pattern Anal. Mach. Intell. 18 (1996) 254}266. [26] M. Pawlak, On the reconstruction aspects of moment descriptors, IEEE Trans. Inform. Theory 38 (1992) 1698}1708. [27] R.R. Bailey, M. Srinath, Orthogonal moment features for use with parametric and non-parametric classi"ers, IEEE Trans. Pattern Anal. Mach. Intell. 18 (1996) 389}398. [28] Y.S. Abu-Mostafa, D. Psaltis, Image normalization by complex moments, IEEE Trans. Pattern Anal. Mach. Intell. 7 (1985) 46}55. [29] M. Gruber, K.Y. Hsu, Moment-based image normalization with high noise-tolerance, Pattern Recognition 19 (1997) 136}139. [30] L. Yang, F. Albregtsen, Fast and exact computation of cartesian geometric moments using discrete Green's theorem, Pattern Recognition 29 (1996) 1061}1073. [31] J. Flusser, T. Suk, Pattern recognition by a$ne moment invariants, Pattern Recognition 26 (1993) 167}174. [32] T.H. Reiss, The revised fundamental theorem of moment invariants, IEEE Trans. Pattern Anal. Mach. Intell. 13 (1991) 830}834. [33] Y. Li, Reforming the theory of invariant moments for pattern recognition, Pattern Recognition 25 (1992) 723}730. [34] W.H. Wong, W.C. Siu, K.M. Lam, Generation of moment invariants and their uses for character recognition, Pattern Recognition Lett. 16 (1995) 115}123. [35] M.R. Teague, Image analysis via the general theory of moments, J. Opt. Soc. Am. 70 (1980) 920}930. [36] A. Wallin, O. Kubler, Complete sets of complex Zernike moment invariants and the role of the pseudoinvariants, IEEE Trans. Pattern Anal. Mach. Intell. 17 (1995) 1106}1110.
About the Author*JAN FLUSSER was born in Prague, Czech Republic, on April 30, 1962. He received the M.Sc. degree in mathematical engineering from the Czech Technical University, Prague, Czech Republic in 1985 and the Ph.D. degree in computer science from the Czechoslovak Academy of Sciences in 1990. Since 1985 he has been with the Institute of Information Theory and Automation, Academy of Sciences of the Czech Republic, Prague. Since 1995 he is the head of Department of Image Processing. Since 1991 he has been also a$liated with the Faculty of Mathematics and Physics, Charles University, Prague, where he gives the courses on Digital Image Processing. His current research interests include digital image processing, pattern recognition and remote sensing. He has authored or coauthored more than 30 scienti"c publications in these areas. Dr Flusser is a member of the Pattern Recognition Society, the IEEE Signal Processing Society, the IEEE Computer Society and the IEEE Geoscience and Remote Sensing Society.
Pattern Recognition 33 (2000) 1411}1422
Coarse-to-"ne planar object identi"cation using invariant curve features and B-spline modeling Yu-Hua Gu *, Tardi Tjahjadi Department of Signals and Systems, Chalmers University of Technology, Gothenburg, S-41296, Sweden School of Engineering, University of Warwick, Coventry, CV4 7AL, UK Received 7 January 1999; accepted 1 June 1999
Abstract This paper presents a hybrid algorithm for coarse-to-"ne-matching of a$ne-invariant object features and B-spline object curves, and simultaneous estimation of transformation parameters. For coarse-matching, two dissimilar measures are exploited by using the signi"cant corners of object boundaries to remove candidate objects with large dissimilarity to a target object. For "ne-matching, a robust point interpolation approach and a simple gradient-based algorithm are applied to B-spline object curves under MMSE criterion. The combination of coarse and "ne-matching steps reduces the computational cost without degrading the matching accuracy. The proposed algorithm is evaluated using a$ne transformed objects. 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Curve matching; Feature extraction; Dissimilar measures; B-splines; Object recognition
1. Introduction The identi"cation, matching and analysis of objects of interest are of prime importance in application domains including target tracking in tra$c [1,2], robot autonomous navigation [3], reconstruction of body structures in medicine [4], the use of eyes as an interface for controlling computer graphics [5], gesture analysis and recognition [6], image browsing and retrieval [7], and object-oriented video coding [8]. Object matching means establishing correspondences between object boundaries, shapes, texture and various other features in a set of images. Since an object may undergo various a$ne transformations including rotations, scalings and translations, object-matching also means computing any transformation parameters based on 2D or 3D object representations [9]. These methods can be further classi"ed as boundary, region and model-based methods [10].
* Corresponding author. Tel.: #46-31-772-1796; fax: #4631-772-1782. E-mail address:
[email protected] (Y.-H. Gu).
If an object is simple, the matching and analysis of a$ne objects could be successfully performed using boundarybased approaches. A direct approach is to match object boundary curves extracted from images, where the objects may undergo various a$ne transformations (e.g. rotations, scalings and translations) and small deformations. One method to compare two object curves is to match the corresponding points on two curves assuming a$ne rigid objects [11]. However, the method is both sensitive to noise because of imperfect data, and to imprecise correspondences. Instead of point matching, a variety of less noise-sensitive approaches have been proposed based on comparing corresponding line segments and polygonal arcs [12,13]. Many techniques have also been proposed for representing a curve by a set of features or by models. These include Fourier descriptors [14], B-splines [15,16], autoregressive model [17], moments [18], curvatures [19], HMM models [20] and wavelets [21]. Among these methods, B-splines are often used in computer graphics because of their properties of continuity, local shape controllability, and a$ne invariance [22]. In [15] a scheme for matching and classi"cation of B-spline curves is presented. However, it requires two separate
0031-3203/00/$20.00 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 1 3 1 - 4
1412
Y.-H. Gu, T. Tjahjadi / Pattern Recognition 33 (2000) 1411}1422
stages for classifying objects and estimating the corresponding a$ne parameters. Furthermore, the method uses the point where a curve crosses the positive horizontal x-axis, which is arbitrary for a$ne curves, as the starting point. This will result in a lack of good correspondences between curve segments and hence increase the error in curve matching. Motivated by the above problems, we present a hybrid two-step matching scheme which explores both dissimilarities of object features and of B-spline approximated curves. In the feature-based coarse-matching step, a small number of signi"cant corners is extracted from each discrete object curve to form a$ne-invariant object features. Since objects may undergo various a$ne transformations, features which are a$ne invariant are more attractive for object-matching. Many curve features, e.g., arc length and the area within an object boundary, are a$ne invariant [23,24]. We introduce two dissimilar measures on a$ne-invariant features based on signi"cant corners. These enable a fast identi"cation between a$ne and non-a$ne objects based on a small set of features. To maintain high reliability in object identi"cation, only candidate objects with large a$ne dissimilarities to a target object are rejected in the coarse-matching step. The non-rejected objects are then passed to the "ne-matching process, which not only enables matching based on the entire curve-shape information, but also enables the estimation of parameters associated with a$ne objects. We also introduce a gradient-based algorithm which simultaneously matches B-spline modeled object curves and estimates transformation parameters. This algorithm incorporates a robust method of assigning corresponding curve points via interpolation. The remaining of the paper is organized as follows. Section 2 describes the coarse-matching algorithm including the method for estimating signi"cant corners and the two dissimilar measures. Section 3 describes the "nematching algorithm, including the selection of corresponding curve points and the gradient-based MMSE matching algorithm. Section 4 presents the experimental results and evaluations. Finally Section 5 concludes the paper.
2. Coarse-matching based on a7ne-invariant features 2.1. Extracting signixcant corner points A corner point on an object curve subtends a sharp angle between its neighboring points. Signi"cant corners of an object curve are estimated and exploited for both coarse and "ne-matching. Let the boundary curve of an object be described by discrete points r "[x y ], I I I k"0, 1,2, n!1, and denote matrix transpose. Since the x and y coordinates of each curve point can be handled separately, the vector form will not be empha-
sized throughout the paper. De"ne an angle u assoI ciated with each point r as the angle between the two I vectors (r , r ) and (r , r ), i.e., I> I I I\ a#b!c u "cos\ , (1) I 2ab
where a""r !r ", b""r !r " and c""r !r ". I\ I I> I I> I\ To prevent arti"cial variation in angular values due to discrete points, a curve is smoothed before the angles are calculated. A smoothed curve rQK"[xQKyQK], k" I I I 0, 1,2, n!1, is obtained as 1 I>U 1 I>U xQK" x, yQK" y , (2) I J I J 2w #1 2w #1 JI\U JI\U where w is a small integer number. Signi"cant corners are those points which have the local minimum angular values below a pre-determined threshold. To prevent them being too closely located to each other, only one signi"cant corner is allowed in each curve interval w . That is, the angle u of a signi"cant corner point r must I I satisfy +u :" u "local minimum angle; u , min u; I I I J JZ I\U I>U
((u (Threshold) or (the ¸ most signi"cant I corners)),,
(3)
where the ¸ most signi"cant corners correspond to the ¸ smallest u . I 2.2. Dissimilar measures Let the ith object be described by the discrete boundary curve +rG ,, k"0, 1,2, n!1, and its centroid be I rG "[xG yG ]. Denote r G "rG !rG as the curve points havA A A I I A ing the coordinate origin shifted to the object centroid, and cnG and cn G , j"0, 1,2, ¸!1, as the signi"cant H H corners before and after shifting the coordinate origin. Further, assume the discrete points +rG , are densely I populated on the original curve such that the arc length between any two adjacent points is small. 2.2.1. Dissimilar Measure 1 Dissimilar Measure 1 computes the dissimilarity of two objects based on the accumulated di!erence of normalized arc lengths between adjacent signi"cant corners of the two object curves. Without loss of generality, assume that adjacent corners cnG and cnG are equal to H\ H curve points rG and rG , respectively. The arc length I I between two adjacent signi"cant corners is approximated by the accumulated chord length of adjacent curve points within these two corners, i.e., I len(cnG , cnG )" "rG !rG ", H\ H I I\ II>
(4)
Y.-H. Gu, T. Tjahjadi / Pattern Recognition 33 (2000) 1411}1422
where j"0, 1,2, ¸!1, k implies mod (k) and j implies L mod ( j), (e.g. mod (7)"2), due to the closed curve. * Eq. (4) is a good approximation under the assumption that +rG , are densely populated. Since arc lengths are I a$ne invariant [24], arc lengths between adjacent signi"cant corners are also a$ne invariant. To obtain scale invariance, arc lengths in Eq. (4) are normalized by the total length of the object curve as follows: len(cnG , cnG ) H\ H . ¸EN(cnG , cnG )" (5) H\ H *\len(cnG , cnG ) H\ H H Dissimilar Measure 1 is de"ned as the accumulated difference of corner-to-corner arc lengths between the two object curves l and m, *\ dissim (l, m)" min "¸EN(cnJ , cnJ ) H>Q\ H>Q WQW*\ H !¸EN(cnK , cnK)", (6) H\ H where s is used to eliminate the in#uence of rotation between the two objects. 2.2.2. Dissimilar Measure 2 Dissimilar Measure 2 computes the dissimilarity of two objects based on the accumulated di!erence of corner-to-centroid distance between the two object curves. Obviously, it is invariant to object translation and rotation. To obtain scale invariance, the scale factor b is estimated from the ratio of the curve lengths of two objects by using Eq. (4), i.e., *\len(cnJ , cnJ ) H\ H . bK " H (7) *\len(cnK , cnK) H\ H H The proof of Eq. (7) is straight forward: Since r J I is assumed to be a scaled version of r K, r J "br K, it folI I I lows that *\len(cn J , cn J )" L\ " r J !r J "" H\ H I\ H I I L\b " r K!r K ""b *\len(cn K ,cn K), where I I\ H\ H H I "r !r ""((x !x )#(y !y ). Note the I I\ I I\ I I\ translational-invariant property of any arc length, len(cn , cn ) in Eq. (7) is equal to len(cn , cn ). The H\ H H\ H Dissimilar Measure 2 between curves l and m with respect to l is de"ned as *\ "cn J !bK cn K" Q>H H . dissim (l, m)" min (8) "cn J " WQW*\ H Q>H Similar to Eq. (6), s is applied to eliminate the in#uence of rotation between the two objects. 2.3. Coarse-matching algorithm In coarse-object-matching, the two dissimilar measures are used for matching S candidate objects to a target object. If both dissim *¹h1 and dissim *¹h2 are satis"ed for a candidate object, where ¹h1 and ¹h2
1413
Table 1 Algorithm for coarse-matching For all candidate curves and a given mth target curve, do: For the lth candidate curve, do: Compute smoothed curve points rQK using Eq. (2); I Compute angle u using Eq. (1), where a, b and c are I calculated using rQK; I Extract ¸ signi"cant corners using Eq. (3); Compute the arc length between adjacent signi"cant corners len(cn , cn ) using Eq. (4); H\ H Compute the normalized arc length ¸EN(cn , cn ) using H\ H Eq. (5); Calculate dissim using Eq. (6); Estimate bK using Eq. (7), and calculate dissim using Eq. (8); If (dissim *¹h1 and dissim *¹h2), then eliminate this curve from the candidate list; otherwise forward this curve to the "ne-matching algorithm.
are pre-determined thresholds, this candidate object is considered to be a$ne dissimilar to the target object, and is thus removed from the candidate list. Otherwise, the candidate object is passed to the "ne-matching process to be analyzed further. The coarse-matching algorithm is summarized in Table 1. Since these dissimilar measures are based on a few sparsely located corners, the coarsematching process is only designed for discriminating candidate objects which are obviously a$ne di!erent from the target object. In general, di!erent types of object contain di!erent number of signi"cant corners. However, objects of similar type, e.g., hand-gesture curves or aircrafts, may contain a similar number of signi"cant corners. If the di!erence in number between two unknown objects is large, e.g., more than 50%, it is assumed that the two objects are a$ne dissimilar. Therefore, the algorithm simply regards the candidate object as dissimilar without applying the dissimilar measures. This object is then removed from the candidate list. Otherwise, extra corners are added to the object which has fewer corners by selecting the "nal number of signi"cant corners ¸"max +¸ , in Eq. (3), where ¸ is the initial number of G G G signi"cant corners for the ith object in a group.
3. Fine-matching of object curves The "ne-matching method is applied to those nonrejected candidate curves from the coarse-matching. The criterion is to minimize the MSE between the continuous B-spline target curve and B-spline candidate curves with respect to the unknown transformation parameters. However, the e$cient implementation requires comparing two sets of B-spline curve points at corresponding curve positions. This ensures that the resulting MMSEs
1414
Y.-H. Gu, T. Tjahjadi / Pattern Recognition 33 (2000) 1411}1422
from matching truly re#ect the a$ne dissimilarity of objects, while the contributions of error from inaccurate point positions on curves are negligible. We present a robust approach for selecting points from each B-spline curve, which results in good correspondences if two objects are a$ne related. This includes exploiting signi"cant corners and interpolating an extra set of corresponding points on each curve. We then present a gradient-based algorithm which simultaneously estimates the transformation parameters and sub-optimally determines the a$ne-similar object from all candidate objects using the MMSE criterion. To limit the scope of this paper, only objects with closed boundary curves are considered. The types of a$ne transformation are limited to translations, rotations and scalings. Further, cubic B-spline is chosen as a tradeo! between computation and smoothness of curves [22]. 3.1. B-spline curve modeling and approximation For "ne object-matching, a discrete-object curve +r ,, k"0, 1,2, n!1, is modeled by a continuous BI spline curve r(t). Cubic B-spline curve model is brie#y reviewed here in order to facilitate the subsequent descriptions and clarify the notations used. Readers are referred to de Boor [22] and Rogers and Adames [25] for further details. Let a discrete closed curve consist of vector points r "[x y ], k"0, 1,2, n!1, in a 2D plane, and be I I I modeled by a continuous periodic cubic B-spline curve r(t)"[x(t) y(t)], where t3[t , t ] is a parameter.
A B-spline knot point r( "[x( y( ], j"0, 1,2, m!1, is H H H de"ned as a position vector at the connecting point of two B-spline curve segments, and is equal to the discrete curve sample. For B-spline curve modeling, m"n holds. The jth curve segment of a B-spline curve can be described by the cubic B-spline coe$cients (or, the control points) +C , on the four basis functions +N (t),: H G r (t)" C N (t), H H>G\ K G G
(9)
where t (min))t)t (max), C "C , and C "C . H H K \ K\ The basis functions N (t) may be derived from the G Cox}deBoor recursive formulae [22]. A B-spline curve with m normalized segments is represented by r(t)" K\r (t!t ), where t3[t , t ], and t"t!t 3 [0,1] H K\ H H H is the range of normalized parameter in the jth segment. It follows that r(t) can then be spanned on (m#3) normalized cubic B-spline basis functions Q (t) with coe$H cients +C ,, H K> r(t)" C Q (t). H> K H H
(10)
Setting t"t in Eq. (10) yields the knot points H r( , j"0, 1,2, m!1. A B-spline curve is fully described H by m knot points +r( ,, or, by m coe$cients +C , [22]. H H If m(n, the resulting B-spline curve is an approximation of the original curve. B-spline curve approximation involves determining the &best' m and "nding the m &best' knot points, or coe$cients, such that the resulting curve is as close as possible to the original curve under a given criterion. This is non-trivial in general [22,26]. In this paper, knot points of a Bspline object curve are chosen using a constraint active B-spline curve model under MMSE criterion which automatically includes all signi"cant corners (de"ned in Section 2.1) as a subset of knot points. Let a B-spline object curve r(t) be described by m knot points r( and ¸ of H these knot points are equal to the signi"cant corners on the curve, i.e., cn 3+r( ,, s"0, 1,2¸!1, and ¸(m. Q H The constraint active B-spline model "nds the &best' remaining (m!¸) knot points such that the mean-square error, MSE(r( ,2,r( )"(1/n) L\#r !r(t )#, is mini* K\ I I I mized, where r is a discrete sample from the original I curve, r(t ) is either a B-spline knot point or an interI polated point, and t is a parameter. To limit the scope of I this paper, the detailed algorithm for "nding these knot points is not included in the paper. 3.2. Method for selecting curve points In order to calculate the MMSE distance between B-spline object curves, the same number of curve points must be chosen at corresponding positions on the curves to be compared. Let a B-spline object curve be represented by m knot points, i"1, 2. In general, m Om , or, G m "m but the curve segments of two a$ne-object curves lack correspondences. The task is to interpolate a small number of m points at corresponding positions so that the distance between the two curves can be computed. The selected "nal curve points of each object may contain three parts: (a) the knot points from the B-spline object curve itself; (b) points interpolated at corresponding positions of the knot points from another object to be compared; (c) extra points by uniform interpolation. The basic set of points is selected from (a) and (b), and contains the minimum number of points as follows: All knot points including signi"cant corners are included in the basic set. In addition, a set of points is interpolated using the lengths of B-spline curve segments and the sharpest signi"cant corner as the starting point. Assume curve i, rG(t), i"1, 2, contains m knots. To interpolate G points rG(t ), t are determined from the normalized H H lengths of curve segments (i.e., the ratios of the lengths of curve segments to the total length of the curve) by applying Eq. (5) to the knot points. Starting from the sharpest signi"cant corner of curve 1, (m !1) new points r(t ) are interpolated at t equal to H H
Y.-H. Gu, T. Tjahjadi / Pattern Recognition 33 (2000) 1411}1422
the normalized curve segment lengths of curve 2. Similarly, using the sharpest corner of r(t) and the normalized curve segment lengths of r(t), (m !1) new points r(t ) are interpolated on curve 2. Once t are H H determined, points rG(t ) are interpolated using Eq. (10). H Hence, the basic set for each curve contains m" (m #m !1) corresponding points. As an example, Fig. 1 illustrates the point interpolation process for twohand curves, one is rotated by 303. Initially, the hand curves contain 39 and 52 knot points, respectively. If the number of points in the basic set is very small, more corresponding points may be interpolated to each of the object curves. As indicated in (c), this is done by uniformly selecting on both curves the same normalized parameters t starting from their sharpest corner. H Whether these extra points are required is a tradeo! between the computation and the accuracy of estimation. Our experiments indicate that using points in the basic set is su$cient in most cases, e.g., in the hand curves, a basic set contains approximately m"70 points and results in MMSE close to those using 600 points. 3.3. B-spline object-matching and parameter estimation A gradient-based algorithm is introduced to simultaneously compute the MMSE distance between two object curves, and to estimate the associated translation, rotation and scaling parameters. Let r2(t) be the B-spline curve of a target object boundary curve, and rQ(t), s"1, 2,2, S, be the candidate object-boundary curves. The mean-square error (MSE) distance between r2(t) and rQ(t) is calculated by 1 K\ e(r2(t), rQ(t))" #r2(t )!bR(h)rQ(t )!b#, (11) H H m H where r(t ) denote B-spline points including both knot H points and interpolated points.
1415
The objective of object-matching is to "nd the &best' a$ne-similar object curve between the target r2(t) and one of the candidate curves rQ(t), s"1, 2 ,2, S, such that the two curves have a minimum distance with respect to the unknown parameters, i.e, e (r2(t),
rQ(t))"min (min e(r2(t), rQ(t))). If this minimum disQ @F@ tance is greater than an acceptable level, all candidate curves are rejected. A sub-optimal matching process is applied. First, the displacement vector is estimated from the distance between the estimated centroids of the two object curves. Then, using the gradient-based method, the error e(r2(t), rQ(t)) is minimized with respect to b and h, i.e., by setting *e/*h "0 and *e/*b "0. The rotaI I I I tion and scaling parameters at (k#1)th iteration are updated from the kth iteration using the steepest-descent method: h "h !k I, I> I F
b "b !k I, I> I @
(12)
where I"*e/*h , I"*e/*b , k and k are small F I I @ I I positive constants which control the speed of convergence and the steady-state performance. De"ning d(r(t ))"r2(t )!bR(h)rQ(t )!b, Eq. (11) is H H H equivalent to e(r2(t), rQ(t))"(1/m) K\d(r(t )). Noting H H e"[e(x) e(y)], d(r(t ))"[d(x(t )) d(y(t ))], r(t)"[x(t) y(t)], H H H b"[b b ] and V W
R"
cos h
sin h
!sin h cos h
,
the 2 elements in d(r(t )) are equal to H d(x(t ))"x2(t )!b(cos h xQ(t )#sin h yQ(t ))!b , H H H H V d(y(t ))"y2(t )!b(!sin hxQ(t )#cos h yQ(t ))!b H H H H W (13)
Fig. 1. Interpolating corresponding curve points on two hand curves. (&o': the estimated sharpest corner). From left to right: (a) A hand curve with 39 knot points &*'; (b) The rotated hand curve with 52 knot points &*'; (c) Interpolated points ' on (a); (d) Interpolated points ' on (b).
1416
Y.-H. Gu, T. Tjahjadi / Pattern Recognition 33 (2000) 1411}1422
and "d(r(t ))""d(x(t ))#d(y(t )) holds. Hence, *e/*h " H H H I I (1/m) K\(*d(x(t ))/*h #*d(y(t ))/*h ) and *e/*b " H I H I I H I I I (1/m) K\(*(d(x(t ))/*b #*(d(y(t ))/*b ), which are H I H I I H I equivalent to
extending Eq. (11) to include both types of curves
2b K\
I" I d (x(t ))(sin h xQ(t )!cos h yQ(t )) F I H I H I H m H
where l is the number of curves, including the object boundary and all its inner curves.
#d (y(t ))(cos h xQ(t )#sin h yQ(t )), I H I H I H
(14) 4. Experimental results
2 K\
I"! d (x(t ))(cos h xQ(t )#sin h yQ(t )) @ I H I H I H m H !d (y(t ))(sin h xQ(t )!cos h yQ(t )). I H I H I H
(15)
The new b and h are obtained using Eq. (12). I> I> Substituting b and h into Eq. (11) and noting I> I> Eq. (13) yields e (r2(t ), rQ(t )) I> H H 1 K\ " +(x2(t )!b (cos h xQ(t ) H I> I> H m H #sin h
I>
1 KJ\ e(r2(t), rQ(t))" #r2J(t )!bR(h)rQJ(t )!b#, H H m J J J H
yQ(t ))!b )#(y2(t ) H V H
#b (sin h xQ(t )!cos h yQ(t ))!b ),. (16) I> I> H I> H W The iteration is repeated until Eq. (16) converges, or a pre-de"ned maximum number of iteration is reached. For each candidate curve s, the iterations can be summarized as updating h using Eqs. (12) and (14), upI> dating b using Eqs. (12) and (15), and updating I> e (r2(t ), rQ(t )) using Eq. (16). This process is repeated I> H H for all candidate curves. The candidate curve s which has the minimum e(r2(t), rQ(t)) value, s"1, 2,2, S, is selected, i.e., s "argmin +e(r2(t), rQ(t)),. If this value is Q within an acceptable level then a correct match is assumed, and the corresponding parameters are assigned as the transformation parameters. Otherwise all candidate curves are rejected. In situations where a rigid object is characterized not only by its outside boundary but also by some inner closed curves, the algorithm may easily be generalized by
Four sets of object boundaries, including hand-gesture curves, tools, aircrafts and animals, were used for tests. These object boundaries were extracted using the edgecurve estimation algorithms [27,28] followed by a boundary tracing and closing algorithm [29]. Figs. 2}5 show, respectively, the four sets of closed object-boundary curves. A$ne hand curves were obtained by applying various rotations, scalings and translations to the original discrete curves. For all other object curves, a$ne curves were obtained by applying various rotations, scalings and translations to the original two-tone images followed by boundary extraction. 4.1. Evaluation of the coarse-matching algorithm The four sets of curves were used to evaluate the coarse-matching algorithm. The initial number of signi"cant corners for each curve are listed in Table 2. These initial corners are marked on the curves in Figs. 2}5. For each set of curves, the number of "nal signi"cant corners ¸ was set equal to the maximum number of initial signi"cant corners in that set (3rd column of Table 2). For each object curve, dissimilar measures were applied to its a$ne transformed curves (denoted as &a$ne curves') as well as other object curves (i.e., non-a$ne-similar curves or non-a$ne curves). Since the estimated signi"cant corners of two a$nerelated curves may not correspond at the exact curve locations, it is desirable that the dissimilar measures be insensitive to small positional shifts of corners due to estimation errors. To determine the sensitivity of the
Fig. 2. Hand-gesture curves with marked corners. From left to right: hd1}hd6. &*': the signi"cant corner; &o': the sharpest corner.
Y.-H. Gu, T. Tjahjadi / Pattern Recognition 33 (2000) 1411}1422
1417
Fig. 3. Boundaries of planes with marked corners. From left to right, and top to bottom: plane 1 to plane 9. &*': Signi"cant corner; &o': the sharpest corner.
Fig. 4. Boundaries of animals with marked corners. From left to right, and top to bottom: a1}a8. &*': Signi"cant corner; &o': the sharpest corner.
dissimilar measures, dissimilar values were computed by shifting signi"cant corners of a$ne-related curves within a range of discrete sample points along the original curve. The set of six hand curves was used for such tests. The
corresponding dissimilar values were then compared to determine whether large di!erences appear from small positional shifts of corners. Table 3 shows the test results, where the range of shifts are indicated by `shifted
1418
Y.-H. Gu, T. Tjahjadi / Pattern Recognition 33 (2000) 1411}1422
Fig. 5. Boundaries of tools with marked corners. From left to right: tool1}tool4. &*': Signi"cant corner; &o': the sharpest corner.
Table 2 Number of initial and "nal signi"cant corners Objects
No. of initial corners
No. of "nal corners (¸)
hd1}hd6 Plane 1}Plane 9
11, 8, 9, 9, 10, 7 14, 16, 20, 19, 17, 20, 16, 17, 15 17, 16, 18, 11, 20, 14, 13, 16 2, 12, 6, 9
11 20
a1} a8 Tool1}Tool4
20 12
samples from cornersa. The results show that the dissimilarities of a$ne curves are much smaller than that of non-a$ne-similar curves, hence these two types of curve are well separated in the feature spaces. These results also hold when the signi"cant corners of two a$ne curves do not correspond exactly due to the shifts. Consequently, the two proposed dissimilar measures are e!ective criteria for discriminating candidate objects which are obviously di!erent from the target object. Tables 4}6 show the measured dissimilar values, dissim and dissim , and the associated mean values of dissim and dissim for all non-a$ne objects in the sets of planes, animals and tools, respectively. For a$ne object curves, the values of dissim and dissim were in the order of 10\ (not included in the tables), well below one half of the corresponding mean dissimilar values of non-a$ne objects. These results indicate that the proposed dissimilar measures are valid and e!ective measures, since there is a large inter-class distance between the features of a$ne and non-a$ne objects.
a small number of knot points (3rd column of Tables 7}10) including the number of signi"cant corners (¸ in Table 2). In all these tests k "k "0.1 were selected, using the principles that a smaller k gives a better G steady-state performance but a slower convergence, while a large k can cause the algorithm to be unstable and to G oscillate around the minimum. Tables 7}10 show the results from the "ne object-matching algorithm. In these tables, the accuracies of the estimated parameters were evaluated by the di!erence in rotations "*h"""hK !h", in scalings "*b"""bK !b", and in translations "*b""((bK !b )#(bK !b ). The MMSE values V V W W from matching all non-a$ne object curves in the same set are also included in these tables as references. The results consistently showed that the algorithm has estimated parameters with high accuracy and with small MMSEs for a$ne objects. The convergence of the algorithm was fast, with an average number of 50 iterations for hand curves, 49 for aircraft curves, 46 for animal curves, and 42 for tool curves. Further, for all non-a$ne objects, much larger MMSE values were obtained. Consequently, the "ne-matching algorithm easily distinguishes a$ne and non-a$ne object curves due to the large separable MMSE distances between these two cases. Comparing the results of matching aircraft curves in Table 8 to those shown in Tables II}IV of Cohen et al. [15] (the "rst two rows corresponding to rotations, scalings and translations), our results show a signi"cantly increased capability for object discrimination, i.e., with smaller MMSE values for all a$ne-similar planes (approximately 10 times smaller), and larger MMSE values for the remaining non-a$ne plane curves (approximately 10 times larger). Consequently, our algorithm has a greater potential in discriminating object curves.
4.2. Evaluation of the xne-matching algorithm 4.3. Coarse-to-xne-matching of objects The four sets of object curves and their a$ne transformed versions were used to evaluate the curve matching and parameter estimation using only the "nematching algorithm. In the experiments, each object curve was approximated by the cubic B-spline using
The coarse-to-"ne-matching algorithm was applied, respectively, to the four sets of object curves and their a$ne transformed versions. Since dissimilar values between a$ne object curves were well below the mean
Y.-H. Gu, T. Tjahjadi / Pattern Recognition 33 (2000) 1411}1422
1419
Table 3 Coarse-matching of a$ne and non-a$ne-similar hand curves. hd1}hd6: the original hand curves; t-hd1}t-hd6: a$ne transformed hand curves Object
Objects compared
Shifted samples from corners
hd1
t-hd2,3,4,5,6 t-hd1 t-hd1 t-hd1,3,4,5,6 t-hd2 t-hd2 t-hd1,2,4,5,6 t-hd3 t-hd3 t-hd1,2,3,5,6 t-hd4 t-hd4 t-hd1,2,3,4,6 t-hd5 t-hd5 t-hd1,2,3,4,5 t-hd6 t-hd6
0 [!10,10] [!20,20] 0 [!10,10] [!14,14] 0 [!10,10] [!20,20] 0 [!10,10] [!12,12] 0 [!10,10] [!20,20] 0 [!10,10] [!20,20]
hd2
hd3
hd4
hd5
hd6
dissim 0.2446}0.7141 )0.0090 )0.0157 0.4026}0.5657 )0.0171 )0.0197 0.2892}0.6477 )0.0128 )0.0197 0.4071}0.7141 )0.0124 )0.0149 0.2412}0.5748 )0.0125 )0.0154 0.4026}0.4947 )0.0112 )0.0198
Mean (dissim ) 0.4600 * * 0.4492 * * 0.4243 * * 0.5677 * * 0.4018 * * 0.4532 * *
dissim 0.2924}0.5217 )0.0886 )0.1945 0.3138}0.9084 )0.0999 )0.1532 0.3059}0.5770 )0.0933 )0.1821 0.3881}1.1559 )0.0864 )0.1022 0.3113}0.5003 )0.1557 )0.2100 0.3487}1.0332 )0.0780 )0.1441
Mean (dissim ) 0.4004 * * 0.5562 * * 0.4230 * * 0.7757 * * 0.3991 * * 0.7075 * *
Table 4 Coarse-matching of non-a$ne-similar airplane curves Object
dissim (min, max)
Mean (dissim )
dissim (min, max)
Mean (dissim )
Plane1 Plane2 Plane3 Plane4 Plane5 Plane6 Plane7 Plane8 Plane9
(0.307, (0.347, (0.388, (0.284, (0.307, (0.317, (0.298, (0.337, (0.284,
0.400 0.464 0.459 0.390 0.442 0.395 0.384 0.421 0.414
(0.289, (0.319, (0.309, (0.132, (0.232, (0.352, (0.321, (0.495, (0.155,
0.437 0.454 0.404 0.386 0.414 0.418 0.535 0.788 0.378
0.519) 0.524) 0.544) 0.451) 0.544) 0.461) 0.450) 0.508) 0.501)
0.567) 0.636) 0.489) 0.556) 0.606) 0.505) 0.729) 1.068) 0.464)
Table 5 Coarse-matching of non-a$ne-similar animal curves. a1}a8: cat, elephant, horse, lizard, mouse, rabbit, squirrel, bear Object
dissim (min, max)
Mean (dissim )
dissim (min, max)
Mean (dissim )
a1 a2 a3 a4 a5 a6 a7 a8
(0.456, (0.452, (0.425, (0.482, (0.459, (0.425, (0.467, (0.452,
0.495 0.509 0.492 0.523 0.528 0.488 0.506 0.495
(0.233, (0.259, (0.252, (0.280, (0.237, (0.233, (0.234, (0.271,
0.367 0.458 0.385 0.333 0.358 0.344 0.366 0.448
0.537) 0.588) 0.564) 0.588) 0.564) 0.513) 0.562) 0.540)
0.751) 0.971) 0.733) 0.379) 0.752) 0.679) 0.663) 0.888)
1420
Y.-H. Gu, T. Tjahjadi / Pattern Recognition 33 (2000) 1411}1422
dissimilar values of non-a$ne object curves, ¹h1 and ¹h2 were chosen to be proportional to the mean dissimilar value. Curve `tool1a in Table 6 is a special case since the number of initial signi"cant corners is obviously di!erent from the remaining tools. `tool1a was therefore considered by the coarse-matching algorithm as nona$ne-similar and removed from the candidate list before the dissimilar measures were applied. The parameters used in these tests are summarized in Table 11.
Table 6 Coarse-matching of non-a$ne-similar tool curves Object Tool1 Tool2 Tool3 Tool4
dissim Mean (dissim ) * *0.4556 *0.5091 *0.6992
* 0.6516 0.6202 0.7618
dissim Mean (dissim ) * *0.4917 *0.3374 *0.6793
* 0.5745 0.7869 1.4395
Table 7 Fine-matching of hand curves Object
hd1 hd2 hd3 hd4 hd5 hd6
A$ne-transformed
Non-a$ne-similar
No. samples in discrete curve
No. B-spline knots
"*h3"
"*b" (10\)
"*b" (10\)
MMSE
1223 675 894 656 950 783
34 34 34 33 33 34
0.1316 0.0096 0.0198 0.0184 0.0078 0.0099
0.93 3.65 0.64 2.03 7.88 7.00
5.640 0.141 0.358 0.250 0.364 0.248
3.020 0.025 0.088 0.090 0.545 0.207
MMSEs
*1910.6 * 678.3 * 624.9 * 510.5 *1226.4 * 580.1
Table 8 Fine-matching of airplane curve Object
Plane1 Plane2 Plane3 Plane4 Plane5 Plane6 Plane7 Plane8 Plane9
A$ne-transformed
Non-a$ne-similar
No. samples in discrete curve
No. B-spline knots
"*h3"
"*b" (10\)
"*b" (10\)
MMSE
663 672 735 757 725 706 539 552 656
35 35 35 36 35 41 35 34 35
0.0448 0.0641 0.0082 0.0244 0.0370 0.0271 0.0205 0.0276 0.1128
0.096 1.198 0.166 0.967 0.447 1.151 0.560 0.339 0.625
0.170 0.016 0.154 0.193 0.266 0.105 0.100 0.062 0.090
0.1818 0.3257 0.1202 1.1389 0.2372 0.5699 0.3083 0.2003 0.9178
MMSEs
*196.5 *215.3 *171.0 *184.4 *341.5 *300.9 *149.8 *184.2 *163.5
Table 9 Fine-matching of animal curves Object
a1 a2 a3 a4 a5 a6 a7 a8
A$ne-transformed
Non-a$ne-similar
No. samples in discrete curve
No. B-spline knots
"*h3"
"*b" (10\)
"*b" (10\)
MMSE
924 799 881 868 916 887 769 539
35 38 35 35 35 34 34 29
0.0130 0.0023 0.0710 0.0200 0.0163 0.0598 0.0410 0.0269
0.580 0.234 0.026 0.170 0.517 1.145 0.711 0.028
0.170 0.016 0.154 0.193 0.266 0.105 0.100 0.062
0.4122 0.2835 1.0300 0.2633 0.2081 0.5984 0.7484 0.1673
MMSEs
*1615.9 * 807.2 *1446.7 *1440.1 *1718.6 *1056.1 * 990.9 * 480.7
Y.-H. Gu, T. Tjahjadi / Pattern Recognition 33 (2000) 1411}1422
1421
Table 10 Fine-matching of tool curves Object
Tool1 Tool2 Tool3 Tool4
A$ne-transformed
Non-a$ne-similar
No. samples in discrete curve
No. B-spline knots
MMSE
"*h3" (10\)
"*b" (10\)
"*b" (10\)
431 201 777 576
21 32 31 32
0.2717 0.0901 0.0545 0.0001
1.804 8.633 26.56 2.563
16.71 8.61 4.21 0.38
1.6991 1.607 0.738 1.293
Table 11 The parameters used in coarse-to-"ne curve matching tests Parameter
In Eq.
Value(s) used in tests
w w ¸ Threshold ¹h1 ¹h2 k and k
(2) (3) (3) (3) In Section 2.3 In Section 2.3 (12)
[1,5] [10,15] 3rd column of Table 2 [1453, 1703] 0.5 mean(dissim ) 0.5 mean(dissim ) 0.1
The "nal results obtained from the coarse-to-"nematching were, as expected, the same as those obtained by just applying the "ne-matching algorithm. However, most candidates were removed from the coarse-matching step. The approximate program execution time for the coarse-to-"ne comparison of two curves on a DECAXP-3000 workstation was 3 s.
MMSEs
* 149.6 * 159.9 *2577.4 * 195.3
corners are extracted for each object using the smoothed discrete object boundary curve. By applying the proposed dissimilar measures over the object features derived from the signi"cant corners, our test results showed small dissimilar values among a$ne similar objects and large dissimilar values among non-a$ne similar objects. Consequently, the coarse-matching step is e!ective in removing most a$ne-dissimilar objects with low computation. By exploiting signi"cant corners for interpolating curve points and by using the proposed "ne-matching algorithm, we simultaneously achieved small MMSE values for matching a$ne objects and obtained the estimated transformation parameters with good accuracy. Further, we obtained large MMSE di!erences between matching a$ne object curves and matching non-a$ne object curves. Comparing to the results shown in [15], our "ne-matching algorithm has shown a greater capacity for discriminating objects. Finally, by combining the coarse and the "ne-matching steps the algorithm is robust in terms of matching accuracy and computational e$ciency.
4.4. Limitations References Since the "ne-matching step uses B-spline approximated curves, the algorithm requires that objects be rigid, or only with small deformations. The algorithm is also sensitive to the quality of the object boundary curves. Inaccurately estimated boundary curves are equivalent to curves from deformed objects. Consequently, a poorly estimated curve of an a$ne object might be recognized as an a$ne-dissimilar object. Finally, the a$ne transformations considered are limited to only rotations, scalings and translations.
5. Conclusions A coarse-to-"ne curve-matching algorithm is proposed and tested for matching planar objects. The signi"cant
[1] G. Sullivan, Visual interpretation of known objects in constrained scenes, Philos. Trans. Roy. Soc. Lond. B. 337 (1992) 361}370. [2] X. Wu, B. Bhanu, Gabor wavelet representation for 3-D object recognition, IEEE Trans. Image Process. 6 (1) (1997) 47}64. [3] R.C. Nelson, J. Aloimonos, Obstacle avoidance using #ow "eld divergence, IEEE Trans. Pattern Anal. Mach. Intell. 11 (10) (1989) 1102}1106. [4] G.E. Mailloux, F. Langlois, P.L. Simard, M. Bertrand, Restoration of the velocity "eld of the heart from twodimensional echocardiograms, IEEE Trans. Med. Imaging 8 (2) (1989) 143}153. [5] A. Gee, R. Cipolla, Fast visual tracking by temporal consensus, Image Vision Comput. 14 (2) (1994) 105}114. [6] V.I. Pavlovic, R. Sharma, T.S. Huang, Visual interpretation of hand gestures for human}computer interpretation:
1422
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
Y.-H. Gu, T. Tjahjadi / Pattern Recognition 33 (2000) 1411}1422 a review, IEEE Trans. Pattern Anal. Mach. Intell. 19 (1997) 667}695. B.S. Manjunath, W.A. Ma, Texture features for browsing and retrieval of image data, IEEE Trans. Pattern Anal. Mach. Intell. 18 (1997) 837}842. H.G. Musmann, M. Hotter, J. Ostermann, Object-oriented analysis-synthesis coding of moving images, Signal Process. Image Commun. 1 (2) (1989) 117}138. R. Chellappa, B. Girod, D.C. Munson Jr., M. Tekalp, M. Vetterli, The past, present and future of image and multidimensional signal processing, IEEE SP Mag. 15 (2) (1998) 21}58. L. Torres, M. Kunt, Video Coding: The 2nd Generation Approach, Kluwer Academic Publisher, Dordrecht, 1996. J. Weng, N. Ahuja, T.S. Huang, Motion and structure from point correspondences with error estimation: Planar surfaces, IEEE Trans. Signal Process. 39 (12) (1991) 2691}2717. B. Kamgar-Parsi, B. Kamgar-Parsi, Matching sets of 3D line segments with application to polygonal arc matching, IEEE Trans. Pattern Anal. Mach. Intell. 19 (1997) 1090}1099. B. Kamgar-Pasi, A. Magalit, A. Rosenfeld, Matching general polygonal arcs, CVGIP: Image Understanding 53 (1991) 227}234. E. Persoon, K.S. Fu, Shape discrimination using Fourier descriptors, IEEE Trans. Systems Man Cybernet. 7 (1977) 170}179. F.S. Cohen, Z. Huang, Z. Yang, Invariant matching and identi"cation of curves using B-spline curve representation, IEEE Trans. Image Process. 4 (1995) 1}10. F.S. Cohen, J. Wang, Modeling image curves using invariant 3D object curve models*a path to 3D recognition and shape estimation from image contours, IEEE Trans. Pattern Anal. Mach. Intell. 16 (1994) 1}23. M.J. Paulik, M. Das, N.K. Loh, Nonstationary autoregressive modeling of object contours, IEEE Trans. Signal Process. 40 (1992) 660}675.
[18] J. Flusser, T. Suk, Pattern recognition by a$ne moment invariants, Pattern Recognition 26 (1993) 167}174. [19] P.J. Besl, R.C. Jain, 3D object recognition, Comput. Surv. 17 (1) (1985) 75}145. [20] A.L.N. Fred, J.S. Marques, P.M. Jorge, Hidden Markov models vs syntactic modeling in object recognition, Proceedings of International Conference on Image Processing (ICIP '97), 1997, pp. 893}896. [21] Q.M. Tueng, W.W. Boles, Wavelet-based a$ne invariant representation: a tool for recognizing planar objects in 3D space, IEEE Trans. Pattern Anal. Mach. Intell. 19 (1997) 846}857. [22] C. de Boor, in: A practical guide to splines, Applied Mathematical Sciences, vol. 27, Springer, Berlin, 1978. [23] K. Arbter, W.E. Snyder, H. Burkhardr, G. Hirzinger, Application of a$ne-invariant Fourier descriptors to recognition of 3D objects, IEEE Trans. Pattern Anal. Mach. Intell. 12 (1990) 452}459. [24] D. Cyganski, R.F. Vaz, A linear signal decomposition approach to a$ne invariant contour identi"cation, Proc. SPIE: Intell. Robots Comput. Vision X 1607 (1991) 98}109. [25] D.F. Rogers, J.A. Adames, Mathematical Elements for Computer Graphics, McGraw-Hill Publishing Company, New York, 1990. [26] F.S. Cohen, J. Wang, Modeling image curve using invariant 3-D object curve models*a path to 3-D recognition and shape estimation from image contours, IEEE Trans. Pattern Anal. Mach. Intell. 16 (1994) 1}12. [27] Y.H. Gu, Adaptive multiresolution Hermite-Binomial "lters for image edge and texture analysis, Proc. SPIE 2308 (1994) 748}759. [28] K.H. Liang, T. Tjahjadi, Y.H. Yang, Bounded di!usion for multiscale edge detection using regularized cubic B-spline "tting, IEEE Trans. Systems Man Cybernet. 29 (2) (1999) 291}297. [29] B.D. Chen, P. Siy, Forward/backward contour tracing with feedback, IEEE Trans. Pattern Anal. Mach. Intell. 9 (1987) 438}446.
About the Author*(IRENE) YU-HUA GU received M.Sc. degree from East China Normal University, in 1984, Ph.D. degree in electrical engineering from Eindhoven University of Technology, The Netherlands, in 1992. She was a research fellow at Philips Research Institute IPO (NL) and Sta!ordshire University (UK), and a lecturer at The University of Birmingham (UK) during 1992}1996. Since September 1996 she has been an assistant professor in the Department of Signals and Systems at Chalmers University of Technology, Sweden. Her current research interests include multispectral image processing, object recognition, time}frequency and time-scale domain signal analysis. Yu-Hua Gu is a member of the IEEE Signal Processing Society. About the Author*TARDI TJAHJADI received the B.Sc. (Hons.) degree in mechanical engineering from University College London, UK, in 1980, the M.Sc. degree in management sciences (operational management) and the Ph.D. in total technology from the University of Manchester Institute of Science and Technology, UK, in 1981 and 1984, respectively. He joined the Department of Engineering at the University of Warwick and the UK Daresbury Synchrotron Radiation Source Laboratory as a Joint Teaching Fellow in 1984. Since 1986 he has been a lecturer in computer systems engineering at the same university. His research interests include multiresolution image processing, model-based colour image processing and fuzzy expert systems.
Pattern Recognition 33 (2000) 1423}1436
Computing the shape of a planar points set Mahmoud Melkemi*, Mourad Djebali L.I.G.I.M. Universite& Claude Bernard Lyon1. 43 boulevard du 11 Novembre 1918, Bat. 710, 69622 Villeurbanne, France Received 7 January 1999; accepted 23 April 1999
Abstract In this article, we introduce a mathematical formalism de"ning the shape of a "nite point set which we call A-shape. The parameter A is a "nite set of points which positions variation allows A-shape to generate a family of graphs extracted from Delaunay triangulation. Each graph corresponds to an element of a shapes set presenting more and more details and going from the convex hull of the points set to the points set itself. It is obvious that the shape having the suitable level of details is obtained by a judicious choice of A. We also propose a method to determine A for which A-shape gives the adequate shape for points sets containing dense and sparse regions. 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Pattern recognition; Computational geometry; Shape hull; Voronoi diagram
1. Introduction We understand by the shape of a "nite points set, the description of this set by connected components, where each component is a polygon which vertices are selected among the initial points of this set. This polygonal representation must re#ect the forms perceived by a human observer of the dot patterns set. In the literature, several works treat the problem of shape computation. According to the followed approach, we distinguish two complementary categories: the "rst category is a set of algorithms using Delaunay triangulation and based on heuristics. In certain complex situations, these algorithms give satisfactory results. However, on the one hand, they are related to a signi"cant number of thresholds to be adjusted. On the other hand, the interpretation of the obtained results is often related to visual criteria. In this class, Jarvis [1] proposes an algorithm based on the idea of regarding the shape as a generalization of the convex hull. Ahuja and Tuceryan [2] describe a method of classi"cation based on the geometrical properties of a Voronoi region. Other algo-
* Corresponding author. Tel.: #4-72432630; fax: #472431312. E-mail address:
[email protected] (M. Melkemi).
rithms [3,4] use the minimum spanning tree or the relative neighborhood graph to de"ne "rst a neighborhood relationships between the points of the set. These proximity informations are used in a classi"cation process. The second category gathers the various concepts, which reveal morphological properties of a point set. Among these concepts the minimum spanning tree [5], the Gabriel graph [6], the relative neighborhood graph [7,8] and the b-skeletons [9], which are subgraphs of the Delaunay triangulation and exhibit the organization of the points according to neighborhood relationships. In this class, we are interested by a concept related to the shape computation according to a mathematical formalism. Indeed, Edelsbrunner et al. [10] describe the shape by a family of graphs, named a-shape, and obtained from Delaunay triangulation, where the parameter a is a real number, which controls the level of details revealed by each graph. Conceptually, this formalism regards the notion of a shape as a generalization of the convex hull. In this context, the convex hull is a description of the shape with an insu$cient level of details. By varying the a parameter from #R to 0, a-shape reveals gradually more details. On the basis of this construction, one can conclude that for a "nite points set, there is always a suitable a for which a-shape re#ects the shape with the wished level of details. This conclusion "nds however counter-examples. Indeed, there are points
0031-3203/00/$20.00 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 1 2 4 - 7
1424
M. Melkemi, M. Djebali / Pattern Recognition 33 (2000) 1423}1436
sets for which, the a-shape family does not contain elements representing suitable shapes for these sets. Concretely, Fig. 1 shows a points set for which there is no a-shape which detects the hole H while modeling correctly the sparse region. This region is built so that each disc, which does not contain any point of the set and which touches two points, has a diameter higher than r, r being the diameter of the largest disc contained in H. With the aim of curing the limits of a-shape, we propose to describe the shape by a family of graphs extracted from Delaunay triangulation. Each graph is called Ashape where A is a "nite points set controlling the level of details re#ected by this graph [11]. A-shape "nds its base through transcribing the concept of connected components of the continuous case to the discrete one. The analogue of A-shape in the continuous case, named A-shape, corresponds to observations of a domain D on various scales. The level of details re#ected at each observation is controlled by parameter A. Concretely, A-shape is the border of a domain obtained by a dilation of D. An example of A-shape is illustrated by Fig. 1(b) where A-shape corresponds to a suitable level of details of the dense region and the sparse one. The implementation of A-shape requires an adequate choice of the parameter A. In the simple case of a uniformly distributed points set, we de"ne the parameter A relative to a threshold t*0. The points of A are the vertices of the Voronoi diagram of the set and are centers of the Delaunay circles having radii higher than t. The A points, thus de"ned, allow A-shape to detect the holes and the cavities with more and more details, as t decreases. In a more general case, where the points sets contain dense and sparse zones, we adapt the construction of the simple case by associating with each point of the set,
a disc centered at this point. Each disc is an in#uence zone of each point, the zone is large, if the point is in a sparse region and weak if not. By taking into account the points weights, the points of A are computed in a similar way to that of the simple case. This article is organized as follows: in Section 2, we present the Voronoi diagram and a-shape which are related to A-shape concept. In the following section, we describe A-shape and its principal properties. The last section relates to the implementation of A-shape, it is a matter of computing the suitable A for the case of sets containing sparse and dense regions. 2. Voronoi diagram and a-shape 2.1. Dexnition of the Voronoi diagram Let S"+p , p ,2, p , be a set of n distinct points of the L plan, d(p, q) is the Euclidean distance between two points p and q. We de"ne a Voronoi polygon of a point p by: G R(S, p )"+p31; d(p, p )(d(p, p ) ∀j such that iOj,. G G H The set <(S)"+R(S, p ),2, R(S, p ), de"nes a parti L tion of the plan. This set is called Voronoi diagram of S. The dual of the Voronoi diagram obtained by connecting the points p , which associated polygons are adjacent, is G called Delaunay triangulation of S which we note D¹(S). 2.2. a-shape of a points set Edelsbrunner et al. [10] introduced an elegant de"nition of the shape of a "nite points set. They justify their concept by de"ning it as a generalization of the convex
Fig. 1. Example of a set containing a dense region and a sparse one. (a) The circle corresponds to the largest disc included in the hole H. (b) A-shape of the points set.
M. Melkemi, M. Djebali / Pattern Recognition 33 (2000) 1423}1436
hull. This concept is expressed in terms of a shapes family called a-shape, a*0. By varying the parameter a from 0 to #R, a-shape covers a "nite set of shapes going from `"ne shapesa to `coarse shapesa. The value a"0 corresponds to the initial set and when a has su$ciently big value, a-shape gives the convex hull. The example of Fig. 2 shows various shapes of the points set `Aa for di!erent values of a, (a"8, 12, 20, 60). We observe that the shape of Fig. 2(b) corresponds to an adequate details level. Formally, a-shape is de"ned as follows. 2.2.1. Dexnitions 1. We call a-disc, a*0, a disc of radius a. 2. We call a-hull of S, the intersection of a-discs not containing S. 3. For any couple (p, q)3S;S, an edge [pq] is a-exposed in S, if there is an a-disc such that on the one hand the points p and q are on its boundary and on the other, its interior does not contain any point of S. 4. a-shape of S is the set of the a-exposed edges. The shape can be seen like the boundary of the a-hull where the arcs of circles are replaced by straight lines.
1425
It is seen as a polygon which can be concave, convex and its components can be reduced to a point. The a-shape is deduced from the Delaunay triangulation. Indeed, an edge e of D¹(S) is an element of a-shape if and only if a (e))a)a (e), where a (e) and a (e) are two
positive real numbers associated with e and computed from the edges of <(S). This property of a-exposed edges makes it possible to extract the a-shape from D¹(S) with a linear complexity, thus, a-shape is computed without di$culty from D¹(S). a-shape shows its limits, if it is a question of modeling sets containing dense and sparse regions. To allow an adequate modeling of this sets type, we propose an alternative concept which we call Ashape. The following part gives a formal de"nition of this concept.
3. Presentation of the A-shape concept A-shape [11] is a rewriting of the connected components concept in the discrete case. Indeed, we reformulate the connected components by introducing the concept
Fig. 2. a-shapes of the dots pattern `Aa. For each shape, the value of a is respectively: (a) a"8, (b) a"12, (c) a"20, (d) a"60.
1426
M. Melkemi, M. Djebali / Pattern Recognition 33 (2000) 1423}1436
A-shape of domains. By varying the parameter A, Ashape of a domains set D describes sights of D on various scales. The image of D at the coarsest scale is its convex hull and that at the "nest scale corresponds to the connected components of D. In next, we de"ne A-shape and present it's analogous A-shape. 3.1. The continuous case: A-shape of domains Let D be a union of a set of bounded domains D ,2, D . Let A be a set of bounded curves in the plan, L such that: 1. the intersection of A with D is empty. 2. there is a set of l closed curves C ,2, C included in J A. The elements of this set are boundaries of domains dom ,2, dom with DL8J dom . J G G This last condition ensures the fact that D is surrounded by curves extracted from A. An example of A is the boundary of a bounded region containing D.
3.1.1. Dexnition of A-shape 1. B(A, D) is the domain containing D and having A as border, with: A"+p31!D"d(p, A)"d(p, D),. 2. We call A-hull of D, the set given by the expression:
7 bM (p) 5B(A, S), NZY where bM (p) indicates the closed complementary of the disc centered in p3A and having d(p, D) as radius. 3. We call A-shape of D the boundary of its A-hull. An example of A-shape is illustrated by Fig. 3. A-shape is a multi-scale description of a set D. The parameter A controls the level of details re#ected by an observation at a given scale. A-shape is the boundary of a set obtained by dilation of D. The `coarsesta level of details which corresponds to the convex hull of D is obtained while regarding A as the closed curve which, on the one hand,
Fig. 3. Illustration of A-hull of a set D. (a) shows the curves A and A. (b) illustrates the set B(A, D). (c) shows A-hull of D.
M. Melkemi, M. Djebali / Pattern Recognition 33 (2000) 1423}1436
1427
surrounds D and on the other hand, is su$ciently far from this domain. The `"nera level of details corresponds to the connected components of D. This level is reached by considering A as the skeleton of the complementary of D in a square containing D. In the example illustrated by Fig. 4, two various levels of details are represented. Fig. 4(a) corresponds to the `coarse shapea of D obtained by using the indicated A. That of Fig. 4(b) re#ects the level of adequate details which is the set of the connected components of D. The passage to the discrete case consists in transcribing A-shape in its analogue A-shape. In this passage, S replaces D and A replaces A. A-shape is thus de"ned as follows.
3.2.1. Dexnition of A-shape
3.2. The discrete case: A-shape of a point set
Fig. 5 illustrates an example of A-shape. A-shape is a graph which can be seen as a set of polygons. These polygons can be convex, concave or reduced to isolated points. It can be regarded as the boundary of A-hull where the arcs of circles are replaced by A-exposed edges. The example of Figs. 6 and 7 shows the passage from the continuous case (A-shape) to the discrete one
Let A be a "nite points set, we note A the set of the Voronoi edges vertices separating the points of S from those of A. Thereafter, bM (q) (q3A) will be de"ned like the complementary of the Delaunay disc centered at q.
1. We call A-hull of S, the set:
7 bM (p) 5B(A, S), NZAY where
B(A, S)"8 R(S6A, p). NZ1 2. An edge [p p ] of D¹(S6A) is A-exposed, if there is G H a Delaunay circle passing by p , p and a point of A. G H 3. A-shape of S is the set of A-exposed edges extracted from D¹(S6A).
Fig. 4. A-hull of a set D in the case of two choices of A. (a) This choice of A corresponds to a `coarsea shape of D. (b) This choice of A corresponds to the exact shape of D.
1428
M. Melkemi, M. Djebali / Pattern Recognition 33 (2000) 1423}1436
Fig. 5. A-shape of the points set S. (a) <(S6A). (b) B(A, S). (c) A-hull of S. (d) A-shape of S.
(A-shape). The sets S and A illustrated by Fig. 6 are obtained by sampling the sets D and A of Fig. 4(a). Fig. 7 shows two examples of A-shapes obtained from the sampling of the sets D and A considered in Fig. 4. We note the similarity of the obtained results in both the example illustrated by Fig. 4 and that illustrated by Fig. 7. In what follows, we present some interesting properties of A-shape. 3.2.2. Properties 1. A-shape of S is a subset of D¹(S). 2. The set S and its convex hull are elements of Ss A-shape. 3. Let [pq] be an edge of D¹(S), where p and q are neighbors across a Voronoi edge [v v ].
[pq] is in A-shape, if and only if, A veri"es the following conditions: (i) A5b(v )5b(v )". (ii) b(v )5AO or b(v )5AO. (iii) ∀x3A5b(v ) (respectively x3A5b(v )), A5b(M , V d(x, M ))5b(v )" (respectively A5b(M , d(x, M ))5 V V V b(v )"), where M is the intersection point be V tween the bisector of [xp] and the edge [v v ]. Property 2 gives the highest and the lowest levels of details reached by A-shape. They correspond respectively, to the convex hull of S and to the set S itself. Property 3 gives a necessary and su$cient condition on the points of A, so that an edge of D¹(S) is in A-shape. The reader will "nd the proofs of these properties in
M. Melkemi, M. Djebali / Pattern Recognition 33 (2000) 1423}1436
1429
Fig. 6. The example of Fig. 4, corresponding to the continuous case, is considered here in the discrete one, by sampling A and D.
the appendix. Fig. 8 illustrates the various sets used in these properties. The implementation of A-shape concept rests on the determination of the parameter A to compute the adequate shape of a given set. In the following section, we propose a method allowing us to build A for both the uniformly distributed sets, and sets containing dense and sparse regions. 4. Computing the parameter A 4.1. The case of uniformly distributed points sets If S is a uniformly distributed set, the parameter A that we propose is the union [12] of the two following sets: (i) centers of the Delaunay circles associated to D¹(S) and having radii higher than a threshold t*0. (ii) for each edge [pq] of the convex hull of S, we consider the point not belonging to the convex hull of S and which is the center of the circle passing by p and q and having su$ciently big radii. The set de"ned in (i) makes it possible to detect holes and cavities and that described in (ii) exhibits the convex zones. When the value of t decreases, A-shape describes a spectrum of forms expressing gradually more and more details. The convex hull of S is the shape corresponding to a su$ciently large value of t. In the case where t"0, all Voronoi vertices are considered. In this case A-shape is interesting when the aim is to detect form of curved type, this is due to the fact that points of A correspond to an approximation of medial axis (see Fig. 9). Another example of A-shape computed by using this method is illustrated by Fig. 10.
4.2. The case of sets containing dense and sparse regions We generalize the previous case by proposing a construction of A, allowing A-shape to model points sets containing dense regions and sparse ones. To compensate the density of sparse regions, we associate with each point of S an in#uence zone which is a disc centered at this point having a large radius if the point is in a sparse region and a small radius if it is in a dense one. The points of A are computed by following the same reasoning as that used in the previous case. This computation is carried out as follows. For each point p 3S, we note by pH3S the point such G G that d(p , pH)"min+d(p , p ), jOi and j"1,2, n,. G G G H We associate with each point p a disc noted b(p , w ), G G G centered at p and having w "d(p , cp #(1!c)pH), 0) G G G G G c)1, as radius. For a "xed value of c, the disc associated with p is large G if p is in a sparse region and it is small if it is in a dense G one. In practice, we note that `interestinga values of c lie between 0.5 and 0.8. Thereafter, c will be "xed at 0.5. Let us note by d(x, p )"d(x, p )!w, the power disG G G tance between x31 and the weighted point (p , w ). Let G G us suppose the known power diagram [13}15] P(S) of the points (p , w ), (p , w ),2, (p , w ). L L We de"ne A as the union of the two following sets: 1. for each edge [p p ] of the convex hull of S, one G H considers a point x checking the following conditions: (i) d(p , x)"d(p , x) with d(p , x) su$ciently big, and G H H d(p , x)*0, k*1. I (ii) x does not belong to the convex hull of S.
1430
M. Melkemi, M. Djebali / Pattern Recognition 33 (2000) 1423}1436
Fig. 7. Two A-shapes of the sampled set D and A of Fig. 4.
2. we consider the vertices x of P(S) which check the following conditions: (i) there exist points (pl, p , p )3S such that: I K d(pl, x)"(p , x)"(p , x)*t. I K (ii) ∀r*1, (p , x)*0 P Thus, the built set A is a generalization of that calculated in the case of the uniformly distributed points (cf. Section 4.1). Indeed, if we cancel out the weightings w , we G "nd the set A which is built for the precedent case. In a similar way, the set A is composed of two sets, the "rst one makes it possible to detect the convex parts of the shape and the second reveals its holes and its cavities. Thus, to determine A-shape, we compute <(S) and de-
duce the weights w i"1,2, n. Then, we compute the G power diagram P(S) and build the points of A. The following stage consists in computing <(S6A) and deducing the edges of A-shape. Note that the case, where t"0, and all vertices are considered as points of A, is interesting when it's question of reconstruct forms of curved type (see Fig. 11). Another example of A-shape, computed by using this method, is illustrated by Fig. 12.
5. Conclusion Determine the shape of a points set, by using the concept a-shape, requires only the choice of the value of
M. Melkemi, M. Djebali / Pattern Recognition 33 (2000) 1423}1436
1431
Fig. 8. [pq] is an edge of A-shape, the gray region should not contain points of A. In the "gure (a) the gray region is b(v )5b(v ). In (b) the gray region represents b(v )5b(M , d(p, M ). V V
one parameter a. For this concept, the unicity of this parameter is at the same time an advantage and a disadvantage, especially, when it is a question of modeling sets with variable densities. In this article, we proposed an alternative concept to a-shape which we call A-shape. It is similar to a transcription of the connected components concept of the continuous case to the discrete one. By varying the positions of A points, A-shape describes a collection of graphs which present more and more details of a set. The edges of the A-shape graph of a given set are edges of the Delaunay triangulation of this set, thus the A-shape is deduced without di$culty after computation of Delaunay triangulation. The implementation of A-shape rests on the determination of A, to calculate the suitable shape of a set. Thus, we presented a method to determine the positions of the points of A in the case of sets containing dense and sparse regions. The works in prospect which join the presented work are given by the following items: E in certain practical cases, we can "nd sets which points are weighted by positive real numbers. The problem consists in calculating the shape by taking into account the points weightings. Our objective is to propose a generalization of A-shape to this sets type, by introducing the weighted A-shape concept. E in the implementation of A-shape, it is interesting to develop methods for calculating the adequate A. This work will be a generalization of the method presented in part 4.
E in the context of surfaces reconstruction from non organized points [16}18], it is interesting to see and evaluate the contribution of the A-shape concept, when one extends it to sets of three-dimensional points.
6. Summary This paper introduces a new de"nition of the shape of a "nite planar points set. The presented concept is named A-shape where A is an arbitrary "nite points set controlling the level of details re#ected by the shape. This concept is justi"ed by considering it as a transcription of the domains shape which is the boundaries of connected components. This allows its ability to achieve suitable shape of any points set. The implementation of A-shape requires an adequate choice of the parameter A. In the simple case of a uniformly distributed points set, we de"ne the parameter A relatively to a threshold t*0. The points of A are the vertices of the Voronoi diagram of the set and these vertices are centers of the Delaunay circles having radii higher than t. Thus, the de"ned points allow A-shape to detect the holes and the cavities with more and more details, as t decreases. In a general case where the sets of points contain dense and sparse regions, we adapt the construction of the simple case by associating, with each point of the set, a disc centered at this point. Each disc is an in#uence zone of each point, each zone is large if the point is in a sparse region and small if
1432
M. Melkemi, M. Djebali / Pattern Recognition 33 (2000) 1423}1436
Fig. 9. A-shape of the set shown in (a). (b) <(S6A) (t"0). (c) A-shape of this set.
not. By taking into account the weights of the points, the points of A are computed in a similar way to that of the simple case. Appendix A In this appendix, we present the proofs of the properties described in the Section 3.2.2. Proof of property 1. This property is a direct consequence of the A-shape de"nition.
Proof of property 3. To demonstrate this property, we give the proofs of the following implications. (a) [pq] is in A-shape implies that the conditions (i), (ii) and (iii) are true. Let us suppose that (i) or (ii) or (iii) are false and [pq] belongs to A-shape. We have, therefore, three possibilities: E Let us suppose that (i) is false, therefore, there exists a3 A such that a3b(v ) and a3b(v ). This implies that [v v ]LR(S6+a,, a), consequently [pq] is not in A
M. Melkemi, M. Djebali / Pattern Recognition 33 (2000) 1423}1436
1433
Fig. 10. A-shape of the pattern `Aa. (b) Represents <(S) with the choice of the points of A(t"15). (c) Represents <(S6A). (d) The detected shape.
shape, which contradicts the starting assumption. We conclude that (i) is necessarily true. E Let us suppose that (ii) is false, therefore, the set b(v )6b(v ) does not contain any point of A, thus, there is no point a3 A such that a is the neighbor to p and q in <(S6A). We deduce that [pq] is not in A-shape, which contradicts the initial assumption. E Let us suppose that (iii) is false, therefore, there exists x3A5b(v ) such that A5b(M , d(x, M ))5b(v )O. V V Let a denote an element of A5b(M , d(x, M ))5b(v ), V V
the points x and a checks the following relations: 1. 2. 3.
v 3R(S6+a,, a). M 3R(S6+a,, a). V v 3R(S6+x,, x).
The proofs of these three relations are as following: a3b(v )Nd(a, v )(d(q, v ) "min (d(y, v ))Nv 3R(S6+a,, a), WZ1
1434
M. Melkemi, M. Djebali / Pattern Recognition 33 (2000) 1423}1436
Fig. 11. A-shape of the set illustrated by (a). (b) Represents the power diagram of the weighted points with the points of A(t"0). (c) <(S6A). (d) Its A-shape.
a3b(M , d(x, M ))Nd(a, M )(d(x, M ) V V V V "d(p, M )"min (d(y, M ))NM 3R(S6+a,, a), V V V WZ1 x3b(v )Nd(x, v )(d(q, v ) "min (d(y, v ))Nv 3R(S6+x,, x). WZ1 From relations 1 and 2, we deduce that [M v ]LR(S6 V +a,, a). Relation 3 implies that [M v ]LR(S6+x,, x). V
Thus, we deduce that [v v ]LR(S6+x, a,, a)6 R(S6+x, a,, x). Consequently, p and q are not neighbors in <(S6A), from where, [pq] is not in A-shape. This contradicts the starting assumption. (b) If (i)}(iii) are true then [pq] belongs to A-shape. The conditions (i) and (iii) guarantee that [v v ] is not included in 6 AR(S6A, a). Consequently, p and q are ?Z neighbors in <(S6A). On another side, condition (ii) implies that there is at least a point a3A such that a is neighbor to p and q. On the basis of these facts, we deduce that [pq] belongs to A-shape.
M. Melkemi, M. Djebali / Pattern Recognition 33 (2000) 1423}1436
1435
Fig. 12. A-shape of the set illustrated by (a). (b) Represents the power diagram of the weighted points with the choice of A points (t"20). (c) Represents <(S6A). (d) The detected shape.
Proof of property 3. A-shape of S corresponds to the convex hull of S, if we choose the points of A such that, on the one hand, the vertices of the convex hull of S6A are in A, and on the other hand, these points are outside the union of the Delaunay discs associated to D¹(S). A-shape of S gives S when all the neighbors of each point of S are in A.
References [1] R.A. Jarvis, Computing the shape hull of points in the plane, in: Proceedings of the IEEE Computing Society Conference on Pattern Recognition and Image Processing, New York, 1977, pp. 231}241. [2] N. Ahuja, M. Tuceryan, Extraction of early perceptual structure in dot patterns: integrating region boundary and component gestalt, Comput. Vision Graphics Image Process. 48 (1989) 304}356.
1436
M. Melkemi, M. Djebali / Pattern Recognition 33 (2000) 1423}1436
[3] R. Urquhart, Graph theoretical clustering based on limited neighborhood sets, Pattern Recognition 15 (3) (1992) 173}187. [4] C.T. Zahn, Graph-theoretical methods for detecting and describing gestalt clusters, IEEE Trans. Comput. C-20 (1971) 68}86. [5] D. Cheriton, R.E. Tarjan, Finding minimum spanning trees, SIAM J. Comput. 5 (4) (1976) 724}742. [6] D.W. Matula, R.R. Sokal, Properties of Gabriel graphs relevant to geographic variation research and the clustering of points in the plane, Geograph. Anal. 12 (3) (1980) 205}222. [7] G.T. Toussaint, The relative neighborhood graph of a "nite planar set, Pattern Recognition 12 (1980) 261}268. [8] J.Q. Jaromczyk, G.T. Toussaint, Relative neighborhood graphs and their relatives, Proc. IEEE 80 (9) (1992) 1502}1517. [9] D.G. Kirkpatrick, J.D. Radke, A framework for computational morphology, in: Computational Geometry, Elsevier, North-Holland, New York, 1985, pp. 234}244. [10] H. Edelsbrunner, D.G. Kirkpatrick, R. Seidel, On the shape of a set of points in the plane, IEEE Trans. Inform. Theory 29 (1983) 551}559.
[11] M. Melkemi, A-shape of a "nite point set, Proc. ACM Comput. Geom., Nice (1997) 367}369. [12] M. Melkemi, L. Melkemi, An algorithm for detecting dot patterns, in: R.A. Earnshaw, J.A. Vince (Eds.), Computer Graphics, Academic Press, Leeds, U.K, 1995, pp. 161}169. [13] J.D. Boissonnat, M. Yvinec, GeH omeH trie algorithmique, Edisci. Int. 1995. [14] F. Aurenhammer, Power diagrams: properties, algorithms and applications, SIAM J. Comput. 16 (1987) 78}96. [15] H. Imai, M. Iri, K. Murota, Voronoi diagrams in the Laguerre geometry and its applications, SIAM J. Comput. 14 (1985) 93}105. [16] J.D. Boissonnat, Geometric Structures for three-dimensional shape representation, ACM Trans. Comput. Graphics 4 (1984) 266}286. [17] H. Edelsbrunner, E.P. MuK ke, Three-Dimensional Alpha Shapes, ACM Trans. Comput. Graphics 13 (1) (1994) 43}70. [18] D. Attali, r-regular shape reconstruction from unorganized points, ACM Comput. Geom., Nice (1997) 248}253.
About the Author*MAHMOUD MELKEMI received a Ph.D. in applied mathematics from the University of Grenoble 1, France. He is a researcher of L.I.G.I.M (Laboratoire d'Informatique Graphique Image et ModeH lisation) at Claude Bernard University, Lyon 1, France. His research works are focused on computer graphics, pattern recognition and computational geometry domains. About the Author*MOURAD DJEBALI is an associate professor at L.I.G.I.M (Laboratoire d'Informatique Graphique Image et ModeH lisation) at Claude Bernard University, Lyon 1, France. He received a Ph.D. in computer science from Claude Bernard University, Lyon 1, France. He is interested by computer vision, computer aided-design and pattern recognition domains.
Pattern Recognition 33 (2000) 1437}1453
Robustness of a multiscale scheme of feature points detection Jacques Fayolle*, Laurence Riou, Christophe Ducottet Laboratoire Traitement du Signal et Instrumentation, UMR CNRS n3 5516, 23 rue du Docteur P. Michelon, 42023 Saint-Etienne, Cedex, France Received 19 February 1998; received in revised form 11 June 1999; accepted 11 June 1999
Abstract We present a new scheme for feature points detection on a grey level image. Its principle is the study of the gradient phase signal along object edges and the characterization of the behavior across scales of the wavelet coe$cients of this signal. The features points are determined as transition points of this signal. In the second part, we study the robustness of the detection scheme against changes of the acquisition parameters: the viewpoint and the zoom of the camera, the object rotation, the luminescence variation and noise. The results show the method e$ciency: most of the points are still detected even if these parameters vary. 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Feature points; Wavelet transform; Local curvature; Method e$ciency; Acquisition parameter; Viewpoint; Scale variation
1. Introduction The detection of feature points on grey level images is a common problem of image processing. The main applications are the comparisons between human perception and computer vision, the recognition or the tracking of objects in a scene or object recognition in an image database. For example, the knowledge of positions of features points allows Talluri to locate a mobile robot in three-dimension [1]. These points are also used in the recognition of objects such as human face pro"les [2]. However, the main application of feature points detection is in the determination of motion between successive frames. The motion measurement is done through the tracking of these points. The tracking problem is known as the correspondence problem. Many papers address this subject [3}8]. Therefore, it is essential to get an algorithm of feature points detection which gives the same set of points even if the position or the orientation of the object has changed.
* Corresponding author: Tel.: 00-33-4-77-48-5131; fax: 00-334-77-48-5120. E-mail address:
[email protected] (J. Fayolle).
We retain in this paper the de"nition of feature points given by Attneave in 1954 [9]: feature points are high curvature points. This de"nition is based on the consideration that the human perception is more sensitive to high curvature points than to all others points. According to Chen [10], we distinguish two main approaches of detection schemes: the polygonal approximation of edges and the detection of grey level corners. The "rst one (the more classical) consists in the segmentation of the image, the representation of the object boundary by one chain code and then the detection of corners as points where the direction of edges changes [11}15]. Obviously, the performance of this kind of algorithms is strongly linked to the quality of the segmentation. Unfortunately, for many types of images, the segmentation task is di$cult. The second approach avoids this drawback. Indeed, the second class of methods takes into account the grey level images and not the object shape [16}19]. Most of these methods use as criterion the changes of the direction of the grey level gradient. For example, Lucas and Kanade propose a detection scheme which uses a segmentation of eigenvalues of a contrast matrix [20]. More recent algorithms detect points through the behavior of wavelet coe$cients across scales. For example, Zheng proposes a detection scheme through Gabor wavelets decomposition and the research of points for which the
0031-3203/00/$20.00 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 1 3 6 - 3
1438
J. Fayolle et al. / Pattern Recognition 33 (2000) 1437}1453
variation of coe$cients across scales is maximum [8]. Similarly, Chen uses the evolution of wavelet coe$cient to isolate corner candidates (with a wavelet de"ned as the "rst derivative of a Gaussian) [10]. The method proposed in this paper is closely related to this last approach: detection through the behavior of coe$cients across scales. More precisely, we determine the gradient phase signal along the edges and extract the feature points from this signal. The feature points are those corresponding to a model of smoothed singularity (named transition). We characterize these points by the length and the amplitude of the grey level transition. These measurements allow the determination of local curvatures and therefore the detection of robust feature points. The quality of a scheme of feature points detection is strongly linked to its robustness against changes of object position and orientation. We expect that the algorithm detect the same set of feature points on the object image for any viewpoint. Indeed, this property of the detection scheme is essential either for object tracking or object recognition in an image database. For instance, we consider the following case: we get an image of an object and we do not know how this image has been acquired (the camera viewpoint, the ratio between the object size and the object}image size, the lighting direction,2). The aim is to recognize this image in a database very quickly. One method is to test the correlation between a set of feature points detected on this image and the sets of feature points detected on each image of the database. The maximum of this correlation indicates the corresponding image of the database. The "rst task of this kind of application is to assure that the same set of feature points is detected even if the acquisition parameters have changed. For object tracking, the problem is the same. Our purpose is to study the robustness of the proposed detection scheme against some parameters. We have chosen the following experimental protocol. We study images of painting reproductions seen from di!erent viewpoints and we test if the set of feature points detected on the object of these paintings is linked to the acquisition parameters. We consider "ve parameters: E the relative position of the camera and the paintings reproduction (the object), E the scale parameter (corresponding to a zoom e!ect), E the image rotation (corresponding to a rotation of the camera around its optical axis), E the luminescence variations, E the signal-to-noise ratio. Our aim is to determine the percentage of points which are still detected even if these parameters vary. We will say that the detection is e$cient if the feature points detected for two di!erent viewpoints correspond to the same localization on the objects represented on the paintings. This experimental protocol is introduced according
to the application of image database consultation but the results obtained are more general. Indeed, the set of tested parameters is large enough to group together many applications such as motion determination or pattern recognition. The rest of this paper is organized as follows. In Section 2, we present the multiscale scheme used for feature points detection. In particular, we give details on the theoretical variation along maxima lines of the modulus of the wavelet transform coe$cients across scales. The studied case is the transition between two stable levels of the grey level function. In Section 3, we show some results of detection on test images and we discuss about the robustness, advantages and drawbacks of the method in Section 4. Concluding remarks are given in Section 5.
2. Multiscale detection of feature points The proposed detection scheme is composed of three main stages: the detection of object edges (we de"ne precisely the notion of edges hereafter), the localization of feature points and then the characterization of these feature points (and particularly the estimation of the curvature at each feature point). Each step is detailed below. 2.1. Theory of multiscale detection using wavelets The algorithm of feature points detection proposed in this paper is based on the theory of multiscale edge detection using wavelets. In this section, we "rst present the principle of multiscale edge detection and we highlight its link with the wavelet transform. Only the one-dimensional (1D) case is considered here (the twodimensional (2D) case can be easily extrapolated). A more detailed presentation of this theory can be found in Mallat, Zhong and Hwang papers [21,22]. The edges of a signal are particular points where this signal has sharp variations. In 1D signal, such points are isolated. In 2D signals, they form continuous lines and provide the location of objects contours. Whereas this notion is clear in the case of a binary object, it becomes fuzzier for grey level images. For two-dimensional signals, points where the gradient vector modulus (of the grey value) is locally maximum in the gradient vector direction make up edges. According to this de"nition initially proposed by Canny [23], edges correspond to in#ection lines in the signal. If the gradient maximum is high, the in#ection corresponds to a sharp variation. On the contrary, if it is low, the in#ection corresponds to a smooth variation. In the 1D case, edges are particular points where the "rst derivative modulus is locally maximum. We describe here the principle of the multiscale detection of edges. We consider a smoothing function u(x)
J. Fayolle et al. / Pattern Recognition 33 (2000) 1437}1453
whose integral is equal to 1 and that converges to 0 at in"nity. A typical example of such smoothing function is the Gaussian function. In order to obtain smoothed versions of a signal at di!erent scales, we introduce the smoothing function u (x) at scale s de"ned by Q 1 x u (x)" u . (1) Q s s
Multiscale edges are de"ned using smoothed versions of the signal and a "rst- or a second-order derivative. Here, we choose the "rst-order operator. The reasons of this choice are the following. We are interested by in#exion points of the signal, and these points can be detected as the local maxima of the "rst derivative signal. The second reason is linked to the choice of the wavelet used and we will explain it later. However, all the mathematical developments presented hereafter can be made with the second-order operator. (The principle and the results are identical.) With the previous choice, multiscale edges are de"ned as the modulus maxima points of the "rst derivative of smoothed versions of the signal. They are in#exion points of the "ltered signal. If the smoothing function is di!erentiable, the "rst derivative of a function f, smoothed at scale s can be expressed by
du 1 du x d ( f u )(x)"f * Q (x)" f * , dx s dx s dx * Q
(2)
where * denotes the convolution product. This multiscale edge detection can be expressed in terms of wavelet transform. Let us introduce the wavelet function t(x) and the wavelet function t (x) at scale s: Q du t(x)" (x), dx
1 x t (x)" t . Q s s
(3)
Then, 1 1 d ( f u )(x)" f * t (x)" =f (s, x), Q dx * Q s s
(4)
where =f denotes the wavelet transform of function f, with respect to the wavelet t(x), at scale s and position x [24}26]. The function t(x) is a wavelet because its integral is equal to 0. Eq. (4) proves that multiscale edges can be detected by means of a wavelet transform. The "rst step consists in computing the wavelet transform of the signal using a wavelet which is the "rst derivative of a smoothing function (Eq. (3)). The important parameter is the choice of the wavelet. We must use a wavelet which is the derivative of a smoothing function (other ones such as Morlet wavelets are not appropriated to the edge detection). But, we can choose the derivation order of the
1439
smoothing function. If we choose as wavelet the n-derivative of a smoothing function, then we can detect edges corresponding to singularity order less than n [22,27]. Here, we are only interested in the edges of the signal, and therefore in the singularity order less than 1. That is why we choose as wavelet the "rst derivative of a Gaussian. However, the following algorithm can be developed for wavelets obtained with second derivative of a Gaussian, but the interpretation of extracted points is more di$cult. The second step of the multiscale edge detection consists in detecting the local maxima of the modulus of the wavelet transform. The wavelet transform used must not be under sampled, and the simplest way to compute it is to evaluate, in the Fourier space, the convolution product of Eq. (4) [28]. The calculation can be made for any value of the scale parameter s. Multiscale edges can be represented in a scale space diagram where the horizontal axis corresponds to the space variable x and the vertical axis corresponds to the scale variable s. While the scale varies, maxima points of the wavelet transform modulus are connected along lines called maxima lines. In the scale space diagram, the maxima lines converge at in#ection points of the original signal when the scale reduces to 0. We will study hereafter the variation of the wavelet transform modulus along these lines. Therefore, we have to get many points along them in order to assure a good precision to our algorithm. This is the reason why we use a continuous wavelet transform and not a dyadic one. In the rest of the paper, the wavelet used is de"ned as the "rst derivative of a Gaussian function and the algorithm used for the wavelet transform is based on the Fourier transform. The advantages of the multiscale edge detection are: E the choice of the scale, therefore the level of details retained on each edge, E to get, in addition of the edge localization, the direction and the modulus of gradient vectors at each edge point (Fig. 1). This additional information is useful for the next step: the detection of feature points along edges. In summary, the edge detection step consists in the calculation of the wavelet transform of the image at one scale (chosen according to the details level we will retain) and the extraction of the maximum of its modulus in the gradient direction. 2.2. Detection of feature points Feature points (de"ned above as high curvature points) correspond obviously to points where the edge direction changes rapidly or (which is equivalent) to points where the gradient direction changes rapidly. The
1440
J. Fayolle et al. / Pattern Recognition 33 (2000) 1437}1453
Fig. 1. (a) Initial image (b) edges detected and gradient vectors at each edge point.
gradient direction criterion is more powerful. Indeed, the edge detection gives this information in a `nearly continuous spacea (because the sampling frequency of the wavelet can be chosen as small as needed), whereas the edge direction is known on the discrete space of the image. Thus, the edge direction is more sensitive to aliasing and noise defaults than the gradient direction. Consequently, we propose the following scheme to detect high curvature points. We localize points corresponding to high variations in the signal of gradient direction. This 1D signal is constructed from the results of the "rst stage. We follow each edge with an edge route and we report the value of gradient direction at each abscissa. Then, the detection of feature points is equivalent to the detection of sharp variations in this onedimensional signal. Some of these variations are not signi"cant (local variations due to noise). On the contrary, other variations correspond to real transitions of the phase signal. These points of transition are retained as feature points. To detect them, we study the behavior across scales of wavelet transform coe$cients of the gradient phase signal. This kind of approach is also used by Chen [10]. It can be split into two sub tasks: the localization of sharp variations and then the selection of transition points among all the variations. The detection of sharp transitions in a one-dimensional signal with wavelet transforms is not an original algorithm. It has been "rst proposed by Mallat [21] and used in many applications. The originality of our algorithm is to use this detection scheme on a phase signal and the proposed extension of the Mallat algorithm in order to characterize very precisely feature points (Section 2.3). Lets us remind the Mallat algorithm. We compute the wavelet transform of the signal for a set of scales and we extract maxima lines of the wavelet transform modulus. These lines (functions of the scale parameter) point the transition abscissas when the scale goes down (Fig. 2).
Fig. 2. An original signal (top), its wavelet transform (middle) and the detected maxima lines (bottom).
To perform the discrimination between di!erent types of variations of the signal, we study the evolution of wavelet coe$cients along the maxima lines. We consider only variations corresponding to peaks or steps, and we represent them by "ltered Dirac and "ltered Heaviside distributions. The logarithmic slope, for important scales, of the evolution of wavelet coe$cients along maxima lines holds the information of variation type. This slope is equal to 0 in the step case and !1 in the Dirac case (Fig. 3). By extension, we consider that a positive slope indicates a "ltered step and a negative one a "ltered Dirac distribution. The proof of these behaviors can be found in Ref. [21]. For the detection of feature points, we are interested in transitions between two stable levels, therefore we are interested in variations corresponding to the step case.
J. Fayolle et al. / Pattern Recognition 33 (2000) 1437}1453
1441
Fig. 4. Estimation of two parameters on a "ltered step: the transition length p and the amplitude A. Fig. 3. Logarithmic slope of the evolution of wavelet coe$cients modulus along maxima lines for the step (䊏) and the Dirac distribution (*).
On the other hand, we associate noise to variations corresponding to the Dirac type. In short, feature points are detected as points for which the behavior of the wavelet coe$cients of the gradient-phase signal along maxima lines is similar to the transition behavior. 2.3. Characterization of feature points The knowledge of the feature point type allows us to characterize them precisely, and "nally to estimate the local curvature. We measure two quantities on a "ltered step: a transition length named p and the amplitude of the step named A (Fig. 4). The direct measurement of these quantities from the grey level image is the main improvement of the proposed algorithm (according to classical schemes of feature points detection). These two parameters are obtained by means of the "tting of the experimental variation of the wavelet transform modulus along maxima lines with theoretical variations. These theoretical variations are computed for a "ltered step in the following manner. (The wavelet used is de"ned as the "rst derivative of the Gausssian, as for the "rst stage of the algorithm.) A "ltered step S (x) expressed in function of the HeaviN side distribution H(x) can be written as S (x)"A ) H(x) * G (x), N N where
(5)
1 x G (x)" G . N p p
(6)
In these formulae, G is the Gaussian function (with average and standard deviation equal to 0 and 1, respectively) and * denotes the convolution product. If we take into account the expression of S (x), the wavelet transN form at scale s of this step S is de"ned by N SI (x)"S (x) * t (x)"A ) H(x) * G (x) * t (x) NQ N Q N Q 1 dG x "A ) H(x) * G (x) * , (7) N s dx s
t (x) is the wavelet at scale s. Since we have chosen this Q wavelet as the "rst derivative of a Gaussian t (x)" Q (1/s) (dG/dx) (x/s) . Since (dG/dx) (x/s)"s (dG /dx) (x), we obtain: Q dG Q (x) SI (x)"A ) H(x) * G (x) * s NQ N dx
d (H(x) * G (x) * G (x)), N Q dx
(8)
SI (x)"As ) d(x) * G (x) * G (x)"As ) G (x) * G (x) NQ Q N Q N "As ) G( (x). Q >N Therefore, the modulus of the wavelet transform is
(9)
"As )
A s "SI (x)"" e\VQ>N N Q (2p (s#p
(10)
A maxima line across scales is given by the maxima of "SI (x)" according to the variable x. The maximum is NQ obtained for x"0 whatever the value of s. Thus, the theoretical evolution of the wavelet transform modulus along a maxima line is given by A s . m(s)" (2p (s#p
(11)
The "tting of this theoretical variation with the experimental one allows the measurement of the two quantities p and A (Fig. 5). The simplest estimation of p and A is obtained from the values of m(s) at two di!erent scales s and s . (We have two equations with two unknowns.) This approach gives the following results:
p"
s(1!m(s )/m(s )) , m(s )/m(s )!s/s
(12)
p 1# . (13) s Thus, we only need to know the values of the wavelet coe$cients at two di!erent scales to extract these characteristics. Another way to determine these two parameters is to estimate them through least mean square optimization. Indeed, we can make a linear "tting of the function s/m(s) A"m(s ) ) (2p )
1442
J. Fayolle et al. / Pattern Recognition 33 (2000) 1437}1453
Fig. 5. Fitting of the experimental variations of the wavelet transform modulus along maxima lines (*) with the theoretical ones (continuous lines) for di!erent values of p.
in the variable s. This "tting gives us the two previous parameters: p and A. Obviously, the least mean square estimation is more robust and more precise than the simple estimation. Note that, in our case, the original signal represents the evolution of the gradient direction along the edge. Therefore, the two quantities p and A get a special signi"cance. The transition length p corresponds to an arc length on the edge, and the amplitude A is an angle variation of the
gradient direction. Consequently, an estimation of the local curvature at each feature point is given by the ratio between A and p. Fig. 6 is an illustration of the feature point detection scheme. We detect the edge of an object (Fig. 6, top), the signal of gradient phase along its edge (Fig. 6, middle). Then, we extract the sharp transitions of this signal, and for each of them, we measure the two parameters A and p. This is done through the study of the evolution of the wavelet transform modulus along maxima lines. (Obviously, there is one maxima line per transition detected on the gradient phase signal.) We have represented the ratio between A and p which is an estimation of the local curvature (Fig. 6, bottom). For illustration, some feature points detected in this example are marked both on the object and on the curvature signal. 2.4. Selection of points according to the local curvature The main advantage of our detection scheme is the ability to estimate the local curvature (in addition to the singularity type selection). The curvature information cannot be available in the algorithm proposed by Chen [10]. Moreover, the knowledge of local curvature at each feature point allows a selection a posteriori between these
Fig. 6. Detection of feature points on an object (top): the signal of gradient direction and the estimated curvature A/p are represented (middle and bottom). Some feature points are identi"ed for illustration.
J. Fayolle et al. / Pattern Recognition 33 (2000) 1437}1453
1443
Fig. 7. Example of feature points selection in function of local curvature criterion. The curvature signal is segmented at di!erent thresholds: c (a) c"0.2 (b) c"0.3 (c) c"0.4.
points. We can choose to retain only points for which the curvature is above a given threshold. Therefore, this curvature information brings our scheme closer to the classical ones (detection on binary images) with the advantages given by the multiscale approach. An example of feature points selection in function of the local curvature criterion is shown Fig. 7.
3. Experimental results In this section, we provide results of two experiments which are conducted to test the e$ciency of the proposed multiscale detection scheme. To illustrate the ability of the algorithm to detect feature points on grey level images, we apply it to two reproductions of famous paintings of Van Gogh and Kandinsky. The "rst test image is the famous Van Gogh painting: `The siestaa (Fig. 8). On this image, objects cannot be de"ned easily. Therefore, the classical detection of feature points fails. But, the proposed multiscale detection allows a correct detection of edges and feature points along them, even if the interpretation of the obtained points is not obvious (Fig. 9). The other test is realized on an image of Kandinsky painting: `Signa (Fig. 8). Unlike the previous image, the object edges are very clear and the de"nition of each edge is easy. This case is more representative of a realistic scene where objects have well de"ned edges. Therefore, it is important to test the e$ciency of our algorithm on this image. The result obtained (Fig. 9) is quite good and the feature points detected have the expected localization. The edge detection is made at scale 2.5 (i.e. the standard deviation of the derivative of the Gaussian used as basic wavelet is equal to 2.5). The estimation of the length of transition, the amplitude and therefore the curvature is made at scales 4.0 and 4.4. These scales are chosen su$ciently high to avoid numerical noise on the gradient phase signal. Indeed, if these scales are lower, we detect shorter transitions in the signal and therefore points
Fig. 8. The two test images: `the siestaa of Van Gogh (a) and `Signea of Kandinsky (b).
corresponding to thinner details. The threshold used on the curvature signal is equal to 0.4 in both cases. These results are obtained for the following con"guration: the painting reproductions are seen from a perpendicular viewpoint (we will refer to this position as the 03 viewpoint). These results are taken as reference results for the following study of the method robustness.
4. Robustness of the method We have seen in the previous section that the proposed algorithm is able to detect feature points on grey level images even if the segmentation of the object is di$cult. We now discuss about the robustness of the method according to "ve parameters: additive noise, changes of the viewpoint (a zoom or a change of the view angle of the camera), the position of the light source or the rotation of the image. We call these parameters `transformation parametersa. This set of parameters is chosen in order to test the ability of the method to detect the same feature points on the objects represented on a 2D image even if this image is seen from di!erent viewpoints. The aim is to show that the detected feature points are
1444
J. Fayolle et al. / Pattern Recognition 33 (2000) 1437}1453
Fig. 9. Experimental results of feature points detection for the two test images. Feature points are represented with blank disks. These images are obtained for the 03 viewpoint.
representative of the object shape. If this result is reached, the use of the set of feature points for many applications is justi"ed (recognition of an object in an image database, motion determination or pattern recognition). The experimental protocol used is the following one. We acquire an image of the painting reproduction from a given viewpoint, we detect feature points on it, and we apply a numerical transform to these points in order to determine their localization in the coordinate system of the reference image. Then we can compare the set of feature points detected for this image and for the reference image (03 viewpoint). The robustness of the method is given by the percentage of feature points still detected (i.e. a feature point corresponds to the same detail of the object seen from di!erent viewpoints). These tests are made for each of the transformation parameters and for the Van Gogh and Kandinsky paintings. The set of obtained results is a good estimator of the method robustness. In addition, we compare our method to the results obtained by other detection schemes of feature points [29]. 4.1. Evaluation method We evaluate the robustness of our method through the measurement of an e$ciency coe$cient. This coe$cient is equal to the percentage of the total number of points in the "rst image which are still detected on the second image. (This second image is obtained after the application of one transformation parameter). If we denote (P ) G and (Q ) the sets of feature points detected on the two G di!erent images of the same object, these sets should be
linked by a transformation matrix H between the two images. Theoretically, we have the following relation: (P )"H(Q ) (14) G G In real situations, even if the points are correctly detected, the points obtained by H(Q ) are not strictly at the G positions of (P ). Therefore, we use a coe$cient of e$cienG cy de"ned by the following ratio: number of points Q for which distance(P , HQ ))e G G G . (15) number of points P G We choose e equal to (2 pixel. Then, this ratio is equal to the percentage of feature points which are still detected (in a 3;3 pixels neighborhood around the reference point) when we apply a transformation on the image (change of the viewpoint, change of the position of the light source,2). Obviously, an important task for the evaluation of the e$ciency is the determination of the transformation matrix H. This task is a well-known problem in robotics. Indeed, the determination of the transformation matrix between the two images of a same object is equivalent to the determination of the transformation matrix between the two cameras in stereoscopy. Therefore, we can suppose that two virtual cameras acquire the two images and then we calibrate these cameras independently. The knowledge of the transformation between these two cameras tells us the apparent transformation of the object. To determine the transformation matrix, we use a standard calibration technique. The principle of this method is to calibrate each of the virtual cameras to
J. Fayolle et al. / Pattern Recognition 33 (2000) 1437}1453
1445
Fig. 10. Images and features points detected at 403.
obtain their positions and their orientations from the known positions of 3D points of a calibration pattern [30]. Knowing these positions and these orientations, a simple calculus gives the transformation matrix H (according to the object coordinate system). We present below the results of the robustness evaluation for each transformation parameter. 4.2. Results of the robustness evaluation 4.2.1. Viewpoint The "rst parameter tested is the change of the viewpoint. We acquire a set of images of the same object (either the Van Gogh or the Kandinsky paintings) with di!erent orientations. The camera is "xed and the object is hung on a micrometric rotation and translation stage. The rotation ranges in [03, 403] by steps of 53. We use the image acquired at 03 as reference image and we compare the set of points detected on this image with the set of points detected on the image at 53, 103,2 until 403. Fig. 10 shows the images obtained at 403 for both the Van Gogh sequence and the Kandinsky sequence. The curves of Figs. 10 and 11 show the evolution of the e$ciency coe$cient for the two sequences. These results indicate that even for a change of viewpoint of 403, there are around 35% of the feature points which are still correctly detected. As soon as the viewpoint changes, the e$ciency coe$cient falls to 70% and the value of the angle is not very important. Indeed, when we increase the viewpoint angle, the coe$cient decreases but this decrease is less than the initial one (this behavior is particularly true for the Van Gogh sequence). Moreover, we have a better e$ciency for the Van Gogh
Fig. 11. Evolution of the e$ciency of the feature point detection according to the viewpoint angle. (䉬 Van Gogh sequence, * Kandinsky sequence).
sequence than for the Kandinsky one (which is theoretically more simple). Indeed, the detection of feature points on the Kandinsky painting which are not very representative (like points along vertical lines of color changes) is the explanation for this strange behavior. If we retain only the points corresponding to transitions with important amplitude, we have a result similar to the Van Gogh case (and the number of detected points is less). 4.2.2. Scale The second parameter tested is the scale change (i.e. a zoom in and a zoom out on the object). To simulate this phenomenon, we change the focal length of the camera lens from 24 to 108 mm. Fig. 12 shows the images obtained for the extreme values for the Van Gogh sequence. The scale ratio between these two images is equal to 4.5.
1446
J. Fayolle et al. / Pattern Recognition 33 (2000) 1437}1453
Fig. 12. Images and features points detected for the extreme scales: 0.48 (a) and 2.2 (b).
Fig. 13. Evolution of the e$ciency of the feature point detection according to a scale change (䉬 Van Gogh sequence, * Kandinsky sequence).
We choose a reference image in the middle of the sequence. Thus, we test both the e!ect of zoom in and zoom out. The following results (Figs. 12 and 13) are presented according to the ratio of scale. The evolution of the e$ciency coe$cient expresses the sensitivity of the method to any scale change. Indeed, even if the results are quite better for the Kandinsky sequence than for the Van Gogh one, these results are not very satisfying. As for the "rst parameter, we note a signi"cant decrease for little changes around the reference image, and a less signi"cant decrease after the "rst evolution. Apart from the "rst decrease, the reason of the e$ciency decrease is linked to the intrinsic change of the edge shape. Indeed, strong variations of the scale alter directly the shape of the objects. For example, in the case of a zoom out, the thin details of edges are erased and, consequently, the gradient phase signal along edges is not
the same. Thus, it is normal that only few points are still correctly detected. However, even for a very important scale change, there are around 20% of feature points which are correctly detected in the di$cult case of the Van Gogh sequence. Moreover, these points are those corresponding to high amplitude transitions. Furthermore, compared to other methods (like the detector of Harris [18]), our result is a little bit better [29]. This is probably due to the multiscale approach used here. We conclude that our method is more robust than the other ones to determine the scale change, but the results are still not satisfying. A way to improve the results of our method is to retain as feature points only those which remain at an important scale. Indeed, if we increase the scale of edge detection, we will erase many details along with them. Expressed in terms of gradient phase signal, we eliminate the little transitions. Only important variations remain. Obviously, these important transitions have a more robust localization than the little ones (which are more sensitive to noise). Therefore, the feature points detected from these important transitions have a more robust localization. For example, in the case of a zoom out, the details along the object edges are lost and therefore only the feature points detected at an important scale can still be detected. In this case, in order to preserve a good e$ciency of the detection during the zoom out, we should detect only the points corresponding to important features and not thin details. This is achieved through the choice of a su$ciently high detection scale. To prove this e!ect, we increase the scale of feature points detection (6.0 instead of 4.0) for the reference image. For the other images (taken with a di!erent zoom)
J. Fayolle et al. / Pattern Recognition 33 (2000) 1437}1453
we still detect at the scale 4.0. This choice simulates the application of image database consultation (the reference image is the one belonging to the database and the other images are the samples we try to recognize). For this application, it is correct to allow the choice of the detection scale for the reference image, on the other hand all the samples should be processed in the same manner even if they are very di!erent. The results obtained for both the two test sequences are shown in Fig. 14. As it is expected, the detection of feature points at an important scale for the reference images improves the e$ciency of the method, in particular for the zoom out case. The average pro"t is around 20% for the Van Gogh test and 7% for the Kandinsky test. The improvement is greater in the case of the Van Gogh painting because there are more points corresponding to very thin details. These points, which will disappear during the zoom out, are no more detected at important scales, and therefore the e$ciency of the detection scheme increases. 4.2.3. Scene lighting We test now the e!ect of any change of the scene lighting. We distinguish two types of lighting change: the simple one which is uniform on the entire scene and the complex one which corresponds to a variation of the lighting direction. We test these two types of e!ect. We use a continuous lighting whose power supply can be controlled (a halogen lamp). For the study of the e!ect of the uniform variation, we acquire a sequence of images with di!erent values of the lighting power and we express the variation of the lighting through the variation of the average grey level. The lighting varies from 200 Lux to around 5000 Lux, corresponding to average grey levels: 57}210 for the Van Gogh sequence and 75}215 for the Kandinsky sequence. As for the previous case, the results are presented according to a reference image corresponding to an average lighting. On the whole, these results are satisfactory, in particular for the Kandinsky sequence. Indeed, for this sequence,
1447
more than 80% of the feature points are still detected even for an important variation of the average grey level. For the Van Gogh sequence, the results are not so good. In particular, when the average grey level is higher than the reference grey level, we have only 40% of e$ciency. This is mainly because of an overexposition of the scene (some objects disappear). The same type of phenomenon also explains the poor e$ciency obtained for the low value of the lighting (for the Van Gogh sequence). Indeed, the contrast is very weak (Fig. 15) and the edge of some objects becomes very fuzzy, almost invisible. The feature points of these objects are not detected. This phenomenon is less important for the Kandinsky sequence because the de"nition of the edges of objects is better. Even if the contrast is weak, we can localize these edges and we can detect the corresponding points. In conclusion, the method gives good results if the lighting variation does not induce the apparent elimination of objects (due to the over or underexposure of the CCD array) (Fig. 16). We also study the e!ect of the direction of the light source. To test this e!ect, we move the light source on a portion of a circle around the object. The variation of the lighting direction ranges in [03, 353] by step of 53. Fig. 17 shows images of Van Gogh and Kandinsky sequences for the complex variation of lighting. The light source position used as reference (called `zero degree positiona) is the one used for the previous tests: the lighting direction and the object plane are almost perpendicular (the approximation is due to the impossibility to align the light source and the camera). As for the case of uniform variations, the results are good (and better for the Kandinsky sequence). The reason is identical: the objects in the Kandinsky painting are more de"nite and the e!ect of overexposition is less important. We can always determine the object edges and therefore we can detect the feature points. For the Kandinsky sequence, we get around 75% of e$ciency of the detection scheme for light source direction less than 303. For greater values of the light source direction, the
Fig. 14. Evolution of the e$ciency of the feature point detection according to a scale change. Comparisons between the detection at a little scale and at an important scale (a) Van Gogh sequence 䉬 little scale 䊐 important scale (b) Kandinsky sequence 䢇 little scale * important scale.
1448
J. Fayolle et al. / Pattern Recognition 33 (2000) 1437}1453
Fig. 15. Images and feature points detected for the extreme values of uniform lighting change: 0.43 (a) and 1.57 (b).
Fig. 16. Evolution of the e$ciency of the feature point detection according to uniform variations of lighting (䉬 Van Gogh sequence, * Kandinsky sequence).
light re#ection becomes very important and the images are quite overexposed (Fig. 17). The e!ect of the light direction is more important for the Van Gogh sequence. In this case, the object edges are less clear and therefore, they are more sensitive to overexposition e!ect. On the whole, for changes of the lighting direction which does not induce overexposed images, the results are quite satisfying (changes lower than 203) (Fig. 18). 4.2.4. Image rotation The fourth parameter tested is the image rotation around the optical axis. It is very di$cult to realize physically this rotation because of the poor determination of both the optical center and the optical axis. Therefore, we choose to simulate this transformation numerically: we rotate the image by image processing. In
this case, we do not have to calibrate the acquisition system since we know exactly the transformation between the two images and therefore, we can apply the reverse transformation to compare the sets of points. Due to the symmetry of the rotation e!ect, we test only the in#uence of rotation for angles ranging in [03, 903]. This interval is sampled by steps of 53. The curves (Fig. 19) show the evolution of the e$ciency coe$cient for both the Van Gogh and the Kandinsky sequences. The two curves follow the same evolution: an initial decrease and stabilization around an e$ciency coe$cient of 80% for the Kandinsky sequence and 55% for the Van Gogh sequence. These results are very satisfying because of the no-link between the rotation angle and the value of the e$ciency coe$cient. This phenomenon indicates the robustness of the method according to this parameter. The percentage of feature points preserved during the transformation is linked to the image type: if the objects are well de"ned, a lot of points are preserved (80% for the Kandinsky case), on the contrary for blurred objects, points corresponding to very thin details are not preserved during the rotation. 4.2.5. Noise Finally, the last parameter tested is the e!ect of additive noise. As for the previous case, we simulate numerically this transformation. We generate uniform noises with di!erent amplitudes and we add them, respectively, to a reference image (either the Van Gogh or the Kandinsky painting). This way, we obtain a sequence of images with di!erent signal-to-noise ratio (SNR). Fig. 20 shows the two images obtained for the lowest values of
J. Fayolle et al. / Pattern Recognition 33 (2000) 1437}1453
1449
Fig. 17. Images and feature points detected with a lighting source direction of 353.
Fig. 18. Evolution of the e$ciency of the feature point detection according to complex variations of lighting (䉬 Van Gogh sequence, * Kandinsky sequence).
Fig. 19. Evolution of the e$ciency of the feature point detection according to image rotation (䉬 Van Gogh sequence, * Kandinsky sequence).
the SNR in both cases. We compare the detected points on these images to those detected on the original image. The results (Figs. 20 and 21) are presented according to a signal-to-noise ratio de"ned as
the signal amplitude, there is still 60% of feature points which are still correctly detected. In the case of the Van Gogh sequence, the decreasing rate is more important. However, 40% of points are still correctly detected for the noises studied (if we except the last measure: 25% of points still detected for noise and signal with same amplitudes). In conclusion, noise addition induces a decrease of the e$ciency coe$cient but this coe$cient remains su$ciently high to assure the robustness of the detection scheme. (The points which are still detected are those corresponding to the main features of the images).
SNR"10 log
signal amplitude . noise amplitude
(16)
The behavior of the detection scheme is identical for both the Van Gogh and the Kandinsky sequences. Indeed, the e$ciency of the detection decreases with the SNR. This behavior is foreseeable because the objects become more and more blurred as the noise amplitude increases. Therefore, the loss of some feature points is expected. The decreasing rate is less important for the Kandinsky case for which the object edges are well de"ned. In this case, even if the noise amplitude equals
4.3. Discussion on these results We should remark also that the e$ciency curves presented above are not strictly monotonic. This phenomenon is intrinsic to the measurement process. Indeed, we
1450
J. Fayolle et al. / Pattern Recognition 33 (2000) 1437}1453
Fig. 20. Images and feature points detected for the lowest SNR.
Fig. 21. Evolution of the e$ciency of the feature point detection according to noise contamination (䉬 Van Gogh sequence, * Kandinsky sequence).
always compare the results obtained from the current values of transformation parameters to the reference image. For example, for the study of the viewpoint e!ect, we compare the results obtained from 53, 103,2 until 453 viewpoints to 03 viewpoint. A feature point, which has disappeared for example on the image at 353, could reappear on the image at 403. And that induces the oscillations of the e$ciency curves. The fact that a feature point can disappear on an image and reappear on the successive one is linked both to the calibration process and to the detection scheme. Indeed, the calibration process used introduces some errors and it is possible that the two calibrated images are not exactly in correspondence, but di!ers from 1 or 2 pixels. This "rst phenomenon is particularly important for the rotation e!ect. On the other hand, the shape of the object edges is altered
by the transformation on the image. Therefore, the gradient phase signal extracted from one edge can be very di!erent from one test to the other, and di!erent feature points will be detected. This second phenomenon is particularly important for the change of viewpoint and zoomin or zoom-out e!ects. However, the oscillations of the e$ciency curves remain weak according to the main evolution of these curves. And the average variations express the behavior of the feature point detection according to the tested set of parameters. In order to erase the perturbations around the average variations, we can apply statistical approaches: we can measure many times the e!ect of the same parameter. The average of all the e$ciency curves will give a more robust evaluation of the detection scheme. 4.4. Comparison with other methods In this section, we compare the results presented above with those of other detection schemes of feature points. In [29], Schmid has tested many algorithms of features points detection (in particular those of Harris [18], Forstner [31], and Horaud [32]). The experimental protocol used for these tests is very close to ours. Indeed, the same set of parameters is addressed and the ranges of values tested for each parameter are comparable. Moreover, these tests are also realized on images of Van Gogh painting. Therefore, the comparison of the di!erent results is signi"cant. In order to simplify this comparison, we just compare our results to the best method of all the techniques tested by Schmid: an improvement of the
J. Fayolle et al. / Pattern Recognition 33 (2000) 1437}1453
1451
Fig. 22. Comparison between multiscale detection of feature points using wavelets (䉬) and Harris detector (*) for the following parameters: (a) viewpoint, (b) scale, (c) uniform lighting, (d) complex lighting, (e) image rotation.
detector of Harris. Brie#y, this detection scheme is based on the study of the following matrix:
e\V>W
I V I VW
I
VW . I W
(17)
In this expression, I represents the grey value function and the index x or y corresponds to the derivation of this function in the x- or y-axis direction. If the eigenvalues of this matrix are high, then the point is retained as feature point. The improvement of Harris detector is to smooth the grey level function before the derivation calculus. Then, the "rst derivative is less sensitive to noise and the obtained results are more robust. (The smoothing func-
tion used is a Gaussian.) We discuss hereafter of the e$ciency of our method according to the smoothed detector of Harris (Fig. 22). On the whole, the results are quite similar. The robustness of the two methods to any change of viewpoint, scale, luminescence, noise or rotation is almost equivalent. Moreover, for both techniques, many points are retained as representative of the object shapes and we can select between these points the more representative ones (through the segmentation of the local curvature). But, with our approach, we have access to a more precise characterization of the feature points. Indeed, for each point, we know the amplitude and the length of the gradient phase transition. These parameters are useful.
1452
J. Fayolle et al. / Pattern Recognition 33 (2000) 1437}1453
For example, the knowledge of the length of the transition informs us about the smoothness of the object shape. If we detail the study of the robustness of both methods, we can extract the following conclusions. The viewpoint and the lighting direction changes are the worst changes for our detection scheme. Indeed, the e$ciency of the proposed method is a little bit lower than the e$ciency of the smoothed Harris detector. However, the calibration process used by Schmid is not identical to our process, and we cannot separate the e!ects intrinsic to the detection scheme and those of the calibration method. On the other hand, the scale change is the best parameter in our case, according to other methods. Indeed, although the results are not very good, they are better than the other ones (the average of e$ciency gain is around 10%). For the rotation parameter, there are some di!erences between the two methods: the results obtained with the Harris detector are better. But, we remark that for rotations which are multiple of 22.53, the results are identical. Therefore, it seems that the weakness e$ciency of our algorithm is due to aliasing default during our calibration process. Finally, the e!ect of uniform variation of luminescence on the results is very similar for both the methods (if we consider only the non-overexposed cases). The last parameter we have to compare is the noise e!ect. But, the test carried out by Schmid [29] is not signi"cant. Indeed, their experimental protocol for this parameter is to acquire di!erent images with no changes in the acquisition parameter. Then, Schmid tests if the feature points have the same localization. But, the signal-to-noise ratio remains constant during this test. Therefore, the e!ect of noise cannot be compared to our result. In conclusion of this comparison, the robustness of the proposed scheme of feature points detection is equivalent to one of the better techniques proposed elsewhere. It is advantageous to use the proposed algorithm in the case of object scale change. On the other hand, our method is more sensitive to the change of viewpoint and lighting direction.
5. Conclusion We have presented in this paper a new scheme of feature point detection based on the behavior across scales of wavelet coe$cients and on the extraction and characterization of edges. The proposed algorithm can detect feature points directly on grey level images, even if the object segmentation is not easy (for example in the case of non uniform lighting of the background). The main originality of our method comes from the study of variations of a one-dimensional signal: the direction of gradient vectors along edges. We select among all variations those corresponding to a step. This is done
through the study along maxima lines of the logarithmic slope of the wavelet coe$cient modulus. Then, we propose a characterization of feature points by the length and the amplitude of the phase transition. The ratio between these two quantities gives us an estimation of the local curvature. This characterization and the selection of feature points on the curvature criterion are the main advantages of our method over the classical detection schemes. (We can retain only points corresponding to very high curvature among all the set of feature points). In the second part of this paper, we present an experimental study of the robustness of the method against "ve parameters: noise, image rotation, viewpoint, light source and scale changes. The obtained results show that the proposed method is robust against image rotation, noise contamination, and variation of the lighting (uniform and/or variation of the direction of light source). For the two other parameters, the results are not so satisfying. The e$ciency of the detection scheme is less important for changes of scale or viewpoint. However, the e$ciency is always higher than 40% for the viewpoint transformation and than 20% for scale change. Moreover, the detection of feature points at a greater scale improves this result. The average gain is around 7% for a `simple imagea like the Kandinsky painting and 20% for a `complexa one like the Van Gogh painting. In addition, our results for scale e!ect are better than those obtained by other detection schemes. On the whole, the robustness of the method is good (in particular if the object edges are well de"ned like that of the Kandinsky painting). Therefore, we can use it as the "rst step of algorithms such as motion determination, pattern recognition algorithms or image database consultation.
References [1] R. Talluri, J.K. Aggarwal, Mobile robot self location using model image feature correspondence, IEEE Trans. Robotics autom. 12 (1) (1996) 63}77. [2] K. Yu, X.Y. Jiang, H. Bunke, Robust facial pro"le recognition, IEEE Int. Conf. Image Process. 3 (1996) 491}494. [3] J. Fayolle, C. Ducottet, T. Fournel, J.P. Schon, Motion characterization of unrigid objects by detecting and tracking feature points, IEEE Int. Conf. Image Process. 3 (1996) 803}806. [4] J. Fayolle, C. Ducottet, J.P. Schon, Application of multiscale characterization of edges to motion determination, IEEE Trans. Signal Process. 46 (4) (1998) 1174}1179. [5] L.S. Shapiro, J.M. Brady, Feature based correspondence: an eigenvector approach, Image vision comput. 10 (5) (1992) 283}288. [6] I.J. Cox, S.L. Hingorani, An e$cient implementation of Reid's multiple hypothesis tracking algorithm and its evaluation for the purpose of visual tracking, IEEE Trans. Pattern Anal. Mach. Intell. 18 (2) (1996) 138}150.
J. Fayolle et al. / Pattern Recognition 33 (2000) 1437}1453 [7] I.K. Sethi, R. Jain, Finding trajectories of feature points in a monocular image sequence, IEEE Trans. Pattern Anal. Mach. Intell. 9 (1) (1987) 56}73. [8] Q. Zheng, R. Chellappa, Automatic feature point extraction and tracking in image sequences for arbitrary camera motion, Int. J. Comput. Vision 15 (1995) 31}76. [9] F. Attneave, Some informational aspects of visual perception, Psychol. Rev. 61 (3) (1954) 183}193. [10] C.H. Chen, J.S. Lee, Y.N. Sun, Wavelet transformation for grey level corner detection, Pattern Recognition 28 (6) (1995) 853}861. [11] C.H. Teh, R.T. Chin, On the detection of dominants points on digital curves, IEEE Trans. Pattern Anal. Mach. Intell. 11 (8) (1989) 859}872. [12] A. Rosenfeld, E. Johnston, Angle detection on digital curves, IEEE Trans. Comput. C22 (1973) 875}878. [13] P.V. Sankar, C.V. Sharma, A parallel procedure for the detection of dominants points on a digital close curves, Comput. Graphics image process. 7 (1978) 403}412. [14] W.Y. Wu, M.J.J. Wang, Detecting the dominant points by the curvature based polygonal approximation, CVGIP: Graphical models image process. 55 (2) (1993) 79}88. [15] M.J. Laboure, J. Azema, T. Fournel, Detection of dominant point on a digital curve, Acta stereologica 11 (2) (1992) 169}174. [16] L. Kitchen, A. Rosenfeld, Grey level corner detection, Pattern recognition lett. 1 (1982) 95}102. [17] H. Moravec, Rover visual obstacle avoidance, Proceedings of the Seventh International Joint Conference on Arti"cial Intelligence, 1981, pp. 785}790. [18] C. Harris, M. Stephens, A combined corner and edge detector, Proceedings of the fourth Alvey Vision Conference, 1988, pp. 174}151. [19] D. Reisfeld, H.J. Wolfson, Y. Yeshurun, Context free attentional operators: the generalized symmetry transform, Int. J. comput. vision 14 (1995) 119}130.
1453
[20] B. Lucas, T. Kanade, An iterative image registration technique with an application to stereo vision, Proceedings of the Seventh International Joint Conference on Arti"cial Intelligence, 1981, pp. 674}679. [21] S. Mallat, S. Zhong, Characterization of signals from multiscale edges, IEEE Trans. Pattern Anal. Mach. Intell. 14 (7) (1992) 710}732. [22] S. Mallat, W.L. Hwang, Singularity detection and processing with wavelets, IEEE Trans. Inf. theory 38 (2) (1992) 617}643. [23] J. Canny, A computational approach to edge detection, IEEE Trans. Pattern Anal. Mach. Intell. 8 (6) (1986) 679}698. [24] I. Daubechies, The wavelet transform, time frequency localization and signal analysis, IEEE Trans. Inf. theory 36 (5) (1990) 961}1005. [25] I. Daubechies, Ten lectures on wavelet, Philadelphia SIAM 1992. [26] Y. Meyer, Ondelettes et opeH rateurs I: ondelettes, Paris, Herman 1990. [27] A. Arneodo et al., Ondelettes, multifractales et turbulences, Diderot edition, Arts et sciences, 1995. [28] B. Torresani, Analyse continue par ondelettes, InterEditions @ CNRS Editions, Paris 1995. [29] C. Schmid, Appariement d'images par invariants locaux de niveaux de gris, application a` l'indexation d'une base d'objets, Ph.D. thesis, University of Grenoble, 1996. [30] L. Riou, J. Fayolle, T. Fournel, PIV measurement using multiple cameras: the calibration method, Proceedings of the Eigth International Symposium on Flow Visualization, Sorrento, September 98. [31] W. Forstner, A framework for low level feature extraction, Proceedings of the Third European Conference on Computer Vision, 1991. [32] R. Horaud, T. Skordas, F. Veillon, Finding geometric and relational structures in an image, Proceedings of the First European Conference on Computer Vision 1990, pp. 374}384.
About the Author*JACQUES FAYOLLE was born in 1970 in France. He received his postgraduate diploma on `Imagesa in 1993 and the Ph.D. degree on `Image analysis and image processinga in 1996 from the Saint-Etienne University. He is currently a professor at the `Institut Universitaire Professionnalisant TeH leH communicationsa and at the research laboratory `Traitement du Signal et Instrumentation-UMR CNRS 5516a at Saint-Etienne. His current research interests include the determination of motion and deformation. The image processing methods are based on mathematical tools such as the continuous wavelet transform. The main principle of motion determination algorithms is to track some feature points of the scene. His current research addresses the problem of 3D motion determination. Other interest "elds are pattern recognition through feature points characterization, camera calibration, segmentation of fuzzy objects, and more generally the applications of the continuous wavelet transform. The main applications of his research are the determination of displacements in turbulent #ows and the quanti"cation of structures on X-ray images. About the Author*LAURENCE RIOU was Born in Saint-Etienne, France in 1973. She graduated from the `Institut SupeH rieur des Techniques AvanceH es de Saint-Etiennea (ISTASE), France in 1996 and also received her postgraduate diploma on `Imagesa in 1996. Since october 1996, she has been a Ph.D. student at the `Traitement du Signal et Instrumentationa laboratory of the CNRS (UMR 5516) Saint-Etienne, France. Her research interests include the areas of image processing and computer vision applied to camera calibration and 3D motion determination. About the Author*CHRISTOPHE DUCOTTET was born in Lyon, France, in 1967. He graduated from the Ecole Nationale SupeH rieure de Physique de Marseille, France, in 1990 and he received the Ph.D. degree in image processing from Saint-Etienne University, France in 1994. He is currently a Professor at the Institut SupeH rieur des Techniques AvanceH es de Saint-Etienne (ISTASE), France and he is a reasearcher at the Traitement du Signal et Instrumentation (TSI) laboratory of the CNRS (UMR 5516), Saint-Etienne, France. His research interests are image processing applied to unrigid motion determination, 3D motion determination and segmentation of fuzzy objects.
Pattern Recognition 33 (2000) 1455}1465
Genetic algorithm-based clustering technique Ujjwal Maulik , Sanghamitra Bandyopadhyay* Department of Computer Science, Government Engineering College, Kalyani, Nadia, India Machine Intelligence Unit, Indian Statistical Institute, 203, B.T. Road, Calcutta - 700 035, India Received 24 June 1998; received in revised form 29 April 1999; accepted 29 April 1999
Abstract A genetic algorithm-based clustering technique, called GA-clustering, is proposed in this article. The searching capability of genetic algorithms is exploited in order to search for appropriate cluster centres in the feature space such that a similarity metric of the resulting clusters is optimized. The chromosomes, which are represented as strings of real numbers, encode the centres of a "xed number of clusters. The superiority of the GA-clustering algorithm over the commonly used K-means algorithm is extensively demonstrated for four arti"cial and three real-life data sets. 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Genetic algorithms; Clustering metric; K-means algorithm; Real encoding; Euclidean distance
1. Introduction Genetic algorithms (GAs) [1}4] are randomized search and optimization techniques guided by the principles of evolution and natural genetics, having a large amount of implicit parallelism. GAs perform search in complex, large and multimodal landscapes, and provide near-optimal solutions for objective or "tness function of an optimization problem. In GAs, the parameters of the search space are encoded in the form of strings (called chromosomes). A collection of such strings is called a population. Initially, a random population is created, which represents di!erent points in the search space. An objective and xtness funtion is associated with each string that represents the degree of goodness of the string. Based on the principle of survival of the "ttest, a few of the strings are selected and each is assigned a number of copies that go into the mating pool. Biologically inspired operators like cross* Corresponding author. Present address: School of Computer Science and Engineering, University of New South Wales, Sydney 2052, Australia; Tel.: 00-61-2-9385-3975; fax: 00-61-29385-1814. E-mail addresses:
[email protected] (U. Maulik),
[email protected] (S. Bandyopadhyay). On leave from Indian Statistical Institute.
over and mutation are applied on these strings to yield a new generation of strings. The process of selection, crossover and mutation continues for a "xed number of generations or till a termination condition is satis"ed. An excellent survey of GAs along with the programming structure used can be found in Ref. [4]. GAs have applications in "elds as diverse as VLSI design, image processing, neural networks, machine learning, jobshop scheduling, etc. [5}10]. In the area of pattern recognition, there are many tasks involved in the process of analyzing/identifying a pattern which need appropriate parameter selection and e$cient search in complex spaces in order to obtain optimum solutions. Therefore, the application of GAs for solving certain problems of pattern recognition (which need optimization of computation requirements, and robust, fast and close approximate solution) appears to be appropriate and natural. Research articles in this area have started to come out [11,12]. Recently, an application of GAs has been reported in the area of (supervised) pattern classi"cation in R, [13,14] for designing a GA-classixer. It attempts to approximate the class boundaries of a given data set with a "xed number (say H) of hyperplanes in such a manner that the associated misclassi"cation of data points is minimized during training. When the only data available are unlabeled, the classi"cation problems are sometimes referred to as
0031-3203/00/$20.00 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 1 3 7 - 5
1456
U. Maulik, S. Bandyopadhyay / Pattern Recognition 33 (2000) 1455}1465
unsupervised classixcation. Clustering [15}19] is an important unsupervised classi"cation technique where a set of patterns, usually vectors in a multi-dimensional space, are grouped into clusters in such a way that patterns in the same cluster are similar in some sense and patterns in di!erent clusters are dissimilar in the same sense. For this it is necessary to "rst de"ne a measure of similarity which will establish a rule for assigning patterns to the domain of a particular cluster centre. One such measure of similarity may be the Euclidean distance D between two patterns x and z de"ned by D"""x!z"". Smaller the distance between x and z, greater is the similarity between the two and vice versa. Several clustering techniques are available in the literature [19,20]. Some, like the widely used K-means algorithm [19], optimize of the distance criterion either by minimizing the within cluster spread (as implemented in this article), or by maximizing the inter-cluster separation. Other techniques like the graph theoretical approach, hierarchical approach, etc., are also available which perform clustering based on other criteria. These are discussed in brief in Section 2. Extensive studies dealing with comparative analysis of di!erent clustering methods [21] suggest that there is no general strategy which works equally well in di!erent problem domains. However, it has been found that it is usually bene"cial to run schemes that are simpler, and execute them several times, rather than using schemes that are very complex but need to be run only once [21]. Since our aim is to propose a clustering technique based on GAs, a criterion is required whose optimization would provide the "nal clusters. An intuitively simple criterion is the within cluster spread, which, as in the K-means algorithm, needs to be minimized for good clustering. However, unlike the K-means algorithm which may get stuck at values which are not optimal [22], the proposed technique should be able to provide good results irrespective of the starting con"guration. It is towards this goal that we have integrated the simplicity of the K-means algorithm with the capability of GAs in avoiding local optima for developing a GA-based clustering technique called GA-clustering algorithm. It is known that elitist model of GAs provide the optimal string as the number of iterations goes to in"nity [23] when the probability of going from any population to the one containing the optimal string is greater than zero. Therefore, under limiting conditions, a GA based clustering technique is also expected to provide an optimal clustering with respect to the clustering metric being considered. Experimental results comparing the GA-clustering algorithm with the K-means algorithm are provided for several arti"cial and real-life data sets. Since our purpose is to demonstrate the e!ectiveness of the proposed technique for a wide variety of data sets, we have chosen arti"cial and real-life data sets with both overlapping and non-overlapping class boundaries, where the
number of dimensions ranges from two to ten and number of clusters ranges from two to nine. Note that we are encoding the centres of the clusters, which will be #oating point numbers, in the chromosomes. One way in which this could have been implemented is by performing real representation with a binary encoding [24]. However, in order to keep the mapping between the actual cluster centres and the encoded centres straight forward, for convenience we have implemented real coded GAs over here [3]. (In this context one may note the observations in Ref. [25] after they experimentally compared binary and #oating point representations in GAs. They found that #oating point representation was faster, consistent and provided a higher degree of precision.)
2. Clustering Clustering in N-dimensional Euclidean space 1, is the process of partitioning a given set of n points into a number, say K, of groups (or, clusters) based on some similarity/dissimilarity metric. Let the set of n points +x , x ,2, x , be represented by the set S and the K clus L ters be represented by C ,C ,2, C . Then ) C O for i"1,2, K, G C 5C " for i"1,2, K, j"1,2, K and iOj G H ) and 8 C "S. G G Some clustering techniques that are available in the literature are K-means algorithm [19], branch and bound procedure [26], maximum likelihood estimate technique [27] and graph theoretic approaches [28]. The K-means algorithm, one of the most widely used ones, attempts to solve the clustering problem by optimizing a given metric. The branch and bound procedure uses a tree search technique for searching the entire solution space of classifying a given set of points into a "xed number of clusters, along with a criterion for eliminating subtrees which do not contain the optimum result. In this scheme, the number of nodes to be searched becomes huge when the size of the data set becomes large. In these cases, a proper choice of the criterion for eliminating subtrees becomes crucial [20]. The maximum likelihood estimate technique performs clustering by computing the posterior probabilities of the classes after assuming a particular distribution of the data set. In the graph theoretic approach, a directed tree is formed among the data set by estimating the density gradient at each point. The clustering is realized by "nding the valley of the density function. It is known that the quality of the result depends wholly on the quality of the estimation technique for the density gradient, particularly in the low-density area of the valley.
U. Maulik, S. Bandyopadhyay / Pattern Recognition 33 (2000) 1455}1465
Our aim in this article has been to propose a clustering methodology which will not assume any particular underlying distribution of the data set being considered, while, as already mentioned in Section 1, it should be conceptually simple like the K-means algorithm. On the other hand, it should not su!er from the limitation of the K-means algorithm which is known to provide sub optimal clusterings depending on the choice of the initial clusters. Since the principles of the K-means algorithm are utilized for devising such a technique, along with the capability of GAs for providing the requisite perturbation to bring it out of the local optima, we have compared the performance of the former with that of the proposed technique. The steps of the K-means algorithm are therefore "rst described in brief. Step 1: Choose K initial cluster centres z , z ,2, z ) randomly from the n points +x , x ,2, x ,. L Step 2: Assign point x , i"1, 2,2, n to cluster G C , j3 +1, 2,2, K, i! H ""x !z ""(""x !z "", p"1, 2,2, K, and jOp. G H G N Ties are resolved arbitrarily. Step 3: Compute new cluster centres zH, zH,2, zH as ) follows: 1 zH" x , i"1, 2,2, K, G H n G VHZ!G where n is the number of elements belonging to cluster G C. G Step 4: If zH"z , i"1, 2,2, K then terminate. OtherG G wise continue from step 2.
1457
Fig. 1. Basic steps in GAs.
Euclidean distances of the points from their respective cluster centres. Mathematically, the clustering metric M for the K clusters C , C ,2, C is given by ) ) M(C , C ,2, C )" ""x !z "". H G ) G VHZ!G The task of the GA is to search for the appropriate cluster centres z , z ,2, z such that the clustering metric M is ) minimized. 3.2. GA-clustering algorithm The basic steps of GAs, which are also followed in the GA-clustering algorithm, are shown in Fig. 1. These are now described in detail.
Note that in case the process does not terminate at Step 4 normally, then it is executed for a maximum "xed number of iterations. It has been shown in Ref. [22] that K-means algorithm may converge to values that are not optimal. Also global solutions of large problems cannot be found with a reasonable amount of computation e!ort [29]. It is because of these factors that several approximate methods are developed to solve the underlying optimization problem. One such method using GAs is described in the next section.
3.2.1. String representation Each string is a sequence of real numbers representing the K cluster centres. For an N-dimensional space, the length of a chromosome is N*K words, where the "rst N positions (or, genes) represent the N dimensions of the "rst cluster centre, the next N positions represent those of the second cluster centre, and so on. As an illustration let us consider the following example.
3. Clustering using genetic algorithms
51.6 72.3 18.3 15.7 29.1 32.2
3.1. Basic principle
represents the three cluster centres (51.6, 72.3), (18.3, 15.7) and (29.1, 32.2). Note that each real number in the chromosome is an indivisible gene.
The searching capability of GAs has been used in this article for the purpose of appropriately determining a "xed number K of cluster centres in 1,; thereby suitably clustering the set of n unlabelled points. The clustering metric that has been adopted is the sum of the
Example 1. Let N"2 and K"3, i.e., the space is twodimensional and the number of clusters being considered is three. Then the chromosome
3.2.2. Population initialization The K cluster centres encoded in each chromosome are initialized to K randomly chosen points from the
1458
U. Maulik, S. Bandyopadhyay / Pattern Recognition 33 (2000) 1455}1465
data set. This process is repeated for each of the P chromosomes in the population, where P is the size of the population. 3.2.3. Fitness computation The "tness computation process consists of two phases. In the "rst phase, the clusters are formed according to the centres encoded in the chromosome under consideration. This is done by assigning each point x , i"1, 2,2, n, to one of the clusters C with centre G H z such that H ""x !z ""(""x !z "", p"1, 2,2, K, and pOj. G H G N All ties are resolved arbitrarily. After the clustering is done, the cluster centres encoded in the chromosome are replaced by the mean points of the respective clusters. In other words, for cluster C , the new centre zH is computed G G as 1 zH" x , i"1, 2,2, K. G H n G HZ!G These zHs now replace the previous z s in the chromoG G some. As an illustration, let us consider the following example. Example 2. The "rst cluster centre in the chromosome considered in Example 1 is (51.6, 72.3). With (51.6, 72.3) as centre, let the resulting cluster contain two more points, viz., (50.0, 70.0) and (52.0, 74.0) besides itself i.e., (51.6, 72.3). Hence the newly computed cluster centre becomes ((50.0#52.0#51.6)/3, (70.0#74.0#72.3)/ 3)"(51.2, 72.1). The new cluster centre (51.2, 72.1) now replaces the previous value of (51.6, 72.3). Subsequently, the clustering metric M is computed as follows: ) M" M , G G M " ""x !z "". G x H G HZ!G The "tness function is de"ned as f"1/M, so that maximization of the "tness function leads to minimization of M. 3.2.4. Selection The selection process selects chromosomes from the mating pool directed by the survival of the "ttest concept of natural genetic systems. In the proportional selection strategy adopted in this article, a chromosome is assigned a number of copies, which is proportional to its "tness in
the population, that go into the mating pool for further genetic operations. Roulette wheel selection is one common technique that implements the proportional selection strategy. 3.2.5. Crossover Crossover is a probabilistic process that exchanges information between two parent chromosomes for generating two child chromosomes. In this article singlepoint crossover with a "xed crossover probability of k is A used. For chromosomes of length l, a random integer, called the crossover point, is generated in the range [1, l!1]. The portions of the chromosomes lying to the right of the crossover point are exchanged to produce two o!spring. 3.2.6. Mutation Each chromosome undergoes mutation with a "xed probability k . For binary representation of chromoK somes, a bit position (or gene) is mutated by simply #ipping its value. Since we are considering #oating point representation in this article, we use the following mutation. A number d in the range [0, 1] is generated with uniform distribution. If the value at a gene position is v, after mutation it becomes v$2 * d * v, v$2 * d,
vO0,
v"0.
The ' or &!' sign occurs with equal probability. Note that we could have implemented mutation as v$d * v. However, one problem with this form is that if the values at a particular position in all the chromosomes of a population become positive (or negative), then we will never be able to generate a new chromosome having a negative (or positive) value at that position. In order to overcome this limitation, we have incorporated a factor of 2 while implementing mutation. Other forms like v$(d#e) * v, where 0(e(1 would also have satis"ed our purpose. One may note in this context that similar sort of mutation operators for real encoding have been used mostly in the realm of evolutionary strategies (see Ref. [3], Chapter 8). 3.2.7. Termination criterion In this article the processes of "tness computation, selection, crossover, and mutation are executed for a maximum number of iterations. The best string seen upto the last generation provides the solution to the
U. Maulik, S. Bandyopadhyay / Pattern Recognition 33 (2000) 1455}1465
1459
clustering problem. We have implemented elitism at each generation by preserving the best string seen upto that generation in a location outside the population. Thus on termination, this location contains the centres of the "nal clusters. The next section provides the results of implementation of the GA-clustering algorithm, along with its comparison with the performance of the K-means algorithm for several arti"cial and real-life data sets.
4. Implementation results The experimental results comparing the GA-clustering algorithm with the K-means algorithm are provided for four arti"cial data sets (Data 1, Data 2, Data 3 and Data 4) and three real-life data sets (Vowel, Iris and Crude Oil), respectively. These are "rst described below:
Fig. 3. Data 3 (&1'*points from class 1, &2'*points from class 2,2, &9'*points from class 9).
4.1. Artixcial data sets Data 1: This is a nonoverlapping two-dimensional data set where the number of clusters is two. It has 10 points. The value of K is chosen to be 2 for this data set. Data 2: This is a nonoverlapping two-dimensional data set where the number of clusters is three. It has 76 points. The clusters are shown in Fig. 2: The value of K is chosen to be 3 for this data set. Data 3: This is an overlapping two-dimensional triangular distribution of data points having nine classes where all the classes are assumed to have equal a priori probabilities ("). It has 900 data points. The X!> ranges for the nine classes are as follows: Class Class Class Class
1: 2: 3: 4:
[!3.3, !0.7] ; [0.7, 3.3], [!1.3, 1.3] ; [0.7, 3.3], [0.7, 3.3] ; [0.7, 3.3], [!3.3, !0.7] ; [!1.3, 1.3],
Fig. 2. Data 2.
Fig. 4. Triangular distribution along the X-axis.
Class Class Class Class Class
5: 6: 7: 8: 9:
[!1.3, 1.3] ; [!1.3, 1.3], [0.7, 3.3] ; [!1.3, 1.3], [!3.3, !0.7] ; [!3.3, !0.7], [!1.3, 1.3] ; [!3.3, !0.7], [0.7, 3.3] ; [!3.3, !0.7].
Thus the domain for the triangular distribution for each class and for each axis is 2.6. Consequently, the height will be (since 12*2.6*height"1). The data set is shown in Fig. 3. The value of K is chosen to be 9 for this data set. Data 4: This is an overlapping ten-dimensional data set generated using a triangular distribution of the form shown in Fig. 4 for two classes, 1 and 2. It has 1000 data points. The value of K is chosen to be 2 for this data set. The range for class 1 is [0, 2];[0, 2];[0, 2]210 times, and that for class 2 is [1, 3];[0, 2];[0, 2]29 times, with the corresponding peaks at (1, 1) and (2, 1). The
1460
U. Maulik, S. Bandyopadhyay / Pattern Recognition 33 (2000) 1455}1465
distribution along the "rst axis (X) for class 1 may be formally quanti"ed as
0
for x)0,
x for 0(x)1, f (x)" 2!x for 1(x)2, 0
for x'2.
for class 1. Similarly for class 2 0
for x)1,
x!1 for 1(x)2, f (x)" 3!x for 2(x)3, 0
for x'3.
The distribution along the other nine axes (> , G i"1, 2,2, 9) for both the classes is for y )0, G y for 0(y )1, G f (y )" G 2!y for 1(y )2, G G 0 for y '2. G 0
classes +d, a, i, u, e, o,. The value of K is therefore chosen to be 6 for this data. Fig. 5 shows the distribution of the six classes in the F }F plane. Iris data: This data represents di!erent categories of irises having four feature values. The four feature values represent the sepal length, sepal width, petal length and the petal width in centimeters [31]. It has three classes (with some overlap between classes 2 and 3) with 50 samples per class. The value of K is therefore chosen to be 3 for this data. Crude oil data: This overlapping data [32] has 56 data points, 5 features and 3 classes. Hence the value of K is chosen to be 3 for this data set. GA-clustering is implemented with the following parameters: k "0.8 k "0.001. The population size P is taken to A K be 10 for Data 1, since it is a very simple data set, while it is taken to be 100 for the others. Note that it is shown in Refs. [15,29] if exhaustive enumeration is used to solve a clustering problem with n points and K clusters, then one requires to evaluate 1 ) (!1))\HjL K H
4.2. Real-life data sets Vowel data: This data consists of 871 Indian Telugu vowel sounds [30]. These were uttered in a consonant} vowel}consonant context by three male speakers in the age group of 30}35 years. The data set has three features F , F and F , corresponding to the "rst, second and third vowel formant frequencies, and six overlapping
partitions. For a data set of size 10 with 2 clusters, this value is 2!1("511), while that of size 50 with 2 clusters is 2!1 (i.e. of the order of 10). For K-means algorithm we have "xed a maximum of 1000 iterations in case it does not terminate normally. However, it was observed that in all the experiments the K-means algorithm terminated much before 1000 iterations.
Fig. 5. Vowel data in the F }F plane.
U. Maulik, S. Bandyopadhyay / Pattern Recognition 33 (2000) 1455}1465 Table 1 M obtained by K-means algorithm for "ve di!erent initial con"gurations for Data 1 when K"2
1461
Table 4 M obtained by GA-clustering algorithm for "ve di!erent initial populations for Data 2 after 100 iterations when K"3
Initial con"guration
K-means
Initial population
GA-clustering
1 2 3 4 5
5.383132 2.225498 2.225498 5.383132 2.225498
1 2 3 4 5
51.013294 51.013294 51.013294 51.013294 51.013294
Table 2 M obtained by GA-clustering algorithm for "ve di!erent initial populations for Data 1 after 100 iterations when K"2
Table 5 M obtained by K-means algorithm for "ve di!erent initial con"gurations for Data 3 when K"9
Initial population
GA-clustering
Initial con"guration
K-means
1 2 3 4 5
2.225498 2.225498 2.225498 2.225498 2.225498
1 2 3 4 5
976.235607 976.378990 976.378990 976.564189 976.378990
Table 3 M obtained by K-means algorithm for "ve di!erent initial con"gurations for Data 2 when K"3
Table 6 M obtained by GA-clustering algorithm for "ve di!erent initial populations for Data 3 after 100 iterations when K"9
Initial con"guration
K-means
Initial population
GA-clustering
1 2 3 4 5
51.013294 64.646739 67.166768 51.013294 64.725676
1 2 3 4 5
966.350481 966.381601 966.350485 966.312576 966.354085
The results of implementation of the K-means algorithm and GA-clustering algorithm are shown, respectively, in Tables 1 and 2 for Data 1, Tables 3 and 4 for Data 2, Tables 5 and 6 for Data 3, Tables 7 and 8 for Data 4, Tables 9 and 10 for Vowel, Tables 11 and 12 for Iris and Tables 13 and 14 for Crude Oil. Both the algorithms were run for 100 simulations. For the purpose of demonstration, "ve di!erent initial con"gurations of the K-means algorithm and "ve di!erent initial populations of the GA-clustering algorithm are shown in the tables. For Data 1 (Tables 1 and 2) it is found that the GA-clustering algorithm provides the optimal value of 2.225498 in all the runs. K-means algorithm also attains this value most of the times (87% of the total runs). However in the other cases, it gets stuck at a value of 5.383132. For Data 2 (Tables 3 and 4), GA-clustering attains the best value of 51.013294 in all the runs. Kmeans, on the other hand, attains this value in 51% of the
Table 7 M obtained by K-means algorithm for "ve di!erent initial con"gurations for Data 4 when K"2 Initial con"guration
K-means
1 2 3 4 5
1246.239153 1246.239153 1246.236680 1246.239153 1246.237127
total runs, while in other runs it gets stuck at di!erent sub-optimal values. Similarly, for Data 3 (Tables 5 and 6) and Data 4 (Tables 7 and 8) the GA-clustering algorithm attains the best values of 966.312576 and 1246.218355 in 20% and 85% of the total runs, respectively. The best
1462
U. Maulik, S. Bandyopadhyay / Pattern Recognition 33 (2000) 1455}1465
Table 8 M obtained by GA-clustering algorithm for "ve di!erent initial populations for Data 4 after 100 iterations when K"2
Table 12 M obtained by GA-clustering algorithm for "ve di!erent initial populations for Iris after 100 iterations when K"3
Initial population
GA-clustering
Initial population
GA-clustering
1 2 3 4 5
1246.221381 1246.218355 1246.218355 1246.218355 1246.218355
1 2 3 4 5
97.10077 97.10077 97.10077 97.10077 97.10077
Table 9 M obtained by K-means algorithm for "ve di!erent initial con"gurations for Vowel when K"6
Table 13 M obtained by K-means algorithm for "ve di!erent initial con"gurations for Crude Oil when K"3
Initial con"guration
K-means
Initial con"guration
K-means
1 2 3 4 5
157460.164831 149394.803983 161094.118096 149373.097180 151605.600107
1 2 3 4 5
279.743216 279.743216 279.484810 279.597091 279.743216
Table 10 M obtained by GA-clustering algorithm for "ve di!erent initial populations for Vowel after 100 iterations when K"6
Table 14 M obtained by GA-clustering algorithm for "ve di!erent initial populations for Crude Oil after 100 iterations when K"3
Initial population
GA-clustering
Initial population
GA-clustering
1 2 3 4 5
149346.490128 149406.851288 149346.152189 149355.823103 149362.780998
1 2 3 4 5
278.965150 278.965150 278.965150 278.965150 278.965150
Table 11 M obtained by K-means algorithm for "ve di!erent initial con"gurations for Iris when K"3 Initial con"guration
K-means
1 2 3 4 5
97.224869 97.204574 122.946353 124.022373 97.204574
values provided by the K-means algorithm for these data sets are 976.235607 (obtained in 20% of the total runs) and 1246.236680 (obtained in 25% of the total runs), respectively, Notably, even the worst values obtained by the GA-clustering algorithm are better than the best
values provided by the K-means algorithm for these data sets. For Vowel Data, (Tables 9 and 10), the K-means algorithm attains the best value of 149373.097180 only once (out of 100 runs). The best value obtained by GA-clustering algorithm is 149346.152189 (which is obtained in 18% of the total runs). The best value obtained by the latter is better than that obtained by K-means algorithm. Notably, the latter attains values less than 150000 in all the runs, while the former attains values greater than this in the majority of its runs. For Iris (Tables 11 and 12) and Crude Oil (Tables 13 and 14) data sets, the GA-clustering algorithm again attains the best values of 97.10077 and 278.965150 respectively in all the runs. The K-means algorithm, on the other hand, fails to attain this value in any of its runs. The best that K-means algorithm achieved are 97.204574
U. Maulik, S. Bandyopadhyay / Pattern Recognition 33 (2000) 1455}1465 Table 15 M obtained by GA-clustering algorithm for "ve di!erent initial populations for Vowel after 500 iterations when K"6 Initial population
GA-clustering
1 2 3 4 5
149344.229245 149370.762900 149342.990377 149352.289363 149362.661869
(reached 60% of the times) and 279.484810 (reached 30% of the times), respectively. From Tables 9 and 10 for Vowel, it is found that unlike the other cases, GA-clustering algorithm attains one value (149406.851288) that is poorer than the best value of K-means algorithm (149373.097180). In order to investigate whether the GA-clustering algorithm can improve its clustering performance, it was executed upto 500 iterations (rather than 100 iterations as was done previously). The results are shown in Table 15. As expected, it is found that the performance of GA-clustering improves. The best value that it now attains is 149342.990377 and the worst is 149370.762900, both of which are better than those obtained after 100 iterations. Moreover, now its performance in all the runs is better than the performance of K-means algorithm for any of the 100 runs.
5. Discussion and conclusions A genetic algorithm-based clustering algorithm, called GA-clustering, has been developed in this article. Genetic algorithm has been used to search for the cluster centres which minimize the clustering metric M. In order to demonstrate the e!ectiveness of the GA-clustering algorithm in providing optimal clusters, several arti"cial and real life data data sets with the number of dimensions ranging from two to ten and the number of clusters ranging from two to nine have been considered. The results show that the GA-clustering algorithm provides a performance that is signi"cantly superior to that of the K-means algorithm, a very widely used clustering technique. Floating-point representation of chromosomes has been adopted in this article, since it is conceptually closest to the problem space and provides a straight forward way of mapping from the encoded cluster centres to the actual ones. In this context, a binary representation may be implemented for the same problem, and the results may be compared with the present #oating point form. Such an investigation is currently being performed.
1463
Note that the clustering metric M that the GA attempts to minimize is given by the sum of the absolute Euclidean distances of each point from their respective cluster centres. We have also implemented the same algorithm by using the sum of the squared Euclidean distances as the minimizing criterion. The same conclusions as obtained in this article are still found to hold good. It has been proved in Ref. [23] that an elitist model of GAs will de"nitely provide the optimal string as the number of iterations goes to in"nity, provided the probability of going from any population to the one containing the optimal string is greater than zero. Note that this has been proved for nonzero mutation probability values and is independent of the probability of crossover. However, since the rate of convergence to the optimal string will de"nitely depend on these parameters, a proper choice of these values is imperative for the good performance of the algorithm. Note that the mutation operator as used in this article also allows nonzero probability of going from any string to any other string. Therefore, our GA-clustering algorithm will also provide the optimal clusters as the number of iterations goes to in"nity. Such a formal theoretical proof is currently being developed that will e!ectively serve as a theoretical proof of the optimality of the clusters provided by the GA-clustering algorithm. However, it is imperative to once again realize that for practical purposes a proper choice of the genetic parameters, which may possibly be kept adaptive, is crucial for a good performance of the algorithm. In this context, one may note that although the K-means algorithm got stuck at sub-optimal solutions, even for the simple data sets, GA-clustering algorithm did not exhibit any such unwanted behaviour.
6. Summary Clustering is an important unsupervised classi"cation technique where a set of patterns, usually vectors in a multi-dimensional space, are grouped into clusters in such a way that patterns in the same cluster are similar in some sense and patterns in di!erent clusters are dissimilar in the same sense. For this it is necessary to "rst de"ne a measure of similarity which will establish a rule for assigning patterns to the domain of a particular cluster centre. One such measure of similarity may be the Euclidean distance D between two patterns x and z by D"""x!z"". Smaller the distance between x and z, greater is the similarity between the two and vice versa. An intuitively simple and e!ective clustering technique is the well-known K-means algorithm. However, it is known that the K-means algorithm may get stuck at suboptimal solutions, depending on the choice of the initial cluster centres. In this article, we propose a solution to the clustering problem where genetic algorithms (GAs) are used for searching for the appropriate cluster
1464
U. Maulik, S. Bandyopadhyay / Pattern Recognition 33 (2000) 1455}1465
centres such that a given metric is optimized. GAs are randomized search and optimization techniques guided by the principles of evolution and natural genetics, and having a large amount of implicit parallelism. GAs perform search in complex, large and multimodal landscapes, and provide near optimal solutions for objective or "tness function of an optimization problem. It is known that elitist model of GAs provide the optimal string as the number of iterations goes to in"nity when the probability of going from any population to the one containing the optimal string is greater than zero. Therefore, under limiting conditions, a GA-based clustering technique is also expected to provide an optimal clustering with respect to the clustering metric being considered. In order to demonstrate the e!ectiveness of the GAbased clustering algorithm in providing optimal clusters, several arti"cial and real-life data sets with the number of dimensions ranging from two to ten and the number of clusters ranging from two to nine have been considered. The results show that the GA-clustering algorithm provides a performance that is signi"cantly superior to that of the K-means algorithm for these data sets.
Acknowledgements This work was carried out when Ms. Sanghamitra Bandyopadhyay held the Dr. K. S. Krishnan fellowship awarded by the Department of Atomic Energy, Govt. of India. The authors acknowledge the reviewer whose valuable comments helped immensely in improving the quality of the article.
References [1] D.E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning, Addison-Wesley, New York, 1989. [2] L. Davis (Ed.), Handbook of Genetic Algorithms, Van Nostrand Reinhold, New York, 1991. [3] Z. Michalewicz, Genetic Algorithms#Data Structures" Evolution Programs, Springer, New York, 1992. [4] J.L.R. Filho, P.C. Treleaven, C. Alippi, Genetic algorithm programming environments, IEEE Comput. 27 (1994) 28}43. [5] S.K. Pal, D. Bhandari, Selection of optimal set of weights in a layered network using genetic algorithms, Inform. Sci. 80 (1994) 213}234. [6] S.K. Pal, D. Bhandari, M.K. Kundu, Genetic algorithms for optimal image enhancement, Pattern Recognition Lett. 15 (1994) 261}271. [7] D. Whitley, T. Starkweather, C. Bogart, Genetic algorithms and neural networks: optimizing connections and connectivity, Parallel Comput. 14 (1990) 347}361. [8] R.K. Belew, J.B. Booker (Eds.), Proceedings of the Fourth International Conference on Genetic Algorithms, Morgan Kaufmann, San Mateo, 1991.
[9] S. Forrest (Ed.), Proceedings of the Fifth International Conference Genetic Algorithms, Morgan Kaufmann, San Mateo, 1993 [10] L.J. Eshelman (Ed.), Proceedings of the Sixth International Conference Genetic Algorithms, Morgan Kaufmann, San Mateo, 1995. [11] E.S. Gelsema (Ed.), Special Issue on Genetic Algorithms, Pattern Recognition Letters, vol. 16(8), Elsevier Sciences Inc., Amsterdam, 1995. [12] S.K. Pal, P.P. Wang (Eds.), Genetic Algorithms for Pattern Recognition, CRC Press, Boca Raton, 1996. [13] S. Bandyopadhyay, C.A. Murthy, S.K. Pal, Pattern classi"cation using genetic algorithms, Pattern Recognition Lett. 16 (1995) 801}808. [14] S.K. Pal, S. Bandyopadhyay, C.A. Murthy, Genetic algorithms for generation of class boundaries, IEEE Trans. Systems, Man Cybernet. 28 (1998) 816}828. [15] M.R. Anderberg, Cluster Analysis for Application, Academic Press, New York, 1973. [16] J.A. Hartigan, Clustering Algorithms, Wiley, New York, 1975. [17] P.A. Devijver, J. Kittler, Pattern Recognition: A Statistical Approach, Prentice-Hall, London, 1982. [18] A.K. Jain, R.C. Dubes, Algorithms for Clustering Data, Prentice-Hall, Englewood Cli!s, NJ, 1988. [19] J.T. Tou, R.C. Gonzalez, Pattern Recognition Principles, Addison-Wesley, Reading, 1974. [20] K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, New York, 1990. [21] R.C. Dubes, A.K. Jain, Clustering techniques: the user's dilemma, Pattern Recognition 8 (1976) 247}260. [22] S.Z. Selim, M.A. Ismail, K-means type algorithms: a generalized convergence theorem and characterization of local optimality, IEEE Trans. Pattern Anal. Mach. Intell. 6 (1984) 81}87. [23] D. Bhandari, C.A. Murthy, S.K. Pal, Genetic Algorithm with elitist model and its convergence, Int. J. Pattern Recognition Artif. Intell. 10 (1996) 731}747. [24] A. Homaifar, S.H.Y. Lai, X. Qi, Constrained Optimization via genetic algorithms, Simulation 62 (1994) 242}254. [25] C.Z. Janikow, Z. Michalewicz, An experimental comparison of binary and #oating point representations in genetic algorithms, in: R.K. Belew, J.B. Booker (Eds.), Proceedings of the Fourth International Conference Genetic Algorithms, Morgan Kaufmann, San Mateo, 1991, pp. 31}36. [26] W.L.G. Koontz, P.M. Narendra, K. Fukunaga, A branch and bound clustering algorithm, IEEE Trans. Comput. C-24 (1975) 908}915. [27] J.H. Wolfe, Pattern Clustering by multivariate mixture analysis, Multivariate Behav. Res. 5 (1970) 329}350. [28] W.L.G. Koontz, P.M. Narendra, K. Fukunaga, A graph theoretic approach to nonparametic cluster analysis, IEEE Trans. Comput. C-25 (1975) 936}944. [29] H. Spath, Cluster Analysis Algorithms, Ellis Horwood, Chichester, UK, 1989. [30] S.K. Pal, D.D. Majumder, Fuzzy sets and decision making approaches in vowel and speaker recognition, IEEE Trans. Systems, Man Cybernet. SMC-7 (1977) 625}629. [31] R.A. Fisher, The use of multiple measurements in taxonomic problems, Ann. Eugenics 3 (1936) 179}188. [32] R.A. Johnson, D.W. Wichern, Applied Multivariate Statistical Analysis, Prentice-Hall, Englewood Cli!s, NJ, 1982.
U. Maulik, S. Bandyopadhyay / Pattern Recognition 33 (2000) 1455}1465
1465
About the Author*UJJWAL MAULIK did his Bachelors in Physics and Computer Science in 1986 and 1989, respectively. Subsequently, he did his Masters and Ph.D in Computer Science in 1991 and 1997, respectively, from Jadavpur University, India. Dr. Maulik has visited Center for Adaptive Systems Applications, Los Alamos, New Mexico, USA in 1997. He is currently the Head of the Department of Computer Science, Kalyani Engineering College, India. His research interests include Parallel Processing and Interconnection Networks, Natural Language Processing, Evolutionary Computation and Pattern Recognition. About the Author*SANGHAMITRA BANDYOPADHYAY did her Bachelors in Physics and computer Science in 1988 and 1991, respectively. Subsequently, she did her Masters in Computer Science from Indian Institute of Technology, Kharagpur in 1993 and Ph.D in Computer Science from Indian Statistical Institute, Calcutta in 1998. Dr. Bandyopadhyay is the recipient of Dr. Shanker Dayal Sharma Gold Medal and Institute Silver Medal for being adjudged the best all round post graduate performer in 1993. She has visited Los Alamos National Laboratory in 1997. She is currently on a post doctoral assignment in University of New South Wales, Sydney, Australia. Her research interests include Evolutionary Computation, Pattern Recognition, Parallel Processing and Interconnection Networks.
Pattern Recognition 33 (2000) 1467}1474
An apparent simplicity appearing in pattern classi"cation problems Manabu Ichino*, Hiroyuki Yaguchi Tokyo Denki University, Hatoyama, Saitama 350-03, Japan Received 28 September 1998; accepted 1 June 1999
Abstract An apparent simplicity appearing in the classi"cation problems is described in order to assert the importance of the feature selection. If we assume that the sample size is "nite, then, by increasing the number of features used to describe each sample pattern, we can improve the interclass distinguishability. On the other hand, increasing the number of features reduces the generality of class descriptions. Therefore, the enhanced distinguishability may be an apparent simplicity appearing in the interclass structure. We illustrate theoretically and experimentally the existence of the trade-o! between the interclass distinguishability and the generality of class descriptions. 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Classi"cation; Recognition; Feature selection; Distinguishability; Generality; Relative neighborhood graph; Mutual neighborhood graph
1. Introduction In many actual pattern classi"cation problems, the class distinguishability can be enhanced by increasing the number of features used to describe the sample patterns even if the features are only locally e!ective. In the simple two-dimensional examples in Fig. 1, eight samples are available for each pattern class. In example (a), only feature X is e!ective to discriminate between the two pattern classes. Several linear classi"ers [1,2] and the partitioning methods [3,4] may be applicable in order to solve this problem. Since feature X is clearly ine!ective, simpler, and thus more general class descriptions, are achieved by removing this feature. In example (b), only feature X is e!ective again to discriminate between the two pattern classes. Linear classi"ers are not applicable, but the partitioning methods [3,4] are still usable to solve this problem. Feature X should be removed again
* Corresponding author. Tel.: 81-492-96-2911; fax: 81-492-965060. E-mail address:
[email protected] (M. Ichino).
to achieve better class descriptions. In example (c), X and X are necessary to discriminate between the two pattern classes. However, we can make a decision rule which is based on alternative uses of two features. In examples (d) and (e), the simultaneous use of the two features is necessary to discriminate between the two pattern classes. Example (d) is known as a two-dimensional XOR problem. Example (e) is an extension of an XOR problem. Clearly, (e) is harder than (d), because we have to use more local descriptions. Several region oriented methods [5}11] are possible to solve these problems. It is important to note that a precise description of class discrimination can be achieved by using very locally e!ective features under the assumption of a given "nite number of design samples. However, the generality of class descriptions obtained may be very poor for independent test samples. Evidence to support this is found in the following problem we often encounter: if the performance obtained with a given set of features is inadequate, it is reasonable to believe that the performance will improve with the addition of information o!ered by more features. Unfortunately, in practice, above a certain number of features, the inclusion of additional features leads to even worse classi"cation results. To overcome
0031-3203/00/$20.00 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 1 2 8 - 4
1468
M. Ichino, H. Yaguchi / Pattern Recognition 33 (2000) 1467}1474
Fig. 1. Two-dimensional examples.
this drawback, it is important to select an optimal subset of features from the global set of features, and to strike a balance between the interclass distinguishability and the generality of class descriptions. Much research has been done on feature selection [1,2,7]. Among this, several papers have attempted to
assert why feature selection is very important and necessary under the assumption that the sample size is "nite. For example, Watanabe [12,13] presented his Theorem of the Ugly Duckling in the frame work of predicate logic. Hughes [14] showed the peaking phenomenon appearing in the classi"cation performance.
M. Ichino, H. Yaguchi / Pattern Recognition 33 (2000) 1467}1474
The purpose of this paper is to present a theory that asserts again the importance of feature selection under the assumption that (1) we have two pattern classes, (2) each sample pattern is represented as a vector composed of d feature values in a d-dimensional Euclidean space (the feature space), and (3) the number of sample patterns given for each pattern class is "nite. In Section 2, we present the relative neighborhood graph (RNG) and the mutual neighborhood graph (MNG) de"ned in subspaces of the feature space. The RNG for a pattern class describes an intraclass structure based on the given sample patterns, and it exhibits the generality of class description with respect to a selected set of features. By increasing the number of features to describe sample patterns, the RNG approaches a complete graph that asserts that the generality of class description becomes minimum. On the other hand, the MNG describes an interclass structure and exhibits distinguishability between the two pattern classes based on a selected set of features. By increasing the number of features used to describe sample patterns, the MNG approaches a complete graph that asserts that complete distinguishability between these classes is achieved. The trade-o! between the interclass distinguishability and the generality of class descriptions is summarized as the Apparent Simplicity Theorem in Section 3. The preliminary form of this theorem was reported by the authors [11,15] in the framework of a mathematical model which can manipulate symbolic data [16]. However, we restate the theorem in an improved form by restricting the feature space as a ddimensional Euclidean space. Section 4 consists of arti"cial examples to illustrate our theorem. Section 5 is the summary.
2. Neighborhood graphs 2.1. Sample patterns Let x be a feature value of kth feature X , k"1, 2, I I 2, d, and a sample pattern be denoted as a vector x"(x , x ,2, x ) B
(1)
in a d-dimensional Euclidean space RB which is called the feature space. A vector x is also called a feature vector. In the following discussion, we treat various subsets of the given set of features. To clarify this, let F be the set of feature numbers given by F "+1, 2,2, d,,
(2)
and be called the feature set. For a feature subset F"+k , k ,2, k , of F , a feature vector may be given K as x"(x , x ,2, x ). I I IK
(3)
1469
We assume two pattern classes u and u . Each pat tern class, for example u , has N training samples repreG G sented by the feature set F as follows. u "+x , x ,2, x G,, i"1, 2. (4) G G G G, In the following discussion, we assume that all N #N samples are mutually di!erent with respect to the feature set F . 2.2. Generality and distinguishability For rth feature values of a pair of samples, x and GN x in u , we de"ne a closed interval on the rth feature axis GO G by I(x , x ) "[min(x , x ), max(x , x )], GN GO P GNP GOP GNP GOP r"1, 2,2, d,
(5)
where min(a, b) and max(a, b) are the operators which take the minimum and the maximum values of a and b, respectively. Then, based on the closed intervals for a feature subset F"+k , k ,2, k , of F , we de"ne a rectan K gular region REC¹(x , x "F) in R as the Cartesian GN GO product set: REC¹(x , x "F)"I(x , x ) ;I(x , x ) ;2 GN GO GN GO I GN GO I ;I(x , x ) . (6) GN GO IK Under a feature subset F, let a (p, q"F) be the number of G samples of u which are included in REC¹(x , x "F), G GN GO where we assume that the number a (p, q"F) excludes two G samples x and x which span the rectangular region GN GO based on F. On the other hand, let b (p, q"F) be the G number of opposite class samples which are included in REC¹(x , x "F). Then, we de"ne a measure of the generGN GO ality of REC¹(x , x "F) as a region to describe class GN GO u by G Gen (p, q"F)"a (p, q"F)/(N !2), (7) G G G and a measure of the distinguishability of REC¹(x , GN x "F) from the other class u by GO H Dis (p, q"F)"1!b (p, q"F)/N , (8) G G H where it is clear that 0)Gen (p, q"F), Dis (p, q"F))1, (9) G G The smaller the value of Gen (p, q"F), the lower the generG ality is as a region to describe class u . On the other hand, G the greater the value Dis (p, q"F), the higher the distinG guishability of REC¹(x , x "F) from the other class GN GO u with respect to what F is. The generality becomes H lowest when Gen (p, q"F)"0, while perfect distinguishaG bility is achieved when Dis (p, q"F)"1. G Fig. 2 illustrates the generality and distinguishability in the Euclidean plane. In this "gure, we have eight samples for class u and seven samples for class u . In the feature
1470
M. Ichino, H. Yaguchi / Pattern Recognition 33 (2000) 1467}1474
able by the addition of another feature X , although c is included in REC¹(a, b"+2,). This special property is not valid for other types of regions. For example, Fig. 3(b) illustrates the circular region spanned by a and b. Our desired property is no longer valid for this type of region. We now summarize the monotone property for the generality and distinguishability as the following theorem. Theorem 1 (Monotone Property). Let x and x , pOq, GN GO be a pair of samples in u , and let F and F be two feature G subsets such that FLF. Then, we have
Fig. 2. Illustration of generality and distinguishability.
Gen (p, q"F))Gen (p, q"F), Dis (p, q"F))Dis (p, q"F). G G G G (13) 2.3. Relative neighborhood graph and mutual neighborhood graph
Fig. 3. Illustrations of the monotone property.
subspace by X , i.e. F"+1,, REC¹(x , x "+1,) includes N O 2 u samples and 3 u samples. Hence, we have Gen (p, q"+1,)"2/(8!2)"1/3, Dis (p, q"+1,) "1!3/7"4/7.
(10)
In the feature subspace by X , i.e. F"+2,, we have Gen (p, q"+2,)"1/(8!2)"1/6, Dis (p, q"+2,) "1!2/7"5/7.
Let x and x , pOq, be a pair of samples in u . Then, GN GO G if Gen (p, q"F)"0, we can say that the two samples G x and x of u are relative neighbors with respect to GN GO G feature subset F. The relative neighborhood graph [8], written RNG(u "F), is a graph constructed by joining all G pairs of relative neighbors in u with respect to feature G subset F. The relative neighborhood graph RNG(u "F) G yields relative relationships between sample patterns in u under the feature subset F. Fig. 4(a) illustrates the G relative neighborhood graph in the Euclidean plane. We have omitted edges where each sample pattern is its own relative neighbor. Let x and x , pOq, be a pair of samples in u . Then, GN GO G if Dis (p, q"F)"1, we can say that the two samples x G GN and x in u are mutual neighbors against the other class GO G u with respect to feature subset F. The mutual neighborH hood graph [8], written MNG(u : u "F), is a graph conG H structed by joining all pairs of sample patterns, in u , G which are mutual neighbors against the other class u with respect to feature subset F. The mutual neighborH hood graph describes the interclass structure between two pattern classes. Fig. 4(b) illustrates the MNG in the
(11)
On the other hand, in the feature space by X and X , i.e. F"+1, 2,, we have Gen (p, q"+1, 2,)"0, Dis (p, q"+1, 2,)"6/7.
(12)
From this example, it is clear that the generality decreases monotonically and the distinguishability increases monotonically by the addition of features to a feature subset F. These monotone properties are based on the use of rectangular regions. The reason may be illustrated by Fig. 3(a). In this "gure, sample pattern c is excluded from the rectangular region (i.e. an interval) REC¹(a, b"+1,) spanned by sample patterns a and b. And the exclusion of c from the rectangular region REC¹(a, b"+1, 2,) is invari-
Fig. 4. Illustrations of the relative neighborhood graph and mutual neighborhood graph.
M. Ichino, H. Yaguchi / Pattern Recognition 33 (2000) 1467}1474
1471
for any feature subset F such that FLF. This asserts the invariability of relative neighbors. Similarly, let a pair of samples x and x , pOq, in u , be mutual neighbors GN GO G against u with respect to F. Then, again from Theorem 1, H we see that Dis (p, q"F)"Dis (p, q"F)"1 for any feature G G subset F such that FLF. This asserts the invariability of mutual neighbors. 䊐 3.2. Apparent simplicity theorem
Fig. 5. Illustration of the MNG and the silhouettes.
Euclidean plane. Again, we have omitted edges where each sample pattern is its own mutual neighbor. When two pattern classes are well separated under feature subset F (see Fig. 4), the MNG becomes a complete graph or a near complete graph (e.g. MNG(u : u "F) and MNG(u : u "F) in Fig. 4(b)). On the other hand, the number of edges of the MNG decreases according to the closeness of the two pattern classes (e.g. MNG(u : u "F) in Fig. 5(a)). The shaded regions in Fig. 5(b) are called the silhouettes of the pattern classes. The silhouette S(u "F) G describes a region for pattern class u in the feature space G RB, and is de"ned by S(u "F)"8 8 REC¹(x , x "F), G GN GO N O
(14)
where the union is taken for all mutual neighbor pairs in u against the other class u under feature subset F. It G H should be noted that the MNG and the silhouette are descriptions for a pattern class from the viewpoint of relativity to the other class. However, the MNG is an abstract mathematical description, while the silhouette is an actual description in the feature subspace.
3. Apparent simplicity theorem 3.1. Theorem of `As the boy, so the mana We assume again the data sets in (4) under the feature set F . Then we have the following theorem: Theorem 2 (As the boy, so the man). Once the properties of relative neighbors and mutual neighbors are obtained for a feature subset F of F , the properties are invariable even with the addition of new features to F.
If we assume that each pattern class has a "nite sample size and that, for each pair x and x in u , there exists GN GO G a set of features by which the rectangular region REC¹(x , x "F) excludes any other sample x in u GN GO GI G and any sample x in the other class u , we obtain the HI H following theorem: Theorem 3 (Apparent Simplicity Theorem). By adding features appropriately to a feature subset F we can obtain the following facts: (1) The generality Gen (p, q"F) becomes zero and the distinG guishability Dis (p, q"F) becomes one for each pair G x and x in u ; GN GO G (2) The RNG(u "F) and the MNG(u : u "F) become comG G H plete graphs; and (3) The silhouette S(u "F) discriminates perfectly all G sample patterns from the other class u , but has the H minimum generality as a description for u . G Proof. The properties (1) and thus (2) are direct conclusions from the theorem of As the boy, so the man. Then, property (3) is derived from (1) and (2). 䊐 From this theorem we see that the silhouette S(u "F) G forms a connected single cluster and never includes any sample in the other class u , since all sample pairs in H u are mutual neighbors. Furthermore, the silhouette G S(u "F) yields a very sparse and shallow description of G u and only a very poor covering ability even for other G training samples in u , since all sample pairs in class G u are also relative neighbors. Therefore, the simplicity of G the interclass structure obtained here is super"cial and is an `apparent simplicitya. Thus, we should select globally e+ective features in order to assure a realistic classi"cation performance. As an e$cient method for feature selection, Ichino [9,10] presented a method based on the modi"ed zero}one integer programming.
4. Arti5cial examples 4.1. Example 1: a minimum distance classixer
Proof. Let a pair of samples x and x , pOq, in u , be GN GO G relative neighbors for a feature subset F. Then, from Theorem 1, we see that Gen (p, q"F)"Gen (p, q"F)"0 G G
For each pattern class u , i"1, 2, we prepared N G d-dimensional sample patterns in such a way that each
1472
M. Ichino, H. Yaguchi / Pattern Recognition 33 (2000) 1467}1474
Fig. 6. Results for Example 1. Fig. 7. Results for Example 2.
component of each sample is a randomly generated number in the unit interval [0, 1]. Therefore, two pattern classes have the same d-dimensional uniform distributions. We calculated the mean vector for each pattern class and applied the minimum distance classixer to N design samples and independent N test samples of class u by changing the dimensionality of each sample from one to d. We repeated the experiment two hundred times, and obtained the average correct classi"cation rates. In this experiment, we chose N"10, 20 and d"11. Fig. 6 summarizes our experimental results. In this "gure, the correct classi"cation rates for test sets take almost the same values around 0.5 regardless of the number of features used. However, for the design sets, the correct classi"cation rates are gradually improved by adding new features. This tendency becomes clear when the sample size is small compared to the dimensionality of the samples. This is exactly the apparent simplicity phenomenon we dealt with above.
maximum number C "124 750 at around 11 features. , This is a remarkable fact. The silhouettes S(u "F) and S(u "F) may mutually overlap, but they never include any sample from their counter pattern class. Therefore, we achieved a complete distinguishability between the classes in terms of our design sets, although it is only apparent simplicity from the viewpoint of assumed class distributions. Furthermore, the silhouette S(u "F) inI cludes all samples of u , but each rectangular region I spanned by mutual neighbor pair of samples in u never I includes other samples of u . Therefore, the silhouette I S(u "F) has the minimum generality as a description of I u . In fact, for new sample patterns, independent from I the design samples, two silhouettes S(u "F) and S(u "F) should have the same capability of covering the patterns. This example asserts that we have to use globally ewective features to discriminate pattern classes. 4.3. Example 3
4.2. Example 2: Gaussian data We generated 2N 15-dimensional Gaussian samples, in which 15 features are mutually independent and identically distributed with the zero mean and the unit variance. As values of N, in the following experiment, we chose "ve numbers, 50, 100, 200, 300, and 500. We divided 2N samples randomly into two sets of N samples. These two sets were used as the design sets for pattern classes u and u . Therefore, two pattern classes have completely overlapping distributions in the 15-dimensional feature space. The Apperent simplicity theorem asserts that, if we "x N and increase the number of features used as a feature subset F taken from the given 15 features, MNG(u : u "F) (MNG(u : u "F)) and RNG(u "F) (RNG(u "F)) approach complete graphs, because their number of edges approach the maximum number C . , Fig. 7 summarizes our experimental results. For example, when N"500, by increasing the number of features in F, the numbers of edges of MNGs and RNGs approach the
This example is also based on Gaussian distributions. For each pattern class, we prepared 50 d-dimensional design samples, where d features were mutually independent and identically distributed with the unit variance. The mean value of the "rst feature X is zero in u and 4.0 in u . All other features have zero means in u and u . Therefore, only the "rst feature is useful to discriminate u and u . Fig. 8 illustrates class distributions in the "rst xve-dimensional subspace in our feature space.
Fig. 8. Class conditional distributions for Example 3.
M. Ichino, H. Yaguchi / Pattern Recognition 33 (2000) 1467}1474
1473
Acknowledgements The authors would like to thank Professor Edwin Diday for his helpful discussions.
References
Fig. 9. Results of Example 3.
Fig. 9 summarizes our experimental results. For each number of features, we counted the number of samples of u and calculated the generality for each mutual neighbor (MN) pair of u against u . Then, as the value of the vertical axis of this 3-D graph, we counted the number of MN pairs that have the same generality to describe class u . The longest horizontal line shows the existence of a MN pair which has the largest generality in the feature subspace spanned by X . Then, the addition of features decreases the generality of MN pairs and increases the number of MN pairs which have low generality. This fact is shown as the changes of horizontal and vertical lines. This example asserts again the importance of feature selection in order to achieve a proper classi"cation performance.
5. Concluding remarks This paper described the Apparent simplicity theorem which asserts that we should select only `globally e!ective featuresa in order to strike a balance between the interclass distinguishability and the generality of class descriptions. By using simple arti"cial examples, we illustrated our theorem and the importance of feature selection in order to achieve better classi"cation performances.
[1] P.A. Devijver, J. Kittler, Pattern Recognition: A Statistical Approach, Prentice-Hall, London, 1982. [2] S.T. Bow, Pattern Recognition and Image Preprocessing, Marcel Dekker, New York, 1992. [3] E.G. Henrichon Jr., K.S. Fu, A nonparametric partitioning procedure for pattern classi"cation, IEEE Trans. Comput. C-18 (1969) 614}624. [4] J.H. Friedman, A recursive partitioning decision rule for nonparametric classi"cation, IEEE Trans. Comput. C-26 (1977) 404}408. [5] R.S. Michalski, Pattern recognition as rule-guided inductive inference, IEEE Trans. Pattern Anal. Mach. Intell. PAMI-2 (1980) 349}361. [6] M. Ichino, A nonparametric multiclass pattern classi"er, IEEE Trans. Systems. Man Cybernet. SMC-9 (1979) 345}352. [7] M. Ichino, Nonparametric feature selection method based on local interclass structure, IEEE Trans. Systems. Man, Cybernet. SMC-11 (1981) 289}296. [8] M. Ichino, J. Sklansky, The relative neighborhood graph for mixed feature variables, Pattern Recognition 18 (2) (1985) 161}167. [9] M. Ichino, Pattern classi"cation based on the Cartesian join system: a general tool for feature selection, in: Proceedings of the IEEE International Conference on Systems Man, Cybernet, Atlanta, 1986, pp. 1420}1424. [10] M. Ichino, A general pattern classi"cation method for mixed feature problems, Trans. IEICE Japan J-71-D (1988) 92}101 (in Japanese). [11] M. Ichino, Feature selection for symbolic data classi"cation, in: E. Diday, et al., (Eds.), New Approaches in Classi"cation and Data Analysis, Springer, Berlin, 1994. [12] S. Watanabe, Knowing and Guessing, Wiley, New York, 1969. [13] S. Watanabe, Pattern Recognition: Human and Mechanical, Wiley-Interscience, New York, 1985. [14] G.F. Hughes, On the mean accuracy of statistical pattern recognizers, IEEE Trans. Inform Theory IT-14 (1968) 55}63. [15] M. Ichino, H. Yaguchi, Symbolic pattern classi"ers based on the Cartesian system model, in: C. Hayashi, et al., (Eds.), Proceedings of IFCS-96: Data Science, Classi"cation, and Related Methods, Springer, Berlin, 1997. [16] E. Diday, The symbolic approach in clustering, in: H.H. Bock (Ed.), Classi"cation and Related Methods of Data Analysis, Elsevier, Amsterdam, 1988.
About the Author*MANABU ICHINO was born in Hokkaido, Japan, on 20 February 1944. He received B.E., M.E., and Ph.D. degrees from Tokyo Denki University, Tokyo, Japan, in 1967, 1970 and 1974. He joined Tokyo Denki University in 1967, where he is now a Professor of the Department of Computer and Systems Engineering. During 1982}1983, he was a Visiting Associate Professor of the Department of Electrical Engineering at the University of California, Irvine. His current research is centered on the development of feature selection methods for pattern recognition and symbolic data analysis. Dr. Ichino is a member of the IEEE, the Classi"cation Society of North America, the Pattern Recognition Society, the IEICE Japan and the IPS Japan.
1474
M. Ichino, H. Yaguchi / Pattern Recognition 33 (2000) 1467}1474
About the Author*HIROYUKI YAGUCHI was born in Ibaraki, Japan on 26 May 1963. He received B.E., M.E., and Ph.D. degrees from Tokyo Denki University, in 1986, 1988 and 1996. He joined the Tokyo Denki University in 1988, where he is now an Assistant Professor in the Department of Computer and Systems Engineering. Dr. Yaguchi is a member of the IEICE Japan.
Pattern Recognition 33 (2000) 1475}1485
Combining multiple classi"ers by averaging or by multiplying? David M.J. Tax *, Martijn van Breukelen , Robert P.W. Duin , Josef Kittler Pattern Recognition Group, Department of Applied Physics, Delft University of Technology, Lorentzweg 1, 2628 CJ Delft, The Netherlands Department of Electronic and Electrical Engineering, University of Surrey, Guildford, Surrey GU2 5XH, UK Received 25 January 1999; received in revised form 1 June 1999; accepted 1 June 1999
Abstract In classi"cation tasks it may be wise to combine observations from di!erent sources. Not only it decreases the training time but it can also increase the robustness and the performance of the classi"cation. Combining is often done by just (weighted) averaging of the outputs of the di!erent classi"ers. Using equal weights for all classi"ers then results in the mean combination rule. This works very well in practice, but the combination strategy lacks a fundamental basis as it cannot readily be derived from the joint probabilities. This contrasts with the product combination rule which can be obtained from the joint probability under the assumption of independency. In this paper we will show di!erences and similarities between this mean combination rule and the product combination rule in theory and in practice. 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Combination of classi"ers; Classi"er fusion; Neural networks; Handwritten digits recognition; Pattern recognition
1. Introduction Sometimes observations from di!erent independent sources are available for a classi"cation task. Instead of training one large classi"er on all data, it may be wise to combine just the outputs of the smaller classi"ers trained on the individual data sources. Even when only one data source is available, a set of classi"ers of di!erent complexity and trained with di!erent training algorithms can be designed and combined. It not only decreases the training time but also can increase the robustness and the performance of the classi"cation [1]. A large number of combining schemes for classi"cation exists. In general three types of situations in combining classi"ers can be identi"ed [2]. In the "rst type each classi"er outputs a single class label and these labels have to be combined [3]. In the second type the classi"ers output sets of class labels ranked in the order of likelihood [4] and the third type involves the combination of real valued outputs for each class by the respective classi* Corresponding author. Tel.: 00-31-(0)15-278-1845. E-mail address:
[email protected] (D.M.J. Tax). Now with TNO, Institute of Applied Physics, Delft, The Netherlands.
"ers (most often posterior probabilities [5], sometimes evidences [6]). Commonly a combined decision is obtained by just averaging the estimated posterior probabilities. This simple algorithm already gives very good results [7,8]. This result is somewhat surprising, especially considering the fact that averaging of the posterior probabilities is not based on some solid (Bayesian) foundation. When the Bayes theorem is adopted for the combination of di!erent classi"ers, a product combination rule automatically appears under the assumption of independence: the outputs of the individual classi"ers are multiplied and then normalized (this is also called a logarithmic opinion pool [9]). In Refs. [10,11] a theoretical framework for combining classi"ers is developed. For di!erent types of combination rules, under which minimum and maximum rules, weighted averages, mean and product, derivations were obtained. When classi"ers are used on identical data representations, the classi"ers estimate the same class posterior probability. To suppress the errors in the estimates, the classi"er outputs should be averaged. On the other hand, when independent data representations are available, classi"er outcomes should be multiplied to gain maximally from the independent representations. In
0031-3203/00/$20.00 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 1 3 8 - 7
1476
D.M.J. Tax et al. / Pattern Recognition 33 (2000) 1475}1485
Ref. [12] comparisons between the average combination rule and the product combination rule are made. It was con"rmed that when independent data sets were available, the product combination rule should be used. Only in case of poor posteriori probability estimates, the more fault tolerant mean combination rule is to be used. In this paper we summarize the di!erences and similarities between these two classi"er combination rules. The combination rules will be derived from the posterior probabilities estimated by di!erent classi"ers and we will show that for two-class problems there is no di!erence between the two combination rules, independent of the fact if independent or dependent data representations are used. In multi-class problems the product combination rule is shown to be superior to the mean combination rule, only more noise sensitive. Also the use of rejection of objects which are classi"ed with low con"dence by the combination rules, will be shown. The problem which remains is when to use which combination rule. This choice depends on the fact if the problem is a two- or a multi-class problem, if independent representations are available and how well the classi"ers estimate the posterior probabilities. In Section 2 some ways to obtain posterior probabilities are discussed. The derivation of the two combination rules is shown in Section 3. In Section 4 di!erences between the mean combination and product combination rules are investigated in case of two- or multi-class problems, in case of very noisy posterior probabilities and in case of rejection of objects. Section 5 will show how this theory translates into practice. The last section will discuss the results.
In this paper four di!erent classi"ers are used. The "rst is a Gaussian linear classi"er, which assumes for each class a Gaussian probability distribution with equal covariance matrices. The posterior probability for one object to belong to class u is H p(x"u )P(u ) H H , f (x)" H p(x"u )P(u ) I I I
(1)
where p(x"u ) is normally distributed. H The second linear classi"er is Fisher linear classi"er, where a pseudoinverse is used when the covariance matrix is close to singular (see also Ref. [15]). This classi"er does not give direct estimations of the posterior class probabilities. Therefore we use the sigmoids of the distances to the decision boundary. These are optimally scaled using a logistic approach and are used as approximations of the posterior probabilities. In cases involving more classes, for each separate class a classi"er is trained between that class and all the other classes combined. Using this method the posterior probabilities of the classes follow from the classi"er for that class. Third classi"cation method is the quadratic classi"er. It assumes like the Gaussian linear classi"er Gaussian probability distributions for each of the classes, only the covariance matrices do not have to be the same. The same posteriori probabilities are used. The last method used is the multilayer perceptron. Networks with di!erent number of hidden units are trained with the Matlab Neural Network Toolbox. It is a gradient descent method with variable learning rate and stopping criterion based on the performance on an independent test set. The normalized network output is used as the estimate for the posteriori probability.
2. Probabilities estimated by classi5ers Our goal in training classi"ers is to "nd the classi"cation rule that minimizes the probability of error. The optimal rule in classi"cation is the Bayesian decision rule: assign a pattern to the class with the largest posterior probability. A drawback is that the true posterior probability has to be known. When this posterior probability is known (or is estimated) not only a classi"cation can be obtained but also the con"dence in this classi"cation can be assessed. When two classes have almost the same posterior probability the classi"cation can be rejected due to lack of con"dence. This results in more reliable classi"cations. Some classi"ers immediately o!er estimates of posterior probabilities, like the multilayer perceptron, trained with backpropagation (see Ref. [13]) or by maximizing the cross-entropy on the network outputs (see Ref. [14]). In other classi"cation methods, probabilities are harder to obtain. Often class probabilities estimates are only reliable for large training sets, as for instance in the case of the k-nearest-neighbor classi"er.
3. Combining rules We assume that all objects are represented by feature vectors x3s, each object belonging to one of C classes u , j"1,2, C. When R measurement vectors x,2, x0 H from feature spaces s,2, s0 are available, the probability P(u "x,2, x0) has to be approximated to make H a classi"cation (see also Ref. [10]). In each of the R feature spaces a classi"er can be constructed which approximates the true posterior class probability P(u "xI) in sI: H f I(xI)"P(u "xI)#eI(xI), H H H
(2)
where eI(xI) is the error made by classi"er k on the H probability estimate that object xI belongs to class u . H A combination rule combines these f I(xI) to approximate H P(u "x,2, x0) as good as possible. H Two extreme cases can be distinguished, the "rst in which s"s"2"s0, the second where s,2, s0
D.M.J. Tax et al. / Pattern Recognition 33 (2000) 1475}1485
are di!erent and assumed to be independent. In the "rst case all the classi"ers use the same data x: P(x,2, x0"u )"P(x"u ) ) d(x!x) ) 2d(x0\!x0). H H This trivially leads to P(u "x,2, x0)"P(u "xI) for any k, 1)k)R. H H
(3)
This P(u "xI) is estimated by f I(xI). When we assume H H zero-mean error for eI(xI) (i.e. zero bias), all f I(xI)'s can be H H averaged to obtain a less error-sensitive estimation. This leads to the mean combination rule 1 0 f (x,2, x0)" f I(xI). H H R I
(4)
In the second case all feature spaces are di!erent and class conditionally independent. The probabilities can be written as P(x,2, x0"u )"P(x"u ) ) P(x"u ) ) 2 H H H P(x0"u ). Using the Bayes rule, we derive H P(u "(xI)/P(u )0\ I H H . P(u "x,2, x0)" H + P(u "xIY)/P(u )0\, HY IY HY HY
(5)
In case of equal a priori class probabilities (P(u )"1/(number of classes)), this formula reduces to H a product combination rule (Eq. (6)) with eI(xI)"0: H 0 f I(xI) I H . f (x,2, x0)" H 0 f I(xI) HY I HY
(6)
4. Di4erences between averaging and multiplying From the derivation of Eqs. (4) and (6), we would expect that the two rules will be useful under di!erent conditions. The mean combination rule will be especially useful in case of identical or very highly correlated feature spaces in which the classi"ers make independent errors. The product combination rule is apt for di!erent, class conditionally independent feature spaces where classi-
1477
"ers make small estimation errors. We will investigate this correlation dependance with a very simple model. In this model data consist of two classes in an Ndimensional space, each normally distributed (see for a three-dimensional example Fig. 1). The "rst class is centered on the origin, the second on (1, 1,2,1)/(N. The covariance matrix can be adjusted. It can be changed from identity, in which case the data is uncorrelated for each component (see left picture in Fig. 1), to complete correlation, in which case all data are perfectly correlated for each component (right picture in Fig. 1). On each single feature a classi"er is trained which means that the number of classi"ers R"N. Each classi"er has to estimate posterior probabilities and a decision boundary. Combining the predictions of the classi"ers in the two extremes, perfect correlation and complete independence of the data, will indicate where one combination rule can be preferred over the other. 4.1. Two-class problems In Table 1 classi"cation errors for the (three) individual classi"ers and the combination rules are shown. The Gaussian classi"ers are trained on 20 training patterns, tested on 100 patterns. The average individual test error rates of the classi"ers are listed in the second column. The error rates from the combination rules are shown in the third and fourth column. The values are averages over 50 runs. In these very simple experiments we see that no di!erence between the combination rules exists, even when a large number of classi"ers is used. In case of two-class problems, we can derive conditions for which the rules behave the same. Assume we have equal class probabilities for the classes. When the product rule classi"es some object x to class u , ( j can be 1 or 2) then H f I(xI)' (1!f I(xI)), H H I I
Fig. 1. Data distribution in three dimensions, (left) uncorrelated, (right) correlated.
(7)
1478
D.M.J. Tax et al. / Pattern Recognition 33 (2000) 1475}1485
Table 1 Results of combining three and ten classi"ers with the two combining rules Data correlation
R"3 0.0 0.5 1.0 R"10 0.0 0.5 1.0
Average test error indiv. classi"ers (%)
Test error mean combination (%)
Test error product combination (%)
Average improvement (%)
0.40$0.06 0.37$0.04 0.32$0.03
0.32$0.04 0.32$0.03 0.32$0.03
0.32$0.04 0.32$0.03 0.32$0.03
0.00$0.01 0.00$0.01 0.00$0.00
0.46$0.05 0.43$0.05 0.32$0.03
0.37$0.04 0.34$0.03 0.32$0.03
0.37$0.04 0.34$0.03 0.32$0.03
0.00$0.01 0.00$0.01 0.00$0.00
We can rewrite f I(xI)"fM (x)#mI with fM (x)" H H H H 1/R f I(xI) and therefore mI"0. This is basically the I H I H bias-variance decomposition (see Ref. [16]). The di!erent values for mI account for the variance, while mI"0 H I H indicates that there is no bias (the posterior probabilities are estimated well). We can expand the terms in Eq. (7):
mImIY f I"fM 0 1# H H #2 H H fM I IIY H
"fM 0#fM 0\ mImIY#fM 0\ mImIYmI#2. (8) H H H H H H H H IIY IIYI For two-class problems mI "!mI . All sums over mI's H in the expansion of f I and f I will be equal, except I I for the sign in summations over an odd number of classi"er outputs and for the factors fM 0\L. Using this in H Eq. (7) results in fM 0#fM 0\ mImIY#fM 0\ mImIYmI#2 H H H H H H H H IIY IIYI
No. Classes
Term 1
Term 2
Term 3
2 3 4 5
1.0 1.0 1.0 1.0
0.16 0.22 0.23 0.30
0.00016 0.00971 0.01862 0.02231
new combination rule: classify object x to class u , ( j can H be 1 or 2) when CfM 0'C(1!fM )0. H H
(10)
This is a rescaled version of the mean combination rule for a two-class problem. In the product combination rule the output values are just shifted to the extremes; for fM (0.5 to 0 and for values fM '0.5 to 1. H H 4.2. Multiclass problems
'(1!fM )0#(1!fM )0\ mImIY!(1!fM )0\ H H H H H IIY ; mImIYmI#2. H H H IIYI
Table 2 Sizes of higher-order sums in Eq. (9) relative to the "rst term, using R"3 classi"ers
(9)
When there are no outliers and mI is smaller than fM I, H H then mImIYmI will stay small. In Table 2 the size of IIYI H H H fM 0\ mImIYmI relative to the previous two terms is H IIYI H H H shown for di!erent number of classes. These values are the largest absolute values over all classes. The relatively large value of the second term does not in#uence the classi"cation, because the signs of these terms are equal for all classes. In case of a two-class problem the third term is very small and can be ignored. This means that when we start with the product combination rule (given by Eq. (7)) and we apply the approximation given by Eq. (8), we get the
For multiclass problems the arti"cial two-class problem from the preceding section is extended. All classes still have the same covariance matrix, only their means are located at n* (1, 1,2, 1)/(R, n"1,2, C. Training three Gaussian linear classi"ers on this data sets for di!erent number of classes, results in Fig. 2. Using the combination rules for more than two-class problems, di!erences between the combination rules appear. Here the situation is much more complicated. Now not only the mean value of the estimated probabilities are most important, but also all individual class probabilities. For instance, when in the mean combination rule in a two-class problem, an object is assigned to a particular class, then the mean combination posterior probability is larger than 0.5. In a three-class problem, a mean value of 0.33 does not guarantee that the object will be assigned to
D.M.J. Tax et al. / Pattern Recognition 33 (2000) 1475}1485
Fig. 2. Results of combining three classi"ers for di!erent number of classes. (The standard deviation for the individual classi"ers is about 4%, for the combination rules about 2%.)
that class. Far away from the third class the decision boundary between two classes is still where the posterior probabilities of these two classes equal 0.5. For multiclass problems the decision boundaries can be in regions with posterior probabilities between 1/C and 0.5. This increase in uncertain areas for a growing number of classes also increases the chance that the classi"cation will change when the product combination rule is used instead of the mean combination rule. 4.3. Error sensitivity To investigate the error sensitivity of the combination rules, the training data used by one of the three classi"ers, is contaminated by noise. The results are shown in Fig. 3. In the left picture, no noise is added. In the middle "gure noise values equally distributed between zero and one are added to one feature, in the right picture noise between zero and two is added. It can be observed that when classi"ers have poor decision boundaries and poor estimated probabilities the mean combination rule is more robust than the product rule. Especially when one of the classi"ers is an outlier which outputs probabilities of 0 and 1, the product combination rule acts as a veto and the solution severely deteriorates. The robustness of the mean combination rule with respect to the product combination rule is shown by Kittler [10] in which Eq. (2) is expanded, comparable with the expansion in Eq. (8). In that paper it is shown that the combined classi"ers using a product-combination rule approximates the error free classi"er up to a factor [1# (eI/P(u "xI))] while in the mean-combinaI H H tion-rule the factor is [1#( eI/ P(u "xI))]. Note that IH I H P(u) "xI)1, so errors are ampli"ed by the product rule. H In the mean-combination rule the errors are divided by
1479
Fig. 3. Results of combining three classi"ers for a three-class problem with one feature contaminated with noise, averaged over 10 runs. The left "gure shows results without noise contamination, middle "gure with noise between 0 and 1, right "gure with noise between 0 and 2. The data correlation is shown on the x-axis. The dotted line is the average individual classi"er error, the dashed line is the average rule, the solid line is the product rule. Standard deviation is about 5% for the classi"ers, 2.5% for the improvement.
the sum of a posterior probabilities and especially for the winning class, where these probabilities are large, the errors are severely dampened. 4.4. Rejection When a higher con"dence is required for classi"cation tasks, an acceptance threshold on the class probability can be introduced. Then probabilities around the decision boundary are excluded from classi"cation. When several independent classi"ers classify an object each with probability 0.6 to the same class, the sum rule also assumes that this probability is 0.6 while the product rule increases the con"dence in this classi"cation by a value depending on the number of other classes and the number of classi"ers involved. From this observation it can be expected that when rejection thresholds are introduced, the mean-combination rule will reject objects which in the product combination rule are con"dently classi"ed. In Fig. 4 the error is plotted versus the rejection rate for a two-class problem. Both, the mean-combination rule and the product rule give approximately the same curves. The ampli"cation of the noise deteriorates all that has been gained by the increase of the con"dence. In case of a problem with more than two classes (see Fig. 5) an improvement is obtained when the threshold is not set too high. For a rejection rate smaller than 80% the product combination rule outperforms the mean combination rule.
5. Experiments 5.1. Data We tested the combination rules using real data sets. This data set is also used and explained by Van
1480
D.M.J. Tax et al. / Pattern Recognition 33 (2000) 1475}1485
Fig. 4. (left) The error-rate and the rejection-rate versus the threshold on the con"dence for the mean-combination (dashed) and the product-combination (solid) for two-class problems: (right) The error-rate versus the rejection rate for the mean-combination rule (dashed) and the product-combination-rule (solid).
Fig. 5. The error-rate and the rejection-rate versus the threshold on the con"dence for the mean-combination (dashed) and the product-combination (solid) for 10 classes: (right) The error-rate versus the rejection rate for the mean-combination rule (dashed) and the product-combination-rule (solid).
Breukelen [17]. From nine original maps from a Dutch public utility digits were extracted. The maps represent the position of a conduit system with respect to certain landmarks and were hand drawn by a large group of drawing engineers over a period of more than 25 yr. The data set is composed of separate dimensioning digits. The digits were automatically extracted from a binary image of the map, deskewed and normalized to "t exactly into a 30 by 48 pixel region. Finally, the digits were labelled manually. From the set of 2000 digits four types of feature sets were extracted: Zernike moments, Karhunen}Loe`ve
features, Fourier descriptors and image vectors. Zernike moments are the projection of the image function onto a set of orthogonal basis functions. There are 13 orders of Zernike moments with 49 moments associated in total. For the feature extraction only the last 11 orders were used resulting in a subset of 47 Zernike moments. As Zernike moments are rotation invariant no distinction was made between digits with values 6 and 9. Thus, only nine classes were available for the Zernike features. The Karhunen}Loe`ve transform is a linear transform and corresponds to the projection of images onto the
D.M.J. Tax et al. / Pattern Recognition 33 (2000) 1475}1485
1481
5.2. Results
Fig. 6. Performance of the individual Gaussian linear classi"ers and the combining rules.
eigenvectors of a covariance matrix. The covariance matrix is created from images of training data. The fourth feature set is based on a simple transform of the binary image to a pixel vector. To reduce both the number of features and the possible loss of information the normalized image was divided into tiles of 2;3 pixels resulting in a total of 240 tiles. Each tile represents an element of the feature vector and the value corresponding to each tile was calculated simply by counting the number of object pixels within the tile. Although the origin of the objects in all data sets is the same, applying di!erent preprocessing methods results in independent measurements of these objects and thus in independent data sets. Therefore, we might expect that combining rules are able to improve upon individual classi"ers considerably.
We trained "ve linear Gaussian classi"ers on each separate data set and combined the outputs. All classi"ers are trained using di!erent number of training patterns (at most 100 objects per class, in total 1000 objects for 10 classes) and are tested on a separate test set consisting of 4292 objects in total. The results are shown in Fig. 6. We see that in the Fourier feature set and in the Zernike feature set, classi"cation errors are quite high. This is due to the rotation invariance of these features. The pixel set and Karhunen}Loe`ve set both perform well. The posteriori probabilities seem to be estimated well, by using combination rules the classi"cation performance dramatically improves. The product rule is consequently better than the mean combination rule. The best classi"cation performance achieved is about 2.3% error. In Fig. 7(a) the worst classi"er, i.e. the classi"er on the Fourier data set is removed. Again the product combination rule outperforms the other methods and achieves an error of 2.8%. Although the Fourier classi"er had average classi"cation error of about 25%, it still contributed to obtain a better classi"cation. In Fig. 7(b) the best classi"er, the classi"er trained on the Karhunen}Loe`ve data set, is removed. Surprisingly by removing this data set the product combination rule still works very well. The mean combination rule deteriorates the performance of the classi"er on the Karhunen}Loe`ve data set. The best performance is now about 2.5% error. These examples show that the classi"ers on the independent data sets classify a large fraction of the objects correctly. This gives a stable behaviour when one classi"er is removed. It also shows that the independent views of the di!erent classi"ers can contribute signi"cantly to correct the output for some of the more di$cult objects.
Fig. 7. Performance of the combining rules and the individual Gaussian linear classi"ers on three data sets. In the left "gure on data sets Karhunen}Loe`ve, Pixel and Zernike, in the right "gure on data sets Fourier, Pixel and Zernike.
1482
D.M.J. Tax et al. / Pattern Recognition 33 (2000) 1475}1485
Especially the product combination rule gains from these independent classi"cations. The mean combination rule on the other hand averages these contributions, which can lead to some performance loss. This is especially visible in Fig. 7(b). We can conclude that by using all available information, thus using all available data sets with a product combination rule, the optimal classi"cation performance is achieved. This experiment was repeated using another classi"er, the Fisher linear discriminant (see Fig. 8). We see that here the estimations of the posteriori probabilities are worse than in the case of the Gaussian linear classi"ers. The combination rules are still useful but the performance improvement is much lower. Especially in cases where one or two classi"ers perform very badly, here with the training set sizes smaller than 40, outliers deteriorate the outcome of the combination rules. Both combination rules achieve about the same classi"cation performance with an error of about 3.4%. In Fig. 9 experiments with quadratic classi"ers are shown. The quadratic classi"er is regularized a bit (by adding 0.2 times the identity matrix to the covariance matrix) to make inversion of the covariance matrix possible. When we use a su$cient number of training patterns, more than 40 patterns per class, the individual performances on the Pixel and Karhunen}Loe`ve data set improve with respect to the linear Gaussian classi"ers, while performance on the Fourier data set deteriorates. Combining gives a serious improvement, which is an indication of improved probability estimates. Lowest classi"cation error for both mean and product combination rule becomes 1.8%. In Fig. 10 combinations of multi-layer perceptron classi"ers is shown. For each data set a 8-hidden unit network is trained and the network outputs are combined.
Fig. 9. Performance of the individual quadratic classi"ers and the combining rules.
Fig. 10. Performance of the individual multi-layer perceptrons (eight hidden units) and the combining rules. Standard deviations on the MLP's are about 5%, only on the Pixel data this is about 20%.
Fig. 8. Performance of the individual Fisher linear classi"ers and the combining rules.
This is done 5 times. The network outputs are very noisy, especially network on Pixel data can have extremely bad performance. Standard deviations on the graphs are 20% for the Pixel data and 5% for the other data sets. This results in a bad performance for the product combination rule. Only the more noise robust mean combination can improve upon the individual classi"ers with a best performance of 5.0% error. In Fig. 11 the performances on multi-layer perceptrons with 20 hidden units is shown (averaged over 10 runs). Here severe overtraining occurs and the product combination rule breaks down. Only the mean combination rule is robust enough to give reasonable results.
D.M.J. Tax et al. / Pattern Recognition 33 (2000) 1475}1485
Fig. 11. Performance of the individual multi-layer perceptrons (20 hidden units) and the combining rules. Standard deviations are about 20%.
To check the results of Section 4.1 the combination rules are also applied to a two-class problem, the distinction between the classes 6 and 9. In Fig. 12 the performance of the Gaussian linear classi"ers and the combination rules is shown for this two-class problem. In the left "gure we see that the classi"er on the Karhunen}Loe`ve data set performs very well, with a minimum error of 1.0%. On the other hand, the performance on the Zernike data set is as bad as 50%. This is caused by the fact that the Zernike moments are rotational invariant and no di!erences between 6 and 9 can be found. Also the Pixel and Fourier data sets show bad classi"cation performances. For these last three classi"ers poor posteriori probabilities are expected and thus the expansion in Eq. (8) will not hold.
1483
This is con"rmed by the performances of the combination rules, both rules behave completely di!erent. The mean combination rule gives reasonable results over the complete range of train sizes, but is far worse than the best classi"er. The product combination rule encounters the problem that the posteriori probabilities are estimated very badly and for both classes a probability of zero is obtained. When for all classes a probability of zero is given, the "nal output probabilities cannot be normalized and the product combination rule does not give an outcome. Only for a large number of training objects per class, 90 and 100 objects per class, the product rule obtains classi"cations. This is shown in the right "gure, which is an enlarged version of the lower part of the left "gure. Fig. 13 shows the results when Fisher linear discriminants are used for the same two-class problem, the 6 and the 9. Again the Zernike data set does not provide useful information but the classi"ers on the Karhunen} Loe`ve and the Pixel data set work well. For smaller training sets the probability estimates are not very accurate and the combination rules (especially the product combination rule) do not improve classi"cation. For larger training set sizes both combination rules converge to a classi"cation error of 0.5%, which is the same as the individual classi"ers on the Pixel and Karhunen}Loe`ve data sets. This con"rms that with su$cient accuracy of the posteriori probability estimates in a two-class problem, both combination rules obtain the same classi"cation.
6. Conclusions The main goal of this paper was to investigate the relative merits of simple averaging over classi"er outputs and multiplying the outputs. Although taking the aver-
Fig. 12. Performance of the individual Gaussian linear classi"ers and the combining rules on the classes 6 and 9.
1484
D.M.J. Tax et al. / Pattern Recognition 33 (2000) 1475}1485
Fig. 13. Performance of the individual Fisher linear classi"ers and the combining rules on the classes 6 and 9.
age is easy to perform and often results in good classi"cation performance, this rule is not based on a solid Bayesian foundation. Under the assumption of independent feature spaces and using the Bayes rule would result in a product combination rule. We showed that in case of a two-class problem in which posterior probabilities are well estimated (without a large number of extreme posterior probability estimations, one and zero), the mean combination rule and the product combination rule perform the same classi"cation. Also when the rejection of objects with low classi"cation con"dence is allowed, mean and product combination rules do not di!er signi"cantly. Only in the case of larger estimation errors the product combination rule deteriorates with respect to the mean combination rule. When the classi"cation problem involves more than two classes, di!erences between the combination rules start to appear. Combining classi"ers which are trained in independent feature spaces results in improved performance for the product rule, while in completely dependent feature spaces the performance is the same. When the rejection option is allowed, this holds for moderate rejection rates. We can conclude that averaging-estimated posterior probabilities is to be preferred in the case when posterior probabilities are not well estimated. Only in the case of problems involving multiple classes with good estimates of posterior class probabilities the product combination rule outperforms the mean combination rule.
Acknowledgements This work was partly supported by the Foundation for Applied Sciences (STW), the Foundation for Computer Science in the Netherlands (SION) and the Dutch Organization for Scienti"c Research (NWO).
References [1] A. Sharkey, N. Sharkey, How to improve the reliability of arti"cial neural networks, Technical Report CS-95-11, Department of Computer Science, University of She$eld, 1995. [2] L. Xu, A. Kryzak, C.V. Suen, Methods of combining multiple classi"ers and their applications to handwriting recognition, IEEE Trans. Systems, Man Cybernet. 22 (3) (1992) 418}435. [3] R. Battiti, A.M. Colla, Democracy in neural nets: voting schemes for classi"cation, Neural Networks 7 (4) (1994) 691}707. [4] K. Tumer, J. Ghosh, Order statistics combiners for neural classi"ers, in: Proceedings of the World Congress on Neural Networks, INNS Press, Washington DC, 1995, pp. I:31}34. [5] R. Jacobs, Method for combining experts' probability assessments, Neural Comput. 7 (5) (1995) 867}888. [6] G. Rogova, Combining the results of several neural network classi"ers, Neural Networks 7 (5) (1994) 777}781. [7] S. Hashem, Optimal linear combinations of neural networks, Neural Networks (1994). [8] M. Tanigushi, V. Tresp, Averaging regularized estimators, Neural Comput. 9 (1997) 1163}1178. [9] J.A. Benediktsson, P.H. Swain, Consensus theoretic classi"cation methods, IEEE Trans. Systems, Man and Cybernet. 22 (4) (1992) 688}704. [10] J. Kittler, M. Hatef, R.P.W. Duin, Combining classi"ers, Proceedings of ICPR'96, (1996) 897}901. [11] J. Kittler, A. Hojjatoleslami, T. Windeatt, Weighting factors in multiple expert fusion, in: A.F. Clark, (Ed.), Proceedings of the eighth British Machine Vision Conference, University of Essex Printing Service, 1997, pp. 41}50. [12] D.M.J. Tax, R.P.W. Duin, M. van Breukelen, Comparison between product and mean classi"er combination rules, in: P. Pudil, J. Novovicova, J. Grim, (Eds.), First International Workshop on Statistical Techniques in Pattern Recognition, Institute of Information Theory and Automation, June 1997, pp. 165}170.
D.M.J. Tax et al. / Pattern Recognition 33 (2000) 1475}1485 [13] D.W. Ruck, S.K. Rogers, M. Kabrisky, M.E. Oxley, B.W. Suter, The multilayer perceptron as an approximation to a Bayes optimal discrimination function, IEEE Trans. Neural Networks 1 (4) (1990) 296}298. [14] C.M. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, Walton Street, Oxford OX2 6DP, 1995. [15] S. Raudys, R.P.W. Duin, Expected classi"cation error of the "sher linear classi"er with pseudo-inverse covariance matrix, Pattern Recognition Lett. 19 (5-6) (1998) 385}392.
1485
[16] S. Geman, E. Bienenstock, R. Doursat, Neural networks and the bias/variance dilemma, Neural Comput. 4 (1992) 1}58. [17] M. van Breukelen, R.P.W. Duin, D.M.J. Tax, Combining classi"ers for the recognition of handwritten digits, in: P. Pudil, J. Novovicova, J. Grim, (Eds.), First International Workshop on Statistical Techniques in Pattern Recognition, Institute of Information Theory and Automation, June 1997, pp. 13}18.
About the Author*DAVID M.J. TAX received the M.Sc. degree in Physics from the University of Nijmegen, The Netherlands in 1996. Currently he works in the Pattern Recognition group in the Delft University of Technology. His research interest include pattern recognition with a focus on neural networks and support vector machines. About the Author*MARTIJN VAN BREUKELEN received the M.Sc. degree in Applied Physics from the Delft University of Technology in the Netherlands in 1998. Currently he works at the Institute of Applied Physics of the Netherlands Organization for Applied Scienti"c Research (TNO). His professional interests include design and development of decision support systems and machine intelligence. About the Author*ROBERT P.W. DUIN studied Applied Physics at Delft University of Technology in the Netherlands. In 1978 he received the Ph.D. degree for a thesis on the accuracy of statistical pattern recognizers. In his research he included various aspects of the automatic interpretation of measurements, learning systems and classi"ers. Between 1980 and 1990 he developed and studied hardware architectures and software con"gurations for interactive image analysis. At present he is an associate professor of the Faculty of Applied Sciences of Delft University of Technology. His main research interest is in the design and evaluation of learning algorithms for pattern recognition applications. This includes in particular neural network classi"ers, support vector classi"ers and classi"er combining strategies. About the Author*J. KITTER graduated from the University of Cambridge in Electrical Engineering in 1971 where he also obtained his Ph.D. in Pattern Recognition in 1974 and the Sc.D. degree in 1991. He joined the Department of Electronic and Electrical Engineering of Surrey University in 1986 where he was a Professor, in charge of the Centre for Vision, Speech and Signal Processing. He has worked on various theoretical aspects of Pattern Recognition and on many applications including automatic inspection, ECG diagnosis, remote sensing, robotics, speech recognition, and document processing. His current research interests include Pattern Recognition, Image Processing and Computer Vision. He has co-authored a book with the title `Pattern Recognition: a statistical approacha published by Prentice-Hall. He has published more than 300 papers. He is a member of the Editorial Boards of Pattern Recognition Journal, Image and Vision Computing, Pattern Recognition Letters, Pattern Recognition and Arti"cial Intelligence, and Machine Vision and Applications.
Pattern Recognition 33 (2000) 1487}1496
On links between mathematical morphology and rough sets Isabelle Bloch* Ecole Nationale Supe& rieure des Te& le& communications, De& partement TSI } CNRS URA 820, 46 rue Barrault, 75013 Paris, France Received 28 September 1998; accepted 10 May 1999
Abstract Based on the observation that rough sets and mathematical morphology are both using dual operators sharing similar properties, we investigate more closely the links existing between both the domains. We establish the equivalence between some morphological operators and rough sets de"ned from either a relation, or a pair of dual operators or a neighborhood system. Then we suggest some extensions using morphological thinning and thickening, and using algebraic operators. We propose to de"ne rough functions and fuzzy rough sets using mathematical morphology on functions and fuzzy mathematical morphology. 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Mathematical morphology; Rough sets; Approximation spaces; Topology; Similarity relations; Fuzzy sets; Fuzzy mathematical morphology; Rough functions; Fuzzy rough sets
1. Introduction Rough set theory has been introduced in 1982 [1], as an extension of set theory, mainly in the domain of intelligent systems. The objective was to deal with incomplete information, leading to the idea of indistinguishability of objects in a set. It is therefore related to the concept of approximation, and of granularity of information (in the sense of Zadeh [2]). This theory was applied successfully in several applications, e.g. information analysis, data analysis and data mining, knowledge discovery (for instance, discovery of which features are relevant for data description), i.e. all those applications in which a need arises for intelligent decision support. Mathematical morphology is originally also based on set theory. It has been introduced in 1964 by Matheron [3,4], in order to study porous media. But this theory evolved rapidly to a general theory of shape and its transformations, and was applied in particular to image processing and pattern recognition [5]. In addition to its set theoretical foundations, it relies on topology on sets, on
* Tel.: #33-1-45-81-75-85; fax: #33-1-45-81-37-94. E-mail address:
[email protected] (I. Bloch).
random sets, on topological algebra, on integral geometry, on lattice theory. The basic idea in mathematical morphology is to study shapes by transforming them using some interaction with a set called structuring element and which is chosen by the user (the observer). Rough set theory [1] is an extension of set theory for dealing with coarse information. In this framework, a set X is approximated by two sets, called upper and lower approximations, and denoted by AM (X) and A(X), such that A(X)LXLAM (X). On the other hand, mathematical morphology [5,6] provides operators that are either extensive or anti-extensive, such as dilation D and erosion E (if the origin of the space belongs to the chosen structuring element B), or closing C and opening O . We have: E (X)LXLD (X) and O (X)LXL C (X), i.e. similar relations as the one for rough sets. One of the basic properties of upper and lower set approximations is duality. A similar property holds for mathematical morphology between dilation and erosion, and between opening and closing. In fact, most of the morphological operators go by pairs of dual operators. Based on these elementary observations, it is tempting to look at closer links between both domains. To our knowledge, the only work that puts together both domains is the one of Polkowski [7], where a hit-or-miss topology is de"ned on rough sets, similar to what is used
0031-3203/00/$20.00 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 1 2 9 - 6
1488
I. Bloch / Pattern Recognition 33 (2000) 1487}1496
in mathematical morphology. Here we take another point of view and try to link lower and upper approximations directly to morphological operators. To our knowledge, it is the "rst time that such links are established, We "rst recall in Section 2 the basic de"nitions of rough sets, in particular those based on a similarity relation, and of mathematical morphology, in particular its four basic operators. Then we compare both theories in light of a list of properties that are commonly used in rough set theory (Section 3). Then we establish in Section 4 formal links between upper and lower approximations on the one hand, and dilation and erosion (respectively opening and closing) on the other hand. In Section 5 we take a closer look on some topological aspects. Then we propose in Section 6 some extensions of these links, using other operators like thinning and thickening, or algebraic operators. We also extend this work to functions and to fuzzy sets, and show how mathematical morphology on functions and on fuzzy sets can be used for de"ning rough functions and rough fuzzy sets. This brings together three di!erent aspects of the information: vagueness (through fuzzy sets), coarseness (through rough sets) and shape (through mathematical morphology). Finally, we conclude with some insights on the respective contributions of each domain to the other, those that can be anticipated from this work.
2.1. Rough sets from relations In rough set theory [1], the two sets AM (X) and A(X) such that A(X)LXLAM (X) are de"ned from an equivalence relation. Let U denote the universe of discourse, X being a subset of U. Each element of U is known through its attributes a. The set of attributes A is a set of functions de"ned on U. Let Inf (x) be the information vector of x: (1)
An equivalence relation R is de"ned with respect to the set of attributes on U as xR y0Inf (x)"Inf (y).
A(X)LXLAM (X).
(5)
The lower approximation of X contains the elements x such that all the elements that are indistinguishable from x (according to the considered attributes) are in X. The upper approximation of X contains the elements x such that at least one element that is indistinguishable from x belongs to X. This de"nition can be extended to any relation R, leading to the notion of generalized approximate space (see e.g. Ref. [8]). Let r(x) be the set de"ned as r(x)"+y3U"xRy,.
(6)
The lower and upper approximations of X according to R are then de"ned as R(X)"+x3U"r(x)LX,,
(7)
RM (X)"+x3U"r(x)5XO,.
(8)
Conversely r(x) can be obtained from the upper approximation of X as r(x)"+y3U"x3RM (+y,),.
(9)
Obviously, if R is an equivalence relation, r(x)"[x] 0 and these de"nitions are equivalent to the original Pawlak's de"nitions. If R is a tolerance relation (i.e. re#exive and symmetrical), these equations de"ne tolerance rough sets. The properties of RM (X) and R(X) depend on the properties of R, as will be seen in Section 3.
2. De5nitions of rough sets and basic morphological operators
Inf (x)"+a(x)"a3A,.
A rough set is the pair (A(X), AM (X)). Obviously, we have
(2)
This relation characterizes the elements that are indistinguishable from each other based on the available information. The pair (U, R ) is called an approximation space. Let [x] denote the class of x. Then lower and upper approximations of a subset X of U are de"ned as
2.2. Mathematical morphology: basic operators Mathematical morphology is basically a set theory [5], that has extensions to functions [5], vectors, fuzzy sets [9]. We just recall here the de"nitions of the four basic operations for sets. Let X be a set of U, and B a set called structuring element. The morphological dilation of X by B is de"ned as D (X)"+x3U"B 5XO,, V
(10)
where B denotes the translation of the structuring eleV ment at point x. The morphological erosion of X by B is de"ned as E (X)"+x3U"B LX,. V
(11)
Morphological opening and closing are de"ned, respectively, by
A(X)"+x3U"[x] LX,,
(3)
O (X)"D [ [E (X)],
(12)
AM (X)"+x3U"[x] 5XO,.
(4)
C (X)"E [ [D (X)],
(13)
I. Bloch / Pattern Recognition 33 (2000) 1487}1496
where B[ denotes the symmetrical of B with respect to the origin of the space. Opening and closing can be rewritten as O (X)"+x3U"y3U"x3B and B LX,, W W
(14)
C (X)"+x3U"∀y3U, x3B NB 5XO,. W W
(15)
For any structuring element that contains the origin of the space, the following property holds: E (X)LXLD (X).
(16)
For any structuring element (without restriction), the following property holds: O (X)LXLC (X).
(17)
In the following, we assume that the origin of U belongs to the structuring element B. 2.3. First conclusion A "rst conclusion that can be drawn from these de"nitions is the similarity between the operators involved in both the domains. Lower approximations involve subsethood, as do erosion and opening, while upper approximations involve set intersection, as do dilation and closing. Moreover, the inclusion properties are similar. These remarks lead to a "rst parallelism between lower approximation and erosion or opening on the one hand, and between upper approximation and dilation or closing on the other hand. Further similarities call for a closer look at the properties satis"ed by the operators in both the domains.
L9. L10. U1. U2. U3. U4. U5. U6. U7. U8. U9. U10. K. LU.
1489
R(X)LR(R(X)). RM (X)LR(RM (X)). RM (X)"[R(X!)]!. RM ()". RM (X6>)"RM (X)6RM (>). RM (X5>)LRM (X)5RM (>). XL>NRM (X)LRM (>). RM (U)"U. XLRM (X). RM (R(X))LX. RM (RM (X))LRM (X). RM (R(X))LR(X). R(X!6>)LR(X)!6R(>). R(X)LRM (X).
Properties L1 and U1 express the duality between lower and upper approximations. These properties allow to derive relations U1}U10 from relations L1}L10. Properties L2, L6, U2 and U6 express limit conditions for the empty set and the whole space. Compatibility with union and intersection is expressed by L3, L4, U3 and U4. Properties L5 and U5 express the increasingness with respect to set inclusion. The basic notions of lower and upper approximations can be found in properties L7, U7 and LU. Properties L8}L10 and U8}U10 concern the composition of approximations. Note that if L7 and L9 are simultaneously satis"ed, we have, due to L5 R(X)"R(R(X)), i.e. lower approximation is idempotent. In the same way, if U7 and U9 are simultaneously satis"ed, then upper approximation is idempotent. If L7 and L10 are simultaneously satis"ed, then we have R(X)"R(RM (X)) and a similar expression for the upper approximation.
3. Comparison of basic properties 3.1. A list of properties of interest
3.2. Which properties do the rough sets and mathematical morphology have in common?
In this section, we list the properties that are of interest in the theory of rough sets. We follow here the presentation provided in Ref. [8]. The satisfaction of these properties depending on the chosen de"nition is detailed in the next subsection.
In Table 1 we compare the properties that are satis"ed by the di!erent de"nitions of rough sets with those satis"ed by the four basic morphological operators.
R(X)"[RM (X!)]!, where X! denotes the complementation of X in U. R(U)";. R(X5>)"R(X)5R(>). R(X6>)MR(X)6R(>). XL>NR(X)LR(>). R()". R(X)LX. XLR(RM (X)).
From the results in Table 1, it appears clearly that lower approximations share many properties with erosion and with opening, while upper approximations share many properties with dilation and closing. These algebraic properties make rough set algebra similar to mathematical morphology algebra. Having made these observations, we can now establish formal links between set approximations and morphological operators.
L1. L2. L3. L4. L5. L6. L7. L8.
3.3. Second conclusion
1490
I. Bloch / Pattern Recognition 33 (2000) 1487}1496
Table 1 Comparison between the properties of rough sets depending on the properties of R with those of mathematical morphology operators. A cross (;) indicates that the property is satis"ed. The "rst column contains the list of properties, according to the notations given in Section 3.1. The next four columns are for rough sets, de"ned from any relation, a tolerance relation, a relation that is re#exive and transitive, and an equivalence relation, respectively. The two last columns are for morphological operators, erosion and opening in the upper part (corresponding to the properties of lower approximation) and dilation and closing in the lower part (corresponding to the properties of upper approximation) Property
Any R
Tolerance rel.
R re#ex. and trans.
Equivalence rel.
Erosion/Dilation
Opening/Closing
L1 L2 L3 L4 L5 L6 L7 L8 L9 L10
; ; ; ; ;
; ; ; ; ; ; ; ;
; ; ; ; ; ; ;
; ; ; ; ; ; ; ; ; ;
; ; ; ; ; ; ; ;
; ; only L ; ; ; ;
U1 U2 U3 U4 U5 U6 U7 U8 U9 U10
; ; ; ; ;
; ; ; ; ; ; ; ; ; ;
; ; ; ; ; ; ; ;
; ; only L ; ; ; ;
K LU
;
; ;
; ;
; ;
; ; ; ; ; ; ; ;
; ;
; ; ; ; ; ; ; ; ; ; ;
;
;
4. Formal links between rough sets and mathematical morphology
0 y!x3B,
(23)
0 x!y3B[ ("B),
(24)
Lower and upper approximations can be obtained from erosion and dilation. For a given structuring element, the corresponding relation is then as follows:
0 x3B , W 0 yRx
(25)
xRy 0 y3B . V From R, we derive r(x) as
(18)
∀x3U, r(x)"+y3U"y3B ,"B . (19) V V We always assume that the origin of U belongs to the structuring element B. It follows that: ∀x3U, x3B V and therefore,
(20)
∀x3U, xRx,
(21)
i.e. R is re#exive. Moreover, if B is symmetrical (i.e. B"B[ ), we have ∀(x, y)3U, xRy 0 y3B , V
(22)
(26)
which proves that R is symmetrical. If follows that R is a tolerance relation. Let us show that for this relation, erosion and lower approximation coincide: ∀XLU, R(X)"+x3U"r(x)LX,, "+x3U"B LX,, V "E (X).
(27) (28) (29)
In a similar way, dilation and upper approximation coincide, since we have ∀XLU, RM (X)"+x3U"r(x)5XO, "+x3U"B 5XO, V "D (X).
(30) (31) (32)
I. Bloch / Pattern Recognition 33 (2000) 1487}1496
These results are com"rmed by the properties shown in Table 1, that are the same for lower and upper approximations derived from a tolerance relation, and for erosion and dilation. These equivalences are in accordance with the operator-oriented view of rough sets [10,8]. Let ¸ and H be two dual operators, such that ∀XLU, ¸(X)"H(X!)!
(33)
and satisfy the following properties: (1) H()", (2) H commutes with union: H(X6>)"H(X)6H(>).
∀XLU,
∀>LU,
Then there exists a relation R such that ¸(X)"R(X) and H(X)"RM (X). This relation is de"ned by xRy 0 x3H(+y,).
(34)
The results proved in this section provide concrete examples of operators ¸ and H, which are morphological erosions and dilations. Actually, a family of operators is obtained, indexed by the structuring element. The derived relation R is exactly the one introduced in Eq. (18), since we have for a symmetrical structuring element B: xRy 0 x3H(+y,) 0 x3D (+y,) 0 x3B 0 y3B . (35) W V Let us now consider opening and closing. Using the operator-oriented point of view, they can be used, respectively, as lower and upper approximations, since they have most of the required properties as shown in Table 1. However, since closing does not commute in general with union (only a inclusion holds for property U3), the direct derivation of R as in Eq. (34) cannot be applied. Even if it is not as obvious as for dilation and erosion to "nd an expression for opening and closing based on a relation, these operators have interesting properties that deserve to consider them as good operators for constructing rough sets. In particular, they are idempotent, which is particularly useful if we take the topology-oriented point of view. This is the scope of the next section.
1491
Let us consider again two dual operators ¸ and H, but satisfying some more axioms: (1) H()", (2) H commutes with union: ∀XLU, ∀>LU, H(X6>)"H(X)6H(>). (3) H is extensive: ∀XLU, XLH(X). (4) H is idempotent: ∀XLU, H(H(X)"H(X). (5) ∀XLU, XL¸(H(X)). If properties (1)}(4) are satis"ed, then the relation R that is derived from H using Eq. (34) is re#exive and transitive, and this de"nes a topological approximation space. Indeed, properties (1)}(4) are the properties of a common closure operator. Except for property (2) where we generally have only an inclusion, properties (1)}(4) are also satis"ed by closing (and the dual operator opening). Therefore, these morphological operators de"ne a topological approximation space. If properties (1)}(5) are satis"ed, then R is an equivalence relation. Property (5) is in general not satis"ed by opening and closing. If the set of considered objects is restricted to the objects that are opened by B (i.e. they do not contain details smaller than B), then property (5) holds. However, in such a case, the lower approximation does not modify the set. Let us now consider erosion and dilation. They do not satisfy property (4), but satisfy all others. The loss of idempotence for the closure operator corresponds to a pre-topology [11]. Therefore, using erosion and dilation introduces the notion of pretopological approximation space. This may be of interest for pattern recognition purposes, since non-idempotent closure allows to aggregate patterns using iterated closure operations. The basic topology on sets in mathematical morphology is the hit-or-miss topology, that is based on the intersection of a closed set with some open sets (the `hita part) and on the non-intersection of a closed set with some compact sets (the `missa part) [5]. It appears that the relations that de"ne this topology are the same as the ones de"ning lower and upper approximations. This leads as given in Ref. [7], to a construction of hit-or-miss topology on rough sets.
5. Topological aspects Important notions in topology are interior and closure operators. More local information is the notion of neighborhood. These two aspects will be dealt within the two parts of this section. 5.1. Topology and pre-topology The idea is that lower and upper approximations can be interpreted as interior and closure. Morphological operators also receive similar interpretations.
5.2. Neighborhood systems Let us now consider a topology de"ned through a neighborhood system. Let n(x) be a neighborhood of x and N(x) be a neighborhood system for x. Lower and upper approximations are then de"ned as [12]: N(X)"+x3U"n(x)3N(X)"n(x)LX,,
(36)
NM (X)"+x3U"∀n(x)3N(X)"n(x)5XO,.
(37)
1492
I. Bloch / Pattern Recognition 33 (2000) 1487}1496
The de"nitions presented in Section 2 correspond to the case where only one neighborhood is considered, i.e. N(x)"+n(x),. The analogy with mathematical morphology is straightforward, if we consider that the structuring element translated at a point x of U is nothing but a neighborhood of x. If we set N(x)"+B ,, we obtain V N(X)"E (X), (38) NM (X)"D (X).
(39)
Moreover, if we consider a family of structuring elements B, 2, BI, and if we set N(x)"+B,2, BI ,, the BG being V V V considered as di!erent neighborhoods of x whose union builds the neighborhood system, we obtain N(X)" 8 E G (X), G
I
(40)
NM (X)" 8 D G (X). (41) G
I Let us now consider opening and closing. Similar relations are obtained, by setting this time N(x)"+B "y3U W and x3B ,. Then we obtain W N(X)"O (X), (42) NM (X)"C (X).
(43)
The proof of these results comes from the writing of an opening as O (X)"+x3U"y3U"x3B and B LX,. (44) W W As for erosion and dilation, we can consider a family of structuring elements for opening and closing. This view is particularly interesting for shape recognition, since in morphological recognition, an object has often to be tested or matched with a set of patterns, like directional structuring elements. This set of patterns is interpreted as a neighborhood system.
6. Extensions In this section, we give some hints on possible extensions of the results we obtained in this paper. These extensions concern the choice of the dual operators and the objects on which they are applied. 6.1. Thinning and thickening Among the dual operators used in mathematical morphology, thinning and thickening are of particular interest, since they allow to perform operations depending on various local con"gurations. The main di!erence with erosion, dilation, opening and closing, is that the structuring element is not only tested against object points
(B LX, B 5XO, etc.), but it is also tested against V V background points (i.e. points of X!). Let us "rst recall the de"nitions of these operations (the reader may refer to Refs. [5,6] for more details). The structuring element is divided into two disjoint parts (¹ , ¹ ), where ¹ is tested against points of X, while ¹ is tested against points of X!. The hit-or-miss transformation is de"ned as HM¹ (X)"E (X)5E (X!). (45) 2 2 2 2 From this operation, thinning and thickening are de"ned as (X)"X!HM¹ (X), (46) 2 2 2 2 ¹hick (X)"X6HM¹ (X). (47) 2 2 2 2 Since ¹ 5¹ ", the origin of the space belongs either to ¹ or to ¹ . In the "rst case, the hit-or-miss trans formation provides a subset of X and it is meaningful to perform thinning. In the second case, the hit-or-miss transformation provides a subset of X! and it is meaningful to perform thickening. The duality that holds between thinning and thickening takes the following form: ¹hin
(X)"[¹hick (X!)]!. (48) 2 2 2 2 Therefore, the possible pairs that can be de"ned from these operators as lower and upper approximations can be of the following types:
¹hin
(1) (¹hin (X), ¹hick (X)) if the origin of U 2 2 2 2 belongs to ¹ , (2) (¹hin (X), ¹hick (X)) if the origin of U 2 2 2 2 belongs to ¹ , (3) (¹hin (X), X) if the origin of U belongs to¹ , 2 2 (4) (X, ¹hick (X)) if the origin of U belongs to ¹ . 2 2 In the following, we restrict our study to the "rst case, where the origin of U belongs to ¹ . The second case is similar. Taking the operator-oriented point of view, the rough sets that can be built from thinning and thickening according to the "rst pair are obtained for the following operators: , H"¹hick . (49) 2 2 2 2 Since thickening generally does not commute with union (except for particular structuring elements where it is equivalent to dilation), it is not possible to derive directly a relation according to which the rough sets are de"ned. Among the properties listed in Table 1, L1, L2, L6, L7, U1, U2, U6, U7 and LU are always satis"ed. This shows that several properties are lost, and therefore we call `generalized rough setsa the pairs obtained from thinning and thickening. The ones obtained using erosion and dilation by a structuring element B are particular cases, corresponding to ¹ de"ned as the family of structuring ¸"¹hin
I. Bloch / Pattern Recognition 33 (2000) 1487}1496
elements containing the origin and at least another point of B, and ¹ "B!¹ . Let us consider now the third and fourth possible pairs of approximations. They bring an original aspect which is a kind of assymetry between lower and upper approximations. Taking the operator-oriented point of view, we have for the third pair: ¸(X)"¹hin (X), H(X)"X, 2 2
(50)
¸(X)"X, H(X)"¹hick (X). 2 2
(51)
Here the duality is not directly between ¸ and H, or between ¸ and H, but between ¸ and H and between ¸ and H, since we have ¸(X!)"¹hin
2 2
(X!)"[¹hick (X)]! 2 2
"[H(X)]!,
(52)
¸(X!)"X!"[H(X)]!.
(53)
The topological interpretation is particularly interesting when using thinning and thickening. Indeed, the pair (¹ , ¹ ) de"nes a neighborhood around a point, describ ing which points of the neighborhood should belong to the set X and which ones should belong to its complement. For instance, taking just the origin for ¹ and B!¹ for ¹ (B being any structuring element, or neighborhood), the hit-or-miss transformation using (¹ , ¹ ) selects the points that are isolated in the back ground. The thinning by (¹ , ¹ ) removes such points, leading to a lower approximation of X that has no isolated points, while the thickening by (¹ , ¹ ) "lls up isolated points of the background, leading to an upper approximation that has no holes constituted by only one point. This shows that very "ne operations can be obtained using these operators. Another interesting point is that these operations can be iterated by using families of structuring elements [5] (for instance, rotations of a generic structuring element). In this way, we can use for instance the skeleton as the lower approximation, the convex hull as the upper approximation, etc., which are useful tools in shape representation and recognition. Moreover, several thinnings and thickenings are homotopic operators, i.e. that deform shapes while preserving their homotopy. This leads to homotopic rough sets, that deserve probably a deeper study. 6.2. Algebraic rough sets using algebraic operations Another possible extension may be derived from algebraic operators. Algebraic erosions and dilations are de"ned on complete lattices as operators that commute with intersection and union, respectively [5,6]. Therefore L3 and U3 are directly satis"ed. Properties L2, L5, L6, U2, U5 and U6 are also satis"ed by these operators. Note
1493
that morphological erosions and dilations are particular cases of algebraic operators, if translation invariance holds. Algebraic openings and closings are de"ned as increasing, anti-extensive (respectively extensive) and idempotent operators [5,6]. Therefore properties L5, L7, L9 and U5, U7, U9 are automatically satis"ed. The use of algebraic erosion/dilation or opening/closing for de"ning lower and upper approximations lead to what we call `algebraic rough setsa. 6.3. Rough functions Since mathematical morphology also applies to functions [5,6], we can use the de"nitions of dilation, erosion, opening and closing on functions to de"ne lower and upper approximations of functions. This seems to be a natural extension of rough sets. Let f be a function de"ned on U, and let B be a set of U (structuring element), that we consider here symmetrical, containing the origin of U. Using erosion and dilation, we de"ne lower and upper approximations of f as ∀x3U, B( f )(x)"E ( f )(x)" inf f (y), WZ V
(54)
∀x3U, BM ( f )(x)"D ( f )(x)"sup f (y). WZ V
(55)
Using opening and closing, we de"ne lower and upper approximations of f as ∀x3U, B( f )(x)"O ( f )(x)"D [E ( f )](x),
(56)
∀x3U, BM ( f )(x)"C ( f )(x)"E [D ( f )](x).
(57)
The properties of these rough functions are direct transpositions of the ones of rough sets: L1. L2. L3. L4. L5. L6. L7. L8. L9. L10. U1. U2. U3. U4. U5. U6. U7. U8.
B( f )"![BM (!f )]. B( f )"f , where f is any constant function. A A A B(min( f, g))"min[B( f ),B(g)]. B(max( f, g))*max[B( f ), B(g)]. f)gNB( f ))B(g). B(f )"f , where f is identically zero. B( f ))f. f)B(BM ( f )). B( f ))B(B( f )). BM ( f ))B(BM ( f )). BM ( f )"![B(!f )]. BM ( f )"f . BM (max( f, g))"max[BM ( f ),BM (g)]. BM (min( f, g)))min[BM ( f ),BM (g)]. f)gNBM ( f ))BM (g). BM ( f )"f . A A f)BM ( f ). BM (B( f )))f.
1494
U9. U10. K. LU.
I. Bloch / Pattern Recognition 33 (2000) 1487}1496
BM (BM ( f )))BM ( f ). BM (B( f )))B( f ). B(max(!f, g)))max[!B( f ), B(g)]. B( f ))BM ( f ).
Using erosion and dilation, properties L1}L8, U1}U8, K and LU hold, as for the case of sets. Using opening and closing, we usually have an inequality only for L3 and U3, properties L8, L10, U8 and U10 being generally not satis"ed, but L9 and U9 are always satis"ed. This construction can be further extended using a function g as structuring element: ∀x3U, g( f )(x)"E ( f )(x)"inf+f (y)!g(y!x), y3U,, E (58) ∀x3U, g ( f )(x)"D ( f )(x)"sup+ f (y)#g(y!x), y3U,. E (59) Similar properties are obtained. 6.4. Fuzzy rough sets In Ref. [9], we de"ned erosion and dilation of a fuzzy set k by a fuzzy structuring element l as follows: ∀x3U, E (k)(x)"inf ¹[c(l(y!x)), k(y)], J WZU
(60)
(61) ∀x3U, D (k)(x)"sup t[l(y!x), k(y)], J WZU where t is a t-norm (fuzzy intersection), ¹ a t-conorm (fuzzy union) and c a fuzzy complementation. The reader may refer to Ref. [13] for more details about fuzzy connectives. Fuzzy opening and closing are de"ned for crisp sets as combinations of erosion and dilation. Fuzzy morphological operations have the same properties as that of crisp ones, as shown in Ref. [9]. Most of the properties hold for any t-norm and t-conorm. Only idempotence and extensivity (respectively, anti-extensivity) of closing (respectively opening) are satis"ed for particular t-norms and t-conorms only, as for instance Lukasiewicz operators, de"ned as: t(a, b)"max(0, a#b!1) and ¹(a, b)"min(1, a#b). Therefore, fuzzy rough sets de"ned from these morphological operators have exactly the same properties as crisp rough sets, at least for particular t-norms and t-conorms. It turns out that these de"nitions using fuzzy erosion and dilation are generalizations of the ones proposed in Ref. [14], for t"min and ¹"max in a completely di!erent context, using a fuzzy relation k . The equiva0 lence is obtained as in the crisp case by setting k (x, y)"l(y!x). 0
(62)
The interpretation is similar to that in the crisp case: the degree of relation between x and y is equal to the degree to which y!x belongs to the structuring element, i.e. to the degree to which y belongs to the structuring element translated at x. This extension brings together three di!erent aspects of the information: rough sets represent coarseness, fuzzy sets represent vagueness and mathematical morphology brings a geometrical, topological and morphological aspect. The conjunction of vagueness and coarseness had already been pointed out in Ref. [14]. In this paper, we bring an additional morphological point of view, by de"ning fuzzy rough sets using fuzzy mathematical morphology.
7. Discussion The main results established in this paper are that morphological operators happen to be good tools for de"ning lower and upper approximations of sets, in the theory of rough sets, with the appropriate properties. Moreover, these operators lead to a generalization of rough sets to functions and to fuzzy sets. Several mathematical aspects are common to both theory, as set theoretical, algebraic and topological aspects. From an information point of view, several aspects are merged: coarseness, morphology, and vagueness for the extension to fuzzy sets. In addition to the formal similarities between both domains, contributions can be brought from each domain to the other. Mathematical morphology brings tools for analyzing shapes, the approximations it provides contain a regularization and "ltering aspect (in particular using opening and closing, or their combinations). Some formulations are particular cases of general lower and upper approximations, but others may bring some generalizations (for instance in the framework of pre-topology instead of topology, or using thinning and thickening). The use of fuzzy morphology for de"ning fuzzy rough sets provide formulations that are more general than the ones originally proposed in Ref. [14] using fuzzy relations. Mathematical morphology operators have a lot of properties that can from now on be used also in the theory of rough sets. One example of useful property is iterativity and combination (for instance dilating a set n times by a ball of radius 1 is equivalent to dilating the set once with a ball of radius n). Such properties allow to perform successive approximations, using the same operator or di!erent ones, in a controlled way, leading to di!erent levels of representation, or of precision. In the same way, compatibility with geometric transformations is useful for spatial applications. Also the choice of the structuring element B has a direct impact on the approximations that can be more or less strong depending on the size of B.
I. Bloch / Pattern Recognition 33 (2000) 1487}1496
Since mathematical morphology is issued from a completely di!erent domain from the theory of rough sets, it also brings a large variety of applications, particularly in image processing. Until now, very few attempts have been done to use rough sets in image processing (see Refs. [15,16]). The results shown in this paper might provide a bridge to "ll this gap. Conversely, since the theory of rough sets has been mainly developed in the domain of arti"cial intelligence, it brings a new look to mathematical morphology, in particular for approximate reasoning and logics. What seems to deserve further study is for instance to build a possibilistic modal logic based on morphological operators. A modal logic based on rough sets is described e.g. in Ref. [8], where the connectives used are in a classical way negation, conjunction, disjunction, implication and equivalence, but also two modal operators, necessity 䊐 and possibility 䉫, that are de"ned from lower and upper approximations. The properties of Table 1 have equivalents in logical terms using these connectives, leading to reasoning rules like 䊐pP䉫p, 䊐pPp and several others. If we use morphological operators for de"ning lower and upper approximations, we have for instance: 䊐X"E (X), 䉫X"D (X). A set X may correspond to a proposition like `this object is Xa, or `this object is in Xa. Then 䊐X and 䉫X represent, respectively, the area where it is necessary that such a proposition holds, and the area where it is just possible. These two areas represent approximations of the location of X for instance, due to imprecision, incomplete knowledge, etc. The logic derived from rough sets provides then some tools for reasoning under imprecision in a morphological context.
8. Summary Rough set theory has been introduced in 1982, as an extension of set theory, mainly in the domain of intelligent systems. The objective was to deal with incomplete information, leading to the idea of indistinguishability of objects in a set. It is therefore related to the concept of approximation. In this framework, a set is approximated by two sets, called upper and lower approximations, that respectively contains and is included in the initial set. Mathematical morphology is originally also based on set theory. It has been introduced in 1964 by Matheron, in order to study porous media. But this theory evolved rapidly to a general theory of shape and its transformations, and was applied particularly in image processing and pattern recognition. Mathematical morphology provides operators that are extensive or anti-extensive, such as dilation and erosion (if the origin of the space belongs to the chosen structuring element), or closing and opening. One of the basic properties of upper and lower set approximations is duality. A similar property holds for mathematical morphology between dilation and erosion,
1495
and between opening and closing. In fact, most of the morphological operators go by pairs of dual operators. Based on these elementary observations, it is tempting to look at closer links between both domains. Here we try to link lower and uper approximations directly to morphological operators. To our knowledge, this is the "rst time that such links are established. We "rst start from the basic de"nitions of rough sets, in particular those based on a similarity relations, and of mathematical morphology, in particular its four basic operators. Then we compare both the theories in the light of a list of properties that are commonly used in rough set theory. Then we establish formal links between upper and lower approximations on the one hand, and dilation and erosion (respectively opening and closing) on the other hand. We then look more closely on some topological aspects (topology and pre-topology de"ned from a closure operator, and neighbourood systems). Then, we propose some extensions of these links, using other operators like thinning and thickening, or algebraic operators. We also extend this work to funtions and on fuzzy sets, and show how mathematical morphology on functions and on fuzzy sets can be used for de"ning rough functions and rough fuzzy sets. This brings together three di!erent aspects of the information: vagueness (through fuzzy sets), coarseness (through rough sets) and shape (through mathematical morphology). Finally we provide some insights on the respective contributions of each domain to the other, those that can be anticipated from this work.
References [1] Z. Pawlak, Rough sets, Int. J. Inform. Comput. Sci. 11 (5) (1982) 341}356. [2] L. Zadeh, Fuzzy sets and information granularity, in: M. Gupta, R. Ragade, R. Yager (Eds.), Advances in Fuzzy Set Theory and Applications, North-Holland, Amsterdam, 1979, pp. 3}18. [3] G. Matheron, EleH ments pour une theH orie des milieux poreux, Masson, Paris, 1967. [4] G. Matheron, Random Sets and Integral Geometry, Wiley, New York, 1975. [5] J. Serra, Image Analysis and Mathematical Morphology, Academic Press, London, 1982. [6] J. Serra, in: J. Serra (Ed.), Image Analysis and Mathematical Morphology, Part II: Theoretical Advances, Academic Press, London, 1988. [7] L. Polkowski, Rough set approach to mathematical morphology: approximate compression of data, in: Information Processing and Management of Uncertainty IPMU'98, Paris, 1998, pp. 1183}1189. [8] Y.Y. Yao, Two views of the theory of rough sets in "nite universes, Int. J. Approximate Reasoning 15 (4) (1996) 291}310. [9] I. Bloch, H. Mam( tre, Fuzzy mathematical morphologies: a comparative study, Pattern Recognition 28 (9) (1995) 1341}1387.
1496
I. Bloch / Pattern Recognition 33 (2000) 1487}1496
[10] T.Y. Lin, Q. Liu, Rough approximate operators: axiomatic rough sets theory, in: W.P. Ziarko (Ed.), Rough Sets, Fuzzy Sets and Knowledge Discovery, Springer, London, 1994, pp. 256}260. [11] H. Emptoz, Mode`le preH topologique pour la reconnaissance des formes. Applications en neurophysiologie, The`se de Doctorat d'Etat, Univ. Claude Bernard, Lyon I, Lyon, France, 1983. [12] T.Y. Lin, Neighborhood systems: a qualitative theory for fuzzy and rough sets. in: Second Annual Joint Conference on Information Science, Wrightsville Beach, NC, 1995, pp. 255}258.
[13] D. Dubois, H. Prade, A review of fuzzy set aggregation connectives, Inform. Sci. 36 (1985) 85}121. [14] D. Dubois, H. Prade, Rough fuzzy sets and fuzzy rough sets, Int. J. General Systems 17 (1990) 191}209. [15] Z. Wojcik, Rough approximation of shapes in pattern recognition, Comput. Vision Graphics Image Process. 40 (1987) 228}249. [16] Z.M. Wojcik, Application of rough sets for edge enhancing image "lters, in: IEEE International Conference on Image Processing ICIP'94, Vol. II, Austin, Texas, 1994, pp. 525}529.
About the Author*ISABELLE BLOCH is a professor at ENST Paris (Signal and Image Department), and is in charge of the Image Processing and Interpretation Group. She is graduated from Ecole des Mines de Paris in 1986, received a Ph.D. from ENST Paris in 1990, and the `Habilitation a` Diriger des Recherchesa from University Paris 5 in 1995. Her research interests include 3D image and object processing, structural pattern recognition, 3D and fuzzy mathematical morphology, decision theory, data fusion in image processing, fuzzy set theory, evidence theory, medical imaging, aerial and satellite imaging.
Pattern Recognition 33 (2000) 1497}1510
Fast nearest-neighbor search algorithms based on approximation-elimination search V. Ramasubramanian , Kuldip K. Paliwal* Computer Systems and Communications Group, Tata Institute of Fundamental Research, Homi Bhabha Road, Bombay 400 005, India School of Microelectronic Engineering, Grizth University, Brisbane, QLD 4111, Australia Received 28 July 1997; received in revised form 5 January 1999; accepted 5 January 1999
Abstract In this paper, we provide an overview of fast nearest-neighbor search algorithms based on an &approximation}elimination' framework under a class of elimination rules, namely, partial distance elimination, hypercube elimination and absolute-error-inequality elimination derived from approximations of Euclidean distance. Previous algorithms based on these elimination rules are reviewed in the context of approximation}elimination search. The main emphasis in this paper is a comparative study of these elimination constraints with reference to their approximation}elimination e$ciency set within di!erent approximation schemes. 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Fast nearest-neighbor search; Approximation}elimination search; Partial-distance search; L , L constraints; Fast vector quantization encoding
1. Introduction Nearest-neighbor search consists in "nding the closest point to a query point among N points in K-dimensional space. The search is widely used in several areas such as pattern classi"cation, nonparametric estimation, information retrieval from multi-key data bases and in image and speech data compression. Reducing the computational complexity of nearest-neighbor search is of considerable interest in these areas. In this paper, we discuss fast nearest-neighbor search in the context of vector quantization encoding. Vector quantization encoding is a powerful data compression technique used in speech coding, image coding and speech recognition [1}7]. Vector quantization encoding is the minimumdistortion quantization of a vector x"(x , x ,2, x ), ) (referred to as the test vector) using a given set of N
* Corresponding author. Tel.: #61-7-3875-6536; fax: #617-3875-5198. E-mail address:
[email protected] (K.K. Paliwal).
K-dimensional codevectors called the codebook , of size N, under a given distance C"+c , G G2 , measure d(x, y). This involves "nding the nearest-neighbor of x in C, given by q(x)"c : d(x, c ))d(x, c ), H H G i"1,2, N which requires N distance computations d(x, c ) using the exhaustive full search. G 1.1. Elimination-based fast nearest-neighbor search The basic structure of the sequential full-search which obtains the nearest-neighbor codevector c is as follows: H d "R (a very large number) ALL For i"1,2, N d "d(x, c ) G G if d (d then j"i; d "d G ALL ALL G next i Here, at any given stage in the search, c is the currentH nearest-neighbor and the current-nearest-neighbor ball b(x, d ) is the surface de"ned by x : d(x, c )"d . For ALL H ALL the Euclidean distance between c "(c , k"1 ,2, K) H HI and x"(x , k"1 ,2, K) given by d(x, c )" I H [ ) (x !c )], the current-nearest-neighbor ball HI I I
0031-3203/00/$20.00 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 1 3 4 - X
1498
V. Ramasubramanian, K.K. Paliwal / Pattern Recognition 33 (2000) 1497}1510
is a hypersphere of radius d with center at x. The ALL sequential full-search "nds the nearest-neighbor by progressively updating the current-nearest-neighbor c when H a codevector is found closer to the test vector than the current-nearest-neighbor, with each update shrinking the current-nearest-neighbor ball radii to the actual nearestneighbor distance. The "nal nearest-neighbor is one of the codevectors inside the current-nearest-neighbor ball at any stage of the search and consequently, the currentnearest-neighbor ball assumes importance as a geometric object in de"ning the primary search space of interest. In the case of sequential full-search, the location of a codevector is determined with respect to the currentnearest-neighbor ball by computing the distance of each successive codevector to the test vector for comparison with the current-nearest-neighbor distance. As a result, the distances of all N codevectors to the test vector are computed with the search complexity being N distance computations per test vector. A direct approach in reducing the complexity of the above sequential &exhaustive full-search' is to reject a codevector by a &quick-elimination'*without computing its actual distance to the test vector by using computationally less expensive rules to determine which codevectors cannot be nearer to the test vector than the current-nearest-neighbor. This is equivalent to approximating the current-nearest-neighbor ball by geometric constraints which allow easy elimination of codevectors which do not lie inside the current-nearest-neighbor ball. Given the current-nearest-neighbor ball b(x, d ) ALL let E(x, d ) represent the spatial approximation of ALL the ball by some geometric constraint such that b(x, d )-E(x, d ). Then, codevectors which do not ALL ALL belong to E(x, d ) cannot be inside b(x, d ) and elimALL ALL ination consists of rejecting these codevectors. The e$ciency of elimination depends on two intrinsic factors of the elimination constraint: (i) the relative cost of the elimination computation with respect to the distance computation and (ii) the number of codevectors retained after elimination in comparison to the actual number of codevectors inside the current-nearest-neighbor ball. This is essentially governed by the computational cost in determining whether a codevector does not belong to E(x, d ) and the extent to which E(x, d ) approximates ALL ALL b(x, d ) spatially in terms of volume. Fig. 1 illustrates ALL this with a simple geometric constraint used in one of the earliest elimination-based fast search [8] and which forms the basis of hypercube-based elimination search reported subsequently [9}12]. Here, the geometric constraint is simply the projection of the current-nearestneighbor ball b(x, d ) on to a coordinate axis k, given by ALL E(x, d )"+y: "y !x "(d ,, i.e., E(x, d ) is the reALL I I ALL ALL gion between the hyperplanes H"+y: y "x !d , I I ALL and H"+y: y "x #d ,. A direct elimination is I I ALL achieved by rejecting the codevectors +c : c 'x # G GI I d or c (x !d ,, incurring atmost two scalar ALL GI I ALL
Fig. 1. Example of geometric constraint for elimination.
comparisons to reject a codevector as against a vector distance computation. (This elimination is carried out more e$ciently as in Ref. [8], using a binary search on a list of ordered coordinates of the codevectors to eliminate codevectors whose coordinates fall outside the projections of the current-nearest-neighbor ball b(x, d ).) ALL Thus, in the sequential full-search, given a currentnearest-neighbor ball, the code-book is reduced by applying the elimination rule C"C!+c : c , E(x, d ),. G G ALL Since the current-nearest-neighbor ball shrinks for every update of the current-nearest-neighbor, i.e., when a codevector closer to the current-nearest-neighbor has bean found, the above elimination need to be done only for every update of the current-nearest-neighbor using the E(x, d ) corresponding to the updated d . The ALL ALL codebook size thus reduces progressively with the aid of the elimination step and results in a reduction in the number of distances computed if each elimination step rejects some codevectors. 1.2. Approximation-elimination search In addition to the intrinsic nature of the geometric constraint in approximating any given current-nearestneighbor ball volume, another important aspect in#uencing the elimination e$ciency of the constraint is the size of the current-nearest-neighbor ball. Clearly, for a given constraint, the elimination e$ciency is better for a smaller current-nearest-neighbor ball. Hence the relative closeness of the current-nearest-neighbor codevector to the test vector and the sequence in which the test vectors are selected for distance computation in#uences the rate at which the codebook size is reduced in the elimination based search. In general, in the sequential search with the elimination step, selecting a codevector close to the test vector in the initial stages of the search will help in the elimination
V. Ramasubramanian, K.K. Paliwal / Pattern Recognition 33 (2000) 1497}1510
of a large number of codevectors in the early steps of the search thereby avoiding the distance computations of these codevectors. For instance, considering the codevectors in the order of decreasing distance to the test vector represents a worst-case condition, since no geometric constraint of the current-nearest-neighbor ball can eliminate any codevector at each update. In contrast, selecting the codevectors in the order of increasing distance represents the best possible situation, since the search then starts with the actual nearest-neighbor and subsequent search merely amounts to verifying that no other codevector lies within the given eliminationconstraint approximating the current-nearest-neighbor ball. In this case, the elimination step rejects the maximum number of codevectors in the "rst step itself and subsequent search is performed on a reduced set of codevectors. The resultant overall reduction in search complexity is then maximum for a given eliminationconstraint. Thus, the distances d(x, c ), i"1,2, N used in the G search provides the exact information for the best ordering in which the codevectors should be considered for search under the elimination based sequential search. However, getting this exact ordering information amounts to a full-search. Therefore, in order to gain any advantage in selecting codevectors (for distance computation) such that the distance to codevectors close to the test vector are computed in the early stages of search, it is necessary to characterize the closeness of a codevector to the test vector using some approximation of the distance measure; such an explicit &codevector selection' step is referred to as an &approximationstep' [13]. In order to gain any advantage from the approximation step in reducing the complexity of the search, the approximation criterion has to be computationally less expensive than the actual distance measure being used in the search, and should characterize the spatial organization of the codevectors with respect to the test vector so as to re#ect their relative closeness to the test vector. The approximation}elimination procedure as described above can be given as a fast-search generalization over the sequential full-search as follows: d "R (a very large number) ALL Do until codebook C is empty c "arg minc C Approx(x, c ) [Approximation] G Z J d"d(x, c ) [Distance computation] G if d(d then ALL j"i; d "d ALL C"C!+c :c ,E(x, d ), [Elimination] K K ALL endif enddo In the above, the approximation and elimination costs incurred in the computation of arg mincJ C Approx(x, c ) Z J
1499
and +c : c , E(x, d ), in comparison to the number of K K ALL distance computations saved by these operations crucially determine the overall reduction in the search complexity. The important issue in achieving fast search under the approximation}elimination framework is in "nding e$cient approximation and elimination criteria which are computationally less expensive than the distance computations but at the same time provide e$cient approximation of the distance measure and the currentnearest-neighbor ball volume. 1.3. Previous work The main elimination rules which have been employed for fast nearest-neighbor search are the partial distance elimination, hypercube elimination, elimination based on the absolute-error-inequality approximation of the Euclidean distance and the triangle-inequality-based elimination. Some of the earlier papers which are based on these elimination rules and which have addressed the issue of determining a &good' search sequence for a given test vector to improve the e$ciency of elimination schemes are [8}26]. 1.3.1. Partial-distance search A simple but e$cient elimination-based search which o!ers improvement over the sequential exhaustive fullsearch is the &partial distance search' method [10,27]. This is applicable to cases where the partial-vector distance attains the full-vector distance in a monotonically nondecreasing manner with the addition of the vector component distances. This allows a codevector to be rejected on the basis of its partial or accumulated distance, i.e., without completing the total distance computation * the elimination being carried out during the distance computation itself. In an earlier paper [28], we have addressed the role of the approximation step in the partial distance-based elimination. Here, we had proposed an ordering of codevectors according to the sizes of their corresponding clusters, i.e., sequencing the codevectors in the order of decreasing probability of being the nearest-neighbor to a given test vector, to improve the savings in computation achieved by the partial distance elimination. The ordering is done in a preprocessing stage and serves as an implicit approximation step in the actual search by providing a favorable sequence of codevectors for the partial distance based elimination. In addition, it was noted that the codebooks obtained at the end of the training process using the clustering algorithms such as the Linde}Buzo}Gray algorithm [29] have arbitrary orderings and are not guaranteed to be arranged in the favorable order. In addition, the implications of dimensionality on the additional complexity reduction due to the proposed ordering was brought out using the asymptotic equipartition property of block coding and
1500
V. Ramasubramanian, K.K. Paliwal / Pattern Recognition 33 (2000) 1497}1510
it was shown that the proposed favorable ordering would be most useful for codebooks designed on low entropy distributions such as when the dimensionality is low. 1.3.2. Triangle-inequality-based fast search Another important elimination rule which has been used extensively for fast nearest-neighbor search is the triangle inequality-based elimination, applicable when the distance measure is a metric [13}52]. This elimination rule corresponds to a hyperannulus constraint of the search space, where codevectors lying outside the hyperannulus region formed between two concentric hyperspheres centered at an &anchor point' are eliminated. The e$ciency of the triangleinequality (or hyperannulus) elimination increases with the number of distinct anchor points used, as the intersection volume of the hyperannulus corresponding to the multiple anchor points becomes more localized, consequently retaining less codevectors after elimination. The di!erent applications in which triangle-inequality-based elimination has found use and motivated development of fast algorithms are information retrieval from data bases [31], document retrieval [32], nonparametric classi"cation [33], fast isolated word recognition [13,38,40,50,51], and fast vector quantization of speech and image data [20,41,43]. The search based on the triangle inequality elimination in most algorithms is based on the interpoint distances computed during a preprocessing phase and set in a branch and bound framework where the search is organized using a hierarchical decomposition of clustered data into a tree structure [13,31}33,36}39,42,49]. The main algorithmic aspects which have received attention under this framework have been the determination of the clusters and the anchor points within each cluster [31,32], e$cient hierarchical decomposition and tree structuring of the given points [33,36,39], improvements in providing additional constraints for elimination [37], the issue of optimum number and location of the "xed anchor points [34], [41], [44}46,48,52], and optimum use of a given set of precomputed interpoint distances by using procedures which approximate the missing interpoint distances [42]. 1.4. Organization of the paper In this paper, we are concerned mainly with the class of elimination rules based on the approximations of Euclidean distance, namely, hypercube elimination and elimination based on the absolute-error-inequality approximation of the Euclidean distance. The main emphasis of this paper is to provide an overview of fast search using these elimination rules under an explicit &approximation}elimination' framework. In addition,
the paper is particularly oriented towards providing additional geometric insight to the main approximation-elimination schemes in an uni"ed framework and empirical studies characterizing the approximation-elimination e$ciency of the di!erent approximation}elimination search in detail. In Section 2, algorithms based on the ¸ , ¸ approxi mation of the ¸ norm are considered with reference to earlier work related to the hypercube constraint of the search space [9,11], the ¸ -based &minmax' algorithm [10] and its subsequent improvement using the absolute-error-inequality-based elimination [12]. These algorithms are viewed under the approximation}elimination framework and geometric details, not brought out originally in these algorithms, are provided for additional insight. The absolute-error-inequality is noted to be the general equivalence of the ¸ and ¸ norms. The ¸ , ¸ -based elimination criterion are geometrically seen as based on the constrained minimization and maximization of the ¸ , ¸ norms given ¸ norm. Approxima tion}elimination based on the ¸ constraint is shown to be more e$cient than the ¸ constraint based on geometric interpretation of the norm equivalence relationship and the volume ratios of the ¸ surface and the ¸ (hypercube) surface for a given ¸ ball. The rela tive e$ciencies of these approximation}elimination schemes are characterized and studied with respect to dimensionality.
2. Search based on L , L 1 approximation One of the earliest elimination-based fast search [8] is based on the projections of the codevectors on a coordinate axis and forms the basis of the ¸ (hypercube) search. This algorithm represents a classic example of search under the approximation}elimination framework. The basic principle behind this algorithm, as illustrated in Fig. 1, is the use of the projection of the current-nearest-neighbor ball b(x, d ) on to a coordiALL nate axis k, as a simple geometric constraint for elimination. This is given by E(x, d )"+y : "y !x "(d ,, ALL I I ALL i.e., E(x, d ) is the region between the hyperplanes ALL H"+y : y "x !d , and H"+y : y "x #d ,. I I ALL I I ALL The algorithm given in Ref. [8] can be described under an approximation}elimination framework as follows: In a preprocessing step, the codevector coordinates are ordered on some coordinate axis k and the corresponding indices stored. During the actual search, for a given test vector x, the codevectors are examined in the order of their projected distance from x on axis k, i.e., the approximation criterion is simply arg mincG C "x !c ". EliminaZ I GI tion is done by using two binary searches to truncate the ordered list of codevectors so as to retain only those codevectors whose coordinates fall within the projections of the current-nearest-neighbor ball b(x, d ). ALL
V. Ramasubramanian, K.K. Paliwal / Pattern Recognition 33 (2000) 1497}1510
This algorithm can be notionally represented in the approximation}elimination form as follows: d "R (a very large number) ALL Do until codebook C is empty c "arg mincl C "x !c "; C"C!c G Z I GJ G d"d(x, c ) G if d(d then ALL j"i; d "d ALL C"C!+c : "c !x "*d , K KI I ALL endif enddo Friedman et al. [8] analyze the performance of this algorithm for uniform distribution and Euclidean distance and obtain an upper bound for the average number of codevectors examined for "nding k nearest-neighbors as n\[kK¹(K/2)])(2N)\). This performance e$ciency decreases rapidly with increase in dimension K; for instance, the average number of distances computed for K"8 and N"1000 being about 600 as reported in the simulation results of Ref. [8]. This is primarily due to the fact that use of only one axis for approximation and elimination provides a very poor constraint for higher dimensions. 2.1. Hypercube-based elimination A direct generalization of the above single-axis-based search is to use all the axis k"1 ,2, K to provide an improved geometric constraint around the current-nearest-neighbor ball. This results in the hypercube-based elimination search as used in subsequent fast search algorithms [9}12]. The geometric constraint in the hypercube-based elimination is E(x, d )"+y : "y !x "( ALL I I d , ∀k"1 ,2, K,, i.e., E(x, d ) is the smallest hyperALL ALL cube formed by the 2K hyperplanes given by HY "+y : y "x !d , and H "+y : y "x #d , I I I ALL I I I ALL for k"1, 2, K, containing the current-nearest-neighbor ball. Elimination by the hypercube constraint can be realized within a full-search structure as a part of the squared-error distance computation, by directly checking if each codevector is within the hypercube. By this, a codevector c can be rejected if c 'x !d or G GI I ALL c 'x #d for any k"1 ,2, K, thus requiring 1 to GI I ALL 2K scalar comparisons for the rejection of a codevector. Alternately, a codevector c can be rejected, as in Ref. G [11], by checking if each of the component distances "c !x " is greater than the current-nearest-neighbor GI I distance d . By this, a codevector can be rejected with ALL a cost of 1 to K scalar subtractions and comparisons prior to a full distance computation. Any codevector passing all the K tests is inside the hypercube inscribing the current-nearest-neighbor ball and is therefore tested by the full distance to determine if it is inside the currentnearest-neighbor ball.
1501
However, a computationally more elegant and simpler scheme for hypercube-based elimination was proposed much earlier by Yunck [9]. This is based on the observation that the hypercube is de"ned by HC(x, d )" ALL I (x !d , x #d ) and the codevectors conI ALL I ALL I tained within the hypercube can be obtained as ) S I I where S "+c : x !d )c )x #d ,. For a speI G I ALL GI I ALL ci"ed hypercube, the sets S , k"1,2, K are determined I using two binary searches on ordered codevector indices on each of the axis and the subset of codevectors within the hypercube given by their intersection is determined by a simple and e$cient multiset-intersection method in Ref. [9]. The basic algorithm in Ref. [9] based on this procedure is mainly directed towards "nding the nearest-neighbor under the ¸ metric (or Minkowski &max metric') and can also be easily applied for search under other general ¸ metric (Minkowski p-metrics). N The ¸ metric is given by ¸ (x, y)"( ) "x !y "N)N, N N I(x, Iy), is Igiven and the ¸ metric, obtained as lim ¸ N N "x !y ". The isometric surby ¸ (x, y)"max I I2 ) I face determined by the ¸ metric is an hypercube of side 2¸ (x, y) centered at x and with the vector y lying on one of its sides. The hypercube based search constraint for a current-nearest-neighbor ball of radius d centered at ALL x, is thus the isometric ¸ surface +y : ¸ (x, y)"d , ALL inscribing the current-nearest- neighbor ball. This constraint eliminates all codevectors c whose ¸ (x, c ) disG G tance is greater than d . This is essentially a part ALL of the general equivalence between ¸ and ¸ norms given by ¸ (x, y)/(K)¸ (x, y))¸ (x, y). This is illustrated in Fig. 2. Fig. 2(a) shows the bounds on ¸ (x, y) given ¸ (x, y)"d. The lower bound ¸ (x, y)"d/(K and the upper ¸ (x, y)"d correspond to y being at A and B, respectively, for a given x. This can also be seen geometrically as in Fig. 2(b) which shows the bounds on ¸ (x, y) given ¸ (x, y)"d. The lower bound ¸ (x, y)"d and the upper bound ¸ (x, y)"d(K cor respond to y being at A and B, respectively, for a given x. The above inequalities describing the norm equivalence correspond to constrained minimization, maximization of the ¸ norm given x and +y : ¸ (x, y)"d, or the constrained minimization, maximization of the ¸ norm given x and +y : ¸ (x, y)"d,. Two metrics d and d on a space X are equivalent if there exist constants 0(c (c (R such that c d (x, y)) d (x, y))c d (x, y)∀x, y3X. Alternatively, the inequality can also be given as: there exist constants 0(e (e (R such that e d (x, y))d (x, y))e d (x, y)∀x, y3X. The ¸ norms N satisfy this equivalence in general. The equivalence between ¸ and ¸ norms forming the hypercube constraint and ¸ and ¸ norms forming the absolute-error-inequality constraint (Section 2.3) directly follow from this.
1502
V. Ramasubramanian, K.K. Paliwal / Pattern Recognition 33 (2000) 1497}1510
Fig. 2. Equivalence of relationship between ¸ and ¸ norms: (a) lower (¸ ) and upper (¸ ) bounds of ¸ (x, y) given x and ¸ (x, y)"d and, (b) lower (¸ ) and upper (¸ ) bounds of ¸ (x, y) given x and ¸ (x, y)"d.
Finding the nearest neighbor under the ¸ metric thus basically involves "nding the smallest hypercube with no codevectors inside it, but with one codevector on one of its sides. In Ref. [9], this is achieved by means of an expanding hypercube procedure, where the search starts with a small hypercube and expands the hypercube until it contains exactly one codevector on its side and no codevectors inside. For search with the general ¸ metric, the search consists of essentially carrying N out a full-search within the small subset of codevectors falling within an initially small, expanding hypercube until the current-nearest-neighbor ball is within the hypercube. This is illustrated for search with the ¸ norm in Fig. 3. Here, the codevector c is the nearest neighbor of the test vector under ¸ norm. However, the ¸ nearest neighbor is codevector c which is closer to x than c . (This can be understood based on the equivalence relation shown in Fig. 2.) The ¸ nearest neighbor is determined in a second step by using the hypercube corresponding to ¸ (x, c ). The main step in this algorithm by Yunck [9] is the determination of the subset of codevectors within a speci"ed hypercube using the multi-set intersection procedure described before. An important factor determining the e$ciency of the search is the size of the initial hypercube; if no codevectors are found inside the cube the search incurs additional overheads in appropriately expanding the cube. The main complexity of the search is in this &cube"nding' step which is estimated to be proportional to KN in terms of shift-operations as implemented in Ref. [9]. The hypercube method in Ref. [10] is an identical search procedure, with the additional feature being the analysis pertaining to "nding the optimal initial hypercube size for a uniform distribution. The optimality is with respect to "nding an e!ective trade o! between the
Fig. 3. Approximation by ¸ (x, c ); c "argmincG C ¸ (x, c ); G Z G c "argminci C ¸ (x, c ). Z G
two complexities: (i) the overhead complexity in expanding the hypercube until it contains a codevector and (ii) the average number of codevectors contained in the initial hypercube which determines the main complexity of number of distances calculated in performing a full-search in the subset of codevectors within the hypercube. Here, it is shown that the average number of codevectors for the optimal initial hypercube thus determined is approximately equal to 2), the average complexity of the search thus increasing exponentially with dimension. In terms of determining the subset of codevectors within a speci"ed hypercube, Ref. [10] uses a multiset-intersection method similar to that proposed earlier in Ref. [9].
V. Ramasubramanian, K.K. Paliwal / Pattern Recognition 33 (2000) 1497}1510
2.2. Minmax method In the &minmax' method [10], the ¸ metric is used explicitly as an approximation criterion in selecting the initial codevector. Subsequent elimination is carried out using the ¸ values computed between the test vector and the codevectors for approximation. This algorithm can be given in the approximation-elimination form as follows: d "R (a very large number) ALL Do until codebook C is empty c "arg mincJ C ¸ (x, c ); C"C!c G Z J G d"d(x, c ) G if d(d then ALL j"i; d "d ALL C"C!+c : ¸ (x, cm )*d , ALL K endif enddo The above search is carried out by computing ¸ (x, c ) G for i"1,2, N in a "rst step. After the "rst elimination using the ¸ values, the remaining codevectors are ordered according to their ¸ distance to the test vector such that subsequently, approximation consists in examining the codevectors sequentially in the order of increasing ¸ distances to the codevectors. This can be viewed as a direct generalization over the single-axis projectionbased selection procedure of Ref. [8]. The main shortcoming in the minmax method is the high overhead complexity of the ¸ -based approxima tion. The initial ¸ -based approximation is essentially a full-search using the ¸ distance and has a cost of [0, NK, (N!1)#N(K!1)] macs (multiplications, additions, comparisons) per vector. This initial approximation step of the search itself has an additional overhead cost of N(K!1) comparisons over the ¸ -distance based full-search and the search complexity reduction here is mainly with respect to reducing the NK multiplications per test vector to a low value. 2.3. Absolute error inequality Soleymani and Morgera [12] proposed an algorithm as an improvement over the minmax method of Ref. [10] with two additional features: (i) it reduces the overhead cost of determining the initial codevector by ¸ approxi mation by using a procedure which is essentially a partial-distance realization of the ¸ -based full-search, and (ii) it uses an additional elimination rule termed the &absolute error inequality' to eliminate codevectors during the distance computation alongwith the hypercube elimination. The absolute error inequality is based on the observation that ) ) if (x !c )(d then "x !c "(d(K. I GI I GI I I
(1)
1503
A codevector c which does not satisfy ) "x ! G I I than c "(d(K, i.e., whose absolute error is greater GI d(K will therefore have its squared error distance greater than d and can be rejected on the basis of its absolute error without having to compute the more expensive squared error distance. This algorithm can be given in terms of the following steps: E Select initial codevector c as ¸ nearest neighbor in G a full-search with partial distance realization: E Distance computation: d "d(x, c ) ALL G E Perform full-search of codebook C using (i) hypercube and (ii) absolute error inequality elimination rules during distance computation with each codevector c in K C. Its distance to the test vector d(x, c ) is computed K and compared with the current-nearest-neighbor distance d only if the codevector passes these two tests. ALL The two elimination rules are carried out as follows for each codevector c 3C: K E Reject c if "x !c "'d , k"1,2, K. K I KI ALL E Reject c by partial distance method in the computaK tion of the absolute error ) "x !c " with the KI I I reference absolute error distance being d (K. ALL 2.4. ¸ -approximation-based search In this section, we highlight the &absolute error inequality'-based elimination in the above algorithm and interpret it geometrically as a space constraint provided by the smallest ¸ surface inscribing the current-near est-neighbor ball, i.e., the given ¸ norm. We consider this notion with additional details and compare the ¸ and ¸ -based approximation}elimination using a search procedure in the minmax form. The proposed search procedure using the ¸ values for approxima tion}elimination is a more e$cient realization of ¸ based elimination than that of Ref. [12] and is shown to o!er lower complexity than the minmax procedure [10] which uses the ¸ values for approximation}elimina tion. The absolute error ) "x !c " in the inequality I GI used in Ref. [12] is the ¸Inorm between x and c and G inequality (1) is basically the equivalence between ¸ and ¸ norms given by ¸ (x, y))¸ (x, y))¸ (x, y)(K. (2) This is illustrated in Fig. 4. Fig. 4(a) shows the bounds on ¸ (x, y) given ¸ (x, y)"d. The lower bound ¸ (x, y)"d and the upper bound ¸ (x, y)"d(K cor respond to y being at A and B, respectively, for a given x. This can also be seen geometrically as in Fig. 4(b) which shows the bounds on ¸ (x, y) given ¸ (x, y)"d. The lower bound ¸ (x, y)"d/(K and the upper ¸ (x, y)"d correspond to y being at A and B respective ly for a given x.
1504
V. Ramasubramanian, K.K. Paliwal / Pattern Recognition 33 (2000) 1497}1510
Fig. 4. Equivalence of relationship between ¸ and ¸ norms: (a) lower (¸ ) and upper (¸ ) bounds of ¸ (x, y) given x and ¸ (x, y)"d and, (b) lower (¸ ) and upper (¸ ) bounds of ¸ (x, y) given x and ¸ (x, y)"d.
The above inequalities describing the norm equivalence can be obtained as a constrained minimization, maximization of the ¸ norm given x and +y : ¸ (x, y)"d, or the constrained minimization, maximization of the ¸ norm given x and +y : ¸ (x, y)"d,. (Soleymani and Morgera [12] obtain the absolute error inequality (2) by induction. Though the inequality is "nally shown to be satis"ed for general K, we note here that, an intermediate step in Ref. [12] is incorrect: this involves proving an alternate inequality ((K!1)(d!z )#z (d(K. In showing ) ) this, Ref. [12] incorrectly obtains the maximum of the to be (2K!1)d/ function ((K!1)(d!z )#z ) ) (4K!3 at z "d/(4K!3; the correct maximum of the ) function is d(K at z "d/(K.) ) The use of absolute error inequality in elimination can be seen in Fig. 4(a) as it implies that points lying outside the surface ¸ (x, y)"d(K cannot lie within the ¸ (x, y)"d surface and hence cannot be nearer to x than y. The geometric constraint provided by the absolute error inequality around the current-nearest-neighbor ball is thus the isometric surface ¸ (x, y)"d(K. This is essentially a simplex polytope, which can be viewed as formed by 2) hyperplanes each given by +y: ) "y "" I Iaxes d(K, with intercepts at d(K on the coordinate in each quadrant of the K-dimensional space with the origin translated to x. Note that the above algorithm essentially obtains a subset of candidate codevectors C"+c : ¸ (x, c )) H H d (K and ¸ (x, c ))d , by employing the upper ALL H ALL bounds of both ¸ , and ¸ distances corresponding to the initial codevector c obtained by the ¸ based approxiG mation in Step 1 (c "arg mincH C ¸ (x, c ); d "d(x, c )). G Z H ALL G The codevectors in C are potential updates for the current-nearest-neighbor and have their ¸ (x, c ) distances H bound by [¸ (x, c ), d (K] as can be seen from Figs. G ALL 2(b) and 4(b), corresponding to the initial codevector
c obtained by ¸ approximation and the ¸ (x, c ) G G bound for ¸ "(x, c )"d . The algorithm selects can G ALL didate codevectors from C by ¸ ordering and performs elimination using both hypercube and absoule error inequality constraints. 2.4.1. Comparison of ¸ and ¸ elimination constraints The relative elimination e$ciencies of the ¸ and ¸ surfaces in providing space constraint for search can be seen by comparing their respective volumes for an unit hyperspheres, i.e., corresponding to an unit current-nearest-neighbor ball radius ¸ (x, y)"1. The volume <(¸ ) enclosed by the ¸ surface (hypercube) is 2) and the volume <(¸ ) enclosed by the ¸ surface can be shown to be 2)((K))/K!. It can be seen in Fig. 5, that use of both ¸ and ¸ surfaces in providing elimination results in an improved elimination e$ciency as the intersection volume (IJKLMNOP) approximates the inscribed hypersphere more closely. The volume <(¸ 5¸ ) enclosed by the intersection of the ¸ and ¸ surfaces can be shown to be 2)(((K))!K((K!1)))/K!. Fig. 6(a), shows the volumes <(¸ ), <(¸ ) and <(¸ 5¸ ) for K"1}10. From this, the ¸ surface can be seen to o!er far superior space constraint in comparison to the ¸ surface. The intersection volume of both ¸ and ¸ can be seen to be only marginally better than ¸ constraint alone. This can be seen in Fig. 6(b), which shows the volumes <(¸ ) and <(¸ 5¸ ) normalized to the corresponding <(¸ ) vol ume, thereby giving a direct comparison of the relative e$ciency of ¸ and (¸ 5¸ ) with respect to the ¸ sur face in terms of the volume ratios. This clearly reveals that elimination by ¸ is more e$cient than the hyper cube ¸ based elimination, with the e$ciency increasing with dimension. The important consideration in the ¸ -based elimina tion is the computational cost involved in determining if
V. Ramasubramanian, K.K. Paliwal / Pattern Recognition 33 (2000) 1497}1510
a codevector lies inside a given ¸ surface. As seen earlier, this will amount to computing the ¸ norm between the codevector and the test vector b " ) "x !c " and G GI I I geometry checking if b 'd (K. While the hypercube G ALL permits realization of the ¸ based elimination using binary search with ordered coordinates and intrinsically simple elimination checks, ¸ -based elimination cannot be realized by similar alternate schemes. As a result, one of the direct ways of realizing it is as a part of the distance computation, as done in Ref. [12] with the use of an additional loop which checks the ¸ norm value in a partial distance form to eliminate a codevector before computing its squared error distance.
Fig. 5. Geometric constraint for elimination by ¸ (ABCD), ¸ (EFGH) and ¸ 5¸ (IJKLMNOP) approximation surfaces of current-nearest-neighbor ball.
1505
2.4.2. Approximation}elimination with ¸ constraint Alternately, consider the following approximation} elimination search algorithm based on using ¸ values. Given a test vector x, the ¸ approximation values b "¸ (x, c )" ) "x !c " for i"1,2, N, are comG G GI I I puted. Approximation-elimination search using the ¸ approximation can be performed as d "R (a very large number) ALL Do until codebook C is empty c "arg mincJ C b ; C"C!c G Z J G d"d(x, c ) G if d(d then ALL j"i; d "d ALL C"C!+c : b *d (K, K K ALL endif enddo Here, it can be noted that the b " ) "x !c " G GI I Iof the values are used for approximation instead ¸ values as in Refs. [10,12]. This approximation by ¸ norm is illustrated in Fig. 7. Here, the codevector c is the nearest neighbor of the test vector under ¸ norm. The ¸ nearest neighbor is codevector c which is closer to x than c . (This can be understood based on the equivalence relation shown in Fig. 4.) The ¸ nearest neighbor is determined in the subsequent approximation}elimination steps using the ¸ elimination corre sponding to ¸ (x, c ). The above algorithm essentially obtains a subset of candidate codevectors C"+c : ¸ (x, c ))d (K, by H H ALL employing the upper bound of ¸ distances corresponding
Fig. 6. Volume of ¸ , ¸ and ¸ 5¸ surfaces inscribing unit hypersphere (¸ "1); (a) volume enclosed by ¸ , ¸ and ¸ 5¸ surfaces as function of dimension K; (b) volume ratios ¸ /¸ and (¸ 5¸ )/¸ as function of dimension K.
1506
V. Ramasubramanian, K.K. Paliwal / Pattern Recognition 33 (2000) 1497}1510 Table 1 Comparison of ¸ and ¸ approximation}elimination e$ciency
¸ ¸
SNR GLL
%CER
nc G
ncY G
((nc) ) GN
nc
ncY G
12.0 12.3
17.9 9.1
8.8 2.2
395 117
19.5 4.1
6.1 1.8
159 49
Dimension K"8; Codebook size N"1024 Data: 50000 vectors of speech waveform (SNR "12.5). DQ
basic ¸ and ¸ approximation}elimination e$ciencies. Thus, the ¸ (¸ )-based search procedures in the above form essentially consists of using ¸ (¸ ) values for approximation and elimination and provide a comparison of the intrinsic ¸ (¸ ) approximation}elimination e$ciency.
Fig. 7. Approximation by ¸ (x, c ); c "argminci C ¸ (x, c ); Z G G c "argmincG C ¸ (x, c ). G Z
to the initial codevector c obtained by the ¸ based G approximation in Step 1 (c "arg mincH C : ¸ (x, c ); G Z H d "d(x, c )). The codevectors in C are potential upALL G dates for the current-nearest-neighbor and have their ¸ (x, c ) distances bound by [¸ (x, c )/(K, d (K] as H G ALL can be seen from Fig. 4(b), corresponding to the initial codevector c obtained by ¸ approximation and the G ¸ (x, c ) bound given ¸ (x, c )"d . The algorithm se G G ALL lects candidate codevectors from C by ¸ ordering and performs elimination using the b "¸ (x, c ) ordering. G G The precomputed b values (used for approximation) G are also used for elimination by ¸ in subsequent elim ination steps for each current-nearest-neighbor update. This will have an intrinsically higher approximation} elimination e$ciency than the elimination carried out during the distance computation step within a full-search as in Ref. [12] for the following reasons: (i) ¸ -based approximation e$ciency is higher than ¸ -based ap proximation and the search starts with an initial codevector (by ¸ approximation) closer to the test vector than that obtained by ¸ approximation, (ii) this in turn results in a faster codebook size reduction during the subsequent elimination steps, (iii) the elimination consists only of scalar comparisons of the form b 'd (K, G ALL which can be realized by binary search on an ordered list of the precomputed b . G It should also be noted that the procedure in Ref. [12] does not use precomputed ¸ values for repeated approximation}elimination subsequent to the "rst approximation}elimination and hence does not represent a typical &approximation}elimination' search as given above. Use of the ¸ norm in the place of ¸ in the above procedure makes the search similar to the minmax method [10] and gives a direct comparison of the
2.5. Simulation results In Table 1, we compare the approximation}elimination e$ciency of the above ¸ and ¸ -based search in the context of vector quantization of speech waveform for dimension K"8 and codebook size N"1024 using 5000 vectors. If c is the xrst candidate codevector seG lected by the approximation criterion from the full codebook, the approximation e$ciency can be characterized using measures which quantify the closeness of c to the G test vector (or the actual nearest neighbor) when obtained over a large test vector sequence. The measures used here are: (i) SNR : the SNR in using the &initial' GLL codevector as the actual nearest-neighbor codevector, (ii) %CER: the percent &classi"cation error rate' which is the percentage of times the initial codevector is not the actual nearest neighbor. The ideal limits of these measures are their values corresponding to the actual nearest-neighbor codevector; these limits are: SNR for SNR , i.e., the DQ GLL SNR corresponding to the actual nearest neighbor determined by full-search and 0 for %CER. In addition to this, the combined approximation}elimination e$ciency is characterized by measuring the (average, maximum) number of codevectors (nc , ncY ), retained after eliminaG G tion with the current nearest-neighbor ball corresponding to the initial candidate codevector selected by the approximation criterion; ((nc) ) is the standard deviation GN of this measure. (nc, ncY ) is the (average, maximum) number of distances computed in the overall search, which is the actual complexity of the search. From this table, it can be seen that ¸ has a signi" cantly higher approximation e$ciency than ¸ with the SNR , and %CER values much closer to their GLL respective actual nearest-neighbor limits. The combined approximation}elimination e$ciency of ¸ can be seen to be much higher than ¸ by the smaller (nc , ncY ) G G values for ¸ , particularly with the maximum number of
V. Ramasubramanian, K.K. Paliwal / Pattern Recognition 33 (2000) 1497}1510
codevectors ncY retained after the "rst elimination being G signi"cantly lower than for ¸ . The large di!erence in the standard deviation ((nc) ) also indicates the relatively GN poorer e$ciency of ¸ in comparison to ¸ . The same can be observed with respect to the overall approximation-elimination e$ciency given in terms of (nc, ncY ), particularly with the worst-case complexity ncY being very low for search based on ¸ in comparison to ¸ approx imation-elimination. In Fig. 8, we show the performance of ¸ -based search for dimensions K"2,4,6 and 8 and codebook size N"1024 in terms of (nc , ncY )-the (average, maximum) G G number of codevectors retained after the "rst elimination step (i.e., with the current nearest-neighbor ball corresponding to the initial candidate codevector) and (nc, ncY )*the (average, maximum) number of distances computed in the overall search, which is the actual complexity of the search. (nc , ncY ) represents the complexity G G of search if a full-search is carried out after the "rst elimination step and ( nc, ncY ) is the complexity under approximation}elimination search with the codebook size reducing progressively with each elimination step. Here, the main points to be noted are: (i) the initial complexity of (nc , ncY ) is reduced signi"cantly to (nc, ncY ) G G by the approximation}elimination search for both ¸ and ¸ search consistently for all dimensions, (ii) while the average complexity of both ¸ and ¸ search is practically the same, ¸ -based search has a signi"cantly higher worst-case complexity than ¸ -based search. Moreover, the worst-case complexity of ¸ -based search increases more signi"cantly with increasing dimension than ¸ based search. The increasing di!erence between
1507
¸ and ¸ in the worst-case initial codebook size ncY can G be particularly noted. This trend conforms with the general volume relationship shown in Fig. 6. 2.6. Discussion The main shortcoming in the above search procedures is the high overhead complexity in the ¸ - or ¸ -based approximation. The initial approximation step is essentially a full-search using the ¸ or ¸ distance. The cost incurred in the initial ¸ value (b ) computation and G approximation is [0, NK#N(K!1), (N!1)] macs (multiplications, additions, comparisons) per vector. For the ¸ -based approximation, this initial approximation cost is [0, NK, (N!1)#N(K!1)] macs per vector. In comparison to the ¸ distance-based full-search costs, this initial approximation step itself has an additional overhead cost of N(K!1) comparisons per vector for ¸ -based search and N(2K!1) additions/subtractions per vector for ¸ approximation. The other overhead costs in the above search consists of the two main parts: (i) Initial elimination cost: The cost incurred in the "rst elimination step after the computation of the distance with the initial codevector and the test vector. When carried out by direct comparisons in an unordered list, this has a cost of N scalar comparisons for both ¸ - and ¸ -based search. (ii) Subsequent approximation}elimination costs: This mainly depend on the size of codebook after the "rst elimination step. As seen in Table 1 ncY , the G average number of codevectors retained after the "rst
Fig. 8. Performance of ¸ - and ¸ -based search for dimensions K"2, 4, 6 and 8 and codebook size N"1024; (a) average complexity, (b) worst-case complexity.
1508
V. Ramasubramanian, K.K. Paliwal / Pattern Recognition 33 (2000) 1497}1510
elimination is very small and hence the average overhead costs in subsequent approximation}elimination is practically negligible. The worst-case costs can however be high as this is proportional to ncY . Here, ¸ based search will incur lower overhead G cost in subsequent search due to the small ncY resultG ing from its higher elimination e$ciency.
reduce the e$ciency of elimination using the ¸ values. For the ¸ (hypercube)-based search, an alternate and computationally more attractive approximation}elimination procedure has been proposed and studied in detail in Refs. [52,44].
3. Conclusions The main complexity and overhead costs involved in the ¸ -, ¸ -based search for dimension K"8 and code book size N"1024 in terms of number of macs (multiplications, additions, comparisons) per sample is as follows: The main complexity (macs) per sample, corresponding to the number of codevector distances nc is [nc, nc(2K!1)/K, (nc!1)/K]. The average and worstcase macs are obtained using nc"nc and nc"ncY . Using these values from Table 1, the (average, worst-case) complexity of ¸ search is [(6.1, 159), (11, 286), (0.63, 19.7)] and of ¸ search is [(1.8, 49), (8.2, 88), (0.1, 6)]. The overhead complexity obtained using the di!erent overhead costs described above is [0, 1024, 1152] for ¸ search and [0, 1920, 256] for ¸ search. The full search macs are [1024, 1920, 127.9]. From this, it can be seen that in terms of main complexity, while the average complexities of ¸ - and ¸ -based search are compara ble, ¸ -based search o!ers a much lower worst-case complexity; here this can be seen to be about a third of the complexity of ¸ based search. In terms of the overhead complexity, ¸ -based search has higher cost of additions and subtractions than ¸ search, but, signi" cantly lower comparison cost than ¸ search. In practi cal applications where additions, subtractions and comparisons, have same computational cost and are less expensive than multiplications, ¸ -based search will o!er better complexity reduction than ¸ -based search. Thus, the search complexity reduction for both ¸ and ¸ -based search here is mainly with respect to reducing the NK multiplications per test vector to a low value. Though multiplication is usually considered the dominant operation in actual search, in hardware realizations where all the three operations (multiplications, additions and comparisons) have equal weightage, this algorithm will not o!er any signi"cant savings due to the high overhead complexity alone, despite the excellent reduction in the main complexity of number of distances computed. The high overhead computational complexity presents the main drawback in the ¸ and ¸ -based approxima tion}elimination search. This motivates the need to "nd alternate approximation procedures which will incur lower approximation cost than the procedures considered above. The partial distance realization (as done in Ref. [12]) for the "rst ¸ approximation is one such solution which can also be applied for ¸ -based approxi mation. However, this results in the ¸ values of several codevectors being lower than the actual values and will
In this paper, we have considered algorithms based on elimination rules which use direct properties of the distance measure such as the partial distance elimination, hypercube elimination, elimination based on the absolute-error-inequality approximation of the Euclidean distance and the triangle-inequality-based elimination, in an &approximation}elimination' search framework. Here, in addition to using e$cient elimination rules to reject codevectors without distance computation, explicit importance is given to the sequence in which the codevectors are examined in the search as determined approximately by their relative closeness to the test vector. The algorithms proposed and studied in this paper under this framework are based on elimination rules derived from approximations of the Euclidean distance, namely, hypercube elimination, and the elimination based on the absolute-error-inequality approximation. In this paper, previous algorithms based on these elimination rules are reviewed in the context of approximation}elimination search. The main contribution in this paper is a comparative study of di!erent approximation schemes and elimination constraints with reference to their approximation} elimination e$ciency. The paper is particularly oriented towards providing additional geometric insight to the main approximation}elimination schemes in an uni"ed framework and empirical studies characterizing the approximation}elimination e$ciency of the di!erent approximation}elimination search in detail. Algorithms based on the ¸ , ¸ approximation of the ¸ norm have been considered with reference to the earlier work based on the hypercube constraint of the search space [9,11], the ¸ -based &minmax' algorithm [10] and its subsequent improvement using the absolute-error-inequality-based elimination [12]. These algorithms are viewed under the approximation}elimination framework and geometric details, not brought out originally in these algorithms, are provided for additional insight. The absolute error inequality is noted to be the general equivalence of the ¸ and ¸ norms and the ¸ -, ¸ -based elimination criterion are geometrically seen as based on the constrained minimization and maximization of the ¸ or ¸ norm given ¸ norm. Elimination based on the ¸ constraint has been shown to be more e$cient than the ¸ constraint based on geometric inter pretation of the norm equivalence relationship and the volume ratios of the ¸ surface and the ¸ (hypercube) surfaces for a given ¸ ball. The relative e$ciencies of
V. Ramasubramanian, K.K. Paliwal / Pattern Recognition 33 (2000) 1497}1510
these eliminations have been characterized and studied with respect to dimensionality in the context of vector quantization of speech waveforms.
4. Summary In this paper, we provide an overview of fast nearestneighbor search algorithms based on an &approximation}elimination' framework under a class of elimination rules derived from approximations of Euclidean distance. Here the search consists of successive approximation to the actual nearest neighbor using repeated application of two steps: (i) approximation: selecting a candidate codevector for testing as a possible successor to the current-nearest-neighbor and, (ii) elimination: rejecting codevectors which cannot be nearer to the given test vector than the current-nearest-neighbor using elimination rules without having to compute the distance from the test vector. The role of an explicit approximation step is to select codevectors which are as close as possible to the test vector using computationally inexpensive criterion and serves as an e$cient alternative to choosing codevectors in random or in a prespeci"ed sequence. Elimination involves the approximation of the currentnearest-neighbor ball by geometric constraints which are simple to compute so as to allow easy elimination of codevectors which do not lie inside the current nearestneighbor ball. Previous algorithms based on elimination rules, namely, partial distance elimination, hypercube elimination and absolute-error-inequality elimination, are reviewed in the context of approximation}elimination search. The main emphasis in this paper is a comparative study of these elimination constraints with reference to their approximation}elimination e$ciency set within di!erent approximation schemes.
References [1] A. Gersho, V. Cuperman, Vector quantization: a pattern matching technique for speech coding, IEEE Commun. Mag. 21 (1983) 15}21. [2] R.M. Gray, Vector quantization, IEEE ASSP Mag. 1 (1984) 4}29. [3] J. Makhoul, S. Roucos, H. Gish, Vector quantization in speech coding, Proc. IEEE 73 (1985) 1555}1588. [4] H. Abut, Vector Quantization, IEEE, Piscataway, NJ, 1990. [5] A. Gersho, R.M. Gray, Vector Quantization and Signal Compression, Kluwer, Boston, 1992. [6] W.B. Kleijn, K.K. Paliwal, Speech Coding and Synthesis, Elsevier, Netherlands, 1995. [7] M. Barlaud et al., Special issue on vector quantization, IEEE Trans. Image Process. 5 (1996) 197}404. [8] J.H. Friedman, F. Baskett, L.J. Shustek, An algorithm for "nding nearest neighbours, IEEE Trans. Comput. 24 (1975) 1000}1006.
1509
[9] T.P. Yunck, A technique to identify nearest-neighbours, IEEE Trans. System Man Cybernet. SMC-6 (1976) 678}683. [10] D.Y. Cheng, A. Gersho, B. Ramamurthi, Y. Shoham, Fast search algorithms for vector quantization and pattern matching, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, 1984, pp. 9.11.1}9.11.4. [11] M.R. Soleymani, S.D. Morgera, A high speed search algorithm for vector quantization, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 1987, pp. 1946}1948. [12] M.R. Soleymani, S.D. Morgera, An e$cient nearest-neighbour search method, IEEE Trans. Comput. COM-35 (1987) 677}679. [13] E. Vidal, An algorithm for "nding nearest neighbours in (approximately) constant average time complexity, Inform. Process. Lett. 4 (1986) 145}157. [14] R.C.T. Lee, Y.H. Chin, S.C. Chang, Application of principal component analysis to multi-key searching, IEEE Trans. Software Engng SE-2 (1976) 185}193. [15] C.H. Papadimitriou, J.L. Bentley, A worst-case analysis of nearest-neighbor searching by projection, in: Automata Languages and Programming: Lecture Notes in Computer Science, Vol. 85, Springer, Berlin, 1980, pp. 470}482. [16] K. Weidong, H. Zheng, Fast search algorithms for vector quantization, Proceedings of the International Conference on Pattern Recognition, 1986, pp. 1007}1009. [17] S. Adlersberg, V. Cuperman, Transform domain vector quantization for speech signals, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, 1987, pp. 1938}1941. [18] J.S. Koh, J.K. Kim, Fast sliding search algorithm for vector quantisation in image coding, Electron. Lett. 24 (1988) 1082}1083. [19] J. Bryant, A fast classi"er for image data, Pattern Recognition 22 (1989) 45}48. [20] S.H. Huang, S.H. Chen, Fast encoding algorithm for VQbased image coding, Electron. Lett. 26 (1990) 1618}1619. [21] L. Guan, M. Kamel, Equal-average hyperplane partitioning method for vector quantization of image data, Pattern Recognition Lett. 13 (1992) 693}699. [22] S.W. Ra, J.K. Kim, Fast weight-ordered search algorithm for image vector quantization, Electron. Lett. 27 (1991) 2081}2083. [23] J. Ngwa-Ndifor, T. Ellis, Predictive partial search algorithm for vector quantization, Electron. Lett. 27 (1991) 1722}1723. [24] A. Nyeck, H. Mokhtari, A. Tosser-Roussey, An improved fast adaptive search algorithm for vector quantization by progressive codebook arrangement, Pattern Recognition 25 (1992) 799}802. [25] A.P. Wilton, G.F. Carpenter, Fast search methods for vector lookup in vector quantisers, Electron. Lett. 28 (1992) 2311}2312. [26] G. Poggi, Fast algorithm for full-search VQ encoding, Electron. Lett. 29 (1993) 1141}1142. [27] C.D. Bei, R.M. Gray, An improvement of the minimum distortion encoding algorithm for vector quantization, IEEE Trans. Commun. COM-33 (1985) 1132}1133. [28] K.K. Paliwal, V. Ramasubramanian, E!ect of ordering the codebook on the e$ciency of the partial distance search algorithm for vector quantization, IEEE Trans. Commun. COM-37 (1989) 538}540.
1510
V. Ramasubramanian, K.K. Paliwal / Pattern Recognition 33 (2000) 1497}1510
[29] Y. Linde, A. Buzo, R.M. Gray, An algorithm for vector quantization design, IEEE Trans. Commun. COM-28 (1980) 84}95. [30] F.P. Fischer, E.A. Patrick, A preprocessing algorithm for nearest neighbour decision rules, Proceedings of the National Electronics Conference, Vol. 26, 1970, pp. 481}485. [31] W.A. Burkhard, R.M. Keller, Some approaches to bestmatch "le searching, Commun. ACM 16 (1973) 230}236. [32] C.J. Van Rijsbergen, The best-match problem in document retrieval, Commun. ACM 17 (1974) 648}649. [33] K. Fukanaga, P.M. Narendra, A branch and bound algorithm for computing k-nearest neighbours, IEEE Trans. Comput. 24 (1975) 750}753. [34] B.A. Shapiro, The choice of reference points in best-match "le searching, Commun. ACM 20 (1977) 339}343. [35] I.K. Sethi, A fast algorithm for recognizing nearest neighbours, IEEE Trans. System Man Cybernet. SMC-11 (1981) 245}249. [36] I. Kalantari, G. McDonald, A data structure and an algorithm for the nearest point problem, IEEE Trans. Software Engng. SE-9 (1983) 631}634. [37] B. Kamgar-Paris, L.N. Kanal, An improved branch and bound algorithm for computing k-nearest neighbours, Pattern Recognition Lett. 3 (1985) 7}12. [38] E. Vidal, H.M. Rulot, F. Casacuberta, J.M. Benedi, On the use of a metric-space search algorithm (AESA) for fast DTW-based recognition of isolated words, IEEE Trans. Acoustics, Speech Signal Process. 36 (1988) 651}660. [39] H. Niemann, R. Goppert, An e$cient branch-and-bound nearest-neighbour classi"er, Pattern Recognition Lett. 7 (1988) 67}72. [40] S.H. Chen, J.S. Pan, Fast search algorithm for VQ-based recognition of isolated words, IEE Proceedings, Vol. 136, 1989, pp. 391}396. [41] V. Ramasubramanian, K.K. Paliwal, An e$cient approximation-elimination algorithm for fast nearest-neighbour search based on a spherical distance coordinate formulation, in: L. Torres-Urgell, M.A. Lagunas-Hernandez (Eds.), Signal Processing V: Theories and Applications (Proceedings EUSIPCO-1990, Barcelona, Spain), North-Holland, Amsterdam, 1990, pp. 1323}1326.
[42] D. Shasha, Tsong-Li Wang, New techniques for best match retrieval, ACM Trans. Inform. Systems 8 (1990) 140}158. [43] M.T. Orchard, A fast nearest-neighbor search algorithm, Proceedings of the IEEE International Conference Acoustics, Speech, and Signal Processing, Toronto, Canada, 1991, pp. 2297}2300. [44] V. Ramasubramanian, K.K. Paliwal, An e$cient approximation-elimination algorithm for fast nearest-neighbor search, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, San Francisco, California, 1992, pp. I-89}I92. [45] M.L. Mico, J. Oncina, E. Vidal, An algorithm for "nding nearest neighbors in constant average time with linear space complexity, Proceedings International Conference on Pattern Recognition, Hague, 1992, pp. 557}560. [46] M.L. Mico, J. Oncina, E. Vidal, A new version and improvements of the nearest-neighbor approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements, Pattern Recognition Lett. 15 (1994) 9}18. [47] E. Vidal, New formulation and improvements of the nearest neighbor approximating and eliminating search algorithm, Pattern Recognition Lett. 15 (1994) 1}8. [48] M.L. Mico, Nearest-neighbor search algorithms in metric spaces, Ph.D. Thesis, Universidad Politecnica de Valencia, 1996. [49] M.L. Mico, J. Oncina, E. Vidal, A fast branch and bound nearest neighbor classi"er in metric spaces, Pattern Recognition Lett. 17 (1996). [50] E. Vidal, F. Casacuberta, H. Rulot, Is the DTW &distance' really a metric?*an algorithm reducing the number of DTW comparisons in isolated word recognition, Speech Commun. 4 (1985) 333}344. [51] E. Vidal, F. Casacuberta, J.M. Benedi, M.J. Lloret, H. Rulot, On the veri"cation of the triangle inequality by DTW dissimilarity measures, Speech Commun. 7 (1988) 67}79. [52] V. Ramasubramanian, Fast algorithms for nearest-neighbour search and application to vector quantization encoding, Ph.D. Thesis, Computer Systems and Communications Group, TIFR, Bombay, India, 1991.
About the Author*V. RAMASUBRAMANIAN received his B.Sc. degree (in Applied Sciences) in 1981, B.E. degree (in Electronics and Communications) from Indian Institute of Science, Bangalore, India, in 1984. He joined the Computer Systems and Communications Group, Tata Institute of Fundamental Research, Bombay, India in 1984 as Research Scholar and obtained his Ph.D. degree (in Computer science) from University of Bombay in 1992. He was a visiting scientist at the Department of Informatic Systems Computation, University of Valencia, Spain from 1991}1992. Since 1993, he is a Fellow in Computer Systems and Communications (CSC) Group, TIFR. Presently he is an Invited Researcher at ATR Interpreting Telecommunications Research Laboratories, Kyoto, Japan for the period 1996}1997. His main research interests are in speech coding, vector quantization-based coding algorithms, complexity reduction of vector quantization encoding, fast nearest-neighbor search algorithms, self-organizing neural networks, pattern recognition in metric space, and speech recognition. About the Author*K.K. PALIWAL received the B.S. degree from Agra University, India, in 1969, the M.S. degree from Aligarh University, India, in 1971 and the Ph.D. degree from Bombay University, India, in 1978. Since 1993, he has been a Professor (Chair, Communication/Information Engineering) at the Gri$th University, Brisbane, Australia. He has worked at a number of organizations including Tata Institute of Fundamental Research, Bombay, India, Norwegian Institute of Technology, Trondheim, Norway, University of Keele, U.K., AT&T Bell Laboratories, Murray Hill, NJ, USA, and Advanced Telecommunication Research (ATR) Laboratories, Kyoto, Japan. He has co-edited two books: Speech Coding and Synthesis (Elsevier, 1995) and Speech and Speaker Recognition: Advanced Topics (Kluwer, 1996). He has published more than 100 papers in international journals. He is a recipient of the 1995 IEEE Signal Processing Society Senior Award. He is as Associate Editor of the IEEE Transactions on Speech and Audio Processing and IEEE Signal Processing Letters. His current research interests include speech processing, image coding and neural networks.
Pattern Recognition 33 (2000) 1511}1524
Weighted matchings for dense stereo correspondence夽 Gabriel Fielding, Moshe Kam* Data Fusion Laboratory, Department of Electrical and Computer Engineering, Drexel University, 3141 Chestnut St, Philadelphia, PA 19104, USA Received 8 January 1999; accepted 1 June 1999
Abstract The calculation of matches between pixels, points, or other features in stereo images is known as the correspondence problem. This problem is ill-posed due to occlusions; not every pixel, point or feature in one stereo image has a match in the other. Minimization of a cost function over some local region and dynamic programming algorithms are two well-known strategies for computing dense correspondences. However the former approach fails in regions of low texture, while the latter imposes an ordering constraint which is not always satis"ed in stereo images. In this study, we present two new techniques for computing dense stereo correspondence. The new methods are based on combinatorial optimization techniques which require polynomial computation time. The "rst method casts the selection of matches as the assignment problem, solved e$ciently by "nding a maximum weighted matching on a bipartite graph. The second is a greedy algorithm which computes suboptimal weighted matchings on the bipartite graphs. Both methods use occlusion nodes when no matches exist. The resulting disparity maps have desirable properties such as dense correspondence, while avoiding the drawbacks associated with ordering constraints. Three existing matching approaches are also reviewed for comparative purposes. We test all "ve techniques on real and synthetic stereo images using performance criteria which speci"cally measure occlusion detection. 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Stereo vision; Dynamic programming; Matching algorithms; Bipartite graph; Weighted matching; Greedy matching
1. Introduction A fundamental problem in stereo vision is to determine where corresponding pixels, points, or other features occur in images taken from stereo cameras. Featurebased methods identify particular features (e.g., point, corner, line) in each image and perform feature matching, thus establishing correspondence on a sparse set of
夽 This work was funded by an ONR sponsored NDSEG research fellowship, by the National Science Foundation through grants ECS 9057587, ECS 9216588, and ECS 9512363, and by the Electric Power Research Institute through grant SF 958. * Corresponding author. Tel.: #215-895-6920; fax: #215895-1695. E-mail address:
[email protected] (M. Kam).
points in the image. Correlation-based stereo compares intensity values within local windows in stereo images in order to establish correspondence. The use of windows avoids numerous problems with feature extraction and results in a dense set of range information about the scene. The matching phase of a stereo algorithm often requires that a local feature (or local region) in one image be compared with several candidates in the other image in search of a match. In correlation-based stereo the "rst step is to get an integer estimate for d in the relation R(y, x #d)"¸(y, x ). As the true value of d may not be P J an integer, there are several algorithms that re"ne the estimate, using either interpolation [2], iterative schemes such as relaxation [3], or more complicated adaptive
Also called area-based stereo by some authors [1].
0031-3203/00/$20.00 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 1 3 2 - 6
1512
G. Fielding, M. Kam / Pattern Recognition 33 (2000) 1511}1524
matching techniques [4]. Their success often depends on the closeness of the initial estimate to the correct value of d. Most of these techniques are too slow for real-time stereo, and hence we focus here on deterministic algorithms that provide fast integer estimates of d. A very popular approach to solving the correspondence problem in stereo is the minimization of sums-ofsquared diwerences [5]. This method is easily implemented and has been used for real-time navigation of mobile robots [6]. Another matching approach [7] uses dynamic programming with a monotonic ordering constraint. Matching is carried out by minimizing a performance index for an entire epipolar line. Thus, locally `gooda matches may be ignored in favor of a matching that is `bettera for the whole line. The monotonic ordering constraint reduces the computational complexity but, as we shall show, it faces di$culties when narrow occluding objects are present. In this paper we introduce two alternative matching techniques which maximize cost functions along the entire epipolar line. The "rst technique casts stereo matching as an integer programming problem whose solution can be found in polynomial time through a maximum weighted matching on a bipartite graph [8]. The second technique is a suboptimal greedy matching algorithm whose matching performance is comparable to both dynamic programming and maximum weighted matching, yet it has lower computational complexity. The two new matching algorithms are compared to existing matching strategies using both synthetic and real images.
2. Camera geometry and terminology We use the pinhole camera model [10] and assume that surfaces are Lambertian [11]. When a light ray from a source is re#ected o! a surface and passes through the pinhole lens of a camera onto the sensor array, we call the subsequent pixel representation of the re#ected ray a projection of the surface. Hence, an arbitrary object in space (i.e., a wall, container) will be represented by a collection of pixels in the image. We make the assumption that every pixel is a projection of one surface. A more detailed description of image formation may be found in the literature for image synthesis (e.g., Ref. [12]).
The Hungarian method can also be used, with inferior bounds on complexity, see Ref. [9]. This is not true in general because the size of the pixel allows it to collect light from a volume in space which may include two or more surfaces at di!erent distances from the cameras. These pixels usually occur on the boundaries of an object's projection in the image and constitute only a small percentage of the total pixels in the image.
Fig. 1 shows the image planes of two cameras in a parallel stereo con"guration. The lenses are represented by points f and f which are drawn behind the image J P planes in order to simplify the projections. Suppose that we are looking for the point in space that created the projection in the right image given by point a. Any point along the ray formed by f a could have created that P projection. Furthermore, we assume that the point in space that created point a has also created some point in the other image plane. The triplet, f , a and f de"ne a P J unique plane. The intersection of this plane with the left image and right image de"nes two lines known as epipolar lines. The epipolar lines represent the possible locations of corresponding projections. Thus, the set of pixels that need to be checked for similarity in the two images lie along predetermined lines. Given two corresponding epipolar lines, we seek to determine the correspondence for each pixel on the two lines. A pair of stereo cameras can be registered such that corresponding epipolar lines are known in the images from each camera. This registration requires a calibration procedure (e.g., Refs. [13,14]). Here, we assume that registration has been performed already, and corresponding epipolar lines have been mapped to horizontal rows in the image. We denote the left image ¸(y , x ) and the right image J J R(y , x ). The data for an image is arranged as a uniformly P P spaced array of integers where each integer represents the brightness of the image at that array element (pixel). Since we assume that calibration has been done, we set y"y "y . The dimensions of the images are M;N P J (M rows, N columns). We denote the minimum disparity and maximum disparity as d and d , re
spectively, with the total range of disparities given by d""d !d "#1.
We de"ne the disparity map D (y, x ) as an integerP P valued array whose elements specify the o!set of a pixel in the right image to its `matcha in the left image (we use the right image as the reference). This means that given
Fig. 1. Epipolar lines (shown as dashed lines) in parallel stereo camera systems.
G. Fielding, M. Kam / Pattern Recognition 33 (2000) 1511}1524
1513
pixel (y, x ) in the right image, the matching pixel in the P left image is located at pixel (y, x #D (y, x )); that is, P P P x "x #D (y, x ). D (y, x ) is unde"ned for pixels which J P P P P P have no match. All matching algorithms in this paper compute integer-valued disparities. Since the true displacement is a real number, interpolation techniques can improve the precision of the disparity estimate [4].
3. Similarity functions To quantify the similarity of corresponding pixels in two images, we use the following popular similarity functions: The normalized correlation (NC): f (y, x, d) ,! (R(y#j, x#i)!k ) (¸(y#j, x#i#d)!k ) P J. "! p p 0 * GHZ5 (1) The sum-of-squared diwerences (SSD): f (y, x, d)" [R(y#j, x#i)!¸(y#j, x#i#d)]. 11" GHZ5 (2) The sum-of-absolute diwerences (SAD): f (y, x, d)" "R(y#j, x#i)!¸(y#j, x#i#d)". 1" GHZ5 (3) In Eqs. (1)}(3), = is a window around the pixel (y, x) given by ="+(i, j): !p)i)p, !q)j)q,. Here, k , k are the mean pixel values over the left and right J P windows, and p , p are the standard deviations of the J P pixel values in the left and right windows, respectively. Each of the functions in Eqs. (1)}(3) takes a minimum value when the image intensities in the left and right windows are identical. For Eqs. (2) and (3) the minimum value is zero and for Eq. (1) this value is !1.
Fig. 2. Epipolar line matching for stereo.
array the cost matrix, c, where given a similarity function f (where m stands for SSD, SAD, or NC) and a row of K interest y: c(x , x )"!f (y, x , x !x ). J P K P J P
(4)
Any point, on a surface in space, that is `matchablea will have an ordered pair (x , x ) of left and right epipolar J P coordinates that specify the cost of the match through (4). This ordered pair also de"nes that point's location in world coordinates through the geometry of stereo cameras. For example, P in Fig. 2 corresponds to the pairing (b, a). Hence, determination of the corresponding pair (x , x ) determines the surface location. Since not J P every point in space is matchable (e.g., occlusions), some elements of the left and right epipolar lines have no corresponding match. To account for this, we de"ne an occlusion cost that e!ectively screens out features with no good matches. Computation of this occlusion cost depends on the similarity function being used and is discussed in the Appendix A. We present "ve matching techniques. The "rst three (local search, the left}right heuristic, and dynamic programming) are from the stereo-matching literature; the last two (maximum weighted matching and greedy matching) are the contributions of this study. 4.1. Existing matching methods
4. Matching algorithms In Fig. 2 we have arranged the left and right epipolar lines to show one representation of a surface observable in both stereo views. This depiction of matching between epipolar lines was "rst given in Ref. [15]. Since each element on an epipolar line is discrete because of the sampling process, we may use the coordinate along that line as an index into a two-dimensional array. The coordinates x and x specify a unique element in the array. J P We "ll each element in the array with the value from the similarity function as described in Section 3. we call the
The existing matching methods we review are local search [5,6] a modi"cation of local search called the left}right heuristic [16,17], and dynamic programming [18,7]. 4.1.1. Local search [5,6] A widely used approach for determining disparity is to compute the values of a similarity function for each candidate pixel and to choose the disparity that corresponds to the minimum of that function. Using the right image as the reference for our disparity map, we compute
1514
G. Fielding, M. Kam / Pattern Recognition 33 (2000) 1511}1524
the disparity on row y as D (y, x )" argmin c(x #d, x ). (5) P P P P B WBWB The similarity functions are not guaranteed to be unimodal, and the global minimum may not be unique. Furthermore, the correct match may correspond to a value other than a minimum of the similarity function. When there is very little texture in a scene, the similarity function may have a wide shallow shape with a plateau of indistinguishable minima. Various methods of coping with these limitations of the similarity functions have been attempted. One approach [19] is to require that the minimum of the similarity function be `lowera than any other value by some percentage. This approach fails when the correct match is not the global minimum. In Table 1 we give the pseudo-code description for local search incorporating the threshold for the occlusion cost, C . If each pixel has d potential matches then "nding the `besta match through minimization requires d!1 comparisons. With N pixels on the epipolar line, the overall complexity of local search is O(dN). 4.1.2. The left}right heuristic [16,17] Disparity values for corresponding pixels should be equal in absolute value but have opposite algebraic signs [17]. The search for a local minimum as described in Table 1 yields the minimum in each column of the epipolar cost matrix. Searching for a minimum along the row of the cost matrix would yield a disparity with respect to the left epipolar line. This suggests an algorithm that requires the left and right disparities to agree; that is, if the minimum in row k occurs in column j then we require that the minimum in column j be row k. One stereo algorithm [16] used a strategy to re"ne disparity estimates by horizontally #ipping the left and right images and then re-analyzing them by using the #ipped left image as the new right image and the #ipped right image as the new left image. Instead of re"ning estimates, we save those matches that agree between the right and left epipolar lines. We call this method the left}right heuristic (LRH). We de"ne two vectors LeftMate and RightMate whose elements are the index of the matching pixel on the opposite epipolar line. If separate minimizations along rows and columns produced the same match then we would have RightMate[LeftMate[i]]"i for each row i. In Table 2 we give the pseudo-code description of the left}right heuristic, again including the occlusion cost.
Agreement between left and right disparities is less likely when the surface is sharply sloping away from the stereo cameras.
Table 1 Algorithm for local search with occlusion cost threshold for j"1 to N do if (min c(k, j)CO), RightMate[ j]"argmin c(k, j) k
k
else RightMate[ j]"UNMATCHED
Table 2 Left}right heuristic algorithm for j"1 to N do RightMate[ j]"argmin c(k, j) k
for i"1 to N do LeftMate[i]"argmin c(i, k) k
for j"1 to N do if (LeftMate[RightMate[ j]]Oj) AND c(RightMate[ j], j))CO) then RightMate[ j]"UNMATCHED
As with local search, each of the N pixels on an epipolar line has d possible matches. For each of these pixels, we perform d!1 comparisons to "nd the minimum. The left}right heuristic requires minimization for each row and column in the cost matrix and then N comparisons to check for agreement. So the overall complexity is O(dN). 4.1.3. Dynamic programming [7,18] Dynamic programming (DP) is a method of organizing an optimization problem to exploit the recursive structure of necessary calculations [20]. Several studies have used this matching approach for stereo matching (e.g., Refs. [7,21]). Central to the use of DP is the monotonic ordering constraint which states that if x G matches x H then P J x G> can only match x H>I with k*1. Without the monoJ P tonic ordering constraint, one is required to perform an exhaustive search. The computational complexity of the dynamic programming approach with the monotonic ordering constraint is O(dN) [7]. Fig. 3 shows a graph on which the shortest path between the vertices labeled S and ¹ can be calculated using DP. The dotted edges allow for occlusions to occur; their weights are given by the occlusion cost C . We associate the weight from the epipolar cost matrix, c(x , x ), J P with the diagonal (solid) edges. While a Dijkstra-type shortest path algorithm [20] solves this problem, a more e$cient algorithm is possible because of the regular structure of the graph. Here we show the solution using DP; for a more detailed discussion see Refs. [7,21] (see Table 3). Each vertex is labeled with its predecessor using the array P(i, j) so we can backtrack from vertex
G. Fielding, M. Kam / Pattern Recognition 33 (2000) 1511}1524
Fig. 3. A graph representation of the stereo matching problem where the similarity between pixels is represented as a weight on the diagonal edges of the graph. The dotted edges allow for occlusions to occur and their weights are given by a cost parameter, C . -
T"(N, N) to "nd which edges are in the shortest path. A horizontal edge corresponds to an occlusion in the right disparity map and a diagonal edge represents a matching in the disparity map. Since DP is e!ectively searching for a shortest path through a graph, any match declared on an epipolar line a!ects subsequent matches. Of critical importance in DP solutions to stereo matching is the choice of the cost of occlusion, C since performance can vary signi"cantly when C is changed. Setting C too large results in no occlusions*the shortest path follows the direct diagonal path from S to ¹. Setting C too low results in selecting occlusions only. In Ref. [7], the authors propose a max-
1515
imum-likelihood framework for selecting the value of C . In Appendix A, we expand on this method to suggest a cost which allows operation with lower SNR than provided for in Ref. [7]. The monotonic ordering constraint e!ectively keeps local disparities near each other without using a regularization framework [22] or a relaxation technique [3], both of which are time consuming. The monotonic ordering constraint is not satisxed where the occluding objects are narrow. Use of DP under these conditions often results in a disparity map that partially detects or completely misses narrow objects. In Fig. 4(a) we show a matching for two epipolar lines. Segment p can be visualized as the surface of a narrow pillar which is visible in both images. The right epipolar line has an occluded region q which has no matches along the left epipolar line. The monotonic ordering constraint forces us to choose either path 1 or path 2 as shown in Fig. 4(b). The surface corresponding to path 1 `detectsa the object but fails to correctly match a large portion of the surface further from the cameras. The surface corresponding to path 2 fails to `detecta the object completely. Consider the synthetic images in Fig. 5(a) and (b). Here, three narrow occluding objects are present in a fairly textured environment. The right obstacle poses a challenge for DP for the reasons just explained. We compare the true disparity map in Fig. 5(c) with results from a dynamic programming-based matching algorithm in Fig. 6; clearly the right-hand side obstacle is mostly undetected. 4.2. New matching methods In this section we describe two new matching strategies for calculating dense correspondences in the epipolar array. Both methods maximize a cost function over the epipolar array; the "rst method computes the maximum
Table 3 Algorithm for dynamic programming-based stereo matching (adapted from Ref. [7]) for i"0 to N do P(i, 0)"2; V(i, 0)"i*C ; Initialize 1st column of vertices for j"0 to N do P(0, j)"3; V(0, j)"j*C ; Initialize 1st row of vertices for i"1 to N do for j"1 to N do v1"V(i!1, j!1);c(i, j) Matching edge cost v2"V(i!1, j);C Vertical edge/Occlusion v3"V(i, j!1);C Horizontal edge/Occlusion V(i, j)"min(v1, v2, v3) Choose shortest path P(i, j)"argmin(v1, v2, v3) Assign predecessor (1, 2, or 3) i"N; j"N; while ( j'0) Backtrack from vertex T if P(i, j)""1 then RightMate[ j]"i; i"i!1; j"j!1; else if P(i, j)""2 do i"i!1; else RightMate[ j]"UNMATCHED; j"j!1;
1516
G. Fielding, M. Kam / Pattern Recognition 33 (2000) 1511}1524
Fig. 4. (a) One epipolar line with two occluding objects and an occluded region q created by the object p (b) the DP solution has two possible paths, choosing (1) leaves a large region unmatched and choosing (2) leaves the surface undetected.
Fig. 5. A synthetic stereo pair with narrow objects, (a) left camera, (b) right camera, and (c) the true integer disparity map with respect to the right image.
Fig. 6. Disparity map generated via the DP approach presented in Ref. [7] using SSD with 3;3 windows.
weighted matching while the second method uses a greedy strategy to "nd a near-optimal maximum weighted matching. 4.2.1. Maximum weighted matching and the Hungarian method Fig. 7 shows a graph representation of the matching problem. Pixels from the left and right epipolar lines are
Fig. 7. A representation of the correspondence problem as a weighted graph.
drawn as nodes (circles) and potential matches between those pixels as drawn as edges (lines). With each edge we associate the weight a(x , x )"C !c(x , x ) where C is J P + J P + an upper bound on the appropriate similarity function. We have also linked an `occlusiona node to each node in right epipolar line with edge cost C !C . By doing this, + -
G. Fielding, M. Kam / Pattern Recognition 33 (2000) 1511}1524
we can include the occlusion cost as a matchable node. The calculation of disparities can be viewed as a weighted matching problem on a bipartite graph. We desire to "nd the set of edges for which the sum of these chosen edge weights is maximum while no two edges touch the same vertex (pixel). Finding an optimal matching for bipartite graphs is known in the combinatorial optimization literature as the maximum weighted matching (MWM) problem [23]. Formally, a bipartite graph G"(<, ;, E) has two distinct sets of nodes (pixels) < and ; and a set of edges E representing potential matches. Associated with each edge e3E is a cost c . This cost is the same as de"ned C earlier where c "a(x , x ) with x 3< and x 3;. A C J P P J matching is de"ned as a subset of edges (EI -E) such that no two edges share a vertex. A maximum weighted matching has I c maximum. CZ# C We de"ne an indicator matrix, g(x , x ), where g(x , x ) J P J P is 1 if we decide that the pixels (y, x ) and (y, x ) match and P J 0 if they do not. We require that all matches be unique, namely if g(x , x )"1, then g(k, x )"0 for all kOx . J P P J Likewise, if g(x , x )"1, then g(x , k)"0 for all kOx . J P J P We can cast the problem of choosing assignments for pairs of pixels as the following integer programming problem: Maximize: a(x , x )g(x , x ) J P J P VJ VP #(1!g(x , x ))(C !C ) J P + Subject to: g(x , x )*0 ∀x , x , J P J P g(x , x )"0 or 1, for all x , J P J VP g(x , x )"0 or 1, for all x . J P P VJ
(6)
In an earlier study [9] we solved this integer programming problem in polynomial-time by using the &Hungarian method' of KoK nig and Egvary (e.g., Ref. [24]). The algorithm is based on primal-dual methods given by Lawler [8] which terminates with a matching of maximum weight, but not necessarily one of maximum cardinality. The complexity of the algorithm is O(N) where N is the number of pixels on an epipolar line (see Table 4).
Alternatively, we could have linked the occlusion nodes to the left epipolar line which would result in a di!erent matching; however, our experiments have shown that the di!erences in "nal matchings using the two linking schemes are insigni"cant. The same is true for adding both left and right occlusion nodes, with each edge having the occlusion cost C !(C /2). + -
1517
Table 4 Maximum weighted matching create U and V vertices (including occlusion vertices) assign edge weights, E, between vertices using c(xl, xr) or CO construct graph, G"(V, U, E) compute maximum weighted bipartite matching on G; for each node v3V, if mate[v]3Occlusion vertex mate[v]"UNMATCHED;
The integer program problem in Eq. (6) and the weighted bipartite matching problem are equivalent [24]. We use a graph algorithm library [25] with an e$cient implementation of a bipartite maximum weighted matching algorithm whose complexity is O(dN). Hereafter we shall refer to both of these approaches as maximum weighted matchings (MWMs). 4.2.2. A greedy algorithm for weighted matchings Greedy algorithms represent a well-known alternative to global optimization procedures [20]. We show how to apply a greedy strategy to stereo matching using the epipolar cost matrix. This method builds a matching by progressively adding edges with maximum weights in the cost matrix a(x , x ) that satisfy the criteria for a feasibly J P matching. This method di!ers from the maximum weighted matching approach in which the "nal matching weight is not guaranteed to be maximum. Fig. 8 shows an example of how a greedy matching can di!er from a maximum weighted matching. The resulting matching is not guaranteed to be globally optimal, however the matching given by the greedy algorithm is typically very close to the optimal matching and the algorithm has lower computational complexity.
Fig. 8. The greedy matching approach. Given the graph on the left with four edges, there are two possible matchings. The matching in the middle is the maximum weighted matching whose weights total 9. The matching on the right is the greedy matching with weight 8. A greedy matching strategy would "rst add the edge with largest weight (in this case, the edge with value 6). The edges with weight 5 and 4 are considered next but do not satisfy the criterion for a valid matching. This leaves only the edge with weight 2 to be added for a total matching of weight 8.
1518
G. Fielding, M. Kam / Pattern Recognition 33 (2000) 1511}1524
Table 5 Greedy algorithm for stereo matching for j"1 to N, Rightmate[ j]"UNMATCHED Initialize labels S"sort(c(x , x )) Sort costs by value J P while S is not empty, Remove maximum edge, a(i, j) from S Pick best edge. if (a(i, j)(C !C ), then break; If below threshold, done; + if (Rightmate[ j]""UNMATCHED) then otherwise, if not matched Rightmate[ j]"i assign match
Given a bipartite graph G"(<, ;, E) and an associated cost with each edge, c , e3E, the greedy matching C algorithm proceeds as follows: Let EI " and M"E. While there exists e3M, e , EI , such that EI 6 e is a feasible matching, let e"argmax+c : ∀e3M,, and set C EI "EI 6 e and M"M!+e,. Implementation of this algorithm is given in Table 5. Since there are about dN edges, the running time for the sorting phase is O(dN log(dN)). The list scanning operation takes O(dN) operations. So the overall complexity is O(dN log(dN)) for each epipolar line.
5. Performance assessment In order to quantify the performance of the "ve matching approaches, we have run the algorithms on synthetic 8-bit greyscale stereo images with random intensities and #at surfaces with random disparities. The images are 128;128 pixels with the intensity of background pixels uniformly distributed between 0 and 255. The resulting image is "ltered with a 5;5 pixel Gaussian "lter [26] to produce a smoothly varying pattern. Each image has between 1 and 10 objects in it. Each object has a random height and width, between 5 and 20 pixels. Each object has a distinct intensity pattern generated in the same way
as the background pattern and the object has a random disparity (between 5 and 20 pixels). The disparity search range for all algorithms is between 0 and 39 pixels (d"40). All algorithms used 3;3 windows and SSD as the similarity function. The cost of occlusion was set to C "542 as discussed in Appendix A. In Fig. 9 we show a sample of two of the random stereo images that we have used, along with the true disparity map. Needless to say, the synthetic images that we experimented with do not exhaust all possible test data; however, they do provide quantitative measures to assess matching algorithm performance when subject to noisy stereo data with occlusions present. In the computed disparity image, we have two types of labels for each pixel: occluded and non-occluded. Only pixels with a non-occluded label can be associated with a disparity. These two labels can be interpreted as corresponding to the two hypotheses: H occluded and H not occluded. A match is a binary decision: D pixel is occluded and D pixel is not occluded. For our synthetic stereo images we know the true matching labels for all pixels and we compute the statistics P(D "H ) and P(D "H ). The statistic P(D "H ) is the probability of correct detection (CD) and P(D "H ) is the probability of false alarm (FA). In the case of H and D , we compute mean-squared-error (MSE) as a measure of matching accuracy. In Table 6 we show the performance of the matching algorithms with C "542 (see Appendix A) averaged over 2000 randomly generated stereo pairs. The "rst column gives the false alarm rate, the second column the correct detection rate and the third column shows the mean-squared-error. The fourth column gives the computational complexities of the "ve algorithms. Local search has the highest false alarm rate as well as the highest detection rate and largest mean-squarederror. Compared to local search, the left}right heuristic improves the false alarm rate signi"cantly; the reason is that the left and right disparities disagree in regions of occlusion and the LRH detects these disagreements. Detection of occlusions comes at the expense of a lower rate
Fig. 9. A textured synthetic stereo pair [128;128], (a) left, (b) right, and (c) the true integer disparity map with respect to the right image. Occluded regions are marked in black.
G. Fielding, M. Kam / Pattern Recognition 33 (2000) 1511}1524 Table 6 Performance of matching algorithms on the textured synthetic stereo test set Algorithm
False alarm
Correct detection
MSE
Complexity
LS LRH DP MWM Greedy
0.1783 0.0404 0.0735 0.0543 0.0540
0.9449 0.7604 0.9166 0.8832 0.8882
4.2920 2.2887 1.4239 2.4745 2.9337
O(dN) O(dN) O(dN) O(dN) O(dN log(dN))
1519
speci"ed. In this case, the global matching approaches o!er superior performance. While these statistics provide measures for the performance of di!erent matching algorithms, none of them fully characterizes matching performance. For example, as was pointed out with DP, narrow occluding objects may be completely missed while maintaining an acceptable average correct detection rate. In the next section, we examine the qualitative performance of the matching algorithms using a variety of real and synthetic images.
6. Examples of correct detections since some correct matching decisions are wrongly labeled occlusions. Dynamic programming has a low probability of false alarm, a high probability of correct detection, and a low MSE. MWM has a lower probability of false alarm than DP but also a lower probability of correct detection. The greedy algorithm's performance is virtually identical to that of MWM. Since algorithm performance is dependent on variations in occlusion cost, we have also computed the ROC (receiver operating characteristic) for the matching algorithms as C varies, as well as the variation in MSE with C . These plots are given in Fig. 10. In Fig. 10(a) we have plotted the probability of correct detection against the false alarm rate with C as a parameter. The global matching strategies cluster together, yielding similar performances while local search (the innermost curve) has the worst performance. LRH initially stays close to the performance of the global matching approaches and then levels o! quickly. While its false alarm rate never exceeds 0.10, its correct detection rate also never reaches 0.8. In Fig. 10(b) we show the mean-squared error as a function of the parameter C . For values of C around the threshold of 542 (computed in Appendix A), DP, LRH, and MWM each have relatively low MSE. LRH has a low MSE over a large range of occlusion costs. The greedy algorithm, while similar to MWM, has a consistently higher MSE for a given C . While local search initially has the highest MSE, the MSE for dynamic programming continue to rise with the threshold C and eventually surpasses even local search. This is a result of no occlusions being chosen for large C and hence, the shortest path becomes the direct path from S to T in Fig. 3. All other algorithms have bounded errors as C varies. One conclusion of Table 6 and Fig. 10 is that the two new algorithms will be preferred for values of C exceeding 3000. In Table 7 we show performance statistics at a "xed false alarm rate; we also give the occlusion cost for which that false alarm rate was achieved. The left}right heuristic is omitted because it cannot operate in the region
6.1. Narrow occluding objects One of the major bene"ts of dynamic programming is that the ordering constraint is usually satis"ed in realworld environments; however, narrow occluding objects in the foreground of the image pose a problem, as discussed in Section 4.1.3. In Fig. 11, we show synthetic stereo images that have two narrow occluding objects. The results of stereo analysis with a disparity range of 40 pixels and C "4875 are shown in Fig. 12. Minimization with occlusion estimation in Fig. 12(a) still mislabels all of the points in the occluded regions, producing `ghostsa. The left}right heuristic in Fig. 12(b) produces a disparity map with fewer false-alarms in the occluded regions yet fewer correct detections of matchable pixels. Dynamic programming in Fig. 12(c) has failed to detect most of the left obstacle. Both MWM and the greedy algorithm, in Figs. 12(d) and (e), respectively, correctly match both obstacles although they have more `noisea artifacts as a result of their matching strategy. In situations like these, the new algorithms have a clear advantage over existing techniques. 6.2. Low-texture regions Correlation-based stereo methods encounter di$culties with regions which have very little texture because the similarity functions produce multiple minima. In Fig. 13 we present a rendered scene where the #oor is relatively featureless except for several shadows. The true disparity map is given in Fig. 13(c) with occluded regions marked in black. In Fig. 14 we show the disparity maps generated using the "ve methods described in this paper. All algorithms use SSD with 3;3 windows, d"40, and C "4870. The disparity map generated using local search is given in Fig. 14(a). The majority of the #oor in the scene is matched incorrectly. The correct disparity is found where there are shadows cast by the objects (providing intensity variation which allows for a correct matching). Outside these regions, the local search fails
1520
G. Fielding, M. Kam / Pattern Recognition 33 (2000) 1511}1524
Fig. 10. Performance curves for the the matching algorithms in this paper when applied to images of the type shown in Fig. 9. Receiver operating characteristic in (a) gives probability of correct detection versus probability of false alarm. Mean-square error versus occlusion cost is shown in (b).
to "nd the correct matches. In Fig. 14(b) the left}right heuristic has eliminated false matches on the #oor. However, we are left with very little disparity information in this region. Dynamic programming has correctly match-
ed the #oor, although artifacts from the matching process result in some incorrect matches on the left and right obstacles. Fig. 14(d) show that MWM correctly matches most of the #oor. The disparity map from the greedy
G. Fielding, M. Kam / Pattern Recognition 33 (2000) 1511}1524 Table 7 Performance measures for a constant false alarm rate of 0.10
LS DP MWM Greedy
False alarm Correct detection
MSE
Occlusion cost
0.10 0.10 0.10 0.10
3.7978 2.0439 3.1332 3.8388
386.9 851.2 1173.3 1178.5
0.8355 0.9282 0.9070 0.9172
1521
algorithm is shown in Fig. 14(e). While similar to MWM, the matching performance on the #oor is not as good as that provided by MWM. In Table 8 we have computed the performance statistics described in the previous section for the disparity maps in Fig. 14. These statistics show that the algorithms o!er similar performance characteristics on these rendered images as for the synthetic images in the previous section. Although LRH has the lowest MSE, it only matches about half of the points in the image. Both MWM and DP have low MSE (relative to local search).
Fig. 11. Synthetic images [128;128 pixels] with two narrow occluding objects: (a) left, (b) right, and (c) truth with occlusions marked in black.
Fig. 12. Disparity maps using: (a) local search, (b) left}right heuristic, (c) dynamic programming, (d) maximum weighted matching, and (e) greedy algorithm.
Fig. 13. A synthetic stereo pair with low-texture regions, (a) left, (b) right, and (c) the true integer disparity map with respect to the right image.
1522
G. Fielding, M. Kam / Pattern Recognition 33 (2000) 1511}1524
Fig. 14. Disparity maps using: (a) local search, (b) left}right heuristic, (c) dynamic programming, (d) maximum weighted matching, and (e) greedy algorithm.
Table 8 Performance measures for the disparity maps in Fig. 14 compared to the true disparity in Fig. 13
LS LRH DP MWM Greedy
False alarm
Correct detection
MSE
0.57 0.22 0.41 0.30 0.29
0.94 0.56 0.95 0.90 0.89
28.1 7.6 7.7 9.4 13.0 Fig. 15. Real stereo images (a) left, and (b) right [images are 256;256].
The greedy algorithm has a false alarm rate and a correct detection rate similar to MWM with a higher MSE. 6.3. Natural images In Fig. 15, we show two real stereo images [128 pixels square] of a terrain with trees, bushes and stumps. We analyze these images using 7;7 windows and normalized correlations as in Eq. (1), with d"60 and C "0.4. The matching results are shown in Fig. 16 where all disparity maps have been post-processed with a 5;5 median "lter to eliminate minor false matches. Local search, shown in Fig. 16(a) shows good performance, but there are several matching artifacts. Most of these errors are eliminated in Fig. 16(b) using the left}right heuristic; however, the coverage is less complete. DP, shown in Fig. 16(c), has very good performance as do MWM in Fig. 16(d), and the greedy algorithm shown in Fig. 16(e). In environments such as
For DP we use CK "0.2 to elicit `bettera performance. Given C for any of the matching algorithms except DP requires setting CK "C /2 to obtain a `comparablea matching. This is because DP can traverse two edges, e!ectively omitting a match at a given location.
this, where narrow occluding objects are rare, the similar performance of the global matching methods favors selection of DP because of its speed advantage.
7. Conclusions We have presented two new matching algorithms for solving the correspondence problem in stereo vision. They are based on formal combinatorial optimization techniques for "nding matchings on graphs. These algorithms were compared to three existing stereo matching techniques using a set of performance criteria which address correct detections, false alarms in occluded regions, and matching accuracy (mean-squared-error). In general, the global matching schemes*dynamic programming, maximum weighted matching, and the greedy algorithm*show superior performance to local matching methods*local search and the left}right heuristic. Of the global matching approaches, dynamic programming is fastest. As long as the ordering constraint is not violated, dynamic programming is the matching algorithm of choice. In unstructured environments where narrow occluding objects may be present, maximum weighted matching and greedy matching are preferred over dynamic programming.
G. Fielding, M. Kam / Pattern Recognition 33 (2000) 1511}1524
1523
Fig. 16. Disparity maps generated from the stereo images in Fig. 15 using 7;7 windows and the similarity function in Eq. (1) using (a) local search, (b) left}right heuristic, (c) dynamic programming, (d) maximum weighted matching, and (e) greedy algorithm.
Appendix A. Computing the cost of occlusion In Ref. [27] we describe a method of computing the cost of occlusion. This method is an expansion of a previous study by Cox et al. [7]. We use Eq. (2) and compute the occlusion cost based on the desired probability of detection P and the sensor noise p. " Let z and z be measurements from correspond G G ing epipolar lines in two cameras. If the measurements are matching (i.e., they are projections of the same point/surface in space), we assume that the di!erence between the measurements is an independent normal random variable z !z &N(0, p). We de"ne G G z "z !z so the random variable, z has a G G G G G G gamma distribution [28], c@ f (x)" x@\e\AV, V !(b)
(A.1)
where b" and c"1/2p. Knowing the underlying dis tribution (A.1) of the matching cost, the probability P " of obtaining a matching cost between 0 and CI can be calculated through the integral
P " "
!I -
1 1 CI x\ e\VN dx"c , - , 2pp 2 2p
(A.2)
where c is the incomplete gamma function de"ned by
1 V t@\e\R dt. c(b, x)" !(b) Conversely, we may "x P in (A.2) and compute CI " through tabulated percentile values of the cumulative gamma distribution or through numerical evaluation of the integral in Eq. (A.2). In this case, each matching instance may be considered a hypothesis test where we compare the test statistic (the matching cost) to the threshold CI which is determined by P . " The signi"cance of the hypothesis test would be 1!P . "
Let the current epipolar line be j . The neighborhood ) (p, q) of a feature z in camera s, on epipolar line QG H QG H j and feature at location i is de"ned as: ) (p, q)"[+z ,, s.t.!p)i)p, QG H QG >GH >H !q)j)q]. For the epipolar line j , we compare corresponding fea tures in the neighborhoods ) (p, q) and ) (p, q). G H G H The cost of matching two features is de"ned as the sum of squared di!erences over the neighborhood (i.e., a 2p#1 by 2q#1 rectangular window of pixels), GO HN (A.3) g " (z !z ). G >GH >H H G G G >GH >H G\O H\N The local window has n"(2p#1)(2q#1) elements and the di!erence between matching features is assumed to be an independent and identically distributed normal random variable, N(0, p). Each term in the sum is a random variable having a gamma distribution with parameters b" and c"1/2p. Hence, the sum in Eq. (A.3) again has a gamma distribution (A.1) with the parameters b"n/2 and c"1/2p. The probability of detection, P and the cost of occlusion CK are related through the " incomplete gamma function
n CI P "c , - . " 2 2p
(A.4)
If we specify the probability of detection P , then we can " compute the cost of occlusion from the percentile values of the gamma distribution function. For example, with n"9, P "0.99, and p"5 we have C "541.65+542; " with n"9, P "0.99, and p"15 we have " C "4874.84+4875. References [1] U.R. Dhond, J.K. Aggarwal, Structure from stereo: a review, IEEE Trans. Systems Man Cybernet 19 (6) (1989) 1489}1510.
1524
G. Fielding, M. Kam / Pattern Recognition 33 (2000) 1511}1524
[2] L. Matthies, R. Szeliski, T. Kanade, Kalman "lter-based algorithms for estimating depth from image sequences, Int. J. Comput. Vision 3 (3) (1989) 209}238. [3] G. Pajares, J.M. Cruz, J. Aranda, Relaxation by hop"eld network in stereo image matching, Pattern Recognition 31 (5) (1998) 561}574. [4] T. Kanade, M. Okutomi, A stereo matching algorithm with an adaptive window: theory and experiment, IEEE Trans. Pattern Anal. Mach. Intell. 16 (9) (1994) 920}932. [5] T. Kanade, Development of a video rate stereo machine, Proceedings of the ARPA Image Understanding Workshop, November 1994, pp. 549}558. [6] B. Ross, A practical stereo vision system, Proceedings of the IEEE Computer Vision and Pattern Recognition, 1993, pp. 148}153. [7] I. Cox, S. Hingorani, S. Rao, B. Maggs, A maximum likelihood stereo algorithm, CVGIP: Image Understanding 63 (3) (1996) 542}567. [8] E. Lawler, Combinatorial Optimization, Holt, Rinehart and Winston, New York, NY, 1976. [9] G. Fielding, M. Kam, Applying the hungarian method to stereo matching, Proceedings of the IEEE Conference on Decision and Control, December 1997, pp. 549}558. [10] B.K.P. Horn, Robot Vision, MIT Press, Cambridge, MA, 1983. [11] D. Ballard, C. Brown, Computer Vision, Prentice-Hall, Englewood Cli!s, NJ, 1982. [12] A. Watt, 3D Computer Graphics, Addison-Wesley Publishing Co., New York, NY, 1993. [13] R.Y. Tsai, A versatile camera calibration technique for high-accuracy 3d machine vision metrology using o!-theshelf tv cameras and lenses, IEEE J. Robot. Automat. RA-3 (4) (1987) 323}344. [14] Guo-Qing Wei, Song De Ma, Implicit and explicit camera calibration: theory and experiments, IEEE Trans. Pattern Anal. Mach. Intell. 16 (5) (1994) 469}480. [15] M. Drumheller, T.A. Poggio, On parallel stereo, International Conference on Robotics and Automation, 1986, pp. 1439}1448.
[16] M.J. Hannah, Sri's baseline stereo system, Proceedings of the DARPA Image Understanding Workshop, 1985, pp. 149}155. [17] D.G. Jones, J. Malik, Computational framework for determining stereo correspondence from a set of linear spatial "lters, European Conference on Computer Vision, 1992, pp. 395}410. [18] H.H. Baker, T.O. Binford, Depth from edge and intensity based stereo, International Joint Conference on Arti"cial Intelligence, 1981, pp. 631}636. [19] E. Krotkov, M. Hebert, R. Simmons, Stereo perception and dead reckoning for a prototype lunar rover, Autonomous Robots 2 (4) (1995) 313}331. [20] T. Cormen, C. Leiserson, R. Rivest, Introduction to Algorithms, MIT Press, Cambridge, MA, 1990. [21] Y. Ohta, T. Kanade, Stereo by intra- and inter-scanline search using dynamic programming, IEEE Trans. Pattern Anal. Mach. Intell. 7 (2) (1985) 139}154. [22] T.A. Poggio, V. Torre, C. Koch, Computational vision and regularization theory, Nature 317 (1985) 314}319. [23] R. Tarjan, Data Structures and Network Algorithms, Society for Industrial and Applied Mathematics, Philadelphia, PA, 1983. [24] C. Papadimitriou, K. Steiglitz, Combinatorial Optimization, Prentice-Hall, Englewood Cli!s, NJ, 1982. [25] K. Mehlhorn, S. Naher, Leda: a Platform for Combinatorial and Geometric Computing, Cambridge University Press, Cambridge, 1999. [26] B. Jahne, Digital Image Processing, 2nd Edition, Springer, New York, NY, 1993. [27] G. Fielding, M. Kam, Computing the cost of occlusion, Comput. Vision and Image Understanding, 1998, submitted for publication. [28] A. Papoulis, Probability, Random Variables, and Stochastic Processes, 3rd Edition, McGraw-Hill Book Company, New York, NY, 1991.
About the Author*GABRIEL FIELDING received the M.S. (Electrical Engineering) and Ph.D. degrees from Drexel University, in 1996 and 1999, respectively. From 1995}1998, he was a National Science and Defense Graduate Research Fellow sponsored by the O$ce of Naval Research. He has worked on computer vision projects with the Advanced Robotics Group at the Naval Research Labs in San Diego, CA. Currently, he is a research scientist with the Eastman Kodak Company in Rochester, New York. His research interests are motion estimation and parallel algorithms for image processing. About the Author*MOSHE KAM was educated at Tel Aviv University (B.S. 1977) and Drexel University (M.Sc 1985, Ph.D. 1987). At present he is a Professor of Electrical and Computer Engineering at Drexel, and Director of its Data Fusion Laboratory. He is a recipient of an NSF Presidential Young Investigator Award (1990), the C.H. MacDonald award for the Outstanding Young Electrical Engineering Educator (1991), and the Drexel University Research Award (1998). His research interests are in System Theory, Detection and Estimation (especially distributed detection and decision fusion,) Robotics, Navigation, and Control.
Pattern Recognition 33 (2000) 1525}1539
Real-time face location on gray-scale static images Dario Maio *, Davide Maltoni Corso di Laurea in Scienze dell'Informazione, Universita% di Bologna, via Sacchi 3, 47023 Cesena, Italy DEIS - CSITE-CNR, Facolta% di Ingegneria, Universita% di Bologna, viale Risorgimento 2, 40136 Bologna, Italy Received 14 October 1998; accepted 19 May 1999
Abstract This work presents a new approach to automatic face location on gray-scale static images with complex backgrounds. In a "rst stage our technique approximately detects the image positions where the probability of "nding a face is high; during the second stage the location accuracy of the candidate faces is improved and their existence is veri"ed. The experimentation shows that the algorithm performs very well both in terms of detection rate (just one missed detection on 70 images) and of e$ciency (about 13 images/s can be processed on Hardware Intel Pentium II 266 MHz). 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Face location; Directional image; Generalized Hough transform; Ellipse detection; Elliptical "tting; Deformable template
1. Introduction Automatic face location is a very important task which constitutes the "rst step of a large area of applications: face recognition, face retrieval by similarity, face tracking, surveillance, etc. (e.g. Ref. [1]). In the opinion of many researchers, face location is the most critical step towards the development of practical face-based biometric systems, since its accuracy and e$ciency have a direct impact on the system usability. Several factors contribute to making this task very complex, especially in the case of applications requiring to operate in real-time on grayscale static images. Complex backgrounds, illumination changes, pose and expression changes, head rotation in the 3D space and di!erent distances between the subject and the camera are the main sources of di$culty. Many face-location approaches have been proposed in the literature, depending on the type of images (grayscale images, color images or image sequences) and on the constraints considered (simple or complex background, scale and rotation changes, di!erent illuminations, etc.). Giving a brief summary of the conspicuous
* Corresponding author. Tel.: #39-(051)-2093547; fax: #39(051)-2093540. E-mail address:
[email protected] (D. Maio).
number of works requires a pre-classi"cation; unfortunately, due to the large amount of di!erent techniques used by researchers this task is not so easy. While we are aware of the unavoidable inaccuracies, we have tried to make a tentative classi"cation: E Methods based on template matching with static masks and heuristic algorithms which use images taken at di!erent resolutions (multiresolution approaches) [2,3]. E Computational approaches based on deformable templates which characterize the human face [4] or internal features [5}8]: eyes, nose, mouth. These methods can be conceived as an evolution of the previous class, since the templates can be adapted to the di!erent shapes characterizing the searched objects. The templates are generally de"ned in terms of geometric primitives like lines, polygons, circles and arcs; a "tness criterion is employed to determine the degree of matching. E Face and facial parts detection using dynamic contours or snakes [6,9}11]. These techniques involve a constrained global optimization, which usually gives very accurate results but at the same time is computationally expensive. E Methods based on elliptical approximation and on face searching via least-squares minimization [12],
0031-3203/00/$20.00 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 1 3 0 - 2
1526
E E
E E
E
E
D. Maio, D. Maltoni / Pattern Recognition 33 (2000) 1525}1539
incremental ellipse "tting [13] and elliptic region growing [14]. Approaches based on the Hough transform [7] and the adaptive Hough transform [15]. Methods based on the search for a signi"cant group of features [triplets, constellations, etc.] in the context considered: for example, two eyes and a mouth suitably located constitute a signi"cant group in the context of a face [7,16}19]. Face search on the eigenspace determined via PCA [20] and face location approaches based on the information theory [21,22]. Neural Network approaches [23}30]. The best results have been obtained by using feed forward networks to classify image portions normalized with respect to scale and illumination. During the training, examples of face objects and non-face objects are presented to the network. The high computational cost, induced by the need to process at di!erent resolutions all the possible positions of a face in the image, is the main drawback of these methods. Face location on color images through segmentation in a color-space: YIQ, YES, HSI, HSV, Farnsworth, etc [27,31}36]. Generally, color information greatly simpli"es the localization task: a simple spectrographic analysis shows that the face skin pixels are usually clustered in a color space, and then an ad hoc segmentation allows the face to be isolated from the background or, at least, to drastically reduce the amount of information which must be processed during the successive stages. Face detection on image sequences using motion information: optical #ow, spatio-temporal gradient, etc. [27,33,37].
Since in several applications it is mandatory (or preferable) to deal with static gray-scale images we believe it is important to develop a method which does not exploit additional information like color and motion. For example, most of the surveillance cameras nowadays installed in shops, banks and airports are still gray-scale cameras (due to the lower cost), and the electronic processing of mug-shot or identity card databases could require to detect faces from static gray-scale pictures printed on paper. Unfortunately, if we discard color and motion-based approaches, the most robust methods are generally time consuming and cannot be used in realtime applications. The aim of this work is to provide a new method which is capable of processing gray-scale static images in real time. The algorithm must operate with structured backgrounds and must tolerate illumination changes, scale variations and small head rotations. Our approach (Fig. 1(a)) is based on a location technique which starts by approximately detecting the image positions (or candidate positions) where the probability to "nd a face is high (module AL) and then for each of them improves the location accuracy and veri"es the presence of a true face (module FLFV). Actually, most of the applications in the "eld of biometric systems require detection of just one object in the image (i.e. the foreground object): under this hypothesis, a more e$cient implementation of our method is reported in Fig. 1(b), where at each step the module AL passes only the most likely position to FLFV and FLFV continues to require a new position until a valid face is detected or no more candidates are available. It should be noted that, even in this case, the system could be used to detect more faces in an image,
Fig. 1. Two di!erent functional schemes of our approach.
D. Maio, D. Maltoni / Pattern Recognition 33 (2000) 1525}1539
1527
Fig. 2. Two images and the corresponding directional images. The vector lengths are proportional to their moduli.
assuming that the iterative process is not prematurely interrupted. Although AL and FLFV have been implemented in a very di!erent manner, both these modules work on the same kind of data: that is the directional image extracted by the starting gray-scale image. In Section 2 the directional image is de"ned and some comments about its computation are reported. Section 3 describes the module AL which is based on the search of elliptical blobs in the directional image by means of the generalized Hough transform. In Section 4 we present the dynamic-mask-based technique used for "ne location and face veri"cation (module FLFV) and in Section 5 we discuss how to practically combine AL and FLVF in order to implement the functional schema of Fig. 1(b). Section 6 reports the results of our experimentation over a 70 image database; "nally, in Section 7, we present our conclusions and discuss future research.
tional information, intensity maxima and minima, etc. Our technique strongly relies on the edge phase-angles contained in a directional image. A directional image is a matrix de"ned over a discrete grid, superimposed on the gray-scale image, whose elements are in correspondence with the grid nodes. Each element is a vector lying on the xy plane. The vector direction represents the tangent to the image edges in a neighborhood of the node, and its modulus is determined as a weighted sum of the contrast (edge strength) and the consistency (direction reliability) (Fig. 2). In this work the directional image is computed by means of the method proposed by Donahue and Roklin [38]. Each directional image element is calculated over a local window where a gradient-type operator is employed to extract several directional estimates (2D sub-vectors), which are averaged by least-squares minimization to control noise. This technique is more robust than the standard operators used for computing the gradient phase angle and enables the contrast and the consistency
2. Directional image Most of the face location approaches perform an initial edge extraction by means of a gradient-like operator; few methods also exploit other additional features like direc-
The vector direction is unoriented and lies in the range [!903, #903[.
1528
D. Maio, D. Maltoni / Pattern Recognition 33 (2000) 1525}1539
to be calculated with a very small overhead. In particular, the contrast is derived by the magnitude of the 2D subvectors and the consistency is in inverse proportion with the residual resulting from the least-squares minimization [38] (in fact, an high reliability corresponds to a low residual, that is, all the 2D sub-vectors are nearly parallel).
3. AL - approximate location The analysis of a certain number of directional images suggested the formulation of a simple method for detecting faces. In particular, we noted that when a face is present in an image the corresponding directional image region is characterized by vectors producing an elliptical blob. For this reason, the module AL is based on the search for ellipses on the directional image. Several techniques could be used for this purpose, for example multiresolution template matching [39] and leastsquares ellipse "tting [12]. We adopted a new approach based on the generalized Hough transform [40] which performs very well in terms of e$ciency. The method is capable of detecting the approximate position of all the ellipses within a certain range of variation de"ned according to pre-"xed scale and rotation changes. The basic idea is to perform a generalized Hough transform by using an elliptical annulus C as template. Actually, the directional information allows the transform to be implemented very e$ciently, since the template used for the accumulator array updating can be reduced to only two sectors of the elliptical annulus. Formally: Let a and b be the lengths of the semi-axes of an ellipse used as reference, and let o and o be, respectively, the reduction and expansion coe$cients de"ning the scale range (and hence the elliptical annulus C): a "o ) a, b "o ) b, a "o ) a, b "o ) b
(Fig. 3). Let D be the directional image and let A be the accumulator array, then the algorithm can be sketched as: Reset A; ∀ vector d3D +[x , y ]"origin(d); u"direction(d); p"modulus(d); T"current}template([x , y ], u); ∀ pixel [x, y]3T +A[x, y]"A[x, y]#p ) weight ([x, y]);, 2 , The high-score A cells are good candidates for ellipse centers. The direction(d) and the modulus(d) of the directional elements d are calculated o!-line as described in Section
Fig. 3. The template T (in dark gray) is constituted by those points which are possible centers of ellipses capable of originating in [x , y ] a vector d with direction u.
2; current}template([x , y ], u) determines the current template T as a function of the direction u of the vector centered in [x , y ]. The points [x , y ] and [x ,y ] in Fig. 3 are the only two points where an ellipse tangent to d in [x , y ] with semi-axes a, b could be centered. Since we are interested in all the ellipses whose semi-axes are in the range [a 2a , b 2b ], we must take into
account all the points lying on the two segments determined by the intersection between the straight line de"ned by [x , y ], [x ,y ] and the elliptical annulus C. Finally, if we assume a maximum angular variation h on the directional information, the geometric locus T of the possible centers becomes:
T" [x, y]"o) P
*angle arctg
x!x y!y # )o, a b y!y h , ) . x!x 2
where *angle(a, b): [!903, #903[;[!903, #903[P [03, 903] returns the smaller angle determined by the
The angular variation h is introduced to compensate for both small ellipse rotations and inaccuracies in the computation of u.
D. Maio, D. Maltoni / Pattern Recognition 33 (2000) 1525}1539
directions a, b; the angle can be computed as a function of u by deriving the tangent vector expression by the parametric equation of the ellipse:
1529
2 ) *angle(arctg((y!y )/(x!x )), ) . h
granularity used is 7;7 pixels, that is, the directional image elements are calculated every 7;7 pixels. Therefore, the size of both the directional image and the accumulator array associated to an X;> pixel image is W X/7 X ;W Y/7X . E The directions of the elements in D have been discretized (256 values). E The templates T have been pre-computed (in our experiments we chose a"34, b"45, o "0.6, o "1.2, h"303); by using relative coordinates with respect to the ellipse center, the number of di!erent templates corresponds to the number of di!erent directions. E The algorithm has been implemented in integer arithmetic.
Fig. 4 shows a representation of a template T whose elements are associated to gray levels proportional to their weights (the light pixels within the sectors are associated to larger weights). An e$cient implementation of the module AL has been obtained by adopting the following tricks:
Fig. 5 shows an example of ellipse detection as performed by AL: the starting image contains "ve di!erently shaped ellipses; the array of accumulator resulting from the Hough transform clearly exhibits a local maximum corresponding to each ellipse center.
E The grid which de"nes the accumulator array A has been set equal to that de"ning D. In particular the
4. FLFV - 5ne location and face veri5cation
b
"arctg ! . a ) tg(u) The function weight : TP[0, 1] associates, to each point 2 [x, y] of T, a weight which decreases with the angular distance between the straight line de"ned by [x,y], [x , y ] and the direction : weight (x, y)"1! 2
Di!erent strategies can be adopted in order to improve the location accuracy and to verify whether an elliptical object is really a face or not. Some of the alternatives we explored are reported in the following:
Fig. 4. (a) graphic representation of an ideal template T, and (b) its corresponding discrete version induced by the discretization of A.
E Improving the ellipse center location through AHT (Adaptive Hough Transform) [41,42] which requires the granularity of the hot accumulator cells to be gradually re"ned. E Local optimization of the center [x , y ], of the semi axes a and b and of the ellipse tilt angle m through a local optimization algorithm (Steepest descent, Downhill Simplex method [43], etc.) which searches for the best"tting ellipse in the parameter space [x , y , a, b, m]. E Position optimization and face veri"cation through the detection of a symmetry axis [44]; a symmetry axis
Fig. 5. An arti"cial image and the corresponding transform.
1530
D. Maio, D. Maltoni / Pattern Recognition 33 (2000) 1525}1539
Fig. 6. The "gure shows the projection, on the vertical axis, of the directional image vectors belonging to the region delimited by the white rectangle. The local minima, generated by the presence of horizontal vectors in the eye and mouth regions could be used for the registration according to the internal feature positions.
can be extracted both from the original and the directional image. E Projection of the image portion containing the ellipse on the symmetry axis and on its orthogonal one. Several authors [34,35,45,46] demonstrated that these projections are characterized by local intensity minima in the regions corresponding to the eyes and the mouth. The projection method can also be applied to the directional image: in this case local maxima and minima are present in the eye, nose and mouth regions due to the presence of horizontal and vertical vectors (Fig. 6). In our preliminary work [47] the "ne location and face veri"cation sub-tasks were performed sequentially. From the experimental evidences we argued that better results could be achieved by executing both operations simultaneously. In fact, a precise face location cannot always be obtained by using an elliptical template, since particular physiognomies, hairstyles or illuminations sometimes make the face only coarsely elliptical. On the other hand, the reliability of a face veri"cation method (for example the projection approach in Fig. 6) strongly depends on the face location accuracy. Therefore, in order to confer a greater robustness to the FLFV module, we developed a global approach which attempts to satisfy the two di!erent aims at the same time. The face is locally searched in a small portion of the directional image by starting from a candidate position resulting from AL; to this purpose, a mask F, describing the global aspect of a human face, and de"ned in terms of directional elements is employed. The local search is performed through an orientation-based correlation between F and the D portion of interest. Actually, after the
approximate location step, D is locally re"ned (i.e. recomputed on a 3;3 pixel grid) in a neighborhood of the candidate position according to a multiresolution strategy (in the following, for sake of exposition, D denotes the new directional image portion). The mask F is de"ned as a set of directional elements n , i"1, 2, n each of which is characterized by an G
origin, an unoriented direction (in the range [!903, 903[) and a modulus. The element origins are determined by superimposing a grid having the same granularity of D (3;3) to the template reported in Fig. 7. In particular an element n is de"ned for each grid point lying within one of the template gray regions (which correspond to the salient face features). Since the template is parametrically de"ned according to the sizes a and b, whereas the grid granularity is "xed, a di!erent number n of elements is
created varying a and b. As to directions, all the elements within the mouth, eyes and eyebrows regions have horizontal direction, the nose elements have vertical direction and each element belonging to the border region has the direction of the tangent (in that point) to the external ellipse. Each of the seven regions in Fig. 7 has a global modulus which is equally partitioned among its elements. If we consider the common face stereotype, the shape and the position of the nose in Fig. 7 appear a bit unusual; in particular, the nose is too short and moved upward. Actually, this
Actually, the border region constitutes an exception: in this case we "xed a priori the total number n of elements within the region and we created an element each 360/n degree by snap ping it to the grid point closest to the border ellipse. In our simulation we chose n "30.
D. Maio, D. Maltoni / Pattern Recognition 33 (2000) 1525}1539
1531
computed as: Distance (F , D, [x, y]) G +Distance"0; ModuliSum"0; ∀ element n3F G +[x , y ]"origin(n); L L Let d3D be the element with origin [x, y]#[x , y ]; L L u "direction(n); p "modulus(n); L L u "direction(d); p "modulus(d); B B Distance"Distance#p ) p ) *angle(u , u ); L B L B ModuliSum"ModuliSum#p ) p ; L B , Distance"Distance/ModuliSum; , Let [x , y ] be a candidate position as resulting from the module AL, then FLFV determines the face position [xH, yH] and the semi-axes aH and bH by minimizing the Distance function over a discrete state space:
Fig. 7. The template used for the construction of the mask F used by the FLVF. The sizes and ratios of the template have been calculated by analyzing some face images and by keeping into account the indications reported by other authors [19,35,36]. The region global moduli are: border"390, mouth"110, nose"60, eyes"35#35, eyebrows"25#25.
choice is motivated by the fact that in the nostril region the edge directions are not strictly vertical but can be rather chaotic. By discretizing a and b in the range [a 2a ,
b 2b ] it is possible to pre-compute a set F"F ,
F , 2, F of static masks (Fig. 8), where the element K positions are given in relative coordinates with respect to the mask center. It should be noted that the set of masks in F does not explicitly model face rotation; in fact, since in our current application only small head rotation is allowed, a set of `verticala masks proved to be adequate; anyway, more in general F could be expanded to cope with tilted faces by including two or more rotated copies of each mask. The masks in F allow a correlation degree at each position in D to be e$ciently computed. Matching directional elements requires an ad hoc correlation (or distance) operator capable of dealing with the discontinuity (!903 903) in the de"nition of directions (e.g. Ref. [48]). Good results have been obtained in this work with a distance operator de"ned as an average sum of direction-di!erence absolute values. The distance between the mask F 3F and the portion of D centered in [x, y] is G
+Distance(F , D, [x, y]), d " min G
FG F Z * VZ V \ V V > V
WZ W\*WW>*W
where *x and *y de"ne a neighborhood of [x , y ], and aH, bH coincide with the semi-axes of the best-"tting mask. In our simulations we set *x"*y"4 and therefore the total number of states is 12;9;9"972. An e$cient implementation (in integer arithmetic) allowed an exhaustive strategy for determining the optimum to be adopted. In case of a large number of states the use of less expensive optimization techniques (Steepest descent, Downhill Simplex method, etc.) should be investigated. Once the best-"tting position has been determined, the face veri"cation sub-task can be simply performed by comparing the distance d with a pre-"xed threshold.
Fig. 9 shows the results of the intermediate steps of a face location example.
5. Combining AL and FLFV Depending on the application requirements, there are several ways of adjusting and combining AL and FLFV modules. Since at this stage our aim is to develop a method capable of e$ciently detecting the foreground face, we adopted the functional schema of Fig. 1.b. In particular, the algorithm searches for just one face in the image; it returns the face position[x , y ] and sizes a , b , in case of detection, and null otherwise. A pseudo-code version of the whole face detection method is reported: Compute directional image D; Perform generalized Hough transform; [x , y ]"get "rst candidate position on A; d "R;
1532
D. Maio, D. Maltoni / Pattern Recognition 33 (2000) 1525}1539
Fig. 8. A graphical representation of the 12 masks constituting the set F (m"12) used in our experimentation (the length of the mask elements is proportional to their modulus). It should be noted that, since the region global moduli are constant, the element moduli in the smaller masks are generally larger since their number is lower. The only exception concerns the border region where the number of elements is constant and therefore their modulus is independent on the mask size.
While ([x , y ] is not NULL) and d '¹ D +D"Re"nes the directional image in a neighborhood of [x , y ]; d " min +Distance(F , D, [x, y]),
G FG F Z >*V
VZ V\ V V WZ W\*WW>*W
//[xH, yH], aH, bH determine the state corresponding to d .
if (d (d )+d "d ;[x , y ]"[xH, yH];
a "aH; b "bH;, [x , y ]"get next candidate position on A;
, if (d (¹ ) + return [x , y ], a , b ; , D else + return null; ,
Two di!erent thresholds ¹ and ¹ (¹ (¹ ) are used by the algorithm: ¹ stops the iterative search process as soon as a good match has been found (d )¹ ). In case all the candidate positions have been examined (in our experiments the candidate positions are the three best local maxima of A), the iterative process is interrupted, and a valid face is returned if the smallest distance computed is less than ¹ . Using a pair of thresholds allows a larger number of candidate positions to be analyzed in case the current one is not su$ciently reliable, and at the
same time the temporary discarded faces can be reconsidered if no more-likely candidates have been found.
6. Experimentation Experimental results have been produced on a database of 70 images each of which contains at least one human face (Fig. 11). All the images (384;288 pixels*256 gray levels) were acquired in some o$ces and laboratories of our department, under di!erent illuminations (sometimes rather critical: backlighting, semidarkness,2) and with the subject at di!erent distances from the camera. In 10 images people wear spectacles. The subjects were required to gaze the camera. Each of the 70 images was manually labeled, by indicating with a mouse, the eye positions (le and re) and the mouth center (mc) of the face in foreground (Fig. 10). The following formulae were used to derive the features (center c and semi-axes a, b) of an ellipse approximating a face: a"1.1 ) ""le-re""; b"1.3 ) (""le-mc""#""re-mc"")/2; c"[x , y ]; mc"[x , y ]; x "x #cos m ) 0.54 ) b,
y "y #sin m ) 0.53 ) b
D. Maio, D. Maltoni / Pattern Recognition 33 (2000) 1525}1539
1533
Fig. 9. The results of the intermediate steps in a face location example: directional image computation, generalized Hough transform and selection of the best candidate position (module AL), directional image local re"ning, determination of the best-"tting mask and position (module FLFV).
Fig. 10. The geometric construction used for de"ning the face ellipse (c, a, b) starting from the eye centers le, re and the mouth center mc.
where m"(a#b)/2, with a"angle(ec, mc) and b"angle(el, mc) Module AL: The approximate location module detected the correct face position as global maximum of A (i.e. "rst
candidate position) in 65 cases (92.86%). In the remaining "ve images the face position was detected as the second or third candidate position. Fig. 12 shows some images and the corresponding transforms, where the
1534
D. Maio, D. Maltoni / Pattern Recognition 33 (2000) 1525}1539
Fig. 11. The database used in our experimentation.
global maximum determined by the elliptical face shape is well visible (an ellipse of semi-axes a and b was superimposed on each maximum). Fig. 13 reports an example where the transform global maximum does not coincide with the face position, but it is determined by a di!erent circular object (an apple within a poster). This false alarm, as can be seen in Fig. 14 ("rst row, third column), is removed by the module FLFV. The whole approach: AL#FLFV: The whole face detection algorithm has been applied to the 70 images: in 69 cases the foreground face was correctly detected; only in
one case no faces were found. Eleven false alarms (in 7 images) generated by AL were correctly discarded by FLFV: if we consider the 69 images, we can conclude that on average about 0.16 false alarms per image are processed and correctly discarded by FLFV before the correct face position is detected. 䉴 Fig. 13. A false alarm caused by an elliptical object in the scene. In any case, the local maximum determined by the face is well evident and constitutes the second best choice.
D. Maio, D. Maltoni / Pattern Recognition 33 (2000) 1525}1539
Fig. 12. Some images and the corresponding transforms as computed by the module AL.
1535
1536
D. Maio, D. Maltoni / Pattern Recognition 33 (2000) 1525}1539
Fig. 14. Some examples of face location. The thin-border ellipses denote AL outputs, whereas the best-"tting masks are reported as FLFV outputs.
D. Maio, D. Maltoni / Pattern Recognition 33 (2000) 1525}1539
The location accuracy was measured by considering, for the 69 faces correctly detected, the percentage error on the center position (c}err) and on the semi-axes length (a}err, b}err). Let c, a and b be the center and the semi-axes of the foreground face in an image (as manually indicated during the database labeling) and let [x , y ], a and b be the center and the semi-axes returned by the algorithm, then: c}err"""[x , y ]!c""/a#b a}err""a !a"/a b}err""b !b"/b The average values obtained: c}err"8.70%, a}err"8.66%, b}err"10.47% denote a good location accuracy. It is worth remarking that a face cannot be exactly described by an ellipse using the geometric model of Fig. 11 and therefore the errors measured should not be exclusively attributed to the detection algorithm. In fact, a qualitative analysis of the results con"rmed that in most cases the mask positioned by the module FLFV perfectly "t the underlying face, especially in the eye and the mouth regions. Fig. 14 shows some examples; the thin-border ellipses denote AL outputs; the false alarms produced by AL can be easily noted since no masks are associated to the corresponding ellipses. Fig. 15 shows the only images which produced a miss detection error, probably due to the lateral illumination which hides a signi"cant part of the face, thus inducing the module AL to poorly estimate the face position. The method discussed in this paper is capable of processing a 384;288 gray-scale image in 0.078 s on Hardware Intel Pentium II 266 MHz: in particular, 0.031 s are necessary for the directional image computation, 0.007 s
Fig. 15. The only missed detection error on our database: none of the three candidate positions have been accepted by the module FLFV; the upper candidate position is not enough close to the face to allow FLFV to converge.
1537
for the Hough transform, 0.015 s for the directional image re"ning and 0.019 s for the "nal template matching. By increasing the last two terms by 16% (in order to take into account the average number of false alarms generated by AL) and by summing all the contributes, we obtain an average processing time of 0.078 s which corresponds approximately to 13 frames per second. We disregard the computation of the 256 templates for the Hough transform and the computation of the masks in F, since they are performed o!-line in 0.086 s.
7. Conclusions This work proposes a two-stage approach to face location on the gray scale static images with complex backgrounds. Both the modules operate on the elements constituting the directional image, which has been proved to be very e!ective in providing reliable information even in the presence of critical illumination and semidarkness. The approximate location module searches for the most likely positions in the image by means of a particular implementation of the generalized Hough transform. Great computational saving is obtained with respect to a correlation-based technique. In fact, in the former case just one directional image scan allows the `hota positions to be extracted, whereas in the latter several elliptic templates (resembling the di!erent elliptical shapes and sizes) should be shifted everywhere on the directional image to discover the high correlation points. Numerically, if n is the average number of cells updated by the 2 current template T and n is the number of elements " in D (in our implementation n +20 and n +2000) 2 " O(n ) n ) operations are necessary for calculating the 2 " Hough transform. If we assume that n also denotes the 2 average number of elements in a hypothetical elliptical template, then O(m ) n ) n ) operations are necessary for 2 " the template matching with m templates. According to our parameters choice, a reasonable value for m is 12 (please, refer to Fig. 8); hence the GHT implementation gives a considerable saving with respect to a correlationbased approach. The "ne location and face veri"cation module analyses small-re"ned portions of directional image attempting to detect the exact position and size of a face together with a con"dence value about its presence. This is performed by means of a set of masks resembling the human face stereotype; since the masks are de"ned in terms of directions, their matching is robust and very little biased by light conditioning. Very good results have been achieved both in terms of face detection rate (just one missed detection on 70 images) and e$ciency (about 13 images/sec. can be processed on Hardware Intel Pentium II 266 MHz running Window952+). Furthermore, a more e$cient version
1538
D. Maio, D. Maltoni / Pattern Recognition 33 (2000) 1525}1539
could be obtained by applying an appropriate code optimization. As to future research, we are going to investigate how to combine the modules AL and FLFV with predictive analysis techniques (for example Kalman "ltering) in the context of face tracking applications. Furthermore, we are gathering a larger database of images which will allow us to more precisely characterize the performance of our approach and to draw a ROC curve showing the false detection/missed detection tradeo!. References [1] R. Chellappa, S. Sirohey, C.L. Wilson, C.S. Barnes, Human and machine recognition of faces: a survey, Tech. Report CS-TR-3339, Computer Vision Laboratory, University of Maryland, 1994. [2] I. Craw, H. Ellis, J.R. Lishman, Automatic extraction of face-features, Pattern Recognition Lett. 5 (1987) 183}187. [3] G. Yang, T.S. Huang, Human face detection in a complex background, Pattern Recognition 27 (1994) 53}63. [4] I. Craw, D. Tock, A. Bennet, Finding face features, Proceedings of ECCV, 1992. [5] A. Yuille, D. Cohen, P. Hallinan, Facial features extraction by deformable templates, Tech. Report 88-2, Harward Robotics Laboratory, 1988. [6] C. Huang, C. Chen, Human facial feature extraction for face interpretation and recognition, Pattern Recognition 25 (1992) 1435}1444. [7] G. Chow, X. Li, Towards a system for automatic facial feature detection, Pattern Recognition 26 (1993) 1739}1755. [8] K. Lam, H. Yan, Locating and extracting the eye in human face images, Pattern Recognition 29 (1996) 771}779. [9] A. Lanitis, C.J. Taylor, T.F. Cootes, T. Ahmed, Automatic interpretation of human faces and hand gesture using #exible models, Proceedings of International Workshop on Automatic Face and Gesture Recognition, Zurich, 1995, pp. 98}103. [10] R. Funayama, N. Yokoya, H. Iwasa, H. Takemura, Facial component extraction by cooperative active nets with global constraints, Proceedings of the 13th ICPR, v. B, 1996, pp. 300}304. [11] S.R. Gunn, M.S. Nixon, Snake head boundary extraction using global and local energy minimisation, Proceedings of the 13th ICPR, v. B, 1996, pp. 581}585. [12] S.A. Sirohey, Human face segmentation and identi"cation, Tech. Report CAR-TR-695, Center for Automation Research, University of Maryland, 1993. [13] A. Jacquin, A. Eleftheriadis, Automatic location tracking of faces and facial features in video sequences, Proceedings of the International Workshop on Automatic Face and Gesture Recognition, 1995, pp. 142}147. [14] R. Herpers, H. Kattner, H. Rodax, G. Sommer, GAZE: an attentive processing strategy to detect and analyze the prominent facial regions, Proceedings of the International Workshop on Automatic Face and Gesture Recognition, 1995, pp. 214}220. [15] X. Li, N. Roeder, Face contour extraction from front-view images, Pattern Recognition 28 (1995) 1167}1179.
[16] V. Govindaraju, S.N. Srihari, D.B. Sher, A computational model for face location, Proceedings of the 3rd ICCV, 1990, pp. 718}721. [17] H.P. Graf, T. Chen, E. Petajan, E. Cosatto, Locating faces and facial parts, Proceedings of the International Workshop on Automatic Face and Gesture Recognition, 1995, pp. 41}46. [18] M.C. Burl, T.K. Leung, P. Perona, Face localization via shape statistics, Proceedings of the International Workshop on Automatic Face and Gesture Recognition, 1995, pp. 154}159. [19] S. Jeng, H.M. Liao, Y. Liu, M. Chern, An e$cient approach for facial feature detection using geometrical face model, Proceedings of the 13th ICPR, v. C, 1996, pp. 426}430. [20] B. Moghaddam, A. Pentland, Maximum likelihood detection of faces and hands, Proceedings of the International Workshop on Automatic Face and Gesture Recognition, 1995, pp. 122}128. [21] M.S. Lew, N. Huijsmans, Information theory and face detection, Proceedings of the 13th ICPR, v. C, 1996, pp. 601}605. [22] A.J. Colmenarez, Face and facial feature detection with information-based maximum discrimination, Nato Asi Conference on Faces, 1997. [23] D. Valentin, H. Abdi, A.J. Otoole, G.W. Cottrell, Connectionist models of face processing*a survey, Pattern Recognition 27 (1994) 1209}1230. [24] G. Burel, D. Carel, Detection and localization of faces on digital images, Pattern Recognition Lett. 15 (10) (1994) 963}967. [25] K. Sung, T. Poggio, Example-based learning for viewbased human face detection, A.I. Memo 1521, CBCL Paper 112, MIT, 1994. [26] H.A. Rowley, S. Baluja, T. Kanade, Human face detection in visual scenes, Tech. Report CMU-CS-95-158R, Carnegie Mellon University, 1995. [27] B. Schiele, A. Waibel, Gaze tracking based on face color, Proceedings of the International Workshop on Automatic Face and Gesture Recognition, 1995, pp. 344}349. [28] N. Intrator, D. Reisfeld, Y. Yeshurun, Extraction of facial features for recognition using neural networks, Proceedings of the International Workshop on Automatic Face and Gesture Recognition, 1995, pp. 260}265. [29] R. Feraud, A conditional ensemble of experts applied to face detection, Nato Asi Conference on Faces, 1997. [30] F.F. Soulie, Connectionist methods for human face processing, Nato Asi Conference on Faces, 1997. [31] H. Wu, Q. Chen, M. Yachida, An application of fuzzy theory: face detection, Proceedings of the International Workshop on Automatic Face and Gesture Recognition, 1995, pp. 314}319. [32] Y. Dai, Y. Nakano, Face-texture model based on SGLD and its application in face detection in a color scene, Pattern Recognition 29 (1996) 1007}1017. [33] C.H. Lee, J.S. Kim, K.H. Park, Automatic face location in a complex background using motion and color information, Pattern Recognition 29 (1996) 1877}1889. [34] K. Sobottka, I. Pitas, Extraction of facial regions and features using color and shape information, Proceedings of the 13th ICPR, v. C, 1996, pp. 421}425.
D. Maio, D. Maltoni / Pattern Recognition 33 (2000) 1525}1539 [35] H. Sako, A.V.W. Smith, Real-time expression recognition based on features position and dimension, Proceedings of the 13th ICPR, v. C, 1996, pp. 643}648. [36] E. Saber, A. Murat Tekalp, Face detection and facial feature extraction using color, shape and symmetry-based cost functions, Proceedings of the 13th ICPR, v. C, 1996, pp. 654}657. [37] B. Leroy, I.L. Herlin, L.D. Cohen, Face identi"cation by deformation measure, Proceedings of the 13th ICPR, v. C, 1996, pp. 633}637. [38] M.J. Donahue, S.I. Rokhlin, On the use of level curves in image analysis, Image Understanding 57 (1993) 185}203. [39] P. Seitz, M. Bichsel, The digital doorkeeper*automatic face recognition with the computer', Proceedings of the 25th IEEE Carnahan Conference on Security Technology, 1991. [40] D.H. Ballard, Generalizing the Hough transform to detect arbitrary shapes, Pattern Recognition 3 (1981) 110}122. [41] L.S. Davis, Hierarchical generalized Hough transform and line segment based generalized Hough transforms, Pattern Recognition 15 (1982) 277}285.
1539
[42] J. Illingworth, J. Kittler, The adaptive Hough transform, IEEE Trans. Pattern Anal. Mach. Intell. 9 (1987) 690}697. [43] W.H. Press, S.A. Teukolsky, W.T. Vetterling, B.P. Flannery, Numerical Recipes in C, Cambridge University Press, Cambridge, 1992. [44] D. Reisfeld, H. Wolfson, Y. Yeshurun, Detection of interest points using symmetry, Proceedings of the 3rd ICCV, 1990, pp. 62}65. [45] R. Brunelli, T. Poggio, Face recognition: features versus templates, IEEE Trans. Pattern Anal. Mach. Intell. 15 (1993) 1042}1052. [46] H. Wu, Q. Chen, M. Yachida, Facial Features Extraction and Face Veri"cation, Proceedings of the 13th ICPR, v. C, 1996, pp. 484}488. [47] D. Maio, D. Maltoni, Fast face location in complex backgrounds, Nato Asi Conference on Faces, 1997. [48] A. Crouzil, L. Massip-Pailhes, S. Castan, A New Correlation Criterion Based on Gradient Fields Similarity, Proceedings of the 13th ICPR, v. A, 1996, pp. 632}636.
About the Author*DARIO MAIO is Full Professor at the Computer Science Department, University of Bologna, Italy. He has published in the "elds of distributed computer systems, computer performance evaluation, database design, information systems, neural networks, biometric systems, autonomous agents. Before joining the Computer Science Department, he received a fellowship from the C.N.R. (Italian National Research Council) for participation to the Air Tra$c Control Project. He received the degree in Electronic Engineering from the University of Bologna in 1975. He is a IEEE member. He is with CSITE - C.N.R. and with DEIS; he teaches database and information systems at the Computer Science Dept., Cesena. About the Author*DAVIDE MALTONI is an Associate Researcher at the Computer Science Department, University of Bologna, Italy. He received the degree in Computer Science from the University of Bologna, Italy, in 1993. In 1998 he received his Ph.D. in Computer Science and Electronic Engineering at DEIS, University of Bologna, with research theme `Biometric Systemsa. His research interests also include autonomous agents, pattern recognition and neural nets. He is an IAPR member.
Pattern Recognition 33 (2000) 1541}1553
Digital knots Akira Nakamura , Azriel Rosenfeld* Department of Computer Science, Hiroshima-Denki Institute of Technology, Hiroshima 739-0321, Japan Center for Automation Research, University of Maryland, College Park, MD 20742-3275, USA Received 23 January 1999; accepted 17 May 1999
Abstract We de"ne digital (6-)knots in Z, and discuss how they are related to isothetic polygonal knots in R. We introduce a class of topology-preserving `simple deformationsa (SD) of digital images, and show that isomorphic knots have digitizations that di!er by SD, and conversely. We also de"ne `regular positiona for isothetic links and digital links, and show that SD can be used to put any digital knot into such a position and to perform `Reidemeister movesa on the resulting projection. 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Digital knot; Isothetic knot; Isomorphism; Simple deformation
1. Introduction Objects in two- and three-dimensional images can be described by a variety of geometrical properties. Of these, topological properties are especially fundamental because they remain invariant under most types of geometric transformations. An object in a two-dimensional image has a simple topological characterization; it consists of a set of connected regions, each of which may have holes, and its topology is completely characterized by the containment relations between the regions and the holes. In three dimensions the situation is much more complicated. An object can have two types of holes } `cavitiesa, which it surrounds completely, and `tunnelsa (like the hole in a doughnut), which it only encircles. Moreover, an object that has a tunnel } such as a simple closed space curve } can be knotted, and two or more such objects can be linked. The basic concepts of components and holes (or components, cavities and tunnels) in digital images have been studied by many authors, but little or no e!ort has been made to study the concepts of knottedness and linkedness for digital objects. This paper is an initial e!ort at such a study.
* Corresponding author. Tel.: #301-405-4526; fax: #301314-9115. E-mail address:
[email protected] (A. Rosenfeld).
A knot [1] K (more precisely: a tame knot) is a simple closed curve in R such that there exists an orientationpreserving homeomorphism of R onto itself (in brief: an OPH) that maps K onto a simple polygon. We call a polygon isothetic if its sides are parallel to the coordinate axes. Evidently, for any simple polygon P, there exists an OPH that maps P onto a simple isothetic polygon; hence we can rede"ne a knot K as a simple closed curve in R such that there exists an OPH that maps K onto a simple isothetic polygon. A link is a "nite union of disjoint knots. Two knots or links are called (knot-theoretically) isomorphic if there exists an OPH that maps one of them onto the other. A knot K is said to be unknotted (or: an unknot) if there exists an OPH that maps K onto a simple planar polygon. Let P, P be simple polygons. We say that P and P di!er by an elementary deformation (in brief: a delta-move) if there exist three points p, q, r such that P intersects the triangle (i.e., the planar triangular region) pqr in the line segment pr; P intersects pqr in the line segments pq and qr; and P, P are otherwise identical (see Fig. 1). It can be shown [2,3] that two polygonal knots di!er by a "nite sequence of delta-moves i! they are isomorphic; and an analogous result is true for polygonal links. Let P, P be isothetic simple polygons. We say that P and P di!er by an isothetic elementary deformation (in brief: a quad-move) if there exist four points p, q, r, s at the
0031-3203/00/$20.00 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 1 2 5 - 9
1542
A. Nakamura, A. Rosenfeld / Pattern Recognition 33 (2000) 1541}1553
Fig. 2. The two types of quad-move: (a) Three sides pq, qr, rs of an isothetic polygon are replaced by the single side ps, or vice versa; (b) two sides pq, ps of an isothetic polygon are replaced by two `oppositea sides qr, rs, or vice versa.
Fig. 1. A delta-move: two sides pq and pr of a polygon are replaced by the single side pr, or vice versa.
We will de"ne a class of operations on digital images called simple deformations, and will show that if the digital links ¸, ¸ are such that 1¸2 and 1¸2 are isomorphic, then ¸ and ¸ di!er by a simple deformation, and conversely. Finally, we will discuss the relationship between quad-moves performed on an isothetic link, simple deformations performed on a digital link, and `Reidemeister movesa performed on the link's regular-position projection.
2. (Isothetic) regular position vertices of a rectangle (i.e., a planar rectangular region) R such that either P intersects R in one of its sides and P intersects it in the other three sides, or else P intersects R in two of its adjacent sides and P intersects it in the other two sides (see Fig. 2). (Evidently, any quad-move can be accomplished by performing two delta-moves, the "rst of which produces a non-isothetic polygon.) It can be shown that two isothetic polygonal knots di!er by a "nite sequence of quad-moves i! they are isomorphic, and similarly for isothetic polygonal links. In this paper we will study digital knots (more precisely: 6-knots) and links in the three-dimensional lattice point space Z. We will usually take a digital knot to be a simple 6-curve, and a digital link to be a union of digital knots K such that no voxel of any K is 6-adjacent to any G G voxel of any other K (in brief: K does not touch K for H G H any jOi). Before introducing digital links, we will review the concept of `regular positiona for a polygonal link, and introduce the concept of `isothetic regular positiona for an isothetic polygonal link. If we join the successive lattice points of each digital knot K of the digital link ¸ with line segments, we obtain an isothetic polygonal link 1¸2. Conversely, it is not hard to see that any "nite union of disjoint isothetic polygons P has a digitization that is a digital link ¸ G composed of digital knots K such that 1K 2 and P are G G G isomorphic.
The parallel projection of a polygonal link onto a plane consists of straight line segments that may intersect one another. We say that ¸ is in regular position with respect to a direction d if the parallel projection ¸ of B ¸ onto a plane perpendicular to d has the following properties: (a) For all but "nitely many points p of ¸ , the preimage B of p is a single point of ¸. (b) Each of the exceptional points has a preimage that is an interior point of exactly two polygon sides of ¸. A simple example is shown in Fig. 3. It is not hard to see that for every polygonal link, there exist directions d with respect to which ¸ is in regular position. If ¸ is isothetic, and d is parallel to one of the axes, ¸ cannot be in regular position with respect to d unless none of the polygon sides of ¸ is parallel to that axis. However, we can de"ne a modi"ed concept of `regular positiona which allow us to use axis-parallel directions for isothetic links. Without loss of generality, we de"ne this concept for the case where d is the z-axis. We say that an isothetic link ¸ is in z-regular position if its parallel projection ¸ onto the xy-plane has the following properX ties (see Fig. 4): (a) For all but "nitely many points p of ¸ , the preimage X of p is a single point of ¸.
A. Nakamura, A. Rosenfeld / Pattern Recognition 33 (2000) 1541}1553
Fig. 3. A polygonal knot K in z-regular position, projected on the xy-plane. Every point in the drawing is the projection of exactly one point of K, except for the three crossing points, each of which is the projection of interior points of exactly two sides of K.
1543
Fig. 4. An isothetic polygonal knot K in z-regular position, projected on the xy-plane. Every point in the drawing is the projection of exactly one point of K, except for the three crossing points, each of which is the projection of interior points of exactly two sides of K, and three of the angle points, each of which is the projection of a side of K parallel to the z-axis.
(b) Each of the exceptional points has a preimage that is either a single polygon side of ¸ parallel to the z-axis, or an interior point of exactly two polygon sides of ¸ that are parallel to the x- and y-axes. It is not di$cult to show that any isothetic polygonal link ¸ can be put into z-regular position by performing a "nite set of quad-moves on it; these moves are used to displace the sides of ¸ so as to eliminate all the violations of properties (a}b). [For example, no violations can exist if no two sides of ¸ parallel to the x-(y-)axis have the same y-(x-)coordinate; no side parallel to the x(y-)axis has an endpoint with the same x-(y-)coordinate as a side parallel to the y-(x-)axis; and no two sides parallel to the z-axis have both the same x-coordinate and the same y-coordinate. Let ¸ be a polygonal link that is in regular position with respect to the z-direction, or an isothetic polygonal link that is in z-regular position. Then in the projection of ¸ onto the xy-plane there are only "nitely many points at which two projected line segments cross. We augment the projection by specifying, at each such crossing, which of the line segments of ¸ whose projections are crossing has a higher z-coordinate. Let K be a polygonal knot in (z-)regular position, and let K be the projection of K onto the xy-plane. As X a point p moves around K, the projection p of P traverses K . Evidently, this traversal visits each crossing X of K twice. The visit in which p is on the higher of the X two line segments is called the over-crossing, and the visit
Fig. 5. A Hopf link. At one of the two mixed crossings, A is higher than B, and at the other, B is higher than A.
in which p is on the lower segment is called the undercrossing. K can be characterized by the cyclic sequence of over- and under-crossings that occur when K is traversed. If no crossings occur, K must be an unknot. More generally, let ¸ be a polygonal link in (z-) regular position. As a point moves around each knot K of ¸, its projection traverses K . In the process, it may encounter X crossings at which K crosses itself (self-crossings) or X crosses the projections of other knots of ¸ (mixed crossings). ¸ can be characterized by the sequences of these X crossings that occur when the knots of ¸ are traversed. If only self-crossings occur, or if the mixed crossings are all over-crossings or all under-crossings (implying that the knots are splittable, so that they have disjoint ranges of z-coordinates), we call ¸ an unlink. If ¸ consists of two knots, and exactly two mixed crossings occur, one of
1544
A. Nakamura, A. Rosenfeld / Pattern Recognition 33 (2000) 1541}1553
them an over-crossing and the other an under-crossing, as in Fig. 5, we say that ¸ minimally linked. (Such an ¸ is sometimes called an extended Hopf link.)
3. Digital images and digital links A digital image I is a mapping from Z (the lattice points of R) into +0, 1, such that only "nitely many lattice points map into 1's. In digital topology, lattice points are called voxels. Two voxels (x, y, z) and (u, v, w) are called 6-adjacent (or 6-neighbors) if "x!u"# "y!v"#"z!w""1; they are called 26-adjacent (or 26neighbors) if max ("x!u", "y!v", "z!w")"1. [In R, lattice points are called pixels, and we similarly de"ne 4-neighbors and 8-neighbors.] The 26-neighborhood N(p) of a voxel p consists of p and its 26-neighbors. The re#exive, transitive closure of 6-(26-)adjacency in the space of voxels is called 6-(26-) connectedness, and a maximal 6-(26-)connected set of voxels is called a 6-(26-) component. If I has only one 6-component S of 1's, but has more than one 26-component of 0's, only one of the components of 0's can be in"nite; this component is called the background of S, and the others, if any, are called cavities in S. (Tunnels in S will be discussed below.) For a brief introduction to digital topology (in two and three dimensions) see [4]. A digital 6-(26-)curve is a "nite 6-(26-)connected set C of voxels each of which has exactly two 6-(26-)neighbors in C. In this paper, a digital 6-curve will usually be called a digital knot. A digital link is a "nite union of digital knots that do not touch each other } i.e., voxels belonging to di!erent knots are never neighbors. A digital link de"nes a digital image in which the voxels belonging to the curves are 1's and the remaining voxels are 0's; we will sometimes refer to this image as the link. The voxels of a digital curve C can evidently be arranged in cyclic order; we call two voxels successive if they are consecutive in this order. If we join each pair of successive voxels in a 6-curve, we obtain an isothetic polygon 1C2; and similarly for a 26-curve, except that the polygon is not isothetic. If ¸ is a digital link, we denote by 1¸2 the union of the polygons constructed in this way from all the knots in ¸. This process uniquely associates a `reala polygonal link with any digital link, and in the case of a 6-link, this polygonal link is isothetic. Conversely, if ¸ is a `reala isothetic polygonal link in R, we can digitize it by partitioning R into half-open cubes, and de"ning the digitization (¸) of ¸ as the set of centers of the cubes that ¸ intersects. Evidently, if we make the cubes small enough, we can ensure that the digitization of each isothetic polygon of ¸ is a 6-curve, and that these 6-curves do not touch each other, so that (¸) is a digital 6-link. Note that 1(¸)2 may not be the same as the original ¸; indeed, if the sides of an isothetic polygon are incommensurate, there cannot exist a grid of
lattice points (cube centers) in which every side has integer coordinates. However, it is clear that there exists an OPH that maps 1(¸)2 onto ¸, so that ¸ and 1(¸)2 are isomorphic. The Euler characteristic [4] of a digital image I can be de"ned in terms of the numbers of certain simple patterns of 1's in I. We can then de"ne the number of tunnels in I as the number of "nite components of 1's and 0's, minus the Euler characteristic. In particular, a 6-curve, which has one "nite component of 1's and no "nite components of 0's, has exactly one tunnel. It can also be shown that a 6-connected set S has a tunnel i! there exists a 6-curve C contained in S, and a 26-curve C contained in SM (the complement of S), such that C and C are linked (i.e., their union is not an unlink).
4. Simple voxels and simple deformations A voxel p of a digital image I is called simple if changing the value of p (from 1 to 0 or 0 to 1) does not change the topology (i.e., the numbers of components, cavities, and tunnels) in N(p). It can be shown that p is simple i! (a) p is 6-adjacent to exactly one 6-component of 1's in N(p) (b) p is 26-adjacent to exactly one 26-component of 0's in N(p) (c) Changing the value of p does not change the number of tunnels in N(p) It is well known [4] that changing the value of a simple voxel does not change the topology of I. Note that p is simple before its value is changed i! it is simple after its value is changed. I and J are said to di!er by a local deformation if J is obtained from I (or vice versa) by changing the value of a single simple voxel. They are said to di!er by a simple deformation (in brief: by SD) if J is obtained from I (or vice versa) by repeatedly changing the values of simple voxels. Evidently, if I and J di!er by SD, they are topologically equivalent. If I and J di!er by SD, and ;, < are the sets of 1's of I and J, we will also say that ; and < di!er by SD.
5. Simple deformations and isomorphism Theorem 1. ¸et C and C be digital 6-knots such that 1C2 and 1C2 di+er by a quad-move; then C and C di+er by SD. Proof. If 1C2 and 1C2 di!er by the "rst type of quadmove, the move takes (a segment of) a side of 1C2 into three coplanar sides of 1C2 (or vice versa), and neither C nor C intersects the interior of the rectangle R de"ned
A. Nakamura, A. Rosenfeld / Pattern Recognition 33 (2000) 1541}1553
by these four line segments. Suppose, without loss of generality, that a side 1s2 of R is contained in a side of C. Let 1s2 be the opposite side of R; let 1t2, 1t2 be the other two sides; and let s, s, t, t be the sets of voxels of C or C that lie on 1s2, 1s2, 1t2, 1t2, respectively. We can perform a series of local deformations of C in which we successively adjoin to s the voxels s that are 4 neighbors of voxels of s and lie in or on R; then the voxels s that are 4-neighbors of voxels of s and lie in or on R;2; until we reach the voxels of s. Readily, each of these local deformations is topology-preserving, since C does not intersect the interior of R, and by using a su$ciently "ne digitization, we can ensure that C also does not touch the interior of the digitization (R) of R except along the sides of (R). The result of this series of local deformations is to `dilatea s until it contains all of the voxels in (R). Next, we successively delete from (R) the voxels in the interior of s (i.e., the voxels of s other than its endpoints); then the voxels in the interior of s ;2; and so on until we reach s (but we do not delete any voxels of s). Readily, each of these deletions is also a topology-preserving local deformation, and the result is that all of (R) is deleted except for the voxels that lie on the three sides t, t, and s of (R). The proof if 1C2 and 1C2 di!er by the second type of quad-move is similar. Suppose the move takes (segments of) successive sides 1s2 and 1t2 of 1C2 into successive sides 1s2 and 1t2 of 1C2, where 1s2, 1t2, 1s2, 1t2 are the sides of rectangle R. As in the previous case, we `dilatea s (or t) until we have added all of the voxels in (R); and we then `erodea (R), leaving only the voxels of s and t. Evidently here too, all of these additions and deletions of voxels are topology-preserving; they thus constitute an SD process that transforms C into C. 䊐 Theorem 2. ¸et C and C be digital 6-knots such that C and C are isomorphic; then C and C di+er by SD. Proof. As observed in Section 1, since 1C2 and 1C2 are isomorphic they di!er by a "nite sequence of quadmoves. Thus there exists a "nite sequence of simple isothetic polygons 1C2"P ,2, P "C such that P L G di!ers from P by a quad-move, 1)i)n. Since G\ the vertices of the polygons P ,2, P do not neces L\ sarily have integer coordinates, there do not necessarily exist digital 6-knots Q ,2, Q such that 1Q 2"P , L\ G G 1(i(n. However, it is not hard to see that P ,2, P L\ can be chosen so that the coordinates of their vertices are rational. [Evidently, for any simple isothetic polygon, there exists an OPH that maps it onto a simple isothetic polygon whose vertices have rational coordinates.] This implies that there exists a (su$ciently "ne) grid of lattice points in which the vertices of P ,2, P all L have integer coordinates; thus in this grid, there do exist
1545
digital 6-knots Q ,2, Q such that Q "P , 0)i)n. L G G Since P and P di!er by a quad-move, 1)i)n, it G G\ follows that Q and Q di!er by SD; hence Q "C and G G\ Q "C di!er by SD. 䊐 L We shall next prove that the converse is also true: If two digital 6-knots C and C di!er by SD, then 1C2 and 1C2 are isomorphic. This result is more di$cult to prove, because if C is a digital 6-knot, and D is a digital image that di!ers from C by SD, D is not necessarily a digital 6-knot; thus when SD is used to transform C into C, there may be many intermediate steps that are not digital 6-knots, so it is more di$cult to describe how 1C2 is related to 1C2. To deal with this di$culty, we will study the class of digital objects that can be obtained by applying SD to a digital 6-knot. A subset S of a digital image will be called a K-set if it is 6-connected, has no cavities, and has exactly one tunnel. It is well known that a 6-curve has these properties; thus any digital 6-knot is a K-set. Moreover, SD preserves these properties; hence if a digital image can be obtained from a 6-curve by SD, the set of 1's of that image is a K-set. We next introduce a method of associating a real solid [I] with the set of 1's in any digital image I. We center a `blacka unit cube at each 1 of I, and a slightly larger `whitea cube at each 0 of I. A point of R is called black if it is not contained in any white cube. Readily, the set [I] of black points of R is connected i! the set of 1's of I is 6-connected; and if the set of 1's of I is a K-set, then [I] is connected, has no cavities, and has exactly one tunnel; we call such an [I] a K-solid. For example, a solid torus is a K-solid. In what follows, the surface of [I] will be called its white surface. Suppose the white cubes have side length 1#d, and consider a set of `graya cubes of side length 1#2d obtained by slightly expanding the white cubes; thus the surface of the union of the gray cubes lies inside the black cubes. We refer to this surface as the gray surface of [I]. Note that for any digital 6-curve C (whether or not knotted), the white or gray surface of [C] is homeomorphic to the surface of a solid torus, and so is orientable. Let C be a digital 6-curve, and let ; be a set obtained from C by SD. We de"ne a meridian polygon of [C] as a rectangle on the white surface of [C], parallel to one of the coordinate planes, whose interior is contained in the interior of [C]. An isothetic polygon Q on the white surface of [;] that is a homeomorphic image of a meridian polygon of [C] is called a meridian polygon of [;]. Let P be an isothetic polygon that lies on the gray surface of [;], and let P and Q be minimally linked. We shall show that P and 1C2 di!er by a "nite sequence of quad-moves; thus any such P is isomorphic to 1C2. In particular, if C and C di!er by SD, we can evidently construct a P on the gray surface of [C], parallel to
1546
A. Nakamura, A. Rosenfeld / Pattern Recognition 33 (2000) 1541}1553
1C2, that di!ers from 1C2 by a "nite sequence of quad-moves; hence 1C2 and 1C2 are isomorphic. We "rst show that if < is obtained from ; by adding a single simple voxel p, then there exist minimally linked isothetic polygons P, Q on the gray and white surfaces of [<], respectively, that di!er from P and Q by "nite sequences of quad-moves. Let S be the (gray or white) surface of [;]. It is not hard to see that since p is simple, the surface of [p] can be decomposed into two simply connected regions R and R, where R is shared with S. Thus when p is added to ;, the surface of [<] is obtained by deleting R from S and replacing it with R. Let B be the common boundary of R and R; evidently, B is a simple closed curve. If P (or Q) does not intersect the interior of R, it lies on the surface of [<]. Suppose it does intersect the interior of R, say in the isothetic polygonal arcs a b ,2, a b , where the a's and b's are on B. Since P I I (or Q) is a simple polygon, these arcs do not intersect one another; hence the pairs of points (a , b ) and (a , b ) G G H H cannot separate one another on B. Hence P (or Q) can be diverted by replacing the arcs (a , b ) by isothetic arcs G G with the same endpoints that lie in the interior of R rather than the interior of R; thus the diverted P (or Q) lies on the surface of [<]. Evidently, the diversions of the arcs can be accomplished by "nite sequences of quad-moves; in particular, P and Q are still minimally linked. The argument is similar if = is obtained from ; by deleting a single simple voxel p. Here the roles of R and R are interchanged; the surface of [=] is obtained by deleting R from S and replacing it with R. If P (or Q) does not intersect the interior of R, it lies on the surface of [=]; if it does, it can be diverted by a "nite sequence of quad-moves, just as in the preceding paragraph, so that the diverted P (or Q) lies on the surface of [=], and P and Q are still minimally linked. By repeating the arguments in the two preceding paragraphs, we can obtain isothetic polygons PH and QH that di!er from P and Q by "nite sequences of quad-moves, where PH lies on the gray surface of [C]; QH lies on the white surface of [C]; QH is unknotted; and PH and QH are minimally linked. [C] is a union of cubes [C ] centered at the voxels of G C. We can cyclically order these cubes, say [C ], [C ],2, [C ], [C ]; each pair of successive cubes C and K G C (modulo m) share a square face s . PH may cross G> G a given square s many times by passing from C to G G C or vice versa; we will call these crossings `upwarda G> and `downwarda, respectively. Let the points where PH crosses s , in cyclic order on PH, be p ,2, p , p . If two G G GI G consecutive crossings p and p are of opposite type, GH G H> then since PH is a simple polygon, the arc p p of GH G H> PH can be shortened, by a "nite sequence of quad-moves, until it is entirely eliminated. We can do this for every pair of consecutive crossings of opposite type, until only crossings of the same type remain; and this can be done
for every s . The resulting shortened PH passes through G every s in the same direction (upward or downward) the G same number of times, say n, and thus passes through every [C ] n times. But since PH is still minimally linked G with QH, we must have n"1; and since PH passes through each [C ] only once, it can be transformed into G 1C2 by a "nite sequence of quad-moves. Hence PH, and thus P, is isomorphic to 1C2. We have thus proved Theorem 3. ¸et C and C be digital 6-knots that di+er by SD; then 1C2 and 1C2 are isomorphic. It should be possible to generalize Theorems 2 and 3 to digital links, and show that two digital links ¸, ¸ di!er by SD i! the isothetic polygonal links obtained by joining the centers of successive voxels in each digital knot of ¸ and ¸ are isomorphic. We will not attempt to do this in detail here; but we will conclude this section by proving a simple result about linked connected sets, where we de"ne two 6-connected sets S, ¹ to be linked if they contain 6-curves C, D such that C and D are linked. Theorem 4. If the 6-connected sets S, ¹ are (un)linked, the same is true for any sets that can be obtained from S and ¹ by SD. Proof. Suppose SD could link two unlinked 6-components S, ¹ of 1's. Evidently the step at which this happens cannot be a step at which a voxel is changed from 1 to 0. Let it be the step at which the simple voxel C is changed from 0 to 1. Just after this step, S and ¹ must contain 6-curves C, D that are linked. If C is not a voxel of C or D, they must have existed and been linked before C was changed to 1; hence C must be a voxel of one of them, say C. Thus just before the change of C to 1, there existed a 6-arc C"C! +C, of 1's which the change completes into a 6-curve. As in the proof that SD cannot create a tunnel by changing a simple voxel from 0 to 1 (see Proposition 1 in Appendix A), there must exist 1's in the neighborhood of C through which C can be completed into a 6-curve CH. Evidently CH is contained in S and is linked with D, so that S was already linked with ¹. Similarly, suppose SD could unlink two linked 6-components S, ¹ of 1's. Evidently the step at which this happens cannot be a step at which a voxel is changed from 0 to 1. Let it be the step at which the simple voxel C is changed from 1 to 0. Just before this step, S and ¹ contained 6-curves C, D that were linked. If C is not a voxel of C or D, changing C to 0 evidently cannot unlink them; hence C must be a voxel of one of them, say C. As in the proof that SD cannot destroy a tunnel by changing a simple voxel from 1 to 0, there must exist 1's in the neighborhood of C through which C can be diverted; let the diverted 6-curve be CH. Evidently, CH is
A. Nakamura, A. Rosenfeld / Pattern Recognition 33 (2000) 1541}1553
contained in S and is still linked with D, so that S remains linked with ¹. )
6. Isothetic regular position for digital links As in Section 2, we can de"ne z-regular position for a digital link; but in the digital case we want to impose the additional restriction that neighboring pixels in the projection of ¸ are projections of successive voxels on one of the sides of ¸, so that the projection looks like a digital line drawing. To ensure this, we can displace the sides that are parallel to the x-(y-)axis so they have even y-(x-)coordinates, and magnify them so their endpoints have even x-(y-)coordinates; thus if these coordinates are all distinct, their projections on the xyplane cannot be 4-neighbors. By Theorem 1, SD can be used to perform quad-moves on digital links; hence SD can be used to put a digital link into z-regular position. A detailed description of an SD-process that puts any digital link into z-regular position is given in Appendix A. For links in regular position, it is well known [1}3] that knot-theoretic isomorphism can be de"ned in terms of simple types of `movesa, known as Reidemeister moves, that locally modify the link's projection; two regularposition links are isomorphic i! their projections di!er by a "nite sequence of such moves. Such `movesa can also be de"ned for isothetic links in z-regular position, or for digital links. The moves can be performed on the z-projection of ¸ by performing suitable quad-moves on ¸; thus in the case of a digital link, the moves can be performed on the projection by applying suitable SD processes to ¸. Detailed descriptions of these SD processes are given in Appendix B.
7. Concluding remarks We have de"ned digital 6-knots in Z, and have discussed how they are related to isothetic polygonal knots in R. We have introduced a class of topology-preserving `simple deformationsa (SD) of digital images, and have shown how SD is related to knot-theoretic isomorphism. We have also de"ned `regular positiona for isothetic links and digital links, and have shown that SD can be used to put any digital knot into such a position and to perform `Reidemeister movesa on the resulting projection. This paper has developed basic parts of a theory of knots and links in three-dimensional digital images. It is hoped that in future papers, these concepts can be applied to the development of algorithms for determining the knottedness or linkedness of real objects (ropes, for example) by analyzing their digital images.
1547
Acknowledgements The authors thank Prof. T.Y. Kong of City University of New York for his comments on an earlier draft of this paper.
Appendix A. Simple deformation, magni5cation, and regular position In this appendix we will show that SD can be used to put a digital link into z-regular position. We "rst prove Proposition 1. For any positive integer m, SD can be used to magnify a digital image in the x-, y-, or z- direction by the factor m. Proof. Magni"cation in (e.g.) the z-direction is done by successively dilating each run of 1's or 0's upward. Let the runs r whose uppermost voxels are on level i of the image G (where the levels are numbered from bottom to top) be dilated by the amount it. Evidently, if the lowest voxel of some r is on level h#1)i, then when the dilation G process reaches level h, r is re-eroded by ht (since the run G r below it is dilated upward by ht); hence at the end of F the dilation process, r 's length has become (i!h)# G it!ht"(i!h)(t#1), which is t#1 times its original length. Each r is dilated upward only after the runs r , for G H j'i, have been dilated upward; and for each i, all the runs r of 0's are dilated upward before all the runs r of G G 1's. It is not hard to show (see Ref. [5] for a detailed proof in the two-dimensional case) that the voxels whose values are changed by this dilation process are 6-adjacent to only one 6-component of 1's, and 26-adjacent to only one 26-component of 0's. To complete the proof that the process involves only changes in the values of simple voxels, we must show that the process also cannot create or destroy tunnels. In what follows, we use the following notation for the 26-neighborhood of a voxel: abc
j kl
rst
def
mCn uvw
ghi
opq
xyz
It is not hard to see that our dilation process has the following properties: (a) When C is changed from 0 to 1, we have e"0 and v"1, and if any voxel in the plane containing v is 0, so is the voxel directly above it in the plane containing C. (b) When C is changed from 1 to 0, we have e"1 and v"0, and if any voxel in the plane containing C is 1, so is the voxel directly above it in the plane containing e.
1548
A. Nakamura, A. Rosenfeld / Pattern Recognition 33 (2000) 1541}1553
Using these facts, the proof that tunnels are not created or destroyed by the process is as follows: In order for changing C from 0 to 1 to create a tunnel, just before the change there must exist a 6-arc C of 1's between two 6-neighbors of C, such that changing C to 1 completes C into a 6-curve C that is linked with a 26-curve D of 0's. Since e"0, the endpoints of C must be two of k, m, n, p, v. If neither of them is v, we can complete C into a 6-curve CH using the voxels directly below its endpoints (these voxels must be 1's since the voxels directly above them are 1's) and v. Similarly, if one of the endpoints is v, we can join it to the other endpoint through the voxel directly below that endpoint (which must be 1) to form a 6-curve CH. Evidently, in either case, if C is linked with D, so is CH. Thus changing C from 0 to 1 does not create a new tunnel. In order for changing C from 0 to 1 to destroy a tunnel, just before the change there must exist a 26-curve D of 0's passing through C that is linked with a 6-curve C of 1's. If the predecessor or successor of C on D is in the plane containing v, the voxel directly above it is also 0, and is a 26-neighbor of e; while if the predecessor or successor is in the plane containing C or e, it itself is a 26-neighbor of e. Hence D can be diverted into a 26curve DH of 0's that passes through e instead of C; and evidently if D is linked with C, so is DH. Thus changing C from 0 to 1 cannot destroy a tunnel. In order for changing C from 1 to 0 to create a tunnel, just before the change there must exist a 26-arc D of 0's between two 26-neighbors of C, such that changing C to 0 completes D into a 26-curve D that is linked with a 6-curve C of 1's. If either endpoint of D is in the plane containing e, the voxel directly below it is also 0, and is a 26-neighbor of v; while if either endpoint is in the plane containing C or v, it itself is a 26-neighbor of v. Hence D can be diverted into a 26-arc of 0's whose endpoints are 26-neighbors of v. If we adjoin v to this arc, we obtain a 26-curve DH, and evidently if D is linked with C, so is DH. Thus changing C from 1 to 0 does not create a new tunnel. In order for changing C from 1 to 0 to destroy a tunnel, just before the change there must exist a 6-curve C of 1's passing through C that is linked with a 26-curve D of 0's. Since v"0, the predecessor and successor of C on C must be two of k, m, n, p, e. If neither of them is e, we can divert C into a 6-curve CH that passes through the voxels directly above them (which must be 1's) and through e. Similarly, if one of them is e, we can join it to the other one, through the voxel directly above the other one, to form a 6-curve CH. Evidently, in either case, if C is linked with D, so is CH. Thus changing C from 1 to 0 cannot destroy a tunnel. ) When a digital 6-curve C is magni"ed by m, say in the z direction, the lengths of its z-runs are multiplied by m, but its x- and y-runs are thickened into rectangles, parallel to the xz- or yz-plane, of z-height m. However, as we
shall next see, using additional SD steps, the magni"ed C can be thinned until it is again a 6-curve whose z-run lengths have been multiplied by m and whose x- and y-run lengths are unchanged. We call the result a magni"ed 6-curve. Proposition 2. For any positive integer m, SD can be used to magnify any 6-link, in the x, y, or z direction, by the factor m. Proof. Let r, s, t,2, be a sequence of x- and y-direction runs, preceded and followed by a z-direction run. After the magni"cation, r, s, t,2, becomes a sequence of rectangles of height m in the z-direction. Since the z-magni"cation was accomplished by upward dilations, we reduce these rectangles to line segments parallel to the xy-plane by upward erosion in the z-direction by the amount m!1. Since the original 6-curves did not touch, and their neighboring 0's were also magni"ed, the rectangular regions also do not touch. Hence SD can be used to re-erode each rectangular region so that the "nal result is a 6-link. This magni"ed 6-link is once again a set of 6-curves, each consisting of a succession of runs, where the lengths of the z-runs are multiples of m. To see why the lengths are multiples of m, consider a two-dimensional 4-arc consisting of a z-run r of length h and a z-run t of length k, joined by an x-run s, with r below and t above s. When we magnify by the factor m in the z-direction, s becomes a rectangle S of height m, with a z-run R of length m(h!1) emanating from the bottom row of S, and a z-run ¹ of length m(k!1) emanating from the top row of S. If we then erode from below by the amount m!1, R is shortened to length m(h!1)!(m!1), and S is thinned to height 1; thus m!1 pixels are added to the top of R (these pixels were previously along the edge of S that R emanated from), so that the "nal length of R is m(h!1), which is m times its original height (if the pixel common to r and s is not counted). The "nal length of ¹ is m(k!1) (erosion from below doesn't a!ect ¹ since it has no bottom pixels); this is also m times its original height if the pixel common to t and s is not counted. 䊐 Using Proposition 2, we can now prove Theorem 4. SD can be used to put any digital link into z-regular position. Proof. Let the given link ¸ have extent n in the zdirection (i.e., its 1's all lie on n successive planes perpendicular to the z-axis). We begin by magnifying ¸ by m"2n in the x- and y-directions, and by 5 in the zdirection. We then re-erode the magni"ed image, as described in Proposition 2, so that the result is again a link. In the resulting link, the x- and y-runs lie in planes whose z-coordinates are multiples of 5. In each such
A. Nakamura, A. Rosenfeld / Pattern Recognition 33 (2000) 1541}1553
plane, the cross-section of the link consists of a set of polygonal arcs composed of alternating x- and y-runs of pixels whose endpoints are at x- or y-coordinates that are multiples of m. Parallel runs are also at coordinates which are multiples of m. The endpoints of the arcs are connected to the rest of the link (in other planes) by z-runs whose lengths are multiples of 5. We shall now show that SD can be used to shift every x- and y-run so that no two x-runs have the same y-coordinate, and no two y-runs have the same x-coordinate. Speci"cally, we will shift each x-run in the y-direction by the amount 2z, and similarly we will shift each y-run in the x-direction by the amount 2z, where z was the z-coordinate of the run before the z-magni"cation by 5. This means that the x-runs will now have y-coordinates, and the y-runs will have x-coordinates, that are of the form hm #2z. Since m is bigger than all the 2z's, the x- and y-runs are now all at di!erent y- and x-coordinates, respectively. The shifting will be done by dilation and re-erosion. The amounts of dilation involved are only 2z, which is smaller than m, so the dilated runs do not intersect any other runs except for those that share endpoints with them, and it is clear that this can give rise to no changes in topology, so it can be done using SD. The dilation and re-erosion will be applied to one plane at a time, as will now be described. On the top plane, each x-run is dilated by 2z in the #y-direction, and each y-run is dilated by 2z in the #x-direction. They are then re-eroded by 2z in the same direction. The result is to shift each x-run by 2z in the #y-direction and each y-run by 2z in the #xdirection. When an x- and a y-run meet at a common point, if they are both runs in the positive direction, the result of the re-erosion is to shorten them by 2z; if they are both in the negative direction, they are both lengthened by 2z; and if one is in the positive direction and the other in the negative direction, the positive one is shortened and the negative one lengthened. (It doesn't matter which is dilated and re-eroded "rst and which is dilated and re-eroded second.) It remains to explain what happens at the endpoints of the x- and y-runs, where they meet z-runs. To explain this, we consider two cases: where the two ends of the z-run meet runs of the same type (x or y); and where they meet runs of di!erent types. Suppose a z-run mets x-runs at both of its ends. Since these runs are in di!erent planes, say z"5j'5i, they need to be shifted by di!erent amounts, 2j and 2i, in the y-direction. Thus to preserve topology, the upper end of the z-run needs to be shifted by 2j in the y-direction, and its lower end by 2i in the y-direction. Thus the z-run has to bend, e.g. half-way between the two planes, and run in the y-direction for a distance of 2(j!i). Note that in the `"nala image, after all the dilation and re-erosion is complete, this short y-run will not coincide, in the projec-
1549
tion parallel to the z-axis, with any other y-run. This is because its x-coordinate is a multiple of m, and it extends from a point of the form hm#2j to a point of the form hm#2i, i.e. its length 2( j!i) is a fraction of m. The original y-runs were all originally at x-coordinates that were multiples of m, but the dilation and re-erosion shifts them so their x-coordinates are of the form hm#2z, which is not a multiple of m. The only y-runs in the "nal image that can have x-coordinates that are multiples of m are y-runs resulting from the bending of z-runs as just described. But if two of these have x-coordinates that are the same multiple hm of m, they must have joined endpoints that were in di!erent planes; say the other one involves the planes z"5v'5u, where u, v are both greater, or both smaller, than i, j. Hence the run generated by the bending of this z-run extends from the point hm#2v to hm#2u; this range is disjoint from the range hm#2j to hm#2i. To achieve the needed bending, when we dilate and re-erode an x-run r in the y-direction, we also dilate and re-erode part of the z-run that extends upward or downward from the endpoint of r; speci"cally, we dilate and re-erode the half of it that contains the endpoint of r. Thus when we are working on the upper of the two planes that contain the endpoints of the z-run, say the plane z"5j, the top half of the z-run is shifted by j; but it is still connected to the lower half by a y-run of length j. When we later work on the lower plane, at z"5i, we dilate and re-erode the bottom half of the z-run, so that it is shifted by i, and joined to the top half by a y-run of length j!i. Finally, suppose a z-run meets an x-run at its upper end and a y-run at its lower end, or vice versa. Here, when we shift the x-run in the y-direction by dilating and eroding it by 2j, we also dilate and erode the z-run by the same amount. If the y-run in the lower plane is in the #y direction, we shorten it by 2j } i.e., we erode it when we re-erode the z-run, but we do not dilate it "rst. If it is in the !y-direction, we lengthen it by 2j } i.e., we dilate it when we dilate the z-run, but we do not re-erode it. Other combinations of positive and negative x- and y-runs in the upper and lower planes are handled analogously. When the entire process is completed, the magni"ed link now consists of x-runs and y-runs all of which are at di!erent coordinates, or if they have the same coordinate, are in disjoint positions. (The short runs that are created by bending some of the z-runs may have the same coordinates, but the ones that have the same coordinates are in disjoint positions as already shown. If two long runs have the same coordinates, they must have originally been in the same plane, so were disjoint before they were shifted, and are still disjoint after they are shifted.) The only pixels in the projection which are projections of more than one 1 in the magni"ed link are pixels where an x-run `crossesa a y-run; since no two x-runs have the same y-coordinate (or if they do, they occupy disjoint
1550
A. Nakamura, A. Rosenfeld / Pattern Recognition 33 (2000) 1541}1553
Fig. 6. The three types of `Reidemeister movesa for projections of digital links onto the xy-plane. Given a portion of a link ¸ whose projection looks like (a, b, or c), the move transforms ¸ into a portion of a link ¸ whose projection looks like (aH, bH, or cH), respectively. In case (a), at the crossing, the x-run can be higher than the y-run or vice versa. In case (b), at the two crossings, the x-runs can both be higher or both lower than the y-runs. In cases (c) and (cH), the x-run is higher than the y-run where they cross, and the ¸-shaped arc is higher than both of them where it crosses them. Note that in all three types of moves, the endpoints of the arcs remain in the same positions; thus a move a!ects a link only locally } the portions of the link not shown in the diagram remain unchanged.
ranges of x-coordinates), and no two y-runs have the same x-coordinate (or if they do, they occupy disjoint ranges of y-coordinates), there cannot be three runs whose projections all pass through the same pixel } i.e., there can be at most one x-run and at most one y-run whose projections pass through a given pixel. Moreover, since the runs whose projections pass through the given pixel are x- and y-runs, they have unit thickness in the z-direction; hence that (x, y) column contains only two runs of length 1. By using an additional factor of 2 in the x and y magni"cations, we can insure that the y-coordinates of the x-runs, and the x-coordinates of the y-runs, are all even; hence their crossings have even coordinates, so that no two crossings can be 4-neighbors. Evidently, a crossing has four 4-neighbor 1's, and a non-crossing has two; these two may be opposite 4-neighbors (if the non-crossing is in the middle of a run) or adjacent 4-neighbors (if it is at an ¸-junction where two runs meet). Note that when an x- or y-run mets a z-run, the z-run must bend in the y- or x-direction, and this gives rise to an ¸-junction in the
projection; there can be no endpoints (1's with exactly one 4-neighbor) in the projection. 䊐
Appendix B. Using simple deformation to perform `Reidemeister movesa If a digital link is in z-regular position, we can `augmenta its projection onto the xy-plane by specifying, at each crossing, whether the x-run or the y-run has a higher z-coordinate in the link. Fig. 6 shows the three types of `Reidemeister movesa that can be performed on augmented projections of digital links. In this appendix we show that these three types of moves can all be performed on the augmented projection of a digital link in z-regular position by applying SD to the link. (For brevity, we will call these digital Reidemeister moves `R-movesa.) Thus if the augmented projections of two digital links in z-regular position di!er by a sequence of R-moves, the links di!er by SD.
A. Nakamura, A. Rosenfeld / Pattern Recognition 33 (2000) 1541}1553
Theorem 5. R-moves can be performed using SD. In other words, given a portion ¸ of a digital link in z-regular position whose projection looks like (a), (b), or (c) in Fig. 6, SD can be used to transform ¸ into an ¸ whose projection looks like (aH), (bH), or (cH). Proof. As explained in Appendix A, we can assume that in a digital link in z-regular position, x-(y-)runs in di!erent z-planes are in di!erent y-(x-)positions. It follows from this that an x-(y-)run in the projection must be the projection of a single x-(y-)run in the link. We "rst show how SD can be used to perform an R-move of type (a). This type of move involves a portion of the projection, as shown in Fig. 6(a), in which a y-run (A) crosses an x-run (B). In transforming (a) into (aH) using SD, we will keep the lower end of A and the left end of B "xed. The upper end of A is joined to the right end of B by a `loopa CD. We will show how to move CD to the left of A, keeping A "xed and shortening B. When we have done this, A and B will no longer cross in the projection, and the result will look like (aH). If A is below B, CD will move above A; if A is above B, it will move below A. (Alternatively, we could move CD above or below B, keeping B "xed.) In (a), if A is below B, A, C, D, and B are joined by z-runs ;, <, = at least one of which is upward, since B is above A. If D is not above A, we move it upward, by dilating it upward and eroding it from below, until it is at least two levels above A. (This shortens =, and lengthens or shortens < depending on whether C is below or above D.) We then move D, =, and < leftward, by dilating them leftward and eroding them from the right. We also shorten B and C (by eroding them from the right) so that = and < still join the endpoints of B and C to the endpoints of D. Note that when D passes above A, C has zero length and the endpoint of B is above A. As we continue to shift D leftward, C becomes a leftward run, and the endpoint of B is to the left of A; the con"guration now looks like (aH). The case where A is above B is treated analogously, except that we begin by moving D downward (if necessary) until it is below A, and then move D, =, and < leftward below A. Evidently, in both cases, the necessary moves can be done using SD; the dilating and eroding con"guration remains simply connected. Thus SD can be used to transform each case of (a) so that its projection looks like (aH), with no crossings. In the con"gurations resulting from the two cases of (a) whose projections look like (aH), the x- and y-runs are at di!erent z-heights; but evidently SD can be used to transform either of them into the other by moving the runs upward or downward. Since these SD processes are reversible, SD can thus be used to transform each subcase of (a) into the other. [In general, if two links in z-regular position have the same projection, SD can be used to transform
1551
one into the other. This is true because in z-regularposition links created by the method described in Appendix A, runs in the projection correspond to runs in the link.] We next show how SD can be used to perform an R-move of type (b). This involves a projection, as shown in Fig. 6(b), in which two pieces of x-run (call them A, C) are linked together by a short y-run (B) and pass under (or over) a long piece of y-run (D). The R-move must shorten the arcs A and C, and shift B, so that ABC lies entirely to the left of D (see bH). Similarly, we can lengthen A and C so that they again extend to the right of D, but this time passing over D instead of under it. The yz-plane that contains arc B also contains two z-runs that join the ends of B to A and C respectively. In this plane, B and these two runs look like the pattern 1 1
1
1
1 or 1
1 1 1 1
1 1 1 1 1 1 1
where the horizontal run is B, and the vertical runs are the z-runs; the free ends of the vertical runs are where they meet the x-runs A and C. Evidently, we can use SD to dilate this pattern by one voxel in the x-direction, along A and C, and then erode it by one voxel in the same direction (which also shortens A and C). This process is repeated until, as seen from above (along the z-axis), A and C have been shortened, and B (and the z-runs that join A and C to B) shifted, so that ABC (and the z-runs) lie to the left of D. This can be done using SD because the space that B and the z-runs sweep through while they are shifted contains no 1's and has no 1's adjacent to it; otherwise they would have been visible in the z-projection shown in (b). During this process, the shifting pattern passes underneath D, which lies in a higher z-plane than A, B, and C. Note that the shifted B is no longer in an x-position (modulo m) proportional to its z-position; but it still does not coincide (in the z-projection) with any other y-run, so that the deformed link is still in regular position. We next raise the pattern in the z-direction until it lies above the z-plane containing D. We do this by dilating the horizontal run (B) by one voxel in the upward (#z)direction, and then re-eroding it; this shifts B upward by one voxel, and shortens or lengthens the vertical runs by one voxel (a z-run that goes upward from the end of B is shortened; a z-run that goes downward from the end of B is lengthened). If shortening is involved (when one or both of the z-runs extends downward from A or C), then when the length of the shortened z-run reaches zero, B continues to shift upward and the shortened z-run
1552
A. Nakamura, A. Rosenfeld / Pattern Recognition 33 (2000) 1541}1553
becomes an upward z-run joining the shortened A or C to the end of B; this run is lengthened as B continues to shift upward. This process continues until B is above the z-plane containing D; the shifted B is joined to the ends of the shortened A and C by upward z-runs. Evidently this process can be carried out using SD, since the space that B and the z-runs pass through contains no other 1's and has no 1's adjacent to it. Note that while the y-run B shifts upward, it remains in the same x-position; thus after it is shifted upward, it is an x-position (modulo m) proportional to its original z-position, not to its new z-position, but it still does not coincide (in the z-projection) with any other y-run, so that the deformed link is still in z-regular position. Similarly, we can use SD to shift B back to the right, passing above D, until it returns to its original x-position. This is done by repeatedly dilating B by one voxel and re-eroding it by one voxel in the x-direction, away from the ends of the shortened A and C. This process creates new x-runs A and C, in the same z-plane as B, that join the shifting B to the ends of the z-runs that go upward from the ends of A and C. (In the z-projection, A and C are extensions of the shortened A and C; the z-runs that join them are invisible.) The process continues until B has returned to its original x-position, but in a higher z-plane; at this point, A6A and C6C look (in the zprojection) like the original unshortened A and C. Evidently the process can be carried out using SD, since the space that B, A and C pass through contains no other 1's and has no 1's adjacent to it. Note that the x-position of B is now again proportional (modulo m) to the original z-height of B, but not to its new z-height; but the deformed link is still in z-regular position. Finally, we show how to perform an R-move of type (c) using SD. As in our discussion of the type (b) R-move, the two visible runs A, B of the `¸a are straight runs in 3D also; they may be joined by a z-run Z (invisible in the projection), but the entire pattern is at a higher z-level than the two crossing runs. We will deform the ¸ by moving its vertex into the "rst quadrant, but leaving its endpoints P and Q "xed (in the second and fourth quadrants, respectively). To do this, we repeatedly dilate Z (alternately) in the #z- and #x-directions; this converts it into an upright
rectangular solid, with one vertical edge along Z; as the solid grows upward and rightward, the left edge of its upper face extends along A, and the bottom edge of its lower face extends along B (or vice versa, if B is in a higher z-plane than A). We continue this growth until the edge of the solid opposite Z reaches the desired position above the "rst quadrant de"ned by the two crossing runs. Evidently this dilation can be done using SD, since there are no other 1's in or adjacent to the volume occupied by the growing solid (except for those on A and B). The solid can then be eroded from the !y- and !x-directions, until nothing remains except for its #yand #x faces. The #y face can then be eroded from the !z- and !x-directions until nothing remains except its #z and #x edges, and the #x face can be eroded from the !y- and #z-directions until nothing remains except its !z and #y edges. Note that the #x edge of the #y face is the same as the #y edge of the #x face; this z-run is in the new position Z of Z. The #z edge of the #y face is an x-run that joins the upper end of Z to P, and the !z edge of the #x face is a y-run that joins the lower end of Z to Q. (In this construction we have assumed that A is in a higher z-plane than B; the reverse case is handled analogously.) Again, it is clear that these erosions can all be performed using SD. As in case (b), the y- and x-positions of the new x- and y-runs no longer correspond (modulo m) to their z-values, but the link is still in z-regular position. 䊐
References [1] R.H. Crowell, R.H. Fox, Introduction to Knot Theory, Springer, New York, 1977. [2] G. Burde, H. Zieschang, Knots, Walter de Gruyter, Berlin, 1985. [3] A. Kawauchi, A Survey of Knot Theory, BirkhaK user, Basel, 1996. [4] T.Y. Kong, A. Rosenfeld, Digital topology: introduction and survey, Comput. Vision Graphics Image Process. 48 (1989) 357}393. [5] A. Rosenfeld, T.Y. Kong, A. Nakamura, Topology-preserving deformations of two-valued digital pictures, Graph. Models Image Process. 60 (1998) 24}34.
About the Author*AKIRA NAKAMURA received the B.A. degree from Hiroshima University in 1953 and the Doctor of Science degree in mathematics from Nagoya University in 1963. From 1959 to 1964 he was an assistant professor and from 1965 to 1970 an associate professor of Applied Mathematics at Nihon University. From 1966 to 1968 he was a visiting assistant professor at the Department of Information Science of the University of North Carolina at Chapel Hill. From 1970 to 1991 he was a professor of Applied Mathematics at Hiroshima University. In 1977 he was also a visiting professor at the Computer Science Center of the University of Maryland. From 1991 to 1998 he was a professor of Computer Science at Meiji University. He is now a part-time professor in the Department of Computer Science at Hiroshima-Denki Institute of Technology and also a professor emeritus at Hiroshima University. He is working on the theory of digital pictures and on rough sets theory for arti"cial intelligence. He is an Associate Editor of the journal Information Sciences.
A. Nakamura, A. Rosenfeld / Pattern Recognition 33 (2000) 1541}1553
1553
About the Author*AZRIEL ROSENFELD is a tenured Research Professor, a Distinguished University Professor (since 1995), and Director of the Center for Automation Research at the University of Maryland in College Park. He also holds a$liate professorships in the Departments of Computer Science, Electrical Engineering, and Psychology. He holds a Ph.D. in Mathematics from Columbia University (1957), rabbinic ordination (1952) and a Doctor of Hebrew Literature degree (1955) from Yeshiva University, and honorary Doctor of Technology degrees from LinkoK ping University, Sweden (1980) and Oulu University, Finland (1994). Dr. Rosenfeld is widely regarded as the leading researcher in the world in the "eld of computer image analysis. Over a period of 35 years he has made many fundamental and pioneering contributions to nearly every area of that "eld. He wrote the "rst textbook in the "eld (1969); was founding editor of its "rst journal (1972); and was co-chairman of its "rst international conference (1987). He has published over 30 books and nearly 600 book chapters and journal articles, and has directed over 50 Ph.D. dissertations. He is a Fellow of the Institute of Electrical and Electronics Engineers (1971), and won its Emanuel Piore Award in 1985; he is a founding Fellow of the American Association for Arti"cial Intelligence (1990) and of the Association for Computing Machinery (1993); he is a Fellow of the Washington Academy of Sciences (1988), and won its Mathematics and Computer Science Award in 1988; he was a founding Director of the Machine Vision Association of the Society of Manufacturing Engineers (1985}1988), won its President's Award in 1987 and is a certi"ed Manufacturing Engineer (1988); he was a founding member of the IEEE Computer Society's Technical Committee on Pattern Analysis and Machine Intelligence (1965), served as its Chairman (1985}1987), and received the Society's Meritorious Service Award in 1986, its Harry Goode Memorial Award in 1995, and became a Golden Core member of the Society in 1996; he received the IEEE Systems, Man, and Cybernetics Society's Norbert Wiener Award in 1995; he received an IEEE Standards Medallion in 1990, and the Electronic Imaging International Imager of the Year Award in 1991; he was a founding member of the Governing Board of the International Association for Pattern Recognition (1978}1985), served as its President (1980}1982), won its "rst K.S. Fu Award in 1988, and became one of its founding Fellows in 1994; he received the Information Science Award from the Association for Intelligent Machinery in 1998; he was a Foreign Member of the Academy of Science of the German Democratic Republic (1988}1992), and is a Corresponding Member of the National Academy of Engineering of Mexico (1982).
Pattern Recognition 33 (2000) 1555}1560
Color image compression using PCA and backpropagation learning Cli!ord Clausen, Harry Wechsler* Department of Computer Science, George Mason University, Fairfax, VA 22030, USA Received 6 April 1999; accepted 17 May 1999
Abstract The RGB components of a color image contain redundant information that can be reduced using a new hybrid neural-network model based upon Sanger's algorithm for representing an image in terms of principal components and a backpropagation algorithm for restoring the original representation. The PCA method produces a black and white image with the same number of pixels as the original color image, but with each pixel represented by a scalar value instead of a three-dimensional vector of RGB components. Experimental results show that as our hybrid learning method adapts to local (spatial) image characteristics it outperforms the YIQ and YUV standard compression methods. Our experiments also show that it is feasible to apply training results from one image to previously unseen images. 2000 Published by Elsevier Science Ltd. Keywords: Principal component analysis (PCA); Color image compression; Backpropagation (BP) learning
1. Introduction
45}46]:
Even though the spectral content of illumination is in"nite dimensional, three dimensions are enough to adequately represent color. Judd et al. [1] use principal component analysis of daylight illumination to show that 99% of the variance can be accounted for with only three principal components. Furthermore, 85% of variance can be represented with only two color channels. Today, color images are rendered on computer monitors using only three primary colors, usually red, green and blue (RGB). Therefore, a straightforward way to compress a color image is to compress each of the red, green and blue gray-scale images that compose the image. A common alternative to the RGB representation is the YIQ representation, which is the standard used for television transmission. An RGB represented color image can be converted to YIQ coordinates as follows [2, pp.
* Corresponding author. 0031-3203/00/$20.00 2000 Published by Elsevier Science Ltd. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 1 2 6 - 0
>
0.3
0.59
0.11
I " 0.60 !0.27 !0.32
Q
0.21 !0.52
0.31
R
G . B
Here > is the luminance or brightness, I is the hue, and Q is the saturation or depth of color. Luminance refers to color intensity. Hue is the dominant color such as orange, red or yellow and saturation is the amount of white light mixed with a hue. With respect to a television set, > represents the black and white image and is what is displayed by a black and white television set. I and Q correspond to the two color adjustment knobs found on a color television set and contain the additional information used to produce color images. For this reason, I and Q together are sometimes referred to as chrominance values. An RGB representation can be obtained from a YIQ representation by inverting the above transformation as
1556
C. Clausen, H. Wechsler / Pattern Recognition 33 (2000) 1555}1560
follows:
R
1.00
0.96
0.62
>
G " 1.00 !0.27 !0.65
I .
B
Q
1.00 !1.10
spatial correlation, further compression is possible. In the next section, a new method for encoding color images using a learning method based upon principal component analysis which takes advantage of spatial correlation of color is presented.
1.70
2. Learning method
Television transmission uses less bandwidth for the hue and saturation components than it does for luminance. Similarly, when compressing color images, the two chrominance values, I and Q, can be decimated by between one-quarter and one-half without perceptible image degradation [2]. It is also possible to decimate > as results shown in Section 3 below. Through decimation, the number of bits required to represent the original color image can be reduced by as much as or more before gray-scale encoding begins. Encoding a color image then consists of converting an RGB image to YIQ coordinates, decimating >, I and Q and then applying a gray-scale compression algorithm such as JPEG [3, pp. 347}408] to each of the >, I, and Q coordinates. To reconstruct a color image, decode each of >, I, and Q and then interpolate the >, I and Q to their original size. Finally, use the inverse transformation above to obtain RGB coordinates for the image. Another color transformation is the YUV code used by MPEG 1. Again > is brightness or luminance and ; and < are chrominance values. The luminance > is de"ned as before, but the color information is stored di!erently. The YUV transformation is de"ned as follows:
0.59
0.11
R
; " !0.15 !0.29
0.44
G .
>
<
0.3
0.61 !0.52 !0.1
B
As before, we can convert a YUV representation to an RGB representation using the inverse transformation as follows:
R
1.01
0.007
1.14
>
G " 0.994 !0.381 !0.583
; .
B
<
1.00
2.02
0.006
We obtain compression from YUV coordinates by decimating >, ;, and <. As pointed out in Ref. [4], there are other color coordinate systems that can be used to represent color such as the Munsel color order system [5] and CIELUV [6]. These color coordinate systems do not take advantage of the spatial relationships between colors in an image. The colors of adjacent pixels in a natural image are highly correlated and, because of this
The general approach is to pre-process a color image by "rst reducing its dimension using learning based upon principal component analysis (PCA) and then compressing the reduced image. We reduce the dimension of an image by "rst partitioning it into 2;2 blocks (that is 4 pixels per block). The color image is assumed to be represented in RGB format and hence each pixel is represented by a three-dimensional vector, (r, g, b), where each vector component is represented by one byte. Therefore, each 2;2 block can be represented as a (3 components per pixel);(4 pixels)"12-dimensional vector, where each vector component is one byte. We consider an image to be a collection of 12-dimensional vectors and our objective is to "nd a partial basis to represent this 12-dimensional vector space. We want the partial basis to minimize the sum of square errors over the entire image. We will use a partial basis consisting of four orthogonal vectors as this works well experimentally. The space represented by this partial basis will be called the reduced image space and the original image transformed to the reduced image space will be called the reduced image. In matrix notation, we are seeking a matrix A such that Ax"y where x is a 12-dimensional " vector in the original image with the mean, k removed, and y is a 4-dimensional vector in the reduced image. It is the reduced image that we compress using a gray-scale compression method. Let B "(A2A)\A2 be the " pseudo inverse of A so that x( "By. Then the B matrix should be such that the square error de"ned as J" ,\ (By !x )2(By !x ) is minimized where there G G G G G are N 2;2 blocks in the image, x corresponds to the ith G 2;2 pixel block in the image and By , is the estimate x( of G G x . We will use a neural network learning algorithm G based on PCA to determine A and use a feed forward neural network trained with back propagation to determine B. We connect these two networks together to form a hybrid neural network as depicted in Fig. 1. The Sanger Network [7] in Fig. 1 creates a reduced dimension image and the feed forward network (FFN) produces an estimate of the original color image from the reduced image. The Sanger network is trained using Sanger's unsupervised learning algorithm. The feed forward network is trained using a supervised learning, back propagation algorithm [8]. The hybrid network is trained using the 12-dimensional vectors x as input vectors to learn the A matrix using Sanger's rule. The A
C. Clausen, H. Wechsler / Pattern Recognition 33 (2000) 1555}1560
Fig. 1. Color reduction network.
matrix is used to calculate four-dimensional vectors, y, for each input vector x. The y vectors are then used as input to a feed forward network with linear elements to learn the B matrix that produce estimates x( of x. Sanger's rule learns weights a so that the output GH y " a x . The algorithm begins with any initial G H GH H A and updates weights according to *a " GH gy (x ! G y a ), where g is a small learning rate that G H I I GH determines how quickly A changes. The FFN begins with an initial B and learns weights that produce output x( "By. The update rule for B is Db "g(x !x( )y . GH G G H The matrix A need not be stored with the compressed image as it is not needed for decompression. However, the matrix B must be stored as part of the compressed image so that the x( (decompressed image) can be produced from y (the decompressed reduced image). Each element of B can be adequately stored in 4 bytes, hence it takes 4*4*12"192 bytes to store B. We also want to store the mean vector, k , which requires 12 bytes (1 byte per component). We store only four principal component multipliers for each 2;2 pixel block and use
1557
8 bits for the "rst principal component multiplier and 6 bits for multipliers 2, 3, and 4. Hence, we need 26 bits for each 2;2 block. Therefore, for a 256;256 image we need 192(8)#12(8)#(26)(128)"427,616 bits. This compares with 1,572,864 bits for the original image. Hence, we obtain a compression ratio of 3.7. The arrangement of the compressed y components in the reduced image in preparation for the subsequent application of a gray-scale compression algorithm is important to the overall results. There are four y components and one should arrange each component in a di!erent quadrant of the image plane. Fig. 2 depicts the four principal components of a reduced color image with components arranged in di!erent image quadrants. This arrangement eliminates arti"cially induced high-frequency content that would result if all four components of y were arranged adjacent to each other. High-frequency content makes gray-scale compression more di$cult. As it can be seen from Fig. 2, each y component is in general quite di!erent from the other y components, while each individual y component displays high-local spatial correlation.
3. Results In this section we present results using each of YIQ, YUV and the PCA/BP hybrid learning as color reduction methods on the images shown in Fig. 3. Note that these images are color images but are only shown here in black and white. For YIQ reduction, we "rst transform the RGB image to the YIQ representation, using the best bit allocation rate found empirically, 6, 5, and 5 bits for each of the >, I, and Q components, respectively, and then use the inverse transform to convert back to the RGB representation. We perform the same operations for the YUV representation, using 6, 5, and 5 bits, respec-
Fig. 2. Reduced color image.
1558
C. Clausen, H. Wechsler / Pattern Recognition 33 (2000) 1555}1560
Table 1 Color compression results YIQ
YUV
PCA/BP Learning
Picture
R
G
B
C
R
G
B
C
R
G
B
C
Lady Baboon Boy Cave Chalk Coral
29.4 28.2 29.3 28.2 28.9 29.3
41.3 41.1 41.6 40.2 40.0 39.4
31.0 32.8 31.3 33.8 30.7 31.6
31.7 31.5 31.8 31.7 31.3 31.8
31.5 30.9 30.9 31.4 29.3 30.6
41.3 39.5 40.8 40.9 41.0 40.0
26.7 27.3 27.0 27.8 27.1 27.8
30.1 30.3 30.2 30.9 29.7 30.6
32.2 34.6 31.5 31.7 34.9 29.5
32.3 34.7 31.7 36.1 35.7 28.5
35.6 35.5 34.7 32.8 34.8 30.2
33.1 34.9 32.4 33.2 35.1 29.3
Fig. 3. Test images.
tively, for >, ;, and < components. For each of YIQ and YUV we obtain a compression ratio of 1.5. For the learning approach, we learn the transform weights as well as inverse transform weights as described above using a value of 1.0E-6 for the learning rate and use 20,000
(3)(8)/(6#5#5)"1.5
repetitions. Learning takes about 9 s on a 133 MHz Pentium processor. We convert each image to the principal component representation as in Fig. 2 and use 8, 6, 6, and 6 bits, respectively, for each of the "rst four principal components. As explained above, this bit allocation results in a compression ration of 3.7. Next we use the weights for the inverse transform to convert back to the RGB representation. For each image and each color reduction method, we compare the original RGB image
C. Clausen, H. Wechsler / Pattern Recognition 33 (2000) 1555}1560
1559
Table 2 Progress of learning
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15.
Step
Picture
R
G
B
C
Rep
Train Train Train Train Train Test Test Train Train Test Test Train Train Test Test
Chalk Cave Chalk Cave Chalk Boy Lady Cave Chalk Boy Lady Cave Chalk Boy Lady
9.3/25.6 23.9/23.5 25.0/25.8 24.4/25.7 25.4/27.3 29.5 29.5 26.2/25.4 25.2/26.8 30.4 28.9 25.5/27.0 27.3/31.4 31.0 30.5
9.3/26.9 25.1/25.7 27.3/27.0 25.9/27.2 26.4/29.0 22.7 23.5 27.4/28.1 28.7/30.8 23.9 24.2 28.4/29.6 31.5/34.1 24.4 24.7
7.3/25.5 24.4/24.4 25.0/26.9 25.4/26.8 28.2/29.3 29.3 28.4 27.6/28.6 30.0/32.2 28.9 28.6 29.7/30.8 31.8/34.5 29.1 28.9
8.5/26.0 24.4/24.4 25.6/26.5 25.2/26.5 26.5/28.4 25.9 26.3 27.0/27.1 27.5/29.3 26.8 26.7 27.4/28.8 29.8/33.1 27.2 27.3
0/1000 1000/2000 2000/3000 3000/4000 4000/5000 5000 5000 5000/6000 6000/7000 7000 7000 7000/8000 8000/9000 9000 9000
with the resulting RGB image produced from the reduced image. We calculate four PSNR values for each image in order to make comparisons. These four values correspond to the noise induced in red (R), green (G), blue (B), and combined (C) channels. Results are shown in Table 1. In Table 1, R, B, G, and C columns hold PSNR values for red, green, blue and combined RGB, respectively. In "ve of six cases, we see that learning outperforms YIQ and YUV with respect to PSNR, even though in all six cases, learning achieves more than twice the compression as the other two methods. The performance of learning is due to two factors. First, the learning method adapts to individual images where YIQ and YUV apply globally without change to all images. Second, the learning method takes advantage of spatial correlation of color in images which the YIQ and YUV methods do not. The primary disadvantage of the learning approach is the computation time required for learning. Once learn PSNR is Peak Signal to Noise Ratio. PSNR"20 log (d/RMS) where d is the peak signal which is usually 255 and RMS is the root-mean-square error between the original image component and the estimated image component. For each of red, green and blue PSNR calculations,
1 , (p !p( ) G G N G where N is the number of pixels in the image, p is the correG sponding red, green, or blue value of the ith pixel in the original image and p( is the estimate of p . The combined channel RMS, G G denoted RMS "( (RMS#RMS#RMS) where RMS , A P E @ P RMS , and RMS are the RMS for red, green and blue compoE @ nents, respectively.
RMS"
ing has taken place, the conversion to the reduced image and back to the new RGB image is very fast. Hence, if the results of learning for one image could be applied to another image, signi"cant improvement in overall compression time would result. To show that learning from one image does apply to other images we try a second experiment in which we train on the Chalk and Cave images in Fig. 3 and apply training results to the Boy and Lady images. Table 2 below shows PSNR results for this experiment where normal rows correspond to training exemplars and bold rows correspond to test images. Learning using the training examples proceeds in steps of 1000 repetitions (approximately 0.5 s) instead of 20,000 repetitions as used for results in Table 2. The learning rate is again 1.0E-6. In cells where two PSNR values are depicted (e.g. 9.3/25.6), PSNR values are for training images and the "rst number is the PSNR just before training is performed and the second number is the PSNR just after training is performed. In cells where only one PSNR value is shown, the PSNR is for a test image and no training was conducted at that step. The Rep column shows the number of cumulative training repetitions before and after training at that step. Hence, two numbers are shown for the training exemplars and one number for the test images. Step 1 shows the results for the chalk image when the weights are initialized to random values between !1.0 and 1.0. Table 2 shows that as learning proceeds on the training images, that reproduced image quality of the training images improves, as would be expected. However, we also see that the image quality of the test images improves, even though the test images are not used for training. This experiment shows that it is feasible to
1560
C. Clausen, H. Wechsler / Pattern Recognition 33 (2000) 1555}1560
apply training from one image to previously unseen images.
4. Conclusion This paper introduces a hybrid learning method consisting of PCA and backpropagation for reducing (&compressing') an RGB color image to a black and white image with the same number of pixels as the original color image. This reduction achieves a 3.7 compression ratio while retaining the capability of reproducing an approximation of the original color image that is optimal with respect to the minimum square error. The resulting black and white image can be further compressed using a gray-scale compression method such as JPEG. Experimental results show that as our hybrid learning method adapts to local (spatial) image characteristics it outperforms the YIQ and YUV standard compression methods. Our experiments also show that it is feasible to apply training results from one image to previously unseen images.
References [1] D.B. Judd, D.L. McAdam, G. Wyszecki, Spectral distribution of typical daylight as a function of correlated color temperature, J. Opt. Soc. Amer. A 54 (1964) 1031}1040. [2] Y. Fisher (Ed.), Fractal Image Compression, Theory and Application, Springer, New York, NY, 1995. [3] M. Nelson, The Data Compression Book, M & T Books, San Mateo, CA, 1982. [4] Healey, E. Glenn, A.S. Steven, B.W. Lawrence, Physics Based Vision: Principles and Practice: Color Vol. 2, A.K. Peters Ltd., 1992. [5] D.B. Judd, G. Wyszecki, Color in Business, Science, and Industry, Wiley, New York, 1975. [6] F. Grum, C.J. Bartleson (Eds.), Optical Radiation Measurements, Vol. 2, Academic Press, New York, 1980. [7] T.D. Sanger, Optimal unsupervised learning in a singlelayer linear feedforward neural network, Neural Networks, (2) (1989) 459}473. [8] J. Hertz, K. Anders, G.P. Richard, Introduction to the Theory of Neural Computation, Addison-Wesley, Reading, MA, 1991.
About the Author*HARRY WECHSLER received the Ph.D. in Computer Science from the University of California, Irvine, in 1975, and he is presently Professor of Computer Science at George Mason University. His research on intelligent systems has been in the areas of PERCEPTION: Computer Vision, Automatic Target Recognition, Signal and Image Processing, MACHINE INTELLIGENCE: Pattern Recognition, Neural Networks, Machine Learning (ML), Information Retrieval, Data Mining and Knowledge Discovery, EVOLUTIONARY COMPUTATION: Genetic Algorithms and Animats, MULTIMEDIA and VIDEO PROCESSING: Large Image Databases, Document Processing, and HUMAN-COMPUTER INTELLIGENT INTERACTION: Face and Hand Gesture Recognition, Biometrics and Forensics. He was Director for the NATO Advanced Study Institutes (ASI) on `Active Perception and Robot Visiona (Maratea, Italy, 1989), `From Statistics to Neural Networksa (Les Arcs, France, 1993), `Face Recognition: From Theory to Applicationsa (Stirling, UK, 1997), and he served as co-Chair for the International Conference on Pattern Recognition held in Vienna, Austria, in 1996. He authored over 200 scienti"c papers, his book `Computational Visiona was published by Academic Press in 1990, he is the editor for `Neural Networks for Perceptiona (Vols. 1 & 2), published by Academic Press in 1991, and co-editor for `Face Recognition: From Theory to Applicationsa, published by Springer-Verlag in 1998. He was elected as an IEEE Fellow in 1992 and as an Int. Assoc. of Pattern Recognition (IAPR) Fellow in 1998. About the Author*CLIFFORD CLAUSEN received the Ph.D. in Information Technology from George Mason University, Fairfax, VA, in 1999. Currently, Cli!ord works for Unisys corporation as a software engineer. His research interests include image compression, signal processing, computer vision, neural networks, pattern recognition, medical information systems, information retrieval, learning systems, and data mining.
Pattern Recognition 33 (2000) 1561}1573
Classifying cervix tissue patterns with texture analysis Qiang Ji *, John Engel, Eric Craine Department of Computer Science, University of Nevada, Reno, NV 89557, USA Western Research Company, Tucson, Arizona, USA Received 16 December 1998; accepted 17 May 1999
Abstract This paper presents a generalized statistical texture analysis technique for characterizing and recognizing typical, diagnostically most important, vascular patterns relating to cervical lesions from colposcopic images. The major contributions of this research include the development of a novel generalized statistical texture analysis approach for accurately characterizing cervical textures and the introduction of a set of textural features that capture the speci"c characteristics of cervical textures as perceived by human. Experimental study demonstrated the feasibility and promising of the proposed approach in discriminating between cervical texture patterns indicative of di!erent stages of cervical lesions. 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Texture analysis; Cervix lesion classi"cation; Colposcopic images; Pattern recognition
1. Introduction The incidence of cervical cancer mortality has been dramatically reduced since the introduction of the Papanicolaou (Pap) test. However, the Pap test is unable to accurately and consistently identify premalignant and malignant disease of the cervix. The incidence of falsenegative Pap tests has become shockingly high. In fact, certain studies [1] show that the false-negative Pap tests account for up to 40% of the actual positives in the sample. The high false-negative rate from Pap tests has motivated the use of colposcopy as a standard screening procedure for precancer examination [1]. Another serious problem with Pap-smears is that they produce an enormous number of positive "ndings which need to be followed up. Most of these "ndings turn out to be false positives based on insigni"cant cellular changes. A computer-supported colposcopy can help reduce the false positives of Pap-smears, therefore substantially reducing costs and patient su!ering associated with the
* Corresponding author. Tel.: #775-784-4613; fax: #775784-1877. E-mail address:
[email protected] (Q. Ji).
unnecessary biopsies. Finally, colposcopy can signi"cantly improve the ability of physicians to perform accurate punch biopsies for histological analysis. Today, regular cytological screening of the cervix via the Pap test, supplemented by colposcopy and colposcopically directed cervical biopsies, has become an established part of modern gynecological practice. One of the major factors hindering the full utilization of colposcopy is the di$culty in training physicians in the recognition of pathology. Colposcopic images contain complex and confusing lesion patterns. Correctly analyzing and classifying di!erent types of tissues require substantial training. The average physicians in private practice do not get to examine enough patients to maintain such expertise and may need to rely on consultation with colposcopy experts. Furthermore, there is also the rapidly increasing trend for people other than highly trained colposcopists, such as primary care physicians, to be involved in the screening procedure. Therefore, it is necessary to simplify the use of colposcopy and to enhance its capability so that average physicians can correctly recognize various tissue patterns in a consistent and uniform fashion. For this reason, we developed an image analysis system to help physicians better interpret various patterns on colposcopic images and to simplify the operation of colposcopy.
0031-3203/00/$20.00 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 1 2 3 - 5
1562
Q. Ji et al. / Pattern Recognition 33 (2000) 1561}1573
Fig. 1. Typical vascular patterns encountered in cervical lesions (adapted from Ref. [2]). (a) network capillaries in original squamous epithelium; (b) hairpin capillaries in original squamous epithelium; (c) and (d) punctation vessels in dysplasia and carcinoma in situ; and (e) and (f) mosaic vessels as seen in dysplasia and carcinoma in situ.
A careful examination of various cervical images reveals regular and repeatable vascular patterns, indicating di!erent stages of dysplasia. In fact, vascular pattern is the most important diagnostic criteria used by colposcopists for recognizing pathology [2]. For example, two basic types of vascular patterns observable in normal or benign lesions are hairpin and network capillaries. On the other hand, di!erent versions of punctation and mosaic vascular patterns may be observed in areas of dysplasia and carcinoma in situ. Fig. 1 shows typical vascular patterns encountered in cervical lesions. Various texture analysis methods have been developed to analyze and classify patterns like these. Classi"cation based on texture analysis is a well-established technique. Many techniques have been developed [3}7] over the years. They have been successfully applied in many "elds including remote sensing, medical diagnosis, and product inspection. The applications of texture analysis techniques in biomedicine focus on cell and tissue classi"cation. For example, Wu and Chen [8] proposed a multi-threshold texture vector for liver tissue classi"cation. Houston et al. [9] investigated di!erent texture measures of ultrasound images for diagnosing prostate cancer and for identifying cancerous lesions. Lachmann et al. [10] proposed to use texture analysis for brain tissue classi"cation from MRI images. Finally, Fortin et al. [11] advocated the use of texture analysis for segmentation of cardiac images.
These examples demonstrate the importance and utility of texture analysis for tissue classi"cation. However, the use of texture analysis techniques (or other image processing approaches) for recognizing and classifying cervix lesions has not been reported in the literature to the best of our knowledge. For this project, we introduce a texture analysis technique for recognizing and classifying cervical tissues. Texture analysis results in a set of feature metrics that describe the characteristics of di!erent vascular patterns. For example, a texture analysis of mosaicism will result in a set of feature measurements which characterize the class of tissues exhibiting mosaicism. Texture analysis is a three-layer process. The "rst layer identi"es the texture primitives, of which texture patterns are composed. The texture primitives are usually geometric entities like pixels or line segments. The second layer extracts certain properties of the identi"ed texture primitives. Depending on the types of primitives, the properties can be tonal properties like intensities for pixels or geometric attributes like length and direction for line segments. The last layer describes the spatial and/or statistical distribution of the primitives in terms of their attributes. The description can be symbolic or statistical. Parallel to the above categorizations, there are two basic approaches to the analysis of textures: structural and statistical. In the structural approach, textures are characterized by the texture primitives and the placement rules. The placement rules relate the centroids of various
Q. Ji et al. / Pattern Recognition 33 (2000) 1561}1573
primitives in terms of their relative positions like adjacency or nearness. Structural approach emphasizes the shape aspects of the primitives. All structural methods assume that textures are made up of primitives which appear in a near regular, repetitive or periodic spatial arrangement. In statistical methods, the granulation, linearation and randomness that constitute a texture pattern are characterized by taking statistics on the pixels. Traditional statistical methods assume that spatial intensity distribution is related to texture patterns. For example, "ne textures have a spectrum rich in high frequencies while coarse textures are rich in low spatial frequencies. Di!erent techniques [4,6,12] have been developed to describe the intensity distribution. For example, texture may be described as the auto-correlation function, which measures the linear dependence between pixels; or as the gray tone run length, which characterizes textures by the statistics of the gray tone of the run. One of the most popular statistical methods is the spatial gray tone cooccurrence matrix (GTCM) [12]. The entries of the cooccurrence matrix are the second-order joint conditional probabilities of two gray tones occurring at certain distance apart in a given direction. First- and second- order statistics derived from this matrix are used as measures for texture analysis. Statistical methods are good for analyzing disordered textures. A careful examination of the textural patterns relating to cervical lesions revealed the followings. First, texture patterns for cervical lesions are primarily due to the vascular patterns. The non-vascular structures in the cervical images contribute very little to the formation of texture patterns in cervical lesions. Furthermore, the vascular structures are mainly characterized by the geometric shape and spatial distribution of capillaries. The gray levels and thickness of capillaries are irrelevant to vascular patterns. Thus, cervix texture patterns cannot be characterized by the spatial intensity distribution of capillaries. Second, texture patterns for cervical lesions do not exhibit regular repetitive or periodic structures. Based on the above observations, we may conclude that the conventional structural and statistical texture analysis approaches are not directly applicable to characterizing cervical texture patterns. The structural approach is not applicable in that it looks for regular repetitive or periodic spatial arrangements, which are not present in cervical texture patterns. On the other hand, the statistical approach characterizes textures by their intensity distribution. It depends more on the intensity transitions within texture elements than on the structural organization of the texture. To best characterize the texture patterns relating to cervical lesions, we proposed a novel generalized statistical method. Recognizing the fact that cervical textures are primarily represented by the vascular structures, we assume that a signi"cant proportion of the texture in-
1563
formation in cervix lesion is contained in the vascular structures and that the vascular structures can be extracted and approximated by a set of connecting line segments of di!erent lengths and orientations. We therefore chose line segments as the textural primitives. Other researchers proposed the use of edgels as primitives [13]. We believed that capillaries can be better characterized by the high level line segments rather than the low level edgels. Given line segments selected as primitives, the length and orientation of line segments are the natural choices for the primitive properties. In summary, our approach "rst extracts the vascular structures from the original cervical lesion images, followed by vectorizing the extracted vascular structure using line segments. Statistical distributions of the line segments are then constructed. First- and second- order statistics derived from the joint and/or marginal distributions are used as textural measures for cervical lesions classi"cation. The beauty of such a texture characterization is that while it takes full advantage of the statistical approach, it also inherents the power of the structural approach by emphasizing the shape aspects of textures.
2. Algorithm development In this section, we detail our texture analysis technique for analyzing the textural patterns relating to cervical lesions. Following the 3-layer texture characterization process described in the previous section, this section further divides into three subsections. The "rst subsection focuses on texture primitives (line segments) extraction, the second subsection discusses properties of the extracted texture primitives, and the third subsection deals with the texture feature formulation and extraction. 2.1. Texture primitive extraction In the preceding section, we introduced our generalized statistical approach for characterizing the texture patterns relating to cervical lesions. The approach captures the cervical textural information contained in the vascular structures using a set of connecting line segments. This section describes our approach for extracting the line segments that approximate the vascular structures. The approach consists of three steps: image preprocessing, skeletonization, and vectorization as detailed below. 2.1.1. Image preprocessing Image preprocessing is important for classi"cation since it directly a!ects the classi"cation accuracy. This is particularly relevant to this project since vascular structures often coexist with other irrelevant artifacts on the surface of a cervix. Furthermore, the presence of #uids and/or other discharges on cervix surface causes
1564
Q. Ji et al. / Pattern Recognition 33 (2000) 1561}1573
non-uniform luminance and contrast to the underlying vascular structures. Fig. 2 shows a colposcopic image of a cervix displaying mosaic vascular pattern. As a result, the primary purpose of preprocessing is to digitally remove artifacts present on the surface of a cervix and to compensate the uneven luminance. To this end, a gray-level morphological operation was performed to remove the artifacts, followed by an image normalization operation to adjust for non-uniform luminance and
Fig. 2. A colposcopic image of a cervix displaying mosaic vascular pattern. Note the presence of artifacts and uneven luminance and contrast in the underlying mosaic vascular pattern.
contrast of the original image so that vascular structures di!er only in luminance and contrast would not be distinguished. Artifacts on the surface of a cervix image are identi"ed and separated from the underlying vascular structures based on the morphological di!erences between artifacts and capillaries. The capillaries usually are much more tortuous than artifacts. Artifacts, on the other hand, are usually short and straight segments or small dots. The di!erences in morphology were exploited in separating them based on the theory of mathematical morphology. The technique of morphological opening with a rotating structure creates an image containing only artifacts, which are then subtracted from the original images, yielding images containing predominantly vascular structures. The rotating structure elements are necessary due to random orientations of the artifacts. Detailed description of this algorithm may be found in Ref. [14]. Image normalization is achieved through background subtraction. The morphological rolling ball algorithm [15] creates a mask image that contains the background. The mask image is subtracted from the original to obtain a normalized image, with enhanced vascular structures. Fig. 3(b) shows an example of a morphologically enhanced image. The radius of the ball varies depending on the vascular patterns being studied and image scales. For the examples shown in Fig. 3, a radius of 3 pixels was chosen. 2.1.2. Vascular structure extraction Given the images after preprocessing, the next step is to extract the vascular structures. This is accomplished with a thresholding operation, resulting in a binary image containing primarily vascular structures. Adaptive thresholding is chosen due to variations in local contrasts as shown in Fig. 3(b). Such variations make the vascular
Fig. 3. (a) original image containing mosaic patterns; (b) morphologically enhanced image; (c) binary image of (a) after adaptive thresholding of (b).
Q. Ji et al. / Pattern Recognition 33 (2000) 1561}1573
structures di$cult to separate from its surroundings with a global threshold. We therefore implemented an automatic adaptive thresholding technique introduced in Ref. [16]. Fig. 3(c) shows the binary image after adaptive thresholding. 2.1.3. Skeletonization and vectorization Following preprocessing, the binary images of vascular structures are skeletonized since we observed that vascular structures can be fully captured by their skeletons. Skeletonization can be regarded as a shape normalization process so that the shape of capillaries are independent from digitization, lighting, scale, and other factors. Furthermore, skeletonization can preserve the critical shape information while removing redundant pixels, greatly facilitating the feature extraction and improving the pattern recognition e$ciency. For the skeletonization process, we employed the thinning algorithm provided by Zhang [17]. The algorithm is simple, fast, and easy to implement. With this algorithm, each capillary is thinned to a skeleton of unitary thickness. Fig. 4(a) shows the skeletonized image of the image in Fig. 3. Primitive extraction is subsequently accomplished through vectorization. Vectorization approximates the thinned vascular structures with connecting line segments. The Hough Transform (HT) is used for line segment detection. A drawback with the conventional HT is that it does not provide any information regarding the line segments. It merely clusters collinear points. Points on a line may represent di!erent line segments. A heuristic method was therefore developed through this research to extend the conventional HT for detecting line segments.
1565
Speci"cally, the line segment detection method developed through this research consists of two steps subsequent to the conventional HT: the "rst step involves grouping collinear points into line segments based on their proximity. To do so, the collinear points are "rst ordered by their column coordinates if their orientations are greater than 903 and by their raw coordinates otherwise. Two sequential pixels belong to the same line segment if the distance between their end points is less than a pre-speci"ed distance. For each line segment, this continues until the distance to the next ordered pixel exceeds the threshold. This ends the previous line segment and starts a new line segment. In many cases, the "rst step results in many short line segments lying along straight lines, some of which are broken because of erroneous edge directions. These shorter line segments must therefore be replaced by the corresponding longer line segments through line merging. This is accomplished in the second step. A metric is needed for line segment merging. It measures how likely two or more shorter line segments are part of the same longer line segment in the image. The metric we chose consists of the following three criterion: "rst, the line segments to be merged must be collinear. Collinearity is measured by the two HT parameters h and o. Two merging line segments should have the same (or close) h and o. The second criteria is proximity, i.e., they must be adjacent to each other. Proximity (nearness) is measured by the ratio of distance between the end points of the two line segments to the length of the line segments. After merging, line segments shorter than a prespeci"ed threshold are discarded as noises. Fig. 4(b) shows the vectorized image of the image shown in Fig. 3(c).
Fig. 4. (a) the skeletonized image of the image in Fig. 3 (c); (b) the vectorized image of the image in (a). In (b), (a) is approximated by connecting line segments.
1566
Q. Ji et al. / Pattern Recognition 33 (2000) 1561}1573
2.2. Primitive properties extraction and distribution construction With texture primitives extracted, we have available a list of primitives (line segments) that model the vascular structures of the original image data. We need to proceed to the next phase of texture characterization}texture primitive attributes computation and their distributions construction. 2.2.1. Primitive properties What we need to do here is to de"ne and extract properties that can best characterize the selected primitives. Since a line segment can be fully described by its length and orientation, line segment length and orientation are the natural choice of their properties. Given two end points of a line segment, its length and orientation can be easily derived. To be consistent, the orientation of a line segment, a real number, takes counterclock with respect to the positive X-axis and is limited to 0}179 inclusive. Similarly, the length, also a real number, can be computed using the two end points. 2.2.2. Distribution construction The properties of line segments can be treated as random variables. To study these properties, we need to estimate their probability density functions, based on sample observations. Since line segment length and orientation are real values, discretization becomes necessary to study the statistical distributions of line segments. Discretization groups line segments into bins, based on their original values. Speci"cally, the orientations of line segments are uniformly discretized into 180 bins ranging from 0 to 1793. The line segment length is discretized in a similar fashion into ¸ bins, where ¸ was empirically selected as 50. After discretization, the length of each line segment is referred to by the number of the bin it belongs to rather than by its actual length.
With the attributes of line segments discretized, we can proceed to construct the density functions (histograms) of the line segments in terms of length and orientation. A total of three distributions were constructed: one joint distribution and two marginal distributions. The joint distribution characterizes the line segment distribution by its length and orientation. Each point in the joint distribution represents the probability of a line segment with a particular length and orientation. The marginal distributions represent the line segment distribution with respect to length/orientation. Each point in a marginal distribution represents the probability of the line segment of a particular length (or orientation). Since line segments are of di!erent lengths, the orientation distribution of line segments is therefore weighted by their lengths. This results in a more realistic orientation distribution. Fig. 5 shows the two marginal distributions of line segment length and orientation for the vectorized mosaic pattern shown in Fig. 4(b). 2.2.3. Distribution normalization Since texture features are subsequently extracted from the above distributions, we need to ensure the invariance of the above distributions to a$ne transformations, i.e., rotation, translation, and scale invariant. These will, in turn, ensure the extracted texture features invariant to a$ne transformations, which is an important practical concern for classi"cation. To do so, the above distributions should be normalized. The normalization approaches we followed need to achieve only rotation invariance for orientation distribution since it is invariant to translation and scale. Similarly, for length distribution, only scale normalization is needed since it is invariant to translation and rotation. The length distribution is normalized via discretization as discussed before. Since "xed discretization level (50) is used to discretize line segment length, scale only a!ects the discretization intervals. Given the fact that the lengths of the discretized line segments are referred to by
Fig. 5. (a) Marginal probability density functions for orientation (a) and length (b) of the line segments in the vectorized mosaic pattern shown in Fig. 4(b).
Q. Ji et al. / Pattern Recognition 33 (2000) 1561}1573
their bin numbers rather than by their actual lengths, the discretization makes the distribution independent of scale. Fig. 6 shows the normalized length distributions for the image with mosaic pattern at two di!erent scales. It is clear that their shapes are very much similar after normalization. For orientation distribution, a rotation (which is equivalent to adding or subtracting an angle to each line) may not only cause a linear shift for the interior bins (which may not a!ect the shape of the distribution) but also cause a circular shift for the boundary bins (bins close to 0 or 179) as shown in Fig. 7(a), where the local peak at 03 results from a circular shift of the corresponding local peak at about 1403 in Fig. 5(a). This will alter the shape of the original distribution, rendering incorrect feature values. Therefore, it must be normalized. The normalization is carried out as follows. Identify the peak
1567
of each distribution and shift the peak to the 903 bin and perform the same amount of shift for other angles. The choice of the peak for normalization rather than other distribution marks like valley is because the peak is less sensitive to noises. Fig. 7 shows the orientation distributions image with mosaics at two di!erent orientations before and after normalization. 2.3. Texture features extraction Feature extraction is a dimension reduction process. It consists of deriving characteristic measurements from the input image or its transformations. The characteristic measurements characterizing the cervical lesion patterns are derived from the vectorized images and from the statistical distributions of the extracted line segments. Most textual features we suggest are extracted from
Fig. 6. The normalized length distribution of image with mosaic patterns in Fig. 5 (b) at two di!erent scales 1:1 for (a) and 5:3 for (b).
Fig. 7. The angle distributions of the mosaic image shown in Fig. 4 (b) with and without normalization: (a) The angle distribution of the mosaic image with a 45-degree rotation (without normalization); (b) the angle distribution of (a) after normalization (its shape is very similar to that of Fig. 5(a)).
1568
Q. Ji et al. / Pattern Recognition 33 (2000) 1561}1573
distributions of line segments. They characterize the shapes of line segments distributions. Additional features are derived directly from the vectorized images to describe the spatial complexity and density (vascular concentration) of the texture patterns. E!orts were made during feature extraction to select features that can relate to the speci"c textural characteristics of the cervical texture patterns. For example, some features relate to such textural characteristics as randomness, contrast, correlation, etc., while others characterize the spatial complexity of the texture patterns. We suggest a set of nine features which can be extracted from each of the two marginal distributions, four features from the joint distribution, and two features directly from the vectorized images, yielding a total of 24 features (2;9#4#2). For illustrative purpose, we will de"ne 4 of the 24 features in this section and explain their signi"cance in relating to the speci"c characteristics of the cervical textures as perceived by human. For detailed de"nitions of all 24 features, refer to Appendix A. Peak density ( f ): measures the strength of the local dominant peak in a marginal distribution, i.e., length or orientation distribution. f "max(p(i)) for i"1, 2,2, N, where p(i) is the prob ability of the ith bin and N is the discretization level. Entropy ( f ): measures the randomness or homogeneity of a distribution with respect to length or orientation. Ratio of the number of intersection points to the number of endpoints ( f ): measures the spatial complexity of the textures. Density ( f ): measures the coarseness (or "neness) of a texture in terms of the amount of edgels per unit area. The average number of edgels per unit area for all pixels is used as a measure of the density. During feature design process, every e!ort was made to devise features that represent the speci"c characteristics of the cervical textures as perceived by human. In this section, we will analyze some of the textural features we proposed and try to relate them to certain textural characteristics. Peak density ( f ) measures the strength of the domi nant length (or orientation) of line segment distribution with respect to a particular attribute. Take the orientation for example, hairpin texture pattern, with most line segments oriented in one direction as shown in Fig. 1, should have the highest f values among all cervical textural patterns while mosaics, with line segment directions scattering in all directions as shown in Fig. 1, should have the lowest. Similarly, for length distribution, hairpin texture pattern also has the highest peak density. Therefore, this feature can discriminate the mosaic and hairpin patterns. Entropy ( f ) measures the randomness or homogen eity of a distribution with respect to length or orientation. Entropy takes a higher value for more random distribution. Take the orientation for example, Mosaics
takes the highest value while hairpin takes the lowest value for the same reason as explained before. Therefore, this feature can discriminate the hairpin pattern from other patterns. f , the ratio of number of intersection points to the number of endpoints ( f ) measures the spatial complex ity of a texture. It takes a large value if capillaries interweave each other. For example, mosaics has the highest f value while punctation takes the lowest. Fur thermore, network pattern also has much higher value than those of hairpin and punctation. Therefore, this feature can discriminate mosaic and network patterns from others. Density ( f ) measures the coarseness (or "neness) of texture in terms of amount of edgels per unit area. Coarse textures have a small number of edgels per unit area while "ne texture have a high number of edgels per unit area. It measures the capillary concentration. For example, network pattern has the highest density while punctation has the lowest density. Therefore, this feature can discriminate the network from punctation patterns. While these features re#ect our subjective perception, much more experimentation and analysis should be done on this subject to analyze the correlation of other proposed features with speci"c characteristics of cervix texture patterns. 2.4. Feature analysis and selection Before proceeding to carry out classi"cation, an analysis of the extracted 24 features was conducted to study the e!ectiveness of each feature. Such an analysis is useful in determining which feature may be eliminated due to redundancy and which may be eliminated due to weak discriminatory power. Motivations for such an analysis include cost savings associated with data acquisition and processing. Feature analysis started with the removal of the correlated features, followed by ranking the remaining features. Redundant features are features whose presence does not contribute very much to the improvement of the classi"cation accuracy. One type of redundant features is features that are linearly correlated with each other. The linearly correlated features can be identi"ed by computing the linear correlation coe$cients between any two features. The identi"ed correlated features are subsequently removed via feature pruning. Criterion used for feature pruning include consistency, invariance, and discriminatory power. Consistency is measured by the within group variance, invariance measures features invariance to transformation, and discriminatory power determines a feature's discriminatory capability as discussed below. For example, angle peak density negatively correlates well with angle contrast, angle contrast feature can be removed since it has lower discriminatory power as shown in Appendix B.
Q. Ji et al. / Pattern Recognition 33 (2000) 1561}1573
Feature ranking rates the discriminatory power of each feature based on its capability in discriminating all classes. The ratio of between-class variance to the within group variance was used as a criteria. Appendix B shows the result of feature ranking. Analysis of the inter-class and intra-class variance revealed some of the features to be ine!ective, yielding a feature vector of reduced dimensionality. For example, angle kurtosis and skewness are ine!ective measures since they are the measures for unimode distribution while angle distributions contain multiple modes. On the other hand, length kurtosis and skewness are e!ective measures since length distributions are close to unimode. Subsequent to correlation computation and feature ranking, a subset of features can be selected to form a new feature set that will be used for classi"cation. This requires retaining those features having the highest discriminating power and deleting those that are redundant or provide minimum information for class separation. Redundant features can be deleted through removing correlated features. Features providing minimum information for classi"cation can be removed through ranking. We will demonstrate this in the experimental section.
2.5. Classixer design With features extracted, algorithms are needed to take the extracted features as input and output the required class labels. Of many di!erent classi"ers, we chose the minimum distance classi"er. To design the minimum distance classi"er, we compute the mean feature values for each pattern class. The distance to the mean of each class from a given input vector is computed and the vector is assigned to the class for which the distance from the vector to the class mean is the shortest since that class has feature measurements most similar to those of the unknown sample. Euclidean distance is used as a measure of the `nearnessa of a cluster, although other measures of similarity are also possible. To prevent features with large numerical values from dominating the distance calculations, both the feature values and the distances are normalized. This normalization procedure takes into account the spread of the values within a cluster. For example, in the one-dimensional case, for a cluster with a mean k and G a standard deviation p , if ( f!k )"( f!k ) but G p 'p , the sample would be classi"ed as belonging to class 1 since it has a larger variability, where f a query vector. For each of the six classes, the distance from the unknown sample to the class mean is computed. The class to which the distance is minimum is the class to which the unknown sample is assigned. The results of this classi"cation are recorded in a confusion matrix.
1569
3. Experimental evaluation In this section, we present the preliminary results of our studies on the usefulness of the proposed texture features for categorizing a series of typical vascular patterns as shown in Fig. 1. This evaluation starts with a discussion of image data acquisition, followed by textural feature extraction and analysis, classi"er design, and "nally a presentation and analysis of the classi"cation results. 3.1. Image acquisition and feature extraction The experiment started with cervical image acquisition and preparation. The images to be prepared should contain the typical vascular pattern classes characterizing di!erent stages of dysphasia. Typical vascular patterns to be recognized in this project include six vascular pattern classes. They are network capillaries, hairpin capillaries in normal and benign cervical lesions; and two di!erent versions of punctation and mosaic patterns in preinvasive and invasive lesions as shown in Fig. 1. Sources of images include our existing library of colposcopic images, published photographs, and commercial slides. All images came with diagnosis and biopsy results. To characterize each vascular pattern class accurately, 50 images were collected for each vascular pattern class, resulting in a total of 300 images. These images represent six classes. For each acquired image, we identi"ed and marked a rectangular region corresponding to a known vascular pattern. The identi"ed regions of interest were preprocessed. The preprocessed images were subsequently skeletonized and vectorized. The line segments distributions were then constructed, followed by the extraction of the 24 texture features. To perform feature selection, 5 images were randomly selected for each pattern class from the 300 images. Feature selection and analysis on the selected images yields 13 optimal features as shown in Table 1. Detailed results for feature analysis may be found in Appendix B.
Table 1 A subset of 13 optimal features Joint entropy Angle entropy Angle peak ratio Length peak density Angle peak density Length entropy CIntersections/CEnds Info entropy Energy Length median Angle contrast Length median/range
1570
Q. Ji et al. / Pattern Recognition 33 (2000) 1561}1573
3.2. Classixcation and results To study the classi"cation performance of the proposed technique, we divided the remaining images (270) into two sets: one for testing and one for training. Due to the large feature vector dimension (24) and relatively smaller number of images (45 for each class pattern), the conventional classi"cation technique, which requires dividing the original data set into separate training and testing data sets, cannot generate accurate results since we do not have su$cient data for training. Instead, cross-validation method was used for classi"cation design and evaluation. This method is widely used by investigators in the pattern recognition "eld when the data set does not contain su$cient samples to have separate training and testing data. The decision boundaries (training) are obtained by leaving out some samples in the original data set, the samples which are left out are then classi"ed (tested). The procedure is repeated for all samples in the data set, with new samples left out each time, to obtain the overall accuracy of the classi"cation scheme. The "nal classi"cation error is the average for each left-out. Speci"cally, for each training and testing session, 240 (40 for each class) sample images were used as the training data and the 30 (5 for each class) left-out were used as the testing data. This process iterated 9 times so that every sample image had the chance to be the left out and to be in the training set. The total number of correct
classi"cation over the 9 iterations was used to evaluate the classi"cation performance. For each testing, the minimum distance decision rule was used to classify the left-out image into one of six vascular pattern classes. Two experiments were conducted. First, classi"cation was performed using 24 features. Second, the 13 optimal features de"ned in Table 1 were computed and used for classi"cation. The performance of the classi"er was recorded in the confusion matrix as shown in Tables 2 and 3. The left most column of the confusion matrix is labeled with the actual classes, the top row shows the classi"ed classes. Table 2 shows the results using all 24 features while the confusion matrix in Table 3 shows the resultant output, using the optimal 13 features. In summary, we obtained the best discrimination performance (87.03%) by using all 24 features as shown in Table 2. However, by using only the 13 optimal features resulting from feature ranking, our technique experiences a loss in classi"cation accuracy (80.36%). The loss in accuracy seems to be minimal compared to the computational saving (almost 40%). Further experiments are needed to validate this.
4. Conclusion This paper describes a texture image analysis technique for characterizing and recognizing typical, diagnostically most important, vascular patterns relating to
Table 2 Confusion matrix for classi"cation using all 24 features
Mosaic 1 Mosaic 2 Punctation 1 Punctation 2 Network Hairpin Total
Mosaic 1
Mosaic 2
Punctation 1
Punctation 2
Network
Hairpin
% Correct
38 9 0 0 0 0 *
7 36 0 0 0 0 *
0 0 35 3 0 0 *
0 0 10 36 0 0 *
0 0 0 0 45 0 *
0 0 0 6 0 45 *
84.44 80.00 77.77 80.00 100.00 100.00 87.03
Table 3 Confusion matrix for classi"cation using 13 optimal features
Mosaic 1 Mosaic 2 Punctation 1 Punctation 2 Network Hairpin Total
Mosaic 1
Mosaic 2
Punctation 1
Punctation 2
Network
Hairpin
% Correct
31 9 0 0 0 0 *
6 36 0 0 0 0 *
0 0 32 0 0 0 *
0 0 7 38 10 0 *
8 0 0 0 35 0 *
0 0 6 7 0 45 *
68.88 80.00 71.11 84.44 77.77 100 80.36
Q. Ji et al. / Pattern Recognition 33 (2000) 1561}1573
cervical lesions. Preliminary experimental study demonstrated the feasibility of the proposed technique in discriminating between cervical texture patterns indicative of di!erent stages of cervical lesions. Study is underway currently to further characterize the performance of the proposed approach with a larger data set. The major contributions of this research include the development of a novel generalized statistical texture analysis technique for accurately characterizing cervical textures and the introduction of a set of textural features that capture speci"c characteristics of the cervical textures as perceived by human. These contributions could potentially lead to a system that can accurately recognize typical vascular patterns indicative of di!erent stages of cervix lesions. Appendix A. Textural features The following equations de"ne textural features. Notations: p(r, c) is the (r, c)th entry in a normalized joint line segment distribution with respect to both length and orientation. p(i) is the ith entry in a marginal line segment distribution with respect to either length or orientation. N is the number of distinct discretization levels with respect to either length or orientation. k and p are the mean and standard deviation of a marginal line segment distribution. A.1. Textural features for marginal density distributions: 1. Peak density ( f ) measures the strength of the local dominant orientation (or length) of a line segment distribution. f "max (p(i)) for i"1, 2 ,2, N. 2. Peak ratio ( f ) measures the relative strength of the second dominant orientation (or lengths of a line segment distribution. p f " , f p "max(p(i)) and p (f for i"1, 2,2, N. 3. Median ( f ) measures the homogeneity of a line seg ment distribution with respect to length or orientation. D p(i)"0.5. G 4. Mean ( f ) measures the global (average) trend of a line segment distribution with respect to length or orientation. , f " p(i) * i. G
1571
5. Median to range ratio ( f ) measures the homogeneity of a line segment distribution with respect to length or orientation. f f " . N 6. Contrast ( f ) measures the local variation or contrast in a distribution around its mean with respect to length or orientation.
i!k , p(i). f " abs p G 7. Entropy ( f ) measures the randomness or homogeneity of a distribution with respect to length or orientation. , f "! p(i) log(p(i)). G 8. Skewness ( f ) characterizes the degree of asymmetry of a univariate distribution around its mean.
1 , i!k f " . N p G 9. Kurtosis ( f ) measures the relative peakness or fatness of a univariate distribution.
1 , i!k \ . f " N p G A.2. Textural features for the joint distribution: 10. Joint entropy ( f ) measures the randomness or homogeneity of a joint distribution. , + f "! p(r, c) log(p(r, c)) P A 11. Informational entropy ( f ) measures the correla tion between two random variables (orientation and length) that are not brought about by the linear correlation coe$cient. f "(1!e\ &VW\D), where H "! p(i)p(j) log[p(i) * p(j)]. VW G H 12. Correlation ( f ) measures the linear dependence between the orientation and length of line segments. p f " J? , p p J ? where p , p , and p represent the covariance of J? J ? length and orientation, marginal variance of length and orientation, respectively. 13. Energy ( f ) measures the uniformity of the entries in the joint distribution. It is the lowest when all entries are equal. f " p(i, j). G H
1572
Q. Ji et al. / Pattern Recognition 33 (2000) 1561}1573
14. Ratio of the number of intersection points to the number of endpoints ( f ) measures the spatial com plexity of the textures. number of intersection points f " . number of end points 15. Density ( f ) measures the coarseness (or "neness) of a texture in terms of amount of edgels per unit area.
Appendix B. Feature ranking Features
Between G variance/ Within G variance
Joint entropy Angle entropy Angle peak ratio Length peak density Angle peak density Length entropy CIntersections/CEnds Info}entropy Energy Length median Angle contrast Length median/range Length skewness Length peak ratio Length mean Density Length kurtosis Correlation Length contrast Angle median/range Angle kurtosis Angle mean Angle median Angle skewness
125.87 29.22 25.92 16.98 15.06 12.42 10.87 10.78 10.57 9.52 9.37 5.48 5.42 4.67 4.64 4.37 2.84 2.15 1.62 1.07 0.99 0.85 0.84 0.57
References [1] B.L. Craine, E.R. Craine, J.R. Engel, N.T. Wemple, A clinical system for digital imaging colposcopy, Proceedings of SPIE Medical Imaging II, Newport Beach, CA, 1988.
[2] P. Kolstad, A. Sta#, Atlas of Colposcopy, University Park Press, 1977. [3] L. Davis, S. Johns, J.K. Aggarwal, Texture analysis using generalized co-occurrence matrices, IEEE PAMI, V. PAMI-1 3 (1979) 251}259. [4] R.M. Haralick, Statistical and structural approaches to texture, Proceedings of IEEE, 1979. [5] D.C. He, L. Wang, Textural "lters based on the texture spectrum, Pattern Recognition 12 (24) (1991) 1187}1195. [6] R.L. Kashyap, A. Khotanzad, Rotation invariant texture classi"cation using features derived from random "eld model, Proceedings of IEEE Proceedings of conference on Computer Vision and Pattern Recognition, Washington DC, 1983. [7] J. Mao, A.K. Jain, Texture classi"cation and segmentation using multiresolution simultaneous autoregressive models, Pattern Recognition 25 (2) (1992) 173}187. [8] C.M. Wu, Y.-C. Chen, Multi-threshold dimension vector for texture analysis and its application to liver tissue classi"cation, Pattern Recognition 26 (1) (1993) 137}144. [9] A. G. Houston, S. B. Premkumar, Statistical interpretation of texture for medical applications, Proceedings of Biomedical Image Processing and Three Dimensional Microscopy, San Jose, 1991 [10] F. Lachmann, C. Barillot, Brain tissue classi"cation from mri data by means of texture analysis, Proceedings of Medical Imaging VI: Image Processing, Newport Beach, CA, 1992. [11] C. Fortin, W. Ohley, Automatic segmentation of cardiac images: texture mapping, Proceedings of 1991 IEEE 17th Annual Northeast Bioengineering Conference, Hartford, CT, 19. [12] R.M. Haralick, K. Shanmugam, I. Dinstein, Textural features for image classi"cation, IEEE Trans. Systems, Man, Cybernet. SMC-3 (6) (1973) 610}612. [13] J. Chomet, Screening for cervical cancer: a new scope for general practitioners? results of the "rst year of colposcopy in general practice, Br. Med. J. 294 (1987) 1326}1328. [14] B.D. Thackray, A.C. Nelson, Semi-automatic segmentation of vascular network images using a rotating structuring element (rose) with mathematical morphology and dual feature thresholding, IEEE Trans. Med. Imaging 12 (3) (1993) 385}392. [15] S. Sternberg, Biomedical Image Processing, IEEE Comput. Mag. (1973) 22}33. [16] E.L. Hall, R.P. Kruger, A survey of preprocessing and feature extraction techniques for radiographic images, IEEE Trans. Comput. C-20 (9) (1971) 1032}1044. [17] T.Y. Zhang, C.Y. Suen, A fast parallel algorithm for thinning digital patterns, Commun. ACM 27 (3) (1984) 236}239.
About the Author*QIANG JI received a MS degree in electrical engineering from the University of Arizona in 1993 and his Ph.D. degree in electrical engineering from the University of Washington in 1998. His areas of research include computer vision, image processing, pattern recognition, and robotics. Dr. Ji is currently an assistant Professor at the department of computer science at University of Nevada at Reno. Between May, 1993 and May 1995, he was a research engineer with Western Research Company, Tucson, Arizona, where he served as a principle investigator on several NIH funded research projects to develop computer vision and pattern recognition algorithms for biomedical applications. In summer 1995, he was a visiting technical sta! with the Robotics Institute, Carnegie Mellon University, where he developed computer vision algorithms for industrial inspection. From 1995 to 1998, he worked as a research associate at the Intelligent Systems Laboratory (ISL) at the University of Washington, involving in a Boeing-funded research project consisting of developing computer vision techniques for 3D geometric tolerancing of manufactured parts from their images.
Q. Ji et al. / Pattern Recognition 33 (2000) 1561}1573
1573
He has published numerous papers in referred journals and conferences in these areas. His research has been funded by local and federal government agencies and by private companies including Boeing and HONDA. He currently serves as PI for a project funded by HONDA involving developing a computer vision system for monitoring driver's vigilance level. About the Author*Dr. JOHN ENGEL obtained his Ph.D. in physics from the University of California, Los Angles. He is now a senior engineer with Etec Systems, Inc. He has, since 1980, been involved with software engineering, scienti"c programming, algorithm development and image processing on micro-, mini-, and mainframe computer systems. He is a member of the Pattern Recognition Society, SPIE, IEEE, and the American Physical Society and the American Geophysical Union. About the Author*Dr. ERIC CRAINE received his Ph.D. in astrophysics from the Ohio State University. He has served as PI or co-PI on over 45 Federally and privately funded research programs. He has considerable experience in the management and consist of all phases of a diversity of research projects, and has an extensive bibliography of scienti"c books, papers, and monographs. He has directed a large University electronic imaging project, NASA supported projects, and has been intimately involved in the design and manufacture of the world's "rst all digital colposcopic systems.
Pattern Recognition 33 (2000) 1575}1584
Unsupervised segmentation using a self-organizing map and a noise model estimation in sonar imagery K.C. Yao *, M. Mignotte , C. Collet , P. Galerne , G. Burel Laboratoire GTS (Groupe de Traitement du Signal), Ecole Navale BP 600, 29 240 Brest Naval, France Universite& de Bretagne Occidentale, LEST-UMR CNRS 6616 6 av. Le Gorgeu, BP 809, 29 285 Brest, France Received 9 January 1998; received in revised form 11 June 1999; accepted 11 June 1999
Abstract This work deals with unsupervised sonar image segmentation. We present a new estimation and segmentation procedure on images provided by a high-resolution sonar. The sonar image is segmented into two kinds of regions: shadow (corresponding to a lack of acoustic reverberation behind each object lying on the seabed) and reverberation (due to the re#ection of acoustic wave on the seabed and on the objects). The unsupervised contextual method we propose is de"ned as a two-step process. Firstly, the iterative conditional estimation is used for the estimation step in order to estimate the noise model parameters and to accurately obtain the proportion of each class in the maximum likelihood sense. Then, the learning of a Kohonen self-organizing map (SOM) is performed directly on the input image to approximate the discriminating functions, i.e. the contextual distribution function of the grey levels. Secondly, the previously estimated proportion, the contextual information and the Kohonen SOM, after learning, are then used in the segmentation step in order to classify each pixel on the input image. This technique has been successfully applied to real sonar images, and is compatible with an automatic processing of massive amounts of data. 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Kohonen self-organizing map; Segmentation; Parameter estimation; sonar imagery; Markov random "eld
1. Introduction In high-resolution sonar imagery, three kinds of regions can be visualized: echo, shadow and sea-bottom reverberation. The echo information is caused by the re#ection of the acoustic wave from the object while the shadow zone corresponds to a lack of acoustic reverberation behind this object. The remaining information is called the sea-bottom reverberation area. On the pictures provided by a classi"cation sonar, the echo features are generally less discriminant than the shadow shape for the classi"cation sonar of object lying on the sea#oor. For this reason, detection and classi"cation of an object
* Corresponding author. Tel.: 00-33-298-234-018; fax: 00-33298-233-857. E-mail address:
[email protected] (K.C. Yao).
located on the sea#oor (as wrecks, rocks, man-made objects, and so on2) are generally based on the extraction and the identi"cation of its associated cast shadow [1]. Thus, before any classi"cation step, one must segment the sonar image between shadow areas and reverberation areas. In fact, the sea-bottom reverberation and the echo are considered as a single class. Unfortunately, sonar images contain speckle noise [2] which a!ects any simple segmentation scheme such as a maximum likelihood (ML) segmentation. In this simple case, each pixel is classi"ed only from its associated grey level intensity. In order to face speckle noise and to obtain an accurate segmentation map, a solution consists in taking into account the contextual information, i.e. class of the neighborhood pixels. This can be done using Markov random "eld (MRF) models [3] and this is why Markovian assumption has been proposed in sonar imagery [4]. In this global bayesian method, pixels are classi"ed
0031-3203/00/$20.00 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 1 3 5 - 1
1576
K.C. Yao et al. / Pattern Recognition 33 (2000) 1575}1584
using the whole information contained in the observed image simultaneously. Nevertheless, simple spatial MRF model have a limited ability to describe properties on large scale, and may not su$cient to ensure the regularization process of the set of labels when the sonar image contains high speckle noise. Such a model can be improved by using a large spatial neighborhood for each pixel [5], or a causal scale and spatial neighborhood [6] but this rapidly increases the complexity of segmentation algorithms and the parameter estimation procedure required to make this segmentation unsupervised. Besides, the segmentation and the estimation procedure with such a priori model requires a lot of computing time. Moreover, the uses of such a global method does not allow to take into account the noise correlation on the sonar image [7]. An alternate approach adopted here, uses a local method, i.e. takes into account the grey levels of the neighborhood pixels. In this scheme, each pixel is classi"ed from information contained in its neighborhood. This method allowing to take into account the noise correlation is divided in two main steps: the model parameter estimation [8] and the segmentation algorithm which is fed with the previously estimated parameters. In this paper, we adopt for the parameter estimation step an iterative method called iterative conditional estimation (ICE) [9] in order to estimate, in the ML sense, the noise model parameters and specially the proportion of each class (shadow and reverberation). Followed by the training of a competitive neural network as a Kohonen self-organizing map (SOM) [10] in order to approximate the discriminating function (i.e. the contextual distribution function of the grey level). For the segmentation step, we develop a contextual segmentation algorithm exploiting e$ciently the previously estimated parameters, the input sonar image, and the topology of the resulting Kohonen SOM. This paper is organized as follows. In Section 2, we detail the parameter estimation step based on the ICE procedure in Section 2.1, and the training step of the SOM in Section 2.2. Section 3 presents the segmentation step. Experimental results both on real scenes and synthetic sonar images are presented in subsection 3.3, where we compare the results obtained with the proposed scheme, a ML segmentation and a classical monoscale Markovian segmentation. Then a conclusion is drawn in Section 4.
2. Estimation step 2.1. Iterative conditional estimation 2.1.1. Introduction We consider a couple of random "elds Z"(X, >) with >"+> , s3S, the "eld of observations located on a latQ
tice S of N sites s, and X"+X , s3S, the label "eld. Each Q > takes its value in " "+0,2,255, and each X in Q M@Q Q +e "shadow, e "reverberation,. The distribution of (X,>) is de"ned "rstly by P (x), the distribution of X as6 sumed to be stationary and Gibbsian (i.e. Markovian) in this estimation step, and secondly by the site-wise likelihoods P (y /x ). In this work, these likelihoods 7Q6Q Q Q depend on the class label x . The observation > is called Q the incomplete data whereas Z stands for the complete data. In this step, we estimate the parameter vector ' which W de"nes P (y/x) by using the iterative method of estima76 tion called iterative conditional estimation (ICE) [9]. This method requires to "nd an estimator, namely ' (X, >) for completely observed data. When X is unobW servable, the iterative ICE procedure de"nes ' I> as W conditional expectations of ') given >"y, computed W according to the current value ' I . This is the best W approximation of ' in terms of the mean square error W [9]. By denoting E , the conditional expectation using I ' I , this iterative procedure is de"ned as follows: W E Initialize the noise model parameters to ' . W E ' I> is computed from ' I and >"y by W W ' I> "E [') />"y]. W I W
(1)
The computation of this expectation is impossible in practice, but we can approach Eq. (1) owing to the law of large numbers by 1 ' I> " [') (x , y)#2#') (x , y)], W L W n W
(2)
where x , with i"1 ,2, n are realizations of X accordG ing to the posterior distribution P . Finally, we 67WVW I W can use the ICE procedure for our application because we get: E An estimator ') (X, >) of the complete data: we use W a maximum likelihood (ML) estimator for the noise model parameter estimation (see Section 2.1.2.) E An initial value ' not too far from the optimal W parameters (see Section 2.1.3). E A way of simulating realizations of X according to the posterior distribution P (y/x) by using the Gibbs 76 sampler [11]. For the prior model, we adopt an 8connexity spatial neighborhood (see Fig. 1) in which b , b , b , b represent the a priori potential associated to the horizontal, vertical, right and left diagonal binary cliques, respectively. In our application, we want to favour homogeneous regions. Then, we de"ne potential functions associated to the two-site cliques of the form b "k[1!d(x , x )], QR Q R
(3)
K.C. Yao et al. / Pattern Recognition 33 (2000) 1575}1584
Fig. 1. Second-order neighborhood and two-site associated cliques.
where b "b , b , b or b according to the type of QR neighboring pair 1s, t2, k is a predetermined positive constant and d(.) is the Kronecker function. 2.1.2. Estimation of the noise model parameters for the complete data The Gaussian law N(k, p), is an appropriate degradation model to describe the luminance y within shadow regions (essentially due to electronic noise) [12]. The most natural choice of the estimator ') (x"e , y) is the W empirical mean and the empirical variance. If N pixels are located in the shadow areas, we have 1 y, (4) k( " Q +* N QZ1VQC 1 p " (y !k( ). (5) +* N !1 Q +* QZ1VQC In order to take into account the speckle noise phenomenon [2] in the reverberation areas, we model the conditional density function of the reverberation class by a shifted Rayleigh law R(min, a) [12]
y !min (y !min) P(y /x "e )" Q exp ! Q . Q Q a a
(6)
The maximum value of the log-likelihood function is used to determine a ML estimator of the complete data. If y( is the minimum grey level in the reverberation
areas and N the number of pixels located within this region, we obtain for ') (x"e , y) the following results W [12]: 1 (y !min ), (7) p " Q +* +* 2N QZ1VQC min Ky( !1. (8) +*
In the two cases, the proportion n of the kth class is I given by the empirical frequency N I n( " with k3+0, 1,. I N #N
(9)
2.1.3. Initialisation The initial parameter values have a signi"cant impact on the rapidity of the covergence of the ICE procedure
1577
and the quality of the "nal estimates. In our application, we use the following method: the initial parameters of the noise model ' are determined by applying a small W non-overlapping sliding window over the image and calculating the sample mean, variance and minimum grey-level estimates. Each estimation calculated over the sliding window gives a `samplea x , a three component G vector. These samples +x ,2, x , are then clustered + into two classes +e , e , using the K-means clustering procedure [13]. This algorithm uses a similarity measure based on the Euclidean distance between samples. A criterion is based on the minimization of the related cost function de"ned by ) (10) J" "x !c ", J G G VJZ!G where the second sum is over all samples in the ith cluster and c is the center of this cluster. It is easy to show that G for a given set of samples and class assignments, J is minimized by choosing c to be the sample mean of the ith G cluster. Moreover, when c is a sample mean, J is miniG mized by assigning x to the class of the cluster with the H nearest mean. A number of other criteria are given in Ref. [13]. The complete algorithm is outlined below: (1) Choose K initial clusters c ,2, c . These could be arbitrarily chosen, but areusually)de"ned by c "x (1)i)K). (11) G G (2) At the kth step, assign the sample x (1)l)M) to J cluster i if #x !c I #(#x !c I # (∀jOi). (12) J J G H In fact, we reassign every sample to the cluster with the nearest mean. In the case of equality, we assign x arbitrary to i or j. J (3) Let c I denote the ith cluster after Step 2. Determine G new clusters by 1 c I> " x, (13) J N G G VJZA I G where N represents the number of samples in c I . G G Thus, the new cluster position is the mean of the samples in the previous one. (4) Repeat until convergence is achieved, say c I> "c I , ∀i. G G Although it is possible to "nd pathological cases where convergence never occurs [13], the algorithm does converge in all tested examples. The rapidity of convergence depends on the number K, the choice of initial cluster centers and the order in which the samples are considered. In our application K"2. Fig. 2(a) represents a sonar image and the result of the K-means clustering algorithm is reported in Fig. 2(b).
1578
K.C. Yao et al. / Pattern Recognition 33 (2000) 1575}1584
Fig. 2. K-means clustering procedure: (a) sonar picture involving an object and a rock shadow (b) segmentation according to the ML criterion with parameter estimation given by the Kmeans algorithm.
On the one hand, a small size window increases the accuracy of the segmentation and then the precision of the distribution mixture estimation. On the other hand, it decreases the number of pixels with which x 's are comJ puted and may increase the misclassi"cation error. In our application, good results are obtained with a 6;6 pixels window. The ML estimation is then used over the Kmeans segmentation in order to "nd ' . W 2.1.4. Parameter estimation procedure for the incomplete data We can use the following algorithm to estimate the noise model parameters. Let us recall that this method takes into account the diversity of the laws in the distribution mixture estimation. E Parameter initialization: The K-means algorithm is used. Let us denote by ' the obtained result. W E ICE procedure. ' I> is computed from ' I in the W W following way: E 䡩 Using the Gibbs sampler, n realizations x ,2, x are simulated according to the poste L rior distribution with parameter vector ' I , and W with P Q Q(y /x "e ) a Gaussian law for shadow 7 6 Q Q area, P Q Q(y /x "e ) a shifted Rayleigh law for 7 6 Q Q reverberation area. E 䡩 For each x with i"1 ,2, n, the parameter vecG tor ' is estimated with the ML estimator on each W class: E 䡩 ' I> is obtained from ' (x , y) with (1)i)n) W W G by using Eq. (2). If the sequence ' I becomes steady, the ICE procedure W is ended and one proceeds the segmentation using the estimated parameters. We can use all these estimated parameters in order to get a complete unsupervised Markovian segmentation (see Section 3.2) or only use the proportion of each class in the Kohonen SOM-based unsupervised segmentation described in Section 2.2.
Fig. 3. Image histogram of the picture reported in Fig. 2 with the estimated Gaussian and Rayleigh laws.
Table 1 Estimated parameters on the pictures reported in Fig. 2(a). n stands for the proportion of the two classes within the sonar image. k and p are the Gaussian parameters (shadow area). Min and a are the Rayleigh law parameters (reverberation area). ' represents the initial parameter estimates and the "nal W estimates are denoted ') W Initialisation of K-means procedure ' DGL?J
WQF?BMU ' DGL?J
WQC?\@MRRMK
0.04 L 0.96 L
36 I 39
55 N 1061 ?
ICE procedure ' DGL?J
WQF?BMU ' DGL?J
WQC?\@MRRMK
0.03 L 0.97 L
32 I 39
17 N 1591 ?
We calibrate the weight of the `stochastica aspect of the ICE by choosing n. When n increases, the `stochastica aspect of the algorithm decreases. The choice of a small value for n (n"1 in our application) can increase its e$ciency [14]. Fig. 3 represents the mixture of distributions of the sonar image reported in Fig. 2(a). The obtained results are given in Table 1. The quality of the estimations is di$cult to appreciate in absence of real values. We can roughly perform such evaluation by comparing the image histogram with the probability density mixture corresponding to the estimated parameters. Fig. 3(a) shows the resulting mixture solution in graphical form. The two dashed curves in the "gures represent the individual components P G(y/e ) 76 K with 0)m)K. The histogram is quite close to the mixture densities based on the estimated parameters, and a segmentation with these estimates can be done as shown in the following section.
K.C. Yao et al. / Pattern Recognition 33 (2000) 1575}1584
2.2. Self-organizing map 2.2.1. Introduction Researches on neurobiology have shown that centers of diverses activities as thought, speech, vision, hearing, lie in speci"c areas of the cortex and these areas are ordered to preserve the topological relations between informations while performing a dimensionality reduction of the representation space. Such organization led Kohonen to develop the SOM algorithm [10]. This kind of competitive neural network is composed of one or two dimensional array of processing elements or neurons in the input space. All these neurons receive the same inputs from external world. Learning is accomplished by iterative application of unlabeled input data. As training process, the neurons evolve in the input space in order to approximate the distribution function of the input vectors. After this step, large-dimensional input vectors are, in a sense, projected down on the one or two-dimensional map in a way that maintains the natural order of the input data. This dimensional reduction could allow us to visualize and to use easily, on a one or two-dimensional array, important relationships among the data that might go unnoticed in a high-dimensional space. The model of SOM used in our application is a onedimensional array of n nodes. To each neuron N , G a weight vector w "(w , w ,2, w )31N is associated. G G G GN During learning procedure, an input vector x31N randomly selected among vectors of the training set, is connected to all neurons in parallel. The input x is compared with all the neurons in the Euclidean distance sense via variable scalar weight w . At the kth step, we assign the GH vector x to the winning or leader neuron N if: J #x!w I #"min #x!w I #. J G G
(14)
All the neurons within a certain neighborhood around the leader participate in the weight-update process. Considering random initial values for w (0)i)n), this G learning process can be described by the following iterative procedure:
1579
the convergence, it is necessary that H I P0 when kP¹, JG where ¹ is the total number of steps of the process [10]. Therefore, for the "rst step, a I should start with a value that is close to unity, thereafter decreasing monotonically [10]. To achieve this task, we use
k . ¹
a I "a 1!
(17)
Moreover, as learning proceeds, the size of the neighborhood should be diminished until it encompasses only a single unit. So, we applied for the width of the Kernel the monotonically decreasing function: p I "p
p 2\ I2\ p
(18)
The ordering of the map occurs during the "rst steps, while the remaining steps are only needed for the "ne adjustment of the weight values. 2.2.2. Iterative learning step The learning process is performed directly on the real image to be segmented. An input vector is "lled with the grey levels of the pixels contained in a 3;3 pixels window sliding over the image (cf. Fig. 4). Therefore, each neuron has nine weights allowing to locate it in the input space. At each step, the location of the window in the image is randomly chosen and the weights are modi"ed according to Eq. (15). Experiments have shown that this training strategy provides as good results as an ordered image scanning process while spending less processing time. p has a signi"cant impact on the quality of the convergence. We have to start with a fairly large value to globally order the map. The initial value p of p can be half the length of the network. During learning, p has to decrease monotonically until it reaches a small value. Experiments have shown that p "0.1 is a good 2\ choice and provides the minimum quantization error
w I> "w I #H I (x I !w I ). (15) G G JG G The lateral interactions among topographically close elements are modeled by the application of a neighborhood function or a smoothing Kernel de"ned over the winning neuron [10]. This Kernel can be written in terms of the Gaussian function
d(l, i) , H I "a I exp ! 2(p I ) JG
(16)
where d(l, i)"#l!i# is the distance between the node l and i in the array, a I (t) is the learning-rate factor and p I de"nes the width of the Kernel at the iteration k. For
Fig. 4. Model of the SOM used for the segmentation. An 3;3 sliding window is used to feed the SOM.
1580
K.C. Yao et al. / Pattern Recognition 33 (2000) 1575}1584
de"ned by 1 , E " #x !w #, OS?LR N G J G
(19)
where the summation is over all the N windows of the image and w is the weight vector associated to the leader J neuron of the input vector x after learning step. G 3. Segmentation step 3.1. SOM segmentation The classi"cation task consists in running the sliding window over the image. For each location of the window, the corresponding input vector x is compared with all the neurons using Eq. (14). The winning neuron, the one which leads to the smallest distance, gives the class of the pixel located in the center of the window. However, before any classi"cation task, we have to calibrate the map in order to associate the label shadow or reverberation to each neuron. Assuming that the input vector x "(0 ,2, 0) should represent a window setting on a perfect shadow area, it is very useful to de"ne the distance graph representing the Euclidean distance in the nine-dimensional space between the point x and all the neurons. Such a graph is given in Fig. 5 and Fig. 6 respectively before and after learning for a hundred-neuron network. Both these "gures show that the maximal distance between two successive cells is widely smaller after learning than before. We can deduce that, after learning, neurons that are topologically close in the array are close
in the input space too. As a matter of fact, neurons that are physical neighbors should respond to a similar input vectors. The calibration of the map uses this topological property and the portion n estimated in Section 2.1 of the pixels labelled as shadow in the image. This process can be summarized as follows: (1) Initially, we a!ect the class reverberation to all neurons. (2) We seek the most evident prototype of the shadow class. This neuron is the winning unit according to Eq. (14) when inputting the vectors x . Then, we a!ect it to the shadow class. (3) We a!ect the shadow class to pixels for which the leader neuron belongs to the shadow class. We can deduce the intermediate shadow portion n provided GLR by the resulting image. (4) If n is smaller than n , we have to select an addiGLR tional prototype of a shadow class among neurons of the reverberation class. According to the topological preserving properties of the map, this additional neuron should be a direct neighbor of an already shadow labelled neuron. Among both the possible neighbors, we take the one which has the smallest Euclidean distance with the point x . Go to 3. (5) If n is larger than n , we stop the process. GLR Experiments have shown that E is a monotonically OS?LR decreasing function of the number of steps and reaches an asymptotic value for large value of ¹. One hundred times the number of network units seems to be a reasonable compromise solution between speed and quality of learning. In our application, 100 neurons have been chosen for the network.
Fig. 5. The distance graph between the 100 neurons of the SOM before learning.
K.C. Yao et al. / Pattern Recognition 33 (2000) 1575}1584
1581
Fig. 6. The distance graph between neurons obtained after learning.
3.2. Markovian segmentation The segmentation of sonar images in two classes can be viewed as a statistical labelling problem according to a global Bayesian formulation in which the posterior distribution P (x/y) J exp [!;(x, y)] has to be 67 maximized [3]. In our case, the corresponding posterior energy ;(x, y) to be minimized is ;(x, y)" t (x , y ) Q Q Q QZ1
GFHFI ; (x,y)
# b [1!d(x , x )] QR Q R 1 2 QR
GFFHFFI ; (x,y)
(20)
where ; denotes the adequacy between observations and labels (( (x , y )"In [P (x /y )]) and ; exQ Q Q 6Q7Q Q Q presses the energy of the priori model. In order to minimize this energy function, we use a deterministic relaxation technique called ICM algorithm [3}5,15]. 3.3. Results on real scenes We compare the segmentation performance of the proposed SOM-based algorithm described in Section 3.1 with a ML segmentation and a classical Markovian segmentation using a deterministic relaxation technique such as the ICM [3]. All the segmentation results exploit the parameter estimation step presented in Section 2.1. This estimation step is used to estimate both the noise model parameters for the ML segmentation, the Mar-
Fig. 7. (a) A real sonar image involving a sandy sea #oor with the cast shadow of a tyre. Two-class segmentation results obtained with this image using: (b) ML segmentation, (c) Markovian segmentation with a deterministic relaxation technique as ICM, (d) the SOM based segmentation result (see Table 2 for the estimated parameters). The SOM segmentation exhibits a good robustness against the speckle noise (which induces false small shadow areas compared to the others approaches).
kovian segmentation and the proportion of the shadow class for the SOM segmentation. Figs. 6}9 show the segmentation results obtained with the di!erent methods. Example of the noise model
1582
K.C. Yao et al. / Pattern Recognition 33 (2000) 1575}1584
Fig. 8. A real sonar image involving an object and a rock shadows. Two-class segmentation results obtained with (a), (b) ML segmentation, (c) Markovian segmentation with ICM technique, (d) SOM based segmentation method (see Table 3 for the estimated parameters). The ML and ICM do not permit to totally eliminate the speckle noise e!ect (creating shadow mislabelled isolated pixels).
Table 2 Estimated parameters on the picture reported in Fig. 6 ICE procedure ' DGL?J
WQF?BMU ' DGL?J
WQC?\@MRRMK
0.02 L 0.98 L
36 I 46
ICE procedure 0.03 L 0.97 L
25 I 35
Table 4 Estimated parameters on the picture reported in Fig. 8 ICE procedure ' DGL?J
WQF?BMU ' DGL?J
WPCTCP@CP?RGML
0.03 L 0.97 L
34 I 42
39 N 1412 ?
85 N 1878 ?
Table 3 Estimated parameters on the picture reported in Fig. 7
' DGL?J
WQF?BMU ' DGL?J
WQC?\@MRRMK
Fig. 9. (a) A synthetic sonar image of a sphere lying on the seabed. Segmentation results obtained with: (b) ML segmentation, (c) Markovian ICM technique, (d) SOM-based segmentation method.
32 N 1430 ?
parameters ' obtained with our scheme are given in W Tables 2}4. Experiments indicate that the SOM segmentation requires less computation than the Markovian segmenta-
tion (30 for the SOM estimation-segmentation, whereas roughly 100 are required for unsupervised scale causal Markovian modelization [16] on IBM 43P-200 MHz workstation). Besides, the ICM algorithm do not permit to decrease the number of false alarm (wrong detections) due to the speckle noise e!ect. The SOM segmentation performs better, exhibits a good robustness versus speckle noise (false alarms have been eliminated), and allows us to preserve the shadow shapes of little rocks. Manufactured objects or rock shadows are better segmented with our method than with the others (cf. Figs. 7}9) and their shape are close to the result we expected. The cast shadow of a manufactured object (a cylinder) reported in Fig. 10 has a geometric shape (contrary to the cast shadow of the rock) that will be discriminant for the classi"cation step.
K.C. Yao et al. / Pattern Recognition 33 (2000) 1575}1584
1583
References
Fig. 10. Real sonar images of a cylindrical object (a) and of ridges of sand (c). Their corresponding SOM-based segmentation results are depicted, respectively, in (b) and (d).
4. Conclusion We have described an unsupervised segmentation procedure based on a parameter estimation step (which o!ers an appropriate estimation of the noise model) and a segmentation step well adapted for sonar image segmentation problem. The estimation step takes into account the diversity of the laws in the distribution mixture of a sonar image and can be used with the Kohonen SOM-based segmentation in order to solve the di$cult problem of unsupervised sonar image segmentation. This scheme is computationally simple and appears as an interesting alternative to existing complex hierarchical Markovian methods. This method has been validated on several real sonar images demonstrating the e$ciency and robustness of this scheme.
Acknowledgements The authors thank GESMA (Groupe d'Etude Sous Marine de l' Atlantique, Brest France) for having provided real sonar images and REGION BRETAGNE for partial "nancial support of this work.
[1] P. Galerne, K. Yao, G. Burel, Objects classi"cation using neural network in sonar imagery, Proceedings of the SPIE on New Image Processing Techniques and Applications, Vol. 3101, Munchen, June 1997, pp. 306}314. [2] J.W. Goodman, Some fundamental properties of speckle, J. Opt. Soc. Am. 66 (11) (1976) 1145}1150. [3] J. Besag, Spatial interaction and the statistical analysis of lattice systems, J. Roy. Statist. Soc. 36 (1974) 192}236. [4] C. Collet, P. Thourel, P. PeH rez, P. Bouthemy, Hierarchical MRF modeling for sonar picture segmentation, Proceedings of the Third IEEE International Conference on Image Processing, Vol. 3, Lausanne, September 1996, pp. 979}982. [5] P. Thourel, C. Collet, P. Bouthemy, P. PeH rez. Multiresolution analysis and MRF modelling applied to the segmentation of shadows in sonar pictures, Proceedings of the Second Asian Conference on Computer Vision, Vol. 2, Singapore, December 1996, pp. 81}85. [6] M. Mignotte, C. Collet, P. PeH rez, P. Bouthemy, Unsupervised hierarchical Markovian segmentation of sonar images. Proceedings of the ICIP, Vol. 3, Santa Barbara, CA, USA, October 1997. [7] F. Schmitt, L. Bonnaud, C. Collet, Contrast control for sonar pictures, Signal and Image Processing, SPIE'96* Technical Conference on Application of Digital Image Processing XIX Vol. 2847, August 1996, pp. 70}82. [8] P. Masson, W. Pieczynski, S.e.m. algorithm and unsupervised statistical segmentation of satellite images, IEEE Trans. GeH osci. Remote Sensing 3 (1993) 618}633. [9] F. Salzenstein, W. Pieczynski, Unsupervised Bayesian segmentation using hidden Markovian "elds, Proceedings of the ICASSP, May 1995, pp. 2411}2414. [10] T. Kohonen, Self Organizing Maps, Springer, Berlin, 1995. [11] S. Geman, D. Geman, Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images, IEEE Trans. Pattern Anal. Mach. Intell. 6 (6) (1984) 721}741. [12] F. Schmitt, M. Mignotte, C. Collet, P. Thourel, Estimation of noise parameters on sonar images, SPIE Statistical and Stochastic Methods for Image processing, Vol. 2823, Denver, 4}5 August 1996, pp. 1}12. [13] S. Banks, Signal Processing Image Processing and Pattern Recognition, Prentice-Hall, Englewood Cli!s, NJ, 1990. [14] B. Braathen, P. Masson, W. Pieczynski, Global and local methods of unsupervised Bayesian segmentation of images, Machine Graphics Vision 1 (1993) 39}52. [15] M. Mignotte, C. Collet, P. PeH rez, P. Bouthemy, Unsupervised Markovian segmentation of sonar images, Proceedings of the ICASSP, Vol. 4, Munchen, May 1997, pp. 2781}2785. [16] M. Mignotte, C. Collet, P. Perez, P. Bouthemy, Sonar image segmentation using an unsupervised hierarchical mrf model, IEEE Trans. Image Process. March 6, 1999, in press.
1584
K.C. Yao et al. / Pattern Recognition 33 (2000) 1575}1584
About the Author*PASCAL GALERNE was born in Brest on August 20, 1970. He received the D.E.A (Postgraduate degree) in Electronic, Image and Signal Processing from the Bretagne Occidentale University in 1994. He is actually a Ph. D. student in Sonar image processing at the Signal Research Center of the French Naval academy. His current research interests include neural networks, image synthesis, image segmentation, classi"cation algorithms. About the Author*MAX MIGNOTTE received the Master of Sciences (Electronics and TeH leH communications) (1992) and the D.E.A (Postgraduate degree) in Digital Signal, Image and Speech Processing of the Institut National Polytechnique of Grenoble (1993). He is actually a Ph.D. student in computer science (Digital signal and image processing) at the Signal Research Center of the French Naval academy. His current research interests include image segmentation, parameters estimation, hierarchical Markovian models, statistical model and genetic algorithm. About the Author*KOFFI-CLED MENT YAO received the Ph.D. degree (1990) in Optical Signal Processing and Computer Science from the Louis Pasteur University of Strasbourg, France. From 1991 to 1992, he worked as Assistant Professor in the Department of Optics and Communication Systems at the Ecole Nationale SupeH rieure des TeH leH communications de Bretagne, France. Currently, He is Mam( tre de ConfeH rences of the Bretagne Ocidentale University and He teaches deterministic and statistical signal processing in Ecole Navale, the French Naval Academy. His present research interests include pattern recognition, neural networks, adaptive "ltering and higher order statistics for blind signals separation in Underwater Acoustics. About the Author*CHRISTOPHE COLLET was born in Paris in 1966. He studied in the Universities of Paris-Sud-Orsay and Toulon in the "elds of Electronic and Signal Processing. He received the Ph.D. in Image Processing in 1992. He is currently Mam( tre de confeH rences at the Bretagne Occidentale University and manages the Signal Research Center of the French Naval Academy. His research interests include nonlinear statistical estimation, Markov random "elds, texture analysis, 2D pattern recognition and hierarchical analysis. About the Author*GILLES BUREL was born in 1964. He obtained the diploma form the Ecole Superieure d'Electricite in 1988. Then, he received the Ph.D. degree in 1991 and the Habilitation a Diriger des Recherches in 1996 from the Bretagne Occidentale University. From 1988 until 1996 he was with Thomson CSF, Rennes, France, a leading company in professional electronics, where he worked in Image Processing and Digital Transmissions. He his now professor at the Bretagne Occidentale University. He is the author of 19 patents and 40 articles in the areas of Image Processing and Digital Transmissions.